educe.stac.sanity package¶
Subpackages¶
Submodules¶
educe.stac.sanity.common module¶
Functionality and report types common to sanity checker
-
class
educe.stac.sanity.common.
ContextItem
(doc, contexts)¶ Bases:
educe.stac.sanity.report.ReportItem
Report item involving EDU contexts
-
class
educe.stac.sanity.common.
RelationItem
(doc, contexts, rel, naughty)¶ Bases:
educe.stac.sanity.common.ContextItem
Errors which involve Glozz relation annotations
-
annotations
()¶
-
html
()¶
-
-
class
educe.stac.sanity.common.
SchemaItem
(doc, contexts, schema, naughty)¶ Bases:
educe.stac.sanity.common.ContextItem
Errors which involve Glozz schema annotations
-
annotations
()¶
-
html
()¶
-
-
class
educe.stac.sanity.common.
UnitItem
(doc, contexts, unit)¶ Bases:
educe.stac.sanity.common.ContextItem
Errors which involve Glozz unit-level annotations
-
annotations
()¶
-
html
()¶
-
-
educe.stac.sanity.common.
anno_code
(anno)¶ Short code providing a clue what the annotation is
-
educe.stac.sanity.common.
is_default
(anno)¶ True if the annotation has type ‘default’
-
educe.stac.sanity.common.
is_glozz_relation
(anno)¶ True if the annotation is a Glozz relation
-
educe.stac.sanity.common.
is_glozz_schema
(anno)¶ True if the annotation is a Glozz schema
-
educe.stac.sanity.common.
is_glozz_unit
(anno)¶ True if the annotation is a Glozz unit
-
educe.stac.sanity.common.
rough_type
(anno)¶ Return either
- “EDU”
- “relation”
- or the annotation type
-
educe.stac.sanity.common.
search_for_glozz_relations
(inputs, k, pred, endpoint_is_naughty=None)¶ Return a
ReportItem
for any glozz relation that satisfies the given predicate.If endpoint_is_naughty is supplied, note which of the endpoints can be considered naughty
-
educe.stac.sanity.common.
search_for_glozz_schema
(inputs, k, pred, member_is_naughty=None)¶ Search for schema that satisfy a condition
-
educe.stac.sanity.common.
search_glozz_units
(inputs, k, pred)¶ Return an item for every unit-level annotation in the given document that satisfies some predicate
Return type: ReportItem
-
educe.stac.sanity.common.
search_in_glozz_schema
(inputs, k, stype, pred, member_is_naughty=None)¶ Search for schema whose memmbers satisfy a condition. Not to be confused with search_for_glozz_schema
-
educe.stac.sanity.common.
summarise_anno
(doc, light=False)¶ Return a function that returns a short text summary of an annotation
-
educe.stac.sanity.common.
summarise_anno_html
(doc, contexts)¶ Return a function that creates HTML descriptions of an annotation given document and contexts
educe.stac.sanity.html module¶
Helpers for building HTML Hint: import the ET for the ET package too
-
educe.stac.sanity.html.
br
(parent)¶ Create and return an HTML br tag under the parent node
-
educe.stac.sanity.html.
elem
(parent, tag, text=None, attrib=None, **kwargs)¶ Create an HTML element under the given parent node, with some text inside of it
-
educe.stac.sanity.html.
span
(parent, text=None, attrib=None, **kwargs)¶ Create and return an HTML span under the given parent node
educe.stac.sanity.main module¶
Check the corpus for any consistency problems
-
class
educe.stac.sanity.main.
SanityChecker
(args)¶ Bases:
object
Sanity checker settings and state
-
output_is_temp
()¶ True if we are writing to an output directory
-
run
()¶ Perform sanity checks and write the output
-
-
educe.stac.sanity.main.
add_element
(settings, k, html, descr, mk_path)¶ Add a link to a report element for a given document, but only if it actually exists
-
educe.stac.sanity.main.
copy_parses
(settings)¶ Copy relevant stanford parser outputs from corpus to report
-
educe.stac.sanity.main.
create_dirname
(path)¶ Create the directory beneath a path if it does not exist
-
educe.stac.sanity.main.
easy_settings
(args)¶ Modify args to reflect user-friendly defaults.
Terminates the program if args.corpus is set but does not point to an existing folder ; otherwise args.doc must be set and everything else is expected to be empty.
Parameters: args (Namespace) – Arguments of the argparser. See also
educe.stac.util.args.check_easy_settings()
-
educe.stac.sanity.main.
first_or_none
(itrs)¶ Return the first element or None if there isn’t one
-
educe.stac.sanity.main.
generate_graphs
(settings)¶ Draw SVG graphs for each of the documents in the corpus
-
educe.stac.sanity.main.
issues_descr
(report, k)¶ Return a string characterising a report as either being warnings or error (helps the user scan the index to figure out what needs clicking on)
-
educe.stac.sanity.main.
main
()¶ Sanity checker CLI entry point
-
educe.stac.sanity.main.
run_checks
(inputs, k)¶ Run sanity checks for a given document
-
educe.stac.sanity.main.
sanity_check_order
(k)¶ We want to sort file id by order of
- doc
- subdoc
- annotator
- stage (unannotated < unit < discourse)
The important bit here is the idea that we should maybe group unit and discourse for 1-3 together
-
educe.stac.sanity.main.
write_index
(settings)¶ Write the report index
educe.stac.sanity.report module¶
Reporting component of sanity checker
-
class
educe.stac.sanity.report.
HtmlReport
(anno_files, output_dir)¶ Bases:
object
Representation of a report that we would like to generate. Output will be dumped to a directory
-
anchor_name
(k, header)¶ HTML anchor name for a report section
-
css
= '\n.annoid { font-family: monospace; font-size: small; }\n.feature { font-family: monospace; }\n.snippet { font-style: italic; }\n.indented { margin-left:1em; }\n.hidden { display:none; }\n.naughty { color:red; }\n.spillover { color:red; font-weight: bold; } /* needs help to be visible */\n.missing { color:red; }\n.excess { color:blue; }\n'¶
-
delete
(k)¶ Delete the subreport for a given key. This can be used if you want to iterate through lots of different keys, generating reports incrementally and then deleting them to avoid building up memory.
No-op if we don’t have a sub-report for the given key
-
flush_subreport
(k)¶ Write and delete (to save memory)
-
has_errors
(k)¶ If we have error-level reports for the given key
-
javascript
= '\nfunction has(xs, x) {\n for (e in xs) {\n if (xs[e] === x) { return true; }\n }\n return false;\n}\n\n\nfunction toggle_hidden(name) {\n var ele = document.getElementById(name);\n var anc = document.getElementById(\'anc_\' + name);\n if (has(ele.classList, "hidden")) {\n ele.classList.remove("hidden");\n anc.innerText = "[hide]";\n } else {\n ele.classList.add("hidden");\n anc.innerText = "[show]";\n }\n}\n'¶
Attach some javascript and html to the given block-level element that turns it into a hide/show toggle block, starting out in the hidden state
-
mk_or_get_subreport
(k)¶ Initialise and cache the subreport for a key, including the subreports for each severity level below it
If already cached, retrieve from cache
-
classmethod
mk_output_path
(odir, k, extension='')¶ Generate a path within a parent directory, given a fileid
-
report
(k, err_type, severity, header, items, noisy=False)¶ Append bullet points for each item to the appropriate section of the appropriate report in progress
-
set_has_errors
(k)¶ Note that this report has seen at least one error-level severity message
-
subreport_path
(k, extension='.report.html')¶ Report for a single document
-
write
(k, path)¶ Write the subreport for a given key to the path. No-op if we don’t have a sub-report for the given key
-
-
class
educe.stac.sanity.report.
ReportItem
¶ Bases:
object
An individual reportable entry (usually involves a list of annotations), rendered as a block of text in the report
-
annotations
()¶ The annotations which this report item is about
-
html
()¶ Return an HTML element corresponding to the visualisation for this item
-
text
()¶ If you don’t want to create an HTML visualisation for a report item, you can fall back to just generating lines of text
Return type: [string]
-
-
class
educe.stac.sanity.report.
Severity
¶ Bases:
enum.Enum
Severity of a sanity check error block
-
error
= 2¶
-
warning
= 1¶
-
-
class
educe.stac.sanity.report.
SimpleReportItem
(lines)¶ Bases:
educe.stac.sanity.report.ReportItem
Report item which just consists of lines of text
-
text
()¶
-
-
educe.stac.sanity.report.
html_anno_id
(parent, anno, bracket=False)¶ Create and return an HTML span parent node displaying the local annotation id for an annotation item
-
educe.stac.sanity.report.
mk_microphone
(report, k, err_type, severity)¶ Return a convenience function that generates report entries at a fixed error type and severity level
Return type: (string, [ReportItem]) -> string
-
educe.stac.sanity.report.
snippet
(txt, stop=50)¶ truncate a string if it’s longer than stop chars