educe.stac.sanity package

Submodules

educe.stac.sanity.common module

Functionality and report types common to sanity checker

class educe.stac.sanity.common.ContextItem(doc, contexts)

Bases: educe.stac.sanity.report.ReportItem

Report item involving EDU contexts

class educe.stac.sanity.common.RelationItem(doc, contexts, rel, naughty)

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz relation annotations

annotations()
html()
class educe.stac.sanity.common.SchemaItem(doc, contexts, schema, naughty)

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz schema annotations

annotations()
html()
class educe.stac.sanity.common.UnitItem(doc, contexts, unit)

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz unit-level annotations

annotations()
html()
educe.stac.sanity.common.anno_code(anno)

Short code providing a clue what the annotation is

educe.stac.sanity.common.is_default(anno)

True if the annotation has type ‘default’

educe.stac.sanity.common.is_glozz_relation(anno)

True if the annotation is a Glozz relation

educe.stac.sanity.common.is_glozz_schema(anno)

True if the annotation is a Glozz schema

educe.stac.sanity.common.is_glozz_unit(anno)

True if the annotation is a Glozz unit

educe.stac.sanity.common.rough_type(anno)

Return either

  • “EDU”
  • “relation”
  • or the annotation type
educe.stac.sanity.common.search_for_glozz_relations(inputs, k, pred, endpoint_is_naughty=None)

Return a ReportItem for any glozz relation that satisfies the given predicate.

If endpoint_is_naughty is supplied, note which of the endpoints can be considered naughty

educe.stac.sanity.common.search_for_glozz_schema(inputs, k, pred, member_is_naughty=None)

Search for schema that satisfy a condition

educe.stac.sanity.common.search_glozz_units(inputs, k, pred)

Return an item for every unit-level annotation in the given document that satisfies some predicate

Return type:ReportItem
educe.stac.sanity.common.search_in_glozz_schema(inputs, k, stype, pred, member_is_naughty=None)

Search for schema whose memmbers satisfy a condition. Not to be confused with search_for_glozz_schema

educe.stac.sanity.common.summarise_anno(doc, light=False)

Return a function that returns a short text summary of an annotation

educe.stac.sanity.common.summarise_anno_html(doc, contexts)

Return a function that creates HTML descriptions of an annotation given document and contexts

educe.stac.sanity.html module

Helpers for building HTML Hint: import the ET for the ET package too

educe.stac.sanity.html.br(parent)

Create and return an HTML br tag under the parent node

educe.stac.sanity.html.elem(parent, tag, text=None, attrib=None, **kwargs)

Create an HTML element under the given parent node, with some text inside of it

educe.stac.sanity.html.span(parent, text=None, attrib=None, **kwargs)

Create and return an HTML span under the given parent node

educe.stac.sanity.main module

Check the corpus for any consistency problems

class educe.stac.sanity.main.SanityChecker(args)

Bases: object

Sanity checker settings and state

output_is_temp()

True if we are writing to an output directory

run()

Perform sanity checks and write the output

educe.stac.sanity.main.add_element(settings, k, html, descr, mk_path)

Add a link to a report element for a given document, but only if it actually exists

educe.stac.sanity.main.copy_parses(settings)

Copy relevant stanford parser outputs from corpus to report

educe.stac.sanity.main.create_dirname(path)

Create the directory beneath a path if it does not exist

educe.stac.sanity.main.easy_settings(args)

Modify args to reflect user-friendly defaults.

Terminates the program if args.corpus is set but does not point to an existing folder ; otherwise args.doc must be set and everything else is expected to be empty.

Parameters:args (Namespace) – Arguments of the argparser.

See also

educe.stac.util.args.check_easy_settings()

educe.stac.sanity.main.first_or_none(itrs)

Return the first element or None if there isn’t one

educe.stac.sanity.main.generate_graphs(settings)

Draw SVG graphs for each of the documents in the corpus

educe.stac.sanity.main.issues_descr(report, k)

Return a string characterising a report as either being warnings or error (helps the user scan the index to figure out what needs clicking on)

educe.stac.sanity.main.main()

Sanity checker CLI entry point

educe.stac.sanity.main.run_checks(inputs, k)

Run sanity checks for a given document

educe.stac.sanity.main.sanity_check_order(k)

We want to sort file id by order of

  1. doc
  2. subdoc
  3. annotator
  4. stage (unannotated < unit < discourse)

The important bit here is the idea that we should maybe group unit and discourse for 1-3 together

educe.stac.sanity.main.write_index(settings)

Write the report index

educe.stac.sanity.report module

Reporting component of sanity checker

class educe.stac.sanity.report.HtmlReport(anno_files, output_dir)

Bases: object

Representation of a report that we would like to generate. Output will be dumped to a directory

anchor_name(k, header)

HTML anchor name for a report section

css = '\n.annoid { font-family: monospace; font-size: small; }\n.feature { font-family: monospace; }\n.snippet { font-style: italic; }\n.indented { margin-left:1em; }\n.hidden { display:none; }\n.naughty { color:red; }\n.spillover { color:red; font-weight: bold; } /* needs help to be visible */\n.missing { color:red; }\n.excess { color:blue; }\n'
delete(k)

Delete the subreport for a given key. This can be used if you want to iterate through lots of different keys, generating reports incrementally and then deleting them to avoid building up memory.

No-op if we don’t have a sub-report for the given key

flush_subreport(k)

Write and delete (to save memory)

has_errors(k)

If we have error-level reports for the given key

javascript = '\nfunction has(xs, x) {\n for (e in xs) {\n if (xs[e] === x) { return true; }\n }\n return false;\n}\n\n\nfunction toggle_hidden(name) {\n var ele = document.getElementById(name);\n var anc = document.getElementById(\'anc_\' + name);\n if (has(ele.classList, "hidden")) {\n ele.classList.remove("hidden");\n anc.innerText = "[hide]";\n } else {\n ele.classList.add("hidden");\n anc.innerText = "[show]";\n }\n}\n'
mk_hidden_with_toggle(parent, anchor)

Attach some javascript and html to the given block-level element that turns it into a hide/show toggle block, starting out in the hidden state

mk_or_get_subreport(k)

Initialise and cache the subreport for a key, including the subreports for each severity level below it

If already cached, retrieve from cache

classmethod mk_output_path(odir, k, extension='')

Generate a path within a parent directory, given a fileid

report(k, err_type, severity, header, items, noisy=False)

Append bullet points for each item to the appropriate section of the appropriate report in progress

set_has_errors(k)

Note that this report has seen at least one error-level severity message

subreport_path(k, extension='.report.html')

Report for a single document

write(k, path)

Write the subreport for a given key to the path. No-op if we don’t have a sub-report for the given key

class educe.stac.sanity.report.ReportItem

Bases: object

An individual reportable entry (usually involves a list of annotations), rendered as a block of text in the report

annotations()

The annotations which this report item is about

html()

Return an HTML element corresponding to the visualisation for this item

text()

If you don’t want to create an HTML visualisation for a report item, you can fall back to just generating lines of text

Return type:[string]
class educe.stac.sanity.report.Severity

Bases: enum.Enum

Severity of a sanity check error block

error = 2
warning = 1
class educe.stac.sanity.report.SimpleReportItem(lines)

Bases: educe.stac.sanity.report.ReportItem

Report item which just consists of lines of text

text()
educe.stac.sanity.report.html_anno_id(parent, anno, bracket=False)

Create and return an HTML span parent node displaying the local annotation id for an annotation item

educe.stac.sanity.report.mk_microphone(report, k, err_type, severity)

Return a convenience function that generates report entries at a fixed error type and severity level

Return type:(string, [ReportItem]) -> string
educe.stac.sanity.report.snippet(txt, stop=50)

truncate a string if it’s longer than stop chars