educe.rst_dt package¶
Conventions specific to the RST discourse treebank project
Submodules¶
educe.rst_dt.annotation module¶
Educe-style representation for RST discourse treebank trees
-
class
educe.rst_dt.annotation.
EDU
(num, span, text, context=None, origin=None)¶ Bases:
educe.annotation.Standoff
An RST leaf node
-
context
= None¶ See the RSTContext object
-
identifier
()¶ A global identifier (assuming the origin can be used to uniquely identify an RST tree)
-
is_left_padding
()¶ Returns True for left padding EDUs
-
classmethod
left_padding
(context=None, origin=None)¶ Return a left padding EDU
-
num
= None¶ EDU number (as used in tree node edu_span)
-
raw_text
= None¶ text that was in the EDU annotation itself
This is not the same as the text that was in the annotated document, on which all standoff annotations and spans are based.
-
set_context
(context)¶ Update the context of this annotation.
-
set_origin
(origin)¶ Update the origin of this annotation and any contained within
Parameters: origin (FileId) – File identifier of the origin of this annotation.
-
span
= None¶ text span
-
text
()¶ Return the text associated with this EDU. We try to return the underlying annotated text if we have the necessary context; if we not, we just fall back to the raw EDU text
-
-
class
educe.rst_dt.annotation.
Node
(nuclearity, edu_span, span, rel, context=None)¶ Bases:
object
A node in an RSTTree or SimpleRSTTree.
-
context
= None¶ See the RSTContext object
-
edu_span
= None¶ pair of integers denoting edu span by count
-
is_nucleus
()¶ A node can either be a nucleus, a satellite, or a root node. It may be easier to work with SimpleRSTTree, in which nodes can only either be nucleus/satellite or much more rarely, root.
-
is_satellite
()¶ A node can either be a nucleus, a satellite, or a root node.
-
nuclearity
= None¶ one of Nucleus, Satellite, Root
-
rel
= None¶ relation label (see SimpleRSTTree for a note on the different interpretation of rel with this and RSTTree)
-
span
= None¶ span
-
-
class
educe.rst_dt.annotation.
RSTContext
(text, sentences, paragraphs)¶ Bases:
object
Additional annotations or contextual information that could accompany a RST tree proper. The idea is to have each subtree pointing back to the same context object for easy retrieval.
-
paragraphs
= None¶ Paragraph annotations pointing back to the text
-
sentences
= None¶ sentence annotations pointing back to the text
-
text
(span=None)¶ Return the text associated with these annotations (or None), optionally limited to a span
-
-
class
educe.rst_dt.annotation.
RSTTree
(node, children, origin=None, verbose=False)¶ Bases:
educe.external.parser.SearchableTree
,educe.annotation.Standoff
Representation of RST trees which sticks fairly closely to the raw RST discourse treebank one.
-
edu_span
()¶ Return the span of the tree in terms of EDU count See self.span refers more to the character offsets
-
get_spans
(subtree_filter=None, exclude_root=False, span_type='edus')¶ Get the spans of a constituency tree.
Each span is described by a triplet (edu_span, nuclearity, relation).
Parameters: - subtree_filter (function, defaults to None) – Function to filter all local trees.
- exclude_root (boolean, defaults to False) – If True, exclude the span of the root node. This cannot be expressed with subtree_filter because the latter is limited to properties local to each subtree in isolation. Or maybe I just missed something.
- span_type (one of {'edus', 'chars'}) – Whether each span is expressed on EDU or character indices. Character indices are useful to compare spans from trees whose EDU segmentation differs.
Returns: spans – List of tuples, each describing a span with a tuple ((edu_start, edu_end), nuclearity, relation).
Return type: list of tuple((int, int), str, str)
-
set_origin
(origin)¶ Update the origin of this annotation and any contained within
Parameters: origin (FileId) – File identifier of the origin of this annotation.
-
text
()¶ Return the text corresponding to this RST subtree. If the context is set, we return the appropriate segment from the subset of the text. If not we just concatenate the raw text of all EDU leaves.
-
text_span
()¶
-
to_pdf
(filename)¶ Image representation in PDF.
-
to_ps
(filename)¶ Export as a PostScript image.
This function is used by _repr_png_.
-
-
exception
educe.rst_dt.annotation.
RSTTreeException
(msg)¶ Bases:
exceptions.Exception
Exceptions related to RST trees not looking like we would expect them to
-
class
educe.rst_dt.annotation.
SimpleRSTTree
(node, children, origin=None)¶ Bases:
educe.external.parser.SearchableTree
,educe.annotation.Standoff
Possibly easier representation of RST trees to work with:
- binary
- relation labels on parent nodes instead of children
Note that RSTTree and SimpleRSTTree share the same Node type but because of the subtle difference in interpretation you should be extremely careful not to mix and match.
-
classmethod
from_rst_tree
(tree)¶ Build and return a SimpleRSTTree from an RSTTree
-
get_spans
(subtree_filter=None, exclude_root=False, span_type='edus')¶ Get the spans of a constituency tree.
Each span is described by a triplet (edu_span, nuclearity, relation).
Parameters: - subtree_filter (function, defaults to None) – Function to filter all local trees.
- exclude_root (boolean, defaults to False) – If True, exclude the span of the root node. This cannot be expressed with subtree_filter because the latter is limited to properties local to each subtree in isolation. Or maybe I just missed something.
- span_type (one of {'edus', 'chars'}) – Whether each span is expressed on EDU or character indices. Character indices are useful to compare spans from trees whose EDU segmentation differs.
Returns: spans – List of tuples, each describing a span with a tuple ((edu_start, edu_end), nuclearity, relation).
Return type: list of tuple((int, int), str, str)
-
classmethod
incorporate_nuclearity_into_label
(tree)¶ Integrate nuclearity of the children into each node’s label.
Nuclearity of the children is incorporated in one of two forms, NN for multi- and NS for mono-nuclear relations.
Parameters: tree (SimpleRSTTree) – The tree of which we want a version with nuclearity incorporated Returns: mod_tree – The same tree but with the type of nuclearity incorporated Return type: SimpleRSTTree Note
This is probably not the best way to provide this functionality. In other words, refactoring is much needed here.
-
set_origin
(origin)¶ Recursively update the origin for this annotation, ie. a little link to the document metadata for this annotation.
Parameters: origin (FileId) – File identifier of the origin of this annotation.
-
text_span
()¶
-
classmethod
to_binary_rst_tree
(tree, rel='---', nuc='Root')¶ Build and return a binary RSTTree from a SimpleRSTTree.
This function is recursive, it essentially pushes the relation label from the parent to the satellite child (for mononuclear relations) or to all nucleus children (for multinuclear relations).
Parameters: - tree (SimpleRSTTree) – SimpleRSTTree to convert
- rel (string, optional) – Relation for the root node of the output
- nuc (string, optional) – Nuclearity for the root node of the output
Returns: rtree – The (binary) RSTTree that corresponds to the given SimpleRSTTree
Return type:
-
educe.rst_dt.annotation.
is_binary
(tree)¶ True if the given RST tree or SimpleRSTTree is indeed binary
educe.rst_dt.corpus module¶
Corpus management (re-exported by educe.rst_dt)
-
class
educe.rst_dt.corpus.
Reader
(corpusdir)¶ Bases:
educe.corpus.Reader
See educe.corpus.Reader for details
-
files
(doc_glob=None)¶ Parameters: doc_glob (str, optional) – Glob for document names, ie. file basenames. A common pattern is doc_glob=’wsj_*’ to exclude documents whose file basenames are of the form fileX. fileX documents are damaged compared to wsj_XX documents ie. their text and that of the corresponding document in the PTB mismatch, and text formatting is scrambled. For example, the figures reported in the paper of (Li et al., 2014) indicate they only consider wsj_XX files.
-
slurp_subcorpus
(cfiles, verbose=False)¶ See educe.rst_dt.parse for a description of RSTTree
-
-
class
educe.rst_dt.corpus.
RstDtParser
(corpus_dir, args, coarse_rels=False, fix_pseudo_rels=False, nary_enc='chain', nuc_in_label=False, exclude_file_docs=False)¶ Bases:
object
Fake parser that gets annotation from the RST-DT.
Parameters: - corpus_dir (string) – TODO
- args (TODO) – TODO
- coarse_rels (boolean, optional) – If True, relation labels are converted to their coarse-grained equivalent.
- nary_enc (string, optional) – Conversion method from constituency to dependency tree, for n-ary spans, n > 2, whose kids are all nuclei: ‘tree’ picks the leftmost nucleus as the head of all the others (effectively a tree), ‘chain’ attaches each nucleus to its predecessor (effectively a chain).
- nuc_in_label (boolean, optional) – If True, incorporate nuclearity into the label (ex: elaboration-NS) ; currently BROKEN (defined on SimpleRSTTree only).
- exclude_file_docs (boolean, default False) – If True, ignore fileX files.
-
decode
(doc_key)¶ Decode a document from the RST-DT (gold)
Parameters: doc_key (string ?) – Identifier (in corpus) of the document we want to decode. Returns: doc – Bunch of information about this document notably its list of EDUs and the structures defined on them: RSTTree, SimpleRSTTree, RstDepTree. Return type: DocumentPlus
-
parse
(doc)¶ Parse the document using the RST-DT (gold).
-
segment
(doc)¶ Segment the document into EDUs using the RST-DT (gold).
-
class
educe.rst_dt.corpus.
RstRelationConverter
(relmap_file)¶ Bases:
object
Converter for RST relations (labels)
Known to work on RstTree, possibly SimpleRstTree (untested).
-
convert_dtree
(dtree)¶ Change relation labels in an RstDepTree using the label mapping.
See attribute self.convert_label.
Parameters: dtree (RstDepTree) – RST dtree Returns: dtree – RST dtree with mapped labels. Return type: RstDepTree
-
convert_label
(label)¶ Convert a label following the mapping, lowercased otherwise
-
convert_tree
(rst_tree)¶ Change relation labels in rst_tree using the mapping
-
-
educe.rst_dt.corpus.
id_to_path
(k)¶ Given a fleshed out FileId (none of the fields are None), return a filepath for it following RST Discourse Treebank conventions.
You will likely want to add your own filename extensions to this path
-
educe.rst_dt.corpus.
mk_key
(doc)¶ Return an corpus key for a given document name
educe.rst_dt.deptree module¶
Convert RST trees to dependency trees and back.
-
class
educe.rst_dt.deptree.
RstDepTree
(edus=[], origin=None, nary_enc='chain')¶ Bases:
object
RST dependency tree
-
edus
¶ list of EDU – List of the EDUs of this document.
-
origin
¶ Document?, optional – TODO
-
nary_enc
¶ one of {‘chain’, ‘tree’}, optional – Type of encoding used for n-ary relations: ‘chain’ or ‘tree’. This determines for example how fragmented EDUs are resolved.
-
add_dependencies
(gov_num, dep_nums, labels=None, nucs=None, rank=None)¶ Add a set of dependencies with a unique governor and rank.
Parameters: - gov_num (int) – Number of the head EDU
- dep_nums (list of int) – Number of the modifier EDUs
- labels (list of string, optional) – Labels of the dependencies
- nuc (list of string, one of [NUC_S, NUC_N]) – Nuclearity of the modifiers
- rank (integer, optional) – Rank of the modifiers in the order of attachment to the head. None means it is not given declaratively and it is instead inferred from the rank of modifiers previously attached to the head.
-
add_dependency
(gov_num, dep_num, label=None, nuc='Satellite', rank=None)¶ Add a dependency between two EDUs.
Parameters: - gov_num (int) – Number of the head EDU
- dep_num (int) – Number of the modifier EDU
- label (string, optional) – Label of the dependency
- nuc (string, one of [NUC_S, NUC_N]) – Nuclearity of the modifier
- rank (integer, optional) – Rank of the modifier in the order of attachment to the head. None means it is not given declaratively and it is instead inferred from the rank of modifiers previously attached to the head.
-
append_edu
(edu)¶ Append an EDU to the list of EDUs
-
deps
(gov_idx)¶ Get the ordered list of dependents of an EDU
-
fragmented_edus
()¶ Get the fragmented EDUs in this RST tree.
Fragmented EDUs are made of two or more EDUs linked by “same-unit” relations.
Returns: frag_edus – Each fragmented EDU is given as a tuple of the indices of the fragments. Return type: list of tuple of int
-
classmethod
from_rst_tree
(rtree, nary_enc='tree')¶ Converts an ̀RSTTree` to an RstDepTree.
Parameters: nary_enc (one of {'chain', 'tree'}) – If ‘chain’, the given RSTTree is binarized first.
-
classmethod
from_simple_rst_tree
(rtree)¶ Converts a ̀SimpleRSTTree` to an RstDepTree
-
get_dependencies
(lbl_type='rel')¶ Get the list of dependencies in this dependency tree.
Each dependency is a 3-uple (gov, dep, label), gov and dep being EDUs.
Parameters: lbl_type (one of {'rel', 'rel+nuc'} (TODO 'rel+nuc+rnk'?)) – Type of the labels.
-
real_roots_idx
()¶ Get the list of the indices of the real roots
-
set_origin
(origin)¶ Update the origin of this annotation.
Parameters: origin (FileId) – File identifier of the origin of this annotation.
-
set_root
(root_num)¶ Designate an EDU as a real root of the RST tree structure
-
spans
()¶ For each EDU, get the tree span it dominates (on EDUs).
Dominance here is recursively defined.
Returns: - span_beg (array of int) – Index of the leftmost EDU dominated by an EDU.
- span_end (array of int) – Index of the rightmost EDU dominated by an EDU.
-
-
exception
educe.rst_dt.deptree.
RstDtException
(msg)¶ Bases:
exceptions.Exception
Exceptions related to conversion between RST and DT trees. The general expectation is that we only raise these on bad input, but in practice, you may see them more in cases of implementation error somewhere in the conversion process.
-
educe.rst_dt.deptree.
binary_to_nary
(nary_enc, pairs)¶ Retrieve nary relations from a set of binary relations.
Parameters: - nary_enc (one of {"chain", "tree"}) – Encoding from n-ary to binary relations.
- pairs (iterable of pairs of identifier (ex: integer, string...)) – Binary relations.
Returns: nary_rels – Nary relations.
Return type: list of tuples of identifiers
educe.rst_dt.document_plus module¶
This submodule implements a document with additional information.
-
class
educe.rst_dt.document_plus.
DocumentPlus
(key, grouping, rst_context)¶ Bases:
object
A document and relevant contextual information
-
align_with_doc_structure
()¶ Align EDUs with the document structure (paragraph and sentence).
Determine which paragraph and sentence (if any) surrounds this EDU. Try to accomodate the occasional off-by-a-smidgen error by folks marking these EDU boundaries, eg. original text:
Para1: “Magazines are not providing us in-depth information on circulation,” said Edgar Bronfman Jr., .. “How do readers feel about the magazine?... Research doesn’t tell us whether people actually do read the magazines they subscribe to.”
Para2: Reuben Mark, chief executive of Colgate-Palmolive, said...
Marked up EDU is wide to the left by three characters: “
Reuben Mark, chief executive of Colgate-Palmolive, said...
-
align_with_raw_words
()¶ Compute for each EDU the raw tokens it contains
This is a dirty temporary hack to enable backwards compatibility. There should be one clean text per document, one tokenization and so on, but, well.
-
align_with_tokens
(verbose=False)¶ Compute for each EDU the overlapping tokens.
-
align_with_trees
(strict=False)¶ Compute for each EDU the overlapping trees
-
all_edu_pairs
(ordered=True)¶ Generate all EDU pairs of a document.
Parameters: ordered (boolean, defaults to True) – If True, generate all ordered pairs of EDUs, otherwise (half as many) unordered pairs. Returns: all_pairs – All pairs of EDUs in this document. Return type: [(EDU, EDU)]
-
relations
(du_pairs, lbl_type='rel', ordered=True)¶ Get the relation that holds in each of the DU pairs.
As of 2016-09-30, this function has a unique caller: doc_vectorizer.DocumentLabelExtractor._extract_labels() .
Parameters: - du_pairs ([(DU, DU)]) – List of DU pairs.
- lbl_type (one of {'rel', 'rel+nuc'}) – Type of label.
- ordered (boolean, defaults to True) – If True, du_pairs are considered ordered, otherwise the label of either (edu1, edu2) or (edu2, edu1) is returned (if not None).
Returns: erels – Relation for each pair of DUs.
Return type: list
ofstr
-
same_unit_candidates
()¶ Generate all EDU pairs that could be a same-unit.
We use the following filters: * right-attachment: i < j, * same sentence: edu2sent[i] == edu2sent[j], * len > 1: i + 1 < j
-
set_syn_ctrees
(tkd_trees, lex_heads=None)¶ Set syntactic constituency trees for this document.
Parameters: - tkd_trees (list of nltk.tree.Tree) – Syntactic constituency trees for this document.
- lex_heads (list of (TODO: see find_lexical_heads), optional) – List of lexical heads for each node of each tree.
-
set_tokens
(tokens)¶ Set tokens for this document.
Parameters: tokens (list of Token) – List of tokens for this document.
-
-
educe.rst_dt.document_plus.
align_edus_with_paragraphs
(doc_edus, doc_paras, text, strict=False)¶ Align EDUs with paragraphs, if any.
Parameters: - doc_edus –
- doc_paras –
- strict –
Returns: edu2para – Map each EDU to the index of its enclosing paragraph. If an EDU is not properly enclosed in a paragraph, the associated index is None. For files with no paragraph marking (e.g. fileX files), returns None.
Return type: list(int or None) or None
-
educe.rst_dt.document_plus.
containing
(span)¶ span -> anno -> bool
if this annotation encloses the given span
educe.rst_dt.graph module¶
Converter from RST Discourse Treebank trees to educe-style hypergraphs
-
class
educe.rst_dt.graph.
DotGraph
(anno_graph)¶ Bases:
educe.graph.DotGraph
A dot representation of this graph for visualisation. The to_string() method is most likely to be of interest here
-
class
educe.rst_dt.graph.
Graph
¶ Bases:
educe.graph.Graph
-
classmethod
from_doc
(corpus, doc_key)¶
-
classmethod
educe.rst_dt.parse module¶
From RST discourse treebank trees to Educe-style objects (reading the format from Di Eugenio’s corpus of instructional texts).
The main classes of interest are RSTTree and EDU. RSTTree can be treated as an NLTK Tree structure. It is also an educe Standoff object, which means that it points to other RST trees (their children) or to EDU.
-
educe.rst_dt.parse.
parse_lightweight_tree
(tstr)¶ Parse lightweight RST debug syntax into SimpleRSTTree, eg.
(R:attribution (N:elaboration (N foo) (S bar) (S quux)))
This is motly useful for debugging or for knocking out quick examples
-
educe.rst_dt.parse.
parse_rst_dt_tree
(tstr, context=None)¶ Read a single RST tree from its RST DT string representation. If context is set, align the tree with it. You should really try to pass in a context (see RSTContext if you can, the None case is really intended for testing, or in cases where you don’t have an original text)
-
educe.rst_dt.parse.
read_annotation_file
(anno_filename, text_filename)¶ Read a single RST tree
educe.rst_dt.ptb module¶
Alignment the RST-WSJ-corpus with the Penn Treebank
-
class
educe.rst_dt.ptb.
PtbParser
(corpus_dir)¶ Bases:
object
Gold parser that gets annotations from the PTB.
It uses an instantiated NLTK BracketedParseCorpusReader for the PTB section relevant to the RST DT corpus.
Note that the path you give to this will probably end with something like parsed/mrg/wsj
-
parse
(doc)¶ Parse a document, using the gold PTB annotation.
Given a document, return a list of educified PTB parse trees (one per sentence).
These are almost the same as the trees that would be returned by the parsed_sents method, except that each leaf/node is associated with a span within the RST DT text.
Note: does nothing if there is no associated PTB corpus entry.
Parameters: doc (DocumentPlus) – Rich representation of the document. Returns: doc – Rich representation of the document, with syntactic constituency trees. Return type: DocumentPlus
-
tokenize
(doc)¶ Tokenize the document text using the PTB gold annotation.
Parameters: doc (DocumentPlus) – Rich representation of the document. Returns: doc – Rich representation of the document, with tokenization. Return type: DocumentPlus
-
-
educe.rst_dt.ptb.
align_edus_with_sentences
(edus, syn_trees, strict=False)¶ Map each EDU to its sentence.
If an EDU span overlaps with more than one sentence span, the sentence with maximal overlap is chosen.
Parameters: - edus (list(EDU)) – List of EDUs.
- syn_trees (list(Tree)) – List of syntactic trees, one per sentence.
- strict (boolean, default False) – If True, raise an error if an EDU does not map to exactly one sentence.
Returns: edu2sent – Map from EDU to (0-based) sentence index or None.
Return type: list(int or None)
educe.rst_dt.rst_wsj_corpus module¶
This module provides loaders for file formats found in the RST-WSJ-corpus.
-
educe.rst_dt.rst_wsj_corpus.
load_rst_wsj_corpus_edus_file
(f)¶ Load a file that contains the EDUs of a document.
Return clean text and the list of EDU offsets on the clean text.
-
educe.rst_dt.rst_wsj_corpus.
load_rst_wsj_corpus_text_file
(f)¶ Load a text file from the RST-WSJ-CORPUS.
Return the text plus its sentences and paragraphs.
The corpus contains two types of text files, so this function is mainly an entry point that delegates to the appropriate function.
-
educe.rst_dt.rst_wsj_corpus.
load_rst_wsj_corpus_text_file_file
(f)¶ Load a text file whose name is of the form file##
These files do not mark paragraphs. Each line contains a sentence preceded by two or three leading spaces.
-
educe.rst_dt.rst_wsj_corpus.
load_rst_wsj_corpus_text_file_wsj
(f)¶ Load a text file whose name is of the form wsj_##
By convention:
- paragraphs are separated by double newlines
- sentences by single newlines
Note that this segmentation isn’t particularly reliable, and seems to both over- (e.g. cut at some abbreviations, like “Prof.”) and under-segment (e.g. not separate contiguous sentences). It shouldn’t be taken too seriously, but if you need some sort of rough approximation, it may be helpful.
educe.rst_dt.sdrt module¶
Convert RST trees to SDRT style EDU/CDU annotations.
The core of the conversion is rst_to_sdrt which produces an intermediary pointer based representation (a single CDU pointing to other CDUs and EDUs).
A fancier variant, rst_to_glozz_sdrt wraps around this core and further converts the CDU into a Glozz-friendly form
-
class
educe.rst_dt.sdrt.
CDU
(members, rel_insts)¶ Complex Discourse Unit.
A CDU contains one or more discourse units, and tracks relation instances between its members. Both CDU and EDU are discourse units.
-
members
¶ list of Unit or Scheme – Immediate member units (EDUs and CDUs) of this CDU.
-
rel_insts
¶ list of Relation – Relation instances between immediate members of this CDU.
-
-
class
educe.rst_dt.sdrt.
RelInst
(source, target, type)¶ Relation instance.
educe.annotation calls these ‘Relation’s which is really more in keeping with how Glozz class them, but properly speaking relation instance is a better name.
-
source
¶ Unit? – Source of the relation instance.
-
target
¶ Unit? – Target of the relation instance.
-
type
¶ string – Name of the relation.
-
-
educe.rst_dt.sdrt.
debug_du_to_tree
(m)¶ Tree representation of CDU.
The set of relation instances is treated as the parent of each node. Loses information ; should only be used for debugging purposes.
-
educe.rst_dt.sdrt.
rst_to_glozz_sdrt
(rst_tree, annotator='ldc')¶ From an RST tree to a STAC-like version using Glozz annotations. Uses rst_to_sdrt
-
educe.rst_dt.sdrt.
rst_to_sdrt
(tree)¶ From RSTTree to CDU or EDU (recursive, top-down transformation). We recognise three patterns walking down the tree (anything else is considered to be an error):
- Pre-terminal nodes: Return the leaf EDU
- Mono-nuclear, N satellites: Return a CDU with a relation instance from the nucleus to each satellite. As an informal example, given X(attribution:S1, N, explanation-argumentative:S2), we return a CDU with sdrt(N) – attribution –> sdrt(S1) and sdrt(N) – explanation-argumentative –> sdrt(S2)
- Multi-nuclear, 0 satellites: Return a CDU with a relation instance across each successive nucleus (assume the same relation). As an informal example, given X(List:N1, List:N2, List:N3), we return a CDU containing sdrt(N1) –List–> sdrt(N2) – List –> sdrt(N3).
educe.rst_dt.text module¶
Educe-style annotations for RST discourse treebank text objects (paragraphs and sentences)
-
class
educe.rst_dt.text.
Paragraph
(num, sentences)¶ Bases:
educe.annotation.Standoff
A paragraph is a sequence of `Sentence`s (also standoff annotations).
-
classmethod
left_padding
(sentences)¶ Return a left padding Paragraph
-
num
= None¶ paragraph ID in document
-
sentences
= None¶ sentence-level annotations
-
classmethod
-
class
educe.rst_dt.text.
Sentence
(num, span)¶ Bases:
educe.annotation.Standoff
Just a text span really
-
classmethod
left_padding
()¶ Return a left padding Sentence
-
num
= None¶ sentence ID in document
-
text_span
()¶
-
classmethod
-
educe.rst_dt.text.
clean_edu_text
(text)¶ Strip metadata from EDU text and compress extraneous whitespace