educe.pdtb package¶
Conventions specific to the Penn Discourse Treebank (PDTB) project
Subpackages¶
Submodules¶
educe.pdtb.corpus module¶
PDTB Corpus management (re-exported by educe.pdtb)
-
class
educe.pdtb.corpus.
Reader
(corpusdir)¶ Bases:
educe.corpus.Reader
See educe.corpus.Reader for details
-
files
(doc_glob=None)¶ Parameters: doc_glob (str, optional) – Glob expression for document (folder) names ; if None, it uses the wildcard ‘/‘ for folder names and file basenames.
-
slurp_subcorpus
(cfiles, verbose=False)¶ See educe.rst_dt.parse for a description of RSTTree
-
-
educe.pdtb.corpus.
id_to_path
(k)¶ Given a fleshed out FileId (none of the fields are None), return a filepath for it following Penn Discourse Treebank conventions.
You will likely want to add your own filename extensions to this path
-
educe.pdtb.corpus.
mk_key
(doc)¶ Return an corpus key for a given document name
educe.pdtb.parse module¶
Standalone parser for PDTB files.
The function parse takes a single .pdtb file and returns a list of Relation, with the following subtypes:
Relation | selection | features | sup? |
---|---|---|---|
ExplicitRelation | Selection | attr, 1 connhead | Y |
ImplicitRelation | InferenceSite | attr, 2 conn | Y |
AltLexRelation | Selection | attr, 2 semclass | Y |
EntityRelation | InferenceSite | none | N |
NoRelation | InferenceSite | none | N |
These relation subtypes are stitched together (and inherit members) from two or three components
- arguments: always arg1 and arg2; but in some cases, the arguments can have supplementary information
- selection: see either Selection or InferenceSite
- some features (see eg. ExplictRelationFeatures)
The simplest way to get to grips with this may be to try the parse function on some sample relations and print the resulting objects.
-
class
educe.pdtb.parse.
AltLexRelation
(selection, features, args)¶ Bases:
educe.pdtb.parse.Selection
,educe.pdtb.parse.AltLexRelationFeatures
,educe.pdtb.parse.Relation
-
class
educe.pdtb.parse.
AltLexRelationFeatures
(attribution, semclass1, semclass2)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
Arg
(selection, attribution=None, sup=None)¶ Bases:
educe.pdtb.parse.Selection
-
class
educe.pdtb.parse.
Attribution
(source, type, polarity, determinacy, selection=None)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
Connective
(text, semclass1, semclass2=None)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
EntityRelation
(infsite, args)¶ Bases:
educe.pdtb.parse.InferenceSite
,educe.pdtb.parse.Relation
-
class
educe.pdtb.parse.
ExplicitRelation
(selection, features, args)¶ Bases:
educe.pdtb.parse.Selection
,educe.pdtb.parse.ExplicitRelationFeatures
,educe.pdtb.parse.Relation
-
class
educe.pdtb.parse.
ExplicitRelationFeatures
(attribution, connhead)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
GornAddress
(parts)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
ImplicitRelation
(infsite, features, args)¶ Bases:
educe.pdtb.parse.InferenceSite
,educe.pdtb.parse.ImplicitRelationFeatures
,educe.pdtb.parse.Relation
-
class
educe.pdtb.parse.
ImplicitRelationFeatures
(attribution, connective1, connective2=None)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
InferenceSite
(strpos, sentnum)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
NoRelation
(infsite, args)¶ Bases:
educe.pdtb.parse.InferenceSite
,educe.pdtb.parse.Relation
-
class
educe.pdtb.parse.
PdtbItem
¶ Bases:
object
-
class
educe.pdtb.parse.
Relation
(args)¶ Bases:
educe.pdtb.parse.PdtbItem
-
arg1
¶ TODO – TODO
-
arg2
¶ TODO – TODO
-
-
class
educe.pdtb.parse.
Selection
(span, gorn, text)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
SemClass
(klass)¶ Bases:
educe.pdtb.parse.PdtbItem
-
class
educe.pdtb.parse.
Sup
(selection)¶ Bases:
educe.pdtb.parse.Selection
-
educe.pdtb.parse.
parse
(path)¶ Retrieve the list of relations found in a single .pdtb file.
Parameters: path (str) – Path to the .pdtb file (?) Returns: relations – List of relations found. Return type: list of Relation
-
educe.pdtb.parse.
parse_relation
(s)¶ Parse a single relation or throw a ParseException.
-
educe.pdtb.parse.
split_relations
(s)¶
educe.pdtb.pdtbx module¶
PDTB in an adhoc (educe-grown) XML format, unfortunately not a standard, but a little homegrown language using XML syntax. I’ll call it pdtbx. No reason it can’t be used outside of educe.
Informal DTD:
- SpanList is attribute spanList in PDTB string convention
- GornAddressList is attribute gornList in PDTB string convention
- SemClass is attribute semclass1 (and optional attribute semclass2)
- in PDTB string convention
- text in <text> elements with usual XML escaping conventions
- args in <arg> elements in order (arg1 before arg2)
- implicitRelations can have multiple connectives
-
educe.pdtb.pdtbx.
Relation_xml
(itm)¶
-
educe.pdtb.pdtbx.
Relations_xml
(itms)¶
-
educe.pdtb.pdtbx.
read_Relation
(node)¶
-
educe.pdtb.pdtbx.
read_Relations
(node)¶
-
educe.pdtb.pdtbx.
read_pdtbx_file
(filename)¶
-
educe.pdtb.pdtbx.
write_pdtbx_file
(filename, relations)¶
educe.pdtb.ptb module¶
Alignment with the Penn Treebank
-
educe.pdtb.ptb.
parse_trees
(corpus, k, ptb)¶ Given an PDTB document and an NLTK PTB reader, return the PTB trees.
Note that a future version of this function will try to educify the trees as well, but for now things will be fairly rudimentary
-
educe.pdtb.ptb.
reader
(corpus_dir)¶ An instantiated NLTK BracketedParseCorpusReader for the PTB section relevant to the PDTB corpus.
Note that the path you give to this will probably end with something like parsed/mrg/wsj