nltk.sem package¶
Submodules¶
nltk.sem.boxer module¶
An interface to Boxer.
This interface relies on the latest version of the development (subversion) version of C&C and Boxer.
- Usage:
Set the environment variable CANDCHOME to the bin directory of your CandC installation. The models directory should be in the CandC root directory. For example:
- /path/to/candc/
- bin/
- candc boxer
- models/
- boxer/
- class nltk.sem.boxer.Boxer(boxer_drs_interpreter=None, elimeq=False, bin_dir=None, verbose=False)[source]¶
Bases: builtins.object
This class is an interface to Johan Bos’s program Boxer, a wide-coverage semantic parser that produces Discourse Representation Structures (DRSs).
- batch_interpret(inputs, discourse_ids=None, question=False, verbose=False)[source]¶
Use Boxer to give a first order representation.
Parameters: - inputs – list of str Input sentences to parse as individual discourses
- occur_index – bool Should predicates be occurrence indexed?
- discourse_ids – list of str Identifiers to be inserted to each occurrence-indexed predicate.
Returns: list of drt.AbstractDrs
- batch_interpret_multisentence(inputs, discourse_ids=None, question=False, verbose=False)[source]¶
Use Boxer to give a first order representation.
Parameters: - inputs – list of list of str Input discourses to parse
- occur_index – bool Should predicates be occurrence indexed?
- discourse_ids – list of str Identifiers to be inserted to each occurrence-indexed predicate.
Returns: drt.AbstractDrs
- interpret(input, discourse_id=None, question=False, verbose=False)[source]¶
Use Boxer to give a first order representation.
Parameters: - input – str Input sentence to parse
- occur_index – bool Should predicates be occurrence indexed?
- discourse_id – str An identifier to be inserted to each occurrence-indexed predicate.
Returns: drt.AbstractDrs
- interpret_multisentence(input, discourse_id=None, question=False, verbose=False)[source]¶
Use Boxer to give a first order representation.
Parameters: - input – list of str Input sentences to parse as a single discourse
- occur_index – bool Should predicates be occurrence indexed?
- discourse_id – str An identifier to be inserted to each occurrence-indexed predicate.
Returns: drt.AbstractDrs
- class nltk.sem.boxer.BoxerCard(discourse_id, sent_index, word_indices, var, value, type)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerDrs(label, refs, conds, consequent=None)[source]¶
Bases: nltk.sem.boxer.AbstractBoxerDrs
- unicode_repr()¶
- class nltk.sem.boxer.BoxerDrsParser(discourse_id=None)[source]¶
Bases: nltk.sem.drt.DrtParser
Reparse the str form of subclasses of AbstractBoxerDrs
- class nltk.sem.boxer.BoxerEq(discourse_id, sent_index, word_indices, var1, var2)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerIndexed(discourse_id, sent_index, word_indices)[source]¶
Bases: nltk.sem.boxer.AbstractBoxerDrs
- unicode_repr()¶
- class nltk.sem.boxer.BoxerNamed(discourse_id, sent_index, word_indices, var, name, type, sense)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerNot(drs)[source]¶
Bases: nltk.sem.boxer.AbstractBoxerDrs
- unicode_repr()¶
- class nltk.sem.boxer.BoxerOr(discourse_id, sent_index, word_indices, drs1, drs2)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerOutputDrsParser(discourse_id=None)[source]¶
Bases: nltk.sem.drt.DrtParser
- class nltk.sem.boxer.BoxerPred(discourse_id, sent_index, word_indices, var, name, pos, sense)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerProp(discourse_id, sent_index, word_indices, var, drs)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerRel(discourse_id, sent_index, word_indices, var1, var2, rel, sense)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
- class nltk.sem.boxer.BoxerWhq(discourse_id, sent_index, word_indices, ans_types, drs1, variable, drs2)[source]¶
Bases: nltk.sem.boxer.BoxerIndexed
nltk.sem.chat80 module¶
Overview¶
Chat-80 was a natural language system which allowed the user to interrogate a Prolog knowledge base in the domain of world geography. It was developed in the early ‘80s by Warren and Pereira; see http://www.aclweb.org/anthology/J82-3002.pdf for a description and http://www.cis.upenn.edu/~pereira/oldies.html for the source files.
This module contains functions to extract data from the Chat-80 relation files (‘the world database’), and convert then into a format that can be incorporated in the FOL models of nltk.sem.evaluate. The code assumes that the Prolog input files are available in the NLTK corpora directory.
The Chat-80 World Database consists of the following files:
world0.pl
rivers.pl
cities.pl
countries.pl
contain.pl
borders.pl
This module uses a slightly modified version of world0.pl, in which a set of Prolog rules have been omitted. The modified file is named world1.pl. Currently, the file rivers.pl is not read in, since it uses a list rather than a string in the second field.
Reading Chat-80 Files¶
Chat-80 relations are like tables in a relational database. The relation acts as the name of the table; the first argument acts as the ‘primary key’; and subsequent arguments are further fields in the table. In general, the name of the table provides a label for a unary predicate whose extension is all the primary keys. For example, relations in cities.pl are of the following form:
'city(athens,greece,1368).'
Here, 'athens' is the key, and will be mapped to a member of the unary predicate city.
The fields in the table are mapped to binary predicates. The first argument of the predicate is the primary key, while the second argument is the data in the relevant field. Thus, in the above example, the third field is mapped to the binary predicate population_of, whose extension is a set of pairs such as '(athens, 1368)'.
An exception to this general framework is required by the relations in the files borders.pl and contains.pl. These contain facts of the following form:
'borders(albania,greece).'
'contains0(africa,central_africa).'
We do not want to form a unary concept out the element in the first field of these records, and we want the label of the binary relation just to be 'border'/'contain' respectively.
In order to drive the extraction process, we use ‘relation metadata bundles’ which are Python dictionaries such as the following:
city = {'label': 'city',
'closures': [],
'schema': ['city', 'country', 'population'],
'filename': 'cities.pl'}
According to this, the file city['filename'] contains a list of relational tuples (or more accurately, the corresponding strings in Prolog form) whose predicate symbol is city['label'] and whose relational schema is city['schema']. The notion of a closure is discussed in the next section.
Concepts¶
In order to encapsulate the results of the extraction, a class of Concept objects is introduced. A Concept object has a number of attributes, in particular a prefLabel and extension, which make it easier to inspect the output of the extraction. In addition, the extension can be further processed: in the case of the 'border' relation, we check that the relation is symmetric, and in the case of the 'contain' relation, we carry out the transitive closure. The closure properties associated with a concept is indicated in the relation metadata, as indicated earlier.
The extension of a Concept object is then incorporated into a Valuation object.
Persistence¶
The functions val_dump and val_load are provided to allow a valuation to be stored in a persistent database and re-loaded, rather than having to be re-computed each time.
Individuals and Lexical Items¶
As well as deriving relations from the Chat-80 data, we also create a set of individual constants, one for each entity in the domain. The individual constants are string-identical to the entities. For example, given a data item such as 'zloty', we add to the valuation a pair ('zloty', 'zloty'). In order to parse English sentences that refer to these entities, we also create a lexical item such as the following for each individual constant:
PropN[num=sg, sem=<\P.(P zloty)>] -> 'Zloty'
The set of rules is written to the file chat_pnames.cfg in the current directory.
- class nltk.sem.chat80.Concept(prefLabel, arity, altLabels=[], closures=[], extension=set())[source]¶
Bases: builtins.object
A Concept class, loosely based on SKOS (http://www.w3.org/TR/swbp-skos-core-guide/).
- augment(data)[source]¶
Add more data to the Concept‘s extension set.
Parameters: data (string or pair of strings) – a new semantic value Return type: set
- close()[source]¶
Close a binary relation in the Concept‘s extension set.
Returns: a new extension for the Concept in which the relation is closed under a given property
- unicode_repr()¶
- nltk.sem.chat80.binary_concept(label, closures, subj, obj, records)[source]¶
Make a binary concept out of the primary key and another field in a record.
A record is a list of entities in some relation, such as ['france', 'paris'], where 'france' is acting as the primary key, and 'paris' stands in the 'capital_of' relation to 'france'.
More generally, given a record such as ['a', 'b', 'c'], where label is bound to 'B', and obj bound to 1, the derived binary concept will have label 'B_of', and its extension will be a set of pairs such as ('a', 'b').
Parameters: - label (str) – the base part of the preferred label for the concept
- closures (list) – closure properties for the extension of the concept
- subj (int) – position in the record of the subject of the predicate
- obj (int) – position in the record of the object of the predicate
- records (list of lists) – a list of records
Returns: Concept of arity 2
Return type: Concept
- nltk.sem.chat80.cities2table(filename, rel_name, dbname, verbose=False, setup=False)[source]¶
Convert a file of Prolog clauses into a database table.
This is not generic, since it doesn’t allow arbitrary schemas to be set as a parameter.
Intended usage:
cities2table('cities.pl', 'city', 'city.db', verbose=True, setup=True)
Parameters: - filename (str) – filename containing the relations
- rel_name (str) – name of the relation
- dbname – filename of persistent store
- nltk.sem.chat80.clause2concepts(filename, rel_name, schema, closures=[])[source]¶
Convert a file of Prolog clauses into a list of Concept objects.
Parameters: - filename (str) – filename containing the relations
- rel_name (str) – name of the relation
- schema (list) – the schema used in a set of relational tuples
- closures (list) – closure properties for the extension of the concept
Returns: a list of Concept objects
Return type: list
- nltk.sem.chat80.concepts(items=('borders', 'circle_of_lat', 'circle_of_long', 'city', 'contains', 'continent', 'country', 'ocean', 'region', 'sea'))[source]¶
Build a list of concepts corresponding to the relation names in items.
Parameters: items (list of strings) – names of the Chat-80 relations to extract Returns: the Concept objects which are extracted from the relations Return type: list
- nltk.sem.chat80.label_indivs(valuation, lexicon=False)[source]¶
Assign individual constants to the individuals in the domain of a Valuation.
Given a valuation with an entry of the form {'rel': {'a': True}}, add a new entry {'a': 'a'}.
Return type: Valuation
- nltk.sem.chat80.make_lex(symbols)[source]¶
Create lexical CFG rules for each individual symbol.
Given a valuation with an entry of the form {'zloty': 'zloty'}, create a lexical rule for the proper name ‘Zloty’.
Parameters: symbols (sequence) – a list of individual constants in the semantic representation Return type: list
- nltk.sem.chat80.make_valuation(concepts, read=False, lexicon=False)[source]¶
Convert a list of Concept objects into a list of (label, extension) pairs; optionally create a Valuation object.
Parameters: - concepts (list(Concept)) – concepts
- read (bool) – if True, (symbol, set) pairs are read into a Valuation
Return type: list or Valuation
- nltk.sem.chat80.process_bundle(rels)[source]¶
Given a list of relation metadata bundles, make a corresponding dictionary of concepts, indexed by the relation name.
Parameters: rels (list of dict) – bundle of metadata needed for constructing a concept Returns: a dictionary of concepts, indexed by the relation name. Return type: dict
- nltk.sem.chat80.sql_query(dbname, query)[source]¶
Execute an SQL query over a database. :param dbname: filename of persistent store :type schema: str :param query: SQL query :type rel_name: str
- nltk.sem.chat80.unary_concept(label, subj, records)[source]¶
Make a unary concept out of the primary key in a record.
A record is a list of entities in some relation, such as ['france', 'paris'], where 'france' is acting as the primary key.
Parameters: - label (string) – the preferred label for the concept
- subj (int) – position in the record of the subject of the predicate
- records (list of lists) – a list of records
Returns: Concept of arity 1
Return type: Concept
- nltk.sem.chat80.val_dump(rels, db)[source]¶
Make a Valuation from a list of relation metadata bundles and dump to persistent database.
Parameters: - rels (list of dict) – bundle of metadata needed for constructing a concept
- db (string) – name of file to which data is written. The suffix ‘.db’ will be automatically appended.
nltk.sem.cooper_storage module¶
- class nltk.sem.cooper_storage.CooperStore(featstruct)[source]¶
Bases: builtins.object
A container for handling quantifier ambiguity via Cooper storage.
- s_retrieve(trace=False)[source]¶
Carry out S-Retrieval of binding operators in store. If hack=True, serialize the bindop and core as strings and reparse. Ugh.
Each permutation of the store (i.e. list of binding operators) is taken to be a possible scoping of quantifiers. We iterate through the binding operators in each permutation, and successively apply them to the current term, starting with the core semantic representation, working from the inside out.
Binding operators are of the form:
bo(\P.all x.(man(x) -> P(x)),z1)
nltk.sem.drt module¶
- class nltk.sem.drt.AbstractDrs[source]¶
Bases: builtins.object
This is the base abstract DRT Expression from which every DRT Expression extends.
- equiv(other, prover=None)[source]¶
Check for logical equivalence. Pass the expression (self <-> other) to the theorem prover. If the prover says it is valid, then the self and other are equal.
Parameters: - other – an AbstractDrs to check equality against
- prover – a nltk.inference.api.Prover
- class nltk.sem.drt.DRS(refs, conds, consequent=None)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.Expression
A Discourse Representation Structure.
- replace(variable, expression, replace_bound=False, alpha_convert=True)[source]¶
Replace all instances of variable v with expression E in self, where v is free in self.
- unicode_repr()¶
- class nltk.sem.drt.DrsDrawer(drs, size_canvas=True, canvas=None)[source]¶
Bases: builtins.object
- BUFFER = 3¶
- OUTERSPACE = 6¶
- TOPSPACE = 10¶
- class nltk.sem.drt.DrtAbstractVariableExpression(variable)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.AbstractVariableExpression
- class nltk.sem.drt.DrtApplicationExpression(function, argument)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.ApplicationExpression
- class nltk.sem.drt.DrtBinaryExpression(first, second)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.BinaryExpression
- class nltk.sem.drt.DrtBooleanExpression(first, second)[source]¶
Bases: nltk.sem.drt.DrtBinaryExpression, nltk.sem.logic.BooleanExpression
- class nltk.sem.drt.DrtConcatenation(first, second, consequent=None)[source]¶
Bases: nltk.sem.drt.DrtBooleanExpression
DRS of the form ‘(DRS + DRS)’
- replace(variable, expression, replace_bound=False, alpha_convert=True)[source]¶
Replace all instances of variable v with expression E in self, where v is free in self.
- unicode_repr()¶
- class nltk.sem.drt.DrtConstantExpression(variable)[source]¶
Bases: nltk.sem.drt.DrtAbstractVariableExpression, nltk.sem.logic.ConstantExpression
- class nltk.sem.drt.DrtEqualityExpression(first, second)[source]¶
Bases: nltk.sem.drt.DrtBinaryExpression, nltk.sem.logic.EqualityExpression
- class nltk.sem.drt.DrtEventVariableExpression(variable)[source]¶
Bases: nltk.sem.drt.DrtIndividualVariableExpression, nltk.sem.logic.EventVariableExpression
- class nltk.sem.drt.DrtFunctionVariableExpression(variable)[source]¶
Bases: nltk.sem.drt.DrtAbstractVariableExpression, nltk.sem.logic.FunctionVariableExpression
- class nltk.sem.drt.DrtIndividualVariableExpression(variable)[source]¶
Bases: nltk.sem.drt.DrtAbstractVariableExpression, nltk.sem.logic.IndividualVariableExpression
- class nltk.sem.drt.DrtLambdaExpression(variable, term)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.LambdaExpression
- class nltk.sem.drt.DrtNegatedExpression(term)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.NegatedExpression
- class nltk.sem.drt.DrtOrExpression(first, second)[source]¶
Bases: nltk.sem.drt.DrtBooleanExpression, nltk.sem.logic.OrExpression
- class nltk.sem.drt.DrtParser[source]¶
Bases: nltk.sem.logic.LogicParser
A lambda calculus expression parser.
- get_BooleanExpression_factory(tok)[source]¶
This method serves as a hook for other logic parsers that have different boolean operators
- handle(tok, context)[source]¶
This method is intended to be overridden for logics that use different operators or expressions
- class nltk.sem.drt.DrtProposition(variable, drs)[source]¶
Bases: nltk.sem.drt.AbstractDrs, nltk.sem.logic.Expression
- unicode_repr()¶
- class nltk.sem.drt.DrtTokens[source]¶
Bases: nltk.sem.logic.Tokens
- CLOSE_BRACKET = ']'¶
- COLON = ':'¶
- DRS = 'DRS'¶
- DRS_CONC = '+'¶
- OPEN_BRACKET = '['¶
- PRONOUN = 'PRO'¶
- PUNCT = ['+', '[', ']', ':']¶
- SYMBOLS = ['&', '^', '|', '->', '=>', '<->', '<=>', '=', '==', '!=', '\\', '.', '(', ')', ',', '-', '!', '+', '[', ']', ':']¶
- TOKENS = ['and', '&', '^', 'or', '|', 'implies', '->', '=>', 'iff', '<->', '<=>', '=', '==', '!=', 'some', 'exists', 'exist', 'all', 'forall', '\\', '.', '(', ')', ',', 'not', '-', '!', 'DRS', '+', '[', ']', ':']¶
- nltk.sem.drt.DrtVariableExpression(variable)[source]¶
This is a factory method that instantiates and returns a subtype of DrtAbstractVariableExpression appropriate for the given variable.
- class nltk.sem.drt.PossibleAntecedents[source]¶
Bases: builtins.list, nltk.sem.drt.AbstractDrs, nltk.sem.logic.Expression
- replace(variable, expression, replace_bound=False, alpha_convert=True)[source]¶
Replace all instances of variable v with expression E in self, where v is free in self.
- unicode_repr None¶
x.__repr__() <==> repr(x)
nltk.sem.drt_glue_demo module¶
- class nltk.sem.drt_glue_demo.DrtGlueDemo(examples)[source]¶
Bases: builtins.object
nltk.sem.evaluate module¶
This module provides data structures for representing first-order models.
- class nltk.sem.evaluate.Assignment(domain, assign=None)[source]¶
Bases: builtins.dict
A dictionary which represents an assignment of values to variables.
An assigment can only assign values from its domain.
If an unknown expression a is passed to a model M‘s interpretation function i, i will first check whether M‘s valuation assigns an interpretation to a as a constant, and if this fails, i will delegate the interpretation of a to g. g only assigns values to individual variables (i.e., members of the class IndividualVariableExpression in the logic module. If a variable is not assigned a value by g, it will raise an Undefined exception.
A variable Assignment is a mapping from individual variables to entities in the domain. Individual variables are usually indicated with the letters 'x', 'y', 'w' and 'z', optionally followed by an integer (e.g., 'x0', 'y332'). Assignments are created using the Assignment constructor, which also takes the domain as a parameter.
>>> from nltk.sem.evaluate import Assignment >>> dom = set(['u1', 'u2', 'u3', 'u4']) >>> g3 = Assignment(dom, [('x', 'u1'), ('y', 'u2')]) >>> g3 {'y': 'u2', 'x': 'u1'}
There is also a print format for assignments which uses a notation closer to that in logic textbooks:
>>> print(g3) g[u2/y][u1/x]
It is also possible to update an assignment using the add method:
>>> dom = set(['u1', 'u2', 'u3', 'u4']) >>> g4 = Assignment(dom) >>> g4.add('x', 'u1') {'x': 'u1'}
With no arguments, purge() is equivalent to clear() on a dictionary:
>>> g4.purge() >>> g4 {}
Parameters: - domain (set) – the domain of discourse
- assign (list) – a list of (varname, value) associations
- purge(var=None)[source]¶
Remove one or all keys (i.e. logic variables) from an assignment, and update self.variant.
Parameters: var – a Variable acting as a key for the assignment.
- unicode_repr None¶
x.__repr__() <==> repr(x)
- class nltk.sem.evaluate.Model(domain, valuation)[source]¶
Bases: builtins.object
A first order model is a domain D of discourse and a valuation V.
A domain D is a set, and a valuation V is a map that associates expressions with values in the model. The domain of V should be a subset of D.
Construct a new Model.
Parameters: - domain (set) – A set of entities representing the domain of discourse of the model.
- valuation (Valuation) – the valuation of the model.
- prop – If this is set, then we are building a propositional model and don’t require the domain of V to be subset of D.
- evaluate(expr, g, trace=None)[source]¶
Call the LogicParser to parse input expressions, and provide a handler for satisfy that blocks further propagation of the Undefined error. :param expr: An Expression of logic. :type g: Assignment :param g: an assignment to individual variables. :rtype: bool or ‘Undefined’
- i(parsed, g, trace=False)[source]¶
An interpretation function.
Assuming that parsed is atomic:
- if parsed is a non-logical constant, calls the valuation V
- else if parsed is an individual variable, calls assignment g
- else returns Undefined.
Parameters: - parsed – an Expression of logic.
- g (Assignment) – an assignment to individual variables.
Returns: a semantic value
- satisfiers(parsed, varex, g, trace=None, nesting=0)[source]¶
Generate the entities from the model’s domain that satisfy an open formula.
Parameters: - parsed (Expression) – an open formula
- varex (VariableExpression or str) – the relevant free individual variable in parsed.
- g (Assignment) – a variable assignment
Returns: a set of the entities that satisfy parsed.
- satisfy(parsed, g, trace=None)[source]¶
Recursive interpretation function for a formula of first-order logic.
Raises an Undefined error when parsed is an atomic string but is not a symbol or an individual variable.
Returns: Returns a truth value or Undefined if parsed is complex, and calls the interpretation function i if parsed is atomic.
Parameters: - parsed – An expression of logic.
- g (Assignment) – an assignment to individual variables.
- unicode_repr()¶
- exception nltk.sem.evaluate.Undefined[source]¶
Bases: nltk.sem.evaluate.Error
- class nltk.sem.evaluate.Valuation(iter)[source]¶
Bases: builtins.dict
A dictionary which represents a model-theoretic Valuation of non-logical constants. Keys are strings representing the constants to be interpreted, and values correspond to individuals (represented as strings) and n-ary relations (represented as sets of tuples of strings).
An instance of Valuation will raise a KeyError exception (i.e., just behave like a standard dictionary) if indexed with an expression that is not in its list of symbols.
- unicode_repr None¶
x.__repr__() <==> repr(x)
- nltk.sem.evaluate.arity(rel)[source]¶
Check the arity of a relation. :type rel: set of tuples :rtype: int of tuple of str
- nltk.sem.evaluate.demo(num=0, trace=None)[source]¶
Run exists demos.
- num = 1: propositional logic demo
- num = 2: first order model demo (only if trace is set)
- num = 3: first order sentences demo
- num = 4: satisfaction of open formulas demo
- any other value: run all the demos
Parameters: trace – trace = 1, or trace = 2 for more verbose tracing
- nltk.sem.evaluate.foldemo(trace=None)[source]¶
Interpretation of closed expressions in a first-order model.
- nltk.sem.evaluate.is_rel(s)[source]¶
Check whether a set represents a relation (of any arity).
Parameters: s (set) – a set containing tuples of str elements Return type: bool
- nltk.sem.evaluate.satdemo(trace=None)[source]¶
Satisfiers of an open formula in a first order model.
- nltk.sem.evaluate.set2rel(s)[source]¶
Convert a set containing individuals (strings or numbers) into a set of unary tuples. Any tuples of strings already in the set are passed through unchanged.
- For example:
- set([‘a’, ‘b’]) => set([(‘a’,), (‘b’,)])
- set([3, 27]) => set([(‘3’,), (‘27’,)])
Return type: set of tuple of str
nltk.sem.glue module¶
- class nltk.sem.glue.DrtGlue(semtype_file=None, remove_duplicates=False, depparser=None, verbose=False)[source]¶
Bases: nltk.sem.glue.Glue
- class nltk.sem.glue.DrtGlueDict(filename, encoding=None)[source]¶
Bases: nltk.sem.glue.GlueDict
- class nltk.sem.glue.DrtGlueFormula(meaning, glue, indices=None)[source]¶
Bases: nltk.sem.glue.GlueFormula
- class nltk.sem.glue.Glue(semtype_file=None, remove_duplicates=False, depparser=None, verbose=False)[source]¶
Bases: builtins.object
- class nltk.sem.glue.GlueDict(filename, encoding=None)[source]¶
Bases: builtins.dict
- get_label(node)[source]¶
Pick an alphabetic character as identifier for an entity in the model.
Parameters: value (int) – where to index into the list of characters
- get_meaning_formula(generic, word)[source]¶
Parameters: generic – A meaning formula string containing the parameter “<word>” :param word: The actual word to be replace “<word>”
- get_semtypes(node)[source]¶
Based on the node, return a list of plausible semtypes in order of plausibility.
- lookup_unique(rel, node, depgraph)[source]¶
Lookup ‘key’. There should be exactly one item in the associated relation.
- unicode_repr None¶
x.__repr__() <==> repr(x)
nltk.sem.hole module¶
An implementation of the Hole Semantics model, following Blackburn and Bos, Representation and Inference for Natural Language (CSLI, 2005).
The semantic representations are built by the grammar hole.fcfg. This module contains driver code to read in sentences and parse them according to a hole semantics grammar.
After parsing, the semantic representation is in the form of an underspecified representation that is not easy to read. We use a “plugging” algorithm to convert that representation into first-order logic formulas.
- class nltk.sem.hole.Constants[source]¶
Bases: builtins.object
- ALL = 'ALL'¶
- AND = 'AND'¶
- EXISTS = 'EXISTS'¶
- HOLE = 'HOLE'¶
- IFF = 'IFF'¶
- IMP = 'IMP'¶
- LABEL = 'LABEL'¶
- LEQ = 'LEQ'¶
- MAP = {'AND': <class 'nltk.sem.logic.AndExpression'>, 'IMP': <class 'nltk.sem.logic.ImpExpression'>, 'ALL': <function <lambda> at 0x150439738>, 'IFF': <class 'nltk.sem.logic.IffExpression'>, 'EXISTS': <function <lambda> at 0x150439490>, 'NOT': <class 'nltk.sem.logic.NegatedExpression'>, 'PRED': <class 'nltk.sem.logic.ApplicationExpression'>, 'OR': <class 'nltk.sem.logic.OrExpression'>}¶
- NOT = 'NOT'¶
- OR = 'OR'¶
- PRED = 'PRED'¶
- class nltk.sem.hole.Constraint(lhs, rhs)[source]¶
Bases: builtins.object
This class represents a constraint of the form (L =< N), where L is a label and N is a node (a label or a hole).
- unicode_repr()¶
- class nltk.sem.hole.HoleSemantics(usr)[source]¶
Bases: builtins.object
This class holds the broken-down components of a hole semantics, i.e. it extracts the holes, labels, logic formula fragments and constraints out of a big conjunction of such as produced by the hole semantics grammar. It then provides some operations on the semantics dealing with holes, labels and finding legal ways to plug holes with labels.
nltk.sem.lfg module¶
nltk.sem.linearlogic module¶
- class nltk.sem.linearlogic.ApplicationExpression(function, argument, argument_indices=None)[source]¶
Bases: nltk.sem.linearlogic.Expression
- simplify(bindings=None)[source]¶
Since function is an implication, return its consequent. There should be no need to check that the application is valid since the checking is done by the constructor.
Parameters: bindings – BindingDict A dictionary of bindings used to simplify Returns: Expression
- unicode_repr()¶
- class nltk.sem.linearlogic.AtomicExpression(name, dependencies=None)[source]¶
Bases: nltk.sem.linearlogic.Expression
- compile_neg(index_counter, glueFormulaFactory)[source]¶
From Iddo Lev’s PhD Dissertation p108-109
Parameters: - index_counter – Counter for unique indices
- glueFormulaFactory – GlueFormula for creating new glue formulas
Returns: (Expression,set) for the compiled linear logic and any newly created glue formulas
- compile_pos(index_counter, glueFormulaFactory)[source]¶
From Iddo Lev’s PhD Dissertation p108-109
Parameters: - index_counter – Counter for unique indices
- glueFormulaFactory – GlueFormula for creating new glue formulas
Returns: (Expression,set) for the compiled linear logic and any newly created glue formulas
- simplify(bindings=None)[source]¶
If ‘self’ is bound by ‘bindings’, return the atomic to which it is bound. Otherwise, return self.
Parameters: bindings – BindingDict A dictionary of bindings used to simplify Returns: AtomicExpression
- unicode_repr()¶
- class nltk.sem.linearlogic.BindingDict(bindings=None)[source]¶
Bases: builtins.object
- unicode_repr()¶
- class nltk.sem.linearlogic.ConstantExpression(name, dependencies=None)[source]¶
Bases: nltk.sem.linearlogic.AtomicExpression
- unify(other, bindings)[source]¶
If ‘other’ is a constant, then it must be equal to ‘self’. If ‘other’ is a variable, then it must not be bound to anything other than ‘self’.
Parameters: - other – Expression
- bindings – BindingDict A dictionary of all current bindings
Returns: BindingDict A new combined dictionary of of ‘bindings’ and any new binding
Raises UnificationException: If ‘self’ and ‘other’ cannot be unified in the context of ‘bindings’
- class nltk.sem.linearlogic.ImpExpression(antecedent, consequent)[source]¶
Bases: nltk.sem.linearlogic.Expression
- compile_neg(index_counter, glueFormulaFactory)[source]¶
From Iddo Lev’s PhD Dissertation p108-109
Parameters: - index_counter – Counter for unique indices
- glueFormulaFactory – GlueFormula for creating new glue formulas
Returns: (Expression,list of GlueFormula) for the compiled linear logic and any newly created glue formulas
- compile_pos(index_counter, glueFormulaFactory)[source]¶
From Iddo Lev’s PhD Dissertation p108-109
Parameters: - index_counter – Counter for unique indices
- glueFormulaFactory – GlueFormula for creating new glue formulas
Returns: (Expression,set) for the compiled linear logic and any newly created glue formulas
- unicode_repr()¶
- unify(other, bindings)[source]¶
Both the antecedent and consequent of ‘self’ and ‘other’ must unify.
Parameters: - other – ImpExpression
- bindings – BindingDict A dictionary of all current bindings
Returns: BindingDict A new combined dictionary of of ‘bindings’ and any new bindings
Raises UnificationException: If ‘self’ and ‘other’ cannot be unified in the context of ‘bindings’
- class nltk.sem.linearlogic.LinearLogicParser[source]¶
Bases: nltk.sem.logic.LogicParser
A linear logic expression parser.
- class nltk.sem.linearlogic.Tokens[source]¶
Bases: builtins.object
- CLOSE = ')'¶
- IMP = '-o'¶
- OPEN = '('¶
- PUNCT = ['(', ')']¶
- TOKENS = ['(', ')', '-o']¶
- exception nltk.sem.linearlogic.UnificationException(a, b, bindings)[source]¶
Bases: builtins.Exception
- class nltk.sem.linearlogic.VariableExpression(name, dependencies=None)[source]¶
Bases: nltk.sem.linearlogic.AtomicExpression
- unify(other, bindings)[source]¶
‘self’ must not be bound to anything other than ‘other’.
Parameters: - other – Expression
- bindings – BindingDict A dictionary of all current bindings
Returns: BindingDict A new combined dictionary of of ‘bindings’ and the new binding
Raises UnificationException: If ‘self’ and ‘other’ cannot be unified in the context of ‘bindings’
nltk.sem.logic module¶
A version of first order predicate logic, built on top of the typed lambda calculus.
- class nltk.sem.logic.AbstractVariableExpression(variable)[source]¶
Bases: nltk.sem.logic.Expression
This class represents a variable to be used as a predicate or entity
- replace(variable, expression, replace_bound=False, alpha_convert=True)[source]¶
See: Expression.replace()
- unicode_repr()¶
- class nltk.sem.logic.AndExpression(first, second)[source]¶
Bases: nltk.sem.logic.BooleanExpression
This class represents conjunctions
- class nltk.sem.logic.AnyType[source]¶
Bases: nltk.sem.logic.BasicType, nltk.sem.logic.ComplexType
- unicode_repr()¶
- class nltk.sem.logic.ApplicationExpression(function, argument)[source]¶
Bases: nltk.sem.logic.Expression
This class is used to represent two related types of logical expressions.
The first is a Predicate Expression, such as “P(x,y)”. A predicate expression is comprised of a FunctionVariableExpression or ConstantExpression as the predicate and a list of Expressions as the arguments.
The second is a an application of one expression to another, such as “(x.dog(x))(fido)”.
The reason Predicate Expressions are treated as Application Expressions is that the Variable Expression predicate of the expression may be replaced with another Expression, such as a LambdaExpression, which would mean that the Predicate should be thought of as being applied to the arguments.
The LogicParser will always curry arguments in a application expression. So, “x y.see(x,y)(john,mary)” will be represented internally as “((x y.(see(x))(y))(john))(mary)”. This simplifies the internals since there will always be exactly one argument in an application.
The str() method will usually print the curried forms of application expressions. The one exception is when the the application expression is really a predicate expression (ie, underlying function is an AbstractVariableExpression). This means that the example from above will be returned as “(x y.see(x,y)(john))(mary)”.
- is_atom()[source]¶
Is this expression an atom (as opposed to a lambda expression applied to a term)?
- pred None[source]¶
Return uncurried base-function. If this is an atom, then the result will be a variable expression. Otherwise, it will be a lambda expression.
- unicode_repr()¶
- class nltk.sem.logic.BasicType[source]¶
Bases: nltk.sem.logic.Type
- class nltk.sem.logic.BinaryExpression(first, second)[source]¶
Bases: nltk.sem.logic.Expression
- unicode_repr()¶
- class nltk.sem.logic.ComplexType(first, second)[source]¶
Bases: nltk.sem.logic.Type
- unicode_repr()¶
- class nltk.sem.logic.ConstantExpression(variable)[source]¶
Bases: nltk.sem.logic.AbstractVariableExpression
This class represents variables that do not take the form of a single character followed by zero or more digits.
- type = e¶
- class nltk.sem.logic.EntityType[source]¶
Bases: nltk.sem.logic.BasicType
- unicode_repr()¶
- class nltk.sem.logic.EqualityExpression(first, second)[source]¶
Bases: nltk.sem.logic.BinaryExpression
This class represents equality expressions like “(x = y)”.
- class nltk.sem.logic.EventType[source]¶
Bases: nltk.sem.logic.BasicType
- unicode_repr()¶
- class nltk.sem.logic.EventVariableExpression(variable)[source]¶
Bases: nltk.sem.logic.IndividualVariableExpression
This class represents variables that take the form of a single lowercase ‘e’ character followed by zero or more digits.
- type = v¶
- class nltk.sem.logic.Expression[source]¶
Bases: nltk.sem.logic.SubstituteBindingsI
This is the base abstract object for all logical expressions
- constants()[source]¶
Return a set of individual constants (non-predicates). :return: set of Variable objects
- equiv(other, prover=None)[source]¶
Check for logical equivalence. Pass the expression (self <-> other) to the theorem prover. If the prover says it is valid, then the self and other are equal.
Parameters: - other – an Expression to check equality against
- prover – a nltk.inference.api.Prover
- findtype(variable)[source]¶
Find the type of the given variable as it is used in this expression. For example, finding the type of “P” in “P(x) & Q(x,y)” yields “<e,t>”
Parameters: variable – Variable
- free()[source]¶
Return a set of all the free (non-bound) variables. This includes both individual and predicate variables, but not constants. :return: set of Variable objects
- predicates()[source]¶
Return a set of predicates (constants, not variables). :return: set of Variable objects
- replace(variable, expression, replace_bound=False, alpha_convert=True)[source]¶
Replace every instance of ‘variable’ with ‘expression’ :param variable: Variable The variable to replace :param expression: Expression The expression with which to replace it :param replace_bound: bool Should bound variables be replaced? :param alpha_convert: bool Alpha convert automatically to avoid name clashes?
- typecheck(signature=None)[source]¶
Infer and check types. Raise exceptions if necessary.
Parameters: signature – dict that maps variable names to types (or string representations of types) Returns: the signature, plus any additional type mappings
- unicode_repr()¶
- variables()[source]¶
Return a set of all the variables for binding substitution. The variables returned include all free (non-bound) individual variables and any variable starting with ‘?’ or ‘@’. :return: set of Variable objects
- visit(function, combinator)[source]¶
Recursively visit subexpressions. Apply ‘function’ to each subexpression and pass the result of each function application to the ‘combinator’ for aggregation:
return combinator(map(function, self.subexpressions))Bound variables are neither applied upon by the function nor given to the combinator. :param function: Function<Expression,T> to call on each subexpression :param combinator: Function<list<T>,R> to combine the results of the function calls :return: result of combination R
- visit_structured(function, combinator)[source]¶
Recursively visit subexpressions. Apply ‘function’ to each subexpression and pass the result of each function application to the ‘combinator’ for aggregation. The combinator must have the same signature as the constructor. The function is not applied to bound variables, but they are passed to the combinator. :param function: Function to call on each subexpression :param combinator: Function with the same signature as the constructor, to combine the results of the function calls :return: result of combination
- class nltk.sem.logic.FunctionVariableExpression(variable)[source]¶
Bases: nltk.sem.logic.AbstractVariableExpression
This class represents variables that take the form of a single uppercase character followed by zero or more digits.
- type = ?¶
- class nltk.sem.logic.IffExpression(first, second)[source]¶
Bases: nltk.sem.logic.BooleanExpression
This class represents biconditionals
- exception nltk.sem.logic.IllegalTypeException(expression, other_type, allowed_type)[source]¶
Bases: nltk.sem.logic.TypeException
- class nltk.sem.logic.ImpExpression(first, second)[source]¶
Bases: nltk.sem.logic.BooleanExpression
This class represents implications
- exception nltk.sem.logic.InconsistentTypeHierarchyException(variable, expression=None)[source]¶
Bases: nltk.sem.logic.TypeException
- class nltk.sem.logic.IndividualVariableExpression(variable)[source]¶
Bases: nltk.sem.logic.AbstractVariableExpression
This class represents variables that take the form of a single lowercase character (other than ‘e’) followed by zero or more digits.
- type None¶
- class nltk.sem.logic.LambdaExpression(variable, term)[source]¶
Bases: nltk.sem.logic.VariableBinderExpression
- unicode_repr()¶
- class nltk.sem.logic.LogicParser(type_check=False)[source]¶
Bases: builtins.object
A lambda calculus expression parser.
- attempt_ApplicationExpression(expression, context)[source]¶
Attempt to make an application expression. The next tokens are a list of arguments in parens, then the argument expression is a function being applied to the arguments. Otherwise, return the argument expression.
- attempt_BooleanExpression(expression, context)[source]¶
Attempt to make a boolean expression. If the next token is a boolean operator, then a BooleanExpression will be returned. Otherwise, the parameter will be returned.
- attempt_EqualityExpression(expression, context)[source]¶
Attempt to make an equality expression. If the next token is an equality operator, then an EqualityExpression will be returned. Otherwise, the parameter will be returned.
- get_BooleanExpression_factory(tok)[source]¶
This method serves as a hook for other logic parsers that have different boolean operators
- get_QuantifiedExpression_factory(tok)[source]¶
This method serves as a hook for other logic parsers that have different quantifiers
- handle(tok, context)[source]¶
This method is intended to be overridden for logics that use different operators or expressions
- make_EqualityExpression(first, second)[source]¶
This method serves as a hook for other logic parsers that have different equality expression classes
- parse(data, signature=None)[source]¶
Parse the expression.
Parameters: - data – str for the input to be parsed
- signature – dict<str, str> that maps variable names to type
strings :returns: a parsed Expression
- parse_Expression(context)[source]¶
Parse the next complete expression from the stream and return it.
- token(location=None)[source]¶
Get the next waiting token. If a location is given, then return the token at currentIndex+location without advancing currentIndex; setting it gives lookahead/lookback capability.
- type_check = None¶
A list of tuples of quote characters. The 4-tuple is comprised of the start character, the end character, the escape character, and a boolean indicating whether the quotes should be included in the result. Quotes are used to signify that a token should be treated as atomic, ignoring any special characters within the token. The escape character allows the quote end character to be used within the quote. If True, the boolean indicates that the final token should contain the quote and escape characters. This method exists to be overridden
- unicode_repr()¶
- class nltk.sem.logic.NegatedExpression(term)[source]¶
Bases: nltk.sem.logic.Expression
- unicode_repr()¶
- class nltk.sem.logic.OrExpression(first, second)[source]¶
Bases: nltk.sem.logic.BooleanExpression
This class represents disjunctions
- class nltk.sem.logic.QuantifiedExpression(variable, term)[source]¶
Bases: nltk.sem.logic.VariableBinderExpression
- unicode_repr()¶
- class nltk.sem.logic.StringTrie(strings=None)[source]¶
Bases: collections.defaultdict
- LEAF = '<leaf>'¶
- class nltk.sem.logic.SubstituteBindingsI[source]¶
Bases: builtins.object
An interface for classes that can perform substitutions for variables.
- class nltk.sem.logic.Tokens[source]¶
Bases: builtins.object
- ALL = 'all'¶
- ALL_LIST = ['all', 'forall']¶
- AND = '&'¶
- AND_LIST = ['and', '&', '^']¶
- BINOPS = ['and', '&', '^', 'or', '|', 'implies', '->', '=>', 'iff', '<->', '<=>']¶
- CLOSE = ')'¶
- COMMA = ','¶
- DOT = '.'¶
- EQ = '='¶
- EQ_LIST = ['=', '==']¶
- EXISTS = 'exists'¶
- EXISTS_LIST = ['some', 'exists', 'exist']¶
- IFF = '<->'¶
- IFF_LIST = ['iff', '<->', '<=>']¶
- IMP = '->'¶
- IMP_LIST = ['implies', '->', '=>']¶
- LAMBDA = '\\'¶
- LAMBDA_LIST = ['\\']¶
- NEQ = '!='¶
- NEQ_LIST = ['!=']¶
- NOT = '-'¶
- NOT_LIST = ['not', '-', '!']¶
- OPEN = '('¶
- OR = '|'¶
- OR_LIST = ['or', '|']¶
- PUNCT = ['.', '(', ')', ',']¶
- QUANTS = ['some', 'exists', 'exist', 'all', 'forall']¶
- SYMBOLS = ['&', '^', '|', '->', '=>', '<->', '<=>', '=', '==', '!=', '\\', '.', '(', ')', ',', '-', '!']¶
- TOKENS = ['and', '&', '^', 'or', '|', 'implies', '->', '=>', 'iff', '<->', '<=>', '=', '==', '!=', 'some', 'exists', 'exist', 'all', 'forall', '\\', '.', '(', ')', ',', 'not', '-', '!']¶
- class nltk.sem.logic.TruthValueType[source]¶
Bases: nltk.sem.logic.BasicType
- unicode_repr()¶
- exception nltk.sem.logic.TypeResolutionException(expression, other_type)[source]¶
Bases: nltk.sem.logic.TypeException
- exception nltk.sem.logic.UnexpectedTokenException(index, unexpected=None, expected=None, message=None)[source]¶
- class nltk.sem.logic.VariableBinderExpression(variable, term)[source]¶
Bases: nltk.sem.logic.Expression
This an abstract class for any Expression that binds a variable in an Expression. This includes LambdaExpressions and Quantified Expressions
- alpha_convert(newvar)[source]¶
Rename all occurrences of the variable introduced by this variable binder in the expression to newvar. :param newvar: Variable, for the new variable
- nltk.sem.logic.VariableExpression(variable)[source]¶
This is a factory method that instantiates and returns a subtype of AbstractVariableExpression appropriate for the given variable.
- nltk.sem.logic.is_eventvar(expr)[source]¶
An event variable must be a single lowercase ‘e’ character followed by zero or more digits.
Parameters: expr – str Returns: bool True if expr is of the correct form
- nltk.sem.logic.is_funcvar(expr)[source]¶
A function variable must be a single uppercase character followed by zero or more digits.
Parameters: expr – str Returns: bool True if expr is of the correct form
- nltk.sem.logic.is_indvar(expr)[source]¶
An individual variable must be a single lowercase character other than ‘e’, followed by zero or more digits.
Parameters: expr – str Returns: bool True if expr is of the correct form
- nltk.sem.logic.parse_logic(s, logic_parser=None, encoding=None)[source]¶
Convert a file of First Order Formulas into a list of {Expression}s.
Parameters: - s (str) – the contents of the file
- logic_parser (LogicParser) – The parser to be used to parse the logical expression
- encoding (str) – the encoding of the input string, if it is binary
Returns: a list of parsed formulas.
Return type: list(Expression)
- nltk.sem.logic.skolem_function(univ_scope=None)[source]¶
Return a skolem function over the variables in univ_scope param univ_scope
nltk.sem.relextract module¶
Code for extracting relational triples from the ieer and conll2002 corpora.
Relations are stored internally as dictionaries (‘reldicts’).
The two serialization outputs are “rtuple” and “clause”.
- An rtuple is a tuple of the form (subj, filler, obj), where subj and obj are pairs of Named Entity mentions, and filler is the string of words occurring between sub and obj (with no intervening NEs). Strings are printed via repr() to circumvent locale variations in rendering utf-8 encoded strings.
- A clause is an atom of the form relsym(subjsym, objsym), where the relation, subject and object have been canonicalized to single strings.
- nltk.sem.relextract.class_abbrev(type)[source]¶
Abbreviate an NE class name. :type type: str :rtype: str
- nltk.sem.relextract.clause(reldict, relsym)[source]¶
Print the relation in clausal form. :param reldict: a relation dictionary :type reldict: defaultdict :param relsym: a label for the relation :type relsym: str
- nltk.sem.relextract.conllned(trace=1)[source]¶
Find the copula+’van’ relation (‘of’) in the Dutch tagged training corpus from CoNLL 2002.
- nltk.sem.relextract.descape_entity(m, defs={'zwnj': '\u200c', 'aring': 'å', 'gt': '>', 'yen': '¥', 'ograve': 'ò', 'Chi': 'Χ', 'delta': 'δ', 'rang': '〉', 'trade': '™', 'Ntilde': 'Ñ', 'upsih': 'ϒ', 'Yacute': 'Ý', 'Atilde': 'Ã', 'radic': '√', 'otimes': '⊗', 'aelig': 'æ', 'oelig': 'œ', 'equiv': '≡', 'ni': '∋', 'Psi': 'Ψ', 'auml': 'ä', 'cup': '∪', 'Acirc': 'Â', 'Epsilon': 'Ε', 'minus': '−', 'otilde': 'õ', 'lt': '<', 'Icirc': 'Î', 'Eacute': 'É', 'Oacute': 'Ó', 'sbquo': '‚', 'Prime': '″', 'oslash': 'ø', 'psi': 'ψ', 'Kappa': 'Κ', 'rsaquo': '›', 'acute': '´', 'uacute': 'ú', 'sigmaf': 'ς', 'lrm': '\u200e', 'zwj': '\u200d', 'cedil': '¸', 'Xi': 'Ξ', 'uml': '¨', 'not': '¬', 'ensp': '\u2002', 'AElig': 'Æ', 'prime': '′', 'Tau': 'Τ', 'lceil': '⌈', 'iquest': '¿', 'alefsym': 'ℵ', 'laquo': '«', 'dArr': '⇓', 'rdquo': '”', 'ge': '≥', 'Igrave': 'Ì', 'reg': '®', 'micro': 'µ', 'shy': '\xad', 'sdot': '⋅', 'nbsp': '\xa0', 'lfloor': '⌊', 'lArr': '⇐', 'Auml': 'Ä', 'brvbar': '¦', 'Otilde': 'Õ', 'szlig': 'ß', 'clubs': '♣', 'agrave': 'à', 'Ocirc': 'Ô', 'Theta': 'Θ', 'Pi': 'Π', 'OElig': 'Œ', 'Scaron': 'Š', 'thetasym': 'ϑ', 'egrave': 'è', 'sub': '⊂', 'iexcl': '¡', 'frac12': '½', 'ordf': 'ª', 'sum': '∑', 'frac14': '¼', 'prop': '∝', 'Uuml': 'Ü', 'ntilde': 'ñ', 'sup': '⊃', 'asymp': '≈', 'theta': 'θ', 'prod': '∏', 'nsub': '⊄', 'hArr': '⇔', 'rArr': '⇒', 'Oslash': 'Ø', 'nu': 'ν', 'THORN': 'Þ', 'yuml': 'ÿ', 'infin': '∞', 'Mu': 'Μ', 'le': '≤', 'thinsp': '\u2009', 'ecirc': 'ê', 'bdquo': '„', 'Sigma': 'Σ', 'fnof': 'ƒ', 'Aring': 'Å', 'tilde': '˜', 'frac34': '¾', 'nabla': '∇', 'mdash': '—', 'uarr': '↑', 'permil': '‰', 'Ugrave': 'Ù', 'eta': 'η', 'Agrave': 'À', 'sup1': '¹', 'forall': '∀', 'circ': 'ˆ', 'eth': 'ð', 'rceil': '⌉', 'iuml': 'ï', 'gamma': 'γ', 'lambda': 'λ', 'harr': '↔', 'Egrave': 'È', 'xi': 'ξ', 'real': 'ℜ', 'divide': '÷', 'Ouml': 'Ö', 'image': 'ℑ', 'hellip': '…', 'igrave': 'ì', 'Yuml': 'Ÿ', 'ang': '∠', 'sube': '⊆', 'loz': '◊', 'frasl': '⁄', 'ETH': 'Ð', 'lowast': '∗', 'Nu': 'Ν', 'plusmn': '±', 'omega': 'ω', 'chi': 'χ', 'sup2': '²', 'sup3': '³', 'Aacute': 'Á', 'cent': '¢', 'Iacute': 'Í', 'oline': '‾', 'Ecirc': 'Ê', 'Beta': 'Β', 'perp': '⊥', 'emsp': '\u2003', 'there4': '∴', 'pi': 'π', 'iota': 'ι', 'empty': '∅', 'euml': 'ë', 'notin': '∉', 'Upsilon': 'Υ', 'para': '¶', 'epsilon': 'ε', 'Delta': 'Δ', 'weierp': '℘', 'uuml': 'ü', 'part': '∂', 'icirc': 'î', 'bull': '•', 'omicron': 'ο', 'upsilon': 'υ', 'copy': '©', 'Iuml': 'Ï', 'Lambda': 'Λ', 'spades': '♠', 'ndash': '–', 'kappa': 'κ', 'ccedil': 'ç', 'Ucirc': 'Û', 'cap': '∩', 'ocirc': 'ô', 'mu': 'μ', 'scaron': 'š', 'lsquo': '‘', 'isin': '∈', 'Zeta': 'Ζ', 'supe': '⊇', 'deg': '°', 'and': '∧', 'tau': 'τ', 'pound': '£', 'curren': '¤', 'int': '∫', 'ucirc': 'û', 'rfloor': '⌋', 'crarr': '↵', 'ugrave': 'ù', 'exist': '∃', 'cong': '≅', 'Dagger': '‡', 'oplus': '⊕', 'times': '×', 'atilde': 'ã', 'piv': 'ϖ', 'iacute': 'í', 'Euml': 'Ë', 'Phi': 'Φ', 'raquo': '»', 'lsaquo': '‹', 'quot': '"', 'Uacute': 'Ú', 'Omicron': 'Ο', 'ne': '≠', 'Iota': 'Ι', 'eacute': 'é', 'rarr': '→', 'yacute': 'ý', 'Rho': 'Ρ', 'darr': '↓', 'Alpha': 'Α', 'zeta': 'ζ', 'Omega': 'Ω', 'acirc': 'â', 'sim': '∼', 'phi': 'φ', 'diams': '♦', 'macr': '¯', 'larr': '←', 'Ccedil': 'Ç', 'ordm': 'º', 'uArr': '⇑', 'beta': 'β', 'Eta': 'Η', 'rho': 'ρ', 'aacute': 'á', 'alpha': 'α', 'rlm': '\u200f', 'middot': '·', 'Gamma': 'Γ', 'euro': '€', 'lang': '〈', 'dagger': '†', 'amp': '&', 'rsquo': '’', 'thorn': 'þ', 'ouml': 'ö', 'or': '∨', 'Ograve': 'Ò', 'sect': '§', 'ldquo': '“', 'hearts': '♥', 'sigma': 'σ', 'oacute': 'ó'})[source]¶
Translate one entity to its ISO Latin value. Inspired by example from effbot.org
- nltk.sem.relextract.extract_rels(subjclass, objclass, doc, corpus='ace', pattern=None, window=10)[source]¶
Filter the output of semi_rel2reldict according to specified NE classes and a filler pattern.
The parameters subjclass and objclass can be used to restrict the Named Entities to particular types (any of ‘LOCATION’, ‘ORGANIZATION’, ‘PERSON’, ‘DURATION’, ‘DATE’, ‘CARDINAL’, ‘PERCENT’, ‘MONEY’, ‘MEASURE’).
Parameters: - subjclass (str) – the class of the subject Named Entity.
- objclass (str) – the class of the object Named Entity.
- doc (ieer document or a list of chunk trees) – input document
- corpus (str) – name of the corpus to take as input; possible values are ‘ieer’ and ‘conll2002’
- pattern (SRE_Pattern) – a regular expression for filtering the fillers of retrieved triples.
- window (int) – filters out fillers which exceed this threshold
Returns: see mk_reldicts
Return type: list(defaultdict)
- nltk.sem.relextract.in_demo(trace=0, sql=True)[source]¶
Select pairs of organizations and locations whose mentions occur with an intervening occurrence of the preposition “in”.
If the sql parameter is set to True, then the entity pairs are loaded into an in-memory database, and subsequently pulled out using an SQL “SELECT” query.
- nltk.sem.relextract.list2sym(lst)[source]¶
Convert a list of strings into a canonical symbol. :type lst: list :return: a Unicode string without whitespace :rtype: unicode
- nltk.sem.relextract.rtuple(reldict, lcon=False, rcon=False)[source]¶
Pretty print the reldict as an rtuple. :param reldict: a relation dictionary :type reldict: defaultdict
- nltk.sem.relextract.semi_rel2reldict(pairs, window=5, trace=False)[source]¶
Converts the pairs generated by _tree2semi_rel into a ‘reldict’: a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence).
Parameters: - pairs – a pair of list(str) and Tree, as generated by
- window (int) – a threshold for the number of items to include in the left and right context
Returns: ‘relation’ dictionaries whose keys are ‘lcon’, ‘subjclass’, ‘subjtext’, ‘subjsym’, ‘filler’, objclass’, objtext’, ‘objsym’ and ‘rcon’
Return type: list(defaultdict)
nltk.sem.skolemize module¶
nltk.sem.util module¶
Utility functions for batch-processing sentences: parsing and extraction of the semantic representation of the root node of the the syntax tree, followed by evaluation of the semantic representation in a first-order model.
- nltk.sem.util.batch_evaluate(inputs, grammar, model, assignment, trace=0)[source]¶
Add the truth-in-a-model value to each semantic representation for each syntactic parse of each input sentences.
Parameters: - inputs – a list of sentences
- grammar – FeatureGrammar or name of feature-based grammar
Returns: a mapping from sentences to lists of triples (parse-tree, semantic-representations, evaluation-in-model)
Return type: dict
- nltk.sem.util.batch_interpret(inputs, grammar, semkey='SEM', trace=0)[source]¶
Add the semantic representation to each syntactic parse tree of each input sentence.
Parameters: - inputs – a list of sentences
- grammar – FeatureGrammar or name of feature-based grammar
Returns: a mapping from sentences to lists of pairs (parse-tree, semantic-representations)
Return type: dict
- nltk.sem.util.batch_parse(inputs, grammar, trace=0)[source]¶
Convert input sentences into syntactic trees.
Parameters: - inputs (list of str) – sentences to be parsed
- grammar – FeatureGrammar or name of feature-based grammar
Return type: dict
Returns: a mapping from input sentences to a list of ``Tree``s
- nltk.sem.util.demo_legacy_grammar()[source]¶
Check that batch_interpret() is compatible with legacy grammars that use a lowercase ‘sem’ feature.
Define ‘test.fcfg’ to be the following
- nltk.sem.util.parse_valuation(s, encoding=None)[source]¶
Convert a valuation file into a valuation.
Parameters: - s (str) – the contents of a valuation file
- encoding (str) – the encoding of the input string, if it is binary
Returns: a nltk.sem valuation
Return type: Valuation
- nltk.sem.util.parse_valuation_line(s, encoding=None)[source]¶
Parse a line in a valuation file.
Lines are expected to be of the form:
noosa => n girl => {g1, g2} chase => {(b1, g1), (b2, g1), (g1, d1), (g2, d2)}
Parameters: - s (str) – input line
- encoding (str) – the encoding of the input string, if it is binary
Returns: a pair (symbol, value)
Return type: tuple
- nltk.sem.util.root_semrep(syntree, semkey='SEM')[source]¶
Find the semantic representation at the root of a tree.
Parameters: - syntree – a parse Tree
- semkey – the feature label to use for the root semantics in the tree
Returns: the semantic representation at the root of a Tree
Return type: sem.Expression
Module contents¶
NLTK Semantic Interpretation Package
This package contains classes for representing semantic structure in formulas of first-order logic and for evaluating such formulas in set-theoretic models.
>>> from nltk.sem import logic
>>> logic._counter._value = 0
The package has two main components:
- logic provides a parser for analyzing expressions of First Order Logic (FOL).
- evaluate allows users to recursively determine truth in a model for formulas of FOL.
A model consists of a domain of discourse and a valuation function, which assigns values to non-logical constants. We assume that entities in the domain are represented as strings such as 'b1', 'g1', etc. A Valuation is initialized with a list of (symbol, value) pairs, where values are entities, sets of entities or sets of tuples of entities. The domain of discourse can be inferred from the valuation, and model is then created with domain and valuation as parameters.
>>> from nltk.sem import Valuation, Model
>>> v = [('adam', 'b1'), ('betty', 'g1'), ('fido', 'd1'),
... ('girl', set(['g1', 'g2'])), ('boy', set(['b1', 'b2'])),
... ('dog', set(['d1'])),
... ('love', set([('b1', 'g1'), ('b2', 'g2'), ('g1', 'b1'), ('g2', 'b1')]))]
>>> val = Valuation(v)
>>> dom = val.domain
>>> m = Model(dom, val)