nltk.model package

Submodules

nltk.model.api module

class nltk.model.api.ModelI[source]

Bases: builtins.object

A processing interface for assigning a probability to the next word.

choose_random_word(context)[source]

Randomly select a word that is likely to appear in this context.

entropy(text)[source]

Evaluate the total entropy of a message with respect to the model. This is the sum of the log probability of each word in the message.

generate(n)[source]

Generate n words of text from the language model.

logprob(word, context)[source]

Evaluate the (negative) log probability of this word in this context.

prob(word, context)[source]

Evaluate the probability of this word in this context.

nltk.model.ngram module

class nltk.model.ngram.NgramModel(n, train, pad_left=True, pad_right=False, estimator=None, *estimator_args, **estimator_kwargs)[source]

Bases: nltk.model.api.ModelI

A processing interface for assigning a probability to the next word.

choose_random_word(context)[source]

Randomly select a word that is likely to appear in this context.

Parameters:context (list(str)) – the context the word is in
entropy(text)[source]

Calculate the approximate cross-entropy of the n-gram model for a given evaluation text. This is the average log probability of each word in the text.

Parameters:text (list(str)) – words to use for evaluation
generate(num_words, context=())[source]

Generate random text based on the language model.

Parameters:
  • num_words (int) – number of words to generate
  • context (list(str)) – initial words in generated string
logprob(word, context)[source]

Evaluate the (negative) log probability of this word in this context.

Parameters:
  • word (str) – the word to get the probability of
  • context (list(str)) – the context the word is in
perplexity(text)[source]

Calculates the perplexity of the given text. This is simply 2 ** cross-entropy for the text.

Parameters:text (list(str)) – words to calculate perplexity of
prob(word, context)[source]

Evaluate the probability of this word in this context using Katz Backoff.

Parameters:
  • word (str) – the word to get the probability of
  • context (list(str)) – the context the word is in
unicode_repr()
nltk.model.ngram.teardown_module(module=None)[source]

Module contents