ngram package

Submodules

ngram.Corpus module

class ngram.Corpus.Corpus(filename)[source]

Class defines a Corpus data file, and reading information from this file for only the Tamil letters

next_tamil_letter()[source]

ngram.Distance module

ngram.Distance.Dice_coeff(wordA, wordB)[source]

# Calculate edit-distance - Implements the Dice coefficent # Ref: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient # distance should be between 0 - 1.0. can be used as a similarity match

ngram.Distance.Jaccard_coeff(*args)[source]
ngram.Distance.edit_distance(wordA, wordB)[source]

” Implements Daegmar-Levenshtein edit distance algorithm: Ref: https://en.wikipedia.org/wiki/Edit_distance Ref: https://en.wikipedia.org/wiki/Levenshtein_distance

ngram.LetterModels module

class ngram.LetterModels.Bigram(filename)[source]

Bases: ngram.LetterModels.Unigram

language_model(verbose=True)[source]

builds a Tamil bigram letter model

save(filename)[source]
class ngram.LetterModels.Letters(filename)[source]
save(filename)[source]
class ngram.LetterModels.Trigram(filename)[source]

Bases: ngram.LetterModels.Unigram

language_model(verbose=True)[source]

builds a Tamil bigram letter model

save(filename)[source]
class ngram.LetterModels.Unigram(filename)[source]

Bases: ngram.LetterModels.Letters

frequency_model()[source]

build a letter frequency model for Tamil letters from a corpus

save(filename)[source]

ngram.WordModels module

ngram.WordModels.get_ngram_groups(word, n=1)[source]

Module contents