ngram package

Submodules

ngram.Corpus module

class ngram.Corpus.Corpus(filename)

Bases: object

Class defines a Corpus data file, and reading information from this file for only the Tamil letters

next_tamil_letter()

ngram.Distance module

ngram.Distance.Dice_coeff(wordA, wordB)

# Calculate edit-distance - Implements the Dice coefficent # Ref: https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient # distance should be between 0 - 1.0. can be used as a similarity match

ngram.Distance.Jaccard_coeff(*args)
ngram.Distance.edit_distance(wordA, wordB)

" Implements Daegmar-Levenshtein edit distance algorithm: Ref: https://en.wikipedia.org/wiki/Edit_distance Ref: https://en.wikipedia.org/wiki/Levenshtein_distance

ngram.LetterModels module

class ngram.LetterModels.Bigram(filename)

Bases: ngram.LetterModels.Unigram

language_model(verbose=True)

builds a Tamil bigram letter model

save(filename)
class ngram.LetterModels.Letters(filename)

Bases: object

save(filename)
update_file(filename)
class ngram.LetterModels.Trigram(filename)

Bases: ngram.LetterModels.Unigram

language_model(verbose=True)

builds a Tamil bigram letter model

save(filename)
class ngram.LetterModels.Unigram(filename)

Bases: ngram.LetterModels.Letters

frequency_model()

build a letter frequency model for Tamil letters from a corpus

save(filename)

ngram.WordModels module

ngram.WordModels.get_ngram_groups(word, n=1)

Module contents