solthiruthi package¶

trie where number of alphabets at each nodes grows with time; implementation uses a dictionary; it contains an attribute count for frequency of letter.

add(word)[source]¶

getAllWords()[source]¶

getAllWordsAndCount()[source]¶

getAllWordsHelper(ref_trie, prefix, all_words)[source]¶

getAllWordsIterable()[source]¶

getAllWordsIterableHelper(ref_trie, prefix)[source]¶

getAllWordsPrefix(prefix)[source]¶

getWordCount(word)[source]¶

hasWordPrefix(wrd_prefix)[source]¶

isWord(word, ret_ref_trie=False)[source]¶: return a boolean as first output, and second output will be the reference trie

isWordAndTrie(word, prefix=False)[source]¶

class solthiruthi.datastore.Node[source]¶

class solthiruthi.datastore.Queue[source]¶

Bases: list

ExceptionMsg = u'Queue does not support list method %s'¶

append(obj)[source]¶: L.append(object) – append object to end

insert(obj)[source]¶: L.insert(index, object) – insert object before index

isempty()[source]¶

peek()[source]¶: look at next imminent item

remove()[source]¶: L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse()[source]¶: L.reverse() – reverse IN PLACE

sort()[source]¶: L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

class solthiruthi.datastore.RTrie(is_tamil=False)[source]¶

Bases: solthiruthi.datastore.DTrie

add(word)[source]¶

getAllWordsIterable()[source]¶

getAllWordsPrefix(pfx)[source]¶

getWordsEndingWith(sfx)[source]¶

reverse(word)[source]¶

class solthiruthi.datastore.TamilTrie(get_idx=<function getidx>, invert_idx=<function tamil>, alphabet_len=323)[source]¶

Bases: solthiruthi.datastore.Trie

Store a list of words into the Trie data structure

add(word)[source]¶

static buildEnglishTrie()[source]¶

getAllWords()[source]¶

getAllWordsHelper(ref_trie, ref_word_limits, prefix, all_words)[source]¶

getAllWordsIterable()[source]¶

getAllWordsPrefix(prefix)[source]¶

hasWordPrefix(prefix)[source]¶

isWord(word, ret_ref_trie=False)[source]¶: return a boolean as first output, and second output will be the reference trie

class solthiruthi.datastore.Trie[source]¶

Bases: object

add(word)[source]¶

static deserializeFromFile()[source]¶

getAllWords()[source]¶

getAllWordsIterable()[source]¶

getAllWordsPrefix(prefix)[source]¶

hasWordPrefix(prefix)[source]¶

isWord(word, ret_ref_trie=False)[source]¶: return a boolean as first output, and second output will be the reference trie

loadWordFile(filename)[source]¶

static mk_empty_trie()[source]¶

static serializeToFile(filename)[source]¶

solthiruthi.datastore.do_load()[source]¶: 4 GB program - very inefficient

solthiruthi.datastore.do_stuff()[source]¶

solthiruthi.dictionary module¶

class solthiruthi.dictionary.Agarathi(dictionary_path, reverse=False)[source]¶

Bases: solthiruthi.dictionary.Dictionary

add(word)[source]¶

finalize()[source]¶

getAllWords()[source]¶

getAllWordsIterable()[source]¶

getDictionaryPath()[source]¶

getWordsEndingWith(sfx)[source]¶

getWordsStartingWith(pfx, limit=inf)[source]¶

hasWordsStartingWith(pfx)[source]¶

isWord(word)[source]¶

class solthiruthi.dictionary.Dictionary[source]¶

Bases: object

add(word)[source]¶

getAllWords()[source]¶

getAllWordsIterable()[source]¶

getDictionaryPath()[source]¶

getSize()[source]¶

getWordsEndingWith(sfx)[source]¶

getWordsStartingWith(pfx)[source]¶

hasWordsStartingWith(pfx)[source]¶

isWord(word)[source]¶

loadWordFile(pre_processor=None)[source]¶

class solthiruthi.dictionary.DictionaryBuilder[source]¶

static create()[source]¶

static createUsingWordList()[source]¶

class solthiruthi.dictionary.EmptyAgarathi[source]¶: Bases: solthiruthi.dictionary.Agarathi

class solthiruthi.dictionary.EnglishLinux[source]¶

Bases: solthiruthi.dictionary.Agarathi

add(word)[source]¶

isWord(word)[source]¶

class solthiruthi.dictionary.Madurai[source]¶: Bases: solthiruthi.dictionary.Agarathi

class solthiruthi.dictionary.TamilVU[source]¶: Bases: solthiruthi.dictionary.Agarathi

class solthiruthi.dictionary.Wikipedia[source]¶: Bases: solthiruthi.dictionary.Agarathi

solthiruthi.dictionary.reverse_Madurai()[source]¶

solthiruthi.dictionary.reverse_TamilVU()[source]¶

solthiruthi.dictionary.reverse_Wikipedia()[source]¶

solthiruthi.dom module¶

class solthiruthi.dom.Document(filename)[source]¶

Bases: solthiruthi.datastore.Queue

open contents of a file on load

tokenize()[source]¶

class solthiruthi.dom.Entity(word, flagged=False, **kwargs)[source]¶

Bases: solthiruthi.dom.Position

getLetters()[source]¶

isFlagged()[source]¶

isWord()[source]¶

class solthiruthi.dom.NonEntity(word, **kwargs)[source]¶

Bases: solthiruthi.dom.Entity, solthiruthi.dom.Position

isWord()[source]¶

class solthiruthi.dom.Position(row, col)[source]¶: Bases: object

class solthiruthi.dom.WordEntity(word, **kwargs)[source]¶

Bases: solthiruthi.dom.Entity

isWord()[source]¶

solthiruthi.heuristics module¶

class solthiruthi.heuristics.AdjacentConsonants(freq=2)[source]¶

Bases: solthiruthi.heuristics.Rule

donot allow adjacent consonants in the word. this may not be as useful as AdjacentVowels rules

agaram_letters = set([u'\u0ba3', u'\u0ba4', u'\u0ba9', u'\u0ba8', u'\u0baa', u'\u0baf', u'\u0bae', u'\u0bb1', u'\u0bb0', u'\u0bb3', u'\u0bb2', u'\u0b95', u'\u0bb4', u'\u0b99', u'\u0bb5', u'\u0b9a', u'\u0b9f', u'\u0b9e'])¶

apply(word, ctx=None)[source]¶: ignore ctx information right now

mei_letters = set([u'\u0b9a\u0bcd', u'\u0baf\u0bcd', u'\u0ba4\u0bcd', u'\u0b99\u0bcd', u'\u0bae\u0bcd', u'\u0ba3\u0bcd', u'\u0bb5\u0bcd', u'\u0b95\u0bcd', u'\u0baa\u0bcd', u'\u0b9f\u0bcd', u'\u0bb4\u0bcd', u'\u0ba9\u0bcd', u'\u0b9e\u0bcd', u'\u0bb3\u0bcd', u'\u0ba8\u0bcd', u'\u0bb2\u0bcd', u'\u0bb1\u0bcd', u'\u0bb0\u0bcd'])¶

reason = u'\u0b92\u0ba9\u0bcd\u0bb1\u0bc8\u0ba4\u0bcd\u0ba4\u0bca\u0b9f\u0bb0\u0bcd\u0ba8\u0bcd\u0ba4\u0bc1\u0b92\u0ba9\u0bcd\u0bb1\u0bc1 \u0bae\u0bc6\u0baf\u0bcd \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0b95\u0bcd\u0b95\u0bb3\u0bcd \u0bb5\u0bb0\u0b95\u0bcd\u0b95\u0bc2\u0b9f\u0bbe\u0ba4\u0bc1. \u0b87\u0ba4\u0bc1 \u0baa\u0bc6\u0bb0\u0bc1\u0bae\u0bcd\u0baa\u0bbe\u0bb2\u0bc1\u0bae\u0bcd \u0baa\u0bbf\u0bb4\u0bc8\u0baf\u0bbe\u0b95 \u0b87\u0bb0\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd.'¶

class solthiruthi.heuristics.AdjacentVowels[source]¶

Bases: solthiruthi.heuristics.Rule

donot allow adjacent vowels in the word. ஆஅக்காள் (originally -> அக்காள்) will be flagged

apply(word, ctx=None)[source]¶: ignore ctx information right now

reason = u'\u0b92\u0ba9\u0bcd\u0bb1\u0bc8\u0ba4\u0bcd\u0ba4\u0bca\u0b9f\u0bb0\u0bcd\u0ba8\u0bcd\u0ba4\u0bc1\u0b92\u0ba9\u0bcd\u0bb1\u0bc1 \u0b89\u0baf\u0bbf\u0bb0\u0bc6\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0b95\u0bcd\u0b95\u0bb3\u0bcd \u0bb5\u0bb0\u0b95\u0bcd\u0b95\u0bc2\u0b9f\u0bbe\u0ba4\u0bc1. \u0b87\u0ba4\u0bc1 \u0baa\u0bc6\u0bb0\u0bc1\u0bae\u0bcd\u0baa\u0bbe\u0bb2\u0bc1\u0bae\u0bcd \u0baa\u0bbf\u0bb4\u0bc8\u0baf\u0bbe\u0b95 \u0b87\u0bb0\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd.'¶

uyir_letters = set([u'\u0b85', u'\u0b87', u'\u0b86', u'\u0b89', u'\u0b88', u'\u0b8a', u'\u0b8f', u'\u0b8e', u'\u0b90', u'\u0b93', u'\u0b92', u'\u0b94'])¶

class solthiruthi.heuristics.BadIME[source]¶

Bases: solthiruthi.heuristics.Rule

donot allow vowels with kombu, thunaikaal etc in the word. ஆாள் (originally intended as -> ஆள்) will be flagged

apply(word, ctx=None)[source]¶: ignore ctx information right now

reason = u'\u0b9a\u0bca\u0bb2\u0bcd\u0bb2\u0bbf\u0bb2\u0bcd \u0baa\u0bbf\u0bb4\u0bc8 \u0b95\u0bbe\u0bb0\u0ba3\u0bae\u0bcd, \u0b87\u0bb2\u0bcd\u0bb2\u0bbe\u0ba4 \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1..'¶

uyir_letters = set([u'\u0b85', u'\u0b87', u'\u0b86', u'\u0b89', u'\u0b88', u'\u0b8a', u'\u0b8f', u'\u0b8e', u'\u0b90', u'\u0b93', u'\u0b92', u'\u0b94'])¶

class solthiruthi.heuristics.RepeatedLetters[source]¶

Bases: solthiruthi.heuristics.Rule

donot allow more than one repetition of a letter in word

apply(word, ctx=None)[source]¶: ignore ctx information right now

reason = u'\u0b92\u0bb0\u0bc7 \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1 \u0baa\u0bb2 \u0bae\u0bc1\u0bb0\u0bc8 (>= 2) \u0ba4\u0bca\u0b9f\u0bb0\u0bcd\u0b9a\u0bcd\u0b9a\u0bbf\u0baf\u0bbe\u0b95 \u0bb5\u0ba8\u0bcd\u0ba4\u0bbe\u0bb2\u0bcd \u0b85\u0ba4\u0bc1 \u0baa\u0bbf\u0bb4\u0bc8\u0baf\u0bbe\u0ba9 \u0b9a\u0bca\u0bb2\u0bcd \u0b86\u0b95\u0bc1\u0bae\u0bcd'¶

class solthiruthi.heuristics.Rule[source]¶

Bases: object

apply(word, ctx)[source]¶: @word is just that. @ctx is a dict of NwordsPrevious, NwordsNext, and a list of surrounding words for as items. e.g. ctx = {‘NPrev’ : 4, ‘Prev’ : [w1,w2,w3,w4],’NNext’:2,’Next’:[w1,w2]} return value should be boolean (False if error found) and an optional reason as second argument

class solthiruthi.heuristics.Sequential[source]¶

static in_sequence(ref_set, ref_reason, freq_threshold=2)[source]¶: ignore ctx information right now. If repetition/match length >= @freq_threshold then we flag-it

solthiruthi.heuristics.get_letters(word)[source]¶

solthiruthi.morphology module¶

class solthiruthi.morphology.CaseFilter(*filter_obj_list)[source]¶

Bases: object

apply(word_in)[source]¶

class solthiruthi.morphology.RemoveCaseSuffix[source]¶

Bases: solthiruthi.morphology.RemoveSuffix

apply(word)[source]¶

setSuffixes()[source]¶

class solthiruthi.morphology.RemoveHyphenatesNumberDate[source]¶

Bases: solthiruthi.morphology.RemoveCaseSuffix

Done correctly (மேல்) 65536-மேல், ivan paritchayil இரண்டாவது, 2-வது

class solthiruthi.morphology.RemoveNegationSuffix[source]¶

Bases: solthiruthi.morphology.RemoveCaseSuffix

setSuffixes()[source]¶

class solthiruthi.morphology.RemovePluralSuffix[source]¶

Bases: solthiruthi.morphology.RemoveSuffix

apply(word)[source]¶

setSuffixes()[source]¶

class solthiruthi.morphology.RemovePrefix[source]¶

Bases: solthiruthi.morphology.RemoveSuffix

apply(word)[source]¶

removePrefix(word)[source]¶

setSuffixes()[source]¶

class solthiruthi.morphology.RemoveSuffix[source]¶

Bases: object

apply(word)[source]¶

prepareSuffixes()[source]¶

removeSuffix(word)[source]¶

setSuffixes()[source]¶

class solthiruthi.morphology.RemoveVerbSuffixTense[source]¶

Bases: solthiruthi.morphology.RemoveCaseSuffix

setSuffixes()[source]¶

solthiruthi.morphology.xkcd()[source]¶

solthiruthi.resources module¶

solthiruthi.resources.get_data_categories()[source]¶

solthiruthi.resources.get_data_dictionaries()[source]¶

solthiruthi.resources.get_data_dir()[source]¶

solthiruthi.resources.mk_path(srcfile)[source]¶

solthiruthi.scoring module¶

class solthiruthi.scoring.NGStats[source]¶

bigram_score(letters)[source]¶

load()[source]¶

unigram_score(letters)[source]¶

solthiruthi.scoring.bigram_scores(letters)[source]¶

solthiruthi.scoring.unigram_score(letters)[source]¶

solthiruthi.solthiruthi module¶

class solthiruthi.solthiruthi.Solthiruthi[source]¶

static get_CLI_options(DEBUG=False)[source]¶

solthiruthi.suggestions module¶

solthiruthi.suggestions.kombu_suggestor()[source]¶

solthiruthi.suggestions.mayangoli_suggestor()[source]¶: Rules:

ண, ன - mayakkam ல, ழ, ள - mayakkam ர, ற - mayakkam

ivattrilum ithan uyirmei varisayilum mayakkangalai kaanalaam.

solthiruthi.suggestions.norvig_suggestor(word, alphabets=None, nedits=1, limit=inf)[source]¶

solthiruthi.vinaisorkal module¶

class solthiruthi.vinaisorkal.VerbClass(classify, words)[source]¶

class solthiruthi.vinaisorkal.VinaiSorkal[source]¶

Doublets = <solthiruthi.vinaisorkal.VerbClass instance>¶

IrregularVerbs = <solthiruthi.vinaisorkal.VerbClass instance>¶

class solthiruthi.vinaisorkal.struct[source]¶

Bases: object

static build()[source]¶

solthiruthi package¶

Submodules¶

solthiruthi.Ezhimai module¶

solthiruthi.WordSpeller module¶

solthiruthi.data_parser module¶

solthiruthi.datastore module¶

solthiruthi.dictionary module¶

solthiruthi.dom module¶

solthiruthi.heuristics module¶

solthiruthi.morphology module¶

solthiruthi.resources module¶

solthiruthi.scoring module¶

solthiruthi.solthiruthi module¶

solthiruthi.suggestions module¶

solthiruthi.vinaisorkal module¶

Module contents¶

Table Of Contents

Related Topics

This Page