solthiruthi package

Submodules

solthiruthi.Ezhimai module

class solthiruthi.Ezhimai.PattiyalThiruthi(option)[source]

Bases: solthiruthi.WordSpeller.ISpeller

static loadWordFile()[source]
process_word(word)[source]

solthiruthi.WordSpeller module

class solthiruthi.WordSpeller.ISpeller[source]

Bases: object

get_return_obj(word)[source]
process_word(word)[source]

solthiruthi.data_parser module

class solthiruthi.data_parser.DataParser(files)[source]
analysis()[source]
parse_data(filename)[source]
process()[source]
static run()[source]
class solthiruthi.data_parser.WordList(cat)[source]
add(word)[source]

solthiruthi.datastore module

class solthiruthi.datastore.DTrie[source]

Bases: solthiruthi.datastore.Trie

trie where number of alphabets at each nodes grows with time; implementation uses a dictionary; it contains an attribute count for frequency of letter.

add(word)[source]
getAllWords()[source]
getAllWordsAndCount()[source]
getAllWordsHelper(ref_trie, prefix, all_words)[source]
getAllWordsIterable()[source]
getAllWordsIterableHelper(ref_trie, prefix)[source]
getAllWordsPrefix(prefix)[source]
getWordCount(word)[source]
hasWordPrefix(wrd_prefix)[source]
isWord(word, ret_ref_trie=False)[source]

return a boolean as first output, and second output will be the reference trie

isWordAndTrie(word, prefix=False)[source]
class solthiruthi.datastore.Node[source]
class solthiruthi.datastore.Queue[source]

Bases: list

ExceptionMsg = u'Queue does not support list method %s'
append(obj)[source]

L.append(object) – append object to end

insert(obj)[source]

L.insert(index, object) – insert object before index

isempty()[source]
peek()[source]

look at next imminent item

remove()[source]

L.remove(value) – remove first occurrence of value. Raises ValueError if the value is not present.

reverse()[source]

L.reverse() – reverse IN PLACE

sort()[source]

L.sort(cmp=None, key=None, reverse=False) – stable sort IN PLACE; cmp(x, y) -> -1, 0, 1

class solthiruthi.datastore.RTrie(is_tamil=False)[source]

Bases: solthiruthi.datastore.DTrie

add(word)[source]
getAllWordsIterable()[source]
getAllWordsPrefix(pfx)[source]
getWordsEndingWith(sfx)[source]
reverse(word)[source]
class solthiruthi.datastore.TamilTrie(get_idx=<function getidx>, invert_idx=<function tamil>, alphabet_len=323)[source]

Bases: solthiruthi.datastore.Trie

Store a list of words into the Trie data structure

add(word)[source]
static buildEnglishTrie()[source]
getAllWords()[source]
getAllWordsHelper(ref_trie, ref_word_limits, prefix, all_words)[source]
getAllWordsIterable()[source]
getAllWordsPrefix(prefix)[source]
hasWordPrefix(prefix)[source]
isWord(word, ret_ref_trie=False)[source]

return a boolean as first output, and second output will be the reference trie

class solthiruthi.datastore.Trie[source]

Bases: object

add(word)[source]
static deserializeFromFile()[source]
getAllWords()[source]
getAllWordsIterable()[source]
getAllWordsPrefix(prefix)[source]
hasWordPrefix(prefix)[source]
isWord(word, ret_ref_trie=False)[source]

return a boolean as first output, and second output will be the reference trie

loadWordFile(filename)[source]
static mk_empty_trie()[source]
static serializeToFile(filename)[source]
solthiruthi.datastore.do_load()[source]

4 GB program - very inefficient

solthiruthi.datastore.do_stuff()[source]

solthiruthi.dictionary module

class solthiruthi.dictionary.Agarathi(dictionary_path, reverse=False)[source]

Bases: solthiruthi.dictionary.Dictionary

add(word)[source]
finalize()[source]
getAllWords()[source]
getAllWordsIterable()[source]
getDictionaryPath()[source]
getWordsEndingWith(sfx)[source]
getWordsStartingWith(pfx, limit=inf)[source]
hasWordsStartingWith(pfx)[source]
isWord(word)[source]
class solthiruthi.dictionary.Dictionary[source]

Bases: object

add(word)[source]
getAllWords()[source]
getAllWordsIterable()[source]
getDictionaryPath()[source]
getSize()[source]
getWordsEndingWith(sfx)[source]
getWordsStartingWith(pfx)[source]
hasWordsStartingWith(pfx)[source]
isWord(word)[source]
loadWordFile(pre_processor=None)[source]
class solthiruthi.dictionary.DictionaryBuilder[source]
static create()[source]
static createUsingWordList()[source]
class solthiruthi.dictionary.EmptyAgarathi[source]

Bases: solthiruthi.dictionary.Agarathi

class solthiruthi.dictionary.EnglishLinux[source]

Bases: solthiruthi.dictionary.Agarathi

add(word)[source]
isWord(word)[source]
class solthiruthi.dictionary.Madurai[source]

Bases: solthiruthi.dictionary.Agarathi

class solthiruthi.dictionary.TamilVU[source]

Bases: solthiruthi.dictionary.Agarathi

class solthiruthi.dictionary.Wikipedia[source]

Bases: solthiruthi.dictionary.Agarathi

solthiruthi.dictionary.reverse_Madurai()[source]
solthiruthi.dictionary.reverse_TamilVU()[source]
solthiruthi.dictionary.reverse_Wikipedia()[source]

solthiruthi.dom module

class solthiruthi.dom.Document(filename)[source]

Bases: solthiruthi.datastore.Queue

open contents of a file on load

tokenize()[source]
class solthiruthi.dom.Entity(word, flagged=False, **kwargs)[source]

Bases: solthiruthi.dom.Position

getLetters()[source]
isFlagged()[source]
isWord()[source]
class solthiruthi.dom.NonEntity(word, **kwargs)[source]

Bases: solthiruthi.dom.Entity, solthiruthi.dom.Position

isWord()[source]
class solthiruthi.dom.Position(row, col)[source]

Bases: object

class solthiruthi.dom.WordEntity(word, **kwargs)[source]

Bases: solthiruthi.dom.Entity

isWord()[source]

solthiruthi.heuristics module

class solthiruthi.heuristics.AdjacentConsonants(freq=2)[source]

Bases: solthiruthi.heuristics.Rule

donot allow adjacent consonants in the word. this may not be as useful as AdjacentVowels rules

agaram_letters = set([u'\u0ba3', u'\u0ba4', u'\u0ba9', u'\u0ba8', u'\u0baa', u'\u0baf', u'\u0bae', u'\u0bb1', u'\u0bb0', u'\u0bb3', u'\u0bb2', u'\u0b95', u'\u0bb4', u'\u0b99', u'\u0bb5', u'\u0b9a', u'\u0b9f', u'\u0b9e'])
apply(word, ctx=None)[source]

ignore ctx information right now

mei_letters = set([u'\u0b9a\u0bcd', u'\u0baf\u0bcd', u'\u0ba4\u0bcd', u'\u0b99\u0bcd', u'\u0bae\u0bcd', u'\u0ba3\u0bcd', u'\u0bb5\u0bcd', u'\u0b95\u0bcd', u'\u0baa\u0bcd', u'\u0b9f\u0bcd', u'\u0bb4\u0bcd', u'\u0ba9\u0bcd', u'\u0b9e\u0bcd', u'\u0bb3\u0bcd', u'\u0ba8\u0bcd', u'\u0bb2\u0bcd', u'\u0bb1\u0bcd', u'\u0bb0\u0bcd'])
reason = u'\u0b92\u0ba9\u0bcd\u0bb1\u0bc8\u0ba4\u0bcd\u0ba4\u0bca\u0b9f\u0bb0\u0bcd\u0ba8\u0bcd\u0ba4\u0bc1\u0b92\u0ba9\u0bcd\u0bb1\u0bc1 \u0bae\u0bc6\u0baf\u0bcd \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0b95\u0bcd\u0b95\u0bb3\u0bcd \u0bb5\u0bb0\u0b95\u0bcd\u0b95\u0bc2\u0b9f\u0bbe\u0ba4\u0bc1. \u0b87\u0ba4\u0bc1 \u0baa\u0bc6\u0bb0\u0bc1\u0bae\u0bcd\u0baa\u0bbe\u0bb2\u0bc1\u0bae\u0bcd \u0baa\u0bbf\u0bb4\u0bc8\u0baf\u0bbe\u0b95 \u0b87\u0bb0\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd.'
class solthiruthi.heuristics.AdjacentVowels[source]

Bases: solthiruthi.heuristics.Rule

donot allow adjacent vowels in the word. ஆஅக்காள் (originally -> அக்காள்) will be flagged

apply(word, ctx=None)[source]

ignore ctx information right now

reason = u'\u0b92\u0ba9\u0bcd\u0bb1\u0bc8\u0ba4\u0bcd\u0ba4\u0bca\u0b9f\u0bb0\u0bcd\u0ba8\u0bcd\u0ba4\u0bc1\u0b92\u0ba9\u0bcd\u0bb1\u0bc1 \u0b89\u0baf\u0bbf\u0bb0\u0bc6\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1\u0b95\u0bcd\u0b95\u0bb3\u0bcd \u0bb5\u0bb0\u0b95\u0bcd\u0b95\u0bc2\u0b9f\u0bbe\u0ba4\u0bc1. \u0b87\u0ba4\u0bc1 \u0baa\u0bc6\u0bb0\u0bc1\u0bae\u0bcd\u0baa\u0bbe\u0bb2\u0bc1\u0bae\u0bcd \u0baa\u0bbf\u0bb4\u0bc8\u0baf\u0bbe\u0b95 \u0b87\u0bb0\u0bc1\u0b95\u0bcd\u0b95\u0bc1\u0bae\u0bcd.'
uyir_letters = set([u'\u0b85', u'\u0b87', u'\u0b86', u'\u0b89', u'\u0b88', u'\u0b8a', u'\u0b8f', u'\u0b8e', u'\u0b90', u'\u0b93', u'\u0b92', u'\u0b94'])
class solthiruthi.heuristics.BadIME[source]

Bases: solthiruthi.heuristics.Rule

donot allow vowels with kombu, thunaikaal etc in the word. ஆாள் (originally intended as -> ஆள்) will be flagged

apply(word, ctx=None)[source]

ignore ctx information right now

reason = u'\u0b9a\u0bca\u0bb2\u0bcd\u0bb2\u0bbf\u0bb2\u0bcd \u0baa\u0bbf\u0bb4\u0bc8 \u0b95\u0bbe\u0bb0\u0ba3\u0bae\u0bcd, \u0b87\u0bb2\u0bcd\u0bb2\u0bbe\u0ba4 \u0ba4\u0bae\u0bbf\u0bb4\u0bcd \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1..'
uyir_letters = set([u'\u0b85', u'\u0b87', u'\u0b86', u'\u0b89', u'\u0b88', u'\u0b8a', u'\u0b8f', u'\u0b8e', u'\u0b90', u'\u0b93', u'\u0b92', u'\u0b94'])
class solthiruthi.heuristics.RepeatedLetters[source]

Bases: solthiruthi.heuristics.Rule

donot allow more than one repetition of a letter in word

apply(word, ctx=None)[source]

ignore ctx information right now

reason = u'\u0b92\u0bb0\u0bc7 \u0b8e\u0bb4\u0bc1\u0ba4\u0bcd\u0ba4\u0bc1 \u0baa\u0bb2 \u0bae\u0bc1\u0bb0\u0bc8 (>= 2) \u0ba4\u0bca\u0b9f\u0bb0\u0bcd\u0b9a\u0bcd\u0b9a\u0bbf\u0baf\u0bbe\u0b95 \u0bb5\u0ba8\u0bcd\u0ba4\u0bbe\u0bb2\u0bcd \u0b85\u0ba4\u0bc1 \u0baa\u0bbf\u0bb4\u0bc8\u0baf\u0bbe\u0ba9 \u0b9a\u0bca\u0bb2\u0bcd \u0b86\u0b95\u0bc1\u0bae\u0bcd'
class solthiruthi.heuristics.Rule[source]

Bases: object

apply(word, ctx)[source]

@word is just that. @ctx is a dict of NwordsPrevious, NwordsNext, and a list of surrounding words for as items. e.g. ctx = {‘NPrev’ : 4, ‘Prev’ : [w1,w2,w3,w4],’NNext’:2,’Next’:[w1,w2]} return value should be boolean (False if error found) and an optional reason as second argument

class solthiruthi.heuristics.Sequential[source]
static in_sequence(ref_set, ref_reason, freq_threshold=2)[source]

ignore ctx information right now. If repetition/match length >= @freq_threshold then we flag-it

solthiruthi.heuristics.get_letters(word)[source]

solthiruthi.morphology module

class solthiruthi.morphology.CaseFilter(*filter_obj_list)[source]

Bases: object

apply(word_in)[source]
class solthiruthi.morphology.RemoveCaseSuffix[source]

Bases: solthiruthi.morphology.RemoveSuffix

apply(word)[source]
setSuffixes()[source]
class solthiruthi.morphology.RemoveHyphenatesNumberDate[source]

Bases: solthiruthi.morphology.RemoveCaseSuffix

Done correctly (மேல்) 65536-மேல், ivan paritchayil இரண்டாவது, 2-வது

class solthiruthi.morphology.RemoveNegationSuffix[source]

Bases: solthiruthi.morphology.RemoveCaseSuffix

setSuffixes()[source]
class solthiruthi.morphology.RemovePluralSuffix[source]

Bases: solthiruthi.morphology.RemoveSuffix

apply(word)[source]
setSuffixes()[source]
class solthiruthi.morphology.RemovePrefix[source]

Bases: solthiruthi.morphology.RemoveSuffix

apply(word)[source]
removePrefix(word)[source]
setSuffixes()[source]
class solthiruthi.morphology.RemoveSuffix[source]

Bases: object

apply(word)[source]
prepareSuffixes()[source]
removeSuffix(word)[source]
setSuffixes()[source]
class solthiruthi.morphology.RemoveVerbSuffixTense[source]

Bases: solthiruthi.morphology.RemoveCaseSuffix

setSuffixes()[source]
solthiruthi.morphology.xkcd()[source]

solthiruthi.resources module

solthiruthi.resources.get_data_categories()[source]
solthiruthi.resources.get_data_dictionaries()[source]
solthiruthi.resources.get_data_dir()[source]
solthiruthi.resources.mk_path(srcfile)[source]

solthiruthi.scoring module

class solthiruthi.scoring.NGStats[source]
bigram_score(letters)[source]
load()[source]
unigram_score(letters)[source]
solthiruthi.scoring.bigram_scores(letters)[source]
solthiruthi.scoring.unigram_score(letters)[source]

solthiruthi.solthiruthi module

class solthiruthi.solthiruthi.Solthiruthi[source]
static get_CLI_options(DEBUG=False)[source]

solthiruthi.suggestions module

solthiruthi.suggestions.kombu_suggestor()[source]
solthiruthi.suggestions.mayangoli_suggestor()[source]
Rules:

ண, ன - mayakkam ல, ழ, ள - mayakkam ர, ற - mayakkam

ivattrilum ithan uyirmei varisayilum mayakkangalai kaanalaam.
solthiruthi.suggestions.norvig_suggestor(word, alphabets=None, nedits=1, limit=inf)[source]

solthiruthi.vinaisorkal module

class solthiruthi.vinaisorkal.VerbClass(classify, words)[source]
class solthiruthi.vinaisorkal.VinaiSorkal[source]
Doublets = <solthiruthi.vinaisorkal.VerbClass instance>
IrregularVerbs = <solthiruthi.vinaisorkal.VerbClass instance>
class solthiruthi.vinaisorkal.struct[source]

Bases: object

static build()[source]

Module contents