tamil package

Submodules

tamil.date module

class tamil.date.BasicTamilTimeFormat

Bases: object

static format(year, month, month_day, week_day, hour, minute, second)
class tamil.date.DateUtils

Bases: object

DAY = 'நாள்'
DAY_SUFFIX = 'கிழமை'
HOUR = 'மணி'
MINUTE = 'நிமிடம்'
MONTH = 'மாதம்'
MONTHS = {'April': 'ஏப்ரல்', 'August': 'ஆகஸ்ட்', 'December': 'டிசம்பர்', 'February': 'பிப்ரவரி', 'January': 'ஜனவரி', 'July': 'ஜூலை', 'June': 'ஜூன்', 'March': 'மார்ச்', 'May': 'மே', 'November': 'நவம்பர்', 'October': 'அக்டோபர்', 'September': 'செப்டம்பர்'}
MONTHS_INDEX = [None, 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
TIME = 'நேரம்'
WEEK = 'வாரம்'
WEEKDAYS = {'friday': 'வெள்ளி', 'monday': 'திங்கள்', 'saturday': 'சனிக்கிழமை', 'sunday': 'ஞாயிறு', 'thursday': 'வியாழன்', 'tuesday': 'செவ்வாய்', 'wednesday': 'புதன்'}
WEEKDAYS_INDEX = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']
YEAR = 'ஆண்டு'
static get_hour_prefix(hour)
static get_time(local_time=None, fmt=None)
static tamil_month(month)
static tamil_weekday(week_day)
class tamil.date.long

Bases: int

tamil.iscii module

tamil.iscii.convert_to_unicode(tscii_input)

convert a byte-ASCII encoded string into equivalent Unicode string in the UTF-8 notation.

tamil.iscii.print_table()

tamil.numeral module

class tamil.numeral.long

Bases: int

tamil.numeral.num2tamilstr(*args)

work till one lakh crore - i.e 1e5*1e7 = 1e12. turn number into a numeral, Indian style. Fractions upto 1e-30

tamil.numeral.num2tamilstr_american(*args)

tamil.regexp module

tamil.regexp.expand_sequence(start, end, seq)
tamil.regexp.expand_tamil(start, end)

expand uyir or mei-letter range etc. i.e. அ-ஔ gets converted to அ,ஆ,இ,ஈ,உ,ஊ,எ,ஏ,ஐ,ஒ,ஓ,ஔ etc.

tamil.regexp.is_containing_seq(start, end, seq)
tamil.regexp.make_pattern(patt, flags=0)

returns a compile regular expression object

tamil.regexp.match(patt, inputstr)
tamil.regexp.search(patt, inputstr)

tamil.tscii module

tamil.tscii.convert_to_unicode(tscii_input)

convert a byte-ASCII encoded string into equivalent Unicode string in the UTF-8 notation.

tamil.tscii.print_table()

tamil.tscii2utf8 module

tamil.tscii2utf8.usage()

tamil.tweetparser module

class tamil.tweetparser.TamilTweetParser(timeline_owner, tweet)

Bases: tamil.tweetparser.TweetParser

static cleanupPunct(tweet)

NonEnglishOrTamilOr

static getTamilWords(tweet)

" word needs to all be in the same tamil language

static isTamilPredicate(word)

is Tamil word : boolean True/False

class tamil.tweetparser.TweetParser(timeline_owner, tweet)

Bases: object

static getAttributeMT(tweet)

see if tweet is a MT

static getAttributeRT(tweet)

see if tweet is a RT

static getHashtags(tweet)

return all hashtags

static getURLs(tweet)

URL : [http://]?[w.?/]+

static getUserHandles(tweet)

given a tweet we try and extract all user handles in order of occurrence

tamil.utf8 module

tamil.utf8.accent_len()
tamil.utf8.agaram(idx)
tamil.utf8.agaram_len()
tamil.utf8.all_tamil(word_in)

predicate checks if all letters of the input word are Tamil letters

tamil.utf8.ayudha_len()
tamil.utf8.classify_letter(letter)
tamil.utf8.cmp(x, y)
tamil.utf8.compare_words_lexicographic(word_a, word_b)

compare words in Tamil lexicographic order

tamil.utf8.get_letters(word)

splits the word into a character-list of tamil/english characters present in the stream

tamil.utf8.get_letters_elementary(word, symmetric=False)
tamil.utf8.get_letters_elementary_iterable(word, symmetric=False)
tamil.utf8.get_letters_iterable(word)

splits the word into a character-list of tamil/english characters present in the stream

tamil.utf8.get_tamil_words(letters)

reverse a Tamil word according to letters, not unicode-points

tamil.utf8.get_words(letters, tamil_only=False)
tamil.utf8.get_words_iterable(letters, tamil_only=False)

given a list of UTF-8 letters section them into words, grouping them at spaces

tamil.utf8.getidx(letter)
tamil.utf8.has_english(word_in)

return True if word_in has any English letters in the string

tamil.utf8.has_tamil(word)

check if the word has any occurance of any tamil letter

tamil.utf8.is_normalized(text)
tamil.utf8.is_tamil_unicode(sequence)
tamil.utf8.is_tamil_unicode_predicate(x)
tamil.utf8.istamil(tchar)

check if the letter tchar is prefix of any of tamil-letter. It suggests we have a tamil identifier

tamil.utf8.istamil_alnum(tchar)

check if the character is alphanumeric, or tamil. This saves time from running through istamil() check.

tamil.utf8.istamil_prefix(word)

check if the given word has a tamil prefix. Returns either a True/False flag

tamil.utf8.joinMeiUyir(mei_char, uyir_char)

This function join mei character and uyir character, and retuns as compound uyirmei unicode character.

Inputs:
mei_char : It must be unicode tamil mei char. uyir_char : It must be unicode tamil uyir char.

Written By : Arulalan.T Date : 22.09.2014

tamil.utf8.join_letters_elementary(elements)
tamil.utf8.letters_to_py(_letters)

return list of letters e.g. uyir_letters as a Python list

tamil.utf8.mei(idx)
tamil.utf8.mei_len()
tamil.utf8.mei_to_agaram(in_syllable)
tamil.utf8.print_tamil_words(tatext, use_frequencies=False)
tamil.utf8.reverse_word(word)

reverse a Tamil word according to letters not unicode-points

tamil.utf8.splitMeiUyir(uyirmei_char)

This function split uyirmei compound character into mei + uyir characters and returns in tuple.

Input : It must be unicode tamil char.

Written By : Arulalan.T Date : 22.09.2014

tamil.utf8.tamil(idx)

retrieve Tamil letter at canonical index from array utf8.tamil_letters

tamil.utf8.tamil_len()
tamil.utf8.tamil_sorted(list_data)
tamil.utf8.to_unicode_repr(_letter)

helpful in situations where browser/app may recognize Unicode encoding in the எ type syntax but not actual unicode glyph/code-point

tamil.utf8.unicode_normalize(cplxchar)
tamil.utf8.uyir(idx)
tamil.utf8.uyir_len()
tamil.utf8.uyirmei(idx)
tamil.utf8.uyirmei_constructed(mei_idx, uyir_idx)

construct uyirmei letter give mei index and uyir index

tamil.utf8.uyirmei_len()
tamil.utf8.word_intersection(word_a, word_b)

return a list of tuples where word_a, word_b intersect

tamil.wordutils module

class tamil.wordutils.DictionaryFixedWordList(wlist)

Bases: object

hasWordsStartingWith(pfx)
isWord(word)
class tamil.wordutils.DictionaryWithPredicate(isWord)

Bases: tuple

isWord

Alias for field number 0

tamil.wordutils.all_plaindromes(dictionary)
tamil.wordutils.anagrams(word, dictionary, permutations=<function tamil_permutations>)
tamil.wordutils.anagrams_in_dictionary(dictionary)
tamil.wordutils.combinagrams(word, dictionary, limit=inf)
tamil.wordutils.combinations(symbols_in)
tamil.wordutils.default_true(*args)
tamil.wordutils.greedy_split(inword, dictionary)
tamil.wordutils.is_anagram(wordA, wordB)
tamil.wordutils.is_palindrome(*args)
tamil.wordutils.minnal(word_list, use_grantham=False)
tamil.wordutils.palindrome(symbols_in)
tamil.wordutils.permutagrams(word, dictionary)
tamil.wordutils.permutations(symbols, predicate=<function default_true>, prefix='')
tamil.wordutils.rhymes_with(inword, reverse_dictionary)
tamil.wordutils.tamil_permutations(inword)
tamil.wordutils.word_split(inword, dictionary)

Module contents