Transliterations

You can set up automatic transliterations in your corpus. Transliterations are functions that transform the text of words and sentences before they are sent to the user. If you use transliterations, you have to list their names in the transliterations list in corpus.json. The user can choose a transliteration in the settings dialog. Although the list of transliterations is defined globally for the whole corpus, each transliteration can work differently depending on the language/tier chosen.

For each language and each transliteration, you have to create a function in a python file located in /search/transliterators (feel free to add new files) with one argument, which contains the text to be transliterated. The function must return a transliterated string. According to the convention used in Tsakorpus, such functions should be called %LANGUAGE_NAME%_translit_%TRANSLITERATION%(text), but that’s up to you.

This a simple example from /search/transliterators/armenian.py:

 1dictArm2Lat = {'խ': 'x', 'ու': 'u', 'ւ': 'w',
 2               'է': 'ē', 'ր': 'r', 'տ': 't',
 3               'ե': 'e', 'ը': 'ə', 'ի': 'i',
 4               'ո': 'o', 'պ': 'p', 'չ': 'č‘',
 5               'ջ': 'ĵ', 'ա': 'a', 'ս': 's',
 6               'դ': 'd', 'ֆ': 'f', 'ք': 'k‘',
 7               'հ': 'h', 'ճ': 'č', 'կ': 'k',
 8               'լ': 'l', 'թ': 't‘', 'փ': 'p‘',
 9               'զ': 'z', 'ց': 'c‘', 'գ': 'g',
10               'վ': 'v', 'բ': 'b', 'ն': 'n',
11               'մ': 'm', 'շ': 'š', 'ղ': 'ġ',
12               'ծ': 'c', 'ձ': 'j', 'յ': 'y',
13               'օ': 'ō', 'ռ': 'ŕ', 'ժ': 'ž',
14               'և': 'ew', ':': '.'}
15
16def armenian_translit_meillet(text):
17    text = text.replace('ու', 'u')
18    text = text.replace('ու'.upper(), 'U')
19    text = text.replace('Ու'.upper(), 'U')
20    textTrans = ''
21    for c in text:
22        try:
23            c = dictArm2Lat[c]
24        except KeyError:
25            try:
26                c = dictArm2Lat[c.lower()].upper()
27            except KeyError:
28                pass
29        textTrans += c
30    return textTrans

When you are done, you have to import your functions in /search/web_app/transliteration.py and add function calls to trans_%TRANSLITERATION%_baseline under a condition like if lang == '%LANGUAGE_NAME%'. If there is no existing function for your transliteration name, you can add one. The transliterations will be applied to the sentence text (“baseline”) and certain fields, such as word form and lemma. Applying transliterations to some other fields, such as glosses, might require slightly different rules. Separate functions for such cases will probably be added in one of the later releases.

If no function is found for some transliteration or some language, nothing bad will happen.

Also see Input methods.