Unified Transliteration Scheme for Carnatic Music Compositions

Synopsis

The Unified transliteration Scheme for Carnatic Music Compositions is an English transliteration scheme for representing lyrics of Carnatic Music compositions in the Indic languages the Sanskrit, Telugu, Tamil, Kannada and Malayalam, the five languages used in Carnatic Music. (Malayalam not yet fully supported in the scheme).

Click here to view the scheme.

Introduction

The primary goals of the scheme are

Aid fans and practioneers of carnatic music to grasp the pronounciation of compositions as accurately as possible.
1. A secondary goal of the scheme is that it should be easy to read in english, and hence it aims towards a fair phonetic representation of the underlying pronounciation - although within the limitations and constraints of (a) the capability of the english language to represent these sounds, (b) conformance to established but not necessarily phonetically accurate spellings for popular words/combinations, and (c) conformance to the most common already established transliteration rules, which again may not be phonetically accurate.
2. While writing text in the scheme, a general rule of thumb to follow is try to represent the underlying sound of the word, and not how the word is written in a specific target language
Enable those carnatic music fans to view the text of the compositions in any of any of the five target Indic languages. The assumption is that many Carnatic Music fans and practitioneers may relate better to lyrics represented in their native language even for compositions in a different language.

An immediate implication of Goal #1 is that the scheme should be able to represent the sounds of all the languages. A not so obvious implication of Goal #2, is that the scheme should be such that, a single representation of a composition should be renderable in all the five languages. This provides the reason for the "unified" part of the scheme.

This makes the scheme markedly different from the common transliteration schemes already in existence for these languages. All those schemes have a primary (if not always explicitly stated) and overriding goal of being able to unambigiously represent the target language in the transliteration source. This invariable leads to script specific idiosyncracies right in the transliteration scheme itself (explained later below). This of course means that a text in this scheme is of little use to people who dont know the language, as they would not be able to grasp the pronounciation from the english transliteration text. This in turn makes it harder for people (who know different languages) to share lyrical information in an effective manner.

The unified scheme strives to avoid language script specific idiosyncracies as much as possible. For a sound (phoneme) that is common to all languages, there is almost always a single representation in the scheme and thus no matter which language it figures in, a reader who understands the scheme should be able to grasp the pronounciation. In some cases, for ease of use, the scheme does allow for various ways of specifying a particular sound (or combination of sounds). In all these cases, one way would be a preferred language neutral way, and the other that may be specific one or more target languages. Using the latter representation will usually not affect translation to other languages, and hence can be used, but it is not preferred as it may make the input text "less phonetic" and hence harder for people who do not know the specific target language to grasp the underlying pronounciation.

Another big difference between this scheme and standard schemes is that this scheme requires a "smarter" transliteration engine to apply language specific rules. In fact, this requirement is the reason why the scheme itself is able to rid itself of script specific idiosyncracies, and also be able to be easy to read.

Conformance to common existing transliteration scheme rules

As stated above, the transliteration scheme is intended to be as phonetic based as possible (i.e. phonetic in english) as possible. However, it adheres to most of the already accepted norms such as A, E, I, O, U for long vowels; t vs T, d vs D for softer vs harder consonants etc. These norms are not phonetically ideal but they are nevertheless fairly well established amongst most if not all existing schemes for the target languages. The scheme conforms to these norms as it is expected that most consumers would be familiar to those norms.

Avoidance of language specific artifacts

Here are some examples of how the scheme avoids language specific artifacts:

In words like candra, languages such as Kannada, Telugu etc. include the anuswara instead of standard na consonant. Standard transliteration schemes may require this to be explicitly specified such as caMdra. The presence of the anuswara does not affect the pronounciation and hence the change makes the transliterated text be a "less fair" representation phonetically. Also, the anuswara is not needed in all languages - e.g. as in Tamil. In the scheme below, the above word can simply be specified as candra (it can actually also be specified as caMdra). In almost all cases, there is no need to explicitly specify the anuswara, and the engine will automatically figure out where it is applicable.
Similarly pArtasArati is rendered in Tamil as பார்த்தசாரதி, as Tamil requires the extra t(த்) after pAr (பார்) to make the following த take the harder sound (i.e. t as opposed to d. Standard schemes require this to explicitly specified such as pArttasArati. This is not required here and the text can remain as pArtasArati making it easily translatable to other Indic languages that do not require the extra character.
Consider the word sundari. Standard Telugu, Kannada schemes may require this to be specified as suMdari. Here the M is to render as ಸುಂದರಿ in Kannada, or సుందరి in Telugu, with the anuswara following su as in ಸುಂ/సుం. Even a Tamil transliteration scheme may require this to be specified as su~ndari with say %n (an arbitrary chosen representation) required for the character ந், differentiating it from n which would stand for ன். Here both Tamil characters carry the same sound, but the former one is used only when preceding த variant as here. However both suMdari and su%ndari diverge away from a "fairer" phonetic representation. In the scheme here, you can simply specify sundari, and it will be rendered correctly in all the languages.
Another example is the word (poison) in Tamil, which is written as நஞ்சு. Here the nj sound is represented as ஞ்ச், and some schemes may require this to be specified explicitly (e.g. na~jcU/na~ncu, with ~j/~n for ஞ் and cu for சு). However, na~jcu/na~ncu as it appears in English, is far from being a fair phonetic representation of the pronounciation of the word it represents. It also leads to incorrect pronounciation in other languages, which would need the ja letter (but the ja letter would be inappropriate for tamizh here). In the scheme here, you can simply specify nanju, which is phonetically correct and the smart engine, will make sure it is rendered correctly in all target languages
Finally, let us take the example of காற்று in Tamil, meaning wind. The sound of the ற் here is really ட். Many schemes may however require the transliteration text to be specified as kARRu, again to make it represent how it is written. In the unified scheme, this is not desirable for two reasons: (a) kARRu is a poor phonetic representation of the underlying word (b) it will translate incorrectly to other languages unless special logic is added. Hence, in the scheme, you specify it as kATRu. This is an example of the general rule of thumb mentioned above: try to represent the underlying sound of the word, and not how the word is written in a target language

Anuswaras - When to specify them explicitly?

The scripts of all target languages except Tamil have the anuswara character, and the scheme does allow for explicit specification of the anuswara character. However, its usage should be carefully considered and used only in places where it is absolutely needed. The reasons for this is explained below.

Firstly, the anuswara usage varies significantly among the target languages. Secondly, in these languages (except classical sanskrit?), the anuswara does not represent a separate sound/phoneme, but instead stands for #n, ~n, n, N, n, or m. Hence, the anuswara can be considered an artifact of the script, and avoided in the input text, which should try to represent the underlying sound of a word, rather than how it is written in any particular target language. This is all the more important because the different languages follow different rules, and anuswara figuring in a certain context in a certain language do not mean it will also figure in that context in a different language.

However, for some contexts, it is not possible for the editor to easily figure out if anuswara should be used or not. Hence, the explicit specification of anuswara is not completely avoidable. However, it is strongly recommended that it be avoided wherever possible as explained later.

Explicit Anuswara Specifier Representation in the Scheme

The scheme provides three different ways for specifying the anuswara `n, `N, M, as opposed to just M that many standard schems employ. The reason for this variety is to avoid the explicit anuswara specification from hiding the underlying phoneme so that the input text still is a phonetically fair (in english) representation of the underlying word. For example sa`ngIta is better than saMgIta. Here are the recommendations as to which anuswara specifier to use when

Use `n when the anuswara represents #n, ~n, or n sound. For example, sa`ngIta instead of saMgIta, sa`ncAri instead of saMcAri, and sa`ntOsha instead of saMtOsha.
Use `N when the anuswara represents N sound. For example, sa`NDIna instead of saMDIna.
Use M when the anuswara represents m sound.

Using explicit Anuswara Specifier in Kannada/Telugu

In kannada and telugu, the anuswara in the script follows these rules:

Anuswara is always used instead of m at end of words.
Anuswara is always used instead of #n when preceding k/kh/g/gh.
Anuswara is always used instead of ~n when preceding c/ch/j/jh.
Anuswara is always used instead of n when preceding t/th/d/dh.
Anuswara is always used instead of N when preceding T/Th/D/Dh.
Anuswara is always used instead of m when preceding p/ph/b/bh.
Anuswara is sometimes used (depending on the word) instead of m when preceding other consonants such as y, r etc. (for example, saMyukta)

Since for the first 6 rules the anuswara is always implied, it is strong recommended that you not use the explicit anuswara specifier in these contexts. Note that this does imply that certain Sanskrit based words when translated to sanskrit may not appear with anuswara when it should have. For example, it sa#nga, but sa`ngIta in sanskrit. But both would be written with anuswara in kannada and telugu. A careful user can explicitly specify the anuswara if a better translation to sanskrit is desired.

For cases covered by the 6th rule, explicit anuswara specifier needs to be specified wheverer applicable.

Using explicit Anuswara Specifier in Sanskrit

In sanskrit, there are no contexts in which anuswara always figures. Instead it depends on the word. Anuswara at end of words, also follow different rules according to different conventions:

Anuswara is used for words ending in m which are in the middle of sentence. Anuswara is not used at end of sentences.
Anuswara is always used for words ending in m
Anuswara is never used for words ending in m

In Sanskrit, it is up to the user to explicitly specify the anuswara where it is applicable.

Using explicit Anuswara Specifier in Malayalam

Malayalam is similar to sanskrit except that it does not use anuswara at end of words, and also uses anuswara in the middle of words but in fewer possible contexts.

In Malayalam, like Sanskrit, it is up to the user to explicitly specify the anuswara where it is applicable.

Qualifiers

The unified transliteration scheme for carnatic music uses qualifiers in the form of super-script numbers for the following purposes:

Represent sounds that are not part of the language
Resolve any ambiguities in the letters of the script (as in the case of Tamil)

For languages other than Tamil, there is only one case where qualifiers are used. That is the Kannada and Telugu representation of the za/zgha, which occurs in Tamil and Malayalam. For Tamil, qualifiers are needed in many more contexts because both case #1 and case #2 arise when representing lyrics from other languages. For information on when qualifiers apply, please refer to the scheme.

Click here to view the scheme.

Back to Top