The Unified transliteration Scheme for Carnatic Music Compositions
English transliteration scheme for representing lyrics of Carnatic Music
compositions in the Indic languages the Sanskrit, Telugu, Tamil, Kannada and
Malayalam, the five languages used in Carnatic Music. (Malayalam not yet fully supported in the scheme
Click here to view the scheme.
The primary goals of the scheme are
- Aid fans and practioneers of carnatic music to grasp the
pronounciation of compositions as accurately as possible.
- A secondary goal of the scheme is that it should be easy to read in
english, and hence it aims towards a fair phonetic representation of the
underlying pronounciation - although within the limitations and
constraints of (a) the capability of the english language to represent
these sounds, (b) conformance to established but not necessarily
phonetically accurate spellings for popular words/combinations, and
(c) conformance to the most common already established transliteration
rules, which again may not be phonetically accurate.
- While writing text in the scheme, a general rule of thumb to
follow is try to represent the underlying sound of the word, and
not how the word is written in a specific target language
- Enable those carnatic music fans to view the text of the compositions in
any of any of the five target Indic languages. The assumption is that many
Carnatic Music fans and practitioneers may relate better to lyrics represented
in their native language even for compositions in a different language.
An immediate implication of Goal #1 is that the scheme should be able to
represent the sounds of all the languages. A not so obvious implication of
Goal #2, is that the scheme should be such that, a single representation of
should be renderable in all the five languages. This
provides the reason for the "unified" part of the scheme.
This makes the scheme markedly different from the common transliteration
schemes already in existence for these languages. All those schemes have a
primary (if not always explicitly stated) and overriding goal of being able to
unambigiously represent the target language in the transliteration source.
This invariable leads to script specific idiosyncracies right in the
transliteration scheme itself (explained later below). This of course means
that a text in this scheme is of little use to people who dont know the
language, as they would not be able to grasp the pronounciation from the
english transliteration text. This in turn makes it harder for people (who
know different languages) to share lyrical information in an effective manner.
The unified scheme strives to avoid language script specific idiosyncracies as
much as possible. For a sound (phoneme) that is common to all languages, there
is almost always a single representation in the scheme and thus no matter
which language it figures in, a reader who understands the scheme should be
able to grasp the pronounciation. In some cases, for ease of use, the
scheme does allow for various ways of specifying a particular sound (or
combination of sounds). In all these cases, one way would be a preferred
language neutral way, and the other that may be specific one or more target
languages. Using the latter representation will usually not affect translation
to other languages, and hence can be used, but it is not preferred as it may
make the input text "less phonetic" and hence harder for people who do not
know the specific target language to grasp the underlying pronounciation.
Another big difference between this scheme and standard schemes is that this
scheme requires a "smarter" transliteration engine to apply language specific
rules. In fact, this requirement is the reason why the scheme itself is able
to rid itself of script specific idiosyncracies, and also be able to be easy
Conformance to common existing transliteration scheme rules
As stated above, the transliteration scheme is intended to be as phonetic
based as possible
(i.e. phonetic in english) as possible. However, it
adheres to most of the already accepted norms such as A, E, I, O, U for
long vowels; t vs T, d vs D for softer vs harder consonants etc. These norms
are not phonetically ideal but they are nevertheless fairly well established
amongst most if not all existing schemes for the target languages. The scheme
conforms to these norms as it is expected that most consumers would be
familiar to those norms.
Avoidance of language specific artifacts
Here are some examples of how the scheme avoids language specific artifacts:
- In words like candra, languages such as Kannada, Telugu etc.
include the anuswara instead of standard na consonant. Standard
transliteration schemes may require this to be explicitly specified such
as caMdra. The presence of the anuswara does not affect the
pronounciation and hence the change makes the transliterated text be a "less
fair" representation phonetically. Also, the anuswara is not needed in all
languages - e.g. as in Tamil. In the scheme below, the above word can simply
be specified as candra (it can actually also be specified as
caMdra). In almost all cases, there is no need to explicitly specify
the anuswara, and the engine will automatically figure out where it is
- Similarly pArtasArati is rendered in Tamil as
பார்த்தசாரதி, as Tamil requires the extra
t(த்) after pAr
(பார்) to make the
following த take the harder sound (i.e.
t as opposed to d. Standard schemes require this to explicitly
specified such as pArttasArati. This is not required here and the text
can remain as pArtasArati making it easily translatable to other Indic
languages that do not require the extra character.
- Consider the word sundari. Standard Telugu, Kannada schemes
may require this to be specified as suMdari. Here the M is to
Telugu, with the anuswara following su as in
Even a Tamil transliteration scheme may require this to be specified as
su~ndari with say %n (an arbitrary chosen representation) required
for the character ந், differentiating
it from n which would stand for
ன். Here both Tamil characters carry
the same sound, but the former one is used only when preceding
த variant as here. However both
suMdari and su%ndari diverge away from a "fairer" phonetic
representation. In the scheme here, you can simply specify sundari, and
it will be rendered correctly in all the languages.
- Another example is the word (poison) in Tamil, which is written as
நஞ்சு. Here the
nj sound is represented as
ஞ்ச், and some schemes
may require this to be specified explicitly (e.g. na~jcU/na~ncu,
with ~j/~n for ஞ் and
cu for சு). However,
na~jcu/na~ncu as it appears in English, is far from being a fair
phonetic representation of the pronounciation of the word it represents. It
also leads to incorrect pronounciation in other languages, which would need
the ja letter (but the ja letter would be inappropriate for
tamizh here). In the scheme here, you can simply specify nanju, which
is phonetically correct and the smart engine, will make sure it is rendered
correctly in all target languages.
- Finally, let us take the example of
Tamil, meaning wind. The sound of the
ற் here is really
ட். Many schemes may however require
the transliteration text to be specified as kARRu, again to make it
represent how it is written. In the unified scheme, this is not desirable for
two reasons: (a) kARRu is a poor phonetic representation of the
underlying word (b) it will translate incorrectly to other languages unless
special logic is added. Hence, in the scheme, you specify it as kATRu.
This is an example of the general rule of thumb mentioned above: try to
represent the underlying sound of the word, and not how the word is written in
a target language
Anuswaras - When to specify them explicitly?
The scripts of all target languages except Tamil have the anuswara character, and the scheme does
allow for explicit specification of the anuswara character. However, its usage should be carefully
considered and used only in places where it is absolutely needed. The reasons for this is explained
Firstly, the anuswara usage varies significantly among the target languages. Secondly, in these languages
(except classical sanskrit?), the anuswara does not represent a separate sound/phoneme, but instead
#n, ~n, n,
N, n, or m.
Hence, the anuswara can be considered an artifact of the script, and avoided in the input text, which
should try to represent the underlying sound of a word, rather than how it is written in any particular
target language. This is all the more important because the different languages follow different rules,
and anuswara figuring in a certain context in a certain language do not mean it will also figure
in that context in a different language.
However, for some contexts, it is not possible for the editor to easily figure out if anuswara should
be used or not. Hence, the explicit specification of anuswara is not completely avoidable. However,
it is strongly recommended that it be avoided wherever possible as explained later.
Explicit Anuswara Specifier Representation in the Scheme
The scheme provides three different ways for specifying the anuswara
as opposed to just M
that many standard schems employ. The reason for this
variety is to avoid the explicit anuswara specification from hiding the underlying phoneme so that the
input text still is a phonetically fair (in english) representation of the underlying word. For example
is better than saMgIta
. Here are
the recommendations as to which anuswara specifier to use when
- Use `n when the anuswara represents
#n, ~n, or n sound. For example,
sa`ngIta instead of saMgIta, sa`ncAri
instead of saMcAri, and sa`ntOsha instead of
- Use `N when the anuswara represents N sound. For example,
sa`NDIna instead of saMDIna.
- Use M when the anuswara represents m sound.
Using explicit Anuswara Specifier in Kannada/Telugu
In kannada and telugu, the anuswara in the script follows these rules:
- Anuswara is always used instead of m at end of words.
- Anuswara is always used instead of #n when preceding
- Anuswara is always used instead of ~n when preceding
- Anuswara is always used instead of n when preceding
- Anuswara is always used instead of N when preceding
- Anuswara is always used instead of m when preceding
- Anuswara is sometimes used (depending on the word) instead of m when preceding
other consonants such as y, r etc. (for example, saMyukta)
Since for the first 6 rules the anuswara is always
implied, it is strong recommended that
you not use the explicit anuswara specifier in these contexts. Note that this does imply that
certain Sanskrit based words when translated to sanskrit may not appear with anuswara when it
should have. For example, it sa#nga
, but sa`ngIta
in sanskrit. But both would be written with anuswara in kannada and telugu. A careful user can
explicitly specify the anuswara if a better translation to sanskrit is desired.
For cases covered by the 6th rule, explicit anuswara specifier needs to be specified wheverer applicable.
Using explicit Anuswara Specifier in Sanskrit
In sanskrit, there are no contexts in which anuswara always
figures. Instead it depends
on the word. Anuswara at end of words, also follow different rules according to different
- Anuswara is used for words ending in m which are in
the middle of sentence. Anuswara is not used at end of sentences.
- Anuswara is always used for words ending in m
- Anuswara is never used for words ending in m
In Sanskrit, it is up to the user to explicitly specify the anuswara where it is applicable.
Using explicit Anuswara Specifier in Malayalam
Malayalam is similar to sanskrit except that it does not use anuswara at end of words,
and also uses anuswara in the middle of words but in fewer possible contexts.
In Malayalam, like Sanskrit, it is up to the user to explicitly specify the anuswara where
it is applicable.
The unified transliteration scheme for carnatic music uses qualifiers
in the form of super-script numbers for the following purposes:
- Represent sounds that are not part of the language
- Resolve any ambiguities in the letters of the script (as in the case of
For languages other than Tamil, there is only one case where qualifiers
are used. That is the Kannada and Telugu representation of the za/zgha
which occurs in Tamil and Malayalam. For Tamil, qualifiers are needed in many
more contexts because both case #1 and case #2 arise when representing lyrics
from other languages. For information on when qualifiers apply, please refer
to the scheme.
Click here to view the scheme.
Back to Top