Note:: This web-page contains characters in Devanagiri, Telugu, Kannada, Tamil and Malayalam. If they do not display correctly, then you need to install proper fonts on your computer. On Firefox, if the Indian language characters do appear but are in the wrong order, please follow instructions given in this page: http://en.wikipedia.org/wiki/Wikipedia:Enabling_complex_text_support_for_Indic_scripts#Check_for_existing_support

Unified Transliteration Scheme for Carnatic Music Compositions
(version 1.2)

Synopsis

The Unified transliteration Scheme for Carnatic Music Compositions is an English transliteration scheme for representing lyrics of Carnatic Music compositions in the Indic languages the Sanskrit, Telugu, Tamil, Kannada and Malayalam, the five languages used in Carnatic Music. (Malayalam not yet fully supported in the scheme).

Click here to view the scheme.

Introduction

The primary goals of the scheme are
  1. Aid fans and practioneers of carnatic music to grasp the pronounciation of compositions as accurately as possible.
    1. A secondary goal of the scheme is that it should be easy to read in english, and hence it aims towards a fair phonetic representation of the underlying pronounciation - although within the limitations and constraints of (a) the capability of the english language to represent these sounds, (b) conformance to established but not necessarily phonetically accurate spellings for popular words/combinations, and (c) conformance to the most common already established transliteration rules, which again may not be phonetically accurate.
    2. While writing text in the scheme, a general rule of thumb to follow is try to represent the underlying sound of the word, and not how the word is written in a specific target language
  2. Enable those carnatic music fans to view the text of the compositions in any of any of the five target Indic languages. The assumption is that many Carnatic Music fans and practitioneers may relate better to lyrics represented in their native language even for compositions in a different language.
An immediate implication of Goal #1 is that the scheme should be able to represent the sounds of all the languages. A not so obvious implication of Goal #2, is that the scheme should be such that, a single representation of a composition should be renderable in all the five languages. This provides the reason for the "unified" part of the scheme.

This makes the scheme markedly different from the common transliteration schemes already in existence for these languages. All those schemes have a primary (if not always explicitly stated) and overriding goal of being able to unambigiously represent the target language in the transliteration source. This invariable leads to script specific idiosyncracies right in the transliteration scheme itself (explained later below). This of course means that a text in this scheme is of little use to people who dont know the language, as they would not be able to grasp the pronounciation from the english transliteration text. This in turn makes it harder for people (who know different languages) to share lyrical information in an effective manner.

The unified scheme strives to avoid language script specific idiosyncracies as much as possible. For a sound (phoneme) that is common to all languages, there is almost always a single representation in the scheme and thus no matter which language it figures in, a reader who understands the scheme should be able to grasp the pronounciation. In some cases, for ease of use, the scheme does allow for various ways of specifying a particular sound (or combination of sounds). In all these cases, one way would be a preferred language neutral way, and the other that may be specific one or more target languages. Using the latter representation will usually not affect translation to other languages, and hence can be used, but it is not preferred as it may make the input text "less phonetic" and hence harder for people who do not know the specific target language to grasp the underlying pronounciation.

Another big difference between this scheme and standard schemes is that this scheme requires a "smarter" transliteration engine to apply language specific rules. In fact, this requirement is the reason why the scheme itself is able to rid itself of script specific idiosyncracies, and also be able to be easy to read.

Conformance to common existing transliteration scheme rules

As stated above, the transliteration scheme is intended to be as phonetic based as possible (i.e. phonetic in english) as possible. However, it adheres to most of the already accepted norms such as A, E, I, O, U for long vowels; t vs T, d vs D for softer vs harder consonants etc. These norms are not phonetically ideal but they are nevertheless fairly well established amongst most if not all existing schemes for the target languages. The scheme conforms to these norms as it is expected that most consumers would be familiar to those norms.

Avoidance of language specific artifacts

Here are some examples of how the scheme avoids language specific artifacts:

Anuswaras - When to specify them explicitly?

The scripts of all target languages except Tamil have the anuswara character, and the scheme does allow for explicit specification of the anuswara character. However, its usage should be carefully considered and used only in places where it is absolutely needed. The reasons for this is explained below.

Firstly, the anuswara usage varies significantly among the target languages. Secondly, in these languages (except classical sanskrit?), the anuswara does not represent a separate sound/phoneme, but instead stands for #n, ~n, n, N, n, or m. Hence, the anuswara can be considered an artifact of the script, and avoided in the input text, which should try to represent the underlying sound of a word, rather than how it is written in any particular target language. This is all the more important because the different languages follow different rules, and anuswara figuring in a certain context in a certain language do not mean it will also figure in that context in a different language.

However, for some contexts, it is not possible for the editor to easily figure out if anuswara should be used or not. Hence, the explicit specification of anuswara is not completely avoidable. However, it is strongly recommended that it be avoided wherever possible as explained later.

Explicit Anuswara Specifier Representation in the Scheme
The scheme provides three different ways for specifying the anuswara `n, `N, M, as opposed to just M that many standard schems employ. The reason for this variety is to avoid the explicit anuswara specification from hiding the underlying phoneme so that the input text still is a phonetically fair (in english) representation of the underlying word. For example sa`ngIta is better than saMgIta. Here are the recommendations as to which anuswara specifier to use when
Using explicit Anuswara Specifier in Kannada/Telugu
In kannada and telugu, the anuswara in the script follows these rules:
  1. Anuswara is always used instead of m at end of words.
  2. Anuswara is always used instead of #n when preceding k/kh/g/gh.
  3. Anuswara is always used instead of ~n when preceding c/ch/j/jh.
  4. Anuswara is always used instead of n when preceding t/th/d/dh.
  5. Anuswara is always used instead of N when preceding T/Th/D/Dh.
  6. Anuswara is always used instead of m when preceding p/ph/b/bh.
  7. Anuswara is sometimes used (depending on the word) instead of m when preceding other consonants such as y, r etc. (for example, saMyukta)
Since for the first 6 rules the anuswara is always implied, it is strong recommended that you not use the explicit anuswara specifier in these contexts. Note that this does imply that certain Sanskrit based words when translated to sanskrit may not appear with anuswara when it should have. For example, it sa#nga, but sa`ngIta in sanskrit. But both would be written with anuswara in kannada and telugu. A careful user can explicitly specify the anuswara if a better translation to sanskrit is desired.

For cases covered by the 6th rule, explicit anuswara specifier needs to be specified wheverer applicable.

Using explicit Anuswara Specifier in Sanskrit
In sanskrit, there are no contexts in which anuswara always figures. Instead it depends on the word. Anuswara at end of words, also follow different rules according to different conventions:

In Sanskrit, it is up to the user to explicitly specify the anuswara where it is applicable.

Using explicit Anuswara Specifier in Malayalam
Malayalam is similar to sanskrit except that it does not use anuswara at end of words, and also uses anuswara in the middle of words but in fewer possible contexts.

In Malayalam, like Sanskrit, it is up to the user to explicitly specify the anuswara where it is applicable.

Qualifiers

The unified transliteration scheme for carnatic music uses qualifiers in the form of super-script numbers for the following purposes:
  1. Represent sounds that are not part of the language
  2. Resolve any ambiguities in the letters of the script (as in the case of Tamil)
For languages other than Tamil, there is only one case where qualifiers are used. That is the Kannada and Telugu representation of the za/zgha, which occurs in Tamil and Malayalam. For Tamil, qualifiers are needed in many more contexts because both case #1 and case #2 arise when representing lyrics from other languages. For information on when qualifiers apply, please refer to the scheme.

Click here to view the scheme.

Back to Top