Scripts + Languages

Internationalised Domain Names (IDNs) are second-level or third-level domain names or web addresses registered in any character set or script defined in Unicode.


To understand how Verisign IDNs support domain name registration in hundreds of native languages with a single Shared Registration System (SRS) requires an understanding of how characters and script are used in written languages and translated for computing.

Relationship between Script, Character and Language

SCRIPTLatinArabichanGreek
CHARACTERLس漢字Ω
LanguageEnglishFarsiChineseGreek

SCRIPT

A script is a collection of symbols used to represent textual information in a language. Examples of scripts: Latin, Arabic, Han and Greek.

CHARACTER

A character is the basic building block of any script and thus any written language. It invokes a meaning at a fundamental level; you cannot break a character down any further and still have meaning.

WRITTEN LANGUAGE

A written language utilises characters from one or more scripts to communicate meaning. Examples of languages: English, Farsi, Chinese and Greek.

ADAPTING LANGUAGE TO COMPUTERS

Different scripts use different keyboards or soft keyboards for input into computing devices. Computer operating systems have Input Method Editors (IME) that facilitate the input of different scripts. IDNs are a similar type of adaptation, allowing people to use their local language script to navigate the web, send and receive e-mail, transfer files and use other applications that require domain names.

UNICODE

A computer uses encoding of characters to understand them. Each character within a character set is assigned a unique number. For example, in the ASCII-coded character set, the uppercase "A" is assigned the number 65. Most domain names are registered in ASCII characters (A to Z, 0 to 9 and the hyphen “-“). However, non-English words that require diacritics such as Spanish and French and languages that use non-Latin scripts such as Kanji and Arabic cannot be rendered in ASCII. Unicode is a universal coded character set, which covers as many as 350 different native languages. For this reason, IDNs use Unicode.

LANGUAGE TAGS

The Verisign IDN infrastructure complies with ICANN Registry Implementation Committee (RIC) guidelines and requires that each IDN be associated with a specific language using a “language tag”. The registrant selects the IDN language tag during the registration process. If an IDN combines more than one language, the registrant must select the most appropriate language. (Not all language tags are referenced today; however, capturing the information during the registration process allows the adoption of language tables in the future.) Download the PDF list of Verisign valid language tags

LANGUAGE TABLES

When an IDN registration is requested, the language tag is checked against a list of languages that have character inclusion tables or character variant mapping tables. These tables are applied to the Unicode points that make up a registration to determine whether the registration is valid for a specific language. If a registration fails for one language, the character set may still be available with a different language tag.

NEED MORE INFO?