CHARACTER VARIANTS

Verisign has worked to address the issue of character variants with interested stakeholders. Registrants normally register domain names that are meaningful in their own language such as a name, word or phrase. However, a single script may be used by more than one language.


As a result, a domain name may have different meanings in the context of other languages or cultures. This variant phenomenon has been classified into four different categories: character, orthographic, lexemic and contextual variants. Verisign has determined that addressing character variants is essential to enable users to navigate the Internet in their own languages. The other variants require difficult linguistic judgements that are not essential to delivering a robust IDN solution.

CHINESE CHARACTER VARIANTS

Many languages may have character variants that could potentially cause end user confusion. For example, the Chinese language has two written forms: Simplified Chinese, used primarily in Mainland China and Traditional Chinese, used primarily in Taiwan, Hong Kong and other South East Asian countries. The two written forms share many characters; however, simplified characters in Simplified Chinese may have the same meaning as complex characters in Traditional Chinese. These characters, called character variants, have the same meaning and pronunciation, however they do not look the same.

A CHARACTER VARIANT SOLUTION

Various thought-leaders in the technical community have suggested different approaches to addressing the character variant issue. Each approach has both positive and negative aspects. However, the IDN community is in agreement that the character variant issue may never be fully addressed because languages are always in a state of change. New character variants between languages will continue to be introduced into languages. Verisign has adopted language tags that reference language tables to address the character variant issue.

Verisign has worked to address the issue of character variants with interested stakeholders, including the China Network Information Centre (CNNIC) (.cn), the Taiwan Network Information Centre (TWNIC) (.tw), the National Internet Development Agency of Korea (.kr), the Japan Registry Service (JPRS) (.jp), the Chinese Domain Name Consortium (CDNC) and the IDN Implementation Committee established by ICANN.

LANGUAGE TAGS

The Verisign IDN infrastructure complies with ICANN Registry Implementation Committee (RIC) guidelines and requires that each IDN be associated with a specific language using a "language tag". The registrant selects the IDN language tag during the registration process. If an IDN combines more than one language, the registrant must select the most appropriate language. (Not all language tags are referenced today; however, capturing the information during the registration process allows the adoption of language tables in the future. Download the list of Verisign Valid Language Tags (PDF).

LANGUAGE TABLES

When an IDN registration is requested, the language tag is checked against a list of languages that have character inclusion tables or character variant mapping tables. These tables are applied to the Unicode code points that make up a registration to determine whether the registration is valid for a specific language. If a registration fails for one language, the character set may still be available with a different language tag.

Language Tables Deployed in the Verisign Character Variant Solution

LANGUAGECode Points
Chinese
Japanese
PolishOnly the Latin characters
GreekU+002D, U+0030 through U+0039, U+0370 through U+03FF
RussianU+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
BelarusianU+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
UkrainianU+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
SerbianU+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
MacedonianU+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F
BulgarianU+002D, U+0030 through U+0039, U+0400 through U+04FF, U+0500 through U+052F

IDN CODE POINTS

The Verisign Shared Registration System (SRS) allows a registrant to register IDNs via a registrar in any script identified within Unicode 5.2 and passed through the IDNA2008 Protocol specification (RFC 5891). To allow for rare scripts, musical notations and other special characters, Verisign has specified permissible, restricted and prohibited code points in our Policy for IDN Code Points.

NEED MORE INFO?