Realize the Power of .com and .net in Local Languages
Verisign Internationalized Domain Names (IDNs) enable businesses to say .com and .net in local language characters. It’s a friendlier, more meaningful way to connect with customers.
In 1996, approximately two-thirds of all Internet users were in the United States so English or Latin-based (also known as ASCII) characters served as the foundation to navigating the Web. However since that time, according to the 2012 State of the Global and Local Internet Comscore report, the non-English speaking Internet population has grown to 87 percent of Internet users, with the Asia-Pacific region accounting for 41.1 percent of the Internet users in the world.
Originally domain names supported only ASCII characters (A to Z, 0 to 9, and the hyphen “-“) which meant that non-English words requiring diacritics (e.g. accent mark, umlaut, breve, dots, etc.) and other languages based on non-Latin characters (e.g. Hangul, Arabic, Thai, Simplified Chinese, etc.) could not be used to navigate the Internet.
With more and more Internet activity taking place outside of Western countries, the introduction of non-Latin based characters is a timely advance for registrars and their customers, bringing new market opportunities to both regional and global registrars seeking to expand their business.
In 2000, Verisign introduced International Domain Names (IDNs) at the second level (to the left of the dot) for .com and .net. This meant domain names such as 스타벅.com could be created, registered, and searched for making the Internet more accessible and relevant to millions of users.
In 2012, Verisign applied to operate registries for nine transliterations of .com and three of .net (to the right of the dot) as part of ICANN’s new generic top-level domain (gTLD) which will allow Verisign to bring businesses full domain names in local language characters.
Verisign’s proposed approach for these new IDN gTLDs will help ensure a ubiquitous end-user experience, and helps to protect consumers and business from having to register purely defensive domain names in our TLDs. In practice, Verisign’s proposed approach means that the registrant for a second-level domain name in our IDN.IDN, IDN.com or IDN.net will have the sole right (subject to applicable rights protection mechanisms), but not be required to register that identical second-level domain in any of the top-level IDNs, .com or .net as applicable.
In order to illustrate our approach, we have identified two use cases below:
Use Case No. 1: Bob Smith already has a registration for an IDN.net second level domain name. That second level domain name will be unavailable in all of the new .net TLDs except to Bob Smith. Bob Smith may choose not to register that second level domain name in any of the new transliterations of the .net TLDs.
Use Case No. 2: John Doe does not have a registration for an IDN.com second level domain name. John Doe registers a second level domain name in our Thai transliteration of .com but in no other TLD. That second level domain name will be unavailable in all other transliterations of .com IDN TLDs and in the .com registry unless and until John Doe (and only John Doe) registers it in another .com IDN TLD or in the .com registry.
Verisign applied for nine transliterations for .com and three for .net as part of the new generic top-level domain (gTLD) program to bring businesses full .com and .net domain names in local language characters.
A registrant requests an IDN from a registrar that supports IDNs. The registrar converts the local-language characters into a sequence of supported letters using an ASCII-compatible encoding (ACE). The registrar submits the ACE string to the Verisign® Shared Registration System (SRS) where it is validated. The IDN is added to the .com and .net TLD zone files and propagated across the Internet.
When a user enters an IDN using native scripts into a Web browser or follows a link, IDN-enabled applications encode the characters into an ACE string that the DNS understands. The DNS processes the request and returns the information to the application. Although the process sounds simple, IDN-enabled application and the DNS support of different languages and scripts has required significant research and development.
The Internet Engineering Task Force (IETF) led the effort to create standards for using non-ASCII characters in the Domain Name System (DNS).
The DNS only recognizes ASCII characters A-Z, 0-9 and '-'. This limits the number of characters that can be utilized to build domain names to 37 of the more than 96,000 characters identified within Unicode. To create domain names from the range of Unicode characters, a character-encoding scheme that uniquely maps Unicode code points to an ASCII representation must be used and standardized.
The IETF published these standards related to Internationalized Domain Names (IDN): Encoding Schemes, Framework, Protocol, Unicode and Right-to-Left Scripts.
The encoding scheme for IDNs uses Punycode, an ASCII Compatible Encoding (ACE) that encodes local language characters into ASCII characters such that DNS can accurately answer a request for an address record. To select Punycode as the ACE standard, IETF considered the balance between compression and implementation. Punycode allows the greatest number of characters (code points) to be represented and is not difficult to deploy.
This RFC is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA) that was largely completed in 2008, known within the series and elsewhere as "IDNA2008." The series replaces an earlier version of IDNA [RFC 3490] [RFC 3491]. For convenience, that version of IDNA is referred to as "IDNA2003." The newer version continues to use the Punycode algorithm [RFC3492] and the ACE (ASCII-Compatible Encoding) prefix from the earlier version.
This RFC describes the core IDNA2008 protocol and its operations. In combination with the "bi-directional" (Bidi) document described below, it explicitly updates and replaces [RFC 3490].
This RFC specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an IDN. It is part of the specification of IDNA2008.
The use of right-to-left scripts in Internationalized Domain Names (IDNs) has presented several challenges. This RFC provides new Bidi rules for Internationalized Domain Names for Applications (IDNA) labels, based on the problems encountered with some scripts and some shortcomings in the 2003 IDNA Bidi criterion.
This RFC provides the background, explanation and rationale for the need of new RFCs to tackle issues that have risen out of the previous version(s) of IDNA. The need to update the version of Unicode supported in IDNs is also discussed in this RFC.
These standards have been published and are now available:
Verisign is committed to following the IETF standards and supporting rapid deployment of this new technology.
Internationalized Domain Names (IDNs) are second- or third-level domain names or Web addresses registered in any character set or script defined in Unicode.
Understanding how Verisign IDNs support domain name registration in hundreds of native languages with a single Shared Registration System (SRS) requires an understanding of how characters and script are used in written language and translated for computing.
A script is a collection of symbols used to represent textual information in a language. Examples of scripts: Latin, Arabic, Han, Greek.
A character is the basic building block of any script, and thus any written language. It invokes a meaning at a fundamental level; you cannot break a character down any further and still have meaning.
A written language utilizes characters from one or more scripts to communicate meaning. Examples of languages: English, Farsi, Chinese, Greek.
Different scripts use different keyboards or soft keyboards for input into computing devices. Computer operating systems have Input Method Editors (IME) that facilitates the input of different scripts. IDNs are a similar type of adaptation, allowing people to use their local-language script to navigate the Web, send and receive email, transfer files and other applications that require domain names.
A computer uses encoding of characters to understand them. Each character within a character set is assigned a unique number. For example, in the ASCII-coded character set, the uppercase "A" is assigned the number 65. Most domain names are registered in ASCII characters (A to Z, 0 to 9 and the hyphen “-“). However, non-English words that require diacritics such as Spanish and French, and languages that use non-Latin scripts such as Kanji and Arabic, cannot be rendered in ASCII. Unicode is a universal coded character set, which covers as many as 350 different native languages. For this reason, IDNs use Unicode.
The Verisign IDN infrastructure complies with ICANN Registry Implementation Committee (RIC) guidelines and requires that each IDN be associated with a specific language using a “language tag.” The registrant selects the IDN language tag during the registration process. If an IDN combines more than one language, the registrant must select the most appropriate language. Not all language tags are referenced today; however, capturing the information during the registration process allows the adoption of language tables in the future. Download the PDF list of Verisign valid language tags
When an IDN registration is requested, the language tag is checked against a list of languages that have character inclusion tables or character-variant mapping tables. These tables are applied to the Unicode points that make up a registration to determine whether the registration is valid for a specific language. If a registration fails for one language, the character set may still be available with a different language tag.
Verisign has worked to address the issue of character variants with interested stakeholders. Registrants typically register domain names that have meaning in their own language such as a name, word or phrase. However, a single script may be used by more than one language.
As a result, a domain name may have different meanings in the context of other languages or cultures. The variant phenomenon has been classified into four different categories: character, orthographic, lexemic and contextual variants. Verisign has determined that addressing character variants is essential to enable users to navigate the Internet in their own languages. The other variants require difficult linguistic judgments that are not essential to delivering a robust IDN solution.
Many languages may have character variants that could potentially cause end-user confusion. For example, the Chinese language has two written forms: Simplified Chinese; used primarily in Mainland China, and Traditional Chinese, used primarily in Taiwan, Hong Kong and other Southeast Asian countries. The two written forms share many characters; however, simplified characters in Simplified Chinese may have the same meaning as complex characters in Traditional Chinese. These characters, called character variants, have the same meaning and pronunciation, but they do not look the same.
Different thought leaders in the technical community have suggested different approaches to address the character variant issue. Each approach has both positive and negative aspects. However, the IDN community is in agreement that the character variant issue may never fully be addressed because languages are always in a state of change. New character variants between languages will continue to be introduced into languages. Verisign has adopted language tags that reference language tables to address the character variant issue.
Verisign has worked to address the issue of character variants with interested stakeholders, including China Network Information Center (CNNIC) (.cn), Taiwan Network Information Center (TWNIC) (.tw), National Internet Development Agency of Korea (.kr), Japan Registry Service (JPRS) (.jp), the Chinese Domain Name Consortium (CDNC) and the IDN Implementation Committee established by ICANN.
Verisign has developed a policy for IDN registrations specifying permissible and prohibited code points.
The Verisign Shared Registration System (SRS) allows the creation of Internationalized Domain Names (IDNs) that contain Unicode supported non-ASCII scripts.
Understand the five validation rules through which the policy is implemented.
After validating an IDN, Verisign executes some further logic based on the Language Tag of the registration.