You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

What are Internationalized Domain Names?

Internationalized Domain Names (IDNs) are domains that are in non-Latin scripts such as Cyrillic, Arabic, Chinese, or Kanji. IDNs have not always been a part of the Internet. Traditionally, domains have been restricted to the Latin alphabet ASCII script: A to Z, 0 to 9, and the hyphen. This original design excluded the use of languages that are based on other scripts. Native languages in Asia, Eastern Europe, and the Middle East were not included in the DNS. As the Internet expanded internationally, this constraint became an evident barrier for online participation. Thus, the Internet’s technical community began to develop standards to incorporate non-Latin scripts in the DNS. 

The goal of the IDN standards are to maintain the universal consistency of the Domain Name System, while allowing individuals to access the Internet using their own languages and scripts. Universality is important in the DNS as it ensures people can find who or what they are looking for online. If the DNS was not universally consistent, there would be no guarantee that accessing www.ICANN.org would take a user to ICANN’s website. Thus, IDN standards work to maintain this universal consistency, while allowing users to access or develop content or send emails that is in their own native scripts.

Simplistically speaking, IDN standards work by transliterating the restricted ASCII character set into international scripts. Thus, every series of non-ASCII characters is converted into a string of ASCII characters prefixed with xn-- (see example below). Although the xn-- text is meaningless to human readers, computers use this code to transliterate characters into a meaningful human name. 

 

Machine Readable Domain

International Human Readable Domain

xn--80abnh6an9b.xn--p1ai

образец.рф

IDNs: From Past to Present

The implementation of IDNs began in 2000. At the time when IDNs were first introduced, they only allowed for non-ASCII characters at the second level (for example, παράδειγμα.eu where the “.eu” remains in Latin script). By the mid-2000s, some countries began experimenting with multilingual top-level domains. China, for example, developed Chinese character generic top level domains for example .china (中国) and .publicinterest (公益). However, because these IDNs were not officially part of the DNS hierarchy, not everyone could resolve to the addresses without installing additional software. To maintain the universal consistency and interoperability of the Internet, this development put pressure on ICANN to internationalize generic TLDs in addition to the hybrid IDNs.

In 2009, the ICANN board approved a fast track process for IDN country code TLDs. By 2011, 17 IDN country code TLDs were launched. Since then, there has been a steady increase including: .한국 (Republic of Korea), .قطر (Qatar), الجزائر (Algeria), .香港 (Hong Kong), .қаз (Kazakhstan), and срб (Serbia).

In 2013, as part of the new gTLD program, ICANN signed its first contracts for generic IDNs including شبكة. (.web), .游戏 (.games), .сайт (.site), and .онлайн (.online). By the end of 2015, more than 400 new generic TLDs were offering internationalized names, including nearly 80 new IDN gTLDs. 

Current Policy Challenges 

Language Diversity and Universal Acceptance

There is little doubt that the introduction of IDNs has helped diversify multilingual internet content. According to the 2016 IDN World Report, where IDNs are in use, the language of web content is more diverse than with traditional ASCII domains (see graphic below). However, there remains a substantial gap between the diversity of languages spoken offline and those represented on the Internet. For example, English is the native language of only 5% of the world’s population, yet remains the language for more than half of the Internet’s content.

 

Part of the reason why we are still seeing a lack of language diversity is because IDNs still do not have universal acceptance. The notion of universal acceptance is to make sure that all relevant actors have adopted standards to ensure that IDNs are usable anywhere, and in any context that an ASCII domain name would be used. Universal acceptance goes beyond the infrastructure of the DNS. For example, Internet browsers all have their own ways of displaying and translating IDNs. But there remain challenges for email clients, mobile applications, and software, which still traditionally use ASCII characters. As innovation continues in areas such as the Internet of Things, universal acceptance will become a bigger challenge as more “things” use the DNS to communicate with one another.

What is ICANN doing? 

There are several groups in ICANN working across various issues of IDN standardization, training and promotion, and universal acceptance. Below is a summary of a few of these groups’ activities.  

Universal Acceptance Steering Group (UASG)

The Universal Acceptance Steering Group was established by ICANN in 2015 to help promote the universal acceptance of all domain names and email addresses. The UASG defines universal acceptance as “the state where all valid domain names and email addresses are accepted, validated, stored, processed and displayed correctly and consistently by all Internet enabled applications, devices and systems”. The UASG considers itself more of an advocacy group that will work over a ten-year period to coordinate outreach, best practices and knowledge repositories surrounding all top-level domains. A large part of the UASG’s work is focused on IDN domain names and email. In particular, the UASG will work to improve business cooperation with browser companies, search engines, IDNs and domestic marketing communications,

Task Force on Arabic IDNs (TFAIDN)

In 2013, the Middle East Strategy Working Group formed the Task force on Arabic Script Internationalized Domain Names (TF-AIDN), which focuses on technical issues and solutions relevant to the deployment of Arabic IDNs. There are a number of technical challenges related to the registration of Arabic IDNs. For example, Tashkeel and Shadda are accent marks placed above or below Arabic letters to produce proper pronunciation. They are used to denote different meanings for different words that use the same base characters. However, neither Tashkeel nor Shadda are permitted in the zone files when registering domain names (RFC 5564). The TF-AIDN works to create standards and frameworks to effectively deal with challenges such as this, as well as training material for Arabic IDN user groups.

Label Generation Rules Panels (LGR)

The DNS’ root zone file is the master file that maintains a list of all of the generic and country code top level domains. This master list is important for maintaining the universal consistency of all the domain names and their associated IP addresses. However, there are specific rules for how information is stored in this file. Traditionally only ASCII characters could be used in the root zone. However, the introduction of IDN labels has led to the development of new rules for storing information in the root. Label Generation Rules (LRG) Panels develop rules for a specific script to be stored in root zone file.  There are currently 15 LRG panels working on the following scripts: Arabic, Armenian, Chinese, Cyrillic, Ethiopic, Georgian, Greek, Japanese, Khmer, Korean, Lao, Latin, Myanmar, Neo-Brahmi, and Thai). Each of these panels are at different phases in developing rules for the root zone. 

  • No labels