This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
LGR Version | 1.3 |
---|---|
Date | 2017-04-26 |
Language | ara-Arab |
Scope | domain: tld |
Unicode Version | 6.3.0 |
Label Generation Rules for Arabic Language [Overview] This document specifies a set of Label Generation Rules for the Arabic language using a limited repertoire as appropriate for a second level domain. This is a DRAFT document released for public comments and not final. Please see the announcement on the ICANN website for public comments on Second Level Reference LGRs for details on how to submit comments. Note: it is anticipated that any code points that were excluded but listed for review will be removed in the finalized version of this file unless their status is changed as result of public comments. [Repertoire] All references converge on 36 Arabic letters and 10 Arabic-Indic digits (in addition to the European digits and U+002D (-) HYPHEN-MINUS). The repertoire, analysis and documentation provided in the "RFC5564: Linguistic Guidelines for the Use of the Arabic Language in Internet Domains" [RFC5564] and the "Proposal for Arabic script Root Zone LGR" [ARABIC-PROPOSAL] were consulted in making this determination. Note: the proposal cited is being adopted for the Arabic script portion of the Root Zone LGR, a process that is expected to be finalized in time for the completion of the Second Level Reference LGR project. [Correlation between repertoire and Bidi rules] Because this LGR contains only a subset of PVALID code points, the Bidi rules expressed in section 2 of RFC5893 [121] (Bidi Rule) can be simplified to two constraints: • Prevent digits (either European, Arabic-Indic, or Extended-Arabic-Indic) from starting a label. • As there are three digit sets being used (European, Arabic-Indic, and Extended-Arabic-Indic digits), do not allow a mixture among these sets. There is an IDN table published in the IANA Repository of IDN Practices for Arabic language by .السعودية (xn--mgberp4a5d4ar, the Saudi Arabia IDN ccTLD) [700]. It is complemented by a guidelines document [IDN-GUIDE]. These documents refer to Arabic language as used in Saudi Arabia. [Variants] Variants are based on the variants listed in the proposed Arabic script LGR for the root zone [ARABIC-PROPOSAL]; they are limited to the variants contained in the sub-repertoire used in this document. See [112] for the full definition of Arabic variants and [RFC6365] for the case of U+0629 (ة) ARABIC LETTER TEH MARBUTA and U+0647 (ه) ARABIC LETTER HEH. See also the comments given in the listing. To generate the variant tables, a number of concepts have been used: a) When building a variant table, for each code point in the Arabic language repertoire, a full study is conducted across the whole Arabic script in order to identify all possible variants corresponding to the code point. This is done for more protection to the registry’s TLD-space and to minimize the overhead of re-studying when the registry decided to support more than one language from the Arabic script. Hence, conducting variant study across the whole Arabic script would provide an IDN registry with the following benefits: - Protection to the registry name space regardless of the supported languages. - Doing the work for a language one time. No need to re-study the code-point’s relationship each time a new language (or new set of code point) is supported by the registries. - Flexibility to add more language as they become ready without effecting exciting supported languages Consistent with the principle of identifying all possible variants without neglecting or overlooking any similarities without documenting it. b) One of the main principles for the stability of the Internet and IDNs is that the end user should be able to reach a website connected to his/her domain name regardless of location. Additionally, an end user reads website addresses based on his/her language alphabet and whatever is available in his/her keyboard. Therefore, in order to enforce this principle the input devices (language table) that the user may use to reach a domain name (based on the user location) should be carefully considered when defining variants. Otherwise, it may cause a reachability problem and reduce the user acceptance. For example, if someone registered the domain name “مكة” (all characters from the Arabic language) and a user try to reach the website connected to this domain name from an Internet café or airport, say, in Pakistan. He/she will not be able to reach that website unless if the variant “مکۃ” (Urdu variant) is already allocated and activated. Thus, variants need to be studied from both similarity point of view (by language community) and reachability pointy of view (based on input devices used by other language communities). c) Consistency is very important concept in variants generation. Regardless of the selected applied-for label the list of generated allocate-able variants should be the same. As we are dealing with normal users at the SLD, their 1st choice (applied-for) label might not be the one will be used by the internet community. Therefore, a registry should provide the registrant the possibility to "correct" his/her choice if he/she was not successful with the first try., e.g.,: The word “Internet” is written in most Arab countries in north Africa as (أنترنت) while it is written in other Arab countries as (إنترنت), while end users often write as (انترنت). Therefore, if someone registered "أنترنت", he/she should be able to enable "إنترنت" or "انترنت". Additionally, many words have two correct ways to write them, e.g., (آدم and أدم) are widely used for the same name Adam. Hence, if someone registered "أدم" he/she should be able to enable "آدم" or "ادم”. This is achieved by making variants allocatable in both directions in the LGR. d) Even though we are constructing LGR at a language level and as we addressing international reachability, we need to consider the issue of no mixing between code points from different language tables in generating a valid variant. This is due to the fact that when generating variants of a given label, some of them will be composed of characters (code points) that are not part of a single language or they are not easily available in an input device (keyboard). Therefore, from practical and realistic point of view, and to significantly minimize the number of allocate-able variants and maximize the number of blocked variants, it is an excellent practice to block these unrealistic variants. In some cases, blocked variants due to language mixing represents more than 99%. Hence, it definitely adds more accuracy to the number of allocate-able variants. Note: Please note that set of supported languages used for implementing the above concepts (namely international reachability and no mixing between languages) are preliminary. They should be updated and add new languages whenever they become ready. [Rules] Common rules: • Hyphen Restrictions — restrictions on the allowable placement of hyphens (no leading/ending hyphen and no consecutive hyphens in positions 3 and 4). These restrictions are described in section 4.2.3.1 of RFC5891 [120]. They are implemented here as context rule on U+002D (-) HYPHEN-MINUS. Right-To-Left rules: • Leading digit — restrictions on the allowable placement of digit (no leading digit). This rule is described in section 2.1 of RFC5893 [121]. • No mix between different digit sets (European, Arabic-Indic, and Extended-Arabic-Indic digits) — restrictions on the mix of digits. This rule is described in section 2.4 of RFC5893 [121]. Arabic language specific rules — These rules aim at reducing the allocation of redundant labels. • No connected ALEF MAKSURA in the Arabic language — restriction on having ALEF MAKSURA (0649) before a right joining or dual joining code point. • No languages mixing in the generated variants—restriction on mixing code points from different language tables. Any variant must be generated using code point taken from a single supported language table. [Actions] Actions included are the default actions for LGRs. Also included is a set of Arabic language specific actions to enforce the Arabic language specific WLE rules. [Methodology and Contributors] This reference LGR for Arabic language for the 2nd Level was developed initially by Michel Suignard and Asmus Freytag, and then verified in expert reviews by Michael Everson, Nicholas Ostler, and Wil Tan. Afterwards, it was re-written by TF-AIDN sub-group (namely in alphabetical order: Abdalmonem Tharwat Galila , Abdeslam Nasri , Abdulaziz Al-Zoman, Abdulrahman Alghadir, Hazem Hezzah, Nabil Benamar, Raed Alfayez, Tarik Merghani) to suit the Arabic Language needs. [References] General references for the language: • Wikipedia: Arabic alphabet https://en.wikipedia.org/wiki/Arabic_alphabet • Omniglot: Arabic http://www.omniglot.com/writing/arabic.htm Other references cited in this document: [ARABIC-PROPOSAL] TF-AIDN, "Proposal for Arabic Script Root Zone LGR", Version 3.4, 18 November 2015 https://www.icann.org/en/system/files/files/arabic-lgr-proposal-18nov15-en.pdf [IDN-GUIDE] Saudi Network Information Center, "Guidelines Rules for writing Arabic IDNs under the IDN ccTLD (السعودية.)" http://nic.net.sa/docs/Guidelines_for_writing_Arabic_IDNs_under_the_IDN_ccTLD_V1.2-en.pdf [RFC5564] RFC 5564, "Linguistic Guidelines for the Use of the Arabic Language in Internet Domains" https://tools.ietf.org/html/rfc5564 [RFC6365] RFC 6365, "Terminology Used in Internationalization in the IETF", section 7.2 "Character Relationships and Variants" https://tools.ietf.org/html/rfc6365 [VIP] Internationalized Domain Names Variant Issues Project, Arabic Case Study Team Issues Report, http://archive.icann.org/en/topics/new-gtlds/arabic-vip-issues-report-07oct11-en.pdf In the listing of the repertoire by code point, references starting from [0] refer to the version of the Unicode Standard in which the corresponding code point was initially encoded. Other references (starting from [100]) document usage of code points. For more details, see the Table of References below.
Number of elements in Repertoire | 59 |
---|---|
Number of extended elements | 0 |
Number of excluded elements | 0 |
Total entries in table | 59 |
Number of code point sequences | 0 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where the comment in the original LGR is equal to the character name, it has been suppressed.
For any code point or sequence for which a variant is defined, the link to the associated variant set, or if mapped to itself, the variant type of that mapping is provided in the Variants column.
# | Code Point |
Glyph | Script | Name | Tags | Required Context | Part of Repertoire |
Variants | Comment | References |
---|---|---|---|---|---|---|---|---|---|---|
1 | 002D | - | Common | HYPHEN-MINUS | sc:Zyyy | not: hyphen-minus-disallowed | ✔ | HYPHEN-MINUS | [0] | |
2 | 0030 | 0 | Common | DIGIT ZERO | sc:Zyyy | not: leading-digit | ✔ | set 1 | DIGIT ZERO | [0] |
3 | 0031 | 1 | Common | DIGIT ONE | sc:Zyyy | not: leading-digit | ✔ | set 2 | DIGIT ONE | [0] |
4 | 0032 | 2 | Common | DIGIT TWO | sc:Zyyy | not: leading-digit | ✔ | set 3 | DIGIT TWO | [0] |
5 | 0033 | 3 | Common | DIGIT THREE | sc:Zyyy | not: leading-digit | ✔ | set 4 | DIGIT THREE | [0] |
6 | 0034 | 4 | Common | DIGIT FOUR | sc:Zyyy | not: leading-digit | ✔ | set 5 | DIGIT FOUR | [0] |
7 | 0035 | 5 | Common | DIGIT FIVE | sc:Zyyy | not: leading-digit | ✔ | set 6 | DIGIT FIVE | [0] |
8 | 0036 | 6 | Common | DIGIT SIX | sc:Zyyy | not: leading-digit | ✔ | set 7 | DIGIT SIX | [0] |
9 | 0037 | 7 | Common | DIGIT SEVEN | sc:Zyyy | not: leading-digit | ✔ | set 8 | DIGIT SEVEN | [0] |
10 | 0038 | 8 | Common | DIGIT EIGHT | sc:Zyyy | not: leading-digit | ✔ | set 9 | DIGIT EIGHT | [0] |
11 | 0039 | 9 | Common | DIGIT NINE | sc:Zyyy | not: leading-digit | ✔ | set 10 | DIGIT NINE | [0] |
12 | 0621 | ء | Arabic | ARABIC LETTER HAMZA | sc:Arab | ✔ | ARABIC LETTER HAMZA | [0], [100], [130], [201], [401], [600], [700] | ||
13 | 0622 | آ | Arabic | ARABIC LETTER ALEF WITH MADDA ABOVE | sc:Arab | ✔ | set 11 | ARABIC LETTER ALEF WITH MADDA ABOVE | [0], [100], [130], [201], [401], [600], [700] | |
14 | 0623 | أ | Arabic | ARABIC LETTER ALEF WITH HAMZA ABOVE | sc:Arab | ✔ | set 11 | ARABIC LETTER ALEF WITH HAMZA ABOVE | [0], [100], [130], [201], [401], [600], [700] | |
15 | 0624 | ؤ | Arabic | ARABIC LETTER WAW WITH HAMZA ABOVE | sc:Arab | ✔ | ARABIC LETTER WAW WITH HAMZA ABOVE | [0], [100], [130], [201], [401], [600], [700] | ||
16 | 0625 | إ | Arabic | ARABIC LETTER ALEF WITH HAMZA BELOW | sc:Arab | ✔ | set 11 | ARABIC LETTER ALEF WITH HAMZA BELOW | [0], [100], [130], [201], [401], [600], [700] | |
17 | 0626 | ئ | Arabic | ARABIC LETTER YEH WITH HAMZA ABOVE | sc:Arab | ✔ | set 12 | ARABIC LETTER YEH WITH HAMZA ABOVE | [0], [100], [130], [201], [401], [600], [700] | |
18 | 0627 | ا | Arabic | ARABIC LETTER ALEF | sc:Arab | ✔ | set 11 | ARABIC LETTER ALEF | [0], [100], [130], [201], [401], [600], [700] | |
19 | 0628 | ب | Arabic | ARABIC LETTER BEH | sc:Arab | ✔ | ARABIC LETTER BEH | [0], [100], [130], [201], [401], [600], [700] | ||
20 | 0629 | ة | Arabic | ARABIC LETTER TEH MARBUTA | sc:Arab | ✔ | set 13 | ARABIC LETTER TEH MARBUTA | [0], [100], [130], [201], [401], [600], [700] | |
21 | 062A | ت | Arabic | ARABIC LETTER TEH | sc:Arab | ✔ | set 14 | ARABIC LETTER TEH | [0], [100], [130], [201], [401], [600], [700] | |
22 | 062B | ث | Arabic | ARABIC LETTER THEH | sc:Arab | ✔ | set 15 | ARABIC LETTER THEH | [0], [100], [130], [201], [401], [600], [700] | |
23 | 062C | ج | Arabic | ARABIC LETTER JEEM | sc:Arab | ✔ | ARABIC LETTER JEEM | [0], [100], [130], [201], [401], [600], [700] | ||
24 | 062D | ح | Arabic | ARABIC LETTER HAH | sc:Arab | ✔ | ARABIC LETTER HAH | [0], [100], [130], [201], [401], [600], [700] | ||
25 | 062E | خ | Arabic | ARABIC LETTER KHAH | sc:Arab | ✔ | ARABIC LETTER KHAH | [0], [100], [130], [201], [401], [600], [700] | ||
26 | 062F | د | Arabic | ARABIC LETTER DAL | sc:Arab | ✔ | ARABIC LETTER DAL | [0], [100], [130], [201], [401], [600], [700] | ||
27 | 0630 | ذ | Arabic | ARABIC LETTER THAL | sc:Arab | ✔ | ARABIC LETTER THAL | [0], [100], [130], [201], [401], [600], [700] | ||
28 | 0631 | ر | Arabic | ARABIC LETTER REH | sc:Arab | ✔ | ARABIC LETTER REH | [0], [100], [130], [201], [401], [600], [700] | ||
29 | 0632 | ز | Arabic | ARABIC LETTER ZAIN | sc:Arab | ✔ | ARABIC LETTER ZAIN | [0], [100], [130], [201], [401], [600], [700] | ||
30 | 0633 | س | Arabic | ARABIC LETTER SEEN | sc:Arab | ✔ | ARABIC LETTER SEEN | [0], [100], [130], [201], [401], [600], [700] | ||
31 | 0634 | ش | Arabic | ARABIC LETTER SHEEN | sc:Arab | ✔ | ARABIC LETTER SHEEN | [0], [100], [130], [201], [401], [600], [700] | ||
32 | 0635 | ص | Arabic | ARABIC LETTER SAD | sc:Arab | ✔ | ARABIC LETTER SAD | [0], [100], [130], [201], [401], [600], [700] | ||
33 | 0636 | ض | Arabic | ARABIC LETTER DAD | sc:Arab | ✔ | ARABIC LETTER DAD | [0], [100], [130], [201], [401], [600], [700] | ||
34 | 0637 | ط | Arabic | ARABIC LETTER TAH | sc:Arab | ✔ | ARABIC LETTER TAH | [0], [100], [130], [201], [401], [600], [700] | ||
35 | 0638 | ظ | Arabic | ARABIC LETTER ZAH | sc:Arab | ✔ | ARABIC LETTER ZAH | [0], [100], [130], [201], [401], [600], [700] | ||
36 | 0639 | ع | Arabic | ARABIC LETTER AIN | sc:Arab | ✔ | ARABIC LETTER AIN | [0], [100], [130], [201], [401], [600], [700] | ||
37 | 063A | غ | Arabic | ARABIC LETTER GHAIN | sc:Arab | ✔ | ARABIC LETTER GHAIN | [0], [100], [130], [201], [401], [600], [700] | ||
40 | 0641 | ف | Arabic | ARABIC LETTER FEH | sc:Arab | ✔ | set 16 | ARABIC LETTER FEH | [0], [100], [130], [201], [401], [600], [700] | |
41 | 0642 | ق | Arabic | ARABIC LETTER QAF | sc:Arab | ✔ | ARABIC LETTER QAF | [0], [100], [130], [201], [401], [600], [700] | ||
42 | 0643 | ك | Arabic | ARABIC LETTER KAF | sc:Arab | ✔ | set 17 | ARABIC LETTER KAF | [0], [100], [130], [201], [401], [600], [700] | |
43 | 0644 | ل | Arabic | ARABIC LETTER LAM | sc:Arab | ✔ | ARABIC LETTER LAM | [0], [100], [130], [201], [401], [600], [700] | ||
44 | 0645 | م | Arabic | ARABIC LETTER MEEM | sc:Arab | ✔ | ARABIC LETTER MEEM | [0], [100], [130], [201], [401], [600], [700] | ||
45 | 0646 | ن | Arabic | ARABIC LETTER NOON | sc:Arab | ✔ | set 18 | ARABIC LETTER NOON | [0], [100], [130], [201], [401], [600], [700] | |
46 | 0647 | ه | Arabic | ARABIC LETTER HEH | sc:Arab | ✔ | set 13 | ARABIC LETTER HEH | [0], [100], [130], [201], [401], [600], [700] | |
47 | 0648 | و | Arabic | ARABIC LETTER WAW | sc:Arab | ✔ | ARABIC LETTER WAW | [0], [100], [130], [201], [401], [600], [700] | ||
48 | 0649 | ى | Arabic | ARABIC LETTER ALEF MAKSURA | sc:Arab | ✔ | set 19 | ARABIC LETTER ALEF MAKSURA | [0], [100], [130], [201], [401], [600], [700] | |
49 | 064A | ي | Arabic | ARABIC LETTER YEH | sc:Arab | ✔ | set 19 | ARABIC LETTER YEH | [0], [100], [130], [201], [401], [600], [700] | |
50 | 0660 | ٠ | Arabic | ARABIC-INDIC DIGIT ZERO | sc:Arab | not: leading-digit | ✔ | set 1 | ARABIC-INDIC DIGIT ZERO | [0], [100], [130], [201], [401], [600], [700] |
51 | 0661 | ١ | Arabic | ARABIC-INDIC DIGIT ONE | sc:Arab | not: leading-digit | ✔ | set 2 | ARABIC-INDIC DIGIT ONE | [0], [100], [130], [201], [401], [600], [700] |
52 | 0662 | ٢ | Arabic | ARABIC-INDIC DIGIT TWO | sc:Arab | not: leading-digit | ✔ | set 3 | ARABIC-INDIC DIGIT TWO | [0], [100], [130], [201], [401], [600], [700] |
53 | 0663 | ٣ | Arabic | ARABIC-INDIC DIGIT THREE | sc:Arab | not: leading-digit | ✔ | set 4 | ARABIC-INDIC DIGIT THREE | [0], [100], [130], [201], [401], [600], [700] |
54 | 0664 | ٤ | Arabic | ARABIC-INDIC DIGIT FOUR | sc:Arab | not: leading-digit | ✔ | set 5 | ARABIC-INDIC DIGIT FOUR | [0], [100], [130], [201], [401], [600], [700] |
55 | 0665 | ٥ | Arabic | ARABIC-INDIC DIGIT FIVE | sc:Arab | not: leading-digit | ✔ | set 6 | ARABIC-INDIC DIGIT FIVE | [0], [100], [130], [201], [401], [600], [700] |
56 | 0666 | ٦ | Arabic | ARABIC-INDIC DIGIT SIX | sc:Arab | not: leading-digit | ✔ | set 7 | ARABIC-INDIC DIGIT SIX | [0], [100], [130], [201], [401], [600], [700] |
57 | 0667 | ٧ | Arabic | ARABIC-INDIC DIGIT SEVEN | sc:Arab | not: leading-digit | ✔ | set 8 | ARABIC-INDIC DIGIT SEVEN | [0], [100], [130], [201], [401], [600], [700] |
58 | 0668 | ٨ | Arabic | ARABIC-INDIC DIGIT EIGHT | sc:Arab | not: leading-digit | ✔ | set 9 | ARABIC-INDIC DIGIT EIGHT | [0], [100], [130], [201], [401], [600], [700] |
59 | 0669 | ٩ | Arabic | ARABIC-INDIC DIGIT NINE | sc:Arab | not: leading-digit | ✔ | set 10 | ARABIC-INDIC DIGIT NINE | [0], [100], [130], [201], [401], [600], [700] |
Legend
Number of variant sets | 19 |
---|---|
Largest variant set | 8 |
Ordinary Variants by Type | allocatable (21) blocked (137) |
Reflexive Variants by Type |
The following tables list each pair of variant mappings on one row. For each pair of code points, by convention, the lower code point is taken as the source of the mapping in the forward → direction and the reverse direction ← is not listed separately. The variant mappings defined in an LGR are required to be symmetric, that is, both the forward and reverse mappings must be specified.
A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.
Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mapping are given in that order, as indicated by the arrows. The same applies to any comments.
In a properly specified LGR, all members of each variant set are variants of each other, a property called transitivity. Because of that, all variant sets are necessarily disjoint. In each set, shading is used to group mappings from the same source code point or sequence.
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0030 | 0 | 0660 | ٠ | ↔ | activated | For international reachability | |
2 | 0030 | 0 | 06F0 | ۰ | ↔ | activated | For international reachability | |
3 | 0660 | ٠ | 06F0 | ۰ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0031 | 1 | 0661 | ١ | ↔ | activated | For international reachability | |
2 | 0031 | 1 | 06F1 | ۱ | ↔ | activated | For international reachability | |
3 | 0661 | ١ | 06F1 | ۱ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0032 | 2 | 0662 | ٢ | ↔ | activated | For international reachability | |
2 | 0032 | 2 | 06F2 | ۲ | ↔ | activated | For international reachability | |
3 | 0662 | ٢ | 06F2 | ۲ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0033 | 3 | 0663 | ٣ | ↔ | activated | For international reachability | |
2 | 0033 | 3 | 06F3 | ۳ | ↔ | activated | For international reachability | |
3 | 0663 | ٣ | 06F3 | ۳ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0034 | 4 | 0664 | ٤ | ↔ | activated | For international reachability | |
2 | 0034 | 4 | 06F4 | ۴ | ↔ | activated | For international reachability | |
3 | 0664 | ٤ | 06F4 | ۴ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0035 | 5 | 0665 | ٥ | ↔ | activated | For international reachability | |
2 | 0035 | 5 | 06F5 | ۵ | ↔ | activated | For international reachability | |
3 | 0665 | ٥ | 06F5 | ۵ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0036 | 6 | 0666 | ٦ | ↔ | activated | For international reachability | |
2 | 0036 | 6 | 06F6 | ۶ | ↔ | activated | For international reachability | |
3 | 0666 | ٦ | 06F6 | ۶ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0037 | 7 | 0667 | ٧ | ↔ | activated | For international reachability | |
2 | 0037 | 7 | 06F7 | ۷ | ↔ | activated | For international reachability | |
3 | 0667 | ٧ | 06F7 | ۷ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0038 | 8 | 0668 | ٨ | ↔ | activated | For international reachability | |
2 | 0038 | 8 | 06F8 | ۸ | ↔ | activated | For international reachability | |
3 | 0668 | ٨ | 06F8 | ۸ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0039 | 9 | 0669 | ٩ | ↔ | activated | For international reachability | |
2 | 0039 | 9 | 06F9 | ۹ | ↔ | activated | For international reachability | |
3 | 0669 | ٩ | 06F9 | ۹ | ↔ | activated | For international reachability |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0622 | آ | 0623 | أ | ↔ | allocatable | Language variant | |
2 | 0622 | آ | 0625 | إ | ↔ | allocatable | Language variant | |
3 | 0622 | آ | 0627 | ا | ↔ | allocatable | U+0622 ALEF WITH MADDA ABOVE is simplified to U+0627 ALEF in Arabic language | |
4 | 0622 | آ | 0671 | ٱ | ↔ | blocked | Typo variant | |
5 | 0622 | آ | 0672 | ٲ | ↔ | blocked | Typo variant | |
6 | 0622 | آ | 0673 | ٳ | ↔ | blocked | Transitivity variant * | |
7 | 0623 | أ | 0625 | إ | ↔ | allocatable | Language variant | |
8 | 0623 | أ | 0627 | ا | → | activated | For international reachability and since U+0623 ALEF WITH HAMZA ABOVE is simplified to U+0627 ALEF in Arabic language | |
← | allocatable |   |   | |||||
9 | 0623 | أ | 0671 | ٱ | ↔ | blocked | Typo variant | |
10 | 0623 | أ | 0672 | ٲ | ↔ | blocked | Typo variant | |
11 | 0623 | أ | 0673 | ٳ | ↔ | blocked | Transitivity variant * | |
12 | 0625 | إ | 0627 | ا | → | activated | For international reachability and since U+0625 ALEF WITH HAMZA BELOW is simplified to U+0627 ALEF in Arabic language | |
← | allocatable |   |   | |||||
13 | 0625 | إ | 0671 | ٱ | ↔ | blocked | Transitivity variant * | |
14 | 0625 | إ | 0672 | ٲ | ↔ | blocked | Transitivity variant * | |
15 | 0625 | إ | 0673 | ٳ | ↔ | blocked | Typo variant | |
16 | 0627 | ا | 0671 | ٱ | ↔ | blocked | Typo variant | |
17 | 0627 | ا | 0672 | ٲ | ↔ | blocked | Typo variant | |
18 | 0627 | ا | 0673 | ٳ | ↔ | blocked | Typo variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0626 | ئ | 06D3 | ۓ | ↔ | blocked | Typo variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0629 | ة | 0647 | ه | ↔ | allocatable | In the Arabic language, U+0647 HEH may be substituted for U+0629 TEH MARBUTA. [RFC6365] | |
2 | 0629 | ة | 06BE | ھ | ↔ | blocked | Typo variant | |
3 | 0629 | ة | 06C1 | ہ | ↔ | blocked | Typo variant | |
4 | 0629 | ة | 06C3 | ۃ | → | activated | For international reachability | |
← | blocked |   |   | |||||
5 | 0629 | ة | 06D5 | ە | ↔ | blocked | Typo variant | |
6 | 0647 | ه | 06BE | ھ | → | activated | For international reachability | |
← | blocked |   |   | |||||
7 | 0647 | ه | 06C1 | ہ | → | activated | For international reachability | |
← | blocked |   |   | |||||
8 | 0647 | ه | 06C3 | ۃ | ↔ | blocked | Transitivity variant * | |
9 | 0647 | ه | 06D5 | ە | ↔ | blocked | Typo/Exact variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 062A | ت | 063E | ؾ | ↔ | blocked | Typo variant | |
2 | 062A | ت | 067A | ٺ | ↔ | blocked | Typo variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 062B | ث | 063F | ؿ | ↔ | blocked | Exact variant | |
2 | 062B | ث | 067D | ٽ | ↔ | blocked | Typo variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0641 | ف | 06A7 | ڧ | ↔ | blocked | Typo/Exact variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0643 | ك | 06A9 | ک | → | activated | For international reachability | |
← | blocked |   |   | |||||
2 | 0643 | ك | 06AA | ڪ | ↔ | blocked | Typo variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0646 | ن | 06BA | ں | ↔ | blocked | Exact variant |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0649 | ى | 064A | ي | ↔ | allocatable | Language variant | |
2 | 0649 | ى | 066E | ٮ | ↔ | blocked | Exact variant | |
3 | 0649 | ى | 067B | ٻ | ↔ | blocked | Transitivity variant * | |
4 | 0649 | ى | 06CC | ی | → | activated | For international reachability | |
← | blocked |   |   | |||||
5 | 0649 | ى | 06CD | ۍ | ↔ | blocked | Typo variant | |
6 | 0649 | ى | 06D0 | ې | ↔ | blocked | Transitivity variant * | |
7 | 0649 | ى | 06D2 | ے | ↔ | blocked | Typo variant | |
8 | 064A | ي | 066E | ٮ | ↔ | blocked | Transitivity variant * | |
9 | 064A | ي | 067B | ٻ | ↔ | blocked | Typo variant | |
10 | 064A | ي | 06CC | ی | → | activated | For international reachability | |
← | blocked |   |   | |||||
11 | 064A | ي | 06CD | ۍ | ↔ | blocked | Typo variant | |
12 | 064A | ي | 06D0 | ې | ↔ | blocked | Typo variant | |
13 | 064A | ي | 06D2 | ے | ↔ | blocked | Typo variant |
The following table lists all top-level classes with their definition and a list of their members intersected with the current repertoire.
Name | Definition | Count | Members | Comment |
---|---|---|---|---|
implicit | Tag=sc:Zyyy | 11 Elements | { 002D 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 } | |
implicit | Tag=sc:Arab | 79 Elements | { 0621 0622 0623 0624 0625 0626 0627 0628 0629 062A 062B 062C 062D ... } | |
transparent | Unicode Property=jt:T | |||
right-joining | Unicode Property=jt:R | |||
left-joining | Unicode Property=jt:L | |||
dual-joining | Unicode Property=jt:D | |||
non-joining | Unicode Property=jt:U | |||
arabic-language | 57 Elements | { 0621 0622 0623 0624 0625 0626 0627 0628 0629 062A 062B 062C ... } | ||
urdu-language | 61 Elements | { 0621 0622 0626 0627 0628 062A 062B 062C 062D 062E 062F 0630 ... } | ||
persian-language | 59 Elements | { 0621 0622 0623 0624 0626 0627 0628 0629 062A 062B 062C 062D ... } | ||
malay-language | 52 Elements | { 0621 0623 0625 0626 0627 0628 0629 062A 062B 062C 062D 062E ... } | ||
pashto-language | 70 Elements | { 0621 0622 0623 0624 0626 0627 0628 0629 062A 062B 062C 062D ... } | ||
arabic-digits | 1 Elements | { 0030-0039 } | ||
arabic-indic-digits | 1 Elements | { 0660-0669 } | ||
extended-arabic-indic-digits | 1 Elements | { 06F0-06F9 } |
Legend
The following table lists all the top-level, or named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).
Name | Regular Expression | Used as Trigger |
Used as Context |
Anchor | Ref | Comment |
---|---|---|---|---|---|---|
hyphen-minus-disallowed | (choice [((start))(anchor)][(anchor)((end))][((start)(any)(any)(cp: 002D))(anchor)]) | ✔ | ✔ | [120] | RFC5891 restrictions on placement of U+002D | |
no-connected-alef-maksura | (start)(0+ any)(1+ cp: 0649)(choice (count:1+) [(right-joining)][(dual-joining)](0+ any)(end)) | ✔ |   | a label which has connected alef maksura is blocked | ||
no-mixing-languages-in-variants | (choice (count:1+) [(start)((count:1+) arabic-language)(end)][(start)((count:1+) persian-language)(end)][(start)((count:1+) urdu-language)(end)][(start)((count:1+) pashto-language)(end)][(start)((count:1+) malay-language)(end)]) | ✔ |   | a label/variant can only be written using characters from one of the 5 languages | ||
leading-digit | ((start))(anchor) | ✔ | ✔ | [121] | RFC5893 RTL labels cannot start with a digit | |
no-digit-mixing | (choice (count:1+) [(start)(0+ any)(arabic-digits)(0+ any)(choice (count:1+) [(arabic-indic-digits)][(extended-arabic-indic-digits)](0+ any)(end))][(start)(0+ any)(arabic-indic-digits)(0+ any)(choice (count:1+) [(arabic-digits)][(extended-arabic-indic-digits)](0+ any)(end))][(start)(0+ any)(extended-arabic-indic-digits)(0+ any)(choice (count:1+) [(arabic-digits)][(arabic-indic-digits)](0+ any)(end))]) | ✔ |   |
Legend
Note: The terminologies used in the regular expressions are followed from RFC7940.
The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
# | Condition | Rule / Variant Set | Disposition | Ref | Comment | |
---|---|---|---|---|---|---|
1 | if label does not match | no-mixing-languages-in-variants | → | invalid | ||
2 | if label matches | no-connected-alef-maksura | → | invalid | ||
3 | if label matches | no-digit-mixing | → | invalid | ||
4 | if at least one variant is in | {blocked} | → | blocked | default action | |
5 | if at least one variant is in | {security-blocked} | → | blocked | default action for similarity variants | |
6 | if at least one variant is in | {activated} | → | activated | default action - For international reachability | |
7 | if at least one variant is in | {allocatable} | → | allocatable | default action for allocatable variants | |
8 | if any label (catch-all) | → | valid | catch all (default action) |
Legend
[0] |
The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5) Any code point cited was originally encoded in Unicode Version 1.1 |
[1] |
The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5) Any code point cited was originally encoded in Unicode Version 2.0 |
[5] |
The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5) Any code point cited was originally encoded in Unicode Version 3.2 |
[100] | Internetstiftelsen i Sverige (IIS), Arabic https://github.com/dotse/IDN-ref-tables/blob/master/language-tables/arabic-lang-ref-table.txt None |
[107] |
MSR-2 Maximum Starting Repertoire https://www.icann.org/en/system/files/files/msr-2-overview-14apr15-en.pdf Code points cited are obsolete |
[120] | RFC5891, Internationalized Domain Names in Applications (IDNA): Protocol http://tools.ietf.org/html/rfc5891 None |
[121] | RFC5893, Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) http://tools.ietf.org/html/rfc5893 None |
[130] | RFC5564, Linguistic Guidelines for the Use of the Arabic Language in Internet Domains http://tools.ietf.org/html/rfc5564 None |
[201] | Omniglot Arabic http://www.omniglot.com/writing/arabic.htm None |
[401] |
The Unicode Consortium, Common Locale Data Repository.- CLDR Version 28 (2015-09-16)- Locale Data Summary for Arabic [ar]- http://www.unicode.org/cldr/charts/28/summary/ar.html Code points cited are from the set of Main Letters |
[600] |
Wikipedia Arabic alphabet https://en.wikipedia.org/wiki/Arabic_alphabet accessed 2015-10-31 Code points cited are from the set of Basic letters |
[601] |
Wikipedia Arabic alphabet https://en.wikipedia.org/wiki/Arabic_alphabet accessed 2015-10-31 Code points cited are from the set of Regional variations |
[603] |
Wikipedia Arabic Che https://en.wikipedia.org/wiki/Che_(Persian_letter)#Other_uses accessed 2015-10-31 Code points cited are used for the sound Che (loan words) |
[700] | Saudi Network Information Center (.sa, Saudi Arabia ccTLD) http://www.iana.org/domains/idn-tables/tables/sa_ar_2.0.pdf None |