LGR for ara-Arab

This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.

LGR Version 1.3
Date 2017-04-26
Language ara-Arab
Scope domain: tld
Unicode Version 6.3.0

Description

Label Generation Rules for Arabic Language

[Overview]

This document specifies a set of Label Generation Rules for the Arabic language using a limited repertoire as appropriate for a second level domain.

This is a DRAFT document released for public comments and not final. Please see the announcement on the ICANN website for public comments on Second  
Level Reference LGRs for details on how to submit comments. Note: it is anticipated that any code points that were excluded but listed for review 
will be removed in the finalized version of this file unless their status is changed as result of public comments.

[Repertoire]

All references converge on 36 Arabic letters and 10 Arabic-Indic digits (in addition to the European digits and U+002D (-) HYPHEN-MINUS). 

The repertoire, analysis and documentation provided in the "RFC5564: Linguistic Guidelines for the Use of the Arabic Language in Internet Domains"  [RFC5564]
and the "Proposal for Arabic script Root Zone LGR" [ARABIC-PROPOSAL] were consulted in making this determination.

Note: the proposal cited is being adopted for the Arabic script portion of the Root Zone LGR, a process that is expected to be finalized in time for 
the completion of the Second Level Reference LGR project.

[Correlation between repertoire and Bidi rules]

Because this LGR contains only a subset of PVALID code points, the Bidi rules expressed in section 2 of RFC5893 [121] (Bidi Rule) can be simplified to
two constraints:

•	Prevent digits (either European, Arabic-Indic, or Extended-Arabic-Indic) from starting a label.
•	As there are three digit sets being used (European, Arabic-Indic, and Extended-Arabic-Indic digits), do not allow a mixture among these sets. 

There is an IDN table published in the IANA Repository of IDN Practices for Arabic language by .السعودية (xn--mgberp4a5d4ar, the Saudi Arabia IDN ccTLD)
[700]. It is complemented by a guidelines document [IDN-GUIDE]. These documents refer to Arabic language as used in Saudi Arabia.

[Variants]

Variants are based on the variants listed in the proposed Arabic script LGR for the root zone [ARABIC-PROPOSAL]; they are limited to the variants 
contained in the sub-repertoire used in this document. See [112] for the full definition of Arabic variants and [RFC6365] for the case of U+0629 (ة) 
ARABIC LETTER TEH MARBUTA and U+0647 (ه) ARABIC LETTER HEH. See also the comments given in the listing.

To generate the variant tables, a number of concepts have been used:

a)	When building a variant table, for each code point in the Arabic language repertoire, a full study is conducted across the whole Arabic script in 
order to identify all possible variants corresponding to the code point. This is done for more protection to the registry’s TLD-space and to minimize 
the overhead of re-studying when the registry decided to support more than one language from the Arabic script. Hence, conducting variant study across 
the whole Arabic script would provide an IDN registry with the following benefits:

-	Protection to the registry name space regardless of the supported languages.
-	Doing the work for a language one time. No need to re-study the code-point’s relationship each time a new language (or new set of code point) is 
supported by the registries.
-	Flexibility to add more language as they become ready without effecting exciting supported languages
Consistent with the principle of identifying all possible variants  without neglecting or overlooking any similarities without documenting it.

b)	One of the main principles for the stability of the Internet and IDNs is that the end user should be able to reach a website connected to his/her 
domain name regardless of location. Additionally, an end user reads website addresses based on his/her language alphabet and whatever is available in 
his/her keyboard. Therefore, in order to enforce this principle the input devices (language table) that the user may use to reach a domain name (based 
on the user location) should be carefully considered when defining variants. Otherwise, it may cause a reachability problem and reduce the user acceptance. 
For example, if someone registered the domain name “مكة” (all characters from the Arabic language) and a user try to reach the website connected to this  
domain name from an Internet café or airport, say, in Pakistan. He/she will not be able to reach that website unless if the variant “مکۃ” (Urdu variant) 
is already allocated and activated. Thus, variants need to be studied from both similarity point of view (by language community) and reachability pointy 
of view (based on input devices used by other language communities).

c)	Consistency is very important concept in variants generation. Regardless of the selected applied-for label the list of generated allocate-able variants
should be the same. As we are dealing with normal users at the SLD, their 1st choice (applied-for) label might not be the one will be used by the internet
community. Therefore, a registry should provide the registrant the possibility to "correct" his/her choice if he/she was not successful with the first try.,
e.g.,: The word “Internet” is written in most Arab countries in north Africa as (أنترنت) while it is written in other Arab countries as (إنترنت), while 
end users often write as (انترنت). Therefore, if someone registered "أنترنت", he/she should be able to enable "إنترنت" or "انترنت". Additionally, many words
have two correct ways to write them, e.g., (آدم and أدم) are widely used for the same name Adam. Hence, if someone registered "أدم" he/she should be able to
enable "آدم" or "ادم”. This is  achieved by making variants allocatable in both directions in the LGR.

d)	Even though we are constructing LGR at a language level and as we addressing international reachability, we need to consider the issue of no mixing 
between code points from different language tables in generating a valid variant. This is due to the fact that when generating variants of a given label,
some of them will be composed of characters (code points) that are not part of a single language or they are not easily available in an input device 
(keyboard). Therefore, from practical and realistic point of view, and to significantly minimize the number of allocate-able variants and maximize the 
number of blocked variants, it is an excellent practice to block these unrealistic variants. In some cases, blocked variants due to language mixing 
represents more than 99%. Hence, it definitely adds more accuracy to the number of allocate-able variants.

Note: Please note that set of supported languages used for implementing the above concepts (namely international reachability and no mixing between languages)
are preliminary. They should be updated and add new languages whenever they become ready.


[Rules]

Common rules:
•	Hyphen Restrictions — restrictions on the allowable placement of hyphens (no leading/ending hyphen and no consecutive hyphens in positions 3 and 4). 
These restrictions are described in section 4.2.3.1 of RFC5891 [120]. They are implemented here as context rule on U+002D (-) HYPHEN-MINUS.
Right-To-Left rules:
•	Leading digit — restrictions on the allowable placement of digit (no leading digit). This rule is described in section 2.1 of RFC5893 [121].
•	No mix between different digit sets (European, Arabic-Indic, and Extended-Arabic-Indic digits) — restrictions on the mix of digits. This rule is 
described in section 2.4 of RFC5893 [121].
Arabic language specific rules — These rules aim at reducing the allocation of redundant labels. 
•	No connected ALEF MAKSURA in the Arabic language — restriction on having ALEF MAKSURA (0649) before a right joining or dual joining code point.
•	No languages mixing in the generated variants—restriction on mixing code points from different language tables. Any variant must be generated using 
code point taken from a single supported language table.

[Actions]

Actions included are the default actions for LGRs. Also included is a set of Arabic language specific actions to enforce the Arabic language specific WLE 
rules.

[Methodology and Contributors]

This reference LGR for Arabic language for the 2nd Level was developed initially by Michel Suignard and Asmus Freytag, and then verified in expert reviews 
by Michael Everson, Nicholas Ostler, and Wil Tan. Afterwards,  it was re-written by TF-AIDN sub-group (namely in alphabetical order: Abdalmonem Tharwat 
Galila , Abdeslam Nasri , Abdulaziz Al-Zoman, Abdulrahman Alghadir, Hazem Hezzah, Nabil Benamar, Raed Alfayez, Tarik Merghani) to suit the Arabic Language 
needs.
 
[References]

General references for the language:
•	Wikipedia: Arabic alphabet 
https://en.wikipedia.org/wiki/Arabic_alphabet
•	Omniglot: Arabic 
http://www.omniglot.com/writing/arabic.htm
Other references cited in this document:
[ARABIC-PROPOSAL]
TF-AIDN, "Proposal for Arabic Script Root Zone LGR", Version 3.4, 18 November 2015 
https://www.icann.org/en/system/files/files/arabic-lgr-proposal-18nov15-en.pdf
[IDN-GUIDE]
Saudi Network Information Center, "Guidelines Rules for writing Arabic IDNs under the IDN ccTLD (السعودية.)" 
http://nic.net.sa/docs/Guidelines_for_writing_Arabic_IDNs_under_the_IDN_ccTLD_V1.2-en.pdf
[RFC5564]
RFC 5564, "Linguistic Guidelines for the Use of the Arabic Language in Internet Domains" 
https://tools.ietf.org/html/rfc5564 
[RFC6365]
RFC 6365, "Terminology Used in Internationalization in the IETF", section 7.2 "Character Relationships and Variants" 
https://tools.ietf.org/html/rfc6365
[VIP]
Internationalized Domain Names Variant Issues Project, Arabic Case Study Team Issues Report, 
http://archive.icann.org/en/topics/new-gtlds/arabic-vip-issues-report-07oct11-en.pdf 

In the listing of the repertoire by code point, references starting from [0] refer to the version of the Unicode Standard in which the corresponding code
point was initially encoded. Other references (starting from [100]) document usage of code points. For more details, see the Table of References below.
        
    

Repertoire

Summary

Number of elements in Repertoire 59
Number of extended elements 0
Number of excluded elements 0
Total entries in table 59
Number of code point sequences 0

Repertoire by Code Point

The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where the comment in the original LGR is equal to the character name, it has been suppressed.

For any code point or sequence for which a variant is defined, the link to the associated variant set, or if mapped to itself, the variant type of that mapping is provided in the Variants column.

# Code
Point
Glyph Script Name Tags Required Context Part of
Repertoire
Variants Comment References
1 002D - Common HYPHEN-MINUS sc:Zyyy not: hyphen-minus-disallowed HYPHEN-MINUS [0]
2 0030 0 Common DIGIT ZERO sc:Zyyy not: leading-digit set 1 DIGIT ZERO [0]
3 0031 1 Common DIGIT ONE sc:Zyyy not: leading-digit set 2 DIGIT ONE [0]
4 0032 2 Common DIGIT TWO sc:Zyyy not: leading-digit set 3 DIGIT TWO [0]
5 0033 3 Common DIGIT THREE sc:Zyyy not: leading-digit set 4 DIGIT THREE [0]
6 0034 4 Common DIGIT FOUR sc:Zyyy not: leading-digit set 5 DIGIT FOUR [0]
7 0035 5 Common DIGIT FIVE sc:Zyyy not: leading-digit set 6 DIGIT FIVE [0]
8 0036 6 Common DIGIT SIX sc:Zyyy not: leading-digit set 7 DIGIT SIX [0]
9 0037 7 Common DIGIT SEVEN sc:Zyyy not: leading-digit set 8 DIGIT SEVEN [0]
10 0038 8 Common DIGIT EIGHT sc:Zyyy not: leading-digit set 9 DIGIT EIGHT [0]
11 0039 9 Common DIGIT NINE sc:Zyyy not: leading-digit set 10 DIGIT NINE [0]
12 0621 ء Arabic ARABIC LETTER HAMZA sc:Arab   ARABIC LETTER HAMZA [0], [100], [130], [201], [401], [600], [700]
13 0622 آ Arabic ARABIC LETTER ALEF WITH MADDA ABOVE sc:Arab   set 11 ARABIC LETTER ALEF WITH MADDA ABOVE [0], [100], [130], [201], [401], [600], [700]
14 0623 أ Arabic ARABIC LETTER ALEF WITH HAMZA ABOVE sc:Arab   set 11 ARABIC LETTER ALEF WITH HAMZA ABOVE [0], [100], [130], [201], [401], [600], [700]
15 0624 ؤ Arabic ARABIC LETTER WAW WITH HAMZA ABOVE sc:Arab   ARABIC LETTER WAW WITH HAMZA ABOVE [0], [100], [130], [201], [401], [600], [700]
16 0625 إ Arabic ARABIC LETTER ALEF WITH HAMZA BELOW sc:Arab   set 11 ARABIC LETTER ALEF WITH HAMZA BELOW [0], [100], [130], [201], [401], [600], [700]
17 0626 ئ Arabic ARABIC LETTER YEH WITH HAMZA ABOVE sc:Arab   set 12 ARABIC LETTER YEH WITH HAMZA ABOVE [0], [100], [130], [201], [401], [600], [700]
18 0627 ا Arabic ARABIC LETTER ALEF sc:Arab   set 11 ARABIC LETTER ALEF [0], [100], [130], [201], [401], [600], [700]
19 0628 ب Arabic ARABIC LETTER BEH sc:Arab   ARABIC LETTER BEH [0], [100], [130], [201], [401], [600], [700]
20 0629 ة Arabic ARABIC LETTER TEH MARBUTA sc:Arab   set 13 ARABIC LETTER TEH MARBUTA [0], [100], [130], [201], [401], [600], [700]
21 062A ت Arabic ARABIC LETTER TEH sc:Arab   set 14 ARABIC LETTER TEH [0], [100], [130], [201], [401], [600], [700]
22 062B ث Arabic ARABIC LETTER THEH sc:Arab   set 15 ARABIC LETTER THEH [0], [100], [130], [201], [401], [600], [700]
23 062C ج Arabic ARABIC LETTER JEEM sc:Arab   ARABIC LETTER JEEM [0], [100], [130], [201], [401], [600], [700]
24 062D ح Arabic ARABIC LETTER HAH sc:Arab   ARABIC LETTER HAH [0], [100], [130], [201], [401], [600], [700]
25 062E خ Arabic ARABIC LETTER KHAH sc:Arab   ARABIC LETTER KHAH [0], [100], [130], [201], [401], [600], [700]
26 062F د Arabic ARABIC LETTER DAL sc:Arab   ARABIC LETTER DAL [0], [100], [130], [201], [401], [600], [700]
27 0630 ذ Arabic ARABIC LETTER THAL sc:Arab   ARABIC LETTER THAL [0], [100], [130], [201], [401], [600], [700]
28 0631 ر Arabic ARABIC LETTER REH sc:Arab   ARABIC LETTER REH [0], [100], [130], [201], [401], [600], [700]
29 0632 ز Arabic ARABIC LETTER ZAIN sc:Arab   ARABIC LETTER ZAIN [0], [100], [130], [201], [401], [600], [700]
30 0633 س Arabic ARABIC LETTER SEEN sc:Arab   ARABIC LETTER SEEN [0], [100], [130], [201], [401], [600], [700]
31 0634 ش Arabic ARABIC LETTER SHEEN sc:Arab   ARABIC LETTER SHEEN [0], [100], [130], [201], [401], [600], [700]
32 0635 ص Arabic ARABIC LETTER SAD sc:Arab   ARABIC LETTER SAD [0], [100], [130], [201], [401], [600], [700]
33 0636 ض Arabic ARABIC LETTER DAD sc:Arab   ARABIC LETTER DAD [0], [100], [130], [201], [401], [600], [700]
34 0637 ط Arabic ARABIC LETTER TAH sc:Arab   ARABIC LETTER TAH [0], [100], [130], [201], [401], [600], [700]
35 0638 ظ Arabic ARABIC LETTER ZAH sc:Arab   ARABIC LETTER ZAH [0], [100], [130], [201], [401], [600], [700]
36 0639 ع Arabic ARABIC LETTER AIN sc:Arab   ARABIC LETTER AIN [0], [100], [130], [201], [401], [600], [700]
37 063A غ Arabic ARABIC LETTER GHAIN sc:Arab   ARABIC LETTER GHAIN [0], [100], [130], [201], [401], [600], [700]
40 0641 ف Arabic ARABIC LETTER FEH sc:Arab   set 16 ARABIC LETTER FEH [0], [100], [130], [201], [401], [600], [700]
41 0642 ق Arabic ARABIC LETTER QAF sc:Arab   ARABIC LETTER QAF [0], [100], [130], [201], [401], [600], [700]
42 0643 ك Arabic ARABIC LETTER KAF sc:Arab   set 17 ARABIC LETTER KAF [0], [100], [130], [201], [401], [600], [700]
43 0644 ل Arabic ARABIC LETTER LAM sc:Arab   ARABIC LETTER LAM [0], [100], [130], [201], [401], [600], [700]
44 0645 م Arabic ARABIC LETTER MEEM sc:Arab   ARABIC LETTER MEEM [0], [100], [130], [201], [401], [600], [700]
45 0646 ن Arabic ARABIC LETTER NOON sc:Arab   set 18 ARABIC LETTER NOON [0], [100], [130], [201], [401], [600], [700]
46 0647 ه Arabic ARABIC LETTER HEH sc:Arab   set 13 ARABIC LETTER HEH [0], [100], [130], [201], [401], [600], [700]
47 0648 و Arabic ARABIC LETTER WAW sc:Arab   ARABIC LETTER WAW [0], [100], [130], [201], [401], [600], [700]
48 0649 ى Arabic ARABIC LETTER ALEF MAKSURA sc:Arab   set 19 ARABIC LETTER ALEF MAKSURA [0], [100], [130], [201], [401], [600], [700]
49 064A ي Arabic ARABIC LETTER YEH sc:Arab   set 19 ARABIC LETTER YEH [0], [100], [130], [201], [401], [600], [700]
50 0660 ٠ Arabic ARABIC-INDIC DIGIT ZERO sc:Arab not: leading-digit set 1 ARABIC-INDIC DIGIT ZERO [0], [100], [130], [201], [401], [600], [700]
51 0661 ١ Arabic ARABIC-INDIC DIGIT ONE sc:Arab not: leading-digit set 2 ARABIC-INDIC DIGIT ONE [0], [100], [130], [201], [401], [600], [700]
52 0662 ٢ Arabic ARABIC-INDIC DIGIT TWO sc:Arab not: leading-digit set 3 ARABIC-INDIC DIGIT TWO [0], [100], [130], [201], [401], [600], [700]
53 0663 ٣ Arabic ARABIC-INDIC DIGIT THREE sc:Arab not: leading-digit set 4 ARABIC-INDIC DIGIT THREE [0], [100], [130], [201], [401], [600], [700]
54 0664 ٤ Arabic ARABIC-INDIC DIGIT FOUR sc:Arab not: leading-digit set 5 ARABIC-INDIC DIGIT FOUR [0], [100], [130], [201], [401], [600], [700]
55 0665 ٥ Arabic ARABIC-INDIC DIGIT FIVE sc:Arab not: leading-digit set 6 ARABIC-INDIC DIGIT FIVE [0], [100], [130], [201], [401], [600], [700]
56 0666 ٦ Arabic ARABIC-INDIC DIGIT SIX sc:Arab not: leading-digit set 7 ARABIC-INDIC DIGIT SIX [0], [100], [130], [201], [401], [600], [700]
57 0667 ٧ Arabic ARABIC-INDIC DIGIT SEVEN sc:Arab not: leading-digit set 8 ARABIC-INDIC DIGIT SEVEN [0], [100], [130], [201], [401], [600], [700]
58 0668 ٨ Arabic ARABIC-INDIC DIGIT EIGHT sc:Arab not: leading-digit set 9 ARABIC-INDIC DIGIT EIGHT [0], [100], [130], [201], [401], [600], [700]
59 0669 ٩ Arabic ARABIC-INDIC DIGIT NINE sc:Arab not: leading-digit set 10 ARABIC-INDIC DIGIT NINE [0], [100], [130], [201], [401], [600], [700]

Legend

Code Point
A code point or code point sequence.
Name
Shows the character or sequence name from the Unicode Character Database.
Glyph
The shape displayed depends on the fonts available to your browser.
Script
Shows the script property value from the Unicode Character Database. Combining marks may have the value Inherited and code points used with more than one script may have the value Common.
References
Links to the references associated with the code point or sequence, if any.
Tags
LGR-defined tag values. Any tags matching the Unicode script property are suppressed in this view.
Required Context
Link to the rule defining the required context a code point or sequence must satisfy. If prefixed by "not:", identifies a context that must not occur.
Variants
A link to the variant set the code point or sequence is a member of, except where a coded point or sequence maps only to itself, in which case the type of that mapping is listed.
Comment
If the comment in this row consists only of the code point or sequence name it is suppressed in this view.
✔ - core repertoire
A check mark in the Part-of-Repertoire column indicates a code point is part of the core repertoire.
◯ - extended repertoire
An open circle indicates a code point is part of an optional extended repertoire, which is normally disabled but could be supported by deleting the relevant context restriction.
✗ - excluded from repertoire
A code point shown with is considered excluded from the repertoire. It is shown only for review purposes.

Variant Sets

Summary

Number of variant sets 19
Largest variant set 8
Ordinary Variants by Type allocatable (21)
blocked (137)
Reflexive Variants by Type  

The following tables list each pair of variant mappings on one row. For each pair of code points, by convention, the lower code point is taken as the source of the mapping in the forward → direction and the reverse direction ← is not listed separately. The variant mappings defined in an LGR are required to be symmetric, that is, both the forward and reverse mappings must be specified.

A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.

Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mapping are given in that order, as indicated by the arrows. The same applies to any comments.

In a properly specified LGR, all members of each variant set are variants of each other, a property called transitivity. Because of that, all variant sets are necessarily disjoint. In each set, shading is used to group mappings from the same source code point or sequence.

Variant Set 1 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0030 0 0660 ٠ activated   For international reachability
2 0030 0 06F0 ۰ activated   For international reachability
3 0660 ٠ 06F0 ۰ activated   For international reachability

Variant Set 2 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0031 1 0661 ١ activated   For international reachability
2 0031 1 06F1 ۱ activated   For international reachability
3 0661 ١ 06F1 ۱ activated   For international reachability

Variant Set 3 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0032 2 0662 ٢ activated   For international reachability
2 0032 2 06F2 ۲ activated   For international reachability
3 0662 ٢ 06F2 ۲ activated   For international reachability

Variant Set 4 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0033 3 0663 ٣ activated   For international reachability
2 0033 3 06F3 ۳ activated   For international reachability
3 0663 ٣ 06F3 ۳ activated   For international reachability

Variant Set 5 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0034 4 0664 ٤ activated   For international reachability
2 0034 4 06F4 ۴ activated   For international reachability
3 0664 ٤ 06F4 ۴ activated   For international reachability

Variant Set 6 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0035 5 0665 ٥ activated   For international reachability
2 0035 5 06F5 ۵ activated   For international reachability
3 0665 ٥ 06F5 ۵ activated   For international reachability

Variant Set 7 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0036 6 0666 ٦ activated   For international reachability
2 0036 6 06F6 ۶ activated   For international reachability
3 0666 ٦ 06F6 ۶ activated   For international reachability

Variant Set 8 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0037 7 0667 ٧ activated   For international reachability
2 0037 7 06F7 ۷ activated   For international reachability
3 0667 ٧ 06F7 ۷ activated   For international reachability

Variant Set 9 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0038 8 0668 ٨ activated   For international reachability
2 0038 8 06F8 ۸ activated   For international reachability
3 0668 ٨ 06F8 ۸ activated   For international reachability

Variant Set 10 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0039 9 0669 ٩ activated   For international reachability
2 0039 9 06F9 ۹ activated   For international reachability
3 0669 ٩ 06F9 ۹ activated   For international reachability

Variant Set 11 — 7 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0622 آ 0623 أ allocatable   Language variant
2 0622 آ 0625 إ allocatable   Language variant
3 0622 آ 0627 ا allocatable   U+0622 ALEF WITH MADDA ABOVE is simplified to U+0627 ALEF in Arabic language
4 0622 آ 0671 ٱ blocked   Typo variant
5 0622 آ 0672 ٲ blocked   Typo variant
6 0622 آ 0673 ٳ blocked   Transitivity variant *
7 0623 أ 0625 إ allocatable   Language variant
8 0623 أ 0627 ا activated   For international reachability and since U+0623 ALEF WITH HAMZA ABOVE is simplified to U+0627 ALEF in Arabic language
allocatable    
9 0623 أ 0671 ٱ blocked   Typo variant
10 0623 أ 0672 ٲ blocked   Typo variant
11 0623 أ 0673 ٳ blocked   Transitivity variant *
12 0625 إ 0627 ا activated   For international reachability and since U+0625 ALEF WITH HAMZA BELOW is simplified to U+0627 ALEF in Arabic language
allocatable    
13 0625 إ 0671 ٱ blocked   Transitivity variant *
14 0625 إ 0672 ٲ blocked   Transitivity variant *
15 0625 إ 0673 ٳ blocked   Typo variant
16 0627 ا 0671 ٱ blocked   Typo variant
17 0627 ا 0672 ٲ blocked   Typo variant
18 0627 ا 0673 ٳ blocked   Typo variant

Variant Set 12 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0626 ئ 06D3 ۓ blocked   Typo variant

Variant Set 13 — 6 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0629 ة 0647 ه allocatable   In the Arabic language, U+0647 HEH may be substituted for U+0629 TEH MARBUTA. [RFC6365]
2 0629 ة 06BE ھ blocked   Typo variant
3 0629 ة 06C1 ہ blocked   Typo variant
4 0629 ة 06C3 ۃ activated   For international reachability
blocked    
5 0629 ة 06D5 ە blocked   Typo variant
6 0647 ه 06BE ھ activated   For international reachability
blocked    
7 0647 ه 06C1 ہ activated   For international reachability
blocked    
8 0647 ه 06C3 ۃ blocked   Transitivity variant *
9 0647 ه 06D5 ە blocked   Typo/Exact variant

Variant Set 14 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 062A ت 063E ؾ blocked   Typo variant
2 062A ت 067A ٺ blocked   Typo variant

Variant Set 15 — 4 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 062B ث 063F ؿ blocked   Exact variant
2 062B ث 067D ٽ blocked   Typo variant

Variant Set 16 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0641 ف 06A7 ڧ blocked   Typo/Exact variant

Variant Set 17 — 3 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0643 ك 06A9 ک activated   For international reachability
blocked    
2 0643 ك 06AA ڪ blocked   Typo variant

Variant Set 18 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0646 ن 06BA ں blocked   Exact variant

Variant Set 19 — 8 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0649 ى 064A ي allocatable   Language variant
2 0649 ى 066E ٮ blocked   Exact variant
3 0649 ى 067B ٻ blocked   Transitivity variant *
4 0649 ى 06CC ی activated   For international reachability
blocked    
5 0649 ى 06CD ۍ blocked   Typo variant
6 0649 ى 06D0 ې blocked   Transitivity variant *
7 0649 ى 06D2 ے blocked   Typo variant
8 064A ي 066E ٮ blocked   Transitivity variant *
9 064A ي 067B ٻ blocked   Typo variant
10 064A ي 06CC ی activated   For international reachability
blocked    
11 064A ي 06CD ۍ blocked   Typo variant
12 064A ي 06D0 ې blocked   Typo variant
13 064A ي 06D2 ے blocked   Typo variant

Classes, Rules and Actions

Character Classes

The following table lists all top-level classes with their definition and a list of their members intersected with the current repertoire.

Name Definition Count Members Comment
implicit Tag=sc:Zyyy 11 Elements { 002D 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 }  
implicit Tag=sc:Arab 79 Elements { 0621 0622 0623 0624 0625 0626 0627 0628 0629 062A 062B 062C 062D ... }  
transparent Unicode Property=jt:T      
right-joining Unicode Property=jt:R      
left-joining Unicode Property=jt:L      
dual-joining Unicode Property=jt:D      
non-joining Unicode Property=jt:U      
arabic-language   57 Elements { 0621 0622 0623 0624 0625 0626 0627 0628 0629 062A 062B 062C ... }  
urdu-language   61 Elements { 0621 0622 0626 0627 0628 062A 062B 062C 062D 062E 062F 0630 ... }  
persian-language   59 Elements { 0621 0622 0623 0624 0626 0627 0628 0629 062A 062B 062C 062D ... }  
malay-language   52 Elements { 0621 0623 0625 0626 0627 0628 0629 062A 062B 062C 062D 062E ... }  
pashto-language   70 Elements { 0621 0622 0623 0624 0626 0627 0628 0629 062A 062B 062C 062D ... }  
arabic-digits   1 Elements { 0030-0039 }  
arabic-indic-digits   1 Elements { 0660-0669 }  
extended-arabic-indic-digits   1 Elements { 06F0-06F9 }  

Legend

Members
Lists the members of the class as code points (xxx). Any class too numerous to list in full is linked with "..." to all the members.
Tag=ttt
A named class is defined by all code points that share the given tag value (ttt).
Implicit
An anonymous class implicitly defined class based on tag value.

Whole label evaluation and context rules

The following table lists all the top-level, or named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).

Name Regular Expression Used as
Trigger
Used as
Context
Anchor Ref Comment
hyphen-minus-disallowed (choice [((start))(anchor)][(anchor)((end))][((start)(any)(any)(cp: 002D))(anchor)])   [120] RFC5891 restrictions on placement of U+002D
no-connected-alef-maksura (start)(0+ any)(1+ cp: 0649)(choice (count:1+) [(right-joining)][(dual-joining)](0+ any)(end))       a label which has connected alef maksura is blocked
no-mixing-languages-in-variants (choice (count:1+) [(start)((count:1+) arabic-language)(end)][(start)((count:1+) persian-language)(end)][(start)((count:1+) urdu-language)(end)][(start)((count:1+) pashto-language)(end)][(start)((count:1+) malay-language)(end)])       a label/variant can only be written using characters from one of the 5 languages
leading-digit ((start))(anchor)   [121] RFC5893 RTL labels cannot start with a digit
no-digit-mixing (choice (count:1+) [(start)(0+ any)(arabic-digits)(0+ any)(choice (count:1+) [(arabic-indic-digits)][(extended-arabic-indic-digits)](0+ any)(end))][(start)(0+ any)(arabic-indic-digits)(0+ any)(choice (count:1+) [(arabic-digits)][(extended-arabic-indic-digits)](0+ any)(end))][(start)(0+ any)(extended-arabic-indic-digits)(0+ any)(choice (count:1+) [(arabic-digits)][(arabic-indic-digits)](0+ any)(end))])        

Legend

Regular Expression
A regular expression equivalent to the rule, shown in the standard notation with some extensions as noted.
[] - a choice
When there are various choices in a rule, each choice is represented by a set enclosed in square brackets.
[∩,−,Δ,∪] - set operators
Sets may be combined by set operators ( = intersection, = difference, Δ = symmetric difference and = union).
()= - empty set
Indicates that the following set is empty because of the result of set operations, or because none of its elements are part of the repertoire defined here.An empty set that is not optional means that a rule can never match.

Note: The terminologies used in the regular expressions are followed from RFC7940.

Actions

The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.

# Condition Rule / Variant Set   Disposition Ref Comment
1 if label does not match no-mixing-languages-in-variants invalid    
2 if label matches no-connected-alef-maksura invalid    
3 if label matches no-digit-mixing invalid    
4 if at least one variant is in {blocked} blocked   default action
5 if at least one variant is in {security-blocked} blocked   default action for similarity variants
6 if at least one variant is in {activated} activated   default action - For international reachability
7 if at least one variant is in {allocatable} allocatable   default action for allocatable variants
8 if any label (catch-all) valid   catch all (default action)

Legend

{...} - variant type set
In the "Rule/Variant Set" column the notation {...} means a set of variant types.

Table of References

[0] The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
Any code point cited was originally encoded in Unicode Version 1.1
[1] The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
Any code point cited was originally encoded in Unicode Version 2.0
[5] The Unicode Consortium. The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
Any code point cited was originally encoded in Unicode Version 3.2
[100] Internetstiftelsen i Sverige (IIS), Arabic https://github.com/dotse/IDN-ref-tables/blob/master/language-tables/arabic-lang-ref-table.txt
None
[107] MSR-2 Maximum Starting Repertoire https://www.icann.org/en/system/files/files/msr-2-overview-14apr15-en.pdf
Code points cited are obsolete
[120] RFC5891, Internationalized Domain Names in Applications (IDNA): Protocol http://tools.ietf.org/html/rfc5891
None
[121] RFC5893, Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA) http://tools.ietf.org/html/rfc5893
None
[130] RFC5564, Linguistic Guidelines for the Use of the Arabic Language in Internet Domains http://tools.ietf.org/html/rfc5564
None
[201] Omniglot Arabic http://www.omniglot.com/writing/arabic.htm
None
[401] The Unicode Consortium, Common Locale Data Repository.- CLDR Version 28 (2015-09-16)- Locale Data Summary for Arabic [ar]- http://www.unicode.org/cldr/charts/28/summary/ar.html
Code points cited are from the set of Main Letters
[600] Wikipedia Arabic alphabet https://en.wikipedia.org/wiki/Arabic_alphabet accessed 2015-10-31
Code points cited are from the set of Basic letters
[601] Wikipedia Arabic alphabet https://en.wikipedia.org/wiki/Arabic_alphabet accessed 2015-10-31
Code points cited are from the set of Regional variations
[603] Wikipedia Arabic Che https://en.wikipedia.org/wiki/Che_(Persian_letter)#Other_uses accessed 2015-10-31
Code points cited are used for the sound Che (loan words)
[700] Saudi Network Information Center (.sa, Saudi Arabia ccTLD) http://www.iana.org/domains/idn-tables/tables/sa_ar_2.0.pdf
None