This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
LGR Version | 1 |
---|---|
Date | 2017-04-26 |
Unicode Version | 6.3.0 |
Language | urd-Arab |
This document specifies a reference set of Label Generation Rules for Urdu using a limited repertoire as appropriate for a second level domain. This is a DRAFT document for public comments released by the Task Force for Arabic Script IDNs - TF-AIDN.
The repertoire consist of the 39 letters of the Urdu alphabet, two sets of digits and the hyphen. The code point U+0626 (ئ) is only allowed when in intiail or medial position.
The following pairs of code points are blocked variants: U+0646 (ن) / U+06BA (ں) and U+06C1 (ہ) / U+06BE (ھ).
Corresponding members of the two sets of digits (ASCII digits and Exented Arabic Indic digits) are allocatable variants of each other
This LGR does not include any script level variants. In case this LGR is integrated with other LGRs using Arabic script, the variant sets defined in the Root Zone LGR (see https://www.icann.org/sites/default/files/lgr/lgr-1-arabic-script-24feb16-en.html) should be considered.
Number of elements in Repertoire | 61 |
---|---|
Number of extended elements | 0 |
Number of excluded elements | 0 |
Total entries in table | 61 |
Number of code point sequences | 0 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where the comment in the original LGR is equal to the character name, it has been suppressed.
For any code point or sequence for which a variant is defined, the link to the associated variant set, or if mapped to itself, the variant type of that mapping is provided in the Variants column.
# | Code Point |
Glyph | Script | Name | Tags | Required Context | Part of Repertoire |
Variants | Comment | References |
---|---|---|---|---|---|---|---|---|---|---|
1 | 002D | - | Common | HYPHEN-MINUS | ✔ | |||||
2 | 0030 | 0 | Common | DIGIT ZERO | ASCII-digit | ✔ | set 1 | |||
3 | 0031 | 1 | Common | DIGIT ONE | ASCII-digit | ✔ | set 2 | |||
4 | 0032 | 2 | Common | DIGIT TWO | ASCII-digit | ✔ | set 3 | |||
5 | 0033 | 3 | Common | DIGIT THREE | ASCII-digit | ✔ | set 4 | |||
6 | 0034 | 4 | Common | DIGIT FOUR | ASCII-digit | ✔ | set 5 | |||
7 | 0035 | 5 | Common | DIGIT FIVE | ASCII-digit | ✔ | set 6 | |||
8 | 0036 | 6 | Common | DIGIT SIX | ASCII-digit | ✔ | set 7 | |||
9 | 0037 | 7 | Common | DIGIT SEVEN | ASCII-digit | ✔ | set 8 | |||
10 | 0038 | 8 | Common | DIGIT EIGHT | ASCII-digit | ✔ | set 9 | |||
11 | 0039 | 9 | Common | DIGIT NINE | ASCII-digit | ✔ | set 10 | |||
12 | 0621 | ء | Arabic | ARABIC LETTER HAMZA | ✔ | |||||
13 | 0622 | آ | Arabic | ARABIC LETTER ALEF WITH MADDA ABOVE | ✔ | |||||
14 | 0626 | ئ | Arabic | ARABIC LETTER YEH WITH HAMZA ABOVE | precedes-right-joining | ✔ | ||||
15 | 0627 | ا | Arabic | ARABIC LETTER ALEF | ✔ | |||||
16 | 0628 | ب | Arabic | ARABIC LETTER BEH | ✔ | |||||
17 | 062A | ت | Arabic | ARABIC LETTER TEH | ✔ | |||||
18 | 062B | ث | Arabic | ARABIC LETTER THEH | ✔ | |||||
19 | 062C | ج | Arabic | ARABIC LETTER JEEM | ✔ | |||||
20 | 062D | ح | Arabic | ARABIC LETTER HAH | ✔ | |||||
21 | 062E | خ | Arabic | ARABIC LETTER KHAH | ✔ | |||||
22 | 062F | د | Arabic | ARABIC LETTER DAL | ✔ | |||||
23 | 0630 | ذ | Arabic | ARABIC LETTER THAL | ✔ | |||||
24 | 0631 | ر | Arabic | ARABIC LETTER REH | ✔ | |||||
25 | 0632 | ز | Arabic | ARABIC LETTER ZAIN | ✔ | |||||
26 | 0633 | س | Arabic | ARABIC LETTER SEEN | ✔ | |||||
27 | 0634 | ش | Arabic | ARABIC LETTER SHEEN | ✔ | |||||
28 | 0635 | ص | Arabic | ARABIC LETTER SAD | ✔ | |||||
29 | 0636 | ض | Arabic | ARABIC LETTER DAD | ✔ | |||||
30 | 0637 | ط | Arabic | ARABIC LETTER TAH | ✔ | |||||
31 | 0638 | ظ | Arabic | ARABIC LETTER ZAH | ✔ | |||||
32 | 0639 | ع | Arabic | ARABIC LETTER AIN | ✔ | |||||
33 | 063A | غ | Arabic | ARABIC LETTER GHAIN | ✔ | |||||
34 | 0641 | ف | Arabic | ARABIC LETTER FEH | ✔ | |||||
35 | 0642 | ق | Arabic | ARABIC LETTER QAF | ✔ | |||||
36 | 0644 | ل | Arabic | ARABIC LETTER LAM | ✔ | |||||
37 | 0645 | م | Arabic | ARABIC LETTER MEEM | ✔ | |||||
38 | 0646 | ن | Arabic | ARABIC LETTER NOON | ✔ | set 11 | ||||
39 | 0648 | و | Arabic | ARABIC LETTER WAW | ✔ | |||||
40 | 067E | پ | Arabic | ARABIC LETTER PEH | ✔ | |||||
41 | 0686 | چ | Arabic | ARABIC LETTER TCHEH | ✔ | |||||
42 | 0688 | ڈ | Arabic | ARABIC LETTER DDAL | ✔ | |||||
43 | 0691 | ڑ | Arabic | ARABIC LETTER RREH | ✔ | |||||
44 | 0698 | ژ | Arabic | ARABIC LETTER JEH | ✔ | |||||
45 | 06A9 | ک | Arabic | ARABIC LETTER KEHEH | ✔ | |||||
46 | 06AF | گ | Arabic | ARABIC LETTER GAF | ✔ | |||||
47 | 06BA | ں | Arabic | ARABIC LETTER NOON GHUNNA | ✔ | set 11 | ||||
48 | 06BE | ھ | Arabic | ARABIC LETTER HEH DOACHASHMEE | ✔ | set 12 | ||||
49 | 06C1 | ہ | Arabic | ARABIC LETTER HEH GOAL | ✔ | set 12 | ||||
50 | 06CC | ی | Arabic | ARABIC LETTER FARSI YEH | ✔ | |||||
51 | 06D2 | ے | Arabic | ARABIC LETTER YEH BARREE | ✔ | |||||
52 | 06F0 | ۰ | Arabic | EXTENDED ARABIC-INDIC DIGIT ZERO | extended-arabic-indic-digit | ✔ | set 1 | |||
53 | 06F1 | ۱ | Arabic | EXTENDED ARABIC-INDIC DIGIT ONE | extended-arabic-indic-digit | ✔ | set 2 | |||
54 | 06F2 | ۲ | Arabic | EXTENDED ARABIC-INDIC DIGIT TWO | extended-arabic-indic-digit | ✔ | set 3 | |||
55 | 06F3 | ۳ | Arabic | EXTENDED ARABIC-INDIC DIGIT THREE | extended-arabic-indic-digit | ✔ | set 4 | |||
56 | 06F4 | ۴ | Arabic | EXTENDED ARABIC-INDIC DIGIT FOUR | extended-arabic-indic-digit | ✔ | set 5 | |||
57 | 06F5 | ۵ | Arabic | EXTENDED ARABIC-INDIC DIGIT FIVE | extended-arabic-indic-digit | ✔ | set 6 | |||
58 | 06F6 | ۶ | Arabic | EXTENDED ARABIC-INDIC DIGIT SIX | extended-arabic-indic-digit | ✔ | set 7 | |||
59 | 06F7 | ۷ | Arabic | EXTENDED ARABIC-INDIC DIGIT SEVEN | extended-arabic-indic-digit | ✔ | set 8 | |||
60 | 06F8 | ۸ | Arabic | EXTENDED ARABIC-INDIC DIGIT EIGHT | extended-arabic-indic-digit | ✔ | set 9 | |||
61 | 06F9 | ۹ | Arabic | EXTENDED ARABIC-INDIC DIGIT NINE | extended-arabic-indic-digit | ✔ | set 10 |
Number of variant sets | 12 |
---|---|
Largest variant set | 2 |
Ordinary Variants by Type | allocatable (20) blocked (4) |
Reflexive Variants by Type |
The following tables list each pair of variant mappings on one row. For each pair of code points, by convention, the lower code point is taken as the source of the mapping in the forward → direction and the reverse direction ← is not listed separately. The variant mappings defined in an LGR are required to be symmetric, that is, both the forward and reverse mappings must be specified.
A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.
Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mapping are given in that order, as indicated by the arrows. The same applies to any comments.
In a properly specified LGR, all members of each variant set are variants of each other, a property called transitivity. Because of that, all variant sets are necessarily disjoint. In each set, shading is used to group mappings from the same source code point or sequence.
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0030 | 0 | 06F0 | ۰ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0031 | 1 | 06F1 | ۱ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0032 | 2 | 06F2 | ۲ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0033 | 3 | 06F3 | ۳ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0034 | 4 | 06F4 | ۴ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0035 | 5 | 06F5 | ۵ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0036 | 6 | 06F6 | ۶ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0037 | 7 | 06F7 | ۷ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0038 | 8 | 06F8 | ۸ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0039 | 9 | 06F9 | ۹ | ↔ | allocatable |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 0646 | ن | 06BA | ں | ↔ | blocked |
# | Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
---|---|---|---|---|---|---|---|---|
1 | 06BE | ھ | 06C1 | ہ | ↔ | blocked |
The following table lists all top-level classes with their definition and a list of their members intersected with the current repertoire.
Name | Definition | Count | Members | Comment |
---|---|---|---|---|
implicit | Tag=ASCII-digit | 10 Elements | { 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 } | |
implicit | Tag=extended-arabic-indic-digit | 10 Elements | { 06F0 06F1 06F2 06F3 06F4 06F5 06F6 06F7 06F8 06F9 } |
The following table lists all the top-level, or named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).
Name | Regular Expression | Used as Trigger |
Used as Context |
Anchor | Ref | Comment |
---|---|---|---|---|---|---|
leading-combining-mark | (start)((class property:gc:Mn) | (class property:gc:Mc)) | ✔ |   | [120] | RFC5891 restrictions on placement of combining marks | |
mixed-digits | (choice [(ASCII-digit)(0+ any)(extended-arabic-indic-digit)][(extended-arabic-indic-digit)(0+ any)(ASCII-digit)]) | ✔ |   | Labels with a mix of European and Extended-Arabic-Indic digits are invalid | ||
precedes-right-joining | (anchor)(((class property:jt:D) | (class property:jt:R))) | ✔ | ✔ | must precede a code point joining on the right |
Note: The terminologies used in the regular expressions are followed from RFC7940.
The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
# | Condition | Rule / Variant Set | Disposition | Ref | Comment | |
---|---|---|---|---|---|---|
1 | if label matches | leading-combining-mark | → | invalid | by default, labels with leading combining marks are invalid | |
2 | if label matches | mixed-digits | → | invalid | RTL labels with a mix of European and Arabic-Indic digits are invalid | |
3 | if at least one variant is in | {out-of-repertoire-var} | → | invalid | disallow code points that are out of repertoire | |
4 | if at least one variant is in | {similar} | → | blocked | default action for similarity variants | |
5 | if at least one variant is in | {blocked} | → | blocked | default action for blocked variants | |
6 | if at least one variant is in | {allocatable} | → | allocatable | default action for allocatable variants | |
7 | if any label (catch-all) | → | valid | catch all (default action) |
[120] | RFC5891, Internationalized Domain Names in Applications (IDNA): Protocol http://tools.ietf.org/html/rfc5891 None |