LGR for urd-Arab

This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.

LGR Version 1
Date 2017-04-26
Unicode Version 6.3.0
Language urd-Arab

Description

Label Generation Rules for Urdu

Overview

This document specifies a reference set of Label Generation Rules for Urdu using a limited repertoire as appropriate for a second level domain. This is a DRAFT document for public comments released by the Task Force for Arabic Script IDNs - TF-AIDN.

Repertoire

The repertoire consist of the 39 letters of the Urdu alphabet, two sets of digits and the hyphen. The code point U+0626 (ئ) is only allowed when in intiail or medial position.

Variants

Blocked variants

The following pairs of code points are blocked variants: U+0646 (ن) / U+06BA (ں) and U+06C1 (ہ) / U+06BE (ھ).

Allocatable variants

Corresponding members of the two sets of digits (ASCII digits and Exented Arabic Indic digits) are allocatable variants of each other

Out-of-repertoire Variants

Script-level Variants

This LGR does not include any script level variants. In case this LGR is integrated with other LGRs using Arabic script, the variant sets defined in the Root Zone LGR (see https://www.icann.org/sites/default/files/lgr/lgr-1-arabic-script-24feb16-en.html) should be considered.

Rules

Repertoire

Summary

Number of elements in Repertoire 61
Number of extended elements 0
Number of excluded elements 0
Total entries in table 61
Number of code point sequences 0

Repertoire by Code Point

The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where the comment in the original LGR is equal to the character name, it has been suppressed.

For any code point or sequence for which a variant is defined, the link to the associated variant set, or if mapped to itself, the variant type of that mapping is provided in the Variants column.

# Code
Point
Glyph Script Name Tags Required Context Part of
Repertoire
Variants Comment References
1 002D - Common HYPHEN-MINUS        
2 0030 0 Common DIGIT ZERO ASCII-digit   set 1    
3 0031 1 Common DIGIT ONE ASCII-digit   set 2    
4 0032 2 Common DIGIT TWO ASCII-digit   set 3    
5 0033 3 Common DIGIT THREE ASCII-digit   set 4    
6 0034 4 Common DIGIT FOUR ASCII-digit   set 5    
7 0035 5 Common DIGIT FIVE ASCII-digit   set 6    
8 0036 6 Common DIGIT SIX ASCII-digit   set 7    
9 0037 7 Common DIGIT SEVEN ASCII-digit   set 8    
10 0038 8 Common DIGIT EIGHT ASCII-digit   set 9    
11 0039 9 Common DIGIT NINE ASCII-digit   set 10    
12 0621 ء Arabic ARABIC LETTER HAMZA        
13 0622 آ Arabic ARABIC LETTER ALEF WITH MADDA ABOVE        
14 0626 ئ Arabic ARABIC LETTER YEH WITH HAMZA ABOVE   precedes-right-joining     
15 0627 ا Arabic ARABIC LETTER ALEF        
16 0628 ب Arabic ARABIC LETTER BEH        
17 062A ت Arabic ARABIC LETTER TEH        
18 062B ث Arabic ARABIC LETTER THEH        
19 062C ج Arabic ARABIC LETTER JEEM        
20 062D ح Arabic ARABIC LETTER HAH        
21 062E خ Arabic ARABIC LETTER KHAH        
22 062F د Arabic ARABIC LETTER DAL        
23 0630 ذ Arabic ARABIC LETTER THAL        
24 0631 ر Arabic ARABIC LETTER REH        
25 0632 ز Arabic ARABIC LETTER ZAIN        
26 0633 س Arabic ARABIC LETTER SEEN        
27 0634 ش Arabic ARABIC LETTER SHEEN        
28 0635 ص Arabic ARABIC LETTER SAD        
29 0636 ض Arabic ARABIC LETTER DAD        
30 0637 ط Arabic ARABIC LETTER TAH        
31 0638 ظ Arabic ARABIC LETTER ZAH        
32 0639 ع Arabic ARABIC LETTER AIN        
33 063A غ Arabic ARABIC LETTER GHAIN        
34 0641 ف Arabic ARABIC LETTER FEH        
35 0642 ق Arabic ARABIC LETTER QAF        
36 0644 ل Arabic ARABIC LETTER LAM        
37 0645 م Arabic ARABIC LETTER MEEM        
38 0646 ن Arabic ARABIC LETTER NOON     set 11    
39 0648 و Arabic ARABIC LETTER WAW        
40 067E پ Arabic ARABIC LETTER PEH        
41 0686 چ Arabic ARABIC LETTER TCHEH        
42 0688 ڈ Arabic ARABIC LETTER DDAL        
43 0691 ڑ Arabic ARABIC LETTER RREH        
44 0698 ژ Arabic ARABIC LETTER JEH        
45 06A9 ک Arabic ARABIC LETTER KEHEH        
46 06AF گ Arabic ARABIC LETTER GAF        
47 06BA ں Arabic ARABIC LETTER NOON GHUNNA     set 11    
48 06BE ھ Arabic ARABIC LETTER HEH DOACHASHMEE     set 12    
49 06C1 ہ Arabic ARABIC LETTER HEH GOAL     set 12    
50 06CC ی Arabic ARABIC LETTER FARSI YEH        
51 06D2 ے Arabic ARABIC LETTER YEH BARREE        
52 06F0 ۰ Arabic EXTENDED ARABIC-INDIC DIGIT ZERO extended-arabic-indic-digit   set 1    
53 06F1 ۱ Arabic EXTENDED ARABIC-INDIC DIGIT ONE extended-arabic-indic-digit   set 2    
54 06F2 ۲ Arabic EXTENDED ARABIC-INDIC DIGIT TWO extended-arabic-indic-digit   set 3    
55 06F3 ۳ Arabic EXTENDED ARABIC-INDIC DIGIT THREE extended-arabic-indic-digit   set 4    
56 06F4 ۴ Arabic EXTENDED ARABIC-INDIC DIGIT FOUR extended-arabic-indic-digit   set 5    
57 06F5 ۵ Arabic EXTENDED ARABIC-INDIC DIGIT FIVE extended-arabic-indic-digit   set 6    
58 06F6 ۶ Arabic EXTENDED ARABIC-INDIC DIGIT SIX extended-arabic-indic-digit   set 7    
59 06F7 ۷ Arabic EXTENDED ARABIC-INDIC DIGIT SEVEN extended-arabic-indic-digit   set 8    
60 06F8 ۸ Arabic EXTENDED ARABIC-INDIC DIGIT EIGHT extended-arabic-indic-digit   set 9    
61 06F9 ۹ Arabic EXTENDED ARABIC-INDIC DIGIT NINE extended-arabic-indic-digit   set 10    

Legend

Code Point
A code point or code point sequence.
Name
Shows the character or sequence name from the Unicode Character Database.
Glyph
The shape displayed depends on the fonts available to your browser.
Script
Shows the script property value from the Unicode Character Database. Combining marks may have the value Inherited and code points used with more than one script may have the value Common.
References
Links to the references associated with the code point or sequence, if any.
Tags
LGR-defined tag values. Any tags matching the Unicode script property are suppressed in this view.
Required Context
Link to the rule defining the required context a code point or sequence must satisfy. If prefixed by "not:", identifies a context that must not occur.
Variants
A link to the variant set the code point or sequence is a member of, except where a coded point or sequence maps only to itself, in which case the type of that mapping is listed.
Comment
If the comment in this row consists only of the code point or sequence name it is suppressed in this view.
✔ - core repertoire
A check mark in the Part-of-Repertoire column indicates a code point is part of the core repertoire.
◯ - extended repertoire
An open circle indicates a code point is part of an optional extended repertoire, which is normally disabled but could be supported by deleting the relevant context restriction.
✗ - excluded from repertoire
A code point shown with is considered excluded from the repertoire. It is shown only for review purposes.

Variant Sets

Summary

Number of variant sets 12
Largest variant set 2
Ordinary Variants by Type allocatable (20)
blocked (4)
Reflexive Variants by Type  

The following tables list each pair of variant mappings on one row. For each pair of code points, by convention, the lower code point is taken as the source of the mapping in the forward → direction and the reverse direction ← is not listed separately. The variant mappings defined in an LGR are required to be symmetric, that is, both the forward and reverse mappings must be specified.

A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.

Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mapping are given in that order, as indicated by the arrows. The same applies to any comments.

In a properly specified LGR, all members of each variant set are variants of each other, a property called transitivity. Because of that, all variant sets are necessarily disjoint. In each set, shading is used to group mappings from the same source code point or sequence.

Variant Set 1 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0030 0 06F0 ۰ allocatable    

Variant Set 2 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0031 1 06F1 ۱ allocatable    

Variant Set 3 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0032 2 06F2 ۲ allocatable    

Variant Set 4 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0033 3 06F3 ۳ allocatable    

Variant Set 5 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0034 4 06F4 ۴ allocatable    

Variant Set 6 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0035 5 06F5 ۵ allocatable    

Variant Set 7 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0036 6 06F6 ۶ allocatable    

Variant Set 8 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0037 7 06F7 ۷ allocatable    

Variant Set 9 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0038 8 06F8 ۸ allocatable    

Variant Set 10 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0039 9 06F9 ۹ allocatable    

Variant Set 11 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 0646 ن 06BA ں blocked    

Variant Set 12 — 2 Members

# Source Glyph Target Glyph   Type(s) Ref Comment
1 06BE ھ 06C1 ہ blocked    

Classes, Rules and Actions

Character Classes

The following table lists all top-level classes with their definition and a list of their members intersected with the current repertoire.

Name Definition Count Members Comment
implicit Tag=ASCII-digit 10 Elements { 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039 }  
implicit Tag=extended-arabic-indic-digit 10 Elements { 06F0 06F1 06F2 06F3 06F4 06F5 06F6 06F7 06F8 06F9 }  

Legend

Members
Lists the members of the class as code points (xxx). Any class too numerous to list in full is linked with "..." to all the members.
Tag=ttt
A named class is defined by all code points that share the given tag value (ttt).
Implicit
An anonymous class implicitly defined class based on tag value.

Whole label evaluation and context rules

The following table lists all the top-level, or named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).

Name Regular Expression Used as
Trigger
Used as
Context
Anchor Ref Comment
leading-combining-mark (start)((class property:gc:Mn) | (class property:gc:Mc))     [120] RFC5891 restrictions on placement of combining marks
mixed-digits (choice [(ASCII-digit)(0+ any)(extended-arabic-indic-digit)][(extended-arabic-indic-digit)(0+ any)(ASCII-digit)])       Labels with a mix of European and Extended-Arabic-Indic digits are invalid
precedes-right-joining (anchor)(((class property:jt:D) | (class property:jt:R)))     must precede a code point joining on the right

Legend

Regular Expression
A regular expression equivalent to the rule, shown in the standard notation with some extensions as noted.
[] - a choice
When there are various choices in a rule, each choice is represented by a set enclosed in square brackets.
[∩,−,Δ,∪] - set operators
Sets may be combined by set operators ( = intersection, = difference, Δ = symmetric difference and = union).
()= - empty set
Indicates that the following set is empty because of the result of set operations, or because none of its elements are part of the repertoire defined here.An empty set that is not optional means that a rule can never match.

Note: The terminologies used in the regular expressions are followed from RFC7940.

Actions

The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.

# Condition Rule / Variant Set   Disposition Ref Comment
1 if label matches leading-combining-mark invalid   by default, labels with leading combining marks are invalid
2 if label matches mixed-digits invalid   RTL labels with a mix of European and Arabic-Indic digits are invalid
3 if at least one variant is in {out-of-repertoire-var} invalid   disallow code points that are out of repertoire
4 if at least one variant is in {similar} blocked   default action for similarity variants
5 if at least one variant is in {blocked} blocked   default action for blocked variants
6 if at least one variant is in {allocatable} allocatable   default action for allocatable variants
7 if any label (catch-all) valid   catch all (default action)

Legend

{...} - variant type set
In the "Rule/Variant Set" column the notation {...} means a set of variant types.

Table of References

[120] RFC5891, Internationalized Domain Names in Applications (IDNA): Protocol http://tools.ietf.org/html/rfc5891
None