11 November 2014

Dear all,

Thank you for the opportunity to provide feedback to the draft final report.

I want to echo all the points Emily has collected. (Emily, thank you!). My colleagues discussed the draft report and came to similar conclusions/questions so I don’t want to duplicate the report. We do have one addition.

One page 14, second bullet, there is a recommendation to have language tags to allow easy identification of what the different data entries represent. Here are the questions/comments that came up:

- EPP already supports the use of the localized form of the contact data as well as the internationalized (ASCII) form of the contact data.  What does the data need to be tagged with the language?  What benefit is there?  The EPP RFC does not support passing anything other than the two forms of contact data, meaning no ability to pass language or script tag information, which adds yet another non-essential complexity. 

- If there is a compelling reason to tag all data elements, it should be clearly articulated.  

Sincerely,

Dennis Tan Tanaka

From: owner-gnso-contactinfo-pdp-wg@icann.org [mailto:owner-gnso-contactinfo-pdp-wg@icann.org] On Behalf Of Emily Taylor
Sent: Tuesday, November 11, 2014 6:20 AM
To: Dillon, Chris
Cc: Lars Hoffmann; gnso-contactinfo-pdp-wg@icann.org
Subject: Re: [gnso-contactinfo-pdp-wg] Wednesday 12 November 23:59 UTC soft deadline for comments

Dear Chris

Thank you for this timely reminder.  Over the past few days, I have been gathering input from colleagues in the Registrar Stakeholder group.  There was a rich discussion on the list, with many participants.  These are less comments on the paper itself than contributions to the general discussion of the issues.

Here is a synthesis of the comments. I hope that they will be useful in cross-checking against the "arguments opposing mandatory transformation" on pages 11-12:

1. Costs:  This proposal essentially externalises translation costs from LEA/IP to Registrars, and none of the commentators were convinced that the costs for contracted parties are justified by benefits to others.  Those requesting the data can pay for the translation.

2. Scale:  Why translate/transliterate all WHOIS data, rather than simply those names that are of interest on-the-fly?  Status quo is several orders of magnitude more efficient

3. Accuracy and responsibility: If the premise of  WHOIS data is that it is provided (and declared accurate) by the Registrant, then who accepts responsibility if Registrars are required to alter that data? How would the proposals impact whois data accuracy complaints and whois verification requirements? 

4: Data integrity: The whois should be displaying what the client entered.   Our trying to interpret that only leads to more data errors, and less accurate data. If we change what the client enters it will only lead to errors: 

a.       Will there be rules on how transliterate non-ascii characters so that it can be done programmatically? Is there some standard system to be used, or are we all just counting on Google Translate?  

b.       If human judgment is required, who is responsible for doing it?

c.       If the registrant is responsible, what if they do not know what it should be?

d.       What if a third-party disagrees with the accuracy of a transliteration? 

e.       Is the registrant’s consent required before a transliteration is published in the whois?

f.       Can a registrant withhold consent?

g.       What if a registrant wants to change an “approved” transliteration?

h.       Is a whois verification required every time one of these transliterated fields are updated?

i.    Where does the requirement for data transformation end? Could Chinese LEA require a contracted party to translate/transliterate existing English contact details into Mandarin? Or, what if the original registration was in a third language/script (Russian Cyrillic), would that skip English and go directly to Chinese?  

5.  Compliance: "who will and how will this be policed?”  If ICANN are making cutbacks in their budget, how are they going to afford the human resources to check every Whois transliteration is correct? It doesn’t make much operational sense, and will likely end up with the registrant paying higher fees for something that they never asked for.

6. Internationalisation: The concept starts to erode the “my language, my Internet” / IDN principle of ICANN, by compelling the use of English/Latin/ASCII by people and locations not using those language/script combinations.  One commentator put it as "Sadly, it is North American thinking I suspect. 'We must translate everything into English'. 

7: Competition: If a contracted party does not want to support a language that should be their prerogative. They can turn away business if they decide that they won’t be able to service that customer appropriately.

---------------

General comments

Taking into account the above input, I have the following observations to make on the draft paper.

First, thank you Chris and the ICANN team for your work in the unenviable task of fairly summarising the arguments on both sides.  I appreciate that it is an important step in the process to try and understand the arguments on both sides.

A general point: I have no sense from the paper, or from the discussions in the group, of the scale of the problem we are addressing here.  Do we have any stats for the following:

(1) a breakdown of WHOIS data by country of registrant - and can we infer what language WHOIS data is likely to be in?  The nearest I can get to is this map from OII which shows the predominance of Latin script / English language countries in the current domain market (http://geography.oii.ox.ac.uk/?page=geography-of-top-level-domain-names) .  However, if you look at growth potential, clearly that is not the case.  And IDN registrations by country show a different pattern (see page 17 at http://www.eurid.eu/files/publ/IDNWorldReport2014_Interactive.pdf)

(2) an estimate of what is likely to be the language of WHOIS data if multiple languages were enabled in these fields.  For example, we could perhaps draw some inferences from the IDN registrations in ASCII TLDs.  Approximately 1% of .com and .net registrations are IDNs, and the majority of those are Latin script.  This may not be representative in that the Latin script ending for .com is more likely to be attractive to Latin script IDNs than, say, right to left scripts or pictograms.  There are currently just shy of 900,000 Russian ccTLD IDNs.  Of these over 800,000 has a registrant based in Russia, and uptake in other countries is low (even former Soviet Union).  See http://statdom.ru/tld/%D1%80%D1%84/report/summary/. There are approximately 12,000 IDNs in Arabic script ccTLDs.  Uptake of IDN new gTLDs has been fairly limited.  I don't think that anyone is claiming that the IDN market has even nearly fulfilled its market potential, but can we have some statement of the scale of the problem?

(3) Do we have a sense of how many WHOIS look-ups are performed by law enforcement and IP interests, what percentage that represents of all WHOIS look ups, and how many prove to be problematic in terms of language of contact?  On the other hand, what problems are currently created by not having the ability to record contact details in the script of the domain name (eg for IDNs)?

(4) There have been a number of studies on different aspects of WHOIS data in the last couple of years - do any of these help to guide us?

Specific comments

Page 11 - as you say there is disagreement on "ease" of search.  If you're English mother tongue, then it might be "easier" to understand the output of a search, but any string is searchable, and you can interpret the search results whatever their script/language.

I find the first bullet point unconvincing - it's like saying "why doesn't everyone just learn English?  It's such a mess having all these languages"

On the second bullet point, p11 - I appreciate that a counter argument is stated to the "transformation will to some extent facilitate communication" argument.  The communication argument is a difficult one.  On one level - as demonstrated within this working group and many others - we default to English in order to communicate with one another across different languages.  However, this is also (to some extent) a factor that deters input from those who are not confident in English as a second language - who may be able to give valuable insights into the debate.  I believe that this is captured in "to some extent" but would welcome more acknowledgement that this cuts both ways.

The third bullet point does not explain why it is also necessary to transliterate/translate *all* data for this benefit to be felt. We need some consideration of proportionality here.

Fourth bullet - define "least translatable" - for whom? Is this truly posed as a barrier to law enforcement and others?

To balance the "cyberflight" argument in the fourth bullet point, could we also point out that in general people tend to register and host locally.  This is perhaps a surprising phenomenon given the strength of some registrars internationally.  For example, on page 5 at http://www.eurid.eu/files/publ/IDNWorldReport2014_Interactive.pdf) we have an analysis of country of hosting for gTLD IDNs plus .eu IDNs.  This was done based on the IP ranges associated with the domain names.  You can see that countries and regions with strong international registrars (eg North America, UK) don't really show any "winner" script.  In contrast, Chinese script, Cyrillic, Han (plus Katakana, Hiragana), Thai, Hangul, Arabic script domains tend to be hosted in countries where associated languages are spoken.  

Could I also add that you can see within large IDN namespaces which offer multiple scripts (eg .com and .net) that registrations cluster strongly around popular scripts.  There are very small numbers indeed outside of them.  I can produce some more analysis on that point if people like.

I hope these inputs are helpful to the working group in its deliberations, and I look forward to joining the discussions.

Best wishes,

Emily

From: owner-gnso-contactinfo-pdp-wg@icann.org [mailto:owner-gnso-contactinfo-pdp-wg@icann.org] On Behalf Of Lars Hoffmann
Sent: 07 November 2014 12:35
To: gnso-contactinfo-pdp-wg@icann.org
Subject: [gnso-contactinfo-pdp-wg] soft DEADLINE
Importance: High

Dear all,

In order to move our effort forward as smoothly as possible, we suggest that in preparation for next week’s call to gather as many comments as possible on the latest version of the draft initial report (attached).

Please provide your feedback by Wednesday 12 November 23:59 UTC – if you don’t provide feedback we will assume that you are content with the report as it stands. If you need more time, please let us know.

If you do provide feedback, please do so in track changes and send it back to the list or to lars.hoffmann@icann.org – so that we can collect all comments and discuss a collated version on next week’s call. 

If you have missed yesterday’s call, you can listen to the  MP3 or read the Transcript as Chris gave some very useful background information/explanations to the latest draft. Please note that there is clean version attached, a red-line one should be in your inbox (sent by Chris earlier this week).

Looking forward to hearing back from you – have a great weekend and best wishes,

Lars

--

Emily Taylor

MA(Cantab), MBA
Director

Netistrar Ltd - Domain Names at Trade Prices
W: http://www.netistrar.com | M: 07540 049322 | T: 01283 617808 11 November 2014


11 November 2014

Dear Petter

Thank you for your message, and apologies for the delay in responding to your points.

I wanted to address the claim that because contracted parties had not made noises about ICANN’s advisory they must be okay with it. I’ve attached a letter that I'm informed was provided by the RySG to ICANN staff as a result of the RySG being provided an early version of the advisor for comment. I understand that none of these comments were taken into account by ICANN when they published the advisory and despite being asked why, I don’t believe any answer was forthcoming.

In short, there have been expressions of concern over the recent advisory, and my understanding from discussions on the RrSG list is that many have concerns over transliteration and translation of WHOIS data.

Kind regards

Emily 

--

Emily Taylor

MA(Cantab), MBA
Director

Netistrar Ltd - Domain Names at Trade Prices
W: http://www.netistrar.com | M: 07540 049322 | T: 01283 617808 

 

On 30 October 2014 13:20, Petter Rindforth ... wrote:

Dear All,

Just a last minute summary of 

Some further comments/questions/inputs/suggestions:

(collected from the IP point of view)

Note that ICANN issued an advisory last month clarifying technical aspects of provisions of the 2013 RAA and new gTLD Registry Agreement regarding uniform requirements for presenting Whois data.   https://www.icann.org/resources/pages/registry-agreement-spec4-raa-rdds-2014-09-12-en .  Significantly , it states that  “Registries and Registrars are encouraged to only use  US-ASCII encoding and character repertoire for WHOIS port 43 output.”  The purpose is to facilitate parsing of Whois data by automated tools such as ICANN’s centralized Whois data portal, http://whois.icann.org/ .  Similar arguments would apply to facilitating machine translation.  

 

Thus the status quo is (or will be, by February 2015) that contracted parties are at least “encouraged” to transliterate into ASCII if Whois data is submitted in some other script. 

Has anyone heard any howls of outrage from registries and registrars over this?  

The advisory also states” All domain name labels in the values of any of the fields described in section 1.4.2 of the 2013 RAA, and sections 1.5, 1.6, and 1.7 of Specification 4 of the Registry Agreement (e.g., Domain Name, Name Server, email) MUST be shown in ASCII-compatible form (A-Label).

 

For example, a name server with an IDN label should be shown as:

Name Server: ns1.xn--caf-dma.example.”

 

The referenced fields include virtually all the registrant data we are concerned with.  See the listing in section 1.4.2 of Specification 3 of the 2013 RAA, https://www.icann.org/resources/pages/approved-with-specs-2013-09-17-en .

I’m not certain whether this ASCII requirement applies only to the labels (e.g., “Name Server”) or to the content following the label --- the example given suggests the latter—which further solidifies the idea that contracted parties are already required to transliterate Whois data into ASCII.  But I could be misreading this requirement.   

§§§

·         "I think it would be useful to suggest the requirement that all Whois text be machine-readable text. I’m not sure if that’s already a recommendation of the EWG report, but as one can imagine, the Whois systems that substitute graphics for the e-mail (which, for all we know, could spread to other fields) would stymie attempts at automated translation by users of Whois.

 

·         Does anyone have any ideas for avoiding flight by bad actors to the least translatable languages? One idea would be to require:

 

·        Whois info to be in either the language of the registrar or registrant (i.e. can’t pick some random language just to make it hard to translate), and

 

·        translation or transliteration is required if it’s not in a) Latin characters, b) one of the six U.N. languages, or c) possibly some larger but reasonable set of well-known and widely translatable languages (say, 20 or so)."


--
Petter Rindforth, LL M

Fenix Legal KB
Stureplan 4c, 4tr
114 35 Stockholm
Sweden
Fax: +46(0)8-4631010
Direct phone: +46(0)702-369360
...
www.fenixlegal.eu

13 November 2014

In regards this statement “Registries and Registrars are encouraged to only use  US-ASCII encoding and character repertoire for WHOIS port 43 output.”. The paragraph from ICANN advisory notes on Sep 12 states:

 

As described in RFC 3912, the WHOIS protocol (port-43) has not been internationalized. While a substitute protocol is being developed in the IETF, Registries and Registrars are encouraged to only use US-ASCII encoding and character repertoire for WHOIS (port-43) output. If the RegistryOperator/Registrar uses characters outside of the US-ASCII repertoire, the output MUST be encoded in UTF-8 to maximize the chances of interoperability.

Although iCANN encourages use of US-ASCII it does not exclude the use of other characters sets as long as they are encoded in UTF-8.

 

From: owner-gnso-contactinfo-pdp-wg@icann.org [mailto:owner-gnso-contactinfo-pdp-wg@icann.org] On Behalf Of Petter Rindforth
Sent: Wednesday, November 12, 2014 6:41 PM
To: Emily Taylor
Cc: gnso-contactinfo-pdp-wg@icann.org
Subject: Re: [gnso-contactinfo-pdp-wg] Re: Translation and Transliteration of Contact Information PDP Working Group Thursday 30 October 2014 / some further comments/questions, etc

 

Thanks, Emily.

 

I'll have a meeting within 20 min from now to further discuss this topic (at INTA).

 

Best,

Petter

--
Petter Rindforth, LL M

Fenix Legal KB
Stureplan 4c, 4tr
114 35 Stockholm
Sweden
Fax: +46(0)8-4631010
Direct phone: +46(0)702-369360
...
www.fenixlegal.eu

  • No labels