Public Comment CloseStatement
Name 

Status

Assignee(s)

Call for
Comments Open
Call for
Comments
Close 
Vote OpenVote CloseDate of SubmissionStaff Contact and EmailStatement Number

06 August 2018

ADOPTED

13Y, 0N, 0A

26 July 2018

03 August 2018

06 August 2018

09 August 2018

06 August 2018

AL-ALAC-ST-0818-01-01-EN


Hide the information below, please click here 


FINAL VERSION SUBMITTED (IF RATIFIED)

The final version to be submitted, if the draft is ratified, will be placed here by upon completion of the vote. 



FINAL DRAFT VERSION TO BE VOTED UPON BY THE ALAC

The final draft version to be voted upon by the ALAC will be placed here before the vote is to begin.

This draft posted by Alan Greenberg, 5 August 2018, 09:44 UTC-4. it is a minor modification of the comment drafted by Justine Chew which intern was based on an original draft by Alan Greenberg plus many further comments.


The ALAC appreciates the opportunity to comment on ICANN’s Open Data Initiative. The ALAC applauds this ICANN initiative to keep the ICANN Community informed of the data it collects and the resolve to publish collected data assets in as openly form as reasonably permissible.

Centralized, easy access to properly organized data repository

It is noted that the identified datasets are published at various locations. While the ALAC understands that different groups within the ICANN Community, and even within ICANN Org, have varying interest and use for different datasets, it is recommended that all the datasets to be published at a single, centralized online location which is easily accessible to all interested parties.

Descriptions for each dataset should be specific and unambiguous, and perhaps supported by a form of simple keyword-based taxonomy which allows each dataset to be tagged to provide supplemental user-guided context to otherwise general descriptions. This would make the datasets more understandable and searchable as well.

Of great interest to the ALAC are the online means made available to query the collected data. While we appreciate that it may be difficult for ICANN to develop and/or provide a common tool which would satisfy the data querying and analysis needs of every group within the ICANN Community, nevertheless, the ALAC proposes that ICANN engage in some effort to develop or license an tool that would enable the ICANN Community to undertake basic querying of user-selected datasets. Alternatively, the ALAC would appreciate if ICANN can suggests readily available, cost-effective online tools for querying and analysis the datasets.  Education of the recommeneded tool(s) is also crucial. Paramount to both approaches, however, and for the overall success of this initiative, is the continued adoption of the three dimensions of data openness which the ALAC supports.

Types and value of data collected, lack of discernable information

While it has embarked on a laudable start with 231 named datasets, from the ALAC’s perspective, it is not only difficult for us to identify those of most interest to our group, but also those which possess discernable derivative value.

Certainly, ICANN meeting demographics and the data specifically associated with At-Large participants/members rank high on our list, as do those related to competition, consumer trust and consumer choice. But of greater interest to the ALAC is data that is not readily identifiable or discernable from the datasets listed in https://www.icann.org/en/system/files/files/odi-data-asset-inventory-11jun18-en.pdf.

Most obvious is a lack of exhaustive data about contractual compliance and the actions it takes. This is arguably one of the most critical areas of ICANN’s operations and other than some specific data sets compiled for the CCT Review, there appears to be nothing.

Another example that is of interest to At-Large is data associated with the Fellowship. The URL listed implies that the only information to be provided is a list of fellows along with the country and interest area. Absent however are the demographics about the Fellowship applicants (ie those who succeeded versus those who did not). Such critical data is needed to indicate to what extent information about the Fellowship Programme is reaching certain parts of the world, which would in turn facilitate fact-driven corrective action (if necessary) and for planning purposes.

Yet another example that is of interest to us is data associated with the membership of At-Large, in terms of participation rates.

Taking the above-mentioned examples further, there is a need to identify and capture (if not already present) metrics-based downstream data for datasets where there is a sequence of actions to be taken or for which some level of success or effectiveness needs to be measured for programme assessment and planning purposes. For our purposes, downstream data that can certainly inform on the effectiveness of various programmes include, but not limited to, the following:-

  • Contractual compliance: measurements of corresponding action taken, time taken to resolve, patterns of non-compliance, plausible trigger events/reasons for non-compliance
  • CCT-related complaints: types of complaints, time to resolve, patterns of domain name abuse etc, plausible trigger events
  • Fellowship programme: participation metrics of returning fellows versus first-time or one-time fellows, transition from fellows to active community membership
  • Membership, related to ALS and individual members:
    • diversity metrics of by country, region, gender, economy, disability status etc,  
    • participation metrics in At-Large in policy development, education & outreach activities, direct & remote participation in meetings
    • travel-related metrics such as difficulties in obtaining travel support, visas, difficulties with Travel Constituency etc.

Uniformity of and responsibility for data

Understanding the methodology of how data which is of interest to us will be accumulated is also an important consideration. It should be noted that data which is or may be of interest to the ALAC currently resides in separate repositories -- eg  those data collected and controlled exclusively by ICANN Org for ICANN operations versus those data collected by ICANN staff for the ALAC which reside, for all intents and purposes, behind the ALAC website and wiki (“the ALAC’s repositories”).

In this context, some preliminary questions arise:

  • For the data that already exists on the web, are there conceivably duplicates of data residing in separate repositories?
  • Will new data continue to be collected and stored in the existing manner?  If yes, how will ICANN ensure that the two stay in sync with each other?
  • For the purposes of the open data platform, will ICANN Org be querying data in the ALAC’s repositories?  

Privacy rights

The ALAC supports the need to consider privacy rights and recognizes ICANN’s legal obligations in processing and publishing data containing personal elements but cautions against withholding personal data to the point of rendering the data worthless. The approach of anonymizing data may be called for if even such data is NOT made publicly available and this should be applied in general.

In very specific cases where personal data is needed to be shared, and without which would render the data worthless to a user, then ICANN should consider placing confidentiality obligations on users who have been specifically identified and authorised to receive data containing personal elements, to do so on a limited license basis. As an example, limit sharing and use of Fellowship participant data to just the ALAC and not At-Large.

Conclusion

Thus, it would be useful if ICANN Org could assist in re-generating a list of datasets with suggestions on what downstream or upstream information can possibly be gleaned from each dataset. The ALAC believes such an exercise would assist both ICANN Org and the ICANN Community to better understand whether the range of data being collected is sufficiently complete and what related data is available to explain changes in the data, and if not, those that can and ought to be collected.

Once a revised list of datasets is established, it should be submitted for public comment. It is far easier to critique such a list than create it from scratch.




DRAFT SUBMITTED FOR DISCUSSION

The first draft submitted will be placed here before the call for comments begins. The Draft should be preceded by the name of the person submitting the draft and the date/time. If, during the discussion, the draft is revised, the older version(S) should be left in place and the new version along with a header line identifying the drafter and date/time should be placed above the older version(s), separated by a Horizontal Rule (available + Insert More Content control).

Posted by Alan Greenberg, 26 July 2018\


The ALAC appreciates the opportunity to comment on ICANN’s Open Data Initiative.

Although a number of the data assets are of interest to At-Large, with 231 entries in the list, it is difficult to identify those of most interest to our group. Certain easy access to ICANN meeting demographics and the data associated with At-Large are high on our list. The ALAC suggests that once initial priorities are established, this ordered list be submitted for public comment. It is far easier to critique such a list that create it from scratch.

Perhaps of more importance is the data that is not identified here. The most obvious gap (unless it is there under a cryptic name) is exhaustive data about contractual compliance and the actions it takes. This is arguably one of the most critical areas of ICANN’s operations and other than some specific data sets compiled for the CCT Review, there appears to be nothing.

Another example that is of interest to At-Large is data associated with the Fellowship. The URL listed implies that the only information to be provided is a list of fellows along with the country and interest area. Absent however are the demographics about the Fellowship applicants (ie those who suceeded plus those who did not). This could provide critical data on to what extent information about the Fellowship is reaching certain parts of the world.

Understanding the methodology of how the data will be accumulated is of interest. For the data that already exists on the web, will:

  • The new data be derived (scraped from the web); or
  • The web data will be constructed from the data tables; or
  • The two reside independently.

If the latter, how will ICANN ensure that the two stay in sync with each other?

Lastly, of great interest are the tools that will be made available for the ICANN community to use to extract and process the data. The utility of the entire project will greatly hinge on the availability and capabilities of such tools.

23 Comments

  1. Some of what is being discussed here is WAY beyond my level of understanding. Project Open Data Metadata Schema V1.1 is an example.

    But other parts are of great interest and importance. https://www.icann.org/en/system/files/files/odi-data-asset-inventory-spreadsheet-11jun18-en.csv is an example (also available as a PDF - see above). It lists all of the data that ICANN is considering making available. It will take a LONG time to do this, so prioritizing will be important, and the community needs to identify what is important. Much of this information is already available on the web or in reports (and the inventory points to some of it). Lines 189-199 refer to At-Large. The intent is to make this information query-able.

    It is not at all clear once that is done, will the web or report be driven from the database, or will there be two parallel repositories that need to be maintained. This is a problem that we have with the At-Large Wiki and Web.

    They have in many (but not all) cases relied on data that is already out there, but not what might be of interest. For instancein the current PC discussion on the Fellowship, the question came up of the distribution of countries that applied for fellowships. All that is currently published is those selected, The item here (line 62_ seems to simply rely on the currently published information.

  2. We could point out that the data publication in and of itself privileges only those with the capacity to interrogate the data. We should recommend ICANN partner with another nonprofit that builds user friendly data query tools for public use. I would think that the ask would be easy...build a tool to help understand the internet...

  3. ICANN's ODI is a welcome step towards enhancing the transparency of operations, improving stakeholder engagement and ensuring inclusiveness. Continued Open Data can be used innovatively to spot trends, identify blind spots & pain points,  and track the effectiveness of corrective actions.

    Most of the 2015 Open Data Principles for Governments are perhaps also relevant for the ICANN ODI. These include:

    1. Open by Default (ICANN needs to justify data sets that it does not want to share as open data)
    2. Timely and Comprehensive (Publish versioned data, within a publicly-declared period, on the base of a schedule, and ensure it is complete in terms of data and relevant metadata)
    3. Accessible & Usable (Ensure data is easily located, machine parsable, free of charge, and released under an open data license that permits free use, reuse and redistribution--such as the Creative Commons CC0-PDD or the Open Data Commons PDDL)
    4. Comparable and Interoperable (Standards-based data, in comparable units across different organizational units)

    ICANN has made a good start with 75+ data sets, although there are minor issues with the data which will be hopefully fixed in forthcoming versions. Additional data sets that could be released could include:

    1. Policy Submissions by At-Large ALS members and Individual Members
    2. ICANN Meeting Participants by Region, Gender, Stakeholder Group
    3. ICANN Meeting Physical and Remote Participants by Session
    4. ICANN At-Large participants who had visa issues for different meetings
    5. ICANN Fellowship Participants and their continued presence at ICANN Public and Online AC/SO Meetings
    6. NomComm Applicants by Region, Gender and Position; Successful applicants on the same basis
      1. Policy Submissions by At-Large ALS members and Individual Members
      2. ICANN Meeting Participants by Region, Gender, Stakeholder Group
      3. ICANN Meeting Physical and Remote Participants by Session
      4. ICANN At-Large participants who had visa issues for different meetings
      5. ICANN Fellowship Participants and their continued presence at ICANN Public and Online AC/SO Meetings
      6. NomComm Applicants by Region, Gender and Position; Successful applicants on the same basis


      2-5 sound like you are asking for names of people. In a post-GDPR world, that is not likely.

      1. Thanks for the comment, Alan.

        Since we are looking at identifying patterns and trends, names and other personal data are not required.

        In any case, most Open Data datasets have identity information stripped from them, or, in some cases, are actively anonymized. ICANN-- keeping in mind not just GDPR, but also other data protection frameworks including its own--is unlikely to default on this parameter.

      2. But isn't this data (including names) already mentioned in the document above of 'ICANN data asset inventory'? for example in page#3, p. #15

        So, I agree with Sarah's comment and suggestion to include this GDPR-related concern to the public comment.

  4. I think among the data sets in data asset inventory ‘meeting related’ details data (ex: meeting registrations data, meeting session’s data, meeting schedule apps data) are important to publish. ICANN spend lot of money for meetings and it’s really important to get insights about meeting data. Also providing a tool to create insight is important same as publishing.

  5. Thank you, Alan Greenberg for the initial draft. Please see comments below:

    • Please edit paragraph 1/2: .......critique such a list that create it from scratch....... change that to than.

    I also believe that our statement should make mention of the Open Data Initiative in a post-GDPR world. You mentioned it in response to Satish but would be good to point it out in the statement.

    Also, to what extent will users be allowed to input (or not) data into the portal? Some portals allow users to insert data while others allow them to only view it. Of most interest though will be the ability to interact with the data in form of charts, graphs and tables so that it is not another complex wiki/website.

    In addition to the fellowship data (as suggested by you) and meeting details (mentioned by Eranga), I would add public comments. How many responses were received, etc? I have seen such numbers revealed through mailing lists but have struggled to find the data on the website.


  6. It seems that Satish's comments are not fully integrated into the statement. Dataset 6 is not listed in your comments, just fellowship.  In the other datasets, Satish is not asking for names, but regions. So for NARALO how many male/female candidates applied for various positions. Currently they just combine all the positions togetherbut the specifics are interesting and helpful in deciding on which regions need more emphasis. 

    Can we ask for the inclusion or request for the datasets that Satish ahs requested. I, too am very interested in the tools that will be made available for the ICANN community to use to extract and process the data. More guidance on this would be helpful



  7. Also so this is only on Open Data and not on the whole ITI initiative?

    1. I think ITI and ODI are two distinct projects, so this public comment process is unlikely to touch upon ITI.

  8. Thanks for the draft, Alan. I agree with the general direction of the draft.

    I am of the opinion that as At-Large, we could be more specific on certain aspects (maybe at the appropriate occasion, if it's now this public comment). For instance, given our interest in diversity, we would be interested in information on region, gender, economy, disability status etc, for different activities of interest to At-Large such as Policy contributions, direct & remote participation in meetings, visa difficulties and more.

    Most open data initiatives provide data sets, but not usually tools, which are left to the consumers of the data. In fact, any dataset that is tool-specific (for instance, can only be opened using software from a particular vendor) may not even desirable. IMO, what we should ask for is data sets in open, vendor-neutral data formats that renders any kind of downstream processing possible, together with a permissive license that legally allows the community to use the data in any way it proposes.

    1. We have an extension of a week, so please keep the comments coming.

      I will add some language specific to diversity.

      I do not believe I intentionally implied that the data would be tool specific, that it must use provided tools or that we should provide query tools in general. I asked about tools to be provided to our community. Perhaps I was not sufficiently clear.

  9. I'm adding a note to 1) Endorse the principle of producing and publishing more compliance results datasets under OD rules 2) Endorse the additional  high-interest-to-At-Large datasets proposed by Satish 3) publication of the data schema in such a way as to make it tool-agnostic. 4) Endorse ICANN providing some kind of visual comparability tool for data on the portal.


    -Carlton

  10. I like the comment esp that it supports the comments we have made relating to other areas but still relevant eg the need to introduce more data relating to the Fellowship programme.

    There is a need for tools that we can use to collect metrics data to gauge the effectiveness of our outreach and engagement and other activities. Data becomes valuable when we can use this as evidence for regional strategic plans and as justification for associated funding requests, etc. 

  11. I am in agreement with most, if not all, the comments made so far. In particular:-

    • Alan's comment about a subsequent opportunity to critique an orderly list once priorities are established is a good one. It would be great if I*org could assist in generating such an orderly list as well as suggestions on what downward information can possibly be gleaned from each item on such as list, for eg. could CCT-metrics raw data collected over time, with some massaging, allow a user consumer of such data to be able to understand what circumstances or events may have led to changes over time? This exercise would assist both I*org and consumers of data to identify whether the range of important data is being collected is sufficiently complete and what related data can be collected or is available to explain changes in the data.
    • Review to balance privacy demands in publishing data but not to the point of rendering the data worthless – anonymizing approaches may or may not work, it will be a case by case basis I think. Perhaps another way is to place confidentiality obligations on data consumers who are specifically identified and authorised to receive data containing personal elements, or a limited license as Satish suggested.
    • Maureen's comment about ability to collect metrics data – this should apply to all data of a transactional nature - ie data related to events to which there is a sequence of actions to be taken (contractual compliance as Alan highlighted, CCT-related complaints, even elements such average time it took to resolve etc is useful info) or for which some level of success/effectiveness needs to be measured for programme assessment and planning purposes.
    • John's comment about ability to interrogate data easily is also a good one. Satish's point about ODI concentration on datasets rather than tools is correct, in my opinion. Further, as a novice data analyst myself, I think it would be difficult for I*org to develop a tool that would meet the needs to every data consumer, so perhaps limiting their scope to provide a basic tool would be reasonable. But more importantly, I would go one step backward and even inquire about the ability to easily search for and find what data is available to data consumers. As it is, the data collected by ICANN is published in various locations and the ODI data asset inventory spreadsheet is the first collated source of data that has crossed my desk. This is a very useful directory!

  12. Dear all, I have taken the liberty of revising the first draft statement in an attempt to incorporate the contributions posted at this wiki page. I am however, unable to post the said revision without preserving the desired formatting, so I have sent my revision to Evin/At-Large staff for posting on my behalf. I invite everyone here to review and comment on the revision, given that I may not have understood or fully represented your specific contribution. Thanks, Justine
  13. Thank you, Justine, for the original overview which was posted to our emails. I hope Evin puts the statement back online again soon. I think you not only synthesised really well the contributions that were posted but also clearly articulated what is important data and why, for ALAC and At-Large end-users, in a way that anyone could understand. Your statement could form the basis of an At-Large policy fact sheet for our ALS and individual members "What is useful ICANN data for At-Large?" 

  14. This is Justine's original document, reformatted:

    Revised Draft Statement

    The ALAC appreciates the opportunity to comment on ICANN’s Open Data Initiative. The ALAC applauds this ICANN initiative to keep the ICANN Community informed of the data it collects and the resolve to publish collected data assets in as openly form as reasonably permissible.

    Centralized, easy access to properly organized data repository

    It is noted that the identified datasets are published at various locations. While the ALAC understands that different groups within the ICANN Community, and even within ICANN Org, have varying interest and use for different datasets, it is recommended that all the datasets to be published at a single, centralized online location which is easily accessible to all interested parties.

    Descriptions for each dataset should be specific and unambiguous, and perhaps supported by a form of simple keyword-based taxonomy which allows each dataset to be tagged to provide supplemental user-guided context to otherwise general descriptions. This would make the datasets more understandable and searchable as well.

    Of great interest to the ALAC are the online means made available to query the collected data. While we appreciate that it may be difficult for ICANN to develop and provide a common tool which would satisfy the data querying and analysis needs of every group within the ICANN Community, nevertheless, the ALAC proposes that ICANN engage in some effort to develop or license an universal tool that would enable the ICANN Community to undertake basic querying of user-selected datasets. Alternatively, the ALAC would appreciate if ICANN can suggests readily available, cost-effective online tools for querying and analysis the datasets.  Paramount to both approaches, however, and for the overall success of this initiative, is the continued adoption of the three dimensions of data openness which the ALAC supports.

    Types and value of data collected, lack of discernable information

    While it has embarked on a laudable start with 231 named datasets, from the ALAC’s perspective, it is not only difficult for us to identify those of most interest to our group, but also those which possess discernable derivative value.

    Certainly, ICANN meeting demographics and the data specifically associated with At-Large participants/members rank high on our list, as do those related to competition, consumer trust and consumer choice. But of greater interest to the ALAC is data that is not readily identifiable or discernable from the datasets listed in https://www.icann.org/en/system/files/files/odi-data-asset-inventory-11jun18-en.pdf.

    Most obvious is a lack of exhaustive data about contractual compliance and the actions it takes. This is arguably one of the most critical areas of ICANN’s operations and other than some specific data sets compiled for the CCT Review, there appears to be nothing.

    Another example that is of interest to At-Large is data associated with the Fellowship. The URL listed implies that the only information to be provided is a list of fellows along with the country and interest area. Absent however are the demographics about the Fellowship applicants (ie those who succeeded versus those who did not). Such are critical data needed to inform us as to what extent information about the Fellowship Programme is reaching certain parts of the world, which would in turn facilitate data-driven corrective action (if necessary) and for planning purposes.

    Yet another example that is of interest to us is data associated with the membership of At-Large, in terms of participation rates.

    Taking the above-mentioned examples further, there is a need to identify and capture (if not already present) metrics-based downstream data for datasets where there is a sequence of actions to be taken or  for which some level of success or effectiveness needs to be measured for programme assessment and planning purposes. For our purposes, downstream data that can certainly inform on the effectiveness of various programmes include, but not limited to, the following:-

    • Contractual compliance: measurements of corresponding action taken, time taken to resolve, patterns of non-compliance, plausible trigger events/reasons for non-compliance

    • CCT-related complaints: types of complaints, time to resolve, patterns of domain name abuse etc, plausible trigger events

    • Fellowship programme: participation metrics of returning fellows versus first-time or one-time fellows, transition from fellows to active community membership

    • Membership, related to ALS and individual members:

      • diversity metrics of by country, region, gender, economy, disability status etc,  

      • participation metrics in At-Large in policy development, education & outreach activities, direct & remote participation in meetings

      • travel-related metrics such as difficulties in obtaining travel support, visas, difficulties with Travel Constituency etc

    Uniformity of and responsibility for data

    Understanding the methodology of how data which is of interest to us will be accumulated is also an important consideration. It should be noted that data which is or may be of interest to the ALAC currently resides in separate repositories -- eg  those data collected and controlled exclusively by ICANN Org for ICANN operations versus those data collected by ICANN staff for the ALAC which reside, for all intents and purposes, behind the ALAC website and wiki (“the ALAC’s repositories”).

    In this context, some preliminary questions arise:

    • For the data that already exists on the web, are there conceivably duplicates of data residing in separate repositories?

    • Will new data continue to be collected and stored in the existing manner?  If yes, how will ICANN ensure that the two stay in sync with each other?

    • For the purposes of the open data platform, will ICANN Org be querying data in the ALAC’s repositories?  

    Privacy rights

    The ALAC supports the need to consider privacy rights and recognizes ICANN’s legal obligations in processing and publishing data containing personal elements but cautions against withholding personal data to the point of rendering the data worthless. The approach of anonymizing data may be called for if even such data is NOT made publicly available and this should be applied in general.

    In very specific cases where personal data is needed to be shared, and without which would render the data worthless to a user, then ICANN should consider placing confidentiality obligations on users who have been specifically identified and authorised to receive data containing personal elements, to do so on a limited license basis. As an example, limit sharing and use of Fellowship participant data to just the ALAC and not At-Large.

    Conclusion

    Thus, it would be useful if ICANN Org could assist in re-generating a list of datasets with suggestions on what downstream or upstream information can possibly be gleaned from each dataset. The ALAC believes such an exercise would assist both ICANN Org and the ICANN Community to better understand whether the range of data being collected is sufficiently complete and what related data is available to explain changes in the data, and if not, those that can and ought to be collected.

    Once a revised list of datasets is established, it should be submitted for public comment. It is far easier to critique such a list than create it from scratch.

    1. This formulation appears to take into account the comments made during the discussions, and looks good to me.

  15. It is due today, I suggest that Justine's document be submitted as the final draft to be voted on by the ALAC

    1. It is due to be completed by us today and as the pen holder of record, I will do that.

      The ICANN Public Comment closes tomorrow.

  16. Great idea Maureen Hilyard.  I agree.  I really like it.