You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »


Testing of new translation tool for LACRALO mailing lists

ICANN Staff have created two mailing lists (New-transbot-en and New-transbot-es) with a select number of persons on those lists for testing purposes.

Some of the key changes implemented in the new translation tool.

  • The lack of punctuation was identified as a key issue for the poor translation of emails. This is because the translate tool can only send a certain amount of characters to the Google Translate API. Without punctation, the translation tool would have to send text mid setence. One of the outcomes from the LACRALO translation WG was the Proposed Notice when email is not translated message which would be sent to the user if the email had formatting issues.
  • Subject lines would not be translated to ensure the conversation thread would not be lost and reduce the chance of garbled subject lines.  

 

Below are some of the bug reports/observations noted during testing 

 

 Date AddedDescriptionStatusAdditional Notes
 Nov 27 2014Emails sent to new-transbot-en resulted in a "Sentence punctuation must be followed by a space" error messageFIXED"The Sentence punctuation error I believe was being caused by the footer being added to the list which is normally removed."
 Dec 11 2014Periods aren't used when a URL is cited at the end (or on its own line) since that would make it a different URL. However, this would result in the "Sentence punctuation must be followed by a space" method   
 Dec 12 2014parser unable to handle HTML formatted emails with URLs - URLs are missing from translation. See http://mm.icann.org/transbot2_archive/efc07cecca.html ; many emails from ALAC_announce are using HTML formatted emails   
 Jan 30 2015The error message from transbot-no-reply@icann.org with the subject line: "Unable to translate your email to ICANN lists" is generic. There is no idea as to what email message the error email is referring to.

  

 

 

History

  • The LACRALO list in English: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/ and LACRALO list in Spanish: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/ were machine translated around September 2008. The translation engine was based on Systran.
  • The quality of translation left a lot to be desired in terms of the language translation. Also, there were technical issues such as the subject lines of translated emails becoming garbled.
  • During the LACRALO meeting in Cartagena in 2010, there were intense discussions regarding the quality of mailing list translations. Some wanted to disable the translation entirely. Others wanted to keep the imperfect translation for mobile  users and disabling translation would send the wrong signal to ICANN when ICANN should be giving language services a higher priority. ICANN staff acknowledged the issues and stressed that IT staff were aware of the issues and would be looking to solve the translation problems.
  • Around May 2011, the translation engine was switched from Systran to Google Translate.
  • in May 2011, Google announced that it would be depreciating the (then) free Google Translate API (http://googlecode.blogspot.com/2011/05/spring-cleaning-for-some-of-our-apis.html) in a few months but later (due to public outcry) announced that a paid version of the translate API would be offered.
  • Other issues noted around June 2011 were identified, one was the translation engine would stop translating emails when they became too big. This meant that a conversation thread would stop being translated leading to missing emails from one list not being translated to the other list. See http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2011/004308.html for the solution to this (translation tool would stop translating quoted text)
  • Around August 2011, the translation engine was switched back to Systran due to Google rejecting calls to the Google Translate API. See http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2011/004368.html
  • Updarte to the translation engine: September 2011, http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2011/004498.html
  • June 2012, detailed analysis of translation issues (see below)

 

 

Background - Identification of LACRALO translation issues 

(this section is from a June 18 2012 email)

Dear David,

With the implementation of the custom machine translation tool for the Latin American and Caribbean Regional At-Large Organisation (LACRALO) mailing lists since mid last year (2011), several issues or factors continue to negatively impact the working of the machine translation tool and in turn, has lead to great difficulty in communication and collaboration with the English and Spanish speaking communities in the LAC region.

To recap, LACRALO has two mailing lists

Emails in english sent to lac-discuss-en@atlarge-lists.icann.org are machine translated via your custom tool using Google Translate and posted to lac-discuss-es@atlarge-lists.icann.org.

Similarly, emails in Spanish sent to the lac-discuss-es@atlarge-lists.icann.org are translated and posted to lac-discuss-en@atlarge-lists.icann.org.

 

To date, several issues have been detected

1) Attachments in emails sent to a list are not received on the other list.

When an email with attachments such as PDFs is sent to one list, the subject line and body of the email is translated and sent to the other list BUT without the attachment.

2) Subject lines of translated emails from ES to EN become garbled.

The subject line of translated emails (seemingly) from the lac-discuss-ES list to the lac-discuss-EN list often translated to garbled text.

Examples abound from a review of the archives.

 

As One Example

(a) First email posted to lac-discuss-en list :
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005932.html
Subject line: [lac-discuss-en] ICANN full list of applied for gTLD strings

(b) which is translated and posted to lac-discuss-es list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004552.html
Subject line: Lista completa de la ICANN solicitó cadenas de gTLD

(c) Someone on the lac-discuss-es list responds posts to lac-discuss-es list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004553.html
Subject line: [lac-discuss-es] Lista completa de la ICANN solicitó cadenas de gTLD

(d) which is translated and posted to the en list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005933.html
Subject: [lac-discuss-en] =? Iso-8859-1? Q? Lista_completa_de_la_ICANN_solici? == Iso-8859-1? Q? T = F3_cadenas_de_gTLD? =

Another example

Another example:

Email on lac-discuss-es list : http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004518.html
Subject line: [lac-discuss-es] RES: Alerta de Noticias de la ICANN - Aviso de Prórroga del período que abarca la ICANN: ICANN FY13 Proyecto de Plan Operativo y Presupuesto

gets translated and posted as an email on lac-discuss-es list: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005897.html
Subject line: [lac-discuss-en] =? Utf-8? Q? RES = 3A_Alerta_de_Noticias_de_la_ICANN_? == Utf-8? Q?-_Aviso_de_Pr = C3 = C3 = ADodo_que_abarca_la_ B3rroga_del_per =? == Utf-8? Q? ICANN 3A_ICANN_FY13_Proyecto_de_Plan_Operativo_y_Presupu =? == utf-8? q? this? =

Note the difference with "Utf-8? Q?" in this example as compared to "Iso-8859-1? Q?" in the previous example.

 

Such gibberish in the subject lines can get even worse if someone responds on the EN list and the translation further scrambles the subject line on the other list.

 

Again, examples abound from a review of the archives but as one example, consider the subject line for an email on lac-discuss-es list


Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004039.html
Subject line: [lac-discuss-es] =? Iso-8859-1? Q? Invitación = F3n_a_la_reuni = F3n_/_LAC? == Iso-8859-1? Q? RALO_Costa_Rica_Eventos_rueda_de_prensa_Grupo_de_Tr? == Iso-8859-1? Q? Abajo_el_martes_06_de_marzo_2012_a_las_20 = 3A00_UTC? =

which gets translated and posted to the EN list as
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005357.html
Subject line: [lac-discuss-en] =? Iso-8859-1? Q? = 3D = 3F_Iso-8859-1 = 3F_Q = 3F_Invitac? == Iso-8859-1? Q? I = F3n_ = 3D_F3n = 5FA = 5Fla = 5Freuni_ = 3D_F3n = 5F / = 5FLAC = 3F? == iso-8859-1? q? _ = 3D = 3D_Iso-8859-1 = 3F_Q = 3F_RALO = 5FCosta = 5FRica = 5FEv? == iso-8859-1? q ? ents = 5Frueda = 5Fde = 5Fprensa = 5FGrupo = 5Fde = 5FTr = 3F_? == iso-8859-1? q? = 3D = 3D_Iso-8859-1 = 3F_Q = 3F_Abajo = 5Fel 5Fmartes = 5F06 =? == iso-8859-1? q? = 5Fde = 5Fmarzo = 5F2012 = 5FA = 5Flas = 5F20_ = 3D_3A00 = 5FUTC? == iso-8859-1? q? = 3F_ = 3D? =

 

3) Missing [lac-discuss-es] in subject lines of translated emails posted to the lac-discuss-es list


Consider example #1 again -
First email posted to lac-discuss-en list :
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005932.html
Subject line: [lac-discuss-en] ICANN full list of applied for gTLD strings

which is translated and posted to lac-discuss-es list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004552.html
Subject line: Lista completa de la ICANN solicitó cadenas de gTLD

The email at http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004552.html shows that the subject line is missing the [lac-discuss-es]. This hampers filtering by ES users and makes it difficult to track threaded conversations.

4) Unusual superscript and other odd characters in translated emails

There have been numerous complaints about the quality of the translation of the actual body of emails with strange characters, some of which are superscript characters appearing in the translated version.
Examples aboud (repeating phrase, I'm afraid) on the LACRALO list archives, here is one example:
Email to lac-discuss-en : http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005858.html
got translated to this on the lac-discuss-es list: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004483.html

As you can see,
* a character like a double quote " is translated to "
* a word like "organisation" is translated to organización
* a sentence like "The highest decision making body in any organisation is also subject to rules." is translated to
"El más alto órgano de decisión en cualquier la organización también está sujeto a reglas."

 

This is a summary of the key issues affecting the machine translation of emails in LACRALO. I hope to have the opportunity to chat with you in Prague
and trust the identification of the issues in this email is sufficient to clarify the problems so that solutions can be developed.


Kind Regards,

Dev Anand Teelucksingh

 


 

  • No labels