Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Jump to the Background for the history and background of this issue.

Table of Contents

...

Bug Reports/Observations with the new translation engine (

...

updated January 2019)

Below are some of the bug reports/observations noted during testing. 


 

Description of IssueNoted byDate AddedStatusAdditional Notes on testing, fixes
1

Better identification of

  • which email gives the transbot problems,
  • what and where in the email gives the transbot problems

in its error response emails.

When the translation tool has issues with the email, an email is sent from transbot-no-reply@icann.org with
the static subject line "Unable to translate your email to ICANN lists" and a template text like

Dear <sender>

Thank you for your participation in the ICANN email list new-transbot-en.
You are getting this email because we were unable to translate your post automatically.
It violated one or more of the formatting rules we must impose to make translation possible.

A complete description of the formatting rules is available at:
https://community.icann.org/x/aYtEAg

In preparing your post for translation, we found the following format violations in your message:
    <issue, usually Sentence punctuation must be followed by a space>

Please edit your post and send it again.
Thank you.


Problem is there is no identification of which email this is referring to - perhaps the subject line of the problem email should be included either in the subject line or in the body of the email.

The format violations message doesn't say WHERE the error is in the email. If its a long email, then how can users identify and correct such issues. Persons getting these messages and not easily seeing where the problem is aren't likely to understand how to do future emails better and the warning messages becomes more of an hindrance.

Perhaps a workaround is to have the text of the email in the error email and identify what section the transbot has issues with.


Other questions :

  • Why is the test required? What is this test trying to solve?
  • How are domain names handled? Since domain names can't have spaces, do domain names without beginning with http:// trigger the error?

  

updated  

Status
colourYellow
titleIN PROGRESS

 Tool now identifies the subject line of the problem email in the body of the email

2

<DNT> tag isn't case sensitive.

Using <DNT> tag works to not translate text, but <dnt> does not.
The tool should be able to treat the DNT tag, regardless of case.

07/03/2017

3

The <DNT> tag will be seen in the original email but not in the translated one.

See EN and ES messages.

Q: Should the <DNT> tags be removed in the original email? I'm thinking it should

07/03/2017
Status
colourBlue
titleON HOLD

To achieve this, it would be necessary to re-architecture the transbot code.

Upon investigating the issue it was discovered this is a known limitation, rather than a bug in the existing code.

The request has been recorded as something to consider in a future iteration of the translation tool.

4ES Translated emails seem to remove several line breaks from the EN
See EN and ES messages
 


5

Re: attachments, new-transbot lists have a message and attachment limit of 200K

Given many PDFs will be larger, will be hard to test unless message size limit is raised

Tested with a smaller attachment, the attachment does go through. See EN and ES

08/03/2017

Status
colourGreen
titleFIXED

The new-transbot list email size limit has been increased to 400K.

This is enforced for the entire email, including text and attachment.

6

Handling an email sent to both new-transbot-en and new-transbot-es lists at the same time.

When such an email to both lists happens, some emails don't get translated.

09/03/2017














Fixed Bug Reports/Observations with the new translation engine (updated January 2019)


<DNT> tag isn't case sensitive.
Using <DNT> tag works to not translate text, but <dnt> does not.
The tool should be able to treat the DNT tag, regardless of case.

The <DNT> tag will be seen in the original email but not in the translated one.

See EN and ES messages.
Q: Should the <DNT> tags be removed in the original email? I'm thinking it should

Better identification of

  • which email gives the transbot problems,
  • what in the email gives the transbot problems

in its error response emails.

When the translation tool has issues with the email, an email is sent from transbot-no-reply@icann.org with
the static subject line "Unable to translate your email to ICANN lists" and a template text like

Dear <sender>
Thank you for your participation in the ICANN email list new-transbot-en.
You are getting this email because we were unable to translate your post automatically.
It violated one or more of the formatting rules we must impose to make translation possible.
A complete description of the formatting rules is available at:
https://community.icann.org/x/aYtEAg
In preparing your post for translation, we found the following format violations in your message:
    <issue, usually Sentence punctuation must be followed by a space>
Please edit your post and send it again.
Thank you.

Problem is there is no identification of which email this is referring to - perhaps the subject line of the

problem email should be included either in the subject line or in the body of the email.
2nd, the format violations message doesn't say where the error is in the email. If its a long email, then how to identify and correct?Perhaps a workaround is to have the text of the email in the error email and identify what section the transbot has issues with

Description of IssueNoted byDate AddedStatusAdditional Notes on testing, fixes

A decimal number in an email e.g 3.9MB will trigger the error message

"Sentence punctuation must be followed by a space"
See EN email

  

Status
colourGreen
titleFIXED

 Tool now handles numbers with decimal spaces.

Re: attachments, new-transbot lists have a message and attachment limit of 200K

Given many PDFs will be larger, will be hard to test unless message size limit is raised

Tested with a smaller attachment, the attachment does go through. See EN and ES

08/03/2017
Dev Anand Teelucksingh07/03/2017  2Dev Anand Teelucksingh07/03/2017
Status
colourBlue
titleON HOLD

To achieve this, it would be necessary to re-architecture the transbot code.

Upon investigating the issue it was discovered this is a known limitation, rather than a bug in the existing code.

The request has been recorded as something to consider in a future iteration of the translation tool.

3ES Translated emails seem to remove several line breaks from the EN
See EN and ES messages
Dev Anand Teelucksingh   
4

Re: attachments, new-transbot lists have a message and attachment limit of 200K

Given many PDFs will be larger, will be hard to test unless message size limit is raised

Tested with a smaller attachment, the attachment does go through. See EN and ES

Dev Anand Teelucksingh08/03/2017

Status
colourGreen
titleFIXED

The new-transbot list email size limit has been increased to 400K.

This is enforced for the entire email, including text and attachment.

5

Handling an email sent to both new-transbot-en and new-transbot-es lists at the same time.

When such an email to both lists happens, some emails don't get translated.

Dev Anand Teelucksingh,
based on testing from Alfredo Calderon
09/03/2017  
6Dev Anand Teelucksingh

  

Status
colourYellow
titleIN PROGRESS

 Tool now identifies the subject line of the problem email in the body of the email

7

A decimal number in an email e.g 3.9MB will trigger the error message

"Sentence punctuation must be followed by a space"
See EN email

Dev Anand Teelucksingh  

Status
colourGreen
titleFIXED

 Tool now handles numbers with decimal spaces.

...

The new-transbot list email size limit has been increased to 400K.

This is enforced for the entire email, including text and attachment.










Best practices and other notes on the new translation engine (March 2017)

  • As a best practice, emails in Spanish should be sent to the -es list only, and emails in English should be sent to the -en list only. Unusual behavior has been observed when an email is sent to both the -es and en list at the same time. Insome cases, this results in one list receiving an un-translated and two translated emails; in other cases, the lists receive only un-translated emails. Until we have a solution for this, sending emails to only one list will reduce the risk of errors.
  • Formatting changes by email clients have been observed in translated emails. In these cases, the email engine delivered the email to the translation tool in a different format than seen in the original. Format changes have been inconsistent and generally minor, not impacting the email content. Avoiding extensive formatting and using plain text in the original email has shown to minimize impact to format.

...


Anchor
introduction
introduction
Background

A custom machine translation tool for the Latin American and Caribbean Regional At-Large Organisation (LACRALO) mailing lists was implemented around early 2011. However several issues or factors continue to negatively impact the working of the machine translation tool and in turn, has lead to great difficulty in communication and collaboration with the English and Spanish speaking communities in the LAC region. 


See

View file
nameLACRALO-Mailing-List-Presentation-ICANN53.pdf
height250
presentation by the At-Large Technology Taskforce Working Group for the ICANN53 meeting which has worked to identify the issues and continues to follow this issue with ICANN Staff.

 


To recap, LACRALO has two mailing lists

...

Examples abound from a review of the archives.

 


Panel
bgColorA9F5F2
borderStylesolid
titleAs One Example

(a) First email posted to lac-discuss-en list :
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005932.html
Subject line: [lac-discuss-en] ICANN full list of applied for gTLD strings

(b) which is translated and posted to lac-discuss-es list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004552.html
Subject line: Lista completa de la ICANN solicitó cadenas de gTLD

(c) Someone on the lac-discuss-es list responds posts to lac-discuss-es list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004553.html
Subject line: [lac-discuss-es] Lista completa de la ICANN solicitó cadenas de gTLD

(d) which is translated and posted to the en list as:
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005933.html
Subject: [lac-discuss-en] =? Iso-8859-1? Q? Lista_completa_de_la_ICANN_solici? == Iso-8859-1? Q? T = F3_cadenas_de_gTLD? =

Panel
bgColorA9F5F2
titleAnother example

Another example:

Email on lac-discuss-es list : http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004518.html
Subject line: [lac-discuss-es] RES: Alerta de Noticias de la ICANN - Aviso de Prórroga del período que abarca la ICANN: ICANN FY13 Proyecto de Plan Operativo y Presupuesto

gets translated and posted as an email on lac-discuss-es list: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005897.html
Subject line: [lac-discuss-en] =? Utf-8? Q? RES = 3A_Alerta_de_Noticias_de_la_ICANN_? == Utf-8? Q?-_Aviso_de_Pr = C3 = C3 = ADodo_que_abarca_la_ B3rroga_del_per =? == Utf-8? Q? ICANN 3A_ICANN_FY13_Proyecto_de_Plan_Operativo_y_Presupu =? == utf-8? q? this? =

Note the difference with "Utf-8? Q?" in this example as compared to "Iso-8859-1? Q?" in the previous example.

 


Such gibberish in the subject lines can get even worse if someone responds on the EN list and the translation further scrambles the subject line on the other list. 


Panel

Again, examples abound from a review of the archives but as one example, consider the subject line for an email on lac-discuss-es list


Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-es/2012/004039.html
Subject line: [lac-discuss-es] =? Iso-8859-1? Q? Invitación = F3n_a_la_reuni = F3n_/_LAC? == Iso-8859-1? Q? RALO_Costa_Rica_Eventos_rueda_de_prensa_Grupo_de_Tr? == Iso-8859-1? Q? Abajo_el_martes_06_de_marzo_2012_a_las_20 = 3A00_UTC? =

which gets translated and posted to the EN list as
Email: http://atlarge-lists.icann.org/pipermail/lac-discuss-en/2012/005357.html
Subject line: [lac-discuss-en] =? Iso-8859-1? Q? = 3D = 3F_Iso-8859-1 = 3F_Q = 3F_Invitac? == Iso-8859-1? Q? I = F3n_ = 3D_F3n = 5FA = 5Fla = 5Freuni_ = 3D_F3n = 5F / = 5FLAC = 3F? == iso-8859-1? q? _ = 3D = 3D_Iso-8859-1 = 3F_Q = 3F_RALO = 5FCosta = 5FRica = 5FEv? == iso-8859-1? q ? ents = 5Frueda = 5Fde = 5Fprensa = 5FGrupo = 5Fde = 5FTr = 3F_? == iso-8859-1? q? = 3D = 3D_Iso-8859-1 = 3F_Q = 3F_Abajo = 5Fel 5Fmartes = 5F06 =? == iso-8859-1? q? = 5Fde = 5Fmarzo = 5F2012 = 5FA = 5Flas = 5F20_ = 3D_3A00 = 5FUTC? == iso-8859-1? q? = 3F_ = 3D? =

...

Testers of new translation tool for LACRALO mailing lists


 

 

 

 

...

 



...




Previous bug reports against older version of the translation tool.

 

StatusDate AddedDescriptionAdditional Notes
Status
colourGreen
titleFIXED
28 Feb 2016

Subject line in body of translated email has the sender and first line of the email on the same line, when the sender and first line should be on separate lines. Two examples:

The converted body text was included immediately after the subject and from line in translated emails. A new line character was inserted between sender name and first body text line to separate them.

Initial testing complete; additional testing in progress.

Example: https://community.icann.org/x/AofDAw

 

 



(Noted by Satish Babu)
Status
colourGreen
titleFIXED
28 Feb 2016

There is an empty space in the beginning of most (but not all) lines in the translated email

After investigation it appears empty spaces may have been added by the email client or by Google Translate API.

Space removed before all translated lines in mailing list emails to resolve the issue.

Tested with mail IDs Outlook, Yahoo, Gmail. Verified that translated emails are not indented; empty spaces are not appearing at the beginning of lines.

Example: https://community.icann.org/x/vYPDAw

(Noted by Satish Babu)
Status
colourGreen
titleFIXED
28 February 2016

At the end of the message, the sender's name starts with a lower case ('dev Anand', although it is 'Dev Anand' in the original message
EN (original): http://mm.icann.org/pipermail/new-transbot-en/2016-February/000080.html ES (translated): http://mm.icann.org/pipermail/new-transbot-es/2016-February/000068.html

Google Translate understands 'Dev' as an abbreviation and as a rule converts it to all lowercase.

Name was hardcoded to fix 'Dev' to begin with a capital letter.

Tested with mail IDs Outlook, Yahoo, Gmail. Verified that name is appearing correctly with first letter capitalized.

Example: https://community.icann.org/x/ooPDAw

(Noted by Satish Babu)
Status
colourGreen
titleFIXED
28 February 2016

'Transbot' is mis-spelled as 'tansbot' (third line from the bottom)
EN (original): http://mm.icann.org/pipermail/new-transbot-en/2016-February/000080.html ES (translated): http://mm.icann.org/pipermail/new-transbot-es/2016-February/000068.html

Misspelling was hardcoded. Applied hardcode fix to correct spelling of 'tansbot' to 'transbot' in translated emails.

Tested with mail IDs Outlook, Yahoo, Gmail. Verified that spelling is now appearing correctly as 'transbot.'

Example: https://community.icann.org/x/oILDAw

Status
colourGreen
titleFIXED
April 17 2016

the transbot can't handle cedilla - as At-Large Staff signature lists a staff member with a cedilla in her name, any message from At-Large Staff will result in a message not translated
See

The April thread with the subject line "CALL FOR MEMBERS: At-Large Public Interest Working Group" on EN : http://mm.icann.org/pipermail/new-transbot-en/2016-April/thread.html and ES : http://mm.icann.org/pipermail/new-transbot-es/2016-April/thread.html showed how the issue was isolated after several variations of the original email were tried.

This is a critical bug, as any cedilla in any word in an email would result in the email not being translated.

The issue has been resolved specific to the reported cases of broken emails caused by the cedilla character in the AL Staff signature.

As a larger issue, it is still in progress. Efforts around the reported case led to wider investigation into how the transbot and email applications handle Unicode characters. This is important UTF-8 compliance work and requires extensive testing.

Recent tests with a wider set of characters using Outlook have been successful. Tests with those same characters have been inconsistent with Gmail and Yahoo. The team is continuing to research, test, and make progress.

Examples: https://community.icann.org/x/pYrDAw

Status
colourGreen
titleFIXED
April 17 2016

The phrase "This Working Group is open to interested members of the At-Large community." gets translated to

"Este grupo de trabajo está abierto a los miembros interesados \u200b\u200bde la comunidad de alcance."

Not sure why it repeatedly happens for that phrase: See
EN: http://mm.icann.org/pipermail/new-transbot-en/2016-April/000090.html
ES: http://mm.icann.org/pipermail/new-transbot-es/2016-April/000078.html

The issue was related to zero-width space, which was being injected by Google Translate API.

Zero-width space is used after characters that aren't followed by a visible space, but after which there may be a line break (source). It was encoded into the Unicode and was appearing as /u200b.

To fix, it was replaced in the translated text with no space.

Tested with mail IDs Outlook, Yahoo, Gmail. Verified that phrase is translated correctly without additional characters.

Example: https://community.icann.org/x/t4PDAw

 

  



Email subject lines can get jumbled and distorted along threads of translated emailsThis issue is related to extra spaces in subject lines, and research shows it is a known issue with Microsoft Office/Mail-man server. There is not a known resolution at this time. As a workaround solution, the new test lists were designed so that subject lines aren't translated and original subject are retained.
  
 



Attachments are not retained on translated emailsChanges were made to support attachments on translated emails. The file formats that will be retained between lists are TXT, PDF, WORD, JPEG, PPT, PNG, GIF

 

 

...