Tuesday, November 30, 2010

How confident is Google about its language detection?


Google provides a very handy online tool for language detection -- http://www.google.com/uds/samples/language/detect.html. After you input something and hit the Detect Language button, the result is shown below telling you what language Google thinks it is, together with whether Google thinks the result is reliable or not, and how confident it is.

The confidence level is between 0-1, so a value of 0.08 means Google has a confidence level of 8%.

I played with it and found some interesting results. First, I tried the word bell. Google thought it was English with a confidence level of 4.75%. Then Jingle bells. Surprisingly, two English words got a lower confidence level 1.36%. Well, that may be because jingle was an English word with a confidence level of 0.27%, and bells (with an s, it went lower to) 1.62%. But Jingle bells got a confidence of bells minus (yes, minus, not plus) Jingle.

Let us continue --
  • Jingle bells, jingle bells, (1.36%. Repetition does not increase the confidence.)
  • Jingle all the way; (32.91%)
  • Oh! what fun it is to ride (59.7%)
  • In a one-horse open sleigh. (34.12%. The confidence drops.)
  • the whole thing (Jingle bells, jingle bells, Jingle all the way; Oh! what fun it is to ride In a one-horse open sleigh.) is 81.57%.
 So, here are the rules --
  • plural form (or other forms) lowers the confidence;
  • more words may lower the confidence;
  • repetition does not increase the confidence;
  • your input history does not help Google to build up the confidence;
  • overconfidence is bad. Google is not 100% confident with any words, so Google is conservative.

Monday, November 15, 2010

Whole-page translation problem in the early versions of Google Dictionary and Google Translate


(The issue is fixed in version 1.2.1.)

Although I had tested all the supported languages with the whole-page translation feature of the Google Dictionary and Google Translate extension for Firefox, the reviewers said that the whole-page translation did not work at all. This puzzled me for quite some time and I rewrote most of that part of code.

The last error they got was an exception of "this.c[0] is undefined" from "https://translate.googleapis.com/translate_static/js/element/main.js Line: 96". That was generated inside the code of Google Translate. I really appreciate they sent me this information as it gave me a hint that Google Translate was expected something from the request. What could it be? I had no clue.

That day, when I was checking the add-on's statistics, I suddenly found that some browsers have their locale set as "Null". What??

Google Dictionary and Google Translate extension has the feature to try to auto detect the language for the user if he/she does not set it, but this feature was not so strong. By setting the locale (general.useragent.locale in configuration) of the browser as "Null" -- a string, I can produce the same error the reviewers did. Why would some browsers have such a strange locale? It is still a mystery to me. I soon found out that not only "Null", but also many of the locales which have flavors could cause the problem.

It is always easier to fix than to find the bug. After I have strengthened the language auto-detection feature in version 1.2.1, the problem is gone. It seems the language auto-detection works only on Windows platform, but at least now anything of the locale does not bother the add-on any more.


Wednesday, November 10, 2010

Crash, core dump and gdb


When a linux program crashes, a core file could be generated under the directory where you ran the file. A core file contains very useful information to help you debug the program. If there is no such a file named core.XXXX, it may be that the parameter of the maximum size of core files is set as 0. Check it with this command:

    $ ulimit -c

If the result is 0, set a size for it, e.g.

    $ ulimit -c 100000

If the core file is generated, you can run gdb to exam the core file and see why and where the crash happened.

    $ gdb /path/to/your/program core.XXXX

If your source code is somewhere else, include the location of the source code in the gdb command line.

    $ gdb -d /path/to/source-code /path/to/program core.XXXX

It brings you to the crash location in your source code. If you want to see a back trace, run the gdb command bt.

    (gdb) bt

 
Get This <