Sunday, December 19, 2010

Why Google Ngram isn't ready for prime time

I read about Google Ngram Viewer on reddit a week or two ago, and immediately found it fascinating. Imagine, that! A tool to let you see when a word achieved currency and how its popularity changed over time compared to other words! I predict that this will become an essential tool for people who are interested in language, both privately and professionally.

Leaving aside the rather noisy section before around 1650 (small sample size, mis-dated books and poor print quality make this a rough patch), it still has some problems. Let's see, for example, when people started talking about singularities:


Wow, nothing before 1800, except a bit of early chatter? How strange. Here's the first quirk of Ngram: it's case sensitive, and English used to do what German does today: they capitalize most nouns. Let's add the graph of "Singularity" to the mix:


Fascinating! Now we get the early uses, but there's a really weird gap in the usage levels. Surely mathematics didn't go through a frightful drought in the late 1700s? Here's where a bit of detective work is needed. English used to use the "long s" between around 1650 and 1800:


So now we can build the full picture:


So, to make ngram work properly, Google needs to add:
  • A case-insensitive version
  • Ideally, better recognition of variant letter forms, or at leaſt a warning for miſsing variants.
  • An option to sum different forms (for example, to show a graph of the sum of "singularity" and "ſingularity" versus "infinity".

Wednesday, November 17, 2010

Samsung are actually quite cool!

I bought a cheap Samsung all-in-one printer yesterday, and not having much time, didn't go reading all over the web for Linux installation instructions.  I just downloaded their Linux driver pack from the website (a .tar.gz file, about 30 MB!) and read through the install script.  It looked quite friendly, so I ran it as root (!!!) on my Debian Lenny box.

It installed the right PPD file for cups, checked that normal users were in the correct user group to be able to print, set up SANE correctly and so on.

As part of my testing, I tried "ls | lpr" - WTF, a GUI?

It turns out it installs its own GUI version of lpr, but renames the old one to "lpr.orig".  So lpr.orig works as normal, and lp still works normally, but when you print things from openoffice or xpdf with the default command, you get a typical "printer settings" screen, which isn't a bad thing for normal users.

There's also a GUI "printer settings" application which it for some reason puts on the KDE desktop (oh well), and if you do "print test page" it actually prints out a CUPS test page!  Sane works very nicely with the scanner, and knows when you put something in the document feeder (the preview window vanishes!)

Bottom line: Samsung actually pay people who know something about *nix to hack away at making their printer installation beginner-friendly, and if that's not good enough for you, if you don't want "all that other stuff", you can do your own custom install, for example by grabbing the ppd file out of their install tarball.
(Edit: I have since learned that free and open drivers are available through the splix project.)

That's very cool in my book.

Monday, November 15, 2010

Success through doing things badly


Where would the science of sweeteners be without shockingly bad lab practice?

Saccharin was first produced in 1878 by Constantin Fahlberg, a chemist working on coal tar derivatives in Ira Remsen's laboratory at the Johns Hopkins University. The sweet taste of saccharin was discovered when Fahlberg noticed a sweet taste on his hand one evening, and connected this with the compound which he had been working on that day.

Cyclamate was discovered in 1937 at the University of Illinois by graduate student Michael Sveda. Michael Sveda was working in the lab on the synthesis of anti-fever medication. He put his cigarette down on the lab bench, and, when he put it back in his mouth, he discovered the sweet taste of cyclamate.

Aspartame was discovered in 1965 by James M. Schlatter, a chemist working for G.D. Searle & Company. Schlatter had synthesized aspartame in the course of producing an antiulcer drug candidate. He accidentally discovered its sweet taste when he licked his finger, which had become contaminated with aspartame, to pick up a piece of paper.

Acesulfame Potassium was developed after the accidental discovery of a similar compound (5,6-dimethyl-1,2,3-oxathiazin-4(3H)-one 2,2-dioxide) in 1967 by Karl Clauss and Harald Jensen at Hoechst AG. After accidentally dipping his fingers into the chemicals that he was working with, Clauss licked them to pick up a piece of paper. (do we see a pattern here?)

Sucralose was discovered in 1989 by scientists from Tate & Lyle, working with researchers Leslie Hough and Shashikant Phadnis at Queen Elizabeth College (now part of King's College London). While researching ways to use sucrose as a chemical intermediate in non-traditional areas, Phadnis was told to test a chlorinated sugar compound. Phadnis thought that Hough asked him to taste it, so he did. He found the compound to be exceptionally sweet.

It’s pretty much only neotame and alitame that were discovered through a structured design process.