Tags: language

PocketSphinx on android via the NDK

Originally published at The Pædantic Programmer. Please leave any comments there.

While working on my project for the Spring ‘10 “NLP on Mobile Devices” course, I put together a PocketSphinx ndk build. You can pull it down from my git repo:

$ git clone git://colliertech.org/colliertech/PocketSphinx.git

I haven’t written any of the JNI marshaling functions yet, though.

Logs from talk with Daniel & Zalmai

Originally published at The Pædantic Programmer. Please leave any comments there.

I had a phone conversation with Daniel Mills and Zalmai Zahir today in order to vet our test sentences for ling 567. I’ve got to say that Daniel is really showing his worth as a linguist here. I’m happy to bask in his glow and make sure the fonts render right.

Go Daniel! The write-up is going to be awesome for lab4, I’m sure ;)


Lushootseed Test Suite as of 1/29/2010


Font for composing Lushootseed

Originally published at The Pædantic Programmer. Please leave any comments there.

At the recommendation of David Beck, I have installed a TTF font from http://www.languagegeek.com/. It took a few minutes for me to figure out how to get it going, but it was pretty straightforward after that. Here are some quick instructions for those of you running Debian variants such as Ubuntu.

  1. Fetch the zip files from languagegeek.com and unpack:
    $ sudo mkdir -m777 -p /usr/local/share/fonts/truetype/languagegeek/
    $ cd /usr/local/share/fonts/truetype/languagegeek/
    $ for zip in AboriginalSerif.zip AboriginalSans.zip
      wget http://www.languagegeek.com/font/$zip &&
      unzip $zip &&
      rm $zip
    $ sudo chmod -R 755 .
  2. Register these fonts with the system using defoma (the debian font manager):
    $ for font in *.ttf
      sudo defoma-font register truetype $PWD/$font

While performing the defoma registration, I was presented with a number of warning-ish-looking messages for each file processed. For better or worse, I am ignoring them:

No CIDSupplement specified for Dotum-Bold, defaulting to 0.
No CIDSupplement specified for Batang-Regular, defaulting to 0.
No CIDSupplement specified for ZenHei-CNS, defaulting to 0.
No CIDSupplement specified for Batang-Bold, defaulting to 0.
No CIDSupplement specified for ZenHei, defaulting to 0.
No CIDSupplement specified for Dotum-Regular, defaulting to 0.

I used gnome-appearance-properties (System → Preferences → Appearance) to set my document font to Aboriginal Sans:

After telling Chromium that it should use these fonts, it renders all of the Lushootseed characters quite nicely.

Lushootseed characters

Originally published at The Pædantic Programmer. Please leave any comments there.

Here are some of the characters used to represent text in the Lushootseed languages. This is an imperfect representation. There doesn’t seem to be a COMBINING LATIN SMALL LETTER W, so I’m using a second character in these cases. I also can’t find any fonts that render a c with both a caron and a comma.

ʔ – glottal stop
ƛ̕ – barred lamda with comma above, right
á – lower-case a with acute
à – lower-case a with grave
í – lower-case i with acute
ì – lower-case i with grave
č – c with caron
č̓ – c with caron and comma
c̕ – c with comma above, right
ə – lower-case schwa
gʷ – g with raised lower-case w
ǰ – j with caron
kʷ – k with raised lower-case w
k̕ – k with comma above, right
k̕ʷ – k with comma above, right and raised lower-case w
ɬ – lower-case l with stroke
l̕ – l with comma above, right
p̕ – p with comma above, right
qʷ – q with raised lower-case w
q̕ – q with comma above, right
q̕ʷ – q with comma above, right and raised lower-case w
š – s with caron
t̕ – t with comma above, right
ù – u with a grave accent
ú – u with an accute accent
w̕ – w with comma above, right
xʷ – x with raised lower-case w
x̣ʷ – x with raised lower-case w and dot below
y̕ – with comma above, right

I’ve checked this in to the University of Washington Linguistics Department’s subversion server. Ping me for credentials if you care.


A quick update – I’m a grad student!

Originally published at The Pædantic Programmer. Please leave any comments there.

Hey all!

I’m sorry I haven’t been very active with my packages recently. I all-of-a-sudden started grad school and have been swamped with studying. I also started a contract and have been busy trying to learn a new codebase while contributing something other than snark.

I promise I’ll get back to packaging IronRuby and IronPython on Mono for Debian as soon as things start settling down. Getting an A in the class is higher priority, though, sorry…

Don’t worry. I haven’t forgotten about you ;)



PS, I am implementing a Perl library to exercise my understanding of the class. You can follow along at the search.cpan.org page for Lingua::HPSG or by cloning the git repo:

$ git clone git://karma.colliertech.org/colliertech/langparser

Well, that was an eventful day!

Originally published at The Pædantic Programmer. Please leave any comments there.

*whew* I did a bunch of things yesterday. We took our kindergärtner to her first Friday at her new school (and were about 10 minutes tardy. oops). We then took our toddler to a nearby playground with swings and slides and let her expend some energy. After she had been sufficiently exercised, we walked back home, stopping at a coffee shop on the way. The baristo (you call male baristas “baristos,” right? :) ) recognized my MC Frontalot shirt and asked whether I had caught him the previous weekend at PAX. Unfortunately, I have not attended PAX since 2006, but I *did* purchase the tee directly from The Front himself ;)

When we got home, I worked a bit on an English Language parser implementation and then went to the University of Washington to meet with Emily Bender about getting in to the Professional Master’s program in Computational Linguistics. It all looks good, and I even got the good news that the GRE is no longer required!

After the meeting, I headed home and poked at the parser for a little while longer. I then picked Scarlet up from after-school care and brought her home. I then hopped in the car and drove toward Bellevue to meet up with Monty while he’s in town. I over-estimated the amount of time traffic would steal on my way to Bellevue, and had an extra hour to blow. So I dropped by building 41 and shot the IronPython bull with Dino. It turns out he’s got an android phone, too. I told him it was possible to put a debian chroot on it and that he should even be able to ‘apt-get install ironpython’ to his phone soon ;) We talked briefly about the CodePlex Foundation and Sam Ramji’s departure from The Evil Empire. Dino seems skeptical about the project. I don’t have enough information to have much of an opinion. However, it sounds like some folks I trust are involved, so I’m hopeful.

I left MS just in time to make it to the wrong address at the specified time. My phone had just enough juice to call Monty to get the right address and then use the navigation system to find my way there. I wasn’t able to make reservations at the place we intended to go for dinner until 8:15, so we went to the Barnes & Noble for a bit. They only had one NLP book in stock and the examples are all in Python. I should learn that language one of these days… As we were leaving the Pacific Place, Monty mentioned to me that he is on the advisory board for the CodePlex Foundation, and that they have been responsive enough to his input that they changed the Mission statement, at his recommendation, just one day before the Foundation was publicized. He feels that this is a very good direction for Microsoft to be heading.

My brother Chris was kind enough to watch the kids while we went out to dinner. Quick note: he recently graduated from UW with a BA in Electrical Engineering and is looking for work using his acquired knowledge, in case anyone needs one of those ;)

We met up with my wife, Hannah and our friends, Mike & Cynthia at our place. Monty graciously avoided mentioning the terrible state in which our apartment has recently found itself. The kids were super cute and polite and said hi/bye.

Over dinner we discussed building an android app (Monty has one, too ;) ) to automate the process of creating bounties for apps and getting folks to implement them. We also talked about MySQL and MariaDB, of course. Hannah and I recalled my time working for MySQL, Inc. on the MaxDB project and some subtle cultural differences we noticed while traveling. It was interesting getting the inside scoop about the Sun acquisition and some of the recent goings-on in the MySQL/Sun/Oracle world. I wasn’t aware, for instance, that the EU is balking on the merger because of monopoly concerns.

Looking to get Iron* and the DLR into RedHat

Originally published at The Pædantic Programmer. Please leave any comments there.

I sent an email to the Fedora Legal list asking whether they will accept software released under the MS-PL license. My friend and former colleague, Brett Lentz mentioned that he was concerned that the Fedora folks might not accept software released under the MS-PL. So I asked. I also bcc’d a certain troll on said mail so as to get lots of flame mail. I’m practicing to become a master twitterbaiter.

14:43 < cj> wakko666: so… we are building ironruby/ironpython debian packages over on OFTC/#debian-cli
14:43 < wakko666> k
14:43 < cj> meebey just packaged up mono in .deb
14:44 < cj> with some backported patches required to get the DLR language engines running correctly
14:44 < wakko666> k
14:44 < cj> we’re using xbuild to perform the build, thanks to ankit’s recent patches.
14:44 < cj> alarm went off. need to address food.
14:44 < wakko666> i know that mono is already in Fedora.
14:45 < cj> great. any idea what version?
14:45 < wakko666> http://koji.fedoraproject.org/koji/packageinfo?packageID=30
14:45 < cj> we’ll need + some patches. This is pretty bleeding edge, but I expect the fedora packagers are as ‘on it’ as the debian folks
14:46 < wakko666> fedora tends to be a bit further ahead of the curve than the debian folks
14:46 < cj> we can supply them the patches required. they are also being merged into the 2.4 branch, so should be in the next official release
14:46 < wakko666> k.. shouldn’t be a problem.
14:47 < cj> here is the tarball we’re using to build the .deb
14:47 < cj> http://github.com/mletterle/ironruby/tarball/20090805+git.e6b28d27
14:49 < cj> most of the stuff you’ll need as far as build commands go are in debian/rules:
14:49 < cj> I’ve got to finish making lunch for kids ;)
14:49 < cj> back shortly.
14:55 < wakko666> cj: my main concern about packaging ironruby is licensing. Fedora will accept packages under the MS-Shared-Source license [ed: this is not at all true.], but the MS-PL isn’t on their list of acceptable license. [ed: it is now.]
14:58 < cj> wakko666: alrighty. jschementi is the guy to talk with about licensing issues. He’ll be back some time soon, I’m sure
14:58 < wakko666> of course, i can always write the spec file and you guys can host your own rpms, but it would be nice to actually get it into Fedora proper.
14:59 < cj> also, MS-PL is dfsg compliant and OSL-approved. Is it a decision to deny MS-PL or that it just hasn’t been reviewed yet?
14:59 < wakko666> not sure. we’d need to ask on the fedora-legal-list mailing list
14:59 < wakko666> http://fedoraproject.org/wiki/Licensing#SoftwareLicenses
15:00 < cj> alright. at another time. it’s nap time for scarlet and zelda. ;)
15:01 < wakko666> sure thing. if you ping the fedora-legal list, let me know what they have to say.
19:49 < cj> wakko666: firestorm initiated.

CI build server produces Iron Python bins, too!

Originally published at The Pædantic Programmer. Please leave any comments there.

Ivan has updated the build script to produce IronPython binaries as well as IronRuby. Get your fresh DLR-powered, mono-friendly, dynamic language implementations here:


There are now links to source and binary tarballs as well as source zips on each of the build result status messages:


I’ve been putting some effort in to getting the .deb of ironruby put together, as well. Here are the current problems :)

* xbuild in sid is not new enough to successfully build IronRuby
* rake requires a gem called pathname2, which is not otherwise packaged, and debian policy strictly disallows using ‘gem install foo’ during build
* mono 2.4.2 does not include mono-api-diff, which is keeping us from making a .deb of xbuild and friends

Using en_US.UTF-8 locale on Debian/Ubuntu

Originally published at The Pædantic Programmer. Please leave any comments there.

NB: this write-up has been superseded. You should be reading this article instead.

It took me a while to figure out how to make Larry’s postings to #perl6 not break my screen session. Now that I’ve figured it out, I’ll see if I can share it with y’all. In addition to screen and irssi displaying garbage, I also got a lot of these errors (I hope this helps those of you who use google to search for error strings):

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = "en_US.UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
First of all, install the locales package:</p>
$ sudo apt-get install locales

Then re-configure it and mark the tick by ‘en_US.UTF-8′:

$ sudo dpkg-reconfigure locales


At this point, you should be good. If you’re using screen, remember to pass the -U argument when re-attaching. I usually do something like this:

$ screen -rxU irc