Saturday, April 12, 2014

Finding a minimal set of English vocabulary with WordNet, Perl, and GLPK integer programming

Preface:
This was my semester project for the class "Principles of Optimization" which was one of my absolute favorite classes that I took in college. It uses WordNet, which is a fantastic database for looking at relationships between English words, (it seems to be the basis of many online tools such as the Visual Thesuarus that is constantly advertised on dictionary.com, it was fun to see how the graphs I made for this project exactly matched the content of Visual Thesuarus, a website that seems to be just begging for an open-source ripoff...). To access WordNet, I used the Perl library WordNet::QueryData which seemed to work quite well. For the Integer Programming, I had a much harder time, I used the library Math::GLPK which has bugs that confused me for quite a while (I wish I remembered what they were, but I don't, all I remember is a general feeling of frustration with the library), if I were to redo this project, I'd take a different approach, maybe writing lp text files and solving with the glpk command line interface, or maybe using Python to interface with an IP solver. I used Cytoscape to generate the network graphs.

For those just here for example code for WordNet::QueryData, or Math::GLPK,
you can find it in a the perl file that you can download here. All others, Read On!