Breath Mints for Penguins

Tuesday, March 25, 2014

building a django app that uses ZeroMQ: an annotated webliography

Introduction:
I wanted to build a website that allows people to search their data against a database (in the not too distant future, when the website is live I'll link it and the source code and give more of an explanation. Edit: and here it is, the code, the ZeroMQ stuff is mostly at "/nmr/management/commands", and the website ). Each search takes a few seconds, so in order to be able to serve multiple clients at a time, and allow scaling, I wanted to build a system where the main wsgi process does not block, but passes the search request off to another process that puts it in a queue and executes requests in the queue one by one. I ended up following a simple approach using ZeroMQ. There is a scheduler that runs as a thread in the main wsgi process. When the search input view receives a search request, it writes the search parameters into the database and opens a connection to the scheduler thread and passes it the unique ID of the database record storing the search parameters. There are one or more worker processes each running as a subprocess. The workers are permanently attached via a socket to the scheduler. When a worker completes a job, the scheduler sends it the ID of the next job in the queue, the worker executes the job, writes the results in a database table, and tells the scheduler it is ready for another job. There can be many workers attached to the scheduler, so that multiple searches can be run concurrently.

Here then is a list of (some of) the websites I used for reference while writing this program.

ChemDraw ChemAxon synergy

In my previous post, I was complaining that there wasn't any free software with a nice command line interface to reliably convert molfiles to InChI strings, and back again. I also mentioned that there was an issue with the way ChemDraw converts structures to InChI strings that made it unacceptable for my purposes. The technical issue with ChemDraw is that it doesn't preserve the isoform of tautomers. Some of the molecules I'm interested in contain amides that are typically found in the amide form, rather than the imidic acid form. When I copy these molecules as InChI from ChemDraw, and then paste them back in, the amide is changed to the imidic acid form, which I don't want. It turns out that this is due to a feature of the InChI format (a format that is still mostly opaque to me) called "Mobile H Perception", where it simplifies a molecule encoding by not specifying the which tautomer it is (thereby saving 1 bit of information I guess). Many programs have the option to export InChI with Mobile H perception off, which is what I want, but I can't find that option in Chemdraw.

Automating Chemdraw: win32 com scripting with python pywin32

Once again I find myself adventuring in the land of Windows COM scripts. Last time I tried, with mixed success. This time, I'm using it to control ChemDraw to help me convert molecule formats. Anytime I use COM scripts, it feels pretty clumsy, like it would be a lot slicker just to have a command line utility, but in this case I couldn't find anything that seemed like it would work well enough (update: I later found out about ChemAxon Molconverter which is quite nice), so COM scripts it is. Python is currently the language I do general purpose programming in, and pywin32 works quite well for COM scripting.

The problem: I want to take molfiles from multiple sources and normalize them so that they can be rendered with the same style settings by chemdoodle web components. While I'm at it, I'd like to generate SMILES strings, InChI strings, and InChI key strings (I'd also like to generate IUPAC names, but I gave up on that part) for the molecules in these molfiles.

Only listening to music in the right ear while working

I used to have a lot of trouble listening to music and doing homework or programming at the same time. It was distracting and I had trouble thinking while the music was going. Sometimes I'd have it on for tedious parts of a task and then when I got to a part that took a lot of careful thinking, I'd turn it down until I got to another boring part. I probably wasted a lot of time just flipping the music on and off. I also noticed that music with words was much more distracting than music without words, so sometimes I'd just listen to light orchestral music instead of the rock and folk I usually listen to.

In any case, what I noticed eventually is that if I listen to music through a headphone on my right ear only, I find it easier to tune out, and less distracting than when I use both ears, or just my left ear. In fact, I find having music in only the left ear to be rather obnoxious.

Why the right ear? It's hard to know, and with a sample size of one, and an investigation that is hardly double blind (or even single blind), it's not at all likely that I'll ever figure it out for sure. But it strikes me as an interesting phenomenon and there's no harm in speculating so here are some ideas:

It could be that the particular ear is not relevant, and the reason I favor one ear over the other is random. I don't remember quite when this habit started (sometime in the last 3 years I think), or why I originally chose to use only my right ear and not the left. It's possible that it was random, that I thought the music was too loud or distracting so I took one headphone off and it seemed better, so the next time I took the same one off even though it wouldn't have made a difference. However, I vaguely remember that even at the beginning the reason I chose the right ear is that the music was only unpleasant in my left ear, hence there was negative feedback encouraging me to remove the headphone from my left ear, but not from the right.

It may be that my right ear just happens to be slightly less sensitive than my right ear. This isn't at all obvious in everyday life, but perhaps there is some 5% or 20% difference in sensitivity peculiar to me.

My favorite, hypothesis, however is that it has something to do with the division of labor in the brain (and hopefully I don't fall victim to too many pop-neuroscience misconceptions here). I'm right handed, so when I type, use the mouse, or write or draw, as I would while working, my left hemisphere has to work harder than my right hemisphere. The left hemisphere contains the center for language, and it was the words in songs that I found most distracting, so maybe the one level of indirection is enough to prevent the music from unduly interfering with my internal dialog. Stretching a bit more, a lot of what I do is highly analytical (like computer programming, or math and science homework), so if the left brain is the center of logic then I may be relying on it more than on the right brain (when doing so called right-brain centric activities, especially drawing, but also writing poetry, I often listen to music from speakers or with both headphones, so there may be some connection there too).

Google turns up a few other reports of things like this on this, but it's all anecdotal (although I haven't yet tried to search the scientific literature, maybe there is something there, not sure how much of it I'd understand though).

What about you, oh vast teeming readership, do you think this is a real phenomenon? Or is it just stochastic noise (pun intended!) and confirmation bias?

Friday, December 20, 2013

Public funding of databases

I've been meaning to comment on this ever since I read about it. The grants that support TAIR are running out and they are going to switch to a subscription fee system. Some formerly open databases such as KEGG have already partially or completely moved their data behind a paywall, and it seems to be a better option than closing up shop completely. You can't really blame the administrators, they're just doing what they have to do to keep the information available to those who need it most. And I'm not particularly worried about my own access to TAIR because given number of labs working on Arabidopsis at my institution I'm sure they'll get a subscription. What I'm worried about is the fragmentation of data, groups may have trouble justifying paying for access to a database that is outside their core specialty, even if the database may be of great help to one or two particular projects. It also has an impact on reproducibility, researchers are more likely to verify, and then follow up and build on previous studies if the data behind those studies is freely available. Finally, I think paywalls will discourage scientists from using database resources even when they might be part of the best and most efficient way to approach a problem. Science is about exploring, and the easier and cheaper it is to explore different approaches to a problem the better the resulting research will be.

I don't know what the best solution to this problem is. Maybe user fees really are the way to go. It seems like one way or another the grant agencies are going to end up paying for database maintenance, either through direct support or through support of the projects that have to pay subscription fees. Maybe database grants should include some kind of an endowment to keep the information open for many years. Whatever the case, I think it is a reflection of the sad state society that important resources like TAIR, with users across the country and world who are using it as a tool to help improve peoples lives, have to struggle to find find funding, meanwhile the NSA has a black hole budget to fund massive spy databases where the vast majority of the data will never be looked at or used by anyone at all. The American government has some strange priorities, and I don't think representatives spend enough time worrying about opportunity costs.

Thursday, December 19, 2013

Combining the contents of multiple word files with a win32 com Python script

A former member of the lab used to keep his lab notebook as word files. One word file per day. For 8 years. Searching through these files was a real pain, so I decided to try to combine them all into one single massive file that will hopefully be easier to search through and convert to other formats.

anaconda python 3.3 on Windows

I've been using python 2.7, however it seems that now most of the libraries I use are compatible with 64-bit python 3.3 (for some, such as Biopython, there is no official python 3.3 binary, but they are available here). The one package that keeps me from breaking away from python 2.7 for good is Gurobi, which still has no python 3 support.

My main operating systems are Windows 7 and Windows 8, when I need Linux (which is often) I run it through VirtualBox.

After a few years of playing around with different Python distributions, I've found Anaconda python to be the least frustrating option, it also makes it easy to switch between two versions of Python.