Friday, December 20, 2013

Public funding of databases

 I've been meaning to comment on this ever since I read about it. The grants that support TAIR are running out and they are going to switch to a subscription fee system. Some formerly open databases such as KEGG have already partially or completely moved their data behind a paywall, and it seems to be a better option than closing up shop completely. You can't really blame the administrators, they're just doing what they have to do to keep the information available to those who need it most. And I'm not particularly worried about my own access to TAIR because given number of labs working on Arabidopsis at my institution I'm sure they'll get a subscription. What I'm worried about is the fragmentation of data, groups may have trouble justifying paying for access to a database that is outside their core specialty, even if the database may be of great help to one or two particular projects. It also has an impact on reproducibility, researchers are more likely to verify, and then follow up and build on previous studies if the data behind those studies is freely available. Finally, I think paywalls will discourage scientists from using database resources even when they might be part of the best and most efficient way to approach a problem. Science is about exploring, and the easier and cheaper it is to explore different approaches to a problem the better the resulting research will be.

I don't know what the best solution to this problem is. Maybe user fees really are the way to go. It seems like one way or another the grant agencies are going to end up paying for database maintenance, either through direct support or through support of the projects that have to pay subscription fees. Maybe database grants should include some kind of an endowment to keep the information open for many years. Whatever the case, I think it is a reflection of the sad state society that important resources like TAIR, with users across the country and world who are using it as a tool to help improve peoples lives, have to struggle to find find funding, meanwhile the NSA has a black hole budget to fund massive spy databases where the vast majority of the data will never be looked at or used by anyone at all. The American government has some strange priorities, and I don't think representatives spend enough time worrying about opportunity costs.

Thursday, December 19, 2013

Combining the contents of multiple word files with a win32 com Python script

A former member of the lab used to keep his lab notebook as word files. One word file per day. For 8 years. Searching through these files was a real pain, so I decided to try to combine them all into one single massive file that will hopefully be easier to search through and convert to other formats.

anaconda python 3.3 on Windows

I've been using python 2.7, however it seems that now most of the libraries I use are compatible with 64-bit python 3.3 (for some, such as Biopython, there is no official python 3.3 binary, but they are available here). The one package that keeps me from breaking away from python 2.7 for good is Gurobi, which still has no python 3 support.

My main operating systems are Windows 7 and Windows 8, when I need Linux (which is often) I run it through VirtualBox.

After a few years of playing around with different Python distributions, I've found Anaconda python to be the least frustrating option, it also makes it easy to switch between two versions of Python.