Friday, December 20, 2013

Public funding of databases

 I've been meaning to comment on this ever since I read about it. The grants that support TAIR are running out and they are going to switch to a subscription fee system. Some formerly open databases such as KEGG have already partially or completely moved their data behind a paywall, and it seems to be a better option than closing up shop completely. You can't really blame the administrators, they're just doing what they have to do to keep the information available to those who need it most. And I'm not particularly worried about my own access to TAIR because given number of labs working on Arabidopsis at my institution I'm sure they'll get a subscription. What I'm worried about is the fragmentation of data, groups may have trouble justifying paying for access to a database that is outside their core specialty, even if the database may be of great help to one or two particular projects. It also has an impact on reproducibility, researchers are more likely to verify, and then follow up and build on previous studies if the data behind those studies is freely available. Finally, I think paywalls will discourage scientists from using database resources even when they might be part of the best and most efficient way to approach a problem. Science is about exploring, and the easier and cheaper it is to explore different approaches to a problem the better the resulting research will be.

I don't know what the best solution to this problem is. Maybe user fees really are the way to go. It seems like one way or another the grant agencies are going to end up paying for database maintenance, either through direct support or through support of the projects that have to pay subscription fees. Maybe database grants should include some kind of an endowment to keep the information open for many years. Whatever the case, I think it is a reflection of the sad state society that important resources like TAIR, with users across the country and world who are using it as a tool to help improve peoples lives, have to struggle to find find funding, meanwhile the NSA has a black hole budget to fund massive spy databases where the vast majority of the data will never be looked at or used by anyone at all. The American government has some strange priorities, and I don't think representatives spend enough time worrying about opportunity costs.

Thursday, December 19, 2013

Combining the contents of multiple word files with a win32 com Python script

A former member of the lab used to keep his lab notebook as word files. One word file per day. For 8 years. Searching through these files was a real pain, so I decided to try to combine them all into one single massive file that will hopefully be easier to search through and convert to other formats.

anaconda python 3.3 on Windows

I've been using python 2.7, however it seems that now most of the libraries I use are compatible with 64-bit python 3.3 (for some, such as Biopython, there is no official python 3.3 binary, but they are available here). The one package that keeps me from breaking away from python 2.7 for good is Gurobi, which still has no python 3 support.

My main operating systems are Windows 7 and Windows 8, when I need Linux (which is often) I run it through VirtualBox.

After a few years of playing around with different Python distributions, I've found Anaconda python to be the least frustrating option, it also makes it easy to switch between two versions of Python.

Wednesday, September 25, 2013

Learning about DNA topology by playing with string

The only way I was ever able to get any kind of intuitive feel for DNA topology was to play with string, twisting it, wrapping it, untwisting it etc.
I made a video demonstrating some ways model DNA structures with string, so grab some string and follow along!

(also my first foray into youtube, I'm not sure if or when I'll ever post anything else there though, but I think this turned out ok)

Friday, September 6, 2013

python crossplatform handling of wildcard command line arguments

Windows and Linux shells handle wildcard arguments differently. Linux (at least under bash) expands wildcard arguments before passing them to a program. Windows passes them without expanding them. This leads to trouble if you want to write a command line utility that will work correctly in both Linux and Windows (or even one compatible with wildcard arguments at all in Windows).
In Python, the typical way to expand a wildcard is with the glob module.
either: glob.glob to return a list, or glob.iglob to return an iterator (which may be preferable if a large list is expected).

Here's a solution that uses the argparse and glob modules:
import argparse  
from glob import glob  
def main(file_names):  
    print file_names  
if __name__ == "__main__":  
    parser = argparse.ArgumentParser()  
    parser.add_argument("file_names", nargs='*') 
    #nargs='*' tells it to combine all positional arguments into a single list  
    args = parser.parse_args()  
    file_names = list()  
    #go through all of the arguments and replace ones with wildcards with the expansion
    #if a string does not contain a wildcard, glob will return it as is.
    for arg in args.file_names:  
        file_names += glob(arg)  
One caveat is that I have noticed that python and bash don't sort the expanded lists in the same way, so if for some reason you need deterministic sorting of input, you should sort the resulting list yourself.

see also:

Thursday, August 22, 2013

Instantiating generic BioCyc (MetaCyc) reactions

Most databases of metabolic reactions include reactions that are generic. That is, some of the reactants and/or products are not specific molecules, but classes of molecules. This is a wonderfully abstract way of representing a large number of chemical transformations with a simple notation.
An example of a generic reaction is the CARBOXYLESTERASE-RXN in MetaCyc, which has the formula:  a carboxylic ester + water <=> an alcohol + a carboxylate + a proton

As useful and elegant as generic reactions are, they present a problem for people trying to generate mathematical models from a database.  The problem is that most mathematical modeling frameworks do not know how to deal with generic reactions.*  So reactions in mathematical models should generally be balanced and unambiguous.

Latendresse et al. (2012) describe a strategy for generating specific reactions ("instances") from generic reactions from the BioCyc family of databases.  Their strategy is to enumerate all possible combinations of reactants and products for a generic reaction, then check for mass balance. The set of mass balanced instances is further filtered by removing instances that include a reactant or product that appears in more than one balanced instance, as such instances are regarded as ambiguous. They also treat polymerization reactions as a special case and handle them differently from other instantiations.

Thursday, July 25, 2013

Is frivilous spending good for the economy?

Basically it comes down to how you define that vague concept of a 'good economy', if you define it as sheer material/monetary throughput in a particular day or month, then it is hard to argue that indiscriminate consumption isn't good for the "economy". But I tend to define a "good economy" as in terms of the health, happiness, and longevity of the people within that economy. And I think that politicians, advertisers, and economists (perhaps not economists), and the people who pay attention to them and are influenced by them, implicitly hold a definition of a "good economy" that is similar to my definition. When politicians (or whoever) promise to "stimulate" the economy, what they mean (or at least what I think most people take them to mean) is that they will cause people to produce more stuff and exchange more money and therefore quality of life will increase. If they only meant the first part without the second part, that is: "production will increase but quality of life will stay the same or decrease", then there would be no reason to elect them to office, follow their advice, or pay any attention to them because what they would essentially be saying is "I promise you will work harder but not be any happier." And there would be nothing "good" about that kind of an economy. So when people talk about "stimulating" the economy, I'm sure they do mean "causing people to produce more stuff and exchange more money," but there is a widespread assumption (a kind of cultural myth) that more production/consumption/money_exchange (regardless of what exactly is being produced and consumed) leads to a higher quality of life. But on examination, that assumption is revealed to be totally false, in fact it matters a great deal what is being produced/consumed, and how much time is being spent to produce it. In a truly "good" economy, we'd all work very few hours, enough to produce whatever food and technology we need to be happy and healthy, and spend the rest of our time doing whatever we enjoy, which I think for many people would not include making things or handing money around (there is a counter argument that could be made here, contending that excess leisure time would mean wasted potential, i.e. that people could be spending their time improving medical technology etc. instead of just taking it easy, but that argument would still have to agree with the main point I'm making here, which is that producing and consuming "frivolous" consumer goods is bad for everyone). To determine whether a particular product or service is frivolous, you only have to ask yourself one question "would I be any less happy/healthy/long-lived if I had just given that person my money in exchange for nothing rather than in exchange for whatever they had to spend time producing?" In light of the above discussion, it can be seen that the other person would be better off had they not had to waste their time producing stuff for you, so if you would be not be proportionately worse off without the item, then you have just spent frivolously and not helped anyone at all (notably, nearly all government military spending falls into this category, but that's another discussion entirely...), not even the economy.

(as is probably obvious from the style here, this was written for the purpose of arguing with someone on Facebook, not as an essay for a class like the previous four posts)

Saturday, July 20, 2013

Smaller is Better: Iceland and the American Dream

           Iceland is a small and ancient nation, first colonized near the year 870 A.D. (1), that has preserved its culture and language for more than a millennium and, despite the recent turmoil, today enjoys one of the highest standards of living in the world.  The United States by contrast is a huge nation, the third largest population in the world (2).  Despite their differences, the U.S. and Iceland have many parallels in terms of why they were founded and the goals of the people who subsequently moved there: the Icelandic Dream of the Viking immigrant is similar to the American Dream of the pilgrim, the poor European, the Chinese laborer, and all the other immigrants to America.  Both nations were founded by independent minded people seeking new lands for farming, opportunities for upward social mobility, and an alternative to the oppressive monarchical societies of their countries of origin (3, 4 p. 89).  Both established representative government at times when such governments were rare in the world.  Iceland’s low population has been both an asset and a detriment to its ability to facilitate the Icelandic Dream.  The U.S. is in the fortunate position of having an overarching federal government with subordinate Iceland-sized municipal and state governments.  By taking into account the ways in which small government has been a boon for Iceland as well as the ways in which it has been a challenge, the U.S. can more optimally divide power among its different levels of government.

Iceland: The Problem of Sustainability in an International Economy

In a world of rapidly increasing numbers of people and rapidly decreasing amounts of natural resources, sustainable resource management is one of the most important problems currently facing humanity.  The country of Iceland is a small country that, throughout its history, has been forced to come to terms with the limitations of its own resources (1).  In some ways, Iceland can be viewed as a microcosm for the world, except, for the time being, on a more extreme level.  While in much of the developed world the problem of unsustainability is not yet a pressing issue in people’s everyday lives, in Iceland the entire national economy hinges on the continued health of its fragile fisheries, so stewardship of marine resources is a top priority for all Icelanders.  Another way Iceland is a smaller, more extreme model for the world is in the recent economic crash, which hit Iceland particularly hard.  Some Icelandic strategies for sustainability may be applicable on a more global scale, but in the long term they will prove insufficient for both Iceland and for the world.

Joan of Arc and Martin Luther

            Their births separated by 71 years and about 400 miles, Joan of Arc and Martin Luther may seem an unlikely pair to compare.  Joan was an illiterate who was executed at the age of 19; Luther was an academic who knew at least four languages and at 19 had no idea he was destined to become a monk, let alone a monk who would change the world.  Despite their differences, Luther and Joan were similar in their unshakable faith in God and their dogged determination to follow God’s will rather than the will of earthly authorities.

Was Machiavelli a Christian?

Machiavelli was certainly a humanist (in the Renaissance sense) and possibly a Christian, but probably not a Roman Catholic.  Machiavelli wrote as though he were a Christian and respected the spiritual authority of the papacy.  His short work An Exhortation to Penitence, in which he espouses humility, charity, and obedience to God, at first glance seems to be a clear endorsement of Christianity.  But for Machiavelli, endorsement does not imply adherence.  Machiavelli wanted desperately to see Italy unite and return to the glories of the Roman republic.  Encouraging religious unity was one way he could contribute towards that goal.

College Essays

I was an undergraduate once and had to take some non-science related courses.  In the seven such courses I took, I wrote a lot of essays, some of which I think would make good reading material.  I'll start editing and posting some of the ones I think are the best (or excerpts from them).  If you're here for computational biology, please excuse the interruption.

Also, I'll preface these by saying I don't necessarily agree with all the opinions in these essays.  Opinions can be such capricious little devils.

Sunday, July 7, 2013

A python port of the nonlinear optimization method of Hooke and Jeeves

I ported the Hooke and Jeeves algorithm to Python from C.  I ported it because I've been writing a Python toolkit for kinetic modeling of biochemical systems.  There are certain modeling methods I want to use that are difficult or impossible to do with COPASI (I'll try to write more about this in a later post), so I'm making a library specific to my needs.  I found the Hooke and Jeeves parameter fitting routine from COPASI to be the most useful for fitting parameters to my current system of interest (Nelder-Mead, and Levenberg-Marquardt also seemed to work pretty well, the other ones not nearly as much).

Wednesday, May 29, 2013

The highschool math club problem that bugged me for entirely too long

 There is a grid of digits and arithmetic operators, certain paths through the grid result in true mathematical statements.  The challenge is to find all of the true statements in the grid.

It's really not that hard of a problem to solve by brute force.  I at the time of the competition I tried writing a program to solve it (in the Blitzmax language, because that was the language I knew best at the time).  But for whatever reason, I couldn't get it to work quite right.

Fast forward a few years, and it was still bugging me that I never solved that problem, so I wrote up a perl script to do it.

which can be found here:

Basically, it's a recursive algorithm that checks every possible path through the puzzle.

The problem can be thought of as an undirected graph. Solutions consist of sets of connected nodes, where each node in the set is passed just once in the traversal.  A set of connected nodes is a solution if the values of the nodes, in order, make a valid mathematical statement.  Even though the graph is not directional, the solutions are.  Because, for example:
15*3 = 45, but 54 != 3*51   
(at first I had written:  12*3 = 36, but 63 != 3*21...   :-P )
In general, the reverse string of a true mathematical equation is usually not true.  So every path, in both directions must be checked.

There are multiple ways to do this, but this algorithm does it by iterating through all of the nodes in the graph (tiles in the puzzle), and finding all possible sets of connected nodes that have that node as the starting point, and which of those sets spell correct equations.

Looking at that perl script now, I see lots of things that make me cringe: for example the regex to parse the input is not quite correct, the data structures I used are probably not the best for this problem (better to preprocess the data into lists probably), and others. Notwithstanding that it does solve the problem it was intended to solve, and since none of the tile-sets given were very large, it solves them instantaneously.

Tuesday, May 28, 2013

The internet as the hive mind of the 21st century: a rationale for blogging

I've learned so much from googling various topics and reading random blogs, forums, and Stack Exchange answers.  I think that the rapid and public sharing of ideas and solutions through the internet is the hallmark of the 21st century.  My hope for this blog is that somebody will find something useful or interesting (or even just entertaining) here, and I can thereby contribute something worthwhile to the hive mind of humanity.