Thursday, August 22, 2013

Instantiating generic BioCyc (MetaCyc) reactions

Most databases of metabolic reactions include reactions that are generic. That is, some of the reactants and/or products are not specific molecules, but classes of molecules. This is a wonderfully abstract way of representing a large number of chemical transformations with a simple notation.
An example of a generic reaction is the CARBOXYLESTERASE-RXN in MetaCyc, which has the formula:  a carboxylic ester + water <=> an alcohol + a carboxylate + a proton

As useful and elegant as generic reactions are, they present a problem for people trying to generate mathematical models from a database.  The problem is that most mathematical modeling frameworks do not know how to deal with generic reactions.*  So reactions in mathematical models should generally be balanced and unambiguous.

Latendresse et al. (2012) describe a strategy for generating specific reactions ("instances") from generic reactions from the BioCyc family of databases.  Their strategy is to enumerate all possible combinations of reactants and products for a generic reaction, then check for mass balance. The set of mass balanced instances is further filtered by removing instances that include a reactant or product that appears in more than one balanced instance, as such instances are regarded as ambiguous. They also treat polymerization reactions as a special case and handle them differently from other instantiations.