Monday, July 4, 2016

Conversion of the Arabidopsis model by Arnold and Nikoloski, 2014, into a model using MetaCyc metabolite names

I think the stoichiometric model of Arabidopsis thaliana by Anne Arnold and Zoran Nikoloski (2014) is a great model. However, they use idiosyncratic and non-standardized metabolite names. I translated these names into MetaCyc compatible names. Here I describe and link to the conversion. The whole conversion table is available on Google Sheets, here.




Arnold-2014 uses "starch1", "starch2", etc. for starch polymer elongation intermediates. There is no good equivalent in MetaCyc, so I just keep the names. Similarly for "Cellulose1" etc.


In MetaCyc, metabolites such as Lipoamide, Acetyldihydrolipoamide, and Succinyldihydrolipoamide, etc are defined as being covalently bound to an enzyme. The free forms of these molecules also exist in metacyc, but are not annotated as being members of any reaction. A good direct substitution from Arnold-2014 to MetaCyc names is not possible. It would be best to split the Arnold-2014 metabolites as one of several MetaCyc metabolites depending on the reaction. Because I want to do a direct substitution, I'll use the free forms from MetaCyc, but a better solution would be to change the reactions as well as the metabolite names.


Oxidized plastoquinone: PLASTOQUINONE-9
Reduced plastoquinone: CPD-12829
Ubiquinone: UBIQUINONE-8
Ubiquinol: CPD-9956 (ubiquinol-8)
Oxidized cytochrome c: Cytochromes-C-OxidizedReduced cytochrome c: Cytochromes-C-Reduced
3-Hydroxy-3-methyl-2-oxobutanoic acid: HMOB (I can't find a metacyc equivalent of this metabolite)



I used a yasmenv script "metabolites_to_metacyc.py" to convert the original model to one with metacyc names:

import pandas as pd
from yasmenv import model

mets = pd.read_table('metabolites.tsv')
m = model.from_tsv("Arnold_2014.tsv")

met_dict = dict()

for (i, r) in mets.iterrows():
  met = r['Metabolite name'][:-3]
  metacyc = r['MetaCyc']
  if met in met_dict:
    if metacyc != met_dict[met]:
      print("inconsistent naming: %s, %s, %s" % (met, met_dict[met], metacyc))
  else:
    met_dict[met] = metacyc

merge_dict = dict()
for (arnold, metacyc) in met_dict.iteritems():
  if metacyc in merge_dict:
    if arnold not in merge_dict[metacyc]:
      merge_dict[metacyc].append(arnold)
  else:
    merge_dict[metacyc] = [arnold]

m2 = m.merge_metabolites(merge_dict, True)

m2.to_tsv('Arnold_metacyc.tsv')



I used flux variability analysis on the original model and the model with the translated metabolite names to verify that the model structure was identical. The FVA solution was exactly the same for both models (except for some differences in rounding of very small numbers). So the translation to metacyc appears to have been successful.

No comments:

Post a Comment