Thursday, December 10, 2015

Extracting gene ontology categories using python pandas and rdflib

I've got a table I exported from Blast2GO, which associates transcript IDs with GO term IDs (an "annot" file in Blast2GO terminology). I want to separate the GO IDs by category: biological_process, cellular_component, or molecular_function. (Note: in this example I use rdflib, a faster way to do the same thing would be to use the more specialized goatools, which is probably the best GO library for python. I used rdflib because I wanted practice SPARQL)