Wednesday, August 13, 2014

Chemspider python scripting

In the past, I've shown how to script Chemdraw and ChemAxon for the purposes of converting chemical information among different file formats. In this post I'll show how to achieve (vaguely) similar results using the ChemSpider web API. ChemSpider is nice because (unlike SciFinder) it is free, they don't mind if you reuse their data, and they don't mind if you access their website with scripts. Using ChemSpider has the advantage that it is a huge database and provides cross-references to other databases, it has images, mol files, and lots of other data about each compound.




Starting Materials:
a bunch of SMILES strings in individual files with the ".smiles" extension (other identifiers can be searched, unfortunately you can't programmatically search a molfile yet, without becoming a ChemSpider "Service Subscriber" whatever that is)


Goal:
get a mol file and chemspider ID for each smiles string

Steps:
First, make a ChemSpider account, and get your "service token" which will be listed in your profile after you make an account.
Now, download the chemspider python API (we could also use a generic SOAP library, or just use "GET" commands with wget, but the API is super convenient and easy to use, so why not use it?). Put the API directory in your PYTHONPATH, and rename the file "private_token_example.py" to "private_token.py", and paste your private token into the correct spot. Now make and exectue chemspider.py

chemspider.py

from ChemSpiPy import chemspipy
from glob import glob
import os

### settings ###
smiles_glob = "mols/*.smiles"
smiles_files = glob(smiles_glob)
mol_dir = "new_mols"
mol_suffix = ".mol"
csid_file = "csids.txt"

### make output directories if they don't already exist ###
if not os.path.exists(mol_dir):
  os.makedirs(mol_dir)
print(smiles_files)
  
### iterate through smiles files and grab the data for each one ###
with open(csid_file, "w") as csid_out:
  for smiles_file in smiles_files:
    with open(smiles_file, "r") as smile:
      smiles_string = smile.readline()
      c = chemspipy.find_one(smiles_string)
      if c is None:
        print("Warning: could not find chemspider hit for %s" % smiles_file) #if it can't find a hit, print a warning to the console
      else:
        csid_out.write(smiles_file + "\t" + c.csid + "\n") #write the csid to a file
        with open(os.path.join(mol_dir, os.path.basename(smiles_file) + mol_suffix), "w") as mol_out:
          mol_out.write(c.mol)


references:
http://www.chemspider.com/Search.asmx
https://github.com/mwormleonhard/ChemSpiPy
http://blog.matt-swain.com/post/16893587098/chemspipy-a-python-wrapper-for-the-chemspider-api


No comments:

Post a Comment