Starting Materials:
a bunch of SMILES strings in individual files with the ".smiles" extension (other identifiers can be searched, unfortunately you can't programmatically search a molfile yet, without becoming a ChemSpider "Service Subscriber" whatever that is)
Goal:
get a mol file and chemspider ID for each smiles string
Steps:
First, make a ChemSpider account, and get your "service token" which will be listed in your profile after you make an account.
Now, download the chemspider python API (we could also use a generic SOAP library, or just use "GET" commands with wget, but the API is super convenient and easy to use, so why not use it?). Put the API directory in your PYTHONPATH, and rename the file "private_token_example.py" to "private_token.py", and paste your private token into the correct spot. Now make and exectue chemspider.py
chemspider.py
from ChemSpiPy import chemspipy from glob import glob import os ### settings ### smiles_glob = "mols/*.smiles" smiles_files = glob(smiles_glob) mol_dir = "new_mols" mol_suffix = ".mol" csid_file = "csids.txt" ### make output directories if they don't already exist ### if not os.path.exists(mol_dir): os.makedirs(mol_dir) print(smiles_files) ### iterate through smiles files and grab the data for each one ### with open(csid_file, "w") as csid_out: for smiles_file in smiles_files: with open(smiles_file, "r") as smile: smiles_string = smile.readline() c = chemspipy.find_one(smiles_string) if c is None: print("Warning: could not find chemspider hit for %s" % smiles_file) #if it can't find a hit, print a warning to the console else: csid_out.write(smiles_file + "\t" + c.csid + "\n") #write the csid to a file with open(os.path.join(mol_dir, os.path.basename(smiles_file) + mol_suffix), "w") as mol_out: mol_out.write(c.mol)
references:
http://www.chemspider.com/Search.asmx
https://github.com/mwormleonhard/ChemSpiPy
http://blog.matt-swain.com/post/16893587098/chemspipy-a-python-wrapper-for-the-chemspider-api
No comments:
Post a Comment