The ChemAxon program MolConverter, does have the option to turn mobile H perception off. It is also a convenient, easy to use, free, program for converting molecule formats, and had I known about it earlier, I wouldn't have made my previous post because I would have used MolConverter instead.
What MolConverter doesn't do so well is generate 2d coordinates for atoms from InChI strings. Converting an InChI string of a big molecule into an image or a molfile winds up looking pretty bad in complicated regions. It's likely that there are ways to tweak the settings to make nicer renderings, but I haven't figured it out, and I was quite happy with the ChemDraw renderings, so those are the ones I'd like to use.
The problem: Same as the previous post. I still want "normalized" molfiles. Now, just because I can, I also want svg images of all of the molecules after normalization.
The solution: Use ChemAxon MolConverter to generate tautomerically unambiguous InChI strings (and InChI keys and SMILES) from molfiles. Then use pywin32 COM scripts to paste them into ChemDraw and save the resulting molfile. Then use MolConverter to convert the new molfiles into svg images.
Setup: ChemDraw Pro 13 (other versions may work too but I haven't tried any) should be installed. Also install ChemAxon MarvinBeans. It's also helpful to add the MarvinBeans bin directory to your PATH system variable (for me this was "C:\Program Files (x86)\ChemAxon\MarvinBeans\bin").
Steps (for each molfile):
- Run molconvert: molconvert inchi:"FixedH SAbs AuxNone Key" input.mol -o output.inchi
- The FixedH option turns off Mobile H perception, and the SAbs option forces it to use absolute stereochemistry (which is what I want). AuxNone because I don't want to save the coordinates, and ChemDraw can't read that anyways. Key, because I also want the InChI key.
- Run molconvert: molconvert smiles:"u" input.mol -o output.smiles
- u option generates a unique smiles
- Copy the SMILES (smiles seem to end up rendering a little nicer than InChI) string onto the clipboard
- Paste the SMILES string into ChemDraw
- Save the rendered molecule as a new molfile
- Run molconvert: molconvert svg:"chiral_all" normalized.mol -o image.svg
- chiral_all labels the chiral centers with their absolute stereochemistry
# Note: for this program to work, you must be windows and have ChemDraw installed
# Also, this program can't run in the background. When you run it, you should't be doing other things
# that will shift the window focus from chemdraw, or use the clipboard
import subprocess
import os
import win32com.client as win32
import re
import time
import win32clipboard as clipboard
import win32con
### configuration ###
root_path = 'E:\\databases\\molecules'
mol_dir = 'mols'
new_mol_dir = 'new_mols'
inchi_dir = 'inchi'
inchi_key_dir = 'inchi_key'
smiles_dir = 'smiles'
image_dir = 'svg'
sleep_time = 0.5 #how long to wait when the sleep function is called. We need to wait periodically so the script doesn't get ahead of program loading
short_sleep_time = 0.25 #how long to wait between key presses
debug = False #If true then only do one file
### program ###
def sleep():
time.sleep(sleep_time)
#### get all the input file names ####
# maybe just as easy or easier with glob module
all_files = [os.path.join(root, name) for root, dirs, files in os.walk(os.path.join(root_path, mol_dir)) for name in files if name.endswith(('.mol'))]
mol_files = list() #for the whole file name
mol_root_names = list() # for the filename without the extension
for f in files:
m = re.search('^(.+)\.mol$', f)
if m:
mol_files.append(f)
root_name = m.group(1)
# to be more cross platform, you'd want to avoid the backslashes and use os.sep (or whatever it is), but this code
# is pretty inherently bound to Windows, so it's ok to use \ as a literal I think
mol_root_names.append(root_name)
####generate smiles and inchi####
inchi_list = list()
smiles_list = list()
inchi_key_list = list()
for f in mol_files:
mol_string = ""
#In many cases this part won't be necessary, but I have some molfiles that have extraneous data defined in the bonds section that has to be gotten rid of for SMILES conversion to work
with open(os.path.join(root_path, mol_dir, f), "rb") as infile:
lines = infile.readlines()
for line in lines:
line = line.rstrip()
if len(line) == 21:
mol_string += line[0:15]
else:
mol_string += line
mol_string += "\n"
if debug == True:
print mol_string
exec_string = 'molconvert inchi:"FixedH SAbs AuxNone Key"'
process = subprocess.Popen(exec_string, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out, err = process.communicate(mol_string)
if err:
print("InChI conversion error" + " " + f + " " + err)
fields = out.splitlines()
inchi_list.append(fields[0])
inchi_key_list.append(fields[1])
exec_string = 'molconvert smiles:"u"'
process = subprocess.Popen(exec_string, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out, err = process.communicate(mol_string)
if err:
print("SMILES conversion error" + " " + f + " " + err)
smiles_list.append(out)
if debug == True:
break
#### save smiles and inchi to files ####
for (index, name) in enumerate(mol_root_names):
with open(os.path.join(root_path, inchi_dir, name)+".inchi","w") as outfile:
outfile.write(inchi_list[index])
with open(os.path.join(root_path, inchi_key_dir, name)+".inchikey","w") as outfile:
outfile.write(inchi_key_list[index])
with open(os.path.join(root_path, smiles_dir, name)+".smiles","w") as outfile:
outfile.write(smiles_list[index])
if debug == True:
break
#### Use Chemdraw to render structures and save molfiles####
#initialize windows shell (so we can send keyboard commands)
shell = win32.Dispatch("WScript.Shell")
#initialize chemdraw
chemdraw = win32.gencache.EnsureDispatch('ChemDraw.Application') #connect to chemdraw
chemdraw.Visible = True #it's kind of fun to watch...
time.sleep(sleep_time)
def hit_keys(keys): #function to wait, then send a keypress signal
time.sleep(short_sleep_time)
shell.SendKeys(keys,1)
doc = chemdraw.Documents.Add()
doc.Activate()
for (index, name) in enumerate(mol_root_names):
sleep()
print(name)
clipboard.OpenClipboard()
clipboard.EmptyClipboard()
clipboard.SetClipboardData(win32con.CF_TEXT, smiles_list[index])
clipboard.CloseClipboard()
#chemdraw must be the active application, so it can receive keyboard commands
shell.AppActivate("ChemDraw Pro") #this should be the name that appears at the top of the ChemDraw window bar
sleep()
hit_keys("^a") #select all
hit_keys("{Delete}") #delete (if there's anything here we want it gone
hit_keys("%e") #alt+e opens the edit menu
hit_keys("s") #paste special
hit_keys("s") #smiles
sleep()
doc.SaveAs(os.path.join(root_path, new_mol_dir, mol_files[index])) #save the new mol file
if debug == True:
break
doc.Close() #close the window
#### Generate Images ####
for (index, name) in enumerate(mol_root_names):
exec_string = 'molconvert svg:"chiral_all" %s -o %s' % (os.path.join(root_path, new_mol_dir, mol_files[index]), os.path.join(root_path, image_dir, name)+".svg")
process = subprocess.Popen(exec_string, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
out, err = process.communicate(mol_string)
if err:
print(f + " " + err)
if debug == True:
break
Result:
I think it worked pretty well. Below is the rendering of the drug Paclitaxel. It's not perfect, for example you can see where two of the methyl groups in the middle get drawn overlapping part of the ring. But it's still perfectly readable and unambiguous and looks pretty nice. So I think this adventure was successful.
References:
https://www.chemaxon.com/marvin/help/applications/molconvert.html
http://www.chemaxon.com/marvin/help/formats/inchi-doc.html
http://www.chemaxon.com/marvin/help/formats/images-doc.html
http://breathmintsforpenguins.blogspot.com/2014/03/automatingchemdraw-win32-com-scripting.html
No comments:
Post a Comment