Tuesday, March 18, 2014

ChemDraw ChemAxon synergy

In my previous post, I was complaining that there wasn't any free software with a nice command line interface to reliably convert molfiles to InChI strings, and back again. I also mentioned that there was an issue with the way ChemDraw converts structures to InChI strings that made it unacceptable for my purposes. The technical issue with ChemDraw is that it doesn't preserve the isoform of tautomers. Some of the molecules I'm interested in contain amides that are typically found in the amide form, rather than the imidic acid form. When I copy these molecules as InChI from ChemDraw, and then paste them back in, the amide is changed to the imidic acid form, which I don't want. It turns out that this is due to a feature of the InChI format (a format that is still mostly opaque to me) called "Mobile H Perception", where it simplifies a molecule encoding by not specifying the which tautomer it is (thereby saving 1 bit of information I guess). Many programs have the option to export InChI with Mobile H perception off, which is what I want, but I can't find that option in Chemdraw.

The ChemAxon program MolConverter, does have the option to turn mobile H perception off. It is also a convenient, easy to use, free, program for converting molecule formats, and had I known about it earlier, I wouldn't have made my previous post because I would have used MolConverter instead.

What MolConverter doesn't do so well is generate 2d coordinates for atoms from InChI strings. Converting an InChI string of a big molecule into an image or a molfile winds up looking pretty bad in complicated regions. It's likely that there are ways to tweak the settings to make nicer renderings, but I haven't figured it out, and I was quite happy with the ChemDraw renderings, so those are the ones I'd like to use.

The problem: Same as the previous post. I still want "normalized" molfiles. Now, just because I can, I also want svg images of all of the molecules after normalization.

The solution: Use ChemAxon MolConverter to generate tautomerically unambiguous InChI strings (and InChI keys and SMILES) from molfiles. Then use pywin32 COM scripts to paste them into ChemDraw and save the resulting molfile. Then use MolConverter to convert the new molfiles into svg images.

Setup: ChemDraw Pro 13 (other versions may work too but I haven't tried any) should be installed. Also install ChemAxon MarvinBeans. It's also helpful to add the MarvinBeans bin directory to your PATH system variable (for me this was "C:\Program Files (x86)\ChemAxon\MarvinBeans\bin").

Steps (for each molfile):
  1. Run molconvert: molconvert inchi:"FixedH SAbs AuxNone Key" input.mol -o output.inchi
    • The FixedH option turns off Mobile  H perception, and the SAbs option forces it to use absolute stereochemistry (which is what I want). AuxNone because I don't want to save the coordinates, and ChemDraw can't read that anyways. Key, because I also want the InChI key.
  2. Run molconvert: molconvert smiles:"u" input.mol -o output.smiles
    • u option generates a unique smiles
  3. Copy the SMILES (smiles seem to end up rendering a little nicer than InChI) string onto the clipboard
  4. Paste the SMILES string into ChemDraw
  5. Save the rendered molecule as a new molfile
  6. Run molconvert:  molconvert svg:"chiral_all" normalized.mol -o image.svg
    • chiral_all labels the chiral centers with their absolute stereochemistry
The code (borrowing heavily from the previous version...):
# Note: for this program to work, you must be windows and have ChemDraw installed
# Also, this program can't run in the background. When you run it, you should't be doing other things
# that will shift the window focus from chemdraw, or use the clipboard

import subprocess
import os
import win32com.client as win32
import re
import time
import win32clipboard as clipboard
import win32con
### configuration ###
root_path = 'E:\\databases\\molecules'
mol_dir = 'mols'
new_mol_dir = 'new_mols'
inchi_dir = 'inchi'
inchi_key_dir = 'inchi_key'
smiles_dir = 'smiles'
image_dir = 'svg'
sleep_time = 0.5 #how long to wait when the sleep function is called. We need to wait periodically so the script doesn't get ahead of program loading
short_sleep_time = 0.25 #how long to wait between key presses
debug = False #If true then only do one file

### program ###
def sleep():

#### get all the input file names ####
# maybe just as easy or easier with glob module
all_files = [os.path.join(root, name) for root, dirs, files in os.walk(os.path.join(root_path, mol_dir)) for name in files if name.endswith(('.mol'))]
mol_files = list() #for the whole file name
mol_root_names = list() # for the filename without the extension
for f in files:
    m = re.search('^(.+)\.mol$', f)
    if m:
        root_name = m.group(1)
        # to be more cross platform, you'd want to avoid the backslashes and use os.sep (or whatever it is), but this code
        # is pretty inherently bound to Windows, so it's ok to use \ as a literal I think

####generate smiles and inchi####
inchi_list = list()
smiles_list = list()
inchi_key_list = list()
for f in mol_files:
    mol_string = ""
    #In many cases this part won't be necessary, but I have some molfiles that have extraneous data defined in the bonds section that has to be gotten rid of for SMILES conversion to work
    with open(os.path.join(root_path, mol_dir, f), "rb") as infile:
        lines = infile.readlines()
        for line in lines:
            line = line.rstrip()
            if len(line) == 21:
                mol_string += line[0:15]
                mol_string += line
            mol_string += "\n"
    if debug == True:
        print mol_string
    exec_string = 'molconvert inchi:"FixedH SAbs AuxNone Key"'
    process = subprocess.Popen(exec_string,  stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
    out, err = process.communicate(mol_string)
    if err:
        print("InChI conversion error" + " " + f + " " + err)
    fields = out.splitlines()
    exec_string = 'molconvert smiles:"u"'
    process = subprocess.Popen(exec_string, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
    out, err = process.communicate(mol_string)
    if err:
        print("SMILES conversion error" + " " + f + " " + err)
    if debug == True:

#### save smiles and inchi to files ####
for (index, name) in enumerate(mol_root_names):
    with open(os.path.join(root_path, inchi_dir, name)+".inchi","w") as outfile:
    with open(os.path.join(root_path, inchi_key_dir, name)+".inchikey","w") as outfile:
    with open(os.path.join(root_path, smiles_dir, name)+".smiles","w") as outfile:
    if debug == True:

#### Use Chemdraw to render structures and save molfiles####
#initialize windows shell (so we can send keyboard commands)
shell = win32.Dispatch("WScript.Shell")

#initialize chemdraw
chemdraw = win32.gencache.EnsureDispatch('ChemDraw.Application') #connect to chemdraw
chemdraw.Visible = True #it's kind of fun to watch...

def hit_keys(keys): #function to wait, then send a keypress signal

doc = chemdraw.Documents.Add()
for (index, name) in enumerate(mol_root_names):
    clipboard.SetClipboardData(win32con.CF_TEXT, smiles_list[index])
    #chemdraw must be the active application, so it can receive keyboard commands
    shell.AppActivate("ChemDraw Pro") #this should be the name that appears at the top of the ChemDraw window bar
    hit_keys("^a") #select all
    hit_keys("{Delete}") #delete (if there's anything here we want it gone
    hit_keys("%e") #alt+e opens the edit menu
    hit_keys("s") #paste special
    hit_keys("s") #smiles
    doc.SaveAs(os.path.join(root_path, new_mol_dir, mol_files[index])) #save the new mol file
    if debug == True:
doc.Close() #close the window

#### Generate Images ####
for (index, name) in enumerate(mol_root_names):
    exec_string = 'molconvert svg:"chiral_all" %s -o %s' % (os.path.join(root_path, new_mol_dir, mol_files[index]), os.path.join(root_path, image_dir, name)+".svg")
    process = subprocess.Popen(exec_string, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
    out, err = process.communicate(mol_string)
    if err:
        print(f + " " + err)
    if debug == True:

I think it worked pretty well. Below is the rendering of the drug Paclitaxel. It's not perfect, for example you can see where two of the methyl groups in the middle get drawn overlapping part of the ring. But it's still perfectly readable and unambiguous and looks pretty nice. So I think this adventure was successful.



No comments:

Post a Comment