Thursday, August 17, 2017

targetP wrapper for large queries

As far as I know, TargetP is still (17 years after its original publication!) the best software for predicting subcellular localization for plant proteins, and also the location of truncation sites.

Without any modifications, targetp works well with small (by modern standards) queries, of less than 2,000 sequences at a time. But becomes glitchy when running with larger queries, such as the 30k-100k genes that are typical from a plant transcriptome assembly.

To adapt TargetP for larger queries, I wrote a Python script that acts as a wrapper around TargetP, called targetp_all.py. The script works by separating the input into smaller subsets of sequences and running those, and combining the output.

Interface is the same as the original program but with a few additional options. The output is somewhat simplified to be in tab-separated format.

It would also be nice to be able to parallelize the execution of TargetP to run on multiple cores at once, but I haven't attempted this yet. I believe that there will be complications involving conflicting temporary files, that may require careful modification of the original source code.

Source code follows. BioPython is a dependency.

Saturday, April 29, 2017

Mira4 assembly of 454 reads from SRA

I want to make an assembly of the Annona squamosa fruit transcriptome data from this paper (http://dx.doi.org/10.1186/s12864-015-1248-3). They give in the paper a link to a web resource (http://www.annonatranscriptome.nabi.res.in/), but the resource appears to now be defunct, so to get contigs reads, I will have to assemble the reads myself. The reads are from two different cultivars of Annona squamosa, so I'm going to assemble each cultivar separately first, and then if that works, I'll try a combined assembly.

MIRA is a nice, free, software package that can assemble 454 data. I've had success with it before, so that's what I'll use for this project too.


Monday, March 20, 2017

Tips for Methods Development and Optimization in Biology



I haven't posted anything in quite a while. Mostly that's because I haven't written anything that I thought would be of general interest. I once again have a young lab assistant to preach at, so I might as well preach at the world too. Here is what I've written for her about methods development in the biology laboratory.