Tuesday, June 24, 2014

maximum blastx tblastn translation length?


Blastx the sequence below against SwissProt. It has an orf that's 493 amino acids long, but the best hsp it turns up (from a protein that is completely identical to the orf) is 453 a.a. long. The same is true when you do tblastn the other way. However, when you first translate the orf (for instance here) and Blastp against SwissProt, you get an hsp covering the whole orf. What's going on here? Does Blast have a maximum length it stores translations at? Hopefully someday I'll have time to investigate this further, or maybe some knowledgeable stranger will pass by this blog and let me know. (edit: This seems to be a freak occurrence with this particular sequence. I have since noticed other Blastx searches to find longer hsps than this one. There is a truncated version of this protein in the NCBI Protein database so maybe that has something to do with it)
GCTGAAAAAAAGTGTAACGTCTCTAGCGGGACGGAGGTAGTATTTATTAAGTTCTGGCGT
GAGAAATAGGAATAAATCTGGACCACCGATCTTAGTTAGATCTATGGTTGATATTTATTT
AAGCATAGTTACAAGATACAATATGCATGATTTTATTTAAGCATTGATTTTACATTAAAA
AGCACAATCGTATAATAAGCATAATATAATGCGTATTTAAGCATGACTTTGTATAGTTTT
AAACATTATTCAAATTTTCTCACGTGAAGATTTATTTACACACGAACCCTTCCCTTCCGT
GTGTATATGTATATAAATATTTAATCAAGATTGACGTGGAGTAGCAAGCACAAGCAAAGG
AGACTTCTTATGGACTACAAATCCAGGAGCTTCAGTCATGTCCAAATCCTCCGCTCTATC
TCCATTACCCAATCTGAAATCGAACTCGTTTACCAGCTTGGATAGTGCAAGCTCGTATAA
AGCCATCGCAAACGTGGATCCGGGGCACCCTCTTCGACCCGACCCGAACGGAAGCATCTC
AAAATGCAACCCTTTATAGTCTATGCTCGTCTCGAGGAACCTTTCTGGACGAAATTCTTC
GGGATTTTCCCACAACGAGGGGTCTCTCGATATGGCCCAGTTGTTGACCAACACGACCGT
GCCACGTGGGATGTCGTAGCCGAGCATATTGGCGTCTTGAGTCAATTCTCGAGGGAGCAG
AATTGCGAAAGGTGGATGTAGGCGTAGAATCTCCTTGGATACTGCTTTCAGATATGGCAT
CTTGTCCACGTCATCCTCGGTAATCCCACCTTTGTTTCTAGAAACTTCTCGCACCTCGTT
CTGCAAAGTTTTTAGGGTACGCGGGTTTTTTATGAGCTCCGCCATCGTCCACTCTAGAGC
CGCGAAAGTCGTATCGGTTCCGGCAGAAACCATGTCGAAGATTAGAGCTTTGATTACGTC
ATCCTCGACGGGGTCAGTATCTTTACTCTCTCTCTGAAACTGAAGCAATGTGTCTACGAA
ATTCGTTTCATCATCACCCACCTTCTTCCTTCTATATTTTCGAAGAATACCCTCCATTGA
TCCATCCAACTTTGTACCGACTTTTTCCACTTCTGCATCGACGCCATTTATCCGGTTGAT
CCAAGACAGCCATGGAACGTAATCCCCCACGTTGAAACTTCCCAAGAGCTTGATAACCTT
GATCAGAATCCGATTAAAATCATCTCCGCCGTCGCCCTTCCTCCCTAACACCGCCCTGTG
AATTACGCCGTTCGTCAGCGCCATGAACATCTCGCTCAAGTTCACGACCGTCGTCGGCTT
CGATCGCCTGATCTTCTCAATCATAGCCGACGTTTCCTCTTCTCGAATCCCGCCGAACGA
CTGGACCCTCTTAGCGCTGAGCAGCTGCAGCATGCACATGCTCCGCGCGTTGCGCCAGTG
CTCGCCGTAGGGGGCGAAGGCCACGCCCTTGCCGCTGTACATCAGCCTGTCGAAGATGCT
CAGCCTCGGCCTGCTCGCGAAGATCACGTCTTGGTTCTTCATGATCTCACGCGCCGCCGC
CGCTGAGGAGGCCACTAGGACAGGAGCGCTGCCGAAATGGAGTAGCATCACCTCGCCGTA
GCGCTTGGATAAGGAGGTGAAGGAGCGGTGGGAGAGGGCTCCGATCAGGTGGAAATGGCC
GATCACCGGAAGCCTTAATGGAGACGGCGGCGGCCTCTTTCTTGAGGAAAGACTGGACTT
TCGTTTATGGAAAAGGACCGCCAGTAAGATTAAAGAGACAGAGAAAAATACTAGAAGAGC
GGCCATTTTCTCTTCAGTTGAATATAATGATGGACAAGTTCTTGGAGAAGG