Wednesday, April 1, 2015

Clustal Omega not good for nucleotide alignment

Clustal Omega claims that "the quality of alignments is superior to previous versions, as measured by "a range of popular benchmarks." But it always seems to do worse than ClustalW and Kalign, when I try to align nucleotides. I suspect I'm just using the wrong settings, but even after I play with the settings a bit it still gives the same results. So I'm kind of at a loss. I think it may be that it's optimized for alignments of huge numbers of long sequences, and ends up not performing as well on small scale alignments. (If you know what's going on here, please let me know in the comments.)


Here is some example sequences that I tried aligning:

>LS_comp37_c0_seq1_len=2308_path=[1:0-2307]
AAAAAAAAAAAAAAAAAAAAAAATGTTAAAAATCAAAAGACTCTTAACTTACAAGATATG
ATAGTATCATATCATCCATAAATACTGTATGATATCATAATAAGATGCCACATATAACAA
AAATATATAATGTTGCCAAAGACAGAAAGTCACACTCGATAAATTCAAATACATAGTTAA
ATATGCAGAAGGCTCTTGAACTTGAACTTGAACTTGAACTTAATTATTACATGCATAAAA
ATTGAATAGTTCATCCATAAACGACGACAGACTCTCCCTTGTGCTCACGTGGTGGCAAAT
GCCACGTTGTTTGAGCCGATTCTCCTTATTTCATTTTGGTGCTTGAAGAATTACGTACGA
ACTATGCCTTGGAAAAACTTTGGTAGAATTTAAGTAAGTAAACGATGGCTCATCATCATC
TCTCATGCAAAGGGCTCGAATAAGGTTCTGGTCATTTGTTGATGTATTATAGGATGTTGT
GTGCCGTGCCCATCTCCATTATGGTACATCAACTGCGCCATCCTTCCTAAATCAACTGCA
CATCCTATAAAATCTTTGCCAAATGGAGAATCCTTCGACACCCTCTCCGCATTCATCTTC
TTCCACACCTCCGCTATCAGCCATTTCACGTGCTTCCGCGCCTCCGCCTCCGATGCATTG
TAGTCACTCATGTAGCACTGAAGTGATTTCGGCACATCCCCTCTGCTCACCTCTTCCACC
GAGGTTCCCAAATCATCAGCAAGCCGCAGAACGAAGGATGACCAGCGAACTAAATCGTGG
TATTTGTACAAACTGTCGACGGTCTCTTTTGTGAACGAATCTGTTACTCGGAAGAATATG
TGCGTTAACATACAGGGCCCACTTATCGACTGCCATGAGTTCTCCAAATACTCTTCCAAA
CTTGGTTTGTGTCCGCCGTAGAACCACCGTGCCTCTACCATATACTTATCCGCCAAATCC
ACCCACGATTGCCGCAGGTAGGGTATAACGTTGACGCCTTTCTCCTTCATAACATCGTAC
GATGTATCATCGACGAAGTTGTTGAGTGCAAGAAAGCACAGTTGCATGTAATCGGGAAGT
TGGTCGATTGAGTCTATATCCCATCTCCGAATGAGTTCTGTGAATTGTTCGAGTTCTTCT
AAGGTGCCGTAGACATCATAAATATCATCGATCACCGTAATCAGAGCGTTGACTTTGCCC
ATCATTATCCTTGCACTTGCATGCTGACGTGGCTCGATGATCCCAGTATTCCAAAAGTAG
CATTCCACCAGTCTATCCCTTGCGAAGGGCAGCTTCTCAACAAACCCAGTATTTCTCCAC
CACCGGAAGGATTCTTTGAGTTCTTGTTGAAAATGAGCTTGAACAATATTTAAGTCGAGT
ATGGCAAGCTCCAACACTACTGGATTCATGTCGGGCCTCTTCCTATACCATTCGATCCAC
ACAGGTGCATTTGGCCTTTTAATCCTCCAATGAAGTGGGATTTCTAAAGAATATGCGATT
CTTGTTAAAAGGTCGCCATCAACACCACCCTCGTTCACTCTTCCCTCCAAAAATTTGGTG
GCGAATTCCCTCGCTGACTCGAGCGTGGTTTCGCCTTCCGTCAACAGAAAGGAAGCTTCA
TACAGTTGCAACAATCCTCTGGTGTCGTCGCTAAGGCTTTCTTTGAACTCACCCTCCTCG
TTCTTGAAACTGTCGAATACCTCTTGTGCGACTTGAAAACCATGTTCTCTGAGGAGCCTA
AATGCAAGAGATGTGGAGTAGAGATCCCTTTCTTCTTTTGGAAAAGGGTTCTTGTAATAG
TGATGGTCGAGATATATAGAGGACAAGATTTCTTTGAACTCATTCTGGAAATGATCGGAC
AGCCCCATCCTCTGCAAGTCATCAATCAATTCAAGCTGTCGAATTTGATCCGTTTCTTTC
TCCAATTCCATCTTCACCAAAGTGACCAGCTCAGAAGCCCTAATCGCGTGTTTGTACTCC
TTATAATCACTGTGGAGGGATTGGATGAATTCGACATCCCAACGAGAAGGGTTGTAGTTT
CCGGATCGTCTTTCAGTAGTGAGTTGCGAGGAGGAGCAAGACACACGGAGGCGAGATCGA
CTAGTACCCTTAGTGTTAGAGAGCAATTTAGGAGAAGAATTCAAGTGTGAGGGTTGAAGA
TAGGTCGTTAGCTTGCTAGGAATTGGCATTTGCATTGCACCACTAAACACTGTGAGAGCC
ATGATTAATTAATCTTTGCTTTCTCCTTTTCTTCCACTCTCTTTCTATGTTTTTCTAGGA
CGTACTGGTTTCTAAAATTTGAGATGCA
>LS_spe_comp37_c0_seq1
AAAAAAAAAAAAAAAAAAAAAAATGTTAAAAATCGACTCTTAACTTACAAGATATGATAG
TATCATATCATCCATAAATACTGTATGATATCATAATAAGATGCCACATATACAAAAATA
TATAATGTTGCCAAAGACAGAAAGTCACACTCGATAAATTCATACATAGTTATATGCAGA
AGGCTCTTGAACTTGAACTTGAACTTGAACTTAATTATTACATGCAATAATTGAATAGTT
CATCCATAAACGACGACAGACTCTCCCTTGTGCTCACGTGGTGGCAAATGCCACGTTGTT
TGAGCCGATTCTCCTTATTTCATTTTGGTGCTTGAAGAATTACGTACGAACTATGCCTTG
GAAAAACTTTGGTAGAATTTAAGTAAGTAAACGATGGCTCATCATCATCTCTCATGCAAA
GGGCTCGAATAAGGTTCTGGTCATTTGTTGATGTATTATAGGATGTTGTGTGCCGTGCCC
ATCTCCATTATGGTACATCAACTGCGCCATCCTTCCTAAATCAACTGCACATCCTATAAA
ATCTTTGCCAAATGGAGAATCCTTCGACACCCTCTCCGCATTCATCTTCTTCCACACCTC
CGCTATCAGCCATTTCACGTGCTTCCGCGCCTCCGCCTCCGATGCATTGTAGTCACTCAT
GTAGCACTGAAGTGATTTCGGCACATCCTCTGCTCACCTCTTCCACCGAGGTTCCCATCA
TCAGCAAGCCGCAGAACGAAGGATGACCAGCGAACTAAATCGTGGTATTTGTACAACTGT
CGACGGTCTCCTTTGTGAACGAATCTGTTACTCGGAAGAATATGTGCGTTAACATACAGG
GCCCACTTATCGACTGCCATGAGTTCTCCAAATACTCTTCCAAACTTGGCGTGTCCGCCG
TAGAACCACCGTGCCTCTACCATATACTTATCCGCCAAATCCACCCACGATTGCCGCAGG
TAGGGTATAACGTTGACGCCTTTCTCCTTCATAACATCGTACGATGTATCATCGACGAAG
TTGTTGAGTGCAAGAAAGCACAGTTGCATGTAATCGGGAAGTTGGTCGATTGAGTCTATA
TCCCATCTCCGAATGAGTTCTGTGAATTGTTCGAGTTCTTCTAAGGTGCCGTAGACATCA
TAAATATCATCGATCACCGTAATCAGAGCGTTGACTTTGCCCATCATTATCCTTGCACTT
GCATGCTGACGTGGCTCGATGATCCCAGTATTCCAAAAGTAGCATTCCACCAGTCTATCC
CTTGCGAAGGGCAGCTTCTCAACAAACCCAGTATTTCTCCACCACCGGAAGGATTCTTTG
AGTTCTTGTTGAAAATGAGCTTGAACAATATTTAAGTCGAGTATGGCAAGCTCCAACACT
ACTGGATTCATGTCGGCCTCTTCCTATACCATTCGATCCACACAGGTGCATTTGGCCTTT
TAATCCTCCAATGAAGTGGGATTTCTAAAGAATATGCGATTCTTGTTAAAAGGTCGCCAT
CAACACCACCCTCGTTCACTCTTCCCTCCAAAAATTTGGTGGCGAATTCCCTCGCTGACT
CGAGCGTGGTTTCGCCTTCTGTCAACAGAAAGGAAGCTTCATACAGTTGCAACAATCCTC
TGGTGTCGTCGCTAAGGCTCTTTGAACTCACTCCTCGTTCTTGAAACTGTCGAATACCTC
TTGTGCGACTTGAACCATGTTCTCTGAGGAGCCTAAATGCAAGAGATGTGGAGTAGAGAT
CCCTCTTCTTTTCAAAGGGTTCTTGTAATAGTGATGGTCGAGATATATAGAGGACAAGAT
TTCTTTGAACTCATTCTGATGATCGGACAGCCCCATCCTCTGCAAGTCATCGATCAATCA
AGCTGTCGAATTTGATCCGTTTCTTTCTCCAATTCCATCACACCAAAGTGACTAGCTCAG
AAGCCCTAATCGCGTGTTTGTACTCCTTATAATCACTGTGGAGGGATTGGATGAATTCGA
CATCCCAACGAGAAATGTAGTTTCCGGATCGTCTTTCAGTAGTGAGTTGCGAGGAGGAGC
AAGACACACGGAGGCGAGATCGACTAGTACCCTTAGTGTTAGAGAGCAATTTAGGAGAAG
AATTCAAGTGTGAGGGTTGAAGATATGTCGTTAGCTTGCCGTACTGGTTTCTAAAATTTG
AGATGCA
>LS_wat_comp37_c0_seq1
AAAAAAAAAAAAAAAAAAAAAAATGTTAAAAATCGACTCTTAACTTACAAGATATGATAG
TATCATATCATCCATAAATACTGTATGATATCATAATAAGATGCCACATATAACAAAAAT
ATATAATGTTGCCAAAGACAGAAAGTCACACTCGATAAATTCAAATACATAGTTAAATAT
GCAGAAGGCTCTTGAACTTGAACTTGAACTTGAACTTAATTATTACATGCATAAAAATTG
AATAGTTCATCCATAAACGACGACAGACTCTCCCTTGTGCTCACGTGGTGGCAAATGCCA
CGTTGTTTGAGCCGATTCTCCTTATTTCATTTTGGTGCTTGAAGAATTACGTACGAACTA
TGCCTTGGAAAAACTTTGGTAGAATTAAGTAAGTAAACGATGGCTCATCATCATCTCTCA
TGCAAAGGGCTCGAATAAGGCTGGTCATTTGTTGATGTATTATAGGATGTTGTGTGCCAT
GCCCATCTCCATTATGGTACATCAACTGCGCCATCCTTCCCATATCAACTGCACATCCTA
TAAAATCTTTGCCAAATGGAGAATCCTTCGACACCCTCTCTGCATTCATCTTCTTCCACA
CCTCTGCTATCAGCCATTTCACGTGCTTTCGCGCCTCCGCCTCCGATGCATTGTAGTCAC
TCATGTAGCACTGAAGTGATTTCGGCACGTCTCTGCTCACCTCTTCCACCGAGGTTCCCA
AATCATCAGCAAGCCGCAGAACGAAGGATGACCAGCGAACTAAATCGTGGTATTTGTACA
AACTGTCGACCGTCTCCTTTGTGAACGAATCTGTTACTCGGAAGAATATGTGCGTTAACA
TACAGGGCCCACTTATCGACTGCCATGAGTTCTCCAAATACTCTTCCAAACTTGGTTTGT
GTCCGCCGTAGAACCACCGTGCCTCTACCATATACTTATCCGCCAAATCCACCCACGATT
GCCGCAGGTAGGGTATAACGTTGACGCCTTTCTCCTTCATAACATCGTACGATGTATCAT
CGACGAAGTTGTTGAGTGCGAGAAAGCACAGTTGCATGTAATCGGGAAGTTGGTCGATTG
AGTCTATATCCCATCTCCGAATGAGGTCTGTGAAGTGTTCGAGTTCTTCTAAGGTGCCGT
AGACATCATAAATATCATCGATCACCGTAATCAGAGCGTTGACTTTGCCCATCATTATCC
TTGCACTTGCATGCTGACGTGGCTCGATGATCCCAGTATTCCAAAAGTAGCATTCCACCA
GTCTATCCCTTGCGAAGGGCAGCTTCTCAACAAACCCAGTATTTCTCCACCACCGGAAGG
ATTCTTTGAGTTCTTGTTGAAAATGAGCTTGAACAATATTTAAGTCGAGTATGGCAAGGT
CCAACACTACTGGATTCATGTCGGGCCTCTTCCTATACGAATCGATCCACACAGGTGCAT
TTGGCCTTTTAATCCTCCAATGAAGTGGGATTTCTAAAGAATATGCGATTCTTGTTAAAA
GGTCGCCATCAACACCACCCTCGTTCACTCTTCCCTCCAAAAATTTGGTGGCGAATCTCG
CTGACTCGAGCGTCGTTTCGCCTTCCGTCAACAGAAAGAAGCTTCATACAGTTGCAACAA
TCCTCTGGTGTCGTCGCTAAGGCTTTCTTTGAACTCACCCTCCTCGTTCTTGAAACTGTC
GAATACCTCTTGTGCGACTTGAAAACCATGTTCTCTGAAGGAGCCTAAATGCAAGAGATG
TGGAGTAGAGATCTTTCTTCTTTTGGAAAAGGGTTCTTGTAATAGTGATGGTCGAGATAT
ATAGAGGACAAGATTTCTTTGAACTCATTCTGGAAATGATCGGACAGCCCCATCCTCTGC
AAGTCATCAATCAATTCAAGCTGTCGAATTGATCCGTTTCTTTCTCCAATTCCATCTTCA
CCAAAGTGACCAGCTCAGAAGCCCTAATCGCGTGCTTGTACTCCTTATAATCACTGTGGA
GGGATTGGATGAATTCGACATCCCAACGAGAAGGGTTGTAGTTTCCGGATCGTCTTTCCG
GTAGTGAGTTGCGAGGAGGAGCAAGACACACGGAGGCGAGATCGACTAGTACCGTTAGTG
TTAGAGAGCAATTTAGGAGAAGAATTCAAGTGTGAGGGTTGAAGATAGGTCGTTAGCTTG
CTAGGAATTGGCATTTGCATTGCACCACTAAACACTGTGAGAGCCATGATTAATTAATCT
CTGCTTTCT
>LS_pep_comp37_c0_seq1
AAAAAAAAAAAAAAAAAAAAAAATGTTAAAAATCAAAAGACTCTTAACTTACAAGATATG
AAGTATCATATCATCCATAAATACTGTATGATATCATAATAAGATGCCACATATAACAAA
AATATATAATGTTGCCAAAGACAGAAAGTCACACTCGATAAATTCAAGTACATAGTTAAA
TATGCAGAAGGCTCTTGAACTTGAACTTGAACTTGAACTTAATTATTACATGCATAAAAA
TTGAATAGTTCATCCATAAACGACGACAGACTCTCCCTTGTGCTCACGTGGTGGCAAATG
CCACGTTGTTTGAGCCGATTCTCCTTATTTCATTTTGGTGCTTGAAGAATTACGTACGAA
CTATGCCTTGGAAAAACTTTGGTAGAATTTAAGTAAGTAAACGATGGCTCATCATCATCA
CCTCTCATGCAAAGGGCTCGAATAAGGTTCTGGTCATTTGTTGATGTATTATAGGATGTT
GTGTGCCGTGCCCATCTCCATTATGGTACATCAACTGCGCCATCCTTCCCATATCAACTG
CACATCCTATAAAATCTTTGCCAAATGGAGAATCCTTCGACACCCTCTCTGCATTCATCT
TCTTCCACACCTCTGCTATCAGCCATTTCACGTGCTTTCGCGCCTCCGCCTCCGACGCAT
TGTAGTCACTCATGTAGCACTGAAGTGATTTCGGCACATCGCCTCTGCTCACCTCTTCCA
CCGAGGTTCCCATCATCAGCAAGCCGCAGAACGAAGGATGACCAGCGAACTATCGTGGTA
TTTGTACAAACTGTCGACGGTCTCCTTTGTGAACGAATCTGTTACTCGGAAGAATATGTG
CGTTAACATACAGGGCCCACTTATCGACTGCCATGAGTTCTCCAAATACTCTTCCAAACT
TGGTTTGTGTCCGCCGTAGAACCACCGTGCCTCTACCATATACTTATCCGCCAAATCCAC
CCACGATTGCCGCAGGTAGGGTATAACGTTGACGCCTTTCTCCTTCATAACATCGTACGA
TGTATCATCGACGAAGTTGTTGAGTGCAAGAAAGCACAGTTGCATGTAATCGGGAAGTTG
GTCGATTGAGTCTATATCCCATCTCCGAATGAGGTCTGTGAATGTTCGAGTTCTTCTAAG
GTGCCGTAGACATCATAAATATCATCGATCACCGTAATCAGAGCGTTGACTTTGCCCATC
ATTATCCTTGCACTTGCATGCTGACGTGGCTCGATGATCCCAGTATTCCAAAAGTAGCAT
TCCACCAGTCTATCCCTTGCGAAGGGCAGCTTCTCAACAAACCCAGTATTTCTCCACCAC
CGGAAGGATTCTTTGAGTTCTTGTTGAAAATGAGCTTGAACATATTTAAGTCGAGTATGG
CAAGCTCCAACACTACTGGATTCATGTCGGGCCTCTTCCTATACGAATCGATCCACACAG
GTGCATTTGGCCTTTTAATCCTCCAATGAAGTGGGATTTCTAAAGAATATGCGATTCTTG
TTAAAAGGTCGCCATCAACACCACCCTCGTTCACTCTTCCCTCCAAAAATTTGGTGGCGA
ATCTCGCTGACTCGAGCGTCGTTTCGCCTTCCGTCAACAGAAAGAAGCTTCATACAGTTG
CAACAATCCTCTGGTGTCGTCGCTAAGGCTTTCTATGAACTCACCCTCCTCGTTCTTGAA
ACTGTCGAATAGCTCTTGTGCGACTTGAAAACCATGTTCTCTGAGGAGCCTAAATGCAAG
AGATGTGGAGTAGAGATCACTTCTTTTGGAAAAGGGTTCTTGTAATAGTGATGGTCGAGA
TATATAGAGGACAAGATTTCTTTGAACTCATTCTGGAAATGATCGGACAGCCCCATCCTC
TGCAAGTCATCAATCAATTCAAGCTGTCGAATTTGATCCGTTTCTTTCTCCAATTCCATC
TTCACCAAAGTGACTAGCTCAGAAGCCCTAATCGCGTGTTTGTACTCCTTATAATCACTG
TGGAGGGATTGGATGAATTCGACATCCCAACGAGAAGGGTTGTAGTTTCCGGATCGTCTT
TCAGGTAGTGAGTTGCGAGGAGGAGCAAGACACACGGAGGCGAGATCGACTAGTACCCTT
AGTGTTAGAGAGCAATTTAGGAGGAAGAATTCAAGTGTGAGGGTTGAAGATAGGTCGTTA
GCTTGCTAGGAATTGGCATTTGCATTGCACCACTAAACACTGTGAGAGCCATGATTAATT
AATCTGCTTT 
 
 
 
 
 
 

No comments:

Post a Comment