Paralelização da Geração de Constraints Lists em Alinhamentos Múltiplos de Sequências Genéticas
Resumo
O Alinhamento Múltiplo de Sequências genéticas é uma etapa essencial na resolução de vários problemas da área de Bioinformática. Devido à sua complexidade exponencial, heurísticas são utilizadas. A que obtém os melhores resultados, mas possui o maior custo computacional, é o Alinhamento Baseado em Consistência. Este trabalho apresenta a paralelização da geração da consistência, fase fundamental para esta heurística, determinando qual a melhor política de escalonamento OpenMP a ser utilizada. Os resultados obtidos mostram o excelente desempenho da paralelização proposta, reduzindo o tempo de execução da geração da consistência, para alinhamentos de tamanhos cientificamente relevantes, de mais de 4 horas para menos 9 minutos.Referências
Blazewicz, J., Frohmberg, W., Kierzynka, M., and Wojciechowski, P. (2013). G-MSA A GPU-based, fast and accurate algorithm for multiple sequence alignment. Journal of Parallel and Distributed Computing, 73(1):32–41.
Do, C. B., Mahabhashyam, M. S. P., Brudno, M., and Batzoglou, S. (2005). Prob-Cons: Probabilistic consistency-based multiple sequence alignment. Genome research, 15(2):330–40.
Edgar, R. C. and Batzoglou, S. (2006). Multiple sequence alignment. Current Opinion in Structural Biology, 16(3):368–373.
Feng, D.-F. and Doolittle, R. F. (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. Journal of Molecular Evolution, 25:351–360.
Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E. L., Tate, J., and Punta, M. (2014). Pfam: the protein families database. Nucleic acids research, 42.
Gotoh, O. (1990). Consistency of optimal sequence alignments. Bulletin of Mathematical Biology, 52(4):509–525.
Gudyś, A. and Deorowicz, S. (2017). QuickProbs 2: Towards rapid construction of high- quality alignments of large protein families. Scientific Reports, 7.
Hogeweg, P. and Hesper, B. (1984). The alignment of sets of sequences and the construction of phyletic trees: An integrated method. Journal of molecular evolution, 20:175–86.
Hung, L.-W., Wang, I. X., Nikaido, K., Liu, P.-Q., Ames, G. F.-L., and Kim, S.-H. (1998). Crystal structure of the ATP-binding subunit of an ABC transporter. Nature, 396(6712):703–707.
João Jr, M., Sena, A. C., and Rebello, V. E. F. (2023). On closing the inopportune gap with consistency transformation and iterative refinement. PLoS ONE, 18(7):1–24.
Linaro Limited (2025). Linaro forge user guide. Version 25.0.1.
Mirarab, S. and Warnow, T. (2011). FastSP: linear time calculation of alignment accuracy. Bioinformatics, 27(23):3250–3258.
Notredame, C., Higgins, D. G., and Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1):205 – 217.
The UniProt Consortium (2024). UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Research, 53(D1):D609–D617.
Thompson, J. D., Linard, B., Lecompte, O., and Poch, O. (2011). A comprehensive benchmark study of multiple sequence alignment methods: Current challenges and future perspectives. PLoS ONE, 6(3).
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269.
Wang, L. and Jiang, T. (1994). On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337–348.
Zola, J., Yang, X., Rospondek, S., and Aluru, S. (2007). Parallel T-Coffee: A parallel multiple sequence aligner. ISCA International Conference on Parallel and Distributed Computing and Systems (PDCS), pages 248–253.
Do, C. B., Mahabhashyam, M. S. P., Brudno, M., and Batzoglou, S. (2005). Prob-Cons: Probabilistic consistency-based multiple sequence alignment. Genome research, 15(2):330–40.
Edgar, R. C. and Batzoglou, S. (2006). Multiple sequence alignment. Current Opinion in Structural Biology, 16(3):368–373.
Feng, D.-F. and Doolittle, R. F. (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. Journal of Molecular Evolution, 25:351–360.
Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E. L., Tate, J., and Punta, M. (2014). Pfam: the protein families database. Nucleic acids research, 42.
Gotoh, O. (1990). Consistency of optimal sequence alignments. Bulletin of Mathematical Biology, 52(4):509–525.
Gudyś, A. and Deorowicz, S. (2017). QuickProbs 2: Towards rapid construction of high- quality alignments of large protein families. Scientific Reports, 7.
Hogeweg, P. and Hesper, B. (1984). The alignment of sets of sequences and the construction of phyletic trees: An integrated method. Journal of molecular evolution, 20:175–86.
Hung, L.-W., Wang, I. X., Nikaido, K., Liu, P.-Q., Ames, G. F.-L., and Kim, S.-H. (1998). Crystal structure of the ATP-binding subunit of an ABC transporter. Nature, 396(6712):703–707.
João Jr, M., Sena, A. C., and Rebello, V. E. F. (2023). On closing the inopportune gap with consistency transformation and iterative refinement. PLoS ONE, 18(7):1–24.
Linaro Limited (2025). Linaro forge user guide. Version 25.0.1.
Mirarab, S. and Warnow, T. (2011). FastSP: linear time calculation of alignment accuracy. Bioinformatics, 27(23):3250–3258.
Notredame, C., Higgins, D. G., and Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1):205 – 217.
The UniProt Consortium (2024). UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Research, 53(D1):D609–D617.
Thompson, J. D., Linard, B., Lecompte, O., and Poch, O. (2011). A comprehensive benchmark study of multiple sequence alignment methods: Current challenges and future perspectives. PLoS ONE, 6(3).
Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269.
Wang, L. and Jiang, T. (1994). On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337–348.
Zola, J., Yang, X., Rospondek, S., and Aluru, S. (2007). Parallel T-Coffee: A parallel multiple sequence aligner. ISCA International Conference on Parallel and Distributed Computing and Systems (PDCS), pages 248–253.
Publicado
28/10/2025
Como Citar
JOÃO JR., Mario; SENA, Alexandre C.; REBELLO, Vinod E. F..
Paralelização da Geração de Constraints Lists em Alinhamentos Múltiplos de Sequências Genéticas. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 26. , 2025, Bonito/MS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 145-156.
DOI: https://doi.org/10.5753/sscad.2025.15870.
