Paralelização da Geração de Constraints Lists em Alinhamentos Múltiplos de Sequências Genéticas

Mario João Jr.; Alexandre C. Sena; Vinod E. F. Rebello

doi:10.5753/sscad.2025.15870

Mario João Jr. UERJ / UFF
Alexandre C. Sena UERJ
Vinod E. F. Rebello UFF

DOI: https://doi.org/10.5753/sscad.2025.15870

Resumo

O Alinhamento Múltiplo de Sequências genéticas é uma etapa essencial na resolução de vários problemas da área de Bioinformática. Devido à sua complexidade exponencial, heurísticas são utilizadas. A que obtém os melhores resultados, mas possui o maior custo computacional, é o Alinhamento Baseado em Consistência. Este trabalho apresenta a paralelização da geração da consistência, fase fundamental para esta heurística, determinando qual a melhor política de escalonamento OpenMP a ser utilizada. Os resultados obtidos mostram o excelente desempenho da paralelização proposta, reduzindo o tempo de execução da geração da consistência, para alinhamentos de tamanhos cientificamente relevantes, de mais de 4 horas para menos 9 minutos.

Referências

Blazewicz, J., Frohmberg, W., Kierzynka, M., and Wojciechowski, P. (2013). G-MSA A GPU-based, fast and accurate algorithm for multiple sequence alignment. Journal of Parallel and Distributed Computing, 73(1):32–41.

Do, C. B., Mahabhashyam, M. S. P., Brudno, M., and Batzoglou, S. (2005). Prob-Cons: Probabilistic consistency-based multiple sequence alignment. Genome research, 15(2):330–40.

Edgar, R. C. and Batzoglou, S. (2006). Multiple sequence alignment. Current Opinion in Structural Biology, 16(3):368–373.

Feng, D.-F. and Doolittle, R. F. (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. Journal of Molecular Evolution, 25:351–360.

Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E. L., Tate, J., and Punta, M. (2014). Pfam: the protein families database. Nucleic acids research, 42.

Gotoh, O. (1990). Consistency of optimal sequence alignments. Bulletin of Mathematical Biology, 52(4):509–525.

Gudyś, A. and Deorowicz, S. (2017). QuickProbs 2: Towards rapid construction of high- quality alignments of large protein families. Scientific Reports, 7.

Hogeweg, P. and Hesper, B. (1984). The alignment of sets of sequences and the construction of phyletic trees: An integrated method. Journal of molecular evolution, 20:175–86.

Hung, L.-W., Wang, I. X., Nikaido, K., Liu, P.-Q., Ames, G. F.-L., and Kim, S.-H. (1998). Crystal structure of the ATP-binding subunit of an ABC transporter. Nature, 396(6712):703–707.

João Jr, M., Sena, A. C., and Rebello, V. E. F. (2023). On closing the inopportune gap with consistency transformation and iterative refinement. PLoS ONE, 18(7):1–24.

Linaro Limited (2025). Linaro forge user guide. Version 25.0.1.

Mirarab, S. and Warnow, T. (2011). FastSP: linear time calculation of alignment accuracy. Bioinformatics, 27(23):3250–3258.

Notredame, C., Higgins, D. G., and Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology, 302(1):205 – 217.

The UniProt Consortium (2024). UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Research, 53(D1):D609–D617.

Thompson, J. D., Linard, B., Lecompte, O., and Poch, O. (2011). A comprehensive benchmark study of multiple sequence alignment methods: Current challenges and future perspectives. PLoS ONE, 6(3).

Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2):260–269.

Wang, L. and Jiang, T. (1994). On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337–348.

Zola, J., Yang, X., Rospondek, S., and Aluru, S. (2007). Parallel T-Coffee: A parallel multiple sequence aligner. ISCA International Conference on Parallel and Distributed Computing and Systems (PDCS), pages 248–253.