Opti-SW: An improved gene sequence alignment algorithm

Opti-SW: An improved gene sequence alignment algorithm

Leixiao Li Jing Gao Yanfeng Liu 

College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China

College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, China

Inner Mongolia Autonomous Region Engineering & Technology Research Center of Big Data Based Software Service, Hohhot 010080, China

Corresponding Author Email: 
gaojing@imau.edu.cn
Page: 
73-85
|
DOI: 
https://doi.org/10.3166/ISI.23.6.73-85
Received: 
| |
Accepted: 
| | Citation

OPEN ACCESS

Abstract: 

This paper aims to improve the speed and complexity of Smith-Waterman (SW) algorithm. For this purpose, the SW algorithm was improved by reducing the complexity and task load of the computation of the scoring matrix without sacrificing the alignment accuracy. Then, the optimized algorithm, denoted as the Opti-SW, was verified through experiment. The results show that the Opti-SW boasts low time complexity, fast computing speed and light computing load. The research findings shed new light on the database search for gene sequences.

Keywords: 

gene sequence alignment, smith-waterman (SW) algorithm, optimization, opti-SW

1. Introduction
2. SW algorithm
3. Improvement of the SW algorithm
4. Experiment and results analysis
5. Conclusions
Acknowledgement

The work is funded in part by the National Natural Science Foundation of China (NSFC) under Grant No. 61462070, the Doctoral research fund project of Inner Mongolia Agricultural University under Grant No. BJ09-44 and the Inner Mongolia Autonomous Region Key Laboratory of big data research and application for agriculture and animal husbandry.

  References

Balaji P., Feng W., Archuleta J., Lin H. (2008). Semantics-based distributed I/O for mpiBLAST. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 293-294. https://doi.org/10.1145/1345206.1345262

Benkrid K., Liu Y., Benkrid A. S. (2009). A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Transactions on Very Large Scale Integration Systems, Vol. 17, No. 4, pp. 561-570. http://dx.doi.org/10.1109/TVLSI.2008.2005314

Cao L., Xu Y., Deng C. (2015). Algorithm for DNA double sequence alignment problem. Application of Computer System, Vol. 24, No. 9, pp. 112-117.

Chen M. (2016). Bioinformatics. Beijing: The Science Publishing Company, pp. 13-18.

Daily J. (2016). Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics, Vol. 17, pp. 124-129. https://doi.org/10.1186/s12859-016-0930-z

Farrar M. (2007). Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, Vol. 23, pp. 156-161. https://doi.org/10.1093/bioinformatics/btl582

Feng B. (2015). Distributed parallel optimization of needleman-wunsch algorithm for double sequence alignment. Hohhot: Inner Mongolia Agricultural University, pp. 38-41.

Feng B., Gao J. (2016). Distributed parallel Needleman-Wunsch algorithm on heterogeneous cluster system. International Conference on Network and Information Systems for Computers.IEEE, pp. 358-361. https://doi.org/10.1109/ICNISC.2015.145

Gao J., Jiao Y., Zhang W. G. (2014). Review of high throughput sequencing sequence alignment. Life Science Research, Vol. 18, No. 5, pp. 458-464.

Jain C., Kumar S. (2014). Fine-grained GPU parallelization of pairwise local sequence alignment. International Conference on High Performance Computing. IEEE Computer Society, pp. 1-10. http://dx.doi.org/10.1109/HiPC.2014.7116912

Li D. W. (2011). Research of Parallel algorithm for sequence alignment based on dynamic programming. Journal of Jinggangshan University (Natural Science Edition), Vol. 32, No. 3, pp. 80-84.

Liu Y., Wang X., Li J., Mao Y., Zhao D. (2012). Research progress of local sequence alignment algorithm and parallel acceleration. Military Medicine, Vol. 36, No. 7, pp. 556-560. Http://dx.chinadoi.cn/10.3969/j.issn.1674-9960.2012.07.018

Liu Y., Wirawan A., Schmidt B. (2013). CUDASW++3.0: Accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, Vol. 14, pp. 117. https://doi.org/10.1186/1471-2105-14-117

Wang C., Li X., Chen P., Wang A., Zhou X., Yu H. (2015). Heterogeneouscloud framework for big data genome sequencing. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 12, pp. 166-178. https://doi.org/10.1109/TCBB.2014.2351800

Wei G., Ma C., Pei S., Wu B. (2009). The accelerating implementation of BLAST with stream processor. IEEE, International Conference on Computer-Aided Industrial Design & Conceptual Design, Caid & Cd. IEEE, pp. 2245-2250. http://dx.doi.org/10.1109/CAIDCD.2009.5375228

Xu B., Li C., Zhuang H., Wang J., Wang Q., Zhou X. (2017). Efficient distributed Smith-Waterman algorithm based on apache spark. IEEE International Conference on Cloud Computing, pp. 608-615. http://dx.doi.org/10.1109/CLOUD.2017.83

Xue Q. F. (2015). Research and application of parallel algorithm for DNA sequence alignment. Shanghai: Shanghai University, pp. 7-11.

Zhang D. Y. (2015). Bioinformatics. Beijing: The Science Publishing Company, pp. 28-43.

Zhao M., Lee W. P., Garrison E. P., Marth G. T. (2013). SSW library: An SIMD Smith-Waterman  C/C++ library for use in genomic applications. Plo S one, Vol. 8, pp. 2138-2142. https://doi.org/10.1371/journal.pone.0082138