Research Article
BibTex RIS Cite

A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms

Year 2018, Volume: 23 Issue: 3, 91 - 102, 26.10.2018
https://doi.org/10.17482/uumfd.425094

Abstract

 In
this study, we present a new web interface for major bioinformatics algorithms and
introduce a novel approximate string matching algorithm. Our web interface executes
major algorithms on the field for the use of computational biologists, students
or any other interested researchers. In the web interface, algorithms come
under three sections: Sequence alignment, pattern matching and motif finding. In
each section, we introduce algorithms in order to find best fitting one for
specific dataset and problem. The interface introduces execution time, memory
usage and context specific results of algorithms such as alignment score. The
interface utilizes emerging open source languages and tools. In order to
develop light and user-friendly interface, all parts of the interface coded
with Python language. On the other hand, Django is used for web interface. Second
contribution of the study is novel A-BOM algorithm, which is designed for
approximate pattern matching problem. The algorithm is approximate matching variation
of Backward Oracle Matching. We compare our algorithm with popular approximate
string matching algorithms. Results denote that A-BOM introduces %30 to %80 short
runtime improvement when compared to current approximate pattern matching algorithms
on long patterns.

References

  • Alluzen, C., Crochemore, M. and Raffinot, M. (1999) Factor Oracle: A New Structure for Pattern Matching, SOFSEM’99: Theory and Practice of Informatics, Lecture Notes in Computer Science, Berlin, 291-306. doi: 10.1007/3-540-47849-3_18
  • Bishop, C. M. (2006) Machine learning and pattern recognition. Information Science and Statistics. Springer, Heidelberg.
  • Boyer, R.S., Moore, J.S and Pratt, W.R. (1977) A Fast String Searching Algorithm, Journal of Molecular Biology, Communications of the ACM, New York, 762-772. doi: 10.1145/359842.359859
  • Burrows, W. and Wheeler, D. J. (1994) A block-sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, Digital Equipment Corporation, California.
  • D'haeseleer, P. (2006) How does DNA sequence motif discovery work?. Nature biotechnology, 24(8), 959-961
  • Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge.
  • Ji, H. and Shendure, J. (2008) Next-generation DNA sequencing, Nature biotechnology volume 26, Nature Publishing Group, London, 1135-1145. doi: 10.1038/nbt1486
  • Knuth, D.E., Morris, J.H and Pratt, W.R. (1977) Fast Pattern Matching in Strings, Journal of Molecular Biology, SIAM Journal on Computing, Philadelphia, 323-350. doi: 10.1137/0206024
  • Langmead, B., and Salzberg, S. L. (2012) Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357.
  • Navarro, R. and Raffinot, M. (2002) Flexible Pattern Matching in String, The press Syndicate of The University of Cambridge, Cambridge.
  • Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, Academic Press Incorporated, London, 443-453. doi: 10.1016/0022-2836(70)90057-4
  • Özcan, G. (2016) Detection of P53 Consensus Sequence: A Novel String Matching With Classes Algorithm, Uludag University Journal of The Faculty of Engineering 21 (2), Bursa, 269-282.
  • Özcan, G., and Ünsal, O. S. (2015). Fast bitwise pattern-matching algorithm for DNA sequences on modern hardware. Turkish Journal of Electrical Engineering & Computer Sciences, 23(5), 1405-1417.
  • Pevsner, J. (2015) Bioinformatics and functional genomics, John Wiley & Sons, UK
  • Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences, Journal of Molecular Biology, Academic Press Incorporated, London, 40-48. doi: 10.1016/0022-2836(81)90087-5

BAŞLICA BİYOİNFORMATİK ALGORİTMALARI İÇİN WEB ARA YÜZÜ VE YENİ OTOMAT TABANLI YAKLAŞIK DESEN EŞLEŞTİRME YAKLAŞIMI

Year 2018, Volume: 23 Issue: 3, 91 - 102, 26.10.2018
https://doi.org/10.17482/uumfd.425094

Abstract

Bu çalışmada temel biyoinformatik algoritmaları
için yeni bir web ara yüzü ve özgün bir yaklaşık desen eşleştirme algoritması
sunmaktayız. Web ara yüzümüz biyologlar, öğrenciler ve ilgili araştırmacılar
için bu alandaki temel algoritmaları çalıştırmaktadır. Web ara yüzünde
algoritmalar üç bölüm altında toplanmaktadır: Dizilim hizalama, desen eşleştirme
ve motif bulma. Her bir bölümde, özgül veri seti ve problemlere en iyi uyan
algoritmanın bulunabilmesi için sonuçlarını karşılaştırabilecekleri algoritmalar
sunulmaktadır. Web ara yüzü çalışma süreleri, hafıza kullanımı ve hizalama
skoru gibi konuya özel sonuçları sunmaktadır. Ara yüz yeni geliştirilen açık
kaynak kodlu dilleri ve araçları kullanmaktadır. Hafif ve kullanıcı dostu bir
ara yüz olması amacıyla ara yüzün tüm kısımları Python dili ile kodlanmıştır.
Diğer yandan web ara yüzü için Django kullanılmıştır. Çalışmanın ikinci
katkısı, yaklaşık desen eşleştirme için tasarlanmış yeni A-BOM algoritmasıdır.
Bu algoritma Backwards Oracle Matching algoritmasının yaklaşık varyasyonudur.
Algoritmamızı popüler yaklaşık desen eşleştirme algoritmaları ile kıyasladık. Sonuçlar,
A-BOM algoritmasını güncel yaklaşık desen eşleştirme algoritmaları ile uzun
desenler üzerinde karşılaştırdığımızda, çalışma süresinde %30 ile %80 arasında kısalma
gelişimi olduğunu göstermektedir. 

References

  • Alluzen, C., Crochemore, M. and Raffinot, M. (1999) Factor Oracle: A New Structure for Pattern Matching, SOFSEM’99: Theory and Practice of Informatics, Lecture Notes in Computer Science, Berlin, 291-306. doi: 10.1007/3-540-47849-3_18
  • Bishop, C. M. (2006) Machine learning and pattern recognition. Information Science and Statistics. Springer, Heidelberg.
  • Boyer, R.S., Moore, J.S and Pratt, W.R. (1977) A Fast String Searching Algorithm, Journal of Molecular Biology, Communications of the ACM, New York, 762-772. doi: 10.1145/359842.359859
  • Burrows, W. and Wheeler, D. J. (1994) A block-sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, Digital Equipment Corporation, California.
  • D'haeseleer, P. (2006) How does DNA sequence motif discovery work?. Nature biotechnology, 24(8), 959-961
  • Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge.
  • Ji, H. and Shendure, J. (2008) Next-generation DNA sequencing, Nature biotechnology volume 26, Nature Publishing Group, London, 1135-1145. doi: 10.1038/nbt1486
  • Knuth, D.E., Morris, J.H and Pratt, W.R. (1977) Fast Pattern Matching in Strings, Journal of Molecular Biology, SIAM Journal on Computing, Philadelphia, 323-350. doi: 10.1137/0206024
  • Langmead, B., and Salzberg, S. L. (2012) Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357.
  • Navarro, R. and Raffinot, M. (2002) Flexible Pattern Matching in String, The press Syndicate of The University of Cambridge, Cambridge.
  • Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of Molecular Biology, Academic Press Incorporated, London, 443-453. doi: 10.1016/0022-2836(70)90057-4
  • Özcan, G. (2016) Detection of P53 Consensus Sequence: A Novel String Matching With Classes Algorithm, Uludag University Journal of The Faculty of Engineering 21 (2), Bursa, 269-282.
  • Özcan, G., and Ünsal, O. S. (2015). Fast bitwise pattern-matching algorithm for DNA sequences on modern hardware. Turkish Journal of Electrical Engineering & Computer Sciences, 23(5), 1405-1417.
  • Pevsner, J. (2015) Bioinformatics and functional genomics, John Wiley & Sons, UK
  • Smith, T.F. and Waterman, M.S. (1981) Identification of common molecular subsequences, Journal of Molecular Biology, Academic Press Incorporated, London, 40-48. doi: 10.1016/0022-2836(81)90087-5
There are 15 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Burak Koca This is me

Gıyasettin Özcan 0000-0002-1166-5919

Publication Date October 26, 2018
Submission Date May 22, 2018
Acceptance Date October 16, 2018
Published in Issue Year 2018 Volume: 23 Issue: 3

Cite

APA Koca, B., & Özcan, G. (2018). A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 23(3), 91-102. https://doi.org/10.17482/uumfd.425094
AMA Koca B, Özcan G. A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. UUJFE. December 2018;23(3):91-102. doi:10.17482/uumfd.425094
Chicago Koca, Burak, and Gıyasettin Özcan. “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23, no. 3 (December 2018): 91-102. https://doi.org/10.17482/uumfd.425094.
EndNote Koca B, Özcan G (December 1, 2018) A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23 3 91–102.
IEEE B. Koca and G. Özcan, “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”, UUJFE, vol. 23, no. 3, pp. 91–102, 2018, doi: 10.17482/uumfd.425094.
ISNAD Koca, Burak - Özcan, Gıyasettin. “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23/3 (December 2018), 91-102. https://doi.org/10.17482/uumfd.425094.
JAMA Koca B, Özcan G. A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. UUJFE. 2018;23:91–102.
MLA Koca, Burak and Gıyasettin Özcan. “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, vol. 23, no. 3, 2018, pp. 91-102, doi:10.17482/uumfd.425094.
Vancouver Koca B, Özcan G. A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. UUJFE. 2018;23(3):91-102.

Announcements:

30.03.2021-Beginning with our April 2021 (26/1) issue, in accordance with the new criteria of TR-Dizin, the Declaration of Conflict of Interest and the Declaration of Author Contribution forms fulfilled and signed by all authors are required as well as the Copyright form during the initial submission of the manuscript. Furthermore two new sections, i.e. ‘Conflict of Interest’ and ‘Author Contribution’, should be added to the manuscript. Links of those forms that should be submitted with the initial manuscript can be found in our 'Author Guidelines' and 'Submission Procedure' pages. The manuscript template is also updated. For articles reviewed and accepted for publication in our 2021 and ongoing issues and for articles currently under review process, those forms should also be fulfilled, signed and uploaded to the system by authors.