Abstract

Finding a common pattern among nucleic acid sequences in a given database is an important yet relatively difficult problem in computational biology. Such a pattern is useful for describing the characteristics of a certain family of nucleic acid sequences, and can also be used for classification purposes as well as examine the closeness of two organisms. In this paper, we present a global pattern extraction tool named GAPE which can be applicable in computational biology to describe a certain family of nucleic acid sequences with common features. The algorithm utilizes an optimized Genetic Algorithm (GA) framework to drive the evolution of desirable patterns. A specialized pair-wise alignment algorithm is also introduced to efficiently examine the closeness of a sequence to a regular expression pattern. Experimental results using real biological data are shown to indicate the effectiveness of the tool.

Share

COinS