Abstract
In-vitro methods for protein structure determination are time-consuming, cost-intensive, and failure-prone. Because of these expenses, alternative computer-based predictive methods have emerged. Predicting a protein's 3-D structure from only its amino acid sequence-also known as ab initio protein structure prediction (PSP)-is computationally demanding because the search space is astronomically large and energy models are extremely complex. Some successes have been achieved in predictive methods but these are limited to small sized proteins (around 100 amino acids); thus, developing efficient algorithms, reducing the search space, and designing effective search guidance heuristics are necessary to study large sized proteins. An on-lattice model can be a better ground for rapidly developing and measuring the performance of a new algorithm, and hence we consider this model for larger proteins (>150 amino acids) to enhance the genetic algorithms (GAs) framework. In this paper, we formulate PSP as a combinatorial optimization problem that uses 3-D face-centered-cubic lattice coordinates to reduce the search space and hydrophobic-polar energy model to guide the search. The whole optimization process is controlled by an enhanced GA framework with four enhanced features: 1) an exhaustive generation approach to diversify the search; 2) a novel hydrophobic core-directed macro-mutation operator to intensify the search; 3) a per-generation duplication elimination strategy to prevent early convergence; and 4) a random-walk technique to recover from stagnation. On a set of standard benchmark proteins, our algorithm significantly outperforms state-of-the-art algorithms. We also experimentally show that our algorithm is robust enough to produce very similar results regardless of different parameter settings.