Electrical and Computer Engineering
at the University of Maine

 

An Intelligent System for Base Calling

Collaborators:   University of Maine
Funded By :      NATO North Atlantic Treaty Organization, National Science Foundation

Contact:            Mohamad Musavi, Cristian Domnisoru

Project Summary

The Human Genome Project (HGP) is one of the most challenging and important scientific initiatives of the twentieth century. Mapping the entire human genome poses difficult technical problems, while promising to deliver great rewards. Significant progress toward completing these goals has been made in the past decade, with several having already been achieved, for instance, the genetic mapping goals for both the human and the mouse have been met. Progress toward the human and mouse physical mapping goals is steady with sufficient support in place to allow the achievement of these goals ahead of schedule. Now, the question is: can we find a cost-efficient and high-throughput solution, not only for HGP, but also for supporting the critical role that DNA sequencing will continue to play in biological research after achieving the goal of HGP.

Current sequencing technology produces error rates between 3.5% and 6% [1]. This corresponds to at least 35 errors in a 1000 base read. A human operator must correct this error, a time consuming process that takes an average of 10 minutes per 1000 bases. In fact, the cost of human operators makes a big portion of the cost-per-base ratio as the cost of machine and chemicals are decreasing. To improve the cost-effectiveness of DNA sequencing and expedite the process of mapping the human genome, we must greatly improve the accuracy of sequencing and reduce human intervention as much as possible, or even eliminate it.

Another issue worthy of mention is that current base-calling software doesn't match the progress of sequencing hardware. For example, ABI's automated sequencing machines, which are the most commonly used in biological research community, can produce sequence data containing more than 1,000 bases. While, their base-calling software can reliably identify only half of the available bases. The other half contains a large number of errors that is either discarded or has to be corrected by an operator. The proprietary nature of their software makes it impossible for end users to optimize it. Several alternate base-calling software packages which are open to the public have been developed by different groups. Although some of them claim that they have outperformed ABI's software, they only can get about 500 ~ 600 bases of high confidence value from the available 1,000 base data. According to our preliminary study, which will be discussed in detail later on, we found it is possible to get longer high quality segment using our proposed software. This will reduce the necessity of re-sequencing low quality segments, which will greatly save money and time.

To meet the challenges described above, the objective for this proposal is to develop a neuro-fuzzy hybrid software system, which will reduce sequencing costs, increase sequencing throughput and improve accuracy of DNA sequencing. The proposed study will produce error rates on the order of a fraction of 1%. This is possible through the use of novel adaptive learning systems. With the application of artificial neural networks and fuzzy systems, we will initially be able to achieve an error rate below 2%. The addition of adaptive capabilities to the intelligent system will allow the operator to further train the system, driving the error rate below 1%. At the same time, our software is designed to be independent of specific sequencing hardware platform. So, it is easy for our software to accommodate new sequencing technology, such as Capillary Electrophoresis. Our software will also provide each base with a confidence value, which will facilitate the assessment of the local and long-range accuracy of DNA sequence and assembling.

Return to Project Index

[Overview] - [Goals] - [Faculty] - [Students] - [Industrial Partners]
 [Publications] - [Projects] - [Axon] - [Downloads] - [Links] - [Contact INTSYS]