Machine learning applications in genetics and genomics pdf
IEEE Xplore Full-Text PDF:Metrics details. Machine learning has demonstrated potential in analyzing large, complex biological data. In practice, however, biological information is required in addition to machine learning for successful application. In the not so distant past, data generation was the bottleneck, now it is data mining, or extracting useful biological insights from large, complicated datasets. In the past decade, technological advances in data generation have advanced studies of complex biological phenomena. In particular, next generation sequencing NGS technologies have allowed researchers to screen changes at varying biological scales, such as genome-wide genetic variation, gene expression and small RNA abundance, epigenetic modifications, protein binding motifs, and chromosome conformation in a high-throughput and cost-efficient manner Fig.
Machine learning and complex biological data
Machine learning , a subfield of computer science involving the development of algorithms that learn how to make predictions based on data , has a number of emerging applications in the field of bioinformatics. Bioinformatics deals with computational and mathematical approaches for understanding and processing biological data . Prior to the emergence of machine learning algorithms, bioinformatics algorithms had to be explicitly programmed by hand which, for problems such as protein structure prediction , proves extremely difficult. This multi-layered approach to learning patterns in the input data allows such systems to make quite complex predictions when trained on large datasets. In recent years, the size and number of available biological datasets have skyrocketed, enabling bioinformatics researchers to make use of these machine learning systems. Genomics involves the study of the genome , the complete DNA sequence , of organisms.
In the past decade, precision genomics based medicine has emerged to provide tailored and effective healthcare for patients depending upon their genetic features. Genome Wide Association Studies have also identified population based risk genetic variants for common and complex diseases. In order to meet the full promise of precision medicine, research is attempting to leverage our increasing genomic understanding and further develop personalized medical healthcare through ever more accurate disease risk prediction models. Polygenic risk scoring and machine learning are two primary approaches for disease risk prediction. Despite recent improvements, the results of polygenic risk scoring remain limited due to the approaches that are currently used. By contrast, machine learning algorithms have increased predictive abilities for complex disease risk.