Big Data and Systems Biology

The explosion of available clinical and scientific data on microbes including genomics, transcriptomics, proteomics, and metabolomics can be coupled to machine learning for more accurate diagnoses. Implementing big data analytic methods can fundamentally impact antibiotic discovery and infectious disease medicine by elucidating systems level antimicrobial resistance (AMR) and virulence mechanisms.

The Power of Genomics

Thanks to the genomics revolution, tens of thousands of strain-specific whole-genome sequences are now accessible for a wide range of disease-associated bacteria. This availability enables big data informatics and machine learning approaches to be used to study the spread and acquisition of AM). Models that are further linked to bacterial gene expression and metabolism can predict antibiotic with no a priori information about the underlying gene content or resistance phenotypes of the strains, enabling the possibility to identify AMR determinants and rapidly diagnose and prioritize antibiotic use directly from the organism sequence. Employing such tools to diagnose and limit the spread of resistance-conferring mechanisms could help ameliorate the looming AMR crisis.

Finding the Melody Amidst the Noise


Use of genome-scale models enable the formulation of flux balance analysis coupled to genome wide association studies that can relate a multi-gene function (such as a biochemical or gene regulation pathway) to an AMR phenotype. CHARM faculty have also applied Independent Component Analysis (ICA) to bacterial transcriptomes, akin to isolating a single voice from a crowded room or a single instrument from an orchestral ensemble. This tool has extraordinary power of resolution and enormous potential to help us understand the mechanistic basis or an organism's responses to environmental variables such as host immune response or antibiotic exposure.

Opportunities and Challenges

The increasing amount of multi-omics data becoming available for human pathogens brings new data analytic challenges. The NIH has invested significant resources into data storage repositories (e.g., GEO, Genbank, etc.) creating a critical need for data science methodologies capable of fully utilizing this data. The novel data science methods that are being developed by UC San Diego CHARM have been demonstrated to successfully scale to the global number of new genome and transcriptome sequences available; indeed, these methods have exhibited increased utility with scale. Our new data analytic methods are enabling new discoveries in large data sets, leading to new hypotheses and ambitious experimental frameworks that would not otherwise be formulated or contemplated.


Systematic pan-genome analysis highlights novel antibacterial targets

The ESKAPEE group of antibiotic-resistant bacterial pathogens are the leading cause of health care-associated infections worldwide. CHARM investigators pan-genome analysis removal of two-component systems significantly affects the fitness of the cell due to their roles in managing various vital functions such as antibiotic resistance, virulence, biofilms, and quorum sensing. Hence, these are promising targets for novel antibacterials.

Read the mSystems paper here