Acknowledgments
My sincere and heartfelt thanks to Dr. Ravi Patel for giving me the opportunity to perform research at the Falk Cardiovascular Center at Stanford University. I thank him for providing insight and direction to my research interests. I am grateful to Daryl Waggott, my primary mentor, who has been a constant pillar of support throughout this incredible experience. Daryl was always eager and readily available to provide answers to the numerous questions that I had with a smile. Thank you so much for putting up with me. Additional thanks to Terra Coakley for her patience and support. I couldn’t have done this without her help and support. I am honored and immensely thankful that the Ashley Lab accommodated a high school junior amidst their overwhelming busy schedules. I am grateful to have been given the chance to perform research in such a world-class institution amongst some of the most well renowned and pioneering scientists in their fields. Thanks to my computer science teacher, Mr. Christopher Kuszmaul, for encouraging my passion to take wings. He has always supported my strong belief that computer science has the power to provide simple solutions to complex global problems. Last but not least, thanks to my wonderful parents and brother for their support and unconditional love. Abstract The purpose of this research is to identify potential new biomarkers for the diagnosis of heart failure. Heart failure, a chronic condition in which the heart is unable to supply sufficient oxygen for the needs of the tissues, is one of the leading causes of mortality in the United States. Finding an informative biomarker is imperative. Though a small number of biomarkers for the diagnosis of heart failure exist, these lack discriminatory power and are not appropriate for all patient populations. Biomarkers are considered superior to other methods of diagnosis because of their objectivity and ability to enable non-invasive diagnosis. This study created a generalizable computational genetics framework that combined publicly available RNA-Seq and microarray datasets pertaining to heart failure. The framework includes pre-processing, statistical analysis steps and features a prioritization algorithm. The R-Programming environment was used. Extensive quality control and validity checks were performed as well as a meta-analysis of the data. The study yielded thirty-eight potential gene candidates, the protein products of which we hypothesize could serve as new biomarkers of heart failure. Of the thirty-eight gene candidates, FHOD3 and DSG2 are most notable due to their secreted nature, known ties to heart conditions, and their high expression values in the plasma of heart failure patients. The findings of this study have several implications as they impact the diagnosis of heart failure for patients with genetic susceptibility to the disease. Better biomarkers, especially those that can help predict treatment options, are the future of this field. These candidates lay the foundation for moving this research forward. Introduction Heart Failure (HF) is a chronic condition in which the heart is unable to supply sufficient oxygen for the normal functions of body tissues. Its commonality and high mortality rate make it a major health problem not only in the United States but also worldwide. As of late, an estimated 23 million people worldwide are living with HF (Gaggin & Januzzi, 2013). Finding non-invasive means for the accurate diagnosis, prognosis and guided therapy of HF has become imperative. Biomarkers are non-invasive objective tools that are indicative of a biological state or condition. Ideal biomarkers are safe, easy to measure, and are consistent across sex, ethnic groups, and other stratifying factors. In the case of HF, biomarkers have impacted the way that the illness is diagnosed and monitored (Gaggin & Januzzi, 2013). Since natriuretic peptides (NPs) were discovered in 1985, they have been established as the best biomarkers of HF (Loncar et al., 2014). Their prognostic abilities and therapeutic values are well established. Although NPS is considered to be the “gold standard” biomarkers of HF, they face several limitations and setbacks. Factors known to influence the clinical interpretation of NPS include age, obesity, atrial arrhythmias, renal failure, and structural heart disease beyond the clinical diagnosis of HF (Loncar et al., 2014). In 2007, The National Academy of Clinical Biochemistry (NACB) set comparable goals in a consensus document that states that a biomarker in HF ideally enables clinicians to: “identify possible underlying (and potentially reversible) causes of HF; confirm the presence or absence of the HF syndrome; and estimate the severity of HF and the risk of disease progression” (Tang et al., 2007). Remarkably, NPS failed to meet the standards set by the NACB. Additionally, no other available biomarkers for HF have satisfied these guidelines. The lack of a sufficient and capable biomarker for HF furthers the need for the discovery of new biomarkers that has discriminatory powers under a variety of demographic and clinical settings. We hypothesize that it is possible to identify candidate biomarkers of heart failure by combining multiple datasets in an analytically structured framework. Materials and Methods This study combines three microarray datasets of studies pertaining to HF with one RNA-Seq HF dataset through the use of an analytically structured framework. It jointly pre-processes and analyzes the data to produce a list of possible biomarkers for heart failure. A joint analysis of datasets from RNA-Seq and Microarray data has proved to be difficult in the past due to the vastly different technical modalities employed by the platforms. This study was able to overcome the barrier of combining RNA-Seq datasets with Microarray datasets, thus maximizing the potential of all the candidate biomarkers produced. The R-Programming language (R Development Core Team, 2008) and several Bioconductor packages were used to perform all statistical analysis steps in the framework created by the study. The R-Programming environment was chosen due to its ease in enabling statistical computation and graphics. It is useful for a large number of statistical procedures. Figure 1 gives a breakdown of the study design and framework created by the study.