Pictured are (from left to right) Cecilia Arighi, Cathy Wu, Hongzhan Huang, Vijay Shanker and Jung-Youn Lee.

Major advances in the fields of molecular biology and genomic technology have generated a wealth of new biological information from the scientific community.

Ease of access to this information and a process for extracting only the information required to answer specific biological questions are critical to future scientific advances.

University of Delaware Prof. Cathy Wu is principal investigator of two new research grants aimed at improving computerized databases to store, organize and index bioinformatics data and to create specialized tools to view and analyze the data.

In particular, Wu’s grants will develop a new bioinformatics research infrastructure to increase understanding of proteins. While genomes provide the genetic blue prints of organisms, the proteomes of all proteins serve as the key mechanical components of life’s processes.

Under a four-year $1.6 million grant from the Advances in Biological Informatics Program of the National Science Foundation, Wu will lead a UD research team to establish a centralized resource the plant research community can use to answer important questions about post-translational modifications (PTMs) — the biochemical modifications of proteins, how they are brought about and what their impacts are to functions of the plant cells and organisms.

The cross-cutting resource will bring together a huge volume of publicly available yet disconnected data from disparate data sources and scientific literature to facilitate basic understanding of fundamental biological processes in plant developments and stress responses. This will be accomplished using an integrative bioinformatics approach to integrate text mining, data mining, data analysis and visualization tools, and databases and ontologies.

“Recent advancements in high-throughput proteomic technologies make our integrative bioinformatics approach timely to study protein PTMs and their networks of enzymes, substrates and interacting partners, which play a pivotal role in signaling pathways in all the kingdoms of life from human, animals and plants, to microbes,” said Wu, Jefferson, Edward G. Jefferson Chair and director of the Center for Bioinformatics and Computational Biology (CBCB) and professor in the departments of Computer and Information Sciences (CIS) and Biological Sciences.

The project is an interdisciplinary research collaboration facilitated by the CBCB. Colleagues Jung-Youn Lee, associate professor of plant and soil sciences, and Vijay K. Shanker, professor of computer and information sciences, serve as co-principal investigators on the project. Bioinformatics scientists Cecilia Arighi, CIS research assistant professor, and Hongzhan Huang, CIS research associate professor, are co-investigators.

“Our development of a text mining system will focus on mining full-length articles for PTM-related information. The work will involve developing new techniques for extracting information from individual sentences and for putting together pieces of information extracted from different sections of an article. Information extracted from tens of thousands of articles will then be made available to biologists as well as data mining programs,” said Shanker, an expert in natural language processing (NLP) and biological text mining.

“The planned web portal will unify the fragmented and sporadic PTM information into a biologically meaningful context, and provide a gateway for plant scientists to search, browse, visualize and explore PTM networks. This is going to be a valuable resource and tool for the plant community to keep up on the deluge of literature and information, simply making that job easy and reliable,” said Lee, who studies how plant cells communicate with each other.

Specifically, Wu’s research team will support the study of four most important PTM’s in plants — phosphorylation, glycosylation, acetylation and ubiquitination. Although the NSF grant focuses on plant PTMs, the bioinformatics framework will also apply to the study of PTM networks important for the understanding of human biology and disease. In particular, conserved protein phosphorylation networks are implicated in multiple diseases, and the phosphorylation enzymes — protein kinases — have been the most popular classes of drug targets for the pharmaceutical industry.

NIH research grant

A separate $3.2 million renewal grant from the National Institute for Health (NIH) targets the development of biomedical ontologies, virtual libraries that organize biological knowledge using a universal language. Ontologies are increasingly important in systems biology research where complex data need to be integrated and scientific data needs to be represented accurately.

This work aims to further develop the Protein Ontology (PRO), a virtual reference library for proteins developed by Wu and her colleagues to create a structured representation of the types of protein classes that reflect evolutionary relatedness, protein forms resulting from alternative splicing, mutations and PTMs, as well as protein complexes, and to precisely define their relationships.

Proteins are central components of biological processes, and as such are key targets for investigation and integration of biomedical knowledge. The PRO ontology will provide a research infrastructure for modeling biological systems, improving the understanding of human disease, and aiding in the identification of potential diagnostic and therapeutic targets. To support the clinical translation of this work, the grant will conduct two clinical driving projects, one for Alzheimer’s disease biomarker identification and one for immune system modeling.

“This grant will help us improve the PRO framework and extend it to new biomedical areas, broaden PRO’s scientific impact and strengthen its community adoption for discovery and reasoning in the health sciences,” said Wu.

Collaborating institutions on the project include Georgetown University, the Jackson Laboratory, the State University of New York and New York University School of Medicine.

About Cathy Wu

Wu, who joined the UD faculty in 2009, is a renowned bioinformatics researcher. Since 2001, she has led the Protein Information Resource (PIR), a major bioinformatics resource that supports genomics, proteomics and systems biology research. At UD, she established the Center of Bioinformatics and Computational Biology (CBCB) to promote, coordinate and support interdisciplinary research and education in bioinformatics and computational biology as a cross-campus initiative.

CBCB launched a master’s program in bioinformatics and computational biology in 2010 and has established a Bioinformatics Core Facility at the Delaware Biotechnology Institute (DBI). A doctoral program in bioinformatics and systems biology is planned for fall 2012.

Prior to joining UD, Wu was professor of oncology and of biochemistry and molecular biology at the Georgetown University Medical Center. She earned her doctoral degree at Purdue University, has served on several advisory boards for agencies such as NIH and NSF, and professional associations such as the Human Proteome Organization (HUPO) and the Association for Computing Machinery (ACM).

Wu has served on 50 conference organizing or scientific committees; published more than 160 peer-reviewed papers, six books and conference proceedings; and given more than 120 invited lectures.

Article by Karen B. Roberts

Photo by Kathy F. Atkinson