Knowledge Representation Resources for Animal Agricultural Researchers

Objective

While genomic tools like microarrays, SNP chips, linkage maps and HapMaps are increasingly available for agricultural researchers, for the use of these tools to translate to gains in agriculture requires supporting biocomputational resources. The overall objective of this proposal is to link high throughput data sets and existing knowledge about gene function to phenotypes and functions of importance to agriculture. We will achieve this by 
(1) providing targeted Gene Ontology (GO) biocuration for agricultural animals; 
(2) developing computational pipelines to support rapid, functional annotation; and 
(3) developing resources and tools to link phenotypes and traits to functions. The expected outcomes of this proposal are: 
(a) core sets of biocurated data that support modeling of genomic and genetics data; 
(b) computational tools that use this data; 
(c) expanded outreach and training for agricultural researchers who wish to model their genomics and genetics datasets; 
(d) rapid analysis of existing literature to support community biocuration projects; 
(e) computational tools to provide 'first pass' biocuration data for different types of experimental data; 
(f) improved ability for databases to manage and share annotated data; 
(g) improved ability for researchers to link traits to function; and 
(h) omics education for the next generation of agricultural researchers, including minority undergraduate students not traditionally engaged in agricultural research.

More information

NON-TECHNICAL SUMMARY: For the first time in history, biologists have access to technologies that enable them to rapidly generate enormous amounts of data about the genomes of our agricultural species. However, researchers using these technologies now face a major bottleneck in deriving knowledge from data to use it for improving agricultural productivity. Our goal is to enable researchers to accelerate knowledge delivery from research investments by giving them the tools to avoid the current bottleneck. We will do this by linking existing information about how genes work to biological data; developing novel and improved methods for predicting links between our existing knowledge and biological data; and by providing new tools for viewing how biological data relates to different species. The tools, training and resources that we develop are easily extended to other
species. Not only will we provide data and tools but we also provide integrated, practical training for the next generation of US researchers. This training ensures that researchers are able to use the resources we provide. Our training component also specifically targets traditionally under-represented minorities in science and technology. The outcome of this project is that researchers will be able to more effectively and efficiently convert the power of genomic research into gains for use agriculture and consumers. Overall, the impact of our work is the improved ability for researchers to benefit society through improved agricultural systems, renewable energy, aquaculture, human nutrition, food safety and biotechnology. The societal impact of our education initiative is recruitment of minorities to emerging areas of biology via novel education and training opportunities.
APPROACH: Firstly, we will provide targeted Gene Ontology (GO) biocuration for agricultural animals by computationally mining published literature to determine knowledge representation for each agricultural species; using this information to provide targeted manual GO biocuration for key genes being intensively studied in each species; and providing training for agricultural researchers who wish to model their omics data. This fulfills the need for high quality functional annotations to support biological modeling of agricultural animal data. Secondly, we will develop computational pipelines for rapid functional annotation by developing a novel computational 'first pass' annotation pipeline based on extracting data from published literature; scaling up our existing computational pipelines to deal with increasing sequence data that requires functional annotation; and
incorporating training about the use of these new annotation data in our GO training workshops. This aim fulfills the need for new computational annotation methods to address the massive scale of genomics data that agricultural researchers face. Thirdly, we will link phenotypes and traits to functions by developing appropriate comparative genomic browsers to allow simultaneous visualization of genomic data across multiple species; and new resources that use computational text mining and knowledge extraction tools to provide large-scale prediction of candidate genes for known QTLs. These tools will be incorporated into our GO training workshops as they become available. This aim fulfills the need for researchers to integrate genomic and genetic data to determine underlying mechanisms of traits and phenotypes.
PROGRESS: 2012/04 TO 2013/03 Target Audience: Provided functional modeling training for agricultural researchers, postdoctoral associated and students. Provided bioinformatics training for undergraduate students at HBCU. Changes/Problems: While the initial proposal included a focus on developing resources and tools to link phenotypes and traits to functions, current developments with the NSF funded Phenotype RCN project and with developing the iAnimal portal based upon the iPlant Collaborative make this goal somewhat obsolete. To support both of these initiatives, we instead plan to develop animal orientated resources on the iPlant cyberinfrastructure. In addition, we are also modifying the content of our training workshops to respond to changing needs as researchers try to model larger and more complex data sets. We are working to incorporate modules on network
and pathways analysis and to introduce information about how to develop a first pass functional annotation for species that currently have none. What opportunities for training and professional development has the project provided? This project provides PhD training for two students, and provides professional development via training workshops for researchers and postdoctoral associates. In addition, we have developed bioinformatics training resources for undergraduates. How have the results been disseminated to communities of interest? We are dissemninating data and tools directly by providing training workshops for the research community. We also report on our progress at the USDA NRSP8 and NC1170 project meetings. What do you plan to do during the next reporting period to accomplish the goals? (a) targeted Gene Ontology (GO) biocuration for agricultural animals - Continue biocuration
of literature for agricultural animals by developing prioritization lists for additional species (e.g, horse, pig, aquaculture species). Use the existing prioritization lists to expand biocuration efforts for the initial species. (b) developing computational pipelines to support rapid, functional annotation - Develop iTerm mapping files for anatomy, cell/tissue ontologies. (c) developing resources and tools to link phenotypes and traits to functions - We expect that during the next reporting period the focus will change to developing functional analysis tools within the iPlant cyberinfrastructure. While this goal was not included in the initial project outline, it is a natural and logical extension of this proposal that will benefit the broader animal agriculture research community.
PROGRESS: 2011/04/01 TO 2012/03/31 OUTPUTS: Our outputs for Aim 1 include development of the eGIFT tool to analyze existing literature for each species and use this to develop a ranked list of prioritized genes for manual curation. The eGIFT prototype uses chicken as its target species and we are currently evaluating this and expanding our literature to include other species. We are also providing continued and expanded training for agricultural researchers who wish to model their omics data. We recently held a training workshop ("Genomic Annotation and Functional Modeling Workshop") at the Maxwell H. Gluck Equine Research Center, University of Kentucky (15-16 November, 2011). 32 registered participants attended, including graduate students and postdoctoral researchers. We also participated in a mini-workshop to be held at International Plant & Animal Genome XX
(January 14-18, 2012) with the aim of disseminating this knowledge to wider audience within the equine genomics community. Drs McCarthy and Gao offered the first "Introduction to Bioinformatics" class at Alcorn State during the Fall semester. The course is offered as a split level graduate/undergraduate course at both MSU and ASU and forms part of the ASU's Biotechnology Certificate Course. Due to logistics of setting up a new course, this class was only offered to undergraduate students late and only seven undergraduate students enrolled. We are seeking feedback from the initial undergraduate students in this course and expect to be able to expand enrollment during Fall semester of 2012. Students in this class also participated in Community Assessment of Community Annotation with Ontologies (CACAO) initiative. Outcomes for Aim 2 include the development and deployment Genome2Seq, a tool
that rapidly looks up genome co-ordinates generated from RNA-Seq data and returns genes and Gene Ontology (GO) annotation when the co-ordinates map to annotated genes and a fasta sequence files when co-ordinates do not map to previously annotated genes. Genome2Seq is available via the AgBase website and the NRSP8 Bioinformatics website. We are also adapting a previously published method that produces more informative summaries by slimming GO terms based on the experimental data set used. This tool is called AutoSlim and is in the final stages of testing prior to its release on AgBase. Two existing AgBase ID mapping tools (ArrayIDer and AffyIDer) are being reconfigured to handle a larger number of accession types and combined for ease of use. This will enable researchers to more easily access existing tools for functional analysis of large data sets. The existing eGIFT is being expanded
to allow users to request eGIFT analysis of genes that have not yet entered the database. Upon receiving requests, eGIFT identify the gene specific iTerms, create the corresponding gene page and then notify the requestor that the job was completed. Outcomes for Aim 3 are that we are currently investigating how to apply the text-mining tool we have to QTL analysis. PARTICIPANTS: Individuals: Fiona M McCarthy (PI) - worked on developing GO annotation priority lists, tool development and teaching training workshops. Carl J. Schmidt (coPI) - worked on development of eGIFT and how to apply the text-mining tool we have to QTL analysis. K. Vijay Shanker (coPI) - worked on development of eGIFT. C. Oana Tudor (Postdoc) - worked on development of eGIFT. Ashique Mahmood (Postdoc) - worked on development of eGIFT. Teresia Buza (Postdoc) - quality control of GO annotations in AgBase. Cathy Gresham
(RA) - AgBase database management, including tool development and deployment. Tony Arick (RA) - tool development and deployment. Samuel Camacaro - tool development and deployment. Partner organizations University of Delaware is also listed on this project and provides key personnel for the development of eGIFT and comparative genomics tools. Alcorn State University provides training support for undergraduate informatics classes. Texas A&M provides CACAO systems for undergraduate training. University of Arizona provides infrastructure support via iPlant and collaborates on developing tools for avian comparative genomics. Collaborators and contacts Bindu Nanduri (MSU) provides data for molecular interaction for agricultural host-pathogens. Jim Reecy (ISU) provides support via NRSP8 Bioinformatics funding. Training and professional development We provided training workshops for agricultural
researchers (including postdocs and graduate students from University of Kentucky) and equine researchers attending PAG XX. We also provided bioinformatics education for undergraduate and graduate students from MSU and Alcorn State University. TARGET AUDIENCES: Researchers - we are targeting agricultural researchers from US University by providing on-site training workshops about functional modeling of their data sets. Students - we are targeting minority students by partnering with Alcorn State University, a HBCU with >80% enrollment of African-Americans. PROJECT MODIFICATIONS: Not relevant to this project.

Investigators

Shanker, Vijay; Schmidt, Carl J; McCarthy, Fiona M; Gao, Ming; Burgess, Shane

Institution

Mississippi State University

Start date

2011

End date

2014

Funding Source

Nat'l. Inst. of Food and Agriculture

Project number

MIS-391110

Accession number

224891