The Core for Applied Genomics and Ecology: Applications of Next Generation Sequencing to Agricultural Sciences

Objective

Next Generation pyrosequencing platforms (Next Gen for short) rely on PCR amplification of the highly conserved 16S ribosomal RNA gene from complex communities of microorganisms as their basis for quantification. The Next Gen methodology parallelizes the DNA sequencing process, allowing simultaneous sequence analysis of tens of thousands to millions of these PCR products from a single sample. By comparing the sequences from thousands of sequences from a given sample to known databases or by using phylogenetic approaches to assign taxonomic status, the species composition of a sample can be inferred by simply counting the number of sequence reads from a given sample that fall into each taxonomic "bin." Because significant information about the community composition can be gathered from counting the top 99.99% of the population (amounting to 10,000 sequence reads) the capacity of the Next Gen sequencer (up to 1 million sequence reads per run) allows one to pool PCR products from multiple samples into a single run on the machine. The PCR primers for each sample carry a unique 8-base bar-code allowing the sequences to be easily parsed into the appropriate sample bins. While this approach has proven very practical in studies with relatively small numbers of samples (n=20-100), significant hurdles exist in adapting this methodology as a high-throughput phenotyping method, where thousands of samples across large study populations must be processed for a single experiment. The most significant hurdles require automation and standardization in order to produce repeatable data. In addition, the massive amounts of data must also be warehoused and analyzed, a daunting challenge in itself. The objectives of this proposal are therefore to develop and validate methods to overcome these hurdles and ultimately to develop a workflow for high-throughput and standardized microbiome phenotyping. One of the first bottlenecks in high-throughput microbiome or metagenome analyses is sample preparation. Sample prep is currently a multi-step process, including extraction of microbial DNA from fecal samples, target gene amplification (16S rRNA gene), and preparation of amplicons for Next Gen sequencing. Likewise, at the output stage, a second bottleneck exists in processing, and analyzing, the massive amounts of raw 16S rRNA sequence data that are generated, managing the raw data, the processed data, and all of the associated sample information. Processing and analyzing the data requires transforming millions of 16S rRNA sequence reads into a normalized taxonomy, followed by calculation of the relative abundance of each taxon in each sample. The enormous size of these 16S rRNA data sets supersedes the algorithmic capability of many of the current methods for taxonomic assignment, posing significant computational challenges. Lastly, database architecture must be developed which automates filtering, processing, and storage of raw data and automatically associates the data with sample information. This database must be linked to pipelines that transform the sequence data into statistics-ready formats with normalized taxonomy with relative abundances of each taxonomic unit.

More information

Non-Technical Summary: Among the many ecosystems that are becoming objects of metagenomic studies, microbial communities are one of the most attractive because the genetic diversity of organisms has not been well explored. The reason for this is that microbial ecosystems, in particular, have previously only been amenable to ecological analyses with traditional culture-based analytical techniques. Since most microbial species from even relatively simple ecosystems are not culturable, microbial ecosystems are currently not well defined and their characteristics are the least understood. At the same time, the importance of microbe-dominated ecosystems to agriculture and agricultural production can hardly be overstated, and metagenomics is supplying new insights into their structure and characteristics rapidly In the GI tract, metagenome studies paired with work in germ-free animals have already shown that microbiome composition is important for normal development and animal performance. Moreover, aberrant composition of the microbiome predisposes individual hosts to complex lifestyle diseases such as obesity and inflammatory bowel disease later in life. Another role of microbiome revealed with the help of metagenomic methods - the association of microbiome composition with energy-harvesting capacity and energy balance - portends significant applications targeting health and performance characteristics in production animals. It is possible that discovering factors that shape composition of the gut microbiome will lead to manipulation of microbiome composition (and its metabolic potential) as a means for maximizing energy utilization and balance in animals. Similarly, insights into the gut microbial communities may provide a means for controlling animal health and even carriage of zoonotic diseases. Beyond the potential applications in animal breeding, knowledge about the microbiomes of the rhizosphere and the phyllosphere will open new opportunities to control health and performance characteristics in plants. Understanding the microbiome composition and the forces that shape it requires study of large populations of plants and animals for epidemiological analyses, QTL analyses, and other experiments that involve large numbers of samples. Presently, Next Gen sequencing platforms available at UNL provide the needed analytical capability. However, high-throughput processing of samples on the front end and efficient storage and analysis of data downstream create bottlenecks in the discovery process. The objectives outlined in this proposal therefore focus on eliminating these bottlenecks. Specifically, we propose to: 1. Develop and validate automated systems for high-throughput sample preparation. 2. Develop quality control standards for PCR reactions and quantitative methods for thresholding the microbiome data. 3. Develop database architecture for storing sample information and filtered microbiome data. 4. Develop pipelines for analysis of microbiome data. Approach: The Next Gen methodology parallelizes DNA sequencing, allowing simultaneous sequence analysis of tens of thousands to millions of PCR products from a single sample. By comparing the sequences from thousands of sequences from a given sample to known databases or by using phylogenetic approaches to assign taxonomic status, the species composition of a sample can be inferred by simply counting the number of sequence reads from a given sample that fall into each taxonomic "bin." We will develop and validate DNA extractions in 96-well formats using the BioSprint 96 Workstation (Qiagen). Fecal samples are placed directly into the corresponding wells of a 96-well deep-well plate filled with 200 ul of glass beads and 1 ml of Lysis buffer. Bacterial cells are then lysed using a Qiagen TissueLyzer, which homogenizes particles in the samples, lysing the microbial cells by the crushing action of the 200 micron glass beads in the homogenate. Glass beads and crude sample and bacterial debris are moved to the bottom of the plate by centrifugation. Magnetic beads are added to the lysate, and DNA from the lysate is then attached to these beads and allowed to incubate at room temp. The beads feature silica-coated surfaces and magnetic cores, which allows efficient attachment of DNA to the surface, followed by concentration of the beads by magnetic attachment. Once the DNA is attached to the beads, the plate is placed into the BioSprint 96 Workstation and the beads are removed from the lysate solution by the magnetic tines of The BioSprint 96-well magnetic head. The head moves the tines and attached magnetized beads through successive wash solutions. After three cycles of washing, the head moves to a new 96-well plate containing elution buffer which removes the DNA from the beads. The DNA at this point is ready for the next stage of processing, PCR reaction. Experience has shown us that the efficiency of the PCR reaction is not linear with respect to all species as unsaturated PCR reactions lead to artificial inflation of the Firmicutes and apparent deflation of the Proteobacteria and Bacteriodetes. In order to determine exactly where the bias occurs in the dose-response curves, titrations will be performed using serial dilution of known DNA templates from five different samples. A series of 5-fold dilutions will be used and the PCR products evaluated by agarose gel electrophoresis and image analysis. Finally, a MySQL GutMicro DB database will be constructed to manage the microbiome data and associated sample information on a LINUX system. Sample information will be uploaded through forms that contain fields such as sample type, sample species, sample origin, gender, experiment type, sample prep date, sample prep method, automated plate barcode, sample prep technician, 16S PCR primers + barcode, emPCR date, emPCR technician, 454 Run date.

Investigators

Benson, Andrew

Institution

University of Nebraska - Lincoln

Start date

2010

End date

2015

Funding Source

Nat'l. Inst. of Food and Agriculture

Project number

NEB-31-129

Accession number

223501