Clostridium difficile (CD) has been known for 30 years as a cause of CD-associated disease (CDAD) in healthcare facilities (HA-CDAD). However, the incidence of community-associated disease (CA-CDAD; in individuals not hospitalized for ¡Ý 12 months and not taking antimicrobials for ¡Ý 3 months) is now nearly that of HA-CDAD. CD strains of ribotype 078 and North American pulsed-field (NAP) type 7 or 8 are frequently incriminated. <P> Porcine CDAD is perhaps the most common uncontrolled cause of neonatal enteritis in pigs, and CD infection occurs frequently in diarrhetic calves. Ribotype 078 strains also comprise > 90% of isolates from these species in the US. Food animal and human disease may be connected, in that > 40% of retail meats are culture positive (> 70% with ribotype 078). Microarray analysis has revealed that ribotype 078 strains from various sources are related, but not identical. <P> Genomic sequencing of multiple strains of similar genotype, but from different sources, will allow identification of source-, genotype-, and perhaps species-specific genes in ribotype 078, in addition to teaching us more about the genome structure of CD.
NON-TECHNICAL SUMMARY:<BR> Our working hypothesis is that a group of C. difficle strains commonly found in food animals have become increasingly common in human C. difficile associated disease. Therefore we would like to obtain genomic sequences of 7 strains from humans, piglets, calves, and retail meats and to determine genetic relationship.
<P> APPROACH: <BR> Strain selection. Our rationale for focusing on strains of ribotype 078. We propose to sequence strains of ribotype 078 from food animal and human hosts (n = 5) and from retail meats (n = 2). There is no established protocol for infection of calves and our Institutional Review Board has been lukewarm on virulence testing in humans. However, all are virulent, to varying extents, in piglets. The NAP types were selected in proportion to those in the collection (Table 2: type 7 and type 8 strains in pigs, humans, and foods, and type 7 in calves). When examined by microarray-based comparative phylogenomics, these strains are found in clades as described above. We have selected the most representative strains possible. We will produce draft rather than finished sequences, due to the substantial cost differential (~ $30K per genome vs < $20k, respectively). Draft sequencing will focus primarily on accuracy, allowing reliable gene predictions and production of a high quality gene list so that important physiological properties of the bacterium can be analyzed at the genetic level. The approach (Figure 2) includes directed pre-finishing to fill gaps, via an automated approach to control costs. We expect a very high quality assembly prior to automated finishing, and although most gaps will be at repeats, regions of lower information content, some will be within genes and annotation will benefit by their being filled. A second strategic decision is to perform the whole genome shotgun (WGS) sequencing with the pyrosequencing platform from Roche-454 Life Sciences (27a). A fragment run of single WGS reads and a run of paired WGS sequences will be performed on the GS-FLX. The paired end protocol will provide scaffolding information, linking contigs and leading to a more contiguous assembly. The paired end protocol on the FLX now produces read pairs variable length, from 250 to very few bases. We continue to explore best use of these data and now complement it with the standard WGS run. In addition, we will add data from a single lane on the Solexa GA to correct errors in 454 sequence. These methodologies are firmly entrenched in the WUGSC. We predict savings with this approach over a Sanger-based whole genome shotgun, while achieving a sequence of the same or higher accuracy and a genome with as high or higher continuity. The final product will be amenable to automated annotation and gene analysis.