An official website of the United States government.

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

ENVIRONMENTALLY AWARE DEEP LEARNING BASED GENOMIC SELECTION AND MANAGEMENT OPTIMIZATION FOR MAIZE YIELD

Objective

The main goals of this project are to improve the accuracy of phenotypic prediction in crops grown in diverse environments, to incorporate genotype-by-environment interactions and biological theory into phenotypic models, and to make deep learning phenotypic prediction models usable in crops lacking sufficient data to train these models. The objectives and sub-objectives below support the realization of these goals.Goal 1: Development of a State-of-the-Art Deep Learning (DL) Phenotypic Prediction Models1.a Optimize Deep Learning Models using Genomic Data 1.a.1 Using maize yield data from the Genomes to Fields Initiative, design fully connected (FCN), convolutional (CNN), and recurrent neural networks (RNN). 1.a.2 Train models using raw genomic data (FCN, CNN, RNN), representation as a Hilbert curve (CNN), or following data reduction. Data will be reduced to loci significantly associated with yield (FCN), autoencoder embeddings (FCN), or to the loci of annotated genes (FCN, CNN, RNN). 1.a.3 Best linear unbiased predictor models (BLUPs) will be created using comparable input data. 1.a.4 Trained DL models will be evaluated against each other and BLUPs. Performance of differences between model types and data treatments (reduction, modification) will be considered.1.b Optimize Deep Learning Models using Environmental and Management Data 1.b.1 DL models (CNN, RNN) using environmental and management data will be designed. Inclusion of residual connections between layers will be tested. 1.b.2 DL models will be trained on data represented as a time series (CNN, RNN) and as a Hilbert curve (CNN) and with and without pre-training on low resolution (e.g., farm or county instead of plot) data. 1.b.3 Performance of the created models will be evaluated.Goal 2: Incorporate Genotype-by-Environment Interactions (GxE) and Biological Theory2.a Design a DL model processing genomic data with connections informed by known gene pathways in maize.2.b Create six models with GxE interactions using the either the best genome processing model from 1.a or the pathway informed model from 2.a.1 and the best environmental and management processing model from 1.b using one of three interaction networks. 2.b.1 Directly predicting yield. 2.b.2 Producing outputs corresponding to variables in a simple crop growth model. 2.b.3 Producing the same number of outputs as in 2, but which are not required to correspond to physiological available.2.c Train benchmarking BLUP and machine learning models.2.d Evaluate model performance relative to benchmarking models and non-interaction models created in 1.2.e For the best performing DL model, SNPs and environmental variables of high importance to the predicted yield will be identified and compared with those reported in the literature.Goal 3: Assess Transferability of Trained Models to Other Crops and Multi-Crop Models3.a Prepare a wheat dataset complete with yield in multiple environments and genomic data.3.b Evaluate Effectiveness of Transfer Learning 3.b.1 Using the design of the best DL model identified, train a model exclusively on wheat. 3.b.2 Fit a BLUP model using these data for use as a benchmark. 3.b.3 Retrain copies of the maize model and wheat model on different percentages of the other species data (5%, 25%, 50%, 75%, or 100%). 3.b.4 Compare model accuracy with respect to percent data available.3.c Evaluate Effectiveness of Multitarget Learning 3.c.1 Extend the design of the best DL model identified by adding a second GxE interaction subnetwork so that each species has a distinct interaction subnetwork. 3.c.2 Train models on half of both species' datasets or all of both with accuracy being tracked for both species. 3.c.3 Train models using differing percentages (5%, 25%, 50%, 75%, or 100%) of the total samples available for each crop.3.d Compare the performance of models using transfer learning and multitarget learning with respect to data availability and with respect to non-retrained single species DL models and benchmark models.

Investigators
Kick, D. R.
Institution
AGRICULTURAL RESEARCH SERVICE
Start date
2023
End date
2025
Project number
ILLW-2022-09713
Accession number
1030152