How Is Genomic Selection Changing the Speed of Crop Improvement in Commercial Breeding Programs?

phenome-networks

Genomic selection is arguably the most transformative methodological advance in plant breeding since the Green Revolution. By using genome-wide marker data to predict the breeding values of untested lines, genomic selection allows breeders to make selection decisions before time-consuming and expensive phenotypic evaluation is complete potentially reducing breeding cycle times by half or more. In 2025, this technology is moving from specialized research programs into mainstream commercial operations across a growing range of crop species.

What Is Genomic Selection and How Does It Work?

Genomic selection is a form of marker-assisted breeding that uses statistical models trained on the observed relationships between genome-wide SNP marker data and phenotypic performance to predict the genetic merit of individuals that have been genotyped but not phenotyped. Unlike earlier marker-assisted selection approaches that relied on a small number of markers associated with specific traits, genomic selection captures the cumulative effect of all markers simultaneously, including those with individually small effects that would not meet significance thresholds in conventional association studies.

The methodology was first formally described for animal breeding by Meuwissen, Hayes, and Goddard in 2001 and was rapidly adopted by the dairy cattle industry, where it produced dramatic improvements in genetic gain per year. Its application to plant breeding followed as the cost of high-density genotyping fell and statistical methods were adapted for the specific features of plant breeding programs, including inbreeding, clonal propagation, and complex experimental designs. A comprehensive review of genomic selection in plant breeding is available from Nature Plants, which has published extensively on the methodological and practical aspects of this technology.

What Genotyping Technologies Are Used for Genomic Selection?

Several genotyping platforms are currently used to generate the marker data required for genomic selection, each with different tradeoffs between cost, marker density, and analytical flexibility. SNP arrays offer fixed panels of markers distributed across the genome and provide highly reproducible results at relatively low cost per sample, making them suitable for large-scale routine genotyping in commercial programs. Genotyping by sequencing approaches sample a random subset of the genome through restriction enzyme digestion and short-read sequencing, providing flexibility in marker density and the ability to discover novel polymorphisms.

The cost of genotyping has fallen dramatically over the past decade, making genomic selection economically viable for a much wider range of breeding programs and crop species. According to Illumina, advances in sequencing technology continue to drive down per-sample costs while increasing data quality, expanding the practical reach of genomic approaches into smaller breeding programs and less commercially prioritized crops. Current costs for SNP array genotyping range from approximately USD 20 to USD 80 per sample depending on array density and sample volume, placing comprehensive genomic selection programs within reach of mid-sized commercial breeders.

How Are Training Populations Designed for Maximum Predictive Accuracy?

The predictive accuracy of a genomic selection model depends critically on the size, genetic diversity, and phenotypic quality of the training population used to calibrate it. A training population should be genetically representative of the breeding population to which predictions will be applied — if the training population does not capture the allele frequency spectrum of the target population, the model will have poor predictive power for materials with genetic backgrounds outside the training range.

Training population size requirements vary by crop species, marker density, and the genetic architecture of the target traits. For highly heritable, oligogenically controlled traits, smaller training populations may suffice. For complex, low-heritability traits influenced by many loci of small effect, training sets of thousands or even tens of thousands of individuals may be needed to achieve acceptable prediction accuracy. Optimal cross-validation experimental designs, including k-fold and leave-one-out approaches, should be used to assess model accuracy before genomic estimated breeding values are used for selection decisions.

What Computational Infrastructure Does Genomic Selection Require?

Genomic selection models involve the analysis of datasets that may include hundreds of thousands of markers scored across thousands of individuals. Several statistical models are in common use, including ridge regression best linear unbiased prediction, Bayesian approaches, and machine learning methods including neural networks. Each has different computational demands and statistical properties that affect their suitability for different genetic architectures and population structures.

Cloud computing has made the computational infrastructure for genomic selection accessible to organizations that lack dedicated high-performance computing resources. Modern breeding data management platforms increasingly offer integrated genomic analysis modules that can submit computation jobs to cloud infrastructure, returning results directly to the breeding database without requiring users to manage data transfer between systems. This integration is critical for making genomic selection operationally practical rather than a specialized research activity requiring dedicated bioinformatics support.

How Can Phenotypic and Genomic Data Be Integrated Effectively?

The full value of genomic selection is realized when genomic predictions are combined with phenotypic information from field trials and greenhouse experiments in a unified analytical framework. Multi-trait models that simultaneously analyze correlated phenotypic traits can increase prediction accuracy for difficult-to-measure traits by leveraging genetic correlations with more easily observed traits. Integration of environmental covariates into genomic prediction models can improve accuracy for genotype-by-environment interaction effects, supporting targeted recommendations for specific deployment environments.

The practical implementation of these integrated analyses requires a data infrastructure that maintains all relevant genomic, phenotypic, and environmental information in a consistent, linked format. Platforms that store genotypic and phenotypic data in a unified database eliminate the data management overhead of maintaining separate systems and enable the kind of rapid, iterative model updating that maximizes the genetic gain per unit time and investment. The CGIAR Excellence in Breeding Platform provides open-access resources and decision tools for genomic selection implementation that are widely used by both public and private breeding programs.

How Does Phenome Networks Enable Genomic Selection Workflows?

The genomics module within the PhenomeOne platform, developed by Phenome Networks, is designed to support the full workflow of genomic selection implementation within commercial breeding programs. The module enables the management of genotypic data alongside phenotypic records in a single platform, supporting QTL mapping, genome-wide association studies, and genomic selection analyses. The PhenoGene module extends these capabilities with an integrated breeding optimizer that calculates optimal crossing schemes to combine favorable genetic markers from multiple parent lines into a target ideotype in the shortest and most cost-effective way. This integration between genomic data management and decision-support tools represents the practical realization of genomic selection in operational breeding programs.

By connecting genomic characterization with pedigree management, field trial data, and breeding decision tools within a single operational platform, PhenomeOne eliminates the technical barriers that have historically limited genomic selection to programs with dedicated bioinformatics capacity. The result is a more accessible pathway to genomic acceleration for a broader range of commercial breeding organizations.

Genomic Selection as the New Standard for Commercial Plant Breeding

The trajectory of genomic selection adoption in commercial plant breeding is clear: this technology is transitioning from competitive advantage to operational baseline across major crop species and well-capitalized breeding programs. The organizations that build the data infrastructure, training populations, and analytical workflows to implement genomic selection effectively today will sustain significant advantages in genetic gain per year the fundamental metric of breeding program productivity through the next decade and beyond. For breeding programs still relying primarily on phenotypic selection, the question in 2025 is not whether to adopt genomic approaches, but how quickly the transition can be completed without disrupting ongoing variety development pipelines.

click here for more info: https://phenome-networks.com/