A Genome-Wide Association Analysis of 15,000 Soybean Germplasm Accessions to Discover Novel Genes for Seed Yield, Protein, and Oil

Impact on Soybean Production and Value

First of all, the eventual discovery of this gene affecting protein/oil content could allow breeders the chance to manipulate protein/oil without affecting yield. Secondly, knowledge on the distribution of genetic variation within the collection will allow breeders to make more informed choices when introducing new genetic variation into their programs, which is required for making genetic gain for yield and other traits. Genetic variability is limited in soybean, so whatever can be done to help facilitate the introduction of genetic variation could have huge impact.

Aaron Lorenz (2014)

Key Terminology

Single nucleotide polymorphism (SNP) is the common variation of a single nucleotide in the DNA sequence of members within a species. They occur throughout the genome and can be used to map genes

Genome refers to the entirety of genetic material of an individual

Accession refers to individual plant or seed samples that are held in a genebank

Germplasm refers to the total genetic material available for study – mainly from the genetic material represented in the available accessions

Quantitative traits are traits, or phenotypes, that are influenced by two or more genes and the environment.

Quantitative trait loci (QTL) are regions of DNA that contain, or are closely linked to, genes of quantitative traits.

Study Objectives

(1) Identify specific SNP marker flanked genomic regions that contain genes governing seed yield, protein, oil, and seed size in about 15,000 soybean accessions that have (to date) been measured (phenotyped) for yield, protein, oil.

(2) Elucidate the genetic relationships between seed yield, protein, and oil concentration in terms of whether genes exist in these accessions that just govern a given trait without a concurrent impact on either or both of the other two traits.

(3) Determine gene effects for each of the three traits relative to their distribution in this population of 15,000 germplasm lines.

(4) Provide the new gene discovery information to USA soybean breeders for use in their varietal development programs.


Using the data on >12,000 soybean accessions, we identified a major QTL on chromosome 20 that impacts both protein and oil. While this QTL has been previously identified, our larger population better refined its position. We also characterized the population structure of the entire collection, so we now have a better idea on how the genetic variation contained in the USDA soybean collection is distributed across countries of origin and maturity group.

Additional Information

A Population Structure and Genome-Wide Association Analysis on the USDA Soybean Germplasm Collection