SOYBEAN GENOMICS RESEARCH
A STRATEGIC PLAN FOR 2008 – 2012
This Report Documents a 5-Year Strategic Plan for Soybean Genomics Research. The Plan was Co-Authored by a Representative Group of 45+ Scientists Who Attended a 30-31 May 2007 Planning Meeting Held in St. Louis, Missouri. (a list of all meeting participants can be found in the appendix)
2
Table of Contents Introduction......................................................................................................................... 3
The Soybean Sequencing Effort ......................................................................................... 5
Main Topic A – Genome Sequence .................................................................................... 6
Sub-Topic A.1 – Genome Informatics.......................................................................... 6
Sub-Topic A.2 – Genome Finishing ............................................................................. 7
Sub-Topic A.3 – Transformation/Transgenics (moved to D) ....................................... 8
Sub-Topic A.5 – Phaseoloid Genomics ........................................................................ 9
Sub-Topic A.6 – Breeder Needs as to Soybean Sequence.......................................... 10
Main Topic B – Gene Function ........................................................................................ 11
Sub-Topic B.1 – Gene Function Annotation/Informatics........................................... 12
Sub-Topic B.2 – Discovery via Mutagenesis ............................................................. 12
Sub-Topic B.3 – Functional Genomics Approaches................................................... 14
Sub-Topic B.4 – Transformation/Transgenics (moved to D) ..................................... 16
Sub-Topic B.5 – Breeder Perspectives on Gene Function.......................................... 16
Sub-Topic B.6 – Soybean Producer Expectations - Genomics................................... 16
Main Topic C – Germplasm Genomics ............................................................................ 17
Sub-Topic C.1 – Association Mapping....................................................................... 17
Sub-Topic C.2 – Track Breeding-Induced Genomic Change ..................................... 18
Sub-Topic C.3 – Mining Yield QTLs in Exotic Germplasm...................................... 19
Sub-Topic C.4 – Marker-Assisted Selection Resources ............................................. 20
Sub-Topic C.5 – Germplasm Genomics Informatics.................................................. 21
Sub-Topic C.6 – Transformation/Transgenics (moved to D) ..................................... 22
Main Topic D – Transformation/Trangenics .................................................................... 22
Sub-Topic D.1 – Create a Transgenic Event Repository............................................ 23
Sub-Topic D.2 – Create a Virtual Center for Transgenics/Transformation................ 23
Sub-Topic D.3 – Establish A Soybean Regulatory Promoter Set............................... 23
Sub-Topic D.4 – Improve Soybean Transformation Efficiency................................. 23
Report Writing Team ........................................................................................................ 24
Acknowledgements........................................................................................................... 24
Appendix – Meeting Participants...................................................................................... 25
3
Introduction A Genome Strategic Planning Workshop was planned by the Soybean Genetics
Executive Committee and convened in St. Louis, MO on 30-31 May 2007. This
workshop was attended by 48 people that included many genomics, genetics, and
breeding experts with expertise in a wide range of scientific disciplines. Also attending
were representatives of the United Soybean Board, the North Central Soybean Research
Program, and one Soybean Producer. Previous soybean strategic plans have dealt with
identifying tools and resources needed to prepare for the eventual sequencing of the
genome. This Workshop represented a paradigm shift for soybean genomics researchers.
The joint announcement in January of 2006 by DOE-JGI and USDA-CSREES that
sequencing the soybean genome was to be undertaken has led to an acceleration of the
attainable goals and targets in many soybean research programs. This announcement has
caused the soybean research community to rethink and reassess its strategic research
objectives given the impact that the soon-to-be availability of genomic DNA sequence
will have on genomic research. A report to the meeting participants by Jeremy Schmutz,
Stanford, the leader of the DOE-JGI soybean sequence assembly effort (see next section),
indicated that the work was proceeding exceptionally well, despite the ancient polyploidy
of a now well-diploidized soybean. A 4X shotgun genome sequence of the genome has
been developed and a draft assembly has been created. This assembly is being evaluated
to determine the optimal means for obtaining the final goal of an 8X coverage. The 8X
sequence is slated to be complete by the end of 2007, with a final assembly expected to
be completed in mid-2008.
Soybean genomic sequence brings a vast amount of data for use in optimizing the rate of
scientific discovery and its translation into technological innovation in the production and
use of soybean, so the primary objective of this Meeting was to assess the current status
of soybean genomics, to identify the resources needed to take advantage of soybean
sequence data, and to lay out a strategic plan for soybean genomics from 2008 to 2012.
The Meeting Agenda was split into four half-day sessions. Three major topic areas: A -
Soybean Genome Sequencing, B - Soybean Gene Function, and C - Soybean Germplasm
Genomics were covered in the first three sessions. In each, the topic area leader
presented a brief report of “where-do-we-stand-now” followed by a charge to the
participants to develop a “where-to-we-want-to-go” strategic plan for the given main
topic. Thereafter, the participants split into sub-topic groups for roundtable discussions
led by a person with expertise in the sub-topic. These discussions were intended to
identify strategic needs of the research community and to identify milestones needed to
achieve the objectives, with one person in each discussion asked to capture the
“discussion bullets and desired milestone dates” on a flip chart. During the last half-day
session, each sub-topic round-table leader/recorder provided an oral report to all meeting
participants and provided a written electronic report to a 5-member writing team who
stayed one additional day to assemble the sub-topic written reports into a single
document that after review/revision became this Genomics Strategy Planning Document.
4
Meeting participants were requested to review the 2005 Soybean Strategic Plan prior to
their arrival at this 2007 Meeting. The 2005 Plan can be found at this SoyBase web site:
http://soybase.org/SoyGenStrat2005/Soy_Genome_Strat_Plan_2005.html. As the writing
team prepared the 2007 Strategic Plan while reviewing the 2005 Strategic Plan, it became
apparent that several objectives identified in the prior plan, such as delivery of large
DNA constructs (BAC-sized) for genetic transformation, or a centralized tilling facility,
were not deemed as high of a priority in 2007 as they were in 2005. Moreover, some
targeted 2005 Plan goals have not moved forward to the extent the 2005 strategic plan
participants had originally proposed as completion dates. This included the generation of
large numbers of independent Tnt1 and Ds insertions for functional analysis of genes in
soybean. Other 2005 Plan targets included methodical characterization of abiotic and
biotic stresses using various expression, proteomic and metabolomic approaches. These
approaches have apparently not achieved the momentum that the 2005 Plan developers
thought would occur (likely due to limited funding). Still, a gratifying number of
objectives and milestones identified in the 2005 Plan have been achieved on schedule.
The number of SNPs and STSs proposed for discovery and development was exceeded.
Inbred mapping resources (RILs) have been developed. Transformation technologies
have improved and various gene knock-out systems are working. Thanks to funding by
the United Soybean Board and the National Science Foundation, physical and transcript
maps are now in stages of completion. Bioinformatic resources and staffing have more
than doubled, just in time to receive the whole genome shotgun sequence of soybean.
Given these technological developments and the diminishing cost of many genomic-
based technologies, the outlook for the next half-decade of soybean genomic research is
quite optimistic.
Unless otherwise noted in this report, the year (e.g., 2008, 2009, etc.) associated with a
goal or scheduled activity denotes that the goal or activity will be completed by
December of the indicated year. For this report, participants were asked to project goals
and activities over the next half-decade (2008 – 2012), recognizing of course, that near-
term projections were likely to be more certain than long-term ones.
5
The Soybean Sequencing Effort
Department of Energy and Joint Genomic Institute.
(http://www.jgi.doe.gov/sequencing/why/soybean.html)
Jeremy Schmutz (of JGI) provided the Meeting Participants with some statistics about the
soybean sequencing effort. The ancient tetraploid nature of the soybean genome does not
appear to be generating problems, at least based on the 4x data accumulated to date.
Soybean Sequencing Effort Targets:
JGI Goal: A near-complete, ordered and oriented genome sequence that covers at least
98% of the euchromatic soybean sequence.
JGI Goal: 80-100 Mb of the genome finished and included in the 8x release.
Collaborative Goal: High quality automated gene annotation and public genome
browser.
DOE-JGI Soybean Sequencing Schedule to Date and Forward (subject to revision)
Date Activity May 07 Evaluate shotgun, set coverage and choose new clones
Jun 07 - Sep 07 QC new 8x library and sequence additional 4x
Oct 07 8x shotgun and BAC ends complete
Dec 07 Build
shotgun
assembly
Jan 08 - Feb 08 Order and orientate final assembly
Mar 08 - Jun 08 Final O & O assembly and Annotation
Jul 08 - Oct 08 Release Annotation Browser, manual annotation and analysis
Nov 08 Collate
and
work
on
publications
Dec 08 Submit publication(s)
6
Main Topic A – Genome Sequence
Sub-Topic A.1 – Genome Informatics
The Community goals of the Genome Informatics group are the establishment of
integrated data, informatics tools for end users, and cyber-infrastructure resources to
assist in: i) annotating genes and genomes, ii) merging maps, and iii) integrating the
various soybean genomic resources along with those of other plant species. One concern
that arose repeatedly is the need for long-term support for genome databases and
informatics, not only for soybean but also for other legumes and crop plants.
A.1.a - Annotation Needs. •
2008 – Establish a soybean Informatics Steering Committee to address the
community’s current and future informatics needs.
•
2008 – Establish the International Soybean Genome Annotation Group (ISGAG),
which will serve as community body to interface with JGI for soybean genome
annotation and the establishment of a controlled vocabulary nomenclature.
•
2008 – Establish community standards for expression, protein and metabolite
profiling platforms and data.
•
2009 – Implement a HapMap browser that transcends linkage groups to ORF to
SNP that will help connect the sequence to polymorphisms for breeders. Example:
Genomic Explorer y Survey of Immune Response (GEYSIR) Software.
•
2010 – Broadly enable the ability to go from expression data to QTL data.
Example: Provide users with an informatic means of mapping microarray
expression data onto the genetic QTL data present in SoyBase.
•
2012 – Integrate genome sequence with physical and genetics maps with the goal
of integration of functional and phenotypic data.
o Establish tools for the identification of candidate genes underlying QTLs.
o Integrate plant traits and phenotypes (e.g., digital image or measurement
data) with genetic maps and other genetic data.
A.1.b - Merge Maps (Genetic, Physical and Sequence). •
2008 – Merge cytological, genetic and physical maps with the draft whole
genome sequence (WGS) of the soybean genome. The goal is to make the
sequence as useful as possible by integrating with existing data sets.
•
2008 – Convert linkage group and chromosome names to common number (1-20).
•
2009 to 2010 – Populate database to overlay physical, genetic, cytological maps
onto draft genome sequence to a level of “75% consistency”.
A.1.c - Integrate Soybean Genomic Data with that of Related or Other Species.
Coordinate soybean genomic data with data now available in other species to identify and
confirm orthologous genes. Here are some specific goals and datelines:
•
2008 – After the release of the soybean genome sequence – generate syntenic
comparisons with the sequences of the below species to confirm gene predictions
and models and to enable functional annotation of other non-coding sequences.
o Model Species (
Arabidopsis, Medicago, Lotus). o Poplar (
Populus trichocarpa – Western Black Cottonwood).
7
•
2010 – Obtain genome sequence of
Phaseolus. vulgaris – dry bean.
o Use the soybean and dry bean sequence to enable sequence transfer to the
pulse crops, e.g.,
V. radiata – mung bean and
Vigna unguicula – cowpea.
•
2012 – Begin finishing draft sequences of many other legumes to:
o Enable orthologous comparisons: make functional inferences between
related genomes.
o Enable ortholog comparisons between Galegoid and Phaseoloid legumes.
A.1.d - Genomic Database Convenient to Access by ALL USERS.
Need an integrated database for use by all users, including breeders, geneticists,
genomicists comparative biologists, molecular biologists, biochemists, etc.). This
represents a long-term ongoing activity that will be necessary for the community to
leverage the genome sequence data for use in all scientific disciplines. The database
should include:
•
2008 to 2012 – Develop a soybean genomic database that has:
o The ability to navigate from maps to genes to traits.
o Different entry portals at a unified web site providing scientists of various
backgrounds a user-friendly interface that enables direct access to relevant
data (i.e., multiple ways to access and manipulate the data well).
o Expansion of the Soybean Breeders Toolbox.
o Augmented databases with phenotypic data.
o Expression data QTLs.
o Transcend data and/or data types (trait to gene or gene to trait).
•
2010 – Integrate all databases, including the “Seed and Population” databases that
exist for available genetic stocks, mutants and germplasm collection, into the
soybean genomic data base.
A.1.e - Transposon and Repeat Sequence Databases.
Transposons – known as McClintocks’s ‘jumping genes’ - are ubiquitous in plant
genomes and confound the assembly and annotation of the genomes. Therefore, a
comprehensive database is necessary.
•
2008 – Establish support system to expedite the creation of the transposon
database. This is an immediate need.
•
2009 – Release of an expert-curated transposon database for manual-based
genome annotation.
Note: Additional annotation/informatics bullets were developed in the B-1 and C-5
sub-topic sessions, so go to the Main Topic B and C sections to view these bullets.
Sub-Topic A.2 – Genome Finishing
What is meant by a “finished genome”? By the end of 2008, the genome sequence will
not be “finished”
per se (i.e., as an end-to-end sequence), but it should be of high enough
quality to be sufficient for most research purposes. Still, many gaps will remain in highly
repetitive areas, centromeres, and within many scaffolds. The following are steps we feel
necessary to make the sequence as usable as possible for soybean researchers.
8
A.2.a - Initial Genome Assembly. •
2008 – 99% of all genes will be sequenced with high accuracy.
•
2008 – 20,000 full length cDNAs sequenced (will support annotation).
•
2008 – 100% of scaffolds > 100 kb ordered and oriented within pseudomolecules.
o each scaffold > 100 kb will have at least two map-consistent markers.
•
2008 – Pseudomolecule assemblies will be publicly available.
A.2.b - Initial Annotation of Genome Sequence. •
2009 – Annotations will be available for download and via a browser at JGI
•
2009 to 2012 – Gene expression support will accompany annotations where
possible (cDNA, EST, homology to transcripts from other species).
A.2.c - Selective Re-Sequencing. •
2010 – BAC libraries will be created and BAC-end sequenced to low coverage
(~5x) for ~20 diverse accessions.
•
2010 to 2012 – Targeted re-sequencing will be carried out from these accessions
for regions of interest.
•
2009 to 2012 – A deep transcript sequencing project for gene discovery and gene-
model validation (i.e. 454, Solexa, etc.) was suggested (but due to evolving
technologies there was less consensus on this goal than on the above two).
Sub-Topic A.3 – Transformation/Transgenics (moved to D)
Note: Bullets in this discussion section were moved to a “new” section - Main Topic D.
Sub-Topic A.4 – Genome Re-sequencing
Once the genome sequence is available, the first goal will be using it for the improvement
of soybean via development of markers and mapping of traits in order to develop the
most efficient tools for the application of marker-assisted selection soybean breeding.
To do this, some limited ‘resequencing’ of related genomes will be necessary for marker
development and trait mapping.
A.4.a - SNP Genotyping. •
2007 – The best current platform is the Illumina BeadStation uisng the Golden
Gate Assay – now used by Beltsville ARS group (Cregan and Hyten).
Expectation of significantly lower costs relative to high-throughput mapping —
if other genomicists and breeders adopt the technology.
•
2007 – A set of 1536 loci with high minor allele frequencies will be genotyped
across core germplasm collection, and in 500 RILs of the inter-specific mating of
Williams 82 (
G. max) x PI 468.916 (
G. soja).
•
2008 – SNPs will be mapped using one or more of these mapping populations:
o Beltsville: RIL populations: 500 Williams 82 x PI 468.916 (
G. soja) , 300
Harosoy x Clark, 233 Minsoy x Noir, 233 Minsoy x Archer.
o Missouri: 1,300 Forrest x Williams 82, 600
G. soja x
G. max o Virginia Tech: 800 PI96.983 (
G. max) x Lee68 , 300 PI407.162 (
G. soja)
x V71-370 (
G. max).
o SIU: RIL populations: 975 Resnik x Hartwig, 500 Essex x Forrest.
9
•
2008 – The Ilumina Infinium assay for genotyping 25,000 SNPs will be ready for
association mapping, using a core collection of genotypes.
A.4.b - Resequencing for SNP Discovery. •
2008 – Discover a minimum of 15,000 SNPs (one every ~50kb) – or as many as
25,000 if costs fall and technology improves.
•
2010 to 2012 – Discover 120,000 SNPs located across genome to permit a
successful haplotyping of the entire germplasm collection of 18,000+ accessions.
A.4.c - Other Technologies – Simultaneous SNP Discovery and Genotyping. •
2008 – Current re-sequencing projects will automatically identify new SSR loci,
which would allow 2,000 more SSRs to be placed on map along with SNPs.
•
2008 – Several groups are currently working with Single Feature Polymorphisms
(SFPs), using Affymetrix data, and will use these to genotype RILs and NILs.
•
2008 – Sequenom SNP genotyping now available for about 1000 existing SNPs,
but for any newly discovered SNPs, this platform requires the redesign of the
primers for the SNP-containing amplicons.
•
2008 – Re-sequencing via the Sanger / Solexa / 454 / ABI SOLiD platforms will
be necessary for SNP discovery in specific targeted genotypes (2-5).
•
2008 – BAC-based re-sequencing should be investigated as it provides positional
information across larger segments of the genome.
Sub-Topic A.5 – Phaseoloid Genomics
The group expressed primary interest in
Phaseolus vulgaris (bean) as a diploid model
species for syntenic sequence comparisons with soybean.
Phaseolus and its allies (e.g.,
Vigna species) are 2n = 22, with relatively small genomes (roughly half that of soybean),
and both
Phaseolus and
Vigna diverged from
Glycine about 19-23 mya.
Phaseolus has
much in common with
Glycine (e.g., determinate nodules), and may be very useful for
the syntenic discovery of genes involved in abiotic stresses (e.g., phosphorus, drought).
These expectations for
Phaseolus genomics are conditioned on its status as a sequencing
target; it is currently being considered by JGI, and is being sequenced at a more limited
level by other groups (e.g., S. Jackson lab – USA, and the V. Geffroy lab – INRA, in
collaboration with the R. Innes lab – PGRP R-gene project).
A.5.a - Genomic Research Goals for Phaseolus.
Genetic redundancy, so prevalent in soybean due to genome duplication(s), is not so
abundant in
Phaseolus; therefore, genetic dissection of difficult agronomic traits may be
more efficient in
Phaseolus, and the results transferred rapidly back to soybean (e.g.,
drought, rust resistance, etc.). In most cases, the future role of the soybean research
community will be indirect, primarily one of voicing support for genomics initiatives by
the bean research community for the following goals:
•
2008 – Targeted sequencing of orthologous regions involved in stress and
resistance responses; primarily sequencing of BAC libraries by various groups.
•
2008 to 2009 – Deep sampling of ESTs for key traits such as root and various
nodule developmental stages, including late stages of such development.
•
2010 – Production of a draft sequence of
Phaseolus vulgaris.
10
•
2008 to 2012 – Integration of
Phaseolus data in databases with soybean (e.g.,
through the Legume Information System).
A.5.b - Other Genera.
Other genera were discussed more briefly since these can fill the temporal gap between
Phaseolus and
Glycine. A BAC library exists for one of the closest generic relatives of
Glycine, which is
Teramnus (diverged 10-12 mya, close to the divergence of
homoeologous genomes in
Glycine). Unfortunately,
Teramnus species have relatively
large genomes (nearly the size of soybean) despite a relatively low chromosome number
(2n = 28) and are not of economic interest.
Pachyrhizus (jicama) diverged from
Glycine 15-18 mya and is of some economic importance in the developing world.
Pueraria lobata (kudzu) is a weed that is closer to
Glycine (13-15 mya) than
Pachyrhizus. The perennial
Glycine species are of interest because of the nature of polyploidy in
Glycine as a whole,
and thus for understanding the duplicated nature of the soybean genome. Because these
species diverged from soybean (and its progenitor,
G. soja) around 5 mya, they afford the
opportunity to localize changes in homoeologous regions to events shared among species
and thus potentially due to the polyploid event vs. later changes that have occurred since
separation from their common ancestor. Moreover, the perennial species, constituting the
secondary germplasm pool for soybean, represent an untapped resource for a wide range
of agronomically important traits such as drought tolerance and rust resistance. Some of
these have been studied (e.g., rust resistance in
G. canescens); crosses have been made
between soybean and one of the perennial species,
G. tomentella.
•
2010-2012 – Library construction and targeted sequencing of one or more
perennial
Glycine species.
Sub-Topic A.6 – Breeder Needs as to Soybean Sequence
In this session, breeders addressed many items, such as, genes (traits –phenotypes) in the
same linkage blocks, the need for additional genic markers, specific genes controlling
traits of interest, and alleles currently available in germplasm. Also discussed was the
need for genomics information to be integrated in a way that breeders can query it
quickly and thus better use markers to select desirable lines and cross combinations.
Additional needs were: precise map positions for the E1 thru E8 (maturity genes); ways
to push phenotypic information onto genomic data bases; the need to map the genes for
deleterious traits (to better select for the non-deleterious alleles); a breeder-friendly SNP
detection system; and breeder useful “quick” assays for SNP markers.
•
2008 to 2009 – Construct a 1536-SNP Oligo Pool Assay (OPA) to provide to
breeders (on a cost-recovery basis) for use in all aspects of breeding.
•
2008 to 2009 – Re-sequence 17 well-chosen genotypes for thousands of SNPs to
evaluate sequence-based genic and other diversity in soybeans.
•
2008 to 2009 – Assay 1000 elite lines for allelic composition at 10,000 SNP loci.
•
2009 to 2012 – Design highly polymorphic breeder friendly marker assays for
those SNP loci linked to key genes/QTLs, to be used for formal MAS, or for
allele-specific frequency enrichment in progenies or populations.
•
2009 to 2012 – Begin the development of an inexpensive yet convenient 10,000-
SNP assay to allow the soybean breeder to routinely examine the diversity of each
year’s selected parents and progeny.
Document Outline
Add New Comment