Plant Bioinformatics Resources for FAIR Agricultural Data Discovery and Reuse workshop

AgBioData will be at the Plant Biology 2023 Conference in Savannah (GA)! On Sunday, August 6, at 11.30 AM ET, we will have a two-hour workshop about the digital bioinformatics resources available in plant biology. In particular, this workshop will introduce and update plant biologists on community databases, data repositories, online bioinformatics tools, and knowledge bases that provide equitable access to structured, integrated data and analytic tools. 

 

Workshop description:

Title: "Plant Bioinformatics Resources for FAIR Agricultural Data Discovery and Reuse"

Location: Meeting Room 201/202
Date: Sunday, August 6, 11:30 AM
Duration: 2 hours

About

Advances in agricultural research increasingly depend on data-driven discovery, and the amount of data being generated is increasing exponentially.  For this data to be maximally useful,  researchers need to know what data is out there, how to use it, and how to share it back to the community. This workshop will introduce and update plant biologists to a variety of digital bioinformatics resources, such as community databases, data repositories, online bioinformatics tools and knowledge bases that provide equitable access to structured, integrated data and analytic tools. These resources enable researchers to generate hypotheses, design experiments, and analyze data more quickly.  This workshop will feature overview presentations from the AgBioData RCN and member digital resources, followed by a  panel discussion on community needs, challenges, and database support for Findable, Accessible, Interoperable, and Reusable (FAIR) data compliance.

Learning Objectives:

  • Identify and use relevant digital tools and bioinformatics resources to access and analyze a variety of public genetic, genomic, and breeding data and conduct research.
  • Identify and use public databases, understand how data can be reused to make new discoveries, understand challenges that inhibit data reuse, and how to help define standards for data types.
  • Understand FAIR data principles and the role of community databases in the FAIR ecosystem, know how to make their data FAIR, and find resources for educating trainees in FAIR data.

 

Workshop agenda:

11:30 AM ET  "The AgBioData Consortium and Research Coordination Network: Enhancing Agricultural Research and Discovery through Improved Data Management and Education", Annarita Marrano (Phoenix Bioinformatics)

AgBioData is a consortium of agricultural biological databases to consolidate standards and best practices for acquiring, displaying, and reusing genomic, genetic, and breeding (GGB) data. In 2021 we received funding from the National Science Foundation for a Research Coordination Network grant to enable Findable, Accessible, Interoperable, and Reusable (FAIR) data in agricultural research. We have four main goals: (1) develop community-based standards for GGB data sharing and management; (2) expand the AgBioData network by recruiting key stakeholders in agricultural research; (3) provide educational material to train researchers on FAIR data sharing; and (4) develop a roadmap for a sustainable genomic, genetic and breeding (GGB) Database Ecosystem. This talk will summarize the work and achievements of the first two years of the NSF RCN grant and provide updates on the future directions of the consortium.
| Slides |

11:45 AM ET  "Breedbase, the digital ecosystem for plant breeding", Christine Nyaga (Boyce Thompson Institute, Cornell University)

Breedbase is a comprehensive open-source digital ecosystem for plant breeding that covers field trial, phenotype, accession, and genotyping data management, in conjunction with apps from the PhenoApps family. In this presentation, we will discuss some new and upcoming features in Breedbase. One recently released feature is the GPCP (genomic prediciton of cross performance) which complements the solGS tool that predicts the performance of lines from genotypic data. The GPCP tool predicts the best combinations of next parents from genotypic data, and ranks the combinations according to the cross prediction merit. In an upcoming new feature, the spatial corrections can be run for each trial and used in subsequent analyses. A summary and overview of other features is also given (https://breedbase.org/).
| Slides |

12:00 PM ET  "Bio-Analytic Resource for Plant Biology", Asher Pasha (University of Toronto)

The BAR is a widely-used, open-access plant biology genomics database resource. The BAR has several databases, including gene expression, protein-protein interactions, and protein structures. The data in the BAR databases are provided to researchers via modern web applications such as eFP Browsers, ePlants, and ThaleMine. BAR tool GAIA can search genes, BAR tools, and BLAST sequences. Feedback from researchers has helped develop and improve the BAR apps.
| Slides |

12:15 PM ET  "The 2023 TAIR update: From basics to the progress with the community-developed v12 of the genome", Sabarinath Subramaniam (Phoenix Bioinformatics)

The Arabidopsis Information Resource (TAIR)  is a curated database and discovery resource for the reference genome and genes of  Arabidopsis thaliana.  Since 1999, TAIR has provided the most comprehensive and current set of plant gene function data to the research community.  This presentation will consist of a brief overview of TAIR basics as well as updates on newly added data and features, including JBrowse2 and the ongoing community-based v12 reannotation of the A. thaliana Col-0 genome. Whereas all previous versions were produced by teams specifically grant funded for this purpose, v12 will be done by combining the efforts and strengths of independent labs around the world to create a resource for the good of the community. The process encompasses five stages: genome assembly, automated annotation, manual review, submission to GenBank/EBI/DDBJ, and dissemination. We will update the community on progress and plans and will solicit feedback and participation from additional interested stakeholders.

12:30 PM ET  "Gramene 2023: A comparative functional genomics resource", Sunita Kumari (Cold Spring Harbor Laboratory)

The Gramene Knowledgebase (https://www.gramene.org/)  is a curated, open-source, integrated data resource for comparative functional genomics in crops and model plant species. The resource is committed to open access and reproducible science based on the FAIR data principles. It hosts 128 reference genomes and four crop specific pangenome sites on maize, rice, grapevine, and sorghum.The reference genome databases are a mirror of EMBL-EBI Ensembl Plants, and built in collaboration with Expression Atlas, and UniProt. The Plant Reactome database is produced in collaboration with the NIH-funded human Reactome project. Gramene is funded by USDA ARS 8062-21000-041-00D.
| Slides |

12:45 PM ET  "SorghumBase: Public Genetic and Genomic Database for the Sorghum Community", Nicholas Gladman (USDA-ARS)

SorghumBase (https://www.sorghumbase.org) is a USDA-ARS funded resource for curated multi-omic and genetic data sets. It also hosts relevant news and events for the sorghum research community. SorghumBase works closely with the community to support stewardship of sorghum genomics data, while establishing best-practices on data management and coordination with the community. We maintain a database of scientific publications, genome sequences, gene structure annotations, and comparative genomic analyses integrated with curated data from the gene Expression Atlas at EBI, Plant Reactome pathway database, and the QTL Atlas at OZ Sorghum. The community coordination aims to improve reference structural gene annotations and develop standards for nomenclature and support through FAIR (Findable, Accessible, Interoperable, Reusable) practices with genetic variation data. We are also working with sorghum researchers, producers, and service providers to develop a marker panel for breeding and molecular biology programs. In addition, weekly news items related to sorghum include highlights on recent publications, conferences, and community events. Our outreach activities include user guides and training sessions. We welcome any and all feedback. SorghumBase offers sorghum investigators improved data collation and exposure that will aid in building an even more robust research community in support of genomics-assisted breeding. Supported by USDA-ARS #8062-21000-041-00D.
| Slides |

1:00 PM ET Panel discussion on community needs, challenges, and database support for FAIR data compliance