Working group description and goals:
Accurate and efficient data biocuration is critical to ensuring FAIR data. Biocuration, however, can be time-consuming and overwhelming, especially with the rapid pace of new data generated nowadays. Natural Language Processing (NLP) models can support biocurators’ work in finding, organizing, integrating, interpreting, and validating diverse information into a structured form suitable for databases and knowledge bases. Most NLP models and applications have been developed for human or other model organisms’ data and may present some challenges when applied to plant and livestock datasets.
This working group will:
-
Define use cases for applying NLP in biocuration for AgBioData databases (e.g., key research questions).
-
Identify common entities curated across AgBioData databases for NLP-driven extraction/curation.
-
Summarize existing NLP models, tools, and curated training sets, and identify their limitations with AgBioData-curated content.
-
Recommend strategies and next steps to address these limitations and advance NLP for biocuration to the consortium.
Chair: Tanya Berardini
Co-Chair: Adam Wright
Members:
- Adam Wright
- Andrew Olson
- Bob Cottingham
- Carson Andorf
- Edwin Ong Jun Kiat
- Irene Cobo Simón
- James Koltes
- Jodi Callwood
- Kapeel Chougule
- Larmande Pierre
- Laurel Cooper
- Parul Gupta
- Qi Li
- Rex Nelson
- Sook Jung
- Srikanth Kumar
- SUDHANSU DASH
- Sushma Naithani
- Taner Sen
- Tanya Berardini
- Trish Whetzel
- Zhiliang Hu
- Doreen Ware
For more information, please email agbiodata@gmail.com.