February 7th 2024 Webinar

February 7th 2024 Webinar

Tuesday, January 23, 2024
Alex Ignatchenko and Paul D. Thomas on TreeGrafter and the Geno Ontology Annotation (GOA) project

Gene Ontology (GO) Annotation (GOA) project at EMBL-EBI aims to provide high-quality GO annotations to proteins in the UniProt Knowledgebase (UniProtKB), RNA molecules from RNACentral and protein complexes from the Complex Portal. Currently, the GOA database hosts 5 million manually curated GO annotations from over 70 research groups. This set is used as a foundation for 15 automatic GO annotation pipelines. The output data re-generated ever 2 month and commonly referred to as Inferred from Electronic Annotation (IEA). The IEA pipelines use range of statistical, rule-based and machine learning algorithms to enrich existing GO annotation coverage. The generated IEA set of over 1.1 billion GO annotations is subject to over 130 checks, constraints and filters to ensure the quality of predicted GO annotations. The GOA data is publicly available from GOA ftp and the GO annotation browser QuickGO. The GOA team is constantly looking for ways to improve the quality of GO annotations and gene product coverage.
The TreeGrafter is a method of prediction of GO annotations based on PANTHER family/subfamily and the InterPro signatures. The project is a collaboration between PANTHER and the InterPro team at EMBL-EBI. The algorithm was published in 2019, and it was incorporated into the InterPro in the second half of 2023. The TreeGrafter mappings were processed and added to the GOA database for testing shortly after. This implementation resulted in about 301 million GO annotations after the GOA pipeline checks and filters. More importantly, the final set has over 200 million GO annotations, which is not predicted by any other IEA methods. The GOA team plans to intergrade TreeGrafter GO annotation pipeline into the GOA database and release it to public in a first half of 2024.

 

Recording available here!