Reactions Inferred for thousands of genomes (draft metabolic networks)
We regularly build a complete set of reactions whose presence can be inferred over thousands of complete genomes (bacteria and archaea). All reactions are defined in the RhEA database, and are inferred through a number of approaches:
A. Inference via InterPro and GO
- genome sequence and protein-coding gene annotations submitted to the International Nucleotide Sequence Database Consortium and subsequently represented in the UniProt Knowledgebase.
- InterPro annotations of the protein sequences, indicating the presence of protein domains.
- curated associations of InterPro entries and Gene Ontology (GO) terms, indicating protein functions.
- curated associations of Gene Ontology functions to enzymatic functions, as defined by the Enzyme Commission, and reactions.
B. Protein-enzyme- reaction associations directly curated in UniProtKB
C. Curated and predicted genome-protein-reaction associations imported from the Microscope platform
D. Reactions and pathways imported from EcoCyc, and projected to orthologous proteins identified by Microme through the use of the Ensembl Compara platform.
E. Protein enzyme-reaction associations, as defined by the Enzyme Commission, predicted by PRIAM.
F. Pathways imported from MetaCyc.
The above data is periodically loaded and combined to produce a matrix of reactions whose presence can be inferred in an annotated genome, which is being currently used to build draft metabolic networks and models on a selection of species. The current pipeline can be rapidly run to update the matrix to accommodate new genomes, new InterPro methods and new associations (via GO) between InterPro entries and reactions. Because InterPro classification of genomes is a computational (and not a curational) process, this is a scalable method. While it does not provide a complete classification of all reactions, it does provide a basic template of reactions which can be supplemented by more detailed knowledge from particular species.
Genome-Reaction Matrix Browser
The inferred reactions in the genomes can be visually explored in our new prototype interface
Summary of statistics:
- Number of genomes: 5575
- Number of CDSs associated to reactions: 4249948
- Number of unique (RhEA) reactions: 2608
- Number of unique gene-reaction associations: 4889117
- Number of genome-reaction associations: 2214967
- Number of reactions per genome
- Avg: 397.3748
- Max: 1428
- Number of pathways per genome
- Avg: 127.1134
- Max: 545