Organization overview
The reference pathways and variants, the species-specific pathway assemblies, and the global metabolic models produced by Microme are different facets of a coherent resource that will provide unprecedented functional, comparative and evolutionary perspective on the microbial world. This resource will be made accessible to the community at large through a single web portal.
This portal will enable access, querying and analyses with pathways and models using several different modes, corresponding to typical use-cases for microbiologists and for computational biologists, described in the figure below.
Microme will provide a robust, scalable infrastructure integrating a curated repository of reference pathwayswith genome-scale constraint-based modelsand downstream applications in comparative genomics and biotechnology. Crucially, the output of these downstream applications will be fed-back into the curation process, enhancing the quality of the reference pathways and models. Although many prior endeavours have already been undertaken in all of these areas, the absence of a well-engineered, coherent framework to support their interaction has proven a fundamental obstacle to the development of high-throughput, pathway-aware systems for automatic genome annotation. By linking these components in a coherent fashion, Microme will advance beyond the state of the art at a time when the explosion of genome-scale sequencing makes such progress a necessity if the growing wealth of data thus produced is to be understood and exploited.
State of the art:
A number of established databases already exist that represent information about biological pathways. The Kyoto Encyclopedia of Genes and Genomes (KEGG) has an excellent collection of reactions, but does not have a well defined pathway model.
EcoCyc is widely considered to have described a definitive collection of pathways related to Escherichia coli, and additionally, these have been projected to other species in EcoCyc’s sister database MetaCyc; however, The MetaCyc collection of pathways includes an extensive set of pathways from all species, together with a limited notion of variant, but is only available under a restrictive license agreement; and links to external resources are sometimes weak. One project that has attracted a great deal of interest in the community is the SEED, whose position paper proposed to significantly boost annotation throughput and quality by using the notions of metabolic subsystemsand variants(crucial to permitting the computational exploitation of pathway data), and annotating subsystem-wise across species rather than genome-wise. These notions are neither formally defined nor rigorously enforced on its curators, however, which limits the current usefulness of their curated dataset. IMG is linked to the JGI sequencing program, but its notion of pathway is again too limited to allow the full range of downstream computational analyses. Reactome is a rich resource of information about metabolic and signaling pathways, is available without license and uses an open source software infrastructure; but is solely focused on human biology.
Genome-scale metabolic models have been reconstructed for nearly twenty organisms to date, mostly through painstaking curation efforts. Various schemes have been proposed to speed-up the reconstruction process, but no credible systematic and scalable strategy has emerged so far (Notebaart et al., 2007, DeJongh et al., 2007). But in spite of several publications advocating that genome-scale constraint-based metabolic models are a natural continuation of genome annotation and constitute the foundation of choice for experimental data integration and extension towards more detailed models of cellular processes, no systematic metabolic model reconstruction pipeline exists yet. The only metabolic model database to date is BIGG (UCSD), which provides – currently through restricted access - simple search and query functions with no analysis layer for a only a limited number of microorganisms.
Existing academic tools that handle genome-scale metabolic models (e.g. FluxAnalyzer (Klamt et al, PMID: 12538248), the COBRA toolbox (Becker et al PMID:17406635)) are typically focused on analyses with already reconstructed models, rather than the reconstruction process itself. Algorithms for metabolic network completion or gap-filling typically explore a very large space of possible networks following a given heuristics, but do not take advantage of the set of pathways from other species to structure and reduce that space. Published methods dedicated to model refinement using experimental data are typically blind to the existence of models or pathways in different species.
Baseline:
The databases and software tools mentioned above constitute the baseline. None of these resources possesses the full combination of qualities necessary to facilitate the large-scale automated reconstruction of microbial metabolism.
Advances over the state of the art: Microme will deliver the following improvements over the current baseline:
-
Establishment of a strong European pathway resource
-
Engagement with the European research, genome sequencing, and bioinformatics communities for pathway annotation and quality control.
-
Development of an open-source software suite (as an extension of the existing Reactome software) for re-use by the community.
-
Direct application of pathway knowledge in the annotation of novel genome sequence.
-
Regular cycles of data updates and tight integration with leading public bioinformatics repositories.
-
A data model capable of capturing information about the taxonomic spread of pathway variants, enabling the effective exploitation of known pathway information in automatic annotation.
-
Presentation of an integrated resource containing pathway and stoichiometric model data.
Microme will be the first major European pathways database with extensive coverage of microbial diversity.
There is no current world-class European resource in the field of microbial pathway databases: Microme will fill this gap. Moreover, only one of the key non-EU pathways resources (BioCyc) focuses on bacteria. BioCyc has several limitations delineated above, eg the lack of an efficient projection/curation scheme or the lack of support for models.
Microme will be completely open source. Microme’s infrastructure will be developed as an extension of the open source Reactome software, and both the data and software will be freely distributed without restriction.
Microme will deliver a projection-curation infrastructure dedicated to the reconstruction of pathways from genomes
Additionally, curation of pathways in Microme will be undertaken through the use of a proven user-friendly distributed curation infrastructure, a key asset in enabling an efficient curation process.
Microme will offer compatibility with Reactome, a leading repository of pathways relevant to human biology. The development of Microme will ensure that the full diversity of pathway data from across the taxonomic range, from bacteria to humans, are available through a common data model and interface.
A semi-automated pipeline enabling systematic model reconstruction:
By taking into account the specific requirements for the construction of metabolic models, Microme will facilitate the incorporation of all necessary elements (e.g. metabolites, reactions, flux balances) for the semi-automatic generation of models and the execution of consistency checks, and the design of computational methods for model refinement.
Microme will be structured around a rigorous definition and computational representation of pathways, allowing pathways to be compared and assembled, and variants of pathways to be identified. The initial cross-species curation of reference pathways will capture the invariant characteristics of biochemical pathways, allowing the analysis of complex cross-species queries across hundreds of organisms. For example, metabolic distances across multiple species can be computed on the basis of the variable components of pathway maps only, similarly to the approach taken in multiple sequence analysis where the variable residue positions determine sequence relationships. By providing access to a large variant repository and by integrating model reconstruction methods that exploit it, Microme will allow the space of alternative models to be computationally explored and tested against experimental data, opening the way to comparative reconstruction.
Microme will allow comparative analysis of pathway variants across multiple speciesthrough its rigorous definition of variants. Comparative and phenotypic analysis will be performed and utilised to enhance the Microme resource. The comparative analysis of pathways and models will benefit from the incorporation of genomic context information
Through these strategies, Microme aims at being the first large-scale reconstruction system and repository for metabolic models. Microme will systematically generate reliable drafts of metabolic models and, through its modular architecture, analysis tools and relational databases provide a computational resource for the refinement of models by different types of experimental data.
Microme models for a selected set of species deposited in the DSMZ collection will be tested and improved with systematically generated growth phenotype data: the corresponding strains undergo a standardised phenotypic screen, a high-throughput substrate utilisation assays (BIOLOG). The outcome of the tests will be fed back into the models, providing a standard measure of model accuracy, and a common, “minimal validation standard” for benchmarking the model-generation pipeline Inconsistencies between model predictions and experiments will provide clues, helping curators fill gaps and expand the metabolic network, refine the gene-reaction correspondence, or correct the biomass function.
Microme will become the first resource worldwide to integrate access to and generation of pathways and models. Microme will exploit the fact that stoichiometric models are very close to “pure” metabolic networks and that both can be decomposed in pathways to provide a unique pipeline and resource for both. We expect the reliability of the model pool and validation resources to improve significantly both with the increase in the number of reconstructed models and with the progress in the curation of pathways, directly performed within Microme or integrated from external sources.




