Allard Group | Department of Biology | University of Fribourg

COMMONS Lab

COMputational Metabolomics & Open science

for Natural products reSearch

The COMMONS Lab research is focused on the development of computational solutions for natural products research. Particular emphasis is placed on the development of tools for the organization, annotation, visualization and interpretation of mass spectrometry data in order to efficiently identify metabolites in complex biological matrices. Linked Open Data and semantic web technologies are explored to organize and connect publicly available and newly acquired knowledge. These results are then used to answer questions in a wide range of research topics spanning from drug discovery to chemical ecology. We are committed to improving knowledge sharing in natural products research and strive to follow the principles of Open Science and Open Notebook Science.

What are the commons ?

The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable earth. These resources are held in common, not owned privately. Commons can also be understood as natural resources that groups of people (communities, user groups) manage for individual and collective benefit. Characteristically, this involves a variety of informal norms and values (social practice) employed for a governance mechanism. Commons can be also defined as a social practice of governing a resource not by state or market but by a community of users that self-governs the resource through institutions that it creates. Commons in Wikipedia

We found COMMONS to be a well suited acronym for COMputational Metabolomics & Open science
for Natural products reSearch. The commons definition is aligned with our vision of Open Science and our strong interest in exploring new ways to create and share natural products research knowledge.
We break things into tiny pieces ...
In order to study the chemicals present in living organisms (plants, fungi, animals or bacteria) we use a set of tools and techniques aiming to break these into very small parts.

We break things down at the bench. For this we use different extraction techniques to selectively access the molecular contents of these biological matrices. These techniques can be quite simple (think of the tea leaves that you add in hot water or your morning coffee ) or more elaborated. Once we have these extracts, we take advantage of chromatographic techniques such as UHPLC (Ultra-High Performance Liquid Chromatography) or GC (gas chromatography) to physically separate the thousands of molecules present inside. Finally, we use Mass Spectrometry (because it's cool) and also because it's an extremely sensitive tool which allow to acquire information for the thousands of molecules we extracted in a small amount of time. Here also, we also mass spectrometers to physically break the molecules and measure the mass of all the produced fragments. This process yields a spectra (a kind of ghost fingerprint of the structure of a molecule).

We also break things using computers. For example we use softwares (such as CFMID http://cfmid4.wishartlab.com/) which have been trained to predict molecule spectra from molecule structures. Using such approach we have transformed large collections of natural products structures into large collection of natural products spectra (Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication). These spectral databases are freely available online at the COMMONS Lab Zenodo repository (ISDB: In Silico Spectral Databases of Natural Products)
... and put them back together !
Most of the tools we develop and work on in the COMMONS Lab aim to link and put back together the things we previously broke (see previous section). Just like a big puzzle . Hopefully, during this process we also learn some interesting things on our way !

Putting things together, finding patterns and making links is how humans have learned to read Nature and more specifically how they have encountered natural products of interest historically. It is for example the typical knowledge acquisition scheme in most of the traditionnal medicine systems. In the last century, we have been exploiting reductionist approaches (see above section for example). These approaches are very powerfull but have their limits. See this paper (Pharmacognosy in the digital era: shifting to contextualized metabolomics) for our views on knowledge acquisition strategies in pharmacognosy.

Central among the goals of the COMMONS Lab is the development of novel computational strategies to efficiently organize, annotate and visualize mass spectrometric data. For this, we need to establish links. Links between spectra. Links between spectra and structures. Links between spectra, structures and bioactivities. Links between structures and their biological sources. Links ... everywhere. Putting things back into context. We believe that this is fundamental to harness the power of reductionist approaches. Their is still much to be done in the corner D of the previous figure. This is were we are currently putting our efforts.

Here are some examples of such tools and approaches :

molecular networking (which could in fact also be called spectral networking). By comparing spectra acquired during non-targeted mass spectrometry analyses, this strategy allows to build networks (or graphs) of spectrally related analytes. As the spectra reflect the structures of the analytes, it is in fact families of (potentially) structurally related analytes that are established. This can be done without the need for any metabolite annotation strategy. This is an extremely powerful approach to organize the complex chemistry of natural extracts. Please, refer to the seminal paper Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking and the GNPS Platform for more details.

metabolite annotation: It is useful to organize the spectra but it is also very important (and quite complicated) to link the spectra to the structures. This means annotating the spectra with the molecular structure, a first step towards identifying the metabolites in a given extract. We have created huge spectral libraries of natural products that can be automatically compared to sets of experimental spectra (see the section "We break things into small pieces..."). To improve this metabolite annotation process, we believe that meta-correction systems, which take into account multiple orthogonal information, are needed. See Deep metabolome annotation in natural products research: towards a virtuous cycle in metabolite identification. Since we are primarily interested in the characterization of specialized metabolites, we exploit the fact that the distribution of these metabolites in the tree of life is not ubiquitous, but in fact governed (at least in part) by the genetics of the producing organisms and thus their phylogeny, to inform the metabolite annotation process. We have shown that such a strategy significantly improves the correct annotation rate of the state-of-the-art metabolite annotation computer solution. See Taxonomically Informed Scoring Enhances Confidence in Natural Products Annotation for further details and implementation.

open ressource of natural products occurences: as briefly discussed in the previous point, it is important for natural products chemists to have acces to large and reliable ressources documenting natural products occurences (which biological organisms contain which chemical structures). Such ressources are valuable not only for the natural products chemists but also for any researcher dealing with living organisms and their chemistry (ecologist, biologist etc.) Up to recently such ressources were scattered, poorly formated and poorly accessible. But this was the past ! We recently launched the LOTUS Initiative with two colleagues (Adriano Rutz at the University of Geneva and Jonathan Bisson at the University of Chicago) and a team of researchers and wikipedians. This initiative aims to propose better knowledge managements solution in natural products research. LOTUS data is hosted at Wikidata and currently represent the most comprehensive and open datasource of natural products occurences. You can read the LOTUS paper for more details or directly browse https://lotus.naturalproducts.net/.

Current research projects

Most of our current research projects are embedded within the Digital Botanical Gardens Initiative (DBGI), an Open Science initiative to explore and establish robust and scalable workflows for the digitization of chemo- and biodiversity, from botanical collections. The ongoing research in the DBGI can be observed at https://www.dbgi.org/dendron-dbgi/.

The DBGI is in fact designed as a pilot for a more widescale and ambitious project: the Earth Metabolome Initiative (EMI), a global effort to profile the metabolic content of all currently known species on our planet.
Research outputs and material

Most of the COMMONS Lab research outputs and material (slides used for teaching, presentation etc.) are available at the following Zenodo Community repository https://zenodo.org/communities/commons-lab-repository

Outputs specifically related to the LOTUS Initiative are made available here https://zenodo.org/communities/the-lotus-initiative/

Outputs specifically related to the Digital Botanical Gardens Initiative are made available here https://zenodo.org/communities/dbgi-zenodo-repository/

Pierre-Marie Allard

PER 04 - 0.104
+41 26 300 8808
E-mail

Department of Biology

Chemin du Musée 10
CH-1700 Fribourg
Switzerland

Specialized metabolism: importance in chemical ecology and human health

Course material is available here

Course SBL.20004

Introduction to metabolomics: data acquisition and processing

Course material is available here