Computational metabolomics

Molecules are everywhere. Within this research theme, we focus on the diverse structures and functions of small and specialised molecules.
Specialised small molecules play crucial roles in nature. They act as signals between species – and even between different kingdoms of life – or serve as chemical defences against competing organisms. They also influence the growth, resilience and nutritional quality of crops. If we could identify all these small molecules and understand what they do, it would greatly advance fields such as antibiotic discovery and sustainable crop production.
Focus
Specialised small molecules rarely appear on their own. They are usually part of complex mixtures containing metabolites from many different organisms and from the environment. To understand such mixtures, we need methods that can quickly and accurately capture the full small-molecule profile.
In our work, we focus on three main routes to improve our understanding of complex metabolite mixtures:
• strengthening our ability to annotate metabolites,
• performing chemically informed comparative metabolomics, and
• linking metabolite profiles to biological functions and to genomic information.
Metabolomics aims to map all small molecules in an organism, and is therefore ideally positioned to explore this chemical diversity. Modern analytical technologies, especially advanced mass spectrometers, now provide extremely information-rich metabolic profiles. Yet, these profiles usually consist of mass spectra rather than direct molecular structures. Existing spectral libraries are expanding, but they still represent only a small fraction of known natural products. As a result, only a limited portion of experimental data – typically between 2 and 25 per cent – can currently be annotated through library matching. This means that most metabolomics data still contain hidden, unexplored chemical information.
Our ambition
Our ambition is to close the gap between the data we can generate and the biochemical insights we can extract. We want to learn how to interpret spectral data in a way that reveals both molecular structure and biological function.
To achieve this, we aim to determine:
- which types of structural information are embedded in metabolomics data
- how we can detect entirely new chemistry in spectral profiles
- how we can identify meaningful groups of metabolites within complex mixtures
Our research agenda is therefore centred on developing algorithms and computational models that improve structural annotation and enable direct biochemical interpretation of metabolomics profiles. We build on concepts from both natural language processing (NLP) and genomics. For example, we have shown that topic-modelling algorithms from NLP can help detect molecular substructures in metabolomics data, and we are pioneering the use of word-embedding approaches to enhance metabolomics analysis.
At the same time, genomics has produced a vast ecosystem of analytical tools over the past decade. By adapting metabolomics data so that these tools can process them, we can reuse powerful genomic methods and benefit from the rapidly increasing amount of curated sample information. Moreover, by linking genomic and metabolomic results, we can accelerate natural-product discovery. This allows us to connect biosynthetic gene clusters to the molecules they produce and to exchange structural information between both data types.
We apply these approaches primarily to the plant root microbiome and the human food metabolome. Both contain highly complex metabolite mixtures that still hold many unknown molecules. By unravelling them, we aim to gain more profound insight into the molecular mechanisms that regulate growth, development, and health.