Making the Most of 'Missingness'
When approaching a jigsaw puzzle, all of the pieces are at your disposal with the object of connecting the pieces to reveal the complete picture. But what happens when a piece is missing? Most would toss the table over and revel in the hours lost.
This is simply not an option for critical illnesses like sepsis, a severe inflammatory response to an invading pathogen, and acute respiratory distress syndrome (ARDS), a high-mortality complication of sepsis. Currently, there is not a simple treatment for either, with time and lives being of the essence. Of the patients that die in the hospital, 35% have sepsis. The mortality of ARDS patients is 40%.
A multidisciplinary MCIRCC research team including, Christopher E. Gillies, PhD, Emergency Medicine; Kevin Ward, MD, Emergency Medicine, Biomedical Engineering; Kathleen Stringer, PHARMD, Pharmacy; Xudong Fan, PhD, Biomedical Engineering; Ruchi Sharma, Biomedical Engineering; and Theodore Jennaro, Pharmacy and Bioinformatics, is using a different method.
The “Probabilistic Modeling of Missing Data to Improve Predictions Using Metabolomics Data” project was funded for the first time in December of 2019, receiving $90,000 from the Michigan Institute for Data Science (MIDAS) in its first round of Propelling Original Data Science (PODS) Grants.
The funding is now being utilized to embark on the development of a Bayesian methodology that correctly models missing data patterns for metabolomics data within a left-censoring, or data below detection limits, framework.
Metabolomics is highly relevant to both sepsis and ARDS. Everyone has a set of metabolites present within their cells. Metabolites are formed in metabolism, the process by which food and drink are converted into energy. Some metabolites include lactate, glucose, and creatinine. These metabolites provide knowledge into the patient’s current physiological state, and metabolomics is a way of measuring these small compounds in a biosample such as through breath or blood. These samples providea more precise diagnosis, improved risk stratification, improved understanding of underlying mechanistic biology, and informed rationale for drug discovery.
“One of the big challenges with metabolomics is that there is a lot of missingness, and we need a way to leverage this data in the presence of all the missing data,” shared Gillies, co-Principal Investigator and MCIRCC Assistant Research Scientist. “The belief is that these metabolomics data sets could be used for predictive modeling, and to better characterize sepsis and ARDS.”
The ‘missingness’ occurs in a pattern known as left censoring. What this means is that there is a metabolite concentration threshold that we can accurately measure above, but not below the threshold. How does this left censoring ‘missingness’ occur? Highly abundant metabolites obscure the quantification of lowly abundant metabolites or lowly abundant metabolites are undetectable.
Gillies explained that the whole idea is to write an algorithm that will take advantage of missing data and make use of it. Consequently, when the missing data is ignored, the result will be a bias of which metabolites are associated with an outcome of interest, such as a patient dying. Knowing which metabolites are actually associated is imperative.
”For example,” said Gillies, “let’s take the metabolite lactate. Because of the measurement technology, you may be unable to measure the lactate properly in a biospecimen. But you could maybe use all of the other information you have to predict what your lactate would have been if you could have measured it. It’s using that type of approach to make a better prediction about sepsis death and ARDS detection.”
Data previously gathered by two co-Principal Investigators is especially helpful in moving this project forward. Dr. Fan’s Micro-gas Chromatography technology, a breathalyzer created to detect volatile organic compounds, determined the breath of ARDS patients sampled had a missing rate greater than 30% in 56.9% of detected compounds. Dr. Stringer’s blood metabolomics data of sepsis patients, quantified using Nuclear Magnetic Resonance spectroscopy, determined 38.6% of metabolites were excluded. This ‘missingness’ may influence conclusions, and thus the course of action.
Currently, the approach is to assume half of the minimum value known, leaving a bias and likely inaccurate result, rather than accounting for the uncertainty in the missing data. The MCIRCC research team’s solution involves making a better prediction by modeling the uncertainty in the likelihood function by using cumulative distribution function, or as Gillies said, “a bunch of math,” knowing that the metabolite could be any value below the censoring threshold.
This new approach will potentially assist in the identification of a subgroup of patients that respond to treatments based on their metabolic profile, therefore enhancing precision medicine.
Depending on the findings, the ultimate goal is to use this methodology and unique data to build a blood biomarker device that can identify and measure the specific metabolites to a patient. This would determine how to move forward with informed, individualized treatment.
Currently the team’s table stands firm as they choose to work with the puzzle’s missing pieces.