Blog – What is statistical mapping?

Blog by Hannah Hussain

Reading Time: 6 minutes

22/11/2023

100

I started my PhD in the wake of the pandemic, coming mainly from a community pharmacy background with a further MSc. I spent a good few months writing my proposal – but nobody really sticks to those do they…? Things change and that’s inevitable, and your supervisory team are likely to influence the path of your research. For me, my amazing and talented supervisors have a variety of incredible research strengths and skills, but it was very clear from the get-go that this was going to be a quantitative PhD. Lucky for me, my quantitative skills were what I wanted to work on and develop. I saw my PhD years as an opportunity to develop my skills, kind of like an extremely long training programme? However, having done little-to-no quantitative analysis before, this idea was extremely daunting! How could I conduct these analyses?! Your PhD is a piece of almost entirely independent work, how could I possibly do this?? I remember when I’d just submitted my systematic review and was dreading what was to come next. But I look back now, and I just think I wish I hadn’t had worried quite so much!

One of the key analyses chapters for me was where I would be conducting “mapping”. Up until I actually did such analyses, I was in the dark about mapping really is and what it would involve. I hope that this blog can shine light on anyone that may be feeling the same way, or anyone that wants to learn a little more about this methodological technique.

For context, my central research question relates to the gap that we see between how people living with dementia (PlwD) self-report their own quality of life (QoL), and how a proxy (typically their family member or friend) reports about them. We frequently see that PlwD optimistically self-report and proxies tend to underreport – providing worse scores. There is an array of reason for this occurrence, for example, the proxy might be projecting their own burden or expectations onto the PlwD when providing these QoL reports therefore resulting in lower scores; while the PlwD may have shifted their expectations in line with their condition – and therefore provide relatively higher scores. The theory and concepts are well explored within the field, however when it comes to how to interpret these 2 distinct sets of QoL scores, the research is sparce. This has resulted in 2 sets of analyses, one performed using the PlwD’s reports, and one performed using the proxy reports.

When it comes to analyses such as economic evaluations – the QoL measurement that is used is critical & carefully considered, as ultimately resource allocation decisions hinge on these metrics.

In which case, whose data should we use? My thesis has worked to understand if there are specific dimensions of QoL that should be answered by the PlwD themselves, and others that are maybe better captured by a proxy. The goal was to generate a combined QoL score that considers both important sources of data in an empirically sound way. The mapping part comes in as I wanted existing and future studies to be able to derive this combined QoL score in the absence of data from either rater. Therefore, mapping to the PlwD’s responses using proxy data, and mapping to the proxy’s responses using PlwD data.

But what actually is mapping right…? It sounds good, but what tangibly is it?? These are questions that I had too.

Statistical mapping is essentially the application of regression models to develop algorithms that translate one measured outcome into another. So, what do we practically need to get started with a mapping study? I’ll tell you:

Mapping checklist:

Existing dataset(s)
Knowledge of the data
Knowledge of the theory
Patience

The algorithms are typically developed using clinical trial data containing the target measure – so the measure that we want to arrive at aka the dependent variable, as well as any predictive measures that we will use in the regression equations as independent variables. The reason I said datasets is because it is desirable to have multiple similar datasets. One that can be used to developing the mapping equation, and one that can be used for testing to see if it works.

Mapping involves a prior knowledge of the data as there are established models that can be best applied according to the data’s distribution. For example, I used response mapping methods using ordered probit models because I wanted to predict responses for each QoL dimension. If you are aiming to predict an overall scale score, a different model would be better to apply. Therefore, it’s important to understand the data that you have as well as the goal of your mapping study.

An understanding of the theory is important as mapping is a systematic iterative process whereby the predictor variables are omitted and brought back in. This step also requires patience. During this process, the fit of the model needs to be scrutinised via the appropriate model fit statistics as well as assessing the model’s predictive abilities. The included independent variables will generate coefficients that need to be carefully considered. Are they behaving as expected according to the theory? For example, we might expect a variable for say “physical health status” to have a negative impact upon depression, where the CSDD (Cornell Scale for Depression in Dementia) is the target measure. Therefore, we would expect physical health status to have a negative coefficient. In this way, according to the underlying theory, the coefficients need to be carefully considered.

The output from mapping studies can then be used in analyses, allowing you to estimate what the target measure would have been in its absence in other studies. However, it’s important to remember that the output is NOT DATA. It is in fact a series of predictions – meaning that the results will show a different distribution compared to actual data that are based around the means.

The key thing that stuck out to me about statistical mapping is that it is in fact an art, and not really a science. Two different researchers could acquire the same data, with the same mapping question and produce different outputs due to their personal understandings and prior knowledge. There isn’t really a complete right or wrong, however each decision you make must be statistically backed up and justified. In my opinion, the best way to learn a skill like statistical mapping is by reading existing studies and by having a go! The University of Oxford has created an incredibly useful source in an online database of mapping studies. You can search your measure or instrument of interest and it’ll fast track you to the relevant literature, and I’ve linked this page to this blog. Happy mapping!

Useful resources: https://www.herc.ox.ac.uk/downloads/herc-database-of-mapping-studies

Hannah Hussain

Author

Hannah Hussain is a PhD Student in Health Economics at The University of Sheffield. As a proud third generation migrant and British-Asian, her career path has been linear and ever evolving, originally qualifying as a Pharmacist in Nottingham, then Health Economics in Birmingham. Her studies have opened a world into Psychology, Mental Health and other areas of health, and with that and personal influences she found her passion for dementia.

Follow @hannagr4m