Statistical models are something you hear about from all sides in these bizarre “corona days”. Something that many people are working on. But what exactly are they? Based on my expertise, I would like to provide you with more information.
What is a statistical model?
A statistical model is a series of statistical assumptions for an optimal approximation of the actual process, whether this is a chemical process, a production process or, currently, the spread of the SARS-CoV-2 virus.
Why do we use a statistical model?
In addition to explaining the past, a good model can usually also enable you to make predictions about the future course of your process, if no unexpected factors are added, of course.
How does a statistical model work?
The bases of a model are facts and figures; after all, we don’t just say “to measure is to know” out of the blue. And that’s where the problem lies in the case of Covid-19, because you need reliable data. By reliable I mean correct, of course, so that when we say “5” the result is also effectively 5, but also that, as far as possible, the data is always determined under the same conditions and represents the same thing. If countries start testing based on different criteria or do not show all the deaths, you cannot make a good comparison: you have apples and pears. So you have to take this into account if your measurement conditions change.
What is the link between a statistical model and a regression?
To arrive at a model from that data depends on how many input parameters you have and how many output parameters, but the basis is always regression. There is often much more data available for modelling production processes than one might think at first sight, because computers and measuring devices generate an enormous amount of data. As a result, the models are also becoming increasingly complex and the traditional statistical techniques are not always sufficient.
When do we use additional techniques?
In cases where we have many variables, we need multivariate techniques. To do this, you generate fictitious variables (principal components) by making combinations of your actual variables, from which you will then construct a model via regression.