Possible problems dependent data
Ignoring the dependence of observations generally causes standard errors to be underestimated. The mechanism leading to this underestimation is easily explained as follows. Imagine an extreme case of 10 groups of 100 identical observations each. In classical statistical methods, the calculation of standard errors is based on 1000 observations. However, since each group contains 100 totally dependent observations, the useful information in the sample really is limited to only 10 observations. Obviously the standard errors will be much greater based on 10 observations, indicating less precision than in the case of 1000 observations.
In reality observations are more likely to be similar to a certain degree instead of being identical. But this dependence will still lead to an underestimation of the standard errors. The p-values resulting from analysing these data with classical methods (t-tests, regression analyses, etc.) will therefore seriously underestimate the risk of having observed a particular pattern of results purely accidentally.
As a solution, several techniques have been suggested to model the dependence in data. The aim is not to remove dependencies, as if they were just a nuisance, but to use them as a source of information in themselves. This is done by making the structure of the analysis model reflect the structure that is found in the data. While classical models assume that there is only one source of random variation, multiple level models introduce random variation at each level of the data-hierarchy and time series models introduce random variation that is specific to the changes from one point in time to the next.
The goal of WP7 is to give an overview of methods to deal with dependencies in data. There is a general difference between multiple level analyses that are dedicated to data with hierarchical dependencies and time series models that are dedicated to time-dependent data. The report Multilevel modelling and time series analysis in traffic safety research – Methodology is structured according to this difference.
|