ERSO
 

Definition dependent data

A general assumption at the basis of all statistical tests is the independence of the observations involved. Throughout the whole population of interest, each observation should be equally likely to show a particular value. We will present two examples showing that this assumption is unrealistic in many if not most situations in road safety research. Generally, two types of dependencies in data can be distinguished: hierarchically dependent data and time dependent data. We will describe why these dependent data are problematic for the classical methods of statistical analyses and present a variety of methods to deal with these problems.

Hierarchically dependent data (nested data)

A recent study of driving speed in Belgium illustrates the concept of hierarchical data. In this study, cameras were set up at a large number of random sampled road-sites and the speed of all cars passing through was registered. For the application of classical statistical methods one would need to assume that each car measured throughout the country had the same chance to drive at a particular speed. However, this is obviously not true as the road-site has a large influence on the speed recorded. For example, if the first car recorded drove 30 km/h, the probability of the next car passing through with 110 km/h is much smaller, than when the first car recorded drove 120 km/h. To conclude, measurements at the same road-site resemble each other much more than measurements across the country do. We therefore do not have the identically independently distributed data that we need for our inference statistics.

Time dependent data (time-series)

In the SafetyNet project, the annual or monthly number of road traffic accidents is collected. Again the assumption for classical statistical methods would be that at each point in time there is the same probability to observe a particular value. In fact, however, the accident numbers at successive points in time resemble each other much more than, for example, the present numbers and those from 10 years ago. A number of factors change over time, making measurements close in time more similar than those across a long period of time. Again we do not have the identically independently distributed data that we need for our inference statistics.

 

A general introduction to analysis of linked data is provided in the report Multilevel modelling and time series analysis in traffic safety research – Methodology.

 

   
 
© 2007 SafetyNet. All rights reserved | Disclaimer | Contact