Wednesday, 4 December 2013

Dirty Data

Westiminster recently feigned shock and horror that the UK's coppers cook the crime figures. But Simon Jenkins says we've known for years that the numbers are meaningless and they should be banned as "they spread confusion and fear".

But 'plod' is not alone in mis-classifying, mis-recording, ignoring or otherwise presenting data in a way that suits himself. We've had many financial trading scandals where banks apparently had no idea of the exposures they faced, either because transactions were concealed or perhaps no one was looking hard enough - the global financial crisis was a function of poor due diligence.

A possible root cause of the problem is that humans are involved too early in the data collection and reporting processes. Rarely are we responding to the 'raw' data, as opposed to figures that have been 'gathered' and 'rolled up' through a series of other people's filters, manipulations and interpretations (which are often taken out of context). It's puzzling why regulators' systems don't receive a feed of the actual trades straight from bank trading desks - or from peer-to-peer lending or crowdfunding platforms - rather than relying on periodic reporting of summary data.

Maybe GCHQ can help...

At any rate, we should focus more on 'clean' mechanisms for capturing and presenting raw data rather than someone else's interpretation of it.

Image from TraceyNolte.

