Data in the right format
Wednesday, December 4, 2019
By Jean Mallo, Performance Manager, Children’s Services Performance Team, Wandsworth Borough Council and co-chair of LIEG (London Information Exchange Group).
This post is part of a series published by the Rees Centre on data (see below for related posts).
The five structural and formatting barriers to turning numbers into intelligence, or Data to Insight.
Local authority resources are dwindling. There is an increased demand on children’s services to make better and quicker decisions to ensure that the right action is taken promptly to improve the lives of children and families. As a result, there is a renewed pressure on performance teams to turn numbers into intelligence, or Data to Insight.
Each manual step taken to manipulate data into a usable structure and format before analysis can result in human error. More critically it leads to duplicated effort in 151 local authorities across England. To put this into perspective, if a table of published data requires just 5 minutes of manual preparation, then almost two days’ of an analyst post across England was spent on formatting rather than on understanding or learning.
The five most common types of structural and formatting problems taking up analysts’ time:
Hidden or blank rows and columns
What you see is not what you get. Copying what appears to be a simple five-by-five table results in a messy muddle that requires manual deletion of rows and/or columns and unmerging cells before it can be used in reports or used as underlying data for analysis.
Numbers and dates stored as text or strings
For a number or a date to become insight it needs to be understood in the wider context. For example – is the number high or low, is there an upward trend or a decline, what is the duration between the two dates? Numbers and dates reported as text or string need to be changed before these questions can be answered.
Dealing with missing data
Turning what appears to be ‘no data’ into genuine ‘no data’ can be a time-consuming task – a space may take the place of what appears to be an empty cell, or it may be filled with a letter or a symbol. The rules change from one source to another, sometimes even within the same organisation or publication.
Machine readable is not always human readable
On the one hand, the ‘tidy data format’ is an analyst’s dream for converting numbers into tables and visualisations using the latest software and technology. On the other, the standard table format is easy to read and understand not only by local authority analysts but also by the public.
Inconsistencies between publications
Regardless of the format or structure chosen by an organisation, the one thing that carries the most weight is the consistency between publications. The children’s social care sector has developed its own data warehouses and analysis tools. Each time the structures and formats change, this is a further burden on local authorities to rewrite the code to manipulate the data back into the required structure.
The first three barriers have been solved in other fields and their solutions are starting to attract the attention of children’s services. There are common standards on how data should be defined, organised and described, and the access of information through Application Programming Interfaces (APIs) proves to be more consistent than spreadsheets. We know that the Department of Education (DfE) are looking at both solutions, although children’s social care data appears to be behind education data in the queue for improvements. Whilst we wait for these to be implemented, there could be merit in developing and sharing scripts that normalise data to help with the automation of cleaning up issues such as inconsistent approaches to dates and missing data.
By contrast, inconsistencies between publications feels like a challenge to be met with intelligent and inclusive partnership working rather than technology. Increasingly, we see organisations who provide data or define data standards, such as the DfE and Ofsted, starting to recognise that seemingly minor decisions about how to label and organise data can have very significant impacts for local government. We have seen an openness to involve children’s services in those decisions to show them how best to help us, and to work collaboratively. Formalising and expanding those arrangements could open the possibility of saving substantial amounts of time for local government staff, and also enable a new generation of analytical tools to be built upon more predictable and comparable data.
The improvements in technology and partnership working continue to drive the change to centralised data manipulation, the benefit of this is that children’s services can focus on providing valuable and insightful local analysis that makes a real difference to the decisions of children’s services in their area.
Jean Mallo, Performance Manager Children’s Services Performance Team Wandsworth Borough Council and co-chair of LIEG (London Information Exchange Group)
This post is part of a series published by the Rees Centre on data. The Rees Centre welcomes guest blog posts from professionals across the sector. Views expressed are the authors’ own and do not represent those of the Rees Centre.