Tracking lockdown, supporting recovery: the potential and the pitfalls of big data
The Government’s response to the COVID-19 crisis is being influenced by a wealth of data, including novel analyses using new forms of data.
The daily Downing Street briefings illustrate the potential of big data to provide timely information to guide public policy. But they also underline the pitfalls of relying on statistics where we have no control over their production. In this blog, Nick Bailey argues that we need a stronger public infrastructure to support policy makers at this critical time and highlights how the Urban Big Data Centre is contributing to this challenge.
The COVID-19 crisis has been quantified like no previous crisis. In the early stages, the focus was understandably on health: cases diagnosed, hospital admissions and the grim tally of deaths. Authoritative sources such as the Johns Hopkins University site quickly emerged, supporting epidemiological analysis by governments, academics, journalists and the general public – much (if not all!) a timely input to policy making.
As the crisis has progressed, attention has shifted to more difficult analytical challenges. Most immediately, governments want to understand compliance with social distancing measures but, as we move into the recovery phase, they will need a much wider range of insights into social and economic conditions across the country. They are faced with an unprecedented economic shock. To develop appropriate strategies and to learn quickly about their impacts, they will need intelligence with shorter lags and higher frequencies than is usually provided by official statistics.
The need is not for ‘real-time’ data current to the hour, minute or second, but rather for ‘near-real-time’ feedback within days or perhaps a week or so. And this data is needed for local areas – local authorities certainly, as the key service providers in many domains, and perhaps smaller areas as well.
New forms of data (big data) have much to offer in this context, given their key characteristics of volume and velocity (scale and flow). In the UK, the daily Downing Street briefings have begun to draw on some of this, citing among others Apple’s Mobility Trends Reports and Google’s Community Mobility Reports. These data illustrate some of the strengths of new forms of data but also their limitations.
On the positive side, these statistics provide daily measures of activities which are otherwise hard to capture, including daily population movements. They are produced as a by-product of other services, requiring no additional infrastructure for data capture: Apple base their measure on requests for directions while Google base theirs on mobile phone locations. They’ve been captured for some time so we can use the past to understand the scale of the change just now.
The data are timely and have high temporal detail. Apple produce statistics daily, with just one or two days’ lag, while Google also reports for each day, updating once a week with a few days lag.
The data offer both wide geographic coverage and spatial detail. Apple covers around 60 countries plus 90 cities within these while Google covers even more: 131 countries plus, for the UK alone, 152 sub-national locations.
Lastly, both companies should be given credit for making the statistics available in a machine-readable format with a consistent URL and stable file structure which makes re-analysis straightforward. There has been a struggle to get UK governments to make some of the basic health statistics available in this format, so there is something for the public sector to learn here.
At the same time, these two sources highlight the limitations, or pitfalls, of big data for public policy.
Most obviously, the companies control whether these series continue and can adjust the underlying methods without having to explain what has been done or why. There is no requirement for continuity or consistency here, nor even transparency.
Important features of the data are left to the companies to determine. Take geographies for example. Google’s sub-national data cover the whole of the UK but Apple’s just a few select cities. Google seems to use metropolitan areas for cities (mostly), providing measures for e.g. ‘Greater Manchester’. Apple provides measures for something called ‘Manchester’, but it is unclear whether that is the local authority, the city-region or just part of the search term entered by the user.
Both companies offer measures of ‘relative mobility’ but there is no consistency in how they construct these. Apple provides measures based on how people travel (driving, public transport and walking) whereas Google provides them based on where people travel (e.g. parks, workplaces or transit stations). Neither provides an overall measure.
Both companies have sought to convert the raw data into more useful intelligence by looking at changes against an earlier reference period but again they take different approaches. Apple compares activity levels for the current day with a single earlier day (13 January 2020) while Google compares with the average for the same day of the week in a five-week period in January/February 2020.
In short, we have remarkably little information on how these measures are constructed. We do not get access to the raw data, nor even the code which describes the many decisions made in converting data into a single indicator. Yet our governments are making decisions, at least partly, on the basis of these numbers.
Towards a public infrastructure of near-real-time intelligence
So, the challenge is to find ways to take the advantages of new forms of data with fewer of the disadvantages that occur when control is left with unaccountable private corporations. We need the up-to-date, spatially-detailed measures that big data offer, but we need also transparency and consistency in how data are converted into measures, and these measures need to be focussed on the major intelligence gaps confronting policy makers at national and local levels. Stakeholders need a chance to shape how measures are constructed and how they are reported.
At UBDC, we are trying to address this challenge. First, we are working to understand what the intelligence needs of local authorities and others are in this context. We have had inputs from many stakeholders but are always happy to have suggestions from others - please do get in touch!
Second, we are drawing on our existing data collection and exploring diverse new data sources which can provide near-real-time feedback. Early outputs from this work cover cycling, public transport, pedestrian footfall, road traffic and the short-term rental market (Airbnb). Many more are in the pipeline but if you have suggestions for additions, please let us know.
Third, we are working on approaches to render data into intelligence to address key questions: how different are things now than they would have been in the absence of lockdown? And as recovery progresses, which areas or groups are being left behind? We will make our methods transparent so that others can critique them or contribute alternative approaches.
Fourth, we will make this intelligence accessible to a broad, non-technical audience. Initially, we are publishing results in the form of blogs to highlight quickly what is possible, but we recognise this is not the long-term solution. We are looking to develop a proper interface that enables policy makers to scrutinise the data they want at the scale they need. We will welcome your inputs and feedback.
The current crisis in underlining to policy makers the value of data and the intelligence it can provide, particularly when it can be delivered quickly. Big data have a crucial role to play, both in the immediate lockdown period and as we move on to the recovery phase. UBDC looks forward to supporting efforts at national and local levels.
Nick Bailey is Director of the Centre. He is a Professor in Urban Studies, based in the School of Social and Political Sciences at the University of Glasgow.