Big data in the pandemic: strengthening the response
As the UK Government’s National Data Strategy recently noted, the pandemic has established a “high water mark of data use”. This blog is based on a presentation I gave at the Campaign for Social Science seminar on ‘How can social statistics help us fight COVID-19?’, which took place on 19 September 2020.
This event examined the role of social statistics in informing the COVID-19 response and recovery.
Data in the pandemic
Data are being used by health and social researchers to understand the drivers of the pandemic and the risk factors for individuals, and to measure the social and economic consequences. More immediately, data have been used in the construction of a wide array of indicators which enable us to track developments, informing public policy and wider public debate.
These indicators have provided insights into a wide range of areas or domains. Health outcomes have understandably been most prominent, but there have been indicators of economic activity and the labour market, the environment and social needs, as well as novel indicators capturing population movements or mobility.
Many indicators have levels of temporal and spatial detail which official statistics rarely, if ever, match. They are being produced on a daily or weekly basis, giving near-real-time insights often down to local authority level. Many have been made widely accessible to non-technical users through rapidly constructed public interfaces or dashboards.
The role of big data
A key feature of these indicators is that most stem from new forms of data or big data. They aren’t the product of the usual carefully curated processes for producing statistics, drawing on Censuses, surveys or administrative extracts.
Some are user-generated, captured through crowdsourcing efforts. The most high-profile is the COVID-19 Symptom Study, which produces daily estimates of the symptomatic population down to local authority level.
Others are produced by administrative or business systems. On the health side, we have daily reports from NHS systems, the basis of this Scottish dashboard, for example. On the economic side, researchers have licenced data from Adzuna, a jobs listing firm, to produce weekly analyses of local vacancies. The Citizen’s Advice Bureaux is drawing on its own records to produce monthly summaries of the problems people seek help with.
Publicly owned sensor networks have long produced data on air pollution or traffic volumes. Since the pandemic, the images from traffic and CCTV cameras are being re-purposed to provide novel measures of pedestrian activity, as this work by UBDC and by ONS has shown.
Most powerful of all, perhaps, are the data which can be harvested through the use of private sensor networks, in the form of mobile phones. Tech giants like Google and Apple have shown the way here, providing local indicators with near-global coverage, examined in this UBDC blog. Smaller companies also contribute, collating data from a myriad of different apps, to track activity in particular retail or leisure locations; see, for example, work by Huq.io.
The mixed economy of data
These indicators are produced in a mixed economy, with contributions from public, private and third sectors. This is in many ways a strength as it has harnessed the energy, creativity and resources of diverse groups, enabling a rapid response that the public sector alone could never have managed.
But it is also a potential weakness, as efforts are piecemeal, fragmented and impermanent. Several of the features we would normally demand of official statistics are absent or limited, so there are many questions we should be asking before we rely too heavily on the representations these indicators provide, compelling as they may seem.
Transparency and continuity
Can we see how the data were processed from raw to finished indicator? Can we be sure that the methods have remained consistent so that changes over time represent change on the ground rather than artefacts of data processing decisions? On 22 September, for example, Google unilaterally suspended its widely-used Community Mobility indicators pending “updates” to the methods used.
How do we know that these data give an unbiased representation of any phenomenon? Which groups or places are under- or over-represented? What efforts have been made to validate the data against other sources?
What conditions are placed on access, e.g. regarding data owner rights to limit or vet what can be said with the data? Will the data continue to be available free?
Strengthening the big data contribution
So how can we ensure we get the most out of the opportunity offered by big data? I think there are five actions we can take.
Data stewardship by the public sector
We need to recognise the importance and value of data which already lies in public ownership and ensure it can be widely used. The National Data Strategy has called for “a radical transformation of how the government understands … the value of its own data” and that is to be welcomed. We now need to work out how to incentivise better performance here, as well as monitoring and, where necessary, penalising failure to deliver.
Data collection through regulation
Beyond its own activities, the Government has many opportunities to secure data from others, especially where it is already regulating activities. Too often, however, such information reporting requirements are seen only as a cost on business rather than as generating wider public benefits. We need to develop new ways to assess data collection and judge what is appropriate.
Enhanced public collection for validation
Some novel data sources, notably those from mobile phones, have enormous potential value. But if we are to take significant policy decisions on the back of these, we need to understand their biases. One means to do this is through integrating new forms of data collection into some of the ‘gold standard’ household surveys conducted by the Government and academia to provide a representative population from which to judge other sources.
Systematic acquisition through licensing for public use
It will continue to be necessary for the Government to acquire rights to use data through licensing. But it could do more to pool the collective purchase power of the many public bodies, national and local, to secure deals which enable much wider and more enduring access.
Review and collation of the existing array of indicators
There is already a wealth of intelligence out there for local organisations but it is fragmented across multiple sites, with variable quality. More could be done to review indicators and collate them at a single location, focusing on the needs of local authorities and their partners who bear the burden of responding to the social and economic fallout of the current crisis.
Nick Bailey is Director of the Centre. He is a Professor in Urban Studies, based in the School of Social and Political Sciences at the University of Glasgow.