In partnership with


Using Data Science in Policy

The first report from BIT’s Data Science team



The range of techniques that make up data science – new tools for analysing data, new datasets, and novel forms of data – have great potential to be used in public policy. However, to date, these tools have principally been the domain of academics, and, where they have been put to use, the private sector has led the way.

At the same time, many of the uses of machine learning have been of fairly abstract interest to government. For example, identifying trends on Twitter is helpful but not inherently valuable. Projects showcasing the power of new data and new tools, such as using machine learning algorithms to beat human experts at the game Go, or to identify the prevalence of cat videos supporting one political candidate or another, have been some distance from application to government ends. Even when they have been applicable, often they have not been adequately tested in the field and the tools built from them have not been based on an understanding of the needs of end users.

Hence, along with many others, over the past year we have been working to conduct rapid exemplar projects in the use of data science, in a way that produces actionable intelligence or insight that can be used not simply as a tool for understanding the world, or for monitoring performance, but also to suggest practical interventions that can be put into place by governments.

We have conducted eight such exemplars, focused on four areas: targeting inspections, improving the quality of randomised controlled trials (RCTs), helping professionals to make better decisions, and predicting which traffic collisions are most likely to lead to someone being killed or seriously injured. This report covers six of these eight exemplars.

Targeting inspections

  • We found that 65 per cent of ‘requires improvement’ and ‘inadequate’ schools were within the 10 per cent of schools identified as highest risk by our model. Increasing this to the riskiest 20 per cent, our model captured 87 per cent of these schools.
  • Using publicly available data published by the Care Quality Commission (CQC) and other sources, 95 per cent of inadequate GP practices can be identified by inspecting only one in five practices.
  • By only using the public part of the CQC’s Intelligent Monitoring system, which is based on several clinical indicators, a similar model would only pick up 30 per cent of inadequate practices for the same inspection effort.
  • We have also built a model to predict the inspection results of care homes, but this model is much less successful, suggesting either that more data are needed or that machine learning techniques could be of limited use here.

Improving randomised controlled trials (RCTs)

  • Previously, we have used RCT data to study how the effectiveness of interventions varies for specific sub-groups, enabling interventions to be better targeted.
  • These sub-groups tended to be broadly defined by one or two pre-determined characteristics and combinations of characteristics were largely ignored.
  • By applying causal machine learning algorithms to data from RCTs, we are able to identify differential impacts of an intervention across all observable characteristics, ensuring that people get the best intervention for them, and helping to prevent backfires.
  • We replicated an experiment conducted in 2016 with King’s College London, in which students were encouraged to attend a Welcome Fair by being sent text messages emphasising either employability or social belonging, with the belonging condition performing best.
  • In our replication study, participants were randomly assigned to receive either one of the messaging arms allocated randomly or the message that the machine learning algorithm predicted would give them the best outcome based on their observable characteristics.
  • In our first study using these techniques, we found a small positive but not statistically significant effect from allocating messages by algorithm, which we believe is due to poorly regulated model complexity. We are improving the design of our targeting by using a consensus of models rather than one.

Helping professionals to make better decisions

  • Social workers need to make a large number of decisions, very often with little time and incomplete information.
  • Our previous work in this area has shown that high caseloads for assessment social workers can influence the decisions that are taken.
  • Working with one local authority, we used natural language processing to predict which cases that were flagged for no further action would return within three months and result either in a child protection plan or a child being taken into care.
  • Analysis using both text and structured data allowed us to predict, 8.3 times better than chance, which cases were likely to be referred back into the system.
  • Using just analysis of the text, we can detect 45.6 per cent of cases that will return from just under 6 per cent of all cases, allowing interventions to be precisely targeted to support the families most in need.
  • We are working with social workers to build a digital tool that can be used to help inform their decisions.

Predicting serious traffic collisions

  • Traffic collisions in East Sussex have bucked a national trend for fewer incidents of killed and seriously injured (KSI) casualties.
  • We are able to predict which accidents will result in someone becoming KSI, with drivers’ behavioural factors, and not road conditions, contributing the most to the explanation.
  • We have been able to bust some myths – for example, about older drivers, and goods vehicles.
  • Motorcyclists, the young, and people in early middle age are disproportionately more likely to be involved in KSI incidents in East Sussex.

Date Published

December 14, 2017