Collecting and Storing Data from Internet-based Sources

Monday 11 June 2018
13:15 – 16:45 BST
Jura Teaching Lab, Level 4 Annexe, University of Glasgow Library, Hillhead Street, Glasgow G12 8QE

Collecting and Storing Data from Internet-based Sources will be an afternoon session providing researchers with the essential skills required to effectively use Application Programming Interfaces (APIs) for downloading data from a variety of online data sources. It will then cover the use of databases for storing and retrieving data and demonstrate how to automate the collection processes. Full course details below.

Collecting and Storing Data from Internet-based Sources

Course instructor: Peter Smyth, Reasearch Associate, University of Manchester

Course duration: Half day (Monday 11th June, 2018, 1:15pm – 4:45pm)

Course location: Jura teaching lab, Level 4 Annexe, Glasgow University Library

Audience: Researchers who need to collect Internet based data, e.g. social media and store it over a period of time


  • £25 - For UK registered students
  • £35 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
  • £50 - For all other participants

Pre-requisite knowledge: Some knowledge of Python would be useful but not essential as all code used will be provided.

Course summary:

Many websites allow researchers and developers to download data using their Application Programming Interface (API). This data is often in formats that social scientists are unfamiliar with (e.g. JSON). Downloaded data can be processed immediately or stored in a database for later processing in a package like R or Stata. Data can be collected at regular intervals over a period of time, using the built-in functionality of the Windows or Linux operating systems.

Course content:

Course participants will be introduced to the following:

  • Understand the JSON data format
  • Understand how to use APIs to collect data
  • Data storage and retrieval using a database (SQLite)
  • Ability to set up automated procedures to collect data

Payment and registration:

Registration is available via Eventbrite.

For any queries regarding registration, please contact Keith Maynard.

Short Presenter Bio:

Peter Smyth is a Research Associate at the University of Manchester, based in the Cathie Marsh Institute. He has spent 35 years working in IT at various large and small commercial organisations before taking an MSc in Big Data Analytics at Sheffield Hallam University and moving into academia. In his previous roles he used any convenient programming environment to hand to solve problems. Now he teaches a variety of programming languages to help others to do the same.

He is a qualified Data and Software Carpentry instructor.