UBDC Summer Training 2017: Getting started with Data Management

Tuesday 29 August 2017
10:00 – 13:00 BST
Jura Teaching Lab, Level 4 Annexe, University of Glasgow Library, Hillhead Street, Glasgow G12 8QE

This course is designed to guide those with limited or no experience of using data-supported research methods through the fundamental processes from acquiring data through to best practices in data management.

In order to maximise the potential from your datasets it is important that care is taken to prepare and organise data to ensure accuracy, efficiency and accordance with usage terms. Course participants will gain a clear understanding of the main sharing and access issues and the most appropriate platforms and conventions to choose for storing and structuring data. The course will provide time to try out practical exercises of data management in R using a variety of example datasets from UBDC's open data collection and other sources to demonstrate the different issues and techniques being taught. These data may include: aggregate health data from ISD, Scottish Census data, SIMD and other Scottish Government open data.

A separate course will run in the afternoon ”Beyond Excel”: PostgreSQL for Data Management. Those attending will learn more about working with databases on a larger scale using PostgreSQL including: data loading, linkage, queries and producing various outputs.

Course instructors

Marta Nicholson and Mirjam Allik, UBDC, University of Glasgow

Course duration

Half day (Tuesday 29th August 2017, 10:00am – 1:00pm)

Course location

Jura teaching lab, Level 4 Annexe, Glasgow University Library


Social scientists, students and practitioners - anyone with responsibility for producing and managing datasets.


  • £25 - For UK registered students
  • £35 - For staff at UK academic institutions, Research Council UK funded researchers, UK public sector staff and staff at UK registered charity organisations
  • £50 - For all other participants

Pre-requisite knowledge

Prior experience of using R would be useful but is not essential.

Course content

Part 1 - Sourcing data, best data management practice and understanding licencing

  • How to find and where to get data to facilitate its resolution (e.g. UBDC / open data repositories / apis / Google!)
  • How to approach the practical challenges of obtaining data:
    • Where to put it (e.g. formats, database platforms, folder structure)
    • Legal issues – types of data licenses and associated restrictions
    • How to manage it (naming conventions, versioning, understanding metadata)
    • Use of digital tools to record annotations about data

Part 2 - Preparing data for analysis

  • Removing extraneous text and symbols from data
  • Restructuring and aggregating datasets
  • Merging data
  • Cleaning datasets from metadata and compiling metadata