Administering UBDC’s Data Service
The administration of UBDC’s data service is an extremely interesting and varied role, which includes the management and sharing of complex urban data to users, sourcing high-quality new datasets and creating accurate metadata to help users to discover and use our collections.
In this blog, our Information Services Officer Heather Sinclair outlines the processes of supplying our users with data, acquiring new datasets and cataloguing the data.
A day in the life of administering our data service
The new user will be welcomed to the service and we will provide background information about the Centre and its goals to support research to improve social, economic and environmental well-being in cities.
We then send out a project summary sheet. This provides potential service users with an opportunity to summarise their projects, which enables us to verify their eligibility to access data, identify the most relevant datasets for their project and decide if any data extracts will be required. We also utilise the project summaries to summarise usage of the datasets for data owners and other interested parties.
After checking that the usage is permitted within the licence terms and conditions, we send out the licences to the end-user to sign. The licences for safeguarded data vary per dataset and they stipulate how the data can and cannot be used (for example for non-commercial academic research, commercial or teaching purposes).
Once the licences are received, we share the data, but the user journey does not stop there, as we maintain regular contact with the user to ensure that they can access the data successfully and make the most use out of the data.
We are always interested to hear how the users have utilised the data in their research and if they produced any publications as a result of having the data, presented at conferences or if the data has had an impact in policy making. One of the terms and conditions that data users agree to is to inform us of publications and other outputs, and of any other outcomes or impact achieved.
Acquiring new datasets
We often receive requests for new types of data and we actively invite suggestions for data acquisitions. On receipt of the request we search for and source new forms of datasets, typically those which align with UBDC’s main research projects: Education and Skills; Housing and Neighbourhoods; Transport and Mobility and Urban Governance. Strong candidate datasets are those with potential for academic benefits and impact for UBDC with terms and conditions that permit suitably widespread academic research use. We evaluate datasets to ensure they meet a high standard of quality and utility. Possible evaluation criteria include data collection methodology and provenance, recency, frequency of updates, data coverage (often we prioritise coverage in areas already well served with existing data to support comparison and linking) as well as basic data completeness and correctness. We also consider whether any special software is required for the user to view the data. A further critical consideration is whether data supply can be sustained – excessively costly datasets or those that offer only a moment-in-time snapshot may be less attractive as there may be less scope to refresh or add to the data over time.
Creating metadata for cataloguing data
After liaising with new data owners and acquiring a new dataset or creating a derived dataset, we immediately create a metadata record, which accurately describes the version of the data and records information such as title, description, spatial and temporal coverage and the variables. All metadata records can be found in the Data Portal and potential users can view the metadata records to establish whether a dataset will meet their project needs. The records also provide a citation to use when referencing the data in a publication, as well as information about licenced usage and technical information about the dataset.
There are some important principles to consider when creating metadata. The FAIR Data Principles aim to increase Findability, Accessibility, Interoperability and Reusability. FAIR principles recommend the use of persistent identifiers, indexing of metadata to enable its retrieval and accurate recording of the provenance. There are also various metadata schemas available such as DataCite, which provides metadata schemas for datasets to ensure standardisation of records. It also offers a minting service, which produces a Digital Object Identifier (DOI). A DOI is a persistent identifier used to identify objects (including datasets) uniquely. An example is 10.1109/5.771073. The benefit of having a DOI is that if the URLs change over time, e.g., the resource moves to a different location, the same DOI will continue to resolve to the correct resource at their new locations.
The Data Catalog Vocabulary (DCAT) is another useful metadata standard, as the use of a standard vocabulary can increase interoperability between different organisations’ data catalogues. The UBDC Data Portal is our data catalogue. The main purpose of the data portal is to host open data files, which are freely available and accessible without logging in.
I hope that this brief overview of the types of activities involved in administering a data service has been informative. However, there are many complex issues to consider regarding open data, data acquisitions, data curation and metadata creation. If you have any queries about these processes, please comment below or get in touch.
Heather Sinclair is the Information Services Officer for Urban Big Data Centre. She provides information management services to enhance data collections and support data programmes.