Improving QDR's Dataverse for Qualitative Data


DOI: https://doi.org/10.59350/5q4p1-ew181

When QDR adopted the Dataverse platform in early 2017, one of our goals was to improve the software, development primarily with quantitative data in mind, for qualitative data and the researchers using it. A little more than one year into using Dataverse software at QDR, we have made significant strides in this direction. Here is a quick overview of some of our biggest additions. I also talk about some of these in the video below.

Qualitative data come in many formats – text, audio, video, images, and more – yet textual data make up by far the largest component of QDR’s (and other qualitative data repositories’) holdings. We therefore care deeply that these data are easy to find. In order to do this, we need to go beyond the description of projects: we need to allow users to search what’s inside the data files. For tabular, quantitative data, Dataverse already allows for this, already extracting variable-level metadata and including it in searches. The analog for qualitative data is straightforward: users need to be able to search for text in files.

Full text search of text and PDF files is now available in QDR and, based on our contributions, in other repositories using Dataverse software such as the Harvard Dataverse. Many of the data files in QDR, however, are restricted to be viewed only by registered users (and some carry further restrictions). How do we make these findable without exposing potentially sensitive contents? Any search on QDR will only show the results the searching users has access to: an guest user’s search will mostly encompass documentation, an authenticated user’s search most files, and an admin’s search all files, in both published and unpublished projects. Full text search is available both across data projects and within any project.

Multimedia Viewers

Other common file types for qualitative data are images, audio recordings, and videos. In the standard Dataverse software, the only way to view such files is to download them and then open them in a viewer on your computer: that’s a lot of steps! Moreover, as you’re looking at larger video files, this entails downloading massive files before you even know if you’ll find them useful.

QDR therefore implemented a set of lightweight viewers, which allow you to open a large variety of files (text, html, pdf, image, audio, video) in a new tab for quick viewing. They’re easily accessible from a button next to the file. While QDR is currently the only Dataverse installation using these viewers, their code is open and easy to run and several other repositories have already indicated that they will use them.

Benefitting from a Strong Open Source Community

While we at QDR focus our development efforts on Dataverse features that are particularly important for qualitative data, many such features also are developed by an active open source community, coordinated by the wonderful team at Harvard’s Institute for Quantitative Social Science (IQSS). Here are some of the biggest gains for us, as a repository for qualitative data, from the last year:

  1. Data projects with many files work much better now and it is easily possible to select and download all files in such projects. Many of our projects have hundreds, some thousands of files, so this is of particular importance for qualitative data.
  2. Individual files in QDR now have Digital Object Identifiers (DOI), making them clearly citeable. Files in qualitative data projects often make sense by themselves (think an individual interview transcript or a historical documents), and we’ve had requests for this in the past.
  3. We are now able to display and make available for download data files organized in folder structures. If you upload a ZIP file containing folders to QDR, these are automatically preserved. Given the large number of files in projects, organization such as this is particularly important for our data. See e.g. the recently published deposits from Alisha Holland and Matt Hitt for two examples of using folders effectively to organize and display data.

Are there any other areas in which you think we should do better for qualitative data? We’re always looking to hear from you by email or on twitter.