Preparing a Data Project: Organization and Documentation


Research data that are well-organized and well-documented when they are shared are easier for other scholars to interpret, and more readily employed for secondary research, for evaluating published research with which the data are associated, and for pedagogical purposes. QDR works closely with depositors to curate the empirical materials they deposit with us in order to facilitate their re-use. Nonetheless, there are many steps that you can take in advance of depositing your data to facilitate the curation process and enhance the quality of your data. We encourage potential depositors to be in touch with QDR from the early stages of data project development so we can offer guidance and assistance with data management and deposit preparation.

QDR prioritizes data projects comprising purely qualitative data, or comprising both qualitative and quantitative data. Most of the advice we offer below applies to both types of data.

Data files – Organizational principles

QDR emphasizes good data organization. We encourage you to offer information about the organization of your project during the deposit process, and to ensure that your file naming convention reflects that organization (see below). Identify the types of data sources your data project includes (e.g., individual interview transcripts, location-based focus group recordings, public documents obtained from archives, articles on a given topic arranged chronologically from a single source publication). Also, try to identify the primary and secondary dimensions along which your data project is organized. We suggest that the top-level organization of data files reflect the first dimension, and lower-level organization reflect the second dimension. For instance, if data were generated through repeated visits to several sites, the first dimension for organizing the resulting files might be location, while the second could be the dates on which the data were collected.

The naming convention for a project should be based on the organization that you have chosen for the materials, as described above. We recommend that you include a few additional items in all file names. We offer two models, and discuss their elements below:

Ex: Camp_Interview protocol_DOCUMENTATION_20170825 QDR: 10055 Ex: Pietro_Interview 1_20151231 QDR: 100XX

  1. The first element in each file name should be the last name of the primary author of the data project; this may be you if you are depositing the data. Also, for file formats where editable text can be used (e.g., MSWord, Excel, text editors), we recommend that you insert a header or documentation tab in each file listing the author’s last name and a narrative version of the file name.

    Image of document header
    Sample document header from http://doi.org/10.5064/F6MS3QNV

     

    Alternatively, for many file types, similar functionality can be achieved by entering the information in the file properties fields; this information will only be visible in the digital version of the file, not in hard copies. QDR staff can assist with this during curation.

  2. Underscore (_) should be used as the transition between each element in the file name.
  3. For all documentation files (e.g., consent scripts, IRB application, de-identification protocol), the suffix “DOCUMENTATION” should be used to indicate that those files are not data files.
  4. Where applicable, the last element in the filename should be the date of creation in the ISO-recommended YYYYMMDD format.
  5. If a file corresponds to an interviewee or an organization whose identity is not to be revealed, do not to use the name or any other direct or indirect identifier as part of the file name or header.

By applying these conventions when you are first naming files (as you collect them), you can greatly expedite the deposit and curation process.

Data files – Formats

Particular data file formats enhance usability and facilitate long-term preservation. QDR has a list of recommended formats for different types of files. QDR staff can assist with format migration where files are not in the recommended formats. If you have analog material (e.g., hard copies of documents, tape recordings) that you would like to digitize for deposit, please contact QDR for guidance.

Data files – Personal and/or confidential information disclosure

All scholars are required to adhere to ethical and legal constraints when sharing data. If you have collected data that are confidential and/or sensitive, extra attention will be necessary to ensure that they are shared ethically and legally. QDR offers detailed guidance on managing and sharing sensitive data that applies to various stages of the research process:

  • When planning your research, consider data sharing in your IRB application and in particular when writing your informed consent protocol.
  • During your research, make sure to manage and handle data responsibly and keep all data, and especially sensitive data, secure.
  • As you prepare your data for sharing, make sure you follow best practices in de-identifying data if you have promised to keep participants’ identities confidential.
  • As you deposit your data, talk with QDR about options for setting access controls that limit who can view and download some or all of the data.

QDR is happy to consult with you on each of these issues, at any point in your research trajectory.

The copyright status of one or more elements of a data project may pose challenges to sharing them via QDR. Using copyrighted works (e.g., archival items and creative publications used as secondary materials) in academic writing usually falls unambiguously under the various exceptions of copyright law, primarily “fair use” in the U.S. context. As such, you may not have spent much time considering the legal implications of dealing with copyrighted works until now. However, different considerations apply when a digital copy of a work that is under copyright is shared – because several other digital copies might have to be produced for preservation and administration purposes, and because dissemination to a larger group of users is often intended.

QDR’s curators can help you to find legal ways to provide access to data that are under copyright. For instance, we can aid you in obtaining necessary licenses and permissions to publish the materials in question. Alternatively, you might provide a detailed data listing (e.g., full bibliographical information) for data files that cannot be shared. Soliciting advice from QDR about what steps to take during the collection process prior to collecting the copyrighted materials may help you to avoid having to retroactively contact copyright holders for permissions.

Preparation of Quantitative Data

Much of the advice we offered above is relevant for preparing quantitative data for sharing and archiving, and QDR curation staff can advise and assist you with questions on preparing quantitative data for sharing. Nonetheless, we recommend that you consult the more detailed, and well-established guidelines provided by ICPSR.

Preparation of CAQDAS Data

Qualitative researchers use software such as NVivo or atlas.ti for Computer Assisted Qualitative Data Analysis (CAQDAS). Such software can facilitate the organization and analysis of complex research relying on qualitative data. Currently, no standard format for archiving and sharing CAQDAS data is available, although QDR is working with the developers of various tools to generate a common format.

QDR recommends that you share CAQDAS data in two different forms. The first form is the raw full export from your project. Make a copy of your project and delete any information you do not want to share such as confidential information or personal notes. Then export in the proprietary format of the vendor (e.g. .npv for NVivo, .xml with full Hermeneutic Unit for atlas.ti, full project .xml for Dedoose) and deposit that file with QDR.

In order to allow users of other software (or users who do not employ software) to access your data, we recommend that you also create a “human-readable” export from that same project copy. Export all relevant files in widely used formats (such as RTF, PDF, Excel, or common video, image, and audio formats). Also export all relevant memos as RTF or PDF files. Deposit all exported files with QDR. The recommendations offered above regarding file organization and naming are also applicable to CAQDAS data.

Documentation Files

Extensive documentation is essential to make data understandable and re-usable by others. Documentation files should be clearly labeled (as described above) so that they can be displayed in a way that introduces and contextualizes the data files of a project.

The specific files that might best document the context in which information was gathered and/or data were created, the collection and generation processes, and (when applicable) how the data were analyzed, differ from project to project. QDR encourages depositors to include documentation files of the following types with their data projects:

  • Questionnaires used for surveys or semi-structured interviews
  • Guidance materials used for team-based fieldwork
  • Instructions for focus group facilitation
  • Consent forms and information sheets
  • Approved IRB application
  • Permissions or licenses from copyright holders
  • Description of methods used to analyze the data
  • Description of fieldwork and project context
  • Description of how derived materials (individual files or variables) were created
  • Coding schemas

When you deposit your data project, you should provide information about any published or in-progress written products that you would like to be linked to the data project. Full bibliographic information for such work is displayed alongside the data project published on QDR following best practices for such linkages. You should also include a full citation to the data project published via QDR as part of the regular bibliography of any publications in which you use or reference the data project. QDR users who view or download your data project will be encouraged to do the same, and you should likewise encourage anyone who contacts you about using the data project in their own research to cite it appropriately.