Use of Artificial Intelligence (AI) and Large Language Models (LLMs) for Secondary Use of QDR-published Data


1. Purpose and Principles

This policy sets out how researchers may and may not use artificial intelligence (AI) tools, including large language models (LLMs), with data obtained from the Qualitative Data Repository (QDR). It interprets existing QDR access agreements, licenses, and data classification rules in the context of new technologies for secondary use.

QDR is committed to:

  • Upholding general restrictions on re-dissemination that apply to the vast majority of data in the repository;
  • Enabling responsible and innovative secondary analysis while minimizing risks to research participants and data confidentiality;
  • Protecting the rights and expectations of data depositors;

2. Key Constraint: Prohibition on Re‑dissemination

Almost all data distributed by QDR are licensed under agreements (Standard Access and Controlled Access) that prohibit re‑dissemination of the data to third parties. For the purposes of this policy, providing QDR data to an AI system may constitute re‑dissemination if that system retains, reuses, or incorporates user-provided data beyond the immediate analytic task. Because of this, the permissibility of using AI tools depends primarily on how those tools handle user-submitted data.

3. Categories of AI and LLM Tools

For clarity, QDR distinguishes between three broad categories of LLM-based tools. These categories are based on publicly stated terms of use and common deployment models, rather than on technical specifications.

Group 1: Online LLMs That May Retain or Reuse Submitted Data

These are online AI tools that, according to their terms of service, may use prompts or uploaded content for purposes such as service improvement or model training.

  • Examples: public or free versions of widely available chat-based AI systems

Policy: No QDR data may be used with Group 1 LLMs.
Use of QDR data in these tools risks incorporating the data into future AI models, which would constitute impermissible re‑dissemination and would violate depositor agreements.

Group 2: Online LLMs That Contractually Commit Not to Retain User Data

These are online AI services that explicitly commit, by contract or terms of use, not to retain, reuse, or train on user-provided data. They might also allow users to request deletion of submitted content.

  • Example: paid API-based services or enterprise versions (i.e., university subscriptions) of commercial LLMs.

Policy: Standard-access (non-restricted) QDR data may be used with Group 2 LLMs. Restricted or controlled-access QDR data may not be used with Group 2 LLMs.
QDR considers standard-access data to be "safe data" with relatively low barriers to access and limited storage requirements. While companies' claims about data handling cannot be directly verified, this level of trust is considered comparable to commonly used commercial storage and collaboration platforms (e.g., cloud storage services) that are not explicitly prohibited by QDR. However, because controlled and restricted data require stronger safeguards, the use of such data with online systems that are not verifiably isolated from the internet is not permitted.

Group 3: Offline or Locally Run LLMs

These are AI models that are run locally on a researcher's machine or within a secure institutional environment; have no external network connections during use; and do not transmit data to third parties.

  • Example: Locally installed LLMs

Policy: Group 3 LLMs may be used with any QDR data that the researcher is otherwise authorized to download, provided that all existing QDR data security and storage requirements are met.
QDR treats offline LLMs as comparable to other local analysis software. When run in compliance with applicable data security plans, they do not raise additional re‑dissemination concerns. The details of their set up and data security measures must be described in an application for data access to controlled data.

4. University-Hosted or Hybrid AI Services

Some universities operate AI services that fall between Groups 2 and 3 (e.g., institutionally hosted tools with strong contractual and technical safeguards).

Policy: At present, QDR evaluates such services on a case-by-case basis. Researchers considering the use of such tools with restricted-access data should contact QDR prior for a consultation prior to applying for access.

As a general principle, QDR is more likely to permit such use where the university classifies the tool as permissible for sensitive or confidential data under its own data policy and the tool operates within a controlled and audited institutional environment.

5. Researcher Responsibilities

Researchers using AI tools with QDR data are responsible for:

  • Selecting tools that comply with this policy and with the terms of their QDR data access agreement;
  • Ensuring that AI use does not increase disclosure risk (e.g., by reconstructing identities or making sensitive linkages);
  • Maintaining appropriate documentation of AI use, particularly where required by journals, funders, or institutional review boards;
  • Remaining accountable for all analytic outputs produced with the assistance of AI tools.

Use of AI tools should not shift responsibility for data protection, analytic validity, or ethical research conduct away from the human researcher.

6. Disclosure and Transparency

QDR encourages transparency in the use of AI tools. Researchers should disclose their use of AI-assisted methods when appropriate, including in publications, codebooks, or methodological appendices, consistent with disciplinary norms and publisher requirements.

7. Enforcement, Monitoring, and Communication

QDR recognizes that monitoring the use of data in certain online AI systems is difficult. As a result:

  • This policy relies primarily on clear communication, researcher good faith, and professional norms.
  • Violations of this policy may constitute violations of QDR data use agreements and may result in suspension or termination of access, as well as pursuing professional repercussions via the violating user's institution.

QDR will continue to update guidance as AI technologies and community standards evolve and welcomes feedback from depositors and users. This policy will be reviewed periodically to reflect changes in AI technology, legal frameworks, and best practices in social science research data stewardship.

Researchers with questions or uncertainty about specific tools or use cases should contact QDR staff before using AI tools with QDR data.

This policy emerged in part from discussions with ICPSR, which has similar policies for its holdings.