You are here

Data Project Types

QDR accepts three types of data projects (different aggregates of qualitative data):  active citation compilations, data collections, and topic clusters.  While in some senses the distinction among the three is one of degree, there are important differences in the ways the project types are constructed and in how they are typically used.

Active Citation Compilations

“Active citation” was originally developed by Andrew Moravcsik (2010, 2012, 2014). The approach entails digitally enhancing the citations – in-text references, footnotes, and/or endnotes – in a piece of scholarship based on qualitative or multi-method research.  Citations that support important claims, conclusions, and inferences are “activated” – hyperlinked to a Transparency Appendix (TRAX) where more rigorous, annotated versions of the citations appear.  In addition, when the actual sources that underlie the activated citations can be legally and ethically shared, the TRAX includes hyperlinks to these sources, which are gathered in an active citation compilation.  

The TRAX begins with a short overview of the overall trajectory of the research, data collection, and the type of inference or interpretation that underlies the qualitative methods employed. Thereafter, for each citation a scholar wishes to activate, the TRAX contains a “citation entry” that  includes the original citation (as it appears in the piece of scholarship), an optional “citation annotation,” and eight elements for each source referenced in the citation:  (a) identification label; (b) source excerpt; (c) source annotation explaining the connection between the source and the textual claim it is supporting; (d) full bibliographic reference and additional location information for source; (e) electronic link to source material available online (optional); (f) source file (optional); (g) information concerning shareability of provided source; and (h) information concerning non-provision of source.

Active citation makes qualitative and multi-method research transparent, facilitating its evaluation. Research transparency consists of:

  • production transparency – describing and explaining the processes employed to generate the data deployed in a social science publication; and
  • analytic transparency -- describing and explaining how the data cited in a publication support the claims, inferences, and interpretations made therein

A Guide to Active Citation offers instructions for using this technique to enhance research transparency and to share data. QDR provides instructions for constructing a TRAX for an existing research publication.

Active citation compilations often include an unrepresentative subset of the data a scholar consulted, collected, or generated when carrying out the research and analysis for the project on which the scholarship is based.  While the materials may have a logic that unites them, at the limit they might be a set of ephemera with no intrinsic connections other than their employment in the author’s narrative.  As such, active citation compilations are most often used to evaluate the claims made in (or, when appropriate, to replicate) qualitative or multi-method research.

Data Collections

A data collection is a coherent group of data that relate to each other in identifiable, describable ways and represent a stand-alone resource that could be analytically useful for scholars beyond the scholar who collected/generated the data.  A data collection has some categorization or logic that makes the data more than just an aggregation of used materials. 

Data collections vary widely in structure.  For instance, they may contain formalized information gathered in the context of pre-set categories and come in the form of a preconfigured database (i.e., may have rows and columns), or may be a particular group of documents or a specific set of interview transcripts.  They may contain many different types of data, and may include data relevant to different aspects of a research project (some data may measure a variable, other data may be used for process tracing, representing a sequence of information leading up to an outcome). The key is that the logic that connects the data can be elucidated.

Data collections are most useful for secondary analysis. 

Topic Clusters

A topic cluster is an unstructured amalgamation of materials on a particular issue or subject. When conducting research, qualitative political scientists invariably gather considerably more information than they ever carefully organize and analyze. The materials in the “everything else” box can be a windfall for other scholars who are addressing the same or a similar topic or searching for relevant background information.

Topic clusters are most helpful to scholars interested in gaining background information as they develop a related research project. 

Data Project Types Compared

Active citation compilations differ from data collections and topic clusters in three ways.  First, the primary purpose of an active citation compilation is to support the claims being made in a piece of scholarship based on qualitative or multi-method research. Its main use is to ‘thicken’ those claims to increase their persuasiveness and (where appropriate) to ease replication. Although an active citation compilation may be used for secondary data analysis outside the context of the original research project, that is not its main goal. By contrast, neither a data collection nor a topic cluster need be connected to a particular piece of scholarship; the data included in these projects may never have been analyzed.  

Second, while a data collection has an underlying logic that unites the data it includes, neither an active citation compilation nor a topic cluster has such a logic.  An active citation compilation may (and usually will) include sources with no intrinsic connections among them other than their citation in a particular piece of scholarship. A topic cluster may include sources that are only connected by their being related to the same topic.

Finally and relatedly, due to the objective for which it is assembled, an active citation compilation generally includes only a subset of the data a scholar generated when carrying out the research and analysis for the project on which the scholarship whose footnotes have been activated is based. Of course, a data collection may also include only a subset of the data a scholar generated in relation to a particular project. Nevertheless, a data collection is typically viewed as “complete” in the sense that it can be treated as a stand-alone resource.  A topic cluster is less-structured but more complete than an active citation compilation, because it often includes everything a scholar happened to collect on a particular topic.