Formatting Data

A file format is a specific way of structuring information for storage in a digital medium so that it is “understandable” in a machine context. Formats should be readable by as many types of systems as possible (including human readable) without compromising the purpose of the data.

These recommendations are meant to guide the way data are created following best practices. The formats in which data are created and stored determine their legal and technical fit for sharing in QDR and for the longevity of access to them.

The submission formats recommended here ensure that data is stored in the most appropriate ways for digital display, long-term archiving, and prevention of technological obsolescence. We recommend that you use these data formats in creating and storing your data. QDR will, however, accept data in all common data formats. Please include information about any unusual / proprietary file format in your first communication with QDR. We may request that you convert data in unusual, proprietary data formats, and we are happy to assist you in doing so.

Recommended Formats for Submission of Born-Digital Data Files

Data Type Recommended Formats
Text PDF/A, Rich Text Format (.rtf), plain text/html (.txt, .html), Extended Markup Language (.xml) following an appropriate schema.
Images Tagged Image Format uncompressed (.tif)
JPEG2000 (.jpg) when image is created as JPEG (e.g. digital camera images)
Scalable vector graphs (.svg) or Encapsulated postscript (.eps) for vector graphs (e.g. figures created by many statistical tools
Audio Broadcast Wave Format (.bwf; ideal when available), Wave audio format (.wav), MP3 (.mp3), OGG (.ogg). While lossless formats like BWF and WAV are preferable when used from recording to deposit, do not convert compressed formats (MP3, OGG) to WAV.
Video MP4
Geospatial Data ESRI Shapefiles (essential - .shp, .shx, .dbf, optional - .prj, .sbx, .sbn), Geo-referenced TIFF (GeoTIFF, .tif)
Tabular Data For data with rich metadata: Stata, SPSS, and R data files (.dta, .sav, .RData)
For data with little metadata: Comma or tab separated values (.csv or .tab)
CAQDAS There is no good storage or exchange format for CAQDAS (computer-assisted qualitative data analysis) at this time. Deposit both the proprietary format of your software of choice with full version information and an export in a common data format (like .rtf) that involves data loss. If your project includes analysis via a CAQDAS package, please note this during your first communication with QDR and we will suggest appropriate export alternatives.

If you have analog material (e.g., hard copies of documents, tape recordings) that you would like to digitize for deposit, please contact QDR for guidance.

As part of QDR’s policies for ensuring sustainable access to its deposits such that they remain understandable and usable, repository staff will also periodically monitor data projects for possible obsolescence of formats. Obsolescence will be evaluated by comparison with common software used in university social science departments, as well as by format-related support requests to the site. If QDR decides to change the format of a file, the original depositor – if available – will be consulted concerning minimizing modification to the intellectual content of the original. However, as part of the original deposit process all depositors grant QDR the right to modify their data files in accordance with this policy.