"Working" file formats, those used in the course of collecting and working with project data, are not always ideal for re-use or long-term preservation, and may not meet the requirements of data archives or repositories or satisfy the expectations of research funders. However, planning for and adopting file formats that are well supported early in your research will help later on when you prepare your data for curation and sharing.
In the absence of specific directives from funders or repositories, it can be unclear which file formats to use. For general advice, see the library's guide to recommended file formats for long-term data curation.
See also DataONE's Document and Store Data Using Stable File Formats for useful information about file formats.
Select Open, Non-Proprietary Formats
Open, non-proprietary formats are far more likely to remain usable even if the software that created them is not available or no longer functional. Formats whose documentation is complete and freely available also have a higher likelihood of long-term preservation. If the program that created the file is the only option for reading or accessing the data, it is likely to be a proprietary, non-open format. As a general rule, plain text formats, such as comma- or tab- delimited files, are open formats and are typically better for re-use and long-term preservation.
Select "Lossless" Formats
Formats that compress the information in a file are often smaller, but the compression often permanently removes data from the file. These formats are "lossy," while formats that do not result in the loss of information when uncompressed are "lossless."
Select Unencrypted and Uncompiled Formats
If the encryption key, passphrase, or password to a file is lost, there may be no way to retrieve the data from the file later, rendering it unusable to others. Uncompiled source code is more readily re-usable by others and has a far greater likelihood of remaining usable over time since recompiling is possible on different architectures and platforms.
For more information, contact the Digital Commons Team at (912) 478-4056 or firstname.lastname@example.org. A member of the Digital Commons Team will contact you as soon as possible during regular business hours.
Portions of this guide are adapted from the Cornell University Research Data Management Services Group website under a Creative Commons Attribution 4.0 International License.