"Working" file formats, those used in the course of collecting and working with project data, are not always ideal for re-use or long-term preservation, and may not meet the requirements of data archives or repositories or satisfy the expectations of research funders. However, planning for and adopting file formats that are well supported early in your research will help later on when you prepare your data for curation and sharing.
In the absence of specific directives from funders or repositories, it can be unclear which file formats to use. For general advice, see the library's guide to recommended file formats for long-term data curation.
See also DataONE's Document and Store Data Using Stable File Formats for useful information about file formats.
Open, non-proprietary formats are far more likely to remain usable even if the software that created them is not available or no longer functional. Formats whose documentation is complete and freely available also have a higher likelihood of long-term preservation. If the program that created the file is the only option for reading or accessing the data, it is likely to be a proprietary, non-open format. As a general rule, plain text formats, such as comma- or tab- delimited files, are open formats and are typically better for re-use and long-term preservation.
Examples of proprietary formats: Excel .xlsx file, Photoshop .psd file
Examples of open formats: comma separated .csv file, .tiff image file
Formats that compress the information in a file are often smaller, but the compression often permanently removes data from the file. These formats are "lossy," while formats that do not result in the loss of information when uncompressed are "lossless."
Examples of lossy formats: .mp3 audio file, .jpeg image file
Examples of lossless formats: .wav audio file, .tiff image file
If the encryption key, passphrase, or password to a file is lost, there may be no way to retrieve the data from the file later, rendering it unusable to others. Uncompiled source code is more readily re-usable by others and has a far greater likelihood of remaining usable over time since recompiling is possible on different architectures and platforms.