Skip to Main Content

Data Management Services

Recommended File Formats for Long-Term Data Curation: Overview

Given the current pace of change in digital technology, the long-term preservation of the complete content and original functionality (e.g., “look and feel”) of certain file formats may not be practical or possible. While most repositories, including Georgia Southern Commons, take reasonable measures to preserve this content and functionality, the best way to ensure that your file content will retain its use and value over time is to prepare and deposit file formats with the highest probability of long-term preservation. This is especially important for non-text formats, including images, audio and video files, spreadsheets and databases, and software. 

For more information about curating your data, see our guide to curating and sharing data. Contact Jeffrey Mortimore, Digital Scholarship Librarian, for help selecting an appropriate repository and preparing your data for deposit.


File Format Characteristics

The likelihood of long-term preservation of content and functionality is higher when submitted formats possess the following characteristics:

  • complete and open documentation

  • platform-independence

  • non-proprietary (vendor-independent)

  • no “lossy” or proprietary compression

  • no embedded files, programs or scripts

  • no full or partial encryption

  • no password protection


Table of Recommended File Formats

Below is a table of file formats organized by probability of long-term preservation of content and functionality. Those formats in column A exhibit the characteristics above and thus have a higher probability of long-term preservation. Those in column C have a lower probability. Formats in column B are preferred over those in column C; however, the likelihood of their long-term preservation is not as high as for those in column A.

The library recommends that researchers depositing works in Georgia Southern Commons, OpenICPSR, or any other repository, submit file formats in column A if at all possible, and consider converting file formats with a lower probability of long-term preservation to formats with a higher probability. Contact Jeffrey Mortimore, Digital Scholarship Librarian, for help selecting file formats and with file conversion.

This table also is available as a downloadable.pdf.

Content
Type

High probability for long-term preservation

Medium probability for long-term preservation

Low probability for long-term preservation

Text

• Plain text (encoding: USASCII, UTF-8, UTF-16 with BOM)
• XML (includes XSD/XSL/XHTML, etc.; with included or accessible schema)
• PDF/A-1 (ISO 19005-1) (*.pdf)

• Cascading Style Sheets (*.css)
• DTD (*.dtd)
• Plain text (ISO 8859-1 encoding)
• PDF (*.pdf) (embedded fonts)
• Rich Text Format 1.x (*.rtf)
• HTML (include a DOCTYPE declaration)
• SGML (*.sgml)
• Open Office (*.sxw/*.odt)
• OOXML (ISO/IEC DIS 29500) (*.docx)

• PDF (*.pdf) (encrypted)
• Microsoft Word (*.doc)
• WordPerfect (*.wpd)
• DVI (*.dvi)
• All other text formats not listed here

Raster Image

• TIFF (uncompressed)
• JPEG2000 (lossless) (*.jp2)
• PNG (*.png)

• BMP (*.bmp)
• JPEG/JFIF (*.jpg)
• JPEG2000 (lossy) (*.jp2)
• TIFF (compressed)
• GIF (*.gif)
• Digital Negative DNG (*.dng)

• MrSID (*.sid)
• TIFF (in Planar format)
• FlashPix (*.fpx)
• PhotoShop (*.psd)
• RAW
• JPEG 2000 Part 2 (*.jpf, *.jpx)
• All other raster image formats not listed here

Vector Graphics

• SVG (no Java script binding) (*.svg)

• Computer Graphic Metafile (CGM, WebCGM) (*.cgm)

• Encapsulated Postscript (EPS)
• Macromedia Flash (*.swf)
• All other vector image formats not listed here

Audio

• AIFF (PCM) (*.aif, *.aiff)
• WAV (PCM) (*.wav)

• SUN Audio (uncompressed) (*.au)
• Standard MIDI (*.mid, *.midi)
• Ogg Vorbis (*.ogg)
• Free Lossless Audio Codec (*.flac)
• Advance Audio Coding (*.mp4, *.m4a, *.aac)
• MP3 (MPEG-1/2, Layer 3) (*.mp3)

• AIFC (compressed) (*.aifc)
• NeXT SND (*.snd)
• RealNetworks 'Real Audio' (*.ra, *.rm, *.ram)
• Windows Media Audio (*.wma)
• Protected AAC (*.m4p)
• WAV (compressed) (*.wav)
• All other audio formats not listed here

Video

• Motion JPEG 2000 (ISO/IEC 15444-4)??*.mj2)
• AVI (uncompressed, motion JPEG) (*.avi)
• QuickTime Movie (uncompressed, motion JPEG) (*.mov)

• Ogg Theora (*.ogg)
• MPEG-1, MPEG-2 (*.mpg, *.mpeg, wrapped in AVI, MOV)
• MPEG-4 (H.263, H.264) (*.mp4, wrapped in AVI, MOV)

• AVI (others) (*.avi)
• QuickTime Movie (others) (*.mov)
• RealNetworks 'Real Video' (*.rv)
• Windows Media Video (*.wmv)
• All other video formats not listed here

Spreadsheet/ Database

• Comma Separated Values (*.csv)
• Delimited Text (*.txt)
• SQL DDL

• DBF (*.dbf)
• OpenOffice (*.sxc/*.ods)
• OOXML (ISO/IEC DIS 29500) (*.xlsx)

• Excel (*.xls)
• All other spreadsheet/ database formats not listed here

Virtual Reality

• X3D (*.x3d)

• VRML (*.wrl, *.vrml)
• U3D (Universal 3D file format)

• All other virtual reality formats not listed here

Computer Programs

• Computer program source code, uncompiled (*.c, *.c++, *.java, *.js, *.jsp, *.php, *.pl, etc.)

 

• Compiled / Executable files (EXE, *.class, COM, DLL, BIN, DRV, OVL, SYS, PIF)

Presentation

 

• OpenOffice (*.sxi/*.odp)
• OOXML (ISO/IEC DIS 29500) (*.pptx)

• PowerPoint (*.ppt)
• All other presentation formats not listed here

 

 

For help, contact the GS Commons Team at digitalcommons@georgiasouthern.edu. A team member will respond as soon as possible during regular business hours.