Given the current pace of change in digital technology, the long-term preservation of the complete content and original functionality (e.g., “look and feel”) of certain file formats may not be practical or possible. While most repositories, including Georgia Southern Commons, take reasonable measures to preserve this content and functionality, the best way to ensure that your file content will retain its use and value over time is to prepare and deposit file formats with the highest probability of long-term preservation. This is especially important for non-text formats, including images, audio and video files, spreadsheets and databases, and software.
For more information about curating your data, see our guide to curating and sharing data. Contact Jeffrey Mortimore, Digital Scholarship Librarian, for help selecting an appropriate repository and preparing your data for deposit.
The likelihood of long-term preservation of content and functionality is higher when submitted formats possess the following characteristics:
complete and open documentation
platform-independence
non-proprietary (vendor-independent)
no “lossy” or proprietary compression
no embedded files, programs or scripts
no full or partial encryption
no password protection
Below is a table of file formats organized by probability of long-term preservation of content and functionality. Those formats in column A exhibit the characteristics above and thus have a higher probability of long-term preservation. Those in column C have a lower probability. Formats in column B are preferred over those in column C; however, the likelihood of their long-term preservation is not as high as for those in column A.
The library recommends that researchers depositing works in Georgia Southern Commons, OpenICPSR, or any other repository, submit file formats in column A if at all possible, and consider converting file formats with a lower probability of long-term preservation to formats with a higher probability. Contact Jeffrey Mortimore, Digital Scholarship Librarian, for help selecting file formats and with file conversion.
This table also is available as a downloadable.pdf.
Content |
High probability for long-term preservation |
Medium probability for long-term preservation |
Low probability for long-term preservation |
Text |
• Plain text (encoding: USASCII, UTF-8, UTF-16 with BOM) |
• Cascading Style Sheets (*.css) |
• PDF (*.pdf) (encrypted) |
Raster Image |
• TIFF (uncompressed) |
• BMP (*.bmp) |
• MrSID (*.sid) |
Vector Graphics |
• SVG (no Java script binding) (*.svg) |
• Computer Graphic Metafile (CGM, WebCGM) (*.cgm) |
• Encapsulated Postscript (EPS) |
Audio |
• AIFF (PCM) (*.aif, *.aiff) |
• SUN Audio (uncompressed) (*.au) |
• AIFC (compressed) (*.aifc) |
Video |
• Motion JPEG 2000 (ISO/IEC 15444-4)??*.mj2) |
• Ogg Theora (*.ogg) |
• AVI (others) (*.avi) |
Spreadsheet/ Database |
• Comma Separated Values (*.csv) |
• DBF (*.dbf) |
• Excel (*.xls) |
Virtual Reality |
• X3D (*.x3d) |
• VRML (*.wrl, *.vrml) |
• All other virtual reality formats not listed here |
Computer Programs |
• Computer program source code, uncompiled (*.c, *.c++, *.java, *.js, *.jsp, *.php, *.pl, etc.) |
• Compiled / Executable files (EXE, *.class, COM, DLL, BIN, DRV, OVL, SYS, PIF) |
|
Presentation |
• OpenOffice (*.sxi/*.odp) |
• PowerPoint (*.ppt) |
For help, contact the GS Commons Team at digitalcommons@georgiasouthern.edu. A team member will respond as soon as possible during regular business hours.