A readme file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
Create one readme file for each data file, whenever possible. It is also appropriate to describe a "dataset" that has multiple, related, identically formatted files, or files that are logically grouped together for use (e.g. a collection of Matlab scripts). When appropriate, also describe the file structure that holds the related data files (see Example 2 in the PDF version).
Name the readme so that it is easily associated with the data file(s) it describes.
Write your readme document as a plain text file, avoiding proprietary formats such as MS Word whenever possible. Format the readme document so it is easy to understand (e.g. separate important pieces of information with blank lines, rather than having all the information in one long paragraph).
Format multiple readme files identically. Present the information in the same order, using the same terminology.
Use standardized date formats. Suggested format: W3C/ISO 8601 date standard, which specifies the international standard notation of YYYYMMDD or YYYYMMDDThhmmss.
Follow the scientific conventions for your discipline for taxonomic, geospatial and geologic names and keywords. Whenever possible, use terms from standardized taxonomies and vocabularies, a few of which are listed below.
GCMD Keywords - Earth & climate sciences, instruments, sensors, services, data centers, etc.
Gene Ontology Vocabulary - gene product characteristics, gene product annotation
Getty Research Institute Vocabularies - geographic names, art & architecture, cultural objects, artist names
Integrated Taxonomic Information System - taxonomic information on plants, animals, fungi, microbes
NASA Thesaurus - engineering, physics, astronomy, astrophysics, planetary science, Earth sciences, biological sciences
USGS Thesaurus - agriculture, forest, fisheries, Earth sciences, life sciences, engineering, planetary sciences, social sciences etc.
Recommended minimum content for data re-use is in bold.
For each filename, a short description of what data it contains
Format of the file if not obvious from the file name
If the data set includes multiple files that relate to one another, the relationship between the files or a description of the file structure that holds them (possible terminology might include "dataset" or "study" or "data package")
Name/institution/address/email information for
Principal investigator (or person responsible for collecting the data)
Associate or co-investigators
Contact person for questions
Date of data collection (can be a single date, or a range)
Information about geographic location of data collection
Date that the file was created
Date(s) that the file(s) was updated and the nature of the update(s), if applicable
Keywords used to describe the data topic
Method description, links or references to publications or other documentation containing experimental design or protocols used in data collection
Any instrument-specific information needed to understand or interpret the data
Standards and calibration information, if appropriate
Describe any quality-assurance procedures performed on the data
Definitions of codes or symbols used to note or characterize low quality/questionable/outliers that people should be aware of
People involved with sample collection, processing, analysis and/or submission
Full names and definitions (spell out abbreviated words) of column headings for tabular data
Units of measurement
Definitions for codes or symbols used to record missing data
Specialized formats or abbreviations used
Licenses or restrictions placed on the data
Links to publications that cite or use the data
Links to other publicly accessible locations of the data
Recommended citation for the data
Information about funding sources that supported the collection of the data
The preceding guidelines have been adapted from several sources, including:
Recommendations for authors. Dryad. 2012. http://datadryad.org/depositing now available at https://web.archive.org/web/20120413115438/http://www.datadryad.org/depositing
Introduction to Ecological Metadata Language (EML). The Knowledge Network for Biocomplexity. 2012. https://web.archive.org/web/20120424124714/http://knb.ecoinformatics.org/eml_metadata_guide.html