An official website of the United States government.

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Data Management Planning

Data are valuable and often unique assets that should be properly managed to be FAIR - findable, accessible, interoperable, and re-usable into the future. Scientists, data managers, library staff, and data users can take actions across the research life cycle(link is external) which contribute to effective and sustainable data management.

The National Agricultural Library provides guidance on data management as outlined in P&P 630: Data Management & Public Access Requirements for ARS(link is external). These guidelines incorporate US Federal public access(link is external) and open data(link is external) directives and comply with a broad range of current funding agency requirements(link is external) for Data Management Plans (DMPs).

The following sections provide information for researchers as they embark on creating their data management plans.

Create a DMP

DMPs created within ARS program areas with USDA funding must follow the structure outlined in P&P 630(link is external). These DMPs contain 6 distinct sections. We provide guidelines covering typical DMP components below, with examples for many agricultural research domains. You can also view a recorded Creating Data Management Plan(link is external) webinar outlining these guidelines.

You can download this example of a well-formed DMP (.docx) and modify as needed:

Data_Management_Plan_Example.docx

Download and view the same DMP with annotations to explain each section:

Data_Management_Plan_Example_annotated.docx

Core DMP components

1. Expected Data Types

Describe the types of data you will produce (e.g. digital, non-digital) and how they will be generated (lab work, field work, surveys, etc.).

For example,

  • You may collect environmental data from real-time sensors, or images from phenocams.
  • You may conduct interviews with digital video and audio recordings and subsequent digital transcriptions.
  • You may have field notebooks from crop management experiments or field trials that are not "born digital."
  • You may generate sequence data for whole genomes or metagenomics.
  • During analysis or modeling, you may create customized computer code or scripts for transformation or data cleaning.

Describe the metadata you will generate. Best practices encourage metadata to facilitate wider understanding and re-use of the data. You should record metadata describing the data you have collected for each experiment, and/or each physical sample. Sometimes this metadata is embedded in the files produced by the sensors or sequencing machines.

If you plan to re-use raw or processed data from other studies, name the anticipated sources.

2. Data Formats and Standards

Describe the data formats (e.g. csv, pdf, doc) for both raw and processed data you will produce. Note any plans to digitize data created in a non-digital format. Most U.S. funding agencies require machine-readable formats where possible.

For example:

  • Written material formats such as Microsoft Word doc and LaTex files, with TXT being most machine-readable
  • Spreadsheets such as Microsoft Excel, with CSV files being most machine-readable
  • Curriculum or instructional material such as Microsoft PowerPoint
  • Digital image formats such as TIFF, JPEG
  • Digital video formats such as MPEG, MOV
  • Databases such as MySQL, Microsoft Access
  • Code/Software such as Matlab m files, R scripts
  • Other/Specialized data formats such as fasta, shp, kmz
  • Specimen or observation data in Darwin Core
  • GIS map layers in a variety of formats
  • Results of Application Programming Interface (API) or web service calls in JSON or XML
  • Find a comprehensive list of file formats at https://www.fileformat.info/format/all.htm(link is external)

Describe the standards or schemas that will be used to structure, store, and share the data and metadata. We strongly encourage community-recognized and non-proprietary standards for maximum interoperability and reusability. Name and link to any published data dictionaries, data standards, or ontologies that you are using. For example:

If you plan to deposit data in an existing database or repository, refer to their data and metadata standards. For instance, data deposited in Ag Data Commons is described with metadata that conforms to the DataCite(link is external) metadata standard, as well as Project Open Data metadata(link is external) for records forwarded to data.gov. You may require additional metadata to describe your research. We strongly encourage depositing data in subject-specific databases that follow community-recognized metadata standards.

3. Data Storage and Preservation of Access

Describe provisions for depositing data in a trusted/certified long-term preservation and archiving environment. This includes your plan for backups, cloud storage, access protocols, obsolescence avoidance, data migration strategy, persistent identifiers, etc.

Describe where data will be stored during and after the life of the project. Name specific workspaces and repositories as appropriate. For example:

We strongly encourage depositing data in discipline-specific databases or repositories that follow community-recognized metadata standards. For USDA-funded data without a discipline-specific repository, the NAL maintains the Ag Data Commons(link is external) as a generalist ag repository. This enables the USDA's compliance with both public access and open data requirements to make federally funded research data open, accessible, and machine-readable. Data stored in the Ag Data Commons will receive a DOI (digital object identifier) for persistent access.

Describe the technical infrastructure and staff expertise.

Describe the plans for long-term preservation. Items to cover include:

  • Amount and size of data expected to be archived for both short- and long-term. Ideally this includes raw data and/or minimally processed data (e.g., with quality control)
  • The planned retention period for the data
  • Strategies, tools, and contingency plans to avoid data loss, degradation, or damage

4. Data Sharing and Public Access

Describe your data access and sharing procedures both during the project and after the data collection process is complete, as well as plans for publication or public release. Name specific repositories, databases, and catalogs as appropriate. Many repositories for storage and preservation also offer public access functionality (e.g. the Ag Data Commons(link is external)).

Explain any restrictions, embargo periods, license, or public access level that apply to the data (see Project Open Data(link is external) for more information). Data generated by federal employees should carry either US Public Domain or Creative Commons CCZero status, while federally funded data and non-federal data may vary depending on funder requirements. Find license definitions and additional information at opendefinition.org(link is external).

The USDA strongly discourages limiting distribution of data to project or personal websites only. Similarly, in most cases it is insufficient to make data available only on request. The USDA prefers researchers deposit data in a trusted/certified long-term preservation and archiving environment.

Other items to cover include:

  • Specify plans to create a catalog record for publicly available datasets in the Ag Data Commons(link is external) if funded by USDA, regardless of where you plan to publish datasets.
  • Outline restrictions such as copyright, proprietary and company secrets, confidentiality, patent, appropriate credit, disclaimers, or conditions for use of the data.
  • Indicate how you will ensure that appropriate funding project numbers (e.g. CRIS numbers such as NIFA award numbers or ARS project numbers) will be cited with the data.

5. Roles and Responsibilities

Describe information about project team members and tasks associated with data management activities over the course of the project.

  • Note who will primarily ensure DMP implementation.
    • This is particularly important for multi-investigator and multi-institutional projects. This may consist of a named data manager or responsibility via role.
  • Define key roles of the DMP team.
    • Appropriate for larger scale projects – identify who will do which tasks.
  • Provide a contingency plan in case key personnel leave the project.
    • For example, if data are managed individually or collaboratively on a platform such as ARS SCINet, and an investigator leaves, note who becomes responsible for the data.
  • Describe what resources are needed to carry out the DMP.
    • If the DMP execution requires funds, add them to the budget request and budget narrative. Projects must budget for sufficient resources to implement the proposed DMP. For example, there may be data publication charges, data storage charges, or salary for data managers.

6. Monitoring and reporting

Include information about how the researcher plans to monitor and report on implementation of the DMP during and after the project, as required by funder. This may include progress in data sharing (publications, database, software, etc.).

DMP Feedback and Review

Submit your DMP for review

NAL's Knowledge Services Division provides a DMP draft review service for ARS researchers based on the provisions and structure outlined in ARS P&P 630. The review may also incorporate aspects of industry and subject-specific standards for data and metadata management.

After preparing your DMP draft, contact NAL for feedback at https://www.nal.usda.gov/ask-question. Include a copy of your DMP draft as well as a copy of the project proposal. Please allow at least 5 business days for this service.

You can also check your own DMP using the checklist for DMP peer review below. Reviewers of USDA DMPs may download the DMP Reviewer Checklist document to receive tips and resources for the review process, and use the following questions to guide their evaluation. If you have suggestions or comments on this review checklist, please visit https://www.nal.usda.gov/ask-question

Checklist for data management plan peer review

General Considerations

  1. Does the DMP cover the full life cycle of the data?
  2. Are best practices in the scientific discipline available and if so are they referenced and/or followed?
  3. Does the budget adequately cover activities in the DMP?
  4. Is the DMP thoughtful and specific such that it is apparent that the project has personnel with appropriate knowledge and experience to manage the data?

Find a Data Repository

Researchers should ideally determine if an appropriate domain repository exists for their data, and tools such as FAIRsharing.org and re3data.org can help with this determination. Alternatively, there are multiple "generalist" repositories available with broader focus including the Ag Data Commons(link is external), the preferred repository for USDA-supported data that does not fit into an existing subject-specific repository. Researchers need to consider the requirements of their community, funder, institution, publisher, and possibly other factors to select an appropriate repository.

Data Management Glossary

This glossary provides definitions of key terms related to scientific/technical data curation and management as broadly adopted in the research data and repository community. The authoritative sources cited are gratefully acknowledged.

Glossary

A

access

The ability for a user to view and interact with data stored on a computer or computer system. (abc-clio/ODLIS)

access level

see: public access level

administrative metadata

(= access and use metadata)

Provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that sometimes are listed as separate metadata types are: Rights management metadata, which deals with intellectual property rights, and Preservation metadata, which contains information needed to archive and preserve a resource. (LTER)

API

(= application programming interface)

A set of software instructions and standards that allows machine to machine communication - like when a website uses a widget to share a link on Twitter or Facebook. (NALT)

author

(= creator)

The main researchers involved in producing the data, or the authors of the publication, in priority order. May include those responsible for software creation. (DataCite Metadata Schema)

B

big data

An accumulation of data that is too large and complex for processing by traditional database management tools. (Definition of BIG DATA, 2020)

C

catalog

see: data catalog

catalog record

see: metadata record

citation

Support for the ability to establish provenance and attribute credit to research data sources, which allows for easier access to research data within journals and on the Internet. (CODATA-ICSTI; NNLM Data Thesaurus)

collection

A grouping of science data that all come from the same source, such as a modeling group or institution. Series/collections have information that is common across all the datasets/granules they contain. (EOSDIS)

contact name

(= ContactPerson; Responsible Party)

Person with knowledge of how to access, troubleshoot, or otherwise field issues related to the resource. (DataCite Metadata Schema)

controlled vocabulary

see: vocabulary

CSV

A standard format for spreadsheet data. Data is represented in a plain text file, with each data row on a new line and commas separating the values on each row. As a very simple open format it is easy to consume and is widely used for publishing open data. (Open Data Handbook)

curator

(= data curator)

Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data centre or repository. (DataCite Metadata Schema, DataCurator)

D

data catalog

(= catalog)

A searchable and browsable online collection of data sets. A data catalog informs customers about available data sets and metadata around a topic and assists users in locating it quickly. (Dataversity; NYU)

data dictionary

A data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. (DataONE)

data integrity

Assuring information will not be accidentally or maliciously altered or destroyed. (NSA)

data life cycle

The data lifecycle represents all the stages of data throughout its life from its creation for a study to its distribution, preservation, and reuse. (DataONE; NNLM Data Thesaurus)

data management plan

(= DMP)

A data management plan describes the data that will be authored and how the data will be managed and made accessible throughout its lifetime. The contents of the data management plan should include: the types of data to be authored; the standards that would be applied, for example format and metadata content; provisions for archiving and preservation; access policies and provisions; and plans for eventual transition or termination of the data collection in the long-term future. (DataONE)

data paper

A factual and objective publication with a focused intent to identify and describe specific data, sets of data, or data collections to facilitate discoverability. (DataCite)

data publishing

Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice. (Wikipedia)

data repository

see: repository

data resource

(= resource)

Resources are the actual files, APIs or links that are being shared. (DKAN)

dataset

A dataset is the term for a collection of research data files produced in the course of research for a paper or project, plus accompanying metadata: describing the data, and indicating who produced the data, and who may access it - i.e. title, description, categories, contributors, license and so forth. Usage of the term dataset varies considerably across disciplinary communities. (Mendeley; Renear et al., 2011)

dataset doi

see: digital object identifier

description

(= summary, abstract)

A rich summary of the dataset: how and why it was generated and how it should (or should not) be used. This can be modified from article text, but should focus on characterizing the data, not the research project. Analogous to an abstract for a paper. (ESIP/NetCDF; Ag Data Commons)

digital object identifier

(= dataset doi, DOI)

Globally unique character strings that reference physical, digital, or abstract objects. They provide actionable, interoperable, persistent links to information about the objects they reference. (USGS)

E

embargo

(= scheduling option)

A specified period of time during which the dataset is inaccessible. At the end of the embargo period, the dataset will be made available. Metadata describing the dataset is publicly available during this period. (Subject guides: Data Publication: Home, 2020)

endpoint

An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format. (W3C)

F

FAIR data

A set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable. (FORCE11)

format

(= file type, file format, resource format)

A digital resource encoded for storage in a computer file in a standard way. File formats may be either proprietary or free and may be either unpublished or open. (Wikipedia; DCMI)

G

geographic extent

The spatial (horizontal and/or vertical) delineation of the resource. (NOAA)

geospatial data

Data about objects, events, or phenomena that have a location on the surface of the earth. (Stock & Guesgen)

GIS

(= Geographic Information System)

A framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. It analyzes spatial location and organizes layers of information into visualizations using maps and 3D scenes. (Esri)

H

harvest

To use the public feed or API of another data portal to import items from that portal's catalog into your own. For example, Data.gov harvests all of its datasets from the data.json files of hundreds of U.S. federal, state and local data portals. (DKAN)

L

license

A legal document under which the resource is made available, typically indicated by URL (DCAT, schema.org)

local resource

Data files stored and served from an internally managed repository. (DKAN, USGS)

M

metadata

Documentation of important aspects of data that describe where, when, and why the data were collected; who collected the data; what types of data were collected; what processes were used to create the data; what quality assurance controls were used; and where the collected data are located. Metadata are provided in a human-readable form as well as in a format that is machine readable (for example, XML) for automated use. (USGS)

metadata record

(= catalog record)

An item-level metadata record details the characteristics of a digital object for the purposes of description, resource discovery, and preservation. It typically includes: Descriptive information; Access points; Contextual information; Reference to the original item and collection; Administrative and preservation information. (Xie & Matusiak, 2015)

metadata schema

A unified and structured set of rules developed for object documentation and functional activities. (Drake, 2003)

O

open data

In general, consistent with the following principles: Public; Accessible; Described; Reusable; Complete; Timely; Managed Post-Release. (Project Open Data)

P

peer review

The process in which a new book, article, software program, etc., is submitted by the prospective publisher to experts in the field for critical evaluation prior to publication, a standard procedure in scholarly publishing. (abc-clio/ODLIS)

processed data

Data that has been edited, cleaned or modified from the raw data. (MGDS)

product type

(= resource type)

A high-level categorization of the most important part of the dataset's actual content – for example, Audiovisual; Collection; Dataset; Image; Model; Software. (Ag Data Commons)

public access level

(= access level)

The degree to which this dataset could be made available to the public, regardless of whether it is currently available to the public [e.g. under embargo]. (Project Open Data)

published (moderation state)

see: data publishing

R

raw data

Refers to data that have not been changed since acquisition. (MGDS)

registry

Authoritative, centrally controlled store of information. (W3C)

remote resource

(= external resource)

Associated data stored in external data repositories, or code stored in external software repositories. (Dryad)

repository

(= data repository, metadata repository)

A place that holds data, makes data available to use, and organizes data in a logical manner. A data repository may also be defined as an appropriate, subject-specific location where researchers can submit their data. (NLM)

resource format

see: format

S

self-citation

Reference made in a written work to one or more of the author's previous publications (book, periodical article, conference paper, etc.), an accepted practice in scholarly communication, provided important works written on the subject by other authors are not neglected or ignored. (abc-clio/ODLIS)

T

taxonomy

Typically a controlled vocabulary with a hierarchical structure, with the understanding that there are different definitions of a hierarchy. Terms within a taxonomy have relations to other terms within the taxonomy. These are typically: parent/broader term, child/narrower term, or often both if the term is at mid-level within a hierarchy. (American Society for Indexing)

temporal coverage

(= temporal extent)

The time period that the dataset covers. An interval of time that is named or defined by its start and end. (DCAT)

U

use limitations

Limitations regarding the dataset's usability. Example statements include "estimates biased over water," "equipment malfunctioned during a specified time," or "granularity makes data unsuitable for certain kinds of analysis". (Ag Data Commons)

V

version

A new version of a dataset is created when there is a change in the structure, contents, or condition of the resource. In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. (ANDS)

vocabulary

(= controlled vocabulary; see also: taxonomy)

A controlled vocabulary, also called an authority file, is an authoritative list of terms to be used in indexing (human or automated). Controlled vocabularies do not necessarily have any structure or relationships between terms within the list and are often used for name authorities (proper nouns), such as persons, organization names, company names, etc. Controlled vocabularies are the broadest category, which includes thesauri and taxonomies. (American Society for Indexing)

References

References

Data Management Videos

The National Agricultural Library has recorded several webinars that researchers may find helpful when creating their data management plans. Visit the NAL Data Management YouTube Playlist(link is external) for the complete series. Highlights include:

Creating a Data Management Plan

https://youtu.be/qtobwSChX7k(link is external)

Topics covered include a review of the 6 expected sections of the

P&P 630: Data Management & Public Access Requirements for ARS

https://youtu.be/R_GlzMF-gg8(link is external)

Topics covered include a comprehensive overview of the P&P 630 which was released in 2020.

Submitting Data to the Ag Data Commons

https://youtu.be/rRfsEE-L1J0(link is external)

Topics covered include an overview of the Ag Data Commons, essential fields for submitting data, and a live demonstration of the Ag Data Commons data submission process.

Page Content Curated By