Data Management Planning

Data are valuable and often unique assets that should be properly managed to be FAIR - findable, accessible, interoperable, and re-usable into the future. Scientists, data managers, library staff, and data users can take actions across the research life cycle which contribute to effective and sustainable data management.

The National Agricultural Library provides guidance on data management as outlined in P&P 630: Data Management & Public Access Requirements for ARS. These guidelines incorporate US Federal public access and open data directives and comply with a broad range of current funding agency requirements for Data Management Plans (DMPs).

The following sections provide information for researchers as they embark on creating their data management plans.

Create a DMP

DMPs created within ARS program areas with USDA funding must follow the structure outlined in P&P 630. These DMPs contain 6 distinct sections. We provide guidelines covering typical DMP components below, with examples for many agricultural research domains. You can also view a recorded Creating Data Management Plan webinar outlining these guidelines.

You can download this example of a well-formed DMP (.docx) and modify as needed:

Data_Management_Plan_Example.docx

Download and view the same DMP with annotations to explain each section:

Data_Management_Plan_Example_annotated.docx

Core DMP components

1. Expected Data Types

Describe the types of data you will produce (e.g. digital, non-digital) and how they will be generated (lab work, field work, surveys, etc.).

For example,

You may collect environmental data from real-time sensors, or images from phenocams.
You may conduct interviews with digital video and audio recordings and subsequent digital transcriptions.
You may have field notebooks from crop management experiments or field trials that are not "born digital."
You may generate sequence data for whole genomes or metagenomics.
During analysis or modeling, you may create customized computer code or scripts for transformation or data cleaning.

Describe the metadata you will generate. Best practices encourage metadata to facilitate wider understanding and re-use of the data. You should record metadata describing the data you have collected for each experiment, and/or each physical sample. Sometimes this metadata is embedded in the files produced by the sensors or sequencing machines.

If you plan to re-use raw or processed data from other studies, name the anticipated sources.

2. Data Formats and Standards

Describe the data formats (e.g. csv, pdf, doc) for both raw and processed data you will produce. Note any plans to digitize data created in a non-digital format. Most U.S. funding agencies require machine-readable formats where possible.

For example:

Written material formats such as Microsoft Word doc and LaTex files, with TXT being most machine-readable
Spreadsheets such as Microsoft Excel, with CSV files being most machine-readable
Curriculum or instructional material such as Microsoft PowerPoint
Digital image formats such as TIFF, JPEG
Digital video formats such as MPEG, MOV
Databases such as MySQL, Microsoft Access
Code/Software such as Matlab m files, R scripts
Other/Specialized data formats such as fasta, shp, kmz
Specimen or observation data in Darwin Core
GIS map layers in a variety of formats
Results of Application Programming Interface (API) or web service calls in JSON or XML
Find a comprehensive list of file formats

Describe the standards or schemas that will be used to structure, store, and share the data and metadata. We strongly encourage community-recognized and non-proprietary standards for maximum interoperability and reusability. Name and link to any published data dictionaries, data standards, or ontologies that you are using. For example:

The ICASA Master Variable list, a naming convention for agricultural model variables
Gene Ontology, the world's largest source of information on the functions of genes
Integrated Taxonomic Information System, authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world
ISO 19115, the required metadata standard for all USDA geospatial data

If you plan to deposit data in an existing database or repository, refer to their data and metadata standards. For instance, data deposited in Ag Data Commons is described with metadata that conforms to the DataCite metadata standard, as well as Project Open Data metadata for records forwarded to data.gov. You may require additional metadata to describe your research. We strongly encourage depositing data in subject-specific databases that follow community-recognized metadata standards.

3. Data Storage and Preservation of Access

Describe provisions for depositing data in a trusted/certified long-term preservation and archiving environment. This includes your plan for backups, cloud storage, access protocols, obsolescence avoidance, data migration strategy, persistent identifiers, etc.

Describe where data will be stored during and after the life of the project. Name specific workspaces and repositories as appropriate. For example:

You may initially manage data on local or network hard drives and then transfer it to a repository for long-term access and preservation.
You may maintain data on a high-speed computing platform such as SCINet or CyVerse or on a shared workspace like Open Science Framework during analysis.
You may deposit data in a subject-specific repository (e.g. NCBI for genomics data, AgCROS for geospatial data) or an institutional repository (e.g. Purdue University Research Repository).
You may maintain data using your own infrastructure beyond the life of the project.

We strongly encourage depositing data in discipline-specific databases or repositories that follow community-recognized metadata standards. For USDA-funded data without a discipline-specific repository, the NAL maintains the Ag Data Commons as a generalist ag repository. This enables the USDA's compliance with both public access and open data requirements to make federally funded research data open, accessible, and machine-readable. Data stored in the Ag Data Commons will receive a DOI (digital object identifier) for persistent access.

Describe the technical infrastructure and staff expertise.

Describe the plans for long-term preservation. Items to cover include:

Amount and size of data expected to be archived for both short- and long-term. Ideally this includes raw data and/or minimally processed data (e.g., with quality control)
The planned retention period for the data
Strategies, tools, and contingency plans to avoid data loss, degradation, or damage

4. Data Sharing and Public Access

Describe your data access and sharing procedures both during the project and after the data collection process is complete, as well as plans for publication or public release. Name specific repositories, databases, and catalogs as appropriate. Many repositories for storage and preservation also offer public access functionality (e.g. the Ag Data Commons).

Explain any restrictions, embargo periods, license, or public access level that apply to the data (see Project Open Data for more information). Data generated by federal employees should carry either US Public Domain or Creative Commons CCZero status, while federally funded data and non-federal data may vary depending on funder requirements. Find license definitions and additional information at opendefinition.org.

The USDA strongly discourages limiting distribution of data to project or personal websites only. Similarly, in most cases it is insufficient to make data available only on request. The USDA prefers researchers deposit data in a trusted/certified long-term preservation and archiving environment.

Other items to cover include:

Specify plans to create a catalog record for publicly available datasets in the Ag Data Commons if funded by USDA, regardless of where you plan to publish datasets.
Outline restrictions such as copyright, proprietary and company secrets, confidentiality, patent, appropriate credit, disclaimers, or conditions for use of the data.
Indicate how you will ensure that appropriate funding project numbers (e.g. CRIS numbers such as NIFA award numbers or ARS project numbers) will be cited with the data.

5. Roles and Responsibilities

Describe information about project team members and tasks associated with data management activities over the course of the project.

Note who will primarily ensure DMP implementation.
- This is particularly important for multi-investigator and multi-institutional projects. This may consist of a named data manager or responsibility via role.
Define key roles of the DMP team.
- Appropriate for larger scale projects – identify who will do which tasks.
Provide a contingency plan in case key personnel leave the project.
- For example, if data are managed individually or collaboratively on a platform such as ARS SCINet, and an investigator leaves, note who becomes responsible for the data.
Describe what resources are needed to carry out the DMP.
- If the DMP execution requires funds, add them to the budget request and budget narrative. Projects must budget for sufficient resources to implement the proposed DMP. For example, there may be data publication charges, data storage charges, or salary for data managers.

6. Monitoring and reporting

Include information about how the researcher plans to monitor and report on implementation of the DMP during and after the project, as required by funder. This may include progress in data sharing (publications, database, software, etc.).

DMP Feedback and Review

Submit your DMP for review

NAL's Knowledge Services Division provides a DMP draft review service for ARS researchers based on the provisions and structure outlined in ARS P&P 630. The review may also incorporate aspects of industry and subject-specific standards for data and metadata management.

After preparing your DMP draft, contact NAL for feedback. Include a copy of your DMP draft as well as a copy of the project proposal. Please allow at least 5 business days for this service.

You can also check your own DMP using the checklist for DMP peer review below. Reviewers of USDA DMPs may download the DMP Reviewer Checklist document to receive tips and resources for the review process and use the following questions to guide their evaluation. If you have suggestions or comments on this review checklist, please contact NAL

Checklist for data management plan peer review

General Considerations

Does the DMP cover the full life cycle of the data?
Are best practices in the scientific discipline available and if so are they referenced and/or followed?
Does the budget adequately cover activities in the DMP?
Is the DMP thoughtful and specific such that it is apparent that the project has personnel with appropriate knowledge and experience to manage the data?

Find a Data Repository

Researchers should ideally determine if an appropriate domain repository exists for their data, and tools such as FAIRsharing.org and re3data.org can help with this determination. Alternatively, there are multiple "generalist" repositories available with broader focus including the Ag Data Commons, the preferred repository for USDA-supported data that does not fit into an existing subject-specific repository. Researchers need to consider the requirements of their community, funder, institution, publisher, and possibly other factors to select an appropriate repository.

Find links to data resources

re3data.org

re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions. re3data.org promotes a culture of sharing, increased access, and better visibility of research data.

FAIRsharing.org

FAIRsharing.org is a curated, informative, and educational resource on worldwide data and metadata standards, inter-related to databases and data policies. Researchers can use FAIRsharing as a lookup resource to identify and cite the standards, databases or repositories that exist for their data and discipline, for example, when creating a data management plan for a grant proposal or funded project; or when submitting a manuscript to a journal, to identify the recommended databases and repositories, as well as the standards they implement to ensure all relevant information about the data is collected at the source.

Journal publisher recommendations

Some major academic publishers list recommended research data repositories:

Springer Nature/BioMed Central: https://www.springernature.com/gp/authors/research-data-policy/recommended-repositories
Nature Scientific Data: https://www.nature.com/sdata/policies/repositories
Elsevier: https://www.elsevier.com/authors/author-resources/research-data/data-base-linking#repositories
DataONE member repositories: https://www.dataone.org/network/
PLoS ONE: https://journals.plos.org/plosone/s/recommended-repositories

Repository Finder

Repository Finder, a pilot project of the Enabling FAIR Data Project led by the American Geophysical Union (AGU) in partnership with DataCite and the Earth, space and environment sciences community, can help you find an appropriate repository to deposit your research data. The tool is hosted by DataCite and queries the re3data registry of research data repositories.

DataSeer

Designed primarily for journal publisher and funder use cases, DataSeer uses Natural Language Processing (NLP) on an uploaded manuscript to a suggest which datasets from the article should be shared, what format they should be in, and which repository is most suitable.

OpenDOAR

OpenDOAR is the quality-assured, global Directory of Open Access Repositories that provide free, open access to academic outputs and resources. Each repository record within OpenDOAR is curated by an editorial team to offer a trusted service for the community. Criteria for listing include open access worldwide without fees, registration or logins. A variety of academic content types are included, e.g. journal articles, theses/ dissertations, reports, working papers, conference proceedings, books/ book chapter) and/or academic resources with sufficient metadata or documentation to make the material re-usable (e.g. archival material, datasets, software, images, videos, learning material).

Download a copy of the "Where do I put my data?" Word document, which includes search instructions for the resources listed: Finding_Repository.docx

Open Data Sharing Exceptions

Exceptions to making data publicly accessible

Important Note: Even if you don't share all your data, you still need to create a Data Management Plan.

As per USDA DR 1020-006, datasets that are outside the scope of the Public Access to Scholarly Publications and Digital Scientific Research Data policy and therefore may not be required to share include:

Computational models and computational model-related content (including parameters, inputs, outputs, and other derived output products).
Data from secondary sources (i.e., secondary outside data).
Data that would not be necessary for validation of scientific research findings (i.e., trivial data).

As per USDA DR 1020-006, researchers are not required to share research data if it meets one of the following criteria:

Personally identifiable information (PII) or other information that could enable re-identification of individuals or businesses, alone or in combination with other publicly available information.
Proprietary data assets.
Data related to protecting critical infrastructure.
Data related to the physical location of threatened or endangered species or sensitive archaeological sites.
Data for which release would be inconsistent with U.S. national, homeland, or economic security or would have significant negative impact on intellectual property rights, innovation, and U.S. competitiveness.
Data for which public access is inconsistent with the agency or staff office mission (e.g., documents designated as Controlled Unclassified Information (CUI).
Other data assets whose release is limited by law, regulation, contract, agreement, national security requirements, or policy (e.g., classified data or dual-use research data).
Data covered by a Freedom of Information Act (FOIA) exemption.

If the dataset does not qualify for an exemption, but the data authors feel that the data should not be shared publicly, then they may request a waiver or an extension of the timeline. The waiver process is detailed the USDA DR 1020-006.

Data Management Glossary

This glossary provides definitions of key terms related to scientific/technical data curation and management as broadly adopted in the research data and repository community. The authoritative sources cited are gratefully acknowledged.

Glossary

A

access

The ability for a user to view and interact with data stored on a computer or computer system. (abc-clio/ODLIS)

access level

see: public access level

administrative metadata

(= access and use metadata)

Provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that sometimes are listed as separate metadata types are: Rights management metadata, which deals with intellectual property rights, and Preservation metadata, which contains information needed to archive and preserve a resource. (LTER)

API

(= application programming interface)

A set of software instructions and standards that allows machine to machine communication - like when a website uses a widget to share a link on Twitter or Facebook. (NALT)

author

(= creator)

The main researchers involved in producing the data, or the authors of the publication, in priority order. May include those responsible for software creation. (DataCite Metadata Schema)

B

big data: An accumulation of data that is too large and complex for processing by traditional database management tools. (Definition of BIG DATA, 2020)

C

catalog

see: data catalog

catalog record

see: metadata record

citation

Support for the ability to establish provenance and attribute credit to research data sources, which allows for easier access to research data within journals and on the Internet. (CODATA-ICSTI; NNLM Data Thesaurus)

collection

A grouping of science data that all come from the same source, such as a modeling group or institution. Series/collections have information that is common across all the datasets/granules they contain. (EOSDIS)

contact name

(= ContactPerson; Responsible Party)

Person with knowledge of how to access, troubleshoot, or otherwise field issues related to the resource. (DataCite Metadata Schema)

controlled vocabulary

see: vocabulary

CSV

A standard format for spreadsheet data. Data is represented in a plain text file, with each data row on a new line and commas separating the values on each row. As a very simple open format it is easy to consume and is widely used for publishing open data. (Open Data Handbook)

curator

(= data curator)

Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data centre or repository. (DataCite Metadata Schema, DataCurator)

D

data catalog

(= catalog)

A searchable and browsable online collection of data sets. A data catalog informs customers about available data sets and metadata around a topic and assists users in locating it quickly. (Dataversity; NYU)

data dictionary

A data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. (DataONE)

data integrity

Assuring information will not be accidentally or maliciously altered or destroyed. (NSA)

data life cycle

The data lifecycle represents all the stages of data throughout its life from its creation for a study to its distribution, preservation, and reuse. (DataONE; NNLM Data Thesaurus)

data management plan

(= DMP)

A data management plan describes the data that will be authored and how the data will be managed and made accessible throughout its lifetime. The contents of the data management plan should include: the types of data to be authored; the standards that would be applied, for example format and metadata content; provisions for archiving and preservation; access policies and provisions; and plans for eventual transition or termination of the data collection in the long-term future. (DataONE)

data paper

A factual and objective publication with a focused intent to identify and describe specific data, sets of data, or data collections to facilitate discoverability. (DataCite)

data publishing

Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice. (Wikipedia)

data repository

see: repository

data resource

(= resource)

Resources are the actual files, APIs or links that are being shared. (DKAN)

dataset

A dataset is the term for a collection of research data files produced in the course of research for a paper or project, plus accompanying metadata: describing the data, and indicating who produced the data, and who may access it - i.e. title, description, categories, contributors, license and so forth. Usage of the term dataset varies considerably across disciplinary communities. (Mendeley; Renear et al., 2011)

dataset doi

see: digital object identifier

description

(= summary, abstract)

A rich summary of the dataset: how and why it was generated and how it should (or should not) be used. This can be modified from article text, but should focus on characterizing the data, not the research project. Analogous to an abstract for a paper. (ESIP/NetCDF; Ag Data Commons)

digital object identifier

(= dataset doi, DOI)

Globally unique character strings that reference physical, digital, or abstract objects. They provide actionable, interoperable, persistent links to information about the objects they reference. (USGS)

E

embargo

(= scheduling option)

A specified period of time during which the dataset is inaccessible. At the end of the embargo period, the dataset will be made available. Metadata describing the dataset is publicly available during this period. (Subject guides: Data Publication: Home, 2020)

endpoint

An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format. (W3C)

F

FAIR data

A set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable. (FORCE11)

format

(= file type, file format, resource format)

A digital resource encoded for storage in a computer file in a standard way. File formats may be either proprietary or free and may be either unpublished or open. (Wikipedia; DCMI)

G

geographic extent

The spatial (horizontal and/or vertical) delineation of the resource. (NOAA)

geospatial data

Data about objects, events, or phenomena that have a location on the surface of the earth. (Stock & Guesgen)

GIS

(= Geographic Information System)

A framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. It analyzes spatial location and organizes layers of information into visualizations using maps and 3D scenes. (Esri)

H

harvest: To use the public feed or API of another data portal to import items from that portal's catalog into your own. For example, Data.gov harvests all of its datasets from the data. json files of hundreds of U.S. federal, state and local data portals. (DKAN)

L

license: A legal document under which the resource is made available, typically indicated by URL (DCAT, schema.org)
local resource: Data files stored and served from an internally managed repository. (DKAN, USGS)

M

metadata

Documentation of important aspects of data that describe where, when, and why the data were collected; who collected the data; what types of data were collected; what processes were used to create the data; what quality assurance controls were used; and where the collected data are located. Metadata are provided in a human-readable form as well as in a format that is machine readable (for example, XML) for automated use. (USGS)

metadata record

(= catalog record)

An item-level metadata record details the characteristics of a digital object for the purposes of description, resource discovery, and preservation. It typically includes: Descriptive information; Access points; Contextual information; Reference to the original item and collection; Administrative and preservation information. (Xie Matusiak, 2015)

metadata schema

A unified and structured set of rules developed for object documentation and functional activities. (Drake, 2003)

O

open data: In general, consistent with the following principles: Public; Accessible; Described; Reusable; Complete; Timely; Managed Post-Release. (Project Open Data)

P

peer review

The process in which a new book, article, software program, etc., is submitted by the prospective publisher to experts in the field for critical evaluation prior to publication, a standard procedure in scholarly publishing. (abc-clio/ODLIS)

processed data

Data that has been edited, cleaned or modified from the raw data. (MGDS)

product type

(= resource type)

A high-level categorization of the most important part of the dataset's actual content – for example, Audiovisual; Collection; Dataset; Image; Model; Software. (Ag Data Commons)

public access level

(= access level)

The degree to which this dataset could be made available to the public, regardless of whether it is currently available to the public [e.g. under embargo]. (Project Open Data)

published (moderation state)

see: data publishing

R

raw data

Refers to data that have not been changed since acquisition. (MGDS)

registry

Authoritative, centrally controlled store of information. (W3C)

remote resource

(= external resource)

Associated data stored in external data repositories, or code stored in external software repositories. (Dryad)

repository

(= data repository, metadata repository)

A place that holds data, makes data available to use, and organizes data in a logical manner. A data repository may also be defined as an appropriate, subject-specific location where researchers can submit their data. (NLM)

resource format

see: format

S

self-citation: Reference made in a written work to one or more of the author's previous publications (book, periodical article, conference paper, etc.), an accepted practice in scholarly communication, provided important works written on the subject by other authors are not neglected or ignored. (abc-clio/ODLIS)

T

taxonomy

Typically, a controlled vocabulary with a hierarchical structure, with the understanding that there are different definitions of a hierarchy. Terms within a taxonomy have relations to other terms within the taxonomy. These are typically: parent/broader term, child/narrower term, or often both if the term is at mid-level within a hierarchy. (American Society for Indexing)

temporal coverage

(= temporal extent)

The time period that the dataset covers. An interval of time that is named or defined by its start and end. (DCAT)

U

use limitations: Limitations regarding the dataset's usability. Example statements include "estimates biased over water," "equipment malfunctioned during a specified time," or "granularity makes data unsuitable for certain kinds of analysis". (Ag Data Commons)

V

version

A new version of a dataset is created when there is a change in the structure, contents, or condition of the resource. In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. (ANDS)

vocabulary

(= controlled vocabulary; see also: taxonomy)

A controlled vocabulary, also called an authority file, is an authoritative list of terms to be used in indexing (human or automated). Controlled vocabularies do not necessarily have any structure or relationships between terms within the list and are often used for name authorities (proper nouns), such as persons, organization names, company names, etc. Controlled vocabularies are the broadest category, which includes thesauri and taxonomies. (American Society for Indexing)

References

American Society for Indexing, Taxonomies & Controlled Vocabularies SIG (nd). About Taxonomies & Controlled Vocabularies. https://www.taxonomies-sig.org/about.htm
CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013). Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal, 12(0), CIDCR1-CIDCR75. https://datascience.codata.org/articles/abstract/253/
Consultative Committee for Space Data Systems (2012). Reference Model for an Open Archival Information System (OAIS), Recommended Practice, Issue 2. CCSDS 650.0-M-2. https://public.ccsds.org/pubs/650x0m2.pdf
Data versioning - ANDS. (2020). Retrieved 19 August 2020, from https://www.ands.org.au/working-with-data/data-management/data-versioni…
DataCite Metadata Working Group (2019). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. DataCite e.V. https://doi.org/10.14454/7xq3-zf69
DataONE (nd.) Create a Data Dictionary. https://dataoneorg.github.io/Education/bestpractices/create-a-data
DataONE (nd). Data Life Cycle. https://www.dataone.org/data-life-cycle
DataONE (nd). Data Management Planning. https://www.dataone.org/data-management-planning
Dataverse Project (nd). About The Project. https://dataverse.org/about
Definition of BIG DATA. (2020). Retrieved 12 August 2020, from https://www.merriam-webster.com/dictionary/big%20data
DKAN (2017). DKAN Open Data Portal. https://dkan.readthedocs.io/en/latest/
Drake, M. (2003). Encyclopedia of Library and Information Science, Second Edition, Vol. 3, CRC Press, USA
Dryad (2019). Fair Data: Best practices for creating reusable data publications. https://datadryad.org/stash/best_practices
Dublin Core Metadata Initiative (2020). DCMI Metadata Terms. Release 2020-01-20. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
Earth Science Information Partners (ESIP) (2020). Attribute Convention for Data Discovery (ACDD) 1-3. ESIP Federation. https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
Federal Enterprise Architecture (FEA) Program (2007). FEA Consolidated Reference Model Version 2.3. https://www.reginfo.gov/public/jsp/Utilities/FEA_CRM_v23_Final_Oct_2007…
figshare.com
Food and Agriculture Organization of the United Nations (2020). FAO Term Portal. http://www.fao.org/faoterm/en/
FORCE11 (2016). The FAIR Data Principles. https://www.force11.org/group/fairgroup/fairprinciples
GIS Mapping Software, Location Intelligence & Spatial Analytics | Esri. (2020). Retrieved 12 August 2020, from https://www.esri.com
Haas, H. & Brown, A. (2004). Web Services Glossary. W3C Working Group Note 11 February 2004. http://www.w3.org/TR/ws-gloss/
Knight, M. (2017). Data Topics: What is a Data Catalog? Dataversity: Data Topics. https://www.dataversity.net/what-is-a-data-catalog/
Knight, M. (2017). Data Topics: What is a Data Dictionary? Dataversity: Data Topics. https://www.dataversity.net/what-is-a-data-dictionary/
Knight, M. (2017). Data Topics: What is a Metadata Repository? Dataversity: Data Topics. https://www.dataversity.net/what-is-a-metadata-repository/
LTER General Data Use Agreement (2005). In LTER Network Data Access Policy Revision 3, Data Access Requirements, and General Data Use Agreement. https://lternet.edu/documents/lter-network-data-access-policy-revision-…
LTER (2004). Understanding Metadata. NISO Press, Bethesda. ISBN 1-880124-62-9 https://www.lter.uaf.edu/metadata_files/UnderstandingMetadata.pdf
Marine Geoscience Data System (MGDS) (nd). Frequently-Asked Questions about data terminology. http://www.marine-geo.org/help/data_FAQ.php
Mendeley Ltd. (2019). FAQ. https://data.mendeley.com/faq
Mize, J. (2012). ISO 19115 Geographic information — Metadata Workbook. National Oceanic and Atmospheric Administration
NASA EarthData (2020). Common Metadata Repository (CMR). https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-com…
NASA EarthData (2020). Data Rights and Related Issues. https://earthdata.nasa.gov/collaborate/open-data-services-and-software/…
NASA EarthData (2020). EOSDIS Glossary. https://earthdata.nasa.gov/learn/user-resources/glossary
National Library of Medicine (2020). https://nnlm.gov/guides/data-thesaurus/data-repository
National Security Agency (1998). NSA Glossary of Terms Used in Security and Intrusion Detection
NNLM Data Thesaurus (nd.) https://nnlm.gov/guides/data-thesaurus/data-citation
NNLM Data Thesaurus (nd.) https://nnlm.gov/guides/data-thesaurus/data-lifecycle
NYU Health Sciences Library (nd). NYU Data Catalog. https://datacatalog.med.nyu.edu/about
Open Data Handbook (nd.) https://opendatahandbook.org/glossary/en/terms/csv
Open Knowledge Foundation (nd). The Open Definition. https://opendefinition.org/
Project Open Data (2014). Project Open Data Metadata Schema v1.1 https://project-open-data.cio.gov/v1.1/schema/
Project Open Data (2014). Project Open Data. Principles. https://project-open-data.cio.gov/principles/
Reitz, J. M. (2004). Online Dictionary for Library and Information Science. ABC-CLIO. https://products.abc-clio.com/ODLIS/odlis_about.aspx
Renear, A.H., Sacchi, S. & Wickett, K.M. (2010). Definitions of dataset in the scientific and technical literature. Proc. Am. Soc. Info. Sci. Tech., 47: 1-4. https://doi.org/10.1002/meet.14504701240
Resources.data.gov (nd). A repository of Federal Enterprise Data Resources. Glossary. https://resources.data.gov/ [formerly Project Open Data]
Riley, J. (2017). Understanding metadata. NISO Primer. National Information Standards Organization (NISO). ISBN 978-1-937522-72-8. https://www.niso.org/publications/understanding-metadata-2017
schema.org
Stock, K., & Guesgen, H. (2020). Chapter 10 - Geospatial Reasoning With Open Data. In Automating Open Source Intelligence (pp. 171-204). Syngress. Retrieved from https://doi.org/10.1016/B978-0-12-802916-9.00010-5
Subject guides: Data Publication: Home. (2020). Retrieved 19 August 2020, from https://libguides.library.usyd.edu.au/datapublication
The National Agricultural Library Agricultural Thesaurus. (2002). Retrieved 12 August 2020, from https://agclass.nal.usda.gov/
USDA National Agricultural Library (nd). Ag Data Commons Data Submission Manual. https://data.nal.usda.gov/ag-data-commons-data-submission-manual#descri…
U.S. Geological Survey (nd). Data Management. https://www.usgs.gov/products/data-and-tools/data-management
U.S. Geological Survey (nd). Data Management: Digital Object Identifiers. https://www.usgs.gov/products/data-and-tools/data-management/digital-ob…
U.S. Geological Survey (nd). Data Management: Data Dictionaries. https://www.usgs.gov/products/data-and-tools/data-management/data-dicti…
Wikipedia contributors (2020). Data publishing. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Data_publishing&oldid=963630…
Wikipedia contributors (2020). File format. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=File_format&oldid=974314346
Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018. https://doi.org/10.1038/sdata.2016.18
W3C (2019). Library terminology informally explained. https://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained
W3C Dataset Exchange Working Group (2020). Data Catalog Vocabulary (DCAT) – Version 2. W3C Recommendation 04 February 2020. https://www.w3.org/TR/2020/REC-vocab-dcat-2-20200204/
Xie, I. & Matusiak, K. K. (2015). Discover Digital Libraries: Theory and Practice. Elsevier. 388 pp. ISBN 978-0-12-417112-1. https://www.sciencedirect.com/book/9780124171121/discover-digital-libra…

Data Management Videos

The National Agricultural Library has recorded several webinars that researchers may find helpful when creating their data management plans. Visit the NAL Data Management YouTube Playlist for the complete series. Highlights include:

Creating a Data Management Plan

https://youtu.be/qtobwSChX7k

Topics covered include a review of the 6 expected sections of the

P&P 630: Data Management & Public Access Requirements for ARS

https://youtu.be/R_GlzMF-gg8

Topics covered include a comprehensive overview of the P&P 630 which was released in 2020.

Submitting Data to the Ag Data Commons

https://youtu.be/rRfsEE-L1J0

Topics covered include an overview of the Ag Data Commons, essential fields for submitting data, and a live demonstration of the Ag Data Commons data submission process.

The current data management policy builds on the 2014 Implementation Plan to Increase Public Access to Results of USDA-funded Scientific Research [pdf, 24 pages] and Data Management Plan for NIFA-Funded Research, Education, and Extension Projects, which in 2019 requested DMPs for all competitive grant programs.