This glossary provides definitions of key terms related to scientific/technical data curation and management as broadly adopted in the research data and repository community. The authoritative sources cited are gratefully acknowledged.
A
- access
- The ability for a user to view and interact with data stored on a computer or computer system. (abc-clio/ODLIS)
- access level
- see: public access level
- administrative metadata
- (= access and use metadata)
Provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of administrative data; two that sometimes are listed as separate metadata types are: Rights management metadata, which deals with intellectual property rights, and Preservation metadata, which contains information needed to archive and preserve a resource. (LTER)
- API
- (= application programming interface)
A set of software instructions and standards that allows machine to machine communication - like when a website uses a widget to share a link on Twitter or Facebook. (NALT)
- author
- (= creator)
The main researchers involved in producing the data, or the authors of the publication, in priority order. May include those responsible for software creation. (DataCite Metadata Schema)
B
- big data
- An accumulation of data that is too large and complex for processing by traditional database management tools. (Definition of BIG DATA, 2020)
C
- catalog
- see: data catalog
- catalog record
- see: metadata record
- citation
- Support for the ability to establish provenance and attribute credit to research data sources, which allows for easier access to research data within journals and on the Internet. (CODATA-ICSTI; NNLM Data Thesaurus)
- collection
- A grouping of science data that all come from the same source, such as a modeling group or institution. Series/collections have information that is common across all the datasets/granules they contain. (EOSDIS)
- contact name
- (= ContactPerson; Responsible Party)
Person with knowledge of how to access, troubleshoot, or otherwise field issues related to the resource. (DataCite Metadata Schema)
- controlled vocabulary
- see: vocabulary
- CSV
- A standard format for spreadsheet data. Data is represented in a plain text file, with each data row on a new line and commas separating the values on each row. As a very simple open format it is easy to consume and is widely used for publishing open data. (Open Data Handbook)
- curator
- (= data curator)
Person tasked with reviewing, enhancing, cleaning, or standardizing metadata and the associated data submitted for storage, use, and maintenance within a data centre or repository. (DataCite Metadata Schema, DataCurator)
D
- data catalog
- (= catalog)
A searchable and browsable online collection of data sets. A data catalog informs customers about available data sets and metadata around a topic and assists users in locating it quickly. (Dataversity; NYU)
- data dictionary
- A data dictionary provides a detailed description for each element or variable in your dataset and data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description. (DataONE)
- data integrity
- Assuring information will not be accidentally or maliciously altered or destroyed. (NSA)
- data life cycle
- The data lifecycle represents all the stages of data throughout its life from its creation for a study to its distribution, preservation, and reuse. (DataONE; NNLM Data Thesaurus)
- data management plan
- (= DMP)
A data management plan describes the data that will be authored and how the data will be managed and made accessible throughout its lifetime. The contents of the data management plan should include: the types of data to be authored; the standards that would be applied, for example format and metadata content; provisions for archiving and preservation; access policies and provisions; and plans for eventual transition or termination of the data collection in the long-term future. (DataONE)
- data paper
- A factual and objective publication with a focused intent to identify and describe specific data, sets of data, or data collections to facilitate discoverability. (DataCite)
- data publishing
- Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice. (Wikipedia)
- data repository
- see: repository
- data resource
- (= resource)
Resources are the actual files, APIs or links that are being shared. (DKAN)
- dataset
- A dataset is the term for a collection of research data files produced in the course of research for a paper or project, plus accompanying metadata: describing the data, and indicating who produced the data, and who may access it - i.e. title, description, categories, contributors, license and so forth. Usage of the term dataset varies considerably across disciplinary communities. (Mendeley; Renear et al., 2011)
- dataset doi
- see: digital object identifier
- description
- (= summary, abstract)
A rich summary of the dataset: how and why it was generated and how it should (or should not) be used. This can be modified from article text, but should focus on characterizing the data, not the research project. Analogous to an abstract for a paper. (ESIP/NetCDF; Ag Data Commons)
- digital object identifier
- (= dataset doi, DOI)
Globally unique character strings that reference physical, digital, or abstract objects. They provide actionable, interoperable, persistent links to information about the objects they reference. (USGS)
E
- embargo
- (= scheduling option)
A specified period of time during which the dataset is inaccessible. At the end of the embargo period, the dataset will be made available. Metadata describing the dataset is publicly available during this period. (Subject guides: Data Publication: Home, 2020)
- endpoint
- An association between a binding and a network address, specified by a URI, that may be used to communicate with an instance of a service. An end point indicates a specific location for accessing a service using a specific protocol and data format. (W3C)
F
- FAIR data
- A set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable. (FORCE11)
- format
- (= file type, file format, resource format)
A digital resource encoded for storage in a computer file in a standard way. File formats may be either proprietary or free and may be either unpublished or open. (Wikipedia; DCMI)
G
- geographic extent
- The spatial (horizontal and/or vertical) delineation of the resource. (NOAA)
- geospatial data
- Data about objects, events, or phenomena that have a location on the surface of the earth. (Stock & Guesgen)
- GIS
- (= Geographic Information System)
A framework for gathering, managing, and analyzing data. Rooted in the science of geography, GIS integrates many types of data. It analyzes spatial location and organizes layers of information into visualizations using maps and 3D scenes. (Esri)
H
- harvest
- To use the public feed or API of another data portal to import items from that portal's catalog into your own. For example, Data.gov harvests all of its datasets from the data.json files of hundreds of U.S. federal, state and local data portals. (DKAN)
L
- license
- A legal document under which the resource is made available, typically indicated by URL (DCAT, schema.org)
- local resource
- Data files stored and served from an internally managed repository. (DKAN, USGS)
M
- metadata
- Documentation of important aspects of data that describe where, when, and why the data were collected; who collected the data; what types of data were collected; what processes were used to create the data; what quality assurance controls were used; and where the collected data are located. Metadata are provided in a human-readable form as well as in a format that is machine readable (for example, XML) for automated use. (USGS)
- metadata record
- (= catalog record)
An item-level metadata record details the characteristics of a digital object for the purposes of description, resource discovery, and preservation. It typically includes: Descriptive information; Access points; Contextual information; Reference to the original item and collection; Administrative and preservation information. (Xie & Matusiak, 2015)
- metadata schema
- A unified and structured set of rules developed for object documentation and functional activities. (Drake, 2003)
O
- open data
- In general, consistent with the following principles: Public; Accessible; Described; Reusable; Complete; Timely; Managed Post-Release. (Project Open Data)
P
- peer review
- The process in which a new book, article, software program, etc., is submitted by the prospective publisher to experts in the field for critical evaluation prior to publication, a standard procedure in scholarly publishing. (abc-clio/ODLIS)
- processed data
- Data that has been edited, cleaned or modified from the raw data. (MGDS)
- product type
- (= resource type)
A high-level categorization of the most important part of the dataset's actual content – for example, Audiovisual; Collection; Dataset; Image; Model; Software. (Ag Data Commons)
- public access level
- (= access level)
The degree to which this dataset could be made available to the public, regardless of whether it is currently available to the public [e.g. under embargo]. (Project Open Data)
- published (moderation state)
- see: data publishing
R
- raw data
- Refers to data that have not been changed since acquisition. (MGDS)
- registry
- Authoritative, centrally controlled store of information. (W3C)
- remote resource
- (= external resource)
Associated data stored in external data repositories, or code stored in external software repositories. (Dryad)
- repository
- (= data repository, metadata repository)
A place that holds data, makes data available to use, and organizes data in a logical manner. A data repository may also be defined as an appropriate, subject-specific location where researchers can submit their data. (NLM)
- resource format
- see: format
S
- self-citation
- Reference made in a written work to one or more of the author's previous publications (book, periodical article, conference paper, etc.), an accepted practice in scholarly communication, provided important works written on the subject by other authors are not neglected or ignored. (abc-clio/ODLIS)
T
- taxonomy
- Typically a controlled vocabulary with a hierarchical structure, with the understanding that there are different definitions of a hierarchy. Terms within a taxonomy have relations to other terms within the taxonomy. These are typically: parent/broader term, child/narrower term, or often both if the term is at mid-level within a hierarchy. (American Society for Indexing)
- temporal coverage
- (= temporal extent)
The time period that the dataset covers. An interval of time that is named or defined by its start and end. (DCAT)
U
- use limitations
- Limitations regarding the dataset's usability. Example statements include "estimates biased over water," "equipment malfunctioned during a specified time," or "granularity makes data unsuitable for certain kinds of analysis". (Ag Data Commons)
V
- version
- A new version of a dataset is created when there is a change in the structure, contents, or condition of the resource. In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. (ANDS)
- vocabulary
- (= controlled vocabulary; see also: taxonomy)
A controlled vocabulary, also called an authority file, is an authoritative list of terms to be used in indexing (human or automated). Controlled vocabularies do not necessarily have any structure or relationships between terms within the list and are often used for name authorities (proper nouns), such as persons, organization names, company names, etc. Controlled vocabularies are the broadest category, which includes thesauri and taxonomies. (American Society for Indexing)
References
- American Society for Indexing, Taxonomies & Controlled Vocabularies SIG (nd). About Taxonomies & Controlled Vocabularies. https://www.taxonomies-sig.org/about.htm
- CODATA-ICSTI Task Group on Data Citation Standards and Practices (2013). Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal, 12(0), CIDCR1-CIDCR75. https://datascience.codata.org/articles/abstract/253/
- Consultative Committee for Space Data Systems (2012). Reference Model for an Open Archival Information System (OAIS), Recommended Practice, Issue 2. CCSDS 650.0-M-2. https://public.ccsds.org/pubs/650x0m2.pdf
- Data versioning - ANDS. (2020). Retrieved 19 August 2020, from https://www.ands.org.au/working-with-data/data-management/data-versioni…
- DataCite Metadata Working Group (2019). DataCite Metadata Schema Documentation for the Publication and Citation of Research Data. Version 4.3. DataCite e.V. https://doi.org/10.14454/7xq3-zf69
- DataONE (nd.) Create a Data Dictionary. https://dataoneorg.github.io/Education/bestpractices/create-a-data
- DataONE (nd). Data Life Cycle. https://www.dataone.org/data-life-cycle
- DataONE (nd). Data Management Planning. https://www.dataone.org/data-management-planning
- Dataverse Project (nd). About The Project. https://dataverse.org/about
- Definition of BIG DATA. (2020). Retrieved 12 August 2020, from https://www.merriam-webster.com/dictionary/big%20data
- DKAN (2017). DKAN Open Data Portal. https://dkan.readthedocs.io/en/latest/
- Drake, M. (2003). Encyclopedia of Library and Information Science, Second Edition, Vol. 3, CRC Press, USA
- Dryad (2019). Fair Data: Best practices for creating reusable data publications. https://datadryad.org/stash/best_practices
- Dublin Core Metadata Initiative (2020). DCMI Metadata Terms. Release 2020-01-20. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
- Earth Science Information Partners (ESIP) (2020). Attribute Convention for Data Discovery (ACDD) 1-3. ESIP Federation. https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
- Federal Enterprise Architecture (FEA) Program (2007). FEA Consolidated Reference Model Version 2.3. https://www.reginfo.gov/public/jsp/Utilities/FEA_CRM_v23_Final_Oct_2007…
- figshare.com
- Food and Agriculture Organization of the United Nations (2020). FAO Term Portal. http://www.fao.org/faoterm/en/
- FORCE11 (2016). The FAIR Data Principles. https://www.force11.org/group/fairgroup/fairprinciples
- GIS Mapping Software, Location Intelligence & Spatial Analytics | Esri. (2020). Retrieved 12 August 2020, from https://www.esri.com
- Haas, H. & Brown, A. (2004). Web Services Glossary. W3C Working Group Note 11 February 2004. http://www.w3.org/TR/ws-gloss/
- Knight, M. (2017). Data Topics: What is a Data Catalog? Dataversity: Data Topics. https://www.dataversity.net/what-is-a-data-catalog/
- Knight, M. (2017). Data Topics: What is a Data Dictionary? Dataversity: Data Topics. https://www.dataversity.net/what-is-a-data-dictionary/
- Knight, M. (2017). Data Topics: What is a Metadata Repository? Dataversity: Data Topics. https://www.dataversity.net/what-is-a-metadata-repository/
- LTER General Data Use Agreement (2005). In LTER Network Data Access Policy Revision 3, Data Access Requirements, and General Data Use Agreement. https://lternet.edu/documents/lter-network-data-access-policy-revision-…
- LTER (2004). Understanding Metadata. NISO Press, Bethesda. ISBN 1-880124-62-9 https://www.lter.uaf.edu/metadata_files/UnderstandingMetadata.pdf
- Marine Geoscience Data System (MGDS) (nd). Frequently-Asked Questions about data terminology. http://www.marine-geo.org/help/data_FAQ.php
- Mendeley Ltd. (2019). FAQ. https://data.mendeley.com/faq
- Mize, J. (2012). ISO 19115 Geographic information — Metadata Workbook. National Oceanic and Atmospheric Administration
- NASA EarthData (2020). Common Metadata Repository (CMR). https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-com…
- NASA EarthData (2020). Data Rights and Related Issues. https://earthdata.nasa.gov/collaborate/open-data-services-and-software/…
- NASA EarthData (2020). EOSDIS Glossary. https://earthdata.nasa.gov/learn/user-resources/glossary
- National Library of Medicine (2020). https://nnlm.gov/guides/data-thesaurus/data-repository
- National Security Agency (1998). NSA Glossary of Terms Used in Security and Intrusion Detection
- NNLM Data Thesaurus (nd.) https://nnlm.gov/guides/data-thesaurus/data-citation
- NNLM Data Thesaurus (nd.) https://nnlm.gov/guides/data-thesaurus/data-lifecycle
- NYU Health Sciences Library (nd). NYU Data Catalog. https://datacatalog.med.nyu.edu/about
- Open Data Handbook (nd.) https://opendatahandbook.org/glossary/en/terms/csv
- Open Knowledge Foundation (nd). The Open Definition. https://opendefinition.org/
- Project Open Data (2014). Project Open Data Metadata Schema v1.1 https://project-open-data.cio.gov/v1.1/schema/
- Project Open Data (2014). Project Open Data. Principles. https://project-open-data.cio.gov/principles/
- Reitz, J. M. (2004). Online Dictionary for Library and Information Science. ABC-CLIO. https://products.abc-clio.com/ODLIS/odlis_about.aspx
- Renear, A.H., Sacchi, S. & Wickett, K.M. (2010). Definitions of dataset in the scientific and technical literature. Proc. Am. Soc. Info. Sci. Tech., 47: 1-4. https://doi.org/10.1002/meet.14504701240
- Resources.data.gov (nd). A repository of Federal Enterprise Data Resources. Glossary. https://resources.data.gov/ [formerly Project Open Data]
- Riley, J. (2017). Understanding metadata. NISO Primer. National Information Standards Organization (NISO). ISBN 978-1-937522-72-8. https://www.niso.org/publications/understanding-metadata-2017
- schema.org
- Stock, K., & Guesgen, H. (2020). Chapter 10 - Geospatial Reasoning With Open Data. In Automating Open Source Intelligence (pp. 171-204). Syngress. Retrieved from https://doi.org/10.1016/B978-0-12-802916-9.00010-5
- Subject guides: Data Publication: Home. (2020). Retrieved 19 August 2020, from https://libguides.library.usyd.edu.au/datapublication
- The National Agricultural Library Agricultural Thesaurus. (2002). Retrieved 12 August 2020, from https://agclass.nal.usda.gov/
- USDA National Agricultural Library (nd). Ag Data Commons Data Submission Manual. https://data.nal.usda.gov/ag-data-commons-data-submission-manual#descri…
- U.S. Geological Survey (nd). Data Management. https://www.usgs.gov/products/data-and-tools/data-management
- U.S. Geological Survey (nd). Data Management: Digital Object Identifiers. https://www.usgs.gov/products/data-and-tools/data-management/digital-ob…
- U.S. Geological Survey (nd). Data Management: Data Dictionaries. https://www.usgs.gov/products/data-and-tools/data-management/data-dicti…
- Wikipedia contributors (2020). Data publishing. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Data_publishing&oldid=963630…
- Wikipedia contributors (2020). File format. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=File_format&oldid=974314346
- Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018. https://doi.org/10.1038/sdata.2016.18
- W3C (2019). Library terminology informally explained. https://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained
- W3C Dataset Exchange Working Group (2020). Data Catalog Vocabulary (DCAT) – Version 2. W3C Recommendation 04 February 2020. https://www.w3.org/TR/2020/REC-vocab-dcat-2-20200204/
- Xie, I. & Matusiak, K. K. (2015). Discover Digital Libraries: Theory and Practice. Elsevier. 388 pp. ISBN 978-0-12-417112-1. https://www.sciencedirect.com/book/9780124171121/discover-digital-libra…