Datasheet for Dataset Template
作者
Olamilekan Wahab
最近上传
5 年前
许可
Creative Commons CC BY 4.0
摘要
This template allows easy creation of datasheets based on the work by Gebru, Timnit et al. “Datasheets for Datasets.” ArXiv abs/1803.09010 (2018): n. pag.
This template allows easy creation of datasheets based on the work by Gebru, Timnit et al. “Datasheets for Datasets.” ArXiv abs/1803.09010 (2018): n. pag.
\documentclass[letterpaper, 10 pt, conference]{ieeeconf} % Comment this line out
% if you need a4paper
%\documentclass[a4paper, 10pt, conference]{ieeeconf} % Use this line for a4
% paper
\IEEEoverridecommandlockouts % This command is only
% needed if you want to
% use the \thanks command
\overrideIEEEmargins
\usepackage{graphicx}
\usepackage{lipsum}
\usepackage{xcolor}
\graphicspath{ {images/} }
\title{\LARGE \bf
Datasheet Template
}
\begin{document}
\maketitle
\thispagestyle{empty}
\pagestyle{empty}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Motivation For Datasheet Creation}
\textcolor{blue}{\subsection{Why was the datasheet created? (e.g., was there a specific task in mind? was there a specific gap that needed to
be filled?)}}
\lipsum[1]
\textcolor{blue}{\subsection{Has the dataset been used already? If so, where are the results so others can compare
(e.g., links to published papers)?}}
\lipsum[1]
\textcolor{blue}{\subsection{What (other) tasks could the dataset be used for?}}
\lipsum[1]
\textcolor{blue}{\subsection{Who funded the creation dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comment?}}
\lipsum[1]
\section{Datasheet Composition}
\textcolor{blue}{\subsection{What are the instances?(that is, examples; e.g., documents, images, people, countries) Are there multiple types
of instances? (e.g., movies, users, ratings; people, interactions between them; nodes, edges)}}
\lipsum[1]
\textcolor{blue}{\subsection{How many instances are there in total (of each type, if appropriate)?}}
\lipsum[1]. Citation example. \cite{latex:companion, latex2e}
\textcolor{blue}{\subsection{What data does each instance consist of ? “Raw”
data (e.g., unprocessed text or images)? Features/attributes? Is there a label/target associated with
instances? If the instances related to people, are subpopulations identified (e.g., by age, gender, etc.) and what is
their distribution?}}
\lipsum[1]
Here's an example of a footnote \footnote{I'm a footnote.} \footnote{I'm another footnote.}
\textcolor{blue}{\subsection{Is there a label or target associated with each instance? If so, please
provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is any information missing from individual instances? If so, please
provide a description, explaining why this information is missing (e.g., because it was unavailable). This does not include intentionally removed
information, but might include, e.g., redacted text.}}
\lipsum[1]
\textcolor{blue}{\subsection{Are relationships between individual instances made explicit (e.g.,
users’ movie ratings, social network links)? If so, please describe
how these relationships are made explicit.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain all possible instances or is it a sample (not
necessarily random) of instances from a larger set? If the dataset is
a sample, then what is the larger set? Is the sample representative of the
larger set (e.g., geographic coverage)? If so, please describe how this
representativeness was validated/verified. If it is not representative of the
larger set, please describe why not (e.g., to cover a more diverse range of
instances, because instances were withheld or unavailable).}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there recommended data splits (e.g., training, development/validation, testing)? If so, please provide a description of these
splits, explaining the rationale behind them.}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any errors, sources of noise, or redundancies in the
dataset? If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is the dataset self-contained, or does it link to or otherwise rely on
external resources (e.g., websites, tweets, other datasets)? If it links
to or relies on external resources, a) are there guarantees that they will
exist, and remain constant, over time; b) are there official archival versions
of the complete dataset (i.e., including the external resources as they existed at the time the dataset was created); c) are there any restrictions
(e.g., licenses, fees) associated with any of the external resources that
might apply to a future user? Please provide descriptions of all external
resources and any restrictions associated with them, as well as links or
other access points, as appropriate.}}
\lipsum[1]
\textcolor{blue}{Any other comments?}
\lipsum[1]
\section{Collection Process}
\textcolor{blue}{\subsection{What mechanisms or procedures were used to collect the data (e.g.,
hardware apparatus or sensor, manual human curation, software program, software API)? How were these mechanisms or procedures validated?}}
\lipsum[1]
\textcolor{blue}{\subsection{How was the data associated with each instance acquired? Was the
data directly observable (e.g., raw text, movie ratings), reported by subjects (e.g., survey responses), or indirectly inferred/derived from other data
(e.g., part-of-speech tags, model-based guesses for age or language)?
If data was reported by subjects or indirectly inferred/derived from other
data, was the data validated/verified? If so, please describe how.}}
\lipsum[1]
\textcolor{blue}{\subsection{If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities)?}}
\lipsum[1]
\textcolor{blue}{\subsection{Who was involved in the data collection process (e.g., students,
crowdworkers, contractors) and how were they compensated (e.g.,
how much were crowdworkers paid)?}}
\lipsum[1]
\textcolor{blue}{\subsection{Over what timeframe was the data collected? Does this timeframe
match the creation timeframe of the data associated with the instances
(e.g., recent crawl of old news articles)? If not, please describe the timeframe in which the data associated with the instances was created.}}
\lipsum[1]
\section{Data Preprocessing}
\textcolor{blue}{\subsection{Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT
feature extraction, removal of instances, processing of missing values)? If so, please provide a description. If not, you may skip the remainder of the questions in this section.}}
\lipsum[1]
\textcolor{blue}{\subsection{Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated
future uses)? If so, please provide a link or other access point to the
“raw” data.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is the software used to preprocess/clean/label the instances available? If so, please provide a link or other access point.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does this dataset collection/processing procedure
achieve the motivation for creating the dataset
stated in the first section of this datasheet? If not,
what are the limitations?}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments}}
\lipsum[1]
\section{Dataset Distribution}
\textcolor{blue}{\subsection{How will the dataset be distributed? (e.g., tarball on
website, API, GitHub; does the data have a DOI and is it
archived redundantly?)}}
\lipsum[1][1-10]
\textcolor{blue}{\subsection{When will the dataset be released/first distributed?
What license (if any) is it distributed under?}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any copyrights on the data?}}
\lipsum[1]
\textcolor{blue}{\subsection{Are there any fees or access/export restrictions?}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments?}}
\lipsum[1]
\section{Dataset Maintenance}
\textcolor{blue}{\subsection{Who is supporting/hosting/maintaining the
dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{Will the dataset be updated? If so, how often and
by whom?}}
\lipsum[1]
\textcolor{blue}{\subsection{How will updates be communicated? (e.g., mailing
list, GitHub)}}
\lipsum[1]
\textcolor{blue}{\subsection{If the dataset becomes obsolete how will this be
communicated?}}
\lipsum[1]
\textcolor{blue}{\subsection{Is there a repository to link to any/all papers/systems that use this dataset?}}
\lipsum[1]
\textcolor{blue}{\subsection{If others want to extend/augment/build on this
dataset, is there a mechanism for them to do so?
If so, is there a process for tracking/assessing the
quality of those contributions. What is the process
for communicating/distributing these contributions
to users?}}
\lipsum[1]
\section{Legal and Ethical Considerations}
\textcolor{blue}{\subsection{Were any ethical review processes conducted (e.g., by an institutional review board)? If so, please provide a description of these review
processes, including the outcomes, as well as a link or other access point
to any supporting documentation.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain data that might be considered confidential
(e.g., data that is protected by legal privilege or by doctorpatient confidentiality, data that includes the content of individuals non-public
communications)? If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety? If so,
please describe why}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset relate to people? If not, you may skip the remaining
questions in this section.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset identify any subpopulations (e.g., by age, gender)?
If so, please describe how these subpopulations are identified and provide
a description of their respective distributions within the dataset.}}
\lipsum[1]
\textcolor{blue}{\subsection{Is it possible to identify individuals (i.e., one or more natural persons), either directly or indirectly (i.e., in combination with other
data) from the dataset? If so, please describe how.}}
\lipsum[1]
\textcolor{blue}{\subsection{Does the dataset contain data that might be considered sensitive in
any way (e.g., data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or
locations; financial or health data; biometric or genetic data; forms of
government identification, such as social security numbers; criminal
history)? If so, please provide a description.}}
\lipsum[1]
\textcolor{blue}{\subsection{Did you collect the data from the individuals in question directly, or
obtain it via third parties or other sources (e.g., websites)?}}
\lipsum[1]
\textcolor{blue}{\subsection{Were the individuals in question notified about the data collection?
If so, please describe (or show with screenshots or other information) how
notice was provided, and provide a link or other access point to, or otherwise reproduce, the exact language of the notification itself.}}
\lipsum[1]
\textcolor{blue}{\subsection{Did the individuals in question consent to the collection and use of
their data? If so, please describe (or show with screenshots or other
information) how consent was requested and provided, and provide a link
or other access point to, or otherwise reproduce, the exact language to
which the individuals consented.}}
\lipsum[1]
\textcolor{blue}{\subsection{If consent was obtained, were the consenting individuals provided
with a mechanism to revoke their consent in the future or for certain
uses? If so, please provide a description, as well as a link or other access
point to the mechanism (if appropriate).}}
\lipsum[1]
\textcolor{blue}{\subsection{Has an analysis of the potential impact of the dataset and its use
on data subjects (e.g., a data protection impact analysis)been conducted? If so, please provide a description of this analysis, including the
outcomes, as well as a link or other access point to any supporting documentation.}}
\lipsum[1]
\textcolor{blue}{\subsection{Any other comments?}}
\lipsum[1]
\medskip
\bibliographystyle{unsrt}
\bibliography{sample}
\end{document}