NeurIPS 2022Datasets and Benchmarks Track
We are immensely grateful for the tremendous contributions of the 92area chairs, 1064reviewersand 39 ethics reviewers to make this new endeavor a success.
TheDatasets and Benchmarks track serves as a novel venue for high-quality publications, talks, and posters on highly valuable machine learning datasets and benchmarks, as well as a forum for discussions on how to improve dataset development. Datasets and benchmarks are crucial for the development of machine learning methods, but also require their ownpublishing and reviewing guidelines. For instance, datasets can often not be reviewed in a double-blind fashion, and hence full anonymization will not be required. On the other hand, they dorequire additional specific checks, such as a proper description of how the data was collected, whether they show intrinsic bias, and whether they will remain accessible.
The 2021 Datasets and Benchmarks track was a big success; you can view the accepted papers from 2021 NeurIPS Datasets and Benchmarks here, and the winners of the best paper awards for the track here.
CRITERIA. We are aiming for an equally stringent review as the main conference, yet better suited to datasets and benchmarks. Submissions to this track will be reviewed according to a set of criteria and best practices specifically designed for datasets and benchmarks, as described below. A key criterion is accessibility: datasets should beavailable and accessible, i.e. thedata can be found and obtained without a personal request to the PI, and any required code should be open source. Next to a scientific paper, authors should also submit supplementary materials such as detail on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained.
RELATIONSHIP TO NEURIPS. Submissions to the track will be part of the main NeurIPS conference, presented alongside the main conference papers. Accepted paperswill beofficially published in associated proceedingsclearly linked to,yetseparate from, the NeurIPS proceedings. Theproceedings will be calledProceedings of theNeural Information Processing Systems Track on Datasets and Benchmarksand they will be hosted on the NeurIPS websitenext to the main NeurIPS proceedings. We will maintain a page on the NeurIPS website with allaccepted datasets and additional information.
SUBMISSIONS. There will be one deadlinethis year. It is also still possible to submit datasets and benchmarks to the main conference (under the usual review process), but dual submission to bothis not allowed (unless youretracted your paper from the main conference). Submission is single-blind, and the review process is open during the discussion phase. However,only accepted papers will remain visible after the review phase, and the datasets themselves can be released at a later date (see below). If it is possible to properly review the submissiondouble-blind, i.e., reviewers do not need access to non-anonymous repositories to review the work, then authors can also choose to submit the work anonymously.
SCOPE. In addition to new datasets and benchmarks on new or existing datasets,we welcome submissions that detail advanced practices in data collection and curation that are of general interest even if the data itself cannot be shared. Data generators,reinforcement learning environments, or benchmarking toolsare also in scope. Frameworks for responsible dataset development, audits of existing datasets,identifying significant problems with existing datasets and their use,or systematic analyses of existing systems on novel datasets that yield important new insight are also in scope.
Read our blog postfrom last year for more about why we started this track.
Important dates
- Abstract submission deadline: Monday, June 6, 2022 01:00 PM PDT.
- Full paper submission and co-author registration deadline: Thursday, June 9, 2022 01:00 PM PDT
- Supplementary materials submission deadline: Thursday, June 16, 2022 01:00 PM PDT
- End of the reviewing process- July 27, 2022 01:00 PM PDT
- Start of Author discussions on OpenReview: Wednesday,August 2, 2022,01:00 PM PDT
- End of author/reviewer discussions on OpenReview: Monday,August29th, 2022,01:00 PM PDT (extended to allow more time to respond to ethics reviews)
- Author notification: Friday, September 16, 2022
- Camera-ready deadline: Wednesday, October 12, 2022 01:00 PM PDT
- Video submission deadline: October 22,2022 01:00 PM PDT
- Poster submission deadline: before the conference.
Note: The site will start accepting submissions on April 16, 2022.
FREQUENTLY ASKED QUESTIONS
Q: My work is in scope for this track but possibly also for the main conference. Where should I submit it?
A: This is ultimately your choice. Consider the main contribution of the submission and how it should be reviewed. If the main contribution is a new dataset, benchmark, or other work that falls into the scope of the track (see above), then it is ideallyreviewed accordingly. As discussed inour blog post, the reviewing procedures of the main conference are focused on algorithmic advances, analysis, and applications, while the reviewing in thistrack is equally stringent but designedto properly assess datasets and benchmarks. Other, more practical considerations arethat this track allows single-blind reviewing(since anonymization is oftenimpossible for hosted datasets) and intended audience, i.e., make your work more visible for people looking for datasets and benchmarks.
Q: How will paper accepted to this track be cited?
A: Accepted papers will appear in official proceedings hosted on the NeurIPS website, next to (yet separate from) the main conference proceedings. The official name will beProceedings of theNeural Information Processing Systems Track on Datasets and Benchmarks.
Q: Do I need to submit an abstract beforehand?
A: Yes, please check the important dates section for more information.
Q: My dataset requires open credentialized access. Can I submit to this track?
A: This will be possible on the condition that acredentialization is necessary for the public good (e.g. because of ethically sensitive medical data), and that an established credentialization procedure is in place that is 1) open to a large section of the public, 2) provides rapidresponse and access to the data, and 3) is guaranteed to be maintained for many years. A good example here is PhysioNetCredentialing, where users must first understand how tohandledata with human subjects, yet is open to anyone who has learned and agrees with the rules.This should be seen as an exceptional measure, and NOT as a way to limit access to data for other reasons (e.g. to shield data behind a Data Transfer Agreement). Misuse would be grounds for desk rejection. During submission, you can indicate that your dataset involves open credentialized access, in which case the necessity, openness, and efficiency of the credentialization process itself will also be checked.
Q: What is the guidance on the design and resolution of the thumbnail to be uploaded?
A: Thumbnails are crucial for virtual poster sessions. Please follow the guidelines herefor creating and uploading the thumbnails.
SUBMISSION INSTRUCTIONS
A submission consists of:
Submissions are limited to 9 content pages in NeurIPS format,including all figures and tables; additional pages containing the paper checklist,references, and acknowledgements are allowed.If your submission is accepted, you will be allowed an additional content page for the camera-ready version.
Please carefully follow theLatex template for this trackwhen preparing proposals. We follow the NeurIPS format, but with the appropriate headings, and without hiding the names of the authors. Download the template as a bundle here.
Reviewing is in principle single-blind, hencethe paper should not be anonymized. In cases where the work can be reviewed equally well anonymously, anonymous submission is also allowed.
Duringsubmission, you can add a public link to the dataset or benchmark data. If the dataset can only be released later, you must include instructions forreviewers on how to access the dataset. This can only be done after the first submissionby sending an official note to the reviewers in OpenReview.We highly recommend making the dataset publicly available immediatelyorbefore the start of the NeurIPS conference. In select cases, requiring solid motivation, the release datecan be stretched up to a year after the submission deadline.
Submission introducing new datasetsmust include the following in the supplementary materials (as a separate PDF):
Dataset documentation and intended uses. Recommended documentation frameworks include datasheets for datasets, dataset nutrition labels, data statements for NLP, and accountability frameworks.
URL to website/platform where the dataset/benchmark can be viewed and downloaded by the reviewers.
Author statement that they bear all responsibility in case of violation of rights, etc., and confirmation of the data license.
Hosting, licensing, and maintenance plan. The choice of hosting platform is yours, as long as you ensure access to the data (possibly through a curated interface) and will provide the necessary maintenance.
To ensure accessibility, we largelyfollow the NeurIPS guidelinesfor data submission, but also allowing more freedom for non-static datasets. The supplementary materials for datasets must include the following:
Links to accessthe dataset and its metadata. This can be hidden upon submission if the dataset is not yet publicly available but must be added in the camera-ready version. In select cases, e.g when the data can only be released at a later date,this can be added afterward (up to a year after the submission deadline). Simulation environments should link to open sourcecode repositories.
The dataset itself should ideally use an open and widely used data format. Provide a detailed explanation on how the dataset can be read. For simulation environments,use existing frameworks or explain how they can be used.
Long-term preservation: It must be clear that the dataset will be available for a long time, either by uploading to a data repository or by explaining how the authors themselves will ensure this
Explicit license: Authors must choose a license, ideally a CC license for datasets, or an open source license for code (e.g. RL environments). An overview of licenses can be found here:https://paperswithcode.com/datasets/license
Add structuredmetadata to a dataset'smeta-data page usingWeb standards (likeschema.organdDCAT):This allows it to be discovered and organized by anyone. A guide can be found here:https://developers.google.com/search/docs/data-types/dataset. If you use an existing data repository, this is often doneautomatically.
Highly recommended:a persistent dereferenceable identifier (e.g. aDOIminted by a data repositoryor a prefix onidentifiers.org) for datasets, or a code repository (e.g. GitHub, GitLab,...) for code. If this is not possible or useful, please explain why.
For benchmarks, the supplementary materials must ensure that all results are easily reproducible. Where possible, use a reproducibility frameworksuch as theML reproducibility checklist, or otherwise guarantee that all results can be easily reproduced, i.e. all necessary datasets, code, and evaluation procedures must be accessible and documented.
For papers introducing best practices increating or curating datasets and benchmarks, the above supplementary materials are not required.
For papers resubmitted after being retracted from another venue: a brief discussion on the main concerns raised by previous reviewers and how you addressed them. You do not need to share the original reviews.
REVIEWING AND SELECTION PROCESS
Reviewing will be single-blind. A datasets and benchmarks program committee will be formed, consisting of experts on machine learning, dataset curation, and ethics. We will ensure diversity in the program committee, both in terms of background as well as technical expertise (e.g., data, ML, data ethics, social science expertise).Each paper will be reviewed by the members of the committee. In select cases that are flagged by reviewers, an ethics review may be performed as well.
The review process will be open: papers and reviews will be publicly visible during the review phase to allow community feedback. They will be hidden again after the review phase, unless they are accepted or when authors opt-in. Authors can choose to keep the datasets themselves hidden until a later release date, as long as reviewers have access.
The factors that will be considered when evaluating papers include:
Utility and quality of the submission: Impact, originality, novelty, relevance to the NeurIPS community will all be considered.
Completeness of the relevant documentation: For datasets, sufficient detail must be provided on how the data was collected and organized, what kind of information it contains, how it should be used ethically and responsibly, as well as how it will be made available and maintained. For benchmarks, best practices on reproducibility should be followed.
Accessibility and accountability: For datasets, there should be a convincing hosting, licensing, and maintenance plan.
Ethics and responsible use: Any ethical implications should be addressed and guidelines for responsible use should be provided where appropriate. Note that, if your submission includes publicly available datasets (e.g. as part of a larger benchmark), you should also check these datasets for ethical issues. You remain responsible for the ethical implications of including existing datasets or other data sources in your work.
ADVISORY COMMITTEE
The following committee will provide advice on the organization of the track over the coming years: Sergio Escalera, Isabelle Guyon,Neil Lawrence,Dina Machuve,Olga Russakovsky,Joaquin Vanschoren, Serena Yeung.
DATASETS AND BENCHMARKSCHAIRS
Anmol Kalia, Meta AI Research (Assistant to the PC)
Deepti Ghadiyaram, Meta AI Research
Joaquin Vanschoren, Eindhoven University of Technology
Contact:neurips-2022-datasets-benchmarks@googlegroups.com