HEDS Form

Download to file

download json

Press the button to download your current form in JSON format.

Upload from file

upload json

Press the button to upload a JSON file. Warning: This will clear your current form completely then upload the contents from the file.

Count of errors

Updates every 60 seconds.

41 blank fields.

Instructions

This is the Human Evaluation Datasheet (HEDS) form which is designed to record full details of human evaluation experiments in Natural Language Processing (NLP), addressing a history of details often going unreported in the field (in extreme cases, no details at all are reported). Reporting such details is crucial for gauging the reliability of results, determining comparability with other experiments, and for assessing reproducibility (Belz et al., 2023a,b; Thomson et al., 2024; Thomson and Belz, 2024). Having a standard set of questions to answer (as provided by HEDS) means not having to worry about what information to include or in what detail, as well as the information being in a format directly comparable to information reported for other human evaluation experiments. To maximise standardisation, questions are in multiple-choice format where possible.

The HEDS form is divided into five main sections, containing questions that record information about resources, evaluated system(s), test set sampling, quality criteria assessed, and ethics, respectively. Within each of the main sections there can be multiple subsections which can be expanded or collapsed.

Each HEDS question comes with instructions and notes to help with answering it, except where the task is exceedingly simple (e.g. when a contact email address is asked for).

HEDS Section 4 needs to be completed for each quality criterion that is evaluated in the experiment. Instructions on how to do this are shown at the start of HEDS Section 4.

The form is not submitted to any server when it is completed, and instead needs to be downloaded to a local file. A tool is available in the GitHub repository for converting the file to latex format (which we used to generate the next section). Please use the "download json" button in the "Download to file" section. This will download a file (in .json format) that contains the current values from each form field. You can also upload a json file (see the "Upload from file" section" on the left of the screen). Warning: This will delete your current form content, then populate the blank form with content from the file. It is advisable to download files as a backup when you are compelting the form. The form saves the field values in local storage of your browser, it will be deleted if you clear the local storage, or if you are in a private/incognito window and then close it.

The form will not prevent you from downloading your save file, even when there are error or warning messages. Yellow warning messages indicate fields that have not been completed. If a field is not relevant for your experiment, enter N/A, and ideally also explain why. Red messages are errors, for example if the form expects an integer and you have entered something else, a red message will be shown. These will still not prevent you from saving the form.

You can generate a list of all current errors/warnings, along with their section numbers, in the "all form errors" tab at the bottom of the form. A count of errors will also be refreshed every 60 seconds on the panel on the left side of the screen.

We recognise that completing a form of this length and level of detail constitutes an overhead in terms of time and effort, especially the first time a HEDS form is completed when the learning curve is steepest. However, this overhead does go down substantially with each use of HEDS, and, we believe, is far outweighed by the benefits: increased scientific rigour, reliability and repeatability.

We envisage the main uses of HEDS to be as follows. Ideally, it should be completed before a human evaluation experiment is run, at the point when the design is final, as part of a formal preregistration process. Once the experiment has been run, the information in the sheet can be updated if necessary, e.g. if the final number of evaluators had to change due to unforeseen circumstances.

Another use is for the purpose of reporting the details of a completed experiment. For this, the completed HEDS sheet can be automatically converted to Latex, ready for inclusion in the supplementary material.

A third use is for carrying out reproducibility studies, as has been done extensively in the ReproGen and ReproNLP shared tasks (Belz et al., 2022, Belz & Thomson, 2024). Here, the HEDS sheets were used to ensure that original work and reproduction experiment had the same properties, hence can be expected to produce similar results.

How to cite

The paper describing HEDS 3.0 is Belz & Thomson 2024.

Question 1.1.1: Where can the main reference for the evaluation experiment be found?

Multiple-choice options (select one)

1. The main paper reporting the experiment is here (enter URL).

2. An unpublished report describing the experiment can be found here (enter URL).

3. No report describing the experiment is available and this sheet will be uploaded for preregistration here (enter URL).

4. No report describing the experiment is available and no pregistration is not planned.

1.1.1: Please select at least 1 of the above options.

Question 1.1.2: Which experiment is this form being completed for?

Referring to the main reference entered for Question 1.1.1, identify the experiment that you’re completing this form for (see instructions section at the start for explanation of term ‘experiment’), in particular to differentiate this experiment from any others that you are carrying out as part of the same overall work: (a) if a link for a published paper was entered under Question 1.1.1, give here the section(s) and/or table(s) that best identify the experiment, plus a brief description for clarity; (b) if ‘preregistration’ or ‘unpublished’ was selected, enter a brief description of the experiment, mentioning quality criteria, dataset and systems.

1.1.2: Please complete this question.

Question 1.2: Where can the resources that were used in the evaluation experiment be found?

Multiple-choice options (select one)

1. The resources used in the experiment can be found here (enter URL(s)).

2. No resources shared.

1.2: Please select at least 1 of the above options.

Question 1.3.1.1: Name of the person completing this sheet.

1.3.1.1: Please complete this question.

Question 1.3.1.2: Affiliation of the person completing this sheet.

1.3.1.2: Please complete this question.

Question 1.3.1.3: Email address of the person completing this sheet.

1.3.1.3: Please complete this question.

Question 1.3.2.1: Name of the contact author.

1.3.2.1: Please complete this question.

Question 1.3.2.2: Affiliation of the contact author.

1.3.2.2: Please complete this question.

Question 1.3.2.3: Email address of the contact author.

1.3.2.3: Please complete this question.

Question 5.1: Which research ethics committee has approved the evaluation experiment this sheet is being completed for, or the larger study it is part of?

Normally, research organisations, universities and other higher-education institutions require some form ethical approval before experiments involving human participants, however innocuous, are permitted to proceed. Please provide here the name of the body that approved the experiment, or state ‘No ethical approval obtained’ if applicable.

5.1: Please complete this question.

Question 5.2: Does personal data (as defined in GDPR Art. 4, Â§1: https://gdpr.eu/article-4-definitions) occur in any of the system outputs (or human-authored stand-ins) evaluated, or responses collected, in the experiment this sheet is being completed for?

Multiple-choice options (select one)

1. No, personal data as defined by GDPR was neither evaluated nor collected.

2. Yes, personal data as defined by GDPR was evaluated and/or collected

5.2: Please select at least 1 of the above options.

Question 5.3: Does special category information (as defined in GDPR Art. 9, Â§1: https://gdpr.eu/article-9-processing-special-categories-of-personal-data-prohibited) occur in any of the evaluation items evaluated, or responses collected, in the evaluation experiment this sheet is being completed for?

Multiple-choice options (select one)

1. No, special category data as defined by GDPR was neither evaluated nor collected.

2. Yes, special category data as defined by GDPR was evaluated and/or collected

5.3: Please select at least 1 of the above options.

Question 5.4: Have any impact assessments been carried out for the evaluation experiment, and/or any data collected/evaluated in connection with it?

If an ex ante or ex post impact assessment has been carried out, and the assessment plan and process, as well as the outcomes, were captured in written form, describe them here and link to the report. Otherwise enter ‘no impact assessment carried out’. Types of impact assessment include data protection impact assessments, e.g. under GDPR. Environmental and social impact assessment frameworks are also available.

5.4: Please complete this question.

List of all errors

refresh list of all errors

Press the button to refresh the list of all errors.

HEDS Form

Download to file

Upload from file

Count of errors

Instructions

Instructions

How to cite

HEDS Section 1: Main Reference and Supplementary Resources

1.1: Main reference

1.2: Supplementary resources

1.3: Contact Details

1.3.1: Details of the person completing this sheet.

1.3.2: Details of the contact author

HEDS Section 2: Evaluated System(s)

HEDS Section 3: Sample of system outputs, evaluators, experimental design

3.1: Sample of system outputs (test set)

3.1.3: Statistical power of the sample

3.2: Evaluators

3.2.2: Evaluator Type

3.3: Experimental Design

3.3.3: Quality assurance

3.3.4: Form/Interface

HEDS Section 4: Definition and Operationalisation of Quality Criteria

4.1: Quality Criterion Properties

4.2: Evaluation mode properties

4.3: Response elicitation

4.3.1: Quality criterion name

4.3.11: Inter-annotator agreement

4.3.12: Intra-annotator agreement

HEDS Section 5: Ethics

All Form Errors

List of all errors