Missing Value Reporting

1) Usage of Missing Value Reporting Terms:

The International Nucleotide Database Collaboration (INSDC) have a standardized missing/null value reporting language to be used where a value of an expected format for sample metadata reporting can not be provided.

The controlled vocabulary takes into account different type of constraints. Submitters are strongly encouraged to always provide true values. However, if missing/null value reporting is required, submitters are asked to use a term with the finest granularity for their situation. See the table below for accepted missing value reporting terms.

  • For Mandatory fields, if specific values cannot be provided, arbitrary values* must not be entered. Instead, the missing value reporting terms defined by INSDC must be used.

    • Examples of arbitrary values: ?, n/a, na, none, not available, not determined, not recorded, null, unk, unknown, unspecified, etc.

  • For Collection date and Geographic location fields, if specific values cannot be provided, select the most appropriate term from the missing value reporting terms 1~13.

  • For Cell line, the Collection date field may be filled with not applicable or missing: lab stock.

  • For mandatory fields other than Collection date and Geographic location, if specific values cannot be provided, choose the most appropriate term from terms 1~5.

  • However, for mandatory fields such as NCBI taxonomy ID, Organism, Strain and Isolate, valid values must be provided. Missing value reporting terms cannot be used for these fields.

  • For Optional fields, if specific values cannot be provided, they may be left blank

2) How to Report Missing Values:

The International Nucleotide Database Collaboration (INSDC) has established a standardized vocabulary for reporting missing or null values in sample metadata.

Key Guidelines:

  • True values should always be provided whenever possible.

  • Do not enter arbitrary values such as ?, n/a, na, none, not available, not determined, not recorded, null, unk, unknown, unspecified, etc.

  • If a value is unavailable, use the most specific missing value reporting term applicable to the situation.

Reporting Rules by field type:

  • Mandatory field

    • If a specific value cannot be provided, use the designated INSDC missing value reporting terms instead of arbitrary values.

    • The following fields must always have valid values and cannot use missing value terms:

      • NCBI Taxonomy ID, Organism, Strain, Isolate

    • For Collection date and Geographic location, choose from terms 1–13.

    • For Collection date in cell line samples, use not applicable or missing: lab stock.

    • For other mandatory fields, choose the most appropriate term from terms 1~5.

    • However, for mandatory fields such as NCBI taxonomy ID, Organism, Strain and Isolate, valid values must be provided. Missing value reporting terms cannot be used for these fields.

  • Optional field

    • If a specific values cannot be provided, the field may be left blank

Refer to the table below for the accepted missing value reporting terms and their appropriate usage.

INSDC Missing Value Reporting Terms:

No
Term
Description

1

not applicable

Information is inappropriate to report, can indicate that the standard itself fails to model or represent the information appropriately

2

not collected

Information of an expected format was not given because it has not been collected

3

not provided

Information of an expected format was not given, a value may be given at the later stage

4

restricted access

Information exists but can not be released openly because of privacy concerns

5

missing

Information does not exist

6

missing: control sample

Information is not applicable as the sample represents a negative control sample collected in a lab

7

missing: sample group

Information is not applicable as the sample represents a group of samples that do not have a single origin. E.g. for co-assembly or transcriptome assembly

8

missing: synthetic construct

Information does not exist as the sample represents an ab-initio synthetic construct

9

missing: lab stock

Information was not collected as the sample represents a cultured cell line or model organism under long-term lab control

10

missing: third party data

Information does not exist as the metadata was not collected or reported in records predating the 2023 agreement. For use in Third Party Annotation (TPA) data submissions

11

missing: data agreement established pre-2023

Data agreements were established before the 2023 INSDC standard and metadata can not be provided. A value may be given at a later stage

12

missing: endangered species

Information can not be reported as the target organism is endangered e.g. on the IUCN red-list

13

missing: human-identifiable

Information can not be reported as the metadata would make the sample human-identifiable

Last updated