Validation Rules

SSUB-R0001

  • level: warning

  • Name: Invalid missing value

  • Message: Invalid missing value. Please provide missing value as either 'not collected', 'not applicable' , 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible).

  • Description: The International Nucleotide Database Collaboration (INSDC) have developed a standardised missing/null value reportingarrow-up-right language to be used where a value of an expected format for sample information reporting can not be provided. Please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible). Note that the reporting level term is required for "collection_date" and "geo_loc_name".

SSUB-R0002

  • level: error

  • Name: Invalid Attribute value for controlled terms

  • Message: Attribute value is not in controlled terms.

  • Description: Values are controlled in several attributes.

SSUB-R0003

  • level: error

  • Name: Duplicated sample title in this submission

  • Message: Sample title is duplicated in the submission.

  • Description: To distinguish sample, label each sample with a title unique within the submission.

SSUB-R0004

  • level: error

  • Name: Taxonomy name and id not match

  • Message: Organism and taxonomy id do not match.

  • Description: Enter a pair of organism and taxonomy id registered in NCBI Taxonomyarrow-up-right. For a novel organism, type Unclassified (32644).

SSUB-R0005

  • level: error

  • Name: Invalid datetime

  • Message: Invalid datetime. Datetimes must be in Coordinated Universal Time (UTC) and follow ISO 8601 standard formats "YYYY-mm-dd", "YYYY-mm" or "YYYY-mm-ddThh:mm:ssZ".

  • Description: Date format must follow ISO 8601 standard "YYYY-mm-dd", "YYYY-mm" or "YYYY-mm-ddThh:mm:ssZ" (e.g., 1990-10-30, 1990-10 or 1990-10-30T14:41:36Z). Collection times must be in Coordinated Universal Time (UTC). Times without time zone are processed as UTC. Non-UTC times are converted to UTC. When you do not report the date, enter missing values in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason.

SSUB-R0006

  • level: error

  • Name: Invalid country

  • Message: Entered country is not in controlled terms. When you do not report the location, enter missing values in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason.

  • Description: Country name must be in the country listarrow-up-right. When you do not report the location, enter missing values in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason

SSUB-R0007

  • level: error

  • Name: Invalid lat_lon format

  • Message: Invalid lat_lon format. Specify as degrees latitude and longitude in format "d[d.dddddddd] N|S d[dd.dddddddd] W|E".

  • Description: Latitude and longitude must be in "d[d.dddddddd] N|S d[dd.dddddddd] W|E" format.

SSUB-R0008

  • level: warning

  • Name: Special character included

  • Message: Special character is included.

SSUB-R0009

  • level: warning

  • Name: Invalid data format

  • Message: Invalid data format.

SSUB-R0010

  • level: warning

  • Name: Invalid host organism name

  • Message: Invalid host organism name.

  • Description: Describe host with the scientific name in NCBI Taxonomyarrow-up-right.

SSUB-R0011

  • level: error

  • Name: Missing sample name

  • Message: Sample name is missing.

SSUB-R0012

  • level: error

  • Name: Missing organism

  • Message: Organism is missing.

SSUB-R0013

  • level: warning

  • Name: Identical Attributes

  • Message: You should have one BioSample for each specimen, and each of your BioSamples must have differentiating information (excluding sample name, title, bioproject accession and description). This check was implemented to encourage submitters to include distinguishing information in their samples. If the distinguishing information is in the sample name, title or description, please recode it into an appropriate attribute, either one of the predefined attributes or a custom attribute you define. If it is necessary to represent true biological replicates as separate BioSamples, you might add an 'aliquot' or 'replicate' attribute, e.g., 'replicate = biological replicate 1', as appropriate. Note that multiple assay types, e.g., RNA-seq and ChIP-seq data may reference the same BioSample if appropriate.

  • Description: To distinguish samples in structural way, please differentiate samples with distinct attributes other than sample name, title, bioproject_id and description

SSUB-R0014

  • level: error

  • Name: Missing mandatory attribute

  • Message: Sample has missing mandatory attribute(s). If you do not have information for the required field(s), please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible). Note that the reporting level term is required for "collection_date" and "geo_loc_name".

  • Description: Please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible). Note that the reporting level term is required for "collection_date" and "geo_loc_name". See 'Missing value reporting' for details.

SSUB-R0015

  • level: error

  • Name: Duplicate Sample Names

  • Message: The following Sample names were used in submission more than one time. Please provide unique Sample names.

  • Description: Enter sample name unique within the BioProject.

SSUB-R0016

  • level: error

  • Name: Missing Attribute name

  • Message: Attribute name is missing.

SSUB-R0017

  • level: error

  • Name: Missing group of at least one required Attributes

  • Message: Sample has missing attribute(s), at least one of the following attributes is required. If you do not have information for the required field(s), please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible).

SSUB-R0018

  • level: error

  • Name: Future collection date

  • Message: Sample collection date is a future date, please specify a date from the past.

SSUB-R0019

  • level: warning

  • Name: Latlon versus country

  • Message: Values provided for 'latitude and longitude' and 'geographic location' contradict each other

  • Description: Country reverse geocoded from latitude and longitude by Google Maps Geocoding APIarrow-up-right and that in the geo_loc_name are not same. Geocoding sometimes does not work for areas such as international waters.

SSUB-R0020

  • level: error

  • Name: Package versus Organism

  • Message: Organism is inappropriate for package. Please either specify a different sample type package or edit the organism according to the 'Appropriate organism and package' rules described at

  • Description: Provide an organism name which is appropriate for the selected Sample package

SSUB-R0021

  • level: warning

  • Name: Sex for bacteria

  • Message: Attribute 'sex' is not appropriate.

SSUB-R0022

  • level: error

  • Name: Multiple Attribute values

  • Message: Multiple values detected. Only one value is allowed. First value was used for subsequent validation.

SSUB-R0023

  • level: warning

  • Name: Multiple vouchers

  • Message: Multiple voucher attributes (specimen voucher, culture collection or biologic material) detected with the same institution code. Only one value is allowed.

SSUB-R0024

  • level: warning

  • Name: Redundant taxonomy attributes

  • Message: Redundant values are detected in at least two of the following fields: organism; host; isolation source. For example, the value you supply for 'host' should not be identical to the value supplied for 'isolation source'. This check is case-insensitive and ignores white-space.

SSUB-R0025

  • level: warning

  • Name: Invalid Attribute value format

  • Message: Attribute value format is invalid.

SSUB-R0026

  • level: error

  • Name: Attribute value is not integer

  • Message: Attribute value must be integer.

SSUB-R0027

  • level: warning

  • Name: Format of geo_loc_name is invalid

  • Message: Format of geo_loc_name is invalid.

  • Description: Describe geographic location in the specified format.

SSUB-R0028

  • level: error

  • Name: Taxonomy at species or infra-specific rank

  • Message: Taxonomy should be species or infra-specific level.

  • Description: Taxonomy should be species or infra-specific level in NCBI Taxonomyarrow-up-right.

SSUB-R0029

  • level: warning

  • Name: Missing values provided for optional attribute

  • Message: Missing values are not necessary for optional attributes. Leave values empty when there is no information.

SSUB-R0030

  • level: error

  • Name: Invalid Sample Name format.

  • Message: Maximum length of Sample Name is 100 characters (alphanumeric characters, spaces and (){}[]+-_.)

SSUB-R0031

  • level: warning

  • Name: Taxonomy warning

  • Message: An organism name in the taxonomy database should be used. If applicable, the organism will be corrected to the scientific name. When the organism is novel, please enter a proposed name in the component_organism.

SSUB-R0032

  • level: error

  • Name: Invalid metagenome source

  • Message: A metagenomic organism name in the taxonomy database should be used. For example, 'soil metagenome'.

SSUB-R0033

  • level: warning

  • Name: Invalid institution code

  • Message: An institution code with an appropriate type ('c' for culture collection and 's' for specimen voucher) in the NCBI BioCollections database should be used.

  • Description: An institution code with an appropriate type ('c' for culture_collection, 's' for specimen_voucher and 'b' for bio_material) should be used. For the institution code, please see the NCBI BioCollections databasearrow-up-right or the BioCollections list filearrow-up-right.

SSUB-R0034

  • level: error

  • Name: Invalid culture_collection format

  • Message: Valid culture collection format is "<institution-code>:[<collection-code>:]<culture_id>"ote that the reporting level term is required for "collection_date" and "geo_loc_name".

  • Description: The institution_code and the identifier for the culture from which the nucleic acid sequence was obtained, with optional collection code. Value format is '<institution_code>:[<collection_code>:]<culture_id>'

SSUB-R0035

SSUB-R0036

  • level: error

  • Name: Specimen voucher for bacteria and unclassified sequences

  • Message: Attribute 'specimen_voucher' is not appropriate for bacteria and unclassified sequences.

SSUB-R0037

  • level: error

  • Name: Invalid specimen_voucher format

  • Message: Valid specimen voucher format is "[<institution-code>:[<collection-code>:]]<specimen_id>"

  • Description: The institution_code and the identifier for the specimen (a part or whole of an animal or plant) from which the sequence was obtained. Value format is '[<institution_code>:[<collection_code>:]]<specimen_id>'.

SSUB-R0038

SSUB-R0039

  • level: error

  • Name: Invalid bio_material format

  • Message: Valid bio_material format is "[<institution-code>:[<collection-code>:]]<material_id>"

  • Description: The institution_code and the identifier for the biological material (living individual or strain) from which the nucleic acid sequence was obtained. Value format is '[<institution_code>:[<collection_code>:]]<material_id>'.

SSUB-R0040

SSUB-R0041

  • level: error

  • Name: Null value for infra-specific identifier

  • Message: Enter non-null value (other than missing etc) for infra-specific identifiers (strain, isolate, cultivar and ecotype) for samples of assembled genome sequences and clinical isolates.

SSUB-R0042

  • level: warning

  • Name: Non-identical identifiers among organism/strain/isolate

  • Message: The identifier in the organism name after 'sp./bacterium/archeon' differs from strain/isolate. Use strain/isolate as the identifier for the organism not registered in the NCBI Taxonomy.

SSUB-R0043

  • level: error

  • Name: Invalid strain value

  • Message: The following values are not allowed: 'bacteria', 'clinical isolate', 'environmental', 'microbial', 'no', 'soil', 'sp', 'sp.', strain', 'whole organism', 'yes'. Additionally, strain name should not start with 'subsp.', 'serovar' or the organism name. All checks are case-insensitive. Provide a valid strain name rather than a descriptive term. This is generally the identifier that you use in your lab work for this sample.

SSUB-R0044

  • level: warning

  • Name: Invalid datetime format

  • Message: Invalid datetime format. Follow ISO 8601 standard "YYYY-mm-dd", "YYYY-mm" or "YYYY-mm-ddThh:mm:ssZ" (e.g., 1990-10-30, 1990-10 or 1990-10-30T14:41:36Z). Collection times must be in Coordinated Universal Time (UTC). Times without time zone are processed as UTC. Non-UTC times are converted to UTC.

SSUB-R0045

  • level: error

  • Name: Missing reporting level term

  • Message: When you do not report "collection_date" and "geo_loc_name", provide missing value in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason. A reporting level term is required in these two attributes.

SSUB-R0046

  • level: warning

  • Name: New line included

  • Message: New line included: value possibly contains new line(s).

SSUB-R0047

  • level: error

  • Name: Non-ASCII header line

  • Message: Unable to parse file: the header line contains non-ASCII characters. Please check that uploaded file is in text format, not Excel.

SSUB-R0048

  • level: error

  • Name: Empty column name

  • Message: Unable to parse batch file: the header line has empty column name.

SSUB-R0049

  • level: error

  • Name: Non-UTF8 input

  • Message: Unable to parse batch file: some values cannot be converted to UTF8 encoding.

SSUB-R0050

  • level: error

  • Name: Non-ASCII Attribute value

  • Message: Non-ASCII format characters detected. Please check for non-standard characters in your attribute values around [### Non-ASCII character ###] and reformat as ASCII-only so that data can be properly consumed by dependent databases.

Last updated