Validation Rules
SSUB-R0001
level: warning
Name: Invalid missing value
Message: Invalid missing value. Please provide missing value as either 'not collected', 'not applicable' , 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible).
Description: The International Nucleotide Database Collaboration (INSDC) have developed a standardised missing/null value reporting language to be used where a value of an expected format for sample information reporting can not be provided. Please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible). Note that the reporting level term is required for "collection_date" and "geo_loc_name".
SSUB-R0002
level: error
Name: Invalid Attribute value for controlled terms
Message: Attribute value is not in controlled terms.
Description: Values are controlled in several attributes.
SSUB-R0003
level: error
Name: Duplicated sample title in this submission
Message: Sample title is duplicated in the submission.
Description: To distinguish sample, label each sample with a title unique within the submission.
SSUB-R0004
level: error
Name: Taxonomy name and id not match
Message: Organism and taxonomy id do not match.
Description: Enter a pair of organism and taxonomy id registered in NCBI Taxonomy. For a novel organism, type Unclassified (32644).
SSUB-R0005
level: error
Name: Invalid datetime
Message: Invalid datetime. Datetimes must be in Coordinated Universal Time (UTC) and follow ISO 8601 standard formats "YYYY-mm-dd", "YYYY-mm" or "YYYY-mm-ddThh:mm:ssZ".
Description: Date format must follow ISO 8601 standard "YYYY-mm-dd", "YYYY-mm" or "YYYY-mm-ddThh:mm:ssZ" (e.g., 1990-10-30, 1990-10 or 1990-10-30T14:41:36Z). Collection times must be in Coordinated Universal Time (UTC). Times without time zone are processed as UTC. Non-UTC times are converted to UTC. When you do not report the date, enter missing values in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason.
SSUB-R0006
level: error
Name: Invalid country
Message: Entered country is not in controlled terms. When you do not report the location, enter missing values in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason.
Description: Country name must be in the country list. When you do not report the location, enter missing values in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason
SSUB-R0007
level: error
Name: Invalid lat_lon format
Message: Invalid lat_lon format. Specify as degrees latitude and longitude in format "d[d.dddddddd] N|S d[dd.dddddddd] W|E".
Description: Latitude and longitude must be in "d[d.dddddddd] N|S d[dd.dddddddd] W|E" format.
SSUB-R0008
level: warning
Name: Special character included
Message: Special character is included.
SSUB-R0009
level: warning
Name: Invalid data format
Message: Invalid data format.
SSUB-R0010
level: warning
Name: Invalid host organism name
Message: Invalid host organism name.
Description: Describe host with the scientific name in NCBI Taxonomy.
SSUB-R0011
level: error
Name: Missing sample name
Message: Sample name is missing.
SSUB-R0012
level: error
Name: Missing organism
Message: Organism is missing.
SSUB-R0013
level: warning
Name: Identical Attributes
Message: You should have one BioSample for each specimen, and each of your BioSamples must have differentiating information (excluding sample name, title, bioproject accession and description). This check was implemented to encourage submitters to include distinguishing information in their samples. If the distinguishing information is in the sample name, title or description, please recode it into an appropriate attribute, either one of the predefined attributes or a custom attribute you define. If it is necessary to represent true biological replicates as separate BioSamples, you might add an 'aliquot' or 'replicate' attribute, e.g., 'replicate = biological replicate 1', as appropriate. Note that multiple assay types, e.g., RNA-seq and ChIP-seq data may reference the same BioSample if appropriate.
Description: To distinguish samples in structural way, please differentiate samples with distinct attributes other than sample name, title, bioproject_id and description
SSUB-R0014
level: error
Name: Missing mandatory attribute
Message: Sample has missing mandatory attribute(s). If you do not have information for the required field(s), please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible). Note that the reporting level term is required for "collection_date" and "geo_loc_name".
Description: Please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible). Note that the reporting level term is required for "collection_date" and "geo_loc_name". See 'Missing value reporting' for details.
SSUB-R0015
level: error
Name: Duplicate Sample Names
Message: The following Sample names were used in submission more than one time. Please provide unique Sample names.
Description: Enter sample name unique within the BioProject.
SSUB-R0016
level: error
Name: Missing Attribute name
Message: Attribute name is missing.
SSUB-R0017
level: error
Name: Missing group of at least one required Attributes
Message: Sample has missing attribute(s), at least one of the following attributes is required. If you do not have information for the required field(s), please provide missing value as either 'not collected', 'not applicable', 'missing' or in format "missing: reporting level term" (e.g. "missing: control sample") for mandatory attribute to declare both the absence of a true value as well as the reason (when possible).
SSUB-R0018
level: error
Name: Future collection date
Message: Sample collection date is a future date, please specify a date from the past.
SSUB-R0019
level: warning
Name: Latlon versus country
Message: Values provided for 'latitude and longitude' and 'geographic location' contradict each other
Description: Country reverse geocoded from latitude and longitude by Google Maps Geocoding API and that in the geo_loc_name are not same. Geocoding sometimes does not work for areas such as international waters.
SSUB-R0020
level: error
Name: Package versus Organism
Message: Organism is inappropriate for package. Please either specify a different sample type package or edit the organism according to the 'Appropriate organism and package' rules described at
Description: Provide an organism name which is appropriate for the selected Sample package
SSUB-R0021
level: warning
Name: Sex for bacteria
Message: Attribute 'sex' is not appropriate.
SSUB-R0022
level: error
Name: Multiple Attribute values
Message: Multiple values detected. Only one value is allowed. First value was used for subsequent validation.
SSUB-R0023
level: warning
Name: Multiple vouchers
Message: Multiple voucher attributes (specimen voucher, culture collection or biologic material) detected with the same institution code. Only one value is allowed.
SSUB-R0024
level: warning
Name: Redundant taxonomy attributes
Message: Redundant values are detected in at least two of the following fields: organism; host; isolation source. For example, the value you supply for 'host' should not be identical to the value supplied for 'isolation source'. This check is case-insensitive and ignores white-space.
SSUB-R0025
level: warning
Name: Invalid Attribute value format
Message: Attribute value format is invalid.
SSUB-R0026
level: error
Name: Attribute value is not integer
Message: Attribute value must be integer.
SSUB-R0027
level: warning
Name: Format of geo_loc_name is invalid
Message: Format of geo_loc_name is invalid.
Description: Describe geographic location in the specified format.
SSUB-R0028
level: error
Name: Taxonomy at species or infra-specific rank
Message: Taxonomy should be species or infra-specific level.
Description: Taxonomy should be species or infra-specific level in NCBI Taxonomy.
SSUB-R0029
level: warning
Name: Missing values provided for optional attribute
Message: Missing values are not necessary for optional attributes. Leave values empty when there is no information.
SSUB-R0030
level: error
Name: Invalid Sample Name format.
Message: Maximum length of Sample Name is 100 characters (alphanumeric characters, spaces and (){}[]+-_.)
SSUB-R0031
level: warning
Name: Taxonomy warning
Message: An organism name in the taxonomy database should be used. If applicable, the organism will be corrected to the scientific name. When the organism is novel, please enter a proposed name in the component_organism.
SSUB-R0032
level: error
Name: Invalid metagenome source
Message: A metagenomic organism name in the taxonomy database should be used. For example, 'soil metagenome'.
SSUB-R0033
level: warning
Name: Invalid institution code
Message: An institution code with an appropriate type ('c' for culture collection and 's' for specimen voucher) in the NCBI BioCollections database should be used.
Description: An institution code with an appropriate type ('c' for culture_collection, 's' for specimen_voucher and 'b' for bio_material) should be used. For the institution code, please see the NCBI BioCollections database or the BioCollections list file.
SSUB-R0034
level: error
Name: Invalid culture_collection format
Message: Valid culture collection format is "<institution-code>:[<collection-code>:]<culture_id>"ote that the reporting level term is required for "collection_date" and "geo_loc_name".
Description: The institution_code and the identifier for the culture from which the nucleic acid sequence was obtained, with optional collection code. Value format is '<institution_code>:[<collection_code>:]<culture_id>'
SSUB-R0035
level: error
Name: Invalid culture_collection
Message: The institution-code and/or collection-code must be registered in the NCBI BioCollections.
Description: For valid institution and collection codes, please see the NCBI BioCollections database or the BioCollections list file.
SSUB-R0036
level: error
Name: Specimen voucher for bacteria and unclassified sequences
Message: Attribute 'specimen_voucher' is not appropriate for bacteria and unclassified sequences.
SSUB-R0037
level: error
Name: Invalid specimen_voucher format
Message: Valid specimen voucher format is "[<institution-code>:[<collection-code>:]]<specimen_id>"
Description: The institution_code and the identifier for the specimen (a part or whole of an animal or plant) from which the sequence was obtained. Value format is '[<institution_code>:[<collection_code>:]]<specimen_id>'.
SSUB-R0038
level: warning
Name: Invalid specimen_voucher
Message: The institution-code and/or collection-code must be registered in the NCBI BioCollections. Valid specimen voucher format is "[<institution-code>:[<collection-code>:]]<specimen_id>".
Description: For valid institution and collection codes, please see the NCBI BioCollections database or the BioCollections list file.
SSUB-R0039
level: error
Name: Invalid bio_material format
Message: Valid bio_material format is "[<institution-code>:[<collection-code>:]]<material_id>"
Description: The institution_code and the identifier for the biological material (living individual or strain) from which the nucleic acid sequence was obtained. Value format is '[<institution_code>:[<collection_code>:]]<material_id>'.
SSUB-R0040
level: warning
Name: Invalid bio_material
Message: The institution-code and/or collection-code must be registered in the NCBI BioCollections. Valid bio_material format is "[<institution-code>:[<collection-code>:]]<material_id>"
Description: For valid institution and collection codes, please see the NCBI BioCollections database or the BioCollections list file.
SSUB-R0041
level: error
Name: Null value for infra-specific identifier
Message: Enter non-null value (other than missing etc) for infra-specific identifiers (strain, isolate, cultivar and ecotype) for samples of assembled genome sequences and clinical isolates.
SSUB-R0042
level: warning
Name: Non-identical identifiers among organism/strain/isolate
Message: The identifier in the organism name after 'sp./bacterium/archeon' differs from strain/isolate. Use strain/isolate as the identifier for the organism not registered in the NCBI Taxonomy.
SSUB-R0043
level: error
Name: Invalid strain value
Message: The following values are not allowed: 'bacteria', 'clinical isolate', 'environmental', 'microbial', 'no', 'soil', 'sp', 'sp.', strain', 'whole organism', 'yes'. Additionally, strain name should not start with 'subsp.', 'serovar' or the organism name. All checks are case-insensitive. Provide a valid strain name rather than a descriptive term. This is generally the identifier that you use in your lab work for this sample.
SSUB-R0044
level: warning
Name: Invalid datetime format
Message: Invalid datetime format. Follow ISO 8601 standard "YYYY-mm-dd", "YYYY-mm" or "YYYY-mm-ddThh:mm:ssZ" (e.g., 1990-10-30, 1990-10 or 1990-10-30T14:41:36Z). Collection times must be in Coordinated Universal Time (UTC). Times without time zone are processed as UTC. Non-UTC times are converted to UTC.
SSUB-R0045
level: error
Name: Missing reporting level term
Message: When you do not report "collection_date" and "geo_loc_name", provide missing value in format "missing: reporting level term" (e.g. "missing: control sample") to declare both the absence of a true value as well as the reason. A reporting level term is required in these two attributes.
SSUB-R0046
level: warning
Name: New line included
Message: New line included: value possibly contains new line(s).
SSUB-R0047
level: error
Name: Non-ASCII header line
Message: Unable to parse file: the header line contains non-ASCII characters. Please check that uploaded file is in text format, not Excel.
SSUB-R0048
level: error
Name: Empty column name
Message: Unable to parse batch file: the header line has empty column name.
SSUB-R0049
level: error
Name: Non-UTF8 input
Message: Unable to parse batch file: some values cannot be converted to UTF8 encoding.
SSUB-R0050
level: error
Name: Non-ASCII Attribute value
Message: Non-ASCII format characters detected. Please check for non-standard characters in your attribute values around [### Non-ASCII character ###] and reformat as ASCII-only so that data can be properly consumed by dependent databases.
Last updated