SV
1. SV VCF Header
Metadata Field Descriptions
fileformat: The current VCF version ID (e.g., VCFv4.1)
fileDate: The date that the file was generated or the date when the file was updated. Use YYYYMMDD format (e.g., 20120201)
reference: The RefSeq Assembly accession number from NCBI or the genome accession number from KBDS KNA on which the variation position is based (e.g., GCF_000001405.40).
Note: If you do not place the required metadata in the VCF file header, your submission files will be returned to you for correction and resubmission.
Metadata Header Example
Example of KVar Required Metadata in a VCF formatted file:
##fileformat=VCFv4.1
##fileDate=20250101
##reference=GCF_000001405.40ALT Tag Descriptions
Define the ALT symbolic alleles used for SVs in the header. Place ALT tag/value descriptions in the header following the required metadata; they will serve to define the data you place in the ALT column of the submission data table.
The ALT tag/value descriptions are an important part of the VCF header as they will allow users viewing your data in VCF format to identify a tag you placed in the ALT column and see definitions for values of that tag. The data you present in the ALT column will be meaningless to some users without the inclusion of the tag/value descriptions in the VCF header for those data.
ALT Tag Definitions:
INFO Tag Descriptions
The VCF header continues with tag/value descriptions for required and optional KVar INFO tags. These descriptions should be placed in the header following the ALT tag descriptions.
The INFO tag/value descriptions you provide in the VCF header will serve to define the data you place in the INFO column of the data table. These descriptions are an important part of the VCF header as they will allow users viewing your data in VCF format to identify a tag you placed in the INFO column and see definitions for values of that tag. The data you present in the INFO column of the data table will be meaningless to some users without the inclusion of the tag/value descriptions in the VCF header for those data.
Note: For detailed descriptions and examples of all available INFO tags, see the INFO Tag Descriptions and Examples section below.
2. SV VCF Data Table
Data Table Structure
Create a tab-delimited table to house your variations and variation data for your submission. The table header should include these eight fixed, mandatory columns (in order):
These columns represent eight fixed fields that must be filled out for each submitted variant. If you do not have data for a particular field, use a dot (".") to represent the missing value.
VCF Data Table Examples
VCF Data Table Field Values
CHROM
This field contains the chromosome identifier from the reference genome where the variant is located. KVar accepts only the "chr" prefix format (e.g., chr1, chr2, chr3, chrX, chrY, chrM). Entries for a specific CHROM should form a contiguous block within the VCF file.
POS
This field contains the reference position of the variant, which is the 1st base of the variation event. Positions are sorted numerically within each reference sequence chromosome (CHROM) in increasing order. All coordinates should be 1-based. You are permitted to have multiple records of different structural variation types (SVTYPE) at the same POS – list shortest variants first. Telomeres are indicated by using positions 1 (p-arm) or chromosome length (q-arm).
Note: Single nucleotide variants and small (< 50 bp) insertions and deletions must be submitted following the SNP page guidelines.
ID
This field contains a unique identifier (ID) for the variant and is required. The ID provided here combined with the handle must be unique for a particular submitter.
REF
This field contains the reference allele of the variant. The bases representing the reference allele can be any of the following: A, C, G, T, or N (case insensitive).
Although standard VCF specification requires that literal sequence representing the REF allele should be provided, it can be difficult to represent the REF allele as literal sequence since structural variants can be quite large and there is often ambiguity in the locations of their breakpoints. It is therefore acceptable to list only the first base of the REF allele.
ALT
This field contains the alternate allele of the variant. Although standard VCF specification requires that literal sequence representing the ALT allele should be provided, it can be difficult to represent the ALT allele as literal sequence since structural variants can be quite large and there is often ambiguity in the locations of their breakpoints. It is therefore preferable to provide one of several ALT tags, surrounded by angle brackets, to indicate the nature of variation at the ALT allele. If there are no alternative alleles, put a dot (".") placeholder in the ALT column.
Note: Although use of ALT tags is not required, it is strongly recommended when it is not possible to list the full sequence of the submitted structural variation. See the ALT Tag Definitions section above for the complete list of available ALT tags and their descriptions.
QUAL
This field contains the quality score for the assertion if available.
FILTER
This field contains the filter status if available.
INFO
This field contains additional information for the reported variation. INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: <key>=<data>[,data]
See the INFO Tag Descriptions and Examples section of this document for examples of the required and optional INFO Tags that KVar supports.
INFO Tag Descriptions and Examples
Required KVar VCF INFO Tags
Place the required tags in the INFO column of the data table and place the corresponding tag descriptions in the file header.
Structural Variant Type (SVTYPE) INFO Tag
The required "SVTYPE" INFO tag allows you to define the kind of structural variation you are submitting to KVar. Failure to include this required INFO tag will result in the delay of your submission.
SVTYPE Tag/Value Description:
SVTYPE Data Format Example:
SVTYPE Value Descriptions:
DEL: Deletion relative to the reference
INS: Insertion of sequence relative to the reference
DUP: Region of elevated copy number relative to the reference
INV: Inversion of reference sequence
CNV: Copy number polymorphic region
BND: Breakend; used to represent complex rearrangements where breakpoints form novel adjacencies between different genomic locations
End Position (END) INFO Tag
The required "END" INFO tag specifies the end position of the variant described in this record.
END Tag/Value Description:
END Data Format Example:
Structural Variant Length (SVLEN) INFO Tag
The required "SVLEN" INFO tag specifies the difference in length between REF and ALT alleles.
SVLEN Tag/Value Description:
SVLEN Data Format Example:
Experiment ID (EXPERIMENT) INFO Tag
The required "EXPERIMENT" INFO tag specifies the Experiment ID from the metadata that generated this call.
EXPERIMENT Tag/Value Description:
EXPERIMENT Data Format Example:
SampleSet ID (SAMPLESET) INFO Tag
The required "SAMPLESET" INFO tag specifies the SampleSet ID from the metadata in which the variant was observed.
SAMPLESET Tag/Value Description:
SAMPLESET Data Format Example:
Imprecise Variant Coordinates
For imprecise structural variants whose breakpoints are not known to basepair resolution, KVar supports both standard VCF confidence intervals (CIPOS/CIEND) and dbVar-specific inner/outer coordinate ranges (POSrange/ENDrange), following the dbVar VCF submission format.
Note:
KVar follows the dbVar VCF format and accepts both CIPOS/CIEND and POSrange/ENDrange tags for reporting imprecise variant coordinates. Use whichever pair of terms is best supported by your underlying data.
One POSrange value must equal POS; one ENDrange value must equal END.
Optional KVar VCF INFO Tags
The following INFO tags are optional and need only be used if they describe available data. If you want to include any of the following INFO tags with your submitted data, place the tag in the INFO column of the data table and place the corresponding tag description in the file header.
Optional VCF INFO tags for KVar submissions include:
Mate Breakend ID (MATEID)
Description (DESC)
Genetic Origin (ORIGIN)
Phenotype (PHENO)
Links to External Databases (LINKS)
Validation Experiment (valEXPERIMENT)
Validated Flag (VALIDATED)
Mate Breakend ID (MATEID)
The optional "MATEID" INFO tag specifies the ID of mate breakends for complex rearrangements. MATEID is used when representing structural variants as breakends (SVTYPE=BND), where each breakend in a novel adjacency references its mate breakend. This tag is important for linking variant calls together, particularly when grouping variant calls into variant regions.
MATEID Tag/Value Description:
MATEID Data Format Example:
For breakend variants representing complex rearrangements:
For breakends with multiple mates:
Note: MATEID is used when representing structural variants as breakends (SVTYPE=BND). When a breakend has a single mate, provide one ID. When a breakend has multiple mates (e.g., due to breakend reuse or uncertainty in measurement), provide a comma-separated list of mate IDs. The MATEID tag is crucial for grouping variant calls into variant regions during downstream analysis. For more detailed examples of MATEID usage, refer to the 1000 Genomes Project VCF format documentation.
Description (DESC)
The optional "DESC" INFO tag allows you to provide KVar with any additional information about this call that is not covered elsewhere.
DESC Tag/Value Description:
DESC Data Format Example:
Genetic Origin (ORIGIN)
The optional "ORIGIN" INFO tag allows you to provide KVar with the origin of the allele if known.
ORIGIN Tag/Value Description:
ORIGIN Data Format Example:
Phenotype (PHENO)
The optional "PHENO" INFO tag allows you to provide KVar with phenotype(s) associated with this call. Note: All clinical assertions should be submitted to ClinVar, not KVar.
PHENO Tag/Value Description:
PHENO Data Format Example:
Links to External Databases (LINKS)
The optional "LINKS" INFO tag allows you to point to this variant in external databases or to other relevant online information about your submission.
LINKS Tag/Value Description:
LINKS Data Format Example:
Validation Experiment (valEXPERIMENT)
The optional "valEXPERIMENT" INFO tag allows you to provide KVar with the Experiment ID(s) from metadata of the experiment(s) used to validate this call, followed by a colon and 'Pass' or 'Fail'.
valEXPERIMENT Tag/Value Description:
valEXPERIMENT Data Format Example:
Validated Flag (VALIDATED)
The optional "VALIDATED" INFO tag is a flag indicating that the variant was validated by a follow-up experiment.
VALIDATED Tag/Value Description:
VALIDATED Data Format Example:
3. Examples of Structural Variation in KVar VCF Format
Insertion (precise)
An insertion of 981 base pairs immediately to the right of coordinate 14588694.
Insertion (imprecise)
An insertion of approximately 1500 base pairs, as determined by a paired-end mapping (PEM) experiment. The insertion took place somewhere between the coordinates corresponding to the mapped sequence reads.
Deletion (precise)
A deletion of 387 base pairs (the deleted bases are 2599384 thru 2599770).
Deletion (imprecise)
A deletion of approximately 8000 base pairs, as determined by a paired-end mapping (PEM) experiment.
Inversion (precise)
An inversion of 863 base pairs.
Duplication (imprecise, inners and outers)
A duplication determined by oligo array CGH. The nature of arrays is such that breakpoints cannot be determined to base pair resolution, only to a range defined by probes on the array.
Inners only:
Outers only:
Inners and Outers:
4. Limitations and Notes
Variation Size and Data Submission Limitations
Submit variations >50bp in length to KVar using SV format. Small variants (≤ 50 bp) must follow the SNP submission format.
Synthetic mutations are not accepted
Variations ascertained from cross-species alignments and analysis are not accepted
Personal human data cannot be accepted due to current policy unless the participant is enrolled in a study with institutional oversight
Last updated