2.3 Variation File Format Guide

Introduction

This document provides detailed guidelines on the accepted variation file formats for data submission to KVar. The specifications are based on international standards, primarily referencing:

  • The 1000 Genomes Project Variant Call Format (VCF 4.1)

  • The dbSNP VCF Submission Format Guidelines

  • The dbVar VCF Submission Format Guidelines

KVar adopts and harmonizes these formats to ensure compatibility with global variant databases while maintaining metadata structures consistent with the K-BDS framework.

Depending on the Study variation type, the accepted file formats differ as follows:

  • If the Study variant type is SNP, only VCF format is accepted. The file must follow the VCF 4.1 specification and contain small variants (≤ 50 bp).

  • If the Study variant type is SV (Structural Variation), both VCF and Excel (.xlsx) submission formats are supported. The Variant Call file is mandatory and must describe all individual structural variant events (> 50 bp). The Variant Region file is optional, but if a submitter wishes to provide region-level information, it should be submitted using the Excel format template provided by KVar.

These guidelines aim to standardize variant file submissions across projects and ensure that all variant records—whether SNPs or SVs—can be accurately parsed, validated, and visualized within the KVar browser environment.

Last updated