1. Overview
Introduction
Korean Nucleotide sequence Archive (KNA) is a repository that collects, stores, and manages nucleotide sequence data across a wide range of biological sources, including genomic, transcriptomic, and synthetic sequences.
KNA accommodates multiple categories of nucleotide data:
Nucleotide sequences: general DNA or RNA sequences such as genes, mRNAs, non-coding RNAs, and organelle genomes (mitocondria, chloroplast, etc.).
Genome assemblies: complete or draft-level genomic sequences from prokaryotic and eukaryotic organisms, including reference genomes and scaffolds of high quality.
MAGs (Metagenome-Assembled Genomes): reconstructed genomes derived from environmental metagenomic samples through binning and assembly, accompanied by quality metrics such as completeness and contamination.
TSA (Transcriptome Shotgun Assemblies): assembled transcript sequences obtained through RNA sequencing, including de novo or reference-based assemblies and novel isoforms.
TLS (Targeted Locus Sequences): marker gene sequences such as 16S rRNA, 18S rRNA, ITS, and COI, used for taxonomic and phylogenetic studies.
Synthetic constructs and PCR primers: artificially designed or experimentally applied sequences such as plasmids, vectors, or primer sets used in molecular biology research.
KNA functions as a component of the Korean BioData Station (K-BDS), providing a standardized platform for the submission, management, and dissemination of nucleotide sequence data generated through biological and genomic research. Through harmonized metadata and accession systems, KNA aims to ensure national-scale data archiving and interoperability with international repositories (e.g., GenBank, ENA, DDBJ)
Key features of KNA
Integrated Nucleotide data management KNA provides a centralized platform to collect, store, and manage diverse nucleotide sequence data, including genomic sequences, transcriptomes, marker sequences, synthetic constructs, and PCR primers.
Compliance and Alignment with International Standards KNA is aligning its data standards and submission framework with INSDC formats (GenBank, ENA, DDBJ) to support global data sharing and interoperability. This ensures that submitted data can be efficiently integrated with international repositories once the archive joins INSDC.
Support for Non-Human and Environmental Samples KNA accepts sequences from humans, non-human organisms, and environmental samples, including metagenomes and organelle genomes, expanding research utility across biodiversity and ecology studies.
Secure and Scalable Data Storage KNA maintains robust infrastructure for secure storage, backup, and scalable management of large-scale nucleotide datasets, ensuring long-term preservation and accessibility.
Facilitating Transparent and Reproducible Research Through standardized accession systems and harmonized metadata, KNA promotes reproducibility and transparency in genomic research, while currently preparing for integration with international standards and repositories.
Data Validation and Quality Assurance Submitted sequences undergo automated and manual validation to ensure integrity, completeness, and adherence to submission guidelines. Metadata curation ensures high-quality, standardized information.
Cite
To describe data deposition in your manuscript, use the following sentence: The nucleotide sequence data reported in this paper have been deposited in the Korea Nucleotide Sequence Archive (KNA) in Korea Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology (KAPxxxxxxx) that are publicly accessible at https://kbds.re.kr/KNA.
Contact KNA staff for assistance at kna@kribb.re.kr
Last updated