2-5. File format guide
π¨π¨π¨μ°μ GenBank fileμ μμ ??, FASTAμμ .faλ μ΄μ λλμ?? (μ΄μ μ μλ¨)...π¨π¨π¨
Introduction
This page informs the submission file formats currently supported by the KNA and gives guidance to submitters about current and future file formats and policies regarding KNA submissions.
KNA accepts text formats such as FASTA(.fasta) for nucleotide sequences.
Only files in FASTA format with the .fasta extension can be uploaded.
Files with .fa or .fna extensions cannot be uploaded, so caution is required.
KNA also accepts GFF(.gff) files for annotation
FASTA file (.fasta)
FASTA files are used to submit unannotated nucleotide sequences, which can represent contigs, scaffolds, or entire chromosomes. Each sequence record in a FASTA file consists of a header line (identifier) and a sequence line.
File Structure
Header line (identifier)
Starts with a
>character.Followed by a sequence identifier and optionally additional information (e.g., description, organism).
No line breaks are allowed within the header.
Sequence line(s)
Contains the nucleotide sequence for that record.
Only valid nucleotide characters are allowed:
Standard bases:
A,C,G,T(uppercase or lowercase)Unknown bases:
Norn
No whitespace or newline characters should appear within a continuous sequence.
Sequences may be split across multiple lines for readability, but the concatenated sequence represents a single record.
Example
GFF file (.gff)
A GFF (Generic Feature Format) file is a 9-column tab-delimited annotation file used to describe genomic features such as genes, transcripts, exons, CDS, and regulatory elements, along with their genomic coordinates and attributes. GFF files submitted to KNA should conform to GFF3 or GTF specifications.
File Structure
Each row represents a single genomic feature with the following 9 columns:
seqid β Sequence or chromosome name
source β Annotation source or software
type β Feature type (e.g., gene, mRNA, exon, CDS)
start β Start position of the feature on the sequence
end β End position of the feature
score β Numeric score (use
.if not applicable)strand β Strand (
+or-)phase β Reading frame for CDS (
0,1,2),.if not applicableattributes β Key-value pairs such as
ID,Parent,Name,Note
Example (GFF3)
Reference
Last updated