4. How to search and download
Browse
Click [Browse] on the top menu of the KRA page to navigate to the browsing page
Search bar where you can input search terms
Filters for detailed search (Access, Strategy, Selection, Layout, File type, Platform, Source)
The number of data entries registered in KRA and their release status
Ability to download metadata/raw data of checked items
Ability to move to the detail page when selecting a title
When selecting a Experiment title, you can view the Experiment information and download raw data

File Download
Click [Download] button in Runs for raw data download
If you want to know the quality of the raw data file, you can check it through [KBQC report] in Runs
If you need to copy a data file link to your My GBox account for high-speed file download or analysis using BioExpress, Clink [Copy data my GBox]. After the link, you can download raw data related to BioProject (KAP#
) from GBox Browser/Explorer

KBQC report
KRA provides a quality control report for raw data. The QC report is generated using FastQC, and the results are indicated as FAIL, WARN or PASS. The descriptions of each parameter are as follows.
A FAIL in QC report does not necessarily mean that the data is unusable. However, if FAIL occurs in specific metrics, it is important to review the experimental design, data quality, and pre-processing steps (e.g., trimming).
Certain issues arising from experimental design or sequencing processes may lead to FAIL in some metrics, but this does not automatically prevent data use. For example, depending on the sequencing strategy, duplication rates may be high, which is common in certain library preparation methods. This does not indicate poor data quality, but rather a natural occurrence depending on the experimental design.
Therefore, even with FAILs in FastQC quality checks, analysis can still proceed, and issues can be resolved through pre-processing or further steps.
Basic Statistics
Description: Provides general information such as the number of sequences, GC content, and read length.
PASS: No WARN/FAIL classification (informational only).
Per Base Sequence Quality
Description: Displays the average and distribution of Phred quality scores (Q-scores) for each base position
PASS: Q-score ≥ 28 across all positions
WARN: Some positions have Q-scores between 20 and 28
FAIL: Multiple positions have Q-scores ≤ 20 If the quality of some areas is low, it can be used after trimming, but if it is low overall, the data reliability is low.
Per Tile Sequence Quality (For Illumina platform)
Description: Monitors Q-score variation across sequencing tiles (flow cell regions)
PASS: Even Q-score distribution
WARN/FAIL: Some tiles show low quality, potentially affecting results
Per Sequence Quality Scores
Description: Assesses the distribution of mean Q-scores per read
PASS: Most sequences have Q-scores ≥ 28
WARN: Some sequences have Q-scores < 20
FAIL: A large number of sequences have Q-scores < 20 If the mean Pred score is too low, the analysis results may be less reliable.
Per Base Sequence Content
Description: Analyzes the proportion of A, T, G, and C bases at each position. In general, random library should have similar A:T:G:C ratios at all locations.
PASS: Nucleotide composition remains relatively even across all positions
WARN: Some positions show a nucleotide bias of more than 10%
FAIL: Extreme base composition bias is observed It is possible that bias exists due to excessive or small number of specific bases or that a specific adapter sequence or artificial sequence was included.
Per Sequence GC Content
Description: Compares the GC content distribution to the expected range for the dataset
PASS: GC content distribution falls within the expected range
WARN: Some sequences deviate slightly from the expected GC content
FAIL: The GC content distribution is highly abnormal (indicating contamination) It is possible that the sample was contaminated or that the GC-rich or AT-rich sequence was over-amplified during the PCR amplification process.
Per Base N Content
Description: Measures the percentage of N (ambiguous) bases at each position
PASS: N content is < 1% at all positions
WARN: Some positions have N content between 1–5%
FAIL: Multiple positions show N content > 5% If the 'N' ratio is too high at a certain location, there are possible sequencing errors, contamination, and sample status problems. If the N ratio is high at the 3' end, it can be analyzed after trimming. However, if it is high overall, consider re-sequencing.
Sequence Length Distribution
Description: Examines the distribution of read lengths to check for consistency
PASS: Reads have a uniform length distribution
WARN: A mixture of different sequence lengths is present
FAIL: The sequence length distribution is highly irregular (possible trimming artifacts) If there are many short reads, there is a possibility of PCR amplification errors, sample degradation, or sequencing errors.
Sequence Duplication Levels
Description: Evaluates how frequently identical sequences appear in the dataset
PASS: The majority of sequences are unique
WARN: More than 20% of sequences are duplicates
FAIL: Over 50% of sequences are duplicates (indicating possible PCR duplication) In RNA-Seq, it is possible that certain reads were over-amplified during the PCR amplification process. It can be analyzed after removing duplicate reads. If there are too many duplicates, the experimental design should be reviewed
Overrepresented Sequences
Description: Identifies sequences that occur more frequently than expected
PASS: No single sequence represents more than 0.1% of total reads
WARN: Some sequences account for 0.1–1% of total reads
FAIL: One or more sequences make up more than 1% of total reads (indicating possible contamination) Typically occurs when the adapter sequence is not removed, which can be analyzed after trimming.
Adapter Content
Description: Detects adapter sequences remaining in reads
PASS: Minimal or no adapter contamination
WARN/FAIL: Presence of adapter sequences requires trimming No problem if used after trimming
Last updated