4. How to search and download

Browse

Click [Browse] on the top menu of the KRA page to navigate to the browsing page

  • Search bar where you can input search terms

  • Filters for detailed search (Access, Strategy, Selection, Layout, File type, Platform, Source)

  • The number of data entries registered in KRA and their release status

  • Ability to download metadata/raw data of checked items

  • Ability to move to the detail page when selecting a title

When selecting a Experiment title, you can view the Experiment information and download raw data

KRA browse page

File Download

Click [Download] button in Runs for raw data download

If you want to know the quality of the raw data file, you can check it through [KBQC report] in Runs

If you need to copy a data file link to your My GBox account for high-speed file download or analysis using BioExpress, Clink [Copy data my GBox]. After the link, you can download raw data related to BioProject (KAP#) from GBox Browser/Explorer

Detail page & file download

KBQC report

KRA provides a quality control report for raw data. The QC report is generated using FastQC, and the results are indicated as FAIL, WARN or PASS. The descriptions of each parameter are as follows.

A FAIL in QC report does not necessarily mean that the data is unusable. However, if FAIL occurs in specific metrics, it is important to review the experimental design, data quality, and pre-processing steps (e.g., trimming).

Certain issues arising from experimental design or sequencing processes may lead to FAIL in some metrics, but this does not automatically prevent data use. For example, depending on the sequencing strategy, duplication rates may be high, which is common in certain library preparation methods. This does not indicate poor data quality, but rather a natural occurrence depending on the experimental design.

Therefore, even with FAILs in FastQC quality checks, analysis can still proceed, and issues can be resolved through pre-processing or further steps.

Basic Statistics

  • Description: Provides general information such as the number of sequences, GC content, and read length.

  • PASS: No WARN/FAIL classification (informational only).

Per Base Sequence Quality

  • Description: Displays the average and distribution of Phred quality scores (Q-scores) for each base position

  • PASS: Q-score ≥ 28 across all positions

  • WARN: Some positions have Q-scores between 20 and 28

  • FAIL: Multiple positions have Q-scores ≤ 20 If the quality of some areas is low, it can be used after trimming, but if it is low overall, the data reliability is low.

Per Tile Sequence Quality (For Illumina platform)

  • Description: Monitors Q-score variation across sequencing tiles (flow cell regions)

  • PASS: Even Q-score distribution

  • WARN/FAIL: Some tiles show low quality, potentially affecting results

Per Sequence Quality Scores

  • Description: Assesses the distribution of mean Q-scores per read

  • PASS: Most sequences have Q-scores ≥ 28

  • WARN: Some sequences have Q-scores < 20

  • FAIL: A large number of sequences have Q-scores < 20 If the mean Pred score is too low, the analysis results may be less reliable.

Per Base Sequence Content

  • Description: Analyzes the proportion of A, T, G, and C bases at each position. In general, random library should have similar A:T:G:C ratios at all locations.

  • PASS: Nucleotide composition remains relatively even across all positions

  • WARN: Some positions show a nucleotide bias of more than 10%

  • FAIL: Extreme base composition bias is observed It is possible that bias exists due to excessive or small number of specific bases or that a specific adapter sequence or artificial sequence was included.

Per Sequence GC Content

  • Description: Compares the GC content distribution to the expected range for the dataset

  • PASS: GC content distribution falls within the expected range

  • WARN: Some sequences deviate slightly from the expected GC content

  • FAIL: The GC content distribution is highly abnormal (indicating contamination) It is possible that the sample was contaminated or that the GC-rich or AT-rich sequence was over-amplified during the PCR amplification process.

Per Base N Content

  • Description: Measures the percentage of N (ambiguous) bases at each position

  • PASS: N content is < 1% at all positions

  • WARN: Some positions have N content between 1–5%

  • FAIL: Multiple positions show N content > 5% If the 'N' ratio is too high at a certain location, there are possible sequencing errors, contamination, and sample status problems. If the N ratio is high at the 3' end, it can be analyzed after trimming. However, if it is high overall, consider re-sequencing.

Sequence Length Distribution

  • Description: Examines the distribution of read lengths to check for consistency

  • PASS: Reads have a uniform length distribution

  • WARN: A mixture of different sequence lengths is present

  • FAIL: The sequence length distribution is highly irregular (possible trimming artifacts) If there are many short reads, there is a possibility of PCR amplification errors, sample degradation, or sequencing errors.

Sequence Duplication Levels

  • Description: Evaluates how frequently identical sequences appear in the dataset

  • PASS: The majority of sequences are unique

  • WARN: More than 20% of sequences are duplicates

  • FAIL: Over 50% of sequences are duplicates (indicating possible PCR duplication) In RNA-Seq, it is possible that certain reads were over-amplified during the PCR amplification process. It can be analyzed after removing duplicate reads. If there are too many duplicates, the experimental design should be reviewed

Overrepresented Sequences

  • Description: Identifies sequences that occur more frequently than expected

  • PASS: No single sequence represents more than 0.1% of total reads

  • WARN: Some sequences account for 0.1–1% of total reads

  • FAIL: One or more sequences make up more than 1% of total reads (indicating possible contamination) Typically occurs when the adapter sequence is not removed, which can be analyzed after trimming.

Adapter Content

  • Description: Detects adapter sequences remaining in reads

  • PASS: Minimal or no adapter contamination

  • WARN/FAIL: Presence of adapter sequences requires trimming No problem if used after trimming

Last updated