1. Introduction

What is a BioProject?

BioProject is a central framework in K-BDS that connects and organizes diverse biological data generated from a single research initiative. It acts as a top-level container that provides an overview of the research, offering a single entry point to access related datasets across multiple K-BDS databases.

A BioProject can be established by a single research group or a consortium and serves as a reference unit that links all associated data — including sequencing, omics, chemical, imaging, and generalist data — under one research context.


Why Register a BioProject?

Registering a BioProject ensures that various types of data produced from the same study are contextually connected and traceable. This linkage provides several important benefits:

  • Integrated data access: Easily find and access all related datasets (e.g., genome, transcriptome, proteome) under one project.

  • Reproducibility and citation: Provide a unified reference ID that can be cited in publications, enhancing reproducibility and scientific transparency.

  • Data interoperability: Enable cross-database linking, improving the discoverability and reusability of your data.

  • Efficient project management: Organize complex multi-omics and cross-disciplinary data under a single project framework.


What Types of Data Can a BioProject Include?

A BioProject supports a wide range of biological and biomedical data generated from comprehensive research efforts, including:

  • Genomics data (sequence read, functional genomics data, nucleotide and assembly, variation)

  • Proteomics data from mass spectrometry (MS)

  • Metabolomics data from MS/NMR

  • Chemical data (structure, assay, profiling)

  • Bio-imaging data (optical, EM, MR, CT, EPhys, etc.)

  • Pre-clinical data

  • Others (generalist)


Relationship with Other Registration Units

In K-BDS, a BioProject serves as the top-level registration unit and connects to other components in the data submission ecosystem:

  • BioSample: Provides detailed information about the biological source materials (e.g., organism, tissue, environment).

  • KRA (Korea Sequence Read Archive): Stores raw sequencing data.

  • KNA (Korea Nucleotide Sequence Archive): Stores assembled nucleotide sequences.

  • KEA (Korea Expression Archive): Manages gene expression and functional genomics data.

  • KSO (Korea Spatial Omics): Handles spatial transcriptomics and multi-omics spatial data.

  • Other databases: Such as proteomics, metabolomics, chemical profiling, bio-imaging, and pre-clinical archives.

This hierarchical structure ensures that each dataset is properly linked back to the research context, making the BioProject the central node in the K-BDS data ecosystem.


💡 Tip: Think of a BioProject as the "home page" for your research. It tells the story of your study and organizes all associated data — no matter how diverse — into one accessible and citable reference point.

BioProject as a Central Hub for Integrated Biological Data

Last updated