1. Introduction
What is a BioProject?
BioProject is a central framework in K-BDS that connects and organizes diverse biological data generated from a single research initiative. It acts as a top-level container that provides an overview of the research, offering a single entry point to access related datasets across multiple K-BDS databases.
A BioProject can be established by a single research group or a consortium and serves as a reference unit that links all associated data — including sequencing, omics, chemical, imaging, and generalist data — under one research context.
Why Register a BioProject?
Registering a BioProject ensures that various types of data produced from the same study are contextually connected and traceable. This linkage provides several important benefits:
Integrated data access: Easily find and access all related datasets (e.g., genome, transcriptome, proteome) under one project.
Reproducibility and citation: Provide a unified reference ID that can be cited in publications, enhancing reproducibility and scientific transparency.
Data interoperability: Enable cross-database linking, improving the discoverability and reusability of your data.
Efficient project management: Organize complex multi-omics and cross-disciplinary data under a single project framework.
What Types of Data Can a BioProject Include?
A BioProject supports a wide range of biological and biomedical data generated from comprehensive research efforts, including:
Genomics data (sequence read, functional genomics data, nucleotide and assembly, variation)
Proteomics data from mass spectrometry (MS)
Metabolomics data from MS/NMR
Chemical data (structure, assay, profiling)
Bio-imaging data (optical, EM, MR, CT, EPhys, etc.)
Pre-clinical data
Others (generalist)
Relationship with Other Registration Units
In K-BDS, a BioProject serves as the top-level registration unit and connects to other components in the data submission ecosystem:
BioSample: Provides detailed information about the biological source materials (e.g., organism, tissue, environment).
KRA (Korea Sequence Read Archive): Stores raw sequencing data.
KNA (Korea Nucleotide Sequence Archive): Stores assembled nucleotide sequences.
KEA (Korea Expression Archive): Manages gene expression and functional genomics data.
KSO (Korea Spatial Omics): Handles spatial transcriptomics and multi-omics spatial data.
Other databases: Such as proteomics, metabolomics, chemical profiling, bio-imaging, and pre-clinical archives.
This hierarchical structure ensures that each dataset is properly linked back to the research context, making the BioProject the central node in the K-BDS data ecosystem.
💡 Tip: Think of a BioProject as the "home page" for your research. It tells the story of your study and organizes all associated data — no matter how diverse — into one accessible and citable reference point.
BioProject as a Central Hub for Integrated Biological Data
Last updated