1. Introduction

What is a BioSample?

A BioSample represents the biological source material from which data in K-BDS are derived. It provides detailed contextual information about the organism, tissue, cell line, environment, or other biological origin that was used in an experiment process. Each BioSample serves as a unique descriptor of the biological material, ensuring that associated datasets (e.g., sequencing, imaging, metabolomics) can be accurately interpreted and reused.

In K-BDS, a BioSample acts as the fundamental descriptive unit that captures metadata about the sample’s source, condition, and collection context, forming the basis for biological reproducibility and interoperability.

The BioSample database stores structured descriptions and metadata of biological materials used in research, ensuring that each dataset retains clear biological and experimental context.

For many data types such as genomics, the BioSample submission form must be completed before submitting actual data. This form includes not only basic information about the biological material (e.g., species name, supplier) but also detailed biological attributes such as growth conditions, treatment details, and disease states. Through this process, each dataset is registered in a way that preserves the characteristics of the sample and the context of the experiment.


Why Register a BioSample?

Registering a BioSample ensures that biological source information is clearly defined, standardized, and linked to all derived datasets. This connection enhances the scientific value and reusability of the data by maintaining traceability from raw materials to analytical results.

Key benefits include:

  • Traceability: Establish a clear lineage between data and its biological origin (organism, tissue, environment, etc.).

  • Metadata standardization: Support consistent and structured descriptions aligned with international standards (e.g., INSDC, MIxS).

  • Cross-database integration: Automatically link BioSamples to associated data in other K-BDS databases (e.g., KRA, KNA, KEA).

  • Reproducibility: Provide researchers with enough contextual information to replicate experiments or analyses accurately.


What Information Does a BioSample Include?

A BioSample record captures comprehensive metadata describing the biological source and experimental context, including (but not limited to):

  • Organism information: Scientific name, taxonomy ID, strain, isolate, and genetic background.

  • Sample type: Tissue, organ, cell line, environmental sample, etc.

  • Collection details: Geographic location, date, collection method, habitat, and conditions.

  • Experimental context: Treatment, disease state, developmental stage, or environmental exposure.

  • Host association: Host species, health status, and relationship to microbiome or symbiotic samples.

  • Linked identifiers: Connections to BioProject, KRA, KNA, KEA, and other archives.

This standardized metadata structure ensures the biological and environmental context of each dataset is captured with precision.


Relationship with Other Registration Units

In K-BDS, the BioSample is directly linked to other registration components:

  • BioProject: The overarching research framework that defines the study context.

  • KRA (Korea Sequence Read Archive): Stores raw sequencing reads derived from the BioSample.

  • KNA (Korea Nucleotide Sequence Archive): Stores assembled nucleotide sequences generated from the sample.

  • KEA (Korea Expression Archive): Contains expression and functional genomics data from the same sample.

  • Other archives: Such as genomic variation, or metabolomics, bio-imaging, generalist datasets connected to the same biological material.

Last updated