Documentation Index
Fetch the complete documentation index at: https://docs.synthesize.bio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Metadata prediction models infer biological metadata from observed expression data. Given a gene expression profile, the model predicts the likely biological characteristics such as cell type, tissue, disease state, and more. This is useful when you want to:- Annotate samples of unknown origin
- Validate sample labels against expression patterns
- Discover potential mislabeled or contaminated samples
- Understand the biological characteristics captured in expression data
Available Models
gem-1-bulk_predict-metadata: Bulk RNA-seq metadata prediction modelgem-1-sc_predict-metadata: Single-cell RNA-seq metadata prediction model
These endpoints may require 1-2 minutes of startup time if they have been scaled down. Plan accordingly for interactive use.
How It Works
Metadata prediction encodes your expression data into the model’s latent space and then uses classifiers to predict the most likely metadata values for each sample. The model returns:- Classifier probabilities: For each categorical metadata field, the probability distribution over possible values
- Predicted labels: The most likely value for each metadata field
- Latent representations: The biological, technical, and perturbation latent vectors
Creating a Query
Metadata prediction queries are simpler than other model types—you only need to provide expression counts:-
inputs: A list of count vectors, where each element is a named list with acountsfield containing expression values -
seed(optional): Random seed for reproducibility
Example: Predicting Sample Metadata
Here’s a complete example predicting metadata for expression samples:Example: Single Sample Prediction
For predicting metadata of a single sample:Query Parameters
inputs (list, required)
A list of expression count vectors. Each element should be a named list containing:counts: A vector of non-negative integers representing gene expression counts
seed (integer, optional)
Random seed for reproducibility.Understanding the Results
The results from metadata prediction include several components:Predicted Metadata
Themetadata data frame contains the predicted values for each sample:
Classifier Probabilities
For categorical metadata fields, the model returns probability distributions over all possible values. These are useful for understanding prediction confidence:Latent Representations
The model also returns latent vectors that capture biological, technical, and perturbation characteristics:Use Cases
Sample Annotation
Annotate unlabeled samples with predicted metadata:Quality Control
Validate existing sample labels against predicted metadata:Important Notes
Counts Vector Length
The counts vector for each sample must match the model’s expected number of genes. If the length doesn’t match, the API will return a validation error. Useget_example_query() to see the expected structure.
Gene Order
Ensure your counts are in the same gene order expected by the model. The gene order should match what the baseline model expects—you can retrieve this from any prediction result’sgene_order field.
Non-Negative Counts
All count values must be non-negative integers. Floats that are whole numbers (like10.0) are accepted, but negative values will cause validation errors.