Documentation Index
Fetch the complete documentation index at: https://docs.synthesize.bio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Reference conditioning models generate expression data conditioned on a real reference sample. This lets you anchor to an existing expression profile while applying perturbations or modifications. This is useful when you want to:- Simulate the effect of a perturbation on a specific sample
- Generate expression profiles that preserve the biological and technical characteristics of a reference
- Create synthetic treated versus control pairs
Available models
gem-1-bulk_reference-conditioning: Bulk RNA-seq reference conditioning modelgem-1-sc_reference-conditioning: Single-cell RNA-seq reference conditioning model
These endpoints may require 1 to 2 minutes of startup time if they have been scaled down. Plan accordingly for interactive use.
How it works
Reference conditioning encodes the biological and technical characteristics from a real expression sample, then generates new expression data that:- Preserves the biological and technical latent space of the reference
- Applies any perturbation metadata you specify
- Returns synthetic expression that reflects the perturbation effect on that specific sample
Creating a query
Reference conditioning queries require different inputs than baseline models:inputs: A list where each input contains:counts: The reference expression countsmetadata: Perturbation-only metadatanum_samples: How many samples to generate
conditioning: Which latent spaces to condition on, typically["biological", "technical"]sampling_strategy:"mean estimation"or"sample generation"
Perturbation-only metadata
Unlike baseline models, reference conditioning queries only accept perturbation metadata fields:perturbation_ontology_idperturbation_typeperturbation_timeperturbation_dose
Example: simulating a drug treatment
A complete example simulating a drug treatment effect on a reference sample:Example: CRISPR knockout simulation
Simulate the effect of knocking out a specific gene:Query parameters
conditioning (list, optional)
Controls which latent spaces are conditioned on the reference. Default is ["biological", "technical"].
When both are conditioned, the model preserves both biological identity and technical characteristics from the reference sample.
sampling_strategy (str, required)
Controls the type of prediction:
"sample generation": Generates realistic-looking synthetic data with measurement error. Bulk only"mean estimation": Provides stable mean estimates. Bulk and single-cell
fixed_total_count (bool, optional)
Controls whether to preserve the reference’s library size:
False(default): The output’s total count is taken from the reference expression sumTrue: Forces the model to use thetotal_countparameter value or default instead of the reference’s library size
total_count (int, optional)
Library size used when converting predicted log CPM back to raw counts. Only effective when fixed_total_count = True.
- Default: 10,000,000 for bulk; 10,000 for single-cell
deterministic_latents (bool, optional)
If True, the model uses the mean of each latent distribution instead of sampling. This produces deterministic, reproducible outputs.
- Default:
False
seed (int, optional)
Random seed for reproducibility.
Valid perturbation metadata
| Field | Description / format |
|---|---|
perturbation_ontology_id | Ensembl gene ID, ChEBI ID, ChEMBL ID, or NCBI Taxonomy ID |
perturbation_type | One of "coculture", "compound", "control", "crispr", "genetic", "infection", "other", "overexpression", "peptide or biologic", "shrna", "sirna" |
perturbation_time | Time since perturbation as a number and unit separated by a space, such as "24 hours" |
perturbation_dose | Dose as a number and unit separated by a space, such as "10 um" or "1 mg/kg" |
Working with results
The result structure is similar to baseline models:Differential expression
When conditioning on both biological and technical latents, you can directly compare the generated expression to your reference to identify perturbation effects:Important notes
Counts vector length
The reference counts vector must match the model’s expected number of genes. If the length does not match, the API returns a validation error. Useget_example_query() to see the expected structure and ensure your counts vector has the correct length.
Gene order
Ensure your reference counts are in the same gene order expected by the model. The response includes agene_order field that specifies the expected order.