Getting started - Synthesize Bio

rsynthbio is an R package that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq and single-cell RNA-seq. To generate datasets without code, use our web platform.

Authentication

Before using the Synthesize Bio API, you need to set up your API token. The package provides a secure way to handle authentication:

# Securely prompt for and store your API token
# The token will not be visible in the console
set_synthesize_token()

# You can also store the token in your system keyring for persistence
# across R sessions (requires the 'keyring' package)
set_synthesize_token(use_keyring = TRUE)

Loading your API key for a session.

# In future sessions, load the stored token
load_synthesize_token_from_keyring()

# Check if a token is already set
has_synthesize_token()

You can manually set the token, but don’t commit it to version control!

set_synthesize_token(token = "your-token-here")

You can obtain an API token by registering at Synthesize Bio.

Available Model Types

Synthesize Bio provides several types of models for different use cases:

Baseline Models

Generate synthetic gene expression data from metadata alone. You describe the biological conditions (tissue type, disease state, perturbations, etc.) and the model generates realistic expression profiles.

gem-1-bulk: Bulk RNA-seq baseline model
gem-1-sc: Single-cell RNA-seq baseline model

See the Baseline Models vignette for detailed usage.

Reference Conditioning Models

Generate expression data conditioned on a real reference sample. This allows you to “anchor” to an existing expression profile while applying perturbations or modifications.

gem-1-bulk_reference-conditioning: Bulk RNA-seq reference conditioning model
gem-1-sc_reference-conditioning: Single-cell RNA-seq reference conditioning model

See the Reference Conditioning vignette for detailed usage.

Metadata Prediction Models

Infer metadata from observed expression data. Given a gene expression profile, predict the likely biological characteristics (cell type, tissue, disease state, etc.).

gem-1-bulk_predict-metadata: Bulk RNA-seq metadata prediction model
gem-1-sc_predict-metadata: Single-cell RNA-seq metadata prediction model

See the Metadata Prediction vignette for detailed usage. Only baseline models are available to all users. You can check which models are available programmatically, use list_models(). Contact us at support@synthesize.bio if you have any questions.

Listing Available Models

You can check which models are available programmatically:

# Check available models
list_models()

Exploring Available Metadata

Each model accepts a specific set of metadata fields with defined vocabularies (valid ontology IDs, cell lines, tissues, etc.). You can browse and download these vocabularies at app.synthesize.bio/docs/vocab. See the Available Metadata vignette for more details.

Quick Start

Here’s a quick example using a baseline model:

# Get an example query structure
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Submit the query and get results
result <- predict_query(query, model_id = "gem-1-bulk")

# Access the results
metadata <- result$metadata
expression <- result$expression

For more detailed examples and advanced usage, see the model-specific vignettes linked above.

​Authentication

​Available Model Types

​Baseline Models

​Reference Conditioning Models

​Metadata Prediction Models

​Listing Available Models

​Exploring Available Metadata

​Quick Start