Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.synthesize.bio/llms.txt

Use this file to discover all available pages before exploring further.

rsynthbio is an R package that provides a convenient interface to the Synthesize Bio API, allowing users to generate realistic gene expression data based on specified biological conditions. This package enables researchers to easily access AI-generated transcriptomic data for various modalities including bulk RNA-seq and single-cell RNA-seq. To generate datasets without code, use our web platform.

Authentication

Before using the Synthesize Bio API, you need to set up your API token. The package provides a secure way to handle authentication:
# Securely prompt for and store your API token
# The token will not be visible in the console
set_synthesize_token()

# You can also store the token in your system keyring for persistence
# across R sessions (requires the 'keyring' package)
set_synthesize_token(use_keyring = TRUE)
Loading your API key for a session.
# In future sessions, load the stored token
load_synthesize_token_from_keyring()

# Check if a token is already set
has_synthesize_token()
You can manually set the token, but don’t commit it to version control!
set_synthesize_token(token = "your-token-here")
You can obtain an API token by registering at Synthesize Bio.

Available Model Types

Synthesize Bio provides several types of models for different use cases:

Baseline Models

Generate synthetic gene expression data from metadata alone. You describe the biological conditions (tissue type, disease state, perturbations, etc.) and the model generates realistic expression profiles.
  • gem-1-bulk: Bulk RNA-seq baseline model
  • gem-1-sc: Single-cell RNA-seq baseline model
See the Baseline Models vignette for detailed usage.

Reference Conditioning Models

Generate expression data conditioned on a real reference sample. This allows you to “anchor” to an existing expression profile while applying perturbations or modifications.
  • gem-1-bulk_reference-conditioning: Bulk RNA-seq reference conditioning model
  • gem-1-sc_reference-conditioning: Single-cell RNA-seq reference conditioning model
See the Reference Conditioning vignette for detailed usage.

Metadata Prediction Models

Infer metadata from observed expression data. Given a gene expression profile, predict the likely biological characteristics (cell type, tissue, disease state, etc.).
  • gem-1-bulk_predict-metadata: Bulk RNA-seq metadata prediction model
  • gem-1-sc_predict-metadata: Single-cell RNA-seq metadata prediction model
See the Metadata Prediction vignette for detailed usage. Only baseline models are available to all users. You can check which models are available programmatically, use list_models(). Contact us at support@synthesize.bio if you have any questions.

Listing Available Models

You can check which models are available programmatically:
# Check available models
list_models()

Exploring Available Metadata

Each model accepts a specific set of metadata fields with defined vocabularies (valid ontology IDs, cell lines, tissues, etc.). You can browse and download these vocabularies at app.synthesize.bio/docs/vocab. See the Available Metadata vignette for more details.

Quick Start

Here’s a quick example using a baseline model:
# Get an example query structure
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Submit the query and get results
result <- predict_query(query, model_id = "gem-1-bulk")

# Access the results
metadata <- result$metadata
expression <- result$expression
For more detailed examples and advanced usage, see the model-specific vignettes linked above.