Skip to main content
Sends a query to the Synthesize Bio API for prediction and retrieves gene expression samples. This function sends the query to the API and processes the response into usable data frames.

Usage

predict_query(
  query,
  model_id,
  api_base_url = NULL,
  poll_interval_seconds = DEFAULT_POLL_INTERVAL_SECONDS,
  poll_timeout_seconds = DEFAULT_POLL_TIMEOUT_SECONDS,
  return_download_url = FALSE,
  raw_response = FALSE,
  self_hosted = NULL,
  ...
)

Arguments

  • query: A list representing the query data to send to the API. Use get_example_query() to generate an example. The query supports additional optional fields: total_count (integer): Library size used when converting predicted log CPM back to raw counts. Higher values scale counts up proportionally. deterministic_latents (logical): If TRUE, the model uses the mean of each latent distribution instead of sampling, producing deterministic outputs for the same inputs. Useful for reproducibility. seed (integer): Random seed for reproducibility.
  • model_id: Character string specifying the model ID (e.g., “gem-1-bulk”, “gem-1-sc”). Use list_models() to see available models.
  • api_base_url: The base URL for the API server. When NULL (default), it is resolved in order from the per-model environment variable SYNTHESIZE_API_BASE_URL__<MODEL> (e.g. SYNTHESIZE_API_BASE_URL__GEM_1_BULK), then the global SYNTHESIZE_API_BASE_URL, then the production default (API_BASE_URL). The per-model variable lets you point each self-hosted model at its own container once and omit api_base_url on every call.
  • poll_interval_seconds: Seconds between polling attempts of the status endpoint. Default is DEFAULT_POLL_INTERVAL_SECONDS (2).
  • poll_timeout_seconds: Maximum total seconds to wait before timing out. Default is DEFAULT_POLL_TIMEOUT_SECONDS (900 = 15 minutes).
  • return_download_url: Logical, if TRUE, returns a list containing the signed download URL instead of parsing into data frames. Default is FALSE.
  • raw_response: Logical, if TRUE, returns the raw (unformatted) response from the API without applying any output transformers. For the production path this is the parsed JSON; for self_hosted = TRUE it is the parsed Arrow Table together with its schema metadata. Default is FALSE.
  • self_hosted: Logical, if TRUE, sends a single synchronous request to a self-hosted model container that returns predictions as an Apache Arrow IPC stream (no polling, no download URL). Requires the optional arrow package and an api_base_url pointing at the container. Unlike the production path, no API key is required (one is only sent if configured). When NULL (default), it is resolved from the SYNTHESIZE_SELF_HOSTED environment variable (truthy for 1/true/yes/on), defaulting to FALSE.
  • ...: Additional parameters to include in the query body. These are passed directly to the API and validated server-side.

Returns

A list. For the production path, if return_download_url is FALSE (default) the list contains metadata and expression data frames; if TRUE it contains download_url and empty data frames. For self_hosted = TRUE, the list contains the transformed data frames (metadata, expression, and latents; plus classifier_probs for metadata-prediction models) with model_version and request_type attached as attributes.

Examples

# Set your API key (in practice, use a more secure method)
# To start using rsynthbio, first you need to have an account with synthesize.bio.
# Go here to create one: https://app.synthesize.bio/

set_synthesize_token()

# Get available models
models <- list_models()

# Create a query for a specific model
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Request raw counts
result <- predict_query(query, model_id = "gem-1-bulk")

# Access the results
metadata <- result$metadata
expression <- result$expression

# Explore the top expressed genes in the first sample
head(sort(expression[1, ], decreasing = TRUE))

# Use deterministic latents for reproducible results
query$deterministic_latents <- TRUE
result_det <- predict_query(query, model_id = "gem-1-bulk")

# Specify a custom total count (library size)
query$total_count <- 5000000
result_custom <- predict_query(query, model_id = "gem-1-bulk")

# Self-hosted container returning a synchronous Apache Arrow IPC stream
result_sh <- predict_query(
  query,
  model_id = "gem-1-bulk",
  api_base_url = "https://gem-1-bulk.internal.partner.example",
  self_hosted = TRUE
)

Source

Generated from R/call_model_api.R and the package help files in man/.