> ## Documentation Index
> Fetch the complete documentation index at: https://docs.synthesize.bio/llms.txt
> Use this file to discover all available pages before exploring further.

# predict_query

> Predict Gene Expression.

Sends a query to the Synthesize Bio API for prediction
and retrieves gene expression samples. This function sends the query
to the API and processes the response into usable data frames.

## Usage

```r theme={null}
predict_query(
  query,
  model_id,
  api_base_url = NULL,
  poll_interval_seconds = DEFAULT_POLL_INTERVAL_SECONDS,
  poll_timeout_seconds = DEFAULT_POLL_TIMEOUT_SECONDS,
  return_download_url = FALSE,
  raw_response = FALSE,
  self_hosted = NULL,
  ...
)
```

## Arguments

* **`query`**: A list representing the query data to send to the API.
  Use `get_example_query()` to generate an example. The query supports additional
  optional fields:
  `total_count` (integer): Library size used when converting predicted log CPM
  back to raw counts. Higher values scale counts up proportionally.
  `deterministic_latents` (logical): If TRUE, the model uses the mean of each
  latent distribution instead of sampling, producing deterministic outputs for
  the same inputs. Useful for reproducibility.
  `seed` (integer): Random seed for reproducibility.
* **`model_id`**: Character string specifying the model ID (e.g., "gem-1-bulk", "gem-1-sc").
  Use `list_models()` to see available models.
* **`api_base_url`**: The base URL for the API server. When NULL (default), it
  is resolved in order from the per-model environment variable
  `SYNTHESIZE_API_BASE_URL__<MODEL>` (e.g.
  `SYNTHESIZE_API_BASE_URL__GEM_1_BULK`), then the global
  `SYNTHESIZE_API_BASE_URL`, then the production default (API\_BASE\_URL).
  The per-model variable lets you point each self-hosted model at its own
  container once and omit `api_base_url` on every call.
* **`poll_interval_seconds`**: Seconds between polling attempts of the status endpoint.
  Default is DEFAULT\_POLL\_INTERVAL\_SECONDS (2).
* **`poll_timeout_seconds`**: Maximum total seconds to wait before timing out.
  Default is DEFAULT\_POLL\_TIMEOUT\_SECONDS (900 = 15 minutes).
* **`return_download_url`**: Logical, if TRUE, returns a list containing the signed
  download URL instead of parsing into data frames. Default is FALSE.
* **`raw_response`**: Logical, if TRUE, returns the raw (unformatted) response
  from the API without applying any output transformers. For the
  production path this is the parsed JSON; for `self_hosted = TRUE` it is
  the parsed Arrow `Table` together with its schema metadata. Default is FALSE.
* **`self_hosted`**: Logical, if TRUE, sends a single synchronous request to a
  self-hosted model container that returns predictions as an Apache Arrow
  IPC stream (no polling, no download URL). Requires the optional `arrow`
  package and an `api_base_url` pointing at the container. Unlike the
  production path, no API key is required (one is only sent if configured).
  When NULL (default), it is resolved from the `SYNTHESIZE_SELF_HOSTED`
  environment variable (truthy for 1/true/yes/on), defaulting to FALSE.
* **`...`**: Additional parameters to include in the query body. These are passed
  directly to the API and validated server-side.

## Returns

A list. For the production path, if `return_download_url` is `FALSE`
(default) the list contains `metadata` and `expression` data frames;
if `TRUE` it contains `download_url` and empty data frames. For
`self_hosted = TRUE`, the list contains the transformed data frames
(`metadata`, `expression`, and `latents`; plus `classifier_probs` for
metadata-prediction models) with `model_version` and `request_type`
attached as attributes.

## Examples

```r theme={null}
# Set your API key (in practice, use a more secure method)
# To start using rsynthbio, first you need to have an account with synthesize.bio.
# Go here to create one: https://app.synthesize.bio/

set_synthesize_token()

# Get available models
models <- list_models()

# Create a query for a specific model
query <- get_example_query(model_id = "gem-1-bulk")$example_query

# Request raw counts
result <- predict_query(query, model_id = "gem-1-bulk")

# Access the results
metadata <- result$metadata
expression <- result$expression

# Explore the top expressed genes in the first sample
head(sort(expression[1, ], decreasing = TRUE))

# Use deterministic latents for reproducible results
query$deterministic_latents <- TRUE
result_det <- predict_query(query, model_id = "gem-1-bulk")

# Specify a custom total count (library size)
query$total_count <- 5000000
result_custom <- predict_query(query, model_id = "gem-1-bulk")

# Self-hosted container returning a synchronous Apache Arrow IPC stream
result_sh <- predict_query(
  query,
  model_id = "gem-1-bulk",
  api_base_url = "https://gem-1-bulk.internal.partner.example",
  self_hosted = TRUE
)
```

## Source

Generated from [`R/call_model_api.R`](https://github.com/synthesizebio/rsynthbio/blob/main/R/call_model_api.R) and the package help files in `man/`.
