Cartesia - Pipecat

Overview

Cartesia provides two STT service implementations:

CartesiaSTTService for real-time speech recognition using Cartesia’s WebSocket API with the ink-whisper model, supporting streaming transcription with both interim and final results for low-latency applications
CartesiaTurnsSTTService for turn-based speech recognition using Cartesia’s v2 WebSocket API with the ink-2 model, where the server drives turn boundaries and pushes structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion

Cartesia STT API Reference

Pipecat’s API methods for Cartesia STT integration

Cartesia Turns STT API Reference

Pipecat’s API methods for Cartesia Turns STT integration

Standard STT Example

Complete example with transcription logging

Turns STT Example

Complete example with turn-based transcription

Cartesia Documentation

Official Cartesia STT documentation and features

Cartesia Platform

Access API keys and transcription models

Installation

To use Cartesia services, install the required dependency:

uv add "pipecat-ai[cartesia]"

Prerequisites

Cartesia Account Setup

Before using Cartesia STT services, you need:

Cartesia Account: Sign up at Cartesia
API Key: Generate an API key from your account dashboard
Model Access: Ensure access to the transcription model you plan to use (ink-whisper for CartesiaSTTService, ink-2 for CartesiaTurnsSTTService)

Required Environment Variables

CARTESIA_API_KEY: Your Cartesia API key for authentication

CartesiaSTTService

api_key

str

required

Cartesia API key for authentication.

base_url

str

default:""

Custom API endpoint URL. If empty, defaults to "api.cartesia.ai". Override for proxied deployments.

encoding

str

default:"pcm_s16le"

Audio encoding format.

sample_rate

int

default:"None"

Audio sample rate in Hz.

live_options

CartesiaLiveOptions | None

default:"None"

deprecated

Configuration options for the transcription service. Deprecated in v0.0.105. Use settings=CartesiaSTTService.Settings(...) for model/language and direct init parameters for encoding/sample_rate instead.

settings

CartesiaSTTService.Settings

default:"None"

Runtime-configurable settings for the STT service. See Settings below.

ttfs_p99_latency

float

default:"CARTESIA_TTFS_P99"

P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using CartesiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame, which triggers an automatic reconnection with the new parameters. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`"ink-whisper"`	The transcription model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`"en"`	Target language for transcription. (Inherited from base STT settings.)

Usage

Basic Setup

from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)

With Custom Options

from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    settings=CartesiaSTTService.Settings(
        model="ink-whisper",
        language="es",
    ),
    sample_rate=16000,
)

Notes

Inactivity timeout: Cartesia disconnects WebSocket connections after 3 minutes of inactivity. The timeout resets with each message sent. Silence-based keepalive is enabled by default to prevent disconnections.
Auto-reconnect on send: If the connection is closed (e.g., due to timeout), the service automatically reconnects when the next audio data is sent.
Runtime settings updates: Changing settings (e.g., language or model) via STTUpdateSettingsFrame triggers a reconnection with the new parameters. To avoid audio loss, reconnection is deferred until the current user turn ends (i.e., until UserStoppedSpeakingFrame is received). Audio frames arriving during the reconnect are buffered and replayed once the new connection is ready. This enables safe dynamic language switching mid-conversation.
Finalize on VAD stop: When the pipeline’s VAD detects the user has stopped speaking, the service sends a "finalize" command to flush the transcription session and produce a final result.

The InputParams / params= / live_options= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Event Handlers

Cartesia STT supports the standard service connection events:

Event	Description
`on_connected`	Connected to Cartesia WebSocket
`on_disconnected`	Disconnected from Cartesia WebSocket

@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to Cartesia STT")

CartesiaTurnsSTTService

The server drives turn boundaries with the ink-2 model, pushing structured events for turn lifecycle management including start, updates, eager end predictions, resume, and final turn completion.

api_key

str

required

Cartesia API key for authentication.

url

str

default:"wss://api.cartesia.ai/stt/turns/websocket"

WebSocket URL for the Cartesia Streaming ASR v2 endpoint.

sample_rate

int | None

default:"None"

Audio sample rate in Hz. If None, uses the pipeline sample rate.

should_interrupt

bool

default:"True"

Whether to broadcast an interruption when the server signals the start of a new turn.

watchdog_min_timeout

float

default:"0.5"

Minimum idle timeout (in seconds) before sending silence to prevent dangling turns. The actual threshold is max(chunk_duration * 2, watchdog_min_timeout).

extra_headers

dict[str, str] | None

default:"None"

Optional additional HTTP headers to send with the WebSocket handshake.

settings

CartesiaTurnsSTTService.Settings

default:"None"

Runtime-updatable settings. See Settings below.

Settings

Runtime-configurable settings passed via the settings constructor argument using CartesiaTurnsSTTService.Settings(...). The ink-2 model family is English-only and does not support runtime model or language switching. Attempts to update these fields will be reported as unhandled.

Parameter	Type	Default	Description
`model`	`str`	`"ink-2"`	The transcription model to use. (Inherited from base STT settings.)
`language`	`Language \| str`	`None`	Target language (fixed to English). (Inherited from base STT settings.)

Usage

Basic Setup

from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService

stt = CartesiaTurnsSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)

With Custom Configuration

from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService

stt = CartesiaTurnsSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
    sample_rate=16000,
    should_interrupt=True,
    watchdog_min_timeout=1.0,
)

With Event Handlers

from pipecat.services.cartesia.turns.stt import CartesiaTurnsSTTService

stt = CartesiaTurnsSTTService(
    api_key=os.getenv("CARTESIA_API_KEY"),
)

@stt.event_handler("on_turn_start")
async def on_turn_start(service, transcript):
    print(f"User started speaking: {transcript}")

@stt.event_handler("on_turn_end")
async def on_turn_end(service, transcript):
    print(f"Final transcript: {transcript}")

Turn-Based Protocol

The service speaks the v2 turn-based wire protocol:

connected → turn.start → turn.update* → (turn.eager_end → turn.resume?)* → turn.end → ...

turn.start: Server detected the start of a turn. Pushes UserStartedSpeakingFrame and optionally broadcasts an interruption.
turn.update: Incremental transcript update. Pushes InterimTranscriptionFrame.
turn.eager_end: Server eagerly predicted the end of turn. Available via event handler for speculative downstream processing.
turn.resume: User resumed speaking after an eager end. Available via event handler.
turn.end: Final transcript for the completed turn. Pushes TranscriptionFrame and UserStoppedSpeakingFrame.

Transcripts are cumulative per turn. There is no is_final flag and no finalize command — closing the socket ends the session.

Notes

English-only: The ink-2 model family supports English transcription only at launch.
No runtime model switching: Unlike the v1 API, the ink-2 model does not support runtime model or language switching.
Watchdog for dangling turns: If audio stops flowing after a turn.start, the service sends silence to prevent the turn from hanging indefinitely. Configure the threshold with watchdog_min_timeout.
Server-driven turns: The server controls turn boundaries. There is no client-side finalize command.
Interruption support: Set should_interrupt=True to broadcast interruptions when the user starts speaking, enabling natural turn-taking.

Event Handlers

Cartesia Turns STT supports the following event handlers:

Event	Handler Signature	Description
`on_connected`	`async def(service)`	Connected to Cartesia WebSocket
`on_disconnected`	`async def(service)`	Disconnected from Cartesia WebSocket
`on_connection_error`	`async def(service, error_msg)`	Connection error occurred
`on_turn_start`	`async def(service, transcript: str)`	Server detected start of a turn
`on_turn_update`	`async def(service, transcript: str)`	Incremental transcript update
`on_turn_eager_end`	`async def(service, transcript: str)`	Server eagerly predicted end of turn
`on_turn_resume`	`async def(service)`	User resumed speaking after an eager end
`on_turn_end`	`async def(service, transcript: str)`	Final transcript for the completed turn

Example:

@stt.event_handler("on_turn_eager_end")
async def on_turn_eager_end(service, transcript):
    print(f"Eager end prediction: {transcript}")
    # Optionally start processing speculatively

@stt.event_handler("on_turn_resume")
async def on_turn_resume(service):
    print("User resumed speaking, discard speculative processing")

Documentation Index

​Overview

Cartesia STT API Reference

Cartesia Turns STT API Reference

Standard STT Example

Turns STT Example

Cartesia Documentation

Cartesia Platform

​Installation

​Prerequisites

​Cartesia Account Setup

​Required Environment Variables

​CartesiaSTTService

​Settings

​Usage

​Basic Setup

​With Custom Options

​Notes

​Event Handlers

​CartesiaTurnsSTTService

​Settings

​Usage

​Basic Setup

​With Custom Configuration

​With Event Handlers

​Turn-Based Protocol

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Cartesia Account Setup

Required Environment Variables

CartesiaSTTService

Settings

Usage

Basic Setup

With Custom Options

Notes

Event Handlers

CartesiaTurnsSTTService

Settings

Usage

Basic Setup

With Custom Configuration

With Event Handlers

Turn-Based Protocol

Notes

Event Handlers