Turn detection

Build responsive voice applications by detecting when users finish speaking.

Benefits

Create natural conversational experiences with proper turn-taking
Reduce response latency in voice assistants and chatbots
Improve user experience with timely system responses
Enable more human-like interactions in voice applications

Use cases

Voice AI - Detect when to generate responses in conversational agents
Real-time translation - Deliver translations as soon as speakers complete thoughts
Dictation - Determine when users have finished speaking to finalize transcription

How it works

A turn, or utterance, is a continuous piece of speech from a single speaker, typically separated by pauses. In conversation systems, detecting the end of an utterance helps determine when it's appropriate for another speaker (or AI system) to respond.

Speechmatics offers two complementary approaches to detect when a speaker has finished their turn:

Silence-based detection - Identifies pauses between speech
Semantic detection - Analyzes linguistic context to identify natural endpoints

Silence-based detection

Detect natural pauses in speech by configuring the silence threshold in your transcription request.

Configuration

Add the end_of_utterance_silence_trigger parameter to your StartRecognition message:

{
  "type": "transcription",
  "transcription_config": {
    "conversation_config": {
      "end_of_utterance_silence_trigger": 0.5
    },
    "language": "en"
  }
}

The end_of_utterance_silence_trigger parameter specifies the silence duration (0-2s) that triggers end of utterance detection.

INFO

Setting end_of_utterance_silence_trigger to 0 disables detection.

Recommended settings

Voice AI applications: 0.5-0.8 seconds
Dictation applications: 0.8-1.2 seconds

Response format

When an end of utterance is detected, you'll receive:

A Final transcript message
An EndOfUtterance message

{
  "message": "EndOfUtterance",
  "format": "2.9",
  "metadata": {
    "start_time": 1.07,
    "end_time": 1.07
  }
}

TIP

Keep end_of_utterance_silence_trigger lower than the max_delay value
Messages are only sent after speech is recognized
Duplicate messages are never sent for the same silence period
Messages don't contain speaker information from diarization

Semantic end of turn

For more natural conversations, combine silence detection with linguistic context analysis. This approach understands when a speaker has completed their thought based on the content of their speech.

Semantic end of turn detection is available through our Flow service, which combines multiple signals for optimal turn detection:

Silence duration
Linguistic completeness
Question detection
Prosodic features

Try semantic end of turn detection with our free Flow service demo or read our implementation guide.

Code examples

Real-time streaming from microphone - ideal for voice AI applications.

import speechmatics
import pyaudio
import threading
import time
import asyncio

API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"

# Audio recording parameters
SAMPLE_RATE = 16000
CHUNK_SIZE = 1024
FORMAT = pyaudio.paFloat32


class AudioProcessor:
    def __init__(self):
        self.wave_data = bytearray()
        self.read_offset = 0

    async def read(self, chunk_size):
        while self.read_offset + chunk_size > len(self.wave_data):
            await asyncio.sleep(0.001)

        new_offset = self.read_offset + chunk_size
        data = self.wave_data[self.read_offset : new_offset]
        self.read_offset = new_offset
        return data

    def write_audio(self, data):
        self.wave_data.extend(data)


class VoiceAITranscriber:
    def __init__(self):
        self.ws = speechmatics.client.WebsocketClient(
            speechmatics.models.ConnectionSettings(
                url=CONNECTION_URL,
                auth_token=API_KEY,
            )
        )
        self.audio = pyaudio.PyAudio()
        self.stream = None
        self.is_recording = False
        self.audio_processor = AudioProcessor()

        # Set up event handlers
        self.ws.add_event_handler(
            event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
            event_handler=self.handle_partial_transcript,
        )

        self.ws.add_event_handler(
            event_name=speechmatics.models.ServerMessageType.AddTranscript,
            event_handler=self.handle_final_transcript,
        )

        self.ws.add_event_handler(
            event_name=speechmatics.models.ServerMessageType.EndOfUtterance,
            event_handler=self.handle_end_of_utterance,
        )

    def handle_partial_transcript(self, msg):
        transcript = msg["metadata"]["transcript"]
        print(f"[Listening...] {transcript}")

    def handle_final_transcript(self, msg):
        transcript = msg["metadata"]["transcript"]
        print(f"[Complete] {transcript}")

    def handle_end_of_utterance(self, msg):
        print("🔚 End of utterance detected - ready for AI response!")
        # This is where your voice AI would process the complete utterance
        # and generate a response

    def stream_callback(self, in_data, frame_count, time_info, status):
        self.audio_processor.write_audio(in_data)
        return in_data, pyaudio.paContinue

    def start_streaming(self):
        try:
            # Set up pyaudio stream with callback
            self.stream = self.audio.open(
                format=FORMAT,
                channels=1,
                rate=SAMPLE_RATE,
                input=True,
                frames_per_buffer=CHUNK_SIZE,
                stream_callback=self.stream_callback,
            )

            # Configure audio settings
            settings = speechmatics.models.AudioSettings()
            settings.encoding = "pcm_f32le"
            settings.sample_rate = SAMPLE_RATE
            settings.chunk_size = CHUNK_SIZE

            # Configure transcription with end-of-utterance detection

            conversation_config = speechmatics.models.ConversationConfig(
                end_of_utterance_silence_trigger=0.75
            )  # Adjust as needed

            conf = speechmatics.models.TranscriptionConfig(
                operating_point="enhanced",
                language=LANGUAGE,
                enable_partials=True,
                max_delay=1,
                conversation_config=conversation_config,
            )

            print("🎤 Voice AI ready - start speaking!")
            print("Press Ctrl+C to stop...")

            # Start transcription using the working approach
            self.ws.run_synchronously(
                transcription_config=conf,
                stream=self.audio_processor,
                audio_settings=settings,
            )

        except KeyboardInterrupt:
            print("\n🛑 Stopping voice AI transcriber...")
        except Exception as e:
            print(f"Error in transcription: {e}")
        finally:
            self.stop_streaming()

    def stop_streaming(self):
        self.is_recording = False
        if self.stream:
            self.stream.stop_stream()
            self.stream.close()
        self.audio.terminate()


# Usage
if __name__ == "__main__":
    transcriber = VoiceAITranscriber()
    transcriber.start_streaming()

Copy in your API key and file name to get started.

import speechmatics

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"
CONNECTION_URL = "wss://eu2.rt.speechmatics.com/v2"

# Create a transcription client
ws = speechmatics.client.WebsocketClient(
    speechmatics.models.ConnectionSettings(
        url=CONNECTION_URL,
        auth_token=API_KEY,
    )
)


# Define an event handler to print the partial transcript
def print_partial_transcript(msg):
    print(f"[partial] {msg['metadata']['transcript']}")


# Define an event handler to print the full transcript
def print_transcript(msg):
    print(f"[   FULL] {msg['metadata']['transcript']}")


# Define an event handler for the end-of-utterance event
def print_eou(msg):
    print("EndOfUtterance")


# Register the event handler for partial transcript
ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
    event_handler=print_partial_transcript,
)

# Register the event handler for full transcript
ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.AddTranscript,
    event_handler=print_transcript,
)

# Register the event handler for end of utterance
ws.add_event_handler(
    event_name=speechmatics.models.ServerMessageType.EndOfUtterance,
    event_handler=print_eou,
)

settings = speechmatics.models.AudioSettings()

# Define transcription parameters
# Full list of parameters described here: https://speechmatics.github.io/speechmatics-python/models

conversation_config = speechmatics.models.ConversationConfig(
    end_of_utterance_silence_trigger=0.75
)  # Adjust as needed

conf = speechmatics.models.TranscriptionConfig(
    operating_point="enhanced",
    language=LANGUAGE,
    enable_partials=True,
    max_delay=1,
    conversation_config=conversation_config,
)

print("Starting transcription (type Ctrl-C to stop):")
with open(PATH_TO_FILE, "rb") as fd:
    try:
        ws.run_synchronously(fd, conf, settings)
    except KeyboardInterrupt:
        print("\nTranscription stopped.")

Benefits​

Use cases​

How it works​

Silence-based detection​

Configuration​

Recommended settings​

Response format​

Semantic end of turn​

Code examples​