Skip to main content

Turn detection

Build responsive voice applications by detecting when users finish speaking.

Benefits

  • Create natural conversational experiences with proper turn-taking
  • Reduce response latency in voice assistants and chatbots
  • Improve user experience with timely system responses
  • Enable more human-like interactions in voice applications

Use cases

  • Voice AI - Detect when to generate responses in conversational agents
  • Real-time translation - Deliver translations as soon as speakers complete thoughts
  • Dictation - Determine when users have finished speaking to finalize transcription

How it works

A turn, or utterance, is a continuous piece of speech from a single speaker, typically separated by pauses. In conversation systems, detecting the end of an utterance helps determine when it's appropriate for another speaker (or AI system) to respond.

Speechmatics offers two complementary approaches to detect when a speaker has finished their turn:

  1. Silence-based detection - Identifies pauses between speech
  2. Semantic detection - Analyzes linguistic context to identify natural endpoints

Silence-based detection

Detect natural pauses in speech by configuring the silence threshold in your transcription request.

Configuration

Add the end_of_utterance_silence_trigger parameter to your StartRecognition message:

{
"type": "transcription",
"transcription_config": {
"conversation_config": {
"end_of_utterance_silence_trigger": 0.5
},
"language": "en"
}
}

The end_of_utterance_silence_trigger parameter specifies the silence duration (0-2s) that triggers end of utterance detection.

INFO

Setting end_of_utterance_silence_trigger to 0 disables detection.

  • Voice AI applications: 0.5-0.8 seconds
  • Dictation applications: 0.8-1.2 seconds

Response format

When an end of utterance is detected, you'll receive:

  1. A Final transcript message
  2. An EndOfUtterance message
{
"message": "EndOfUtterance",
"format": "2.9",
"metadata": {
"start_time": 1.07,
"end_time": 1.07
}
}
TIP
  • Keep end_of_utterance_silence_trigger lower than the max_delay value
  • Messages are only sent after speech is recognized
  • Duplicate messages are never sent for the same silence period
  • Messages don't contain speaker information from diarization

Semantic end of turn

For more natural conversations, combine silence detection with linguistic context analysis. This approach understands when a speaker has completed their thought based on the content of their speech.

Semantic end of turn detection is available through our Flow service, which combines multiple signals for optimal turn detection:

  • Silence duration
  • Linguistic completeness
  • Question detection
  • Prosodic features

Try semantic end of turn detection with our free Flow service demo or read our implementation guide.

Code examples

Real-time streaming from microphone - ideal for voice AI applications.

import speechmatics
import pyaudio
import threading
import time
import asyncio

API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
CONNECTION_URL = f"wss://eu2.rt.speechmatics.com/v2"

# Audio recording parameters
SAMPLE_RATE = 16000
CHUNK_SIZE = 1024
FORMAT = pyaudio.paFloat32


class AudioProcessor:
def __init__(self):
self.wave_data = bytearray()
self.read_offset = 0

async def read(self, chunk_size):
while self.read_offset + chunk_size > len(self.wave_data):
await asyncio.sleep(0.001)

new_offset = self.read_offset + chunk_size
data = self.wave_data[self.read_offset : new_offset]
self.read_offset = new_offset
return data

def write_audio(self, data):
self.wave_data.extend(data)


class VoiceAITranscriber:
def __init__(self):
self.ws = speechmatics.client.WebsocketClient(
speechmatics.models.ConnectionSettings(
url=CONNECTION_URL,
auth_token=API_KEY,
)
)
self.audio = pyaudio.PyAudio()
self.stream = None
self.is_recording = False
self.audio_processor = AudioProcessor()

# Set up event handlers
self.ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddPartialTranscript,
event_handler=self.handle_partial_transcript,
)

self.ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.AddTranscript,
event_handler=self.handle_final_transcript,
)

self.ws.add_event_handler(
event_name=speechmatics.models.ServerMessageType.EndOfUtterance,
event_handler=self.handle_end_of_utterance,
)

def handle_partial_transcript(self, msg):
transcript = msg["metadata"]["transcript"]
print(f"[Listening...] {transcript}")

def handle_final_transcript(self, msg):
transcript = msg["metadata"]["transcript"]
print(f"[Complete] {transcript}")

def handle_end_of_utterance(self, msg):
print("🔚 End of utterance detected - ready for AI response!")
# This is where your voice AI would process the complete utterance
# and generate a response

def stream_callback(self, in_data, frame_count, time_info, status):
self.audio_processor.write_audio(in_data)
return in_data, pyaudio.paContinue

def start_streaming(self):
try:
# Set up pyaudio stream with callback
self.stream = self.audio.open(
format=FORMAT,
channels=1,
rate=SAMPLE_RATE,
input=True,
frames_per_buffer=CHUNK_SIZE,
stream_callback=self.stream_callback,
)

# Configure audio settings
settings = speechmatics.models.AudioSettings()
settings.encoding = "pcm_f32le"
settings.sample_rate = SAMPLE_RATE
settings.chunk_size = CHUNK_SIZE

# Configure transcription with end-of-utterance detection

conversation_config = speechmatics.models.ConversationConfig(
end_of_utterance_silence_trigger=0.75
) # Adjust as needed

conf = speechmatics.models.TranscriptionConfig(
operating_point="enhanced",
language=LANGUAGE,
enable_partials=True,
max_delay=1,
conversation_config=conversation_config,
)

print("🎤 Voice AI ready - start speaking!")
print("Press Ctrl+C to stop...")

# Start transcription using the working approach
self.ws.run_synchronously(
transcription_config=conf,
stream=self.audio_processor,
audio_settings=settings,
)

except KeyboardInterrupt:
print("\n🛑 Stopping voice AI transcriber...")
except Exception as e:
print(f"Error in transcription: {e}")
finally:
self.stop_streaming()

def stop_streaming(self):
self.is_recording = False
if self.stream:
self.stream.stop_stream()
self.stream.close()
self.audio.terminate()


# Usage
if __name__ == "__main__":
transcriber = VoiceAITranscriber()
transcriber.start_streaming()