Perfect Diarization
Supported PlatformsPerfect diarization is currently supported for Zoom, Meet and Teams bots. It is not available for the Desktop Recording SDK.
Perfect diarization is a feature designed to address the problem of inaccurate speaker attribution in meeting transcripts. Meeting platforms can sometimes attribute words to the wrong speaker, especially when multiple people are talking at once. This feature ensures that each speaker's words are accurately identified, even when participants are talking over each other.
How It Works
Perfect diarization transcribes separate audio streams for each participant instead of using the combined audio stream for the entire meeting, significantly improving the accuracy of speaker attribution.
This feature is compatible with all AI transcription providers supported by Recall.ai and can be used for real-time transcription.
Usage
Real-time transcription
To configure perfect diarization in a Create Bot request, set the use_separate_streams_when_available
to true
in your recording_config.transcript.diarization
config:
curl --request POST \
--url https://us-west-2.recall.ai/api/v1/bot/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header "accept: application/json" \
--header "content-type: application/json" \
--data '
{
"recording_config": {
"transcript": {
"diarization": {
"use_separate_streams_when_available": true
},
"provider": {
...
}
}
}
}
'
Async transcription
To configure perfect diarization in a Create Asny Transcript request, set the use_separate_streams_when_available
to true
in your diarization
config:
curl --request POST \
--url https://us-east-1.recall.ai/api/v1/recording/id/create_transcript/ \
--header "Authorization: $RECALLAI_API_KEY" \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--data '
{
"provider": {...},
"diarization": {
"use_separate_streams_when_available": true
}
}
Cost Considerations
For real-time transcription with separate streams, we typically see ~1.8x the transcription credit usage compared to the normal transcription.
For async transcription with separate streams, we trim out sections of audio without speech to optimize your transcription costs. The result is that async transcription can use anywhere from 0.6x to 1.2x the transcription credit usage, with the average cost difference being 1x.
Updated 1 day ago