Perfect Diarization

📘
Supported Platforms
Perfect diarization is currently supported for Zoom, Meet and Teams bots. It is not available for the Desktop Recording SDK.

Perfect diarization is a feature designed to address the problem of inaccurate speaker attribution in meeting transcripts. Meeting platforms can sometimes attribute words to the wrong speaker, especially when multiple people are talking at once. This feature ensures that each speaker's words are accurately identified, even when participants are talking over each other.

How It Works

Perfect diarization transcribes separate audio streams for each participant instead of using the combined audio stream for the entire meeting, significantly improving the accuracy of speaker attribution.

This feature is compatible with all AI transcription providers supported by Recall.ai and can be used for real-time transcription.

Usage

Real-time transcription

To configure perfect diarization in a Create Bot request, set the use_separate_streams_when_available to true in your recording_config.transcript.diarization config:

curl --request POST \
     --url https://us-west-2.recall.ai/api/v1/bot/ \
     --header "Authorization: $RECALLAI_API_KEY" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     --data '
{
  "recording_config": {
    "transcript": {
      "diarization": {
        "use_separate_streams_when_available": true
      },
      "provider": {
        ...
      }
    }
  }
}
'

Async transcription

To configure perfect diarization in a Create Asny Transcript request, set the use_separate_streams_when_available to true in your diarization config:

curl --request POST \
     --url https://us-east-1.recall.ai/api/v1/recording/id/create_transcript/ \
     --header "Authorization: $RECALLAI_API_KEY" \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --data '
{
  "provider": {...},
  "diarization": {
    "use_separate_streams_when_available": true
  }
}

Cost Considerations

For real-time transcription with separate streams, we typically see ~1.8x the transcription credit usage compared to the normal transcription.

For async transcription with separate streams, we trim out sections of audio without speech to optimize your transcription costs. The result is that async transcription can use anywhere from 0.6x to 1.2x the transcription credit usage, with the average cost difference being 1x.