EA202091595A1 - METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER - Google Patents
METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCERInfo
- Publication number
- EA202091595A1 EA202091595A1 EA202091595A EA202091595A EA202091595A1 EA 202091595 A1 EA202091595 A1 EA 202091595A1 EA 202091595 A EA202091595 A EA 202091595A EA 202091595 A EA202091595 A EA 202091595A EA 202091595 A1 EA202091595 A1 EA 202091595A1
- Authority
- EA
- Eurasian Patent Office
- Prior art keywords
- voice
- voice model
- speakers
- target
- telephone conversations
- Prior art date
Links
- 230000011218 segmentation Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/75—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Telephonic Communication Services (AREA)
Abstract
Изобретение относится к области голосовой биометрии, в частности к задаче автоматической оценки голосовых моделей дикторов по записям их телефонных переговоров с автоматической привязкой голосовой модели диктора к номеру телефона. Предложен способ получения голосовой модели целевого диктора, согласно которому осуществляют сегментацию по голосам дикторов по меньшей мере двух фонограмм телефонных переговоров с получением сегментов речи; строят голосовые модели дикторов по полученным сегментам речи; осуществляют кластеризацию построенных голосовых моделей дикторов с использованием метаданных телефонных переговоров с получением кластеров; определяют связи между кластерами на основании фонограмм телефонных переговоров и выделяют кластер с наибольшим количеством связей как кластер целевого диктора. Также предложено устройство для получения голосовой модели целевого диктора.The invention relates to the field of voice biometrics, in particular to the problem of automatic assessment of voice models of speakers based on the recordings of their telephone conversations with automatic binding of the voice model of the speaker to a telephone number. A method for obtaining a voice model of a target speaker is proposed, according to which segmentation is carried out according to the voices of the speakers of at least two phonograms of telephone conversations to obtain speech segments; build voice models of speakers based on the received speech segments; clustering the constructed voice models of speakers using the metadata of telephone conversations to obtain clusters; determine connections between clusters on the basis of phonograms of telephone conversations and select the cluster with the greatest number of connections as a target speaker's cluster. Also proposed is a device for obtaining a target speaker's voice model.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/RU2017/000990 WO2019132690A1 (en) | 2017-12-27 | 2017-12-27 | Method and device for building voice model of target speaker |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EA202091595A1 true EA202091595A1 (en) | 2020-09-18 |
Family
ID=67067964
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EA202091595A EA202091595A1 (en) | 2017-12-27 | 2017-12-27 | METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER |
Country Status (3)
| Country | Link |
|---|---|
| KR (1) | KR20200140235A (en) |
| EA (1) | EA202091595A1 (en) |
| WO (1) | WO2019132690A1 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111865926A (en) * | 2020-06-24 | 2020-10-30 | 深圳壹账通智能科技有限公司 | Call channel construction method and device based on double models and computer equipment |
| CN111785291B (en) * | 2020-07-02 | 2024-07-02 | 北京捷通华声科技股份有限公司 | Voice separation method and voice separation device |
| US11790921B2 (en) * | 2020-08-04 | 2023-10-17 | OTO Systems Inc. | Speaker separation based on real-time latent speaker state characterization |
| CN112750440B (en) * | 2020-12-30 | 2023-12-29 | 北京捷通华声科技股份有限公司 | Information processing method and device |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP3745403B2 (en) * | 1994-04-12 | 2006-02-15 | ゼロックス コーポレイション | Audio data segment clustering method |
| US9043207B2 (en) * | 2009-11-12 | 2015-05-26 | Agnitio S.L. | Speaker recognition from telephone calls |
| RU2530314C1 (en) * | 2013-04-23 | 2014-10-10 | Общество с ограниченной ответственностью "ЦРТ-инновации" | Method for hybrid generative-discriminative segmentation of speakers in audio-flow |
-
2017
- 2017-12-27 WO PCT/RU2017/000990 patent/WO2019132690A1/en not_active Ceased
- 2017-12-27 EA EA202091595A patent/EA202091595A1/en unknown
- 2017-12-27 KR KR1020207021848A patent/KR20200140235A/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| KR20200140235A (en) | 2020-12-15 |
| WO2019132690A1 (en) | 2019-07-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI643184B (en) | Method and apparatus for speaker diarization | |
| EA202091595A1 (en) | METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER | |
| JP6954680B2 (en) | Speaker confirmation method and speaker confirmation device | |
| Bee | Treefrogs as animal models for research on auditory scene analysis and the cocktail party problem | |
| GB2566215A (en) | Voice user interface | |
| CN104036774B (en) | Tibetan dialect recognition methods and system | |
| EP2806425A3 (en) | System and method for speaker verification | |
| DE602004023134D1 (en) | LANGUAGE RECOGNITION AND SYSTEM ADAPTED TO THE CHARACTERISTICS OF NON-NUT SPEAKERS | |
| EP4235646A3 (en) | Adaptive audio enhancement for multichannel speech recognition | |
| EP4084000A3 (en) | Neural networks for speaker verification | |
| WO2016126768A3 (en) | Conference word cloud | |
| Sun et al. | Speaker diarization system for RT07 and RT09 meeting room audio | |
| WO2021074721A3 (en) | System for automatic assessment of fluency in spoken language and a method thereof | |
| WO2012129255A3 (en) | Systems and methods for segmenting and/or classifying an audio signal from transformed audio information | |
| GB201105415D0 (en) | A speech processing system and method | |
| WO2014025682A3 (en) | Acoustic data selection for training the parameters of an acoustic model | |
| EP2963643A3 (en) | Entity name recognition | |
| EP4425488A3 (en) | Acoustic model training using corrected terms | |
| CN103730112B (en) | Multi-channel voice simulation and acquisition method | |
| WO2014115115A3 (en) | Determining apnea-hypopnia index ahi from speech | |
| US11528571B1 (en) | Microphone occlusion detection | |
| RU2015141805A (en) | SIMULATION OF ACOUSTIC PULSE RESPONSE | |
| CN102831890A (en) | Method for recognizing text-independent voice prints | |
| JP6480124B2 (en) | Biological detection device, biological detection method, and program | |
| CN113921026B (en) | Voice enhancement method and device |