[go: up one dir, main page]

CN115376516B - Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs - Google Patents

Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs

Info

Publication number
CN115376516B
CN115376516B CN202110514062.4A CN202110514062A CN115376516B CN 115376516 B CN115376516 B CN 115376516B CN 202110514062 A CN202110514062 A CN 202110514062A CN 115376516 B CN115376516 B CN 115376516B
Authority
CN
China
Prior art keywords
voiceprint
twin
pair
speaker
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110514062.4A
Other languages
Chinese (zh)
Other versions
CN115376516A (en
Inventor
于乐
张峰
李祥军
张弘扬
马禹昇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110514062.4A priority Critical patent/CN115376516B/en
Publication of CN115376516A publication Critical patent/CN115376516A/en
Application granted granted Critical
Publication of CN115376516B publication Critical patent/CN115376516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of voice recognition and discloses a voice recognition method, a device, equipment and a storage medium based on twin voice pairs, wherein the method comprises the steps of extracting a voice feature set to be recognized of voice to be recognized, and then determining a voice target group to be matched, which corresponds to the voice to be recognized and comprises a plurality of voice clusters; and then, carrying out twin voiceprint pair matching on the voiceprint feature set to be identified according to the voiceprint cluster, finally, determining the overall coverage information of the voiceprint feature set to be identified of the voiceprint target group to be matched according to a matching result, and judging that the voice to be identified belongs to the voiceprint target group to be matched when the overall coverage information meets a preset condition.

Description

Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a voiceprint recognition method, apparatus, device, and storage medium based on a twin voiceprint pair.
Background
Voiceprint recognition (also known as speaker recognition) enables a machine to automatically recognize the identity of a speaker from speech. Each person has unique voiceprint, on one hand, the acoustic organs of each person are different in shape and size, so that the sound can be different in pitch, timbre and the like, and on the other hand, each person has own unique speaking habit, and words, rhythm, pronunciation modes and the like can be different in the speaking process. This uniqueness of the voiceprint indicates that it is feasible to identify the identity of the speaker by speech.
Voiceprint technology is widely used in the judicial and reconnaissance fields, and recording is very critical and unique evidence in many major cases such as luxury kidnapping, terrorist threats and the like. The collected voice and the criminal suspects voice are compared through the voiceprint recognition technology, so that a relatively objective identity judgment result can be obtained, and the relatively objective identity judgment result is used as one of auxiliary bases of judicial judgment.
With the development of telecommunication network technology in China, many crime partners rapidly develop and spread in recent years by means of communication tools such as mobile phones, fixed telephones, networks and the like and non-contact fraud implemented by modern technology and the like, and great losses are caused to people. At present, telecommunication fraud presents a trend of partner crimes, a plurality of criminals divide work definitely and respectively have responsibilities, and cooperate according to the arranged 'script', so that operators and supervision departments bring great difficulty.
The prior art is only designed and developed for a single speaker, and can only identify one by one if the method is applied to the condition of crime party, so that the processing is time-consuming and labor-consuming, the efficiency is low, and the accuracy is not high.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The invention mainly aims to provide a voiceprint recognition method, device, equipment and storage medium based on a twin voiceprint pair, and aims to solve the technical problems that the existing voice recognition technology is basically only used for recognizing and detecting an individual, and is time-consuming, labor-consuming and low in efficiency.
In order to achieve the above object, the present invention provides a voiceprint recognition method based on a twin voiceprint pair, the method comprising the steps of:
Extracting a voiceprint feature set to be recognized of the voice to be recognized;
determining a voiceprint target group to be matched corresponding to the voice to be recognized, wherein the voiceprint target group to be matched comprises a plurality of voiceprint clusters;
Twin voiceprint pair matching is carried out on the voiceprint feature set to be identified according to the voiceprint cluster, and overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be identified is determined according to a matching result;
And when the overall coverage information meets a preset condition, judging that the voice to be recognized belongs to the voiceprint target group to be matched.
Preferably, the step of performing twin voiceprint pair matching on the voiceprint feature set to be identified according to the voiceprint cluster, and determining overall coverage information of the voiceprint target group to be matched on the voiceprint feature set to be identified according to a matching result includes:
traversing the plurality of voiceprint clusters;
The method comprises the steps of obtaining a twin voiceprint pair contained in a currently traversed voiceprint cluster, wherein the twin voiceprint pair contains at least two speakers, and each speaker is pre-configured with a voiceprint recognition model;
Determining the covering condition of the currently traversed voiceprint cluster on the voiceprint cluster of the voiceprint feature set to be identified according to the voiceprint identification model corresponding to the twin voiceprint pair;
and when the traversal is finished, determining the overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be identified according to the acquired voiceprint cluster coverage condition of each voiceprint cluster to the voiceprint feature set to be identified.
Preferably, the voiceprint feature set to be identified comprises a plurality of voiceprint features to be matched;
the step of determining the covering condition of the currently traversed voiceprint cluster to the voiceprint cluster of the voiceprint feature set to be identified according to the voiceprint identification model corresponding to the twin voiceprint pair comprises the following steps:
traversing the voiceprint feature set to be identified to obtain currently traversed voiceprint features to be matched;
obtaining hit conditions of currently traversed voiceprint features to be matched according to the corresponding voiceprint recognition model of the twin voiceprint pair, wherein the hit conditions comprise hit or miss;
when the traversal of the voiceprint feature set to be identified is finished, counting the duty ratio of the hit voiceprint features to be matched in the voiceprint feature set to be identified;
And determining the covering condition of the voiceprint clusters currently traversed to the voiceprint clusters of the voiceprint feature set to be identified according to the duty ratio.
Preferably, the step of obtaining the hit condition of the currently traversed voiceprint feature to be matched according to the corresponding voiceprint recognition model of the twin voiceprint includes:
Respectively calculating model matching scores corresponding to currently traversed voiceprint features to be matched according to different voiceprint recognition models corresponding to the twin voiceprint pairs;
comparing the calculated model matching score with an initial threshold value;
if the calculated model matching score is smaller than the initial threshold value, judging that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched;
if the calculated model matching score does not have the model matching score smaller than the initial threshold value, selecting the maximum model matching score from the calculated model matching score;
comparing the maximum model matching score with a preset judgment threshold value;
If the maximum model matching score is greater than or equal to the preset judgment threshold value, judging that the currently traversed voiceprint cluster hits the currently traversed voiceprint feature to be matched;
And if the maximum model matching score is smaller than the preset judgment threshold value, judging that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched.
Preferably, after the step of determining that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched, the method further includes:
traversing the rest voiceprint clusters in the plurality of voiceprint clusters, and returning to the step of acquiring the twin voiceprint pairs contained in the currently traversed voiceprint clusters.
Preferably, before the step of extracting the to-be-recognized voiceprint feature set of the to-be-recognized voice, the method further includes:
Acquiring voice samples of each speaker in a speaker group;
modeling each speaker, and training the constructed model according to the voice sample to obtain a voiceprint recognition model;
In the speaker group, calculating a model matching score corresponding to the voiceprint characteristics of each speaker by using the voiceprint recognition model corresponding to each speaker one by one;
And determining a twin voiceprint pair of the speaker group according to the calculated model matching score, and constructing a voiceprint target group of the speaker group according to the twin voiceprint pair.
Preferably, the step of determining a twin voiceprint pair of the speaker group according to the calculated model matching score and constructing a voiceprint target group of the speaker group according to the twin voiceprint pair comprises:
constructing a model matching score set corresponding to each voiceprint recognition model according to the calculated model matching scores;
Traversing the speaker group, and acquiring a target voiceprint recognition model and target voiceprint characteristics corresponding to the currently traversed speaker;
searching a target model matching score set corresponding to the target voiceprint recognition model from the model matching score set;
reading target model matching scores corresponding to the target voiceprint features from the target model matching score set;
searching the maximum model matching scores except the target model matching scores in the target model matching score set;
Determining a target speaker to which the maximum model matching score belongs, and constructing a twin voiceprint pair according to the current traversed speaker and the target speaker;
when the traversal of the speaker group is completed, a twin voiceprint pair corresponding to each speaker is obtained;
and constructing a voiceprint cluster to which each twin voiceprint pair belongs, and constructing a voiceprint target group of the speaker group according to all the voiceprint clusters.
In addition, in order to achieve the above object, the present invention also proposes a voiceprint recognition device based on a twin voiceprint pair, the device comprising:
The feature extraction module is used for extracting a voiceprint feature set to be recognized of the voice to be recognized;
The voiceprint group acquisition module is used for determining a voiceprint target group to be matched corresponding to the voice to be recognized, wherein the voiceprint target group to be matched comprises a plurality of voiceprint clusters;
The voiceprint pair matching module is used for carrying out twin voiceprint pair matching on the voiceprint feature set to be identified according to the voiceprint cluster, and determining the overall coverage information of the voiceprint target group to be matched on the voiceprint feature set to be identified according to a matching result;
And the result judging module is used for judging that the voice to be recognized belongs to the voiceprint target group to be matched when the overall coverage information meets a preset condition.
In addition, in order to achieve the aim, the invention also provides a voiceprint recognition device based on the twin voiceprint pair, which comprises a memory, a processor and a voiceprint recognition program based on the twin voiceprint pair, wherein the voiceprint recognition program based on the twin voiceprint pair is stored on the memory and can run on the processor, and is configured to realize the steps of the voiceprint recognition method based on the twin voiceprint pair.
In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a voiceprint recognition program based on a twin voiceprint pair, which when executed by a processor, implements the steps of the voiceprint recognition method based on a twin voiceprint pair as described above.
According to the voice recognition method, the voice to be recognized is judged to belong to the voice target group to be recognized, voice recognition is carried out on a speaker through the twin voice pairs in the voice cluster contained in the voice cluster constructed in advance, so that whether the speaker belongs to the voice cluster can be integrally judged, recognition of the speakers in the voice cluster one by one is not needed, recognition efficiency is improved, and the voice group detection is obvious.
Drawings
FIG. 1 is a schematic structural diagram of a voiceprint recognition device based on a twin voiceprint pair in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flow chart of a first embodiment of a voiceprint recognition method based on twin voiceprint pairs according to the present invention;
FIG. 3 is a schematic structural diagram of a voiceprint cluster in a first embodiment of a voiceprint recognition method based on twin voiceprint pairs according to the present invention;
FIG. 4 is a flowchart of a second embodiment of a voiceprint recognition method based on a twin voiceprint pair according to the present invention;
FIG. 5 is a flowchart of a third embodiment of a voiceprint recognition method based on a twin voiceprint pair according to the present invention;
fig. 6 is a block diagram of a first embodiment of a voiceprint recognition device based on twin voiceprint pairs according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a voiceprint recognition device based on a twin voiceprint pair in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the voiceprint recognition device based on the twin voiceprint pair can include a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a wireless FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of a voiceprint recognition device based on twin voiceprint pairs, and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a voiceprint recognition program based on a pair of twin voiceprints may be included in the memory 1005 as one storage medium.
In the voiceprint recognition device based on the twin voiceprint pair shown in fig. 1, the network interface 1004 is mainly used for data communication with the network server, the user interface 1003 is mainly used for data interaction with a user, and the processor 1001 and the memory 1005 in the voiceprint recognition device based on the twin voiceprint pair can be arranged in the voiceprint recognition device based on the twin voiceprint pair, and the voiceprint recognition device based on the twin voiceprint pair invokes a voiceprint recognition program based on the twin voiceprint pair stored in the memory 1005 through the processor 1001 and executes the voiceprint recognition method based on the twin voiceprint pair provided by the embodiment of the invention.
The embodiment of the invention provides a voiceprint recognition method based on a twin voiceprint pair, and referring to fig. 2, fig. 2 is a flow chart of a first embodiment of the voiceprint recognition method based on the twin voiceprint pair.
In this embodiment, the voiceprint recognition method based on the twin voiceprint pair includes the following steps:
s10, extracting a voiceprint feature set to be recognized of voice to be recognized;
It should be noted that, the execution body of the method of the present embodiment may be a computing service device having functions of voice collection, data processing, network communication and program running, for example, a smart phone, a tablet computer, a personal computer, or other electronic devices having the same or similar functions, which is not limited in this embodiment.
It should be understood that the voice to be recognized may be collected voice of the user, for example, when the user makes a call, voice collection may be performed on the user to obtain the voice to be recognized. The voiceprint feature set to be identified may be a voiceprint feature set obtained after voiceprint feature extraction of a voice to be identified, where the set may include a plurality of voiceprint features to be matched, such as [ x1, x2 ].
Step S20, determining a voiceprint target group to be matched corresponding to the voice to be recognized, wherein the voiceprint target group to be matched comprises a plurality of voiceprint clusters;
it should be noted that, before executing the step S10 in this embodiment, the voiceprint group corresponding to the voice to be recognized needs to be constructed, that is, the voiceprint target group to be matched, and generally, there may be a plurality of voiceprint clusters or only one voiceprint cluster in one voiceprint group, and the number of voiceprint clusters may be different according to different actual situations.
In this embodiment, the voiceprint group may be constructed by first collecting voice samples of all speakers in a speaker group (for example, a fraud phone group), then modeling and identifying the speakers in the group, determining a (twin) voiceprint cluster according to the identification result, and finally constructing the voiceprint group according to the (twin) voiceprint cluster. In practical application, the voiceprint group can be constructed for fraud partners in different areas according to the call records of the checked or suspected fraud telephone partners, so that subsequent voiceprint recognition can be carried out on individuals in the fraud telephone partners.
It is emphasized that since the distribution of fraud telephone partners has a certain dispersion and the voiceprint groups of different fraud telephone partners are different, for example, the voiceprint groups of group a and B partners in different areas are substantially impossible to be the same, if the currently acquired voice to be identified originates from group a, the identification thereof by group B partners is obviously inaccurate. The method for determining the voice target group to be matched corresponding to the voice to be recognized in the step can be that firstly, a voice source of the voice to be recognized is determined, then a call terminal (such as a mobile phone, a computer, intelligent wearable equipment and the like) corresponding to the voice source is queried, then a fraud telephone group partner closest to the call terminal is determined according to the position (such as a geographic position and an IP address) or a daily activity area of the call terminal, and finally, the voice group closest to the fraud telephone group is obtained as the voice target group to be matched.
Step S30, twin voiceprint pair matching is carried out on the voiceprint feature set to be identified according to the voiceprint cluster, and overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be identified is determined according to a matching result;
It should be noted that, the voiceprint cluster is formed by twin voiceprint pairs, and the twin voiceprint pairs can be determined by a voiceprint recognition model (which is built and trained in advance) corresponding to the speaker. Specifically, the voiceprint characteristics of each speaker can be scored one by using the voiceprint recognition model of each speaker in the group where the speaker is located, then, the voiceprint characteristics of other speakers m 'with the highest voiceprint characteristics except the current speaker m are searched in the group according to the scoring result of each voiceprint recognition model, and the speaker m' can be called as a twin voiceprint pair of the speaker m. Referring to fig. 3, fig. 3 is a schematic structural diagram of a voiceprint cluster in a first embodiment of a voiceprint recognition method based on a twin voiceprint pair according to the present invention, and as shown in fig. 3, the voiceprint cluster includes three voiceprint clusters, where a speaker m in one voiceprint cluster is a twin voiceprint pair of a speaker m'.
In practical application, traversing the speaker group according to the above manner can obtain all twin voiceprint pairs in the speaker group, then analyzing the twin voiceprint pairs, if in the twin voiceprint pairs (m and m '), the speaker m only has a twin voiceprint pair relationship with the speaker m ' in the twin voiceprint pair, but has no twin voiceprint pair relationship with other speakers except the twin voiceprint pair, then the speaker m and m ' are called to form a voiceprint cluster.
It should be understood that, in this embodiment, the voiceprint recognition model constructed for the speaker may be any one of GMM mean supervector model, identity authentication vector (i-vector) based model, and x-vector model (a deep neural network trained to distinguish speakers, and map variable-length utterances to embedding with fixed dimensions), and the specific model is not limited by this embodiment and the following embodiments.
In a specific implementation, after the voiceprint feature set to be identified is obtained, the voiceprint feature set to be identified can be matched through twin voiceprints contained in the voiceprint cluster. Firstly, a voiceprint recognition model (which is built and trained in advance) corresponding to each speaker in a twin voiceprint pair is obtained, then, the voiceprint features [ x1, x2, ] to be matched in the voiceprint feature set to be recognized are respectively scored according to the voiceprint recognition model, the scoring result is compared with a preset threshold value, then, whether the voiceprint cluster hits the voiceprint feature to be matched or not is judged according to the comparison result, and if the voiceprint feature contained in the voiceprint feature set [ x1, x2, ] to be recognized exceeds a set threshold value (such as 80%), the coverage of the voiceprint cluster to be recognized reaches the standard by all speakers in the twin voiceprint pair of one voiceprint cluster.
For example, the voiceprint cluster c includes a twin voiceprint pair (m and m'), the hit rate of the voiceprint feature set [ x1, x2, & gt..x 10] to be identified is 80%, that is, 8 voiceprint features in the 10 voiceprint features are hit by the voiceprint cluster c, and then the voiceprint cluster c is determined to reach the coverage standard of the voiceprint feature set [ x1, x2, &..x 10] to be identified.
It can be understood that after the coverage condition (the voiceprint cluster) of the voiceprint feature set to be identified in each voiceprint cluster is obtained, the overall coverage information of the voiceprint feature set to be identified in the voiceprint target group to be matched can be determined.
And S40, judging that the voice to be recognized belongs to the target group of the voiceprint to be matched when the overall coverage information meets a preset condition.
It should be noted that, in this embodiment, the preset conditions may be set according to actual situations, for example, when the number of voiceprint clusters with coverage reaching the standard of the voiceprint feature set to be identified in the entire voiceprint group exceeds a set percentage (for example, 80%) of the total number of voiceprint clusters in the entire voiceprint group, it is determined that the overall coverage information meets the preset conditions, that is, the voice to be identified belongs to the voiceprint target group to be matched, so as to realize quick determination on whether the speaker belongs to the specific target group.
According to the voice recognition method, the voice recognition device and the voice recognition system, the voice to be recognized is judged to belong to the voice target group to be recognized through extracting the voice feature set to be recognized of the voice to be recognized, then the voice target group to be recognized, which corresponds to the voice to be recognized, is determined to comprise a plurality of voice clusters, twin voice pairs are conducted on the voice target group to be recognized according to the voice cluster to be recognized, finally overall coverage information of the voice target group to be recognized is determined according to the matching result, when the overall coverage information meets the preset condition, voice recognition is conducted on a speaker through the twin voice pairs in the voice cluster contained in the voice cluster constructed in advance, therefore whether the speaker belongs to the voice cluster can be judged integrally, recognition is not needed to be conducted on the voice to be recognized according to the speaker group one by one, recognition efficiency is improved, and advantages are obvious in voice group detection.
Referring to fig. 4, fig. 4 is a flowchart of a second embodiment of a voiceprint recognition method based on a twin voiceprint pair according to the present invention.
Based on the first embodiment, in this embodiment, the step S30 includes:
step 301, traversing the plurality of voiceprint clusters;
It should be understood that, there may be a plurality of voiceprint clusters in the voiceprint target group to be matched, in order to accurately obtain the coverage condition of the voiceprint feature set to be identified in each voiceprint cluster, the following operations of the present embodiment are performed by selecting one voiceprint cluster from the plurality of voiceprint clusters at a time in a traversal manner.
Step S302, a twin voiceprint pair contained in a voiceprint cluster traversed currently is obtained, wherein the twin voiceprint pair contains at least two speakers, and each speaker is pre-configured with a voiceprint recognition model;
As described in the first embodiment, the voiceprint clusters are formed by twin voiceprint pairs, so that each twin voiceprint pair contains at least two speakers, and each speaker is preconfigured and trained with a voiceprint recognition model during the voiceprint cluster construction stage. For example, in the pair of twin voiceprints (m and m '), speaker m has a pre-configured voiceprint recognition model A1, speaker m' has a pre-configured voiceprint recognition model A2, and so on.
Step S303, determining the covering condition of the currently traversed voiceprint cluster to the voiceprint cluster to be identified in the voiceprint feature set according to the voiceprint identification model corresponding to the twin voiceprint pair;
It should be noted that, in the voiceprint cluster coverage situation, that is, the hit situation of the voiceprint cluster to be matched in the voiceprint feature set to be identified, if a certain voiceprint feature to be matched is scored by the voiceprint recognition models in all voiceprint twin pairs of a certain voiceprint cluster, no score smaller than the initial threshold value exists, and the highest score exceeds the decision threshold value, the voiceprint cluster can be considered to hit the voiceprint feature to be matched. And similarly, if the hit rate of all the voiceprint features to be matched in the voiceprint feature set to be identified by a certain voiceprint cluster exceeds a set threshold (such as 80%), judging that the coverage of the voiceprint feature set to be identified by the voiceprint cluster reaches the standard.
And step S304, when the traversal is finished, determining the overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be identified according to the acquired voiceprint cluster coverage condition of each voiceprint cluster to the voiceprint feature set to be identified.
It can be understood that when all the voiceprint clusters in the voiceprint target group to be matched are traversed in the above manner, the voiceprint cluster coverage condition of the voiceprint feature set to be identified of each voiceprint cluster can be obtained. For example, the target group of the voiceprint to be matched includes 5 voiceprint clusters (M1, M2, M3, M4, M5), if the coverage of the voiceprint clusters M1, M2, M3, M5 to the voiceprint feature set [ x1, x2, ] to be identified meets the standard, and the coverage of the voiceprint feature set [ x1, x2, ] to be identified by M4 does not meet the standard, the overall coverage information is that the overall coverage of the target group of the voiceprint to be matched to the voiceprint feature set to be identified is 80%, and if the overall coverage exceeds the set percentage, it can be determined that the voice to be identified belongs to the target group of the voiceprint to be matched.
Further, in order to ensure accurate acquisition of the voiceprint cluster coverage, in this embodiment, the step S303 may specifically include:
step S3031, traversing the voiceprint feature set to be identified to obtain currently traversed voiceprint features to be matched;
Step S3032, obtaining hit conditions of currently traversed voiceprint features to be matched according to the corresponding voiceprint recognition model of the twin voiceprint pair, wherein the hit conditions comprise hit or miss;
step S3033, counting the duty ratio of the hit voiceprint features to be matched in the voiceprint feature set to be identified when the traversal of the voiceprint feature set to be identified is finished;
and step S3034, determining the covering condition of the voiceprint clusters of the voiceprint feature set to be identified by the currently traversed voiceprint clusters according to the duty ratio.
It should be noted that, in order to ensure accuracy of the recognition result, the embodiment preferably adopts a traversal mode to match the voiceprint features to be matched in the voiceprint feature set to be recognized one by one, where there may be a plurality of voiceprint features [ x1, x2, ]. The voiceprint features to be matched in the voiceprint feature set to be recognized. In this embodiment, the voiceprint features to be matched which are currently traversed, i.e. the voiceprint features which are currently input into the voiceprint recognition model for scoring, are called.
It should be understood that, for one voiceprint cluster (assuming that there is only one twin voiceprint pair), if all the voiceprint features [ x1, x2, x4, x5] in the voiceprint feature set to be identified [ x1, x2, x4, x5] are hit by the voiceprint cluster, the coverage of the voiceprint cluster to be identified in the voiceprint feature set to be identified [ x1, x2, x4, x5] is determined to be up to standard, and if the ratio of the voiceprint feature set to be identified in the voiceprint feature set to be identified is less than 80%, the coverage of the voiceprint cluster to be identified is determined to be up to standard.
Further, the specific implementation manner of the step S3032 in this embodiment may include the following steps:
step S1, respectively calculating model matching scores corresponding to currently traversed voiceprint features to be matched according to different voiceprint recognition models corresponding to the twin voiceprint pairs;
It should be appreciated that for the currently traversed voiceprint cluster, its twin voiceprint pair is determined, and correspondingly, its twin voiceprint pair speaker-corresponding voiceprint recognition model is also determined. For example, the voiceprint cluster c includes a twin voiceprint pair (m and m '), the voiceprint recognition model A1 corresponding to the speaker m, the voiceprint recognition model A2 corresponding to the speaker m', and the currently traversed voiceprint feature to be matched is x1, and then by inputting the voiceprint feature to be matched x1 into the voiceprint recognition models A1 and A2, the corresponding model matching scores are A1 (x 1) and A2 (x 1) can be calculated.
S2, comparing the calculated model matching score with an initial threshold value;
In a specific implementation, after the model matching scores A1 (x 1) and A2 (x 1) are calculated, they can be compared with an initial threshold value (specific data can be set according to actual conditions), and then whether the model matching scores A1 (x 1) and A2 (x 1) are smaller than the initial threshold value or not is judged according to a comparison result.
Of course, in this embodiment, for the corresponding voiceprint recognition models (if the number is greater) of different twin voiceprint pairs, a traversing mode may be adopted to select one model at a time to calculate a model matching score corresponding to the currently traversed voiceprint feature to be matched, then the calculated model matching score is compared with an initial threshold value, once the calculated model matching score is smaller than the initial threshold value, the feature matching operation on the present voiceprint cluster is directly abandoned, and the voiceprint cluster is directly determined to miss the currently traversed voiceprint feature to be matched, so that the feature matching time is saved, and the voiceprint recognition efficiency is improved.
Step S3, if the calculated model matching score is smaller than the initial threshold value, judging that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched;
It can be understood that, as described above, if the calculated model matching score has a model matching score smaller than the initial threshold value, for example, A2 (x 1), then it may be directly determined that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched, and the present voiceprint cluster is abandoned, and the next voiceprint cluster is skipped to continue to match the voiceprint feature, that is, the step S301 is returned, and other voiceprint clusters are traversed again.
S4, if the calculated model matching score does not have the model matching score smaller than the initial threshold value, selecting the maximum model matching score from the calculated model matching score;
Accordingly, if there is no model matching score smaller than the initial threshold value among the calculated model matching scores, that is, the model matching scores A1 (x 1) and A2 (x 1) are both greater than the initial threshold value, the model matching score with the largest score, that is, the maximum model matching score, may be selected from the model matching scores A1 (x 1) and A2 (x 1), and then the following step S5 is performed.
S5, comparing the maximum model matching score with a preset judgment threshold value;
step S6, if the maximum model matching score is greater than or equal to the preset judgment threshold value, judging that the currently traversed voiceprint cluster hits the currently traversed voiceprint feature to be matched;
And S7, if the maximum model matching score is smaller than the preset judgment threshold value, judging that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched.
It should be noted that, in this embodiment, the value of the preset decision threshold is adjustable, and the absolute value of the preset decision threshold is greater than the absolute value of the initial threshold. In practical application, if the calculated model matching score does not have a model matching score smaller than the initial threshold value, but the model matching score with the largest score is still lower than the preset judgment threshold value, the currently traversed voiceprint cluster is still judged to miss the currently traversed voiceprint feature to be matched.
In addition, for the voiceprint feature set [ x1, x2, & gt..x 5] to be identified, if the voiceprint feature x1 to be matched is hit by a certain voiceprint cluster c in the voiceprint target group to be matched, a traversal mode can be adopted to directly select one voiceprint cluster from the rest voiceprint clusters to perform feature matching on the next voiceprint feature x2 to be matched until all the voiceprint clusters in the voiceprint target group to be matched are traversed.
According to the embodiment, through traversing all the voiceprint clusters in the voiceprint target group to be matched, then determining the hit condition of the voiceprint features in the voiceprint feature set to be identified of each voiceprint cluster according to the voiceprint identification model constructed in advance, the voiceprint cluster coverage condition of the voiceprint feature set to be identified of the voiceprint clusters can be accurately and comprehensively confirmed, and meanwhile, the embodiment determines the overall coverage information of the voiceprint feature set to be identified of the voiceprint target group to be matched according to the voiceprint cluster coverage condition, so that the accuracy and reliability of a final identification result are effectively ensured.
Referring to fig. 5, fig. 5 is a flowchart of a third embodiment of a voiceprint recognition method based on a twin voiceprint pair according to the present invention.
Based on the above embodiments, in this embodiment, before the step S10, the method further includes constructing a voiceprint target group, which specifically includes the following steps:
s01, obtaining voice samples of each speaker in a speaker group;
it should be noted that, the speaker group in this embodiment may be determined by a specific application scenario, for example, a fraud phone group, and the voice samples of each speaker in the group may be obtained during the talking process of the speaker.
Step S02, modeling each speaker, and training the constructed model according to the voice sample to obtain a voiceprint recognition model;
In this step, modeling the speaker may be performed by using one of a GMM mean hyper-vector model, an i-vector model, and an x-vector model as an initial voiceprint recognition model, and then performing model training on the model according to a corresponding speech sample, and obtaining the voiceprint recognition model after model convergence.
S03, in the speaker group, calculating a model matching score corresponding to the voiceprint feature of each speaker by using the voiceprint recognition model corresponding to each speaker one by one;
It should be noted that, for the trained voiceprint recognition model, after any voiceprint feature is input into the model, a model matching score that can represent the matching degree between the voiceprint feature and the voiceprint recognition model can be obtained.
For a (voiceprint recognition) model that is more prone to be determined as a target person, a higher model match score indicates that the speaker to which the voiceprint feature belongs is more likely to be the actual corresponding person to which the model corresponds. Conversely, for a model that is less prone to be determined to be the target person (voiceprint recognition), the result output by the model may be the opposite number.
And S04, determining a twin voiceprint pair of the speaker group according to the calculated model matching score, and constructing a voiceprint target group of the speaker group according to the twin voiceprint pair.
It will be appreciated that the model matching score described above can reflect the degree of matching of the current voiceprint feature with the voiceprint recognition model, that is, the degree of matching of the speaker providing the current voiceprint feature with the actual corresponding person of the voiceprint recognition model, and if there is another voiceprint recognition model, the calculated model matching score is the highest in addition to the model matching score calculated by the voiceprint recognition model, then the speaker providing the current voiceprint feature and the speaker corresponding to the voiceprint recognition model with the highest model matching score in addition to the voiceprint recognition model can be referred to as a twin voiceprint pair.
In order to achieve accurate acquisition of the twin voiceprint pair, as an implementation manner, step S04 in this embodiment may specifically include:
Step S041, constructing a model matching score set corresponding to each voiceprint recognition model according to the calculated model matching scores;
Step S042, traversing the speaker group and acquiring a target voiceprint recognition model and target voiceprint characteristics corresponding to the currently traversed speaker;
Step S043, searching a target model matching score set corresponding to the target voiceprint recognition model from the model matching score set;
Step S044, reading target model matching scores corresponding to the target voiceprint features from the target model matching score set;
step S045, searching the maximum model matching scores except the target model matching scores in the target model matching score set;
Step S046, determining a target speaker to which the maximum model matching score belongs, and constructing a twin voiceprint pair according to the current traversed speaker and the target speaker;
step S047, when the traversal of the speaker group is completed, a twin voiceprint pair corresponding to each speaker is obtained;
and S048, constructing a voiceprint cluster to which each twin voiceprint pair belongs, and constructing a voiceprint target group of the speaker group according to all the voiceprint clusters.
The above steps S041-S048 are described herein with reference to specific examples, for example, the speaker group includes three speakers (A, B, C), the voiceprint recognition models corresponding to the speakers (A, B, C) are f A(·)、fB(·)、fC (·) respectively, and the voiceprint features (sets) provided by the speakers (A, B, C) are (X A、XB、XC) respectively.
After the calculation in the steps S01-S03, a model matching score set corresponding to each voiceprint recognition model can be obtained {[fA(XA),fA(XB),fA(XC)],[fB(XA),fB(XB),fB(XC)],[fC(XA),fC(XB),fC(XC)]}.
If the speaker traversed at the current moment is the speaker A, the corresponding target voice print recognition model is f A (·) and the target voice print characteristic is X A, at the moment, a target model matching score set [ f A(XA),fA(XB),fA(XC ] corresponding to the target voice print recognition model f A (·) can be searched from the model matching score set, then a target model matching score f A(XA corresponding to the target voice print characteristic X A is read from the target model matching score set, then the maximum model matching score except the target model matching score f A(XA) is searched in the target model matching score set, and the assumption is f A(XC), the target speaker C to which the maximum model matching score f A(XC) belongs can be determined, and then a twin voice print pair (A, C) can be constructed according to the current traversed speaker A and the target speaker C.
In the specific implementation, after all twin voiceprint pairs corresponding to the speaker group are constructed in the above manner, the twin voiceprint pairs can be analyzed, then a voiceprint cluster is formed according to the analysis result, and finally a voiceprint target group of the speaker group is constructed based on the voiceprint cluster.
It should be noted that, in the analysis of the twin voiceprint pair, if the speaker m has a twin voiceprint pair relationship with only the speaker m 'in the twin voiceprint pair, but does not have a twin voiceprint pair relationship with other speakers except the twin voiceprint pair, the pair may be called that m and m' form a voiceprint cluster. In addition, the construction of the twin voiceprint pair in this embodiment is not limited to the specific manner described above, and any other manner may be used to determine the similarity of the characteristics of the speaker voiceprint, and then construct the twin voiceprint pair according to the similarity, which is not limited in the present invention.
According to the embodiment, voice samples of all speakers in the speaker group are obtained, modeling is conducted on all the speakers, the built models are trained according to the voice samples, a voiceprint recognition model is obtained, model matching scores corresponding to voiceprint characteristics of all the speakers are calculated in the speaker group one by one through the voiceprint recognition model corresponding to each speaker, finally a twin voiceprint pair of the speaker group is determined according to the calculated model matching scores, and a voiceprint target group of the speaker group is built according to the twin voiceprint pair. According to the embodiment, the twin voiceprint pair analysis is performed on the speaker group, and then the voiceprint cluster structure in the voiceprint target group is determined according to the analysis result, so that the degree of fit between the finally constructed voiceprint group and the speaker group can be ensured.
In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a voiceprint recognition program based on the twin voiceprint pair, and the voiceprint recognition program based on the twin voiceprint pair realizes the steps of the voiceprint recognition method based on the twin voiceprint pair when being executed by a processor.
Referring to fig. 6, fig. 6 is a block diagram of a first embodiment of a voiceprint recognition device based on a twin voiceprint pair according to the present invention.
As shown in fig. 6, a voiceprint recognition device based on a twin voiceprint pair according to an embodiment of the present invention includes:
The feature extraction module 601 is configured to extract a voiceprint feature set to be recognized of a voice to be recognized;
The voiceprint group acquisition module 602 is configured to determine a voiceprint target group to be matched corresponding to the voice to be recognized, where the voiceprint target group to be matched includes a plurality of voiceprint clusters;
The voiceprint pair matching module 603 is configured to perform twin voiceprint pair matching on the voiceprint feature set to be identified according to the voiceprint cluster, and determine overall coverage information of the voiceprint target group to be matched on the voiceprint feature set to be identified according to a matching result;
And a result determining module 604, configured to determine that the voice to be recognized belongs to the target group of voiceprints to be matched when the overall coverage information meets a preset condition.
According to the voice recognition method, the voice recognition device and the voice recognition system, the voice to be recognized is judged to belong to the voice target group to be recognized through extracting the voice feature set to be recognized of the voice to be recognized, then the voice target group to be recognized, which corresponds to the voice to be recognized, is determined to comprise a plurality of voice clusters, twin voice pairs are conducted on the voice target group to be recognized according to the voice cluster to be recognized, finally overall coverage information of the voice target group to be recognized is determined according to the matching result, when the overall coverage information meets the preset condition, voice recognition is conducted on a speaker through the twin voice pairs in the voice cluster contained in the voice cluster constructed in advance, therefore whether the speaker belongs to the voice cluster can be judged integrally, recognition is not needed to be conducted on the voice to be recognized according to the speaker group one by one, recognition efficiency is improved, and advantages are obvious in voice group detection.
Based on the first embodiment of the voiceprint recognition device based on the twin voiceprint pair, a second embodiment of the voiceprint recognition device based on the twin voiceprint pair is provided.
In this embodiment, the voiceprint pair matching module 603 is further configured to traverse the plurality of voiceprint clusters, obtain a twin voiceprint pair that is included in a currently traversed voiceprint cluster, where the twin voiceprint pair includes at least two speakers and each speaker is preconfigured with a voiceprint recognition model, determine, according to the twin voiceprint pair corresponding voiceprint recognition model, a voiceprint cluster coverage condition of the currently traversed voiceprint cluster to the voiceprint feature set to be recognized, and determine, when the traversing is finished, overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be recognized according to the obtained voiceprint cluster coverage condition of each voiceprint cluster to the voiceprint feature set to be recognized.
Further, the voiceprint pair matching module 603 is further configured to traverse the voiceprint feature set to be identified to obtain a currently traversed voiceprint feature to be matched, obtain a hit condition of the currently traversed voiceprint feature to be matched according to the twinned voiceprint pair corresponding voiceprint recognition model, where the hit condition includes hit or miss, count a ratio of the hit voiceprint feature to be matched in the voiceprint feature set to be identified when the traversing of the voiceprint feature set to be identified is completed, and determine a voiceprint cluster coverage condition of the currently traversed voiceprint cluster to the voiceprint feature set to be identified according to the ratio.
Further, the voiceprint pair matching module 603 is further configured to respectively calculate model matching scores corresponding to the currently traversed voiceprint features to be matched according to different voiceprint recognition models corresponding to the twin voiceprint pair, compare the calculated model matching scores with an initial threshold, determine that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint features to be matched if the calculated model matching scores are smaller than the initial threshold, select a maximum model matching score from the calculated model matching scores if the calculated model matching scores do not have the model matching scores smaller than the initial threshold, compare the maximum model matching score with a preset judgment threshold, determine that the currently traversed voiceprint cluster hits the currently traversed voiceprint features to be matched if the maximum model matching score is larger than or equal to the preset judgment threshold, and determine that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint features to be matched if the maximum model matching score is smaller than the preset judgment threshold.
Further, the voiceprint pair matching module 603 is further configured to traverse remaining voiceprint clusters in the plurality of voiceprint clusters, and perform an operation of obtaining a twin voiceprint pair included in the currently traversed voiceprint cluster.
The voiceprint recognition device based on the twin voiceprint pairs further comprises a voiceprint group construction module, a voiceprint recognition model, a model matching score and a model target group, wherein the voiceprint group construction module is used for acquiring voice samples of all speakers in a speaker group, modeling each speaker, training the constructed model according to the voice samples to obtain the voiceprint recognition model, calculating the model matching score corresponding to the voiceprint characteristics of each speaker one by using the voiceprint recognition model corresponding to each speaker in the speaker group, determining the twin voiceprint pairs of the speaker group according to the calculated model matching score, and constructing the voiceprint target group of the speaker group according to the twin voiceprint pairs.
The voice print group construction module is further used for constructing a model matching score set corresponding to each voice print recognition model according to the calculated model matching scores, traversing the speaker group, acquiring a target voice print recognition model and target voice print characteristics corresponding to the currently traversed speaker, searching a target model matching score set corresponding to the target voice print recognition model from the model matching score set, reading target model matching scores corresponding to the target voice print characteristics from the target model matching score set, searching a maximum model matching score except the target model matching score in the target model matching score set, determining a target speaker to which the maximum model matching score belongs, constructing a twin voice print pair according to the currently traversed speaker and the target speaker, acquiring the twin voice print pair corresponding to each speaker when traversing the speaker group is completed, constructing voice print target groups of the speaker group according to all voice print clusters.
Other embodiments or specific implementation manners of the voiceprint recognition device based on the twin voiceprint pair of the present invention may refer to the above method embodiments, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A voiceprint recognition method based on a twin voiceprint pair, the method comprising:
Extracting a voiceprint feature set to be recognized of the voice to be recognized;
Determining a voice print target group to be matched corresponding to the voice to be recognized, wherein the voice print target group to be matched comprises a plurality of voice print clusters, the voice print clusters are formed by twin voice print pairs, the twin voice print pairs are determined by voice print recognition models corresponding to speakers, namely, in the group where the speakers are located, voice print characteristics of each speaker are scored one by using the voice print recognition model of each speaker, and for each voice print recognition model, voice print characteristics of other speakers m 'with highest voice print characteristics except for the current speaker m are searched in the group according to a scoring result, and the other speakers m' are called as twin voice print pairs of the current speaker m;
Twin voiceprint pair matching is carried out on the voiceprint feature set to be identified according to the voiceprint cluster, and overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be identified is determined according to a matching result;
And when the overall coverage information meets a preset condition, judging that the voice to be recognized belongs to the voiceprint target group to be matched.
2. The method for identifying a voiceprint based on a twin voiceprint pair according to claim 1, wherein the step of performing twin voiceprint pair matching on the voiceprint feature set to be identified according to the voiceprint cluster, and determining overall coverage information of the voiceprint target group to be matched on the voiceprint feature set to be identified according to a matching result includes:
traversing the plurality of voiceprint clusters;
The method comprises the steps of obtaining a twin voiceprint pair contained in a currently traversed voiceprint cluster, wherein the twin voiceprint pair contains at least two speakers, and each speaker is pre-configured with a voiceprint recognition model;
Determining the covering condition of the currently traversed voiceprint cluster on the voiceprint cluster of the voiceprint feature set to be identified according to the voiceprint identification model corresponding to the twin voiceprint pair;
and when the traversal is finished, determining the overall coverage information of the voiceprint target group to be matched to the voiceprint feature set to be identified according to the acquired voiceprint cluster coverage condition of each voiceprint cluster to the voiceprint feature set to be identified.
3. The voiceprint recognition method based on the twin voiceprint pair according to claim 2, wherein the voiceprint feature set to be recognized comprises a plurality of voiceprint features to be matched;
the step of determining the covering condition of the currently traversed voiceprint cluster to the voiceprint cluster of the voiceprint feature set to be identified according to the voiceprint identification model corresponding to the twin voiceprint pair comprises the following steps:
traversing the voiceprint feature set to be identified to obtain currently traversed voiceprint features to be matched;
obtaining hit conditions of currently traversed voiceprint features to be matched according to the corresponding voiceprint recognition model of the twin voiceprint pair, wherein the hit conditions comprise hit or miss;
when the traversal of the voiceprint feature set to be identified is finished, counting the duty ratio of the hit voiceprint features to be matched in the voiceprint feature set to be identified;
And determining the covering condition of the voiceprint clusters currently traversed to the voiceprint clusters of the voiceprint feature set to be identified according to the duty ratio.
4. The voiceprint recognition method based on a twin voiceprint pair according to claim 3, wherein the step of acquiring the hit condition of the currently traversed voiceprint feature to be matched according to the voiceprint recognition model corresponding to the twin voiceprint pair comprises:
Respectively calculating model matching scores corresponding to currently traversed voiceprint features to be matched according to different voiceprint recognition models corresponding to the twin voiceprint pairs;
comparing the calculated model matching score with an initial threshold value;
if the calculated model matching score is smaller than the initial threshold value, judging that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched;
if the calculated model matching score does not have the model matching score smaller than the initial threshold value, selecting the maximum model matching score from the calculated model matching score;
comparing the maximum model matching score with a preset judgment threshold value;
If the maximum model matching score is greater than or equal to the preset judgment threshold value, judging that the currently traversed voiceprint cluster hits the currently traversed voiceprint feature to be matched;
And if the maximum model matching score is smaller than the preset judgment threshold value, judging that the currently traversed voiceprint cluster does not hit the currently traversed voiceprint feature to be matched.
5. The method for identifying voiceprint based on a pair of twin voiceprints according to claim 4, wherein after the step of determining that the currently traversed voiceprint cluster misses the currently traversed voiceprint feature to be matched, the method further comprises:
traversing the rest voiceprint clusters in the plurality of voiceprint clusters, and returning to the step of acquiring the twin voiceprint pairs contained in the currently traversed voiceprint clusters.
6. The method for voiceprint recognition based on a pair of twin voiceprints according to any one of claims 1 to 5, wherein prior to the step of extracting a set of voiceprint features to be recognized of a voice to be recognized, the method further comprises:
Acquiring voice samples of each speaker in a speaker group;
modeling each speaker, and training the constructed model according to the voice sample to obtain a voiceprint recognition model;
In the speaker group, calculating a model matching score corresponding to the voiceprint characteristics of each speaker by using the voiceprint recognition model corresponding to each speaker one by one;
And determining a twin voiceprint pair of the speaker group according to the calculated model matching score, and constructing a voiceprint target group of the speaker group according to the twin voiceprint pair.
7. The method of claim 6, wherein the step of determining a pair of twin voiceprint of the speaker group based on the calculated model match score and constructing a target group of voiceprint of the speaker group based on the pair of twin voiceprint comprises:
constructing a model matching score set corresponding to each voiceprint recognition model according to the calculated model matching scores;
Traversing the speaker group, and acquiring a target voiceprint recognition model and target voiceprint characteristics corresponding to the currently traversed speaker;
searching a target model matching score set corresponding to the target voiceprint recognition model from the model matching score set;
reading target model matching scores corresponding to the target voiceprint features from the target model matching score set;
searching the maximum model matching scores except the target model matching scores in the target model matching score set;
Determining a target speaker to which the maximum model matching score belongs, and constructing a twin voiceprint pair according to the current traversed speaker and the target speaker;
when the traversal of the speaker group is completed, a twin voiceprint pair corresponding to each speaker is obtained;
and constructing a voiceprint cluster to which each twin voiceprint pair belongs, and constructing a voiceprint target group of the speaker group according to all the voiceprint clusters.
8. A voiceprint recognition device based on a pair of twin voiceprints, the device comprising:
The feature extraction module is used for extracting a voiceprint feature set to be recognized of the voice to be recognized;
The voiceprint group acquisition module is used for determining a voiceprint target group to be matched corresponding to the voice to be identified, wherein the voiceprint target group to be matched comprises a plurality of voiceprint clusters, the voiceprint clusters are formed by twin voiceprint pairs, the twin voiceprint pairs are determined by voiceprint identification models corresponding to the speakers, namely, in the group where the speakers are located, the voiceprint identification model of each speaker is used for scoring the voiceprint characteristics of each speaker one by one, and for each voiceprint identification model, the voiceprint characteristics of other speakers m 'with the highest voiceprint characteristics except the current speaker m are searched in the group according to the scoring result, and the other speakers m' are called as twin voiceprint pairs of the current speaker m;
The voiceprint pair matching module is used for carrying out twin voiceprint pair matching on the voiceprint feature set to be identified according to the voiceprint cluster, and determining the overall coverage information of the voiceprint target group to be matched on the voiceprint feature set to be identified according to a matching result;
And the result judging module is used for judging that the voice to be recognized belongs to the voiceprint target group to be matched when the overall coverage information meets a preset condition.
9. A twin voiceprint pair based voiceprint recognition apparatus comprising a memory, a processor and a twin voiceprint pair based voiceprint recognition program stored on the memory and executable on the processor, the twin voiceprint pair based voiceprint recognition program being configured to implement the steps of the twin voiceprint pair based voiceprint recognition method of any one of claims 1 to 7.
10. A storage medium having stored thereon a voiceprint recognition program based on a twin voiceprint pair, which when executed by a processor performs the steps of the voiceprint recognition method based on a twin voiceprint pair as claimed in any one of claims 1 to 7.
CN202110514062.4A 2021-05-11 2021-05-11 Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs Active CN115376516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110514062.4A CN115376516B (en) 2021-05-11 2021-05-11 Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110514062.4A CN115376516B (en) 2021-05-11 2021-05-11 Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs

Publications (2)

Publication Number Publication Date
CN115376516A CN115376516A (en) 2022-11-22
CN115376516B true CN115376516B (en) 2025-08-26

Family

ID=84059578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110514062.4A Active CN115376516B (en) 2021-05-11 2021-05-11 Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs

Country Status (1)

Country Link
CN (1) CN115376516B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119580709A (en) * 2024-11-29 2025-03-07 中国船舶集团有限公司第七〇九研究所 Intelligent voice and video command and dispatch method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900725A (en) * 2018-05-29 2018-11-27 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove, device, terminal device and storage medium
CN109637547A (en) * 2019-01-29 2019-04-16 北京猎户星空科技有限公司 Audio data mask method, device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9368109B2 (en) * 2013-05-31 2016-06-14 Nuance Communications, Inc. Method and apparatus for automatic speaker-based speech clustering
CN111243601B (en) * 2019-12-31 2023-04-07 北京捷通华声科技股份有限公司 Voiceprint clustering method and device, electronic equipment and computer-readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900725A (en) * 2018-05-29 2018-11-27 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove, device, terminal device and storage medium
CN109637547A (en) * 2019-01-29 2019-04-16 北京猎户星空科技有限公司 Audio data mask method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115376516A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN110136727B (en) Speaker identification method, device and storage medium based on speaking content
US10904643B2 (en) Call classification through analysis of DTMF events
CN112468659B (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
CN107562760B (en) Voice data processing method and device
JP2021527840A (en) Voiceprint identification methods, model training methods, servers, and computer programs
CN108269575B (en) Voice recognition method for updating voiceprint data, terminal device and storage medium
WO2021159902A1 (en) Age recognition method, apparatus and device, and computer-readable storage medium
CN105244031A (en) Speaker identification method and device
WO2019134247A1 (en) Voiceprint registration method based on voiceprint recognition model, terminal device, and storage medium
CN112632248B (en) Question and answer method, device, computer equipment and storage medium
CN111079517A (en) Face management and recognition method and computer-readable storage medium
EP3816996B1 (en) Information processing device, control method, and program
US12288562B2 (en) System and method for spoofing detection
WO2021072893A1 (en) Voiceprint clustering method and apparatus, processing device and computer storage medium
CN109545226A (en) A kind of audio recognition method, equipment and computer readable storage medium
CN106128466A (en) Identity vector processing method and device
CN111863025A (en) An audio source anti-forensics method
CN115376516B (en) Voiceprint recognition method, device, equipment and storage medium based on twin voiceprint pairs
WO2022107242A1 (en) Processing device, processing method, and program
CN113011503B (en) Data evidence obtaining method of electronic equipment, storage medium and terminal
CN111985231B (en) Unsupervised role recognition method and device, electronic equipment and storage medium
CN111063359B (en) Telephone return visit validity judging method, device, computer equipment and medium
WO2022107241A1 (en) Processing device, processing method, and program
CN113763963B (en) Phone card fraud detection method, system, computer equipment and storage medium
CN111062345A (en) Training method and device of vein recognition model and vein image recognition device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant