[go: up one dir, main page]

CN114596863A - Interactive voiceprint clustering method and system, electronic equipment and storage medium - Google Patents

Interactive voiceprint clustering method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN114596863A
CN114596863A CN202210127098.1A CN202210127098A CN114596863A CN 114596863 A CN114596863 A CN 114596863A CN 202210127098 A CN202210127098 A CN 202210127098A CN 114596863 A CN114596863 A CN 114596863A
Authority
CN
China
Prior art keywords
clustering
voiceprint
interactive
similarity
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210127098.1A
Other languages
Chinese (zh)
Other versions
CN114596863B (en
Inventor
洪国强
肖龙源
李稀敏
叶志坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202210127098.1A priority Critical patent/CN114596863B/en
Publication of CN114596863A publication Critical patent/CN114596863A/en
Application granted granted Critical
Publication of CN114596863B publication Critical patent/CN114596863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of voiceprint clustering, in particular to an interactive voiceprint clustering method, system, electronic equipment and storage medium. The method comprises the following steps: extracting a feature vector of the voice file, and clustering based on a preset clustering threshold value; auditing each type of clustered voice files to confirm the final number of people contained in the voice files; re-clustering according to the number of confirmed people, calculating a mean value based on the voiceprint features of each class to serve as the central features of the voiceprint features of the class, solving the similarity between each feature and the central features in the class, and marking the documents which are lower than the similarity threshold as documents which are difficult to classify based on a preset similarity threshold; and reviewing the hard classified files again and re-allocating the hard classified files to correct classes. The invention combines the advantages of the algorithm and the human by combining the human and the algorithm and combining the advantages of the algorithm and the human by an interactive method, solves the problem of the insufficiency of the algorithm and further improves the final clustering effect.

Description

一种交互式的声纹聚类方法、系统、电子设备及存储介质An interactive voiceprint clustering method, system, electronic device and storage medium

技术领域technical field

本发明涉及声纹聚类技术领域,具体涉及一种交互式的声纹聚类方法、系统、电子设备及存储介质。The invention relates to the technical field of voiceprint clustering, in particular to an interactive voiceprint clustering method, system, electronic device and storage medium.

背景技术Background technique

声纹聚类是将同一人的语音文件归为一类。现有技术利用算法提取语音中的声纹特征,通过特征比对的方式,将相似度高的归为一类。聚类方式一般为设定相似度阈值或设定人数为聚类终止条件。Voiceprint clustering is to group the voice files of the same person into one category. In the prior art, an algorithm is used to extract voiceprint features in speech, and by feature comparison, those with high similarity are classified into one category. The clustering method is generally to set the similarity threshold or set the number of people as the cluster termination condition.

聚类的准确性受限于算法本身。以人数为终止条件会有较好的聚类效果,但算法无法准确获取人数信息。而以相似度阈值作为终止条件效果相对较差。The accuracy of clustering is limited by the algorithm itself. Taking the number of people as the termination condition will have a better clustering effect, but the algorithm cannot accurately obtain the number of people. However, using the similarity threshold as the termination condition is relatively ineffective.

发明内容SUMMARY OF THE INVENTION

针对现有技术中设定相似度阈值或设定人数为聚类终止条件的声纹聚类方式准确性受限,无法准确获取人数信息的问题,本发明提供了一种交互式的声纹聚类方法、系统、电子设备及存储介质,利用与人的交互来确认分类人数,及处理难分类语音问题。Aiming at the problem that the voiceprint clustering method in the prior art in which the similarity threshold is set or the number of people is set as the clustering termination condition is limited in accuracy and cannot accurately obtain the number of people information, the present invention provides an interactive voiceprint clustering method. Class methods, systems, electronic devices, and storage media, using human interaction to confirm the classification of people, and to deal with difficult-to-classify speech problems.

为实现上述目的,本发明实施例提供了如下的技术方案:To achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

第一方面,在本发明提供的一个实施例中,提供了一种交互式的声纹聚类方法,包括以下步骤:In a first aspect, in an embodiment provided by the present invention, an interactive voiceprint clustering method is provided, including the following steps:

提取语音文件的特征向量,并基于预设的聚类阈值进行聚类;Extract the feature vector of the voice file, and perform clustering based on the preset clustering threshold;

对聚类后的每类语音文件进行审核,以确认所述语音文件包含的最终的人数;Auditing each type of voice file after clustering to confirm the final number of people contained in the voice file;

根据确认的人数重新聚类,并基于每类的声纹特征计算均值作为该类声纹特征的中心特征,求取该类中每个特征和中心特征的相似度,并基于预设的相似度阈值,将低于所述相似度阈值的标记为难分类文件;Re-cluster according to the confirmed number of people, and calculate the mean value based on the voiceprint features of each class as the central feature of the voiceprint feature of this class, find the similarity between each feature in the class and the central feature, and based on the preset similarity Threshold, marking files below the similarity threshold as difficult to classify;

对所述难分类文件再次审核,并重新分配到正确的类别中。The hard-to-categorize files are reviewed again and reassigned to the correct category.

在本发明提供的一些实施例中,所述语音文件的特征向量通过声纹提取算法提取,所述语音文件的特征向量提取的方法,包括以下步骤:In some embodiments provided by the present invention, the feature vector of the voice file is extracted by a voiceprint extraction algorithm, and the method for extracting the feature vector of the voice file includes the following steps:

获取待提取的语音文件;Get the voice file to be extracted;

对获取所述语音文件进行语音识别,获得所述语音文件的音频特征;Perform voice recognition on the acquired voice file to obtain the audio feature of the voice file;

将获得的音频特征输入模型中,获得所述语音文件的特征向量。Input the obtained audio features into the model to obtain the feature vector of the voice file.

在本发明提供的一些实施例中,所述语音文件进行声纹特征提取,采用ivector或xvector算法提取语音文件的特征向量,获得所述语音文件的特征向量。In some embodiments provided by the present invention, voiceprint feature extraction is performed on the voice file, and an ivector or xvector algorithm is used to extract the feature vector of the voice file to obtain the feature vector of the voice file.

在本发明提供的一些实施例中,基于预设的聚类阈值进行聚类时,将语音文件提取的不同特征向量进行相似度计算,其中,不同特征向量根据cosine或plda的相似度算法计算相似度。In some embodiments provided by the present invention, when clustering is performed based on a preset clustering threshold, the similarity calculation is performed on different feature vectors extracted from the voice file, wherein the similarity between the different feature vectors is calculated according to the similarity algorithm of cosine or plda. Spend.

在本发明提供的一些实施例中,基于预设的聚类阈值进行聚类时,采用AHC聚类或SC聚类方法进行聚类。In some embodiments provided by the present invention, when performing clustering based on a preset clustering threshold, AHC clustering or SC clustering method is used to perform clustering.

在本发明提供的一些实施例中,所述预设的聚类阈值为通过测试确认的阈值结果。在确认阈值结果之前,还通过给出的测试文件进行聚类测试,选择效果最好的阈值作为之后使用的默认阈值。In some embodiments provided by the present invention, the preset clustering threshold is a threshold result confirmed by testing. Before confirming the threshold results, a clustering test is also performed through the given test file, and the threshold with the best effect is selected as the default threshold used later.

在本发明提供的一些实施例中,对聚类后的每类语音文件进行审核时为人工进行审核,通过听取录音文件说话人的声音进行判断,将不同类但为同一人的归并同类。In some embodiments provided by the present invention, the auditing of each type of voice file after clustering is performed manually, and judgment is made by listening to the voice of the speaker of the recording file, and the same person but different types are merged into the same category.

在本发明提供的一些实施例中,对所述难分类文件再次审核时,为人工再次审核弥补算法的不足,并重新分配到正确的类别中。In some embodiments provided by the present invention, when the hard-to-categorize files are reviewed again, the deficiency of the algorithm is compensated for the manual review again, and the files are re-assigned to the correct category.

第二方面,在本发明提供的另一个实施例中,提供了一种交互式的声纹聚类系统包括:In a second aspect, in another embodiment provided by the present invention, an interactive voiceprint clustering system is provided, including:

特征提取单元,用于对获取的语音文件进行语音识别提取特征向量;A feature extraction unit, used for performing speech recognition on the acquired voice file to extract feature vectors;

聚类单元,用于对提取的特征向量基于预设的聚类阈值进行聚类;a clustering unit for clustering the extracted feature vectors based on a preset clustering threshold;

相似度比较单元,用于将人数重新聚类后每类的声纹特征均值作为该类声纹特征的中心特征,求取该类中每个特征和中心特征的相似度,并基于预设的相似度阈值比较,将低于所述相似度阈值的标记为难分类文件;以及The similarity comparison unit is used to take the mean value of the voiceprint features of each class after re-clustering the number of people as the central feature of the voiceprint feature of this class, to obtain the similarity between each feature in the class and the central feature, and based on the preset similarity threshold comparison, marking documents below the similarity threshold as difficult to classify; and

审核单元,用于对聚类后的每类语音文件进行审核以确认所述语音文件包含的最终的人数,还用于对所述难分类文件再次审核,并重新分配到正确的类别中。The review unit is used for reviewing each type of voice files after clustering to confirm the final number of people included in the voice files, and is also used for reviewing the difficult-to-classify files again and reassigning them to the correct category.

第三方面,在本发明提供的又一个实施例中,提供了一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器加载并执行所述计算机程序时实现交互式的声纹聚类方法的步骤。In a third aspect, in another embodiment provided by the present invention, an electronic device is provided, including a memory and a processor, the memory stores a computer program, and the processor implements interaction when loading and executing the computer program The steps of the voiceprint clustering method of the formula.

第四方面,在本发明提供的再一个实施例中,提供了一种存储介质,存储有计算机程序,所述计算机程序被处理器加载并执行时实现所述交互式的声纹聚类方法的步骤。In a fourth aspect, in yet another embodiment provided by the present invention, a storage medium is provided, which stores a computer program, and when the computer program is loaded and executed by a processor, the interactive voiceprint clustering method is implemented. step.

本发明提供的技术方案,具有如下有益效果:The technical scheme provided by the invention has the following beneficial effects:

本发明的交互式的声纹聚类方法、系统、电子设备及存储介质,通过提取语音文件的特征向量,聚类后进行人工审核,将聚类后的每类语音文件中不同类但属于同一人的归并同类,根据最终归并的类数来确认最终的人数,在利用算法根据相似度标记难分类文件,在进行人工审核将难分类文件重新分配到正确的类中。通过人和算法的结合,以交互式的方法结合了算法和人的各自优点,解决算法方面的不足,从而提高最终聚类的效果。The interactive voiceprint clustering method, system, electronic device and storage medium of the present invention extract feature vectors of voice files, perform manual review after clustering, and classify different types of voice files after clustering but belong to the same type. People are merged into the same class, and the final number of people is confirmed according to the final number of merged classes. The algorithm is used to mark the difficult-to-classify files according to the similarity, and the difficult-to-classify files are reassigned to the correct class during manual review. Through the combination of people and algorithms, the advantages of algorithms and people are combined in an interactive way, and the shortcomings of algorithms are solved, thereby improving the effect of final clustering.

本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。These and other aspects of the invention will be more clearly understood from the description of the following embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

附图说明Description of drawings

附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification, and are used to explain the present invention together with the embodiments of the present invention, and do not constitute a limitation to the present invention. In the attached image:

图1为本发明的一种交互式的声纹聚类方法的流程图。FIG. 1 is a flowchart of an interactive voiceprint clustering method according to the present invention.

图2为本发明一个实施例中交互式的声纹聚类方法中特征向量提取的流程图。FIG. 2 is a flowchart of feature vector extraction in an interactive voiceprint clustering method according to an embodiment of the present invention.

图3为本发明一个实施例中交互式的声纹聚类系统的系统框图。FIG. 3 is a system block diagram of an interactive voiceprint clustering system in an embodiment of the present invention.

图4为本发明一个实施例中电子设备的结构框图。FIG. 4 is a structural block diagram of an electronic device in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

在本发明的说明书和权利要求书及上述附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如101、102等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。In some of the processes described in the description and claims of the present invention and the above-mentioned drawings, various operations are included in a specific order, but it should be clearly understood that these operations may not be in accordance with the order in which they appear herein. For execution or parallel execution, the sequence numbers of the operations, such as 101, 102, etc., are only used to distinguish different operations, and the sequence numbers themselves do not represent any execution order. Additionally, these flows may include more or fewer operations, and these operations may be performed sequentially or in parallel.

下面将结合本发明示例性实施例中的附图,对本发明示例性实施例中的技术方案进行清楚、完整地描述,显然,所描述的示例性实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the exemplary embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the exemplary embodiments of the present invention. Obviously, the described exemplary embodiments are only part of the embodiments of the present invention, rather than All examples. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present invention.

图1为本发明提供的一种交互式的声纹聚类方法的流程图。FIG. 1 is a flowchart of an interactive voiceprint clustering method provided by the present invention.

参阅图1所示,本发明提供了一种交互式的声纹聚类方法,该声纹聚类方法利用与人的交互来确认分类人数,并处理难分类语音问题。Referring to FIG. 1 , the present invention provides an interactive voiceprint clustering method, which utilizes interaction with people to confirm the number of people to classify, and to deal with the problem of difficult-to-classify speech.

在本实施例中,所述交互式的声纹聚类方法,包括如下步骤:In this embodiment, the interactive voiceprint clustering method includes the following steps:

S1、提取语音文件的特征向量,并基于预设的聚类阈值进行聚类;S1, extract the feature vector of the voice file, and perform clustering based on a preset clustering threshold;

S2、对聚类后的每类语音文件进行审核,以确认所述语音文件包含的最终的人数;S2. Review each type of voice file after clustering to confirm the final number of people included in the voice file;

S3、根据确认的人数重新聚类,并基于每类的声纹特征计算均值作为该类声纹特征的中心特征,求取该类中每个特征和中心特征的相似度,并基于预设的相似度阈值,将低于所述相似度阈值的标记为难分类文件;S3. Re-cluster according to the confirmed number of people, and calculate the mean value based on the voiceprint features of each class as the central feature of the voiceprint feature of this class, obtain the similarity between each feature in the class and the central feature, and based on the preset Similarity threshold, marking files below the similarity threshold as difficult to classify;

S4、对所述难分类文件再次审核,并重新分配到正确的类别中。S4. Review the hard-to-categorize files again, and reassign them to correct categories.

本实施例的交互式的声纹聚类方法分为上述四个主要步骤执行,其中,对语音文件的聚类处理以及标记难分类文件均为通过算法完成的声纹聚类操作,而其中对聚类后的每类语音文件的审核以及对所述难分类文件再次审核均是由人工进行审核完成,本发明的交互式的声纹聚类方法通过人和算法的结合,以交互式的方法结合了算法和人的各自优点,解决算法方面的不足,从而提高最终聚类的效果。The interactive voiceprint clustering method of this embodiment is divided into the above-mentioned four main steps to perform, wherein, the clustering of voice files and the marking of difficult-to-classify files are both voiceprint clustering operations completed by algorithms, and the The auditing of each type of voice files after the clustering and the re-examining of the hard-to-classify files are all done manually. Combining the respective advantages of algorithms and people, it solves the shortcomings of the algorithm, thereby improving the effect of the final clustering.

在本实施例的步骤S1中,所述语音文件的特征向量通过声纹提取算法提取,参见图2所示,所述语音文件的特征向量提取的方法,包括以下步骤:In step S1 of this embodiment, the feature vector of the voice file is extracted by a voiceprint extraction algorithm. Referring to FIG. 2 , the method for extracting the feature vector of the voice file includes the following steps:

S101、获取待提取的语音文件;S101, obtaining a voice file to be extracted;

S102、对获取所述语音文件进行语音识别,获得所述语音文件的音频特征;S102, performing speech recognition on the acquired voice file to obtain an audio feature of the voice file;

S103、将获得的音频特征输入模型中,获得所述语音文件的特征向量。S103. Input the obtained audio feature into the model, and obtain the feature vector of the voice file.

在本实施例中,提取声纹的算法采用目前主流的ivector,xvector等。具体步骤就是音频文件其他前端特征,如mfcc或fbank等,在本实施例中,语音文件进行声纹特征提取,采用ivector或xvector算法提取语音文件的特征向量,然后输入模型中,最终获取特征向量。In this embodiment, the algorithm for extracting voiceprint adopts the current mainstream ivector, xvector, and the like. The specific steps are other front-end features of the audio file, such as mfcc or fbank. In this embodiment, voiceprint feature extraction is performed on the voice file, and the ivector or xvector algorithm is used to extract the feature vector of the voice file, and then input into the model, and finally the feature vector is obtained. .

在本实施例中,进行聚类操作时,不同的特征向量根据plda算法或cosine算法进行相似度计算,其中,plda算法或cosine算法为现有驻留的相似度计算算法。在本实施例中,所述聚类时相似度的算法包括但不局限于plda算法或cosine算法。In this embodiment, during the clustering operation, the similarity calculation of different feature vectors is performed according to the plda algorithm or the cosine algorithm, wherein the plda algorithm or the cosine algorithm is an existing resident similarity calculation algorithm. In this embodiment, the clustering similarity algorithm includes but is not limited to the plda algorithm or the cosine algorithm.

因此,基于预设的聚类阈值进行聚类时,将语音文件提取的不同特征向量进行相似度计算,其中,不同特征向量根据cosine或plda的相似度算法计算相似度。Therefore, when clustering is performed based on the preset clustering threshold, the similarity calculation is performed on different feature vectors extracted from the voice file, wherein the similarity is calculated according to the similarity algorithm of cosine or plda.

在本实施例中,基于预设的聚类阈值进行聚类时,采用AHC聚类或SC聚类方法进行聚类。需要说明的时,基于预设的聚类阈值进行聚类时,聚类方法包括但不局限于AHC聚类或SC聚类方法。In this embodiment, when performing clustering based on a preset clustering threshold, AHC clustering or SC clustering method is used to perform clustering. It should be noted that, when performing clustering based on a preset clustering threshold, the clustering method includes but is not limited to AHC clustering or SC clustering method.

在本实施例中,所述预设的聚类阈值为通过测试确认的阈值结果。在确认阈值结果之前,还通过给出的测试文件进行聚类测试,选择效果最好的阈值作为之后使用的默认阈值。In this embodiment, the preset clustering threshold is a threshold result confirmed through testing. Before confirming the threshold results, a clustering test is also performed through the given test file, and the threshold with the best effect is selected as the default threshold used later.

基于预设的聚类阈值进行聚类时,预设的聚类阈值会采用较高的值,使每一类中都是同一人。通过上述聚类后,通常聚类数量要大于实际的聚类人数,但聚类后的类数要远小于语音文件数量,可以借助聚类操作在很大程度上节省人为聚类的时间,提高了声纹聚类操作的效率,也方便后续人工审核。When clustering is performed based on the preset clustering threshold, the preset clustering threshold will adopt a higher value, so that the same person is in each category. After the above clustering, the number of clusters is usually larger than the actual number of clusters, but the number of clusters after clustering is much smaller than the number of voice files. The clustering operation can greatly save the time of artificial clustering and improve the This improves the efficiency of voiceprint clustering operations and facilitates subsequent manual review.

例如:本发明一个实施例的一种声纹提取算法,其预设的聚类阈值是0.7,采用较高的阈值0.8,有20个语音文件进行聚类,分成了五类,由于采用较高阈值聚类,每类中基本为同一人的语音。然后人工只需听每类中任意一条,这里有5类,因此只需听5条语音,然后将相同的类合并,这里假设人工合并了两类,最后20个文件分成了4类。本实施例中,不同算法计算相似度方式不一样,因此较高阈值也是相对而言。For example, in a voiceprint extraction algorithm according to an embodiment of the present invention, the preset clustering threshold is 0.7, and the higher threshold is 0.8. There are 20 voice files for clustering, which are divided into five categories. Threshold clustering, each class is basically the same person's voice. Then the manual only needs to listen to any one of each category, there are 5 categories, so only need to listen to 5 voices, and then merge the same category. Here, it is assumed that the two categories are manually merged, and the last 20 files are divided into 4 categories. In this embodiment, different algorithms calculate the similarity in different ways, so the higher threshold is also relatively speaking.

在本实施例的步骤S2中,对聚类后的每类语音文件进行审核时为人工进行审核,通过听取录音文件说话人的声音进行判断,将不同类但为同一人的归并同类。In step S2 of this embodiment, the auditing of each type of voice file after clustering is performed manually, and judgment is made by listening to the voice of the speaker of the recording file, and the same person but different types are merged into the same type.

人工进行审核时,根据听每类的语音文件,将不同类但属于同一人的归并同类,根据最终归并的类数来确认最终的人数;由于前一步骤阈值较高,会将同一人分成两人,这一步借助人工方式,进一步归并来确认人数,前一步骤较高的阈值并来确认人数,结果相对正常阈值会多分类,同时分得的每个类也更为纯净,这样更好的结合人工操作。When manually reviewing, according to listening to the voice files of each category, the same person will be merged into the same category, and the final number of people will be confirmed according to the final number of merged categories; due to the high threshold in the previous step, the same person will be divided into two groups. People, this step uses manual methods to further merge to confirm the number of people. The higher threshold in the previous step is used to confirm the number of people. Compared with the normal threshold, the result will be more classified, and each class will be more pure, which is better. Combined with manual operation.

本实施例中,人工审核是通过听取录音文件说话人的声音进行判断的,这里聚类文件会很多,但实际的说话人数并不会很多,所以通过人工进行判别是可行的。比如两个语音文件,由于噪声,采集设备的差异或算法本身缺陷造成相似打分较低,但对于人来说这方面的问题是可以弥补的,能够很好的判断两个语音是否同一人。In this embodiment, the manual review is judged by listening to the voice of the speaker of the recording file. There will be many clustered files, but the actual number of speakers will not be many, so it is feasible to judge manually. For example, for two voice files, the similarity score is low due to noise, differences in acquisition equipment, or defects in the algorithm itself, but for humans, this problem can be compensated, and it can be well judged whether the two voices are the same person.

根据步骤S2中确认的人数重新聚类,并基于每类的声纹特征计算均值作为该类的中心特征。在本实施例中,声纹特征是利用算法计算出的可以表征声纹的一维向量。计算均值作为该类的中心特征时,比如共有三个文件,提取的声纹向量为A,B,C。则中心特征为1/3*(A+B+C)。Re-cluster according to the number of people confirmed in step S2, and calculate the mean value based on the voiceprint features of each class as the central feature of the class. In this embodiment, the voiceprint feature is a one-dimensional vector calculated by an algorithm that can characterize the voiceprint. When calculating the mean as the central feature of the class, for example, there are three files, and the extracted voiceprint vectors are A, B, and C. Then the central feature is 1/3*(A+B+C).

求取该类中每个特征和中心特征的相似度时,相似度有相关计算方法与步骤S2相同,可以采用比如cosine或plda算法,相似度低于预设的低阈值就是难分类文件,或者称为相似度低的文件。When calculating the similarity between each feature and the central feature in the class, the similarity calculation method is the same as that of step S2. For example, cosine or plda algorithm can be used. If the similarity is lower than the preset low threshold, it is a difficult-to-classify file, or referred to as low-similarity files.

在步骤S3中,由于算法一般基于人数的聚类比基于相似度阈值的聚类效果要更好,因此确认了人数后再进行一次聚类从而提高聚类的准确性。同时该步骤采用人数强制分类,会有些相似度较低的归为一类,该类可能由于算法问题错误分类,需后续进一步确认。In step S3, since the clustering effect based on the number of people is generally better than the clustering effect based on the similarity threshold, the clustering is performed again after confirming the number of people to improve the accuracy of the clustering. At the same time, this step adopts the forced classification of the number of people, and some with low similarity will be classified into one category. This category may be misclassified due to algorithm problems and needs to be further confirmed later.

因此,在难分类的语音文件人工再次审核时,步骤S3将相似度低的语音标记为难分类,在步骤S4中通过人工方式来弥补算法的不足,从而进一步提高分类的准确性。Therefore, when the difficult-to-classify voice files are manually reviewed again, step S3 marks the voices with low similarity as difficult to classify, and in step S4, artificial methods are used to make up for the deficiencies of the algorithm, thereby further improving the classification accuracy.

本实施例中,对所述难分类文件再次审核时,为人工再次审核弥补算法的不足,并重新分配到正确的类别中。In this embodiment, when the hard-to-categorize files are reviewed again, the deficiencies of the algorithm are compensated for by manual review, and are reassigned to the correct category.

应该理解的是,上述虽然是按照某一顺序描述的,但是这些步骤并不是必然按照上述顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,本实施例的一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the above is described in a certain order, these steps are not necessarily performed sequentially in the above order. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, a part of the steps in this embodiment may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times, and the execution sequence of these steps or stages is also different. It is necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages within the other steps.

在一个实施例中,参见图3所示,本发明实施例提供了一种交互式的声纹聚类系统,该系统包括特征提取单元100、聚类单元200、相似度比较单元300以及审核单元400。In one embodiment, as shown in FIG. 3 , an embodiment of the present invention provides an interactive voiceprint clustering system, the system includes a feature extraction unit 100 , a clustering unit 200 , a similarity comparison unit 300 and an auditing unit 400.

其中,所述特征提取单元100,用于对获取的语音文件进行语音识别提取特征向量。在本实施例中,提取声纹的算法采用目前主流的ivector,xvector等。其中,在本实施例中,语音文件进行声纹特征提取,采用ivector或xvector算法提取语音文件的特征向量,然后输入模型中,最终获取特征向量。Wherein, the feature extraction unit 100 is configured to perform voice recognition on the acquired voice file to extract feature vectors. In this embodiment, the algorithm for extracting voiceprint adopts the current mainstream ivector, xvector, and the like. Wherein, in this embodiment, voiceprint feature extraction is performed on the voice file, and the ivector or xvector algorithm is used to extract the feature vector of the voice file, and then input into the model to finally obtain the feature vector.

所述聚类单元200用于对提取的特征向量基于预设的聚类阈值进行聚类。所述聚类单元200在基于预设的聚类阈值进行聚类时,采用AHC聚类或SC聚类方法进行聚类。The clustering unit 200 is configured to cluster the extracted feature vectors based on a preset clustering threshold. When performing clustering based on a preset clustering threshold, the clustering unit 200 adopts AHC clustering or SC clustering method for clustering.

需要说明的时,基于预设的聚类阈值进行聚类时,聚类方法包括但不局限于AHC聚类或SC聚类方法。It should be noted that, when performing clustering based on a preset clustering threshold, the clustering method includes but is not limited to AHC clustering or SC clustering method.

所相似度比较单元300用于将人数重新聚类后每类的声纹特征均值作为该类声纹特征的中心特征,求取该类中每个特征和中心特征的相似度,并基于预设的相似度阈值比较,将低于所述相似度阈值的标记为难分类文件。The similarity comparison unit 300 is used for re-clustering the number of people and the voiceprint feature mean of each class as the central feature of the voiceprint feature of this class, to obtain the similarity of each feature and the central feature in the class, and based on the preset The similarity threshold is compared, and the files below the similarity threshold are marked as difficult to classify.

所相似度比较单元300进行聚类操作时,不同的特征向量根据plda算法或cosine算法进行相似度计算,其中,plda算法或cosine算法为现有驻留的相似度计算算法。在本实施例中,所述聚类时相似度的算法包括但不局限于plda算法或cosine算法。When the similarity comparison unit 300 performs the clustering operation, different feature vectors perform similarity calculation according to the plda algorithm or the cosine algorithm, wherein the plda algorithm or the cosine algorithm is an existing resident similarity calculation algorithm. In this embodiment, the clustering similarity algorithm includes but is not limited to the plda algorithm or the cosine algorithm.

基于预设的聚类阈值进行聚类时,将语音文件提取的不同特征向量进行相似度计算,其中,不同特征向量根据cosine或plda的相似度算法计算相似度。When performing clustering based on a preset clustering threshold, the similarity calculation is performed on different feature vectors extracted from the speech file, wherein the similarity is calculated according to the similarity algorithm of cosine or plda.

所述审核单元400用于对聚类后的每类语音文件进行审核以确认所述语音文件包含的最终的人数,还用于对所述难分类文件再次审核,并重新分配到正确的类别中。The review unit 400 is used to review each type of voice file after the clustering to confirm the final number of people contained in the voice file, and is also used to review the difficult-to-classify files again and reassign them to the correct category. .

所述审核单元400通常为人工进行审核,人工进行审核时,根据听每类的语音文件,将不同类但属于同一人的归并同类,根据最终归并的类数来确认最终的人数;由于前一步骤阈值较高,会将同一人分成两人,这一步借助人工方式,进一步归并来确认人数,前一步骤较高的阈值并来确认人数,结果相对正常阈值会多分类,同时分得的每个类也更为纯净,这样更好的结合人工操作。The auditing unit 400 usually conducts manual auditing, and when manually auditing, according to listening to the voice files of each category, merge different categories but belong to the same person into the same category, and confirm the final number of people according to the final number of categories merged; The step threshold is higher, and the same person will be divided into two people. In this step, the number of people is confirmed by further merging by means of manual methods. The higher threshold value in the previous step is combined to confirm the number of people. The result will be more classified than the normal threshold value. The class is also more pure, which is better combined with manual operations.

在本实施例中,交互式的声纹聚类系统在执行时采用如前述的一种交互式的声纹聚类方法的步骤,因此,本实施例中对交互式的声纹聚类系统的运行过程不再详细介绍。In this embodiment, the interactive voiceprint clustering system adopts the steps of the above-mentioned interactive voiceprint clustering method. Therefore, in this embodiment, the interactive voiceprint clustering system has The operation process will not be described in detail.

在一个实施例中,图4示出根据本发明一实施方式的电子设备的结构框图。在一个可能的设计中,如图4所示,在本发明的实施例中提供了一种电子设备600,该电子设备600包括存储器601和处理器602,存储器601中存储有计算机程序,该处理器602被配置为用于执行所述存储器601中存储的计算机程序。所述存储器601用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器602执行以实现上述方法实施例中的步骤:In one embodiment, FIG. 4 shows a structural block diagram of an electronic device according to an embodiment of the present invention. In a possible design, as shown in FIG. 4 , an electronic device 600 is provided in an embodiment of the present invention. The electronic device 600 includes a memory 601 and a processor 602 , and a computer program is stored in the memory 601 to process the The device 602 is configured to execute the computer program stored in the memory 601 . The memory 601 is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 602 to implement the steps in the above method embodiments:

提取语音文件的特征向量,并基于预设的聚类阈值进行聚类;Extract the feature vector of the voice file, and perform clustering based on the preset clustering threshold;

对聚类后的每类语音文件进行审核,以确认所述语音文件包含的最终的人数;Auditing each type of voice file after clustering to confirm the final number of people contained in the voice file;

根据确认的人数重新聚类,并基于每类的声纹特征计算均值作为该类声纹特征的中心特征,求取该类中每个特征和中心特征的相似度,并基于预设的相似度阈值,将低于所述相似度阈值的标记为难分类文件;Re-cluster according to the confirmed number of people, and calculate the mean value based on the voiceprint features of each class as the central feature of the voiceprint feature of this class, find the similarity between each feature in the class and the central feature, and based on the preset similarity Threshold, marking files below the similarity threshold as difficult to classify;

对所述难分类文件再次审核,并重新分配到正确的类别中。The hard-to-categorize files are reviewed again and reassigned to the correct category.

需要特别说明的是,根据本发明的实施方式,上文参考附图描述的方法可以被实现为计算机软件程序。例如,本发明的实施方式包括一种计算机程序产品,其包括有形地包含在及其可读介质上的计算机程序,所述计算机程序包含用于执行附图中的方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信部分从网络上被下载和安装,和/或从可拆卸介质被安装。It should be noted that, according to the embodiments of the present invention, the methods described above with reference to the accompanying drawings may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a readable medium thereof, the computer program containing program code for performing the methods of the Figures. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion, and/or installed from a removable medium.

附图中的流程图和框图,图示了按照本发明各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,路程图或框图中的每个方框可以代表一个单元、程序段或代码的一部分,所述单元、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the diagram or block diagram may represent a unit, segment, or portion of code that contains one or more functions for implementing the specified logical function. executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

在本发明的实施例中提供了一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤:An embodiment of the present invention provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps in the foregoing method embodiments:

提取语音文件的特征向量,并基于预设的聚类阈值进行聚类;Extract the feature vector of the voice file, and perform clustering based on the preset clustering threshold;

对聚类后的每类语音文件进行审核,以确认所述语音文件包含的最终的人数;Auditing each type of voice file after clustering to confirm the final number of people contained in the voice file;

根据确认的人数重新聚类,并基于每类的声纹特征计算均值作为该类声纹特征的中心特征,求取该类中每个特征和中心特征的相似度,并基于预设的相似度阈值,将低于所述相似度阈值的标记为难分类文件;Re-cluster according to the confirmed number of people, and calculate the mean value based on the voiceprint features of each class as the central feature of the voiceprint feature of this class, find the similarity between each feature in the class and the central feature, and based on the preset similarity Threshold, marking files below the similarity threshold as difficult to classify;

对所述难分类文件再次审核,并重新分配到正确的类别中。The hard-to-categorize files are reviewed again and reassigned to the correct category.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机指令表征的计算机程序来指令相关的硬件来完成,的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through a computer program represented by computer instructions, and the computer program can be stored in a non-volatile computer readable In the storage medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory.

非易失性存储器可包括只读存储器、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器或动态随机存取存储器等。Non-volatile memory may include read-only memory, magnetic tape, floppy disk, flash memory, or optical memory, among others. Volatile memory may include random access memory or external cache memory. By way of illustration and not limitation, RAM may take various forms, such as static random access memory or dynamic random access memory, among others.

综上所述,本发明的交互式的声纹聚类方法、系统、电子设备及存储介质,通过提取语音文件的特征向量,聚类后进行人工审核,将聚类后的每类语音文件中不同类但属于同一人的归并同类,根据最终归并的类数来确认最终的人数,在利用算法根据相似度标记难分类文件,在进行人工审核将难分类文件重新分配到正确的类中。通过人和算法的结合,以交互式的方法结合了算法和人的各自优点,解决算法方面的不足,从而提高最终聚类的效果。To sum up, the interactive voiceprint clustering method, system, electronic device and storage medium of the present invention extract feature vectors of voice files, perform manual review after clustering, and classify each type of voice file after clustering into For the merging of different categories but belonging to the same person, the final number of people is confirmed according to the final number of merged categories. The algorithm is used to mark the difficult-to-classify files according to the similarity, and manual review is performed to reassign the difficult-to-classify files to the correct class. Through the combination of people and algorithms, the advantages of algorithms and people are combined in an interactive way, and the shortcomings of algorithms are solved, thereby improving the effect of final clustering.

以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims (10)

1. An interactive voiceprint clustering method, comprising:
extracting a feature vector of the voice file, and clustering based on a preset clustering threshold value;
auditing each type of clustered voice files to confirm the final number of people contained in the voice files;
re-clustering according to the number of confirmed people, calculating a mean value based on the voiceprint features of each class to serve as the central features of the voiceprint features of the class, solving the similarity between each feature and the central features in the class, and marking the documents which are lower than the similarity threshold as documents which are difficult to classify based on a preset similarity threshold;
and reviewing the hard classified files again and re-allocating the hard classified files to correct classes.
2. The interactive voiceprint clustering method of claim 1 wherein: the feature vector of the voice file is extracted through a voiceprint extraction algorithm, and the method for extracting the feature vector of the voice file comprises the following steps:
acquiring a voice file to be extracted;
carrying out voice recognition on the obtained voice file to obtain the audio characteristics of the voice file;
and inputting the obtained audio features into a model to obtain feature vectors of the voice file.
3. The interactive voiceprint clustering method of claim 2 wherein: and extracting the voiceprint features of the voice file, and extracting the feature vector of the voice file by adopting an vector or xvector algorithm to obtain the feature vector of the voice file.
4. The interactive voiceprint clustering method of claim 1 wherein: and when clustering is carried out based on a preset clustering threshold, carrying out similarity calculation on different feature vectors extracted from the voice file, wherein the similarity is calculated by the different feature vectors according to a similarity calculation method of cosine or plda.
5. The interactive voiceprint clustering method of claim 1 wherein: and when clustering is carried out based on a preset clustering threshold value, clustering is carried out by adopting an AHC clustering or SC clustering method.
6. The interactive voiceprint clustering method of claim 5 wherein: the preset clustering threshold is a threshold result confirmed by testing.
7. The interactive voiceprint clustering method of claim 6 wherein: and auditing is manually performed when each type of clustered voice files are audited, and different types but the same type are merged into the same type by judging through hearing the voice of the speaker of the record file.
8. An interactive voiceprint clustering system, characterized by: the interactive voiceprint clustering system adopts the interactive voiceprint clustering method of any one of claims 1 to 7 to improve the final clustering effect; the interactive voiceprint clustering system comprises:
the characteristic extraction unit is used for carrying out voice recognition on the acquired voice file to extract a characteristic vector;
the clustering unit is used for clustering the extracted feature vectors based on a preset clustering threshold value;
the similarity comparison unit is used for taking the mean value of the voiceprint features of each class after people are clustered again as the central features of the voiceprint features of the class, solving the similarity between each feature in the class and the central features, comparing the similarity based on a preset similarity threshold, and marking the tags which are lower than the similarity threshold as documents which are difficult to classify; and
and the auditing unit is used for auditing each type of clustered voice files to confirm the final number of people contained in the voice files, and is also used for auditing the files difficult to classify again and redistributing the files into correct classes.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, wherein the steps of the method of any one of claims 1 to 7 are implemented when the computer program is loaded and executed by the processor.
10. A storage medium storing a computer program, characterized in that the computer program is loaded by a processor and executed to implement the steps of the method according to any of claims 1 to 7.
CN202210127098.1A 2022-02-11 2022-02-11 Interactive voiceprint clustering method, system, electronic device and storage medium Active CN114596863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210127098.1A CN114596863B (en) 2022-02-11 2022-02-11 Interactive voiceprint clustering method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210127098.1A CN114596863B (en) 2022-02-11 2022-02-11 Interactive voiceprint clustering method, system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN114596863A true CN114596863A (en) 2022-06-07
CN114596863B CN114596863B (en) 2024-12-24

Family

ID=81805039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210127098.1A Active CN114596863B (en) 2022-02-11 2022-02-11 Interactive voiceprint clustering method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114596863B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708850A (en) * 2022-02-24 2022-07-05 厦门快商通科技股份有限公司 Interactive voice segmentation and clustering method, device and equipment
CN116013322A (en) * 2022-12-08 2023-04-25 北京奇艺世纪科技有限公司 Method and device for determining character corresponding to line, and electronic equipment
WO2025138958A1 (en) * 2023-12-29 2025-07-03 杭州阿里云飞天信息技术有限公司 File classification method and apparatus, electronic device, storage medium, and program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
WO2019237517A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Speaker clustering method and apparatus, and computer device and storage medium
CN111933147A (en) * 2020-06-22 2020-11-13 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN112669855A (en) * 2020-12-17 2021-04-16 北京沃东天骏信息技术有限公司 Voice processing method and device
CN112735437A (en) * 2020-12-15 2021-04-30 厦门快商通科技股份有限公司 Voiceprint comparison method, system and device and storage mechanism
WO2021164515A1 (en) * 2020-02-17 2021-08-26 中国银联股份有限公司 Detection method and apparatus for tampered image
WO2021237570A1 (en) * 2020-05-28 2021-12-02 深圳市欢太科技有限公司 Image auditing method and apparatus, device, and storage medium
CN113851136A (en) * 2021-09-26 2021-12-28 平安科技(深圳)有限公司 Cluster-based speaker recognition method, device, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
WO2019237517A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Speaker clustering method and apparatus, and computer device and storage medium
WO2021164515A1 (en) * 2020-02-17 2021-08-26 中国银联股份有限公司 Detection method and apparatus for tampered image
WO2021237570A1 (en) * 2020-05-28 2021-12-02 深圳市欢太科技有限公司 Image auditing method and apparatus, device, and storage medium
CN111933147A (en) * 2020-06-22 2020-11-13 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN112735437A (en) * 2020-12-15 2021-04-30 厦门快商通科技股份有限公司 Voiceprint comparison method, system and device and storage mechanism
CN112669855A (en) * 2020-12-17 2021-04-16 北京沃东天骏信息技术有限公司 Voice processing method and device
CN113851136A (en) * 2021-09-26 2021-12-28 平安科技(深圳)有限公司 Cluster-based speaker recognition method, device, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付瑞;牛泰龙;常琳;: "声纹识别技术在广播监测领域的应用探究", 现代电视技术, no. 03, 15 March 2020 (2020-03-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708850A (en) * 2022-02-24 2022-07-05 厦门快商通科技股份有限公司 Interactive voice segmentation and clustering method, device and equipment
CN114708850B (en) * 2022-02-24 2025-08-12 厦门快商通科技股份有限公司 Interactive voice segmentation and clustering method, device and equipment
CN116013322A (en) * 2022-12-08 2023-04-25 北京奇艺世纪科技有限公司 Method and device for determining character corresponding to line, and electronic equipment
CN116013322B (en) * 2022-12-08 2025-10-03 北京奇艺世纪科技有限公司 Method, device and electronic equipment for determining characters corresponding to lines
WO2025138958A1 (en) * 2023-12-29 2025-07-03 杭州阿里云飞天信息技术有限公司 File classification method and apparatus, electronic device, storage medium, and program product

Also Published As

Publication number Publication date
CN114596863B (en) 2024-12-24

Similar Documents

Publication Publication Date Title
US11776547B2 (en) System and method of video capture and search optimization for creating an acoustic voiceprint
CN114596863A (en) Interactive voiceprint clustering method and system, electronic equipment and storage medium
CN109637547B (en) Audio data labeling method and device, electronic equipment and storage medium
CN107452385A (en) A kind of voice-based data evaluation method and device
US20180342250A1 (en) Automatic speaker identification in calls
WO2020147407A1 (en) Conference record generation method and apparatus, storage medium and computer device
CN110634490B (en) Voiceprint identification method, device and equipment
CN107154257A (en) Customer service quality evaluating method and system based on customer voice emotion
CN109256137A (en) Voice acquisition method, device, computer equipment and storage medium
CN108597525A (en) Voice vocal print modeling method and device
CN111933144A (en) Conference voice transcription method and device for post-creation of voiceprint and storage medium
Sinclair et al. Where are the challenges in speaker diarization?
CN114067807B (en) Audio data processing method, device and electronic equipment
CN106776832A (en) Processing method, apparatus and system for question and answer interactive log
CN113808612A (en) Voice processing method, device and storage medium
US8494986B2 (en) Information processing apparatus, information processing method, and program
CN111010484A (en) Automatic quality inspection method for call recording
WO2019132690A1 (en) Method and device for building voice model of target speaker
CN113163155A (en) User head portrait generation method and device, electronic equipment and storage medium
JP2005321530A (en) Utterance identification system and method therefor
CN111063359A (en) Telephone return visit validity judging method, device, computer equipment and medium
CN118153538A (en) Syntax template construction method, device and electronic device
JP2015230455A (en) Voice classification device, voice classification method, and program
CN115114911A (en) Interest category identification method, device, equipment, storage medium and program product
CN113642503A (en) Window service scoring method and system based on image and voice recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant