CN105933272A - Voiceprint recognition method capable of preventing recording attack, server, terminal, and system - Google Patents
Voiceprint recognition method capable of preventing recording attack, server, terminal, and system Download PDFInfo
- Publication number
- CN105933272A CN105933272A CN201511020257.4A CN201511020257A CN105933272A CN 105933272 A CN105933272 A CN 105933272A CN 201511020257 A CN201511020257 A CN 201511020257A CN 105933272 A CN105933272 A CN 105933272A
- Authority
- CN
- China
- Prior art keywords
- user
- voice
- characters
- character
- voiceprint authentication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0861—Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
 
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
 
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Telephonic Communication Services (AREA)
Abstract
本发明提供了一种能够防止录音攻击的声纹认证方法、服务器、终端及系统,声纹认证方法包括:根据一用户的声纹认证请求生成字符组合及字符的发音规则;将所述字符组合及字符的发音规则发送给请求终端;接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端。本发明能够有效的防止录音攻击。
The present invention provides a voiceprint authentication method, server, terminal and system capable of preventing recording attacks. The voiceprint authentication method includes: generating character combinations and character pronunciation rules according to a user's voiceprint authentication request; and the pronunciation rules of the character are sent to the requesting terminal; receiving the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character; performing voiceprint authentication according to the user voice, the character combination and the pronunciation rule of the character; Send the voiceprint authentication result to the requesting terminal. The invention can effectively prevent recording attacks.
Description
技术领域technical field
本发明属于声纹识别领域,特别涉及一种能够防止录音攻击的声纹认证方法、服务器、终端及系统。The invention belongs to the field of voiceprint recognition, in particular to a voiceprint authentication method, server, terminal and system capable of preventing recording attacks.
背景技术Background technique
声纹同指纹一样,是一种非常重要的能表征人身份的生物特征。相比传统的密码认证等手段,声纹高安全性和便捷性等特点。声纹认证中最常用的攻击手段主要有录音回放攻击、说话人仿冒攻击及伪造认证语音攻击。Like fingerprints, voiceprint is a very important biological feature that can represent a person's identity. Compared with traditional methods such as password authentication, voiceprint has the characteristics of high security and convenience. The most commonly used attack methods in voiceprint authentication mainly include recording playback attack, speaker spoofing attack and fake authentication voice attack.
其中录音回放攻击是指攻击者通过高保真的录音设备通过各种手段获取用户的语音样本,使用用户的原始录音或者通过裁剪、拼接等手段处理后合成“说话人真音”,然后在认证系统采集用户语音时,通过高保真的功放进行回放,从而进行攻击。说话人仿冒攻击是指一些善于默认他人语音的攻击者通过模仿说话人的说话方式以及发音特点来进行攻击。伪造认证语音攻击是指通过合成、转换、拼接等技术手段伪造被攻击者的语音来进行攻击。Among them, the recording playback attack refers to that the attacker obtains the user's voice sample through various means through high-fidelity recording equipment, uses the user's original recording or processes it through cropping, splicing, etc. When the user's voice is collected, it is played back through a high-fidelity power amplifier to attack. Speaker spoofing attack means that some attackers who are good at acquiescing to other people's voice attack by imitating the speaker's speaking style and pronunciation characteristics. Forged authenticated voice attack refers to the attack by forging the voice of the victim through synthesis, conversion, splicing and other technical means.
说话人仿冒攻击需要攻击者具有很好的模仿能力,伪造认证语音攻击也往往需要较高的专业技能,这两种攻击本身攻击难道就高,另外无论是模仿音还是伪造音,终究不是真实音,现有的声纹识别技术基本能够应对这两类攻击。Speaker spoofing attacks require the attacker to have good imitation ability, and fake authentication voice attacks often require high professional skills. Is the attack of these two attacks high? In addition, whether it is imitation voice or fake voice, it is not real voice after all. , the existing voiceprint recognition technology can basically deal with these two types of attacks.
录音回放攻击是声纹识别中面临的非常重要的问题,攻击者获取声音后通过软件合成来进行攻击。录音攻击有两种情况,一种是用户在其他情况下说话声音被窃取来进行攻击;另一种是用户在进行声纹识别时,通过恶意软件录取用户的声音进行攻击。Recording playback attack is a very important problem in voiceprint recognition. The attacker obtains the voice and then uses software synthesis to attack. There are two types of recording attacks. One is that the user's voice is stolen in other circumstances to attack; the other is that the user uses malware to record the user's voice during voiceprint recognition.
针对录音攻击,现有技术中,主要有如下两种解决方法:For recording attacks, in the prior art, there are mainly the following two solutions:
第一种方案是通过分析录音和原始语音之间在信道特征模式上差异来分辨出是否是录音内容;第二种方案是在验证说话人的声纹的同时,也验证说话人的说话内容,因为录音攻击者并不知道本次的说话内容。The first solution is to distinguish whether it is the recording content by analyzing the difference in the channel characteristic mode between the recording and the original voice; the second solution is to verify the speaker's speech content while verifying the speaker's voiceprint, Because the recording attacker does not know the content of this speech.
但是,方案一对声音信号质量、信噪比、通道质量等要求很高,在实际应用中取得的效果并不是很好。However, the solution has high requirements on sound signal quality, signal-to-noise ratio, and channel quality, and the effect achieved in practical applications is not very good.
方案二中如果每次随机的让用户读写大段文字,用户体验较差,如果减少用户的语音输入,比如专利(申请号:201310123555.0;发明名称:基于动态密码语音的身份确认系统及方法),从26个英文字母以及10个数字中挑选组合,每次随机组合生产动态密码后,让用户通过语音进行输入,由于事先并不知道每次生产的动态密码,所以可以抵抗简单的录音攻击,是一种较好的解决办法。但是由于该专利只在26个英文字母和10个数字中共36个字符随机组合,如果攻击者通过录音分隔的方式,分隔出这36个字符,那么无论得到何种随机字符串,攻击者只需要简单的通过36个字符中进行拼接进行攻击。In Scheme 2, if the user is randomly asked to read and write a large piece of text each time, the user experience is poor. If the user’s voice input is reduced, such as a patent (application number: 201310123555.0; invention name: identity confirmation system and method based on dynamic password voice) , select a combination from 26 English letters and 10 numbers, and after each random combination produces a dynamic password, let the user input by voice. Since the dynamic password produced each time is not known in advance, it can resist simple recording attacks. is a better solution. However, since the patent only randomly combines 26 English letters and 10 numbers with a total of 36 characters, if the attacker separates the 36 characters by recording, no matter what kind of random string is obtained, the attacker only needs to Simply attack by concatenating 36 characters.
发明内容Contents of the invention
本发明提供一种具有防止录音攻击功能的声纹认证方法、服务器及终端,用于解决现有技术中防止录音攻击方法存在漏洞,不能有效的防止录音攻击的缺陷。The invention provides a voiceprint authentication method, a server and a terminal with the function of preventing recording attacks, which are used to solve the defects that there are loopholes in the methods for preventing recording attacks in the prior art and cannot effectively prevent recording attacks.
为了解决上述技术问题,本发明提供一种能够防止录音攻击的声纹认证方法,In order to solve the above technical problems, the present invention provides a voiceprint authentication method capable of preventing recording attacks,
根据一用户的声纹认证请求生成字符组合及字符的发音规则;Generate character combinations and character pronunciation rules according to a user's voiceprint authentication request;
将所述字符组合及字符的发音规则发送给请求终端;Send the pronunciation rules of the character combination and characters to the requesting terminal;
接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;receiving the user voice input by the requesting terminal according to the pronunciation rules of the character combination and characters;
根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;;Perform voiceprint authentication according to the user's voice, the character combination and the pronunciation rules of the characters;
将所述声纹认证结果发送至所述请求终端。Send the voiceprint authentication result to the requesting terminal.
本发明另提供一种能够防止录音攻击的声纹认证方法,The present invention also provides a voiceprint authentication method capable of preventing recording attacks,
发送一用户的声纹认证请求至服务器;Send a user's voiceprint authentication request to the server;
接收并显示所述服务器发送的字符组合及字符的发音规则;receiving and displaying the character combination and the pronunciation rules of the characters sent by the server;
接收用户根据所述字符组合及字符的发音规则输入的用户语音;receiving the user's voice input by the user according to the pronunciation rules of the character combination and characters;
将所述用户语音发送至所述服务器;sending the user voice to the server;
接收所述服务器发送的声纹认证结果。Receive the voiceprint authentication result sent by the server.
本发明另提供一种能够防止录音的声纹认证服务器,The present invention also provides a voiceprint authentication server capable of preventing recording,
生成单元,用于根据一用户的请求生成字符组合及字符的发音规则;A generating unit, configured to generate character combinations and character pronunciation rules according to a user's request;
发送单元,用于将所述字符组合及字符的发音规则发送给请求终端,将声纹认证结果发送至所述请求终端;A sending unit, configured to send the character combination and the pronunciation rules of the characters to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
接收单元,用于接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;a receiving unit, configured to receive the user voice input by the requesting terminal according to the character combination and character pronunciation rules;
声音检测单元,用于根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;A sound detection unit, configured to perform voiceprint authentication according to the user's voice, the character combination and the pronunciation rules of the characters;
本发明又提供一种能够防止录音攻击的声纹认证终端,The present invention also provides a voiceprint authentication terminal capable of preventing recording attacks,
请求单元,用于发送一用户的声纹认证请求至服务器;A request unit, configured to send a user's voiceprint authentication request to the server;
接收单元,用于接收并显示所述服务器发送的字符组合及字符的发音规则,接收所述服务器发送的声纹认证结果;The receiving unit is used to receive and display the character combination and character pronunciation rules sent by the server, and receive the voiceprint authentication result sent by the server;
录入单元,用于接收用户根据所述字符组合及字符的发音规则输入的用户语音;The input unit is used to receive the user's voice input by the user according to the pronunciation rules of the character combination and characters;
发送单元,用于将所述用户语音发送至所述服务器。A sending unit, configured to send the user voice to the server.
本发明再提供一种能够防止录音攻击的声纹认证系统,该系统包括服务器及请求终端,其中,所述服务器用于根据一用户的声纹认证请求生成字符组合及字符的发音规则;将所述字符组合及字符的发音规则发送给请求终端;接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端;The present invention further provides a voiceprint authentication system capable of preventing recording attacks. The system includes a server and a requesting terminal, wherein the server is used to generate character combinations and character pronunciation rules according to a user's voiceprint authentication request; The character combination and the pronunciation rules of the character are sent to the requesting terminal; the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character is received; the pronunciation is performed according to the user voice, the character combination and the pronunciation rule of the character fingerprint authentication; sending the voiceprint authentication result to the requesting terminal;
所述请求终端用于发送一用户的声纹认证请求至服务器;接收并显示所述服务器发送的字符组合及字符的发音规则;接收用户根据所述字符组合及字符的发音规则输入的用户语音;将所述用户语音发送至所述服务器;接收所述服务器发送的声纹认证结果。The request terminal is used to send a user's voiceprint authentication request to the server; receive and display the character combination and the pronunciation rules of the characters sent by the server; receive the user voice input by the user according to the character combination and the pronunciation rules of the characters; Sending the user voice to the server; receiving the voiceprint authentication result sent by the server.
本发明提出的能够防止录音攻击的声纹认证方法、服务器、终端及系统,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本发明可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method, server, terminal and system proposed by the present invention, which can prevent recording attacks, can effectively prevent recording attacks by verifying whether the characters and pronunciation methods in the user's voice are consistent with the character combinations and character pronunciation rules generated by the server , even if the attacker can obtain the user's voice through other channels to meet the content of the voice, it cannot meet the requirements of the pronunciation method. Further, in order to prevent the user voice repeatedly input by the user from being attacked by recording, after judging that the characters and pronunciation methods in the user voice are consistent with the character combinations and character pronunciation rules generated by the server, it is also judged that the current voice to be verified is consistent with the historical voice database Check whether the voice of the user is consistent in the user. If they are consistent, it means that there is a recording attack. The invention can effectively prevent recording attacks in voiceprint authentication.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.
图1为本发明一实施例的能够防止录音攻击的声纹认证方法流程图;Fig. 1 is a flow chart of a voiceprint authentication method capable of preventing recording attacks according to an embodiment of the present invention;
图2为本发明一实施例的能够防止录音攻击的声纹认证过程流程图;Fig. 2 is a flow chart of the voiceprint authentication process capable of preventing recording attacks according to an embodiment of the present invention;
图3为本发明一实施例的能够防止录音攻击的声纹认证过程流程图;Fig. 3 is a flow chart of the voiceprint authentication process capable of preventing recording attacks according to an embodiment of the present invention;
图4为本发明一实施例的数字“0”的发音对应的波形图;FIG. 4 is a waveform diagram corresponding to the pronunciation of the number "0" according to an embodiment of the present invention;
图5为本发明一实施例的能够防止录音攻击的声纹认证方法流程图;5 is a flowchart of a voiceprint authentication method capable of preventing recording attacks according to an embodiment of the present invention;
图6为本发明一实施例的能够防止录音攻击的声纹认证服务器;Fig. 6 is a voiceprint authentication server capable of preventing recording attacks according to an embodiment of the present invention;
图7为本发明一实施例的能够防止录音攻击的声纹认证终端;FIG. 7 is a voiceprint authentication terminal capable of preventing recording attacks according to an embodiment of the present invention;
图8为本发明一实施例的能够防止录音攻击的声纹认证系统;Fig. 8 is a voiceprint authentication system capable of preventing recording attacks according to an embodiment of the present invention;
图9为本发明一实施例的具有防止录音攻击功能的声纹认证方法流程图。FIG. 9 is a flowchart of a voiceprint authentication method with the function of preventing recording attacks according to an embodiment of the present invention.
具体实施方式detailed description
为了使本发明的技术特点及效果更加明显,下面结合附图对本发明的技术方案做进一步说明,本发明也可有其他不同的具体实例来加以说明或实施,任何本领域技术人员在权利要求范围内做的等同变换均属于本发明的保护范畴。In order to make the technical features and effects of the present invention more obvious, the technical solutions of the present invention will be further described below in conjunction with the accompanying drawings. The present invention can also be described or implemented in other different specific examples. The equivalent transformations done within all belong to the protection category of the present invention.
如图1所示,图1为本发明一实施例的能够防止录音攻击的声纹认证方法流程图。As shown in FIG. 1 , FIG. 1 is a flowchart of a voiceprint authentication method capable of preventing recording attacks according to an embodiment of the present invention.
本实施例是从服务器侧描述的声纹认证方法,根据终端反馈的用户语音、服务器生成的字符组合及字符的发音规则进行声纹认证,本实施例能够一定程度上防止录音攻击。This embodiment is a voiceprint authentication method described from the server side. The voiceprint authentication is performed according to the user voice fed back by the terminal, the character combination generated by the server, and the character pronunciation rules. This embodiment can prevent recording attacks to a certain extent.
具体的,能够防止录音攻击的声纹认证方法包括如下步骤:Specifically, the voiceprint authentication method capable of preventing recording attacks includes the following steps:
步骤101:根据一用户的声纹认证请求生成字符组合及字符的发音规则;Step 101: Generate character combinations and character pronunciation rules according to a user's voiceprint authentication request;
字符组合包括但不限于字母、数字、汉字等,字符的发音规则包括但不限于发音的音调、发音的长度等,一实施例中,字符组合中的每个字符对应一个发音规则,另一实施例中,字符组合中的两个字符对应一个发音规则,本发明对字符组合及字符组合中的字符的发音规则的具体形式不做限制。Character combinations include but are not limited to letters, numbers, Chinese characters, etc., and the pronunciation rules of characters include but are not limited to the tone of pronunciation, the length of pronunciation, etc. In one embodiment, each character in the character combination corresponds to a pronunciation rule, another implementation In the example, two characters in the character combination correspond to one pronunciation rule, and the present invention does not limit the specific form of the character combination and the pronunciation rules of the characters in the character combination.
本申请一实施例中,所述字符组合及字符的发音规则是随机生成的。In an embodiment of the present application, the character combinations and the pronunciation rules of the characters are randomly generated.
步骤102:将字符组合及字符的发音规则发送给请求终端;Step 102: Send the character combination and character pronunciation rules to the requesting terminal;
本发明所述的终端包括但不限于手机、PAD、电脑及笔记本。The terminals described in the present invention include but are not limited to mobile phones, PADs, computers and notebooks.
步骤103:接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;Step 103: receiving the user voice input by the requesting terminal according to the character combination and character pronunciation rules;
步骤104:根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;Step 104: Perform voiceprint authentication according to the user's voice, the character combination and the pronunciation rules of the characters;
步骤105:将所述声纹认证结果发送至所述请求终端。Step 105: Send the voiceprint authentication result to the requesting terminal.
本实施例中,即使攻击者能够获取语音字符信息,也无法获取字符的发音规则,通过加入发音规则的认证,能够有效的防止录音攻击。In this embodiment, even if the attacker can obtain phonetic character information, he cannot obtain the pronunciation rules of the characters. By adding the authentication of the pronunciation rules, recording attacks can be effectively prevented.
详细的说,步骤104进一步包括:In detail, step 104 further includes:
判断所述用户语音与所述用户历史输入的语音是否为同一人的声音;Judging whether the voice of the user and the voice input by the user in history are the voice of the same person;
判断所述用户语音中的字符与所述字符组合中的字符是否相同;judging whether the characters in the user's voice are the same as the characters in the character combination;
判断所述用户语音中的字符的发音方式与所述字符的发音规则是否匹配;Judging whether the pronunciation mode of the character in the user's voice matches the pronunciation rule of the character;
只有所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配同时满足时,声纹认证才通过,其他情况声纹认证不通过,即若所述用户语音与所述用户历史输入的语音不为同一人,和/或所述用户语音中的字符与所述字符组合中的字符不同,和/或所述用户语音中的字符的发音方式与所述字符的发音规则不匹配,则声纹认证不通过。Only the voice of the user and the voice input by the user are the same person, the characters in the voice of the user are the same as the characters in the character combination, and the pronunciation of the characters in the voice of the user is the same as the pronunciation of the characters The voiceprint authentication will only pass when the rules match and are satisfied at the same time. In other cases, the voiceprint authentication will not pass, that is, if the user's voice and the user's historical input voice are not the same person, and/or the characters in the user's voice are not the same as If the characters in the character combination are different, and/or the pronunciation of the characters in the user's voice does not match the pronunciation rules of the characters, the voiceprint authentication fails.
本发明并不限制上述判断过程的顺序,任何顺序的组合均能实现声纹认证的判断。The present invention does not limit the order of the above-mentioned judging process, any combination of order can realize the judgment of voiceprint authentication.
优选的,如图2所示,步骤104进一步包括:Preferably, as shown in Figure 2, step 104 further includes:
步骤201:先判断所述用户语音与所述用户历史输入的语音是否为同一人的声音;如果不为同一人的声音,则声纹认证不通过,如果为同一人的声音,继续步骤202;Step 201: first judge whether the user voice and the voice input by the user are from the same person; if they are not from the same person, the voiceprint authentication fails, and if they are from the same person, continue to step 202;
具体实施时,在进行步骤202之前,需先按照字符分隔客户端上送的用户语音,然后提取用户语音中的字符。During specific implementation, before performing step 202, the user voice sent by the client needs to be separated according to characters, and then the characters in the user voice are extracted.
步骤202:判断所述用户语音中的字符与所述字符组合中的字符是否相同;Step 202: judging whether the characters in the user's voice are the same as the characters in the character combination;
如果所述用户语音中的字符与所述字符组合中的字符不同,则声纹认证不通过即声纹认证失败;If the characters in the user's voice are different from the characters in the character combination, the voiceprint authentication fails, that is, the voiceprint authentication fails;
如果所述用户语音中的字符与所述字符组合中的字符相同,则继续步骤203;If the characters in the user's voice are the same as the characters in the character combination, proceed to step 203;
步骤203:判断所述用户语音中的字符的发音方式与所述字符的发音规则是否匹配;Step 203: judging whether the pronunciation mode of the character in the user's voice matches the pronunciation rule of the character;
如果所述用户语音中的字符的发音方式与所述字符的发音规则不匹配,则声纹认证不通过;If the pronunciation of the characters in the user's voice does not match the pronunciation rules of the characters, the voiceprint authentication fails;
如果所述用户语音中的字符的发音方式与所述字符的发音规则匹配,则声纹认证通过。If the pronunciation mode of the character in the user's voice matches the pronunciation rule of the character, the voiceprint authentication is passed.
按照本实施例所述的顺序进行声纹认证能够加快认证的速度,预防录音攻击的同时提高用户的体验效果。如下实施例中,如不做特殊说明,均按照本实施例所述的顺序进行声纹认证。Performing voiceprint authentication according to the sequence described in this embodiment can speed up authentication, prevent recording attacks and improve user experience. In the following embodiments, unless otherwise specified, voiceprint authentication is performed in accordance with the sequence described in this embodiment.
复请参阅图2,判断出所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配后还包括将用户语音存储至历史语音库中,便于后续调取用户输入的语音信息。Referring again to Fig. 2, it is judged that the voice of the user and the voice input by the user history are the same person, the characters in the voice of the user are the same as the characters in the character combination, and the pronunciation of the characters in the voice of the user is the same After the method is matched with the pronunciation rules of the characters, it also includes storing the user's voice in the historical voice database, so as to facilitate the subsequent retrieval of the voice information input by the user.
如图3所示,本申请一实施例中,判断出所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配后还包括:As shown in Figure 3, in one embodiment of the present application, it is determined that the user voice and the voice input by the user are from the same person, the characters in the user voice are the same as the characters in the character combination, and the After the pronunciation of the character in the user's voice matches the pronunciation rule of the character, it also includes:
步骤204:判断所述用户语音与所述用户在历史语音库中的语音是否一致;Step 204: judging whether the user's voice is consistent with the user's voice in the historical voice library;
如果所述用户语音与所述用户在历史语音库中的语音一致,则声纹认证不通过;If the user's voice is consistent with the user's voice in the historical voice library, the voiceprint authentication fails;
如果所述用户语音与所述用户在历史语音库中的语音不一致,则声纹认证通过,将所述用户语音存储至历史语音库中。If the user's voice is inconsistent with the user's voice in the historical voice database, the voiceprint authentication is passed, and the user's voice is stored in the historical voice database.
通过验证用户语音与历史语音库中的该用户的语音是否一致,能够防止同一用户的不同次语音认证中输入的相同用户语音出现录音攻击。By verifying whether the user's voice is consistent with the user's voice in the historical voice database, it is possible to prevent recording attacks on the same user's voice input in different voice authentications of the same user.
本发明一实施例中,上一实施例的步骤204进一步包括:In an embodiment of the present invention, step 204 of the previous embodiment further includes:
提取所述用户语音的特征参数;extracting feature parameters of the user's voice;
计算所述用户语音的特征参数与所述用户在历史数据库中的语音的特征参数的欧几里德距离,所述欧几里德距离小于预定阈值时,所述用户语音与所述用户在历史语音库中的语音一致,所述欧几里德距离大于预定阈值时,所述用户语音与所述用户在历史语音库中的语音不一致。calculating the Euclidean distance between the characteristic parameters of the user's speech and the characteristic parameters of the user's speech in the historical database, and when the Euclidean distance is less than a predetermined threshold, the user's speech and the user's historical The speech in the speech database is consistent, and when the Euclidean distance is greater than a predetermined threshold, the user's speech is inconsistent with the user's speech in the historical speech database.
本实施例所述的预定阈值可根据人发出同样声音的差异性确定。The predetermined threshold described in this embodiment can be determined according to the difference of the same sound made by people.
具体实施时,判断用户语音与所述用户在历史语音库中的语音是否一致的详细过程为:During specific implementation, the detailed process of judging whether the user's voice is consistent with the voice of the user in the historical voice bank is:
1)按字符将用户语音分为多段语音,对每段语音进行预处理,包括分帧、预加重、加窗等处理,得到可以进一步计算的一段声音。1) Divide the user's speech into multiple speeches according to characters, and perform preprocessing on each speech, including processing such as framing, pre-emphasis, and windowing, to obtain a section of sound that can be further calculated.
2)找到每段语音中的有效语音部分的起点和终点。2) Find the start and end points of the valid speech parts in each speech.
如图4所示,图4为数字“0”的发音对应的波形图,由图4可以看出在声音的前后都有很多的无音段或者细微的噪声段。如果不去掉这些无效的声音信号,攻击者可以在录音的无效的声音端进行处理而影响录音检测的效果。As shown in Figure 4, Figure 4 is a waveform diagram corresponding to the pronunciation of the number "0". It can be seen from Figure 4 that there are many silent segments or subtle noise segments before and after the sound. If these invalid sound signals are not removed, the attacker can process the invalid sound end of the recording and affect the effect of recording detection.
具体实施时,可通过短时能量和短时过零率判断语音有效部分的起点和终点。During specific implementation, the start and end points of the effective part of the speech can be judged by the short-term energy and the short-term zero-crossing rate.
其中短时能量是指一帧语音信号的强度之和,第n帧语音信号的短时能量En:The short-term energy refers to the sum of the strength of a frame of speech signal, and the short-term energy En of the nth frame of speech signal:
        
其中,m为第n帧第m个采样点,N为该帧的大小,xn(m)为第n帧第m个采样点归一化后的频率。Wherein, m is the mth sampling point of the nth frame, N is the size of the frame, and x n (m) is the normalized frequency of the mth sampling point of the nth frame.
短时过零率是指在一帧语音信号波形穿过横轴的次数,记为Zn,The short-term zero-crossing rate refers to the number of times that the voice signal waveform crosses the horizontal axis in one frame, which is denoted as Z n ,
        
其中,m为第n帧第m个采样点,N为该帧的大小,xn(m)为第n帧第m个采样点归一化后的频率。Wherein, m is the mth sampling point of the nth frame, N is the size of the frame, and x n (m) is the normalized frequency of the mth sampling point of the nth frame.
当短时能量En超过阀值E或者短时过零率Zn超过阀值Z时,该语音为有效语音的开始,当短时能量En低于阀值E或者短时过零率Zn低于阀值Z时,该语音为有效语音的结束。When the short-term energy En exceeds the threshold E or the short-term zero-crossing rate Zn exceeds the threshold Z, the speech is the beginning of a valid speech; when the short-term energy En is lower than the threshold E or the short-term zero-crossing rate Zn is lower than the threshold When the value is Z, the speech is the end of the valid speech.
3)采用Mel尺度倒谱系数(MFCC)对有效语音提取特征参数。该方法是目前声音处理中比较通用的特征参数提取办法,本发明此处不再赘述。3) Using Mel scale cepstral coefficients (MFCC) to extract feature parameters for effective speech. This method is a relatively common feature parameter extraction method in sound processing at present, and will not be repeated here in the present invention.
记录用户本次经过前三步预处理、分割掉语音无效部分和提取特征参数后,用户的某个字符的语音表示为T:Record the user's voice of a character after the first three steps of preprocessing, segmenting the invalid part of the voice, and extracting feature parameters this time:
T有N帧矢量{T(1),T(2),…T(n),…,T(N)},T(n)是第n帧的语音特征矢量。T has N frame vectors {T(1), T(2), ... T(n), ..., T(N)}, where T(n) is the speech feature vector of the nth frame.
对于历史库中该用户的字符发音进行同样预处理、分割掉语音无效部分和提取特征参数后记为R:Perform the same preprocessing for the user's character pronunciation in the historical database, segment the invalid part of the voice and extract the feature parameters, and denote it as R:
R有M帧矢量R={R(1),R(2),…R(m),…,R(M)},R(m)为第m帧的语音特征矢量。R has M frame vectors R={R(1), R(2),...R(m),...,R(M)}, and R(m) is the speech feature vector of the mth frame.
4)计算用户声音与历史语音库中存储的声音的相似性,即为计算T与R的相似性,该相似性判断可通过计算T和R的欧几里得距离。4) Calculating the similarity between the user's voice and the voice stored in the historical voice database, that is, calculating the similarity between T and R, which can be judged by calculating the Euclidean distance between T and R.
d(T(in),R(im))表示T中第in帧特征与R中im帧特征之间的欧几里德距离,如果两个波形在某个帧完全重合,则距离d为0。为了比较它们之间的相似度,可以计算它们之间的距离D[T,R],距离越小则相似度越高。d(T(i n ),R(i m )) represents the Euclidean distance between the i nth frame feature in T and the i m frame feature in R, if the two waveforms completely overlap in a certain frame, then The distance d is 0. In order to compare the similarity between them, the distance D[T,R] between them can be calculated, the smaller the distance, the higher the similarity.
若N=M,即两段语音长度相同,直接简单计算用户语音与历史语音库中存储的语音的欧几里得距离D[T,R]=d(1,1)+d(2,2)+…+d(N,N),如果两端语音完全一样,则D[T,R]=0,通过这种方式只可以判断T和R是否完全相同,但是录音攻击者在实际攻击中往往会采取对原始录音在部分位置进行拉伸、缩短或者删除等操作,所以如果简单计算两者距离并不能很好的防御此类攻击。If N=M, that is, the length of the two speeches is the same, directly and simply calculate the Euclidean distance D[T, R]=d(1,1)+d(2,2) between the user's speech and the speech stored in the historical speech database )+…+d(N,N), if the voices at both ends are exactly the same, then D[T,R]=0, in this way only T and R can be judged whether they are exactly the same, but the recording attacker will The original recording is often stretched, shortened, or deleted in some positions, so simply calculating the distance between the two is not a good defense against such attacks.
对于N和M不相同时,要考虑将T(n)和R(m)对齐。对齐可以采用线性扩张的方法,如果N<M可以将T线性映射为一个M帧的序列,再计算它与{R(1),R(2),……,R(M)}之间的距离。但是攻击者不会对整段声音进行处理,而往往只对声音的部分位置进行处理,如果采取此方法会识别出二者声音相似度很低。When N and M are different, consider aligning T(n) and R(m). Alignment can use the method of linear expansion, if N<M, T can be linearly mapped to a sequence of M frames, and then calculate the relationship between it and {R(1), R(2),...,R(M)} distance. However, the attacker will not process the entire sound, but only part of the sound. If this method is adopted, it will be recognized that the similarity between the two sounds is very low.
因此比较语音T和R的相似度需要将时间规则和距离测量结合起来,通过寻找函数im=Φ(in),将T的时间轴n非线性地映射到R的时间轴m上,并使该T与R的距离D[T,R]满足:Therefore, comparing the similarity of speech T and R needs to combine the time rule and distance measurement. By finding the function i m = Φ(i n ), the time axis n of T is nonlinearly mapped to the time axis m of R, and Make the distance D[T,R] between T and R satisfy:
        
其中:in:
        
Φ(in+1)≥Φ(in)Φ(i n +1)≥Φ(i n )
Φ(in+1)-Φ(in)≤1Φ(i n +1)-Φ(i n )≤1
可以看出很明显满足动态规划的条件,可以使用动态规划算法进行求解,其中动态规划多项式为:It can be seen that the condition of dynamic programming is obviously satisfied, and the dynamic programming algorithm can be used to solve it, where the dynamic programming polynomial is:
D(in,im)=d(T(in),R(im))+min{D(in-1,im),D(in-1,im-1),D(in-1,im-2)}D(in,im)=d(T(in),R(im))+min{D(in-1,im),D(in-1,im-1),D(in-1,im- 2)}
这样从(l,1)点出发(令D(1,1)=0)搜索,反复递推,直到(N,M)就可以得到最优路径,而且D(N,M)就是最佳匹配路径所对应的匹配距离。In this way, starting from point (l, 1) (let D(1, 1) = 0) search, and repeat recursion until (N, M) can get the optimal path, and D (N, M) is the best match The matching distance corresponding to the path.
由于每个人的发言由多种因素影响,任何人重复发相同字符的声音在声波上不可能完全相似,肯定存在差异性,定义这个差异性为判断的预定阀值。如果D(N,M)=0,则说明两端语音T和R完全一致,可以证明为语音T和R为一个声音,可能存在录音攻击;如果D(N,M)<阀值,则说明两端语音T和R相似程度很高,同样可能存在录音攻击;如果D(N,M)>=阀值,则说明T和R不是同一语音,不存在录音攻击。Since each person's speech is affected by many factors, it is impossible for anyone to repeat the same character's voice to be completely similar in sound waves, and there must be differences. This difference is defined as the predetermined threshold for judgment. If D(N,M)=0, it means that the voices T and R at both ends are completely consistent, and it can be proved that the voices T and R are one voice, and there may be a recording attack; if D(N,M)<threshold, it means The voices T and R at both ends are very similar, and there may also be a recording attack; if D(N,M)>=threshold, it means that T and R are not the same voice, and there is no recording attack.
本发明提出的能够防止录音攻击的声纹认证方法,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本发明可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method proposed by the present invention that can prevent recording attacks can effectively prevent recording attacks by verifying whether the characters and pronunciation methods in the user's voice are consistent with the character combinations and character pronunciation rules generated by the server. The user's voice obtained through other channels satisfies the content of the voice, but also fails to meet the requirements of the pronunciation method. Further, in order to prevent the user voice repeatedly input by the user from being attacked by recording, after judging that the characters and pronunciation methods in the user voice are consistent with the character combinations and character pronunciation rules generated by the server, it is also judged that the current voice to be verified is consistent with the historical voice database Check whether the voice of the user is consistent in the user. If they are consistent, it means that there is a recording attack. The invention can effectively prevent recording attacks in voiceprint authentication.
如图5所示,图5为本发明一实施例的能够防止录音攻击的声纹认证方法流程图。该方法是从请求终端侧进行的描述,具体的,声纹认证方法包括:As shown in FIG. 5 , FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing recording attacks according to an embodiment of the present invention. This method is described from the side of the requesting terminal. Specifically, the voiceprint authentication method includes:
步骤501:发送一用户的声纹认证请求至服务器;Step 501: Send a user's voiceprint authentication request to the server;
步骤502:接收并显示所述服务器发送的字符组合及字符的发音规则;Step 502: receiving and displaying the character combination and character pronunciation rules sent by the server;
步骤503:接收用户根据所述字符组合及字符的发音规则输入的用户语音;Step 503: receiving the user's voice input by the user according to the character combination and character pronunciation rules;
步骤504:将所述用户语音发送至所述服务器;Step 504: Send the user voice to the server;
步骤505:接收所述服务器发送的声纹认证结果。Step 505: Receive the voiceprint authentication result sent by the server.
如图6所示,图6为本发明一实施例的一种能够防止录音攻击的声纹认证服务器,该服务器600包括,生成单元601,用于根据一用户的请求生成字符组合及字符的发音规则;As shown in Figure 6, Figure 6 is a voiceprint authentication server capable of preventing recording attacks according to an embodiment of the present invention, the server 600 includes a generation unit 601, which is used to generate character combinations and pronunciation of characters according to a user's request rule;
发送单元602,用于将所述字符组合及字符的发音规则发送给请求终端,将声纹认证结果发送至所述请求终端;A sending unit 602, configured to send the character combination and pronunciation rules of the characters to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
接收单元603,用于接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;The receiving unit 603 is configured to receive the user voice input by the requesting terminal according to the character combination and character pronunciation rules;
声音检测单元604,用于根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证。The sound detection unit 604 is configured to perform voiceprint authentication according to the user's voice, the character combination and the pronunciation rules of the characters.
如图7所示,图7为本发明一实施例的能够防止录音攻击的声纹认证终端。具体的,该认证终端700包括:请求单元701,用于发送一用户的声纹认证请求至服务器;As shown in FIG. 7, FIG. 7 is a voiceprint authentication terminal capable of preventing recording attacks according to an embodiment of the present invention. Specifically, the authentication terminal 700 includes: a request unit 701, configured to send a user's voiceprint authentication request to the server;
接收单元702,用于接收并显示所述服务器发送的字符组合及字符的发音规则,接收所述服务器发送的声纹认证结果;The receiving unit 702 is configured to receive and display the character combination and character pronunciation rules sent by the server, and receive the voiceprint authentication result sent by the server;
录入单元703,用于接收用户根据所述字符组合及字符的发音规则输入的用户语音;The input unit 703 is used to receive the user's voice input by the user according to the pronunciation rules of the character combination and characters;
发送单元704,用于将所述用户语音发送至所述服务器。A sending unit 704, configured to send the user voice to the server.
如图8所示,图8为本发明一实施例的能够防止录音攻击的声纹认证系统。As shown in FIG. 8 , FIG. 8 is a voiceprint authentication system capable of preventing recording attacks according to an embodiment of the present invention.
该声纹认证系统包括服务器600及请求终端700,其中,所述服务器600用于根据一用户的声纹认证请求生成字符组合及字符的发音规则;将所述字符组合及字符的发音规则发送给请求终端;接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端;The voiceprint authentication system includes a server 600 and a request terminal 700, wherein the server 600 is used to generate character combinations and pronunciation rules of characters according to a user's voiceprint authentication request; send the character combination and pronunciation rules of characters to Requesting terminal; receiving the user voice input by the requesting terminal according to the character combination and the pronunciation rules of characters; performing voiceprint authentication according to the user voice, the character combination and the pronunciation rules of characters; sent to the requesting terminal;
所述请求终端700用于发送一用户的声纹认证请求至服务器;接收并显示所述服务器发送的字符组合及字符的发音规则;接收用户根据所述字符组合及字符的发音规则输入的用户语音;将所述用户语音发送至所述服务器;接收所述服务器发送的声纹认证结果。The request terminal 700 is used to send a user's voiceprint authentication request to the server; receive and display the character combination and character pronunciation rules sent by the server; receive the user voice input by the user according to the character combination and character pronunciation rules ; Sending the user voice to the server; receiving the voiceprint authentication result sent by the server.
本发明提出的能够防止录音攻击的声纹认证方法、服务器、终端及系统,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本发明可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method, server, terminal and system proposed by the present invention, which can prevent recording attacks, can effectively prevent recording attacks by verifying whether the characters and pronunciation methods in the user's voice are consistent with the character combinations and character pronunciation rules generated by the server , even if the attacker can obtain the user's voice through other channels to meet the content of the voice, it cannot meet the requirements of the pronunciation method. Further, in order to prevent the user voice repeatedly input by the user from being attacked by recording, after judging that the characters and pronunciation methods in the user voice are consistent with the character combinations and character pronunciation rules generated by the server, it is also judged that the current voice to be verified is consistent with the historical voice database Check whether the voice of the user is consistent in the user. If they are consistent, it means that there is a recording attack. The invention can effectively prevent recording attacks in voiceprint authentication.
为了更清楚的说明本申请的技术方案,下面以一具体实施例进行说明,结合图9所示,防止录音攻击的系统工作流程为:In order to illustrate the technical solution of the present application more clearly, a specific embodiment is used below to illustrate, and as shown in FIG. 9 , the system workflow for preventing recording attacks is as follows:
步骤901:客户端发送身份认证请求至服务器;Step 901: the client sends an identity authentication request to the server;
步骤902:服务器接收身份认证请求;Step 902: the server receives the identity authentication request;
步骤903:服务器根据身份认证请求随机生成验证字符组合以及字符的发音方式,并将其发送给客户端;Step 903: The server randomly generates a verification character combination and the pronunciation of the characters according to the identity authentication request, and sends them to the client;
步骤904:客户端接收到服务器下发的待验证字符组合及字符的发音规则后,提示用户按要求读入字符;Step 904: After the client receives the character combination to be verified and the pronunciation rules of the characters issued by the server, it prompts the user to read the characters as required;
步骤905:客户端接收用户读入的用户语音,并将用户读入的用户语音发送至服务器;Step 905: the client receives the user voice read in by the user, and sends the user voice read in by the user to the server;
步骤906:服务器进行声纹验证,判断接收的用户语音与预先存储的该用户的语音是否为同一人,具体实施时可采用目前常规的声纹验证算法;Step 906: The server performs voiceprint verification, and judges whether the received user's voice and the pre-stored user's voice are the same person, and the current conventional voiceprint verification algorithm can be used for specific implementation;
如果声纹验证不是同一个人,则直接返回用户认证失败给客户端;If the voiceprint verification is not the same person, it will directly return the user authentication failure to the client;
如果声纹验证为同一人,则继续录音检测;If the voiceprint is verified as the same person, continue recording detection;
步骤907:验证用户声音中的字符与服务器生成的字符组合中的字符是否相同;如果用户声音中的字符与服务器生成的字符组合中的字符不相同,则用户声音中的字符验证不通过,返回用户认证失败给客户端;如果用户声音中的字符与服务器生成的字符组合中的字符相同,则用户声音中的字符验证通过,继续步骤908;Step 907: Verify whether the characters in the user voice are the same as the characters in the character combination generated by the server; if the characters in the user voice are not the same as the characters in the character combination generated by the server, then the verification of the characters in the user voice fails, and returns If the user authentication fails to the client; if the characters in the user voice are identical to the characters in the character combination generated by the server, the character verification in the user voice is passed, and continue to step 908;
步骤908:验证用户声音中的字符的发音方式与服务器生成的字符发音方式是否相同,如果用户声音中的字符的发音方式与服务器生成的字符发音方式不相同,则用户声音中的字符发音方式验证不通过,返回用户认证失败给客户端;如果用户声音中的字符的发音方式与服务器生成的字符发音方式相同,则用户声音中的字符发音方式验证通过,继续步骤909;Step 908: Verify that the pronunciation of the characters in the user's voice is the same as the pronunciation of the characters generated by the server, if the pronunciation of the characters in the user's voice is different from the pronunciation of the characters generated by the server, verify the pronunciation of the characters in the user's voice If not, return the user authentication failure to the client; if the pronunciation mode of the character in the user voice is identical to the pronunciation mode of the character generated by the server, the verification of the pronunciation mode of the character in the user voice is passed, and proceed to step 909;
步骤909:验证用户声音是否存在于历史语音库中,如果存在,则证明存在录音攻击,认证失败,将认证失败结果发送给客户端;如果不存在,则声纹认证通过,将用户声音存储于历史语音库中,将声纹认证通过结果发送给客户端。Step 909: Verify whether the user's voice exists in the historical voice database, if it exists, it proves that there is a recording attack, the authentication fails, and the authentication failure result is sent to the client; if it does not exist, the voiceprint authentication passes, and the user's voice is stored in In the historical voice database, the voiceprint authentication pass result is sent to the client.
验证用户声音是否存在于历史语音库中的过程已在上述实施例中进行了详细的说明,此处不再赘述。声纹认证通过后,客户端继续相应的操作,本发明对此不做限制。The process of verifying whether the user's voice exists in the historical speech database has been described in detail in the above embodiments, and will not be repeated here. After the voiceprint authentication is passed, the client continues corresponding operations, which is not limited in the present invention.
本发明提出的能够防止录音攻击的声纹认证方法、服务器、终端及系统,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本发明可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method, server, terminal and system proposed by the present invention, which can prevent recording attacks, can effectively prevent recording attacks by verifying whether the characters and pronunciation methods in the user's voice are consistent with the character combinations and character pronunciation rules generated by the server , even if the attacker can obtain the user's voice through other channels to meet the content of the voice, it cannot meet the requirements of the pronunciation method. Further, in order to prevent the user voice repeatedly input by the user from being attacked by recording, after judging that the characters and pronunciation methods in the user voice are consistent with the character combinations and character pronunciation rules generated by the server, it is also judged that the current voice to be verified is consistent with the historical voice database Check whether the voice of the user is consistent in the user. If they are consistent, it means that there is a recording attack. The invention can effectively prevent recording attacks in voiceprint authentication.
以上所述仅用于说明本申请技术方案,任何本领域普通技术人员均可在不违背本发明的精神及范畴下,对上述实施例进行修饰与改变。因此,本发明的权利保护范围应视权利要求范围为准。The above description is only used to illustrate the technical solutions of the present application, and any person skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be determined by the scope of the claims.
Claims (11)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201511020257.4A CN105933272A (en) | 2015-12-30 | 2015-12-30 | Voiceprint recognition method capable of preventing recording attack, server, terminal, and system | 
| PCT/CN2016/111714 WO2017114307A1 (en) | 2015-12-30 | 2016-12-23 | Voiceprint authentication method capable of preventing recording attack, server, terminal, and system | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201511020257.4A CN105933272A (en) | 2015-12-30 | 2015-12-30 | Voiceprint recognition method capable of preventing recording attack, server, terminal, and system | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN105933272A true CN105933272A (en) | 2016-09-07 | 
Family
ID=56839979
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201511020257.4A Pending CN105933272A (en) | 2015-12-30 | 2015-12-30 | Voiceprint recognition method capable of preventing recording attack, server, terminal, and system | 
Country Status (2)
| Country | Link | 
|---|---|
| CN (1) | CN105933272A (en) | 
| WO (1) | WO2017114307A1 (en) | 
Cited By (29)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2017114307A1 (en) * | 2015-12-30 | 2017-07-06 | 中国银联股份有限公司 | Voiceprint authentication method capable of preventing recording attack, server, terminal, and system | 
| CN109087647A (en) * | 2018-08-03 | 2018-12-25 | 平安科技(深圳)有限公司 | Application on Voiceprint Recognition processing method, device, electronic equipment and storage medium | 
| CN109218269A (en) * | 2017-07-05 | 2019-01-15 | 阿里巴巴集团控股有限公司 | Identity authentication method, device, equipment and data processing method | 
| CN109935233A (en) * | 2019-01-29 | 2019-06-25 | 天津大学 | A recording attack detection method based on amplitude and phase information | 
| CN110169014A (en) * | 2017-01-03 | 2019-08-23 | 诺基亚技术有限公司 | Device, method and computer program product for certification | 
| CN111316668A (en) * | 2017-11-14 | 2020-06-19 | 思睿逻辑国际半导体有限公司 | Detection of loudspeaker playback | 
| CN111524528A (en) * | 2020-05-28 | 2020-08-11 | Oppo广东移动通信有限公司 | Voice wake-up method and device for anti-recording detection | 
| US10984083B2 (en) | 2017-07-07 | 2021-04-20 | Cirrus Logic, Inc. | Authentication of user using ear biometric data | 
| CN112735426A (en) * | 2020-12-24 | 2021-04-30 | 深圳市声扬科技有限公司 | Voice verification method and system, computer device and storage medium | 
| US11017252B2 (en) | 2017-10-13 | 2021-05-25 | Cirrus Logic, Inc. | Detection of liveness | 
| US11023755B2 (en) | 2017-10-13 | 2021-06-01 | Cirrus Logic, Inc. | Detection of liveness | 
| US11037574B2 (en) | 2018-09-05 | 2021-06-15 | Cirrus Logic, Inc. | Speaker recognition and speaker change detection | 
| CN113012684A (en) * | 2021-03-04 | 2021-06-22 | 电子科技大学 | Synthesized voice detection method based on voice segmentation | 
| US11042617B2 (en) | 2017-07-07 | 2021-06-22 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes | 
| US11042616B2 (en) | 2017-06-27 | 2021-06-22 | Cirrus Logic, Inc. | Detection of replay attack | 
| US11042618B2 (en) | 2017-07-07 | 2021-06-22 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes | 
| US11164588B2 (en) | 2017-06-28 | 2021-11-02 | Cirrus Logic, Inc. | Magnetic detection of replay attack | 
| US11264037B2 (en) | 2018-01-23 | 2022-03-01 | Cirrus Logic, Inc. | Speaker identification | 
| US11270707B2 (en) | 2017-10-13 | 2022-03-08 | Cirrus Logic, Inc. | Analysing speech signals | 
| US11276409B2 (en) | 2017-11-14 | 2022-03-15 | Cirrus Logic, Inc. | Detection of replay attack | 
| CN114826709A (en) * | 2022-04-15 | 2022-07-29 | 马上消费金融股份有限公司 | Identity authentication and acoustic environment detection method, system, electronic device and medium | 
| US11475899B2 (en) | 2018-01-23 | 2022-10-18 | Cirrus Logic, Inc. | Speaker identification | 
| US11631402B2 (en) | 2018-07-31 | 2023-04-18 | Cirrus Logic, Inc. | Detection of replay attack | 
| US11705135B2 (en) | 2017-10-13 | 2023-07-18 | Cirrus Logic, Inc. | Detection of liveness | 
| US11704397B2 (en) | 2017-06-28 | 2023-07-18 | Cirrus Logic, Inc. | Detection of replay attack | 
| US11735189B2 (en) | 2018-01-23 | 2023-08-22 | Cirrus Logic, Inc. | Speaker identification | 
| US11748462B2 (en) | 2018-08-31 | 2023-09-05 | Cirrus Logic Inc. | Biometric authentication | 
| US11755701B2 (en) | 2017-07-07 | 2023-09-12 | Cirrus Logic Inc. | Methods, apparatus and systems for authentication | 
| US11829461B2 (en) | 2017-07-07 | 2023-11-28 | Cirrus Logic Inc. | Methods, apparatus and systems for audio playback | 
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN109754817B (en) * | 2017-11-02 | 2025-02-18 | 北京三星通信技术研究有限公司 | Signal processing method and terminal equipment | 
| CN112365895B (en) * | 2020-10-09 | 2024-04-19 | 深圳前海微众银行股份有限公司 | Audio processing method, device, computing equipment and storage medium | 
| CN114023331B (en) * | 2021-10-20 | 2025-07-15 | 中国工商银行股份有限公司 | Performance detection method, device, equipment and storage medium of voiceprint recognition system | 
| CN119989323B (en) * | 2025-04-15 | 2025-09-12 | 支付宝(杭州)信息技术有限公司 | Data processing method, device and glasses-type wearable device | 
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN1808567A (en) * | 2006-01-26 | 2006-07-26 | 覃文华 | Voice-print authentication device and method of authenticating people presence | 
| CN102457845A (en) * | 2010-10-14 | 2012-05-16 | 阿里巴巴集团控股有限公司 | Wireless service identity authentication method, equipment and system | 
| CN102543084A (en) * | 2010-12-29 | 2012-07-04 | 盛乐信息技术(上海)有限公司 | Online voiceprint recognition system and implementation method thereof | 
| CN102737634A (en) * | 2012-05-29 | 2012-10-17 | 百度在线网络技术(北京)有限公司 | Authentication method and device based on voice | 
| CN104717219A (en) * | 2015-03-20 | 2015-06-17 | 百度在线网络技术(北京)有限公司 | Vocal print login method and device based on artificial intelligence | 
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US9202460B2 (en) * | 2008-05-14 | 2015-12-01 | At&T Intellectual Property I, Lp | Methods and apparatus to generate a speech recognition library | 
| CN104901808A (en) * | 2015-04-14 | 2015-09-09 | 时代亿宝(北京)科技有限公司 | Voiceprint authentication system and method based on time type dynamic password | 
| CN105185379B (en) * | 2015-06-17 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | voiceprint authentication method and device | 
| CN105096121B (en) * | 2015-06-25 | 2017-07-25 | 百度在线网络技术(北京)有限公司 | voiceprint authentication method and device | 
| CN105933272A (en) * | 2015-12-30 | 2016-09-07 | 中国银联股份有限公司 | Voiceprint recognition method capable of preventing recording attack, server, terminal, and system | 
- 
        2015
        - 2015-12-30 CN CN201511020257.4A patent/CN105933272A/en active Pending
 
- 
        2016
        - 2016-12-23 WO PCT/CN2016/111714 patent/WO2017114307A1/en active Application Filing
 
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN1808567A (en) * | 2006-01-26 | 2006-07-26 | 覃文华 | Voice-print authentication device and method of authenticating people presence | 
| CN102457845A (en) * | 2010-10-14 | 2012-05-16 | 阿里巴巴集团控股有限公司 | Wireless service identity authentication method, equipment and system | 
| CN102543084A (en) * | 2010-12-29 | 2012-07-04 | 盛乐信息技术(上海)有限公司 | Online voiceprint recognition system and implementation method thereof | 
| CN102737634A (en) * | 2012-05-29 | 2012-10-17 | 百度在线网络技术(北京)有限公司 | Authentication method and device based on voice | 
| CN104717219A (en) * | 2015-03-20 | 2015-06-17 | 百度在线网络技术(北京)有限公司 | Vocal print login method and device based on artificial intelligence | 
Non-Patent Citations (1)
| Title | 
|---|
| 赵力: "《语音信号处理》", 31 May 2009 * | 
Cited By (38)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2017114307A1 (en) * | 2015-12-30 | 2017-07-06 | 中国银联股份有限公司 | Voiceprint authentication method capable of preventing recording attack, server, terminal, and system | 
| US11283631B2 (en) | 2017-01-03 | 2022-03-22 | Nokia Technologies Oy | Apparatus, method and computer program product for authentication | 
| CN110169014A (en) * | 2017-01-03 | 2019-08-23 | 诺基亚技术有限公司 | Device, method and computer program product for certification | 
| US12026241B2 (en) | 2017-06-27 | 2024-07-02 | Cirrus Logic Inc. | Detection of replay attack | 
| US11042616B2 (en) | 2017-06-27 | 2021-06-22 | Cirrus Logic, Inc. | Detection of replay attack | 
| US11704397B2 (en) | 2017-06-28 | 2023-07-18 | Cirrus Logic, Inc. | Detection of replay attack | 
| US11164588B2 (en) | 2017-06-28 | 2021-11-02 | Cirrus Logic, Inc. | Magnetic detection of replay attack | 
| CN109218269A (en) * | 2017-07-05 | 2019-01-15 | 阿里巴巴集团控股有限公司 | Identity authentication method, device, equipment and data processing method | 
| US11829461B2 (en) | 2017-07-07 | 2023-11-28 | Cirrus Logic Inc. | Methods, apparatus and systems for audio playback | 
| US12248551B2 (en) | 2017-07-07 | 2025-03-11 | Cirrus Logic Inc. | Methods, apparatus and systems for audio playback | 
| US10984083B2 (en) | 2017-07-07 | 2021-04-20 | Cirrus Logic, Inc. | Authentication of user using ear biometric data | 
| US11714888B2 (en) | 2017-07-07 | 2023-08-01 | Cirrus Logic Inc. | Methods, apparatus and systems for biometric processes | 
| US11755701B2 (en) | 2017-07-07 | 2023-09-12 | Cirrus Logic Inc. | Methods, apparatus and systems for authentication | 
| US11042617B2 (en) | 2017-07-07 | 2021-06-22 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes | 
| US12135774B2 (en) | 2017-07-07 | 2024-11-05 | Cirrus Logic Inc. | Methods, apparatus and systems for biometric processes | 
| US11042618B2 (en) | 2017-07-07 | 2021-06-22 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes | 
| US11017252B2 (en) | 2017-10-13 | 2021-05-25 | Cirrus Logic, Inc. | Detection of liveness | 
| US12380895B2 (en) | 2017-10-13 | 2025-08-05 | Cirrus Logic Inc. | Analysing speech signals | 
| US11023755B2 (en) | 2017-10-13 | 2021-06-01 | Cirrus Logic, Inc. | Detection of liveness | 
| US11705135B2 (en) | 2017-10-13 | 2023-07-18 | Cirrus Logic, Inc. | Detection of liveness | 
| US11270707B2 (en) | 2017-10-13 | 2022-03-08 | Cirrus Logic, Inc. | Analysing speech signals | 
| US11276409B2 (en) | 2017-11-14 | 2022-03-15 | Cirrus Logic, Inc. | Detection of replay attack | 
| CN111316668B (en) * | 2017-11-14 | 2021-09-28 | 思睿逻辑国际半导体有限公司 | Detection of loudspeaker playback | 
| CN111316668A (en) * | 2017-11-14 | 2020-06-19 | 思睿逻辑国际半导体有限公司 | Detection of loudspeaker playback | 
| US11051117B2 (en) | 2017-11-14 | 2021-06-29 | Cirrus Logic, Inc. | Detection of loudspeaker playback | 
| US11475899B2 (en) | 2018-01-23 | 2022-10-18 | Cirrus Logic, Inc. | Speaker identification | 
| US11694695B2 (en) | 2018-01-23 | 2023-07-04 | Cirrus Logic, Inc. | Speaker identification | 
| US11264037B2 (en) | 2018-01-23 | 2022-03-01 | Cirrus Logic, Inc. | Speaker identification | 
| US11735189B2 (en) | 2018-01-23 | 2023-08-22 | Cirrus Logic, Inc. | Speaker identification | 
| US11631402B2 (en) | 2018-07-31 | 2023-04-18 | Cirrus Logic, Inc. | Detection of replay attack | 
| CN109087647A (en) * | 2018-08-03 | 2018-12-25 | 平安科技(深圳)有限公司 | Application on Voiceprint Recognition processing method, device, electronic equipment and storage medium | 
| US11748462B2 (en) | 2018-08-31 | 2023-09-05 | Cirrus Logic Inc. | Biometric authentication | 
| US11037574B2 (en) | 2018-09-05 | 2021-06-15 | Cirrus Logic, Inc. | Speaker recognition and speaker change detection | 
| CN109935233A (en) * | 2019-01-29 | 2019-06-25 | 天津大学 | A recording attack detection method based on amplitude and phase information | 
| CN111524528A (en) * | 2020-05-28 | 2020-08-11 | Oppo广东移动通信有限公司 | Voice wake-up method and device for anti-recording detection | 
| CN112735426A (en) * | 2020-12-24 | 2021-04-30 | 深圳市声扬科技有限公司 | Voice verification method and system, computer device and storage medium | 
| CN113012684A (en) * | 2021-03-04 | 2021-06-22 | 电子科技大学 | Synthesized voice detection method based on voice segmentation | 
| CN114826709A (en) * | 2022-04-15 | 2022-07-29 | 马上消费金融股份有限公司 | Identity authentication and acoustic environment detection method, system, electronic device and medium | 
Also Published As
| Publication number | Publication date | 
|---|---|
| WO2017114307A1 (en) | 2017-07-06 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN105933272A (en) | Voiceprint recognition method capable of preventing recording attack, server, terminal, and system | |
| Mukhopadhyay et al. | All your voices are belong to us: Stealing voices to fool humans and machines | |
| CN107104803B (en) | User identity authentication method based on digital password and voiceprint joint confirmation | |
| Chen et al. | Towards understanding and mitigating audio adversarial examples for speaker recognition | |
| WO2017197953A1 (en) | Voiceprint-based identity recognition method and device | |
| US20190013026A1 (en) | System and method for efficient liveness detection | |
| KR101757990B1 (en) | Method and device for voiceprint indentification | |
| US11979398B2 (en) | Privacy-preserving voiceprint authentication apparatus and method | |
| Gałka et al. | Playback attack detection for text-dependent speaker verification over telephone channels | |
| EP2364495B1 (en) | Method for verifying the identify of a speaker and related computer readable medium and computer | |
| CN104509065B (en) | Human interaction proof is used as using the ability of speaking | |
| US7447632B2 (en) | Voice authentication system | |
| CN104217149B (en) | Biometric authentication method and equipment based on voice | |
| US20130132091A1 (en) | Dynamic Pass Phrase Security System (DPSS) | |
| WO2017215558A1 (en) | Voiceprint recognition method and device | |
| EP1962280A1 (en) | Method and network-based biometric system for biometric authentication of an end user | |
| CN101231737A (en) | A system and method for enhancing the security of online banking transactions | |
| WO2017206375A1 (en) | Voiceprint registration and authentication methods and devices | |
| Saquib et al. | A survey on automatic speaker recognition systems | |
| CN113012684B (en) | Synthesized voice detection method based on voice segmentation | |
| Turner et al. | Attacking speaker recognition systems with phoneme morphing | |
| JP7339116B2 (en) | Voice authentication device, voice authentication system, and voice authentication method | |
| RU2351023C2 (en) | User verification method in authorised access systems | |
| US20230153815A1 (en) | Methods and systems for training a machine learning model and authenticating a user with the model | |
| Chen et al. | An HASM-assisted voice disguise scheme for emotion recognition of IoT-enabled voice interface | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date: 20160907 | |
| RJ01 | Rejection of invention patent application after publication |