[go: up one dir, main page]

CN107993071A - Electronic device, auth method and storage medium based on vocal print - Google Patents

Electronic device, auth method and storage medium based on vocal print Download PDF

Info

Publication number
CN107993071A
CN107993071A CN201711161344.0A CN201711161344A CN107993071A CN 107993071 A CN107993071 A CN 107993071A CN 201711161344 A CN201711161344 A CN 201711161344A CN 107993071 A CN107993071 A CN 107993071A
Authority
CN
China
Prior art keywords
vocal print
voice data
data
print feature
discriminant vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711161344.0A
Other languages
Chinese (zh)
Inventor
赵峰
王健宗
程宁
郑斯奇
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201711161344.0A priority Critical patent/CN107993071A/en
Priority to PCT/CN2018/076113 priority patent/WO2019100606A1/en
Publication of CN107993071A publication Critical patent/CN107993071A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions
    • G06Q20/40145Biometric identity checks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Collating Specific Patterns (AREA)

Abstract

Auth method and storage medium the present invention relates to a kind of electronic device, based on vocal print, this method include:After the voice data of targeted customer of pending authentication is received, call predetermined convolutional neural networks CNN models to carry out framing and sampling to the voice data, obtain speech sample data;The speech sample data is handled using Predetermined filter to extract preset kind vocal print feature, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;The background channel model that vocal print feature vector input is trained in advance, to construct the current vocal print discriminant vectors of the voice data;The space length between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore is calculated, authentication is carried out to the user based on the space length, and generate verification result.The present invention can improve the accuracy and efficiency of authentication.

Description

Electronic device, auth method and storage medium based on vocal print
Technical field
The present invention relates to field of communication technology, more particularly to a kind of electronic device, the auth method based on vocal print and Storage medium.
Background technology
At present, the scope of business of many large-scale financing corporations is related to multiple business such as insurance, bank, investment, and every A business is usually required for same client to be linked up, and therefore, the authentication to client, which also just becomes, ensures service security Important component.In order to meet the real-time demand of business, current this kind of financing corporation's generally use manual type is to visitor The identity at family carries out analysis verification, but since customer group is huge, single by manually carrying out, discriminant analysis is not only time-consuming to take Power, easily error, and operating cost can be greatly increased;In addition, some financing corporations attempt to use the automatic recognition of speech Mode the identity of user is differentiated automatically, however, the accuracy rate of this kind of existing automatic recognition of speech mode is low, need Improve.Therefore, how the high automatic recognition of speech scheme of accuracy is provided and has become a technical problem urgently to be resolved hurrily.
The content of the invention
Auth method and storage medium it is an object of the invention to provide a kind of electronic device, based on vocal print, purport Improving the accuracy and efficiency of authentication.
To achieve the above object, the present invention provides a kind of electronic device, the electronic device include memory and with it is described The processor of memory connection, is stored with the processing system that can be run on the processor, the processing in the memory System realizes following steps when being performed by the processor:
Framing sampling step, after the voice data of targeted customer of pending authentication is received, is called predetermined Convolutional neural networks CNN models carry out framing and sampling to the voice data, obtain speech sample data;
Extraction step, is handled the speech sample data using Predetermined filter to extract preset kind vocal print spy Sign, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice The current vocal print discriminant vectors of data;
Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer that prestore it Between space length, authentication is carried out to the user based on the space length, and generates verification result.
Preferably, the framing sampling step, specifically includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do The corresponding two-dimentional voice data of voice data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, are obtained described Speech sample data.
Preferably, the extraction step, specifically includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency Spectral coefficient MFCC forms corresponding vocal print feature vector.
Preferably, the verification step, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
To achieve the above object, it is described based on vocal print the present invention also provides a kind of auth method based on vocal print Auth method includes:
S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolutional Neural net Network CNN models carry out framing and sampling to the voice data, obtain speech sample data;
S2, is handled the speech sample data using Predetermined filter to extract preset kind vocal print feature, and base The corresponding vocal print feature vector of the voice data is built in the preset kind vocal print feature;
S3, the background channel model that vocal print feature vector input is trained in advance, to construct the voice data Current vocal print discriminant vectors;
S4, calculates the sky between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Between distance, authentication is carried out to the user based on the space length, and generates verification result.
Preferably, the step S1 includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do The corresponding two-dimentional voice data of voice data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, are obtained described Speech sample data.
Preferably, the step S2 includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency Spectral coefficient MFCC forms corresponding vocal print feature vector.
Preferably, the step S4 includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
Preferably, the background channel model is gauss hybrid models, is included before the step S3:
The voice data sample of default quantity is obtained, which is handled to obtain preset kind vocal print spy Sign, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set of the first ratio and the verification collection of the second ratio, first ratio and Second ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, Verified using the accuracy rate of the gauss hybrid models after the verification set pair training;
If the accuracy rate is more than predetermined threshold value, model training terminates, and institute is used as using the gauss hybrid models after training Background channel model is stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample Amount, and training is re-started based on the voice data sample after increase.
The present invention also provides a kind of computer-readable recording medium, processing is stored with the computer-readable recording medium The step of system, the processing system realizes above-mentioned auth method based on vocal print when being executed by processor.
The beneficial effects of the invention are as follows:The present invention to targeted customer based on vocal print when carrying out authentication, using convolution Neural network model carries out voice data the speech processes of framing and sampling, can quickly and efficiently obtain in voice data Useful local data, extracts vocal print feature based on speech sample data and builds the body that vocal print feature vector carries out targeted customer Part verification, it is possible to increase the accuracy and efficiency of authentication.
Brief description of the drawings
Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention;
Fig. 2 is the flow diagram of auth method one embodiment of the invention based on vocal print.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before creative work is made All other embodiments obtained are put, belong to the scope of protection of the invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is only used for description purpose, and cannot It is interpreted as indicating or implies its relative importance or imply the quantity of the technical characteristic indicated by indicating.Thus, define " the One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the skill between each embodiment Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical solution It will be understood that the combination of this technical solution is not present with reference to there is conflicting or can not realize when, also not in application claims Protection domain within.
As shown in fig.1, Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention.Electronic device 1 is It is a kind of to carry out numerical computations and/or the equipment of information processing automatically according to the instruction for being previously set or storing.The electricity Sub-device 1 can be computer, can also be single network server, the server group or base of multiple webservers composition In the cloud being made of a large amount of hosts or the webserver of cloud computing, wherein cloud computing is one kind of Distributed Calculation, by one One super virtual computer of the computer collection composition of group's loose couplings.
In the present embodiment, electronic device 1 may include, but be not limited only to, and can be in communication with each other connection by system bus Memory 11, processor 12, network interface 13, memory 11 are stored with the processing system that can be run on the processor 12.Need , it is noted that Fig. 1 illustrate only the electronic device 1 with component 11-13, it should be understood that being not required for implementing all The component shown, what can be substituted implements more or less components.
Wherein, memory 11 includes memory and the readable storage medium storing program for executing of at least one type.Inside save as the fortune of electronic device 1 Row provides caching;Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories Deng), random access storage device (RAM), static random-access memory (SRAM), read-only storage (ROM), electric erasable can compile Journey read-only storage (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile Storage medium.In certain embodiments, readable storage medium storing program for executing can be the internal storage unit of electronic device 1, such as the electronics The hard disk of device 1;In further embodiments, which can also be that the external storage of electronic device 1 is set Plug-in type hard disk that is standby, such as being equipped with electronic device 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11 The operating system and types of applications software of electronic device 1, such as the place in one embodiment of the invention are installed on commonly used in storage Program code of reason system etc..Export or will export each in addition, memory 11 can be also used for temporarily storing Class data.
The processor 12 can be in certain embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is commonly used in the control electricity The overall operation of sub-device 1, such as perform and carry out data interaction or communicate relevant control and processing with the other equipment Deng.In the present embodiment, the processor 12 is used to run the program code stored in the memory 11 or processing data, example Such as run processing system.
The network interface 13 may include radio network interface or wired network interface, which is commonly used in Communication connection is established between the electronic device 1 and other electronic equipments.In the present embodiment, network interface 13 is mainly used for electricity Sub-device 1 is connected with other equipment, establishes data transmission channel and communication connection, to receive the target of pending authentication use The voice data at family.
The processing system is stored in memory 11, including it is at least one be stored in it is computer-readable in memory 11 Instruction, which can be performed by processor device 12, to realize the method for each embodiment of the application;With And at least one computer-readable instruction is different according to the function that its each several part is realized, can be divided into different logic moulds Block.
In one embodiment, following steps are realized when above-mentioned processing system is performed by the processor 12:
Framing sampling step, after the voice data of targeted customer of pending authentication is received, is called predetermined Convolutional neural networks CNN models carry out framing and sampling to the voice data, obtain speech sample data;
In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device. When gathering voice data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device is used with target Family keeps suitable distance, and does not have to the big voice capture device of distortion as far as possible, and power supply keeps electric current steady preferably using alternating current It is fixed;Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise Reason, is disturbed with further reduce.In order to extract to obtain the vocal print feature of voice data, the voice data gathered is default The voice data of data length, or be the voice data more than preset data length.
In a preferred embodiment, the voice data received is one-dimensional voice data, and framing sampling step is specific to wrap Include:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do The corresponding two-dimentional voice data of voice data;Using the convolution kernel of default specification, and based on the first default step-length, to the two dimension language Sound data carry out convolution;Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, Obtain the speech sample data.
Wherein, only stationarity is presented in voice signal within a short period of time, and framing is that one section of voice signal is divided into N sections in short-term Between voice signal, and have one section of duplicate block in order to avoid losing the continuity Characteristics of voice signal, between adjacent speech frame Domain, repeat region are generally the 1/2 of frame length.After framing, each frame is handled all as stationary signal.
Wherein, the convolution kernel for presetting specification can be the convolution kernel of 5*5, and the first default step-length can be 1*1, and second is default Step-length can be 2*2.
Extraction step, is handled the speech sample data using Predetermined filter to extract preset kind vocal print spy Sign, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print etc., and the present embodiment presets class Type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to Amount.
Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice The current vocal print discriminant vectors of data;
In the present embodiment, which is preferably gauss hybrid models, is calculated using the gauss hybrid models Vocal print feature vector, draws corresponding current vocal print discriminant vectors (i.e. i-vector).
Specifically, which includes:
1) Gauss model, is selected:First, every frame data are calculated using the parameter in common background channel model in difference The likelihood of Gauss model is to numerical value, by, to numerical matrix each column sorting in parallel, choosing top n Gauss model, finally to likelihood Obtain a matrix per frame data numerical value in mixed Gauss model:
Loglike=E (X) * D (X)-1*XT-0.5*D(X)-1*(X.2)T,
Wherein, Loglike trains the average square come to numerical matrix, E (X) for likelihood for common background channel model Battle array, D (X) are covariance matrix, and X is data matrix, X.2Each it is worth for matrix and is squared.
Wherein, likelihood is to numerical computational formulas:loglikesi=Ci+Ei*Covi -1*Xi-Xi T*Xi*Covi -1, loglikesi The i-th row vector for likelihood to numerical matrix, CiFor the constant term of i-th of model, EiFor the Mean Matrix of i-th of model, Covi For the covariance matrix of i-th of model, XiFor the i-th frame data.
2) posterior probability, is calculated:X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down Angular moment battle array, and element is arranged as to 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model Variance matrix, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data Value, then carries out Softmax recurrence, operation is finally normalized, and obtains every frame in mixed Gauss model Posterior probability distribution, The ProbabilityDistribution Vector of every frame is formed into probability matrix.
3) current vocal print discriminant vectors, are extracted:Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with Obtained by probability matrix row summation:
Wherein, GammaiFor i-th of element of coefficient of first order vector, loglikesjiFor Likelihood is to the jth row of numerical matrix, i-th of element.
Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix:
X=LoglikeT* feats, wherein, X is second order coefficient matrix, and loglike is likelihood to numerical matrix, feats It is characterized data matrix.
It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, then pass through first order and two Secondary item calculates current vocal print discriminant vectors.
Preferably, the process of training gauss hybrid models includes:
The voice data sample of default quantity (such as 100,000) is obtained, which is handled to obtain pre- If type vocal print feature, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set and the second ratio (such as 0.25) of the first ratio (such as 0.75) Verification collection, first ratio and the second ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, Verified using the accuracy rate of the gauss hybrid models after the verification set pair training;
If the accuracy rate is more than predetermined threshold value, model training terminates, using the gauss hybrid models after training before The background channel model stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample Amount, and training is re-started based on the voice data sample after increase.
Wherein, when the vocal print feature vector in using training set is trained gauss hybrid models, the D that extracts The corresponding likelihood probability of dimension vocal print feature can be expressed as with K Gaussian component:
Wherein, P (x) is the probability (mixing that voice data sample is generated by gauss hybrid models Gauss model), wkFor the weight of each Gauss model, probability that p (x | k) generate for sample by k-th of Gauss model, K is high This model quantity.
The parameter of whole gauss hybrid models can be expressed as:{wiii, wiFor the weight of i-th of Gauss model, μi For the average of i-th of Gauss model, ∑iFor the covariance of i-th of Gauss model.The training gauss hybrid models can use non-prison The EM algorithms superintended and directed, object function use maximal possibility estimation, i.e., make log-likelihood function maximum by selection parameter.Training is completed Afterwards, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained and is multiplied by matrix of covariance etc., An as trained gauss hybrid models.
The background channel model that the present embodiment is trained in advance is by the excavation to a large amount of voice data and compares trained Arrive, this model can accurately portray background sound when user speaks while the vocal print feature of user is retained to greatest extent Line feature, and can remove this feature in identification, and the inherent feature of user voice is extracted, it can significantly improve use The accuracy rate and efficiency of family authentication.
Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer that prestore it Between space length, authentication is carried out to the user based on the space length, and generates verification result.
In the present embodiment, vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, preferably Ground, the space length of the present embodiment is COS distance, and COS distance is to utilize two vectorial angle cosine values in vector space Measurement as the size for weighing two inter-individual differences.
Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors The identification information of its corresponding user is carried in storage, it is capable of the identity of the corresponding user of accurate representation.Calculating space Before distance, the identification information provided according to user obtains the standard vocal print discriminant vectors of storage.
Wherein, when the space length being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying Failure.
Compared with prior art, the present embodiment to targeted customer based on vocal print when carrying out authentication, using convolution god The speech processes of framing and sampling are carried out to voice data through network model, can quickly and efficiently obtain in voice data has Local data, extracts vocal print feature based on speech sample data and builds the identity that vocal print feature vector carries out targeted customer Verification, it is possible to increase the accuracy and efficiency of authentication;In addition, the present embodiment take full advantage of it is relevant with sound channel in voice Vocal print feature, this vocal print feature need not simultaneously be any limitation as text, thus be identified with verification during have compared with Big flexibility.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, above-mentioned extraction step includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency Spectral coefficient MFCC forms corresponding vocal print feature vector.
In the present embodiment, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that in voice data High frequency characteristics more highlights, and specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein, Z is voice data, and α is Constant factor, it is preferable that the value of α is 0.97;Due to speech sample data after framing to a certain extent away from original Voice, therefore, it is necessary to carry out windowing process to speech sample data.
In the present embodiment, it is, for example, to take the logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually Realized by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC. Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame speech sample data, by the mel-frequency cepstrum coefficient of every frame MFCC composition characteristic data matrixes, this feature data matrix are the vocal print feature vector of speech sample data.
The present embodiment takes speech sample data mel-frequency cepstrum coefficient MFCC to form corresponding vocal print feature vector, due to Its than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore can improve The accuracy of authentication.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, the verification step, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;If the cosine Distance is less than or equal to default distance threshold, then generates the information being verified;If the COS distance is more than default Distance threshold, the then information that generation verification does not pass through.
In the present embodiment, the mark letter of targeted customer can be carried when storing the standard vocal print discriminant vectors of targeted customer Breath, when verifying the identity of user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining Chordal distance verifies the identity of targeted customer, improves the accuracy of authentication.
As shown in Fig. 2, Fig. 2 is the flow diagram of auth method one embodiment of the invention based on vocal print, the base Comprise the following steps in the auth method of vocal print:
Step S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolution god Framing and sampling are carried out to the voice data through network C NN models, obtain speech sample data;
In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device. When gathering voice data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device is used with target Family keeps suitable distance, and does not have to the big voice capture device of distortion as far as possible, and power supply keeps electric current steady preferably using alternating current It is fixed;Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise Reason, is disturbed with further reduce.In order to extract to obtain the vocal print feature of voice data, the voice data gathered is default The voice data of data length, or be the voice data more than preset data length.
In a preferred embodiment, the voice data received is one-dimensional voice data, and framing sampling step is specific to wrap Include:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do The corresponding two-dimentional voice data of voice data;Using the convolution kernel of default specification, and based on the first default step-length, to the two dimension language Sound data carry out convolution;Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, Obtain the speech sample data.
Wherein, only stationarity is presented in voice signal within a short period of time, and framing is that one section of voice signal is divided into N sections in short-term Between voice signal, and have one section of duplicate block in order to avoid losing the continuity Characteristics of voice signal, between adjacent speech frame Domain, repeat region are generally the 1/2 of frame length.After framing, each frame is handled all as stationary signal.
Wherein, the convolution kernel for presetting specification can be the convolution kernel of 5*5, and the first default step-length can be 1*1, and second is default Step-length can be 2*2.
Step S2, is handled the speech sample data to extract preset kind vocal print feature using Predetermined filter, And the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print etc., and the present embodiment presets class Type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to Amount.
Step S3, the background channel model that vocal print feature vector input is trained in advance, to construct the voice number According to current vocal print discriminant vectors;
In the present embodiment, which is preferably gauss hybrid models, is calculated using the gauss hybrid models Vocal print feature vector, draws corresponding current vocal print discriminant vectors (i.e. i-vector).
Specifically, which includes:
1) Gauss model, is selected:First, every frame data are calculated using the parameter in common background channel model in difference The likelihood of Gauss model is to numerical value, by, to numerical matrix each column sorting in parallel, choosing top n Gauss model, finally to likelihood Obtain a matrix per frame data numerical value in mixed Gauss model:
Loglike=E (X) * D (X)-1*XT-0.5*D(X)-1*(X.2)T,
Wherein, Loglike trains the average square come to numerical matrix, E (X) for likelihood for common background channel model Battle array, D (X) are covariance matrix, and X is data matrix, X.2Each it is worth for matrix and is squared.
Wherein, likelihood is to numerical computational formulas:loglikesi=Ci+Ei*Covi -1*Xi-Xi T*Xi*Covi -1, loglikesi The i-th row vector for likelihood to numerical matrix, CiFor the constant term of i-th of model, EiFor the Mean Matrix of i-th of model, Covi For the covariance matrix of i-th of model, XiFor the i-th frame data.
2) posterior probability, is calculated:X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down Angular moment battle array, and element is arranged as to 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model Variance matrix, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data Value, then carries out Softmax recurrence, operation is finally normalized, and obtains every frame in mixed Gauss model Posterior probability distribution, The ProbabilityDistribution Vector of every frame is formed into probability matrix.
3) current vocal print discriminant vectors, are extracted:Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with Obtained by probability matrix row summation:
Wherein, GammaiFor i-th of element of coefficient of first order vector, loglikesjiFor Likelihood is to the jth row of numerical matrix, i-th of element.
Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix:
X=LoglikeT* feats, wherein, X is second order coefficient matrix, and loglike is likelihood to numerical matrix, feats It is characterized data matrix.
It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, then pass through first order and two Secondary item calculates current vocal print discriminant vectors.
Preferably, the process of training gauss hybrid models includes:
The voice data sample of default quantity (such as 100,000) is obtained, which is handled to obtain pre- If type vocal print feature, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set and the second ratio (such as 0.25) of the first ratio (such as 0.75) Verification collection, first ratio and the second ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, Verified using the accuracy rate of the gauss hybrid models after the verification set pair training;
If the accuracy rate is more than predetermined threshold value, model training terminates, using the gauss hybrid models after training before The background channel model stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample Amount, and training is re-started based on the voice data sample after increase.
Wherein, when the vocal print feature vector in using training set is trained gauss hybrid models, the D that extracts The corresponding likelihood probability of dimension vocal print feature can be expressed as with K Gaussian component:
Wherein, P (x) is the probability (mixing that voice data sample is generated by gauss hybrid models Gauss model), wkFor the weight of each Gauss model, probability that p (x | k) generate for sample by k-th of Gauss model, K is high This model quantity.
The parameter of whole gauss hybrid models can be expressed as:{wiii, wiFor the weight of i-th of Gauss model, μi For the average of i-th of Gauss model, ∑iFor the covariance of i-th of Gauss model.The training gauss hybrid models can use non-prison The EM algorithms superintended and directed, object function use maximal possibility estimation, i.e., make log-likelihood function maximum by selection parameter.Training is completed Afterwards, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained and is multiplied by matrix of covariance etc., An as trained gauss hybrid models.
The background channel model that the present embodiment is trained in advance is by the excavation to a large amount of voice data and compares trained Arrive, this model can accurately portray background sound when user speaks while the vocal print feature of user is retained to greatest extent Line feature, and can remove this feature in identification, and the inherent feature of user voice is extracted, it can significantly improve use The accuracy rate and efficiency of family authentication.
Step S4, calculates between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Space length, authentication is carried out to the user based on the space length, and generates verification result.
In the present embodiment, vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, preferably Ground, the space length of the present embodiment is COS distance, and COS distance is to utilize two vectorial angle cosine values in vector space Measurement as the size for weighing two inter-individual differences.
Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors The identification information of its corresponding user is carried in storage, it is capable of the identity of the corresponding user of accurate representation.Calculating space Before distance, the identification information provided according to user obtains the standard vocal print discriminant vectors of storage.
Wherein, when the space length being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying Failure.
Compared with prior art, the present embodiment to targeted customer based on vocal print when carrying out authentication, using convolution god The speech processes of framing and sampling are carried out to voice data through network model, can quickly and efficiently obtain in voice data has Local data, extracts vocal print feature based on speech sample data and builds the identity that vocal print feature vector carries out targeted customer Verification, it is possible to increase the accuracy and efficiency of authentication;In addition, the present embodiment take full advantage of it is relevant with sound channel in voice Vocal print feature, this vocal print feature need not simultaneously be any limitation as text, thus be identified with verification during have compared with Big flexibility.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, above-mentioned step S2 includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency Spectral coefficient MFCC forms corresponding vocal print feature vector.
In the present embodiment, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that in voice data High frequency characteristics more highlights, and specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein, Z is voice data, and α is Constant factor, it is preferable that the value of α is 0.97;Due to speech sample data after framing to a certain extent away from original Voice, therefore, it is necessary to carry out windowing process to speech sample data.
In the present embodiment, it is, for example, to take the logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually Realized by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC. Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame speech sample data, by the mel-frequency cepstrum coefficient of every frame MFCC composition characteristic data matrixes, this feature data matrix are the vocal print feature vector of speech sample data.
The present embodiment takes speech sample data mel-frequency cepstrum coefficient MFCC to form corresponding vocal print feature vector, due to Its than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore can improve The accuracy of authentication.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, the step S4, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;If the cosine Distance is less than or equal to default distance threshold, then generates the information being verified;If the COS distance is more than default Distance threshold, the then information that generation verification does not pass through.
In the present embodiment, the mark letter of targeted customer can be carried when storing the standard vocal print discriminant vectors of targeted customer Breath, when verifying the identity of user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining Chordal distance verifies the identity of targeted customer, improves the accuracy of authentication.
The present invention also provides a kind of computer-readable recording medium, processing is stored with the computer-readable recording medium The step of system, the processing system realizes above-mentioned auth method based on vocal print when being executed by processor.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, takes Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of electronic device, it is characterised in that the electronic device includes memory and the processing being connected with the memory Device, is stored with the processing system that can be run on the processor in the memory, the processing system is by the processor Following steps are realized during execution:
Framing sampling step, after the voice data of targeted customer of pending authentication is received, calls predetermined convolution Neutral net CNN models carry out framing and sampling to the voice data, obtain speech sample data;
Extraction step, is handled the speech sample data to extract preset kind vocal print feature using Predetermined filter, and The corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice data Current vocal print discriminant vectors;
Verification step, calculates between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Space length, authentication is carried out based on the space length to the user, and generates verification result.
2. electronic device according to claim 1, it is characterised in that the framing sampling step, specifically includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, obtains the voice The corresponding two-dimentional voice data of data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, obtain the voice Sampled data.
3. electronic device according to claim 1 or 2, it is characterised in that the extraction step, specifically includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window is corresponded to Frequency spectrum, the frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum system Number MFCC forms corresponding vocal print feature vector.
4. electronic device according to claim 1 or 2, it is characterised in that the verification step, specifically includes:
Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore: For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
5. a kind of auth method based on vocal print, it is characterised in that the auth method based on vocal print includes:
S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolutional neural networks CNN Model carries out framing and sampling to the voice data, obtains speech sample data;
S2, is handled the speech sample data using Predetermined filter to extract preset kind vocal print feature, and is based on being somebody's turn to do Preset kind vocal print feature builds the corresponding vocal print feature vector of the voice data;
S3, the background channel model that vocal print feature vector input is trained in advance, to construct the current of the voice data Vocal print discriminant vectors;
S4, calculate space between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore away from From carrying out authentication to the user based on the space length, and generate verification result.
6. the auth method according to claim 5 based on vocal print, it is characterised in that the step S1 includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, obtains the voice The corresponding two-dimentional voice data of data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, obtain the voice Sampled data.
7. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the step S2 includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window is corresponded to Frequency spectrum, the frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum system Number MFCC forms corresponding vocal print feature vector.
8. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the step S4 includes:
Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore: For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
9. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the background channel mould Type is gauss hybrid models, is included before the step S3:
The voice data sample of default quantity is obtained, which is handled to obtain preset kind vocal print feature, And corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set of the first ratio and the verification collection of the second ratio, first ratio and second Ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, are utilized The accuracy rate of gauss hybrid models after the verification set pair training is verified;
If the accuracy rate is more than predetermined threshold value, model training terminates, and the back of the body is used as using the gauss hybrid models after training Scape channel model, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the quantity of the voice data sample, and Training is re-started based on the voice data sample after increase.
10. a kind of computer-readable recording medium, it is characterised in that be stored with processing system on the computer-readable recording medium System, realizes that the identity based on vocal print as any one of claim 5 to 9 is tested when the processing system is executed by processor The step of card method.
CN201711161344.0A 2017-11-21 2017-11-21 Electronic device, auth method and storage medium based on vocal print Pending CN107993071A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711161344.0A CN107993071A (en) 2017-11-21 2017-11-21 Electronic device, auth method and storage medium based on vocal print
PCT/CN2018/076113 WO2019100606A1 (en) 2017-11-21 2018-02-10 Electronic device, voiceprint-based identity verification method and system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711161344.0A CN107993071A (en) 2017-11-21 2017-11-21 Electronic device, auth method and storage medium based on vocal print

Publications (1)

Publication Number Publication Date
CN107993071A true CN107993071A (en) 2018-05-04

Family

ID=62031709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711161344.0A Pending CN107993071A (en) 2017-11-21 2017-11-21 Electronic device, auth method and storage medium based on vocal print

Country Status (2)

Country Link
CN (1) CN107993071A (en)
WO (1) WO2019100606A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108650266A (en) * 2018-05-14 2018-10-12 平安科技(深圳)有限公司 Server, the method for voice print verification and storage medium
CN108648759A (en) * 2018-05-14 2018-10-12 华南理工大学 A kind of method for recognizing sound-groove that text is unrelated
CN108806696A (en) * 2018-05-08 2018-11-13 平安科技(深圳)有限公司 Establish method, apparatus, computer equipment and the storage medium of sound-groove model
CN110265037A (en) * 2019-06-13 2019-09-20 中信银行股份有限公司 Auth method, device, electronic equipment and computer readable storage medium
CN110556126A (en) * 2019-09-16 2019-12-10 平安科技(深圳)有限公司 Voice recognition method and device and computer equipment
WO2019237518A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Model library establishment method, voice recognition method and apparatus, and device and medium
CN110634492A (en) * 2019-06-13 2019-12-31 中信银行股份有限公司 Login verification method and device, electronic equipment and computer readable storage medium
CN110782879A (en) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 Sample size-based voiceprint clustering method, device, equipment and storage medium
WO2020073519A1 (en) * 2018-10-11 2020-04-16 平安科技(深圳)有限公司 Voiceprint verification method and apparatus, computer device and storage medium
CN111477235A (en) * 2020-04-15 2020-07-31 厦门快商通科技股份有限公司 Voiceprint acquisition method, device and equipment
CN111524525A (en) * 2020-04-28 2020-08-11 平安科技(深圳)有限公司 Original voice voiceprint recognition method, device, equipment and storage medium
CN111552832A (en) * 2020-04-01 2020-08-18 深圳壹账通智能科技有限公司 Risk user identification method and device based on voiceprint features and associated graph data
CN111862933A (en) * 2020-07-20 2020-10-30 北京字节跳动网络技术有限公司 Method, apparatus, apparatus and medium for generating synthetic speech
CN112331217A (en) * 2020-11-02 2021-02-05 泰康保险集团股份有限公司 Voiceprint recognition method and device, storage medium and electronic equipment
CN112669820A (en) * 2020-12-16 2021-04-16 平安科技(深圳)有限公司 Examination cheating recognition method and device based on voice recognition and computer equipment
CN113177816A (en) * 2020-01-08 2021-07-27 阿里巴巴集团控股有限公司 Information processing method and device
CN114780787A (en) * 2022-04-01 2022-07-22 杭州半云科技有限公司 Voiceprint retrieval method, identity verification method, identity registration method and device
CN115086045A (en) * 2022-06-17 2022-09-20 海南大学 Data security protection method and device based on voiceprint forgery detection
CN115358749A (en) * 2022-08-09 2022-11-18 平安银行股份有限公司 Identity verification method, identity verification device, server and computer readable storage medium
CN118568701A (en) * 2024-07-30 2024-08-30 青岛大学 A secure authentication method based on secure computer

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Voice activity detection method and voice activity detector
CN101894566A (en) * 2010-07-23 2010-11-24 北京理工大学 Visualization method of Chinese mandarin complex vowels based on formant frequency
CN101923855A (en) * 2009-06-17 2010-12-22 复旦大学 Text-independent Voiceprint Recognition System
CN103310273A (en) * 2013-06-26 2013-09-18 南京邮电大学 Method for articulating Chinese vowels with tones and based on DIVA model
CN106682574A (en) * 2016-11-18 2017-05-17 哈尔滨工程大学 One-dimensional deep convolution network underwater multi-target recognition method
CN106847302A (en) * 2017-02-17 2017-06-13 大连理工大学 Single-channel Mixed Speech Separation Method in Time Domain Based on Convolutional Neural Network
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN107240397A (en) * 2017-08-14 2017-10-10 广东工业大学 A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106205606A (en) * 2016-08-15 2016-12-07 南京邮电大学 A kind of dynamic positioning and monitoring method based on speech recognition and system
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197130A (en) * 2006-12-07 2008-06-11 华为技术有限公司 Voice activity detection method and voice activity detector
CN101923855A (en) * 2009-06-17 2010-12-22 复旦大学 Text-independent Voiceprint Recognition System
CN101894566A (en) * 2010-07-23 2010-11-24 北京理工大学 Visualization method of Chinese mandarin complex vowels based on formant frequency
CN103310273A (en) * 2013-06-26 2013-09-18 南京邮电大学 Method for articulating Chinese vowels with tones and based on DIVA model
CN106682574A (en) * 2016-11-18 2017-05-17 哈尔滨工程大学 One-dimensional deep convolution network underwater multi-target recognition method
CN106847302A (en) * 2017-02-17 2017-06-13 大连理工大学 Single-channel Mixed Speech Separation Method in Time Domain Based on Convolutional Neural Network
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN107240397A (en) * 2017-08-14 2017-10-10 广东工业大学 A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡青: "卷积神经网络在声纹识别中的应用研究", 《中国优秀硕士学位论文全文数据库》 *
胡青等: "基于卷积神经网络的说话人识别算法", 《信息网络安全》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806696A (en) * 2018-05-08 2018-11-13 平安科技(深圳)有限公司 Establish method, apparatus, computer equipment and the storage medium of sound-groove model
CN108806696B (en) * 2018-05-08 2020-06-05 平安科技(深圳)有限公司 Method, apparatus, computer equipment and storage medium for establishing voiceprint model
CN108650266B (en) * 2018-05-14 2020-02-18 平安科技(深圳)有限公司 Server, voiceprint verification method and storage medium
CN108648759A (en) * 2018-05-14 2018-10-12 华南理工大学 A kind of method for recognizing sound-groove that text is unrelated
CN108650266A (en) * 2018-05-14 2018-10-12 平安科技(深圳)有限公司 Server, the method for voice print verification and storage medium
WO2019218512A1 (en) * 2018-05-14 2019-11-21 平安科技(深圳)有限公司 Server, voiceprint verification method, and storage medium
WO2019237518A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Model library establishment method, voice recognition method and apparatus, and device and medium
WO2020073519A1 (en) * 2018-10-11 2020-04-16 平安科技(深圳)有限公司 Voiceprint verification method and apparatus, computer device and storage medium
CN110634492A (en) * 2019-06-13 2019-12-31 中信银行股份有限公司 Login verification method and device, electronic equipment and computer readable storage medium
CN110634492B (en) * 2019-06-13 2023-08-25 中信银行股份有限公司 Login verification method, login verification device, electronic equipment and computer readable storage medium
CN110265037A (en) * 2019-06-13 2019-09-20 中信银行股份有限公司 Auth method, device, electronic equipment and computer readable storage medium
CN110556126A (en) * 2019-09-16 2019-12-10 平安科技(深圳)有限公司 Voice recognition method and device and computer equipment
CN110556126B (en) * 2019-09-16 2024-01-05 平安科技(深圳)有限公司 Speech recognition method and device and computer equipment
CN110782879A (en) * 2019-09-18 2020-02-11 平安科技(深圳)有限公司 Sample size-based voiceprint clustering method, device, equipment and storage medium
CN113177816A (en) * 2020-01-08 2021-07-27 阿里巴巴集团控股有限公司 Information processing method and device
CN111552832A (en) * 2020-04-01 2020-08-18 深圳壹账通智能科技有限公司 Risk user identification method and device based on voiceprint features and associated graph data
CN111477235A (en) * 2020-04-15 2020-07-31 厦门快商通科技股份有限公司 Voiceprint acquisition method, device and equipment
CN111524525B (en) * 2020-04-28 2023-06-16 平安科技(深圳)有限公司 Voiceprint recognition method, device, equipment and storage medium of original voice
CN111524525A (en) * 2020-04-28 2020-08-11 平安科技(深圳)有限公司 Original voice voiceprint recognition method, device, equipment and storage medium
CN111862933A (en) * 2020-07-20 2020-10-30 北京字节跳动网络技术有限公司 Method, apparatus, apparatus and medium for generating synthetic speech
CN112331217A (en) * 2020-11-02 2021-02-05 泰康保险集团股份有限公司 Voiceprint recognition method and device, storage medium and electronic equipment
CN112331217B (en) * 2020-11-02 2023-09-12 泰康保险集团股份有限公司 Voiceprint recognition method and device, storage medium and electronic equipment
CN112669820A (en) * 2020-12-16 2021-04-16 平安科技(深圳)有限公司 Examination cheating recognition method and device based on voice recognition and computer equipment
CN112669820B (en) * 2020-12-16 2023-08-04 平安科技(深圳)有限公司 Examination cheating recognition method and device based on voice recognition and computer equipment
CN114780787A (en) * 2022-04-01 2022-07-22 杭州半云科技有限公司 Voiceprint retrieval method, identity verification method, identity registration method and device
CN115086045A (en) * 2022-06-17 2022-09-20 海南大学 Data security protection method and device based on voiceprint forgery detection
CN115358749A (en) * 2022-08-09 2022-11-18 平安银行股份有限公司 Identity verification method, identity verification device, server and computer readable storage medium
CN118568701A (en) * 2024-07-30 2024-08-30 青岛大学 A secure authentication method based on secure computer

Also Published As

Publication number Publication date
WO2019100606A1 (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN107993071A (en) Electronic device, auth method and storage medium based on vocal print
CN107527620B (en) Electronic device, the method for authentication and computer readable storage medium
CN110556126B (en) Speech recognition method and device and computer equipment
TWI641965B (en) Method and system of authentication based on voiceprint recognition
CN107680586B (en) Far-field speech acoustic model training method and system
CN107481717B (en) Acoustic model training method and system
CN111933154B (en) Method, equipment and computer readable storage medium for recognizing fake voice
WO2019136912A1 (en) Electronic device, identity authentication method and system, and storage medium
CN109147798B (en) Speech recognition method, device, electronic equipment and readable storage medium
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN110265035B (en) Speaker recognition method based on deep learning
CN108630208B (en) Server, voiceprint-based identity authentication method and storage medium
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN108694952B (en) Electronic device, identity authentication method and storage medium
CN108281158A (en) Voice biopsy method, server and storage medium based on deep learning
CN111161713A (en) Voice gender identification method and device and computing equipment
CN111798047A (en) Wind control prediction method and device, electronic equipment and storage medium
CN109378014A (en) A method and system for source identification of mobile devices based on convolutional neural network
CN108650266B (en) Server, voiceprint verification method and storage medium
CN116913304A (en) Real-time voice stream noise reduction method and device, computer equipment and storage medium
CN116504276A (en) Emotion classification method and device based on artificial intelligence, computer equipment and medium
CN115223569B (en) Speaker verification method, terminal and storage medium based on deep neural network
CN114048770B (en) Automatic detection method and system for digital audio deletion and insertion tampering operation
CN113035176A (en) Voice data processing method and device, computer equipment and storage medium
CN114067834A (en) Bad preamble recognition method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180504