CN107993071A - Electronic device, auth method and storage medium based on vocal print - Google Patents
Electronic device, auth method and storage medium based on vocal print Download PDFInfo
- Publication number
- CN107993071A CN107993071A CN201711161344.0A CN201711161344A CN107993071A CN 107993071 A CN107993071 A CN 107993071A CN 201711161344 A CN201711161344 A CN 201711161344A CN 107993071 A CN107993071 A CN 107993071A
- Authority
- CN
- China
- Prior art keywords
- vocal print
- voice data
- data
- print feature
- discriminant vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4014—Identity check for transactions
- G06Q20/40145—Biometric identity checks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Evolutionary Computation (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Collating Specific Patterns (AREA)
Abstract
Auth method and storage medium the present invention relates to a kind of electronic device, based on vocal print, this method include:After the voice data of targeted customer of pending authentication is received, call predetermined convolutional neural networks CNN models to carry out framing and sampling to the voice data, obtain speech sample data;The speech sample data is handled using Predetermined filter to extract preset kind vocal print feature, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;The background channel model that vocal print feature vector input is trained in advance, to construct the current vocal print discriminant vectors of the voice data;The space length between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore is calculated, authentication is carried out to the user based on the space length, and generate verification result.The present invention can improve the accuracy and efficiency of authentication.
Description
Technical field
The present invention relates to field of communication technology, more particularly to a kind of electronic device, the auth method based on vocal print and
Storage medium.
Background technology
At present, the scope of business of many large-scale financing corporations is related to multiple business such as insurance, bank, investment, and every
A business is usually required for same client to be linked up, and therefore, the authentication to client, which also just becomes, ensures service security
Important component.In order to meet the real-time demand of business, current this kind of financing corporation's generally use manual type is to visitor
The identity at family carries out analysis verification, but since customer group is huge, single by manually carrying out, discriminant analysis is not only time-consuming to take
Power, easily error, and operating cost can be greatly increased;In addition, some financing corporations attempt to use the automatic recognition of speech
Mode the identity of user is differentiated automatically, however, the accuracy rate of this kind of existing automatic recognition of speech mode is low, need
Improve.Therefore, how the high automatic recognition of speech scheme of accuracy is provided and has become a technical problem urgently to be resolved hurrily.
The content of the invention
Auth method and storage medium it is an object of the invention to provide a kind of electronic device, based on vocal print, purport
Improving the accuracy and efficiency of authentication.
To achieve the above object, the present invention provides a kind of electronic device, the electronic device include memory and with it is described
The processor of memory connection, is stored with the processing system that can be run on the processor, the processing in the memory
System realizes following steps when being performed by the processor:
Framing sampling step, after the voice data of targeted customer of pending authentication is received, is called predetermined
Convolutional neural networks CNN models carry out framing and sampling to the voice data, obtain speech sample data;
Extraction step, is handled the speech sample data using Predetermined filter to extract preset kind vocal print spy
Sign, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice
The current vocal print discriminant vectors of data;
Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer that prestore it
Between space length, authentication is carried out to the user based on the space length, and generates verification result.
Preferably, the framing sampling step, specifically includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do
The corresponding two-dimentional voice data of voice data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, are obtained described
Speech sample data.
Preferably, the extraction step, specifically includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains
Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms corresponding vocal print feature vector.
Preferably, the verification step, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
To achieve the above object, it is described based on vocal print the present invention also provides a kind of auth method based on vocal print
Auth method includes:
S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolutional Neural net
Network CNN models carry out framing and sampling to the voice data, obtain speech sample data;
S2, is handled the speech sample data using Predetermined filter to extract preset kind vocal print feature, and base
The corresponding vocal print feature vector of the voice data is built in the preset kind vocal print feature;
S3, the background channel model that vocal print feature vector input is trained in advance, to construct the voice data
Current vocal print discriminant vectors;
S4, calculates the sky between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Between distance, authentication is carried out to the user based on the space length, and generates verification result.
Preferably, the step S1 includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do
The corresponding two-dimentional voice data of voice data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, are obtained described
Speech sample data.
Preferably, the step S2 includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains
Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms corresponding vocal print feature vector.
Preferably, the step S4 includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
Preferably, the background channel model is gauss hybrid models, is included before the step S3:
The voice data sample of default quantity is obtained, which is handled to obtain preset kind vocal print spy
Sign, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set of the first ratio and the verification collection of the second ratio, first ratio and
Second ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training,
Verified using the accuracy rate of the gauss hybrid models after the verification set pair training;
If the accuracy rate is more than predetermined threshold value, model training terminates, and institute is used as using the gauss hybrid models after training
Background channel model is stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample
Amount, and training is re-started based on the voice data sample after increase.
The present invention also provides a kind of computer-readable recording medium, processing is stored with the computer-readable recording medium
The step of system, the processing system realizes above-mentioned auth method based on vocal print when being executed by processor.
The beneficial effects of the invention are as follows:The present invention to targeted customer based on vocal print when carrying out authentication, using convolution
Neural network model carries out voice data the speech processes of framing and sampling, can quickly and efficiently obtain in voice data
Useful local data, extracts vocal print feature based on speech sample data and builds the body that vocal print feature vector carries out targeted customer
Part verification, it is possible to increase the accuracy and efficiency of authentication.
Brief description of the drawings
Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention;
Fig. 2 is the flow diagram of auth method one embodiment of the invention based on vocal print.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not
For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before creative work is made
All other embodiments obtained are put, belong to the scope of protection of the invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is only used for description purpose, and cannot
It is interpreted as indicating or implies its relative importance or imply the quantity of the technical characteristic indicated by indicating.Thus, define " the
One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the skill between each embodiment
Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical solution
It will be understood that the combination of this technical solution is not present with reference to there is conflicting or can not realize when, also not in application claims
Protection domain within.
As shown in fig.1, Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention.Electronic device 1 is
It is a kind of to carry out numerical computations and/or the equipment of information processing automatically according to the instruction for being previously set or storing.The electricity
Sub-device 1 can be computer, can also be single network server, the server group or base of multiple webservers composition
In the cloud being made of a large amount of hosts or the webserver of cloud computing, wherein cloud computing is one kind of Distributed Calculation, by one
One super virtual computer of the computer collection composition of group's loose couplings.
In the present embodiment, electronic device 1 may include, but be not limited only to, and can be in communication with each other connection by system bus
Memory 11, processor 12, network interface 13, memory 11 are stored with the processing system that can be run on the processor 12.Need
, it is noted that Fig. 1 illustrate only the electronic device 1 with component 11-13, it should be understood that being not required for implementing all
The component shown, what can be substituted implements more or less components.
Wherein, memory 11 includes memory and the readable storage medium storing program for executing of at least one type.Inside save as the fortune of electronic device 1
Row provides caching;Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories
Deng), random access storage device (RAM), static random-access memory (SRAM), read-only storage (ROM), electric erasable can compile
Journey read-only storage (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile
Storage medium.In certain embodiments, readable storage medium storing program for executing can be the internal storage unit of electronic device 1, such as the electronics
The hard disk of device 1;In further embodiments, which can also be that the external storage of electronic device 1 is set
Plug-in type hard disk that is standby, such as being equipped with electronic device 1, intelligent memory card (Smart Media Card, SMC), secure digital
(Secure Digital, SD) blocks, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11
The operating system and types of applications software of electronic device 1, such as the place in one embodiment of the invention are installed on commonly used in storage
Program code of reason system etc..Export or will export each in addition, memory 11 can be also used for temporarily storing
Class data.
The processor 12 can be in certain embodiments central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is commonly used in the control electricity
The overall operation of sub-device 1, such as perform and carry out data interaction or communicate relevant control and processing with the other equipment
Deng.In the present embodiment, the processor 12 is used to run the program code stored in the memory 11 or processing data, example
Such as run processing system.
The network interface 13 may include radio network interface or wired network interface, which is commonly used in
Communication connection is established between the electronic device 1 and other electronic equipments.In the present embodiment, network interface 13 is mainly used for electricity
Sub-device 1 is connected with other equipment, establishes data transmission channel and communication connection, to receive the target of pending authentication use
The voice data at family.
The processing system is stored in memory 11, including it is at least one be stored in it is computer-readable in memory 11
Instruction, which can be performed by processor device 12, to realize the method for each embodiment of the application;With
And at least one computer-readable instruction is different according to the function that its each several part is realized, can be divided into different logic moulds
Block.
In one embodiment, following steps are realized when above-mentioned processing system is performed by the processor 12:
Framing sampling step, after the voice data of targeted customer of pending authentication is received, is called predetermined
Convolutional neural networks CNN models carry out framing and sampling to the voice data, obtain speech sample data;
In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device.
When gathering voice data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device is used with target
Family keeps suitable distance, and does not have to the big voice capture device of distortion as far as possible, and power supply keeps electric current steady preferably using alternating current
It is fixed;Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise
Reason, is disturbed with further reduce.In order to extract to obtain the vocal print feature of voice data, the voice data gathered is default
The voice data of data length, or be the voice data more than preset data length.
In a preferred embodiment, the voice data received is one-dimensional voice data, and framing sampling step is specific to wrap
Include:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do
The corresponding two-dimentional voice data of voice data;Using the convolution kernel of default specification, and based on the first default step-length, to the two dimension language
Sound data carry out convolution;Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution,
Obtain the speech sample data.
Wherein, only stationarity is presented in voice signal within a short period of time, and framing is that one section of voice signal is divided into N sections in short-term
Between voice signal, and have one section of duplicate block in order to avoid losing the continuity Characteristics of voice signal, between adjacent speech frame
Domain, repeat region are generally the 1/2 of frame length.After framing, each frame is handled all as stationary signal.
Wherein, the convolution kernel for presetting specification can be the convolution kernel of 5*5, and the first default step-length can be 1*1, and second is default
Step-length can be 2*2.
Extraction step, is handled the speech sample data using Predetermined filter to extract preset kind vocal print spy
Sign, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print etc., and the present embodiment presets class
Type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data
Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted
The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to
Amount.
Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice
The current vocal print discriminant vectors of data;
In the present embodiment, which is preferably gauss hybrid models, is calculated using the gauss hybrid models
Vocal print feature vector, draws corresponding current vocal print discriminant vectors (i.e. i-vector).
Specifically, which includes:
1) Gauss model, is selected:First, every frame data are calculated using the parameter in common background channel model in difference
The likelihood of Gauss model is to numerical value, by, to numerical matrix each column sorting in parallel, choosing top n Gauss model, finally to likelihood
Obtain a matrix per frame data numerical value in mixed Gauss model:
Loglike=E (X) * D (X)-1*XT-0.5*D(X)-1*(X.2)T,
Wherein, Loglike trains the average square come to numerical matrix, E (X) for likelihood for common background channel model
Battle array, D (X) are covariance matrix, and X is data matrix, X.2Each it is worth for matrix and is squared.
Wherein, likelihood is to numerical computational formulas:loglikesi=Ci+Ei*Covi -1*Xi-Xi T*Xi*Covi -1, loglikesi
The i-th row vector for likelihood to numerical matrix, CiFor the constant term of i-th of model, EiFor the Mean Matrix of i-th of model, Covi
For the covariance matrix of i-th of model, XiFor the i-th frame data.
2) posterior probability, is calculated:X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down
Angular moment battle array, and element is arranged as to 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude
Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model
Variance matrix, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background
Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data
Value, then carries out Softmax recurrence, operation is finally normalized, and obtains every frame in mixed Gauss model Posterior probability distribution,
The ProbabilityDistribution Vector of every frame is formed into probability matrix.
3) current vocal print discriminant vectors, are extracted:Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with
Obtained by probability matrix row summation:
Wherein, GammaiFor i-th of element of coefficient of first order vector, loglikesjiFor
Likelihood is to the jth row of numerical matrix, i-th of element.
Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix:
X=LoglikeT* feats, wherein, X is second order coefficient matrix, and loglike is likelihood to numerical matrix, feats
It is characterized data matrix.
It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, then pass through first order and two
Secondary item calculates current vocal print discriminant vectors.
Preferably, the process of training gauss hybrid models includes:
The voice data sample of default quantity (such as 100,000) is obtained, which is handled to obtain pre-
If type vocal print feature, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set and the second ratio (such as 0.25) of the first ratio (such as 0.75)
Verification collection, first ratio and the second ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training,
Verified using the accuracy rate of the gauss hybrid models after the verification set pair training;
If the accuracy rate is more than predetermined threshold value, model training terminates, using the gauss hybrid models after training before
The background channel model stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample
Amount, and training is re-started based on the voice data sample after increase.
Wherein, when the vocal print feature vector in using training set is trained gauss hybrid models, the D that extracts
The corresponding likelihood probability of dimension vocal print feature can be expressed as with K Gaussian component:
Wherein, P (x) is the probability (mixing that voice data sample is generated by gauss hybrid models
Gauss model), wkFor the weight of each Gauss model, probability that p (x | k) generate for sample by k-th of Gauss model, K is high
This model quantity.
The parameter of whole gauss hybrid models can be expressed as:{wi,μi,Σi, wiFor the weight of i-th of Gauss model, μi
For the average of i-th of Gauss model, ∑iFor the covariance of i-th of Gauss model.The training gauss hybrid models can use non-prison
The EM algorithms superintended and directed, object function use maximal possibility estimation, i.e., make log-likelihood function maximum by selection parameter.Training is completed
Afterwards, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained and is multiplied by matrix of covariance etc.,
An as trained gauss hybrid models.
The background channel model that the present embodiment is trained in advance is by the excavation to a large amount of voice data and compares trained
Arrive, this model can accurately portray background sound when user speaks while the vocal print feature of user is retained to greatest extent
Line feature, and can remove this feature in identification, and the inherent feature of user voice is extracted, it can significantly improve use
The accuracy rate and efficiency of family authentication.
Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer that prestore it
Between space length, authentication is carried out to the user based on the space length, and generates verification result.
In the present embodiment, vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, preferably
Ground, the space length of the present embodiment is COS distance, and COS distance is to utilize two vectorial angle cosine values in vector space
Measurement as the size for weighing two inter-individual differences.
Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors
The identification information of its corresponding user is carried in storage, it is capable of the identity of the corresponding user of accurate representation.Calculating space
Before distance, the identification information provided according to user obtains the standard vocal print discriminant vectors of storage.
Wherein, when the space length being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying
Failure.
Compared with prior art, the present embodiment to targeted customer based on vocal print when carrying out authentication, using convolution god
The speech processes of framing and sampling are carried out to voice data through network model, can quickly and efficiently obtain in voice data has
Local data, extracts vocal print feature based on speech sample data and builds the identity that vocal print feature vector carries out targeted customer
Verification, it is possible to increase the accuracy and efficiency of authentication;In addition, the present embodiment take full advantage of it is relevant with sound channel in voice
Vocal print feature, this vocal print feature need not simultaneously be any limitation as text, thus be identified with verification during have compared with
Big flexibility.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, above-mentioned extraction step includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains
Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms corresponding vocal print feature vector.
In the present embodiment, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that in voice data
High frequency characteristics more highlights, and specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein, Z is voice data, and α is
Constant factor, it is preferable that the value of α is 0.97;Due to speech sample data after framing to a certain extent away from original
Voice, therefore, it is necessary to carry out windowing process to speech sample data.
In the present embodiment, it is, for example, to take the logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually
Realized by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC.
Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame speech sample data, by the mel-frequency cepstrum coefficient of every frame
MFCC composition characteristic data matrixes, this feature data matrix are the vocal print feature vector of speech sample data.
The present embodiment takes speech sample data mel-frequency cepstrum coefficient MFCC to form corresponding vocal print feature vector, due to
Its than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore can improve
The accuracy of authentication.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, the verification step, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;If the cosine
Distance is less than or equal to default distance threshold, then generates the information being verified;If the COS distance is more than default
Distance threshold, the then information that generation verification does not pass through.
In the present embodiment, the mark letter of targeted customer can be carried when storing the standard vocal print discriminant vectors of targeted customer
Breath, when verifying the identity of user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects
It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining
Chordal distance verifies the identity of targeted customer, improves the accuracy of authentication.
As shown in Fig. 2, Fig. 2 is the flow diagram of auth method one embodiment of the invention based on vocal print, the base
Comprise the following steps in the auth method of vocal print:
Step S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolution god
Framing and sampling are carried out to the voice data through network C NN models, obtain speech sample data;
In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device.
When gathering voice data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device is used with target
Family keeps suitable distance, and does not have to the big voice capture device of distortion as far as possible, and power supply keeps electric current steady preferably using alternating current
It is fixed;Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise
Reason, is disturbed with further reduce.In order to extract to obtain the vocal print feature of voice data, the voice data gathered is default
The voice data of data length, or be the voice data more than preset data length.
In a preferred embodiment, the voice data received is one-dimensional voice data, and framing sampling step is specific to wrap
Include:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do
The corresponding two-dimentional voice data of voice data;Using the convolution kernel of default specification, and based on the first default step-length, to the two dimension language
Sound data carry out convolution;Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution,
Obtain the speech sample data.
Wherein, only stationarity is presented in voice signal within a short period of time, and framing is that one section of voice signal is divided into N sections in short-term
Between voice signal, and have one section of duplicate block in order to avoid losing the continuity Characteristics of voice signal, between adjacent speech frame
Domain, repeat region are generally the 1/2 of frame length.After framing, each frame is handled all as stationary signal.
Wherein, the convolution kernel for presetting specification can be the convolution kernel of 5*5, and the first default step-length can be 1*1, and second is default
Step-length can be 2*2.
Step S2, is handled the speech sample data to extract preset kind vocal print feature using Predetermined filter,
And the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print etc., and the present embodiment presets class
Type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data
Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted
The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to
Amount.
Step S3, the background channel model that vocal print feature vector input is trained in advance, to construct the voice number
According to current vocal print discriminant vectors;
In the present embodiment, which is preferably gauss hybrid models, is calculated using the gauss hybrid models
Vocal print feature vector, draws corresponding current vocal print discriminant vectors (i.e. i-vector).
Specifically, which includes:
1) Gauss model, is selected:First, every frame data are calculated using the parameter in common background channel model in difference
The likelihood of Gauss model is to numerical value, by, to numerical matrix each column sorting in parallel, choosing top n Gauss model, finally to likelihood
Obtain a matrix per frame data numerical value in mixed Gauss model:
Loglike=E (X) * D (X)-1*XT-0.5*D(X)-1*(X.2)T,
Wherein, Loglike trains the average square come to numerical matrix, E (X) for likelihood for common background channel model
Battle array, D (X) are covariance matrix, and X is data matrix, X.2Each it is worth for matrix and is squared.
Wherein, likelihood is to numerical computational formulas:loglikesi=Ci+Ei*Covi -1*Xi-Xi T*Xi*Covi -1, loglikesi
The i-th row vector for likelihood to numerical matrix, CiFor the constant term of i-th of model, EiFor the Mean Matrix of i-th of model, Covi
For the covariance matrix of i-th of model, XiFor the i-th frame data.
2) posterior probability, is calculated:X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down
Angular moment battle array, and element is arranged as to 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude
Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model
Variance matrix, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background
Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data
Value, then carries out Softmax recurrence, operation is finally normalized, and obtains every frame in mixed Gauss model Posterior probability distribution,
The ProbabilityDistribution Vector of every frame is formed into probability matrix.
3) current vocal print discriminant vectors, are extracted:Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with
Obtained by probability matrix row summation:
Wherein, GammaiFor i-th of element of coefficient of first order vector, loglikesjiFor
Likelihood is to the jth row of numerical matrix, i-th of element.
Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix:
X=LoglikeT* feats, wherein, X is second order coefficient matrix, and loglike is likelihood to numerical matrix, feats
It is characterized data matrix.
It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, then pass through first order and two
Secondary item calculates current vocal print discriminant vectors.
Preferably, the process of training gauss hybrid models includes:
The voice data sample of default quantity (such as 100,000) is obtained, which is handled to obtain pre-
If type vocal print feature, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set and the second ratio (such as 0.25) of the first ratio (such as 0.75)
Verification collection, first ratio and the second ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training,
Verified using the accuracy rate of the gauss hybrid models after the verification set pair training;
If the accuracy rate is more than predetermined threshold value, model training terminates, using the gauss hybrid models after training before
The background channel model stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample
Amount, and training is re-started based on the voice data sample after increase.
Wherein, when the vocal print feature vector in using training set is trained gauss hybrid models, the D that extracts
The corresponding likelihood probability of dimension vocal print feature can be expressed as with K Gaussian component:
Wherein, P (x) is the probability (mixing that voice data sample is generated by gauss hybrid models
Gauss model), wkFor the weight of each Gauss model, probability that p (x | k) generate for sample by k-th of Gauss model, K is high
This model quantity.
The parameter of whole gauss hybrid models can be expressed as:{wi,μi,Σi, wiFor the weight of i-th of Gauss model, μi
For the average of i-th of Gauss model, ∑iFor the covariance of i-th of Gauss model.The training gauss hybrid models can use non-prison
The EM algorithms superintended and directed, object function use maximal possibility estimation, i.e., make log-likelihood function maximum by selection parameter.Training is completed
Afterwards, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained and is multiplied by matrix of covariance etc.,
An as trained gauss hybrid models.
The background channel model that the present embodiment is trained in advance is by the excavation to a large amount of voice data and compares trained
Arrive, this model can accurately portray background sound when user speaks while the vocal print feature of user is retained to greatest extent
Line feature, and can remove this feature in identification, and the inherent feature of user voice is extracted, it can significantly improve use
The accuracy rate and efficiency of family authentication.
Step S4, calculates between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Space length, authentication is carried out to the user based on the space length, and generates verification result.
In the present embodiment, vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, preferably
Ground, the space length of the present embodiment is COS distance, and COS distance is to utilize two vectorial angle cosine values in vector space
Measurement as the size for weighing two inter-individual differences.
Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors
The identification information of its corresponding user is carried in storage, it is capable of the identity of the corresponding user of accurate representation.Calculating space
Before distance, the identification information provided according to user obtains the standard vocal print discriminant vectors of storage.
Wherein, when the space length being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying
Failure.
Compared with prior art, the present embodiment to targeted customer based on vocal print when carrying out authentication, using convolution god
The speech processes of framing and sampling are carried out to voice data through network model, can quickly and efficiently obtain in voice data has
Local data, extracts vocal print feature based on speech sample data and builds the identity that vocal print feature vector carries out targeted customer
Verification, it is possible to increase the accuracy and efficiency of authentication;In addition, the present embodiment take full advantage of it is relevant with sound channel in voice
Vocal print feature, this vocal print feature need not simultaneously be any limitation as text, thus be identified with verification during have compared with
Big flexibility.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, above-mentioned step S2 includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains
Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms corresponding vocal print feature vector.
In the present embodiment, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that in voice data
High frequency characteristics more highlights, and specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein, Z is voice data, and α is
Constant factor, it is preferable that the value of α is 0.97;Due to speech sample data after framing to a certain extent away from original
Voice, therefore, it is necessary to carry out windowing process to speech sample data.
In the present embodiment, it is, for example, to take the logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually
Realized by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC.
Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame speech sample data, by the mel-frequency cepstrum coefficient of every frame
MFCC composition characteristic data matrixes, this feature data matrix are the vocal print feature vector of speech sample data.
The present embodiment takes speech sample data mel-frequency cepstrum coefficient MFCC to form corresponding vocal print feature vector, due to
Its than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore can improve
The accuracy of authentication.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, the step S4, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Distance:For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;If the cosine
Distance is less than or equal to default distance threshold, then generates the information being verified;If the COS distance is more than default
Distance threshold, the then information that generation verification does not pass through.
In the present embodiment, the mark letter of targeted customer can be carried when storing the standard vocal print discriminant vectors of targeted customer
Breath, when verifying the identity of user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects
It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining
Chordal distance verifies the identity of targeted customer, improves the accuracy of authentication.
The present invention also provides a kind of computer-readable recording medium, processing is stored with the computer-readable recording medium
The step of system, the processing system realizes above-mentioned auth method based on vocal print when being executed by processor.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art
Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, takes
Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of electronic device, it is characterised in that the electronic device includes memory and the processing being connected with the memory
Device, is stored with the processing system that can be run on the processor in the memory, the processing system is by the processor
Following steps are realized during execution:
Framing sampling step, after the voice data of targeted customer of pending authentication is received, calls predetermined convolution
Neutral net CNN models carry out framing and sampling to the voice data, obtain speech sample data;
Extraction step, is handled the speech sample data to extract preset kind vocal print feature using Predetermined filter, and
The corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;
Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice data
Current vocal print discriminant vectors;
Verification step, calculates between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore
Space length, authentication is carried out based on the space length to the user, and generates verification result.
2. electronic device according to claim 1, it is characterised in that the framing sampling step, specifically includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, obtains the voice
The corresponding two-dimentional voice data of data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, obtain the voice
Sampled data.
3. electronic device according to claim 1 or 2, it is characterised in that the extraction step, specifically includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window is corresponded to
Frequency spectrum, the frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum system
Number MFCC forms corresponding vocal print feature vector.
4. electronic device according to claim 1 or 2, it is characterised in that the verification step, specifically includes:
Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore: For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
5. a kind of auth method based on vocal print, it is characterised in that the auth method based on vocal print includes:
S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolutional neural networks CNN
Model carries out framing and sampling to the voice data, obtains speech sample data;
S2, is handled the speech sample data using Predetermined filter to extract preset kind vocal print feature, and is based on being somebody's turn to do
Preset kind vocal print feature builds the corresponding vocal print feature vector of the voice data;
S3, the background channel model that vocal print feature vector input is trained in advance, to construct the current of the voice data
Vocal print discriminant vectors;
S4, calculate space between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore away from
From carrying out authentication to the user based on the space length, and generate verification result.
6. the auth method according to claim 5 based on vocal print, it is characterised in that the step S1 includes:
Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, obtains the voice
The corresponding two-dimentional voice data of data;
Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data;
Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, obtain the voice
Sampled data.
7. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the step S2 includes:
Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window is corresponded to
Frequency spectrum, the frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum system
Number MFCC forms corresponding vocal print feature vector.
8. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the step S4 includes:
Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore: For the standard vocal print discriminant vectors,For current vocal print discriminant vectors;
If the COS distance is less than or equal to default distance threshold, the information being verified is generated;
If the COS distance is more than default distance threshold, the information that generation verification does not pass through.
9. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the background channel mould
Type is gauss hybrid models, is included before the step S3:
The voice data sample of default quantity is obtained, which is handled to obtain preset kind vocal print feature,
And corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample;
The vocal print feature vector is divided into the training set of the first ratio and the verification collection of the second ratio, first ratio and second
Ratio and less than or equal to 1;
Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, are utilized
The accuracy rate of gauss hybrid models after the verification set pair training is verified;
If the accuracy rate is more than predetermined threshold value, model training terminates, and the back of the body is used as using the gauss hybrid models after training
Scape channel model, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the quantity of the voice data sample, and
Training is re-started based on the voice data sample after increase.
10. a kind of computer-readable recording medium, it is characterised in that be stored with processing system on the computer-readable recording medium
System, realizes that the identity based on vocal print as any one of claim 5 to 9 is tested when the processing system is executed by processor
The step of card method.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711161344.0A CN107993071A (en) | 2017-11-21 | 2017-11-21 | Electronic device, auth method and storage medium based on vocal print |
| PCT/CN2018/076113 WO2019100606A1 (en) | 2017-11-21 | 2018-02-10 | Electronic device, voiceprint-based identity verification method and system, and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711161344.0A CN107993071A (en) | 2017-11-21 | 2017-11-21 | Electronic device, auth method and storage medium based on vocal print |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN107993071A true CN107993071A (en) | 2018-05-04 |
Family
ID=62031709
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201711161344.0A Pending CN107993071A (en) | 2017-11-21 | 2017-11-21 | Electronic device, auth method and storage medium based on vocal print |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107993071A (en) |
| WO (1) | WO2019100606A1 (en) |
Cited By (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108650266A (en) * | 2018-05-14 | 2018-10-12 | 平安科技(深圳)有限公司 | Server, the method for voice print verification and storage medium |
| CN108648759A (en) * | 2018-05-14 | 2018-10-12 | 华南理工大学 | A kind of method for recognizing sound-groove that text is unrelated |
| CN108806696A (en) * | 2018-05-08 | 2018-11-13 | 平安科技(深圳)有限公司 | Establish method, apparatus, computer equipment and the storage medium of sound-groove model |
| CN110265037A (en) * | 2019-06-13 | 2019-09-20 | 中信银行股份有限公司 | Auth method, device, electronic equipment and computer readable storage medium |
| CN110556126A (en) * | 2019-09-16 | 2019-12-10 | 平安科技(深圳)有限公司 | Voice recognition method and device and computer equipment |
| WO2019237518A1 (en) * | 2018-06-11 | 2019-12-19 | 平安科技(深圳)有限公司 | Model library establishment method, voice recognition method and apparatus, and device and medium |
| CN110634492A (en) * | 2019-06-13 | 2019-12-31 | 中信银行股份有限公司 | Login verification method and device, electronic equipment and computer readable storage medium |
| CN110782879A (en) * | 2019-09-18 | 2020-02-11 | 平安科技(深圳)有限公司 | Sample size-based voiceprint clustering method, device, equipment and storage medium |
| WO2020073519A1 (en) * | 2018-10-11 | 2020-04-16 | 平安科技(深圳)有限公司 | Voiceprint verification method and apparatus, computer device and storage medium |
| CN111477235A (en) * | 2020-04-15 | 2020-07-31 | 厦门快商通科技股份有限公司 | Voiceprint acquisition method, device and equipment |
| CN111524525A (en) * | 2020-04-28 | 2020-08-11 | 平安科技(深圳)有限公司 | Original voice voiceprint recognition method, device, equipment and storage medium |
| CN111552832A (en) * | 2020-04-01 | 2020-08-18 | 深圳壹账通智能科技有限公司 | Risk user identification method and device based on voiceprint features and associated graph data |
| CN111862933A (en) * | 2020-07-20 | 2020-10-30 | 北京字节跳动网络技术有限公司 | Method, apparatus, apparatus and medium for generating synthetic speech |
| CN112331217A (en) * | 2020-11-02 | 2021-02-05 | 泰康保险集团股份有限公司 | Voiceprint recognition method and device, storage medium and electronic equipment |
| CN112669820A (en) * | 2020-12-16 | 2021-04-16 | 平安科技(深圳)有限公司 | Examination cheating recognition method and device based on voice recognition and computer equipment |
| CN113177816A (en) * | 2020-01-08 | 2021-07-27 | 阿里巴巴集团控股有限公司 | Information processing method and device |
| CN114780787A (en) * | 2022-04-01 | 2022-07-22 | 杭州半云科技有限公司 | Voiceprint retrieval method, identity verification method, identity registration method and device |
| CN115086045A (en) * | 2022-06-17 | 2022-09-20 | 海南大学 | Data security protection method and device based on voiceprint forgery detection |
| CN115358749A (en) * | 2022-08-09 | 2022-11-18 | 平安银行股份有限公司 | Identity verification method, identity verification device, server and computer readable storage medium |
| CN118568701A (en) * | 2024-07-30 | 2024-08-30 | 青岛大学 | A secure authentication method based on secure computer |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101197130A (en) * | 2006-12-07 | 2008-06-11 | 华为技术有限公司 | Voice activity detection method and voice activity detector |
| CN101894566A (en) * | 2010-07-23 | 2010-11-24 | 北京理工大学 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
| CN101923855A (en) * | 2009-06-17 | 2010-12-22 | 复旦大学 | Text-independent Voiceprint Recognition System |
| CN103310273A (en) * | 2013-06-26 | 2013-09-18 | 南京邮电大学 | Method for articulating Chinese vowels with tones and based on DIVA model |
| CN106682574A (en) * | 2016-11-18 | 2017-05-17 | 哈尔滨工程大学 | One-dimensional deep convolution network underwater multi-target recognition method |
| CN106847302A (en) * | 2017-02-17 | 2017-06-13 | 大连理工大学 | Single-channel Mixed Speech Separation Method in Time Domain Based on Convolutional Neural Network |
| CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
| CN107240397A (en) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106205606A (en) * | 2016-08-15 | 2016-12-07 | 南京邮电大学 | A kind of dynamic positioning and monitoring method based on speech recognition and system |
| CN106847309A (en) * | 2017-01-09 | 2017-06-13 | 华南理工大学 | A kind of speech-emotion recognition method |
-
2017
- 2017-11-21 CN CN201711161344.0A patent/CN107993071A/en active Pending
-
2018
- 2018-02-10 WO PCT/CN2018/076113 patent/WO2019100606A1/en not_active Ceased
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101197130A (en) * | 2006-12-07 | 2008-06-11 | 华为技术有限公司 | Voice activity detection method and voice activity detector |
| CN101923855A (en) * | 2009-06-17 | 2010-12-22 | 复旦大学 | Text-independent Voiceprint Recognition System |
| CN101894566A (en) * | 2010-07-23 | 2010-11-24 | 北京理工大学 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
| CN103310273A (en) * | 2013-06-26 | 2013-09-18 | 南京邮电大学 | Method for articulating Chinese vowels with tones and based on DIVA model |
| CN106682574A (en) * | 2016-11-18 | 2017-05-17 | 哈尔滨工程大学 | One-dimensional deep convolution network underwater multi-target recognition method |
| CN106847302A (en) * | 2017-02-17 | 2017-06-13 | 大连理工大学 | Single-channel Mixed Speech Separation Method in Time Domain Based on Convolutional Neural Network |
| CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
| CN107240397A (en) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | A kind of smart lock and its audio recognition method and system based on Application on Voiceprint Recognition |
Non-Patent Citations (2)
| Title |
|---|
| 胡青: "卷积神经网络在声纹识别中的应用研究", 《中国优秀硕士学位论文全文数据库》 * |
| 胡青等: "基于卷积神经网络的说话人识别算法", 《信息网络安全》 * |
Cited By (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108806696A (en) * | 2018-05-08 | 2018-11-13 | 平安科技(深圳)有限公司 | Establish method, apparatus, computer equipment and the storage medium of sound-groove model |
| CN108806696B (en) * | 2018-05-08 | 2020-06-05 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and storage medium for establishing voiceprint model |
| CN108650266B (en) * | 2018-05-14 | 2020-02-18 | 平安科技(深圳)有限公司 | Server, voiceprint verification method and storage medium |
| CN108648759A (en) * | 2018-05-14 | 2018-10-12 | 华南理工大学 | A kind of method for recognizing sound-groove that text is unrelated |
| CN108650266A (en) * | 2018-05-14 | 2018-10-12 | 平安科技(深圳)有限公司 | Server, the method for voice print verification and storage medium |
| WO2019218512A1 (en) * | 2018-05-14 | 2019-11-21 | 平安科技(深圳)有限公司 | Server, voiceprint verification method, and storage medium |
| WO2019237518A1 (en) * | 2018-06-11 | 2019-12-19 | 平安科技(深圳)有限公司 | Model library establishment method, voice recognition method and apparatus, and device and medium |
| WO2020073519A1 (en) * | 2018-10-11 | 2020-04-16 | 平安科技(深圳)有限公司 | Voiceprint verification method and apparatus, computer device and storage medium |
| CN110634492A (en) * | 2019-06-13 | 2019-12-31 | 中信银行股份有限公司 | Login verification method and device, electronic equipment and computer readable storage medium |
| CN110634492B (en) * | 2019-06-13 | 2023-08-25 | 中信银行股份有限公司 | Login verification method, login verification device, electronic equipment and computer readable storage medium |
| CN110265037A (en) * | 2019-06-13 | 2019-09-20 | 中信银行股份有限公司 | Auth method, device, electronic equipment and computer readable storage medium |
| CN110556126A (en) * | 2019-09-16 | 2019-12-10 | 平安科技(深圳)有限公司 | Voice recognition method and device and computer equipment |
| CN110556126B (en) * | 2019-09-16 | 2024-01-05 | 平安科技(深圳)有限公司 | Speech recognition method and device and computer equipment |
| CN110782879A (en) * | 2019-09-18 | 2020-02-11 | 平安科技(深圳)有限公司 | Sample size-based voiceprint clustering method, device, equipment and storage medium |
| CN113177816A (en) * | 2020-01-08 | 2021-07-27 | 阿里巴巴集团控股有限公司 | Information processing method and device |
| CN111552832A (en) * | 2020-04-01 | 2020-08-18 | 深圳壹账通智能科技有限公司 | Risk user identification method and device based on voiceprint features and associated graph data |
| CN111477235A (en) * | 2020-04-15 | 2020-07-31 | 厦门快商通科技股份有限公司 | Voiceprint acquisition method, device and equipment |
| CN111524525B (en) * | 2020-04-28 | 2023-06-16 | 平安科技(深圳)有限公司 | Voiceprint recognition method, device, equipment and storage medium of original voice |
| CN111524525A (en) * | 2020-04-28 | 2020-08-11 | 平安科技(深圳)有限公司 | Original voice voiceprint recognition method, device, equipment and storage medium |
| CN111862933A (en) * | 2020-07-20 | 2020-10-30 | 北京字节跳动网络技术有限公司 | Method, apparatus, apparatus and medium for generating synthetic speech |
| CN112331217A (en) * | 2020-11-02 | 2021-02-05 | 泰康保险集团股份有限公司 | Voiceprint recognition method and device, storage medium and electronic equipment |
| CN112331217B (en) * | 2020-11-02 | 2023-09-12 | 泰康保险集团股份有限公司 | Voiceprint recognition method and device, storage medium and electronic equipment |
| CN112669820A (en) * | 2020-12-16 | 2021-04-16 | 平安科技(深圳)有限公司 | Examination cheating recognition method and device based on voice recognition and computer equipment |
| CN112669820B (en) * | 2020-12-16 | 2023-08-04 | 平安科技(深圳)有限公司 | Examination cheating recognition method and device based on voice recognition and computer equipment |
| CN114780787A (en) * | 2022-04-01 | 2022-07-22 | 杭州半云科技有限公司 | Voiceprint retrieval method, identity verification method, identity registration method and device |
| CN115086045A (en) * | 2022-06-17 | 2022-09-20 | 海南大学 | Data security protection method and device based on voiceprint forgery detection |
| CN115358749A (en) * | 2022-08-09 | 2022-11-18 | 平安银行股份有限公司 | Identity verification method, identity verification device, server and computer readable storage medium |
| CN118568701A (en) * | 2024-07-30 | 2024-08-30 | 青岛大学 | A secure authentication method based on secure computer |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2019100606A1 (en) | 2019-05-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107993071A (en) | Electronic device, auth method and storage medium based on vocal print | |
| CN107527620B (en) | Electronic device, the method for authentication and computer readable storage medium | |
| CN110556126B (en) | Speech recognition method and device and computer equipment | |
| TWI641965B (en) | Method and system of authentication based on voiceprint recognition | |
| CN107680586B (en) | Far-field speech acoustic model training method and system | |
| CN107481717B (en) | Acoustic model training method and system | |
| CN111933154B (en) | Method, equipment and computer readable storage medium for recognizing fake voice | |
| WO2019136912A1 (en) | Electronic device, identity authentication method and system, and storage medium | |
| CN109147798B (en) | Speech recognition method, device, electronic equipment and readable storage medium | |
| CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
| CN110265035B (en) | Speaker recognition method based on deep learning | |
| CN108630208B (en) | Server, voiceprint-based identity authentication method and storage medium | |
| CN110929836B (en) | Neural network training and image processing method and device, electronic equipment and medium | |
| CN108694952B (en) | Electronic device, identity authentication method and storage medium | |
| CN108281158A (en) | Voice biopsy method, server and storage medium based on deep learning | |
| CN111161713A (en) | Voice gender identification method and device and computing equipment | |
| CN111798047A (en) | Wind control prediction method and device, electronic equipment and storage medium | |
| CN109378014A (en) | A method and system for source identification of mobile devices based on convolutional neural network | |
| CN108650266B (en) | Server, voiceprint verification method and storage medium | |
| CN116913304A (en) | Real-time voice stream noise reduction method and device, computer equipment and storage medium | |
| CN116504276A (en) | Emotion classification method and device based on artificial intelligence, computer equipment and medium | |
| CN115223569B (en) | Speaker verification method, terminal and storage medium based on deep neural network | |
| CN114048770B (en) | Automatic detection method and system for digital audio deletion and insertion tampering operation | |
| CN113035176A (en) | Voice data processing method and device, computer equipment and storage medium | |
| CN114067834A (en) | Bad preamble recognition method and device, storage medium and computer equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180504 |