CN107993071A

CN107993071A - Electronic device, auth method and storage medium based on vocal print

Info

Publication number: CN107993071A
Application number: CN201711161344.0A
Authority: CN
Inventors: 赵峰; 王健宗; 程宁; 郑斯奇; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2017-11-21
Filing date: 2017-11-21
Publication date: 2018-05-04
Also published as: WO2019100606A1

Abstract

Auth method and storage medium the present invention relates to a kind of electronic device, based on vocal print, this method include：After the voice data of targeted customer of pending authentication is received, call predetermined convolutional neural networks CNN models to carry out framing and sampling to the voice data, obtain speech sample data；The speech sample data is handled using Predetermined filter to extract preset kind vocal print feature, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature；The background channel model that vocal print feature vector input is trained in advance, to construct the current vocal print discriminant vectors of the voice data；The space length between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore is calculated, authentication is carried out to the user based on the space length, and generate verification result.The present invention can improve the accuracy and efficiency of authentication.

Description

Electronic device, auth method and storage medium based on vocal print

Technical field

The present invention relates to field of communication technology, more particularly to a kind of electronic device, the auth method based on vocal print and Storage medium.

Background technology

At present, the scope of business of many large-scale financing corporations is related to multiple business such as insurance, bank, investment, and every A business is usually required for same client to be linked up, and therefore, the authentication to client, which also just becomes, ensures service security Important component.In order to meet the real-time demand of business, current this kind of financing corporation's generally use manual type is to visitor The identity at family carries out analysis verification, but since customer group is huge, single by manually carrying out, discriminant analysis is not only time-consuming to take Power, easily error, and operating cost can be greatly increased；In addition, some financing corporations attempt to use the automatic recognition of speech Mode the identity of user is differentiated automatically, however, the accuracy rate of this kind of existing automatic recognition of speech mode is low, need Improve.Therefore, how the high automatic recognition of speech scheme of accuracy is provided and has become a technical problem urgently to be resolved hurrily.

The content of the invention

Auth method and storage medium it is an object of the invention to provide a kind of electronic device, based on vocal print, purport Improving the accuracy and efficiency of authentication.

To achieve the above object, the present invention provides a kind of electronic device, the electronic device include memory and with it is described The processor of memory connection, is stored with the processing system that can be run on the processor, the processing in the memory System realizes following steps when being performed by the processor：

Framing sampling step, after the voice data of targeted customer of pending authentication is received, is called predetermined Convolutional neural networks CNN models carry out framing and sampling to the voice data, obtain speech sample data；

Extraction step, is handled the speech sample data using Predetermined filter to extract preset kind vocal print spy Sign, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature；

Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice The current vocal print discriminant vectors of data；

Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer that prestore it Between space length, authentication is carried out to the user based on the space length, and generates verification result.

Preferably, the framing sampling step, specifically includes：

Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do The corresponding two-dimentional voice data of voice data；

Using the convolution kernel of default specification, and based on the first default step-length, convolution is carried out to the two dimension voice data；

Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, are obtained described Speech sample data.

Preferably, the extraction step, specifically includes：

Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window obtains Corresponding frequency spectrum, inputs Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum；

Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency Spectral coefficient MFCC forms corresponding vocal print feature vector.

Preferably, the verification step, specifically includes：

Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Distance：For the standard vocal print discriminant vectors,For current vocal print discriminant vectors；

If the COS distance is less than or equal to default distance threshold, the information being verified is generated；

If the COS distance is more than default distance threshold, the information that generation verification does not pass through.

To achieve the above object, it is described based on vocal print the present invention also provides a kind of auth method based on vocal print Auth method includes：

S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolutional Neural net Network CNN models carry out framing and sampling to the voice data, obtain speech sample data；

S2, is handled the speech sample data using Predetermined filter to extract preset kind vocal print feature, and base The corresponding vocal print feature vector of the voice data is built in the preset kind vocal print feature；

S3, the background channel model that vocal print feature vector input is trained in advance, to construct the voice data Current vocal print discriminant vectors；

S4, calculates the sky between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Between distance, authentication is carried out to the user based on the space length, and generates verification result.

Preferably, the step S1 includes：

Preferably, the step S2 includes：

Preferably, the step S4 includes：

Preferably, the background channel model is gauss hybrid models, is included before the step S3：

The voice data sample of default quantity is obtained, which is handled to obtain preset kind vocal print spy Sign, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample；

The vocal print feature vector is divided into the training set of the first ratio and the verification collection of the second ratio, first ratio and Second ratio and less than or equal to 1；

Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, Verified using the accuracy rate of the gauss hybrid models after the verification set pair training；

If the accuracy rate is more than predetermined threshold value, model training terminates, and institute is used as using the gauss hybrid models after training Background channel model is stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample Amount, and training is re-started based on the voice data sample after increase.

The present invention also provides a kind of computer-readable recording medium, processing is stored with the computer-readable recording medium The step of system, the processing system realizes above-mentioned auth method based on vocal print when being executed by processor.

The beneficial effects of the invention are as follows：The present invention to targeted customer based on vocal print when carrying out authentication, using convolution Neural network model carries out voice data the speech processes of framing and sampling, can quickly and efficiently obtain in voice data Useful local data, extracts vocal print feature based on speech sample data and builds the body that vocal print feature vector carries out targeted customer Part verification, it is possible to increase the accuracy and efficiency of authentication.

Brief description of the drawings

Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention；

Fig. 2 is the flow diagram of auth method one embodiment of the invention based on vocal print.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before creative work is made All other embodiments obtained are put, belong to the scope of protection of the invention.

It should be noted that the description for being related to " first ", " second " etc. in the present invention is only used for description purpose, and cannot It is interpreted as indicating or implies its relative importance or imply the quantity of the technical characteristic indicated by indicating.Thus, define " the One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the skill between each embodiment Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical solution It will be understood that the combination of this technical solution is not present with reference to there is conflicting or can not realize when, also not in application claims Protection domain within.

As shown in fig.1, Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention.Electronic device 1 is It is a kind of to carry out numerical computations and/or the equipment of information processing automatically according to the instruction for being previously set or storing.The electricity Sub-device 1 can be computer, can also be single network server, the server group or base of multiple webservers composition In the cloud being made of a large amount of hosts or the webserver of cloud computing, wherein cloud computing is one kind of Distributed Calculation, by one One super virtual computer of the computer collection composition of group's loose couplings.

In the present embodiment, electronic device 1 may include, but be not limited only to, and can be in communication with each other connection by system bus Memory 11, processor 12, network interface 13, memory 11 are stored with the processing system that can be run on the processor 12.Need , it is noted that Fig. 1 illustrate only the electronic device 1 with component 11-13, it should be understood that being not required for implementing all The component shown, what can be substituted implements more or less components.

Wherein, memory 11 includes memory and the readable storage medium storing program for executing of at least one type.Inside save as the fortune of electronic device 1 Row provides caching；Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories Deng), random access storage device (RAM), static random-access memory (SRAM), read-only storage (ROM), electric erasable can compile Journey read-only storage (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile Storage medium.In certain embodiments, readable storage medium storing program for executing can be the internal storage unit of electronic device 1, such as the electronics The hard disk of device 1；In further embodiments, which can also be that the external storage of electronic device 1 is set Plug-in type hard disk that is standby, such as being equipped with electronic device 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11 The operating system and types of applications software of electronic device 1, such as the place in one embodiment of the invention are installed on commonly used in storage Program code of reason system etc..Export or will export each in addition, memory 11 can be also used for temporarily storing Class data.

The processor 12 can be in certain embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is commonly used in the control electricity The overall operation of sub-device 1, such as perform and carry out data interaction or communicate relevant control and processing with the other equipment Deng.In the present embodiment, the processor 12 is used to run the program code stored in the memory 11 or processing data, example Such as run processing system.

The network interface 13 may include radio network interface or wired network interface, which is commonly used in Communication connection is established between the electronic device 1 and other electronic equipments.In the present embodiment, network interface 13 is mainly used for electricity Sub-device 1 is connected with other equipment, establishes data transmission channel and communication connection, to receive the target of pending authentication use The voice data at family.

The processing system is stored in memory 11, including it is at least one be stored in it is computer-readable in memory 11 Instruction, which can be performed by processor device 12, to realize the method for each embodiment of the application；With And at least one computer-readable instruction is different according to the function that its each several part is realized, can be divided into different logic moulds Block.

In one embodiment, following steps are realized when above-mentioned processing system is performed by the processor 12：

In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device. When gathering voice data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device is used with target Family keeps suitable distance, and does not have to the big voice capture device of distortion as far as possible, and power supply keeps electric current steady preferably using alternating current It is fixed；Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise Reason, is disturbed with further reduce.In order to extract to obtain the vocal print feature of voice data, the voice data gathered is default The voice data of data length, or be the voice data more than preset data length.

In a preferred embodiment, the voice data received is one-dimensional voice data, and framing sampling step is specific to wrap Include：

Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, is somebody's turn to do The corresponding two-dimentional voice data of voice data；Using the convolution kernel of default specification, and based on the first default step-length, to the two dimension language Sound data carry out convolution；Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, Obtain the speech sample data.

Wherein, only stationarity is presented in voice signal within a short period of time, and framing is that one section of voice signal is divided into N sections in short-term Between voice signal, and have one section of duplicate block in order to avoid losing the continuity Characteristics of voice signal, between adjacent speech frame Domain, repeat region are generally the 1/2 of frame length.After framing, each frame is handled all as stationary signal.

Wherein, the convolution kernel for presetting specification can be the convolution kernel of 5*5, and the first default step-length can be 1*1, and second is default Step-length can be 2*2.

Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print etc., and the present embodiment presets class Type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to Amount.

In the present embodiment, which is preferably gauss hybrid models, is calculated using the gauss hybrid models Vocal print feature vector, draws corresponding current vocal print discriminant vectors (i.e. i-vector).

Specifically, which includes：

1) Gauss model, is selected：First, every frame data are calculated using the parameter in common background channel model in difference The likelihood of Gauss model is to numerical value, by, to numerical matrix each column sorting in parallel, choosing top n Gauss model, finally to likelihood Obtain a matrix per frame data numerical value in mixed Gauss model：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

Wherein, Loglike trains the average square come to numerical matrix, E (X) for likelihood for common background channel model Battle array, D (X) are covariance matrix, and X is data matrix, X.²Each it is worth for matrix and is squared.

Wherein, likelihood is to numerical computational formulas：loglikes_i=C_i+E_i*Cov_i ^-1*X_i-X_i ^T*X_i*Cov_i ^-1, loglikes_i The i-th row vector for likelihood to numerical matrix, C_iFor the constant term of i-th of model, E_iFor the Mean Matrix of i-th of model, Cov_i For the covariance matrix of i-th of model, X_iFor the i-th frame data.

2) posterior probability, is calculated：X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down Angular moment battle array, and element is arranged as to 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model Variance matrix, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data Value, then carries out Softmax recurrence, operation is finally normalized, and obtains every frame in mixed Gauss model Posterior probability distribution, The ProbabilityDistribution Vector of every frame is formed into probability matrix.

3) current vocal print discriminant vectors, are extracted：Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with Obtained by probability matrix row summation：

Wherein, Gamma_iFor i-th of element of coefficient of first order vector, loglikes_jiFor Likelihood is to the jth row of numerical matrix, i-th of element.

Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix：

X=Loglike^T* feats, wherein, X is second order coefficient matrix, and loglike is likelihood to numerical matrix, feats It is characterized data matrix.

It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, then pass through first order and two Secondary item calculates current vocal print discriminant vectors.

Preferably, the process of training gauss hybrid models includes：

The voice data sample of default quantity (such as 100,000) is obtained, which is handled to obtain pre- If type vocal print feature, and corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample；

The vocal print feature vector is divided into the training set and the second ratio (such as 0.25) of the first ratio (such as 0.75) Verification collection, first ratio and the second ratio and less than or equal to 1；

If the accuracy rate is more than predetermined threshold value, model training terminates, using the gauss hybrid models after training before The background channel model stated, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the number of the voice data sample Amount, and training is re-started based on the voice data sample after increase.

Wherein, when the vocal print feature vector in using training set is trained gauss hybrid models, the D that extracts The corresponding likelihood probability of dimension vocal print feature can be expressed as with K Gaussian component：

Wherein, P (x) is the probability (mixing that voice data sample is generated by gauss hybrid models Gauss model), w_kFor the weight of each Gauss model, probability that p (x | k) generate for sample by k-th of Gauss model, K is high This model quantity.

The parameter of whole gauss hybrid models can be expressed as：{w_i,μ_i,Σ_i, w_iFor the weight of i-th of Gauss model, μ_i For the average of i-th of Gauss model, ∑_iFor the covariance of i-th of Gauss model.The training gauss hybrid models can use non-prison The EM algorithms superintended and directed, object function use maximal possibility estimation, i.e., make log-likelihood function maximum by selection parameter.Training is completed Afterwards, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained and is multiplied by matrix of covariance etc., An as trained gauss hybrid models.

The background channel model that the present embodiment is trained in advance is by the excavation to a large amount of voice data and compares trained Arrive, this model can accurately portray background sound when user speaks while the vocal print feature of user is retained to greatest extent Line feature, and can remove this feature in identification, and the inherent feature of user voice is extracted, it can significantly improve use The accuracy rate and efficiency of family authentication.

In the present embodiment, vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, preferably Ground, the space length of the present embodiment is COS distance, and COS distance is to utilize two vectorial angle cosine values in vector space Measurement as the size for weighing two inter-individual differences.

Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors The identification information of its corresponding user is carried in storage, it is capable of the identity of the corresponding user of accurate representation.Calculating space Before distance, the identification information provided according to user obtains the standard vocal print discriminant vectors of storage.

Wherein, when the space length being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying Failure.

Compared with prior art, the present embodiment to targeted customer based on vocal print when carrying out authentication, using convolution god The speech processes of framing and sampling are carried out to voice data through network model, can quickly and efficiently obtain in voice data has Local data, extracts vocal print feature based on speech sample data and builds the identity that vocal print feature vector carries out targeted customer Verification, it is possible to increase the accuracy and efficiency of authentication；In addition, the present embodiment take full advantage of it is relevant with sound channel in voice Vocal print feature, this vocal print feature need not simultaneously be any limitation as text, thus be identified with verification during have compared with Big flexibility.

In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, above-mentioned extraction step includes：

In the present embodiment, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that in voice data High frequency characteristics more highlights, and specifically, the transmission function of high-pass filtering is：H (Z)=1- α Z^-1, wherein, Z is voice data, and α is Constant factor, it is preferable that the value of α is 0.97；Due to speech sample data after framing to a certain extent away from original Voice, therefore, it is necessary to carry out windowing process to speech sample data.

In the present embodiment, it is, for example, to take the logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually Realized by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC. Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame speech sample data, by the mel-frequency cepstrum coefficient of every frame MFCC composition characteristic data matrixes, this feature data matrix are the vocal print feature vector of speech sample data.

The present embodiment takes speech sample data mel-frequency cepstrum coefficient MFCC to form corresponding vocal print feature vector, due to Its than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore can improve The accuracy of authentication.

In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, the verification step, specifically includes：

Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Distance：For the standard vocal print discriminant vectors,For current vocal print discriminant vectors；If the cosine Distance is less than or equal to default distance threshold, then generates the information being verified；If the COS distance is more than default Distance threshold, the then information that generation verification does not pass through.

In the present embodiment, the mark letter of targeted customer can be carried when storing the standard vocal print discriminant vectors of targeted customer Breath, when verifying the identity of user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining Chordal distance verifies the identity of targeted customer, improves the accuracy of authentication.

As shown in Fig. 2, Fig. 2 is the flow diagram of auth method one embodiment of the invention based on vocal print, the base Comprise the following steps in the auth method of vocal print：

Step S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolution god Framing and sampling are carried out to the voice data through network C NN models, obtain speech sample data；

Step S2, is handled the speech sample data to extract preset kind vocal print feature using Predetermined filter, And the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature；

Step S3, the background channel model that vocal print feature vector input is trained in advance, to construct the voice number According to current vocal print discriminant vectors；

Specifically, which includes：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

Preferably, the process of training gauss hybrid models includes：

Step S4, calculates between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Space length, authentication is carried out to the user based on the space length, and generates verification result.

In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, above-mentioned step S2 includes：

In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, the step S4, specifically includes：

The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, takes Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.

It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made, is directly or indirectly used in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of electronic device, it is characterised in that the electronic device includes memory and the processing being connected with the memory Device, is stored with the processing system that can be run on the processor in the memory, the processing system is by the processor Following steps are realized during execution：

Framing sampling step, after the voice data of targeted customer of pending authentication is received, calls predetermined convolution Neutral net CNN models carry out framing and sampling to the voice data, obtain speech sample data；

Extraction step, is handled the speech sample data to extract preset kind vocal print feature using Predetermined filter, and The corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature；

Construction step, the background channel model that vocal print feature vector input is trained in advance, to construct the voice data Current vocal print discriminant vectors；

Verification step, calculates between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore Space length, authentication is carried out based on the space length to the user, and generates verification result.

2. electronic device according to claim 1, it is characterised in that the framing sampling step, specifically includes：

Framing is carried out to the voice data, by the voice data after framing using frame as row, using intraframe data as row, obtains the voice The corresponding two-dimentional voice data of data；

Maximum pond maxpooling samplings are carried out according to the second default step-length to the voice data after convolution, obtain the voice Sampled data.

3. electronic device according to claim 1 or 2, it is characterised in that the extraction step, specifically includes：

Preemphasis and windowing process are carried out to the speech sample data, carrying out Fourier transform to each adding window is corresponded to Frequency spectrum, the frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum；

Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum system Number MFCC forms corresponding vocal print feature vector.

4. electronic device according to claim 1 or 2, it is characterised in that the verification step, specifically includes：

Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore： For the standard vocal print discriminant vectors,For current vocal print discriminant vectors；

5. a kind of auth method based on vocal print, it is characterised in that the auth method based on vocal print includes：

S1, after the voice data of targeted customer of pending authentication is received, calls predetermined convolutional neural networks CNN Model carries out framing and sampling to the voice data, obtains speech sample data；

S2, is handled the speech sample data using Predetermined filter to extract preset kind vocal print feature, and is based on being somebody's turn to do Preset kind vocal print feature builds the corresponding vocal print feature vector of the voice data；

S3, the background channel model that vocal print feature vector input is trained in advance, to construct the current of the voice data Vocal print discriminant vectors；

S4, calculate space between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the targeted customer to prestore away from From carrying out authentication to the user based on the space length, and generate verification result.

6. the auth method according to claim 5 based on vocal print, it is characterised in that the step S1 includes：

7. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the step S2 includes：

8. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the step S4 includes：

9. the auth method based on vocal print according to claim 5 or 6, it is characterised in that the background channel mould Type is gauss hybrid models, is included before the step S3：

The voice data sample of default quantity is obtained, which is handled to obtain preset kind vocal print feature, And corresponding vocal print feature vector is built based on the corresponding vocal print feature of each voice data sample；

Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, are utilized The accuracy rate of gauss hybrid models after the verification set pair training is verified；

If the accuracy rate is more than predetermined threshold value, model training terminates, and the back of the body is used as using the gauss hybrid models after training Scape channel model, if alternatively, the accuracy rate is less than or equal to predetermined threshold value, increases the quantity of the voice data sample, and Training is re-started based on the voice data sample after increase.

10. a kind of computer-readable recording medium, it is characterised in that be stored with processing system on the computer-readable recording medium System, realizes that the identity based on vocal print as any one of claim 5 to 9 is tested when the processing system is executed by processor The step of card method.