声纹识别方法、装置、电子设备及介质Voiceprint recognition method, device, electronic device and medium
技术领域Technical field
本申请属于身份认证技术领域,尤其涉及一种声纹识别方法、装置、电子设备及介质。The present application belongs to the technical field of identity authentication, and in particular, to a voiceprint recognition method, device, electronic device and medium.
背景技术Background technique
声纹识别也称为说话人识别,用于判断某段语音是若干人中的哪一个所说的或者用于确认某段语音是否是指定的某个人所说的,是一项根据语音波形反映说话人生理和行为特征的语音参数,自动识别说话人身份的技术。目前,声纹识别广泛应用于互联网、银行系统、公安司法等领域。声纹,是用电声学仪器显示的携带言语信息的声波频谱。每个人的语音声学特征既有相对稳定性,又有变异性,不是绝对的、一成不变的。这种变异可来自生理、病理、心理、模拟、伪装,也与环境干扰有关。Voiceprint recognition, also known as speaker recognition, is used to determine which segment of speech is spoken by a certain segment of speech or to confirm whether a segment of speech is spoken by a designated person. The speech parameters of the physiological and behavioral characteristics of the speaker, the technique of automatically identifying the identity of the speaker. At present, voiceprint recognition is widely used in the Internet, banking systems, public security and other fields. Voiceprint is a sound wave spectrum that carries speech information displayed by electroacoustic instruments. Each person's speech acoustic characteristics are both relatively stable and variability, not absolute and immutable. This variation can come from physiology, pathology, psychology, simulation, camouflage, and also related to environmental disturbances.
业界主流的声纹识别方法一般需要先对说话人的声纹进行建模,通常是对全局背景模型预先进行训练。现有的声纹模型中,主要采用混合高斯模型来训练通用的背景模型。由于基于无监督训练的混合高斯背景模型中并没有样本数据的类别信息,仅用以代表说话人空间中所有说话人的特征,是一个单一的说话人无关的背景模型,因此难以准确地辨别说话人的差异性特征,最终导致对说话人的声纹进行识别时,识别准确率低。The mainstream voiceprint recognition method in the industry generally needs to first model the voiceprint of the speaker, usually the pre-training of the global background model. In the existing voiceprint model, a mixed Gaussian model is mainly used to train a general background model. Since the mixed Gaussian background model based on unsupervised training does not have the category information of the sample data, it is only used to represent the characteristics of all the speakers in the speaker space, and is a single speaker-independent background model, so it is difficult to accurately distinguish the speech. The difference in human characteristics ultimately leads to a low recognition accuracy when the speaker's voiceprint is recognized.
技术问题technical problem
本发明实施例提供了一种声纹识别方法、装置、电子设备及介质,以解决现有技术难以准确地辨别说话人的差异性特征,从而导致声纹识别准确率较低的问题。The embodiment of the invention provides a voiceprint recognition method, device, electronic device and medium, so as to solve the problem that the prior art is difficult to accurately distinguish the distinctive features of the speaker, thereby resulting in a low accuracy of voiceprint recognition.
技术解决方案Technical solution
本发明实施例的第一方面,提供了一种声纹识别方法,包括:A first aspect of the embodiments of the present invention provides a voiceprint recognition method, including:
分别对输入的K条语音进行预处理,以获取所述每条语音中的有效语音,所述语音包括训练语音及待识别语音;Performing pre-processing on the input K voices to obtain valid voices in each voice, the voices including training voices and to-be-recognized voices;
提取所述每条训练语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵;Extracting a Mel frequency cepstral coefficient acoustic characteristic of the effective speech in each training speech, and outputting a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of each training speech;
构建长短时递归神经网络模型,并将所述第一特征矩阵输入所述神经网络模型,以获取所述神经网络模型的输出参数;Constructing a long-term recurrent neural network model, and inputting the first feature matrix into the neural network model to obtain an output parameter of the neural network model;
利用所述神经网络模型的输出参数及所述每条训练语音对应的说话人特征,分别训练得出N条训练语音的N个特征提取矩阵,所述每个特征提取矩阵对应一个所述训练语音的说话人模型;Using the output parameters of the neural network model and the speaker features corresponding to each training speech, respectively, N feature extraction matrices of N training speeches are respectively trained, and each feature extraction matrix corresponds to one of the training speeches. Speaker model;
提取所述待识别语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述待识别语音的分帧数的第二特征矩阵;Extracting a Mel frequency cepstral coefficient acoustic characteristic of the effective speech in the speech to be recognized, and outputting a second feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of the to-be-recognized speech;
在所述N个说话人模型中,根据预设的相似性度量算法,选取出与所述第二特征矩阵相匹配的说话人模型,所述选取出的说话人模型对应的说话人输出为所述待识别语音的声纹识别结果;In the N speaker models, according to a preset similarity measurement algorithm, a speaker model matching the second feature matrix is selected, and the speaker output corresponding to the selected speaker model is Depicting the voiceprint recognition result of the recognized voice;
其中,所述K和N为大于零的整数,且K大于N。Wherein, K and N are integers greater than zero, and K is greater than N.
本发明实施例的第二方面,提供了一种声纹识别装置,包括:A second aspect of the embodiments of the present invention provides a voiceprint recognition apparatus, including:
预处理模块,用于分别对输入的K条语音进行预处理,以获取所述每条语音中的有效语音,所述语音包括训练语音及待识别语音;
a pre-processing module, configured to pre-process the input K voices to obtain valid voices in each voice, where the voice includes a training voice and a voice to be recognized;
第一提取模块,用于提取所述每条训练语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵;a first extraction module, configured to extract a Meer frequency cepstral coefficient acoustic feature of the effective speech in each training speech, output a dimension including the Mel frequency cepstral coefficient, and a number of sub-frames of each training speech First feature matrix;
构建模块,用于构建长短时递归神经网络模型,并将所述第一特征矩阵输入所述神经网络模型,以获取所述神经网络模型的输出参数;a building module, configured to construct a long-term recurrent neural network model, and input the first feature matrix into the neural network model to obtain an output parameter of the neural network model;
训练模块,用于利用所述神经网络模型的输出参数及所述每条训练语音对应的说话人特征,分别训练得出N条训练语音的N个特征提取矩阵,所述每个特征提取矩阵对应一个所述训练语音的说话人模型;a training module, configured to use the output parameters of the neural network model and the speaker features corresponding to each training speech to respectively obtain N feature extraction matrices of N training speeches, where each feature extraction matrix corresponds to a speaker model of the training speech;
第二提取模块,用于提取所述待识别语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述待识别语音的分帧数的第二特征矩阵;a second extraction module, configured to extract an acoustic characteristic of a Mel frequency cepstral coefficient of the effective speech in the speech to be recognized, and output a dimension including the dimension of the Mel frequency cepstral coefficient and the number of sub-frames of the to-be-recognized speech Two characteristic matrix;
识别模块,用于在所述N个说话人模型中,根据预设的相似性度量算法,选取出与所述第二特征矩阵相匹配的说话人模型,所述选取出的说话人模型对应的说话人输出为所述待识别语音的声纹识别结果;An identification module, configured to select, in the N speaker models, a speaker model that matches the second feature matrix according to a preset similarity measurement algorithm, where the selected speaker model corresponds The speaker outputs the voiceprint recognition result of the speech to be recognized;
其中,所述K和N为大于零的整数,且K大于N。Wherein, K and N are integers greater than zero, and K is greater than N.
本发明实施例的第三方面,提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机序时实现如下步骤:A third aspect of the embodiments of the present invention provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer program The following steps are implemented:
分别对输入的K条语音进行预处理,以获取所述每条语音中的有效语音,所述语音包括训练语音及待识别语音;Performing pre-processing on the input K voices to obtain valid voices in each voice, the voices including training voices and to-be-recognized voices;
提取所述每条训练语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵;Extracting a Mel frequency cepstral coefficient acoustic characteristic of the effective speech in each training speech, and outputting a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of each training speech;
构建长短时递归神经网络模型,并将所述第一特征矩阵输入所述神经网络模型,以获取所述神经网络模型的输出参数;Constructing a long-term recurrent neural network model, and inputting the first feature matrix into the neural network model to obtain an output parameter of the neural network model;
利用所述神经网络模型的输出参数及所述每条训练语音对应的说话人特征,分别训练得出N条训练语音的N个特征提取矩阵,所述每个特征提取矩阵对应一个所述训练语音的说话人模型;Using the output parameters of the neural network model and the speaker features corresponding to each training speech, respectively, N feature extraction matrices of N training speeches are respectively trained, and each feature extraction matrix corresponds to one of the training speeches. Speaker model;
提取所述待识别语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述待识别语音的分帧数的第二特征矩阵;Extracting a Mel frequency cepstral coefficient acoustic characteristic of the effective speech in the speech to be recognized, and outputting a second feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of the to-be-recognized speech;
在所述N个说话人模型中,根据预设的相似性度量算法,选取出与所述第二特征矩阵相匹配的说话人模型,所述选取出的说话人模型对应的说话人输出为所述待识别语音的声纹识别结果;In the N speaker models, according to a preset similarity measurement algorithm, a speaker model matching the second feature matrix is selected, and the speaker output corresponding to the selected speaker model is Depicting the voiceprint recognition result of the recognized voice;
其中,所述K和N为大于零的整数,且K大于N。Wherein, K and N are integers greater than zero, and K is greater than N.
本发明实施例的第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被至少一个处理器执行时实现如下步骤:According to a fourth aspect of the embodiments of the present invention, a computer readable storage medium storing a computer program, the computer program being executed by at least one processor, implements the following steps:
分别对输入的K条语音进行预处理,以获取所述每条语音中的有效语音,所述语音包括训练语音及待识别语音;Performing pre-processing on the input K voices to obtain valid voices in each voice, the voices including training voices and to-be-recognized voices;
提取所述每条训练语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵;Extracting a Mel frequency cepstral coefficient acoustic characteristic of the effective speech in each training speech, and outputting a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of each training speech;
构建长短时递归神经网络模型,并将所述第一特征矩阵输入所述神经网络模型,以获取所述神经网络模型的输出参数;Constructing a long-term recurrent neural network model, and inputting the first feature matrix into the neural network model to obtain an output parameter of the neural network model;
利用所述神经网络模型的输出参数及所述每条训练语音对应的说话人特征,分别训练
得出N条训练语音的N个特征提取矩阵,所述每个特征提取矩阵对应一个所述训练语音的说话人模型;Using the output parameters of the neural network model and the speaker characteristics corresponding to each training speech, respectively training
Deriving N feature extraction matrices of N training speeches, each feature extraction matrix corresponding to a speaker model of the training speech;
提取所述待识别语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述待识别语音的分帧数的第二特征矩阵;Extracting a Mel frequency cepstral coefficient acoustic characteristic of the effective speech in the speech to be recognized, and outputting a second feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of the to-be-recognized speech;
在所述N个说话人模型中,根据预设的相似性度量算法,选取出与所述第二特征矩阵相匹配的说话人模型,所述选取出的说话人模型对应的说话人输出为所述待识别语音的声纹识别结果;In the N speaker models, according to a preset similarity measurement algorithm, a speaker model matching the second feature matrix is selected, and the speaker output corresponding to the selected speaker model is Depicting the voiceprint recognition result of the recognized voice;
其中,所述K和N为大于零的整数,且K大于N。Wherein, K and N are integers greater than zero, and K is greater than N.
有益效果Beneficial effect
在本发明实施例中,采用监督学习的方式来训练声纹背景模型,通过结合说话人的特征,能够从原始训练语音数据中挖掘出更合适的声学特征集,从而能够更准确地辨别说话人的差异性特征,在语音交叠的场景下能获取更好的声纹识别效果。由于识别的主要过程是基于深度神经网络模型来实现,因此能够学习到鲁棒性更强的说话人模型,解决现有声纹识别方法识别准确率低的问题。In the embodiment of the present invention, the voiceprint background model is trained by supervised learning, and by combining the characteristics of the speaker, a more suitable acoustic feature set can be extracted from the original training voice data, thereby more accurately distinguishing the speaker. The difference feature can obtain better voiceprint recognition effect in the scene of overlapping voices. Since the main process of recognition is based on the deep neural network model, it is possible to learn a more robust speaker model and solve the problem of low recognition accuracy of the existing voiceprint recognition method.
附图说明DRAWINGS
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below. It is obvious that the drawings in the following description are only the present invention. For some embodiments, other drawings may be obtained from those of ordinary skill in the art in light of the inventive workability.
图1是本发明实施例提供的声纹识别方法的实现流程图;1 is a flowchart of an implementation of a voiceprint recognition method according to an embodiment of the present invention;
图2是本发明实施例提供的声纹识别方法中步骤S101的具体实现流程图;2 is a specific implementation flowchart of step S101 in the voiceprint recognition method according to the embodiment of the present invention;
图3是本发明实施例提供的声纹识别方法中步骤S102的具体的实现流程图;3 is a specific implementation flowchart of step S102 in the voiceprint recognition method according to the embodiment of the present invention;
图4是本发明实施例提供的声纹识别方法中步骤S103的具体的实现流程图;4 is a specific implementation flowchart of step S103 in the voiceprint recognition method according to the embodiment of the present invention;
图5是本发明实施例提供的声纹识别方法中步骤S104的具体的实现流程图;FIG. 5 is a specific implementation flowchart of step S104 in the voiceprint recognition method according to the embodiment of the present invention;
图6是本发明实施例提供的声纹识别装置的结构框图;6 is a structural block diagram of a voiceprint recognition apparatus according to an embodiment of the present invention;
图7是本发明实施例提供的电子设备的示意图。FIG. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。In the following description, for purposes of illustration and description However, it will be apparent to those skilled in the art that the present invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the invention.
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。In order to explain the technical solution described in the present invention, the following description will be made by way of specific embodiments.
本发明实施例基于时间递归深度神经网络来实现,对说话人模型的训练依靠训练语音的声学特征来对模型的参数进行估计及优化调整,以不同的说话人模型表示不同的说话人个性特征,获得待识别语音的特征提取矩阵后,依次与多个说话人模型进行匹配对比,将不符合匹配条件的说话人模型淘汰,最终,接收符合匹配条件的说话人模型对应的说话人作为声纹识别的结果。The embodiment of the invention is implemented based on a time recursive depth neural network. The training of the speaker model relies on the acoustic characteristics of the training speech to estimate and optimize the parameters of the model, and different speaker models are used to represent different speaker personality characteristics. After obtaining the feature extraction matrix of the speech to be recognized, the speaker model is matched with multiple speaker models in turn, and the speaker model that does not meet the matching condition is eliminated. Finally, the speaker corresponding to the speaker model matching the matching condition is received as the voiceprint recognition. the result of.
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。In order to explain the technical solution described in the present invention, the following description will be made by way of specific embodiments.
图1示出了本发明实施例提供的声纹识别方法的实现流程,详述如下:FIG. 1 is a flowchart showing an implementation process of a voiceprint recognition method according to an embodiment of the present invention, which is described in detail as follows:
在S101中,分别对输入的K条语音进行预处理,以获取所述每条语音中的有效语音,
所述语音包括训练语音及待识别语音。In S101, the input K voices are respectively preprocessed to obtain valid voices in each voice,
The speech includes training speech and speech to be recognized.
在本实施例中,通过输入数量足够多的训练语音来建立不同的说话人模型,该训练语音为已知说话人身份的标记语音样本,用于调整说话人模型的参数,使该模型能够基于监督学习,在实际应用中达到所要求的识别性能。In this embodiment, different speaker models are established by inputting a sufficient number of training speeches, which are labeled speech samples of known speaker identity, used to adjust parameters of the speaker model, so that the model can be based on Supervise learning and achieve the required recognition performance in practical applications.
当需要判断某段语音是若干人中的哪一个所说的或者用于确认某段语音是否是指定的某个人所说的时,该段语音即为待识别语音。训练语音与待识别语音的作用不同,可以是不同或相同的语音数据。当二者相同时,所述待识别语音可用以检验最终得出的说话人模型的性能,测试其是否能够准确地识别出待识别语音的说话人身份。When it is necessary to determine which of a number of people a certain voice is said or to confirm whether a certain voice is said by a designated person, the speech is the speech to be recognized. The training speech is different from the speech to be recognized, and may be different or the same speech data. When the two are the same, the speech to be recognized can be used to test the performance of the finally derived speaker model, and test whether it can accurately recognize the speaker identity of the speech to be recognized.
对所述语音进行预处理,以降低每段连续语音信号中的背景噪声水平,输出含有实际分析意义的有效语音,为后续说话人模型训练提供高信噪比的训练集,提高了模型训练的速度,达到更为准确的模型训练效果。Pre-processing the speech to reduce the background noise level in each continuous speech signal, output effective speech with practical analysis meaning, provide a high SNR training set for subsequent speaker model training, and improve model training. Speed, to achieve more accurate model training results.
作为本发明的另一个实施例,图2示出了本发明实施例提供的声纹识别方法S101的具体实现流程,详述如下:As another embodiment of the present invention, FIG. 2 shows a specific implementation flow of the voiceprint recognition method S101 provided by the embodiment of the present invention, which is described in detail as follows:
S201,分别对输入的K条语音进行预加重处理,以提升所述每条语音中的高频信号频段。S201: Perform pre-emphasis processing on the input K voices respectively to improve a frequency band of the high frequency signal in each voice.
在本实施例中,为了降低口唇辐射的影响,突出高频的共振峰,分别将每条语音信号通过一个高通滤波器来对语音中的高频部分进行加重,使语音信号的频谱变得更平滑。In this embodiment, in order to reduce the influence of the lip radiation, the high-frequency resonance peak is highlighted, and each speech signal is respectively passed through a high-pass filter to emphasize the high-frequency portion of the speech, so that the spectrum of the speech signal becomes more smooth.
S202,采用分帧加窗算法,分别将所述预加重处理后的每条语音转化为短时平稳信号。S202, using a frame-and-window algorithm, respectively converting each of the pre-emphasis processed speech into a short-time stationary signal.
选择适当数量的采样点,对所述预加重处理后的每条语音进行分帧,以使每条语音转化为多帧短时语音信号。其中,每一帧信号可视为一个平稳过程,即统计特性平稳。Each of the pre-emphasis processed speech is segmented by selecting an appropriate number of sampling points to convert each speech into a multi-frame short-term speech signal. Among them, each frame signal can be regarded as a stationary process, that is, the statistical characteristics are stable.
在本实施例中,所述加窗过程表示将原来的短时语音信号作为被积函数,并使其与特定的窗函数做积。窗函数是一种除在给定区间之外取值均为零的实函数,包含但不限于矩形窗、三角窗、汉宁窗和海明窗等窗函数。In this embodiment, the windowing process indicates that the original short-term speech signal is used as an integrand and is integrated with a particular window function. A window function is a real function that takes zero values except for a given interval, including but not limited to window functions such as rectangular windows, triangular windows, Hanning windows, and Hamming windows.
优选地,在本实施例中该窗函数为汉宁窗。Preferably, in the embodiment, the window function is a Hanning window.
S203,基于端点检测算法区分所述短时平稳信号中的噪声与语音,并将所述短时平稳信号中的语音输出为所述每条语音的有效语音。S203. Differentiate the noise and the voice in the short-term stationary signal based on the endpoint detection algorithm, and output the voice in the short-term stationary signal as the effective voice of each voice.
首先,在短时语音信号对应的短时功率谱轮廓中选取一个较高的短时能量判决门限值,并进行第一次粗判。有效语音信号的起止点位于所述门限值与短时能量包络交点所对应的时间间隔之外。First, a higher short-term energy decision threshold is selected in the short-time power spectrum profile corresponding to the short-term speech signal, and the first coarse judgment is performed. The starting and ending point of the effective speech signal is outside the time interval corresponding to the threshold value and the short-term energy envelope intersection.
根据背景噪声的平均能量,选取一个较低的短时能量判决门限值,语音短时能量包络与该门限相交的两个点作为有效语音信号的起止点,可将上述有效语音提取并输出。According to the average energy of the background noise, a lower short-term energy decision threshold is selected, and the two points of the short-term energy envelope intersecting the threshold are used as the starting and ending points of the effective speech signal, and the effective speech can be extracted and output. .
本发明实施例通过对输入的多条语音进行预加重处理,避免高频段的输出信噪比明显下降,通过提取语音信号中的有效语音,过滤所述短时平稳信号中的噪声,能够减少说话人模型训练过程中的计算量和缩短后续多个步骤的语音处理时间,能够排除无声段的噪声干扰,提高语音识别的正确率。In the embodiment of the present invention, by pre-emphasizing the input multiple voices, the output signal-to-noise ratio of the high frequency band is obviously reduced, and the noise in the short-time stationary signal is filtered by extracting the effective voice in the voice signal, thereby reducing the speaking. The amount of calculation in the human model training process and the shortening of the speech processing time of the subsequent multiple steps can eliminate the noise interference of the silent segment and improve the correct rate of speech recognition.
在S102中,提取所述每条训练语音中有效语音的梅尔频率倒谱系数(MFCC)声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵。In S102, extracting a Mel Frequency Cepstral Coefficient (MFCC) acoustic feature of the effective speech in each training speech, outputting a dimension including the Mel frequency cepstral coefficient and a number of sub-frames of each training speech The first feature matrix.
基于人耳听觉特性提出来的梅尔Mel频率与Hz频率成非线性对应关系,利用所述非线性的关系,计算得到Hz频谱特征。The Mel's Mel frequency based on the human ear's auditory characteristics is nonlinearly related to the Hz frequency. Using the nonlinear relationship, the Hz spectral feature is calculated.
Hz频率与Mel频率的转换公式为:Fmel=2595*lg(1+fHZ/700)
The conversion formula of Hz frequency and Mel frequency is: F mel = 2595 * lg (1 + f HZ / 700)
作为本发明的另一个实施例,图3示出了本发明实施例提供的声纹识别方法S102的具体实现流程,如下所述:As another embodiment of the present invention, FIG. 3 shows a specific implementation flow of the voiceprint recognition method S102 provided by the embodiment of the present invention, as follows:
在S301中,通过快速傅利叶变换分析所述每条训练语音中的有效语音,获取所述有效语音的功率谱。In S301, the effective speech in each of the training speeches is analyzed by a fast Fourier transform to obtain a power spectrum of the valid speech.
从上述实施例中提取出的有效语音经过快速傅里叶变换后,得到各帧有效语音的频谱,对所述频谱取模后,再进行平方计算,得到各帧有效语音的功率谱。功率谱上表征出来的不同能量分布,代表语音的不同特性。After the effective speech extracted from the above embodiment is subjected to fast Fourier transform, the spectrum of the effective speech of each frame is obtained, and after the modulo is applied to the spectrum, the square calculation is performed to obtain the power spectrum of the effective speech of each frame. The different energy distributions characterized by the power spectrum represent different characteristics of speech.
在S302中,采用梅尔尺度的滤波器组对所述功率谱进行滤波处理,所述滤波器组包含M个三角滤波器,并获取所述每个三角滤波器输出的对数能量。In S302, the power spectrum is filtered by a filter set of a Mel scale, the filter set includes M triangular filters, and the logarithmic energy of the output of each of the triangular filters is obtained.
所述M个三角滤波器的中心频率分别为f(m),m=1,2,……,k,其中,k优选取值为22至26。The center frequencies of the M triangular filters are f(m), m=1, 2, . . . , k, respectively, wherein k is preferably 22 to 26.
在S303中,将所述对数能量进行离散余弦变换后,输出所述有效语音的梅尔频率倒谱系数声学特征。In S303, after performing the discrete cosine transform on the logarithmic energy, the Mel frequency cepstral coefficient acoustic characteristic of the effective speech is output.
在S304中,根据所述梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵。In S304, according to the Mel frequency cepstral coefficient acoustic feature, a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of each training speech is output.
每帧有效语音信号的能量加上所述对数能量,构建了二维的MFCC声学特征。在此过程中加入多种声学特征,例如音高、过零率以及共振峰等,使得输出的第一特征矩阵能以“MFCC维度×分帧数”来表示,所述分帧数为原始输入的每条语音信号在分帧加窗处理过程中的分帧数目。The energy of the effective speech signal per frame plus the logarithmic energy constructs a two-dimensional MFCC acoustic signature. A variety of acoustic features, such as pitch, zero-crossing rate, and formant, are added in the process so that the output first feature matrix can be represented by "MFCC dimension x number of frames", which is the original input. The number of framings of each speech signal during the framing windowing process.
本发明实施例将有效语音的功率谱经过三角滤波器过滤,实现了每帧有效语音频谱的平滑化,消除了谐波的作用,突显出每帧有效语音对应的原始语音信号的共振峰。以包含MFCC声学特征维度的第一特征矩阵作为神经网络模型的输入,能够使得神经网络模型的训练不会受到输入语音的音调影响,降低了运算量。In the embodiment of the invention, the power spectrum of the effective speech is filtered by the triangular filter, the smoothing of the effective speech spectrum of each frame is realized, the harmonic effect is eliminated, and the resonance peak of the original speech signal corresponding to the effective speech of each frame is highlighted. Taking the first feature matrix including the MFCC acoustic feature dimension as the input of the neural network model enables the training of the neural network model to be not affected by the pitch of the input speech, reducing the amount of computation.
在S103中,构建长短时递归神经网络模型,并将所述第一特征矩阵输入所述神经网络模型,以获取所述神经网络模型的输出参数。In S103, a long-term recurrent neural network model is constructed, and the first feature matrix is input into the neural network model to obtain an output parameter of the neural network model.
作为本发明的另一个实施例,图4示出了本发明实施例提供的声纹识别方法S103的具体实现流程,详述如下:As another embodiment of the present invention, FIG. 4 shows a specific implementation flow of the voiceprint recognition method S103 provided by the embodiment of the present invention, which is described in detail as follows:
在S401中,初始化一个长短时递归神经网络模型,所述神经网络模型包含输入层、含有长短期记忆单元的递归层以及输出层。In S401, a long-term recursive neural network model is initialized, the neural network model including an input layer, a recursive layer containing long and short-term memory units, and an output layer.
在本实施例中,神经网络模型包含多个层次,不同层的作用有差异。在此,以五层网络为例,对长短时递归神经网络的网络结构进行阐述,可以理解的是,在实际应用的网络结构中,神经网络的层数不局限于五层。In this embodiment, the neural network model includes multiple levels, and the roles of the different layers are different. Here, taking the five-layer network as an example, the network structure of the long-short recursive neural network is explained. It can be understood that in the actual applied network structure, the number of layers of the neural network is not limited to five layers.
本实施例使用开源深度学习工具CNTK初始化一个五层长短时递归神经网络模型,该神经网络模型(DNN)的网络结构为:一个输入层,三个含有长短期记忆单元(LSTM)的递归层和一个输出层。其中,每个递归层包含有1024个节点,且包含有两级层次结构,其中一级为具有512个节点的映射层。In this embodiment, an open source deep learning tool CNTK is used to initialize a five-layer long-term recurrent neural network model. The network structure of the neural network model (DNN) is: one input layer, three recursive layers containing long-term and short-term memory units (LSTM), and An output layer. Each recursive layer contains 1024 nodes and includes a two-level hierarchy, where one level is a mapping layer with 512 nodes.
LSTM递归层输入的参数是83维的语音特征向量,基于当前帧、前五帧以及后五帧有效语音的前后信息,每次只移动一帧有效语音进行迭代计算,故总共有913维(11帧×83维)的特征向量作为LSTM的输入,该913维特征向量进入LSTM递归层后,依次通过1024个隐含层记忆单元。因此,LSTM递归层的输入与输出特征向量维度相同。
The parameters of the LSTM recursive layer input are 83-dimensional speech feature vectors. Based on the information of the current frame, the first five frames, and the last five frames of effective speech, each time only one frame of effective speech is iterated, so there are a total of 913 dimensions (11 The feature vector of frame × 83 dimension is used as the input of LSTM. After entering the LSTM recursive layer, the 913-dimensional feature vector passes through 1024 hidden layer memory cells in sequence. Therefore, the input and output feature vector dimensions of the LSTM recursive layer are the same.
对于该神经网络结构的训练可以使用随机梯度下降的优化方法。For the training of this neural network structure, an optimization method of stochastic gradient descent can be used.
在S402中,将所述第一特征矩阵输入所述神经网络模型。In S402, the first feature matrix is input to the neural network model.
在S403中,采用Softmax分类器对所述第一特征矩阵中的帧特征向量进行分类,并根据分类结果进行状态聚类,得到多类帧特征向量。In S403, the frame feature vectors in the first feature matrix are classified by using a Softmax classifier, and state clustering is performed according to the classification result to obtain a plurality of types of frame feature vectors.
在S404中,分别计算所述各类帧特征向量的后验概率,所述各类帧特征向量的后验概率为所述神经网络模型的输出参数。In S404, a posterior probabilities of the various types of frame feature vectors are respectively calculated, and a posterior probabilities of the various types of frame feature vectors are output parameters of the neural network model.
DNN输出参数为:其中,所述i表示第i帧有效语音;所述θ表示语音对应的文本信息;所述fi表示深度神经网络输入的第一特征矩阵;所述k表示输出的第k个类,对应于传统混合高斯模型中混合高斯的数量。The DNN output parameters are: Wherein, the i represents an effective speech of the i-th frame; the θ represents text information corresponding to the speech; the f i represents a first feature matrix input by the deep neural network; and the k represents the k-th class of the output, corresponding to The number of Gaussian blends in a traditional mixed Gaussian model.
在S104中,利用所述神经网络模型的输出参数及所述每条训练语音对应的说话人特征,分别训练得出N条训练语音的N个特征提取矩阵,所述每个特征提取矩阵对应一个所述训练语音的说话人模型。In S104, using the output parameters of the neural network model and the speaker features corresponding to each training speech, respectively, N feature extraction matrices of N training speeches are respectively obtained, and each feature extraction matrix corresponds to one The speaker model of the training speech.
作为本发明的另一个实施例,图5示出了本发明实施例提供的声纹识别方法S104的具体实现流程,详述如下:As another embodiment of the present invention, FIG. 5 shows a specific implementation flow of the voiceprint recognition method S104 provided by the embodiment of the present invention, which is described in detail as follows:
在S501中,获取所述神经网络模型的训练参数,所述训练参数为所述输出参数的混合权重、均值及方差。In S501, a training parameter of the neural network model is acquired, where the training parameter is a mixed weight, an average value, and a variance of the output parameter.
基于上述实施例中的DNN输出参数,所述三个训练参数的计算公式分别为:Based on the DNN output parameters in the above embodiment, the calculation formulas of the three training parameters are respectively:
方差:
variance:
在S502中,根据所述训练参数及所述训练语音对应的说话人特征,利用前向-后向算法计算所述每条训练语音对应说话人的特征向量。In S502, a feature vector corresponding to the speaker of each training speech is calculated by using a forward-backward algorithm according to the training parameter and the speaker feature corresponding to the training speech.
在本实施例中,训练语音对应的说话人特征表示训练语音的说话人身份标记信息,根据上述DNN输出参数的混合权重、均值、方差以及训练语音的标记信息,使用基于前向-后向算法原理的Baum-Welch算法,迭代估计每条训练语音所对应说话人的特征向量。In this embodiment, the speaker feature corresponding to the training voice represents the speaker identity tag information of the training voice, and the forward-backward algorithm is used according to the mixed weight, the mean value, the variance, and the tag information of the training voice of the DNN output parameter. The Baum-Welch algorithm of the principle iteratively estimates the feature vector of the speaker corresponding to each training speech.
在S503中,将所述神经网络模型的训练参数及所述每条训练语音对应说话人的特征向量迭代至收敛,得出所述每条训练语音的特征提取矩阵。In S503, the training parameters of the neural network model and the feature vector corresponding to each speaker of the training speech are iterated to convergence, and the feature extraction matrix of each training speech is obtained.
在S105中,提取所述待识别语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述待识别语音的分帧数的第二特征矩阵。In S105, extracting a Mel frequency cepstral coefficient acoustic feature of the effective speech in the to-be-recognized speech, and outputting a second feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of the to-be-recognized speech .
对于本文中S102所述具体实施例中的内容,在S105中也同样适用,区别在于本步骤中处理的原始语音信号为待识别语音,S102中处理的原始语音信号为训练语音,其余实现原理均相同,在此不一一赘述。The content of the specific embodiment in S102 is also applicable in S105. The difference is that the original voice signal processed in this step is the voice to be recognized, and the original voice signal processed in S102 is the training voice. The same, not repeated here.
在S106中,在所述N个说话人模型中,根据预设的相似性度量算法,选取出与所述第二特征矩阵相匹配的说话人模型,所述选取出的说话人模型对应的说话人输出为所述待识别语音的声纹识别结果。In S106, in the N speaker models, according to a preset similarity measurement algorithm, a speaker model matching the second feature matrix is selected, and the selected speaker model corresponds to a speech. The human output is the voiceprint recognition result of the speech to be recognized.
相似性度量算法包含但不限于距离测度、相似测度和匹配测度等算法,用以衡量所述第二特征矩阵与所述说话人模型在特征客观表征形式上的相近程度。
The similarity measurement algorithm includes, but is not limited to, an algorithm such as a distance measure, a similarity measure, and a matching measure to measure the degree of similarity between the second feature matrix and the speaker model in the objective objective representation form.
作为本发明的另一个实施例,通过相似测度算法中的余弦测度法来获取与所述第二特征矩阵相匹配的说话人模型。As another embodiment of the present invention, a speaker model that matches the second feature matrix is obtained by a cosine measure in a similarity measure algorithm.
在本实施例中,用向量空间中两个向量夹角的余弦值来衡量第二特征矩阵与N个说话人模型个体间差异的大小。通过比较输入的两个i-vector低维向量的余弦距离并设定一定的阈值,进行两个向量(代表待识别语音的第二特征矩阵与说话人模型)的相似度判断。其中,连接代表特征点与原点的直线相交于原点,夹角越小代表两个特征越相似,夹角越大代表两个特征的相似度越小。In this embodiment, the cosine of the angles of the two vectors in the vector space is used to measure the difference between the second feature matrix and the N speaker model individuals. The similarity judgment of two vectors (representing the second feature matrix of the speech to be recognized and the speaker model) is performed by comparing the cosine distances of the input two i-vector low-dimensional vectors and setting a certain threshold. Wherein, the line connecting the feature point and the origin intersects the origin, and the smaller the angle, the more similar the two features are, and the larger the angle, the smaller the similarity of the two features.
在所述N个说话人模型中,选取相似度最大的一个说话人模型,该说话人模型的原始说话人即为待识别语音的说话人,从而得到待识别语音的声纹识别结果。In the N speaker models, a speaker model with the greatest similarity is selected, and the original speaker of the speaker model is the speaker of the speech to be recognized, thereby obtaining the voiceprint recognition result of the speech to be recognized.
在本发明实施例中,采用监督学习的方式来训练声纹背景模型,通过结合说话人的特征,能够从原始训练语音数据中挖掘出更合适的声学特征集,从而能够更准确地辨别说话人的差异性特征,在语音交叠的场景下能获取更好的声纹识别效果。由于识别的主要过程是基于深度神经网络模型来实现,因此能够学习到鲁棒性更强的说话人模型,解决现有声纹识别方法识别准确率低的问题。In the embodiment of the present invention, the voiceprint background model is trained by supervised learning, and by combining the characteristics of the speaker, a more suitable acoustic feature set can be extracted from the original training voice data, thereby more accurately distinguishing the speaker. The difference feature can obtain better voiceprint recognition effect in the scene of overlapping voices. Since the main process of recognition is based on the deep neural network model, it is possible to learn a more robust speaker model and solve the problem of low recognition accuracy of the existing voiceprint recognition method.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence of the steps in the above embodiments does not imply a sequence of executions, and the order of execution of the processes should be determined by its function and internal logic, and should not be construed as limiting the implementation of the embodiments of the present invention.
对应于上文实施例所述的声纹识别方法,图6示出了本发明实施例提供的声纹识别装置的结构框图,所述声纹识别装置可以是软件模块、硬件模块或者是软硬结合的模块。为了便于说明,仅示出了与本实施例相关的部分。Corresponding to the voiceprint recognition method described in the above embodiments, FIG. 6 is a structural block diagram of a voiceprint recognition apparatus according to an embodiment of the present invention. The voiceprint recognition apparatus may be a software module, a hardware module, or a soft and hard. Combined module. For the convenience of explanation, only the parts related to the present embodiment are shown.
参照图6,该装置包括:Referring to Figure 6, the apparatus includes:
预处理模块61,用于分别对输入的K条语音进行预处理,以获取所述每条语音中的有效语音,所述语音包括训练语音及待识别语音。The pre-processing module 61 is configured to pre-process the input K voices to obtain valid voices in each voice, where the voice includes the training voice and the voice to be recognized.
第一提取模块62,用于提取所述每条训练语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵。a first extraction module 62, configured to extract a Mel frequency cepstral coefficient acoustic feature of the effective speech in each training speech, output a dimension including the Mel frequency cepstral coefficient, and a framing of each training speech The first feature matrix of the number.
构建模块63,用于构建长短时递归神经网络模型,并将所述第一特征矩阵输入所述神经网络模型,以获取所述神经网络模型的输出参数。The building module 63 is configured to construct a long-term recursive neural network model, and input the first feature matrix into the neural network model to obtain an output parameter of the neural network model.
训练模块64,用于利用所述神经网络模型的输出参数及所述每条训练语音对应的说话人特征,分别训练得出N条训练语音的N个特征提取矩阵,所述每个特征提取矩阵对应一个所述训练语音的说话人模型。The training module 64 is configured to use the output parameters of the neural network model and the speaker features corresponding to each training speech to respectively train N feature extraction matrices of N training speeches, and each feature extraction matrix Corresponding to a speaker model of the training speech.
第二提取模块65,用于提取所述待识别语音中有效语音的梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述待识别语音的分帧数的第二特征矩阵。a second extraction module 65, configured to extract a Meer frequency cepstral coefficient acoustic feature of the valid speech in the to-be-recognized speech, output a dimension including the Mel frequency cepstral coefficient, and a number of sub-frames of the to-be-recognized speech The second feature matrix.
识别模块66,用于在所述N个说话人模型中,根据预设的相似性度量算法,选取出与所述第二特征矩阵相匹配的说话人模型,所述选取出的说话人模型对应的说话人输出为所述待识别语音的声纹识别结果。The identification module 66 is configured to select, in the N speaker models, a speaker model that matches the second feature matrix according to a preset similarity measurement algorithm, where the selected speaker model corresponds to The speaker output is the voiceprint recognition result of the speech to be recognized.
其中,所述K和N为大于零的整数,且K大于N。Wherein, K and N are integers greater than zero, and K is greater than N.
可选地,所述预处理模块61包括:Optionally, the pre-processing module 61 includes:
预加重子模块,用于分别对输入的K条语音进行预加重处理,以提升所述每条语音中的高频信号频段;a pre-emphasis sub-module, configured to perform pre-emphasis processing on the input K voices respectively to improve a frequency band of the high-frequency signal in each voice;
转化子模块,用于采用分帧加窗算法,分别将所述预加重处理后的每条语音转化为短时平稳信号;
a conversion sub-module, configured to convert each voice after the pre-emphasis processing into a short-time stationary signal by using a frame-and-window algorithm;
检测子模块,用于基于端点检测算法区分所述短时平稳信号中的噪声与语音,并将所述短时平稳信号中的语音输出为所述每条语音的有效语音。And a detecting submodule, configured to distinguish the noise and the voice in the short-term stationary signal based on the endpoint detection algorithm, and output the voice in the short-term stationary signal as the effective voice of each voice.
可选地,所述第一提取模块62包括:Optionally, the first extraction module 62 includes:
获取子模块,用于通过快速傅利叶变换分析所述每条训练语音中的有效语音,获取所述有效语音的功率谱;Obtaining a sub-module, configured to analyze, by using a fast Fourier transform, the effective voice in each training voice, to obtain a power spectrum of the valid voice;
滤波子模块,用于采用梅尔尺度的滤波器组对所述功率谱进行滤波处理,所述滤波器组包含M个三角滤波器,并获取所述每个三角滤波器输出的对数能量,所述M为大于零的整数;a filtering submodule, configured to filter the power spectrum by using a filter group of a Meyer scale, the filter set comprising M triangular filters, and acquiring a logarithmic energy of the output of each of the triangular filters, The M is an integer greater than zero;
变换子模块,用于将所述对数能量进行离散余弦变换后,输出所述有效语音的梅尔频率倒谱系数声学特征;a transform submodule, configured to perform a discrete cosine transform on the logarithmic energy, and output an acoustic characteristic of the Mel frequency cepstral coefficient of the effective speech;
输出子模块,用于根据所述梅尔频率倒谱系数声学特征,输出包含所述梅尔频率倒谱系数的维度及所述每条训练语音的分帧数的第一特征矩阵。And an output submodule, configured to output, according to the Mel frequency cepstral coefficient acoustic feature, a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of framings of each training speech.
可选地,所述构建模块63包括:Optionally, the building module 63 includes:
初始化子模块,用于初始化一个长短时递归神经网络模型,所述神经网络模型包含输入层、含有长短期记忆单元的递归层以及输出层;An initialization sub-module for initializing a long-term recursive neural network model, the neural network model comprising an input layer, a recursive layer containing long and short-term memory units, and an output layer;
输入子模块,用于将所述第一特征矩阵输入所述神经网络模型;An input submodule, configured to input the first feature matrix into the neural network model;
分类子模块,用于采用Softmax分类器对所述第一特征矩阵中的帧特征向量进行分类,并根据分类结果进行状态聚类,得到多类帧特征向量;a classification sub-module, configured to classify frame feature vectors in the first feature matrix by using a Softmax classifier, and perform state clustering according to the classification result to obtain a plurality of types of frame feature vectors;
计算子模块,用于分别计算所述各类帧特征向量的后验概率,所述各类帧特征向量的后验概率为所述神经网络模型的输出参数。And a calculation submodule, configured to respectively calculate a posterior probabilities of the various types of frame feature vectors, wherein the posterior probabilities of the various types of frame feature vectors are output parameters of the neural network model.
可选地,所述训练模块64包括:Optionally, the training module 64 includes:
参数获取子模块,用于获取所述神经网络模型的训练参数,所述训练参数为所述输出参数的混合权重、均值及方差;a parameter obtaining submodule, configured to acquire training parameters of the neural network model, where the training parameter is a mixed weight, an average value, and a variance of the output parameter;
特征获取子模块,用于根据所述训练参数及所述训练语音对应的说话人特征,利用前向-后向算法计算所述每条训练语音对应说话人的特征向量;a feature acquisition sub-module, configured to calculate, according to the training parameter and the speaker feature corresponding to the training speech, a feature vector of the speaker corresponding to each training voice by using a forward-backward algorithm;
迭代子模块,用于将所述神经网络模型的训练参数及所述每条训练语音对应说话人的特征向量迭代至收敛,得出所述每条训练语音的特征提取矩阵An iterative sub-module, configured to iterate to the training parameter of the neural network model and the feature vector of each of the training speech corresponding to the speaker, to obtain a feature extraction matrix of each training speech
在本发明实施例中,采用监督学习的方式来训练声纹背景模型,通过结合说话人的特征,能够从原始训练语音数据中挖掘出更合适的声学特征集,从而能够更准确地辨别说话人的差异性特征,在语音交叠的场景下能获取更好的声纹识别效果。由于识别的主要过程是基于深度神经网络模型来实现,因此能够学习到鲁棒性更强的说话人模型,解决现有声纹识别方法识别准确率低的问题。In the embodiment of the present invention, the voiceprint background model is trained by supervised learning, and by combining the characteristics of the speaker, a more suitable acoustic feature set can be extracted from the original training voice data, thereby more accurately distinguishing the speaker. The difference feature can obtain better voiceprint recognition effect in the scene of overlapping voices. Since the main process of recognition is based on the deep neural network model, it is possible to learn a more robust speaker model and solve the problem of low recognition accuracy of the existing voiceprint recognition method.
图7是本发明实施例提供的电子设备的示意图。如图7所示,该实施例的电子设备7包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机程序72,例如声纹识别程序。所述处理器70执行所述计算机程序72时实现上述各个文件管理方法实施例中的步骤,例如图1所示的步骤101至106。或者,所述处理器70执行所述计算机程序72时实现上述各装置实施例中各模块/单元的功能,例如图6所示模块61至66的功能。FIG. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 7, the electronic device 7 of this embodiment includes a processor 70, a memory 71, and a computer program 72, such as a voiceprint recognition program, stored in the memory 71 and operable on the processor 70. When the processor 70 executes the computer program 72, the steps in the foregoing various file management method embodiments are implemented, such as steps 101 to 106 shown in FIG. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the modules/units in the various apparatus embodiments described above, such as the functions of the modules 61-66 shown in FIG.
示例性的,所述计算机程序72可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器71中,并由所述处理器70执行,以完成本发明。所述一个
或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序72在所述电子设备7中的执行过程。Illustratively, the computer program 72 can be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to complete this invention. The one
The plurality of modules/units may be a series of computer program instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer program 72 in the electronic device 7.
所述电子设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述电子设备7可包括,但不仅限于,处理器70、存储器71。本领域技术人员可以理解,图7仅仅是电子设备7的示例,并不构成对电子设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备7还可以包括输入输出设备、网络接入设备、总线等。The electronic device 7 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The electronic device 7 may include, but is not limited to, a processor 70, a memory 71. It will be understood by those skilled in the art that FIG. 7 is merely an example of the electronic device 7, and does not constitute a limitation on the electronic device 7, and may include more or less components than those illustrated, or combine some components, or different components. For example, the electronic device 7 may further include an input and output device, a network access device, a bus, and the like.
所称处理器70可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 70 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
所述存储器71可以是所述电子设备7的内部存储单元,例如电子设备7的硬盘或内存。所述存储器71也可以是所述电子设备7的外部存储设备,例如所述电子设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器71还可以既包括所述电子设备7的内部存储单元也包括外部存储设备。所述存储器71用于存储所述计算机程序以及所述电子设备7所需的其他程序和数据。所述存储器71还可以用于暂时地存储已经输出或者将要输出的数据。The memory 71 may be an internal storage unit of the electronic device 7, such as a hard disk or memory of the electronic device 7. The memory 71 may also be an external storage device of the electronic device 7, such as a plug-in hard disk equipped on the electronic device 7, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc. Further, the memory 71 may also include both an internal storage unit of the electronic device 7 and an external storage device. The memory 71 is used to store the computer program and other programs and data required by the electronic device 7. The memory 71 can also be used to temporarily store data that has been output or is about to be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能模块、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块、模块完成,即将所述装置的内部结构划分成不同的功能模块或模块,以完成以上描述的全部或者部分功能。实施例中的各功能模块、模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中,上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。另外,各功能模块、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中模块、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。It will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the division of each functional module and module described above is exemplified. In practical applications, the above functions may be assigned to different functional modules according to needs. The module is completed by dividing the internal structure of the device into different functional modules or modules to perform all or part of the functions described above. Each functional module and module in the embodiment may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module, and the integrated module may be implemented by hardware. Formal implementation can also be implemented in the form of software functional modules. In addition, the specific names of the respective functional modules and modules are only for the purpose of facilitating mutual differentiation, and are not intended to limit the scope of protection of the present application. For the specific working process of the modules and modules in the foregoing system, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the modules and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
在本发明所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述模块或模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或模块的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiment described above is merely illustrative. For example, the division of the module or module is only a logical function division. In actual implementation, there may be another division manner, for example, multiple modules or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. Alternatively, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be in electrical, mechanical or other form.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块
上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated. The components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules.
on. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明实施例各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated modules, if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage. The medium includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。
The embodiments described above are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and the modifications or substitutions do not deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in Within the scope of protection of the present invention.