CN103903634B - Activation tone detection and method and device for activation tone detection - Google Patents
Activation tone detection and method and device for activation tone detection Download PDFInfo
- Publication number
- CN103903634B CN103903634B CN201210570563.5A CN201210570563A CN103903634B CN 103903634 B CN103903634 B CN 103903634B CN 201210570563 A CN201210570563 A CN 201210570563A CN 103903634 B CN103903634 B CN 103903634B
- Authority
- CN
- China
- Prior art keywords
- tonality
- characteristic parameter
- value
- frame
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Noise Elimination (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种激活音检测(VAD)及用于激活音检测的方法(包括背景噪声检测、调性信号检测、VAD判决中当前帧激活音保持帧数的修正、VAD判决中信噪比门限的调整等方法)和装置。The present invention relates to an activation tone detection (VAD) and a method for activation tone detection (including background noise detection, tonal signal detection, correction of the current frame activation tone retention frame number in VAD judgment, signal-to-noise ratio threshold in VAD judgment adjustment methods) and devices.
背景技术Background technique
正常的语音通话中,用户有时在说话,有时在听,这个时候就会在通话过程出现非激活音阶段,正常情况下通话双方总的非语音激活阶段要超过通话双方总的语音编码时长的50%。在非激活音阶段,只有背景噪声,背景噪声通常没有任何有用信息。利用这一事实,在语音频信号处理过程中,通过激活音检测(VAD)算法检测出于激活音和非激活音,并采用不同的方法分别进行处理。现代的很多语音编码标准,如AMR,AMR-WB,都支持VAD功能。在效率方面,这些编码器的VAD并不能在所有的典型背景噪声下都达到很好的性能。特别是在非稳定噪声下,这些编码器的VAD效率都较低。而对于音乐信号,这些VAD有时候会出现错误检测,导致相应的处理算法出现明显的质量下降。In a normal voice call, the user is sometimes talking and sometimes listening. At this time, there will be an inactive tone phase during the call. Under normal circumstances, the total non-voice activation phase of both parties in the call exceeds 50% of the total voice coding time of both parties in the call. %. During the inactive tone phase, there is only background noise, which usually does not have any useful information. Taking advantage of this fact, in the process of speech and audio signal processing, the active tone and the inactive tone are detected by the activation tone detection (VAD) algorithm, and different methods are used to process them respectively. Many modern speech coding standards, such as AMR and AMR-WB, support the VAD function. In terms of efficiency, the VAD of these encoders does not perform well in all typical background noises. Especially under non-stationary noise, the VAD efficiency of these encoders is low. For music signals, these VADs sometimes have false detections, resulting in significant quality degradation of the corresponding processing algorithms.
发明内容Contents of the invention
本发明要解决的技术问题是提供一种激活音检测(VAD)及用于激活音检测的方法(包括背景噪声检测、调性信号检测、VAD判决中当前激活音保持帧数的修正、VAD判决中信噪比门限的调整等方法)和装置,以提高VAD检测的准确率。The technical problem to be solved by the present invention is to provide an activation tone detection (VAD) and a method for activation tone detection (including background noise detection, tonal signal detection, correction of current activation tone retention frame number in VAD judgment, VAD judgment Methods such as the adjustment of the signal-to-noise ratio threshold) and devices, to improve the accuracy of VAD detection.
为解决上述技术问题,本发明提供了一种激活音检测(VAD)方法,该方法包括:In order to solve the above-mentioned technical problems, the present invention provides a kind of activation tone detection (VAD) method, and this method comprises:
获得当前帧的子带信号及频谱幅值;Obtain the subband signal and spectrum amplitude of the current frame;
根据子带信号计算得到当前帧的帧能量参数、谱重心特征参数和时域稳定度特征参数的值;根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The values of the frame energy parameter, the spectral center of gravity characteristic parameter and the time domain stability characteristic parameter of the current frame are calculated according to the sub-band signal; the values of the spectral flatness characteristic parameter and the tonality characteristic parameter are obtained according to the spectral amplitude calculation;
根据前一帧估计得到的背景噪声能量、当前帧的帧能量参数及信噪比子带能量计算得到当前帧的信噪比参数;calculating the signal-to-noise ratio parameter of the current frame according to the background noise energy estimated in the previous frame, the frame energy parameter of the current frame and the signal-to-noise ratio sub-band energy;
根据当前帧帧能量参数、谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数计算得到当前帧的调性标志;According to the frame energy parameters of the current frame, the characteristic parameters of the spectral center of gravity, the characteristic parameters of the temporal stability, the characteristic parameters of the spectral flatness, and the characteristic parameters of the tonality, the tonality flag of the current frame is calculated;
根据调性标志、信噪比参数、谱重心特征参数、帧能量参数计算得到VAD判决结果。The VAD judgment result is calculated according to the tonality mark, the signal-to-noise ratio parameter, the characteristic parameter of the spectral center of gravity, and the frame energy parameter.
为解决上述技术问题,本发明提供了一种激活音检测(VAD)装置,该装置包括:In order to solve the above-mentioned technical problems, the present invention provides a kind of activation tone detection (VAD) device, and this device comprises:
滤波器组,用于获得当前帧的子带信号;A filter bank for obtaining subband signals of the current frame;
频谱幅值计算单元,用于获得当前帧的频谱幅值;A spectrum amplitude calculation unit, configured to obtain the spectrum amplitude of the current frame;
特征参数获取单元,用于根据子带信号计算得到当前帧的帧能量参数、谱重心特征参数和时域稳定度特征参数的值;根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The characteristic parameter acquisition unit is used to calculate the frame energy parameter, spectral center of gravity characteristic parameter and time domain stability characteristic parameter value of the current frame according to the subband signal; obtain the spectral flatness characteristic parameter and tonality characteristic parameter according to the spectral amplitude calculation value;
标志计算单元,用于根据当前帧帧能量参数、谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数计算得到当前帧的调性标志;A sign calculation unit, used to calculate the tonality sign of the current frame according to the current frame frame energy parameter, spectral center of gravity characteristic parameter, temporal stability characteristic parameter, spectral flatness characteristic parameter, and tonality characteristic parameter;
信噪比计算单元,用于根据前一帧估计得到的背景噪声能量、当前帧的帧能量参数及信噪比子带能量计算得到当前帧的信噪比参数;A signal-to-noise ratio calculation unit, which is used to calculate the signal-to-noise ratio parameter of the current frame according to the background noise energy estimated in the previous frame, the frame energy parameter of the current frame, and the signal-to-noise ratio sub-band energy;
VAD判决单元,用于根据调性标志、信噪比参数、谱重心特征参数、帧能量参数计算得到VAD判决结果。The VAD judgment unit is used to calculate and obtain the VAD judgment result according to the tonality flag, the signal-to-noise ratio parameter, the characteristic parameter of the center of gravity of the spectrum, and the frame energy parameter.
为解决上述技术问题,本发明提供了一种背景噪声检测方法,该方法包括:In order to solve the above technical problems, the present invention provides a background noise detection method, the method comprising:
获得当前帧的子带信号及频谱幅值;Obtain the subband signal and spectrum amplitude of the current frame;
根据子带信号计算得到的帧能量参数、谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The values of the frame energy parameter, the spectral center of gravity characteristic parameter, and the time domain stability characteristic parameter calculated according to the sub-band signal are obtained, and the values of the spectral flatness characteristic parameter and the tonality characteristic parameter are calculated according to the spectrum amplitude;
根据谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数、当前帧能量参数进行背景噪声检测,判断当前帧是否为背景噪声。Background noise detection is performed according to the characteristic parameters of spectral center of gravity, temporal stability characteristic parameters, spectral flatness characteristic parameters, tonality characteristic parameters, and current frame energy parameters to determine whether the current frame is background noise.
为解决上述技术问题,本发明提供了一种背景噪声检测装置,该装置包括:In order to solve the above technical problems, the present invention provides a background noise detection device, which includes:
滤波器组,用于获得当前帧的子带信号;A filter bank for obtaining subband signals of the current frame;
频谱幅值计算单元,用于获得当前帧的频谱幅值;A spectrum amplitude calculation unit, configured to obtain the spectrum amplitude of the current frame;
特征参数计算单元,用于根据子带信号计算得到的帧能量参数、谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The characteristic parameter calculation unit is used to obtain the values of the frame energy parameter, the spectral center of gravity characteristic parameter, and the time domain stability characteristic parameter calculated according to the subband signal, and obtain the values of the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectral amplitude ;
背景噪声判断单元,用于根据谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数、当前帧能量参数进行背景噪声检测,判断当前帧是否为背景噪声。The background noise judging unit is used to detect the background noise according to the spectral center of gravity characteristic parameter, time domain stability characteristic parameter, spectral flatness characteristic parameter, tonality characteristic parameter, and current frame energy parameter, and judge whether the current frame is background noise.
为解决上述技术问题,本发明提供了一种调性信号检测方法,该方法包括:In order to solve the above-mentioned technical problem, the present invention provides a kind of tone signal detection method, and this method comprises:
获得当前帧的子带信号及频谱幅值;Obtain the subband signal and spectrum amplitude of the current frame;
根据子带信号计算得到谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;Calculate the values of the spectral center of gravity characteristic parameter and the time domain stability characteristic parameter according to the sub-band signal, and obtain the values of the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectral amplitude;
根据调性特征参数、时域稳定度特征参数、谱平坦度特征参数、谱重心特征参数、判断当前帧是否为调性信号。Whether the current frame is a tonality signal is judged according to the characteristic parameter of tonality, the characteristic parameter of time domain stability, the characteristic parameter of spectral flatness, and the characteristic parameter of spectral center of gravity.
为解决上述技术问题,本发明提供了一种调性信号检测装置,该检测装置包括:In order to solve the above technical problems, the present invention provides a tonality signal detection device, which includes:
滤波器组,用于获得当前帧的子带信号;A filter bank for obtaining subband signals of the current frame;
频谱幅值计算单元,用于获得当前帧的频谱幅值;A spectrum amplitude calculation unit, configured to obtain the spectrum amplitude of the current frame;
特征参数计算单元,根据在子带信号计算得到谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The characteristic parameter calculation unit obtains the values of the spectral center of gravity characteristic parameter and the time domain stability characteristic parameter according to the calculation of the sub-band signal, and obtains the values of the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectral amplitude calculation;
调性信号判断单元,用于根据调性特征参数、时域稳定度特征参数、谱平坦度特征参数、谱重心特征参数判断当前帧是否为调性信号。The tonality signal judging unit is used for judging whether the current frame is a tonality signal according to the tonality characteristic parameter, the temporal stability characteristic parameter, the spectral flatness characteristic parameter, and the spectral center of gravity characteristic parameter.
为解决上述技术问题,本发明提供了一种VAD判决中当前帧激活音保持帧数的修正方法,该方法包括:In order to solve the above-mentioned technical problems, the present invention provides a method for correcting the number of frames of the current frame active tone in the VAD decision, the method comprising:
计算得到长时信噪比lt_snr和平均全带信噪比SNR2_lt_ave;Calculate the long-term signal-to-noise ratio lt_snr and the average full-band signal-to-noise ratio SNR2_lt_ave;
根据前面若干帧的判决结果、长时信噪比lt_snr、平均全带信噪比SNR2_lt_ave、当前帧的信噪比和当前帧的VAD判决结果,对当前激活音保持帧数进行修正。According to the judgment results of the previous frames, the long-term SNR lt_snr, the average full-band SNR SNR2_lt_ave, the SNR of the current frame, and the VAD judgment result of the current frame, the number of frames to keep the current activation tone is corrected.
为解决上述技术问题,本发明提供了一种VAD判决中当前激活音保持帧数的修正装置,该修正装置包括:In order to solve the above-mentioned technical problems, the present invention provides a correction device for maintaining the number of frames of the current active tone in the VAD decision, the correction device comprising:
长时信噪比计算单元,用于计算长时信噪比lt_snr;A long-term signal-to-noise ratio calculation unit, used to calculate the long-term signal-to-noise ratio lt_snr;
平均全带信噪比计算单元,用于计算平均全带信噪比SNR2_lt_ave;The average full-band SNR calculation unit is used to calculate the average full-band SNR SNR2_lt_ave;
激活音保持帧数修正单元,用于根据前面若干帧的判决结果、长时信噪比lt_snr、平均全带信噪比SNR2_lt_ave、当前帧的信噪比参数和当前帧的VAD判决结果,对当前激活音保持帧数进行修正。Activation tone keeps the frame number correction unit, is used for according to the decision result of previous several frames, long-term signal-to-noise ratio lt_snr, average full-band signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of current frame and the VAD decision result of current frame, to current Activation sound maintains frame number for correction.
为解决上述技术问题,本发明提供了一种VAD判决中信噪比门限的调整方法,该调整方法包括:In order to solve the above-mentioned technical problems, the present invention provides a method for adjusting the signal-to-noise ratio threshold in a VAD decision, the adjustment method comprising:
根据子带信号计算得到当前帧的谱重心特征参数;Calculate the spectral center of gravity characteristic parameter of the current frame according to the sub-band signal;
计算前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,得到长时信噪比lt_snr;Calculate the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated in the previous frame to obtain the long-term signal-to-noise ratio lt_snr;
根据谱重心特征参数、长时信噪比、前面连续激活音帧个数和前面连续噪声帧个数continuous_noise_num调整VAD判决的信噪比门限。Adjust the SNR threshold of the VAD decision according to the characteristic parameters of the spectral center of gravity, the long-term SNR, the number of the previous continuous activation audio frames, and the number of the previous continuous noise frames continuous_noise_num.
为解决上述技术问题,本发明提供了一种VAD判决中信噪比门限的调整装置,该调整装置包括:In order to solve the above technical problems, the present invention provides a device for adjusting the signal-to-noise ratio threshold in VAD decision, the device for adjusting comprises:
特征参数获取单元,用于根据子带信号计算得到当前帧的谱重心特征参数;A characteristic parameter acquisition unit, used to calculate the spectral center of gravity characteristic parameter of the current frame according to the subband signal;
长时信噪比计算单元,用于计算前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,得到长时信噪比lt_snr;The long-term signal-to-noise ratio calculation unit is used to calculate the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated in the previous frame to obtain the long-term signal-to-noise ratio lt_snr;
信噪比门限调整单元,用于根据谱重心特征参数、长时信噪比、前面连续激活音帧个数和前面连续噪声帧个数continuous_noise_num调整VAD判决的信噪比门限。The signal-to-noise ratio threshold adjustment unit is used to adjust the signal-to-noise ratio threshold of the VAD decision according to the characteristic parameters of the spectral center of gravity, the long-term signal-to-noise ratio, the number of continuous activation audio frames and the number of continuous_noise_num previous continuous noise frames.
本发明方法和装置克服了既有VAD算法的缺点,在提高VAD对不稳定噪声检测效率的同时也提高音乐检测的准确率。使得采用本VAD的语音频信号处理算法可以得到更好的性能。The method and device of the invention overcome the shortcomings of the existing VAD algorithm, and improve the accuracy of music detection while improving the detection efficiency of VAD for unstable noise. Therefore, the voice and audio signal processing algorithm using the VAD can obtain better performance.
附图说明Description of drawings
图1为本发明激活音检测方法实施例1的示意图;Fig. 1 is the schematic diagram of embodiment 1 of activation sound detection method of the present invention;
图2为本发明激活音检测方法实施例2的示意图;2 is a schematic diagram of Embodiment 2 of the activation sound detection method of the present invention;
图3为本发明实施例1、2中得到VAD判决结果的过程示意图;Fig. 3 is the schematic diagram of the process of obtaining VAD judgment result in the embodiment of the present invention 1,2;
图4为本发明激活音检测(VAD)装置实施例1的模块结构示意图;Fig. 4 is the module structure schematic diagram of Embodiment 1 of activation sound detection (VAD) device of the present invention;
图5为本发明激活音检测(VAD)装置实施例2的模块结构示意图;Fig. 5 is the module structure schematic diagram of Embodiment 2 of activation sound detection (VAD) device of the present invention;
图6为本发明VAD装置中的VAD判决单元的模块结构示意图;6 is a schematic diagram of the module structure of the VAD decision unit in the VAD device of the present invention;
图7为本发明背景噪声检测方法实施例的示意图;7 is a schematic diagram of an embodiment of the background noise detection method of the present invention;
图8为本发明背景噪声检测装置的模块结构示意图;8 is a schematic diagram of the module structure of the background noise detection device of the present invention;
图9为本发明调性信号检测方法实施例的示意图;FIG. 9 is a schematic diagram of an embodiment of a method for detecting a tone signal in the present invention;
图10为本发明调性信号检测装置的模块结构示意图;10 is a schematic diagram of the module structure of the tonality signal detection device of the present invention;
图11为本发明调性信号检测装置的调性信号判断单元的模块结构示意图;Fig. 11 is a schematic diagram of the module structure of the tonality signal judging unit of the tonality signal detection device of the present invention;
图12为本发明VAD判决中当前激活音保持帧数的修正方法实施例的示意图;FIG. 12 is a schematic diagram of an embodiment of a method for correcting the number of frames maintained by the current active tone in the VAD decision of the present invention;
图13为本发明VAD判决中当前激活音保持帧数的修正装置的模块结构示意图;Fig. 13 is a schematic diagram of the module structure of the correction device for maintaining the frame number of the current active tone in the VAD judgment of the present invention;
图14为本发明VAD判决中信噪比门限的调整方法实施例的示意图;FIG. 14 is a schematic diagram of an embodiment of a method for adjusting the SNR threshold in VAD decision of the present invention;
图15为本发明调整信噪比门限的具体流程示意图;FIG. 15 is a schematic diagram of a specific flow for adjusting the SNR threshold in the present invention;
图16为本发明VAD判决中信噪比门限的调整装置的模块结构示意图。FIG. 16 is a schematic diagram of the module structure of the device for adjusting the signal-to-noise ratio threshold in the VAD decision of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而非全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.
本发明激活音检测(VAD,Voice Activity Detection)方法实施例1,如图1所示,该方法包括:Embodiment 1 of the Voice Activity Detection (VAD, Voice Activity Detection) method of the present invention, as shown in Figure 1, the method includes:
步骤101:获得当前帧的子带信号及频谱幅值;Step 101: Obtain the subband signal and spectrum amplitude of the current frame;
本实施例中以帧长为20ms,采样率为32kHz的音频流为例具体说明。在其它帧长和采样率条件下,本发明的方法同样适用。In this embodiment, an audio stream with a frame length of 20 ms and a sampling rate of 32 kHz is taken as an example for specific description. Under other conditions of frame length and sampling rate, the method of the present invention is also applicable.
将当前帧时域信号输入滤波器组,进行子带滤波计算,得到滤波器组子带信号;Input the time-domain signal of the current frame into the filter bank, perform sub-band filtering calculation, and obtain the sub-band signal of the filter bank;
本实施例中采用一个40通道的滤波器组,本发明对于采用其他通道数的滤波器组同样适用。In this embodiment, a 40-channel filter bank is used, and the present invention is also applicable to filter banks with other numbers of channels.
将当前帧时域信号输入40通道的滤波器组,进行子带滤波计算,得到16个时间样点上40个子带的滤波器组子带信号X[k,l],0≤k<40,0≤l<16,其中k为滤波器组子带的索引,其值表示系数对应的子带,l为各个子带的时间样点索引,其实现步骤如下:Input the time-domain signal of the current frame into the filter bank of 40 channels, perform sub-band filtering calculation, and obtain the filter bank sub-band signal X[k,l] of 40 sub-bands on 16 time samples, 0≤k<40, 0≤l<16, where k is the index of the subband of the filter bank, its value indicates the subband corresponding to the coefficient, l is the time sample index of each subband, and its implementation steps are as follows:
101a:将最近的640个音频信号样值存储在数据缓存中。101a: Store the latest 640 audio signal samples in the data buffer.
101b:将数据缓存中的数据移40个位置,把最早的40个采样值移出数据缓存,并把40个新的样点存入到0到39的位置上。101b: Move the data in the data buffer by 40 positions, move the earliest 40 sample values out of the data buffer, and store 40 new sample points into positions from 0 to 39.
将缓存中的数据x乘上窗系数,得到数组z,计算方程式如下:Multiply the data x in the cache by the window coefficient to get the array z. The calculation formula is as follows:
z[n]=x[n]·Wqmf[n];0≤n<640;z[n]=x[n]·W qmf [n]; 0≤n<640;
其中Wqmf为滤波器组窗系数。Where W qmf is the filter bank window coefficient.
采用以下的伪代码计算得到一个80点的数据u,Use the following pseudo code to calculate a data u of 80 points,
采用下面的方程计算得到数组r和i:The arrays r and i are calculated using the following equations:
r[n]=u[n]-u[79-n],0≤n<40r[n]=u[n]-u[79-n], 0≤n<40
i[n]=u[n]+u[79-n]i[n]=u[n]+u[79-n]
101c:重复101b的计算过程,直到将本帧的所有数据都经过滤波器组滤波,最后的输出结果即为滤波器组子带信号X[k,l]。101c: Repeat the calculation process of 101b until all the data in this frame is filtered by the filter bank, and the final output result is the filter bank sub-band signal X[k,l].
101d:完成上面计算过程后,得到40个子带的16个时间样点的滤波器组子带信号X[k,l],0≤k<40,0≤l<16。101d: After completing the above calculation process, the filter bank subband signal X[k, l] of 16 time samples of 40 subbands is obtained, 0≤k<40, 0≤l<16.
对滤波器组子带信号进行时频变换,并计算得到频谱幅值。Time-frequency transformation is performed on the sub-band signal of the filter bank, and the spectrum amplitude is calculated.
其中对全部滤波器组子带或部分滤波器组子带进行时频变换,计算频谱幅值,都可以实现本发明实施例。本发明的所述时频变换方法可以是DFT、FFT、DCT或DST。本实施例采用DFT为例,说明其具体实现方法。计算过程如下:The embodiment of the present invention can be realized by performing time-frequency transformation on all filter bank subbands or part of filter bank subbands and calculating spectrum amplitude. The time-frequency transformation method of the present invention may be DFT, FFT, DCT or DST. This embodiment uses DFT as an example to describe its specific implementation method. The calculation process is as follows:
对索引为0到9的每个滤波器组子带上的16个时间样点数据进行16点的DFT变换,进一步提高频谱分辨率,并计算各个频点的幅值,得到频谱幅值XDFT_AMP。Perform 16-point DFT transformation on the 16 time sample point data on each filter bank subband indexed from 0 to 9 to further improve the spectral resolution, and calculate the amplitude of each frequency point to obtain the spectral amplitude X DFT_AMP .
计算各个频点的幅值过程如下:The process of calculating the amplitude of each frequency point is as follows:
首先,计算数组XDFT[k][j]在各个点上的能量,计算方程式如下:First, calculate the energy of the array X DFT [k][j] at each point, the calculation equation is as follows:
XDFT_POW[k,j]=(real(XDFT[k,j])2+(image(XDFT[k,j])2;0≤k<10;0≤j<16;其中real(XDFT_POW[k,j]),image(XDFT_POW[k,j])分别表示频谱系数XDFT_POW[k,j]的实部和虚部。X DFT_POW [k, j]=(real(X DFT [k, j]) 2 +(image(X DFT [k, j]) 2 ; 0≤k<10;0≤j<16; where real(X DFT_POW [k,j]), image(X DFT_POW [k,j]) represent the real part and imaginary part of spectral coefficient XD FT_POW [k,j] respectively.
如果k为偶数,则采用以下方程计算各个频点上的频谱幅值:If k is an even number, the following equation is used to calculate the spectrum amplitude at each frequency point:
如果k为奇数,则采用以下方程计算各个频点上的频谱幅值:If k is an odd number, the following equation is used to calculate the spectrum amplitude at each frequency point:
XDFT_AMP即为时频变换后的频谱幅值。X DFT_AMP is the spectrum amplitude after time-frequency transformation.
步骤102:根据子带信号计算得到当前帧的帧能量参数和谱重心特征参数的值;Step 102: Calculate and obtain the values of the frame energy parameter and spectral centroid characteristic parameter of the current frame according to the subband signal;
帧能量参数、谱重心特征参数和调性特征参数的值,可采用现有技术方法获得,优选地,各参数采用如下方法获得:The values of the frame energy parameter, spectral center of gravity characteristic parameter and tonality characteristic parameter can be obtained by prior art methods, preferably, each parameter is obtained by the following method:
所述帧能量参数是各个子带信号能量的加权叠加值或直接叠加值;具体地:The frame energy parameter is a weighted superposition value or a direct superposition value of each subband signal energy; specifically:
a)根据滤波器组子带信号X[k,l]计算各滤波器组子带的能量,计算方程式如下:a) Calculate the energy of each filter bank subband according to the filter bank subband signal X[k, l], the calculation equation is as follows:
b)将部分听觉比较敏感的滤波器组子带或所有的滤波器组子带的能量累加,得到帧能量参数。b) accumulating the energy of some of the filter bank sub-bands that are more sensitive to hearing or all the filter bank sub-bands to obtain the frame energy parameter.
其中根据心理听觉模型,人耳对极低频(如100Hz以下)和高频(如20kHz以上)声音会比较不敏感,本发明认为按照频率从低到高排列的滤波器组子带,从第二个子带到倒数第二个子带为听觉比较敏感的主要滤波器组子带,将部分或全部听觉比较敏感的滤波器组子带能量累加得到帧能量参数1,计算方程式如下:Wherein according to the psychoacoustic model, the human ear is less sensitive to extremely low frequency (such as below 100Hz) and high frequency (such as above 20kHz) sounds, the present invention thinks that according to the filter bank sub-bands arranged from low to high frequency, from the second The penultimate sub-band of the first sub-band is the main filter bank sub-band that is more sensitive to hearing. The frame energy parameter 1 is obtained by accumulating the energy of some or all of the filter bank sub-bands that are more sensitive to hearing. The calculation formula is as follows:
其中,e_sb_start为起始子带索引,其取值范围为[0,6]。e_sb_end为结束子带索引,其取值大于6,小于子带总数。Wherein, e_sb_start is the start subband index, and its value range is [0, 6]. e_sb_end is the end subband index, and its value is greater than 6 and less than the total number of subbands.
帧能量参数1的值加上部分或全部在计算帧能量参数1时未使用的滤波器组子带的能量的加权值,得到帧能量参数2,其计算方程式如下:The value of frame energy parameter 1 adds the weighted value of the energy of the unused filter bank subband when calculating frame energy parameter 1 partially or completely, obtains frame energy parameter 2, and its calculation equation is as follows:
其中e_scale1,e_scale2为加权比例因子,其取值范围分别为[0,1]。num_band为子带总个数。Among them, e_scale1 and e_scale2 are weighted scale factors, and their value ranges are [0, 1] respectively. num_band is the total number of subbands.
谱重心特征参数是通过求滤波器组子带能量加权相加的和与子带能量的直接相加的和的比值或通过对其他谱重心特征参数值进行平滑滤波得到的。The characteristic parameter of the spectral center of gravity is obtained by calculating the ratio of the sum of the weighted sum of the subband energy of the filter bank to the sum of the direct sum of the subband energy or by smoothing and filtering the values of other characteristic parameters of the spectral centroid.
谱重心特征参数可以采用如下子步骤实现:The characteristic parameters of spectral center of gravity can be realized by the following sub-steps:
a:将用于谱重心特征参数计算的子带区间划分如下:a: The sub-band intervals used for the calculation of the characteristic parameters of the spectral center of gravity are divided as follows:
b:采用a的谱重心特征参数计算区间划分方式和以下公式,计算得到两个谱重心特征参数值,分别为第一区间谱重心特征参数和第二区间谱重心特征参数。b: Using the spectral centroid characteristic parameter calculation interval division method and the following formula in a, two spectral centroid characteristic parameter values are calculated, which are the spectral centroid characteristic parameter of the first interval and the spectral centroid characteristic parameter of the second interval.
Delta1,Delta2分别为一个小的偏置值,取值范围为(0,1)。其中k为谱重心编号索引。Delta1 and Delta2 are a small offset value respectively, and the value range is (0, 1). where k is the spectral centroid number index.
c:对第一区间谱重心特征参数sp_center[0]进行平滑滤波运算,得到平滑谱重心特征参数值,即第一区间谱重心特征参数值的平滑滤波值,计算过程如下:c: Perform smoothing filtering operation on the characteristic parameter sp_center[0] of the spectral center of gravity in the first interval to obtain the characteristic parameter value of the smoothing spectral center of gravity, that is, the smoothing filtering value of the characteristic parameter value of the spectral center of gravity in the first interval. The calculation process is as follows:
sp_center[2]=sp_center--1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_scale)sp_center[2]= sp_center --1 [2]·spc_sm_scale+sp_center[0]·(1-spc_sm_scale)
其中,spc_sm_scale为谱重心参数平滑滤波比例因子,sp_center-1[2]表示上一帧的平滑谱重心特征参数值,其初始值为1.6。Among them, spc_sm_scale is the smoothing filter scale factor of spectral center of gravity parameter, sp_center -1 [2] represents the smoothing spectral center of gravity characteristic parameter value of the previous frame, and its initial value is 1.6.
步骤103:根据前一帧估计得到的背景噪声能量、当前帧的帧能量参数及信噪比子带能量计算得到当前帧的信噪比参数;Step 103: Calculate the SNR parameter of the current frame according to the background noise energy estimated in the previous frame, the frame energy parameter of the current frame, and the SNR subband energy;
前一帧的背景噪声能量可通过现有方法获得。The background noise energy of the previous frame can be obtained by existing methods.
如果当前帧是起始帧,信噪比子带背景噪声能量的值采用默认的初值。前一帧信噪比子带背景噪声能量估计与当前帧的信噪比子带背景能量估计的原理相同,当前帧的信噪比子带背景能量估计参见下文实施例2中的步骤207。具体地,当前帧的信噪比参数可采用现有信噪比计算方法实现。优选地,采用以下方法:If the current frame is the starting frame, the value of the background noise energy of the SNR subband adopts the default initial value. The previous frame SNR sub-band background noise energy estimation is based on the same principle as the current frame SNR sub-band background energy estimation. For the current frame SNR sub-band background energy estimation, refer to step 207 in Embodiment 2 below. Specifically, the signal-to-noise ratio parameter of the current frame can be realized by using an existing signal-to-noise ratio calculation method. Preferably, the following methods are used:
首先,将滤波器组子带重新划分为若干信噪比子带,划分索引如下表,First, the filter bank subband is re-divided into several SNR subbands, and the division index is as follows,
再次、根据当前帧各个信噪比子带的能量和上一帧各个信噪比子带的背景噪声能量计算子带平均信噪比SNR1。计算方程如下:Again, calculate the sub-band average SNR1 according to the energy of each SNR sub-band in the current frame and the background noise energy of each SNR sub-band in the previous frame. The calculation equation is as follows:
其中Esb2_bg为估计得到的上一帧各个信噪比子带的背景噪声能量,num_band信噪比子带个数。得到上一帧信噪比子带的背景噪声能量的原理与得到当前帧的信噪比子带背景能量的原理相同,得到当前帧的信噪比子带背景能量的过程参见下文实施例2的步骤207;Where E sb2_bg is the estimated background noise energy of each SNR sub-band in the previous frame, and num_band is the number of SNR sub-bands. The principle of obtaining the background noise energy of the SNR sub-band of the previous frame is the same as the principle of obtaining the background energy of the SNR sub-band of the current frame. For the process of obtaining the background energy of the SNR sub-band of the current frame, refer to Embodiment 2 below Step 207;
最后,根据估计得到的上一帧全带背景噪声能量和当前帧的帧能量参数,计算全带信噪比SNR2:Finally, according to the estimated full-band background noise energy of the previous frame and the frame energy parameters of the current frame, the full-band signal-to-noise ratio SNR2 is calculated:
其中Et_bg为估计得到的上一帧全带背景噪声能量,得到上一帧全带背景噪声能量原理与得到当前帧的全带背景噪声能量的原理相同,得到当前帧的全带背景噪声能量的过程参见下文实施例2的步骤207;Where E t_bg is the estimated full-band background noise energy of the previous frame, the principle of obtaining the full-band background noise energy of the previous frame is the same as the principle of obtaining the full-band background noise energy of the current frame, and obtaining the full-band background noise energy of the current frame Refer to step 207 of Embodiment 2 below for the process;
本实施例中信噪比参数包括子带平均信噪比SNR1和全带信噪比SNR2。全带背景噪声能量和各个子带的背景噪声能量统称为背景噪声能量。The signal-to-noise ratio parameters in this embodiment include sub-band average signal-to-noise ratio SNR1 and full-band signal-to-noise ratio SNR2. The full-band background noise energy and the background noise energy of each sub-band are collectively referred to as the background noise energy.
步骤104:根据调性标志、信噪比参数、谱重心特征参数、帧能量参数计算得到VAD判决结果。Step 104: Calculate and obtain a VAD decision result according to the tonality flag, the signal-to-noise ratio parameter, the characteristic parameter of the center of gravity of the spectrum, and the frame energy parameter.
实施例2Example 2
本发明激活音检测(VAD)方法实施例2,对输入的音频信号分帧进行多相滤波,得到滤波器组子带信号,并对滤波器组子带信号进一步进行时频变换,并计算得到频谱幅值,分别在各个滤波器组子带信号和频谱幅值上进行信号特征提取,得到各个特征参数值。根据特征参数值计算得到当前帧的背景噪声标识和调性标志。根据当前帧能量参数值和背景噪声能量计算得到当前帧的信噪比参数,根据计算得到的当前帧的信噪比参数、前面帧的VAD(语音激活检测,Voice Activity Detection)判决结果和各个特征参数,判断当前帧是否是激活音帧。根据激活音帧判决结果对背景噪声标识进行修正,得到新的背景噪声标识。根据新的背景噪声标识判断是否对背景噪声进行更新。VAD检测的具体过程如下:Embodiment 2 of the activation tone detection (VAD) method of the present invention performs polyphase filtering on the input audio signal in sub-frames to obtain the filter bank sub-band signal, and further performs time-frequency conversion on the filter bank sub-band signal, and calculates Spectrum amplitude, signal feature extraction is performed on each filter bank sub-band signal and spectrum amplitude respectively, and each characteristic parameter value is obtained. According to the value of the characteristic parameter, the background noise mark and the tonality mark of the current frame are obtained. According to the current frame energy parameter value and background noise energy calculation, the SNR parameter of the current frame is obtained, and according to the calculated SNR parameter of the current frame, the VAD (Voice Activity Detection, Voice Activity Detection) judgment result and each feature of the previous frame Parameter to determine whether the current frame is an active audio frame. According to the judgment result of the active sound frame, the background noise mark is corrected to obtain a new background noise mark. Whether to update the background noise is judged according to the new background noise identifier. The specific process of VAD detection is as follows:
如图2所示,该方法实施例2包括:As shown in Figure 2, the method embodiment 2 includes:
步骤201:获得当前帧的子带信号及频谱幅值;Step 201: Obtain the subband signal and spectrum amplitude of the current frame;
步骤202:根据子带信号计算得到当前的帧能量参数、谱重心特征参数、时域稳定度特征参数的值;根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;Step 202: Calculate and obtain the values of the current frame energy parameter, spectral center of gravity characteristic parameter, and time-domain stability characteristic parameter according to the sub-band signal; obtain the values of the spectral flatness characteristic parameter and tonality characteristic parameter according to the spectrum amplitude;
所述帧能量参数是各个子带信号能量的加权叠加值或直接叠加值;The frame energy parameter is a weighted superposition value or a direct superposition value of each subband signal energy;
所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值;The characteristic parameter of the spectral center of gravity is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy;
具体地,specifically,
根据各个滤波器组子带的能量计算得到谱重心特征参数,谱重心特征参数是通过求滤波器组子带能量加权相加的和与子带能量的直接相加的和的比值或通过对其他谱重心特征参数值进行平滑滤波得到的。The characteristic parameters of the spectral center of gravity are calculated according to the energy of each filter bank subband, and the characteristic parameter of the spectral center of gravity is obtained by calculating the ratio of the weighted sum of the filter bank subband energy to the direct sum of the subband energy or by other It is obtained by smoothing and filtering the characteristic parameter values of the spectral center of gravity.
谱重心特征参数可以采用如下子步骤实现:The characteristic parameters of spectral center of gravity can be realized by the following sub-steps:
a:将用于谱重心特征参数计算的子带区间划分如下:a: The sub-band intervals used for the calculation of the characteristic parameters of the spectral center of gravity are divided as follows:
b:采用a的谱重心特征参数计算区间划分方式和以下公式,计算得到两个谱重心特征参数值,分别为第一区间谱重心特征参数和第二区间谱重心特征参数。b: Using the spectral centroid characteristic parameter calculation interval division method and the following formula in a, two spectral centroid characteristic parameter values are calculated, which are the spectral centroid characteristic parameter of the first interval and the spectral centroid characteristic parameter of the second interval.
Delta1,Delta2分别为一个小的偏置值,取值范围为(0,1)。其中k为谱重心编号索引。Delta1 and Delta2 are a small offset value respectively, and the value range is (0, 1). where k is the spectral centroid number index.
c:对第一区间谱重心特征参数sp_center[0]进行平滑滤波运算,得到平滑谱重心特征参数值,即第一区间谱重心特征参数值的平滑滤波值,计算过程如下:c: Perform smoothing filtering operation on the characteristic parameter sp_center[0] of the spectral center of gravity in the first interval to obtain the characteristic parameter value of the smoothing spectral center of gravity, that is, the smoothing filtering value of the characteristic parameter value of the spectral center of gravity in the first interval. The calculation process is as follows:
sp_center[2]=sp_center-1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_scale)sp_center[2]=sp_center -1 [2]·spc_sm_scale+sp_center[0]·(1-spc_sm_scale)
其中,spc_sm_scale为谱重心参数平滑滤波比例因子,sp_center-1[2]表示上一帧的平滑谱重心特征参数值其初始值为1.6。Among them, spc_sm_scale is the smoothing filter scaling factor of spectral center of gravity parameter, sp_center -1 [2] represents the smoothing spectral center of gravity characteristic parameter value of the previous frame, and its initial value is 1.6.
所述时域稳定度特征参数是幅值叠加值的方差和幅值叠加值平方的期望的比值,或该比值乘上一个系数;The time-domain stability characteristic parameter is the expected ratio of the variance of the amplitude superposition value to the square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
具体地,specifically,
由最新的若干帧信号的帧能量参数计算得到时域稳定度特征参数。在本实施例中采用最新的40帧信号的帧能量参数计算得到时域稳定度特征参数。具体计算步骤为:The time-domain stability characteristic parameters are calculated from the frame energy parameters of the latest several frame signals. In this embodiment, the frame energy parameters of the latest 40-frame signals are used to calculate the time-domain stability characteristic parameters. The specific calculation steps are:
首先,计算得到最近40帧信号的能量幅值,计算方程如下:First, calculate the energy amplitude of the latest 40 frames of signals, and the calculation equation is as follows:
其中,e_offset为一个偏置值,其取值范围为[0,0.1]Among them, e_offset is an offset value, and its value range is [0, 0.1]
其次,依次将当前帧到前面第40帧的相邻两帧的能量幅值相加,得到20个幅值叠加值。具体计算方程如下:Secondly, the energy amplitudes of the two adjacent frames from the current frame to the previous 40th frame are sequentially added to obtain 20 amplitude superposition values. The specific calculation equation is as follows:
Ampt2(n)=Ampt1(-2n)+Ampt1(-2n-1);0≤n<20;Amp t2 (n)=Amp t1 (-2n)+Amp t1 (-2n-1); 0≤n<20;
其中,n=0时,Ampt1表示当前帧的能量幅值,n<0时,Ampt1表示当前帧往前的n帧的能量幅值。Wherein, when n=0, Amp t1 represents the energy amplitude of the current frame, and when n<0, Amp t1 represents the energy amplitude of n frames before the current frame.
最后,通过计算最近的20个幅值叠加值的方差和平均能量的比值,得到时域稳定度特征参数1td_stable_rate0。计算方程式如下:Finally, the time-domain stability characteristic parameter 1td_stable_rate0 is obtained by calculating the ratio of the variance and the average energy of the latest 20 amplitude superposition values. The calculation formula is as follows:
所述谱平坦度特征参数是某些频谱幅值的几何平均数和算术平均数的比值,或该比值乘上一个系数;The characteristic parameter of spectral flatness is the ratio of the geometric mean and the arithmetic mean of certain spectral amplitudes, or the ratio is multiplied by a coefficient;
具体地,将频谱幅值XDFT_AMP划分成若干个频带,并计算当前帧各个频带的谱平坦度,得到当前帧的谱平坦度特征参数。Specifically, the spectral amplitude X DFT_AMP is divided into several frequency bands, and the spectral flatness of each frequency band of the current frame is calculated to obtain the characteristic parameter of the spectral flatness of the current frame.
本实施例将频谱幅值划分成3个频带,并计算这3个频带的谱平坦度特征,其具体实现步骤如下:In this embodiment, the spectrum amplitude is divided into 3 frequency bands, and the spectral flatness characteristics of these 3 frequency bands are calculated. The specific implementation steps are as follows:
首先、将XDFT_AMP按照下表的索引划分为3个频带。First, divide X DFT_AMP into 3 frequency bands according to the indexes in the table below.
其次、分别计算各个子带的谱平坦度,得到当前帧的谱平坦度特征参数。当前帧的各个谱平坦度特征参数值的计算方程如下:Secondly, the spectral flatness of each sub-band is calculated respectively to obtain the characteristic parameter of the spectral flatness of the current frame. The calculation equation of each spectral flatness characteristic parameter value of the current frame is as follows:
最后,对当前帧的谱平坦度特征参数进行平滑滤波,得到当前帧最终的谱平坦度特征参数。Finally, smoothing and filtering are performed on the spectral flatness characteristic parameters of the current frame to obtain the final spectral flatness characteristic parameters of the current frame.
sSMR(k)=smr_scale·sSMR-1(k)+(1-smr_scale)·SMR(k);0≤k<3sSMR(k)=smr_scale·sSMR -1 (k)+(1-smr_scale)·SMR(k); 0≤k<3
其中smr_scale为平滑因子,其取值范围为[0.6,1],sSMR-1(k)为上一帧的第k个谱平坦度特征参数的值。。Among them, smr_scale is the smoothing factor, and its value range is [0.6, 1], and sSMR -1 (k) is the value of the kth spectral flatness characteristic parameter of the previous frame. .
调性特征参数是通过计算前后两帧信号的帧内频谱差分系数的相关值得到的,或继续对该相关值进行平滑滤波得到的。The tonality feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficients of the two frames of signals before and after, or by continuing to smooth and filter the correlation value.
具体地,前后两帧信号的帧内频谱差分系数的相关值的计算方法如下:Specifically, the calculation method of the correlation value of the intra-frame spectral difference coefficients of the two frames of signals before and after is as follows:
根据频谱幅值计算得到调性特征参数,其中调性特征参数可以根据所有频谱幅值或部分频谱幅值计算得到。The tonality characteristic parameters are calculated according to the spectrum amplitudes, wherein the tonality characteristic parameters can be calculated according to all or part of the spectrum amplitudes.
其计算步骤如下:Its calculation steps are as follows:
a,将部分(不小于8个频谱系数)或全部频谱幅值跟相邻的频谱幅值做差分运算,并将差分结果小于0的值置0,得到一组非负的频谱差分系数。a. Partial (not less than 8 spectral coefficients) or all spectral amplitudes are differentially calculated with adjacent spectral amplitudes, and the value of the differential result less than 0 is set to 0 to obtain a set of non-negative spectral differential coefficients.
本实施例选择位置索引为3到61的频点系数为例,计算调性特征参数。具体过程如下:In this embodiment, frequency point coefficients with position indexes ranging from 3 to 61 are selected as an example to calculate tonality characteristic parameters. The specific process is as follows:
将频点3到频点61的相邻频谱幅值做差分运算,方程式如下:The adjacent spectrum amplitudes from frequency point 3 to frequency point 61 are differentially calculated, and the equation is as follows:
spec_dif[n-3]=XDFT_AMP(n+1)-XDFT_AMP(n);3≤n<62;spec_dif[n-3]=X DFT_AMP (n+1)-X DFT_AMP (n); 3≤n<62;
将spec_dif中小于0的变量置零。Zero the variables in spec_dif that are less than 0.
b,求取步骤a计算得到的当前帧非负的频谱差分系数和前一帧非负的频谱差分系数的相关系数,得到第一调性特征参数值。计算方程式如下:b. Calculating the correlation coefficient between the current frame non-negative spectral difference coefficient calculated in step a and the previous frame non-negative spectral difference coefficient to obtain the first tonality characteristic parameter value. The calculation formula is as follows:
其中,pre_spec_dif为前一帧的非负的频谱差分系数。Among them, pre_spec_dif is the non-negative spectral difference coefficient of the previous frame.
c,对第一调性特征参数值进行平滑运算,得到第二调性特征参数值。计算方程如下:c, performing a smoothing operation on the first tonality characteristic parameter value to obtain the second tonality characteristic parameter value. The calculation equation is as follows:
tonality_rate2=tonal_scale·tonality_rate2-1+(1-tonal_scale)·tonality_rate1tonality_rate2 = tonal_scale tonality_rate2 -1 + (1-tonal_scale) tonality_rate1
tonal_scale为调性特征参数平滑因子,其取值范围为[0.1,1],tonality_rate2-1为前一帧的第二调性特征参数值,其初始值取值范围为[0,1]。tonal_scale is the smoothing factor of the tonal characteristic parameter, and its value range is [0.1, 1], and tonality_rate2 -1 is the second tonal characteristic parameter value of the previous frame, and its initial value ranges from [0, 1].
步骤203:根据前一帧估计得到的背景噪声能量、当前帧的帧能量参数及信噪比子带能量计算得到当前帧的信噪比参数;Step 203: Calculate the SNR parameter of the current frame according to the background noise energy estimated in the previous frame, the frame energy parameter of the current frame, and the SNR subband energy;
步骤204:根据当前帧帧能量参数、谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数计算得到当前帧的初始背景噪声标识和调性标志;Step 204: According to the frame energy parameters of the current frame, the characteristic parameters of the spectral center of gravity, the characteristic parameters of the time domain stability, the characteristic parameters of the spectral flatness, and the characteristic parameters of the tonality, calculate and obtain the initial background noise mark and the tonality mark of the current frame;
步骤205:根据调性标志、信噪比参数、谱重心特征参数、帧能量参数计算得到VAD判决结果;Step 205: Calculate and obtain the VAD judgment result according to the tonality flag, signal-to-noise ratio parameter, spectral center of gravity characteristic parameter, and frame energy parameter;
具体地,该步骤205的具体实现方法参见下文结合图3的描述。Specifically, for a specific implementation method of step 205, refer to the description in conjunction with FIG. 3 below.
可理解地,步骤205VAD判决之前的步骤,只要其中的参数没有前后因果关系,则前后顺序可调,比如得到初始背景噪声标识和调性标志的步骤204可以在信噪比计算步骤203之前。Understandably, as long as there is no causal relationship between the parameters of the steps before the VAD decision in step 205, the sequence can be adjusted. For example, the step 204 of obtaining the initial background noise flag and the tonality flag can be before the signal-to-noise ratio calculation step 203.
当前帧的初始背景噪声标识需要修正后用于下一帧信噪比参数的计算,因此得到当前帧的初始背景噪声标识的操作也可以在VAD判决之后。The initial background noise identifier of the current frame needs to be corrected for the calculation of the SNR parameter of the next frame, so the operation of obtaining the initial background noise identifier of the current frame may also be performed after the VAD decision.
步骤206:根据当前帧VAD的判决结果、调性特征参数、信噪比参数、调性标志、时域稳定度特征参数对初始背景噪声标识进行修正;Step 206: Correct the initial background noise identifier according to the decision result of the current frame VAD, the tonality characteristic parameter, the signal-to-noise ratio parameter, the tonality flag, and the time domain stability characteristic parameter;
如果信噪比参数SNR2小于一个设定的门限值SNR2_redec_thr1,SNR1小于SNR1_redec_thr1、VAD标志vad_f1ag等于0、调性特征参数tonality_rate2小于tonality_rate2_thr1、调性标志tonality_flag等于0并且时域稳定度特征参数lt_stable_rate0小于lt_stable_rate0_redec_thr1(设置为0.1),则背景噪声标识赋值为1。If the signal-to-noise ratio parameter SNR2 is less than a set threshold value SNR2_redec_thr1, SNR1 is less than SNR1_redec_thr1, the VAD flag vad_f1ag is equal to 0, the tonality characteristic parameter tonality_rate2 is less than tonality_rate2_thr1, the tonality flag tonality_flag is equal to 0 and the time domain stability characteristic parameter lt_stable_rate0 is less than lt_stable_rate0_redec_thr1 (set to 0.1), the background noise flag is assigned a value of 1.
步骤207:根据背景噪声标识的修正值和当前帧的帧能量参数、前一帧的全带背景噪声能量,得到当前帧的背景噪声能量;所述当前帧的背景噪声能量用于下一帧信噪比参数计算。Step 207: Obtain the background noise energy of the current frame according to the correction value of the background noise flag, the frame energy parameter of the current frame, and the full-band background noise energy of the previous frame; the background noise energy of the current frame is used for the next frame signal Noise ratio parameter calculation.
根据背景噪声标识判断是否进行背景噪声更新,如果背景噪声标识为1,则根据估计得到全带背景噪声能量与当前帧信号的能量的比值进行背景噪声更新。背景噪声能量估计包括子带背景噪声能量估计和全带背景噪声能量估计。Whether to update the background noise is judged according to the background noise flag. If the background noise flag is 1, the background noise is updated according to the ratio of the estimated full-band background noise energy to the energy of the current frame signal. Background noise energy estimation includes sub-band background noise energy estimation and full-band background noise energy estimation.
a,子带背景噪声能量估计方程式如下:a, The subband background noise energy estimation equation is as follows:
Esb2_bg(k)=Esb2_bg_pre(k)·αbg_e+Esb2_bg(k)·(1-αbg_e);0≤k<num_sbE sb2_bg (k) = E sb2_bg_pre (k) · α bg_e + E sb2_bg (k) · (1-α bg_e ); 0≤k<num_sb
其中num_sb是频域子带的个数,Esb2_bg_pre(k)表示前一帧第k个信噪比子带的子带背景噪声能量。Wherein num_sb is the number of frequency domain subbands, E sb2_bg_pre (k) represents the subband background noise energy of the kth SNR subband of the previous frame.
αbg_e是背景噪声更新因子,其值由前一帧的全带背景噪声能量和当前帧能量参数决定。计算过程如下:α bg_e is the background noise update factor, and its value is determined by the energy of the full-band background noise of the previous frame and the energy parameters of the current frame. The calculation process is as follows:
如果上一帧全带背景背景噪声能量Et_bg小于当前帧的帧能量参数Et1,则取值0.96,否则取值0.95。If the full-band background background noise energy E t_bg of the last frame is smaller than the frame energy parameter E t1 of the current frame, the value is 0.96; otherwise, the value is 0.95.
b,全带背景噪声能量估计:b, Full-band background noise energy estimation:
如果当前帧的背景噪声标识为1,则更新背景噪声能量累加值Et_sum和背景噪声能量累计帧数NEt_counter,计算方程如下:If the background noise flag of the current frame is 1, then update the background noise energy accumulation value E t_sum and the background noise energy accumulation frame number N Et_counter , the calculation equation is as follows:
Et_sum=Et_sum_-1+Et1;E t_sum = E t_sum_-1 + E t1 ;
NEt_counter=NEt_counter_-1+1;N Et_counter = N Et_counter_-1 +1;
其中Et_sum_-1为前一帧的背景噪声能量累加值,NEt_counter_-1为前一帧计算得到的背景噪声能量累计帧数。Among them, Et_sum_-1 is the background noise energy accumulated value of the previous frame, and N Et_counter_-1 is the background noise energy accumulated frame number calculated in the previous frame.
c,全带背景噪声能量由背景噪声能量累加值Et_sum和累计帧数NEt_counter的比值得到:c. The full-band background noise energy is obtained from the ratio of the background noise energy accumulation value E t_sum and the accumulated frame number N Et_counter :
判断NEt_counter是否等于64,如果NEt_counter等于64则分别将背景噪声能量累加值Et_sum和累计帧数NEt_counter乘0.75。Determine whether N Et_counter is equal to 64, and if N Et_counter is equal to 64, multiply the background noise energy accumulation value E t_sum and the accumulated frame number N Et_counter by 0.75.
d,根据调性标志、帧能量参数、全带背景噪声能量的值对子带背景噪声能量和背景噪声能量累加值进行调整。计算过程如下:d, Adjust the sub-band background noise energy and the accumulated value of the background noise energy according to the tonality flag, the frame energy parameter, and the value of the full-band background noise energy. The calculation process is as follows:
如果调性标志tonality_flag等于1并且帧能量参数Et1的值小于背景噪声能量特征参数Et_bg的值乘以一个增益系数gain,If the tonality flag tonality_flag is equal to 1 and the value of the frame energy parameter E t1 is smaller than the value of the background noise energy characteristic parameter E t_bg multiplied by a gain coefficient gain,
则,Et_sum=Et_sum·gain+delta;Esb2_bg(k)=Esb2_bg(k)·gain+delta;Then, E t_sum =E t_sum ·gain+delta; E sb2_bg (k)=E sb2_bg (k)·gain+delta;
其中,gain的取值范围为[0.3,1]。Among them, the value range of gain is [0.3, 1].
实施例1和实施例2中,根据调性标志、信噪比参数、谱重心特征参数、帧能量参数计算得到VAD判决结果的流程,如图3所示包括如下步骤:In Embodiment 1 and Embodiment 2, the process of obtaining the VAD judgment result according to the tonality flag, signal-to-noise ratio parameter, spectral center of gravity characteristic parameter, and frame energy parameter calculation includes the following steps as shown in Figure 3:
步骤301:通过前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比lt_snr;Step 301: Calculate the long-term signal-to-noise ratio lt_snr through the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated in the previous frame;
平均长时激活音信号能量Efg和平均长时背景噪声能量Ebg的计算和定义见步骤307。长时信噪比lt_snr计算方程如下:See step 307 for the calculation and definition of the average long-duration activation tone signal energy E fg and the average long-duration background noise energy E bg . The long-term signal-to-noise ratio lt_snr calculation equation is as follows:
该式中,长时信噪比lt_snr是采用对数表示的。 In this formula, the long-term signal-to-noise ratio lt_snr is expressed in logarithm.
步骤302:计算最近若干个帧的全带信噪比SNR2的平均值,得到平均全带信噪比SNR2_lt_ave;Step 302: Calculate the average value of the full-band SNR SNR2 of several recent frames to obtain the average full-band SNR SNR2_lt_ave;
计算方程如下:The calculation equation is as follows:
SNR2(n)表示当前帧往前第n帧的全带信噪比SNR2的值,F_num为计算平均值的总帧数,其取值范围为[8,64]。SNR2(n) represents the full-band SNR SNR2 value of the nth frame before the current frame, F_num is the total number of frames for calculating the average value, and its value range is [8, 64].
步骤303:根据谱重心特征参数、长时信噪比lt_snr、前面连续激活音帧个数continuous_speech_num和前面连续噪声帧个数continuous_noise_num得到VAD判决的信噪比门限snr_thr;Step 303: Obtain the signal-to-noise ratio threshold snr_thr of the VAD decision according to the characteristic parameter of the spectral center of gravity, the long-term signal-to-noise ratio lt_snr, the number of continuous active audio frames continuous_speech_num and the number of continuous noise frames before it continuous_noise_num;
具体实现步骤如下:The specific implementation steps are as follows:
首先,设置信噪比门限snr_thr的初始值,范围为[0.1,2],优选地为1.06。First, set the initial value of the SNR threshold snr_thr, the range is [0.1, 2], preferably 1.06.
其次,根据谱重心特征参数首次调整信噪比门限snr_thr的值。其步骤如下:如果谱重心特征参数sp_center[2]的值大于一个设定的门限值spc_vad_dec_thr1,则snr_thr加上一个偏置值,优先的改偏置值取0.05;否则,如果sp_center[1]大于spc_vad_dec_thr2,则snr_thr加上一个偏置值,优先的改偏置值取0.10;否则,snr_thr加上一个偏置值,优先的改偏置值取0.40;其中,门限值spc_vad_dec_thr1和spc_vad_dec_thr2取值范围为[1.2,2.5]Secondly, the value of the SNR threshold snr_thr is adjusted for the first time according to the characteristic parameter of the spectral center of gravity. The steps are as follows: if the value of the characteristic parameter sp_center[2] of the spectral center of gravity is greater than a set threshold value spc_vad_dec_thr1, then snr_thr is added with a bias value, and the preferred bias value is 0.05; otherwise, if sp_center[1] If it is greater than spc_vad_dec_thr2, add an offset value to snr_thr, and the preferred offset value is 0.10; otherwise, add an offset value to snr_thr, and the preferred offset value is 0.40; among them, the threshold values spc_vad_dec_thr1 and spc_vad_dec_thr2 take values range is [1.2, 2.5]
再次,根据前面连续激活音帧个数continuous_speech_num、前面连续噪声帧个数continuous_noise_num、平均全带信噪比SNR2_lt_ave和长时信噪比lt_snr二次调整snr_thr的值。如果前面连续语音个数continuous_speech_num大于一个设定的门限值cpn_vad_dec_thr1,则snr_thr减去0.2;否则,如果前面连续噪声个数continuous_noise_num大于一个设定的门限值cpn_vad_dec_thr2,并且SNR2_lt_ave大于一个偏置值加上长时信噪比lt_snr乘以系数lt_tsnr_scale,则snr_thr加上一个偏置值,优先的改偏置值取0.1;否则,如果continuous_noise_num大于一个设定的门限值cpn_vad_dec_thr3,则snr_thr加上一个偏置值,优先的改偏置值取0.2;否则,如果continuous_noise_num大于一个设定的门限值cpn_vad_dec_thr4,则snr_thr加上一个偏置值,优先的改偏置值取0.1。其中,门限值cpn_vad_dec_thr1,cpn_vad_dec_thr2,cpn_vad_dec_thr3,cpn_vad_dec_thr4取值范围为[2,500],系数lt_tsnr_scale取值范围为[0,2]。跳过本步骤,直接进入最后一步,也可实现本发明。Again, the value of snr_thr is adjusted twice according to the number of continuous active sound frames continuous_speech_num, the number of continuous noise frames continuous_noise_num, the average full-band SNR SNR2_lt_ave, and the long-term SNR lt_snr. If the number of continuous speech continuous_speech_num is greater than a set threshold value cpn_vad_dec_thr1, then subtract 0.2 from snr_thr; otherwise, if the number of continuous noises in front of continuous_noise_num is greater than a set threshold value cpn_vad_dec_thr2, and SNR2_lt_ave is greater than a bias value plus Multiply the long-term signal-to-noise ratio lt_snr by the coefficient lt_tsnr_scale, then add an offset value to snr_thr, and the preferred offset value is 0.1; otherwise, if continuous_noise_num is greater than a set threshold value cpn_vad_dec_thr3, add an offset value to snr_thr Set value, the preferred offset value is 0.2; otherwise, if continuous_noise_num is greater than a set threshold value cpn_vad_dec_thr4, then an offset value is added to snr_thr, and the preferred offset value is 0.1. Among them, the value range of the threshold values cpn_vad_dec_thr1, cpn_vad_dec_thr2, cpn_vad_dec_thr3, and cpn_vad_dec_thr4 is [2, 500], and the value range of the coefficient lt_tsnr_scale is [0, 2]. Skip this step, directly enter the last step, also can realize the present invention.
最后,根据长时信噪比lt_snr的值再对信噪比门限snr_thr进行最终调整,得到当前帧的信噪比门限snr_thr。Finally, according to the value of the long-term SNR lt_snr, the SNR threshold snr_thr is finally adjusted to obtain the SNR threshold snr_thr of the current frame.
修正方程如下:The correction equation is as follows:
snr_thr=snr_thr+(lt_tsnr-thr_offset)·thr_scale;snr_thr=snr_thr+(lt_tsnr-thr_offset) thr_scale;
其中,thr_offset为一个偏置值,其取值范围为[0.5,3];thr_scale为一个增益系数,其取值范围为[0.1,1]。Wherein, thr_offset is an offset value, and its value range is [0.5, 3]; thr_scale is a gain coefficient, and its value range is [0.1, 1].
步骤304:根据VAD的判决门限snr_thr和当前帧计算得到的信噪比参数SNR1、SNR2计算得到初始的VAD判决;Step 304: Calculate the initial VAD decision according to the VAD decision threshold snr_thr and the signal-to-noise ratio parameters SNR1 and SNR2 calculated by the current frame;
计算过程如下:The calculation process is as follows:
如果SNR1大于判决门限snr_thr,则判断当前帧为激活音帧,用VAD标志vad_flag的值来指示当前帧是否为激活音帧,本实施例中用值1表示当前帧为激活音帧,0表示当前帧为非激活音帧。否则,判断当前帧为非激活音帧,VAD标志Vad_flag的值置0。If SNR1 is greater than the decision threshold snr_thr, then judge that the current frame is an active audio frame, and use the value of the VAD sign vad_flag to indicate whether the current frame is an active audio frame. In this embodiment, the value 1 is used to indicate that the current frame is an active audio frame, and 0 indicates that the current frame is an active audio frame. Frames are inactive audio frames. Otherwise, it is determined that the current frame is an inactive audio frame, and the value of the VAD flag Vad_flag is set to 0.
如果SNR2大于一个设定的门限值snr2_thr,则判断当前帧为激活音帧,VAD标志vad_flag的值置1。其中,snr2_thr的取值范围为[1.2,5.0]If the SNR2 is greater than a set threshold snr2_thr, it is judged that the current frame is an active audio frame, and the value of the VAD flag vad_flag is set to 1. Among them, the value range of snr2_thr is [1.2, 5.0]
步骤305:根据调性标志、平均全带信噪比SNR2_lt_ave、谱重心和长时信噪比lt_snr对VAD的判决结果进行修正;Step 305: Correct the decision result of VAD according to the tonality flag, the average full-band SNR SNR2_lt_ave, the center of gravity of the spectrum, and the long-term SNR lt_snr;
具体步骤如下:Specific steps are as follows:
如果调性标志指示当前帧为调性信号,即tonality_flag为1,则判断当前帧是激活音信号,vad_flag标志置1。If the tonality flag indicates that the current frame is a tonality signal, that is, the tonality_flag is 1, then it is judged that the current frame is an active tone signal, and the vad_flag flag is set to 1.
如果平均全带信噪比SNR2_lt_ave大于一个设定的门限SNR2_lt_ave_t_thr1加上长时信噪比lt_snr乘于系数lt_tsnr_tscale,则判断当前帧为激活音帧,vad_flag标志置1。If the average full-band signal-to-noise ratio SNR2_lt_ave is greater than a set threshold SNR2_lt_ave_t_thr1 plus the long-term signal-to-noise ratio lt_snr multiplied by the coefficient lt_tsnr_tscale, it is judged that the current frame is an active audio frame, and the vad_flag flag is set to 1.
其中,本实施例SNR2_lt_ave_thr1的取值范围为[1,4],lt_tsnr_tscale的取值范围为[0.1,0.6]。Wherein, the value range of SNR2_lt_ave_thr1 in this embodiment is [1, 4], and the value range of lt_tsnr_tscale is [0.1, 0.6].
如果平均全带信噪比SNR2_lt_ave大于一个设定的门限SNR2_lt_ave_t_thr2,并且谱重心特征参数sp_center[2]大于一个设定的门限sp_center_t_thr1和长时信噪比lt_snr小于一个设定的门限lt_tsnr_t_thr1,则判断当前帧为激活音帧,vad_f1ag标志置1。其中,SNR2_lt_ave_t_thr2的取值范围为[1.0,2.5],sp_center_t_thr1的取值范围为[2.0,4.0],lt_tsnr_t_thr1的取值范围为[2.5,5.0]。If the average full-band SNR SNR2_lt_ave is greater than a set threshold SNR2_lt_ave_t_thr2, and the spectral center of gravity characteristic parameter sp_center[2] is greater than a set threshold sp_center_t_thr1 and the long-term SNR lt_snr is less than a set threshold lt_tsnr_t_thr1, then judge the current The frame is an active audio frame, and the vad_f1ag flag is set to 1. Wherein, the value range of SNR2_lt_ave_t_thr2 is [1.0, 2.5], the value range of sp_center_t_thr1 is [2.0, 4.0], and the value range of lt_tsnr_t_thr1 is [2.5, 5.0].
如果SNR2_lt_ave大于一个设定的门限SNR2_lt_ave_t_thr3,并且谱重心特征参数sp_center[2]大于一个设定的门限sp_center_t_thr2和长时信噪比lt_snr小于一个设定的门限lt_tsnr_t_thr2,则判断当前帧为激活音帧,vad_flag标志置1。其中,SNR2_lt_ave_t_thr3的取值范围为[0.8,2.0],sp_center_t_thr2的取值范围为[2.0,4.0],lt_tsnr_t_thr2的取值范围为[2.5,5.0]。If SNR2_lt_ave is greater than a set threshold SNR2_lt_ave_t_thr3, and the spectral center of gravity feature parameter sp_center[2] is greater than a set threshold sp_center_t_thr2 and the long-term signal-to-noise ratio lt_snr is less than a set threshold lt_tsnr_t_thr2, then it is judged that the current frame is an active audio frame, The vad_flag flag is set to 1. Wherein, the value range of SNR2_lt_ave_t_thr3 is [0.8, 2.0], the value range of sp_center_t_thr2 is [2.0, 4.0], and the value range of lt_tsnr_t_thr2 is [2.5, 5.0].
如果SNR2_lt_ave大于一个设定的门限SNR2_lt_ave_t_thr4,并且谱重心特征参数sp_center[2]大于一个设定的门限sp_center_t_thr3和长时信噪比lt_snr小于一个设定的门限lt_tsnr_t_thr3,则判断当前帧为激活音帧,vad_flag标志置1。其中,SNR2_lt_ave_t_thr4的取值范围为[0.6,2.0],sp_center_t_thr3的取值范围为[3.0,6.0],lt_tsnr_t_thr3的取值范围为[2.5,5.0]。If SNR2_lt_ave is greater than a set threshold SNR2_lt_ave_t_thr4, and the spectral center of gravity characteristic parameter sp_center[2] is greater than a set threshold sp_center_t_thr3 and the long-term signal-to-noise ratio lt_snr is less than a set threshold lt_tsnr_t_thr3, then it is judged that the current frame is an active audio frame, The vad_flag flag is set to 1. Wherein, the value range of SNR2_lt_ave_t_thr4 is [0.6, 2.0], the value range of sp_center_t_thr3 is [3.0, 6.0], and the value range of lt_tsnr_t_thr3 is [2.5, 5.0].
步骤306:根据前面若干帧的判决结果、长时信噪比lt_snr、平均全带信噪比SNR2_lt_ave、当前帧的信噪比参数和当前帧的VAD判决结果,修正激活音保持帧数;Step 306: according to the judgment results of several previous frames, the long-term signal-to-noise ratio lt_snr, the average full-band signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of the current frame and the VAD judgment result of the current frame, modify the number of frames to keep the active tone;
具体计算步骤如下:The specific calculation steps are as follows:
当前激活音保持帧数修正的前提条件是激活音标志指示当前帧为激活音帧,若不符合该条件,不修正当前激活音保持帧数num_speech_hangover的值,直接进入步骤307。The precondition for modifying the number of frames to hold the current active sound is that the active sound flag indicates that the current frame is an active sound frame. If this condition is not met, the value of the number of frames to hold the current active sound num_speech_hangover is not corrected, and directly proceeds to step 307 .
激活音保持帧数修正步骤如下:The steps to correct the number of frames for activation sound retention are as follows:
如果前面连续语音帧数continuous_speech_num小于一个设定的门限值continuous_speech_num_thr1,并且lt_tsnr小于一个设定的门限值lt_tsnr_h_thr1,则当前激活音保持帧数num_speech_hangover等于最小连续激活音帧数减去前面连续语音帧数continuous_speech_num。否则,如果SNR2_lt_ave大于一个设定的门限值SNR2_lt_ave_thr1,并且前面连续语音帧个数continuous_speech_num大于一个设定的门限值continuous_speech_num_thr2,则根据长时信噪比lt_tsnr的大小设置激活音保持帧数num_speech_hangover的值。否则,不修正当前激活音保持帧数num_speech_hangover的值。其中本实施例中最小连续激活音帧数取值为8,其可以在[6,20]之间取值。If the previous continuous speech frame number continuous_speech_num is less than a set threshold value continuous_speech_num_thr1, and lt_tsnr is less than a set threshold value lt_tsnr_h_thr1, then the number of currently active tone hold frames num_speech_hangover is equal to the minimum number of continuous active sound frames minus the previous continuous speech frame Number continuous_speech_num. Otherwise, if SNR2_lt_ave is greater than a set threshold value SNR2_lt_ave_thr1, and the number of continuous speech frames continuous_speech_num is greater than a set threshold value continuous_speech_num_thr2, then set the number of active tone hold frames num_speech_hangover according to the size of the long-term signal-to-noise ratio lt_tsnr value. Otherwise, the value of the number of frames num_speech_hangover for the currently active voice will not be modified. Wherein the value of the minimum number of continuous active sound frames in this embodiment is 8, which can be a value between [6, 20].
具体步骤如下:Specific steps are as follows:
如果长时信噪比lt_snr大于2.6,则num_speech_hangover的值为3;否则,如果长时信噪比lt_snr大于1.6,则num_speech_hangover的值为4;否则,num_speech_hangover的值为5。If the long-term signal-to-noise ratio lt_snr is greater than 2.6, the value of num_speech_hangover is 3; otherwise, if the long-term signal-to-noise ratio lt_snr is greater than 1.6, the value of num_speech_hangover is 4; otherwise, the value of num_speech_hangover is 5.
步骤307:根据当前帧的判决结果和激活音保持帧数num_speech_hangover添加激活音保持,得到当前帧的VAD判决结果。Step 307: Add the active tone hold according to the decision result of the current frame and the number of active tone hold frames num_speech_hangover to obtain the VAD decision result of the current frame.
其方法为:Its method is:
如果当前帧被判断为非激活音,即激活音标志为0,并且激活音保持帧数num_speech_hangover大于0,添加激活音保持,即设置激活音标志为1,,并且将num_speech_hangover的值减1。If the current frame is judged as an inactive tone, that is, the active tone flag is 0, and the number of active tone hold frames num_speech_hangover is greater than 0, add the active tone hold, that is, set the active tone flag to 1, and decrease the value of num_speech_hangover by 1.
得到当前帧的最终的VAD判决结果。Obtain the final VAD decision result of the current frame.
优选地,步骤304之后,还包括根据VAD初始判决结果,计算平均长时激活音信号能量Efg;步骤307之后,还包括,根据VAD判决结果计算平均长时背景噪声能量Ebg,计算值用于下一帧VAD判决。Preferably, after step 304, it also includes calculating the average long-duration activation tone signal energy E fg according to the VAD initial decision result; after step 307, it also includes calculating the average long-duration background noise energy E bg according to the VAD decision result, and the calculated value is used In the next frame VAD decision.
平均长时激活音信号能量Efg具体计算过程如下:The specific calculation process of the average long-duration activation tone signal energy E fg is as follows:
a),如果VAD初始判决结果指示当前帧为激活音帧,即VAD标志的值为1,并且Et1大于Ebg的若干倍,本实施例取6倍,则更新平均长时激活音能量累加值fg_energy和平均长时激活音能量累加帧数fg_energy_count。更新方法为fg_energy加上Et1得到新的fg_energy。fg_energy_count加1得到新的fg_energy_count。a), if the VAD initial decision result indicates that the current frame is an active sound frame, that is, the value of the VAD flag is 1, and E t1 is greater than several times of E bg , which is 6 times in this embodiment, then the average long-term active sound energy accumulation is updated The value fg_energy and the average long-duration activation sound energy accumulation frame number fg_energy_count. The update method is to add E t1 to fg_energy to get new fg_energy. Add 1 to fg_energy_count to get new fg_energy_count.
b),为了保证平均长时激活音信号能量能反映最新的激活音信号能量,如果平均长时激活音能量累加帧数值等于某一个设定值fg_max_frame_num,则累加帧数和累加值同时乘上一个衰减系数attenu_coef1。本实施例中fg_max_frame_num取值512,attenu_coef1取值为0.75。b) In order to ensure that the average long-duration activation tone signal energy can reflect the latest activation tone signal energy, if the accumulated frame value of the average long-duration activation tone energy is equal to a certain set value fg_max_frame_num, the accumulated frame number and the accumulated value are multiplied by a Attenuation coefficient attenu_coef1. In this embodiment, the value of fg_max_frame_num is 512, and the value of attenu_coef1 is 0.75.
c),由平均长时激活音能量累加值fg_energy除以平均长时激活音能量累加帧数得到平均长时激活音信号能量,计算方程式如下:c), the average long-duration activation tone signal energy is obtained by dividing the average long-duration activation tone energy accumulation value fg_energy by the average long-duration activation tone energy accumulation frame number, and the calculation formula is as follows:
平均长时背景噪声能量Ebg的计算方法为:The calculation method of the average long-term background noise energy E bg is:
假设bg_energy_count为背景噪声能量累加帧数,用于记录最近背景噪声能量的累加值包含了多少帧的能量。bg_energy为最近背景噪声能量的累加值。Assume that bg_energy_count is the number of accumulated frames of background noise energy, which is used to record how many frames of energy are included in the accumulated value of the latest background noise energy. bg_energy is the cumulative value of the latest background noise energy.
a),如果当前帧判断为非激活音帧,则VAD标志的值为0,并且SNR2小于1.0,则更新背景噪声能量累加值bg_energy和背景噪声能量累加帧数bg_energy_count。更新方法为背景噪声能量累加值bg_energy加上Et1得到新的背景噪声能量累加值bg_energy。背景噪声能量累加帧数bg_energy_count加1得到新的背景噪声能量累加帧数bg_energy_count。a) If the current frame is determined to be an inactive audio frame, the value of the VAD flag is 0, and the SNR2 is less than 1.0, then update the background noise energy accumulation value bg_energy and the background noise energy accumulation frame number bg_energy_count. The update method is to add the background noise energy accumulation value bg_energy to E t1 to obtain a new background noise energy accumulation value bg_energy. Add 1 to the background noise energy accumulation frame number bg_energy_count to get the new background noise energy accumulation frame number bg_energy_count.
b),如果背景噪声能量累加帧数bg_energy_count为等于平均长时背景噪声能量计算的最大计数帧数,则累加帧数和累加值同时乘上衰减系数attenu_coef2。其中,本实施例平均长时背景噪声能量计算的最大计数帧数为512,衰减系数attenu_coef2等于0.75。b) If the background noise energy accumulation frame number bg_energy_count is equal to the maximum count frame number of the average long-term background noise energy calculation, then the accumulation frame number and the accumulation value are multiplied by the attenuation coefficient attenu_coef2 at the same time. Wherein, in this embodiment, the maximum number of counted frames for calculating the average long-term background noise energy is 512, and the attenuation coefficient attenu_coef2 is equal to 0.75.
c),由背景噪声能量累加值bg_energy除于背景噪声能量累加帧数得到平均长时背景噪声能量计算方程式如下:c), by dividing the background noise energy accumulation value bg_energy by the number of background noise energy accumulation frames to obtain the average long-term background noise energy calculation equation is as follows:
为了实现上述激活音检测方法实施例1和2,本发明还提供了一种激活音检测(VAD)装置实施例1,如图4所示,该装置包括:In order to realize the above-mentioned activation tone detection method embodiments 1 and 2, the present invention also provides an activation tone detection (VAD) device embodiment 1, as shown in Figure 4, the device includes:
滤波器组,用于获得当前帧的子带信号;A filter bank for obtaining subband signals of the current frame;
频谱幅值计算单元,用于获得当前帧的频谱幅值;A spectrum amplitude calculation unit, configured to obtain the spectrum amplitude of the current frame;
特征参数获取单元,用于根据子带信号计算得到当前帧的帧能量参数和谱重心特征参数的值;A feature parameter acquisition unit, configured to calculate the values of the frame energy parameter and the spectral center of gravity feature parameter of the current frame according to the subband signal;
信噪比计算单元,用于根据前一帧估计得到的背景噪声能量、当前帧的帧能量参数及信噪比子带能量计算得到当前帧的信噪比参数;A signal-to-noise ratio calculation unit, which is used to calculate the signal-to-noise ratio parameter of the current frame according to the background noise energy estimated in the previous frame, the frame energy parameter of the current frame, and the signal-to-noise ratio sub-band energy;
VAD判决单元,用于根据调性标志、信噪比参数、谱重心特征参数、帧能量参数计算得到VAD判决结果。The VAD judgment unit is used to calculate and obtain the VAD judgment result according to the tonality flag, the signal-to-noise ratio parameter, the characteristic parameter of the center of gravity of the spectrum, and the frame energy parameter.
对应于方法实施例2,所述特征参数获取单元,还用于根据子带信号计算得到时域稳定度特征参数的值,用于根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;;Corresponding to method embodiment 2, the characteristic parameter acquisition unit is also used to calculate the value of the time domain stability characteristic parameter according to the subband signal, and to obtain the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectrum amplitude the value of;
各特征参数可采用现有方法获得,也可采用以下方法获得:Each characteristic parameter can be obtained by existing methods, or by the following methods:
所述帧能量参数是各个子带信号能量的加权叠加值或直接叠加值;The frame energy parameter is a weighted superposition value or a direct superposition value of each subband signal energy;
所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值;The characteristic parameter of the spectral center of gravity is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy, or the value obtained by smoothing the ratio;
所述时域稳定度特征参数是幅值叠加值的方差和幅值叠加值平方的期望的比值,或该比值乘上一个系数;The time-domain stability characteristic parameter is the expected ratio of the variance of the amplitude superposition value to the square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
所述谱平坦度特征参数是某些频谱幅值的几何平均数和算术平均数的比值,或该比值乘上一个系数;The characteristic parameter of spectral flatness is the ratio of the geometric mean and the arithmetic mean of certain spectral amplitudes, or the ratio is multiplied by a coefficient;
调性特征参数是通过计算前后两帧信号的帧内频谱差分系数的相关值得到,或继续对该相关值进行平滑滤波得到。The tonality feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficient of the two frame signals before and after, or by continuing to smooth and filter the correlation value.
如图5所示,本发明激活音检测(VAD)装置实施例2,与实施例1不同的是,所述装置还包括标志计算单元和背景噪声能量处理单元,其中:As shown in FIG. 5, Embodiment 2 of the activation sound detection (VAD) device of the present invention is different from Embodiment 1 in that the device also includes a sign calculation unit and a background noise energy processing unit, wherein:
标志计算单元,用于根据当前帧帧能量参数、谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数计算得到当前帧的调性标志:The sign calculation unit is used to calculate the tonality sign of the current frame according to the frame energy parameter of the current frame, the characteristic parameter of the spectral center of gravity, the characteristic parameter of time domain stability, the characteristic parameter of spectral flatness, and the characteristic parameter of the tonality:
背景噪声能量处理单元,其包括:A background noise energy processing unit comprising:
标识计算模块,用于根据当前帧帧能量参数、谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数计算得到当前帧的初始背景噪声标识;The identification calculation module is used to calculate and obtain the initial background noise identification of the current frame according to the current frame frame energy parameters, spectral center of gravity characteristic parameters, time domain stability characteristic parameters, spectral flatness characteristic parameters, and tonality characteristic parameters;
标识修正模块,用于根据当前帧VAD的判决结果、调性特征参数、信噪比参数、调性标志、时域稳定度特征参数对初始背景噪声标识进行修正;The identification correction module is used to modify the initial background noise identification according to the decision result of the current frame VAD, the tonality characteristic parameter, the signal-to-noise ratio parameter, the tonality sign, and the time domain stability characteristic parameter;
背景噪声能量获取模块,用于根据背景噪声标识的修正值和当前帧的帧能量参数、前一帧的全带背景噪声能量,得到当前帧的背景噪声能量,所述当前帧的背景噪声能量用于下一帧信噪比参数计算。The background noise energy acquisition module is used to obtain the background noise energy of the current frame according to the correction value of the background noise flag, the frame energy parameter of the current frame, and the full-band background noise energy of the previous frame, and the background noise energy of the current frame is used Calculate the SNR parameter in the next frame.
对应于方法实施例1和2,如图6所示,所述VAD判决单元包括:Corresponding to method embodiments 1 and 2, as shown in Figure 6, the VAD decision unit includes:
长时信噪比计算模块,用于通过前一帧计算得到的平均长时激活音信The long-term signal-to-noise ratio calculation module is used for the average long-term activation signal calculated by the previous frame
号能量和平均长时背景噪声能量的比值,计算得到长时信噪比lt_snr;The ratio of the signal energy to the average long-term background noise energy is calculated to obtain the long-term signal-to-noise ratio lt_snr;
平均全带信噪比计算模块,用于计算最近若干个帧的全带信噪比SNR2的平均值,得到平均全带信噪比SNR2_lt_ave;The average full-band signal-to-noise ratio calculation module is used to calculate the average value of the full-band signal-to-noise ratio SNR2 of several recent frames to obtain the average full-band signal-to-noise ratio SNR2_lt_ave;
信噪比门限计算模块,用于根据谱重心特征参数、长时信噪比lt_snr、前面连续激活音帧个数continuous_speech_num和前面连续噪声帧个数continuous_noise_num得到VAD判决的信噪比门限snr_thr;The signal-to-noise ratio threshold calculation module is used to obtain the signal-to-noise ratio threshold snr_thr of the VAD decision according to the characteristic parameter of the spectral center of gravity, the long-term signal-to-noise ratio lt_snr, the number of continuous activation sound frames continuous_speech_num and the number of continuous noise frames in front of the continuous_noise_num;
初始VAD判决模块,用于根据VAD的判决门限snr_thr和当前帧计算得到的信噪比参数SNR1、SNR2计算得到初始的VAD判决;The initial VAD decision module is used to calculate the initial VAD decision according to the signal-to-noise ratio parameters SNR1 and SNR2 calculated according to the decision threshold snr_thr of VAD and the current frame;
VAD结果修正模块,根据调性标志、平均全带信噪比SNR2_lt_ave、谱重心和长时信噪比lt_snr对VAD的判决结果进行修正;The VAD result correction module corrects the VAD judgment result according to the tonality mark, the average full-band signal-to-noise ratio SNR2_lt_ave, the center of gravity of the spectrum, and the long-term signal-to-noise ratio lt_snr;
激活音保持帧修正模块,用于根据前面若干帧的判决结果、长时信噪比lt_snr、平均全带信噪比SNR2_lt_ave、当前帧的信噪比和当前帧的VAD判决结果,修正得到激活音保持帧数;The activation tone maintenance frame correction module is used to modify and obtain the activation tone according to the judgment results of the previous frames, the long-term SNR lt_snr, the average full-band SNR SNR2_lt_ave, the SNR of the current frame, and the VAD judgment result of the current frame Keep the number of frames;
VAD判决模块,用于根据当前帧的判决结果和激活音保持帧数num_speech_hangover添加激活音保持,得到当前帧的VAD判决结果。The VAD judgment module is configured to add an active tone hold according to the decision result of the current frame and the number of active tone hold frames num_speech_hangover, to obtain the VAD decision result of the current frame.
更优选地,所述VAD判决单元还包括:能量计算模块,用于根据VAD初始判决结果,计算平均长时激活音信号能量Efg;以及根据VAD判决结果进行平均长时背景噪声能量Ebg更新,更新后的值用于下一帧VAD判决。More preferably, the VAD decision unit also includes: an energy calculation module, which is used to calculate the average long-term activation tone signal energy E fg according to the VAD initial decision result; and update the average long-term background noise energy E bg according to the VAD decision result , the updated value is used for the next frame VAD decision.
本发明还提供了一种背景噪声检测方法实施例,如图7所示,该方法包括:The present invention also provides an embodiment of a background noise detection method, as shown in FIG. 7, the method includes:
步骤701:获得当前帧的子带信号及频谱幅值;Step 701: Obtain the subband signal and spectrum amplitude of the current frame;
步骤702:根据子带信号计算得到的帧能量参数、谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;Step 702: Calculate the values of the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectrum amplitude according to the values of the frame energy parameter, spectral center of gravity characteristic parameter, and temporal stability characteristic parameter calculated from the subband signal;
优选地,所述帧能量参数是各个子带信号能量的加权叠加值或直接叠加值。Preferably, the frame energy parameter is a weighted superposition value or a direct superposition value of signal energy of each subband.
所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值。The characteristic parameter of the spectral center of gravity is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy, or a value obtained by smoothing the ratio.
所述时域稳定度参数是帧能量幅值的方差和幅值叠加值平方的期望的比值,或该比值乘上一个系数。The time domain stability parameter is an expected ratio between the variance of the frame energy amplitude and the square of the amplitude superposition value, or the ratio is multiplied by a coefficient.
所述谱平坦度参数是某些频谱幅值的几何平均数和算术平均数的比值,或该比值乘上一个系数。The spectral flatness parameter is the ratio of the geometric mean and the arithmetic mean of certain spectrum amplitudes, or the ratio multiplied by a coefficient.
具体地,步骤701和步骤702可采用与上文相同的方法,在此不再赘述。Specifically, step 701 and step 702 may use the same method as above, which will not be repeated here.
步骤703:根据谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数、当前帧能量参数进行背景噪声检测,判断当前帧是否为背景噪声。Step 703: Perform background noise detection according to the spectral center of gravity characteristic parameters, time-domain stability characteristic parameters, spectral flatness characteristic parameters, tonality characteristic parameters, and current frame energy parameters, and determine whether the current frame is background noise.
优选地,判断以下任一条件成立,则判断当前帧不是噪声信号:Preferably, if any of the following conditions is judged to be true, then it is judged that the current frame is not a noise signal:
所述时域稳定度参数lt_stable_rate0大于一个设定的门限值;The time domain stability parameter lt_stable_rate0 is greater than a set threshold value;
第一区间谱重心特征参数值的平滑滤波值大于一个设定的门限值,且时域稳定度特征参数值也大于某一个设定的门限值;The smoothing filter value of the characteristic parameter value of the center of gravity of the spectrum in the first interval is greater than a set threshold value, and the time domain stability characteristic parameter value is also greater than a certain set threshold value;
调性特征参数或其平滑滤波后的值大于一个设定的门限值,且时域稳定度特征参数lt_stable_rate0值大于其设定的门限值;The tonality characteristic parameter or its smooth-filtered value is greater than a set threshold value, and the time domain stability characteristic parameter lt_stable_rate0 value is greater than its set threshold value;
各子带的谱平坦度特征参数或各自平滑滤波后的值均小于各自对应的设定的门限值;The spectral flatness characteristic parameters of each sub-band or the values after smoothing and filtering are smaller than the respective corresponding set threshold values;
或,判断帧能量参数Et1的的值大于设定的门限值E_thr1。Or, it is determined that the value of the frame energy parameter E t1 is greater than the set threshold value E_thr1.
具体地,假设当前帧为背景噪声。Specifically, it is assumed that the current frame is background noise.
本实施例通过一个背景噪声标识background_flag来指示当前帧是否是背景噪声,并约定如果判断当前帧为背景噪声,则设置背景噪声标识background_flag为1,否则设置背景噪声标识background_flag为0。In this embodiment, a background noise flag background_flag is used to indicate whether the current frame is background noise, and it is agreed that if the current frame is judged to be background noise, the background noise flag background_flag is set to 1, otherwise the background noise flag background_flag is set to 0.
根据时域稳定度特征参数、谱重心特征参数、谱平坦度特征参数、调性特征参数、当前帧能量参数检测当前帧是否为噪声信号。如果不是噪声信号,则将背景噪声标识background_flag置0。Whether the current frame is a noise signal is detected according to the time-domain stability characteristic parameter, the spectral center of gravity characteristic parameter, the spectral flatness characteristic parameter, the tonality characteristic parameter, and the current frame energy parameter. If it is not a noise signal, set the background noise flag background_flag to 0.
具体过程如下:The specific process is as follows:
判断时域稳定度参数lt_stable_rate0是否大于一个设定的门限值lt_stable_rate_thr1。如果是,则判断当前帧不是噪声信号,并将background_flag置0。本实施例门限值lt_stable_rate_thr1取值范围为[0.8,1.6];Determine whether the time domain stability parameter lt_stable_rate0 is greater than a set threshold lt_stable_rate_thr1. If yes, it is judged that the current frame is not a noise signal, and background_flag is set to 0. The value range of the threshold value lt_stable_rate_thr1 in this embodiment is [0.8, 1.6];
判断平滑谱重心特征参数值是否大于一个设定的门限值sp_center_thr1,并且时域稳定度特征参数值也大于某一个设定的门限值lt_stable_rate_thr2。如果是,则判断当前帧不是噪声信号,并将background_flag置0。sp_center_thr1的取值范围为[1.6,4];1t_stable_rate_thr2的取值范围为(0,0.1]。It is judged whether the characteristic parameter value of the center of gravity of the smooth spectrum is greater than a set threshold value sp_center_thr1, and the value of the characteristic parameter value of the time domain stability is also greater than a certain set threshold value lt_stable_rate_thr2. If yes, it is judged that the current frame is not a noise signal, and background_flag is set to 0. The value range of sp_center_thr1 is [1.6, 4]; the value range of 1t_stable_rate_thr2 is (0, 0.1].
判断调性特征参数tonality_rate2的值是否大于一个设定的门限值tonality_rate_thr1,时域稳定度特征参数lt_stable_rate0值是否大于设定的门限值lt_stable_rate_thr3,如果上述条件同时成立,则判断当前帧不是背景噪声,background_flag赋值为0。门限值tonality_rate_thr1取值范围在[0.4,0.66]。门限值lt_stable_rate_thr3的取值范围为[0.06,0.3]。Determine whether the value of the tonality characteristic parameter tonality_rate2 is greater than a set threshold value tonality_rate_thr1, whether the time domain stability characteristic parameter lt_stable_rate0 value is greater than the set threshold value lt_stable_rate_thr3, and if the above conditions are satisfied at the same time, it is judged that the current frame is not background noise , background_flag is assigned a value of 0. The value range of the threshold value tonality_rate_thr1 is [0.4, 0.66]. The value range of the threshold lt_stable_rate_thr3 is [0.06, 0.3].
判断谱平坦度特征参数sSMR[0]的值是否小于设定的门限值sSMR_thr1,判断谱平坦度特征参数sSMR[1]的值是否小于设定的门限值sSMR_thr2,判断谱平坦度特征参数sSMR[2]的值是否小于设定的sSMR_thr3。如果上述条件同时成立,则判断当前帧不是背景噪声。background_flag赋值为0。门限值sSMR_thr1、sSMR_thr2、sSMR_thr3的取值范围为[0.88,0.98]。判断平坦度特征参数sSMR[0]的值是否小于设定的门限值sSMR_thr4,判断谱平坦度特征参数sSMR[1]的值是否小于设定的门限值sSMR_thr5,判断谱平坦度特征参数sSMR[1]的值是否小于设定的门限值sSMR_thr6。如果上述任一条件成立,则判断当前帧不是背景噪声。background_flag赋值为0。sSMR_thr4、sSMR_thr5、sSMR_thr6的取值范围为[0.80,0.92]Determine whether the value of the spectral flatness characteristic parameter sSMR[0] is less than the set threshold value sSMR_thr1, determine whether the value of the spectral flatness characteristic parameter sSMR[1] is less than the set threshold value sSMR_thr2, and determine the spectral flatness characteristic parameter Whether the value of sSMR[2] is less than the set sSMR_thr3. If the above conditions are satisfied at the same time, it is determined that the current frame is not background noise. background_flag is assigned a value of 0. The value range of the threshold values sSMR_thr1, sSMR_thr2, and sSMR_thr3 is [0.88, 0.98]. Determine whether the value of the flatness characteristic parameter sSMR[0] is less than the set threshold value sSMR_thr4, determine whether the value of the spectral flatness characteristic parameter sSMR[1] is less than the set threshold value sSMR_thr5, and determine the spectral flatness characteristic parameter sSMR Whether the value of [1] is less than the set threshold sSMR_thr6. If any of the above conditions is true, it is determined that the current frame is not background noise. background_flag is assigned a value of 0. The value range of sSMR_thr4, sSMR_thr5, and sSMR_thr6 is [0.80, 0.92]
判断帧能量参数Et1的值是否大于设定的门限值E_thr1,如果上述条件成立,则判断当前帧不是背景噪声。background_flag赋值为0。E_thr1根据帧能量参数的动态范围进行取值。It is judged whether the value of the frame energy parameter E t1 is greater than the set threshold value E_thr1 , and if the above condition is satisfied, it is judged that the current frame is not background noise. background_flag is assigned a value of 0. E_thr1 takes the value according to the dynamic range of the frame energy parameter.
如果当前帧未被检测不是背景噪声,则表示当前帧为背景噪声。If the current frame is not detected as background noise, it means that the current frame is background noise.
对应于上述方法,本发明还提供了一种背景噪声检测装置,如图8所示,该装置包括:Corresponding to the above method, the present invention also provides a background noise detection device, as shown in Figure 8, the device includes:
滤波器组,用于获得当前帧的子带信号;A filter bank for obtaining subband signals of the current frame;
频谱幅值计算单元,用于获得当前帧的频谱幅值;A spectrum amplitude calculation unit, configured to obtain the spectrum amplitude of the current frame;
特征参数计算单元,用于根据子带信号计算得到的帧能量参数、谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The characteristic parameter calculation unit is used to obtain the values of the frame energy parameter, the spectral center of gravity characteristic parameter, and the time domain stability characteristic parameter calculated according to the subband signal, and obtain the values of the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectral amplitude ;
优选地,所述帧能量参数是各个子带信号能量的加权叠加值或直接叠加值。Preferably, the frame energy parameter is a weighted superposition value or a direct superposition value of signal energy of each subband.
所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值。The characteristic parameter of the spectral center of gravity is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy, or a value obtained by smoothing the ratio.
所述时域稳定度参数是帧能量幅值的方差和幅值叠加值平方的期望的比值,或该比值乘上一个系数。The time domain stability parameter is an expected ratio between the variance of the frame energy amplitude and the square of the amplitude superposition value, or the ratio is multiplied by a coefficient.
所述谱平坦度参数是某些频谱幅值的几何平均数和算术平均数的比值,或该比值乘上一个系数。The spectral flatness parameter is the ratio of the geometric mean and the arithmetic mean of certain spectrum amplitudes, or the ratio multiplied by a coefficient.
背景噪声判断单元,用于根据谱重心特征参数、时域稳定度特征参数、谱平坦度特征参数、调性特征参数、当前帧能量参数进行背景噪声检测,判断当前帧是否为背景噪声。The background noise judging unit is used to detect the background noise according to the spectral center of gravity characteristic parameter, time domain stability characteristic parameter, spectral flatness characteristic parameter, tonality characteristic parameter, and current frame energy parameter, and judge whether the current frame is background noise.
优选地,所述背景噪声判断单元判断以下任一条件成立,则判断当前帧不是噪声信号:Preferably, the background noise judging unit judges that any of the following conditions is true, then judges that the current frame is not a noise signal:
所述时域稳定度参数lt_stable_rate0大于一个设定的门限值;The time domain stability parameter lt_stable_rate0 is greater than a set threshold value;
第一区间谱重心特征参数值的平滑滤波值大于一个设定的门限值,且时域稳定度特征参数值也大于某一个设定的门限值;The smoothing filter value of the characteristic parameter value of the center of gravity of the spectrum in the first interval is greater than a set threshold value, and the time domain stability characteristic parameter value is also greater than a certain set threshold value;
调性特征参数或其平滑滤波后的值大于一个设定的门限值,且时域稳定度特征参数lt_stable_rate0值大于其设定的门限值;The tonality characteristic parameter or its smooth-filtered value is greater than a set threshold value, and the time domain stability characteristic parameter lt_stable_rate0 value is greater than its set threshold value;
各子带的谱平坦度特征参数或各自平滑滤波后的值均小于各自对应的设定的门限值;The spectral flatness characteristic parameters of each sub-band or the values after smoothing and filtering are smaller than the respective corresponding set threshold values;
或,判断帧能量参数Et1的的值大于设定的门限值E_thr1。Or, it is determined that the value of the frame energy parameter E t1 is greater than the set threshold value E_thr1.
本发明还提供了一种调性信号检测方法,如图9所示,方法包括:The present invention also provides a method for detecting a tone signal, as shown in FIG. 9 , the method includes:
步骤901:获得当前帧的子带信号及频谱幅值;Step 901: Obtain the subband signal and spectrum amplitude of the current frame;
步骤902:根据子带信号计算得到当前帧的谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;Step 902: Calculate and obtain the values of the spectral center of gravity characteristic parameter and time domain stability characteristic parameter of the current frame according to the sub-band signal, and obtain the values of the spectral flatness characteristic parameter and tonality characteristic parameter according to the spectrum amplitude;
优选地,所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值;所述时域稳定度特征参数是幅值叠加值的方差和幅值叠加值平方的期望的比值,或该比值乘上一个系数;Preferably, the characteristic parameter of the center of gravity of the spectrum is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy, or the value obtained by smoothing and filtering the ratio; the characteristic parameter of the time domain stability is the amplitude The expected ratio of the variance of the overlay to the square of the magnitude overlay, or multiply the ratio by a factor;
所述谱平坦度特征参数是某些频谱幅值的几何平均数和算术平均数的比值,或该比值乘上一个系数;The characteristic parameter of spectral flatness is the ratio of the geometric mean and the arithmetic mean of certain spectral amplitudes, or the ratio is multiplied by a coefficient;
调性特征参数是通过计算前后两帧信号的帧内频谱差分系数的相关值得到,或继续对该相关值进行平滑滤波得到。The tonality feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficient of the two frame signals before and after, or by continuing to smooth and filter the correlation value.
步骤903:根据调性特征参数、时域稳定度特征参数、谱平坦度特征参数、谱重心特征参数判断当前帧是否为调性信号。Step 903: Determine whether the current frame is a tonality signal according to the characteristic parameters of tonality, temporal stability, spectral flatness, and spectral center of gravity.
步骤903判断是否为调性信号时,执行以下操作:Step 903: When judging whether it is a tone signal, perform the following operations:
A)假设当前帧信号为非调性信号,并用一个调性帧标志tonality_frame来指示当前帧是否为调性帧。A) Assume that the current frame signal is an atonality signal, and use a tonality frame flag tonality_frame to indicate whether the current frame is a tonality frame.
本实施例中tonality_frame的值为1表示当前帧为调性帧,0表示当前帧为非调性帧;In this embodiment, the value of tonality_frame is 1, indicating that the current frame is a tonal frame, and 0, indicating that the current frame is an atonality frame;
B)判断调性特征参数tonality_ratel或其平滑滤波后tonality_rate2的值是否大于对应的设定的门限值tonality_decision_thr1或tonality_decision_thr2,如果上述条件有一个成立则执行步骤C),否则执行步骤D);B) judging whether the value of tonality_ratel or tonality_rate2 after smoothing and filtering is greater than the corresponding set threshold value tonality_decision_thr1 or tonality_decision_thr2, if one of the above conditions is established, execute step C), otherwise execute step D);
其中,tonality_decision_thr1的取值范围为[0.5,0.7],tonality_ratel的取值范围为[0.7,0.99]。Among them, the value range of tonality_decision_thr1 is [0.5, 0.7], and the value range of tonality_ratel is [0.7, 0.99].
C如果时域稳定度特征参数值lt_stable_rate0小于一个设定的门限值lt_stable_decision_thr1;谱重心特征参数值sp_center[1]大于一个设定的门限值spc_decision_thr1,且各子带的谱平坦度特征参数均小于各自对应的预设的门限值,具体地,谱平坦度特征参数sSMR[0]小于一个设定的门限值sSMF_decision_thr1或sSMR[1]小于一个设定的门限值sSMF_decision_thr2或sSMR[2]小于一个设定的门限值sSMF_decision_thr3;则判断当前帧为调性帧,设置调性帧标志tonality_frame的值为1,否则判断为非调性帧,设置调性帧标志tonality_frame的值为0。并继续执行步骤D。C If the time domain stability characteristic parameter value lt_stable_rate0 is less than a set threshold value lt_stable_decision_thr1; the spectral center of gravity characteristic parameter value sp_center[1] is greater than a set threshold value spc_decision_thr1, and the spectral flatness characteristic parameters of each subband are equal to Less than their corresponding preset threshold values, specifically, the spectral flatness characteristic parameter sSMR[0] is less than a set threshold value sSMF_decision_thr1 or sSMR[1] is less than a set threshold value sSMF_decision_thr2 or sSMR[2 ] is less than a set threshold value sSMF_decision_thr3; then it is judged that the current frame is a tonal frame, and the value of the tonality_frame flag is set to 1; otherwise, it is judged as a non-tonality frame, and the value of the tonality frame flag is set to 0. And proceed to step D.
其中,门限值lt_stable_decision_thr1的取值范围为[0.01,0.25],spc_decision_thr1为[1.0,1.8],sSMF_decision_thr1为[0.6,0.9],sSMF_decision_thr2[0.6,0.9],sSMF_decision_thr3[0.7,0.98]。Among them, the value range of the threshold lt_stable_decision_thr1 is [0.01, 0.25], spc_decision_thr1 is [1.0, 1.8], sSMF_decision_thr1 is [0.6, 0.9], sSMF_decision_thr2[0.6, 0.9], sSMF_decision_thr3[0.7, 0.98].
D)根据调性帧标志tonality_frame对调性程度特征参数tonality_degree进行更新,其中调性程度参数tonality_degree初始值在激活音检测装置开始工作时进行设置,取值范围为[0,1]。不同的情况下,调性程度特征参数tonality_degree计算方法不同:D) Update the tonality degree characteristic parameter tonality_degree according to the tonality frame flag tonality_frame, wherein the initial value of the tonality degree parameter tonality_degree is set when the activation sound detection device starts to work, and the value range is [0, 1]. In different cases, the calculation method of tonality_degree characteristic parameter is different:
如果当前的调性帧标志指示当前帧为调性帧,则采用以下方程式对调性程度特征参数tonality_degree进行更新:If the current tonality frame flag indicates that the current frame is a tonality frame, the tonality_degree characteristic parameter tonality_degree is updated using the following equation:
tonality_degree=tonality_degree-1·td_scale_A+td_scale_B;tonality_degree = tonality_degree -1 td_scale_A+td_scale_B;
其中,tonality_degree-1为前一帧的调性程度特征参数。其初始值取值范围为[0,1]。td_scale_A为衰减系数,其取值范围为[0,1];td_scale_B为累加系数,其取值范围为[0,1]。Among them, tonality_degree -1 is the characteristic parameter of the tonality degree of the previous frame. Its initial value ranges from [0, 1]. td_scale_A is the attenuation coefficient, and its value range is [0, 1]; td_scale_B is the accumulation coefficient, and its value range is [0, 1].
E)根据更新后的调性程度特征参数tonality_degree判断当前帧是否为调性信号,并设置调性标志tonality_flag的值。E) Judging whether the current frame is a tonality signal according to the updated tonality degree characteristic parameter tonality_degree, and setting the value of the tonality flag tonality_flag.
具体地,若调性程度特征参数tonality_degree大于某个设定的门限值,则判断当前帧为调性信号,否则,判断当前帧为非调性信号。Specifically, if the tonality_degree characteristic parameter tonality_degree is greater than a certain threshold value, it is judged that the current frame is a tonal signal; otherwise, it is judged that the current frame is a non-tonality signal.
对应于前述调性信号检测方法,本发明还提供了一种调性信号检测装置,如图10所示,该检测装置包括:Corresponding to the aforementioned tonality signal detection method, the present invention also provides a tonality signal detection device, as shown in Figure 10, the detection device includes:
滤波器组,用于获得当前帧的子带信号;A filter bank for obtaining subband signals of the current frame;
频谱幅值计算单元,用于获得当前帧的频谱幅值;A spectrum amplitude calculation unit, configured to obtain the spectrum amplitude of the current frame;
特征参数计算单元,用于根据子带信号计算得到当前的谱重心特征参数、时域稳定度特征参数的值,根据频谱幅值计算得到谱平坦度特征参数和调性特征参数的值;The characteristic parameter calculation unit is used to calculate the values of the current spectral center of gravity characteristic parameter and the time domain stability characteristic parameter according to the sub-band signal, and obtain the values of the spectral flatness characteristic parameter and the tonality characteristic parameter according to the spectral amplitude value calculation;
如前所述,所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值;As mentioned above, the characteristic parameter of the spectral center of gravity is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy, or the value obtained by smoothing the ratio;
所述时域稳定度特征参数是幅值叠加值的方差和幅值叠加值平方的期望的比值,或该比值乘上一个系数;The time-domain stability characteristic parameter is the expected ratio of the variance of the amplitude superposition value to the square of the amplitude superposition value, or the ratio is multiplied by a coefficient;
所述谱平坦度特征参数是某些频谱幅值的几何平均数和算术平均数的比值,或该比值乘上一个系数;The characteristic parameter of spectral flatness is the ratio of the geometric mean and the arithmetic mean of certain spectral amplitudes, or the ratio is multiplied by a coefficient;
调性特征参数是通过计算前后两帧信号的帧内频谱差分系数的相关值得到,或继续对该相关值进行平滑滤波得到。The tonality feature parameter is obtained by calculating the correlation value of the intra-frame spectral difference coefficient of the two frame signals before and after, or by continuing to smooth and filter the correlation value.
调性信号判断单元,用于根据调性特征参数、时域稳定度特征参数、谱平坦度特征参数、谱重心特征参数判断当前帧是否为调性信号。The tonality signal judging unit is used for judging whether the current frame is a tonality signal according to the tonality characteristic parameter, the temporal stability characteristic parameter, the spectral flatness characteristic parameter, and the spectral center of gravity characteristic parameter.
如图11所示,所述调性信号判断单元包括:As shown in Figure 11, the tonality signal judging unit includes:
调性信号初始化模块,用于设定当前帧信号为非调性信号,并用一个调性帧标志tonality_frame来指示当前帧是否为调性帧;The tonality signal initialization module is used to set the current frame signal as an atonality signal, and uses a tonality frame flag tonality_frame to indicate whether the current frame is a tonality frame;
调性特征参数判断模块,用于判断调性特征参数tonality_rate1或其平滑滤波后tonality_rate2的值是否大于对应的设定的门限值;The tonality characteristic parameter judging module is used for judging whether the value of the tonality characteristic parameter tonality_rate1 or tonality_rate2 after smoothing and filtering is greater than the corresponding set threshold value;
调性信号判断模块,用于在所述调性特征参数判断模块判断为是时,如果时域稳定度特征参数值小于一个设定的门限值;谱重心特征参数值大于一个设定的门限值,且各子带的谱平坦度特征参数均小于各自对应的预设的门限值;判断当前帧为调性帧;在根据计算得到的调性程度特征参数tonality_degree判断当前帧是否为调性信号,并在所述调性特征参数判断模块判断为否时,用于根据更新后的调性程度特征参数tonality_degree判断当前帧是否为调性信号,并设置调性标志tonality_flag的值;The tonality signal judging module is used to determine that the tonality characteristic parameter judging module is yes, if the time domain stability characteristic parameter value is less than a set threshold value; the spectral center of gravity characteristic parameter value is greater than a set threshold value Limit value, and the spectral flatness characteristic parameters of each sub-band are less than the respective preset threshold values; judge that the current frame is a tonality frame; judge whether the current frame is a tonality frame according to the calculated tonality_degree characteristic parameter tonality_degree Tonality signal, and when the tonality characteristic parameter judging module judges as no, it is used to judge whether the current frame is a tonality signal according to the updated tonality degree characteristic parameter tonality_degree, and set the value of the tonality flag tonality_flag;
调性程度参数更新模块,用于在调性特征参数tonality_rate1或其平滑滤波后tonality_rate2的值均小于对应的设定的门限值时,根据调性帧标志对调性程度特征参数tonality_degree进行更新,其中调性程度参数tonality_degree初始值在激活音检测装置开始工作时进行设置。The tonality degree parameter update module is used to update the tonality degree characteristic parameter tonality_degree according to the tonality frame flag when the value of the tonality characteristic parameter tonality_rate1 or tonality_rate2 after smoothing filtering is less than the corresponding set threshold value, wherein The initial value of the tonality degree parameter tonality_degree is set when the activation sound detection device starts to work.
具体地,如果当前的调性帧标志指示当前帧为调性帧,则调性程度参数更新模块采用以下方程式对调性程度特征参数tonality_degree进行更新:Specifically, if the current tonality frame flag indicates that the current frame is a tonality frame, the tonality degree parameter update module uses the following equation to update the tonality_degree characteristic parameter:
tonality_degree=tonality_degree-1·td scale_A+td_scale_B;tonality_degree = tonality_degree -1 td scale_A + td_scale_B;
其中,tonality_degree-1为前一帧的调性程度特征参数。其初始值取值范围为[0,1]。td_scale_A为衰减系数,其取值范围为[0,1];td_scale_B为累加系数,其取值范围为[0,1]。Among them, tonality_degree -1 is the characteristic parameter of the tonality degree of the previous frame. Its initial value ranges from [0, 1]. td_scale_A is the attenuation coefficient, and its value range is [0, 1]; td_scale_B is the accumulation coefficient, and its value range is [0, 1].
若调性程度特征参数tonality_degree大于某个设定的门限值,则所述调性信号判断模块判断当前帧为调性信号,否则,判断当前帧为非调性信号。If the tonality degree characteristic parameter tonality_degree is greater than a certain threshold value, the tonality signal judging module judges that the current frame is a tonality signal, otherwise, judges that the current frame is an atonality signal.
具体地,如果调性程度特征参数tonality_degree大于该门限值0.5,则判断当前帧为调性信号,设置调性标志tonality_flag的值为1;否则,判断当前帧为非调性信号,设置该值为0。调性信号判决的门限值取值区间为[0.3,0.7]。Specifically, if the tonality_degree characteristic parameter tonality_degree is greater than the threshold value 0.5, then it is judged that the current frame is a tonal signal, and the value of the tonality flag tonality_flag is set to 1; otherwise, it is judged that the current frame is a non-tonality signal, and the value is set is 0. The range of threshold value for tone signal judgment is [0.3, 0.7].
本发明还提供了一种VAD判决中激活音保持帧数的修正方法,如图12所示,该方法包括:The present invention also provides a method for correcting the frame number of the active tone in the VAD decision, as shown in Figure 12, the method includes:
步骤1201:根据子带信号计算得到长时信噪比lt_snr;Step 1201: Calculate the long-term signal-to-noise ratio lt_snr according to the sub-band signal;
具体地,通过前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比lt_snr;长时信噪比lt_snr可采用对数表示。Specifically, the long-term signal-to-noise ratio lt_snr is calculated through the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated in the previous frame; the long-term signal-to-noise ratio lt_snr can be expressed in logarithm.
步骤1202:计算平均全带信噪比SNR2_lt_ave;Step 1202: Calculate the average full-band SNR SNR2_lt_ave;
计算最近若干个帧的全带信噪比SNR2的平均值,得到平均全带信噪比SNR2_lt_ave;Calculate the average value of the full-band signal-to-noise ratio SNR2 of several recent frames to obtain the average full-band signal-to-noise ratio SNR2_lt_ave;
步骤1203:根据前面若干帧的判决结果、长时信噪比lt_snr、平均全带信噪比SNR2_lt_ave、当前帧的信噪比参数和当前帧的VAD判决结果,对当前激活音保持帧数进行修正。Step 1203: According to the judgment results of the previous frames, the long-term SNR lt_snr, the average full-band SNR SNR2_lt_ave, the SNR parameters of the current frame, and the VAD judgment results of the current frame, the number of frames to maintain the current active tone is corrected .
可理解地,当前激活音保持帧数修正的前提条件是激活音标志指示当前帧为激活音帧。Understandably, the precondition for maintaining the correction of the frame number of the current active sound is that the active sound flag indicates that the current frame is the active sound frame.
优选地,对当前激活音保持帧数进行修正时,如果前面连续语音帧数小于一个设定的门限值1,并且长时信噪比lt_snr小于一个设定的门限值2,则当前激活音保持帧数等于最小连续激活音帧数减去前面连续语音帧数;否则,如果平均全带信噪比SNR2_lt_ave大于一个设定的门限值3,并且前面连续语音帧个数大于一个设定的门限值4,则根据长时信噪比的大小设置激活音保持帧数的值,否则不修正当前激活音保持帧数num_speech_hangover的值。Preferably, when modifying the number of frames maintained for the current activation sound, if the previous continuous speech frame number is less than a set threshold value 1, and the long-term signal-to-noise ratio lt_snr is less than a set threshold value 2, the current active tone The number of sound holding frames is equal to the minimum number of continuous active sound frames minus the number of previous continuous speech frames; otherwise, if the average full-band SNR SNR2_lt_ave is greater than a set threshold value 3, and the number of previous continuous speech frames is greater than a set If the threshold value is 4, set the value of the number of frames to hold the active voice according to the long-term signal-to-noise ratio, otherwise the value of the number of frames to hold the current active voice num_speech_hangover will not be corrected.
对应于前述激活音保持帧数的修正方法,本发明还提供了一种VAD判决中激活音保持帧数的修正装置,如图13所示,该修正装置包括:Corresponding to the method for correcting the number of frames to be kept in the activation tone, the present invention also provides a correction device for maintaining the number of frames in the activation tone in the VAD decision. As shown in FIG. 13 , the correction device includes:
长时信噪比计算单元,用于计算长时信噪比lt_snr;A long-term signal-to-noise ratio calculation unit, used to calculate the long-term signal-to-noise ratio lt_snr;
具体地,长时信噪比计算单元通过前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比lt_snr;Specifically, the long-term signal-to-noise ratio calculation unit calculates the long-term signal-to-noise ratio lt_snr through the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated in the previous frame;
平均全带信噪比计算单元,用于计算平均全带信噪比SNR2_lt_ave;The average full-band SNR calculation unit is used to calculate the average full-band SNR SNR2_lt_ave;
具体地,所述平均全带信噪比计算单元计算最近若干个帧的全带信噪比SNR2的平均值,得到平均全带信噪比SNR2_lt_ave。Specifically, the average full-band signal-to-noise ratio calculation unit calculates the average value of the full-band signal-to-noise ratio SNR2 of several recent frames to obtain the average full-band signal-to-noise ratio SNR2_lt_ave.
激活音保持帧数修正单元,用于根据前面若干帧的判决结果、长时信噪比lt_snr、平均全带信噪比SNR2_lt_ave、当前帧的信噪比参数和当前帧的VAD判决结果,对当前激活音保持帧数进行修正。Activation tone keeps the frame number correction unit, is used for according to the decision result of previous several frames, long-term signal-to-noise ratio lt_snr, average full-band signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of current frame and the VAD decision result of current frame, to current Activation sound maintains frame number for correction.
如上文所述,当前激活音保持帧数修正的前提条件是激活音标志指示当前帧为激活音帧。As mentioned above, the precondition for maintaining the correction of the frame number of the current active sound is that the active sound flag indicates that the current frame is the active sound frame.
优选地,激活音保持帧数修正单元,对当前激活音保持帧数进行修正时,如果前面连续语音帧数小于一个设定的门限值1,并且长时信噪比lt_snr小于一个设定的门限值2,则当前激活音保持帧数等于最小连续激活音帧数减去前面连续语音帧数,否则,如果平均全带信噪比SNR2_lt_ave大于一个设定的门限值3,并且前面连续语音帧个数大于一个设定的门限值4,则根据长时信噪比的大小设置激活音保持帧数的值,否则不修正当前激活音保持帧数nun_speech_hangover的值。Preferably, when the activation tone maintenance frame number modification unit is to modify the current activation tone maintenance frame number, if the previous continuous speech frame number is less than a set threshold value 1, and the long-term signal-to-noise ratio lt_snr is less than a set Threshold value 2, the current active tone keeps the number of frames equal to the minimum number of continuous active tone frames minus the previous continuous speech frame number, otherwise, if the average full-band SNR SNR2_lt_ave is greater than a set threshold value 3, and the previous continuous If the number of speech frames is greater than a set threshold value of 4, then set the value of the number of frames to hold the activation tone according to the size of the long-term signal-to-noise ratio, otherwise the value of the number of frames of the current activation tone to hold nun_speech_hangover will not be corrected.
本发明还提供了一种VAD判决中信噪比门限的调整方法,如图14所示,该调整方法包括:The present invention also provides a method for adjusting the SNR threshold in VAD decision, as shown in Figure 14, the adjustment method includes:
步骤1401:根据子带信号计算得到当前帧的谱重心特征参数;Step 1401: Calculate the characteristic parameter of the spectral center of gravity of the current frame according to the sub-band signal;
具体地,所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值。Specifically, the characteristic parameter of the spectral center of gravity is a ratio of a weighted cumulative value to an unweighted cumulative value of all or part of the sub-band signal energy, or a value obtained by smoothing the ratio.
步骤1402:通过前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比lt_snr;Step 1402: Calculate the long-term signal-to-noise ratio lt_snr through the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated in the previous frame;
步骤1403:根据谱重心特征参数、长时信噪比、前面连续激活音帧个数和前面连续噪声帧个数continuous_noise_num调整VAD判决的信噪比门限。Step 1403: Adjust the SNR threshold of the VAD decision according to the characteristic parameters of the spectral center of gravity, the long-term SNR, the number of the previous continuous active audio frames and the number of the previous continuous noise frames continuous_noise_num.
具体地,如图15所示,调整信噪比门限的步骤包括:Specifically, as shown in Figure 15, the steps of adjusting the SNR threshold include:
步骤1501:设置信噪比门限snr_thr的初始值;Step 1501: Set the initial value of the SNR threshold snr_thr;
步骤1502:根据谱重心参数首次调整信噪比门限snr_thr的值;Step 1502: adjust the value of the SNR threshold snr_thr for the first time according to the spectral center of gravity parameter;
步骤1503:根据前面连续激活音帧个数continuous_speech_num、前面连续噪声帧个数continuous_noise_num、平均全带信噪比SNR2_lt_ave和长时信噪比lt_snr二次调整信噪比门限snr_thr的值;Step 1503: adjust the value of the SNR threshold snr_thr twice according to the number of continuous_speech_num of the previous continuous activation sound frames, the number of continuous_noise_num of the previous continuous noise frames, the average full-band SNR SNR2_lt_ave and the long-term SNR lt_snr;
步骤1504:根据长时信噪比lt_snr的值再对信噪比门限snr_thr进行最终修正,得到当前帧的信噪比门限snr_thr。Step 1504: According to the value of the long-term SNR lt_snr, the SNR threshold snr_thr is finally corrected to obtain the SNR threshold snr_thr of the current frame.
对应于前述信噪比门限的调整方法,本发明还提供了一种VAD判决中信噪比门限的调整装置,如图16所示,该调整装置包括:Corresponding to the aforementioned adjustment method of the SNR threshold, the present invention also provides an adjustment device for the SNR threshold in VAD decision, as shown in FIG. 16 , the adjustment device includes:
特征参数获取单元,用于根据子带信号计算得到当前帧的谱重心特征参数;A characteristic parameter acquisition unit, used to calculate the spectral center of gravity characteristic parameter of the current frame according to the subband signal;
优选地,所述谱重心特征参数是所有或部分子带信号能量的加权累加值和未加权累加值的比值,或该比值进行平滑滤波得到的值。Preferably, the characteristic parameter of the spectral center of gravity is the ratio of the weighted cumulative value and the unweighted cumulative value of all or part of the sub-band signal energy, or a value obtained by smoothing the ratio.
长时信噪比计算单元,用于通过前一帧计算得到的平均长时激活音信号能量和平均长时背景噪声能量的比值,计算得到长时信噪比lt_snr;The long-term signal-to-noise ratio calculation unit is used to calculate the ratio of the average long-term activation tone signal energy and the average long-term background noise energy calculated by the previous frame to calculate the long-term signal-to-noise ratio lt_snr;
信噪比门限调整单元,用于根据谱重心特征参数、长时信噪比、前面连续激活音帧个数和前面连续噪声帧个数continuous_noise_num调整VAD判决的信噪比门限。The signal-to-noise ratio threshold adjustment unit is used to adjust the signal-to-noise ratio threshold of the VAD decision according to the characteristic parameters of the spectral center of gravity, the long-term signal-to-noise ratio, the number of continuous activation audio frames and the number of continuous_noise_num previous continuous noise frames.
具体地,所述信噪比门限调整单元调整信噪比门限时,设置信噪比门限snr_thr的初始值;根据谱重心参数首次调整信噪比门限snr_thr的值;根据前面连续激活音帧个数continuous_speech_num、前面连续噪声帧个数continuous_noise_num、平均全带信噪比SNR2_lt_ave和长时信噪比lt_snr二次调整snr_thr的值;最后,根据长时信噪比lt_snr的值再对信噪比门限snr_thr进行最终调整,得到当前帧的信噪比门限snr_thr。Specifically, when the SNR threshold adjustment unit adjusts the SNR threshold, the initial value of the SNR threshold snr_thr is set; the value of the SNR threshold snr_thr is adjusted for the first time according to the spectral center of gravity parameter; Continuous_speech_num, the number of continuous noise frames before continuous_noise_num, the average full-band SNR SNR2_lt_ave and the long-term SNR lt_snr adjust the value of snr_thr twice; finally, according to the value of the long-term SNR lt_snr, adjust the SNR threshold snr_thr The final adjustment is to obtain the SNR threshold snr_thr of the current frame.
现代的很多语音编码标准,如AMR,AMR-WB,都支持VAD功能。在效率方面,这些编码器的VAD并不能在所有的典型背景噪声下都达到很好的性能。特别是在非稳定噪声下,如office噪声,这些编码器的VAD效率都较低。而对于音乐信号,这些VAD有时候会出现错误检测,导致相应的处理算法出现明显的质量下降。Many modern speech coding standards, such as AMR and AMR-WB, support the VAD function. In terms of efficiency, the VAD of these encoders does not perform well in all typical background noises. Especially under non-stationary noise, such as office noise, the VAD efficiency of these encoders is low. For music signals, these VADs sometimes have false detections, resulting in significant quality degradation of the corresponding processing algorithms.
本发明的方法克服了既有VAD算法的缺点,在提高VAD对不稳定噪声检测效率的同时也提高音乐检测的准确率。使得采用本VAD的语音频信号处理算法可以得到更好的性能。The method of the invention overcomes the shortcomings of the existing VAD algorithm, and improves the accuracy of music detection while improving the detection efficiency of VAD for unstable noise. Therefore, the voice and audio signal processing algorithm using the VAD can obtain better performance.
本发明提供的背景噪声检测方法,可使得背景噪声的估计更加准确和稳定,有利于提高VAD检测的准确率。本发明同时提供的调性信号检测方法,提高了调性音乐检测的准确率。本发明同时提供的激活音保持帧数的修正方法,可使得在不同的噪声和信噪比下,VAD算法可以在性能和效率得到更好的平衡。本发明同时提供的VAD判决中信噪比门限的调整方法,可使得VAD判决算法在不同的信噪比下都可以达到较好的准确率,在保证质量的情况下,进一步的提升效率。The background noise detection method provided by the invention can make the estimation of the background noise more accurate and stable, and is beneficial to improving the accuracy of VAD detection. The tonality signal detection method provided by the invention improves the accuracy of tonality music detection. The method for correcting the frame number of the active tone maintained by the present invention can make the VAD algorithm better balanced in terms of performance and efficiency under different noises and signal-to-noise ratios. The method for adjusting the signal-to-noise ratio threshold in the VAD decision provided by the present invention can make the VAD decision algorithm achieve better accuracy under different signal-to-noise ratios, and further improve efficiency while ensuring quality.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。Those skilled in the art can understand that all or part of the steps in the above method can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, and the like. Optionally, all or part of the steps in the foregoing embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, or may be implemented in the form of software function modules. The present invention is not limited to any specific combination of hardware and software.
Claims (20)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201810622976.0A CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment | 
| CN201210570563.5A CN103903634B (en) | 2012-12-25 | 2012-12-25 | Activation tone detection and method and device for activation tone detection | 
| CN202110060370.4A CN112992188B (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in activated voice detection VAD judgment | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201210570563.5A CN103903634B (en) | 2012-12-25 | 2012-12-25 | Activation tone detection and method and device for activation tone detection | 
Related Child Applications (2)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201810622976.0A Division CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment | 
| CN202110060370.4A Division CN112992188B (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in activated voice detection VAD judgment | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN103903634A CN103903634A (en) | 2014-07-02 | 
| CN103903634B true CN103903634B (en) | 2018-09-04 | 
Family
ID=50994913
Family Applications (3)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201810622976.0A Active CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment | 
| CN201210570563.5A Active CN103903634B (en) | 2012-12-25 | 2012-12-25 | Activation tone detection and method and device for activation tone detection | 
| CN202110060370.4A Active CN112992188B (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in activated voice detection VAD judgment | 
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201810622976.0A Active CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment | 
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN202110060370.4A Active CN112992188B (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in activated voice detection VAD judgment | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (3) | CN109119096B (en) | 
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN109119096B (en) * | 2012-12-25 | 2021-01-22 | 中兴通讯股份有限公司 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment | 
| CN105810201B (en) * | 2014-12-31 | 2019-07-02 | 展讯通信(上海)有限公司 | Voice activity detection method and its system | 
| CN106328169B (en) | 2015-06-26 | 2018-12-11 | 中兴通讯股份有限公司 | A kind of acquisition methods, activation sound detection method and the device of activation sound amendment frame number | 
| CN107393558B (en) * | 2017-07-14 | 2020-09-11 | 深圳永顺智信息科技有限公司 | Voice activity detection method and device | 
| CN111724808A (en) * | 2019-03-18 | 2020-09-29 | Oppo广东移动通信有限公司 | Audio signal processing method, device, terminal and storage medium | 
| CN110431625B (en) | 2019-06-21 | 2023-06-23 | 深圳市汇顶科技股份有限公司 | Voice detection method, voice detection device, voice processing chip and electronic equipment | 
| CN112634921B (en) * | 2019-10-09 | 2024-02-13 | 北京中关村科金技术有限公司 | Voice processing method, device and storage medium | 
| CN112669877B (en) * | 2020-09-09 | 2023-09-29 | 珠海市杰理科技股份有限公司 | Noise detection and suppression method and device, terminal equipment, system and chip | 
| CN113192531B (en) * | 2021-05-28 | 2024-04-16 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, terminal and storage medium for detecting whether audio is pure audio | 
| CN114297580B (en) * | 2021-12-23 | 2025-05-13 | 贵州航天电子科技有限公司 | A low-latency signal-to-noise ratio calculation method based on FPGA | 
| CN115273913B (en) * | 2022-07-27 | 2024-07-30 | 歌尔科技有限公司 | Voice endpoint detection method, device, equipment and computer readable storage medium | 
| CN115881167B (en) * | 2022-11-25 | 2025-08-12 | 歌尔科技有限公司 | Speech detection method, apparatus and computer-readable storage medium | 
| CN115862685B (en) * | 2023-02-27 | 2023-09-15 | 全时云商务服务股份有限公司 | Real-time voice activity detection method and device and electronic equipment | 
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US6317711B1 (en) * | 1999-02-25 | 2001-11-13 | Ricoh Company, Ltd. | Speech segment detection and word recognition | 
| CN101379548A (en) * | 2006-02-10 | 2009-03-04 | 艾利森电话股份有限公司 | A voice detector and a method for suppressing sub-bands in a voice detector | 
| CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection | 
| CN102687196A (en) * | 2009-10-08 | 2012-09-19 | 西班牙电信公司 | Method for the detection of speech segments | 
| CN102741918A (en) * | 2010-12-24 | 2012-10-17 | 华为技术有限公司 | Method and apparatus for voice activity detection | 
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO1995015550A1 (en) * | 1993-11-30 | 1995-06-08 | At & T Corp. | Transmitted noise reduction in communications systems | 
| FI100840B (en) * | 1995-12-12 | 1998-02-27 | Nokia Mobile Phones Ltd | Noise cancellation and background noise canceling method in a noise and a mobile telephone | 
| US6453291B1 (en) * | 1999-02-04 | 2002-09-17 | Motorola, Inc. | Apparatus and method for voice activity detection in a communication system | 
| CA2420129A1 (en) * | 2003-02-17 | 2004-08-17 | Catena Networks, Canada, Inc. | A method for robustly detecting voice activity | 
| US7366658B2 (en) * | 2005-12-09 | 2008-04-29 | Texas Instruments Incorporated | Noise pre-processor for enhanced variable rate speech codec | 
| KR101151746B1 (en) * | 2006-01-02 | 2012-06-15 | 삼성전자주식회사 | Noise suppressor for audio signal recording and method apparatus | 
| CN101197130B (en) * | 2006-12-07 | 2011-05-18 | 华为技术有限公司 | Sound activity detecting method and detector thereof | 
| CN101320559B (en) * | 2007-06-07 | 2011-05-18 | 华为技术有限公司 | Sound activation detection apparatus and method | 
| CN101236742B (en) * | 2008-03-03 | 2011-08-10 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device | 
| CN102044243B (en) * | 2009-10-15 | 2012-08-29 | 华为技术有限公司 | Method and device for voice activity detection (VAD) and encoder | 
| EP2491548A4 (en) * | 2009-10-19 | 2013-10-30 | Ericsson Telefon Ab L M | Method and voice activity detector for a speech encoder | 
| CN102194457B (en) * | 2010-03-02 | 2013-02-27 | 中兴通讯股份有限公司 | Audio encoding and decoding method, system and noise level estimation method | 
| EP3252771B1 (en) * | 2010-12-24 | 2019-05-01 | Huawei Technologies Co., Ltd. | A method and an apparatus for performing a voice activity detection | 
| CN102097095A (en) * | 2010-12-28 | 2011-06-15 | 天津市亚安科技电子有限公司 | Speech endpoint detecting method and device | 
| CN102074246B (en) * | 2011-01-05 | 2012-12-19 | 瑞声声学科技(深圳)有限公司 | Dual-microphone based speech enhancement device and method | 
| CN109119096B (en) * | 2012-12-25 | 2021-01-22 | 中兴通讯股份有限公司 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment | 
- 
        2012
        - 2012-12-25 CN CN201810622976.0A patent/CN109119096B/en active Active
- 2012-12-25 CN CN201210570563.5A patent/CN103903634B/en active Active
- 2012-12-25 CN CN202110060370.4A patent/CN112992188B/en active Active
 
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US6317711B1 (en) * | 1999-02-25 | 2001-11-13 | Ricoh Company, Ltd. | Speech segment detection and word recognition | 
| CN101379548A (en) * | 2006-02-10 | 2009-03-04 | 艾利森电话股份有限公司 | A voice detector and a method for suppressing sub-bands in a voice detector | 
| CN102687196A (en) * | 2009-10-08 | 2012-09-19 | 西班牙电信公司 | Method for the detection of speech segments | 
| CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection | 
| CN102741918A (en) * | 2010-12-24 | 2012-10-17 | 华为技术有限公司 | Method and apparatus for voice activity detection | 
Non-Patent Citations (1)
| Title | 
|---|
| LIS, SOPHIA ANTIPOLIS.Digital cellular telecommunications system (Phase 2+);Voice Activity Detector (VAD) for Adaptive Multi-Rate(AMR) speech traffic channels;General description (GSM 06.94 version 7.1.0 Release 1998);Draft ETSI EN 301 708.《IEEE, LIS, SOPHIA ANTIPOLIS CEDEX, FRANCE, vol. SMG11, no. V7.1.0, July 1999 (1999-07-01)》.1999, * | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN109119096B (en) | 2021-01-22 | 
| CN112992188B (en) | 2024-06-18 | 
| CN103903634A (en) | 2014-07-02 | 
| CN109119096A (en) | 2019-01-01 | 
| CN112992188A (en) | 2021-06-18 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN103903634B (en) | Activation tone detection and method and device for activation tone detection | |
| US9672841B2 (en) | Voice activity detection method and method used for voice activity detection and apparatus thereof | |
| CN104424956B9 (en) | Activation tone detection method and device | |
| CN106328169B (en) | A kind of acquisition methods, activation sound detection method and the device of activation sound amendment frame number | |
| KR102390784B1 (en) | Voice activity detection method and device | |
| US9959886B2 (en) | Spectral comb voice activity detection | |
| US9384759B2 (en) | Voice activity detection and pitch estimation | |
| US20140211965A1 (en) | Audio bandwidth dependent noise suppression | |
| EP2760022B1 (en) | Audio bandwidth dependent noise suppression | |
| Upadhyay et al. | A perceptually motivated stationary wavelet packet filter-bank utilizing improved spectral over-subtraction algorithm for enhancing speech in non-stationary environments | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |