CN113327634B - A voice activity detection method and system for low power consumption circuit - Google Patents
A voice activity detection method and system for low power consumption circuit Download PDFInfo
- Publication number
- CN113327634B CN113327634B CN202110755667.2A CN202110755667A CN113327634B CN 113327634 B CN113327634 B CN 113327634B CN 202110755667 A CN202110755667 A CN 202110755667A CN 113327634 B CN113327634 B CN 113327634B
- Authority
- CN
- China
- Prior art keywords
- voice
- activity detection
- classification
- voice activity
- detection method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 34
- 230000000694 effects Effects 0.000 title claims abstract description 29
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 238000012706 support-vector machine Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 abstract description 5
- 238000004422 calculation algorithm Methods 0.000 description 13
- 238000012935 Averaging Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000037433 frameshift Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mobile Radio Communication Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及语音算法设计领域,特别是涉及一种基于线性SVM(支持向量机,Support vector machine)的语音静默检测方法及系统。The present invention relates to the field of speech algorithm design, and in particular to a speech silence detection method and system based on linear SVM (support vector machine).
背景技术Background technique
语音活动检测(Voice Activity Detection,VAD),又称语音端点检测,顾名思义便是要判定语音开始和结束的位置,最早应用于电话传输与检测等通信领域,如今也在语音识别、语音压缩领域应用广泛,是重要的语音预处理技术。Voice Activity Detection (VAD), also known as voice endpoint detection, is to determine the starting and ending positions of speech. It was first used in communication fields such as telephone transmission and detection. Today, it is also widely used in speech recognition and speech compression. It is an important speech preprocessing technology.
在已有的语音端点检测算法,根据分类使用语音特征,可以分为时域特征VAD算法、频域特征VAD算法。其中时域特征下包括:短时能量、短时过零率、短时自相关等;频域特征又包括:基音周期、梅尔倒谱距离等。同时语音端点检测实际上是一个二分类问题,即根据语音段与噪声段在时域或频域上特征的差异进行语音信号分类。所以不同的VAD算法可能会采用不同的分类器,例如最经典的双门限检测算法采用阈值判断,也算法有基于决策树、有限状态机或神经网络等更复杂的分类器进行分类。The existing speech endpoint detection algorithms can be divided into time domain feature VAD algorithms and frequency domain feature VAD algorithms according to the speech features used for classification. The time domain features include: short-time energy, short-time zero-crossing rate, short-time autocorrelation, etc.; the frequency domain features include: pitch period, Mel cepstrum distance, etc. At the same time, speech endpoint detection is actually a binary classification problem, that is, the speech signal is classified according to the difference in the characteristics of the speech segment and the noise segment in the time domain or frequency domain. Therefore, different VAD algorithms may use different classifiers. For example, the most classic dual threshold detection algorithm uses threshold judgment, and there are also algorithms based on more complex classifiers such as decision trees, finite state machines or neural networks for classification.
语音特征需要可以较好地体现出语音和噪声的区别;时域特征在高信噪比的情况下有较好的结果,但是在噪声比较大的环境中,会出现噪声淹没语音信号的情况,从而导致基于能量或过零率这类型特征的判决出错。频域特征在一定程度上受噪声影响程度低于时域特征,但频域特征计算复杂度相较时域特征更高。Speech features need to be able to better reflect the difference between speech and noise; time domain features have better results in high signal-to-noise ratio conditions, but in a noisy environment, the noise will drown out the speech signal, resulting in errors in judgments based on energy or zero-crossing rate features. Frequency domain features are less affected by noise than time domain features to a certain extent, but the computational complexity of frequency domain features is higher than that of time domain features.
发明内容Summary of the invention
为了折中现有技术计算复杂度和准确率的矛盾,以较低的算法复杂度实现较好的分类准确率,本发明提出一种应用于低功耗电路的语音活动检测方法及系统。In order to compromise the contradiction between the computational complexity and accuracy of the prior art and achieve better classification accuracy with lower algorithm complexity, the present invention proposes a voice activity detection method and system applied to a low power consumption circuit.
本发明的技术问题通过以下的技术方案予以解决:The technical problem of the present invention is solved by the following technical solutions:
本发明提出一种应用于低功耗电路的语音活动检测方法,其特征在于,包括如下步骤:S1:接收输入的语音,进行语音特征提取,采用子带能量特征,减少子带数量;经过特征提取,得到特征值进入后续分类;S2:选择支持向量机的线性分类器进行训练分类,输出语音分类结果,完成语音活动检测。The present invention proposes a voice activity detection method applied to a low-power circuit, which is characterized in that it includes the following steps: S1: receiving input voice, performing voice feature extraction, using sub-band energy features, and reducing the number of sub-bands; after feature extraction, obtaining feature values for subsequent classification; S2: selecting a linear classifier of a support vector machine for training and classification, outputting voice classification results, and completing voice activity detection.
在一些实施例,在S1步骤中,将语音以帧长进行分帧,帧移等于帧长;采用矩形窗,对语音信号进行加窗运算来实现分帧加窗操作,窗长为对应帧长的数据点数。In some embodiments, in step S1, the speech is divided into frames according to the frame length, and the frame shift is equal to the frame length; a rectangular window is used to perform a windowing operation on the speech signal to implement the framing and windowing operation, and the window length is the number of data points corresponding to the frame length.
在一些实施例,在分帧加窗操作后,带通滤波,计算短时能量。In some embodiments, after the frame splitting and windowing operations, band-pass filtering is performed and short-time energy is calculated.
在一些实施例,在S1步骤中,所述语音特征提取采用全模拟电路实现。In some embodiments, in step S1, the speech feature extraction is implemented using a full analog circuit.
在一些实施例,在S1步骤中,增加基于递归平均估计的背景噪声特征作为新特征。In some embodiments, in step S1, a background noise feature based on recursive average estimation is added as a new feature.
在一些实施例,基于递归平均估计的背景噪声计算方式如下:In some embodiments, the background noise based on the recursive average estimation is calculated as follows:
公式中β1,β2取值在0-1之间;NL(i),E(i)为第i In the formula, β 1 and β 2 are between 0 and 1; NL(i) and E(i) are the values of the i-th
for_E(i)<NL(i-1):for_E(i)<NL(i-1):
NL(i)=β2NL(i-1)+(1-β2)E(i)NL(i)=β 2 NL(i-1)+(1-β 2 )E(i)
帧信号的背景噪声和短时能量;平滑因子β的取值采用阈值法,根据NL(i-1)与E(i)的大小关系选取不同β值。The background noise and short-time energy of the frame signal; the value of the smoothing factor β is determined by the threshold method, and different β values are selected according to the size relationship between NL(i-1) and E(i).
在一些实施例,对模拟域特征进行量化,转为8bit数字域特征。In some embodiments, the analog domain characteristics are quantized and converted into 8-bit digital domain characteristics.
在一些实施例,对支持向量机的线性分类器的权重值进行位数限制,以降低复杂度。In some embodiments, the weight values of the linear classifier of the support vector machine are limited in number of bits to reduce complexity.
本发明还提出一种应用于低功耗电路的语音活动检测系统,其特征在于,包括:特征提取模块,分类模块;所述特征提取模块,接收输入的语音,进行语音特征提取,采用子带能量、基于递归平均估计的背景噪声作为分类特征;所述分类模块,采用支持向量机的线性分类器进行训练分类;经过所述特征提取模块后,得到特征值进入后续分类模块进行训练分类,输出语音分类结果,完成语音活动检测。The present invention also proposes a voice activity detection system applied to a low-power circuit, characterized in that it includes: a feature extraction module and a classification module; the feature extraction module receives input voice, performs voice feature extraction, and uses sub-band energy and background noise based on recursive average estimation as classification features; the classification module uses a linear classifier of a support vector machine for training and classification; after passing through the feature extraction module, the characteristic values obtained enter the subsequent classification module for training and classification, output the voice classification results, and complete the voice activity detection.
本发明还提出一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现以上任一所述方法的步骤。The present invention further proposes a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and is characterized in that when the computer program is executed by a processor, the steps of any of the above methods are implemented.
本发明与现有技术对比的有益效果包括:本发明相较于传统的使用子带能量特征的算法,子带数目大幅减少,降低了算法复杂度;在分类器方面,采用了准确率与复杂度折中的SVM作为分类器;同时NL特征的引入使得准确率相较传统算法并未有明显下降;本发明采用较低的实现复杂度取得了在低信噪比下较好的分类准确率。Compared with the prior art, the beneficial effects of the present invention include: compared with the traditional algorithm using sub-band energy features, the number of sub-bands in the present invention is greatly reduced, thereby reducing the complexity of the algorithm; in terms of the classifier, SVM, which compromises accuracy and complexity, is used as the classifier; at the same time, the introduction of NL features ensures that the accuracy does not drop significantly compared with the traditional algorithm; the present invention uses a lower implementation complexity to achieve better classification accuracy under low signal-to-noise ratio.
在一些实施例,本发明与现有技术对比的有益效果包括:同时由于该方法的语音特征提取部分采用全模拟电路实现,在设计时便考虑了实际电路实现可能性,从而使得该方法可以同时额外满足低功耗电路需求。In some embodiments, the beneficial effects of the present invention compared with the prior art include: since the speech feature extraction part of the method is implemented using a full analog circuit, the possibility of actual circuit implementation is taken into consideration during the design, so that the method can also meet the low-power circuit requirements at the same time.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明实施例的处理流程示意图;FIG1 is a schematic diagram of a processing flow of an embodiment of the present invention;
图2是本发明实施例的特征提取模块语音示意图;FIG2 is a speech schematic diagram of a feature extraction module according to an embodiment of the present invention;
图3是本发明实施例的分类模块语音示意图。FIG. 3 is a speech schematic diagram of a classification module according to an embodiment of the present invention.
具体实施方式Detailed ways
下面对照附图并结合优选的实施方式对本发明作进一步说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。The present invention is further described below with reference to the accompanying drawings and in combination with preferred embodiments. It should be noted that the embodiments and features in the embodiments of the present application can be combined with each other without conflict.
需要说明的是,本实施例中的左、右、上、下、顶、底等方位用语,仅是互为相对概念,或是以产品的正常使用状态为参考的,而不应该认为是具有限制性的。It should be noted that the directional terms such as left, right, up, down, top, and bottom in this embodiment are merely relative concepts or are based on the normal use status of the product, and should not be considered as restrictive.
本发明的目的在于,提出一种较为简单的语音活动检测方法及系统,其在低信噪比的情况下依旧可以有较好的分类准确率;同时由于该方法的语音特征提取部分可以采用全模拟电路实现,从而使得该方法可以同时额外满足低功耗电路需求。The purpose of the present invention is to propose a relatively simple voice activity detection method and system, which can still have a good classification accuracy under low signal-to-noise ratio conditions; at the same time, because the voice feature extraction part of the method can be implemented using a full analog circuit, the method can also additionally meet the low-power circuit requirements.
本发明实施例的语音活动检测方法处理流程如图1所示,由特征提取模块和分类模块组成,其中特征提取模块采用子带能量及基于递归平均估计的背景噪声NL作为分类特征;分类模块采用线性SVM作为分类器。The processing flow of the voice activity detection method of the embodiment of the present invention is shown in Figure 1, which consists of a feature extraction module and a classification module, wherein the feature extraction module uses subband energy and background noise NL based on recursive average estimation as classification features; the classification module uses linear SVM as a classifier.
本发明实施例的一种应用于低功耗电路的语音活动检测方法,包括如下步骤:S1:接收输入的语音,进行语音特征提取,采用子带能量特征,减少子带数量;经过特征提取,得到特征值进入后续分类;S2:选择支持向量机的线性分类器进行训练分类,输出语音分类结果,完成语音活动检测。A voice activity detection method applied to a low-power circuit according to an embodiment of the present invention comprises the following steps: S1: receiving input voice, performing voice feature extraction, using sub-band energy features, and reducing the number of sub-bands; after feature extraction, obtaining feature values for subsequent classification; S2: selecting a linear classifier of a support vector machine for training and classification, outputting voice classification results, and completing voice activity detection.
在S1步骤中,将语音以帧长进行分帧,帧移等于帧长;采用矩形窗,对语音信号进行加窗运算来实现分帧加窗操作,窗长为对应帧长的数据点数。In step S1, the speech is divided into frames according to the frame length, and the frame shift is equal to the frame length; a rectangular window is used to perform a windowing operation on the speech signal to implement the framing and windowing operation, and the window length is the number of data points corresponding to the frame length.
在分帧加窗操作中,带通滤波,计算短时能量。In the frame-by-frame windowing operation, bandpass filtering is performed and short-time energy is calculated.
在S1步骤中,所述语音特征提取采用全模拟电路实现。In step S1, the speech feature extraction is implemented using a full analog circuit.
在S1步骤中,增加基于递归平均估计的背景噪声特征作为新特征。In step S1, the background noise feature based on recursive average estimation is added as a new feature.
该方法处理流程具体如下:The processing flow of this method is as follows:
第一阶段:将语音以25ms帧长进行分帧,同时考虑到若帧移小于帧长,则电路实现时要采用数字电路加入寄存器,不方便低功耗模拟电路实现。故本设计中帧移等于帧长,即帧与帧之间没有重叠。在方法实现时采用矩形窗,对语音信号进行加窗运算来实现分帧操作,窗长为25ms对应的数据点数,按照一般语音信号16k的采样率,窗长N为400。The first stage: divide the speech into frames with a frame length of 25ms. At the same time, if the frame shift is less than the frame length, the circuit must be implemented by adding a digital circuit to the register, which is not convenient for low-power analog circuit implementation. Therefore, in this design, the frame shift is equal to the frame length, that is, there is no overlap between frames. When implementing the method, a rectangular window is used to perform a windowing operation on the speech signal to realize the framing operation. The window length is the number of data points corresponding to 25ms. According to the sampling rate of 16k for general speech signals, the window length N is 400.
在语音特征选择部分,我们采用子带能量特征,但大幅减少了子带数量,仅在声音频谱中的100-5khz间选取4个频带计算短时能量。这在一定程度上降低了分类准确率,作为补偿,我们增加了背景噪声估计NL特征作为新特征。该基于递归平均估计的背景噪声计算方式如下:该特In the speech feature selection part, we use subband energy features, but greatly reduce the number of subbands, and only select 4 frequency bands between 100-5khz in the sound spectrum to calculate short-time energy. This reduces the classification accuracy to a certain extent. As compensation, we add background noise estimation NL features as new features. The background noise calculation method based on recursive average estimation is as follows: The special
for_E(i)<NL(i-1):for_E(i)<NL(i-1):
NL(i)=β2NL(i-1)+(1-β2)E(i)NL(i)=β 2 NL(i-1)+(1-β 2 )E(i)
征基于时间常数递归平均得到,递归平均是语音增强领域常用的方法,且计算对象大多是信噪比SNR,递归平均可以减小突变情况,使得计算目标变化较为平滑。上式便是通过递归平均计算背景噪声NL,公式中β1,β2取值在0-1之间;NL(i),E(i)为第i帧信号的背景噪声和短时能量,β为平滑因子,在该递归平均计算中,平滑因子的取值采用阈值法,即根据NL(i-1)与E(i)的大小关系选取不同大小的β值。由于之前提到的递归平均的特性,该特征可以一定程度减小语音段内的过渡区(即图2中信号幅度突然下降的部分,但仍为语音部分)对特征值的波动,从而减少误判。The feature is obtained by recursive averaging based on time constants. Recursive averaging is a commonly used method in the field of speech enhancement, and the calculation object is mostly the signal-to-noise ratio SNR. Recursive averaging can reduce mutations and make the calculation target change more smoothly. The above formula is to calculate the background noise NL by recursive averaging. In the formula, β 1 and β 2 are between 0 and 1; NL(i) and E(i) are the background noise and short-time energy of the i-th frame signal, and β is the smoothing factor. In the recursive averaging calculation, the value of the smoothing factor adopts the threshold method, that is, different β values are selected according to the size relationship between NL(i-1) and E(i). Due to the characteristics of the recursive averaging mentioned above, this feature can reduce the fluctuation of the feature value in the transition zone within the speech segment (that is, the part where the signal amplitude suddenly drops in Figure 2, but it is still the speech part) to a certain extent, thereby reducing misjudgment.
经过特征提取模块后,得到的5组特征值经过量化操作,在送入后续的分类模块进行训练分类。After passing through the feature extraction module, the 5 sets of feature values obtained are quantized and then sent to the subsequent classification module for training and classification.
第二阶段:在分类器方面,我们选择支持向量机的线性分类器SVM。与其他种类的分类器相比,相较域值判断和决策树,支持向量机的线性分类器SVM具有更高的准确率;相较于DNN深度神经网络,支持向量机的线性分类器SVM具有更小的复杂度。在SVM分类器中,有带核函数的SVM与线性SVM,经过算法仿真比较,对于所选语音特征,我们发现线性SVM的效果与核函数相差很小,但电路实现更加简单,功耗更低,故最终选定线性SVM分类器。Phase II: In terms of classifiers, we choose the linear classifier SVM of support vector machines. Compared with other types of classifiers, the linear classifier SVM of support vector machines has higher accuracy than domain value judgment and decision tree; compared with DNN deep neural network, the linear classifier SVM of support vector machines has smaller complexity. Among SVM classifiers, there are SVM with kernel function and linear SVM. After algorithm simulation and comparison, for the selected speech features, we found that the effect of linear SVM is very similar to that of kernel function, but the circuit implementation is simpler and the power consumption is lower, so the linear SVM classifier was finally selected.
同时考虑到电路实现部分,我们对模拟域特征进行量化,转为8bit数字域特征。同时对线性SVM分类器的权重值w进行位数限制,以降低复杂度。若依照每一帧的分类结果与数据集提供的标签之间相同点判为分类正确,如图3,正确帧数占总帧数的比例即为分类准确率。若不考虑噪声段与语音段分开计算,那么最终该算法可以10db信噪比情况下,达到90%的分类准确率;在5db信噪比下,也有85%的分类准确率。若按照将噪声段与语音段分开考虑分类准确率,则10db信噪比情况下,噪声部分准确率为89.87%,语音部分为91.55%。At the same time, considering the circuit implementation part, we quantize the analog domain features and convert them into 8-bit digital domain features. At the same time, the weight value w of the linear SVM classifier is limited to the number of bits to reduce the complexity. If the classification result of each frame is judged to be correct according to the same points between the labels provided by the data set, as shown in Figure 3, the proportion of correct frames to the total number of frames is the classification accuracy. If the noise segment and the speech segment are not considered to be calculated separately, the algorithm can finally achieve a classification accuracy of 90% under a 10db signal-to-noise ratio; under a 5db signal-to-noise ratio, it also has a classification accuracy of 85%. If the classification accuracy is considered separately from the noise segment and the speech segment, the accuracy of the noise part is 89.87% and the accuracy of the speech part is 91.55% under a 10db signal-to-noise ratio.
本发明实施例的一种应用于低功耗电路的语音活动检测系统,包括:特征提取模块,分类模块;所述特征提取模块,接收输入的语音,进行语音特征提取,采用子带能量、基于递归平均估计的背景噪声作为分类特征;所述分类模块,采用支持向量机的线性分类器进行训练分类;经过所述特征提取模块后,得到特征值进入后续分类模块进行训练分类,输出语音分类结果,完成语音活动检测。A voice activity detection system applied to a low-power circuit according to an embodiment of the present invention comprises: a feature extraction module and a classification module; the feature extraction module receives input voice, performs voice feature extraction, and uses subband energy and background noise based on recursive average estimation as classification features; the classification module uses a linear classifier of a support vector machine for training and classification; after passing through the feature extraction module, feature values are obtained and entered into a subsequent classification module for training and classification, and a voice classification result is output to complete voice activity detection.
本发明实施例的一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如以上任一所述方法的步骤。A computer-readable storage medium according to an embodiment of the present invention stores a computer program, wherein the computer program implements the steps of any of the above methods when executed by a processor.
本发明提出的语音活动检测方法采用较低的实现复杂度取得了在低信噪比下较好的分类准确率;同时在设计时便考虑了实际电路实现可能性,使得该方法特征提取部分可以采用全模拟电路实现,以达到低功耗需求。The voice activity detection method proposed in the present invention has achieved good classification accuracy under low signal-to-noise ratio with relatively low implementation complexity; at the same time, the possibility of actual circuit implementation is taken into consideration during the design, so that the feature extraction part of the method can be implemented with a full analog circuit to meet low power consumption requirements.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的技术人员来说,在不脱离本发明构思的前提下,还可以做出若干等同替代或明显变型,而且性能或用途相同,都应当视为属于本发明的保护范围。The above contents are further detailed descriptions of the present invention in combination with specific preferred embodiments, and it cannot be determined that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art of the present invention, several equivalent substitutions or obvious variations can be made without departing from the concept of the present invention, and the performance or use is the same, which should be regarded as belonging to the protection scope of the present invention.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110755667.2A CN113327634B (en) | 2021-07-05 | 2021-07-05 | A voice activity detection method and system for low power consumption circuit |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110755667.2A CN113327634B (en) | 2021-07-05 | 2021-07-05 | A voice activity detection method and system for low power consumption circuit |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113327634A CN113327634A (en) | 2021-08-31 |
| CN113327634B true CN113327634B (en) | 2024-06-21 |
Family
ID=77425516
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110755667.2A Active CN113327634B (en) | 2021-07-05 | 2021-07-05 | A voice activity detection method and system for low power consumption circuit |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113327634B (en) |
-
2021
- 2021-07-05 CN CN202110755667.2A patent/CN113327634B/en active Active
Non-Patent Citations (2)
| Title |
|---|
| 小波分析和支持向量机相融合的语音端点检测算法;朱恒军等;计算机科学;正文第2-4节内容 * |
| 支持向量机;Marco Croce等;IEEE JOURNAL OF SOLID-STATE CIRCUITS;第第56卷卷(第第3期期);摘要、正文第2-3节内容 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113327634A (en) | 2021-08-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109859767B (en) | Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid | |
| CN101599269B (en) | Phonetic end point detection method and device therefor | |
| CN102044244B (en) | Signal classifying method and device | |
| CN111223493A (en) | Voice signal noise reduction processing method, microphone and electronic equipment | |
| CN111540342B (en) | Energy threshold adjusting method, device, equipment and medium | |
| CN111739562B (en) | Voice activity detection method based on data selectivity and Gaussian mixture model | |
| US20230186943A1 (en) | Voice activity detection method and apparatus, and storage medium | |
| CN110265065A (en) | A kind of method and speech terminals detection system constructing speech detection model | |
| CN115116446B (en) | Speaker recognition model construction method in noise environment | |
| CN111722696B (en) | Voice data processing method and device for low-power-consumption equipment | |
| WO2021007841A1 (en) | Noise estimation method, noise estimation apparatus, speech processing chip and electronic device | |
| CN114937449B (en) | Voice keyword recognition method and system | |
| CN112053694A (en) | Voiceprint recognition method based on CNN and GRU network fusion | |
| CN113299308A (en) | Voice enhancement method and device, electronic equipment and storage medium | |
| CN118098255A (en) | Voice enhancement method based on neural network detection and related device thereof | |
| Abdulatif et al. | Investigating cross-domain losses for speech enhancement | |
| CN113327634B (en) | A voice activity detection method and system for low power consumption circuit | |
| CN114512128A (en) | Speech recognition method, device, equipment and computer readable storage medium | |
| TWI749547B (en) | Speech enhancement system based on deep learning | |
| CN110600019B (en) | Convolution neural network computing circuit based on speech signal-to-noise ratio pre-grading in real-time scene | |
| Asad et al. | Noise Suppression Using Gated Recurrent Units and Nearest Neighbor Filtering | |
| CN103337245B (en) | Based on the noise suppressing method of signal to noise ratio curve and the device of subband signal | |
| CN117198300A (en) | A bird sound recognition method and device based on attention mechanism | |
| CN114937450B (en) | Voice keyword recognition method and system | |
| Lee et al. | 37.8 A 13.5 µW 35-Keyword End-to-End Keyword Spotting System Featuring Personalized On-Chip Training in 28nm CMOS |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |