[go: up one dir, main page]

CN116364115A - Sound breaking detection method and device, electronic equipment and storage medium - Google Patents

Sound breaking detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116364115A
CN116364115A CN202310348811.XA CN202310348811A CN116364115A CN 116364115 A CN116364115 A CN 116364115A CN 202310348811 A CN202310348811 A CN 202310348811A CN 116364115 A CN116364115 A CN 116364115A
Authority
CN
China
Prior art keywords
clipping
time
domain signal
detection window
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310348811.XA
Other languages
Chinese (zh)
Inventor
包绎成
林勇平
熊贝尔
刘华平
赵翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202310348811.XA priority Critical patent/CN116364115A/en
Publication of CN116364115A publication Critical patent/CN116364115A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

本公开的实施方式提供了一种破音检测方法和装置、电子设备、存储介质,属于信号处理技术领域。该方法包括:将待检测音频信号划分为N个时域信号帧;根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,第一削波置信度用于区分正常音频信号和削波音频信号;根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;响应于第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。本公开解决了现有固定削波检测阈值带来的破音检测的误检率高和准确率低的问题。

Figure 202310348811

Embodiments of the present disclosure provide a broken sound detection method and device, electronic equipment, and a storage medium, which belong to the technical field of signal processing. The method comprises: dividing the audio signal to be detected into N time-domain signal frames; Confidence is used to distinguish between normal audio signals and clipped audio signals; according to the first clipping confidence of consecutive M time domain signal frames, determine the second clipping confidence corresponding to the detection window formed by consecutive M time domain signal frames degree; in response to the second clipping confidence degree being greater than the preset first threshold, it is determined that there is a broken sound within the corresponding detection window. The disclosure solves the problems of high false detection rate and low accuracy rate of broken sound detection caused by the existing fixed clipping detection threshold.

Figure 202310348811

Description

破音检测方法和装置、电子设备、存储介质Breaking sound detection method and device, electronic equipment, storage medium

技术领域technical field

本公开涉及信号处理技术领域,更具体地,本公开的实施方式涉及一种破音检测方法和装置、电子设备、存储介质。The present disclosure relates to the technical field of signal processing, and more specifically, the embodiments of the present disclosure relate to a broken sound detection method and device, electronic equipment, and a storage medium.

背景技术Background technique

本部分旨在为陈述的本公开的实施方式提供背景或上下文。此处的描述不因为包括在本部分中就承认是现有技术。This section is intended to provide a background or context for the stated implementations of the disclosure. The descriptions herein are not admitted to be prior art by inclusion in this section.

破音又称为爆音,是一种很刺耳的声音,在存在破音的情况下,原本的声音会变得含混不清。当音频信号中存在破音时会严重影响音频质量,因此,需要对音频信号进行破音检测。Cracking sound, also known as popping sound, is a very harsh sound. In the presence of cracking sound, the original sound will become ambiguous. When there is a broken sound in the audio signal, the audio quality will be seriously affected. Therefore, it is necessary to detect the broken sound on the audio signal.

由于绝大多数破音是由信号削波导致的,相关技术中,通常是通过一个削波阈值来检测削波,再通过削波来判断破音的。但实际上,削波并不一定会产生听感破音,这样就容易出现误检;且对于信号波动较大场景,采用固定的削波阈值进行检测的准确率较低。Since most broken sounds are caused by signal clipping, in related technologies, clipping is usually detected by a clipping threshold, and then broken sounds are judged by clipping. But in fact, clipping does not necessarily produce aurally broken sound, which is prone to false detection; and for scenes with large signal fluctuations, the accuracy of detection using a fixed clipping threshold is low.

发明内容Contents of the invention

本公开的实施方式提供一种破音检测方法和装置、电子设备、存储介质。Embodiments of the present disclosure provide a broken sound detection method and device, electronic equipment, and a storage medium.

在本公开实施方式的第一方面中,提供了一种破音检测方法,方法包括:将待检测音频信号划分为N个时域信号帧,所述N为正整数;根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,所述第一削波置信度用于区分正常音频信号和削波音频信号;根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;所述M为小于所述N的正整数;响应于所述第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。In the first aspect of the embodiments of the present disclosure, a broken sound detection method is provided. The method includes: dividing the audio signal to be detected into N time-domain signal frames, where N is a positive integer; The statistical distribution of the signal amplitude in the frame determines the first clipping confidence level corresponding to the frame, and the first clipping confidence level is used to distinguish between normal audio signals and clipping audio signals; according to M consecutive time-domain signal frames The first clipping confidence level is determined to determine the second clipping confidence level corresponding to the detection window formed by consecutive M time-domain signal frames; the M is a positive integer smaller than the N; in response to the second clipping If the confidence level is greater than the preset first threshold, it is determined that there is a broken sound within the corresponding detection window.

可选地,所述根据每帧时域信号帧内信号幅值的统计分布情况,确定第一削波置信度,包括:对于每帧时域信号帧,统计各幅值区段内的信号数量,以构建统计直方图;所述幅值区段为对由该时域信号帧的最大幅值与最小幅值形成的幅值区间进行划分获得的;查找所述统计直方图中的目标区块,所述目标区块为区块两端均高于区块中间,且区块后端高于区块前端的区块,所述区块前端和所述区块后端为根据所述查找的先后顺序确定的;确定各所述目标区块中区块两端的横向距离;根据各所述横向距离中的最大值与所述幅值区段的总数量的比值,确定所述第一削波置信度。Optionally, the determining the first clipping confidence level according to the statistical distribution of the signal amplitude in each frame of the time-domain signal includes: for each frame of the time-domain signal, counting the number of signals in each amplitude range , to construct a statistical histogram; the amplitude section is obtained by dividing the amplitude interval formed by the maximum amplitude and minimum amplitude of the time-domain signal frame; find the target block in the statistical histogram , the target block is a block whose both ends are higher than the middle of the block, and the back end of the block is higher than the front end of the block, and the front end of the block and the back end of the block are based on the search Determined in sequence; determine the lateral distance at both ends of the block in each of the target blocks; determine the first clipping according to the ratio of the maximum value in each of the lateral distances to the total number of the amplitude segments Confidence.

可选地,所述查找所述目标区块,包括:分别从所述统计直方图的两端开始,依次向中间移动,查找所述目标区块。Optionally, the searching for the target block includes: respectively starting from both ends of the statistical histogram and moving to the middle in order to search for the target block.

可选地,所述方法还包括:对于每帧时域信号帧,根据目标信号在该帧时域信号帧中的占比,确定该帧的第一削波比例,所述目标信号为幅值最大值和幅值最小值对应的信号;根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例;所述确定检测窗内存在破音,包括:响应于所述第二削波置信度大于预设第一阈值,且所述第二削波比例大于预设第二阈值,确定该检测窗内存在破音。Optionally, the method further includes: for each time-domain signal frame, determining the first clipping ratio of the frame according to the proportion of the target signal in the time-domain signal frame, the target signal being the amplitude The signal corresponding to the maximum value and the minimum value of the amplitude; according to the first clipping ratio of consecutive M time-domain signal frames, determine the second clipping ratio corresponding to the detection window; the determination that there is a broken sound in the detection window includes: a response When the second clipping confidence level is greater than a preset first threshold and the second clipping ratio is larger than a preset second threshold, it is determined that there is a broken sound within the detection window.

可选地,所述方法还包括:确定该检测窗内时域信号帧的频域能量特征;所述确定检测窗内存在破音,包括:响应于检测窗内信号帧不满足以下至少一项:所述第二削波置信度大于预设第一阈值,所述第二削波比例大于预设第二阈值,则根据所述频域能量特征,确定该检测窗内是否存在破音。Optionally, the method further includes: determining the frequency-domain energy feature of the time-domain signal frame in the detection window; the determining that there is a broken sound in the detection window includes: responding to the signal frame in the detection window not satisfying at least one of the following : The second clipping confidence level is greater than a preset first threshold, and the second clipping ratio is larger than a preset second threshold, then according to the frequency domain energy feature, determine whether there is a broken sound in the detection window.

可选地,所述频域能量特征包括截止频率值,所述确定该检测窗内时域信号帧的频域能量特征,包括:对该检测窗内的每帧时域信号帧进行时频变换,得到对应的频域信号;确定所述频域信号的能量重心为该信号帧的所述截止频率值。Optionally, the frequency-domain energy feature includes a cutoff frequency value, and the determining the frequency-domain energy feature of the time-domain signal frame in the detection window includes: performing time-frequency transformation on each time-domain signal frame in the detection window , to obtain a corresponding frequency domain signal; determining the energy center of gravity of the frequency domain signal as the cutoff frequency value of the signal frame.

可选地,所述根据所述频域能量特征,确定该检测窗内是否存在破音,包括:响应于信号帧的所述截止频率值大于频率阈值,确定该信号帧存在削波,所述频率阈值为基于所述频域信号的最大频率值确定的;响应于该检测窗内至少存在预设数量的具有削波的信号帧,确定该检测窗存在破音。Optionally, the determining whether there is a broken sound in the detection window according to the frequency domain energy feature includes: determining that there is clipping in the signal frame in response to the cutoff frequency value of the signal frame being greater than a frequency threshold, the The frequency threshold is determined based on the maximum frequency value of the frequency domain signal; in response to at least a preset number of signal frames with clipping within the detection window, it is determined that there is a broken sound in the detection window.

可选地,所述根据连续M个时域信号帧的第一削波置信度,确定对应的第二削波置信度,包括:对该检测窗内的M个时域信号帧的第一削波置信度进行第一加权处理,获得所述第二削波置信度;所述根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例,包括:对该检测窗内的M个时域信号帧的第一削波比例进行第二加权处理,获得所述第二削波比例。Optionally, the determining the corresponding second clipping confidence level according to the first clipping confidence level of consecutive M time-domain signal frames includes: the first clipping confidence level of the M time-domain signal frames in the detection window Perform the first weighting process on the wave confidence degree to obtain the second clipping confidence degree; said determining the second clipping ratio corresponding to the detection window according to the first clipping ratio of consecutive M time-domain signal frames includes: A second weighting process is performed on the first clipping ratios of the M time-domain signal frames in the detection window to obtain the second clipping ratio.

可选地,所述M为基于连续削波形成破音的最小时长与时域信号帧的长度确定的。Optionally, the M is determined based on the minimum duration of continuous clipping to form a broken sound and the length of a time-domain signal frame.

在本发明实施方式的第二方面中,提供了一种破音检测装置,其特征在于,装置包括:信号划分模块,被配置为将待检测音频信号划分为N个时域信号帧,所述N为正整数;第一确定模块,被配置为根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,所述第一削波置信度用于区分正常音频信号和削波音频信号;第二确定模块,被配置为根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;所述M为小于所述N的正整数;破音确定模块,被配置为响应于所述第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。In the second aspect of the embodiment of the present invention, there is provided a broken sound detection device, which is characterized in that the device includes: a signal division module configured to divide the audio signal to be detected into N time-domain signal frames, the N is a positive integer; the first determination module is configured to determine the first clipping confidence corresponding to the frame according to the statistical distribution of the signal amplitude in each frame of the time-domain signal, and the first clipping confidence is used For distinguishing between normal audio signals and clipped audio signals; the second determination module is configured to determine the detection window corresponding to the detection window formed by consecutive M time-domain signal frames according to the first clipping confidence of consecutive M time-domain signal frames The second clipping confidence degree; the M is a positive integer smaller than the N; the broken sound determination module is configured to determine the corresponding detection window memory in response to the second clipping confidence degree being greater than a preset first threshold In broken sound.

可选地,所述第一确定模块包括:直方图构建模块,被配置为对于每帧时域信号帧,统计各幅值区段内的信号数量,以构建统计直方图;所述幅值区段为对由该时域信号帧的最大幅值与最小幅值形成的幅值区间进行划分获得的;查找模块,被配置为查找所述统计直方图中的目标区块,所述目标区块为区块两端均高于区块中间,且区块后端高于区块前端的区块,所述区块前端和所述区块后端为根据所述查找的先后顺序确定的;距离确定模块,被配置为确定各所述目标区块中区块两端的横向距离;置信度确定模块,被配置为根据各所述横向距离中的最大值与所述幅值区段的总数量的比值,确定所述第一削波置信度。Optionally, the first determination module includes: a histogram construction module configured to, for each time-domain signal frame, count the number of signals in each amplitude range to construct a statistical histogram; the amplitude range The segment is obtained by dividing the amplitude interval formed by the maximum amplitude and the minimum amplitude of the time-domain signal frame; the search module is configured to search for the target block in the statistical histogram, and the target block Both ends of the block are higher than the middle of the block, and the back end of the block is higher than the front end of the block. The front end of the block and the back end of the block are determined according to the sequence of the search; the distance The determination module is configured to determine the lateral distance between both ends of the block in each of the target blocks; the confidence determination module is configured to determine the maximum value of each of the lateral distances according to the total number of the amplitude segments. ratio to determine the first clipping confidence.

可选地,所述查找模块还被配置为:分别从所述统计直方图的两端开始,依次向中间移动,查找所述目标区块。Optionally, the search module is further configured to: respectively start from both ends of the statistical histogram and move to the middle in order to search for the target block.

可选地,所述装置还包括削波比例确定模块,削波比例确定模块被配置为:对于每帧时域信号帧,根据目标信号在该帧时域信号帧中的占比,确定该帧的第一削波比例,所述目标信号为幅值最大值和幅值最小值对应的信号;根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例;所述破音确定模块还被配置为:响应于所述第二削波置信度大于预设第一阈值,且所述第二削波比例大于预设第二阈值,确定该检测窗内存在破音。Optionally, the device further includes a clipping ratio determining module, and the clipping ratio determining module is configured to: for each time-domain signal frame, determine the frame according to the proportion of the target signal in the time-domain signal frame The first clipping ratio of , the target signal is the signal corresponding to the maximum amplitude value and the minimum amplitude value; according to the first clipping ratio of consecutive M time-domain signal frames, determine the second clipping ratio corresponding to the detection window The broken sound determination module is also configured to: in response to the second clipping confidence being greater than a preset first threshold, and the second clipping ratio is greater than a preset second threshold, determine that there is a Breaking sound.

可选地,所述装置还包括:频域特征确定模块,被配置为确定该检测窗内时域信号帧的频域能量特征;所述破音确定模块还被配置为:响应于检测窗内信号帧不满足以下至少一项:所述第二削波置信度大于预设第一阈值,所述第二削波比例大于预设第二阈值,则根据所述频域能量特征,确定该检测窗内是否存在破音。Optionally, the device further includes: a frequency domain feature determination module configured to determine the frequency domain energy feature of the time domain signal frame within the detection window; the broken sound determination module is further configured to: respond to If the signal frame does not meet at least one of the following: the second clipping confidence is greater than a preset first threshold, and the second clipping ratio is greater than a preset second threshold, then the detection is determined according to the frequency domain energy feature Whether there is a broken sound in the window.

可选地,所述频域能量特征包括截止频率值,所述频域特征确定模块还被配置为:对该检测窗内的每帧时域信号帧进行时频变换,得到对应的频域信号;确定所述频域信号的能量重心为该信号帧的所述截止频率值。Optionally, the frequency-domain energy feature includes a cutoff frequency value, and the frequency-domain feature determination module is further configured to: perform time-frequency transformation on each time-domain signal frame within the detection window to obtain a corresponding frequency-domain signal ; Determine the energy center of gravity of the frequency domain signal as the cutoff frequency value of the signal frame.

可选地,所述破音确定模块还被配置为:响应于信号帧的所述截止频率值大于频率阈值,确定该信号帧存在削波,所述频率阈值为基于所述频域信号的最大频率值确定的;响应于该检测窗内至少存在预设数量的具有削波的信号帧,确定该检测窗存在破音。Optionally, the broken sound determination module is further configured to: determine that there is clipping in the signal frame in response to the cutoff frequency value of the signal frame being greater than a frequency threshold, and the frequency threshold is a maximum value based on the frequency domain signal The frequency value is determined; in response to at least a preset number of signal frames with clipping within the detection window, it is determined that there is a broken sound in the detection window.

可选地,所述第二确定模块还被配置为:对该检测窗内的M个时域信号帧的第一削波置信度进行第一加权处理,获得所述第二削波置信度;所述削波比例确定模块还被配置为:对该检测窗内的M个时域信号帧的第一削波比例进行第二加权处理,获得所述第二削波比例。Optionally, the second determination module is further configured to: perform a first weighting process on the first clipping confidences of the M time-domain signal frames within the detection window to obtain the second clipping confidences; The clipping ratio determining module is further configured to: perform a second weighting process on the first clipping ratios of the M time-domain signal frames within the detection window to obtain the second clipping ratio.

可选地,所述M为基于连续削波形成破音的最小时长与时域信号帧的长度确定的。Optionally, the M is determined based on the minimum duration of continuous clipping to form a broken sound and the length of a time-domain signal frame.

在本发明实施方式的第三方面中,提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现如上述实施例中的方法。In a third aspect of the embodiments of the present invention, a storage medium is provided, on which a program is stored, and when the program is executed by a processor, the methods in the above-mentioned embodiments are implemented.

在本发明实施方式的第四方面中,提供了一种电子设备,包括:处理器和存储器,存储器存储有可执行指令,处理器用于调用存储器存储的可执行指令执行如上述实施例中的方法。In the fourth aspect of the embodiments of the present invention, an electronic device is provided, including: a processor and a memory, the memory stores executable instructions, and the processor is used to call the executable instructions stored in the memory to execute the method in the above-mentioned embodiments .

根据本公开实施方式提供的破音检测方法,一方面,可以根据每帧时域信号帧内的信号幅值的统计分布情况,确定该帧对应的第一削波置信度,以区分正常音频信号和削波音频信号,即可以基于信号的统计分布情况来动态调整削波阈值,解决了现有固定削波检测阈值带来的削波检测准确率的问题,进而保证了破音检测的准确率。另一方面,通过连续M个时域信号帧的第一削波置信度确定对应检测窗的第二削波置信度,以检测窗为检测单位检测破音,保证检测出的破音为实际上的听感破音,避免了对短时削波而无听感破音的误检现象,降低了破音误检率。此外,本发明在时域进行破音检测,计算复杂度低且检测效率高。According to the broken sound detection method provided by the embodiments of the present disclosure, on the one hand, according to the statistical distribution of the signal amplitude in each frame of the time-domain signal frame, the first clipping confidence corresponding to the frame can be determined to distinguish normal audio signals And clipped audio signals, that is, the clipping threshold can be dynamically adjusted based on the statistical distribution of the signal, which solves the problem of clipping detection accuracy caused by the existing fixed clipping detection threshold, thereby ensuring the accuracy of broken sound detection . On the other hand, the second clipping confidence level corresponding to the detection window is determined by the first clipping confidence level of M consecutive time-domain signal frames, and the broken sound is detected with the detection window as the detection unit, so as to ensure that the detected broken sound is actually It avoids the false detection of short-term clipping without hearing the broken sound, and reduces the false detection rate of the broken sound. In addition, the invention detects broken sounds in the time domain, which has low computational complexity and high detection efficiency.

附图说明Description of drawings

通过参考附图阅读下文的详细描述,本公开示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本公开的若干实施方式,其中:The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily understood by reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of the present disclosure are shown by way of illustration and not limitation, in which:

图1示意性示出了根据本公开的一个实施例的破音检测方法流程示意图。Fig. 1 schematically shows a flowchart of a method for detecting broken sounds according to an embodiment of the present disclosure.

图2示意性示出了根据本公开的一个实施例的统计直方图的示意图。Fig. 2 schematically shows a schematic diagram of a statistical histogram according to an embodiment of the present disclosure.

图3示意性示出了根据本公开的一个实施例的基于统计直方图确定一信号帧的第一削波置信度的流程示意图。Fig. 3 schematically shows a flowchart of determining a first clipping confidence level of a signal frame based on a statistical histogram according to an embodiment of the present disclosure.

图4示意性示出了根据本公开的一个实施例的破音检测过程的流程示意图。Fig. 4 schematically shows a flowchart of a broken sound detection process according to an embodiment of the present disclosure.

图5示意性示出了根据本公开的一个实施例的音频信号的破音检测结果图。Fig. 5 schematically shows a diagram of a broken sound detection result of an audio signal according to an embodiment of the present disclosure.

图6示意性示出了根据本公开的一个实施例的破音检测装置结构框图。Fig. 6 schematically shows a structural block diagram of a broken sound detection device according to an embodiment of the present disclosure.

图7示意性示出了适于用来实现本发明实施例的电子设备的结构示意图。Fig. 7 schematically shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present invention.

在附图中,相同或对应的标号表示相同或对应的部分。In the drawings, the same or corresponding reference numerals denote the same or corresponding parts.

具体实施方式Detailed ways

下面将参考若干示例性实施方式来描述本公开的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本公开,而并非以任何方式限制本公开的范围。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。The principle and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are given only to enable those skilled in the art to better understand and implement the present disclosure, rather than to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

本领域技术人员知道,本公开的实施方式可以实现为一种系统、装置、设备、方法或计算机程序产品。因此,本公开可以具体实现为以下形式,即:完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。Those skilled in the art know that the embodiments of the present disclosure may be implemented as a system, device, device, method or computer program product. Therefore, the present disclosure may be embodied in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

根据本公开的实施方式,提出了一种破音检测方法和装置、电子设备、存储介质。According to the embodiments of the present disclosure, a broken sound detection method and device, electronic equipment, and a storage medium are proposed.

发明概述Summary of the invention

破音:又称为爆音,是输入音频信号超过了当前音频信号数字表述的最大范围导致的音质明显受损的现象,表现为声音刺耳,使原本的声音变得含混不清。从语谱图上看,各频段的能量都很高。Broken sound: Also known as popping sound, it is a phenomenon that the sound quality is obviously damaged due to the input audio signal exceeding the maximum range of the current digital representation of the audio signal. It is manifested as harsh sound, making the original sound unclear. From the spectrogram, the energy of each frequency band is very high.

削波:又称为限幅,是一种音频失真形式,在信号超过特定阈值的情况下会被限制。从波形上看波峰和波谷被推平,数值变为阈值。一般录音设备都会将阈值设为1防止功率过大。而达到一定程度的削波会导致听感上的破音。Clipping: Also known as clipping, is a form of audio distortion that limits a signal above a certain threshold. From the waveform, the peaks and troughs are flattened, and the value becomes the threshold. Generally, recording equipment will set the threshold to 1 to prevent excessive power. Reaching a certain degree of clipping will lead to a broken sound in the sense of hearing.

快速傅里叶变换(FFT):是离散傅里叶变换的一种快速算法,用于将时域信号变换到频域。IFFT是FFT的反变换,即将频域信号变回时域。Fast Fourier Transform (FFT): It is a fast algorithm of discrete Fourier transform, which is used to transform the time domain signal into the frequency domain. IFFT is the inverse transform of FFT, that is, the frequency domain signal is changed back to the time domain.

梅尔频普(Mel谱):是指基于短时傅里叶变换,把一段长信号分帧、加窗,再对每一帧做FFT,最后把每一帧的结果沿另一个维度堆叠起来得到的二维信号形式(谱图)。Mel Spectrum (Mel Spectrum): Based on the short-time Fourier transform, a long signal is divided into frames, windowed, and then FFT is performed on each frame, and finally the results of each frame are stacked along another dimension The resulting two-dimensional signal form (spectrogram).

破音的原因可能是在麦克风采集音频过程中,音频信号的音量超过了录音设备模数转换器的上限。破音的原因还可能是在音频信号处理过程中,对音频信号进行了放大处理,而放大结果超出了声音位深度的最大数字表述。在听感上,破音表现为声音变沙哑或有杂音。The reason for the broken sound may be that the volume of the audio signal exceeds the upper limit of the analog-to-digital converter of the recording device during the process of collecting audio by the microphone. The reason for the broken sound may also be that the audio signal is amplified during the audio signal processing process, and the amplified result exceeds the maximum digital expression of the sound bit depth. In the sense of hearing, broken sound is manifested as hoarseness or noise.

破音检测是检测音频信号中是否存在破音现象以及破音的具体位置,通过破音检测可以即时调整录音参数或音频信号处理参数,避免破音出现;也可以用于辅助破音修复算法对检出破音的位置进行针对性修复。Broken sound detection is to detect whether there is a broken sound in the audio signal and the specific location of the broken sound. Through the broken sound detection, the recording parameters or audio signal processing parameters can be adjusted in real time to avoid the broken sound; it can also be used to assist the broken sound repair algorithm The position where the broken sound is detected is targeted for repair.

相关技术中,按分析的域可以分为时域破音检测和频域破音检测,其中时域主要统计音频各采样点的数值,频域通过计算频谱提取特征进行分析判断。但是不论时域还是频域方法,都是通过检测音频是否存在削波片段和削波片段的位置来检测破音的。事实上,短时间的削波并不会引起听感破音,即听感破音并不完全等价于削波。In related technologies, the domain of analysis can be divided into time-domain broken sound detection and frequency-domain broken sound detection. The time domain mainly counts the values of each audio sampling point, and the frequency domain analyzes and judges by calculating the spectrum extraction features. But regardless of the time-domain or frequency-domain method, the broken sound is detected by detecting whether there is a clipping segment and the position of the clipping segment in the audio. In fact, short-term clipping does not cause audible cracks, that is, audible cracks are not completely equivalent to clipping.

针对以上问题,本发明基于的发明构思为:通过时域信号帧内的信号统计分布情况来确定该帧的第一削波置信度,以进行该帧的削波检测,从而可以实现不同帧的动态削波检测阈值,能够提高削波检测准确度;还设计了由连续M个时域信号帧形成的破音检测窗,以检测窗为单位进行破音检测,降低将短时削波作为破音的误检率。将帧内特征和连续多帧的连续帧特征相结合,实现听感破音的准确检测。In view of the above problems, the present invention is based on the idea of: determining the first clipping confidence level of the frame through the statistical distribution of signals in the time domain signal frame, so as to perform clipping detection of the frame, so that different frames can be realized. The dynamic clipping detection threshold can improve the accuracy of clipping detection; a broken sound detection window formed by continuous M time-domain signal frames is also designed, and the broken sound detection is performed with the detection window as a unit, reducing the short-term clipping as a broken sound. Sound false detection rate. Combining the intra-frame features and the continuous frame features of continuous multi-frames, the accurate detection of auditory cracks is realized.

本申请的破音检测方法可以用于各类应用程序的音频处理过程中。例如社交类应用程序、音乐类应用程序、视频类应用程序、游戏类应用程序等具有音频处理功能的应用程序。对于社交类应用程序,可以采用本申请提供的破音检测方法,实时检测社交通话过程中语音是否出现破音及破音所在位置,以及时调整设备参数或网络状态。对于音乐类应用程序,可以采用本申请提供的破音检测方法,实时检测播放音乐是否出现破音及破音部分所在位置,以调整设备参数或切换其他音频,保证用户体验。The broken sound detection method of the present application can be used in the audio processing process of various application programs. For example, social application programs, music application programs, video application programs, game application programs, and other application programs with audio processing functions. For social applications, the broken sound detection method provided by this application can be used to detect in real time whether there is a broken sound and the location of the broken sound during a social call, so as to adjust device parameters or network status in time. For music applications, the broken sound detection method provided by this application can be used to detect in real time whether there is broken sound and the location of the broken sound part in playing music, so as to adjust device parameters or switch other audio to ensure user experience.

本公开的各实施例的方法及装置,可以应用在包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,该破音检测方法可以由安装在终端设备或/和服务端设备的软件或硬件来执行,软件可以是区块链平台。服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。本公开以电子设备为例进行说明。The methods and devices in the embodiments of the present disclosure can be applied to at least one of electronic devices including but not limited to servers and terminals that can be configured to execute the methods provided in the embodiments of the present application. In other words, the broken sound detection method can be executed by software or hardware installed on the terminal device or/and server device, and the software can be a block chain platform. The server includes but is not limited to: single server, server cluster, cloud server or cloud server cluster, etc. The present disclosure takes an electronic device as an example for description.

示例性方法exemplary method

以下结合说明书附图对本公开的优选实施例进行说明,应当理解,此处所描述的优选实施例仅用于说明和解释本发明,并不用于限定本发明,并且在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。The preferred embodiments of the present disclosure will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention. The embodiments and the features in the embodiments can be combined with each other.

下面参考图1来描述根据本公开示例性实施方式的破音检测方法,可以包括步骤S110-S140。The following describes a broken sound detection method according to an exemplary embodiment of the present disclosure with reference to FIG. 1 , which may include steps S110-S140.

步骤S110,将待检测音频信号划分为N个时域信号帧,N为正整数。Step S110, dividing the audio signal to be detected into N time-domain signal frames, where N is a positive integer.

步骤S120,根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,第一削波置信度用于区分正常音频信号和削波音频信号。Step S120, according to the statistical distribution of signal amplitude in each frame of the time-domain signal, determine the first clipping confidence corresponding to the frame, and the first clipping confidence is used to distinguish normal audio signals from clipped audio signals.

步骤S130,根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;M为小于N的正整数。Step S130, according to the first clipping confidence of M consecutive time-domain signal frames, determine the second clipping confidence corresponding to the detection window formed by M consecutive time-domain signal frames; M is a positive integer smaller than N.

步骤S140,响应于第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。Step S140, in response to the second clipping confidence being greater than the preset first threshold, it is determined that there is a broken sound within the corresponding detection window.

本公开实施例提供的破音检测方法,一方面,可以根据每帧时域信号帧内的信号幅值的统计分布情况,确定该帧对应的第一削波置信度,以区分正常音频信号和削波音频信号,即可以基于信号的统计分布情况来动态调整削波阈值,解决了现有固定削波检测阈值带来的削波检测准确率的问题,进而保证了破音检测的准确率。另一方面,通过连续M个时域信号帧的第一削波置信度确定对应检测窗的第二削波置信度,以检测窗为检测单位检测破音,保证检测出的破音为实际上的听感破音,避免了对短时削波而无听感破音的误检现象,降低了破音误检率。此外,本发明在时域进行破音检测,计算复杂度低且检测效率高。The broken sound detection method provided by the embodiments of the present disclosure, on the one hand, can determine the first clipping confidence corresponding to the frame according to the statistical distribution of the signal amplitude in each frame of the time-domain signal, so as to distinguish between normal audio signals and Clipping audio signals means that the clipping threshold can be dynamically adjusted based on the statistical distribution of the signal, which solves the problem of clipping detection accuracy caused by the existing fixed clipping detection threshold, thereby ensuring the accuracy of broken sound detection. On the other hand, the second clipping confidence level corresponding to the detection window is determined by the first clipping confidence level of M consecutive time-domain signal frames, and the broken sound is detected with the detection window as the detection unit, so as to ensure that the detected broken sound is actually It avoids the false detection of short-term clipping without hearing the broken sound, and reduces the false detection rate of the broken sound. In addition, the invention detects broken sounds in the time domain, which has low computational complexity and high detection efficiency.

以下对上述各步骤进行详细说明。The above steps will be described in detail below.

在步骤S110中,将待检测音频信号划分为N个时域信号帧。In step S110, the audio signal to be detected is divided into N time-domain signal frames.

在本示例实施方式中,待检测音频信号可以是时域音频信号,其来源可以是录音音频、实时通话音频或待播放音乐音频等,本示例对此不做限定。可以根据固定时长对待检测音频信号进行帧划分,如5毫秒、10毫秒15毫秒等,也可以根据设定好的帧长度和重叠率划分信号帧,本示例对此不做限定。重叠率是指信号的重叠程度,信号重叠是指相邻两帧信号存在一定长度的重叠,通过信号重叠可以消除因窗函数加权所消耗的一部分真实信号。N为划分的信号帧的数量,取值可以是1或大于1的整数。还可以记录每一信号帧的时间戳,以区分信号帧及其顺序。In this example embodiment, the audio signal to be detected may be a time-domain audio signal, and its source may be recorded audio, real-time call audio, or music audio to be played, etc., which is not limited in this example. The audio signal to be detected can be divided into frames according to a fixed duration, such as 5 milliseconds, 10 milliseconds, 15 milliseconds, etc., or the signal frame can be divided according to a set frame length and overlap rate, which is not limited in this example. The overlap rate refers to the degree of signal overlap. Signal overlap means that there is a certain length of overlap between two adjacent frames of signals. Through signal overlap, part of the real signal consumed by window function weighting can be eliminated. N is the number of divided signal frames, and the value may be an integer greater than 1. It is also possible to record the time stamp of each signal frame to distinguish the signal frame and its sequence.

在步骤S120中,根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度。In step S120, the first clipping confidence level corresponding to the frame is determined according to the statistical distribution of the signal amplitude in each frame of the time-domain signal.

在本示例实施方式中,对于每帧时域信号帧,可以对信号幅值的分布情况进行统计,基于统计结果确定第一削波置信度。采用第一削波置信度区分正常音频信号和削波音频信号。In this example implementation, for each time-domain signal frame, statistics may be performed on the distribution of signal amplitudes, and the first clipping confidence level may be determined based on the statistical results. The normal audio signal is distinguished from the clipped audio signal using the first clipping confidence level.

示例性地,可以通过以下步骤确定第一削波置信度。Exemplarily, the first clipping confidence level may be determined through the following steps.

第一步,对于每帧时域信号帧,统计各幅值区段内的信号数量,以构建统计直方图。In the first step, for each time-domain signal frame, the number of signals in each amplitude range is counted to construct a statistical histogram.

在本示例实施方式中,可以对由时域信号帧的最大幅值与最小幅值形成的幅值区间进行区段划分,例如,将该幅值区间等分成K(K为正整数)段,统计每个幅值区段的信号数量,构建统计直方图,如图2所示,直方图的横坐标为幅值区段的编号(如1,2,…K),纵坐标为信号数量。In this exemplary embodiment, the amplitude interval formed by the maximum amplitude and the minimum amplitude of the time-domain signal frame may be segmented, for example, the amplitude interval is equally divided into K (K is a positive integer) segments, The number of signals in each amplitude segment is counted, and a statistical histogram is constructed. As shown in Figure 2, the abscissa of the histogram is the number of the amplitude segment (such as 1, 2, ... K), and the ordinate is the number of signals.

第二步,查找统计直方图中的目标区块。The second step is to find the target block in the statistical histogram.

在本示例实施方式中,目标区块是指统计直方图中区块两端均高于区块中间,且区块后端高于区块前端的区块,区块前端和区块后端为根据查找遍历统计直方图的先后顺序确定的,先遍历的直方柱为区块前端,后遍历的直方柱为区块后端。目标区块如图2中的虚线框所示。可以通过循环遍历的方式查找目标区块。例如,从统计直方图的左端向右端查找,找到区块两端均高于区块中间,且区块右端高于区块左端的区块作为目标区块。还可以从统计直方图的右端向左端查找,找到区块两端均高于区块中间,且区块左端高于区块右端的区块作为目标区块。示例性地,还可以同时从统计直方图的两端(即左端和右端)开始,依次向中间移动,查找目标区块,以加快查找速度。In this exemplary embodiment, the target block refers to a block whose both ends of the block in the statistical histogram are higher than the middle of the block, and the back end of the block is higher than the front end of the block, and the front end of the block and the back end of the block are Determined according to the order of searching and traversing statistical histograms, the histogram traversed first is the front end of the block, and the histogram traversed later is the back end of the block. The target block is shown in the dotted box in Figure 2. The target block can be found by cyclic traversal. For example, search from the left end to the right end of the statistical histogram, find a block whose both ends are higher than the middle of the block, and the right end of the block is higher than the left end of the block as the target block. It is also possible to search from the right end to the left end of the statistical histogram to find a block whose both ends of the block are higher than the middle of the block and whose left end is higher than the right end of the block as the target block. Exemplarily, it is also possible to start from both ends (ie, the left end and the right end) of the statistical histogram at the same time, and move to the middle in order to search for the target block, so as to speed up the search.

第三步,确定各目标区块中区块两端之间的横向距离。The third step is to determine the lateral distance between the two ends of the block in each target block.

在本示例实施方式中,可以采用区块右端的编号与区块左端的编号作差,以获得该横向距离,如图2中的横向距离为k2-k1,1≤k1<k2≤K。In this exemplary implementation, the difference between the number at the right end of the block and the number at the left end of the block can be used to obtain the horizontal distance, as shown in FIG. 2 , the horizontal distance is k2-k1, 1≤k1<k2≤K.

第四步,根据各横向距离中的最大值与幅值区段的总数量的比值,确定第一削波置信度。Step 4: Determine the first clipping confidence level according to the ratio of the maximum value in each lateral distance to the total number of amplitude segments.

在本示例实施方式中,在目标区块数量为1时,可以将该目标区块的横向距离与幅值区段作商,确定第一削波置信度。在目标区块的数量大于1的情况下,可以选择横向距离最大的目标区块确定第一削波置信度。In this example implementation, when the number of target blocks is 1, the first clipping confidence level may be determined by making a business between the lateral distance of the target block and the amplitude range. When the number of target blocks is greater than 1, the target block with the largest lateral distance may be selected to determine the first clipping confidence level.

举例而言,设对于K个幅值区段的第k幅值区段,落在该段内的信号数量为H(k)个。如图3所示,可以通过以下步骤确定第一削波置信度。For example, it is assumed that for a k-th amplitude segment of the K amplitude segments, the number of signals falling within this segment is H(k). As shown in Fig. 3, the first clipping confidence can be determined through the following steps.

步骤S301,初始化左边直方图数量Yl0=H(kl),右边直方图数量Yr0=H(kr),左距离dl(左指针距直方图左端的横向距离)和右距离dr(右指针距直方图右端的横向距离)为0,即dl=dr=0,左边指针kl=1,右边指针kr=K;最大横向距离参数Dmax=0。Step S301, initialize the left histogram quantity Yl0=H(kl), the right histogram quantity Yr0=H(kr), the left distance dl (the horizontal distance from the left pointer to the left end of the histogram) and the right distance dr (the right pointer to the histogram The lateral distance at the right end) is 0, that is, dl=dr=0, the left pointer kl=1, the right pointer kr=K; the maximum lateral distance parameter Dmax=0.

步骤S302,左边指针kl增加1,左距离dl增加1。In step S302, the left pointer kl is increased by 1, and the left distance dl is increased by 1.

步骤S303,在统计直方图中确定kl指针对应的左边直方图数量H(kl)是否大于Yl0,若是,则转至S304,否则转至S308。Step S303, determine in the statistical histogram whether the left histogram quantity H(kl) corresponding to the kl pointer is greater than Yl0, if yes, go to S304, otherwise go to S308.

步骤S304,更新Yl0=H(kl)并将dl重置为0,否则,转至步骤S340。此步骤为从左边向中间遍历过程。Step S304, update Yl0=H(kl) and reset dl to 0, otherwise, go to step S340. This step is a traversal process from left to middle.

步骤S305,右边指针kr减少1,右距离dr增加1。In step S305, the right pointer kr is decreased by 1, and the right distance dr is increased by 1.

步骤S306,在统计直方图中确定kr指针对应的右边直方图数量H(kr)是否大于Yr0,若是,则转至S307,否则转至S308。Step S306, determine in the statistical histogram whether the number of right histograms H(kr) corresponding to the kr pointer is greater than Yr0, if yes, go to S307, otherwise go to S308.

步骤S307,更新Yr0=H(kr),并将dr重置为0,否则,转至步骤S340。此步骤为从右边向中间遍历过程。Step S307, update Yr0=H(kr), and reset dr to 0, otherwise, go to step S340. This step is a traversal process from right to middle.

步骤S308,更新Dmax为此时Dmax、dl和dr中的最大值。Step S308, updating Dmax to be the maximum value among Dmax, dl and dr at this time.

步骤S309,判断kr是否小于kl,若是,转至步骤S310,否则转至步骤S302和S305,进入下一次循环。Step S309, judge whether kr is smaller than kl, if so, go to step S310, otherwise go to steps S302 and S305, and enter the next cycle.

步骤S310,确定该时间帧的第一削波置信度Rcl=Dmax/K。Step S310, determine the first clipping confidence Rcl=Dmax/K of the time frame.

以上步骤中,S302-S303和S305-306可以同步进行,可以根据以上过程确定每个信号帧的第一削波置信度并记录。In the above steps, S302-S303 and S305-306 can be performed synchronously, and the first clipping confidence level of each signal frame can be determined and recorded according to the above process.

在步骤S130中,根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度。In step S130, according to the first clipping confidence levels of the consecutive M time-domain signal frames, a second clipping confidence level corresponding to the detection window formed by the M consecutive time-domain signal frames is determined.

在本示例实施方式中,M为小于N的正整数,M可以基于连续削波形成破音的最小时长与时域信号帧的长度确定,连续削波形成破音的最小时长可以根据经验和实际情况确定,例如该最小时长可以设置为0.4秒,可以将该最小时长与信号帧长度之间的比值向上取整后作为M,也可以对该比值进行适当放大后取整作为M,本示例对此不做限定。可以将连续M个时域信号帧形成一个针对破音的检测窗,即对每个检测窗进行破音检测。In this example implementation, M is a positive integer less than N, M can be determined based on the minimum duration of continuous clipping to form a broken sound and the length of the time-domain signal frame, and the minimum duration of continuous clipping to form a broken sound can be based on experience and reality The situation is determined. For example, the minimum hour length can be set to 0.4 seconds. The ratio between the minimum hour length and the signal frame length can be rounded up and taken as M, or the ratio can be properly enlarged and rounded up as M. In this example, This is not limited. M consecutive time-domain signal frames may be formed into a detection window for broken sounds, that is, sound broken detection is performed on each detection window.

在本示例实施方式中,可以对连续M个时域信号帧的第一削波置信度进行相关运算,确定第二削波置信度。例如,该相关运算可以是求平均值运算,也可以是加权运算或聚合处理,本示例对此不做限定。In this exemplary embodiment, a correlation operation may be performed on the first clipping confidence levels of M consecutive time-domain signal frames to determine the second clipping confidence level. For example, the correlation operation may be an averaging operation, a weighting operation or aggregation processing, which is not limited in this example.

示例性地,可以对该检测窗内的M个时域信号帧的第一削波置信度进行第一加权处理,获得第二削波置信度。Exemplarily, a first weighting process may be performed on the first clipping confidences of the M time-domain signal frames within the detection window to obtain the second clipping confidences.

在本示例实施方式中,第一加权处理是指加权求和处理,M个信号帧对应权值之和可以是1,例如,可以对M各信号帧的第一削波置信度进行高斯加权处理,获得第二削波置信度,高斯加权处理是指在计算平均值时,不同位置的数值计算权重不同。高斯加权代表权重数值遵循高斯分布,两头小中间大。In this example embodiment, the first weighting process refers to weighted summation processing, and the sum of weights corresponding to M signal frames may be 1, for example, Gaussian weighting processing may be performed on the first clipping confidence of each M signal frame , to obtain the second clipping confidence level, and the Gaussian weighting process means that when calculating the average value, the numerical calculation weights of different positions are different. Gaussian weighting means that the weight value follows a Gaussian distribution, with small ends and a large middle.

在步骤S140中,响应于第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。In step S140, in response to the second clipping confidence being greater than the preset first threshold, it is determined that there is a broken sound within the corresponding detection window.

在本示例实施方式中,第一阈值是指检测窗的破音阈值,可以根据实际情况进行设置。在检测窗的第二削波置信度大于第一阈值的情况下,可以认为该检测窗存在破音。In this example embodiment, the first threshold refers to the broken sound threshold of the detection window, which can be set according to actual conditions. In a case where the second clipping confidence level of the detection window is greater than the first threshold, it may be considered that there is a broken sound in the detection window.

在一些实施例中,除了削波置信度以外,还可以增加削波比例作为破音检测特征,方法还可以包括以下步骤。In some embodiments, in addition to the clipping confidence, a clipping ratio may also be added as a crack detection feature, and the method may further include the following steps.

对于每帧时域信号帧,根据目标信号在该帧时域信号帧中的占比,确定该帧的第一削波比例。根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例。For each time-domain signal frame, the first clipping ratio of the frame is determined according to the proportion of the target signal in the time-domain signal frame. A second clipping ratio corresponding to the detection window is determined according to the first clipping ratios of M consecutive time-domain signal frames.

在本示例实施方式中,目标信号为幅值最大值和幅值最小值对应的信号。由于削波的本质是信号绝对值高于阈值时就会被削减到阈值,因此削波片段必然出现有多个采样点的数值等于阈值,基于此来确定第一削波比例。对于每个时域信号帧,先确定幅值最大值和幅值最小值,统计该帧内幅值最大值和幅值最小值的采样点数量n1,则该帧的第一削波比例为n1/n2,n2为该帧的采样点总数。In this exemplary embodiment, the target signal is a signal corresponding to a maximum amplitude value and a minimum amplitude value. Since the essence of clipping is that when the absolute value of the signal is higher than the threshold value, it will be cut to the threshold value, so there must be multiple sampling points whose values are equal to the threshold value in the clipping segment, and the first clipping ratio is determined based on this. For each time-domain signal frame, first determine the maximum amplitude value and the minimum amplitude value, and count the number n1 of sampling points of the maximum amplitude value and minimum amplitude value in the frame, then the first clipping ratio of the frame is n1 /n2, n2 is the total number of sampling points of the frame.

在本示例实施方式中,可以对连续M个时域信号帧的第一削波比例进行相关运算,确定第二削波比例。例如,该相关运算可以是求平均值运算,也可以是加权运算或聚合处理,本示例对此不做限定。示例性地,可以对该检测窗内的M个时域信号帧的第一削波比例进行第二加权处理,获得第二削波比例。第二加权处理可以是高斯加权处理,第而加权处理可以与第一加权处理相同或不同,本示例对此不做限定。In this exemplary embodiment, a correlation operation may be performed on the first clipping ratios of M consecutive time-domain signal frames to determine the second clipping ratio. For example, the correlation operation may be an averaging operation, a weighting operation or aggregation processing, which is not limited in this example. Exemplarily, a second weighting process may be performed on the first clipping ratios of the M time-domain signal frames within the detection window to obtain the second clipping ratio. The second weighting process may be Gaussian weighting process, and the second weighting process may be the same as or different from the first weighting process, which is not limited in this example.

基于削波置信度和削波比例两个特征下的破音检测策略为:响应于第二削波置信度大于预设第一阈值,且第二削波比例大于预设第二阈值,确定该检测窗内存在破音。The broken sound detection strategy based on the two features of clipping confidence and clipping ratio is: in response to the second clipping confidence being greater than the preset first threshold, and the second clipping ratio being larger than the preset second threshold, determine the There is a broken sound in the detection window.

在本示例实施方式中,第二阈值是破音在削波比例维度的阈值,可以根据实际情况进行设置。通过削波置信度和削波比例两个维度的特征确定检测窗是否存在破音,可以提高破音的检测准确率。In this example implementation, the second threshold is the threshold of the broken sound in the clipping ratio dimension, which can be set according to actual conditions. By using the features of the two dimensions of clipping confidence and clipping ratio to determine whether there is a broken sound in the detection window, the detection accuracy of the broken sound can be improved.

实际中除了明显削波片段以外,还存在一些信号数值接近削波阈值但实际听感有破音的场景,对于该场景,本发明基于削波的特性,即削波会导致高频能量比正常音频更高,通过添加频域特征来检测。In practice, in addition to obvious clipping segments, there are also some scenes where the signal value is close to the clipping threshold but the actual hearing is broken. For this scene, the present invention is based on the characteristics of clipping, that is, clipping will cause high-frequency energy to be higher than normal. Audio is higher and detected by adding frequency domain features.

在一些实施例中,方法还包括:确定该检测窗内时域信号帧的频域能量特征。In some embodiments, the method further comprises: determining a frequency domain energy signature of the time domain signal frame within the detection window.

在本示例实施方式中,频域能量特征可以包括各种频域的能量指标,如功率值、能量最大值、能量重心、截止频率值等,本示例对此不做限定。In this example embodiment, the frequency domain energy feature may include various frequency domain energy indicators, such as power value, energy maximum value, energy center of gravity, cutoff frequency value, etc., which is not limited in this example.

示例性地,频域能量特征包括截止频率值,可以先对该检测窗内的每帧时域信号帧进行时频变换,得到对应的频域信号;再确定频域信号的能量重心为该信号帧的截止频率值。Exemplarily, the frequency-domain energy feature includes a cut-off frequency value, and time-frequency transformation can be performed on each frame of the time-domain signal frame in the detection window to obtain the corresponding frequency-domain signal; and then the energy center of gravity of the frequency-domain signal is determined as the signal The cutoff frequency value for frames.

在本示例实施方式中,可以对每一信号帧依次进行加窗处理和FFT,得到对应的Mel谱,再确定该Mel谱的能量分布重心即频率Fc。能量分布重心可以通过每个频率位置的能量(该位置的幅值与频率值的乘积)之和与频率位置的总数量的比值来确定。In this exemplary embodiment, windowing processing and FFT may be performed sequentially on each signal frame to obtain a corresponding Mel spectrum, and then the energy distribution center of gravity of the Mel spectrum, that is, the frequency Fc, is determined. The center of gravity of the energy distribution can be determined by the ratio of the sum of the energy of each frequency position (the product of the amplitude value of the position and the frequency value) to the total number of frequency positions.

在一个检测窗内信号帧不满足:第二削波置信度大于预设第一阈值或/和第二削波比例大于预设第二阈值的情况下,可以基于该截止频率值确定该检测窗内是否存在破音。In a case where the signal frame in a detection window does not satisfy: the second clipping confidence is greater than the preset first threshold or/and the second clipping ratio is greater than the preset second threshold, the detection window may be determined based on the cutoff frequency value Whether there is a broken sound in it.

示例性地,可以先通过比较信号帧的截止频率值与频率阈值,在截止频率值大于频率阈值的情况下,确定该信号帧存在削波。频率阈值可以基于频域信号的最大频率值确定,例如频率阈值可以设置为最大频率值的1/3~1/4。再确定该检测窗内具有削波的信号帧是否达到预设数量(如2),在达到预设数量的情况下,确定该检测窗存在破音。通过该频域能量特征避免了对无削波但有听感破音的特殊情况的漏检,提高检测准确性。Exemplarily, by comparing the cutoff frequency value of the signal frame with the frequency threshold, it is determined that there is clipping in the signal frame if the cutoff frequency value is greater than the frequency threshold. The frequency threshold may be determined based on the maximum frequency value of the frequency domain signal, for example, the frequency threshold may be set to 1/3˜1/4 of the maximum frequency value. It is then determined whether the number of signal frames with clipping within the detection window reaches a preset number (such as 2), and if the number reaches the preset number, it is determined that there is a broken sound in the detection window. The frequency-domain energy feature avoids the missed detection of the special case of no clipping but hearing-breaking sound, and improves the detection accuracy.

如图4所示,下面对本申请实施例的一种破音检测方法的具体流程进行介绍。As shown in FIG. 4 , the specific flow of a broken sound detection method according to the embodiment of the present application will be introduced below.

步骤S401,将待检测音频信号划分为N个时域信号帧。Step S401, divide the audio signal to be detected into N time-domain signal frames.

步骤S402,根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度。Step S402: Determine the first clipping confidence level corresponding to the frame according to the statistical distribution of the signal amplitude in each frame of the time-domain signal.

步骤S403,对该检测窗内的M个时域信号帧的第一削波置信度进行第一加权处理,获得第二削波置信度。Step S403, performing a first weighting process on the first clipping confidence levels of the M time-domain signal frames within the detection window to obtain a second clipping confidence level.

步骤S404,对于每帧时域信号帧,根据目标信号在该帧时域信号帧中的占比,确定该帧的第一削波比例,目标信号为幅值最大值和幅值最小值对应的信号。Step S404, for each time-domain signal frame, determine the first clipping ratio of the frame according to the proportion of the target signal in the time-domain signal frame, and the target signal is corresponding to the maximum amplitude value and the minimum amplitude value Signal.

步骤S405,对该检测窗内的M个时域信号帧的第一削波比例进行第二加权处理,获得第二削波比例。Step S405, performing a second weighting process on the first clipping ratios of the M time-domain signal frames within the detection window to obtain a second clipping ratio.

步骤S406,对该检测窗内的每帧时域信号帧进行时频变换,得到对应的频域信号。Step S406, performing time-frequency transformation on each frame of the time-domain signal within the detection window to obtain a corresponding frequency-domain signal.

步骤S407,确定频域信号的能量重心为该信号帧的截止频率值。Step S407, determining the energy center of gravity of the frequency domain signal as the cutoff frequency value of the signal frame.

步骤S408,判断当前检测窗的第二削波置信度是否大于第一阈值,且第二削波比例是否大于第二阈值,若是,转至步骤S409,否则转至步骤S410。Step S408, judging whether the second clipping confidence of the current detection window is greater than the first threshold, and whether the second clipping ratio is greater than the second threshold, if yes, go to step S409, otherwise go to step S410.

步骤S409,确定当前检测窗存在破音。In step S409, it is determined that there is a broken sound in the current detection window.

步骤S410,判断当前检测窗内的目标信号帧的数量是否大于预设数量,若是,则转至S409,否则,转至步骤S411。目标信号帧是截止频率值大于频率阈值的信号帧。Step S410, judging whether the number of target signal frames in the current detection window is greater than the preset number, if yes, go to S409, otherwise, go to step S411. The target signal frame is a signal frame with a cutoff frequency value greater than a frequency threshold.

步骤S411,向前滑动一个步长(如一个时间帧),转至步骤S402,对下一检测窗进行破音检测。Step S411, slide forward by one step (for example, one time frame), go to step S402, and perform broken sound detection on the next detection window.

在上述实施例中,破音判断特征:削波置信度(S402、S403)、削波比例(S404、S405)和频域能量特征(S406、S407)的确定过程可以同时进行,以加快检测效率。In the above-mentioned embodiment, the determination process of breaking sound judgment features: clipping confidence (S402, S403), clipping ratio (S404, S405) and frequency domain energy features (S406, S407) can be carried out simultaneously to speed up the detection efficiency .

上述实施例的具体细节已经在前述的破音检测方法中进行了详细的描述,因此此处不再赘述。The specific details of the above embodiment have been described in detail in the foregoing broken sound detection method, so details are not repeated here.

实验验证Experimental verification

将一段音频信号输入具有本发明提出的破音检测方法的电子设备,进行破音检测,检测结果如图5所示,图中虚线框选部分为检测出的破音部分,在该音频信号的首尾两端,说明该部分具有听感破音。同时可以看出,检测出的破音部分出现了较长时间的削波,但是该音频信号的中间部分也存在削波现象,但是并没有被本发明方法检测出破音,该部分属于有削波但听感无破音部分。而若直接采用现有的基于削波检测破音的方式,很显然会判定中间部分存在破音,通过实验说明本发明方法的破音检测准确性更高。A section of audio signal is input to the electronic equipment with the broken sound detection method proposed by the present invention, and the broken sound detection is carried out. The detection result is as shown in Figure 5, and the dotted line frame selection part in the figure is the broken sound part detected, in the audio signal Both ends at the beginning and the end, indicating that this part has a sense of hearing breaking. At the same time, it can be seen that the detected broken sound part has clipping for a long time, but the middle part of the audio signal also has clipping phenomenon, but the broken sound is not detected by the method of the present invention, and this part belongs to clipping. The sound is wave but there is no broken sound part. However, if the existing method of detecting broken sound based on clipping is directly adopted, it is obvious that there is broken sound in the middle part. Experiments show that the detection accuracy of broken sound by the method of the present invention is higher.

虽然绝大多数破音都是由信号削波导致的,但是削波并不等价于破音,因为只有持续一定时长的削波才会形成听感破音。基于此本发明一方面通过提取检测窗内的削波置信度和削波比例等多组削波特征,针对持续一定时长的信号帧(检测窗)的多组削波特征进行结合分析,有效避免了对有削波但无破音的音频片段的误检。另一方面,通过增加频域能量特征来应对信号并未削波但能量较大存在听感破音的场景,有效避免了对该场景的漏检。具体地,考虑到能量重心较大的信号帧是削波和该特殊场景的共同特征,基于能量重心来检测该特殊场景,降低漏检率。Although the vast majority of broken sounds are caused by signal clipping, clipping is not equivalent to broken sounds, because only clipping that lasts for a certain period of time can form aurally broken sounds. Based on this, on the one hand, the present invention extracts multiple groups of clipping features such as clipping confidence and clipping ratio in the detection window, and performs combined analysis on multiple groups of clipping features of signal frames (detection windows) that last for a certain period of time, effectively avoiding False detection of clipped but not broken audio clips. On the other hand, by adding frequency-domain energy features to deal with the scene where the signal is not clipped but the energy is large and the sound is broken, the missed detection of the scene is effectively avoided. Specifically, considering that a signal frame with a large energy center of gravity is a common feature of clipping and the special scene, the special scene is detected based on the energy center of gravity to reduce the missed detection rate.

本发明还通过统计信号分布来确定削波置信度,巧妙的避免了削波阈值的设置,规避了阈值选取不合理而导致的削波检测失效的问题,且削波置信度通过时域信号帧来确定,相比于频域确定过程,大大降低了计算复杂度。同时确定的削波置信度可以在0-1之间连续分布,相比于传统的二分情况,能够为后续的破音检测策略提供较大的灵活性。The present invention also determines the clipping confidence level by statistical signal distribution, cleverly avoids the setting of the clipping threshold, and avoids the problem of clipping detection failure caused by unreasonable selection of the threshold value, and the clipping confidence level is determined by the time-domain signal frame Compared with the frequency domain determination process, the computational complexity is greatly reduced. At the same time, the determined clipping confidence can be continuously distributed between 0 and 1, which can provide greater flexibility for the subsequent sound break detection strategy compared with the traditional dichotomous situation.

示例性装置Exemplary device

需要说明的是,本公开实施例提供的破音检测方法,执行主体可以为对应装置。接下来,参考图6先对本公开示例性实施方式的破音检测装置进行描述。It should be noted that, the broken sound detection method provided by the embodiment of the present disclosure may be executed by a corresponding device. Next, a broken sound detection device according to an exemplary embodiment of the present disclosure will be described first with reference to FIG. 6 .

图6示意性示出了根据本发明的一个实施例的破音检测装置的框图。Fig. 6 schematically shows a block diagram of a broken sound detection device according to an embodiment of the present invention.

参照图6所示,根据本发明的一个实施例的破音检测装置600,装置600可以包括:信号划分模块610、第一确定模块620、第二确定模块630和破音确定模块640,其中:信号划分模块610,可以被配置为将待检测音频信号划分为N个时域信号帧,N为正整数;第一确定模块620,可以被配置为根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,第一削波置信度用于区分正常音频信号和削波音频信号;第二确定模块630,可以被配置为根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;M为小于N的正整数;破音确定模块640,可以被配置为响应于第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。Referring to Fig. 6, according to an embodiment of the broken sound detection device 600 of the present invention, the device 600 may include: a signal division module 610, a first determination module 620, a second determination module 630 and a broken sound determination module 640, wherein: The signal division module 610 may be configured to divide the audio signal to be detected into N time-domain signal frames, where N is a positive integer; the first determination module 620 may be configured according to the signal amplitude in each frame of the time-domain signal Statistical distribution, determine the first clipping confidence corresponding to the frame, the first clipping confidence is used to distinguish normal audio signals and clipping audio signals; the second determination module 630 can be configured to The first clipping confidence level of the signal frame determines the second clipping confidence level corresponding to the detection window formed by consecutive M time-domain signal frames; M is a positive integer less than N; the broken sound determination module 640 can be configured as In response to the second clipping confidence being greater than the preset first threshold, it is determined that a broken sound exists within the corresponding detection window.

在本公开的一些实施例中,基于前述方案,第一确定模块620包括:直方图构建模块,被配置为对于每帧时域信号帧,统计各幅值区段内的信号数量,以构建统计直方图;幅值区段为对由该时域信号帧的最大幅值与最小幅值形成的幅值区间进行划分获得的;查找模块,被配置为查找统计直方图中的目标区块,目标区块为区块两端均高于区块中间,且区块后端高于区块前端的区块,区块前端和区块后端为根据查找的先后顺序确定的;距离确定模块,被配置为确定各目标区块中区块两端的横向距离;置信度确定模块,被配置为根据各横向距离中的最大值与幅值区段的总数量的比值,确定第一削波置信度。In some embodiments of the present disclosure, based on the foregoing solution, the first determination module 620 includes: a histogram construction module configured to, for each time-domain signal frame, count the number of signals in each amplitude segment to construct statistics Histogram; the amplitude section is obtained by dividing the amplitude interval formed by the maximum amplitude and minimum amplitude of the time domain signal frame; the search module is configured to search for the target block in the statistical histogram, the target A block is a block in which both ends of the block are higher than the middle of the block, and the back end of the block is higher than the front end of the block. The front end of the block and the back end of the block are determined according to the order of search; the distance determination module is It is configured to determine the lateral distance between both ends of the block in each target block; the confidence degree determining module is configured to determine the first clipping confidence degree according to the ratio of the maximum value in each lateral distance to the total number of amplitude segments.

在本公开的一些实施例中,基于前述方案,查找模块还被配置为:分别从统计直方图的两端开始,依次向中间移动,查找目标区块。In some embodiments of the present disclosure, based on the aforementioned solutions, the search module is further configured to: start from both ends of the statistical histogram respectively, move to the middle in sequence, and search for the target block.

在本公开的一些实施例中,基于前述方案,装置600还包括削波比例确定模块,削波比例确定模块被配置为:对于每帧时域信号帧,根据目标信号在该帧时域信号帧中的占比,确定该帧的第一削波比例,目标信号为幅值最大值和幅值最小值对应的信号;根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例;破音确定模块640还被配置为:响应于第二削波置信度大于预设第一阈值,且第二削波比例大于预设第二阈值,确定该检测窗内存在破音。In some embodiments of the present disclosure, based on the foregoing solution, the apparatus 600 further includes a clipping ratio determination module, and the clipping ratio determination module is configured to: for each time-domain signal frame, according to the target signal in the frame time-domain signal frame Determine the first clipping ratio of the frame, the target signal is the signal corresponding to the maximum amplitude value and the minimum amplitude value; according to the first clipping ratio of consecutive M time-domain signal frames, determine the corresponding detection window The second clipping ratio; the broken sound determination module 640 is also configured to: in response to the second clipping confidence being greater than the preset first threshold, and the second clipping ratio is greater than the preset second threshold, determine the detection window memory In broken sound.

在本公开的一些实施例中,基于前述方案,装置600还包括:频域特征确定模块,被配置为确定该检测窗内时域信号帧的频域能量特征;破音确定模块640还被配置为:响应于检测窗内信号帧不满足以下至少一项:第二削波置信度大于预设第一阈值,第二削波比例大于预设第二阈值,则根据频域能量特征,确定该检测窗内是否存在破音。In some embodiments of the present disclosure, based on the foregoing solution, the device 600 further includes: a frequency domain feature determination module configured to determine the frequency domain energy feature of the time domain signal frame within the detection window; the broken sound determination module 640 is also configured It is: in response to the signal frame in the detection window not satisfying at least one of the following: the second clipping confidence is greater than the preset first threshold, and the second clipping ratio is greater than the preset second threshold, then according to the frequency domain energy feature, determine the Check whether there is a broken sound in the window.

在本公开的一些实施例中,基于前述方案,频域能量特征包括截止频率值,频域特征确定模块还被配置为:对该检测窗内的每帧时域信号帧进行时频变换,得到对应的频域信号;确定频域信号的能量重心为该信号帧的截止频率值。In some embodiments of the present disclosure, based on the foregoing solution, the frequency-domain energy feature includes a cutoff frequency value, and the frequency-domain feature determination module is further configured to: perform time-frequency transformation on each frame of the time-domain signal frame within the detection window to obtain Corresponding frequency domain signal; determine the energy center of gravity of the frequency domain signal as the cutoff frequency value of the signal frame.

在本公开的一些实施例中,基于前述方案,破音确定模块640还被配置为:响应于信号帧的截止频率值大于频率阈值,确定该信号帧存在削波,频率阈值为基于频域信号的最大频率值确定的;响应于该检测窗内至少存在预设数量的具有削波的信号帧,确定该检测窗存在破音。In some embodiments of the present disclosure, based on the foregoing solution, the broken sound determination module 640 is further configured to: determine that there is clipping in the signal frame in response to the cutoff frequency value of the signal frame being greater than a frequency threshold, and the frequency threshold is based on the frequency domain signal Determined by the maximum frequency value of ; in response to at least a preset number of signal frames with clipping within the detection window, it is determined that there is a broken sound in the detection window.

在本公开的一些实施例中,基于前述方案,第二确定模块630还被配置为:对该检测窗内的M个时域信号帧的第一削波置信度进行第一加权处理,获得第二削波置信度;削波比例确定模块还被配置为:对该检测窗内的M个时域信号帧的第一削波比例进行第二加权处理,获得第二削波比例。In some embodiments of the present disclosure, based on the foregoing solution, the second determination module 630 is further configured to: perform a first weighting process on the first clipping confidences of the M time-domain signal frames within the detection window to obtain the first Two clipping confidence levels; the clipping ratio determination module is further configured to: perform a second weighting process on the first clipping ratios of the M time-domain signal frames within the detection window to obtain a second clipping ratio.

在本公开的一些实施例中,基于前述方案,M为基于连续削波形成破音的最小时长与时域信号帧的长度确定的。In some embodiments of the present disclosure, based on the foregoing solution, M is determined based on the minimum duration of continuous clipping to form a broken sound and the length of a time-domain signal frame.

上述破音检测装置中各模块或单元的具体细节已经在对应的破音检测方法中进行了详细的描述,因此此处不再赘述。The specific details of each module or unit in the above broken sound detection device have been described in detail in the corresponding broken sound detection method, so details will not be repeated here.

示例性介质Exemplary medium

在介绍了本发明示例性实施方式的方法之后,接下来,对本发明示例性实施方式的介质进行说明。After introducing the method of the exemplary embodiment of the present invention, next, the medium of the exemplary embodiment of the present invention will be described.

在一些可能的实施方式中,本发明的各个方面还可以实现为一种存储介质,其上存储有程序代码,当程序代码被设备的处理器执行时用于实现本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的破音检测方法中的步骤。In some possible implementations, various aspects of the present invention can also be implemented as a storage medium, on which program code is stored, and when the program code is executed by the processor of the device, it is used to implement the above-mentioned "exemplary method" in this specification The steps in the crack detection method according to various exemplary embodiments of the present invention are described in the section.

具体地,设备的处理器执行程序代码时用于实现如下步骤:Specifically, the processor of the device is used to implement the following steps when executing the program code:

将待检测音频信号划分为N个时域信号帧,N为正整数;根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,第一削波置信度用于区分正常音频信号和削波音频信号;根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;M为小于N的正整数;响应于第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。The audio signal to be detected is divided into N time-domain signal frames, and N is a positive integer; according to the statistical distribution of the signal amplitude in each frame of the time-domain signal frame, the first clipping confidence corresponding to the frame is determined, and the first clipping Wave confidence is used to distinguish between normal audio signals and clipped audio signals; according to the first clipping confidence of consecutive M time-domain signal frames, determine the second clipping corresponding to the detection window formed by consecutive M time-domain signal frames Confidence: M is a positive integer smaller than N; in response to the second clipping confidence being greater than the preset first threshold, it is determined that there is a broken sound in the corresponding detection window.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述的破音检测方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述破音检测方法的技术方案的描述。The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned broken sound detection method belong to the same concept, and the details of the technical solution of the storage medium that are not described in detail can be found in the description of the technical solution of the above-mentioned broken sound detection method .

需要说明的是:上述的存储介质可以是可读存储介质。可读存储介质例如可以是但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。It should be noted that: the above-mentioned storage medium may be a readable storage medium. The readable storage medium may be, for example but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线、光缆、RF等,或者上述的任意合适的组合。The program code contained on the readable storage medium can be transmitted by any appropriate medium, including but not limited to: wireless, cable, optical cable, RF, etc., or any suitable combination of the above.

可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户电子设备上执行、部分在用户电子设备上部分在远程电子设备上执行、或者完全在远程电子设备或服务器上执行。在涉及远程电子设备的情形中,远程电子设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户电子设备,或者,可以连接到外部电子设备(例如利用因特网服务提供商来通过因特网连接)。Program codes for performing the operations of the present invention can be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., as well as conventional procedural programming Language - such as "C" or similar programming language. The program code may execute entirely on the user electronic device, partly on the user electronic device and partly on the remote electronic device, or entirely on the remote electronic device or server. In cases involving a remote electronic device, the remote electronic device may be connected to the user electronic device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it may be connected to an external electronic device (such as by using an Internet service Provider via Internet connection).

示例性电子设备Exemplary electronic device

在介绍了本公开示例性实施方式的方法、介质和装置之后,接下来,介绍根据本公开的另一示例性实施方式的电子设备。After introducing the method, medium, and apparatus of the exemplary embodiment of the present disclosure, next, an electronic device according to another exemplary embodiment of the present disclosure is introduced.

所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present invention can be implemented as systems, methods or program products. Therefore, various aspects of the present invention can be embodied in the following forms, that is: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "circuit", "module" or "system".

下面参照图7来描述根据本发明的这种实施方式的电子设备700。图7显示的电子设备700仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。An electronic device 700 according to this embodiment of the present invention is described below with reference to FIG. 7 . The electronic device 700 shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present invention.

如图7所示,电子设备700以通用电子设备的形式表现。电子设备700的组件可以包括但不限于:上述至少一个处理单元710、上述至少一个存储单元720、连接不同系统组件(包括存储单元720和处理单元710)的总线730。As shown in FIG. 7, an electronic device 700 is represented in the form of a general electronic device. Components of the electronic device 700 may include but not limited to: at least one processing unit 710 , at least one storage unit 720 , and a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710 ).

其中,存储单元存储有程序代码,程序代码可以被处理单元710执行,使得处理单元710执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。Wherein, the storage unit stores program codes, and the program codes can be executed by the processing unit 710, so that the processing unit 710 executes the steps according to various exemplary embodiments of the present invention described in the "Exemplary Methods" section of this specification.

存储单元720可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)7201和/或高速缓存存储单元7202,还可以进一步包括只读存储单元(ROM)7203。The storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 7201 and/or a cache storage unit 7202 , and may further include a read-only storage unit (ROM) 7203 .

存储单元720还可以包括具有一组(至少一个)程序模块7205的程序/实用工具7204,这样的程序模块7205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, Implementations of networked environments may be included in each or some combination of these examples.

总线730可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。Bus 730 may represent one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local area using any of a variety of bus structures. bus.

电子设备700也可以与一个或多个外部设备(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备700交互的设备通信,和/或与使得该电子设备700能与一个或多个其它电子设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过显示单元740和与显示单元740连接的输入/输出(I/O)接口750进行。并且,电子设备700还可以通过网络适配器760与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器760通过总线730与电子设备700的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备700使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 700 can also communicate with one or more external devices (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 700, and/or communicate with the device that enables the user to interact with the electronic device 700. The electronic device 700 is capable of communicating with any device (eg, router, modem, etc.) that communicates with one or more other electronic devices. Such communication may be performed through the display unit 740 and an input/output (I/O) interface 750 connected to the display unit 740 . Moreover, the electronic device 700 can also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN) and/or a public network such as the Internet) through the network adapter 760 . As shown, the network adapter 760 communicates with other modules of the electronic device 700 through the bus 730 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台电子设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。Through the description of the above implementations, those skilled in the art can easily understand that the example implementations described here can be implemented by software, or by combining software with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure can be embodied in the form of software products, and the software products can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to make an electronic device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.

上述为本实施例的一种电子设备700的示意性方案。需要说明的是,该电子设备700的技术方案与上述的破音检测方法的技术方案属于同一构思,电子设备的技术方案未详细描述的细节内容,均可以参见上述破音检测方法的技术方案的描述。The foregoing is a schematic solution of an electronic device 700 in this embodiment. It should be noted that the technical solution of the electronic device 700 belongs to the same concept as the technical solution of the above-mentioned broken sound detection method, and details that are not described in detail in the technical solution of the electronic device can be referred to in the technical solution of the above-mentioned broken sound detection method. describe.

应当注意,尽管在上文详细描述中提及了破音检测装置的若干模块或子模块,但是这种划分仅仅是示例性的,并非是强制性的。实际上,根据本发明的实施方式,上文描述的两个或更多模块或单元的特征和功能可以在一个模块或单元中具体化。反之,上文描述的一个模块或单元的特征和功能可以进一步划分为由多个模块或单元来具体化。It should be noted that although several modules or sub-modules of the crack detection device are mentioned in the above detailed description, this division is only exemplary and not mandatory. Actually, according to the embodiment of the present invention, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.

此外,尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。In addition, while operations of the methods of the present invention are depicted in the figures in a particular order, there is no requirement or implication that these operations must be performed in that particular order, or that all illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

虽然已经参考若干具体实施方式描述了本发明的精神和原理,但是应该理解,本发明并不限于所发明的具体实施方式,对各方面的划分也不意味着这些方面中的特征不能组合以进行受益,这种划分仅是为了表述的方便。本发明旨在涵盖所附权利要求的精神和范围内所包括的各种修改和等同布置。Although the spirit and principles of the invention have been described with reference to several specific embodiments, it should be understood that the invention is not limited to the specific embodiments of the invention, nor does division of aspects imply that features in these aspects cannot be combined to achieve Benefit, this division is only for the convenience of expression. The present invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1.一种破音检测方法,其特征在于,所述方法包括:1. a broken sound detection method, is characterized in that, described method comprises: 将待检测音频信号划分为N个时域信号帧,所述N为正整数;Dividing the audio signal to be detected into N time-domain signal frames, where N is a positive integer; 根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,所述第一削波置信度用于区分正常音频信号和削波音频信号;According to the statistical distribution of the signal amplitude in each frame of the time-domain signal, determine the first clipping confidence level corresponding to the frame, and the first clipping confidence level is used to distinguish between normal audio signals and clipping audio signals; 根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;所述M为小于所述N的正整数;According to the first clipping confidence of consecutive M time-domain signal frames, determine the second clipping confidence corresponding to the detection window formed by consecutive M time-domain signal frames; the M is a positive integer smaller than the N; 响应于所述第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。In response to the second clipping confidence being greater than a preset first threshold, it is determined that there is a broken sound within the corresponding detection window. 2.根据权利要求1所述的方法,其特征在于,所述根据每帧时域信号帧内信号幅值的统计分布情况,确定第一削波置信度,包括:2. The method according to claim 1, wherein, determining the first clipping confidence level according to the statistical distribution of the signal amplitude in each frame of the time-domain signal frame includes: 对于每帧时域信号帧,统计各幅值区段内的信号数量,以构建统计直方图;所述幅值区段为对由该时域信号帧的最大幅值与最小幅值形成的幅值区间进行划分获得的;For each time-domain signal frame, the number of signals in each amplitude section is counted to construct a statistical histogram; the amplitude section is the amplitude formed by the maximum amplitude and the minimum amplitude of the time-domain signal frame obtained by dividing the value range; 查找所述统计直方图中的目标区块,所述目标区块为区块两端均高于区块中间,且区块后端高于区块前端的区块,区块前端和区块后端为根据所述查找的先后顺序确定的;Find the target block in the statistical histogram, the target block is a block whose both ends are higher than the middle of the block, and the back end of the block is higher than the front end of the block, the front end of the block and the back end of the block The terminal is determined according to the sequence of the search; 确定各所述目标区块中区块两端的横向距离;determining the lateral distance between the two ends of the blocks in each of the target blocks; 根据各所述横向距离中的最大值与所述幅值区段的总数量的比值,确定所述第一削波置信度。The first clipping confidence is determined according to a ratio of a maximum value in each of the lateral distances to the total number of amplitude segments. 3.根据权利要求2所述的方法,其特征在于,所述查找所述目标区块,包括:3. The method according to claim 2, wherein the searching for the target block comprises: 分别从所述统计直方图的两端开始,依次向中间移动,查找所述目标区块。Starting from both ends of the statistical histogram respectively, moving to the middle in order to search for the target block. 4.根据权利要求1所述的方法,其特征在于,所述方法还包括:4. The method according to claim 1, wherein the method further comprises: 对于每帧时域信号帧,根据目标信号在该帧时域信号帧中的占比,确定该帧的第一削波比例,所述目标信号为幅值最大值和幅值最小值对应的信号;For each time-domain signal frame, according to the proportion of the target signal in the frame of the time-domain signal frame, determine the first clipping ratio of the frame, and the target signal is a signal corresponding to the maximum amplitude value and the minimum amplitude value ; 根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例;Determine a second clipping ratio corresponding to the detection window according to the first clipping ratio of consecutive M time-domain signal frames; 所述确定检测窗内存在破音,包括:The determination that there is a broken sound in the detection window includes: 响应于所述第二削波置信度大于预设第一阈值,且所述第二削波比例大于预设第二阈值,确定该检测窗内存在破音。In response to the second clipping confidence level being greater than a preset first threshold and the second clipping ratio being larger than a preset second threshold, it is determined that there is a broken sound within the detection window. 5.根据权利要求4所述的方法,其特征在于,所述方法还包括:5. method according to claim 4, is characterized in that, described method also comprises: 确定该检测窗内时域信号帧的频域能量特征;determining the frequency-domain energy characteristics of the time-domain signal frame within the detection window; 所述确定检测窗内存在破音,包括:The determination that there is a broken sound in the detection window includes: 响应于检测窗内信号帧不满足以下至少一项:In response to the signal frame within the detection window not satisfying at least one of the following: 所述第二削波置信度大于预设第一阈值,The second clipping confidence level is greater than a preset first threshold, 所述第二削波比例大于预设第二阈值,The second clipping ratio is greater than a preset second threshold, 则根据所述频域能量特征,确定该检测窗内是否存在破音。Then, according to the frequency-domain energy feature, it is determined whether there is a broken sound within the detection window. 6.根据权利要求5所述的方法,其特征在于,所述频域能量特征包括截止频率值,所述确定该检测窗内时域信号帧的频域能量特征,包括:6. The method according to claim 5, wherein the frequency-domain energy feature comprises a cutoff frequency value, and the determination of the frequency-domain energy feature of the time-domain signal frame in the detection window comprises: 对该检测窗内的每帧时域信号帧进行时频变换,得到对应的频域信号;performing time-frequency transformation on each frame of the time-domain signal frame in the detection window to obtain a corresponding frequency-domain signal; 确定所述频域信号的能量重心为该信号帧的所述截止频率值。Determine the energy center of gravity of the frequency domain signal as the cutoff frequency value of the signal frame. 7.根据权利要求6所述的方法,其特征在于,所述根据所述频域能量特征,确定该检测窗内是否存在破音,包括:7. The method according to claim 6, wherein said determining whether there is a broken sound in the detection window according to the frequency-domain energy feature comprises: 响应于信号帧的所述截止频率值大于频率阈值,确定该信号帧存在削波,所述频率阈值为基于所述频域信号的最大频率值确定的;determining that there is clipping in the signal frame in response to the cutoff frequency value of the signal frame being greater than a frequency threshold, the frequency threshold being determined based on the maximum frequency value of the frequency domain signal; 响应于该检测窗内至少存在预设数量的具有削波的信号帧,确定该检测窗存在破音。In response to there being at least a preset number of signal frames with clipping within the detection window, it is determined that a broken sound exists in the detection window. 8.根据权利要求4所述的方法,其特征在于,所述根据连续M个时域信号帧的第一削波置信度,确定对应的第二削波置信度,包括:8. The method according to claim 4, wherein said determining the corresponding second clipping confidence level according to the first clipping confidence level of consecutive M time-domain signal frames comprises: 对该检测窗内的M个时域信号帧的第一削波置信度进行第一加权处理,获得所述第二削波置信度;performing a first weighting process on the first clipping confidences of the M time-domain signal frames within the detection window to obtain the second clipping confidences; 所述根据连续M个时域信号帧的第一削波比例,确定对应检测窗的第二削波比例,包括:The determining the second clipping ratio corresponding to the detection window according to the first clipping ratio of consecutive M time-domain signal frames includes: 对该检测窗内的M个时域信号帧的第一削波比例进行第二加权处理,获得所述第二削波比例。Performing a second weighting process on the first clipping ratios of the M time-domain signal frames within the detection window to obtain the second clipping ratio. 9.一种破音检测装置,其特征在于,所述装置包括:9. A broken sound detection device, characterized in that the device comprises: 信号划分模块,被配置为将待检测音频信号划分为N个时域信号帧,所述N为正整数;A signal division module configured to divide the audio signal to be detected into N time-domain signal frames, where N is a positive integer; 第一确定模块,被配置为根据每帧时域信号帧内信号幅值的统计分布情况,确定该帧对应的第一削波置信度,所述第一削波置信度用于区分正常音频信号和削波音频信号;The first determination module is configured to determine the first clipping confidence level corresponding to the frame according to the statistical distribution of the signal amplitude in each frame of the time domain signal, and the first clipping confidence level is used to distinguish normal audio signals and clip audio signals; 第二确定模块,被配置为根据连续M个时域信号帧的第一削波置信度,确定由连续M个时域信号帧形成的检测窗对应的第二削波置信度;所述M为小于所述N的正整数;The second determination module is configured to determine the second clipping confidence corresponding to the detection window formed by the continuous M time-domain signal frames according to the first clipping confidence of consecutive M time-domain signal frames; the M is a positive integer less than said N; 破音确定模块,被配置为响应于所述第二削波置信度大于预设第一阈值,确定对应的检测窗内存在破音。The broken sound determining module is configured to determine that there is broken sound within the corresponding detection window in response to the second clipping confidence being greater than a preset first threshold. 10.一种电子设备,包括:处理器和存储器,所述存储器存储有可执行指令,所述处理器用于调用所述存储器存储的可执行指令执行如权利要求1至8中任一项所述的方法。10. An electronic device, comprising: a processor and a memory, the memory stores executable instructions, and the processor is used to call the executable instructions stored in the memory to execute as described in any one of claims 1 to 8 Methods.
CN202310348811.XA 2023-03-28 2023-03-28 Sound breaking detection method and device, electronic equipment and storage medium Pending CN116364115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310348811.XA CN116364115A (en) 2023-03-28 2023-03-28 Sound breaking detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310348811.XA CN116364115A (en) 2023-03-28 2023-03-28 Sound breaking detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116364115A true CN116364115A (en) 2023-06-30

Family

ID=86931234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310348811.XA Pending CN116364115A (en) 2023-03-28 2023-03-28 Sound breaking detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116364115A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117998254A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Broken sound restoration method, device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007072005A (en) * 2005-09-05 2007-03-22 Nippon Telegr & Teleph Corp <Ntt> Non-stationary noise discrimination method, apparatus thereof, program thereof and recording medium thereof
CN101563902A (en) * 2006-12-21 2009-10-21 Lm爱立信电话有限公司 Method and apparatus for reducing signal peak-to-average ratio
CN104167209A (en) * 2014-08-06 2014-11-26 华为软件技术有限公司 Method and device for detecting audio distortion
CN106384599A (en) * 2016-08-31 2017-02-08 广州酷狗计算机科技有限公司 Cracking voice identification method and device
CN106847307A (en) * 2016-12-21 2017-06-13 广州酷狗计算机科技有限公司 Signal detecting method and device
CN110335623A (en) * 2019-07-09 2019-10-15 上海艾为电子技术股份有限公司 Audio data processing method and device
CN114299994A (en) * 2022-01-04 2022-04-08 中南大学 Popping detection method, device and medium for laser Doppler remote interception of voice
CN114566169A (en) * 2022-02-28 2022-05-31 腾讯音乐娱乐科技(深圳)有限公司 Wheat spraying detection method, audio recording method and computer equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007072005A (en) * 2005-09-05 2007-03-22 Nippon Telegr & Teleph Corp <Ntt> Non-stationary noise discrimination method, apparatus thereof, program thereof and recording medium thereof
CN101563902A (en) * 2006-12-21 2009-10-21 Lm爱立信电话有限公司 Method and apparatus for reducing signal peak-to-average ratio
CN104167209A (en) * 2014-08-06 2014-11-26 华为软件技术有限公司 Method and device for detecting audio distortion
CN106384599A (en) * 2016-08-31 2017-02-08 广州酷狗计算机科技有限公司 Cracking voice identification method and device
CN106847307A (en) * 2016-12-21 2017-06-13 广州酷狗计算机科技有限公司 Signal detecting method and device
CN110335623A (en) * 2019-07-09 2019-10-15 上海艾为电子技术股份有限公司 Audio data processing method and device
CN114299994A (en) * 2022-01-04 2022-04-08 中南大学 Popping detection method, device and medium for laser Doppler remote interception of voice
CN114566169A (en) * 2022-02-28 2022-05-31 腾讯音乐娱乐科技(深圳)有限公司 Wheat spraying detection method, audio recording method and computer equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117998254A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Broken sound restoration method, device and storage medium

Similar Documents

Publication Publication Date Title
CN110335593B (en) Voice endpoint detection method, device, equipment and storage medium
CN109074814B (en) Noise detection method and terminal equipment
EP2437256A1 (en) Method and device for realizing trace of background noise in communication system
CN112004177A (en) Howling detection method, microphone volume adjustment method and medium
CN109346062B (en) Voice endpoint detection method and device
EP3451697A1 (en) Method and device for howling detection
US10665248B2 (en) Device and method for classifying an acoustic environment
CN104078051B (en) A kind of voice extracting method, system and voice audio frequency playing method and device
CN110503944B (en) Method and device for training and using voice wake-up model
CN110838301A (en) Method, device terminal and non-transitory computer readable storage medium for suppressing howling
CN115910018B (en) Method and device for improving voice privacy of silence cabin
CN116364115A (en) Sound breaking detection method and device, electronic equipment and storage medium
CN114627899A (en) Sound signal detection method and device, computer readable storage medium and terminal
CN113345439A (en) Subtitle generating method, device, electronic equipment and storage medium
CN113393862B (en) Method, device, equipment and storage medium for detecting sound distortion
US12418747B2 (en) Method and apparatus for switching main microphone, voice detection method and apparatus for microphone, microphone-loudspeaker integrated device, and readable storage medium
CN110890104B (en) Voice endpoint detection method and system
CN110491413B (en) Twin network-based audio content consistency monitoring method and system
CN113270118A (en) Voice activity detection method and device, storage medium and electronic equipment
WO2024099359A1 (en) Voice detection method and apparatus, electronic device and storage medium
CN113270099B (en) Intelligent voice extraction method and device, electronic equipment and storage medium
CN111816217B (en) Self-adaptive endpoint detection voice recognition method and system and intelligent device
WO2023193573A1 (en) Audio processing method and apparatus, storage medium, and electronic device
CN112885380A (en) Method, device, equipment and medium for detecting unvoiced and voiced sounds
TWI756817B (en) Voice activity detection device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination