[go: up one dir, main page]

CN116962955A - Multi-channel sound mixing method, equipment and medium - Google Patents

Multi-channel sound mixing method, equipment and medium Download PDF

Info

Publication number
CN116962955A
CN116962955A CN202210414876.5A CN202210414876A CN116962955A CN 116962955 A CN116962955 A CN 116962955A CN 202210414876 A CN202210414876 A CN 202210414876A CN 116962955 A CN116962955 A CN 116962955A
Authority
CN
China
Prior art keywords
channel
energy
frame
audio
audio data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210414876.5A
Other languages
Chinese (zh)
Inventor
周永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202210414876.5A priority Critical patent/CN116962955A/en
Priority to PCT/CN2023/087077 priority patent/WO2023197967A1/en
Publication of CN116962955A publication Critical patent/CN116962955A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

本申请涉及音频处理技术领域,具体涉及一种多声道的混音方法、设备及介质。该方法包括:获取第一多声道音频数据,第一多声道音频数据包括M个待混音声道的音频数据;确定出第一多声道音频数据中存在能量满足预设能量阈值的音频数据,并对第一多声道音频数据中能量大于预设能量阈值的音频数据进行能量降幅处理;根据能量降幅处理结果,得到第二多声道音频数据;对第二多声道音频数据进行下混,得到具有N个混音声道的混音输出数据,其中M>N,并且N≥1。本申请实施例中提供的多声道的混音方法,可以解决声道下混时,由于部分音频帧能量过高产生的破音问题,得到更为理想的声道下混结果,提升用户的听觉体验。

This application relates to the field of audio processing technology, and specifically to a multi-channel mixing method, equipment and medium. The method includes: acquiring first multi-channel audio data, the first multi-channel audio data including audio data of M channels to be mixed; determining that there is energy in the first multi-channel audio data that meets a preset energy threshold Audio data, and perform energy reduction processing on audio data whose energy is greater than the preset energy threshold in the first multi-channel audio data; obtain the second multi-channel audio data according to the energy reduction processing result; perform energy reduction processing on the second multi-channel audio data Perform downmixing to obtain mixing output data with N mixing channels, where M>N, and N≥1. The multi-channel mixing method provided in the embodiment of the present application can solve the problem of sound breakage caused by excessive energy of some audio frames during channel downmixing, obtain a more ideal channel downmixing result, and improve the user's experience. Auditory experience.

Description

多通道的混音方法、设备及介质Multi-channel mixing method, device and medium

技术领域Technical Field

本申请涉及音频处理技术领域,具体涉及一种多通道的混音方法、设备及介质。The present application relates to the field of audio processing technology, and in particular to a multi-channel mixing method, device and medium.

背景技术Background Art

随着现代技术的快速发展,在需要进行音频播放的多种场景中,由于音频数据与音频输出设备的声道数量的不匹配问题,往往需要在输出音频时完成实时多声道混音,一般为将多声道音频数据转为声道数量更少的音频数据,即声道下混。例如在大屏播放AiMax片源等的时候,会存在3.1、5.1、7.1等多声道的音频数据,但是大屏输出设备切换为数字音频接口(Sony/Philips Digital Interface,spdif)/音频回传通道(Audio ReturnChannel,ARC)/蓝牙输出时,存在只输出两个声道的情况,为了尽可能多的保留音频流的信息,需要对多个声道的数据进行下混生成两声道数据。With the rapid development of modern technology, in various scenarios where audio playback is required, due to the mismatch between audio data and the number of channels of the audio output device, it is often necessary to complete real-time multi-channel mixing when outputting audio, generally converting multi-channel audio data into audio data with fewer channels, that is, channel downmixing. For example, when playing AiMax sources on a large screen, there will be multi-channel audio data such as 3.1, 5.1, and 7.1, but when the large-screen output device is switched to a digital audio interface (Sony/Philips Digital Interface, spdif)/audio return channel (Audio Return Channel, ARC)/Bluetooth output, there is a situation where only two channels are output. In order to retain as much audio stream information as possible, it is necessary to downmix the data of multiple channels to generate two-channel data.

目前对多声道混音为两声道的声道下混方案一般采用下列两种方案:Currently, the following two solutions are generally used for downmixing of multi-channel audio to two channels:

1)采用多声道的前两个声道数据作为输出,丢弃中置声道、环绕声道、低音声道等部分。此种方案在进行输出时,由于部分人声音频数据出现在被丢弃的声道中会造成人声丢失,同时由于只采用两个声道作为输出,会降低用户的听觉体验。1) Use the first two channels of multi-channel data as output, and discard the center channel, surround channel, bass channel, etc. When outputting, this solution will cause the loss of human voice because some human voice audio data appears in the discarded channels. At the same time, since only two channels are used as output, the user's listening experience will be reduced.

2)采用杜比下混方案,对多声道的音频数据中和左右声道相关的数据进行加权求和,得到两声道的音频数据输出。但是对于不符合杜比规格的音频数据,例如对于音频数据中,低音音频数据能量较高的情况,在采用杜比下混方案进行声道下混时,会出现破音的情况,使得用户的听觉体验不佳。2) Dolby downmixing scheme is used to perform weighted summation of the data related to the left and right channels in the multi-channel audio data to obtain the two-channel audio data output. However, for audio data that does not meet Dolby specifications, for example, for audio data with high bass audio data energy, when the Dolby downmixing scheme is used for channel downmixing, the sound will be broken, resulting in a poor listening experience for the user.

发明内容Summary of the invention

本申请实施例提供了一种多通道的混音方法、设备及介质,解决了目前声道下混方案中,下混后的音频数据破音,影响用户听觉体验的问题。The embodiments of the present application provide a multi-channel mixing method, device and medium, which solve the problem of distortion of audio data after downmixing in the current channel downmixing solution, affecting the user's auditory experience.

第一方面,本申请实施例提供了一种多声道的混音方法,应用于电子设备,包括:获取第一多声道音频数据,第一多声道音频数据包括M个待混音声道的音频数据;确定出第一多声道音频数据中存在能量满足预设能量阈值的音频数据,并对第一多声道音频数据中能量大于预设能量阈值的音频数据进行能量降幅处理;根据能量降幅处理结果,得到第二多声道音频数据;对第二多声道音频数据进行下混,得到具有N个混音声道的混音输出数据,其中M>N,并且N≥1。In a first aspect, an embodiment of the present application provides a multi-channel mixing method, which is applied to an electronic device, including: obtaining first multi-channel audio data, the first multi-channel audio data including audio data of M channels to be mixed; determining that there is audio data in the first multi-channel audio data whose energy meets a preset energy threshold, and performing energy reduction processing on the audio data in the first multi-channel audio data whose energy is greater than the preset energy threshold; obtaining second multi-channel audio data based on the energy reduction processing result; down-mixing the second multi-channel audio data to obtain mixed output data with N mixed channels, where M>N and N≥1.

可以理解,第一多声道音频数据为声道下混时的输入数据,第二多声道音频数据为声道下混后的输出数据。It can be understood that the first multi-channel audio data is input data when the channels are down-mixed, and the second multi-channel audio data is output data after the channels are down-mixed.

在一些实施例中,预设能量阈值为预先设置好的,可能会造成混音后音频破音的最低能量值。在一些实施例中,预设能量阈值为预先设置好的,可能会造成混音后音频破音且影响用户听觉体验的其他能量值。本申请对此不作限制。In some embodiments, the preset energy threshold is a pre-set minimum energy value that may cause the audio to break after mixing. In some embodiments, the preset energy threshold is a pre-set other energy value that may cause the audio to break after mixing and affect the user's auditory experience. This application is not limited to this.

在一些实施例中,第一多声道音频数据可以为2.1声道、3.1声道、5.1声道、7.1声道等多声道音频数据,混音输出数据可以为单声道音频数据、两声道音频数据,也可以为不超过第一多声道音频数据的其他多声道音频数据。In some embodiments, the first multi-channel audio data can be 2.1-channel, 3.1-channel, 5.1-channel, 7.1-channel or other multi-channel audio data, and the mixed output data can be mono audio data, two-channel audio data, or other multi-channel audio data not exceeding the first multi-channel audio data.

可以理解,本申请中的多声道混音方法为将多声道的音频数据(即第一多声道音频数据)混合成声道数量更少的音频数据,即混音输出数据。在对各声道音频数据进行声道下混前,通过对各声道的音频数据进行能量跟踪,确定出超出预设能量阈值的音频数据,并进行能量抑制,得到能量抑制后的第二多声道音频数据,并对第二多声道音频数据进行声道下混。本申请实施例的多声道的混音方法,可以充分适应并支持多种多声道音频数据的声道下混,可以解决声道下混时,由于部分音频帧能量过高产生的破音问题,得到更为理想的声道下混结果,提升用户的听觉体验。It can be understood that the multi-channel mixing method in the present application is to mix multi-channel audio data (i.e., the first multi-channel audio data) into audio data with fewer channels, i.e., mixed output data. Before down-mixing the audio data of each channel, the audio data exceeding the preset energy threshold is determined by energy tracking of the audio data of each channel, and energy suppression is performed to obtain the second multi-channel audio data after energy suppression, and the second multi-channel audio data is down-mixed. The multi-channel mixing method of the embodiment of the present application can fully adapt to and support the down-mixing of channels of various multi-channel audio data, can solve the problem of broken sound caused by excessive energy of some audio frames during down-mixing, obtain a more ideal down-mixing result of the channel, and enhance the user's auditory experience.

在上述第一方面的一种可能的实现中,确定出第一多声道音频数据中存在能量大于预设能量阈值的音频数据,包括:对第一多声道音频数据进行分帧处理,得到多个音频帧,并确定多个音频帧的帧能量;确定出第一多声道音频数据中存在帧能量大于预设能量阈值的高能量音频帧。In a possible implementation of the first aspect above, determining that there is audio data in the first multi-channel audio data whose energy is greater than a preset energy threshold includes: performing frame processing on the first multi-channel audio data to obtain multiple audio frames, and determining frame energy of the multiple audio frames; determining that there is a high-energy audio frame in the first multi-channel audio data whose frame energy is greater than a preset energy threshold.

可以理解,在一些实施例中,第一多声道音频数据中帧能量不超过预设能量阈值的音频帧为低能量音频帧,对于低能量音频帧可以不进行能量降幅处理。It can be understood that, in some embodiments, audio frames whose frame energy in the first multi-channel audio data does not exceed a preset energy threshold are low-energy audio frames, and energy reduction processing may not be performed on the low-energy audio frames.

在上述第一方面的一种可能的实现中,对第一多声道音频数据中能量大于预设能量阈值的音频数据进行能量降幅处理,得到第二多声道音频数据,包括:确定高能量音频帧的目标增益,并根据目标增益确定高能量音频帧的帧增益;根据高能量音频帧的帧增益,确定能量降幅处理后高能量音频帧对应的目标音频帧。In a possible implementation of the first aspect above, energy reduction processing is performed on audio data in the first multi-channel audio data whose energy is greater than a preset energy threshold to obtain second multi-channel audio data, including: determining a target gain of a high-energy audio frame, and determining a frame gain of the high-energy audio frame based on the target gain; and determining a target audio frame corresponding to the high-energy audio frame after the energy reduction processing based on the frame gain of the high-energy audio frame.

可以理解,目标增益为对高能量音频帧进行能量降幅处理时的能量抑制因子,利用该能量抑制因子可以实现高能量音频帧的能量降幅。It can be understood that the target gain is an energy suppression factor when performing energy reduction processing on a high-energy audio frame, and the energy reduction of the high-energy audio frame can be achieved by using the energy suppression factor.

在一些实施例中,对于低能量音频帧也可以具有目标增益,低能量音频帧的目标增益为1,即不对其进行能量降幅。In some embodiments, a target gain may also be provided for a low-energy audio frame, and the target gain of the low-energy audio frame is 1, that is, no energy reduction is performed on the low-energy audio frame.

在上述第一方面的一种可能的实现中,高能量音频帧的帧能量是通过下列公式确定的:其中,高能量音频帧包括L个采样点;β表示帧能量平滑系数;xi(n)(k)表示M个待混音声道中第i个待混音声道的第n个音频帧中的第k个采样点的音频数据;表示M个待混音声道中第i个待混音声道的第n个音频帧中的第k个采样点的能量;表示M个待混音声道中第i个待混音声道的第n个音频帧的帧能量。In a possible implementation of the first aspect, the frame energy of the high-energy audio frame is determined by the following formula: The high-energy audio frame includes L sampling points; β represents a frame energy smoothing coefficient; x i (n)(k) represents the audio data of the kth sampling point in the nth audio frame of the i-th channel to be mixed among the M channels to be mixed; represents the energy of the kth sampling point in the nth audio frame of the ith channel to be mixed among the M channels to be mixed; Indicates the frame energy of the nth audio frame of the ith channel to be mixed among the M channels to be mixed.

在一些实施例中,每一个音频帧可以包括L=512个采样点,即音频帧的帧长为512。在另一些实施例中,L还可以为其他数值,本申请对此不作限制。In some embodiments, each audio frame may include L=512 sampling points, that is, the frame length of the audio frame is 512. In other embodiments, L may also be other values, which is not limited in the present application.

在上述第一方面的一种可能的实现中,预设能量阈值包括第一阈值和/或第二阈值;高能量音频帧包括下列至少之一:M个待混音声道的多个音频帧中,对应于同一混音声道的索引相同的至少一个音频帧的平均帧能量大于第一阈值的音频帧为高能量音频帧;同一待混音声道的各音频帧中,与对应音频帧连续的至少两个音频帧的最大帧能量大于第二阈值的音频帧为高能量音频帧。In a possible implementation of the first aspect above, the preset energy threshold includes a first threshold and/or a second threshold; the high-energy audio frame includes at least one of the following: among multiple audio frames of M channels to be mixed, an audio frame whose average frame energy of at least one audio frame with the same index corresponding to the same mixed channel is greater than the first threshold is a high-energy audio frame; among each audio frame of the same channel to be mixed, an audio frame whose maximum frame energy of at least two audio frames consecutive to the corresponding audio frame is greater than the second threshold is a high-energy audio frame.

可以理解,音频帧的索引为M个待混音声道中任意一个声道中的某一音频帧对应的序号,例如对于M个待混音声道中第i个待混音声道的第n个音频帧,其索引为n。It can be understood that the index of the audio frame is the serial number corresponding to an audio frame in any one of the M channels to be mixed. For example, for the nth audio frame of the ith channel to be mixed among the M channels to be mixed, its index is n.

在上述第一方面的一种可能的实现中,M个待混音声道的各音频帧的最大帧能量是根据与各音频帧对应于同一混音声道且索引相同的音频帧中的帧能量最大的音频帧的帧能量确定的。In a possible implementation of the first aspect, the maximum frame energy of each audio frame of the M to-be-mixed channels is determined according to the frame energy of an audio frame having the maximum frame energy among audio frames corresponding to the same mixing channel and having the same index as the audio frames.

在上述第一方面的一种可能的实现中,高能量音频帧的目标增益是根据预设能量阈值,以及与各高能量音频帧连续的至少两个音频帧的最大帧能量确定的。In a possible implementation of the first aspect above, the target gain of the high-energy audio frame is determined according to a preset energy threshold and a maximum frame energy of at least two audio frames consecutive to each high-energy audio frame.

在上述第一方面的一种可能的实现中,帧增益是通过下列公式确定的:其中,α表示帧增益平滑系数;表示M个待混音声道中第i个待混音声道的第n个音频帧的目标增益;表示M个待混音声道中第i个待混音声道的第n-1个音频帧的帧增益;表示M个待混音声道中第i个待混音声道的第n个音频帧的帧增益。In a possible implementation of the first aspect, the frame gain is determined by the following formula: Where α represents the frame gain smoothing coefficient; represents the target gain of the nth audio frame of the ith channel to be mixed among the M channels to be mixed; represents the frame gain of the n-1th audio frame of the ith channel to be mixed among the M channels to be mixed; Indicates the frame gain of the nth audio frame of the ith channel to be mixed among the M channels to be mixed.

在一些实施例中,第一多声道音频数据中的帧能量不超过预设能量阈值的低能量音频帧也可以采用上述公式计算其帧增益,其中低能量音频帧的目标增益为1。In some embodiments, a low-energy audio frame whose frame energy in the first multi-channel audio data does not exceed a preset energy threshold may also use the above formula to calculate its frame gain, wherein the target gain of the low-energy audio frame is 1.

在上述第一方面的一种可能的实现中,根据高能量音频帧的帧增益,确定能量降幅处理后高能量音频帧对应的目标音频帧,包括:根据高能量音频帧的帧增益,确定高能量音频帧中各采样点的采样点增益;根据各采样点增益,对高能量音频帧中的各采样点的音频数据进行能量降幅处理,得到目标音频帧中各采样点的音频数据;根据目标音频帧各采样点的音频数据生成目标音频帧。In a possible implementation of the first aspect above, determining a target audio frame corresponding to the high-energy audio frame after energy reduction processing according to the frame gain of the high-energy audio frame includes: determining a sampling point gain of each sampling point in the high-energy audio frame according to the frame gain of the high-energy audio frame; performing energy reduction processing on audio data of each sampling point in the high-energy audio frame according to each sampling point gain to obtain audio data of each sampling point in the target audio frame; and generating a target audio frame according to the audio data of each sampling point in the target audio frame.

在上述第一方面的一种可能的实现中,各采样点增益是通过下列公式确定的:In a possible implementation of the first aspect, the gain of each sampling point is determined by the following formula:

其中,FrameLen表示目标音频帧的帧长;FrameGainxi(n-1)表示M个待混音声道中第i个待混音声道的第n-1个音频帧的帧增益;FrameGainxi(n)表示M个待混音声道中第i个待混音声道的第n个音频帧的帧增益;GainBuGainBuff[i][k]xi(n-1)表示M个待混音声道中第i个待混音声道的第n-1个音频帧的第k个采样点的采样点增益;GainBuGainBuff[i][k]xi(n)表示M个待混音声道中第i个待混音声道的第n个音频帧的第k个采样点的采样点增益。Wherein, FrameLen represents the frame length of the target audio frame; FrameGain xi(n-1) represents the frame gain of the n-1th audio frame of the ith channel to be mixed among the M channels to be mixed; FrameGain xi(n) represents the frame gain of the nth audio frame of the ith channel to be mixed among the M channels to be mixed; GainBuGainBuff[i][k] xi(n-1) represents the sampling point gain of the kth sampling point of the n-1th audio frame of the ith channel to be mixed among the M channels to be mixed; GainBuGainBuff[i][k] xi(n) represents the sampling point gain of the kth sampling point of the nth audio frame of the ith channel to be mixed among the M channels to be mixed.

在一些实施例中,第一多声道音频数据中的帧能量不超过预设能量阈值的低能量音频帧也可以采用上述公式计算其采样点增益,其中低能量音频帧的帧增益为基于低能量音频帧的目标增益为1计算得到的。In some embodiments, a low-energy audio frame in the first multi-channel audio data whose frame energy does not exceed a preset energy threshold can also use the above formula to calculate its sampling point gain, where the frame gain of the low-energy audio frame is calculated based on the target gain of the low-energy audio frame being 1.

在上述第一方面的一种可能的实现中,目标音频帧中各采样点的音频数据是通过目标音频帧对应的高能量音频帧的各采样点的音频数据以及对应的采样点增益确定的。In a possible implementation of the first aspect, audio data of each sampling point in the target audio frame is determined by audio data of each sampling point of a high-energy audio frame corresponding to the target audio frame and a corresponding sampling point gain.

在上述第一方面的一种可能的实现中,根据能量降幅处理结果,得到第二多声道音频数据,包括:根据目标音频帧和第一多声道音频数据中能量不大于预设能量阈值的低能量音频帧,生成第二多声道音频数据。In a possible implementation of the first aspect above, obtaining second multi-channel audio data according to the energy reduction processing result includes: generating second multi-channel audio data according to the target audio frame and a low-energy audio frame in the first multi-channel audio data whose energy is not greater than a preset energy threshold.

在上述第一方面的一种可能的实现中,对第二多声道音频数据进行下混,得到具有N个第二声道的混音输出数据,包括:对第二多声道音频数据中对应于同一第二声道的目标音频帧和低能量音频帧进行加权求和,得到混音输出数据。In a possible implementation of the first aspect, downmixing the second multi-channel audio data to obtain mixed output data having N second channels includes: performing weighted summation on target audio frames and low-energy audio frames corresponding to the same second channel in the second multi-channel audio data to obtain the mixed output data.

可以理解,上述得到混音输出数据的过程为对第二多声道音频数据采用杜比下混的方法对其进行声道下混,其中进行加权求和的权重系数为预设的参数。It can be understood that the process of obtaining the mixed output data is to downmix the second multi-channel audio data using the Dolby downmix method, wherein the weight coefficient for weighted summation is a preset parameter.

第二方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;一个或多个存储器;一个或多个存储器存储有一个或多个程序,当一个或者多个程序被一个或多个处理器执行时,使得电子设备执行上述多通道的混音方法。In a second aspect, an embodiment of the present application provides an electronic device, comprising: one or more processors; one or more memories; one or more memories storing one or more programs, and when one or more programs are executed by one or more processors, the electronic device executes the above-mentioned multi-channel mixing method.

第三方面,本申请实施例提供了一种计算机可读存储介质,存储介质上存储有指令,指令在计算机上执行时使计算机执行上述多通道的混音方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which instructions are stored. When the instructions are executed on a computer, the computer executes the above-mentioned multi-channel mixing method.

第四方面,本申请实施例提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现上述多通道的混音方法。In a fourth aspect, an embodiment of the present application provides a computer program product, including a computer program/instruction, which implements the above-mentioned multi-channel mixing method when executed by a processor.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1所示为本申请实施例提供的多声道的混音方法的场景示意图;FIG1 is a schematic diagram of a scenario of a multi-channel mixing method provided in an embodiment of the present application;

图2所示为本申请实施例提供的六声道下混为两声道的混音方法的流程示意图;FIG2 is a schematic flow chart of a mixing method for downmixing six channels into two channels provided in an embodiment of the present application;

图3所示为本申请实施例提供的一种多声道的混音方法的流程示意图;FIG3 is a schematic diagram showing a flow chart of a multi-channel mixing method provided in an embodiment of the present application;

图4所示为本申请实施例提供的另一种多声道的混音方法的流程示意图;FIG4 is a schematic flow chart of another multi-channel mixing method provided in an embodiment of the present application;

图5所示为本申请实施例提供的一种能量抑制方法的流程示意图;FIG5 is a schematic diagram showing a flow chart of an energy suppression method provided in an embodiment of the present application;

图6所示为本申请实施例提供的一种多声道音频数据的码流波形示意图;FIG6 is a schematic diagram of a code stream waveform of multi-channel audio data provided in an embodiment of the present application;

图7所示为进行声道下混后的混音声道的码流波形示意图和能量谱示意图;FIG7 is a schematic diagram of a code stream waveform and an energy spectrum of a mixed channel after channel downmixing;

图8所示为本申请实施例提供的一种手机的硬件结构示意图。FIG8 is a schematic diagram showing the hardware structure of a mobile phone provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

如前文所述,对于不符合杜比规格的音频数据,在进行声道下混得到的音频数据会出现破音的问题。具体地,不符合杜比规格的音频数据,由于低音声道的音频数据由于能量过高,在进行加权求和得到混音后的音频数据时,对应音频数据的能量也会过高,进而,对应音频数据在进行输出时,用户听到的就是破音的音频,用户听觉体验不高。As mentioned above, for audio data that does not meet Dolby specifications, the audio data obtained by downmixing the channels will have a problem of distortion. Specifically, for audio data that does not meet Dolby specifications, since the energy of the bass channel audio data is too high, when the weighted sum is performed to obtain the mixed audio data, the energy of the corresponding audio data will also be too high. As a result, when the corresponding audio data is output, the user hears the audio distortion, and the user's auditory experience is not good.

为解决上述声道下混方案中,下混后的音频数据破音,影响用户听觉体验的问题,本申请提出了一种多声道的混音方法。该方法包括:电子设备确定出多声道音频数据的各声道中超出预设能量阈值的音频数据,并对其进行能量抑制(即能量降幅),然后计算得到各声道抑制后的音频数据。进而可以基于预设声道下混算法对抑制后的多声道音频数据进行加权求和,得到声道下混后的音频数据。In order to solve the problem of audio data being broken after downmixing and affecting the user's auditory experience in the above-mentioned channel downmixing solution, the present application proposes a multi-channel mixing method. The method includes: an electronic device determines the audio data in each channel of the multi-channel audio data that exceeds a preset energy threshold, and performs energy suppression (i.e., energy reduction) on it, and then calculates the suppressed audio data of each channel. Then, based on the preset channel downmixing algorithm, the suppressed multi-channel audio data can be weighted summed to obtain the audio data after channel downmixing.

可以理解,在一些实施例中,能量抑制可以通过计算各声道中每个音频帧的帧增益,并通过对应的抑制因子对帧增益进行降幅,得到抑制后的音频帧,进而得到能量抑制后的音频数据。It can be understood that in some embodiments, energy suppression can be performed by calculating the frame gain of each audio frame in each channel and reducing the frame gain by a corresponding suppression factor to obtain a suppressed audio frame, thereby obtaining energy suppressed audio data.

可以理解,预设声道下混算法为,声道下混前后,各声道的对应关系,以及加权求和的权重系数。例如,对于六声道下混为两声道的声道下混方案中,其中六声道分别为左声道、左环绕声道、低音声道、中置声道、右声道、右环绕声道,预设声道下混算法为:左声道、左环绕、低音、中置下混为左声道,右声道、右环绕、低音、中置下混为右声道,加权求和时的权重系数为杜比下混方案中默认权重系数。It can be understood that the preset channel downmixing algorithm is the corresponding relationship between each channel before and after the channel downmixing, and the weight coefficient of the weighted sum. For example, in a channel downmixing scheme in which six channels are downmixed to two channels, where the six channels are the left channel, the left surround channel, the bass channel, the center channel, the right channel, and the right surround channel, the preset channel downmixing algorithm is: the left channel, the left surround channel, the bass channel, and the center channel are downmixed to the left channel, and the right channel, the right surround channel, the bass channel, and the center channel are downmixed to the right channel, and the weight coefficient during the weighted summation is the default weight coefficient in the Dolby downmixing scheme.

可以理解,在一些实施例中,可以将多声道音频数据划分为多个音频帧,进而基于音频帧进行能量跟踪与能量抑制。其中,音频帧为通过分帧处理即将音频数据划分为多个音频数据片段,每个片段作为一个音频帧,各个音频帧之间有交叠,且各声道的音频帧划分相同。在其他实施例中,还可以通过其他方式对能量超出预设能量阈值的音频数据进行能量抑制,本申请对此不作限制。It can be understood that in some embodiments, multi-channel audio data can be divided into multiple audio frames, and then energy tracking and energy suppression can be performed based on the audio frames. Among them, the audio frame is to divide the audio data into multiple audio data segments through framing processing, each segment is an audio frame, there is overlap between the audio frames, and the audio frames of each channel are divided in the same way. In other embodiments, energy suppression can be performed on audio data whose energy exceeds a preset energy threshold in other ways, and this application does not limit this.

本申请实施例中提供的多声道的混音方法,在对各声道音频数据进行加权求和前,通过对各声道的音频帧进行能量跟踪,确定出超出预设能量阈值的音频帧,并进行能量抑制,可以充分适应多种音频数据,支持多种多声道音频数据的声道下混,可以解决声道下混时,由于部分音频帧能量过高产生的破音问题,得到更为理想的声道下混结果,提升用户的听觉体验。The multi-channel mixing method provided in the embodiment of the present application, before performing weighted summation on the audio data of each channel, tracks the energy of the audio frames of each channel, determines the audio frames that exceed a preset energy threshold, and performs energy suppression. This method can fully adapt to a variety of audio data and support channel downmixing of a variety of multi-channel audio data. It can solve the problem of distortion caused by excessive energy of some audio frames during channel downmixing, obtain a more ideal channel downmixing result, and enhance the user's auditory experience.

可以理解,本申请实施例中的电子设备包括但不限于手机(包括折叠屏手机)、平板电脑、膝上型计算机、台式计算机、服务器、可穿戴设备、头戴式显示器、移动电子邮件设备、车机设备、便携式游戏机、便携式音乐播放器、阅读器设备、其中嵌入或耦接有一个或多个处理器的电视机等各类电子设备。为了方便说明,下文以电子设备为手机为例,进行对本申请进行介绍。It can be understood that the electronic devices in the embodiments of the present application include, but are not limited to, mobile phones (including foldable screen mobile phones), tablet computers, laptop computers, desktop computers, servers, wearable devices, head-mounted displays, mobile email devices, car equipment, portable game consoles, portable music players, reader devices, televisions embedded with or coupled with one or more processors, and other electronic devices. For the convenience of explanation, the following takes the electronic device as an example to introduce the present application.

下面结合图1和图2,以手机100中的音频数据输出到蓝牙耳机200为例,对本申请实施例中的应用场景进行介绍。1 and 2 , taking the output of audio data from the mobile phone 100 to the Bluetooth headset 200 as an example, the application scenario in the embodiment of the present application is introduced.

图1所示为多声道的混音方法的应用场景示意图。FIG. 1 is a schematic diagram showing an application scenario of a multi-channel mixing method.

如图1所示,该场景包括手机100和蓝牙耳机200。其中手机100和蓝牙耳机通过蓝牙无线连接。As shown in Fig. 1, the scene includes a mobile phone 100 and a Bluetooth headset 200. The mobile phone 100 and the Bluetooth headset are wirelessly connected via Bluetooth.

当用户戴上蓝牙耳机200并播放手机100中的多声道的音频数据时,手机100需要将其中的多声道的音频数据进行声道下混,转换为包括左声道和右声道的两声道的音频数据,并将该音频数据通过蓝牙发送至蓝牙耳机200。蓝牙耳机200接收到该音频数据后会进行播放。When the user wears the Bluetooth headset 200 and plays the multi-channel audio data in the mobile phone 100, the mobile phone 100 needs to down-mix the multi-channel audio data, convert it into two-channel audio data including a left channel and a right channel, and send the audio data to the Bluetooth headset 200 via Bluetooth. After receiving the audio data, the Bluetooth headset 200 will play it.

图2所示为六声道下混为两声道的混音方法的流程示意图。FIG. 2 is a schematic flow chart of a mixing method for downmixing six channels into two channels.

具体地,如图2所示,以六声道音频数据为例,其中六声道包括左声道A1、左环绕声道A2、低音声道A3、中置声道A4、右声道A5、右环绕声道A6。手机100在进行音频数据的声道下混前,可以先对六声道音频数据进行分帧处理,并确定每个音频帧的帧能量。当任一声道的音频帧的帧能量大于预设能量阈值时,确定该音频帧的能量抑制因子,对该音频帧进行能量抑制,并根据抑制后的各音频帧计算该能量抑制后的六声道的音频数据。然后将能量抑制后的六声道的音频数据采用预设声道下混算法,进行声道下混,得到两声道音频数据,两声道音频数据包括左声道a1和右声道a2。然后手机100可以将左声道a1的音频数据发送至蓝牙耳机200的左耳机,将右声道a2的音频数据发送至蓝牙耳机200的右耳机。Specifically, as shown in FIG2 , taking six-channel audio data as an example, the six channels include a left channel A1, a left surround channel A2, a bass channel A3, a center channel A4, a right channel A5, and a right surround channel A6. Before the mobile phone 100 performs channel downmixing of the audio data, it can first perform frame processing on the six-channel audio data and determine the frame energy of each audio frame. When the frame energy of the audio frame of any channel is greater than the preset energy threshold, the energy suppression factor of the audio frame is determined, the energy of the audio frame is suppressed, and the six-channel audio data after the energy suppression is calculated according to each suppressed audio frame. Then, the six-channel audio data after energy suppression is downmixed using a preset channel downmixing algorithm to obtain two-channel audio data, and the two-channel audio data includes a left channel a1 and a right channel a2. Then the mobile phone 100 can send the audio data of the left channel a1 to the left earphone of the Bluetooth headset 200, and send the audio data of the right channel a2 to the right earphone of the Bluetooth headset 200.

可以理解,本申请中的多声道的混音方法,除了可以支持上述六声道下混为两声道,还可以支持任意M声道下混为N声道,其中M>N。It can be understood that the multi-channel mixing method in the present application can not only support the mixing of the above six channels into two channels, but also support the mixing of any M channels into N channels, where M>N.

下面结合附图,对本申请实施例中的多声道的混音方法进行进一步介绍。The multi-channel mixing method in the embodiment of the present application is further introduced below with reference to the accompanying drawings.

图3所示为本申请实施例提供的多声道的混音方法的流程示意图。FIG3 is a schematic flow chart of a multi-channel mixing method provided in an embodiment of the present application.

如图3所示,多声道的混音方法包括:As shown in FIG3 , the multi-channel mixing method includes:

301:获取多声道音频数据。301: Obtain multi-channel audio data.

可以理解,多声道音频数据(即前文中的第一多声道音频数据)中的不同声道为不同的声道类型,其中每个声道为待混音声道,例如左声道、右声道、左环绕声道、右环绕声道等,不同声道的音频数据可以通过不同的扬声器输出,也可以通过声道下混,由相同的声道输出。It can be understood that different channels in the multi-channel audio data (i.e., the first multi-channel audio data in the previous text) are different channel types, wherein each channel is a channel to be mixed, such as a left channel, a right channel, a left surround channel, a right surround channel, etc. The audio data of different channels can be output through different speakers, or can be output through channel downmixing and output through the same channel.

在一些实施例中,多声道音频数据可以为3.1、5.1、7.1等多声道的音频数据。其中,3.1声道包括左声道、低音声道、中置声道和右声道,5.1声道包括左声道、左后环绕声道、低音声道、中置声道、右声道和右环绕声道,7.1声道包括左声道、左后环绕声道、低音声道、中置声道、右声道、右环绕声道、左后环绕声道和右后环绕声道。在一些实施例中,多声道音频数据还可以为比上述举例中更多更少的声道,本申请对此不作限制。In some embodiments, the multi-channel audio data may be 3.1, 5.1, 7.1, or other multi-channel audio data. Among them, the 3.1 channel includes a left channel, a bass channel, a center channel, and a right channel; the 5.1 channel includes a left channel, a left rear surround channel, a bass channel, a center channel, a right channel, and a right surround channel; and the 7.1 channel includes a left channel, a left rear surround channel, a bass channel, a center channel, a right channel, a right surround channel, a left rear surround channel, and a right rear surround channel. In some embodiments, the multi-channel audio data may also include more or fewer channels than in the above examples, and the present application does not limit this.

可以理解,在一些实施例中,获取到的多声道音频数据中还包括该多声道音频数据的mask掩码。其中,mask掩码用于在多声道音频数据上覆盖一层掩膜,以选择或屏蔽部分音频数据。根据mask掩码,还可以确定出对多声道音频数据进行混音前后,各声道的对应关系,即每个混音后的声道的音频数据由哪几个混音前的声道的音频数据混合而成。It can be understood that in some embodiments, the acquired multi-channel audio data also includes a mask of the multi-channel audio data. The mask is used to cover a layer of mask on the multi-channel audio data to select or shield part of the audio data. According to the mask, the corresponding relationship between the channels before and after mixing the multi-channel audio data can also be determined, that is, the audio data of each mixed channel is mixed with the audio data of the channels before mixing.

在一些实施例中,在获取到多声道音频数据后,还可以对多声道音频数据进行初始化处理。其中,初始化可以包括,确定预设声道下混算法,具体可以包括:确定混音前后各声道的对应关系,以及对多声道的音频数据进行加权求和得到对应混音后的各声道的音频数据的权重系数。可以理解,多声道音频数据中的对应于声道下混后同一混音声道(即声道下混后的输出声道)的至少一个声道的音频数据,可以作为一个联合检测声道。进而,在进行音频帧的能量判断时,可以先对联合检测声道中对应的音频帧的能量进行初判。例如,可以对应音频帧的平均能量进行判断,判断其是否大于设定的初判能量阈值。在一些实施例中,初始化还可以包括,对预设声道下混算法或能量抑制算法中的公式的参数进行初始化。In some embodiments, after obtaining the multi-channel audio data, the multi-channel audio data can also be initialized. Wherein, initialization can include determining a preset channel downmixing algorithm, which can specifically include: determining the correspondence between the channels before and after mixing, and performing weighted summation on the multi-channel audio data to obtain the weight coefficients of the audio data of each channel after the corresponding mixing. It can be understood that the audio data of at least one channel corresponding to the same mixed channel after the channel downmixing (i.e., the output channel after the channel downmixing) in the multi-channel audio data can be used as a joint detection channel. Furthermore, when performing energy judgment of the audio frame, the energy of the corresponding audio frame in the joint detection channel can be initially judged. For example, the average energy of the corresponding audio frame can be judged to determine whether it is greater than the set initial judgment energy threshold. In some embodiments, initialization can also include initializing the parameters of the formula in the preset channel downmixing algorithm or the energy suppression algorithm.

302:对多声道音频数据进行分帧,得到多个音频帧。302: Divide the multi-channel audio data into frames to obtain multiple audio frames.

可以理解,多声道音频数据的特征及表征其本质特征的参数会随时间变化而改变,即音频数据具有时变特性,故而下文对多声道音频数据的能量跟踪以及能量抑制,可以建立在短时的基础上,即进行短时分析。具体地,可以通过将多声道音频数据划分为多段,每段为一个音频帧。且同一音频帧的本质特征保持不变或相对稳定。It can be understood that the characteristics of multi-channel audio data and the parameters characterizing its essential characteristics will change over time, that is, audio data has time-varying characteristics, so the energy tracking and energy suppression of multi-channel audio data below can be based on short-term analysis. Specifically, the multi-channel audio data can be divided into multiple segments, each segment is an audio frame. And the essential characteristics of the same audio frame remain unchanged or relatively stable.

进一步地,在一些实施例中,可以对多声道音频数据直接进行分帧,例如,每个音频帧的帧长为10-30ms。Furthermore, in some embodiments, the multi-channel audio data may be directly divided into frames, for example, the frame length of each audio frame is 10-30 ms.

在另一些实施例中,可以对多声道音频数据进行采样,将连续的多声道音频数据转换为离散的多声道音频数据,并将连续的512个采样点的音频数据组成一个音频帧。In some other embodiments, the multi-channel audio data may be sampled, the continuous multi-channel audio data may be converted into discrete multi-channel audio data, and the audio data of 512 consecutive sampling points may be combined into an audio frame.

303:若第m个音频帧的帧能量大于预设能量阈值。其中,预设能量阈值为预先设置的。若第m个音频帧的帧能量大于预设能量阈值时,表明声道下混后,该音频帧对应的部分音频可能会出现破音,则需要进行能量抑制,即执行步骤304。当若第m个音频帧的帧能量小于或等于预设能量阈值时,表明声道下混后,该音频帧对应的部分音频不会出现破音,则不需要进行能量抑制,即执行步骤305。303: If the frame energy of the mth audio frame is greater than the preset energy threshold. The preset energy threshold is pre-set. If the frame energy of the mth audio frame is greater than the preset energy threshold, it indicates that after the channel is down-mixed, part of the audio corresponding to the audio frame may be distorted, and energy suppression is required, that is, execute step 304. If the frame energy of the mth audio frame is less than or equal to the preset energy threshold, it indicates that after the channel is down-mixed, part of the audio corresponding to the audio frame will not be distorted, and energy suppression is not required, that is, execute step 305.

可以理解,音频帧的索引为多声道音频数据的多个声道中任意一个声道中的某一音频帧对应的序号,例如对于M个待混音声道中第i个待混音声道的第m个音频帧,其索引为m。It can be understood that the index of the audio frame is the serial number corresponding to an audio frame in any one of the multiple channels of the multi-channel audio data. For example, for the mth audio frame of the ith channel to be mixed among the M channels to be mixed, its index is m.

在一些实施例中,可以根据可能会造成声道下混后音频数据破音的最小帧能量数值确定预设能量阈值。例如,预设能量阈值为-6dB或-3dB等。具体可以根据不同的电子设备、音频输出设备以及多声道音频数据进行确定,本申请对此不作限制。In some embodiments, the preset energy threshold can be determined based on the minimum frame energy value that may cause audio data to break after channel downmixing. For example, the preset energy threshold is -6dB or -3dB. Specifically, it can be determined based on different electronic devices, audio output devices, and multi-channel audio data, and this application does not limit this.

在一些实施例中,步骤303可以通过计算各联合检测声道中的各声道的第m个音频帧的平均帧能量,并对该平均帧能量进行判断,确定第m个音频帧的平均帧能量是否大于设定能量阈值。In some embodiments, step 303 may determine whether the average frame energy of the mth audio frame is greater than a set energy threshold by calculating the average frame energy of the mth audio frame of each channel in each joint detection channel and judging the average frame energy.

在一些实施例中,步骤303可以仅通过分别计算各声道的第m个音频帧的帧能量,并对该音频帧的帧能量是否大于设定能量阈值,以确定第m个音频帧的帧能量是否大于设定能量阈值。在另一些实施例中,可以通过取各联合检测声道中的各声道的第m个音频帧的最大帧能量,作为第m个音频帧的帧能量,进而判断该帧能量是否大于设定能量阈值,以确定第m个音频帧的帧能量是否大于设定能量阈值。进一步的,在确定出各联合检测声道中的各声道的第m个音频帧的最大帧能量,作为第m个音频帧的帧能量后,可以判断第m个音频帧附近至少一个音频帧中最大的帧能量是否大于设定能量阈值。例如判断第m个音频帧和第m-1个音频帧的帧能量中最大的帧能量是否大于设定能量阈值,以确定第m个音频帧的帧能量是否大于设定能量阈值。In some embodiments, step 303 can determine whether the frame energy of the mth audio frame is greater than the set energy threshold by only calculating the frame energy of the mth audio frame of each channel respectively, and determining whether the frame energy of the audio frame is greater than the set energy threshold. In other embodiments, the maximum frame energy of the mth audio frame of each channel in each joint detection channel can be taken as the frame energy of the mth audio frame, and then determining whether the frame energy is greater than the set energy threshold to determine whether the frame energy of the mth audio frame is greater than the set energy threshold. Further, after determining the maximum frame energy of the mth audio frame of each channel in each joint detection channel as the frame energy of the mth audio frame, it can be determined whether the maximum frame energy of at least one audio frame near the mth audio frame is greater than the set energy threshold. For example, it is determined whether the maximum frame energy of the frame energies of the mth audio frame and the m-1th audio frame is greater than the set energy threshold to determine whether the frame energy of the mth audio frame is greater than the set energy threshold.

在一些实施例中,设定能量阈值包括初判能量阈值(即前文中的第一阈值)和精判能量阈值(即前文中的第二阈值)。进而,步骤303可以对第m个音频帧的帧能量进行初判。具体地,先通过计算各联合检测声道中的第m个音频帧的平均帧能量,并对该平均帧能量进行判断,判断是否大于初判能量阈值,若大于,则进行进一步精判。具体地,精判可以为可以通过取各联合检测声道中的各声道的第m个音频帧的最大帧能量,作为第m个音频帧的帧能量,然后可以判断第m个音频帧附近连续的至少两个音频帧中最大的帧能量是否大于精判能量阈值。下文将结合公式对步骤303进行进一步介绍。In some embodiments, the energy threshold is set to include a preliminary judgment energy threshold (i.e., the first threshold in the foregoing text) and a precise judgment energy threshold (i.e., the second threshold in the foregoing text). Furthermore, step 303 can make a preliminary judgment on the frame energy of the mth audio frame. Specifically, first calculate the average frame energy of the mth audio frame in each joint detection channel, and judge the average frame energy to determine whether it is greater than the preliminary judgment energy threshold. If it is greater, further precise judgment is performed. Specifically, the precise judgment can be performed by taking the maximum frame energy of the mth audio frame of each channel in each joint detection channel as the frame energy of the mth audio frame, and then it can be determined whether the maximum frame energy of at least two consecutive audio frames near the mth audio frame is greater than the precise judgment energy threshold. Step 303 will be further introduced below in conjunction with the formula.

在一些实施例中,音频帧的帧能量可以通过对多声道音频数据进行傅里叶变换得到多声道音频数据的能量谱,并根据连续的多个采样点的能量值,计算得到多个采样点组成的音频帧的帧能量,具体计算方法将在下文中结合公式进行介绍。In some embodiments, the frame energy of the audio frame can be obtained by performing Fourier transform on the multi-channel audio data to obtain the energy spectrum of the multi-channel audio data, and calculating the frame energy of the audio frame composed of multiple sampling points based on the energy values of multiple consecutive sampling points. The specific calculation method will be introduced in conjunction with the formula below.

304:利用能量抑制算法对第m个音频帧进行能量抑制,得到第m个目标音频帧。304: Use an energy suppression algorithm to perform energy suppression on the mth audio frame to obtain the mth target audio frame.

可以理解,能量抑制算法为基于第m个音频帧的帧能量,确定出能量抑制后的目标音频帧的算法。在一些实施例中,可以根据计算得到的第m个音频帧的帧能量以及预设的公式,计算出能量的抑制因子,然后基于该能量抑制因子对第m个音频帧进行能量抑制,得到第m个目标音频帧。在一些实施例中,能量抑制因子为目标增益,进而能量抑制可以为根据目标增益计算得到第m个音频帧的帧增益,并根据该帧增益计算得到第m个目标音频帧。It can be understood that the energy suppression algorithm is an algorithm for determining the target audio frame after energy suppression based on the frame energy of the mth audio frame. In some embodiments, the energy suppression factor can be calculated based on the calculated frame energy of the mth audio frame and a preset formula, and then the energy of the mth audio frame is suppressed based on the energy suppression factor to obtain the mth target audio frame. In some embodiments, the energy suppression factor is a target gain, and then the energy suppression can be to calculate the frame gain of the mth audio frame based on the target gain, and calculate the mth target audio frame based on the frame gain.

在一些实施例中,当每个音频帧包括多个采样点时,在计算得到目标音频帧时,可以通过能量抑制因子,计算第m个音频帧的帧增益,并根据第m个音频帧的帧增益计算第m个音频帧中各采样点的增益(即采样点增益),进而基于各采样点的增益确定目标音频帧中各采样点的音频数据,并进行信号重建,确定第m个目标音频帧的音频数据。In some embodiments, when each audio frame includes multiple sampling points, when calculating the target audio frame, the frame gain of the mth audio frame can be calculated by using the energy suppression factor, and the gain of each sampling point in the mth audio frame (i.e., the sampling point gain) is calculated based on the frame gain of the mth audio frame, and then the audio data of each sampling point in the target audio frame is determined based on the gain of each sampling point, and signal reconstruction is performed to determine the audio data of the mth target audio frame.

305:确定第m个音频帧为第m个目标音频帧。305: Determine the mth audio frame as the mth target audio frame.

可以理解,在一些实施例中,第m个音频帧的帧能量未超过预设能量阈值,则在进行声道下混后,该音频帧对应的音频数据进行输出后,不会因为能量过高出现破音,进而也不会影响到用户的听觉体验,因此不需要对该音频帧进行能量抑制,可以保留原音频帧的音频数据。It can be understood that in some embodiments, if the frame energy of the mth audio frame does not exceed the preset energy threshold, then after channel downmixing, the audio data corresponding to the audio frame will not be distorted due to excessive energy after being output, and will not affect the user's auditory experience. Therefore, there is no need to suppress the energy of the audio frame, and the audio data of the original audio frame can be retained.

306:基于预设声道下混算法,将多个声道的各目标音频帧进行声道下混,得到混音输出数据。306: Based on a preset channel downmixing algorithm, perform channel downmixing on target audio frames of multiple channels to obtain mixed output data.

可以理解,预设声道下混规则包括声道下混前后对应的声道以及声道下混计算公式。即可以通过声道下混前后对应的声道,确定参与计算声道下混后的各混音声道的音频数据的多声道中的部分声道的目标音频帧。然后根据确定出的、对应同一混音声道的语音帧,将其代入对应的声道下混公式,达到该混音声道的混音输出数据。其中,混音输出数据即前文中的输出到蓝牙耳机200的一个声道的音频数据。It can be understood that the preset channel downmixing rule includes the corresponding channels before and after the channel downmixing and the channel downmixing calculation formula. That is, the target audio frames of some channels in the multi-channels of the audio data of each mixed channel after the channel downmixing can be determined by the corresponding channels before and after the channel downmixing. Then, according to the determined speech frame corresponding to the same mixed channel, it is substituted into the corresponding channel downmixing formula to obtain the mixed output data of the mixed channel. Among them, the mixed output data is the audio data of one channel output to the Bluetooth headset 200 in the above text.

可以理解,在一些实施例中,步骤305或304在得到目标音频帧后,可以利用杜比下混算法,进行声道下混,得到各混音声道的混音输出数据。具体地,可以对各联合检测声道中的对应索引的目标音频帧进行加权求和,得到该目标音频帧对应的混音输出数据中的混音音频帧,然后将对应同一混音声道的混音音频帧进行拼合,可以得到该混音声道的输出数据。It can be understood that in some embodiments, after obtaining the target audio frame, in step 305 or 304, the Dolby downmix algorithm can be used to perform channel downmixing to obtain the mixed output data of each mixed channel. Specifically, the target audio frames of the corresponding indexes in each joint detection channel can be weighted summed to obtain the mixed audio frames in the mixed output data corresponding to the target audio frames, and then the mixed audio frames corresponding to the same mixed channel can be spliced to obtain the output data of the mixed channel.

本申请实施例通过上述多声道的混音方法,对多声道音频数据中语音数据的能量进行跟踪,并对能量较大的音频数据进行能量抑制,进而基于能量抑制后的多声道音频数据进行声道下混。本申请实施例中的多声道的混音方法既可以适用于杜比规格的多声道音频数据,也可以适用于非杜比规格的多声道音频数据,即本申请实施例中的多声道的混音方法可以适用多种多声道的声道下混,能够实现自适应声道下混,不会因为音频数据的能量过高出现破音的情况。同时,本申请实施例中的多声道的混音方法仅对部分能量较高的音频数据进行能量抑制,未遗弃音频数据,在解决下混破音的同时还可以保留各声道的音频数据,提高用户的听觉体验。The embodiment of the present application tracks the energy of the voice data in the multi-channel audio data through the above-mentioned multi-channel mixing method, and suppresses the energy of the audio data with higher energy, and then performs channel downmixing based on the multi-channel audio data after energy suppression. The multi-channel mixing method in the embodiment of the present application can be applied to both multi-channel audio data of Dolby specifications and multi-channel audio data of non-Dolby specifications, that is, the multi-channel mixing method in the embodiment of the present application can be applied to a variety of multi-channel channel downmixing, and can achieve adaptive channel downmixing, and will not cause distortion due to excessive energy of the audio data. At the same time, the multi-channel mixing method in the embodiment of the present application only suppresses the energy of some audio data with higher energy, and does not abandon the audio data. While solving the problem of distortion in downmixing, the audio data of each channel can also be retained, thereby improving the user's auditory experience.

下面结合图4,以两声道下混为单声道的混音方法为例,对本申请实施例中的多声道的混音方法进行进一步介绍。4 , the multi-channel mixing method in the embodiment of the present application is further introduced by taking the mixing method of downmixing two channels into a single channel as an example.

图4所示为本申请实施例中的另一种多声道的混音方法的流程示意图。FIG4 is a schematic flow chart of another multi-channel mixing method in an embodiment of the present application.

如图4所示,多声道音频数据包括声道1音频数据的码流和声道2音频数据的码流。电子设备在获取到声道1音频数据和声道2音频数据后,对两声道的音频数据进行分帧处理,每个声道可以得到6个音频数据帧。As shown in Figure 4, the multi-channel audio data includes a code stream of audio data of channel 1 and a code stream of audio data of channel 2. After acquiring the audio data of channel 1 and the audio data of channel 2, the electronic device performs frame processing on the audio data of the two channels, and each channel can obtain 6 audio data frames.

由分帧后的各声道的音频数据的码流可知,声道1的音频帧中,第二个音频帧和第四个音频帧的不是很平稳,声道2的音频帧中,第二个音频帧和第三个音频帧的不是很平稳,为了避免下混后的混音数据出现破音的情况,则需要分别对声道1的音频帧中,第二个音频帧和第四个音频帧,以及声道1的音频帧中,第二个音频帧和第三个音频帧进行能量抑制,得到各声道抑制后的目标音频帧的码流。进而,可以采用杜比下混的算法,对下混后的两个声道中对应的目标音频帧进行加权求和,得到混音声道中,对应混音音频帧,6个混音音频帧组成混音输出数据。It can be seen from the code stream of the audio data of each channel after framing that the second and fourth audio frames in the audio frame of channel 1 are not very stable, and the second and third audio frames in the audio frame of channel 2 are not very stable. In order to avoid the situation of broken sound in the mixed data after downmixing, it is necessary to suppress the energy of the second and fourth audio frames in the audio frame of channel 1, and the second and third audio frames in the audio frame of channel 1, respectively, to obtain the code stream of the target audio frame after suppression of each channel. Furthermore, the Dolby downmix algorithm can be used to perform weighted summation on the corresponding target audio frames in the two channels after downmixing to obtain the corresponding mixed audio frames in the mixed channel, and 6 mixed audio frames constitute the mixed output data.

下面结合图5,对本申请实施例中的一种能量抑制方法进行进一步介绍。In conjunction with FIG5 , an energy suppression method in an embodiment of the present application is further introduced below.

图5所示为本申请实施例中的一种能量抑制方法的流程图。FIG5 is a flow chart showing an energy suppression method in an embodiment of the present application.

如图5所示,该方法包括:As shown in FIG5 , the method includes:

501:多声道数据。其中步骤501中的获取到的多声道数据即步骤301中的获取多声道音频数据,步骤501与步骤301相似,本申请在此不作赘述。501: Multi-channel data. The multi-channel data obtained in step 501 is the multi-channel audio data obtained in step 301. Step 501 is similar to step 301 and will not be described in detail in this application.

502:根据声道排布生成初始化下混算法,确定混音通道。502: Generate an initial downmix algorithm according to the channel arrangement and determine the mixing channel.

可以理解,声道排布即多声道数据中的声道种类和数量。初始化下混算法为,根据声道排布生成的、声道下混前后声道与混音声道的对应关系,以及计算得到混音声道的数据的公式算法。具体地,在一些实施例中,生成初始化下混算法可以包括通过多声道数据的mask掩码确定M个声道与N个声道的对应关系。例如,在六声道下混为两声道时,左声道、左环绕、低音、中置下混为左声道,右声道、右环绕、低音、中置下混为右声道。在一些实施例中,生成初始化下混算法还包括计算得到混音声道中各混音输出数据的公式和公式中参数的初始化。其中,声道下混时生成同一混音通道(即混音声道)的各声道可以表示为联合检测声道,即混音通道对应的联合检测声道中的各声道的数据进行加权求和后得到混音通道(即混音声道)需要输出的数据。It can be understood that the channel arrangement is the type and number of channels in the multi-channel data. The initialization downmixing algorithm is generated according to the channel arrangement, the correspondence between the front and rear channels of the channel downmixing and the mixed channels, and the formula algorithm for calculating the data of the mixed channels. Specifically, in some embodiments, generating the initialization downmixing algorithm may include determining the correspondence between M channels and N channels through the mask mask of the multi-channel data. For example, when the six channels are downmixed to two channels, the left channel, left surround, bass, and center are downmixed to the left channel, and the right channel, right surround, bass, and center are downmixed to the right channel. In some embodiments, generating the initialization downmixing algorithm also includes calculating the formula for each mixed output data in the mixed channel and the initialization of the parameters in the formula. Among them, when the channels are downmixed, each channel of the same mixed channel (i.e., mixed channel) can be expressed as a joint detection channel, that is, the data of each channel in the joint detection channel corresponding to the mixed channel is weighted and summed to obtain the data that the mixed channel (i.e., mixed channel) needs to output.

503:VAD检测结果为1。503: VAD detection result is 1.

其中VAD检测全称为语音端点检测,英文全称为VoiceActivityDetection。其中VAD检测的条件可以为音频数据的帧能量大于设定能量阈值。The full name of VAD detection is Voice Activity Detection. The condition of VAD detection can be that the frame energy of the audio data is greater than the set energy threshold.

在一些实施例中,进行VAD检测前可以先对多声道数据进行采样和分帧的处理。其中的分帧处理可以为,对多声道数据进行采样后,将部分连续的采样点划分为一个音频帧,例如取512个采样点为一个音频帧。进而,在一些实施例中,VAD检测条件可以为,检测音频帧的帧能量是否大于预设的VAD检测阈值(即前文中的第一阈值),其中的VAD检测阈值可例如帧能量大于-6dB。当VAD的检测判决结果为真(true)时,也即VAD检测结果为1,则表明该音频帧的帧能量查过了预设的VAD检测阈值,该音频帧在进行声道下混后,可能会出现破音的情况。可以理解,VAD检测即为前文中的初判,基于初判结果可以进行进一步精判。In some embodiments, the multi-channel data may be sampled and framed before VAD detection. The frame processing may be, after sampling the multi-channel data, dividing some continuous sampling points into an audio frame, for example, taking 512 sampling points as an audio frame. Furthermore, in some embodiments, the VAD detection condition may be to detect whether the frame energy of the audio frame is greater than a preset VAD detection threshold (i.e., the first threshold in the foregoing text), wherein the VAD detection threshold may be, for example, a frame energy greater than -6dB. When the VAD detection judgment result is true (true), that is, the VAD detection result is 1, it indicates that the frame energy of the audio frame exceeds the preset VAD detection threshold, and the audio frame may be broken after the channel downmixing. It can be understood that VAD detection is the initial judgment in the foregoing text, and further precise judgment can be performed based on the initial judgment result.

在一些实施例中,可以联合检测声道中的各音频帧的平均帧能量是否大于预设的VAD检测阈值。若平均帧能量过高,则需要对各声道的各音频帧进行能量跟踪,并对符合能量抑制条件的音频帧进行能量抑制。In some embodiments, it is possible to jointly detect whether the average frame energy of each audio frame in the channel is greater than a preset VAD detection threshold. If the average frame energy is too high, it is necessary to track the energy of each audio frame in each channel and suppress the energy of the audio frames that meet the energy suppression conditions.

可以理解,当VAD检测判断结果为1时,才需要对音频帧进行能量跟踪,当VAD检测的检测判决结果为假(false),也即VAD检测结果为0时,声道下混后不会由于音频帧的能量过高出现破音情况。则可以对下一个音频帧进行VAD检测。It can be understood that when the VAD detection result is 1, it is necessary to track the energy of the audio frame. When the VAD detection result is false, that is, the VAD detection result is 0, the channel downmix will not cause distortion due to the excessive energy of the audio frame. Then, the VAD detection can be performed on the next audio frame.

504:计算帧能量,跟踪联合检测声道和前后帧最大能量。504: Calculate frame energy, track the maximum energy of the joint detection channel and the previous and next frames.

可以理解,VAD检测结果为1,则需要对对应的音频帧进行能量抑制。It can be understood that if the VAD detection result is 1, it is necessary to perform energy suppression on the corresponding audio frame.

具体地,以L个采样点为一个语音帧,则可以通过如下公式计算出音频帧的帧能量:Specifically, taking L sampling points as a speech frame, the frame energy of the audio frame can be calculated by the following formula:

其中,表示第i个声道中第n个音频帧的帧能量。β为计算帧能量时的平滑系数,xi(n)(k)表示第i个声道中第n个音频帧的第k个采样点的输入数据,也即获取的多通道数据中的部分数据。其中i=0,1,2,3,4,5·····,i的取值与声道的数量有关,k可以取0至L之间的整数,n的取值范围为对多声道数据进行分帧后的语音帧的数量。在一些实施例中,平滑系数β=0.3。in, Represents the frame energy of the nth audio frame in the ith channel. β is the smoothing coefficient when calculating the frame energy, and x i (n)(k) represents the input data of the kth sampling point of the nth audio frame in the ith channel, that is, part of the data in the acquired multi-channel data. Wherein i=0,1,2,3,4,5·····, the value of i is related to the number of channels, k can be an integer between 0 and L, and the value range of n is the number of speech frames after framing the multi-channel data. In some embodiments, the smoothing coefficient β=0.3.

可以理解,在上述步骤504中,通过计算帧能量可以实现对各声道的数据的帧能量的跟踪,进而当对音频帧的帧能量进行追踪,判断帧能量大于设定检测阈值时,可以执行步骤505。It can be understood that in the above step 504, by calculating the frame energy The frame energy of the data of each channel can be tracked, and then when the frame energy of the audio frame is tracked and it is determined that the frame energy is greater than the set detection threshold, step 505 can be executed.

具体地,可以计算联合检测声道中,对应音频帧中能量最大的音频帧,具体可以通过如下方式进行确定。Specifically, the audio frame with the maximum energy in the corresponding audio frames in the joint detection channel may be calculated, and may be determined in the following manner.

其中,表示联合检测声道的各声道数据中的第n个音频帧的最大帧能量值。in, Indicates the maximum frame energy value of the nth audio frame in each channel data of the joint detection channel.

在一些实施例中,通过上述公式(2)确定出当前第n个音频帧的最大帧能量值后,可以通过以下公式,确定前后音频帧中最大的能量值:In some embodiments, after the maximum frame energy value of the current n-th audio frame is determined by the above formula (2), the maximum energy value of the previous and next audio frames can be determined by the following formula:

其中,表示第n个音频帧的前后音频帧中的能量最大值,表示联合检测声道的各声道数据中的第n-1个音频帧的最大帧能量值,表示联合检测声道的各声道数据中的第n+1个音频帧的最大帧能量值。in, Indicates the maximum energy of the audio frames before and after the nth audio frame. represents the maximum frame energy value of the n-1th audio frame in each channel data of the joint detection channel, Indicates the maximum frame energy value of the n+1th audio frame in each channel data of the joint detection channel.

在一些实施例中,还通过确定计算第n个和第n-1个音频帧的前后音频帧中的能量最大值中的最大值得到。In some embodiments, The maximum value of the energy maxima in the audio frames before and after the n-th and n-1-th audio frames is also determined and calculated.

505:计算目标增益和帧增益,最后计算每个采样点的增益,乘以固定增益后输出结果。505: Calculate the target gain and the frame gain, and finally calculate the gain of each sampling point, multiply it by the fixed gain, and output the result.

可以理解,其中的输出结果为进行声道下混时的输入数据。It can be understood that the output result is the input data when the channel downmixing is performed.

在一些实施例中,计算得到后,可以先判断当前的第n个音频帧的能量是否超过设定检测阈值(即上文的第二阈值),若超过,则表明该语音帧在进行声道下混后有破音风险,需要进行能量抑制,进而可以计算该音频帧的目标增益和帧增益确定能量抑制因子。In some embodiments, the calculation is After that, it can be determined whether the energy of the current nth audio frame exceeds the set detection threshold (i.e., the second threshold mentioned above). If it exceeds, it indicates that the speech frame has the risk of distortion after channel downmixing and needs energy suppression. Then, the target gain and frame gain of the audio frame can be calculated to determine the energy suppression factor.

可以理解,设定检测阈值的判断对每个声道的帧能量进行精确判断,以确定是否需要对能量过大的音频帧进行抑制。It can be understood that the determination of setting the detection threshold accurately determines the frame energy of each channel to determine whether it is necessary to suppress the audio frame with excessive energy.

可以理解,目标增益可以理解为能量抑制因子,用于对帧增益进行降幅,进而实现对音频帧的能量进行降幅的目的。It can be understood that the target gain can be understood as an energy suppression factor, which is used to reduce the frame gain, thereby achieving the purpose of reducing the energy of the audio frame.

在一些实施例中,目标增益可以通过如下公式进行计算:In some embodiments, the target gain may be calculated by the following formula:

可以理解,其中Threshold表示设定检测阈值,Threshold=-3dB。表示第i个声道的第n个音频帧的目标增益。It can be understood that Threshold represents the set detection threshold, Threshold = -3dB. Indicates the target gain of the nth audio frame of the ith channel.

由公式(4)可知,当前后帧的最大帧能量值小于设定检测阈值Threshold时,表明该音频帧的帧能量适当,不会出现因能量过高而造成下混破音的情况。当前后帧的最大帧能量值大于或等于设定检测阈值Threshold时,表明该音频帧的帧能量过高,可能出现因能量过高而造成下混破音的情况,需要计算器目标增益,对音频帧的帧增益进行抑制。It can be seen from formula (4) that when the maximum frame energy value of the previous and next frames is less than the set detection threshold Threshold, it indicates that the frame energy of the audio frame is appropriate and there will be no downmixing distortion caused by excessive energy. When the maximum frame energy value of the previous and next frames is greater than or equal to the set detection threshold Threshold, it indicates that the frame energy of the audio frame is too high and there may be downmixing distortion caused by excessive energy. It is necessary to calculate the target gain and suppress the frame gain of the audio frame.

在一些实施例中,计算得到目标增益后,可以通过以下公式确定音频帧的帧增益:In some embodiments, after the target gain is calculated, the frame gain of the audio frame may be determined by the following formula:

其中,表示第i个声道的第n个音频帧的帧增益,表示第i个声道的第n-1个音频帧的帧增益,α表示计算帧增益时的平滑系数。在一些实施例中,平滑系数α=0.1。in, represents the frame gain of the nth audio frame of the ith channel, represents the frame gain of the n-1th audio frame of the ith channel, and α represents a smoothing coefficient when calculating the frame gain. In some embodiments, the smoothing coefficient α=0.1.

在一些实施例中,根据上述公式(5)计算得到帧增益后,可以采用下列公式计算音频帧中每个采样点的采样点增益:In some embodiments, after the frame gain is calculated according to the above formula (5), the sampling point gain of each sampling point in the audio frame can be calculated using the following formula:

其中,表示第i个声道的第n个音频帧中的第k个采样点的采样点增益,表示第i个声道的第n-1个音频帧中的第k个采样点的采样点增益,FrameLen表示音频帧的帧长。例如,以512个采样点为一个音频帧,则该音频帧的帧长为512。in, represents the sampling point gain of the kth sampling point in the nth audio frame of the ith channel, represents the sampling point gain of the kth sampling point in the n-1th audio frame of the ith channel, and FrameLen represents the frame length of the audio frame. For example, if 512 sampling points are used as an audio frame, the frame length of the audio frame is 512.

由上述公式(6)可知,在计算采样点增益时,各采样点的采样点增益与其对应的语音帧的帧增益以及前一个语音帧的对应索引数的采样点增益有关。其中各采样点的索引为各采样点在对应音频帧中的序号,例如一个音频帧中的第k个采样点的索引为k。It can be seen from the above formula (6) that when calculating the sampling point gain, the sampling point gain of each sampling point is related to the frame gain of the corresponding speech frame and the sampling point gain of the corresponding index number of the previous speech frame. The index of each sampling point is the sequence number of each sampling point in the corresponding audio frame, for example, the index of the kth sampling point in an audio frame is k.

进一步地,在一些实施例中,根据公式6中得到的采样点增益计算得到目标音频帧的计算公式为:Further, in some embodiments, the calculation formula for obtaining the target audio frame according to the sampling point gain obtained in Formula 6 is:

其中,αi(n)[i][k]表示第i个声道的第n个目标音频帧的第k个采样点的音频数据,xi(n)[i][k]表示第i个声道的第n个目标音频帧的第k个采样点的音频数据,即公式1中的中的xi(n)(k)。Among them, α i (n)[i][k] represents the audio data of the kth sampling point of the nth target audio frame of the ith channel, and xi (n)[i][k] represents the audio data of the kth sampling point of the nth target audio frame of the ith channel, that is, xi (n)(k) in Formula 1.

可以理解,在一些实施例中根据目标音频帧中离散的各采样点的音频数据,可以得到连续的目标音频帧的音频数据。进而可以将各目标音频帧的音频数据作为声道下混的输入数据。It can be understood that in some embodiments, audio data of continuous target audio frames can be obtained based on audio data of discrete sampling points in the target audio frame, and the audio data of each target audio frame can be used as input data for channel downmixing.

在一些实施例中,计算得到能量抑制后的各音频帧后,可以通过下列公式进行声道下混:In some embodiments, after calculating each audio frame after energy suppression, channel downmixing can be performed using the following formula:

其中,aj(n)表示第j个混音通道的第n个音频帧的输出结果,wi,j(n)表示第j个混音通道中第i个声道的第n个音频帧的混音权重,αi(n)表示进行声道下混时,第i个声道的第n个音频帧的输入数据。当声道下混为两声道时,j=0,1,当声道下混为三声道时,j=0,1,2,以此类推。A=0,1,2,3,4,5······,分别表示混音前的多个声道。例如A=0,1,2,3,4,5,则分别代表左声道、左环绕声道、低音声道、中置声道、右声道、右环绕声道。J表示第j个混音通道对应的混音前的声道数。Among them, a j (n) represents the output result of the nth audio frame of the jth mixing channel, w i,j (n) represents the mixing weight of the nth audio frame of the ith channel in the jth mixing channel, and α i (n) represents the input data of the nth audio frame of the ith channel when the channel is downmixed. When the channel is downmixed to two channels, j = 0, 1, when the channel is downmixed to three channels, j = 0, 1, 2, and so on. A = 0, 1, 2, 3, 4, 5······, respectively represent multiple channels before mixing. For example, A = 0, 1, 2, 3, 4, 5, respectively represent the left channel, left surround channel, bass channel, center channel, right channel, and right surround channel. J represents the number of channels before mixing corresponding to the jth mixing channel.

为了更清楚地阐述本申请实施例中的多声道的混音方法的积极效果,下面结合图6和图7,对本申请实施例中的多声道的混音方法进行仿真。其中,仿真软件为clion软件。并且,以多声道音频数据为5.1声道音频数据,平滑系数α=0.1,β=0.3,设定检测阈值Threshold=-3dB为仿真条件进行仿真。In order to more clearly illustrate the positive effects of the multi-channel mixing method in the embodiment of the present application, the multi-channel mixing method in the embodiment of the present application is simulated in combination with Figures 6 and 7. The simulation software is clion software. In addition, the multi-channel audio data is 5.1 channel audio data, the smoothing coefficient α=0.1, β=0.3, and the detection threshold Threshold=-3dB is set as the simulation condition for simulation.

图6所示为本申请实施例中的一种多声道音频数据的码流波形示意图。FIG. 6 is a schematic diagram of a code stream waveform of multi-channel audio data in an embodiment of the present application.

图7所示为进行声道下混后的混音声道的码流波形示意图和能量谱示意图。FIG. 7 is a schematic diagram showing a code stream waveform and an energy spectrum of a mixed channel after channel downmixing.

如图6所示,图中的六个码流波形图从上到下依次代表左声道、右声道、中置声道、低音声道、左环绕声道、右环绕声道。其中横坐标表示时间,纵坐标表示音频数据的音频数据。在进行码流透传的过程中,每个采样点都用16位表示,因此采用固定系数下混方案的情况下,如果想要保证混音结果不破音,则各个声道相加要小于16位能表示的范围,否则数据就会溢出回绕,进而发生数据跳变,从而会产生杂音。如图6,如果采用杜比下混系数进行混音,图6中的方框内的部分数据存在能量峰值较大的情况,若要保证下混不破音,则声道下混的系数需要变得很小。同时由于音量的大小和能量正相关,相应的混音声道音量都会变小,导致下混后音频的整体音量变小,影响用户的听觉体验。As shown in FIG6, the six code stream waveforms in the figure represent the left channel, the right channel, the center channel, the bass channel, the left surround channel, and the right surround channel from top to bottom. The horizontal axis represents time, and the vertical axis represents the audio data of the audio data. In the process of code stream transparent transmission, each sampling point is represented by 16 bits. Therefore, in the case of a fixed coefficient downmixing scheme, if you want to ensure that the mixing result is not broken, the sum of each channel must be less than the range that can be represented by 16 bits, otherwise the data will overflow and wrap around, and then data jumps will occur, which will cause noise. As shown in FIG6, if the Dolby downmix coefficient is used for mixing, some of the data in the box in FIG6 has a large energy peak. If you want to ensure that the downmix is not broken, the coefficient of the channel downmix needs to become very small. At the same time, since the volume is positively correlated with energy, the volume of the corresponding mixed channel will become smaller, resulting in a smaller overall volume of the downmixed audio, affecting the user's auditory experience.

如图7所示,第一行码流波形图为采用本申请中的多声道的混音方法得到的左声道混音声道的音频数据的码流波形图,第二行码流波形图为采用杜比混音方法得到的右声道混音声道的音频数据的码流波形图。在第一行码流波形图和第二行码流波形图中,横坐标表示时间,纵坐标表示音频数据的音频数据。第三行对应于第一行的左声道混音声道的音频数据,为左声道混音声道的音频数据的能量谱,第四行对应于第二行的右声道混音声道的音频数据,为右声道混音声道的音频数据的能量谱。在第三行能量谱和第四行能量谱中,横坐标表示时间,纵坐标表示能量。As shown in Figure 7, the first row of code stream waveforms is a code stream waveform of the audio data of the left channel mixing channel obtained by the multi-channel mixing method in this application, and the second row of code stream waveforms is a code stream waveform of the audio data of the right channel mixing channel obtained by the Dolby mixing method. In the first row of code stream waveforms and the second row of code stream waveforms, the horizontal axis represents time, and the vertical axis represents the audio data of the audio data. The third row corresponds to the audio data of the left channel mixing channel of the first row, which is the energy spectrum of the audio data of the left channel mixing channel, and the fourth row corresponds to the audio data of the right channel mixing channel of the second row, which is the energy spectrum of the audio data of the right channel mixing channel. In the third row of energy spectrum and the fourth row of energy spectrum, the horizontal axis represents time, and the vertical axis represents energy.

由图7可以看出,经过本申请实施例中的多声道的混音方法处理后的所声道的音频数据的码流波形,在包络变化上更加平稳,极少出现较大的波动,其能量条纹也比较稳定,未出现过高而遍布整个频域的情况。而未经过本申请中的多声道的混音方法处理的右声道混音声道的音频数据的码流波形在其能量较大的部分,例如图7中方框所框选的音频数据,能量条纹遍布整个频域,出现明显破音的问题,杂音也比较明显。As can be seen from FIG. 7, the code stream waveform of the audio data of the right channel after being processed by the multi-channel mixing method in the embodiment of the present application is more stable in envelope changes, rarely showing large fluctuations, and its energy stripes are relatively stable, not too high and spread throughout the entire frequency domain. However, the code stream waveform of the audio data of the right channel mixed channel that has not been processed by the multi-channel mixing method in the present application has energy stripes throughout the entire frequency domain in its energy-rich part, such as the audio data framed by the box in FIG. 7, and there is an obvious problem of broken sound, and the noise is also relatively obvious.

可见,本申请实施例中的多声道的混音方法,通过对各声道的语音数据进行能量跟踪,进而实现对能量过高的音频数据进行能量抑制,在不损失音频数据的前提下,降低了混音破音的风险,提高用户的听觉体验。It can be seen that the multi-channel mixing method in the embodiment of the present application performs energy tracking on the voice data of each channel, thereby suppressing the energy of audio data with excessively high energy. This reduces the risk of mixing distortion without losing audio data, thereby improving the user's auditory experience.

图8根据本申请的实施例,示出了一种手机100的硬件结构示意图。FIG8 shows a schematic diagram of the hardware structure of a mobile phone 100 according to an embodiment of the present application.

手机100能够执行本申请实施例提供的显示方法。在图8中,相似的部件具有同样的附图标记。如图8所示,手机100可以包括处理器110、电源模块140、存储器180、摄像头101、移动通信模块130、无线通信模块120、传感器模块190、音频模块150、接口模块160以及显示屏102等。The mobile phone 100 can execute the display method provided in the embodiment of the present application. In FIG8 , similar components have the same reference numerals. As shown in FIG8 , the mobile phone 100 may include a processor 110, a power module 140, a memory 180, a camera 101, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, an interface module 160, and a display screen 102, etc.

可以理解的是,本发明实施例示意的结构并不构成对手机100的具体限定。在本申请另一些实施例中,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It is to be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the mobile phone 100. In other embodiments of the present application, the mobile phone 100 may include more or fewer components than shown in the figure, or combine some components, or separate some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

处理器110可以包括一个或多个处理单元,例如,可以包括中央处理器(CentralProcessing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)、图像信号处理器(image signal processor,ISP)、数字信号处理器(Digital Signal Processor,DSP)、微处理器(Micro-programmed Control Unit,MCU)、人工智能(Artificial Intelligence,AI)处理器或可编程逻辑器件(Field Programmable Gate Array,FPGA)等的处理模块或处理电路。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。例如,在本申请的一些实例中,处理器110可以用来判断第m个音频帧的能量是否大于设定能量阈值,并计算能量抑制因子。在一些实施例中,处理器110还可以用于对得到的目标音频帧进行声道下混,得到混音输出数据。The processor 110 may include one or more processing units, for example, a central processing unit (CPU), a graphics processing unit (GPU), an image signal processor (ISP), a digital signal processor (DSP), a microprocessor (MCU), an artificial intelligence (AI) processor, or a processing module or processing circuit of a programmable logic device (FPGA). Among them, different processing units can be independent devices or integrated in one or more processors. For example, in some examples of the present application, the processor 110 can be used to determine whether the energy of the mth audio frame is greater than a set energy threshold and calculate an energy suppression factor. In some embodiments, the processor 110 can also be used to down-mix the channels of the obtained target audio frame to obtain mixed output data.

存储器180可用于存储数据、软件程序以及模块,可以是易失性存储器(VolatileMemory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(FlashMemory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,或者也可以是可移动存储介质,例如安全数字(Secure Digital,SD)存储卡。在申请的一些实施例中,存储器180用于存储手机100多声道音频数据以及预设声道下混算法。The memory 180 can be used to store data, software programs and modules, and can be a volatile memory (VolatileMemory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (Non-Volatile Memory), such as a read-only memory (Read-Only Memory, ROM), a flash memory (FlashMemory), a hard disk (Hard Disk Drive, HDD) or a solid-state drive (SSD); or a combination of the above types of memory, or it can also be a removable storage medium, such as a secure digital (Secure Digital, SD) memory card. In some embodiments of the application, the memory 180 is used to store multi-channel audio data of the mobile phone 100 and a preset channel downmixing algorithm.

电源模块140可以包括电源、电源管理部件等。电源可以为电池。电源管理部件用于管理电源的充电和电源向其他模块的供电。充电管理模块用于从充电器接收充电输入;电源管理模块用于连接电源,充电管理模块与处理器110。The power module 140 may include a power source, a power management component, etc. The power source may be a battery. The power management component is used to manage the charging of the power source and the power supply of the power source to other modules. The charging management module is used to receive charging input from the charger; the power management module is used to connect the power source, the charging management module and the processor 110.

移动通信模块130可以包括但不限于天线、功率放大器、滤波器、低噪声放大器(Low Noise Amplify,LNA)等。移动通信模块130可以提供应用在手机100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块130可以由天线接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块130还可以对经调制解调处理器调制后的信号放大,经天线转为电磁波辐射出去。在一些实施例中,移动通信模块130的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块130至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, a low noise amplifier (LNA), etc. The mobile communication module 130 may provide solutions for wireless communications including 2G/3G/4G/5G, etc., applied to the mobile phone 100. The mobile communication module 130 may receive electromagnetic waves by an antenna, and perform filtering, amplification, and other processing on the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 130 may also amplify the signal modulated by the modulation and demodulation processor, and convert it into electromagnetic waves for radiation through the antenna. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be arranged in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be arranged in the same device as at least some of the modules of the processor 110.

无线通信模块120可以包括天线,并经由天线实现对电磁波的收发。无线通信模块120可以提供应用在手机100上的包括无线局域网(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(Infrared,IR)等无线通信的解决方案。手机100可以通过无线通信技术与网络以及其他设备进行通信。The wireless communication module 120 may include an antenna, and transmit and receive electromagnetic waves via the antenna. The wireless communication module 120 may provide wireless communication solutions including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication technology (NFC), infrared technology (IR), etc., which are applied to the mobile phone 100. The mobile phone 100 may communicate with the network and other devices through wireless communication technology.

在一些实施例中,手机100的移动通信模块130和无线通信模块120也可以位于同一模块中。In some embodiments, the mobile communication module 130 and the wireless communication module 120 of the mobile phone 100 may also be located in the same module.

摄像头101用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件把光信号转换成电信号,之后将电信号传递给ISP(Image Signal Processor,图像信号处理器)转换成数字图像信号。手机100可以通过ISP,摄像头101,视频编解码器,GPU(Graphic Processing Unit,图形处理器),显示屏102以及应用处理器等实现拍摄功能。例如,在本申请的一些实施例中,摄像头101用于采集人脸图像、二维码图像,用于手机100进行人脸识别、二维码识别等。。The camera 101 is used to capture static images or videos. The object generates an optical image through the lens and projects it onto the photosensitive element. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP (Image Signal Processor) to convert it into a digital image signal. The mobile phone 100 can implement the shooting function through the ISP, the camera 101, the video codec, the GPU (Graphic Processing Unit), the display screen 102, and the application processor. For example, in some embodiments of the present application, the camera 101 is used to collect facial images and QR code images, and the mobile phone 100 performs facial recognition, QR code recognition, etc.

显示屏102包括显示面板。显示面板可以采用液晶显示屏(Liquid CrystalDisplay,LCD),有机发光二极管(Organic Light-emitting Diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(Active-matrix Organic Light-emitting Diode的,AMOLED),柔性发光二极管(Flex Light-emitting Diode,FLED),Mini LED,Micro LED,Micro OLED,量子点发光二极管(Quantum Dot Light-emitting Diodes,QLED)等。例如,显示屏102用于显示手机100在横屏/竖屏状态下分屏、平行视界、单个APP独占屏幕等模式下的各个UI界面。The display screen 102 includes a display panel. The display panel may be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini LED, a Micro LED, a Micro OLED, a quantum dot light-emitting diode (QLED), etc. For example, the display screen 102 is used to display various UI interfaces of the mobile phone 100 in the horizontal/vertical screen state, such as split screen, parallel horizon, and a single APP exclusive screen.

传感器模块190可以包括接近光传感器、压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。The sensor module 190 may include a proximity light sensor, a pressure sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

音频模块150可以将数字音频信息转换成模拟音频信号输出,或者将模拟音频输入转换为数字音频信号。音频模块150还可以用于对音频信号编码和解码。在一些实施例中,音频模块150可以设置于处理器110中,或将音频模块150的部分功能模块设置于处理器110中。The audio module 150 can convert digital audio information into analog audio signal output, or convert analog audio input into digital audio signal. The audio module 150 can also be used to encode and decode audio signals. In some embodiments, the audio module 150 can be arranged in the processor 110, or some functional modules of the audio module 150 can be arranged in the processor 110.

接口模块160包括外部存储器接口、通用串行总线(Universal Serial Bus,USB)接口及用户标识模块(Subscriber Identification Module,SIM)卡接口等。其中外部存储器接口可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机100的存储能力。外部存储卡通过外部存储器接口与处理器110通信,实现数据存储功能。通用串行总线接口用于手机100和其他手机进行通信。用户标识模块卡接口用于与安装至手机100的SIM卡进行通信,例如读取SIM卡中存储的电话号码,或将电话号码写入SIM卡中。The interface module 160 includes an external memory interface, a Universal Serial Bus (USB) interface, and a Subscriber Identification Module (SIM) card interface. The external memory interface can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 100. The external memory card communicates with the processor 110 via the external memory interface to implement a data storage function. The Universal Serial Bus interface is used for the mobile phone 100 to communicate with other mobile phones. The Subscriber Identification Module card interface is used to communicate with a SIM card installed in the mobile phone 100, for example, to read a phone number stored in the SIM card, or to write a phone number into the SIM card.

在一些实施例中,手机100还包括按键、马达以及指示器等。其中,按键可以包括音量键、开/关机键等。马达用于使手机100产生振动效果。指示器可以包括激光指示器、射频指示器、LED指示器等。In some embodiments, the mobile phone 100 further includes buttons, motors, and indicators, etc. Among them, the buttons may include a volume button, a power on/off button, etc. The motor is used to make the mobile phone 100 produce a vibration effect. The indicator may include a laser indicator, a radio frequency indicator, an LED indicator, etc.

本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。The various embodiments of the mechanism disclosed in the present application can be implemented in hardware, software, firmware or a combination of these implementation methods. The embodiments of the present application can be implemented as a computer program or program code executed on a programmable system, which includes at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device and at least one output device.

可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(Digital Signal Processor,DSP)、微控制器、专用集成电路(Application Specific Integrated Circuit,ASIC)或微处理器之类的处理器的任何系统。Program code can be applied to input instructions to perform the functions described in this application and generate output information. The output information can be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.

程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random AccessMemory,RAM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-OnlyMemory,EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。Program code can be implemented with high-level procedural language or object-oriented programming language to communicate with processing system. When necessary, program code can also be implemented with assembly language or machine language. In fact, the mechanism described in this application is not limited to the scope of any specific programming language. In either case, the language can be a compiled language or an interpreted language. In some cases, the disclosed embodiments can be implemented with hardware, firmware, software or any combination thereof. The disclosed embodiments can also be implemented as instructions carried or stored thereon by one or more temporary or non-temporary machine-readable (e.g., computer-readable) storage media, which can be read and executed by one or more processors. For example, instructions can be distributed over a network or by other computer-readable media. Therefore, the machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including but not limited to floppy disks, optical disks, optical discs, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROM), random access memories (RAM), erasable programmable read-only memories (EPROM), electrically erasable programmable read-only memories (EEPROM), magnetic or optical cards, flash memory, or tangible machine-readable storage for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet in electrical, optical, acoustic or other forms of propagation signals. Therefore, the machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

此外,本申请的技术方案还提供一种计算机可读存储介质,计算机可读存储介质上存储有指令,该指令在电子设备100上执行时使电子设备100执行本申请技术方案提供的显示方法。In addition, the technical solution of the present application also provides a computer-readable storage medium, on which instructions are stored. When the instructions are executed on the electronic device 100, the electronic device 100 executes the display method provided by the technical solution of the present application.

此外,本申请的技术方案还提供一种计算机程序产品,该计算机程序产品包括指令,指令用于实现本申请技术方案提供的显示方法。In addition, the technical solution of the present application also provides a computer program product, which includes instructions, and the instructions are used to implement the display method provided by the technical solution of the present application.

此外,本申请的技术方案还提供一种芯片装置,芯片装置包括:通信接口,用于输入和/或输出信息;处理器,用于执行计算机可执行程序,使得安装有芯片装置的设备执行本申请技术方案提供的显示方法。In addition, the technical solution of the present application also provides a chip device, which includes: a communication interface for inputting and/or outputting information; a processor for executing a computer executable program so that a device equipped with the chip device executes the display method provided by the technical solution of the present application.

在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。In the accompanying drawings, some structural or method features may be shown in a specific arrangement and/or order. However, it should be understood that such a specific arrangement and/or order may not be required. Instead, in some embodiments, these features may be arranged in a manner and/or order different from that shown in the illustrative drawings. In addition, the inclusion of structural or method features in a particular figure does not mean that such features are required in all embodiments, and in some embodiments, these features may not be included or may be combined with other features.

需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。It should be noted that the units/modules mentioned in the various device embodiments of the present application are all logical units/modules. Physically, a logical unit/module can be a physical unit/module, or a part of a physical unit/module, or can be implemented as a combination of multiple physical units/modules. The physical implementation method of these logical units/modules themselves is not the most important. The combination of functions implemented by these logical units/modules is the key to solving the technical problems proposed by the present application. In addition, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules that are not closely related to solving the technical problems proposed by the present application, which does not mean that there are no other units/modules in the above-mentioned device embodiments.

需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in the examples and description of this patent, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "including one" do not exclude the existence of other identical elements in the process, method, article or device including the elements.

虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Although the present application has been illustrated and described with reference to certain preferred embodiments thereof, it will be apparent to those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (16)

1.一种多声道的混音方法,应用于电子设备,其特征在于,包括:1. A multi-channel mixing method, applied to electronic equipment, characterized by including: 获取第一多声道音频数据,所述第一多声道音频数据包括M个待混音声道的音频数据;Obtain first multi-channel audio data, where the first multi-channel audio data includes audio data of M channels to be mixed; 确定出所述第一多声道音频数据中存在能量大于预设能量阈值的音频数据,并对所述第一多声道音频数据中能量大于所述预设能量阈值的所述音频数据进行能量降幅处理;Determine that there is audio data with an energy greater than a preset energy threshold in the first multi-channel audio data, and perform energy analysis on the audio data with an energy greater than the preset energy threshold in the first multi-channel audio data. Amplitude reduction processing; 根据所述能量降幅处理结果,得到第二多声道音频数据;Obtain second multi-channel audio data according to the energy reduction processing result; 对所述第二多声道音频数据进行下混,得到具有N个混音声道的混音输出数据,其中M>N,并且N≥1。The second multi-channel audio data is downmixed to obtain mixing output data having N mixing channels, where M>N, and N≥1. 2.根据权利要求1所述的多声道的混音方法,其特征在于,所述确定出所述第一多声道音频数据中存在能量大于预设能量阈值的音频数据,包括:2. The multi-channel mixing method according to claim 1, wherein determining that there is audio data with energy greater than a preset energy threshold in the first multi-channel audio data includes: 对所述第一多声道音频数据进行分帧处理,得到多个音频帧,并确定所述多个音频帧的帧能量;Perform frame segmentation processing on the first multi-channel audio data to obtain multiple audio frames, and determine the frame energy of the multiple audio frames; 确定出所述第一多声道音频数据中存在帧能量大于预设能量阈值的高能量音频帧。It is determined that there is a high-energy audio frame whose frame energy is greater than a preset energy threshold in the first multi-channel audio data. 3.根据权利要求2所述的多声道的混音方法,其特征在于,所述高能量音频帧的帧能量是通过下列公式确定的:3. The multi-channel mixing method according to claim 2, wherein the frame energy of the high-energy audio frame is determined by the following formula: 其中,所述高能量音频帧包括L个采样点;Wherein, the high-energy audio frame includes L sampling points; β表示帧能量平滑系数;β represents the frame energy smoothing coefficient; xi(n)(k)表示所述M个待混音声道中第i个待混音声道的第n个音频帧中的第k个采样点的音频数据;x i (n) (k) represents the audio data of the k-th sampling point in the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; 表示所述M个待混音声道中第i个待混音声道的所述第n个音频帧中的所述第k个采样点的能量; Represents the energy of the k-th sampling point in the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; 表示所述M个待混音声道中第i个待混音声道的所述第n个音频帧的帧能量。 Indicates the frame energy of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed. 4.根据权利要求2所述的多声道的混音方法,其特征在于,所述预设能量阈值包括第一阈值和/或第二阈值;4. The multi-channel mixing method according to claim 2, wherein the preset energy threshold includes a first threshold and/or a second threshold; 所述高能量音包括下列至少之一:The high-energy sound includes at least one of the following: 所述M个待混音声道的所述多个音频帧中,对应于同一混音声道的索引相同的至少一个音频帧的平均帧能量大于所述第一阈值的音频帧为所述高能量音频帧;Among the multiple audio frames of the M to-be-mixed channels, the audio frame corresponding to at least one audio frame with the same index of the same mixing channel whose average frame energy is greater than the first threshold is the high energy audio frames; 同一待混音声道的各音频帧中,与对应音频帧连续的至少两个音频帧的最大帧能量大于所述第二阈值的音频帧为所述高能量音频帧。Among the audio frames of the same channel to be mixed, the audio frame in which the maximum frame energy of at least two audio frames that are continuous with the corresponding audio frame is greater than the second threshold is the high-energy audio frame. 5.根据权利要求4所述的多声道的混音方法,其特征在于,所述M个待混音声道的各所述音频帧的最大帧能量是根据与各所述音频帧对应于同一混音声道且索引相同的音频帧中的帧能量最大的音频帧的帧能量确定的。5. The multi-channel mixing method according to claim 4, characterized in that the maximum frame energy of each audio frame of the M to be mixed channels is based on the value corresponding to each audio frame. The frame energy of the audio frame with the largest frame energy among audio frames with the same mixing channel and the same index is determined. 6.根据权利要求2所述的多声道的混音方法,其特征在于,对所述第一多声道音频数据中能量大于所述预设能量阈值的所述音频数据进行能量降幅处理,包括:6. The multi-channel mixing method according to claim 2, characterized in that, energy reduction processing is performed on the audio data in the first multi-channel audio data whose energy is greater than the preset energy threshold, include: 确定所述高能量音频帧的目标增益,并根据所述目标增益确定所述高能量音频帧的帧增益;Determine a target gain of the high-energy audio frame, and determine a frame gain of the high-energy audio frame based on the target gain; 根据所述高能量音频帧的帧增益,确定能量降幅处理后所述高能量音频帧对应的目标音频帧。According to the frame gain of the high-energy audio frame, a target audio frame corresponding to the high-energy audio frame after energy reduction processing is determined. 7.根据权利要求6所述的多声道的混音方法,其特征在于,所述高能量音频帧的所述目标增益是根据所述预设能量阈值,以及与各所述高能量音频帧连续的至少两个音频帧的最大帧能量确定的。7. The multi-channel mixing method according to claim 6, wherein the target gain of the high-energy audio frame is based on the preset energy threshold and is related to each of the high-energy audio frames. Determined by the maximum frame energy of at least two consecutive audio frames. 8.根据权利要求7所述的多声道的混音方法,其特征在于,所述帧增益是通过下列公式确定的:8. The multi-channel mixing method according to claim 7, wherein the frame gain is determined by the following formula: 其中,in, α表示帧增益平滑系数;α represents the frame gain smoothing coefficient; 表示所述M个待混音声道中第i个待混音声道的第n个音频帧的目标增益; Represents the target gain of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; 表示所述M个待混音声道中所述第i个待混音声道的所述第n-1个音频帧的帧增益; Represents the frame gain of the n-1th audio frame of the i-th audio channel to be mixed among the M audio channels to be mixed; 表示所述M个待混音声道中所述第i个待混音声道的所述第n个音频帧的帧增益。 Represents the frame gain of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed. 9.根据权利要求6所述的多声道的混音方法,其特征在于,所述根据所述高能量音频帧的帧增益,确定能量降幅处理后所述高能量音频帧对应的目标音频帧,包括:9. The multi-channel mixing method according to claim 6, wherein the target audio frame corresponding to the high-energy audio frame after energy reduction processing is determined according to the frame gain of the high-energy audio frame. ,include: 根据所述高能量音频帧的帧增益,确定所述高能量音频帧中各采样点的采样点增益;Determine the sampling point gain of each sampling point in the high-energy audio frame according to the frame gain of the high-energy audio frame; 根据各所述采样点增益,对所述高能量音频帧中的各采样点的音频数据进行能量降幅处理,得到所述目标音频帧中各采样点的音频数据;According to the gain of each sampling point, perform energy reduction processing on the audio data of each sampling point in the high-energy audio frame to obtain the audio data of each sampling point in the target audio frame; 根据所述目标音频帧各采样点的音频数据生成所述目标音频帧。The target audio frame is generated according to the audio data of each sampling point of the target audio frame. 10.根据权利要求9所述的多声道的混音方法,其特征在于,各所述采样点增益是通过下列公式确定的:10. The multi-channel mixing method according to claim 9, characterized in that the gain of each sampling point is determined by the following formula: 其中,in, FrameLen表示所述目标音频帧的帧长;FrameLen represents the frame length of the target audio frame; 表示所述M个待混音声道中第i个待混音声道的第n-1个音频帧的帧增益; Represents the frame gain of the n-1th audio frame of the i-th channel to be mixed among the M channels to be mixed; 表示所述M个待混音声道中所述第i个待混音声道的第n个音频帧的帧增益; Represents the frame gain of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; 表示所述M个待混音声道中所述第i个待混音声道的所述第n-1个音频帧的第k个采样点的采样点增益; Represents the sampling point gain of the k-th sampling point of the n-1th audio frame of the i-th audio channel to be mixed among the M audio channels to be mixed; 表示所述M个待混音声道中所述第i个待混音声道的所述第n个音频帧的第k个采样点的采样点增益。 Represents the sampling point gain of the k-th sampling point of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed. 11.根据权利要求9所述的多声道的混音方法,其特征在于,所述目标音频帧中各采样点的音频数据是通过所述目标音频帧对应的高能量音频帧的各采样点的音频数据以及对应的采样点增益确定的。11. The multi-channel mixing method according to claim 9, characterized in that the audio data of each sampling point in the target audio frame is through each sampling point of the high-energy audio frame corresponding to the target audio frame. The audio data and the corresponding sampling point gain are determined. 12.根据权利要求6所述的多声道的混音方法,其特征在于,所述根据所述能量降幅处理结果,得到第二多声道音频数据,包括:12. The multi-channel mixing method according to claim 6, wherein the second multi-channel audio data obtained according to the energy reduction processing result includes: 根据所述目标音频帧和所述第一多声道音频数据中能量不大于预设能量阈值的低能量音频帧,生成所述第二多声道音频数据。The second multi-channel audio data is generated according to the target audio frame and the low-energy audio frame in the first multi-channel audio data whose energy is not greater than a preset energy threshold. 13.根据权利要求12所述的多声道的混音方法,其特征在于,所述对所述第二多声道音频数据进行下混,得到具有N个第二声道的混音输出数据,包括:13. The multi-channel mixing method according to claim 12, wherein the second multi-channel audio data is down-mixed to obtain mixing output data having N second channels. ,include: 对所述第二多声道音频数据中对应于同一第二声道的所述目标音频帧和所述低能量音频帧进行加权求和,得到所述混音输出数据。Perform a weighted sum of the target audio frame and the low-energy audio frame corresponding to the same second channel in the second multi-channel audio data to obtain the mixing output data. 14.一种电子设备,其特征在于,包括:14. An electronic device, characterized in that it includes: 存储器,用于存储由电子设备的一个或多个处理器执行的指令,以及memory for storing instructions for execution by one or more processors of the electronic device, and 处理器,是电子设备的处理器之一,用于控制执行权利要求1至13中任一项所述的多声道的混音方法。The processor is one of the processors of the electronic device, and is used for controlling and executing the multi-channel mixing method according to any one of claims 1 to 13. 15.一种计算机可读存储介质,其特征在于,所述存储介质上存储有指令,所述指令在计算机上执行时使所述计算机执行权利要求1至13中任一项所述的多声道的混音方法。15. A computer-readable storage medium, characterized in that instructions are stored on the storage medium, and when the instructions are executed on a computer, they cause the computer to execute the multi-voice method according to any one of claims 1 to 13. way of mixing. 16.一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,所述指令用于实现如权利要求1至13中任一项所述的多声道的混音方法。16. A computer program product, characterized in that the computer program product includes instructions, and the instructions are used to implement the multi-channel mixing method according to any one of claims 1 to 13.
CN202210414876.5A 2022-04-15 2022-04-15 Multi-channel sound mixing method, equipment and medium Pending CN116962955A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210414876.5A CN116962955A (en) 2022-04-15 2022-04-15 Multi-channel sound mixing method, equipment and medium
PCT/CN2023/087077 WO2023197967A1 (en) 2022-04-15 2023-04-07 Multi-channel sound mixing method, and device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210414876.5A CN116962955A (en) 2022-04-15 2022-04-15 Multi-channel sound mixing method, equipment and medium

Publications (1)

Publication Number Publication Date
CN116962955A true CN116962955A (en) 2023-10-27

Family

ID=88329055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210414876.5A Pending CN116962955A (en) 2022-04-15 2022-04-15 Multi-channel sound mixing method, equipment and medium

Country Status (2)

Country Link
CN (1) CN116962955A (en)
WO (1) WO2023197967A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118865999A (en) * 2024-09-25 2024-10-29 中央广播电视总台 Multi-channel mixing processing method and device, and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE0400998D0 (en) * 2004-04-16 2004-04-16 Cooding Technologies Sweden Ab Method for representing multi-channel audio signals
CN103188595B (en) * 2011-12-31 2015-05-27 展讯通信(上海)有限公司 Method and system of processing multichannel audio signals
EP3761311A1 (en) * 2016-11-08 2021-01-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for downmixing or upmixing a multichannel signal using phase compensation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118865999A (en) * 2024-09-25 2024-10-29 中央广播电视总台 Multi-channel mixing processing method and device, and electronic equipment
CN118865999B (en) * 2024-09-25 2024-12-17 中央广播电视总台 Multichannel sound mixing processing method and device and electronic equipment

Also Published As

Publication number Publication date
WO2023197967A1 (en) 2023-10-19

Similar Documents

Publication Publication Date Title
CN110827843B (en) Audio processing method and device, storage medium and electronic equipment
US10834503B2 (en) Recording method, recording play method, apparatuses, and terminals
US20200258539A1 (en) Sound outputting device including plurality of microphones and method for processing sound signal using plurality of microphones
US20230171337A1 (en) Recording method of true wireless stereo earbuds and recording system
CN114203163A (en) Audio signal processing method and device
CN113160846B (en) Noise suppression method and electronic device
WO2019033438A1 (en) Audio signal adjustment method and device, storage medium, and terminal
CN107637095A (en) The loudspeaker of reservation privacy, energy efficient for personal voice
CN107240396B (en) Speaker self-adaptation method, device, equipment and storage medium
WO2022199405A1 (en) Voice control method and apparatus
US20200213732A1 (en) Volume adjusting method, device, and terminal device
US20250254466A1 (en) Sound field expansion method, audio device and computer-readable storage medium
CN116962955A (en) Multi-channel sound mixing method, equipment and medium
CN117953912B (en) Voice signal processing method and related equipment
CN118737111A (en) Speech processing method, device, vehicle, storage medium and program product
JP2022095689A (en) Voice data noise reduction method, device, equipment, storage medium, and program
CN116913328B (en) Audio processing method, electronic device and storage medium
CN102045619A (en) Recording apparatus, recording method, audio signal correction circuit, and program
CN107566595A (en) Volume control method, device and storage medium of mobile terminal and mobile terminal
CN114299923B (en) Audio identification method, device, electronic equipment and storage medium
CN117133303A (en) A voice noise reduction method, electronic device and medium
CN113889084B (en) Audio recognition method, device, electronic device and storage medium
CN114220454A (en) Audio noise reduction method, medium and electronic equipment
CN116320144A (en) A kind of audio playing method and electronic equipment
CN111163411B (en) Method for reducing influence of interference sound and sound playing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination