CN114073106B - Binaural beamforming microphone array - Google Patents
Binaural beamforming microphone array Download PDFInfo
- Publication number
- CN114073106B CN114073106B CN202080005496.5A CN202080005496A CN114073106B CN 114073106 B CN114073106 B CN 114073106B CN 202080005496 A CN202080005496 A CN 202080005496A CN 114073106 B CN114073106 B CN 114073106B
- Authority
- CN
- China
- Prior art keywords
- signal
- audio
- microphone array
- noise
- audio output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002596 correlated effect Effects 0.000 claims abstract description 7
- 230000005236 sound signal Effects 0.000 claims description 40
- 238000012545 processing Methods 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 31
- 230000000875 corresponding effect Effects 0.000 claims description 26
- 238000001914 filtration Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 description 15
- 230000015654 memory Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 8
- 230000003321 amplification Effects 0.000 description 7
- 238000003199 nucleic acid amplification method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000004807 localization Effects 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 238000000926 separation method Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 3
- 230000001427 coherent effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 230000005291 magnetic effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000010363 phase shift Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1787—General system configurations
- G10K11/17879—General system configurations using both a reference signal and an error signal
- G10K11/17881—General system configurations using both a reference signal and an error signal the reference signal being an acoustic signal, e.g. recorded with a microphone
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/12—Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/002—Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
- H04S3/004—For headphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/401—2D or 3D arrays of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/027—Spatial or constructional arrangements of microphones, e.g. in dummy heads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
技术领域Technical Field
本公开涉及麦克风阵列,具体地,涉及双耳波束形成麦克风阵列。The present disclosure relates to microphone arrays, and in particular, to binaural beamforming microphone arrays.
背景技术Background Art
麦克风阵列已被用于大范围的应用中,包括例如助听器、智能耳机、智能扬声器、语音通信、自动语音识别(ASR)、人机界面等。麦克风阵列的性能在很大程度上取决于它在有噪声和/或混响环境中提取感兴趣的信号的能力。由此,已经开发了很多技术来最大化感兴趣的信号的增益并抑制噪声、干扰和/或反射的影响。一种这样的技术称为波束形成,其根据信号源和麦克风的空间配置对接收到的信号进行滤波,以专注于源自特定位置的声音。然而,在实际情况下,传统的具有高增益的波束形成器缺乏处理噪声放大(例如,在特定频率范围内的白噪声放大)的能力。Microphone arrays have been used in a wide range of applications, including, for example, hearing aids, smart headsets, smart speakers, voice communications, automatic speech recognition (ASR), human-machine interfaces, etc. The performance of a microphone array depends largely on its ability to extract the signal of interest in a noisy and/or reverberant environment. As a result, many techniques have been developed to maximize the gain of the signal of interest and suppress the effects of noise, interference, and/or reflections. One such technique is called beamforming, which filters the received signal based on the spatial configuration of the signal source and the microphones to focus on sounds originating from a specific location. However, in practical situations, conventional beamformers with high gain lack the ability to handle noise amplification (e.g., amplification of white noise in a specific frequency range).
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
在附图的图中以示例而非限制的方式示出了本公开。The present disclosure is illustrated by way of example and not limitation in the figures of the accompanying drawings.
图1为示出根据本公开的实施方式的可以在其中配置示例麦克风阵列系统以进行操作的环境的简化图。FIG. 1 is a simplified diagram illustrating an environment in which an example microphone array system may be configured to operate according to an embodiment of the present disclosure.
图2为示出根据本公开的实施方式的示例麦克风阵列系统的简化框图FIG. 2 is a simplified block diagram showing an example microphone array system according to an embodiment of the present disclosure.
图3为示出感兴趣的信号和噪声信号之间不同相位关系以及这样的相位关系对感兴趣的信号的模糊性的影响的图。3 is a graph illustrating different phase relationships between a signal of interest and a noise signal and the effect of such phase relationships on the ambiguity of the signal of interest.
图4为示出根据本公开的实施方式的可以在其中配置示例双耳波束形成器以进行操作的环境的简化图。4 is a simplified diagram illustrating an environment in which an example binaural beamformer may be configured to operate in accordance with an embodiment of the present disclosure.
图5为示出可以由包括两个正交波束形成滤波器的示例双耳波束形成器执行的方法的流程图。5 is a flow chart illustrating a method that may be performed by an example binaural beamformer including two orthogonal beamforming filters.
图6为示出结合期望的信号和白噪声信号的本文所描述的示例双耳波束形成器与传统波束形成器的仿真输出耳间相干性的线图。6 is a line graph showing simulated output interaural coherence of an example binaural beamformer described herein and a conventional beamformer combining a desired signal and a white noise signal.
图7为示出根据本公开的实施方式的示例计算机系统的框图。FIG. 7 is a block diagram illustrating an example computer system according to an embodiment of the present disclosure.
具体实施方式DETAILED DESCRIPTION
图1为示出了可以在其中配置麦克风阵列102以进行操作的环境100的简化框图。麦克风阵列102可以与一个或多个应用相关联,所述应用包括例如助听器、智能耳机、智能扬声器、语音通信、自动语音识别(ASR)、人机接口等。环境100可以包括多个音频信号源。这些音频信号可以包括感兴趣的信号104(例如,语音信号)、噪声信号106(例如,漫射噪声)、干扰信号108、白噪声信号110(例如,从麦克风阵列102本身生成的噪声)和/或类似信号。所述麦克风阵列102可以包括被配置为串联操作的多个(例如,M个)麦克风(例如,声学传感器)。该麦克风可放置在平台上(例如,直线或曲线平台(cursive platform)),以便从它们各自的源/位置接收所述信号104、106、108和/或110。例如,可以根据彼此特定的几何关系来布置麦克风(例如,沿着线、在同一平面表面上、在三维空间中彼此之间以特定距离隔开等)。麦克风阵列102中的每个麦克风可以在特定时间相对于参考点(例如,麦克风阵列102中的参考麦克风位置)以特定入射角捕获源自源的音频信号的版本。可以记录声音捕获的时间,以便确定每个麦克风相对于参考点的时间延迟。所捕获的音频信号可以被转换成一个或多个电子信号以用于进一步处理。FIG. 1 is a simplified block diagram showing an environment 100 in which a microphone array 102 can be configured for operation. The microphone array 102 can be associated with one or more applications, including, for example, hearing aids, smart headphones, smart speakers, voice communications, automatic speech recognition (ASR), human-machine interfaces, etc. The environment 100 can include multiple audio signal sources. These audio signals can include a signal of interest 104 (e.g., a speech signal), a noise signal 106 (e.g., diffuse noise), an interference signal 108, a white noise signal 110 (e.g., noise generated from the microphone array 102 itself), and/or similar signals. The microphone array 102 may include a plurality of (e.g., M) microphones (e.g., acoustic sensors) configured to operate in series. The microphones may be placed on a platform (e.g., a straight or curvilinear platform) to receive the signals 104, 106, 108, and/or 110 from their respective sources/positions. For example, the microphones may be arranged according to a specific geometric relationship to each other (e.g., along a line, on the same plane surface, spaced apart from each other at a specific distance in three-dimensional space, etc.). Each microphone in the microphone array 102 can capture a version of an audio signal originating from a source at a specific angle of incidence at a specific time relative to a reference point (e.g., a reference microphone position in the microphone array 102). The time of sound capture can be recorded to determine the time delay of each microphone relative to the reference point. The captured audio signal can be converted into one or more electronic signals for further processing.
麦克风阵列102可以包括或是以通信的方式耦合一个处理装置,例如数字信号处理器(DSP)或中央处理单元(CPU)。所述处理装置可以被配置为处理(例如,滤波)从麦克风阵列102接收的信号,并生成具有某些特性(例如,降噪、语音增强、声源分离、去混响,等)的音频输出112。例如,处理装置可以被配置为对经由麦克风阵列102接收的信号进行滤波,使得感兴趣的信号104可以被提取和/或增强,而其他信号(例如,信号106、108和/或110)可以被抑制以最小化它们可能对感兴趣的信号的不利影响。The microphone array 102 may include or be communicatively coupled to a processing device, such as a digital signal processor (DSP) or a central processing unit (CPU). The processing device may be configured to process (e.g., filter) signals received from the microphone array 102 and generate an audio output 112 having certain properties (e.g., noise reduction, speech enhancement, sound source separation, dereverberation, etc.). For example, the processing device may be configured to filter the signals received via the microphone array 102 so that the signal of interest 104 may be extracted and/or enhanced, while other signals (e.g., signals 106, 108, and/or 110) may be suppressed to minimize their possible adverse effects on the signal of interest.
图2为示出了如本文所述的示例麦克风阵列系统200的简化框图。如图2所示,该系统200可以包括麦克风阵列202,模数转换器(ADC)204和处理装置206。麦克风阵列202可以包括多个被布置为接收来自不同的源和/或位于不同的角度的音频信号的麦克风。在示例中,可以相对于坐标系(x,y)来指定麦克风的位置。坐标系可以包括可以参照其指定麦克风位置的原点(O),其中该原点可以与麦克风之一的位置重合。麦克风的角度位置也可以参照坐标系来定义。源信号可以作为来自远场的平面波并以音速(例如,c=340m/s)传播并撞击麦克风阵列202。FIG. 2 is a simplified block diagram showing an example microphone array system 200 as described herein. As shown in FIG. 2 , the system 200 may include a microphone array 202, an analog-to-digital converter (ADC) 204, and a processing device 206. The microphone array 202 may include a plurality of microphones arranged to receive audio signals from different sources and/or at different angles. In an example, the position of the microphone may be specified relative to a coordinate system (x, y). The coordinate system may include an origin (O) to which the microphone position may be specified, wherein the origin may coincide with the position of one of the microphones. The angular position of the microphone may also be defined with reference to the coordinate system. The source signal may be a plane wave from a far field and propagate at the speed of sound (e.g., c=340 m/s) and impact the microphone array 202.
麦克风阵列202中的每个麦克风可以接收带有一定时间延迟和/或相移的源信号的版本。麦克风的电子组件可以将所接收的声音信号转换为可以被送到ADC 204内的电子信号。在示例实施方式中,ADC 204可以进一步将电子信号转换成一个或多个数字信号。Each microphone in the microphone array 202 may receive a version of the source signal with a certain time delay and/or phase shift. The electronic components of the microphone may convert the received sound signal into an electronic signal that may be sent to the ADC 204. In an example embodiment, the ADC 204 may further convert the electronic signal into one or more digital signals.
处理装置206可以包括输入接口(未示出),以接收由ADC 204生成的数字信号。处理装置206可以进一步包括被配置成准备数字信号以进一步处理的预处理器208。例如,预处理器208可以包括硬件电路和/或软件程序,以使用例如短时傅立叶变换或其他合适类型的频域变换技术将数字信号转换为频域表示。The processing device 206 may include an input interface (not shown) to receive the digital signal generated by the ADC 204. The processing device 206 may further include a pre-processor 208 configured to prepare the digital signal for further processing. For example, the pre-processor 208 may include a hardware circuit and/or a software program to convert the digital signal into a frequency domain representation using, for example, a short-time Fourier transform or other suitable type of frequency domain transform technique.
预处理器208的输出可以例如经由波束形成器210由处理装置206进一步处理。波束形成器210可以操作以将一个或多个滤波器(例如,空间滤波器)应用于接收到的信号以实现信号的空间选择性。在一个实施方式中,波束形成器210可被配置为处理所捕获的信号的相位和/或幅度,使得特定角度的信号可以经历相长干涉,而其他信号可能经历相消干涉。波束形成器210的处理可以导致形成期望的波束图(例如,指向性图),该波束图可以增强来自一个或多个特定方向的音频信号。这样的波束图的用于最大化其在观测方向上的灵敏度(例如,与最大灵敏度相关联的音频信号的入射角度)与其在所有方向上的平均灵敏度的比值的能力,可以通过一个或多个参数来量化,所述参数例如包括指向性因子(DF)。The output of the preprocessor 208 may be further processed by the processing device 206, for example, via a beamformer 210. The beamformer 210 may be operable to apply one or more filters (e.g., spatial filters) to the received signals to achieve spatial selectivity of the signals. In one embodiment, the beamformer 210 may be configured to process the phase and/or amplitude of the captured signals so that signals at specific angles may experience constructive interference, while other signals may experience destructive interference. The processing of the beamformer 210 may result in the formation of a desired beam pattern (e.g., a directivity pattern) that may enhance audio signals from one or more specific directions. The ability of such a beam pattern to maximize the ratio of its sensitivity in the observation direction (e.g., the angle of incidence of the audio signal associated with the maximum sensitivity) to its average sensitivity in all directions may be quantified by one or more parameters, including, for example, a directivity factor (DF).
处理装置206还可以包括后处理器212,该后处理器212被配置为将由波束形成器210产生的信号变换成合适的形式以用于输出。例如,后处理器212可以操作以将由波束形成器210针对每个频率子带提供的估计转换回时域,使得麦克风阵列系统200的输出可以是听觉接收机能够理解的。The processing device 206 may also include a post-processor 212 configured to transform the signal generated by the beamformer 210 into a suitable form for output. For example, the post-processor 212 may be operative to convert the estimate provided by the beamformer 210 for each frequency subband back into the time domain so that the output of the microphone array system 200 may be understandable to a hearing receiver.
本文所描述的信号的和/或滤波可以从下面的描述中理解。对于作为来自方位角θ的平面波,以音速(例如,c=340m/s)在消声声学环境中传播,并撞击在包括2M个全向麦克风的麦克风阵列上(例如,麦克风阵列202)的感兴趣的源信号,相应的长度为2M的导向矢量可被表示为如下:The signal and/or filtering described herein can be understood from the following description. For a source signal of interest that is a plane wave from an azimuth angle θ, propagating at the speed of sound (e.g., c=340 m/s) in an anechoic acoustic environment and impinging on a microphone array (e.g., microphone array 202) comprising 2M omnidirectional microphones, the corresponding steering vector of length 2M can be represented as follows:
其中,J可以表示虚数单位,即J2=-1,ω=2πf可以表示角度频率,f>0为时间频率,τ0=δ/c可以表示在角度θ=0上两个相邻传感器之间的延迟,δ是阵元间距,并且上标T可以表示转置运算符。声波波长可以由λ=c/f表示。Wherein, J may represent an imaginary unit, i.e., J 2 = -1, ω = 2πf may represent an angular frequency, f>0 is a temporal frequency, τ 0 = δ/c may represent a delay between two adjacent sensors at an angle θ = 0, δ is an array element spacing, and the superscript T may represent a transposition operator. The wavelength of an acoustic wave may be represented by λ = c/f.
基于如上定义的导向矢量,长度为2M的频域观察信号向量可被表示为Based on the steering vector defined above, the frequency domain observation signal vector with a length of 2M can be expressed as
y(ω)=[Y1(ω) Y2(ω) … Y2M(ω)]T y(ω)=[Y 1 (ω) Y 2 (ω) … Y 2M (ω)] T
=x(ω)+v(ω)=x(ω)+v(ω)
=d(ω,θs)X(ω)+v(ω),=d(ω, θ s )X(ω)+v(ω),
其中Ym(ω)可以表示第m个麦克风信号,x(ω)=d(ω,θs)X(ω),X(ω)可以表示感兴趣的零均值源信号(例如,期望的信号),d(ω,θs)可以表示信号传播向量(例如,其可以采用与转向向量相同的形式),并且v(ω)可以表示定义类似于y(ω)的零均值加性噪声信号向量。Where Ym (ω) may represent the mth microphone signal, x(ω)=d(ω, θs )X(ω), X(ω) may represent the zero-mean source signal of interest (e.g., the desired signal), d(ω, θs ) may represent the signal propagation vector (e.g., which may take the same form as the steering vector), and v(ω) may represent a zero-mean additive noise signal vector defined similarly to y(ω).
根据上述内容,关于y(ω)的2M×2M的协方差矩阵可以被推导为Based on the above, the 2M×2M covariance matrix of y(ω) can be derived as
其中E[·]可以表示数学期望,上标H可以表示共轭-转置运算符,可以表示X(ω)的方差,可以表示v(ω)的方差矩阵,可以表示位于第一传感器或麦克风处的噪声V1(ω)的方差,以及Γv(ω)=Φv(ω)/φV1(ω)(例如,通过利用φV1(ω)归一化Φv(ω))可表示噪声的伪相干矩阵。可以假设在多个传感器或麦克风之间(例如,在所有传感器或麦克风之间)的噪声的方差是相同的。where E[·] can represent mathematical expectation, and the superscript H can represent the conjugate-transposition operator. It can be expressed as the variance of X(ω), The variance matrix of v(ω) can be expressed as The variance of the noise V 1 (ω) at the first sensor or microphone may be represented, and Γ v (ω)=Φ v (ω)/φ V1 (ω) (e.g., by normalizing Φ v (ω) with φ V1 (ω)) may represent a pseudo-coherence matrix of the noise. The variance of the noise may be assumed to be the same between multiple sensors or microphones (e.g., between all sensors or microphones).
本文所描述的传感器间距δ可以被假设为小于声波波长λ(例如,δ<<λ),其中λ=c/f。这可能意味着ωτ0小于2π(例如,ωτ0<<2π)并且真实声学压力差可以由麦克风输出的有限差来近似。此外,可以假设期望的源信号将从角度θ=0(例如,以端射方向)传播。因此,y(ω)可以表示为The sensor spacing δ described herein can be assumed to be smaller than the acoustic wavelength λ (e.g., δ << λ), where λ = c/f. This may mean that ωτ 0 is smaller than 2π (e.g., ωτ 0 << 2π) and the true acoustic pressure difference can be approximated by the finite difference of the microphone outputs. Furthermore, it can be assumed that the desired source signal will propagate from an angle θ = 0 (e.g., in an end-fire direction). Thus, y(ω) can be expressed as
y(ω)=d(ω,0)X(ω)+v(ω)y(ω)=d(ω,0)X(ω)+v(ω)
并且在端射处,波束形成器的波束图的值可以等于1或具有最大值。And at endfire, the value of the beam pattern of the beamformer may be equal to 1 or have a maximum value.
在波束形成器滤波器的示例实施方式中,可以在麦克风阵列102的一个或多个麦克风(例如,在每个麦克风)的输出处应用复权重。然后可以In an example implementation of a beamformer filter, complex weights may be applied at the output of one or more microphones (eg, at each microphone) of the microphone array 102.
将加权后的输出一起求和以获得源信号的估计,如下所示:The weighted outputs are summed together to obtain an estimate of the source signal as follows:
Z(ω)=hH(ω)y(ω)Z(ω)=h H (ω)y(ω)
=X(ω)hH(ω)d(ω,0)+hH(ω)v(ω)=X(ω)h H (ω)d(ω,0)+h H (ω)v(ω)
其中Z(ω)可以表示期望信号X(ω)的估计并且h(ω)可以表示长度2M的空间线性滤波器,其包括应用到麦克风的输出的复权重。信号源的方向上的无失真约束可以被计算为:Where Z(ω) may represent an estimate of the desired signal X(ω) and h(ω) may represent a spatial linear filter of length 2M comprising complex weights applied to the outputs of the microphones. The distortion-free constraint in the direction of the signal source may be calculated as:
hH(ω)d(ω,0)=1,h H (ω)d(ω,0)=1,
并且波束形成器的指向性因子(DF)可以被定义为:And the directivity factor (DF) of the beamformer can be defined as:
其中对于i,j=1,2,...,2M,[Γd(ω)]i,j可以表示球面各向同性(例如,漫射的)噪声的伪相干矩阵,并且可以被in For i,j=1,2,...,2M,[Γ d (ω)] i,j may represent the pseudo-coherence matrix of spherical isotropic (eg, diffuse) noise and may be
推导为:It is derived as:
基于以上所示定义和/或计算,通过最大化DF并考虑以上所示的无失真约束,波束形成器(被称为超指向波束形成器)可以表示如下:Based on the definitions and/or calculations shown above, by maximizing DF and considering the distortion-free constraints shown above, the beamformer (referred to as a super-directional beamformer) can be expressed as follows:
(例如,考虑到本文所述的阵列几何结构),对应于这种波束形成器的DF可以具有最大值,该最大值可以表示为:(For example, considering the array geometry described herein), the DF corresponding to such a beamformer may have a maximum value, which may be expressed as:
本文描述的示例波束形成器能够生成频率不变(例如,由于DF的增加或最大化)的波束图。但是,DF的增加可能会导致更大的噪声放大,例如由麦克风阵列102中的麦克风的硬件元件产生的白噪声的放大(例如,在低频范围内)。为了降低噪声放大对感兴趣的信号的不良影响,可以考虑在麦克风阵列102中部署较少数量的麦克风,规范化矩阵Γd(ω)和/或设计具有极低的自噪声水平的麦克风阵列102。但是,这些方法可能成本高昂且难以实施,或者可能会对波束形成器性能的其他方面产生负面影响(例如,导致DF减小、波束图的形状发生变化和/或波束图更加依赖于频率)。The example beamformers described herein are capable of generating beam patterns that are frequency invariant (e.g., due to an increase or maximization of DF). However, an increase in DF may result in greater noise amplification, such as amplification of white noise generated by hardware elements of microphones in the microphone array 102 (e.g., in the low frequency range). In order to reduce the adverse effects of noise amplification on the signal of interest, one may consider deploying a smaller number of microphones in the microphone array 102, normalizing the matrix Γ d (ω), and/or designing the microphone array 102 with an extremely low self-noise level. However, these approaches may be costly and difficult to implement, or may negatively affect other aspects of the beamformer performance (e.g., resulting in a decrease in DF, a change in the shape of the beam pattern, and/or a greater frequency dependence of the beam pattern).
本公开的实施方式探索音频信号被感知的位置和/或方向在人类听觉系统中的信号(例如,以低于诸如1kHz的频率)的可懂度上的影响,以解决本文所述的噪声放大问题。在人类双耳听觉系统中对语音信号的感知可以被分类为同相位和异相位,而对噪声信号(例如,白噪声信号)的感知可以被分类为同相位、随机相位或异相位。如本文中所引用的,“同相位”可以意味着到达双耳接收机(例如,具有诸如一对耳机的两个接收通道的接收机、具有两个耳朵的人等)的两个信号流具有基本上相同的相位(例如,大致相同的相位)。“异相位”可以意味着到达双耳接收机的两个信号流的相位相差大约180°。“随机相位”可以意味着到达双耳接收机的两个信号流之间的相位关系是随机的(例如,信号流的各自相位相差一个随机量)。Embodiments of the present disclosure explore the influence of the perceived position and/or direction of an audio signal on the intelligibility of a signal in a human auditory system (e.g., at a frequency lower than, for example, 1kHz) to solve the noise amplification problem described herein. The perception of a speech signal in a human binaural auditory system can be classified as in-phase and out-of-phase, while the perception of a noise signal (e.g., a white noise signal) can be classified as in-phase, random phase, or out-of-phase. As cited herein, "in-phase" can mean that two signal streams arriving at a binaural receiver (e.g., a receiver having two receiving channels such as a pair of headphones, a person having two ears, etc.) have substantially the same phase (e.g., approximately the same phase). "Out-of-phase" can mean that the phases of the two signal streams arriving at the binaural receiver differ by approximately 180°. "Random phase" can mean that the phase relationship between the two signal streams arriving at the binaural receiver is random (e.g., the respective phases of the signal streams differ by a random amount).
图3为示出与感兴趣的信号(例如,语音信号)和噪声信号(例如,白噪声)相关联的不同相位场景,以及耳间相位关系对这些信号的定位的影响的图。左列示出双耳噪声信号流之间的相位关系可以分类为同相位,随机相位和异相位。顶行示出双耳语音信号流之间的相位关系可以分类为同相位和异相位。图3的其余部分示出当信号并存于环境中时,由双耳接收机感知的语音信号和噪声信号两者的相位关系的组合。例如,单元格302描绘了一个场景,其中语音流和白噪声流在双耳接收机处都为同相位(例如,作为单声道波束形成的结果),并且单元格304描绘了一个场景,其中到达的双耳接收机的语音流为同相位,而到达接收机的噪声流具有随机相位关系。FIG. 3 is a diagram showing different phase scenarios associated with a signal of interest (e.g., a speech signal) and a noise signal (e.g., white noise), and the effect of the interaural phase relationship on the localization of these signals. The left column shows that the phase relationship between binaural noise signal streams can be classified as in-phase, random phase, and out-of-phase. The top row shows that the phase relationship between binaural speech signal streams can be classified as in-phase and out-of-phase. The rest of FIG. 3 shows a combination of the phase relationship of the speech signal and the noise signal perceived by the binaural receiver when the signals coexist in the environment. For example, cell 302 depicts a scenario in which the speech stream and the white noise stream are both in-phase at the binaural receiver (e.g., as a result of monophonic beamforming), and cell 304 depicts a scenario in which the speech stream arriving at the binaural receiver is in-phase, while the noise stream arriving at the receiver has a random phase relationship.
语音信号的可懂度可以基于语音信号和白噪声的相位关系的组合而变化。以下表1示出了基于语音和噪声之间的相位关系的可懂度排序,其中反相的和异相的情况对应于较高级别的可懂度,而同相的情况对应于较低级别的可懂度。The intelligibility of speech signals can vary based on the combination of the phase relationship of the speech signal and white noise. Table 1 below shows the intelligibility ranking based on the phase relationship between speech and noise, where the anti-phase and out-of-phase conditions correspond to higher levels of intelligibility, while the in-phase conditions correspond to lower levels of intelligibility.
表1-基于语音/噪声相位关系的可懂度排序Table 1 - Intelligibility ranking based on speech/noise phase relationship
当语音信号和噪声被感知为来自同一方向(例如,如在同相的情况下),人类听觉系统将难于从噪声中分离语音,并且语音信号的可懂度将受到影响。因此,诸如双耳线性滤波的双耳滤波可以结合波束形成(例如,固定波束形成)来执行,以生成具有与以上所示的反相的或异相的情况相对应的相位关系的双耳输出(例如,两个输出流)。双耳输出中的每一个可以包括对应于感兴趣的信号(例如,语音信号)的信号分量和对应于噪声信号(例如,白噪声)的噪声分量。可以以这样的方式来应用滤波:使得输出流的噪声分量变得不相关(例如,具有随机相位关系),而输出流的信号分量保持相关(例如,为彼此同相)和/或被增强。因此,期望的信号和白噪声可被感知为来自不同的方向,并被更好地分离以提高可懂度。When the speech signal and the noise are perceived as coming from the same direction (e.g., as in the case of being in phase), the human auditory system will have difficulty separating the speech from the noise, and the intelligibility of the speech signal will be affected. Therefore, binaural filtering such as binaural linear filtering can be performed in conjunction with beamforming (e.g., fixed beamforming) to generate binaural outputs (e.g., two output streams) having a phase relationship corresponding to the anti-phase or out-of-phase situation shown above. Each of the binaural outputs may include a signal component corresponding to a signal of interest (e.g., a speech signal) and a noise component corresponding to a noise signal (e.g., white noise). Filtering may be applied in such a way that the noise component of the output stream becomes uncorrelated (e.g., having a random phase relationship), while the signal component of the output stream remains correlated (e.g., being in phase with each other) and/or enhanced. Therefore, the desired signal and white noise may be perceived as coming from different directions and are better separated to improve intelligibility.
图4为示出了环境400中的被配置为应用双耳滤波以提高期望信号的可懂度的麦克风阵列402的简化框图。环境400可以类似于图1中描绘的环境100,其中感兴趣的信号404和白噪声信号410的各自的源并存。类似于图1的麦克风阵列102,麦克风阵列402可以包括被配置为串联操作的多个(例如,M个)麦克风(例如,声学传感器)。这些麦克风可以被放置为例如以不同的角度和/或在不同的时间从其位置捕获感兴趣的信号404(例如,源音频信号)的不同版本。该麦克风还可以捕获一个或多个其他音频信号(例如,噪声406和/或干扰408),所述音频信号包括由麦克风阵列402本身的电子元件生成的白噪声410。Fig. 4 is a simplified block diagram of a microphone array 402 configured to apply binaural filtering to improve the intelligibility of a desired signal in an environment 400. Environment 400 may be similar to the environment 100 depicted in Fig. 1, wherein respective sources of a signal 404 of interest and a white noise signal 410 coexist. Similar to the microphone array 102 of Fig. 1, microphone array 402 may include multiple (e.g., M) microphones (e.g., acoustic sensors) configured to operate in series . These microphones may be placed to capture different versions of a signal 404 of interest (e.g., source audio signal) from its position, for example, at different angles and/or at different times. The microphone may also capture one or more other audio signals (e.g., noise 406 and/or interference 408), the audio signal including white noise 410 generated by the electronic components of microphone array 402 itself.
麦克风阵列402可以包括或可以通信地耦合到诸如数字信号处理器(DSP)或中央处理单元(CPU)的处理装置。所述处理装置可以被配置为对感兴趣的信号404和/或白噪声信号410应用双耳滤波,并为双耳接收机生成多个输出。例如,处理装置可以将第一波束形成器滤波器h1应用到感兴趣的信号404和白噪声信号410以生成第一音频输出流。处理装置还可以将第二波束形成器滤波器h2应用到感兴趣的信号404和白噪声信号410以生成第二音频输出流。第一和第二音频输出流中的每一个可以包括白噪声分量412a和期望信号分量412b。白噪声分量412a可以对应于白噪声信号410(例如,白噪声信号的已滤波版本),并且期望信号分量412b可以对应于感兴趣的信号404(例如,感兴趣的信号的已滤波版本)。滤波器h1和h2可被设计为彼此正交,使得白噪声分量412a在第一和第二音频输出流中变得不相关(例如,具有随机相位关系或约为零的耳间相干性(IC))。滤波器h1和h2还可以以这样的方式来配置:使得期望信号分量412b在第一和第二音频输出流中是彼此同相位的(例如,具有约为一的IC)。因此,第一和第二音频输出的双耳接收机可以将感兴趣的信号404和白噪声信号410感知为来自不同位置和/或方向,并且因此可以提高感兴趣的信号的可理解性。The microphone array 402 may include or may be communicatively coupled to a processing device such as a digital signal processor (DSP) or a central processing unit (CPU). The processing device may be configured to apply binaural filtering to the signal of interest 404 and/or the white noise signal 410 and generate a plurality of outputs for the binaural receiver. For example, the processing device may apply a first beamformer filter h1 to the signal of interest 404 and the white noise signal 410 to generate a first audio output stream. The processing device may also apply a second beamformer filter h2 to the signal of interest 404 and the white noise signal 410 to generate a second audio output stream. Each of the first and second audio output streams may include a white noise component 412a and a desired signal component 412b. The white noise component 412a may correspond to the white noise signal 410 (e.g., a filtered version of the white noise signal), and the desired signal component 412b may correspond to the signal of interest 404 (e.g., a filtered version of the signal of interest). Filters hi and h2 may be designed to be orthogonal to each other so that the white noise component 412a becomes uncorrelated (e.g., has a random phase relationship or an interaural coherence (IC) of approximately zero) in the first and second audio output streams. Filters hi and h2 may also be configured in such a manner that the desired signal component 412b is in phase with each other (e.g., has an IC of approximately one) in the first and second audio output streams. Thus, a binaural receiver of the first and second audio outputs may perceive the signal of interest 404 and the white noise signal 410 as coming from different locations and/or directions, and thus may improve the intelligibility of the signal of interest.
在一个实施方式中,可以结合固定波束形成来执行双耳线性滤波。两个复数值线性滤波器(例如,h1(ω)和h2(ω))可以被应用到所观察的信号向量,诸如本文所述的y(ω))。滤波器的各自长度可以取决于包括在相关麦克风阵列中的麦克风的数量。例如,如果相关麦克风阵列包括2M个麦克风,则滤波器的长度可以为2M。In one embodiment, binaural linear filtering may be performed in conjunction with fixed beamforming. Two complex-valued linear filters (e.g., h 1 (ω) and h 2 (ω)) may be applied to the observed signal vector, such as y(ω) as described herein. The respective lengths of the filters may depend on the number of microphones included in the associated microphone array. For example, if the associated microphone array includes 2M microphones, the length of the filter may be 2M.
源信号(例如,X(ω))的两个估计(例如,Z1(ω)和Z2(ω))可以响应于信号的双声道滤波而获得。估计可以表示为Two estimates (e.g., Z 1 (ω) and Z 2 (ω)) of a source signal (e.g., X(ω)) can be obtained in response to binaural filtering of the signal. The estimates can be expressed as
并且Zi(ω)的方差可以表示为And the variance of Zi (ω) can be expressed as
其中Γv(ω),Φy(ω),Φv(ω),φX(ω),φV1(ω)和d(ω,0)的各自含义如本文所述。wherein Γ v (ω), Φ y (ω), Φ v (ω), φ X (ω), φ V1 (ω) and d(ω,0) have the same meanings as described herein.
基于以上,两个无失真约束可以被确定为Based on the above, two distortion-free constraints can be determined as
并且输入信噪比(SNR)和输出SNR可被分别计算为And the input signal-to-noise ratio (SNR) and output SNR can be calculated as
和and
在至少某些场景下(例如,当h1(ω)=ii且h2(ω)=ij时,其中ii与ij分别为2M×2M单位矩阵I2M的第i与第j列),双耳输出SNR可以等于输入SNR(例如,oSNR[ii(ω),ij(ω)]=iSNR(ω))。基于输入SNR和输出SNR,双耳SNR增益可以例如被确定为In at least some scenarios (e.g., when h 1 (ω) = i i and h 2 (ω) = i j , where i i and i j are the i-th and j-th columns of the 2M×2M identity matrix I 2M , respectively), the binaural output SNR may be equal to the input SNR (e.g., oSNR[i i (ω), i j (ω)] = iSNR(ω)). Based on the input SNR and the output SNR, the binaural SNR gain may be determined, for example, as
还可以确定与双耳波束形成相关联的其他度量,所述度量可以包括:例如,表示为W[h1(ω),h2(ω)]的双声道白噪声增益(WNG)、表示为D[h1(ω),h2(ω)]的双耳指向性因子(DF)和表示为|B[h1(ω),h2(ω),θ]|2的双耳波束图。这些度量可以根据以下方式计算:Other metrics associated with binaural beamforming may also be determined, which may include, for example, binaural white noise gain (WNG) denoted as W[h 1 (ω), h 2 (ω)], binaural directivity factor (DF) denoted as D[h 1 (ω), h 2 (ω)], and binaural beam pattern denoted as |B[h 1 (ω), h 2 (ω), θ]| 2. These metrics may be calculated as follows:
其中已在上文解释了Γd(ω)的含义。The meaning of Γ d (ω) has been explained above.
人类听觉系统中的双耳信号的定位可以取决于另一项度量,其在本文中称为信号的耳间相干性(IC)。IC的值(或IC的模数)可以根据双声道信号的相关性而增加或减小。例如,当源信号的两个音频流高度相关时(例如,当两个音频流彼此同相位时,或者当人类听觉系统将两个音频流感知为来自单个信号源时),IC的值可以达到最大值(例如,1)。当源信号的两个音频流基本不相关时(例如,当两个音频流具有随机相位关系时,或者当人类听觉系统将两个流感知为来自两个独立的源时),IC的值可以达到最小值(例如,0)。IC的值可以指示大脑用于定位声音的其他双耳线索(例如,耳间时间差(ITD)、耳间声级差(ILD)、声场的宽度等),或者可以与大脑用于定位声音的其他双耳线索有关。随着声音的IC减小,大脑对声音进行定位的能力可能会相应降低。The localization of binaural signals in the human auditory system can depend on another metric, which is referred to as the interaural coherence (IC) of the signal in this article. The value of IC (or the modulus of IC) can increase or decrease according to the correlation of the binaural signal. For example, when the two audio streams of the source signal are highly correlated (for example, when the two audio streams are in phase with each other, or when the human auditory system perceives the two audio streams as coming from a single signal source), the value of IC can reach a maximum value (for example, 1). When the two audio streams of the source signal are substantially unrelated (for example, when the two audio streams have a random phase relationship, or when the human auditory system perceives the two streams as coming from two independent sources), the value of IC can reach a minimum value (for example, 0). The value of IC can indicate other binaural clues (for example, interaural time difference (ITD), interaural level difference (ILD), the width of the sound field, etc.) that the brain is used to locate the sound, or can be related to other binaural clues that the brain is used to locate the sound. As the IC of the sound decreases, the ability of the brain to locate the sound may be reduced accordingly.
可以如下确定和/或理解耳间相干性的影响。令A(ω)和B(ω)为两个零均值复数值随机变量。A(ω)和B(ω)之间的相干函数(CF)可以被定义为The effect of interaural coherence can be determined and/or understood as follows. Let A(ω) and B(ω) be two zero-mean complex-valued random variables. The coherence function (CF) between A(ω) and B(ω) can be defined as
其中上标*表示复共轭运算符。γAB(ω)的值可满足以下关系:0≤|γAB(ω)|2≤1。对于一对或多对(例如,对于任意一对)的麦克风或传感器(i,j),噪声的输入IC可对应于Vi(ω)and Vj(ω)之间的CF,如下所示。Wherein the superscript * denotes a complex conjugate operator. The value of γ AB (ω) may satisfy the following relationship: 0≤|γ AB (ω)| 2 ≤1. For one or more pairs (eg, for any pair) of microphones or sensors (i, j), the input IC of the noise may correspond to the CF between V i (ω) and V j (ω), as shown below.
白噪声的输入IC,即γw(ω),和漫射噪声的输入IC,即γd(ω),可以为如下The input IC of white noise, i.e., γ w (ω), and the input IC of diffuse noise, i.e., γ d (ω), can be given as follows
γw(ω)=0γ w (ω)=0
噪声的输出IC可以被定义为在Z1(ω)和Z2(ω)中经滤波的噪声之间的CF,如下所示。The output IC of the noise can be defined as the CF between the filtered noise in Z 1 (ω) and Z 2 (ω) as shown below.
在至少某些场景下(例如,当h1(ω)=ii且h2(ω)=ij时),输入和输出IC可以是相等的,即,γ[ii(ω),ij(ω)]=γ[h1(ω),h2(ω)]。白噪声的输出IC,即γw[h1(ω),h2(ω)],和漫射噪声的输出IC,即γd[h1(ω),h2(ω)],可以分别被确定为In at least some scenarios (e.g., when h1 (ω)= ii and h2 (ω)= ij ), the input and output ICs may be equal, i.e., γ[ i (ω), ij (ω)]=γ[ h1 (ω), h2 (ω)]. The output ICs of white noise, i.e., γw [ h1 (ω), h2 (ω)], and diffuse noise, i.e., γd [ h1 (ω), h2 (ω)], may be determined as
和and
当滤波器h1(ω)和h2(ω)共线时,以下情况可能成立:When filters h 1 (ω) and h 2 (ω) are collinear, the following may hold:
其中,可以是复数值的数,并且|γ[h1(ω),h2(ω)]|、|γw[h1(ω),h2(ω)]|和|γd[h1(ω),h2(ω)]|都可以具有接近1的值(例如,|γ[h1(ω),h2(ω)]|=|γw[h1(ω),h2(ω)]|=|γw[h1(ω),h2(ω)]|=1)。因此,不仅期望的源信号将被感知为相干(例如,完全相干),其他信号(例如,噪声)也将被感知为相干,并且组合的信号(例如,期望的源信号加噪声)可以被感知为来自同一方向。其结果是,人类听觉系统将难于分离信号,并且期望的信号的可懂度可能被影响。in, It can be a complex-valued number, and |γ[h 1 (ω), h 2 (ω)]|, |γ w [h 1 (ω), h 2 (ω)]|, and |γ d [h 1 (ω), h 2 (ω)]| can all have values close to 1 (e.g., |γ[h 1 (ω), h 2 (ω)]|=|γ w [h 1 (ω), h 2 (ω)]|=|γ w [h 1 (ω), h 2 (ω)]|=1). Therefore, not only the desired source signal will be perceived as coherent (e.g., completely coherent), other signals (e.g., noise) will also be perceived as coherent, and the combined signal (e.g., the desired source signal plus noise) may be perceived as coming from the same direction. As a result, it will be difficult for the human auditory system to separate the signals, and the intelligibility of the desired signal may be affected.
当滤波器h1(ω)和h2(ω)彼此正交时(例如,h1 H(ω)h2(ω)=0),期望的源信号和噪声(例如,白噪声)之间的分离可以得到改进。下面解释可以如何推导这样的正交滤波器,和它们对期望的信号和噪声之间的分离的影响,以及对期望的信号的增强的可懂度的影响。When filters h1 (ω) and h2 (ω) are orthogonal to each other (e.g., h1H (ω) h2 (ω)=0), the separation between the desired source signal and noise (e.g., white noise ) can be improved. The following explains how such orthogonal filters can be derived, and their effect on the separation between the desired signal and noise, as well as on the enhanced intelligibility of the desired signal.
本文所述的矩阵Γd(ω)可以是对称的,并且可以被对角化为The matrix Γ d (ω) described herein may be symmetric and may be diagonalized as
UT(ω)Γd(ω)U(ω)=Λ(ω)U T (ω)Γ d (ω)U(ω)=Λ(ω)
其中in
U(ω)=[u1(ω) u2(ω) … u2M(ω)]U(ω)=[u 1 (ω) u 2 (ω) … u 2M (ω)]
可以是满足以下条件的正交矩阵It can be an orthogonal matrix that satisfies the following conditions
UT(ω)U(ω)=U(ω)UT(ω)=I2M U T (ω) U (ω) = U (ω) U T (ω) = I 2M
并且and
Λ(ω)=diag[λ1(ω),λ2(ω),...,λ2M(ω)]Λ(ω)=diag[λ 1 (ω), λ 2 (ω),..., λ 2M (ω)]
可以是对角矩阵。Can be a diagonal matrix.
正交向量u1(ω),u2(ω),...,u2M(ω)可以是分别对应于矩阵Γd(ω)的特征值λ1(ω),λ2(ω),...,λ2M(ω)的特征向量,其中λ1(ω)≥λ2(ω)≥···≥λ2M(ω)>0。由此,可以最大化本文描述的漫射噪声的输出IC的正交滤波器可被确定为The orthogonal vectors u 1 (ω), u 2 (ω), ..., u 2M (ω) may be eigenvectors corresponding to eigenvalues λ 1 (ω), λ 2 (ω), ..., λ 2M (ω), respectively, of the matrix Γ d (ω), where λ 1 (ω) ≥ λ 2 (ω) ≥ ... ≥ λ 2M (ω) > 0. Thus, the orthogonal filter of the output IC that can maximize the diffuse noise described herein may be determined as
CF的第一最大模式可以如下:The first maximum mode of CF can be as follows:
具有对应的向量q+,1(ω)和q-,1(ω),其中With corresponding vectors q +, 1 (ω) and q -, 1 (ω), where
CF的所有M个最大模式(从m=1,2,...,M)都可以满足以下条件All M maximum modes of CF (from m=1, 2, ..., M) can satisfy the following conditions
具有相应的向量q+,m(ω))和q-,m(ω),其中with corresponding vectors q +, m (ω)) and q -, m (ω), where
并且and
基于上文,以下情况可能成立:Based on the above, the following may be true:
从两个向量集合q+,m(ω)和q-,m(ω)中,m=1,2,....,M,两个大小为2M×M的半正交矩阵可以形成为:From two vector sets q +, m (ω) and q -, m (ω), with m = 1, 2, ..., M, two semi-orthogonal matrices of size 2M × M can be formed as:
Q+(ω)=[q+,1(ω) q+,2(ω) … q+,M(ω)],Q + (ω) = [q +, 1 (ω) q +, 2 (ω) … q +, M (ω)],
Q-(ω)=[q-,1(ω) q-,2(ω) … q-,M(ω)],Q - (ω) = [q -, 1 (ω) q -, 2 (ω) ... q -, M (ω)],
其中in
IM是M×M的单位矩阵。I M is the M×M identity matrix.
以下情况也可能成立:The following may also be true:
其中in
Λ-(ω)=diag[λ-,1(ω),λ-,2(ω),...,λ-,M(ω)],Λ - (ω)=diag[λ -, 1 (ω), λ -, 2 (ω),..., λ -, M (ω)],
Λ+(ω)=diag[λ+,1(ω),λ+,2(ω),...,λ+,M(ω)],Λ + (ω)=diag[λ +, 1 (ω), λ +, 2 (ω),..., λ +, M (ω)],
是两个大小为M×M的对角矩阵,对角线元素λ-,m(ω)=λm(ω)-λ2M-m+1(ω)并且λ+,m(ω)=λm(ω)+λ2M-m+1(ω)。are two diagonal matrices of size M×M, with diagonal elements λ −,m (ω)=λ m (ω)−λ 2M−m+1 (ω) and λ +,m (ω)=λ m (ω)+λ 2M−m+1 (ω).
令N为2≤N≤M的正整数,两个大小为2M×M的半正交矩阵可以定义如下:Let N be a positive integer with size 2≤N≤M, and two semi-orthogonal matrices of size 2M×M can be defined as follows:
Q+,:N(ω)Q +,:N (ω)
=[q+,1(ω) q+,2(ω) … q+,N(ω)],Q-,:N(ω)=[q +, 1 (ω) q +, 2 (ω) … q +, N (ω)], Q -,: N (ω)
=[q-,1(ω) q-,2(ω) … q-,N(ω)]=[q -, 1 (ω) q -, 2 (ω) … q -, N (ω)]
在示例实施方式中,本文描述的正交滤波器可以采取以下形式:In an example implementation, the orthogonal filters described herein may take the following form:
其中in
可以表示长度N的公共复数值滤波器。对于此类正交滤波器,漫射噪声的输出IC可以计算为A common complex-valued filter of length N can be represented. For such an orthogonal filter, the output IC of the diffuse noise can be calculated as
其中in
Λ-,N(ω)=diag[λ-,1(ω),λ-,2(ω),...,λ-,N(ω)]Λ -, N (ω) = diag [λ -, 1 (ω), λ -, 2 (ω), ..., λ -, N (ω)]
Λ+,N(ω)=diag[λ+,1(ω),λ+,2(ω),...,λ+,N(ω)]Λ +, N (ω) = diag [λ +, 1 (ω), λ +, 2 (ω), ..., λ +, N (ω)]
并且and
基于上文,双耳WNG、DF、和功率波束图可以分别确定为如下:Based on the above, the binaural WNG, DF, and power beam patterns can be determined as follows:
和and
其中in
可以是大小为N×2的矩阵,并且无失真约束可以是can be a matrix of size N × 2, and the distortion-free constraint can be
其中N≥2。Where N≥2.
由以上可以得出Zi(ω)的方差为:From the above, we can conclude that the variance of Zi (ω) is:
其中对于φZ1(ω),Q±,:N(ω)=Q+,:N(ω),并且对于φZ2(ω),Q±,:N(ω)=Q-,:N(ω)。在漫射加白噪声的情况下(例如,Γd(ω)=Γd(ω)+I2M),Zi(ω)的方差可以简化为where for φ Z1 (ω), Q ±,:N (ω) = Q +,:N (ω), and for φ Z2 (ω), Q ±,:N (ω) = Q −,:N (ω). In the case of diffuse plus white noise (e.g., Γ d (ω) = Γ d (ω) + I 2M ), the variance of Zi (ω) can be simplified to
这表明φZ1(ω)可以等于φZ2(ω)(例如,φZ1(ω)=φZ2(ω))。This indicates that φ Z1 (ω) may be equal to φ Z2 (ω) (eg, φ Z1 (ω)=φ Z2 (ω)).
此外,两个估计Z1(ω)和Z2(ω)的互相关可以如下确定:Furthermore, the cross-correlation of the two estimates Z 1 (ω) and Z 2 (ω) can be determined as follows:
在漫射加白噪声(例如,Γd(ω)=Γd(ω)+I2M)的情况下,该互相关可能会变为In the case of diffuse plus white noise (eg, Γ d (ω) = Γ d (ω) + I 2M ), the cross-correlation may become
这可以不依赖于白噪声。对于Γv(ω)=Γd(ω)+I2M,所估计的信号的输出IC可以被确定为This may not rely on white noise. For Γ v (ω) = Γ d (ω) + I 2M , the output IC of the estimated signal may be determined as
从上文可以看出,在某些场景下(例如,对于大的输入SNR),所估计的信号的定位线索可以(例如,主要地)取决于期望的信号的定位线索,而在其他场景下(例如,对于低SNR),所估计的信号的定位线索可以(例如,主要地)取决于漫射加白噪声的定位线索。因此,可以通过最小化受本文描述的无失真约束限制的经滤波的漫射噪声信号的总和来获得第一双耳波束形成器(例如,双耳超指向波束形成器)。求和可以例如按以下方式执行:As can be seen from the above, in some scenarios (e.g., for large input SNRs), the localization clues of the estimated signal may (e.g., mainly) depend on the localization clues of the desired signal, while in other scenarios (e.g., for low SNRs), the localization clues of the estimated signal may (e.g., mainly) depend on the localization clues of the diffuse plus white noise. Therefore, a first binaural beamformer (e.g., a binaural super-directional beamformer) can be obtained by minimizing the sum of the filtered diffuse noise signals subject to the distortion-free constraint described herein. The summation can be performed, for example, in the following manner:
从中可以得出以下:From this we can conclude that:
并且相应的DF可以确定为:And the corresponding DF can be determined as:
因此,第一双耳波束形成器可以由以下表示:Therefore, the first binaural beamformer can be represented by:
可以通过最大化本文描述的DF来获得第二双耳波束形成器(例如,第二双耳超指向波束形成器)。例如,当The second binaural beamformer (e.g., the second binaural super-directional beamformer) can be obtained by maximizing the DF described herein.
上面所示的DF可以改写为:The DF shown above can be rewritten as:
其中in
C′(ω,0)C′H(ω,0)可表示NxN厄米特(Hermitian)矩阵并且矩阵的秩可以等于2。由于有两个约束(例如,无失真约束)要满足,因此可以考虑两个特征向量,表示为t′1(ω)和t′2(ω)。这些特征向量可以对应于矩阵C′(ω,0)C′H(ω,0)的两个非空的特征值,表示为λt′1(ω)andλt′2(ω)。由此,最大化如以上改写的具有两个自由度的DF的滤波器(由于要满足两个约束)可以如下所示:C′(ω,0) C′H (ω,0) may represent an NxN Hermitian matrix and the rank of the matrix may be equal to 2. Since there are two constraints (e.g., distortion-free constraints) to satisfy, two eigenvectors may be considered, denoted as t′1 (ω) and t′2 (ω). These eigenvectors may correspond to two non-empty eigenvalues of the matrix C′(ω,0) C′H (ω,0), denoted as λt′1( ω ) and λt′2 (ω). Thus, a filter that maximizes the DF with two degrees of freedom as rewritten above (since two constraints are to be satisfied) may be as follows:
其中in
α′(ω)=[α′1(ω) α′2(ω)]T≠0α′(ω)=[α′ 1 (ω) α′ 2 (ω)] T ≠0
可以是长度为2的任意复数值向量,并且T′1∶2(ω)可以被确定为:can be any complex-valued vector of length 2, and T′ 1:2 (ω) can be determined as:
T′1∶2(ω)=[t′1(ω) t′2(ω)]T′ 1:2 (ω)=[t′ 1 (ω) t′ 2 (ω)]
因此,最大化上述DF的滤波器可以表示为:Therefore, the filter that maximizes the above DF can be expressed as:
并且相应的DF可以确定为:And the corresponding DF can be determined as:
基于以上内容,可以得出以下:Based on the above, we can conclude the following:
并且第二双耳波束形成器可以被确定为:And the second binaural beamformer can be determined as:
通过在双耳波束形成器中包括两个子波束形成滤波器(例如,每一个子波束形成滤波器用于双耳通道之一),并使滤波器彼此正交,波束形成器的双耳输出中的白噪声分量的IC可以被降低(例如,被最小化)。在一些实施方式中,波束形成器的双耳输出中的漫射噪声分量的IC也可以被增加(例如,被最大化)。波束形成器的双耳输出中的信号分量(例如,感兴趣的信号)可以为同相位而输出中的白噪声分量可以具有随机相位关系。这样,在从波束形成器接收双耳输出时,人类听觉系统可以更好地从白噪声中分离感兴趣的信号,并减弱白噪声放大的影响。By including two sub-beamforming filters in a binaural beamformer (e.g., one for each binaural channel) and making the filters orthogonal to each other, the IC of the white noise component in the binaural output of the beamformer can be reduced (e.g., minimized). In some embodiments, the IC of the diffuse noise component in the binaural output of the beamformer can also be increased (e.g., maximized). The signal components (e.g., the signal of interest) in the binaural output of the beamformer can be in phase while the white noise components in the output can have a random phase relationship. In this way, when receiving the binaural output from the beamformer, the human auditory system can better separate the signal of interest from the white noise and reduce the effect of white noise amplification.
图5为示出了可以由包括两个正交滤波器的示例波束形成器(例如,图2的波束形成器210)执行的方法500的流程图。方法500可以由包括硬件(例如,电路、专用逻辑、可编程逻辑、微代码等)、软件(例如,在处理装置上运行以执行硬件仿真的指令),或其组合的处理逻辑执行。5 is a flow chart illustrating a method 500 that may be performed by an example beamformer (e.g., beamformer 210 of FIG. 2 ) including two orthogonal filters. Method 500 may be performed by processing logic including hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions running on a processing device to perform hardware emulation), or a combination thereof.
为了简化说明,将方法描绘和描述为一系列动作。然而,根据本公开的动作可以以各种顺序和/或同时发生,并且伴随本文未呈现和描述的其他动作。此外,可能不需要所有示出的动作来实现根据所公开的主题的方法。另外,这些方法可以替代地经由状态图或事件表示为一系列相互关联的状态。另外,应当理解,在本说明书中公开的方法能够被存储在制品上,以便于将这样的方法传输和转移到计算装置。本文所使用的术语制品旨在涵盖可从任何计算机可读装置或存储介质访问的计算机程序。In order to simplify the description, the method is depicted and described as a series of actions. However, the actions according to the present disclosure may occur in various orders and/or simultaneously, and with other actions not presented and described herein. In addition, all the actions shown may not be required to implement the method according to the disclosed subject matter. In addition, these methods can alternatively be represented as a series of interrelated states via state diagrams or events. In addition, it should be understood that the methods disclosed in this specification can be stored on an article of manufacture, so that such methods can be transmitted and transferred to a computing device. The term article of manufacture used herein is intended to cover a computer program accessible from any computer-readable device or storage medium.
参照图5,在502处,方法500可以由与麦克风阵列(例如,图1中的麦克风阵列102、图2中的202或图4中的402)相关联的处理装置(例如,处理装置206)来执行。在504处,处理装置可以接收包括源音频信号(例如,感兴趣的信号)和噪声信号(例如,白噪声)的音频输入信号。在506处,处理装置可以将第一波束形成器滤波器应用于包括感兴趣的信号和噪声信号的音频输入信号,以生成指定用于第一声道接收机的第一音频输出。第一音频输出可以包括以各自的第一相位为特性的第一源信号分量(例如,表示感兴趣的信号)和第一噪声分量(例如,表示白噪声)。在508处,处理装置可以将第二波束形成器滤波器应用于包括感兴趣的信号和噪声信号的音频输入信号,以生成指定用于第二声道接收机的第二音频输出。第二音频输出可以包括以各自的第二相位为特性的第二源信号分量(例如,表示感兴趣的信号)和第二噪声分量(例如,表示白噪声)。第一和第二波束形成器滤波器可以以使得两个输出的噪声分量不相关(例如,具有随机相位关系)并且两个输出的源信号分量相关(例如,彼此同相位)的方式构造。在510处,第一和第二音频输出可被提供给各自声道接收机或各自音频通道。例如,第一音频输出可以被提供给第一声道接收机(例如,用于左耳),而第二音频输出可被指定用于第二声道接收机(例如,用于右耳)。输出中的白噪声分量的耳间相干性(IC)可以被最小化(例如,具有约为零的值),而输出中的信号分量的耳间相干性(IC)可以被最大化(例如,具有约为一的值)。5 , at 502 , method 500 may be performed by a processing device (e.g., processing device 206) associated with a microphone array (e.g., microphone array 102 in FIG. 1 , 202 in FIG. 2 , or 402 in FIG. 4 ). At 504 , the processing device may receive an audio input signal including a source audio signal (e.g., a signal of interest) and a noise signal (e.g., white noise). At 506 , the processing device may apply a first beamformer filter to the audio input signal including the signal of interest and the noise signal to generate a first audio output designated for a first channel receiver. The first audio output may include a first source signal component (e.g., representing the signal of interest) and a first noise component (e.g., representing white noise) characterized by respective first phases. At 508 , the processing device may apply a second beamformer filter to the audio input signal including the signal of interest and the noise signal to generate a second audio output designated for a second channel receiver. The second audio output may include a second source signal component (e.g., representing the signal of interest) and a second noise component (e.g., representing white noise) characterized by respective second phases. The first and second beamformer filters may be constructed in a manner such that the noise components of the two outputs are uncorrelated (e.g., have a random phase relationship) and the source signal components of the two outputs are correlated (e.g., in phase with each other). At 510, the first and second audio outputs may be provided to respective channel receivers or respective audio channels. For example, the first audio output may be provided to a first channel receiver (e.g., for a left ear), while the second audio output may be designated for a second channel receiver (e.g., for a right ear). The interaural coherence (IC) of the white noise component in the output may be minimized (e.g., having a value of approximately zero), while the interaural coherence (IC) of the signal component in the output may be maximized (e.g., having a value of approximately one).
图6为结合期望的信号和白噪声,将本文所描述的示例双耳波束形成器的仿真输出IC与传统波束形成器的仿真输出IC进行比较的线图。图的上半部分示出了双耳与传统波束形成器两者的期望的信号的输出IC等于一,而该图的下半部分示出了双耳波束形成器的白噪声的输出IC等于零并且传统波束形成器的白噪声的输出IC等于一。这表明,在双耳波束形成器的两个输出信号中,信号分量(例如,期望的信号)是基本上相关的,而白色噪声分量基本上不相关。由此,输出信号对应于本文讨论的异相的情况,其中期望的信号和白噪声被感知为来自空间中的两个单独的方向/位置。FIG6 is a line graph comparing the simulated output IC of the example binaural beamformer described herein with the simulated output IC of the conventional beamformer in combination with the desired signal and white noise. The upper half of the graph shows that the output IC of the desired signal of both the binaural and conventional beamformers is equal to one, while the lower half of the graph shows that the output IC of the white noise of the binaural beamformer is equal to zero and the output IC of the white noise of the conventional beamformer is equal to one. This indicates that in the two output signals of the binaural beamformer, the signal component (e.g., the desired signal) is substantially correlated, while the white noise component is substantially uncorrelated. Thus, the output signals correspond to the out-of-phase situation discussed herein, in which the desired signal and the white noise are perceived as coming from two separate directions/positions in space.
本文所述的双耳波束形成器还可具有一个或多个其他期望特性。例如,虽然由双耳波束形成器生成的波束图可以根据包括在与波束形成器相关联的麦克风阵列中的麦克风的数量而改变,但所述波束图可以基本上相对于频率不变(例如,基本上是频率不变的)。此外,当与相同阶(例如,一阶、二阶、三阶和四阶)的传统波束形成器相比时,双耳波束形成器不仅可以提供更好的期望的信号和白噪声信号之间的分离,而且还产生更高的白噪声增益(WNG)。The binaural beamformers described herein may also have one or more other desirable characteristics. For example, although the beam pattern generated by the binaural beamformer may vary depending on the number of microphones included in the microphone array associated with the beamformer, the beam pattern may be substantially invariant with respect to frequency (e.g., substantially frequency invariant). Furthermore, when compared to conventional beamformers of the same order (e.g., first order, second order, third order, and fourth order), the binaural beamformer may not only provide better separation between the desired signal and the white noise signal, but also produce a higher white noise gain (WNG).
图7为示出根据示例实施例的采用计算机系统700的示例形式的机器的框图,在该计算机系统700中可以执行指令集或指令序列以使该机器执行本文所讨论的方法中的任何一种。在替代实施例中,该机器作为独立装置操作,或者可以连接(例如,联网)到其他机器。在联网部署中,机器可以在服务器-客户机网络环境中以服务器或客户机机器的身份操作,或者可以在对等(或分布式)网络环境中充当对等机器。该机器可以是车载系统、可穿戴装置、个人计算机(PC)、平板PC、混合平板、个人数字助理(PDA)、移动电话或任何能够(顺序或以其他方式)执行指定该机器要采取的动作的指令的机器。此外,虽然仅示出了单个机器,但是术语“机器”也应被理解为包括机器的任何集合,这些机器单独地或共同地执行一组(或多组)指令以执行本文讨论的任何一个或多个方法。类似地,术语“基于处理器的系统”应被认为包括由处理器(例如,计算机)控制或操作的任何一组一个或多个机器,以单独或共同执行指令以执行本文讨论的任何一个或多个方法。FIG. 7 is a block diagram showing a machine in the example form of a computer system 700 according to an example embodiment, in which an instruction set or sequence of instructions can be executed to make the machine perform any of the methods discussed herein. In an alternative embodiment, the machine operates as a standalone device, or can be connected (e.g., networked) to other machines. In a networked deployment, the machine can operate as a server or client machine in a server-client network environment, or can act as a peer machine in a peer (or distributed) network environment. The machine can be an in-vehicle system, a wearable device, a personal computer (PC), a tablet PC, a hybrid tablet, a personal digital assistant (PDA), a mobile phone, or any machine capable of (sequentially or otherwise) executing instructions specifying the actions to be taken by the machine. In addition, although only a single machine is shown, the term "machine" should also be understood to include any collection of machines that individually or collectively execute a set (or multiple sets) of instructions to perform any one or more methods discussed herein. Similarly, the term "processor-based system" should be considered to include any set of one or more machines controlled or operated by a processor (e.g., a computer) to individually or collectively execute instructions to perform any one or more methods discussed herein.
示例计算机系统700包括至少一个处理器702(例如,中央处理单元(CPU)、图形处理单元(GPU)或两者、处理器核、计算节点等),主存储器704和静态存储器706,它们经由链接708(例如,总线)彼此通信。计算机系统700可以进一步包括视频显示单元710、字母数字输入装置712(例如,键盘)和用户界面(UI)导航装置714(例如,鼠标)。在一个实施例中,视频显示单元710、输入装置712和UI导航装置714被并入触摸屏显示器中。计算机系统700可以另外包括存储装置716(例如,驱动单元)、信号生成装置718(例如,扬声器)、网络接口装置720以及一个或多个传感器(未示出),诸如全球定位系统(GPS)传感器、指南针、加速度计、陀螺仪、磁力计或其他传感器。The example computer system 700 includes at least one processor 702 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both, a processor core, a computing node, etc.), a main memory 704, and a static memory 706, which communicate with each other via a link 708 (e.g., a bus). The computer system 700 may further include a video display unit 710, an alphanumeric input device 712 (e.g., a keyboard), and a user interface (UI) navigation device 714 (e.g., a mouse). In one embodiment, the video display unit 710, the input device 712, and the UI navigation device 714 are incorporated into a touch screen display. The computer system 700 may additionally include a storage device 716 (e.g., a drive unit), a signal generating device 718 (e.g., a speaker), a network interface device 720, and one or more sensors (not shown), such as a global positioning system (GPS) sensor, a compass, an accelerometer, a gyroscope, a magnetometer, or other sensors.
存储装置716包括机器可读介质722,该机器可读介质722上存储了一组或多组数据结构和指令724(例如,软件),这些数据结构和指令724体现了本文所描述的一种或多种方法或功能或由其利用。指令724也可以在由计算机系统700执行的期间,全部或至少部分地驻留在主存储器704、静态存储器706和/或处理器702内,主存储器704、静态存储器706和处理器702也构成机器可读介质。The storage device 716 includes a machine-readable medium 722 having stored thereon one or more sets of data structures and instructions 724 (e.g., software) that embody or are utilized by one or more of the methodologies or functions described herein. The instructions 724 may also reside, in whole or in part, within the main memory 704, the static memory 706, and/or the processor 702 during execution by the computer system 700, the main memory 704, the static memory 706, and the processor 702 also constituting machine-readable media.
虽然在示例实施例中将机器可读介质722示为单个介质,但是术语“机器可读介质”可以包括存储一个或多个指令的724的单个介质或多个媒介(例如,集中式或分布式数据库,和/或相关联的高速缓存和服务器)。术语“机器可读介质”还应当被认为包括能够存储、编码或携带由机器执行的指令并使机器执行本公开的任何一种或多种方法,或者能够存储、编码或携带由此类指令利用或与此类指令相关联的数据结构的任何有形介质。因此,术语“机器可读介质”应被认为包括但不限于固态存储器以及光和磁介质。机器可读介质的特定示例包括易失性或非易失性存储器,包括但不限于,举例来说,半导体存储装置(例如,电可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM))和闪存装置;诸如内部硬盘和可移动磁盘之类的磁盘;磁光盘;以及CD-ROM和DVD-ROM磁盘。Although the machine-readable medium 722 is shown as a single medium in the example embodiment, the term "machine-readable medium" may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) storing one or more instructions 724. The term "machine-readable medium" should also be considered to include any tangible medium that is capable of storing, encoding, or carrying instructions executed by a machine and causing the machine to perform any one or more methods of the present disclosure, or capable of storing, encoding, or carrying data structures utilized by or associated with such instructions. Thus, the term "machine-readable medium" should be considered to include, but is not limited to, solid-state memory and optical and magnetic media. Specific examples of machine-readable media include volatile or non-volatile memory, including, but not limited to, for example, semiconductor memory devices (e.g., electrically programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM)) and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
指令724还可以利用多种众所周知的传输协议中的任何一种(例如HTTP),经由网络接口装置720使用传输介质通过通信网络726发送或接收。通信网络的示例包括局域网(LAN)、广域网(WAN)、互联网、移动电话网络、普通旧电话(POTS)网络和无线数据网络(例如Wi-Fi,3G和4G LTE/LTE-A或WiMAX网络)。术语“传输介质”应被认为包括能够存储、编码或携带由机器执行的指令的任何无形介质,并且包括数字或模拟通信信号或其他无形介质以促进这种软件的通信。The instructions 724 may also be sent or received over a communication network 726 using a transmission medium via the network interface device 720 using any of a number of well-known transmission protocols (e.g., HTTP). Examples of communication networks include a local area network (LAN), a wide area network (WAN), the Internet, a mobile phone network, a plain old telephone (POTS) network, and wireless data networks (e.g., Wi-Fi, 3G and 4G LTE/LTE-A or WiMAX networks). The term "transmission medium" shall be deemed to include any intangible medium that is capable of storing, encoding, or carrying instructions for execution by the machine, and includes digital or analog communications signals or other intangible media to facilitate communication of such software.
在前面的描述中,阐述了许多细节。然而,对于受益于本公开的本领域普通技术人员显而易见的是,可以在没有这些具体细节的情况下实践本公开。在一些实例中,以框图的形式而不是详细地示出了众所周知的结构和装置,以避免使本公开不清楚。In the foregoing description, many details are set forth. However, it will be apparent to those of ordinary skill in the art who benefit from the present disclosure that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form rather than in detail to avoid making the present disclosure unclear.
已经根据计算机存储器内的数据位的操作的算法和符号表示来呈现详细描述的某些部分。这些算法描述和表示是数据处理领域的技术人员用来最有效地将其工作的实质传达给本领域其他技术人员的手段。这里,算法通常被认为是产生期望结果的自洽的步骤序列。这些步骤是需要对物理量进行物理操纵的步骤。通常,尽管不是必须的,这些量采取能够被存储、传输、组合、比较和以其他方式操纵的电或磁信号的形式。主要出于通用的原因,已经证明有时将这些信号称为位、值、元素、符号、字符、项、数字等是方便的。Certain portions of the detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the art of data processing to most effectively convey the substance of their work to others skilled in the art. Here, an algorithm is generally considered to be a self-consistent sequence of steps that produces a desired result. These steps are steps that require physical manipulation of physical quantities. Typically, although not necessarily, these quantities take the form of electrical or magnetic signals that can be stored, transmitted, combined, compared, and otherwise manipulated. Mainly for general reasons, it has proven convenient at times to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, etc.
然而,应当牢记,所有这些和类似术语均应与适当的物理量相关联,并且仅仅是应用于这些量的方便标签。除非从下面的讨论中另外明确指出,否则应理解,在整个描述中,利用诸如“分段”、“分析”、“确定”、“启用”、“识别”、“修改”等术语的讨论表示计算机系统或类似电子计算装置的动作和过程,其将表示为计算机系统寄存器和存储器中的物理(例如电子)量的数据,操纵和转换为其他表示为计算机系统存储器或其他此类信息存储、传输或显示装置中的物理量的数据。It should be kept in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless otherwise clearly indicated from the following discussion, it should be understood that throughout the description, discussions utilizing terms such as "segment," "analyze," "determine," "enable," "identify," "modify," etc., refer to actions and processes of a computer system or similar electronic computing device that manipulate and transform data represented as physical (e.g., electronic) quantities in the computer system's registers and memories into other data represented as physical quantities in the computer system's memories or other such information storage, transmission, or display devices.
词语“示例”或“示例性”在本文中用来表示充当示例、实例或说明。本文中被描述为“示例”或“示例性”的任何方面或设计不一定被解释为比其他方面或设计优选或有利。相反,词语“示例”或“示例性”的使用旨在以具体方式呈现概念。如在本申请中使用的,术语“或”旨在表示包含性的“或”而不是排他性的“或”。也就是说,除非另有说明或从上下文可以清楚地看出,否则“X包括A或B”旨在表示任何自然的包含性排列。也就是说,如果X包括A;X包括B;或X包括A和B,则在任何上述情况下均满足“X包括A或B”。另外,在本申请和所附权利要求中使用的冠词“一”和“一个”通常应被解释为意指“一个或多个”,除非另有说明或从上下文清楚地指向单数形式。此外,除非如此描述,否则贯穿全文使用术语“实施例”或“一个实施例”或“实施方式”或“一个实施方式”并不旨在表示相同的实施例或实施方式。The words "example" or "exemplary" are used herein to mean serving as an example, instance or illustration. Any aspect or design described herein as "example" or "exemplary" is not necessarily to be interpreted as being preferred or advantageous over other aspects or designs. On the contrary, the use of the words "example" or "exemplary" is intended to present concepts in a specific way. As used in this application, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless otherwise specified or clearly seen from the context, "X includes A or B" is intended to mean any natural inclusive arrangement. That is, if X includes A; X includes B; or X includes A and B, then "X includes A or B" is satisfied in any of the above cases. In addition, the articles "one" and "an" used in this application and the appended claims should generally be interpreted as meaning "one or more", unless otherwise specified or clearly pointed to the singular form from the context. In addition, unless so described, the use of the terms "embodiment" or "one embodiment" or "implementation" or "one implementation" throughout the text is not intended to represent the same embodiment or implementation.
在整个说明书中对“一个实施方式”或“实施方式”的引用是指结合该实施方式描述的特定特征、结构或特性包括在至少一个实施方式中。因此,在整个说明书中各处出现的短语“在一个实施方式中”或“在实施方式中”不一定都指的是同一实施方式。另外,术语“或”旨在表示包含性的“或”而不是排他性的“或”。References throughout this specification to "one embodiment" or "an embodiment" mean that a particular feature, structure, or characteristic described in conjunction with the embodiment is included in at least one embodiment. Thus, the phrases "in one embodiment" or "in an embodiment" appearing in various places throughout this specification are not necessarily all referring to the same embodiment. Additionally, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or".
应当理解,以上描述意图是说明性的而不是限制性的。在阅读和理解以上描述之后,许多其他实施方式对于本领域技术人员将是显而易见的。因此,本公开的范围应参考所附的权利要求以及这些权利要求所赋予的等效物的全部范围来确定。It should be understood that the above description is intended to be illustrative rather than restrictive. After reading and understanding the above description, many other embodiments will be apparent to those skilled in the art. Therefore, the scope of the present disclosure should be determined with reference to the appended claims and the full scope of equivalents given by these claims.
Claims (22)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2020/094296 WO2021243634A1 (en) | 2020-06-04 | 2020-06-04 | Binaural beamforming microphone array |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114073106A CN114073106A (en) | 2022-02-18 |
| CN114073106B true CN114073106B (en) | 2023-08-04 |
Family
ID=78831552
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202080005496.5A Active CN114073106B (en) | 2020-06-04 | 2020-06-04 | Binaural beamforming microphone array |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US11546691B2 (en) |
| CN (1) | CN114073106B (en) |
| WO (1) | WO2021243634A1 (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102789155B1 (en) * | 2019-03-10 | 2025-04-01 | 카르돔 테크놀로지 엘티디. | Speech Augmentation Using Clustering of Queues |
| EP4147229A4 (en) * | 2020-05-08 | 2024-07-17 | Microsoft Technology Licensing, LLC | System and method for data augmentation for multi-microphone signal processing |
| CN115396800A (en) * | 2022-05-16 | 2022-11-25 | 西安合谱声学科技有限公司 | Directional hearing aid method, directional hearing aid system, hearing aid and storage medium |
| CN118982988A (en) * | 2024-05-06 | 2024-11-19 | 西南财经大学 | Single-input dual-output speech separation method with reverse presentation |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8842861B2 (en) * | 2010-07-15 | 2014-09-23 | Widex A/S | Method of signal processing in a hearing aid system and a hearing aid system |
| CN109997375A (en) * | 2016-11-09 | 2019-07-09 | 西北工业大学 | Concentric circles difference microphone array and associated beam are formed |
| US10567898B1 (en) * | 2019-03-29 | 2020-02-18 | Snap Inc. | Head-wearable apparatus to generate binaural audio |
Family Cites Families (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2005050618A2 (en) * | 2003-11-24 | 2005-06-02 | Koninklijke Philips Electronics N.V. | Adaptive beamformer with robustness against uncorrelated noise |
| WO2009151578A2 (en) * | 2008-06-09 | 2009-12-17 | The Board Of Trustees Of The University Of Illinois | Method and apparatus for blind signal recovery in noisy, reverberant environments |
| EP2360943B1 (en) | 2009-12-29 | 2013-04-17 | GN Resound A/S | Beamforming in hearing aids |
| US20120057717A1 (en) | 2010-09-02 | 2012-03-08 | Sony Ericsson Mobile Communications Ab | Noise Suppression for Sending Voice with Binaural Microphones |
| US9078057B2 (en) * | 2012-11-01 | 2015-07-07 | Csr Technology Inc. | Adaptive microphone beamforming |
| CN105075294B (en) * | 2013-04-30 | 2018-03-09 | 华为技术有限公司 | audio signal processing device |
| US9980075B1 (en) * | 2016-11-18 | 2018-05-22 | Stages Llc | Audio source spatialization relative to orientation sensor and output |
| EP3753263B1 (en) * | 2018-03-14 | 2022-08-24 | Huawei Technologies Co., Ltd. | Audio encoding device and method |
| US10425745B1 (en) * | 2018-05-17 | 2019-09-24 | Starkey Laboratories, Inc. | Adaptive binaural beamforming with preservation of spatial cues in hearing assistance devices |
| WO2020014812A1 (en) * | 2018-07-16 | 2020-01-23 | Northwestern Polytechnical University | Flexible geographically-distributed differential microphone array and associated beamformer |
| US11276397B2 (en) * | 2019-03-01 | 2022-03-15 | DSP Concepts, Inc. | Narrowband direction of arrival for full band beamformer |
| US11276307B2 (en) * | 2019-09-24 | 2022-03-15 | International Business Machines Corporation | Optimized vehicle parking |
| US11330366B2 (en) * | 2020-04-22 | 2022-05-10 | Oticon A/S | Portable device comprising a directional system |
| US11425497B2 (en) * | 2020-12-18 | 2022-08-23 | Qualcomm Incorporated | Spatial audio zoom |
-
2020
- 2020-06-04 CN CN202080005496.5A patent/CN114073106B/en active Active
- 2020-06-04 US US17/273,237 patent/US11546691B2/en active Active
- 2020-06-04 WO PCT/CN2020/094296 patent/WO2021243634A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8842861B2 (en) * | 2010-07-15 | 2014-09-23 | Widex A/S | Method of signal processing in a hearing aid system and a hearing aid system |
| CN109997375A (en) * | 2016-11-09 | 2019-07-09 | 西北工业大学 | Concentric circles difference microphone array and associated beam are formed |
| US10567898B1 (en) * | 2019-03-29 | 2020-02-18 | Snap Inc. | Head-wearable apparatus to generate binaural audio |
Non-Patent Citations (1)
| Title |
|---|
| A Simple Theory and New Method of Differential Beamforming With Uniform Linear Microphone Arrays;HUANG Gongping等;IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING;第28卷;第1079-1093页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| US11546691B2 (en) | 2023-01-03 |
| WO2021243634A1 (en) | 2021-12-09 |
| CN114073106A (en) | 2022-02-18 |
| US20220248135A1 (en) | 2022-08-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114073106B (en) | Binaural beamforming microphone array | |
| EP3413589B1 (en) | A microphone system and a hearing device comprising a microphone system | |
| CN102771144B (en) | Apparatus and method for direction dependent spatial noise reduction | |
| CN104781880B (en) | The apparatus and method that multi channel speech for providing notice has probability Estimation | |
| JP6074263B2 (en) | Noise suppression device and control method thereof | |
| KR101555416B1 (en) | Apparatus and method for spatially selective sound acquisition by acoustic triangulation | |
| EP3373602A1 (en) | A method of localizing a sound source, a hearing device, and a hearing system | |
| CN102324237B (en) | Microphone-array speech-beam forming method as well as speech-signal processing device and system | |
| WO2015035785A1 (en) | Voice signal processing method and device | |
| CN110827846B (en) | Speech noise reduction method and device adopting weighted superposition synthesis beam | |
| Buchris et al. | Frequency-domain design of asymmetric circular differential microphone arrays | |
| CN111681665A (en) | Omnidirectional noise reduction method, equipment and storage medium | |
| Wang et al. | On robust and high directive beamforming with small-spacing microphone arrays for scattered sources | |
| CN115457971A (en) | A noise reduction method, electronic equipment and storage medium | |
| Luo et al. | Design of fully steerable differential beamformers with linear superarrays | |
| Wang et al. | TARGET SPEECH EXTRACTION IN COCKTAIL PARTY BY COMBINING BEAMFORMING AND BLIND SOURCE SEPARATION. | |
| CN113491137B (en) | Flexible differential microphone array with fractional order | |
| Jin et al. | On the design of robust linear differential microphone arrays against low-rank noise | |
| Atkins et al. | Robust superdirective beamformer with optimal regularization | |
| Farmani et al. | Sound source localization for hearing aid applications using wireless microphones | |
| Bai et al. | Kalman filter-based microphone array signal processing using the equivalent source model | |
| WO2022170541A1 (en) | First-order differential microphone array with steerable beamformer | |
| US20250287160A1 (en) | Ear-worn device with neural network-based noise modification and/or spatial focusing | |
| WO2024108515A1 (en) | Concentric circular microphone arrays with 3d steerable beamformers | |
| Adebisi et al. | Acoustic signal gain enhancement and speech recognition improvement in smartphones using the REF beamforming algorithm |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |