[go: up one dir, main page]

CN116965060A - Audio system using optical microphone - Google Patents

Audio system using optical microphone Download PDF

Info

Publication number
CN116965060A
CN116965060A CN202180094032.0A CN202180094032A CN116965060A CN 116965060 A CN116965060 A CN 116965060A CN 202180094032 A CN202180094032 A CN 202180094032A CN 116965060 A CN116965060 A CN 116965060A
Authority
CN
China
Prior art keywords
sound
user
light source
audio
skin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180094032.0A
Other languages
Chinese (zh)
Inventor
安东尼奥·约翰·米勒
莉莉安娜·鲁伊斯·迪亚兹
安德鲁·约翰·欧德科克
迈克·安德烈·谢勒
莫尔塔扎·哈莱吉梅巴迪
罗宾·夏尔马
魏国华
穆罕默德·塔雷克·艾哈迈德·埃尔-哈达德
吉泽姆·塔巴克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Meta Platforms Technologies LLC
Original Assignee
Meta Platforms Technologies LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/525,155 external-priority patent/US20220201403A1/en
Application filed by Meta Platforms Technologies LLC filed Critical Meta Platforms Technologies LLC
Priority claimed from PCT/US2021/063805 external-priority patent/WO2022133086A1/en
Publication of CN116965060A publication Critical patent/CN116965060A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrostatic, Electromagnetic, Magneto- Strictive, And Variable-Resistance Transducers (AREA)

Abstract

一种音频系统,包括光学传声器和音频控制器。该光学传声器包括光源和检测器。在一些实施例中,该光源照射用户的皮肤。替代地,光学传声器还包括膜,并且光源照射膜的一部分。来自局部区域的声音引起皮肤中的振动(或膜中的振动)。检测器可以与光源处于干涉配置或非干涉配置。音频控制器使用从检测器输出的信号来监测皮肤(或膜)的振动,并且使用监测到的振动对声音进行测量。

An audio system includes an optical microphone and an audio controller. The optical microphone includes a light source and a detector. In some embodiments, the light source illuminates the user's skin. Alternatively, the optical microphone further includes a membrane, and the light source illuminates a portion of the membrane. Sound from a local area causes vibrations in the skin (or vibrations in the membrane). The detector can be in an interference configuration or a non-interference configuration with the light source. The audio controller uses the signal output from the detector to monitor vibrations of the skin (or membrane) and uses the monitored vibrations to make sound measurements.

Description

使用光学传声器的音频系统Audio systems using optical microphones

相关申请的交叉引用Cross-references to related applications

本申请要求于2020年12月17日提交的、第63/126,669号美国临时申请的权益,该美国临时申请通过引用全部并入本文。This application claims the benefit of U.S. Provisional Application No. 63/126,669, filed on December 17, 2020, which is incorporated by reference in its entirety.

技术领域Technical field

本公开总体上涉及音频系统,并且更具体地,涉及使用光学传声器的音频系统。The present disclosure relates generally to audio systems, and more particularly, to audio systems using optical microphones.

背景技术Background technique

在嘈杂的环境(例如,喧闹的餐厅)中,常规音频系统可能难以选择性地捕获来自目标声源(例如,说话者、用户自己的语音等)的声音。用户是否在说话会影响对声音的选择性捕获。但是在嘈杂的环境中,音频系统往往无法区分用户说话和来自环境的噪声。常规音频系统试图使用语音活动检测器来缓解这一问题,该语音活动检测器依赖于在干扰声音中听得见的穿戴者语音(例如,经由常规传声器检测到的)的时间特性和频谱特性。但是,在低声信噪比(signal-to-noise ratio,SNR)环境(即,嘈杂的环境)中,由于穿戴者语音被噪声完全掩盖,这种方法经常失效。In noisy environments (e.g., noisy restaurants), conventional audio systems may have difficulty selectively capturing sounds from target sound sources (e.g., speakers, the user's own speech, etc.). Whether the user is speaking affects the selective capture of sounds. But in noisy environments, audio systems often cannot distinguish between user speech and ambient noise. Conventional audio systems attempt to alleviate this problem using voice activity detectors that rely on the temporal and spectral characteristics of the wearer's speech (eg, detected via conventional microphones) that is audible among interfering sounds. However, in low signal-to-noise ratio (SNR) environments (i.e., noisy environments), this method often fails because the wearer's voice is completely masked by the noise.

发明内容Contents of the invention

因此,本发明公开了根据所附权利要求的音频系统、方法、计算机可读介质和计算机程序。音频系统使用光学传声器。该音频系统包括音频控制器,并且还可以包括传声器阵列。在一些实施例中,音频系统可以是头戴式视图器(headset)、项链、手表、可听设备等的一部分。光学传声器包括光源和检测器。该光源被配置为发射光。该光包括参考光束和感测光束,并且光源被配置为用感测光束照射用户的皮肤。来自局部区域的声音(例如,用户的语音、其他人等)在用户的皮肤中引起振动。检测器处于与光源的干涉配置(例如,自混合、低相干干涉测量(low coherence interferometry,LCI)等),使得该检测器被配置为对混合信号进行检测。该混合信号对应于与感测光束的被皮肤反射的部分混合的参考光束。音频控制器被配置为使用该信号来对局部区域的声音进行测量。Accordingly, the present invention discloses audio systems, methods, computer readable media and computer programs in accordance with the appended claims. Audio systems use optical microphones. The audio system includes an audio controller and may also include a microphone array. In some embodiments, the audio system may be part of a headset, necklace, watch, hearable device, etc. Optical microphones include light sources and detectors. The light source is configured to emit light. The light includes a reference beam and a sensing beam, and the light source is configured to illuminate the user's skin with the sensing beam. Sound from a local area (eg, the user's voice, other people, etc.) causes vibrations in the user's skin. The detector is in an interference configuration with the light source (eg, self-mixing, low coherence interferometry (LCI), etc.) such that the detector is configured to detect mixed signals. This mixed signal corresponds to the reference beam mixed with the portion of the sensing beam that is reflected by the skin. The audio controller is configured to use the signal to make measurements of sound in the local area.

在根据本发明的音频系统的实施例中,音频系统包括:光学传声器,该光学传声器包括光源和检测器,该光源被配置为发射光,该光包括参考光束和感测光束,该光源被配置为用感测光束照射用户的皮肤,其中,来自局部区域的声音在皮肤中引起振动,该检测器与光源处于干涉配置,使得该检测器被配置为对混合信号进行检测,该混合信号对应于与感测光束的被皮肤反射的部分混合的参考光束;以及音频控制器,该音频控制器被配置为使用该混合信号来对声音进行测量。In an embodiment of the audio system according to the present invention, the audio system includes: an optical microphone including a light source and a detector, the light source being configured to emit light including a reference beam and a sensing beam, the light source being configured To illuminate the user's skin with a sensing beam in which sound from a localized area induces vibrations in the skin, the detector is in an interference configuration with the light source such that the detector is configured to detect a mixed signal corresponding to a reference beam mixed with the portion of the sensing beam reflected by the skin; and an audio controller configured to use the mixed signal to measure the sound.

在根据本发明的音频系统的实施例中,干涉配置可以使得光源和检测器形成以下中的至少一个:自混合干涉仪、迈克尔逊干涉仪、低相干干涉系统、激光多普勒测振仪(laser doppler vibrometer,LDV)或某种其它类型的干涉系统。In embodiments of the audio system according to the present invention, the interference configuration may be such that the light source and detector form at least one of the following: a self-mixing interferometer, a Michelson interferometer, a low-coherence interference system, a laser Doppler vibrometer ( laser doppler vibrometer (LDV) or some other type of interferometric system.

在根据本发明的音频系统的实施例中,光源可以邻接检测器,并且光学传声器还可以包括:透镜,该透镜耦接到光源,该透镜可以被配置为:将从光源发射的光分成参考光束和感测光束;以及将感测光束引导至皮肤,并且将参考光束反射到检测器。In an embodiment of the audio system according to the present invention, the light source may be adjacent to the detector, and the optical microphone may further comprise a lens coupled to the light source, the lens may be configured to split light emitted from the light source into a reference beam. and a sensing beam; and directing the sensing beam to the skin and reflecting the reference beam to the detector.

在根据本发明的音频系统的实施例中,该系统还可以包括:第二光学传声器,该第二光学传声器包括第二光源和第二检测器,该第二光源被配置为发射光,该光包括第二参考光束和第二感测光束,该光源被配置为用该感测光束照射用户的皮肤,该第二检测器与第二光源处于干涉配置,使得该第二检测器被配置为对第二混合信号进行检测,该第二混合信号对应于与第二感测光束的被皮肤反射的部分混合的第二参考光束;块,该块包括第一侧和第二侧,该第一侧耦接到光学传声器,并且该第二侧耦接到第二光学传声器;以及透镜,该透镜耦接到光源、块和第二光源,该透镜被配置为将从光源发射的光分成参考光束和感测光束,并将从第二光源发射的光分成第二参考光束和第二感测光束,将感测光束和第二感测光束引导至皮肤,将参考光束反射到检测器,并将第二参考光束反射到第二检测器。In an embodiment of the audio system according to the present invention, the system may further include: a second optical microphone including a second light source and a second detector, the second light source being configured to emit light, the light Comprising a second reference beam and a second sensing beam, the light source is configured to illuminate the user's skin with the sensing beam, and the second detector is in an interference configuration with the second light source such that the second detector is configured to detecting a second mixed signal corresponding to a second reference beam mixed with a portion of the second sensing beam reflected by the skin; a block including a first side and a second side, the first side coupled to the optical microphone, and the second side coupled to the second optical microphone; and a lens coupled to the light source, the block, and the second light source, the lens configured to split light emitted from the light source into a reference beam and sensing the light beam, dividing the light emitted from the second light source into a second reference beam and a second sensing beam, directing the sensing beam and the second sensing beam to the skin, reflecting the reference beam to the detector, and The two reference beams are reflected to the second detector.

在根据本发明的音频系统的实施例中,检测器和光源可以彼此分离开阈值距离。In embodiments of the audio system according to the invention, the detector and the light source may be separated from each other by a threshold distance.

在根据本发明的音频系统的实施例中,光学传声器可以位于包括鼻托的头戴式视图器上,并且光学传声器可以集成到该鼻托中,并且光源可以被配置为用感测光束照射用户的鼻子的皮肤。In embodiments of the audio system according to the present invention, the optical microphone may be located on a headset including a nose pad, and the optical microphone may be integrated into the nose pad, and the light source may be configured to illuminate the user with the sensing beam nose skin.

在根据本发明的音频系统的实施例中,光学传声器可以位于包括眼镜框的头戴式视图器上,并且光学传声器可以集成到该眼镜框中,并且光源可以被配置为用感测光束照射用户的面部上的皮肤。除此之外,音频系统还可以包括第二光学传声器,该第二光学传声器可以集成到眼镜框上与第一光学传声器不同的位置,该第二光学传声器可以被配置为用第二感测光束照射用户的面部上的皮肤的与光学传声器不同的部分。In an embodiment of the audio system according to the present invention, the optical microphone may be located on a head-mounted viewer including a spectacle frame, and the optical microphone may be integrated into the spectacle frame, and the light source may be configured to illuminate the user with the sensing beam skin on the face. In addition, the audio system may further include a second optical microphone, which may be integrated into the spectacle frame at a different location than the first optical microphone, which may be configured to use the second sensing beam. A portion of the skin on the user's face that is different from the optical microphone is illuminated.

在根据本发明的音频系统的实施例中,光学传声器可以位于头戴式视图器上,并且音频系统还可以包括:传声器阵列,该传声器阵列位于头戴式视图器上,该传声器阵列被配置为对来自局部区域的声音进行检测;其中,音频控制器被进一步配置为使用检测到的声音来校准光学传声器。In an embodiment of the audio system according to the present invention, the optical microphone may be located on the headset, and the audio system may further include: a microphone array located on the headset, the microphone array being configured to Sound from the local area is detected; wherein the audio controller is further configured to use the detected sound to calibrate the optical microphone.

在根据本发明的音频系统的实施例中,光学传声器可以位于头戴式视图器上,并且音频系统还可以包括:传声器阵列,该传声器阵列位于头戴式视图器上,该传声器阵列被配置为对来自局部区域的声音进行检测;其中,音频控制器还可以被配置为部分地基于检测到的声音来增强所测量的声音。In an embodiment of the audio system according to the present invention, the optical microphone may be located on the headset, and the audio system may further include: a microphone array located on the headset, the microphone array being configured to Sound from a local area is detected; wherein the audio controller may also be configured to enhance the measured sound based in part on the detected sound.

在根据本发明的音频系统的实施例中,音频控制器还可以被配置为部分地基于所测量的声音来确定用户的面部的表情。In embodiments of the audio system according to the present invention, the audio controller may be further configured to determine an expression of the user's face based in part on the measured sound.

在根据本发明的音频系统的实施例中,光学传声器可以位于头戴式视图器上,并且音频控制器还可以被配置为:识别所测量的声音中的噪声;生成声音滤波器,以抑制所识别的噪声,以及应用声音滤波器来修改对应于音频内容的音频信号,其中,音频系统还可以包括:换能器阵列,该换能器阵列集成到头戴式视图器中,该换能器阵列被配置为将修改后的音频信号作为修改后的音频内容呈现给用户,该修改后的音频内容包括音频内容和抑制噪声的抑制分量。In an embodiment of the audio system according to the present invention, the optical microphone may be located on the headset, and the audio controller may be further configured to: identify noise in the measured sound; generate a sound filter to suppress the The identified noise, and applying a sound filter to modify the audio signal corresponding to the audio content, wherein the audio system may further include: a transducer array integrated into the headset, the transducer The array is configured to present the modified audio signal to the user as modified audio content, the modified audio content including the audio content and a suppression component that suppresses noise.

在根据本发明的音频系统的实施例中,其中,光学传声器可以位于头戴式视图器上,并且音频系统还可以包括:传声器阵列,该传声器阵列位于头戴式视图器上,该传声器阵列可以被配置为对来自局部区域的声音进行检测,该来自局部区域的声音可以包括音频系统的用户的语音;其中,音频控制器还可以被配置为:使用所测量的声音在检测到的声音中识别用户的语音;以及基于所识别的用户的语音来更新声音滤波器,其中,可以使用更新后的声音滤波器来修改音频内容,并且可以由至少一个音频系统来呈现修改后的音频内容。除此之外,更新后的声音滤波器还可以增强用户的语音,并且音频控制器还可以被配置为:用更新后的滤波器修改音频内容,其中,修改后的音频内容增强了用户的语音;以及将修改后的音频内容提供给第二音频系统,其中,该第二音频系统呈现修改后的音频内容。In an embodiment of the audio system according to the present invention, wherein the optical microphone may be located on the head-mounted viewer, and the audio system may further include: a microphone array located on the head-mounted viewer, the microphone array may Configured to detect sounds from a local area, where the sounds from the local area may include the voice of a user of the audio system; wherein the audio controller may also be configured to: use the measured sounds to identify among the detected sounds the user's voice; and updating the sound filter based on the recognized user's voice, wherein the updated sound filter can be used to modify the audio content, and the modified audio content can be presented by at least one audio system. In addition, the updated sound filter can also enhance the user's voice, and the audio controller can also be configured to: modify the audio content with the updated filter, wherein the modified audio content enhances the user's voice ; and providing the modified audio content to a second audio system, wherein the second audio system presents the modified audio content.

在根据本发明的音频系统的实施例中,其中,光学传声器可以位于头戴式视图器上,并且音频系统还可以包括:传声器阵列,该传声器阵列位于头戴式视图器上,该传声器阵列可以被配置为对来自局部区域的声音进行检测,该来自局部区域的声音可以包括音频系统的用户的语音;其中,音频控制器还可以被配置为:使用所测量的声音在检测到的声音中识别用户的语音;以及基于所识别的用户的语音来更新声音滤波器,其中,可以使用更新后的声音滤波器来修改音频内容,并且可以由至少一个音频系统来呈现修改后的音频内容。除此之外,更新后的声音滤波器可以增强用户的语音,并且音频控制器还可以被配置为:用更新后的滤波器修改音频内容,其中,修改后的音频内容可以增强用户的语音;确定修改后的音频内容包括命令;以及根据该命令来执行动作。In an embodiment of the audio system according to the present invention, wherein the optical microphone may be located on the head-mounted viewer, and the audio system may further include: a microphone array located on the head-mounted viewer, the microphone array may Configured to detect sounds from a local area, where the sounds from the local area may include the voice of a user of the audio system; wherein the audio controller may also be configured to: use the measured sounds to identify among the detected sounds the user's voice; and updating the sound filter based on the recognized user's voice, wherein the updated sound filter can be used to modify the audio content, and the modified audio content can be presented by at least one audio system. In addition, the updated sound filter can enhance the user's voice, and the audio controller can also be configured to: modify the audio content with the updated filter, wherein the modified audio content can enhance the user's voice; Determining that the modified audio content includes a command; and performing an action based on the command.

在一些实施例中,描述了一种用于使用作为音频系统的一部分的光学传声器的方法。从光学传声器的光源发射光。所发射的光包括参考光束和感测光束。来自局部区域的声音(例如,用户的语音、其他人等)在用户的皮肤中引起振动。用感测光束照射用户的皮肤(例如,面部的部分)。与光源处于干涉配置的检测器对混合信号进行检测。该混合信号对应于与感测光束的被皮肤反射的部分混合的参考光束。使用该混合信号来对声音进行测量。In some embodiments, a method for using an optical microphone as part of an audio system is described. Light is emitted from the optical microphone's light source. The emitted light includes a reference beam and a sensing beam. Sound from a local area (eg, the user's voice, other people, etc.) causes vibrations in the user's skin. The user's skin (eg, part of the face) is illuminated with a sensing beam. A detector in an interference configuration with the light source detects the mixed signal. This mixed signal corresponds to the reference beam mixed with the portion of the sensing beam that is reflected by the skin. Use this mixed signal to make measurements on the sound.

在根据本发明的方法的实施例中,该方法包括:从光学传声器的光源发射光,该光包括参考光束和感测光束;用感测光束照射用户的皮肤,其中,来自局部区域的声音在皮肤中引起振动;通过与光源处于干涉配置的检测器对混合信号进行检测,该混合信号对应于与感测光束的被皮肤反射的部分混合的参考光束;以及使用该混合信号对声音进行测量。In an embodiment of the method according to the present invention, the method includes: emitting light from a light source of an optical microphone, the light including a reference beam and a sensing beam; irradiating the user's skin with the sensing beam, wherein the sound from the local area is inducing vibrations in the skin; detecting a mixed signal corresponding to a reference beam mixed with a portion of the sensing beam reflected by the skin by a detector in an interference configuration with the light source; and measuring the sound using the mixed signal.

在根据本发明的方法的实施例中,可以部分地由用户的语音引起皮肤的振动,并且该方法还可以包括:通过传声器阵列对来自局部区域的声音进行检测;使用所测量的声音在检测到的声音中识别用户的语音;以及基于所识别的用户的语音来更新声音滤波器,其中,可以使用更新后的声音滤波器来修改音频内容,并且可以由至少一个音频系统来呈现修改后的音频内容。In an embodiment of the method according to the present invention, the vibration of the skin may be caused in part by the user's voice, and the method may further include: detecting sound from a local area through a microphone array; using the measured sound to detect identifying the user's voice in the voice; and updating the voice filter based on the recognized voice of the user, wherein the updated voice filter can be used to modify the audio content, and the modified audio can be presented by at least one audio system content.

在根据本发明的方法的实施例中,其中,干涉配置可以使得光源和检测器形成以下中的至少一个:自混合干涉仪、迈克尔逊干涉仪、低相干干涉系统、激光多普勒测振仪或某种其它类型的干涉系统。In an embodiment of the method according to the invention, wherein the interference configuration may be such that the light source and the detector form at least one of the following: a self-mixing interferometer, a Michelson interferometer, a low-coherence interference system, a laser Doppler vibrometer or some other type of interference system.

在根据本发明的方法的实施例中,所测量的声音可以包括用户的语音,并且该语音的高频分量可以相对于该语音的低频衰减,并且该方法还可以包括:对语音的高频分量进行重构;以及用重构的高频分量更新所测量的该语音的声音。In an embodiment of the method according to the present invention, the measured sound may include the user's voice, and the high-frequency component of the voice may be attenuated relative to the low-frequency component of the voice, and the method may further include: performing a reconstruction; and updating the measured sound of the speech with the reconstructed high-frequency components.

在一些实施例中,描述了一种非暂时性计算机可读介质,所述非暂时性计算机可读介质被配置为存储程序代码指令。所述指令在由音频系统的处理器执行时,促使所述音频系统执行上述方法和/或本文所描述的其它方法的步骤。In some embodiments, a non-transitory computer-readable medium configured to store program code instructions is described. The instructions, when executed by the processor of the audio system, cause the audio system to perform steps of the above methods and/or other methods described herein.

在根据本发明的计算机可读介质的实施例中,所述非暂时性计算机可读介质被配置为存储程序代码指令,所述程序代码指令在由音频系统的处理器执行时,促使所述音频系统执行包括以下步骤的步骤:从光学传声器的光源发射光,该光包括参考光束和感测光束;用感测光束照射用户的皮肤,其中,来自局部区域的声音在皮肤中引起振动;通过与光源处于干涉配置的检测器对混合信号进行检测,该混合信号对应于与感测光束的被皮肤反射的部分混合的参考光束;以及使用该混合信号对声音进行测量。In an embodiment of a computer-readable medium in accordance with the present invention, the non-transitory computer-readable medium is configured to store program code instructions that, when executed by a processor of an audio system, cause the audio The system performs steps including: emitting light from a light source of an optical microphone, the light including a reference beam and a sensing beam; irradiating the user's skin with the sensing beam, wherein the sound from the local area induces vibrations in the skin; and A detector with the light source in an interference configuration detects a mixed signal corresponding to a reference beam mixed with a portion of the sensing beam reflected by the skin; and the sound is measured using the mixed signal.

在一些实施例中,描述了一种计算机程序。所述计算机程序包括指令,所述指令在由音频系统的处理器执行时,促使所述音频系统执行上述方法和/或本文所描述的其它方法的步骤。In some embodiments, a computer program is described. The computer program includes instructions that, when executed by a processor of the audio system, cause the audio system to perform the steps of the methods described above and/or other methods described herein.

在一些实施例中,检测器和光源不处于干涉配置,并且光学传声器基于从皮肤反射和/或散射的光的强度调制(非干涉)来对局部的声音进行测量。In some embodiments, the detector and light source are not in an interference configuration, and the optical microphone measures localized sound based on intensity modulation of light reflected and/or scattered from the skin (non-interference).

注意,在一些实施例中,光学传声器还包括膜,并且光源用感测光束(或更一般地,来自光源的光)照射膜和/或膜片的一部分,而不是照射皮肤。在这些实施例中,来自局部区域的声音促使膜振动。因此,光学传声器通过监测由来自局部区域的声音引起的膜的振动来对局部区域中的声音进行测量。在此实施例中,光源和检测器可以处于干涉配置或非干涉配置。Note that in some embodiments, the optical microphone also includes a membrane, and the light source illuminates the membrane and/or a portion of the diaphragm with a sensing beam (or, more generally, light from the light source) instead of illuminating the skin. In these embodiments, sound from a localized area causes the membrane to vibrate. Optical microphones therefore measure sound in a local area by monitoring the vibrations of the membrane caused by the sound coming from the local area. In this embodiment, the light source and detector may be in an interference configuration or a non-interference configuration.

附图说明Description of the drawings

图1A为根据一个或多个实施例的被实现为眼镜设备的头戴式视图器的立体图,该头戴式视图器包括至少一个光学传声器。1A is a perspective view of a head-mounted viewer implemented as an eyewear device and including at least one optical microphone in accordance with one or more embodiments.

图1B为根据一个或多个实施例的被实现为头戴式显示器(head-mounteddisplay,HMD)的头戴式视图器的立体图,该头戴式视图器包括至少一个光学传声器。FIG. 1B is a perspective view of a head-mounted display (HMD) implemented as a head-mounted display (HMD) including at least one optical microphone in accordance with one or more embodiments.

图2为根据一个或多个实施例的音频系统的块图。Figure 2 is a block diagram of an audio system in accordance with one or more embodiments.

图3为根据一个或多个实施例的示例鼻托,该鼻托包括光学传声器,该光学传声器具有位于鼻托内的不同位置的光源和检测器。Figure 3 is an example nose pad including an optical microphone having a light source and detector located at different locations within the nose pad, in accordance with one or more embodiments.

图4为根据一个或多个实施例的被配置为自混合干涉仪的示例光学传声器。Figure 4 is an example optical microphone configured as a self-mixing interferometer in accordance with one or more embodiments.

图5A为根据一个或多个实施例的被配置为自混合干涉仪的光学传声器,该光学传声器的多个部件处于串联配置。Figure 5A is an optical microphone configured as a self-mixing interferometer with multiple components of the optical microphone in a series configuration in accordance with one or more embodiments.

图5B为根据一个或多个实施例的被配置为自混合干涉仪的光学传声器,该光学传声器的多个部件处于并联配置。Figure 5B is an optical microphone configured as a self-mixing interferometer with multiple components of the optical microphone in a parallel configuration in accordance with one or more embodiments.

图5C为根据一个或多个环境的包括两个自混合干涉仪的成对光学传声器的示例。Figure 5C is an example of a pair of optical microphones including two self-mixing interferometers according to one or more environments.

图6为根据一个或多个实施例的被配置为激光多普勒测振仪的光学传声器的示例。Figure 6 is an example of an optical microphone configured as a laser Doppler vibrometer in accordance with one or more embodiments.

图7为根据一个或多个实施例的被配置为使用光学相干层析成像(opticalcoherence tomography,OCT)的光学传声器的示例。Figure 7 is an example of an optical microphone configured to use optical coherence tomography (OCT) in accordance with one or more embodiments.

图8为示出了根据一个或多个实施例的使用处于干涉配置的光学接触式换能器的过程的流程图。8 is a flowchart illustrating a process for using an optical contact transducer in an interference configuration in accordance with one or more embodiments.

图9为根据一个或多个实施例的包括头戴式视图器的系统。Figure 9 is a system including a head-mounted viewer in accordance with one or more embodiments.

附图仅出于例示的目的描绘各种实施例。本领域技术人员将从以下论述中容易地认识到,在不脱离本文所描述的原理的情况下,可以采用本文所示出的结构和方法的替代实施例。The drawings depict various embodiments for purposes of illustration only. Those skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.

具体实施方式Detailed ways

一种音频系统,包括一个或多个光学传声器和音频控制器。在一些实施例中,音频系统是头戴式视图器的一部分,并且一个或多个光学传声器被定位为对由头戴式视图器的局部区域中的声音(例如,用户的语音、其他人、噪声源等)引起的头戴式视图器的用户的皮肤的振动进行监测。一个或多个光学传声器可以被定位在头戴式视图器上的一个或多个位置(例如,鼻托、眼镜框等)上。音频系统可以使用监测到的振动来对局部区域中的声音进行测量,并基于所测量的声音执行各种动作(例如,主动降噪(active noise cancellation)、用户语音增强、用作语音活动检测(voice activity detection,VAD)等)。注意,尽管以上是在一个或多个光学传声器位于头戴式视图器上的情况下。在其它实施例中,一个或多个光学传声器和/或音频系统可以位于其它设备(例如,项链、智能手表等)上。An audio system including one or more optical microphones and an audio controller. In some embodiments, the audio system is part of a headset, and one or more optical microphones are positioned to detect sounds in a local area of the headset (e.g., the user's voice, other people's voices, Vibrations of the headset user's skin caused by noise sources, etc.) are monitored. One or more optical microphones may be positioned at one or more locations on the headset (eg, nose pads, eyeglass frames, etc.). The audio system can use the detected vibrations to measure sound in a local area and perform various actions based on the measured sound (e.g., active noise cancellation, user voice enhancement, voice activity detection (e.g., active noise cancellation), user voice enhancement, etc. voice activity detection, VAD), etc.). Note that although the above is with one or more optical microphones on the headset. In other embodiments, one or more optical microphones and/or audio systems may be located on other devices (eg, necklaces, smart watches, etc.).

光学传声器监测用户的皮肤的振动。该光学传声器包括光源和检测器。该光源(例如,垂直腔面发射激光器(vertical cavity surface emitting laser,VCSEL))被配置为发射光。所发射的光的光带使得皮肤主要反射该光(相对于,例如,主要吸收它,在250到1800纳米(nm)之间)。在一些实施例中,所发射的光是连续波,并且包括参考光束和感测光束。来自音频系统的局部区域的声音在用户的皮肤中引起振动。光源被配置为用感测光束照射用户的皮肤(例如,面部的部分)。Optical microphones monitor the vibrations of the user's skin. The optical microphone includes a light source and a detector. The light source (eg, vertical cavity surface emitting laser (VCSEL)) is configured to emit light. The emitted light has a band of light such that the skin primarily reflects the light (as opposed to, for example, primarily absorbing it, between 250 and 1800 nanometers (nm)). In some embodiments, the emitted light is a continuous wave and includes a reference beam and a sensing beam. Sound from localized areas of the audio system causes vibrations in the user's skin. The light source is configured to illuminate the user's skin (eg, a portion of the face) with the sensing beam.

检测器与光源处于干涉配置,使得该检测器被配置为对混合信号进行检测。在干涉配置中,光学传声器成为基于干涉仪的系统,其中,来自激光的光和反射光之间的相长干涉或相消干涉提供了随变化的距离的变化而改变的信号。例如,850nm的波长可以提供低至10nm范围或以下的精度。干涉配置使得光源和检测器形成干涉系统(例如,自混合干涉仪、迈克尔逊干涉仪、低相干干涉测量(LCI)、激光多普勒测振仪(LDV)等)。混合信号对应于与感测光束的被皮肤反射的部分混合的参考光束。The detector is in an interference configuration with the light source such that the detector is configured to detect the mixed signal. In an interference configuration, the optical microphone becomes an interferometer-based system, where constructive or destructive interference between the light from the laser and the reflected light provides a signal that changes with varying distances. For example, a wavelength of 850nm can provide accuracy down to the 10nm range or below. The interference configuration enables the light source and detector to form an interference system (eg, self-mixing interferometer, Michelson interferometer, low coherence interferometry (LCI), laser Doppler vibrometer (LDV), etc.). The mixed signal corresponds to the reference beam mixed with the portion of the sensing beam reflected by the skin.

音频控制器对来自检测器的信息进行处理。该音频控制器被配置为使用该混合信号对声音进行测量。该音频控制器对混合信号进行分析,以测量在用户的皮肤中引起振动的部分或全部声音。该声音可以包括:例如,用户的语音、和/或局部区域内的其它声源(例如,其他人、噪声源(例如,风扇)等)。The audio controller processes the information from the detector. The audio controller is configured to use the mixed signal to measure sound. The audio controller analyzes the mixed signal to measure some or all of the sound that causes vibrations in the user's skin. The sound may include: for example, the user's voice, and/or other sound sources in the local area (eg, other people, noise sources (eg, fans), etc.).

在一些实施例中,音频系统还可以包括传声器阵列。该传声器阵列被配置为对来自局部区域的声音进行检测。该来自局部区域的声音可以包括:例如,音频系统的用户的语音、来自局部区域中的其它声源的声音或它们的某种组合。In some embodiments, the audio system may also include a microphone array. The microphone array is configured to detect sounds from a localized area. The sound from the local area may include, for example, the voice of a user of the audio system, sounds from other sound sources in the local area, or some combination thereof.

该音频系统可以部分地基于由一个或多个光学传声器测量的声音、由传声器阵列检测的声音或它们的某种组合来执行各种动作。多种动作可以例如包括:增强用户的语音、使用一个或多个光学传声器用于语音活动检测(VAD)、执行主动降噪、捕获用于识别用户微表情的信息、监测头戴式视图器的定位等。注意,在一些实施例中,各个光学传声器中的一些或全部光学传声器可以耦接到如下的振动衰减结构:该振动衰减结构耦接到头戴式视图器。该振动衰减结构减轻了从头戴式视图器(或更一般地,光学传声器所耦接到的设备)传递到光学传声器的振动。The audio system may perform various actions based in part on sound measured by one or more optical microphones, sound detected by a microphone array, or some combination thereof. Various actions may include, for example: enhancing the user's speech, using one or more optical microphones for voice activity detection (VAD), performing active noise reduction, capturing information for identifying the user's microexpressions, monitoring the headset's Positioning etc. Note that in some embodiments, some or all of the respective optical microphones may be coupled to a vibration attenuating structure that is coupled to the headset. The vibration attenuating structure reduces vibrations transmitted to the optical microphone from the headset (or, more generally, the device to which the optical microphone is coupled).

注意,在一些实施例中,光学器件包括膜,并且通过监测膜的振动来监测来自局部区域的声音,而不是监测用户的皮肤的振动。来自局部区域的声音促使膜振动。在此实施例中,光源被配置为照射膜的一部分,并且该膜对光的一部分进行散射和/或反射。该光源和检测器可以处于干涉配置或非干涉配置。该检测器对散射光和/或反射光进行检测。音频控制器使用来自检测器的信号来对来自局部区域的声音进行测量。Note that in some embodiments, the optics include a membrane, and sound from the local area is monitored by monitoring vibrations of the membrane rather than monitoring vibrations of the user's skin. Sound from a local area causes the membrane to vibrate. In this embodiment, the light source is configured to illuminate a portion of the film, and the film scatters and/or reflects a portion of the light. The light source and detector can be in an interference configuration or a non-interference configuration. The detector detects scattered light and/or reflected light. The audio controller uses the signal from the detector to measure the sound coming from a local area.

常规VAD在低声SNR环境(例如,嘈杂的拥挤的餐厅)中不能很好地工作。这些系统使用传声器来对来自局部区域的声音进行检测,并且然后尝试将用户的语音从低声SNR环境内分离出来。但是,在低声SNR环境中,由于穿戴者语音被其它声音(例如,其他人在拥挤的餐厅中说话)完全掩盖,这种方法经常失效。相比之下,本文所描述的音频系统使用一个或多个接触式光学传声器来对用户的皮肤上的振动进行监测,并使用这些振动来对声音进行测量。检测到的信号中的噪声比常规信号低得多,并且它允许对用户何时在说话进行可靠的识别。此外,由于在干涉系统中仅观察到相对距离变化,因此,对于检测器与光源处于干涉配置的实施例,在距离变化(例如,移动眼镜)的情况下,不需要校准绝对距离或不需要进行对准。此外,常规VAD(例如,具有振动膜的骨传导传声器)具有谐振频率,并且在该谐振频率附近或在该谐振频率以上使用常规VAD可能是困难的且不准确的。相比之下,光学传声器不具有移动或振动元件,因此,对于光学接触式传声器来说,常规VAD中的上述限制不是问题。Regular VADs don't work well in low SNR environments (e.g., noisy, crowded restaurants). These systems use microphones to detect sounds coming from a localized area and then attempt to separate the user's speech from the low SNR environment. However, in low SNR environments, this method often fails because the wearer's voice is completely masked by other sounds (for example, other people talking in a crowded restaurant). In contrast, the audio system described here uses one or more contact optical microphones to monitor vibrations on the user's skin and uses these vibrations to measure sound. The noise in the detected signal is much lower than conventional signals, and it allows reliable identification of when a user is speaking. Furthermore, since only relative distance changes are observed in interferometric systems, for embodiments in which the detector and light source are in an interference configuration, in the case of distance changes (e.g., moving glasses), no calibration of the absolute distance is required or required. alignment. Furthermore, conventional VADs (eg, bone conduction microphones with diaphragms) have a resonant frequency, and using conventional VADs near or above this resonant frequency can be difficult and inaccurate. In contrast, optical microphones have no moving or vibrating elements, so the above limitations in conventional VADs are not a problem for optical contact microphones.

本发明的实施例可以包括人工现实系统或可以结合人工现实系统来实现。人工现实是在呈现给用户之前已经以某种方式进行了调整的现实形式,人工现实可以包括例如虚拟现实(virtual reality,VR)、增强现实(augmented reality,AR)、混合现实(mixedreality,MR)、混杂现实(hybrid reality)或它们的某种组合和/或衍生物。人工现实内容可以包括完全生成的内容或与捕获的(例如,真实世界的)内容相结合的生成内容。人工现实内容可以包括视频、音频、触觉反馈或它们的某种组合,其中的任何一种都可以以单通道呈现或以多通道呈现(例如向观看者产生三维效果的立体视频)。此外,在一些实施例中,人工现实还可以与应用程序、产品、附件、服务或它们的某种组合相关联,这些应用程序、产品、附件、服务或它们的某种组合用于在人工现实中创建内容和/或以其它方式在人工现实中使用。提供人工现实内容的人工现实系统可以在各种平台上实现,这些平台包括连接到主计算机系统的可穿戴设备(例如,头戴式视图器)、独立的可穿戴设备(例如,头戴式视图器)、移动设备或计算系统、或能够向一个或多个观看者提供人工现实内容的任何其它硬件平台。Embodiments of the present invention may include artificial reality systems or may be implemented in conjunction with artificial reality systems. Artificial reality is a form of reality that has been adjusted in some way before being presented to the user. Artificial reality can include, for example, virtual reality (VR), augmented reality (AR), and mixed reality (MR). , hybrid reality or some combination and/or derivative thereof. Artificial reality content may include fully generated content or generated content combined with captured (eg, real-world) content. Artificial reality content can include video, audio, haptic feedback, or some combination thereof, any of which can be presented in a single channel or in multiple channels (such as stereoscopic video that produces a three-dimensional effect to the viewer). Furthermore, in some embodiments, artificial reality may also be associated with applications, products, accessories, services, or some combination thereof, which are used to operate in artificial reality. Create content in and/or otherwise use it in artificial realities. Artificial reality systems that provide artificial reality content can be implemented on a variety of platforms, including wearable devices (e.g., head-mounted viewers) connected to a host computer system, stand-alone wearable devices (e.g., head-mounted viewers) (server), mobile device or computing system, or any other hardware platform capable of delivering artificial reality content to one or more viewers.

图1A为根据一个或多个实施例的被实现为眼镜设备的头戴式视图器100的立体图。在一些实施例中,眼镜设备是近眼显示器(near eye display,NED)。通常,头戴式视图器100可以被穿戴在用户的面部上,使得内容(例如,媒体内容)是使用显示组件和/或音频系统来呈现的。然而,头戴式视图器100还可以以如下这种方式使用:使得以不同的方式向用户呈现媒体内容。由头戴式视图器100呈现的媒体内容的示例包括一个或多个图像、视频、音频或它们的某种组合。头戴式视图器100包括眼镜框,并且除了其它部件可以包括显示组件(该显示组件包括一个或多个显示元件120)、深度摄像头组件(depth cameraassembly,DCA)、音频系统以及定位传感器190。尽管图1A示出了在头戴式视图器100上的示例位置中的头戴式视图器100的部件,但是这些部件可以位于头戴式视图器100上的其它位置,位于与头戴式视图器100配对的外围设备上,或位于它们的某种组合。类似地,在头戴式视图器100上可以存在比图1A中示出的部件更多或更少的部件。FIG. 1A is a perspective view of a headset 100 implemented as an eyewear device in accordance with one or more embodiments. In some embodiments, the eyewear device is a near eye display (NED). Generally, the headset 100 may be worn on the user's face such that content (eg, media content) is presented using display components and/or audio systems. However, the headset 100 may also be used in such a manner that media content is presented to the user in a different manner. Examples of media content presented by headset 100 include one or more images, video, audio, or some combination thereof. The headset 100 includes eyeglass frames and may include, among other components, a display assembly including one or more display elements 120 , a depth camera assembly (DCA), an audio system, and a positioning sensor 190 . Although FIG. 1A shows components of the headset 100 in an example position on the headset 100 , these components may be located at other locations on the headset 100 , in conjunction with the headset 100 . on the peripheral device to which the processor 100 is paired, or on some combination thereof. Similarly, there may be more or fewer components on headset 100 than shown in FIG. 1A .

眼镜框110保持头戴式视图器100的其它部件。眼镜框110包括:保持一个或多个显示元件120的前部,以及附接到用户的头部的端件(例如,镜脚)。眼镜框110的前部跨过用户的鼻子的顶部。端件的长度可以是可调整的(例如,可调整的镜脚长度)以适合不同的用户。端件还可以包括在用户的耳朵后面弯曲的部分(例如,脚套、耳承)。Eyeglass frame 110 holds other components of headset 100 . Eyeglass frame 110 includes a front portion that holds one or more display elements 120, and end pieces (eg, temples) that attach to the user's head. The front of the eyeglass frame 110 spans the top of the user's nose. The length of the end piece may be adjustable (eg, adjustable temple length) to suit different users. The end piece may also include a portion that curves behind the user's ear (eg, foot cuff, earpiece).

一个或多个显示元件120向穿戴头戴式视图器100的用户提供光。如所示出的,对于用户的每只眼睛,头戴式视图器包括一显示元件120。在一些实施例中,显示元件120生成图像光,该图像光被提供到头戴式视图器100的适眼区(eyebox)。适眼区是用户在穿戴头戴式视图器100时眼睛占据的空间中的位置。例如,显示元件120可以是波导显示器。波导显示器包括光源(例如,二维源、一个或多个线源、一个或多个点源等)和一个或多个波导。来自光源的光被内耦合到一个或多个波导中,该一个或多个波导以如下这种方式输出光:使得在头戴式视图器100的适眼区中存在光瞳复制。来自一个或多个波导的光的内耦合和/或外耦合可以使用一个或多个衍射光栅来完成。在一些实施例中,波导显示器包括扫描元件(例如,波导、反射镜等),该扫描元件在来自光源的光被内耦合到一个或多个波导中时对该光进行扫描。注意,在一些实施例中,显示元件120中的一个或两个是不透明的,并且不透射来自头戴式视图器100周围的局部区域的光。该局部区域是头戴式视图器100周围的区域。例如,该局部区域可以是穿戴头戴式视图器100的用户处于其内部的房间,或穿戴头戴式视图器100的用户可能在外部并且该局部区域是外部区域。在这种背景下,头戴式视图器100生成VR内容。替代地,在一些实施例中,显示元件120中的一个或两个是至少部分透明的,使得来自局部区域的光可以与来自一个或多个显示元件的光组合,以产生AR内容和/或MR内容。One or more display elements 120 provide light to a user wearing headset 100 . As shown, the headset includes a display element 120 for each eye of the user. In some embodiments, display element 120 generates image light that is provided to an eyebox of headset 100 . The eye zone is the position in space occupied by the user's eyes when wearing the headset 100 . For example, display element 120 may be a waveguide display. A waveguide display includes a light source (eg, a two-dimensional source, one or more line sources, one or more point sources, etc.) and one or more waveguides. Light from the light source is incoupled into one or more waveguides that output the light in such a manner that there is pupil replication in the eye zone of the headset 100 . In-coupling and/or out-coupling of light from one or more waveguides may be accomplished using one or more diffraction gratings. In some embodiments, a waveguide display includes a scanning element (eg, waveguide, mirror, etc.) that scans light from a light source as the light is incoupled into one or more waveguides. Note that in some embodiments, one or both of the display elements 120 are opaque and do not transmit light from a localized area around the headset 100 . The local area is the area around the headset 100 . For example, the local area may be a room inside which the user wearing headset 100 is, or the user wearing headset 100 may be outside and the local area is an external area. In this context, the headset 100 generates VR content. Alternatively, in some embodiments, one or both of the display elements 120 are at least partially transparent such that light from the localized area can be combined with light from one or more display elements to produce AR content and/or MR content.

在一些实施例中,显示元件120不生成图像光,而是透镜将来自局部区域的光传输到适眼区。例如,显示元件120中的一个或两个可以是未矫正(非处方用)的透镜或处方用透镜(例如,单光透镜、双焦和三焦透镜或渐进式透镜),以帮助矫正用户的视力中的缺陷。在一些实施例中,显示元件120可以是偏光的和/或有色的,以保护用户的眼睛免受太阳伤害。In some embodiments, display element 120 does not generate image light, but rather the lens transmits light from a localized area to an eye-friendly area. For example, one or both of the display elements 120 may be uncorrected (non-prescription) lenses or prescription lenses (eg, single vision lenses, bifocal and trifocal lenses, or progressive lenses) to help correct the user's vision. defects in. In some embodiments, display element 120 may be polarized and/or tinted to protect the user's eyes from sun damage.

在一些实施例中,显示元件120可以包括附加的光学器件块(optics block)(未示出)。光学器件块可以包括一个或多个光学元件(例如,透镜、菲涅耳透镜等),该一个或多个光学元件将来自显示元件120的光引导到适眼区。光学器件块可以例如校正一些或全部图像内容中的像差、放大一些或全部图像或它们的某种组合。In some embodiments, display element 120 may include additional optics blocks (not shown). The optics block may include one or more optical elements (eg, lenses, Fresnel lenses, etc.) that direct light from display element 120 to an eye zone. The optics block may, for example, correct aberrations in some or all of the image content, amplify some or all of the image, or some combination thereof.

DCA确定头戴式视图器100周围的局部区域的一部分的深度信息。DCA包括一个或多个成像设备130和DCA控制器(在图1A中未示出),并且还可以包括照明器140。在一些实施例中,照明器140用光照射局部区域的一部分。该光可以是例如红外(infrared,IR)结构光(例如,点状图案结构光、条形结构光等)、用于飞行时间(time-of-flight,ToF)的IR闪光等。在一些实施例中,该一个或多个成像设备130捕获局部区域的包括来自照明器140的光的部分的图像。如所示出的,图1A示出了单个照明器140和两个成像设备130。在替代实施例中,不存在照明器140且存在至少两个成像设备130。DCA determines depth information for a portion of a local area around the headset 100 . The DCA includes one or more imaging devices 130 and a DCA controller (not shown in Figure 1A), and may also include an illuminator 140. In some embodiments, illuminator 140 illuminates a portion of the local area with light. The light may be, for example, infrared (IR) structured light (eg, point pattern structured light, strip structured light, etc.), IR flash for time-of-flight (ToF), etc. In some embodiments, the one or more imaging devices 130 capture images of portions of the local area that include light from the illuminator 140 . As shown, Figure 1A shows a single illuminator 140 and two imaging devices 130. In an alternative embodiment, there is no illuminator 140 and at least two imaging devices 130 are present.

DCA控制器使用捕获的图像和一种或多种深度确定技术来计算局部区域的一部分的深度信息。深度确定技术可以是例如直接飞行时间(ToF)深度感测、间接ToF深度感测、结构光、被动立体分析、主动立体分析(使用由来自照明器140的光添加到场景的纹理)、确定场景的深度的某种其它技术或它们的某种组合。The DCA controller uses the captured image and one or more depth determination techniques to calculate depth information for a portion of the local area. Depth determination techniques may be, for example, direct time-of-flight (ToF) depth sensing, indirect ToF depth sensing, structured light, passive stereo analysis, active stereo analysis (using texture added to the scene by light from the illuminator 140), scene determination some other technology or some combination of them.

音频系统提供了音频内容。该音频系统包括换能器阵列、传感器阵列、一个或多个光学传声器145以及音频控制器150。然而,在其它实施例中,音频系统可以包括不同的部件和/或附加的部件。类似地,在一些情况下,参考音频系统的多个部件描述的功能可以以与此处描述的方式不同的方式分布在该多个部件之中。例如,音频控制器的一些或全部功能可以由远程服务器执行。The audio system provides the audio content. The audio system includes a transducer array, a sensor array, one or more optical microphones 145, and an audio controller 150. However, in other embodiments, the audio system may include different components and/or additional components. Similarly, in some cases, functionality described with reference to multiple components of an audio system may be distributed among the multiple components in a manner different from that described herein. For example, some or all functions of the audio controller may be performed by the remote server.

换能器阵列向用户呈现声音。该换能器阵列包括多个换能器。换能器可以是扬声器160或组织换能器170(例如,骨传导换能器或软骨传导换能器)。尽管扬声器160被示出位于眼镜框110的外部,但是扬声器160可以被封入在眼镜框110中。在一些实施例中,头戴式视图器100包括扬声器阵列而不是用于每个耳朵的单独的扬声器,该扬声器阵列包括集成到眼镜框110中的多个扬声器,以改善所呈现的音频内容的方向性。组织换能器170耦合到用户的头部,并直接振动用户的组织(例如,骨或软骨),以生成声音。换能器的数量和/或位置可以与图1A中示出的数量和/或位置不同。The transducer array presents sound to the user. The transducer array includes a plurality of transducers. The transducer may be a speaker 160 or a tissue transducer 170 (eg, a bone conduction transducer or a cartilage conduction transducer). Although speaker 160 is shown external to eyeglass frame 110 , speaker 160 may be enclosed within eyeglass frame 110 . In some embodiments, instead of a separate speaker for each ear, the headset 100 includes a speaker array that includes multiple speakers integrated into the eyeglass frame 110 to improve the presentation of the audio content. Directionality. Tissue transducer 170 couples to the user's head and directly vibrates the user's tissue (eg, bone or cartilage) to generate sound. The number and/or location of transducers may differ from that shown in Figure 1A.

传感器阵列对头戴式视图器100的局部区域内的声音进行检测。该传感器阵列包括多个声学传感器180。声学传感器180捕获从局部区域(例如,房间)中的一个或多个声源发出的声音。每个声学传感器被配置为检测声音并将检测到的声音转换为电子格式(模拟的或数字的)。声学传感器180可以是声波传感器、传声器、声音换能器或适于检测声音的类似传感器。The sensor array detects sounds within a local area of the headset 100 . The sensor array includes a plurality of acoustic sensors 180 . Acoustic sensor 180 captures sound emitted from one or more sound sources in a local area (eg, a room). Each acoustic sensor is configured to detect sound and convert the detected sound into an electronic format (analog or digital). Acoustic sensor 180 may be an acoustic wave sensor, a microphone, a sound transducer, or a similar sensor suitable for detecting sound.

在一些实施例中,一个或多个声学传感器180可以被放置在每个耳朵的耳道中(例如,充当双耳传声器)。在一些实施例中,声学传感器180可以被放置在头戴式视图器100的外表面上、被放置在头戴式视图器100的内表面上、与头戴式视图器100分离(例如,为某种其它设备的一部分)或它们的某种组合。声学传感器180的数量和/或位置可以与图1A中示出的数量和/或位置不同。例如,可以增加声学检测位置的数量,以增加收集到的音频信息的量以及信息的灵敏度和/或准确性。声学检测位置可以被定向为使得传声器能够对穿戴头戴式视图器100的用户周围的宽范围方向上的声音进行检测。In some embodiments, one or more acoustic sensors 180 may be placed in the ear canal of each ear (eg, acting as binaural microphones). In some embodiments, acoustic sensor 180 may be placed on an outer surface of headset 100 , placed on an inner surface of headset 100 , separate from headset 100 (e.g., part of some other device) or some combination thereof. The number and/or location of acoustic sensors 180 may differ from that shown in Figure 1A. For example, the number of acoustic detection locations can be increased to increase the amount of audio information collected as well as the sensitivity and/or accuracy of the information. The acoustic detection locations may be oriented so that the microphones can detect sounds in a wide range of directions around a user wearing headset 100 .

在一些实施例中,一个或多个光学传声器145对由局部区域中的声音引起的皮肤的基于组织的振动进行检测。这些声音可以例如包括:用户的语音、以及来自局部区域中的其它声源的声音。例如,当用户说话时,语音的一部分实际上经由组织传导传输通过用户的组织。该语音的一部分在用户的头部的皮肤上表现为轻微的基于组织的振动。一个或多个光学传声器145对这些基于组织的振动进行检测。类似地,用户外部的声源(例如,风扇、其他说话者等)产生声音,该声音也可以在用户的皮肤上表现为振动。光学传声器145包括至少一个光源和至少一个检测器,并且可以可选地包括一个或多个光学元件。可以以多种方式配置该光学传声器145。例如,光源和检测器可以被配置为处于串联配置或并联配置(例如,如下文关于图5A和图5B所描述的)。并且在一些情况下,光学传声器可以是包括至少两个自混合干涉仪的成对光学传声器(例如,如下文关于图5C所描述的)。并且在一些情况下,光学传声器的光源和检测器可以位于头戴式视图器100上的不同位置(例如,鼻托内的不同定位、眼镜框110上的不同定位等)。检测器和光源可以位于不同或同一管芯(die)上,这例如取决于它们是双路干涉仪还是共路干涉仪,在这种情况下,它们之间的阈值距离由干涉仪臂长确定。In some embodiments, one or more optical microphones 145 detect tissue-based vibrations of the skin caused by sound in a localized area. These sounds may include, for example, the user's voice and sounds from other sound sources in the local area. For example, when a user speaks, part of the speech is actually transmitted through the user's tissue via tissue conduction. Part of this speech appears as slight tissue-based vibrations on the skin of the user's head. One or more optical microphones 145 detect these tissue-based vibrations. Similarly, sound sources external to the user (e.g., fans, other speakers, etc.) produce sound, which may also manifest as vibrations on the user's skin. Optical microphone 145 includes at least one light source and at least one detector, and may optionally include one or more optical elements. The optical microphone 145 can be configured in a variety of ways. For example, the light source and detector may be configured in a series configuration or a parallel configuration (eg, as described below with respect to Figures 5A and 5B). And in some cases, the optical microphone may be a pair of optical microphones including at least two self-mixing interferometers (eg, as described below with respect to Figure 5C). And in some cases, the light source and detector of the optical microphone may be located at different locations on the headset 100 (eg, different positioning within the nose pad, different positioning on the eyeglass frame 110, etc.). The detector and light source can be on different or the same die, depending for example on whether they are dual or common interferometers, in which case the threshold distance between them is determined by the interferometer arm length .

光源被配置为发射光。该光源可以是:例如,垂直腔面发射激光器(VCSEL)、边缘发射激光器、可调谐激光器、某种其它相干光源或它们的某种组合。所发射的光的光带使得皮肤主要反射该光(相对于,例如,主要吸收它)。一个或多个光学传声器145可以发射例如850nm、940nm、1300nm、1050nm等的光。所发射的光是连续波,并且在一些实施例中,包括参考光束和感测光束。在一些实施例中,光源被配置为(例如,用感测光束)照射用户的皮肤(例如,面部的一个或多个相同部分或不同部分)。The light source is configured to emit light. The light source may be, for example, a vertical cavity surface emitting laser (VCSEL), an edge emitting laser, a tunable laser, some other coherent light source, or some combination thereof. The band of light emitted is such that the skin mainly reflects the light (as opposed to, for example, mainly absorbing it). One or more optical microphones 145 may emit light at, for example, 850 nm, 940 nm, 1300 nm, 1050 nm, etc. The emitted light is a continuous wave and, in some embodiments, includes a reference beam and a sensing beam. In some embodiments, the light source is configured to illuminate (eg, with a sensing beam) the user's skin (eg, one or more of the same or different portions of the face).

检测器对由光源发射的光的带中的光进行监测。该检测器可以是例如一个或多个光电检测器。在一些实施例中,检测器和光源被配置为形成干涉系统(例如,自混合干涉仪、迈克尔逊干涉仪、LCI(例如,光学相干层析成像)、LDV等)。因此,检测器被配置为对混合信号进行检测,该混合信号对应于与感测光束的(例如,经由菲涅耳反射和/或散射反射)被皮肤反射的部分混合的参考光束。在其它实施例中,光源和检测器处于非干涉配置。在此配置中,检测器对来自皮肤的反射光和/或散射光的强度调制进行测量。The detector monitors light in the band of light emitted by the light source. The detector may be, for example, one or more photodetectors. In some embodiments, the detector and light source are configured to form an interferometric system (eg, self-mixing interferometer, Michelson interferometer, LCI (eg, optical coherence tomography), LDV, etc.). The detector is therefore configured to detect a mixed signal corresponding to the reference beam mixed with a portion of the sensing beam that is reflected by the skin (eg via Fresnel reflection and/or scattering reflection). In other embodiments, the light source and detector are in a non-interference configuration. In this configuration, the detector measures the intensity modulation of reflected and/or scattered light from the skin.

在替代实施例中,光学传声器145包括膜,并且监测膜的振动,而不是监测用户的皮肤的振动。在此实施例中,光源被配置为照射膜的一部分,并且该膜对光的一部分进行散射和/或反射。在一些实施例中,该膜中至少被光源照射的部分在由光源发射的光的带中是高度反射的。光源和检测器可以处于干涉配置或非干涉配置。该检测器对来自膜的散射光和/或反射光进行检测,输出的信号(例如,混合信号、调制强度)可以用于监测局部区域中的声音。In an alternative embodiment, the optical microphone 145 includes a membrane, and the vibrations of the membrane are monitored rather than the vibrations of the user's skin. In this embodiment, the light source is configured to illuminate a portion of the film, and the film scatters and/or reflects a portion of the light. In some embodiments, at least the portion of the film illuminated by the light source is highly reflective in the band of light emitted by the light source. The light source and detector can be in an interference configuration or a non-interference configuration. The detector detects scattered and/or reflected light from the membrane, and the output signal (eg, mixed signal, modulation intensity) can be used to monitor sound in a local area.

在图1A所示的示例中,光学传声器145位于眼镜框110的如下区域中:该区域将与穿戴头戴式视图器100的用户的鼻子的一部分接触。例如,光学传声器145可以集成到一副眼镜的一个或两个鼻托中。在其它实施例中,多个光学传声器145中的一个或多个光学传声器可以替代地或附加地位于头戴式视图器100上的其它位置,和/或在头戴式视图器100上可以存在一个或多个附加的光学传声器145。例如,一个或多个光学传声器145中的一些或全部光学传声器可以被定位在眼镜框110的朝内侧上,位于侧向辐射定位(side fireposition)147A、147B、147C、147D和/或中梁定位148中的一些或全部定位处。下文关于图2、图3、图4和图5A至图5C对光学传声器145的各种实施例进行描述。In the example shown in FIG. 1A , optical microphone 145 is located in an area of spectacle frame 110 that would come into contact with a portion of the nose of a user wearing headset 100 . For example, optical microphone 145 may be integrated into one or both nose pads of a pair of glasses. In other embodiments, one or more of the plurality of optical microphones 145 may alternatively or additionally be located at other locations on the headset 100 and/or may be present on the headset 100 One or more additional optical microphones 145. For example, some or all of the one or more optical microphones 145 may be positioned on the inward side of the eyeglass frame 110 at side fire positions 147A, 147B, 147C, 147D and/or at the center beam position. Some or all of the 148 locations. Various embodiments of optical microphone 145 are described below with respect to Figures 2, 3, 4, and 5A-5C.

音频控制器150对来自一个或多个光学传声器145的检测器的、检测到的一个或多个混合信号进行处理,以测量来自局部区域的声音。音频控制器150可以对来自各个光学传声器145中的一些或全部光学传声器的混合信号进行分析,以测量在用户的皮肤中引起振动的部分或全部声音。该声音可以包括:例如,用户的语音、和/或局部区域内的其它声源(例如,其他人、噪声源(例如,风扇)等)。The audio controller 150 processes the detected mixed signal(s) from the detector(s) of the optical microphone(s) 145 to measure sound from the local area. Audio controller 150 may analyze the mixed signal from some or all of the various optical microphones 145 to measure some or all of the sound that causes vibrations in the user's skin. The sound may include: for example, the user's voice, and/or other sound sources in the local area (eg, other people, noise sources (eg, fans), etc.).

音频控制器150可以包括处理器和计算机可读存储介质。音频控制器150可以被配置为生成波达方向(direction of arrival,DOA)估计、生成声学传递函数(例如,阵列传递函数(array transfer function,ATF)和/或头相关传递函数(head-related transferfunction,HRTF))、追踪声源的位置、在声源的方向上形成波束、对声源进行分类、生成用于换能器阵列的声音滤波器、指示换能器阵列执行主动降噪、识别用户的语音、基于所识别的用户的语音来识别并执行命令、捕获可以用于识别用户微表情的信息或它们的某种组合。关于下文论述的附图对关于音频控制器150可以如何使用检测到的组织振动的附加细节进行详细描述。Audio controller 150 may include a processor and computer-readable storage media. Audio controller 150 may be configured to generate a direction of arrival (DOA) estimate, generate an acoustic transfer function (eg, array transfer function (ATF)) and/or a head-related transfer function (ATF). , HRTF)), track the location of the sound source, form a beam in the direction of the sound source, classify the sound source, generate a sound filter for the transducer array, instruct the transducer array to perform active noise reduction, identify the user voice, recognize and execute commands based on the recognized user's voice, capture information that can be used to identify the user's micro-expressions, or some combination thereof. Additional details regarding how audio controller 150 may use detected tissue vibrations are detailed with respect to the figures discussed below.

定位传感器190响应于头戴式视图器100的运动而生成一个或多个测量信号。定位传感器190可以位于头戴式视图器100的眼镜框110的一部分上。定位传感器190可以包括惯性测量单元(inertial measurement unit,IMU)。定位传感器190的示例包括:一个或多个加速度计、一个或多个陀螺仪、一个或多个磁力计、检测运动的另一合适类型的传感器、用于IMU的误差校正的一种类型的传感器、或它们的某种组合。定位传感器190可以位于IMU外部、IMU内部或它们的某种组合。Positioning sensor 190 generates one or more measurement signals in response to movement of headset 100 . Positioning sensor 190 may be located on a portion of eyeglass frame 110 of headset 100 . The positioning sensor 190 may include an inertial measurement unit (IMU). Examples of positioning sensors 190 include: one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor that detects motion, a type of sensor used for error correction of the IMU , or some combination of them. Positioning sensor 190 may be located external to the IMU, internal to the IMU, or some combination thereof.

在一些实施例中,头戴式视图器100可以针对头戴式视图器100的定位以及局部区域的模型的更新而提供同步定位与地图构建(simultaneous localization and mapping,SLAM)。例如,头戴式视图器100可以包括生成彩色图像数据的无源摄像头组件(passivecamera assembly,PCA)。PCA可以包括检测局部区域中的部分或全部区域的图像的一个或多个RGB摄像头。在一些实施例中,DCA的成像设备130中的一些或全部成像设备也可以充当PCA。由PCA检测的图像和由DCA确定的深度信息可以用于确定局部区域的参数、生成局部区域的模型、更新局部区域的模型或它们的某种组合。此外,定位传感器190追踪头戴式视图器100在房间内的定位(例如,位置和姿态)。下文结合图4对关于头戴式视图器100的多个部件的附加细节进行论述。In some embodiments, the headset 100 may provide simultaneous localization and mapping (SLAM) for positioning the headset 100 and updating the model of the local area. For example, the headset 100 may include a passive camera assembly (PCA) that generates color image data. PCA may include one or more RGB cameras that detect images of some or all of the local area. In some embodiments, some or all of the DCA's imaging devices 130 may also serve as PCAs. The image detected by PCA and the depth information determined by DCA can be used to determine parameters of the local area, generate a model of the local area, update a model of the local area, or some combination thereof. Additionally, positioning sensor 190 tracks the positioning (eg, position and attitude) of headset 100 within the room. Additional details regarding various components of headset 100 are discussed below in conjunction with FIG. 4 .

图1B为根据一个或多个实施例的被实现为HMD的头戴式视图器105的立体图。在描述AR系统和/或MR系统的实施例中,HMD的前侧的多个部分在可见波段(~380nm至750nm)中是至少部分透明的,并且HMD中的位于HMD的前侧与用户的眼睛之间的多个部分是至少部分透明的(例如,部分透明的电子显示器)。HMD包括前部刚性体115和带175。头戴式视图器105包括许多上文参考图1A描述的相同的部件,但是这些部件被修改以与HMD形状因子(formfactor)结合。例如,HMD包括显示组件、DCA、音频系统(该音频系统包括一个或多个光学传声器145)、以及定位传感器190。图1B示出了照明器140、多个扬声器160、多个成像设备130、多个声学传感器180以及定位传感器190。这些扬声器160可以位于各种位置:例如,被耦接到带175(如所示出的)、被耦接到前部刚性体115,或可以被配置为插入用户的耳道内。FIG. 1B is a perspective view of a head-mounted viewer 105 implemented as an HMD, in accordance with one or more embodiments. In embodiments describing AR systems and/or MR systems, portions of the front side of the HMD are at least partially transparent in the visible band (~380 nm to 750 nm), and portions of the HMD located on the front side of the HMD are in contact with the user's The portions between the eyes are at least partially transparent (eg, a partially transparent electronic display). The HMD includes a front rigid body 115 and a belt 175 . The headset 105 includes many of the same components described above with reference to Figure 1A, but these components are modified to incorporate the HMD form factor. For example, the HMD includes a display component, a DCA, an audio system including one or more optical microphones 145, and a positioning sensor 190. FIG. 1B shows an illuminator 140, a plurality of speakers 160, a plurality of imaging devices 130, a plurality of acoustic sensors 180, and a positioning sensor 190. These speakers 160 may be located in various locations: for example, coupled to the strap 175 (as shown), coupled to the front rigid body 115 , or may be configured to be inserted into the user's ear canal.

图2为根据一个或多个实施例的音频系统200的块图。图1A或图1B中的音频系统可以是音频系统200的实施例。音频系统200为用户生成一个或多个声学传递函数。然后,音频系统200可以使用该一个或多个声学传递函数来为用户生成音频内容。在图2的实施例中,音频系统200包括换能器阵列210、传感器阵列220、光学传声器组件222、以及音频控制器230。音频系统200的一些实施例具有与此处所描述的部件不同的部件。类似地,在一些情况下,多种功能可以以与此处所描述的方式不同的方式分布在多个部件之中。Figure 2 is a block diagram of an audio system 200 in accordance with one or more embodiments. The audio system in FIG. 1A or FIG. 1B may be an embodiment of audio system 200. Audio system 200 generates one or more acoustic transfer functions for the user. Audio system 200 may then use the one or more acoustic transfer functions to generate audio content for the user. In the embodiment of FIG. 2 , audio system 200 includes transducer array 210 , sensor array 220 , optical microphone assembly 222 , and audio controller 230 . Some embodiments of audio system 200 have different components than those described herein. Similarly, in some cases, functionality may be distributed among multiple components in a manner different from that described herein.

换能器阵列210被配置为呈现音频内容。换能器阵列210包括多个换能器。换能器是提供音频内容的设备。换能器可以是:例如,扬声器(例如,扬声器160)、组织换能器(例如,组织换能器170)、提供音频内容的某种其它设备、或它们的某种组合。组织换能器可以被配置为起到骨传导换能器或软骨传导换能器的作用。换能器阵列210可以经由以下方式来呈现音频内容:经由空气传导(例如,经由一个或多个扬声器)、经由骨传导(经由一个或多个骨传导换能器)、经由软骨传导音频系统(经由一个或多个软骨传导换能器)、或它们的某种组合。在一些实施例中,换能器阵列210可以包括一种或多种换能器,以覆盖频率范围的不同部分。例如,压电换能器可以用于覆盖频率范围的第一部分,并且动圈式换能器可以用于覆盖频率范围的第二部分。Transducer array 210 is configured to present audio content. Transducer array 210 includes a plurality of transducers. A transducer is a device that provides audio content. The transducer may be, for example, a speaker (eg, speaker 160), a tissue transducer (eg, tissue transducer 170), some other device that provides audio content, or some combination thereof. The tissue transducer may be configured to function as a bone conduction transducer or a cartilage conduction transducer. Transducer array 210 may present audio content via air conduction (e.g., via one or more speakers), via bone conduction (via one or more bone conduction transducers), via cartilage conduction audio systems (e.g., via one or more speakers). via one or more cartilage conduction transducers), or some combination thereof. In some embodiments, transducer array 210 may include one or more transducers to cover different portions of the frequency range. For example, piezoelectric transducers may be used to cover a first part of the frequency range, and moving coil transducers may be used to cover a second part of the frequency range.

骨传导换能器通过使用户头部中的骨/组织振动来生成声压波。骨传导换能器可以耦接到头戴式视图器的一部分,并且可以被配置为位于连接到用户的颅骨的一部分的耳廓之后。骨传导换能器接收来自音频控制器230的振动指令,并且基于接收到的指令使用户的颅骨的一部分振动。来自骨传导换能器的振动生成组织承受的声压波,该声压波绕过耳膜朝向用户的耳蜗传播。Bone conduction transducers generate sound pressure waves by causing bone/tissue in the user's head to vibrate. The bone conduction transducer may be coupled to a portion of the headset and may be configured behind an auricle connected to a portion of the user's skull. The bone conduction transducer receives vibration instructions from the audio controller 230 and vibrates a portion of the user's skull based on the received instructions. Vibrations from the bone conduction transducer generate sound pressure waves experienced by the tissue that travel around the eardrum toward the user's cochlea.

软骨传导换能器通过使用户的耳朵的耳廓软骨的一个或多个部分振动来生成声压波。软骨传导换能器可以耦接到头戴式视图器的一部分,并且可以被配置为耦接到耳朵的耳廓软骨的一个或多个部分。例如,软骨传导换能器可以耦接到用户的耳朵的耳廓的背面。软骨传导换能器可以位于沿着外耳周围的耳廓软骨的任何位置(例如,耳廓、耳屏、耳廓软骨的某个其它部分、或它们的某种组合)。使耳廓软骨的一个或多个部分振动可以生成:耳道以外的空气传播的声压波;组织产生的声压波,该组织产生的声压波促使耳道的一些部分振动,从而生成耳道内的空气传播的声压波;或它们的某种组合。所生成的空气传播的声压波顺着耳道朝向耳膜传播。Cartilage conduction transducers generate sound pressure waves by vibrating one or more portions of the auricle cartilage of the user's ear. The cartilage conduction transducer may be coupled to a portion of the headset and may be configured to couple to one or more portions of the auricular cartilage of the ear. For example, a cartilage conduction transducer may be coupled to the back of the pinna of the user's ear. The cartilage conduction transducer may be located anywhere along the auricular cartilage surrounding the external ear (eg, the pinna, the tragus, some other portion of the auricular cartilage, or some combination thereof). Vibrating one or more parts of the auricle cartilage can generate: air-borne sound pressure waves outside the ear canal; sound pressure waves generated by tissue that causes parts of the ear canal to vibrate, thereby generating Sound pressure waves propagated through the air in the tunnel; or some combination thereof. The resulting airborne sound pressure wave travels down the ear canal toward the eardrum.

换能器阵列210根据来自音频控制器230的指令生成音频内容。在一些实施例中,音频内容被空间化。空间化的音频内容是似乎源自特定方向和/或目标区域(例如,局部区域中的对象、和/或虚拟对象)的音频内容。例如,空间化的音频内容可以使得对于音频系统200的用户来说声音似乎是源自房间对面的虚拟歌手。换能器阵列210可以耦接到可穿戴设备(例如,头戴式视图器100或头戴式视图器105)。在替代实施例中,换能器阵列210可以是与可穿戴设备分离(例如,耦接到外部控制台)的多个扬声器。Transducer array 210 generates audio content based on instructions from audio controller 230. In some embodiments, the audio content is spatialized. Spatialized audio content is audio content that appears to originate from a specific direction and/or target area (eg, objects in a local area, and/or virtual objects). For example, spatialized audio content may make the sound appear to a user of audio system 200 to be originating from a virtual singer across the room. Transducer array 210 may be coupled to a wearable device (eg, headset 100 or headset 105). In alternative embodiments, transducer array 210 may be a plurality of speakers separate from the wearable device (eg, coupled to an external console).

传感器阵列220对该传感器阵列220周围的局部区域内的声音进行检测。检测到的声音可以例如来自音频系统200的用户(例如,用户的语音)、和/或可以是来自局部区域中的其它声源(例如,其他人)的声音。传感器阵列220可以包括多个声学传感器,该多个声学传感器中的每个声学传感器对声波的气压变化进行检测,并将检测到的声音转换为电子格式(模拟的或数字的)。该多个声学传感器可以被定位在头戴式视图器(例如,头戴式视图器100和/或头戴式视图器105)上、被定位在用户上(例如,在用户的耳道中)、被定位在颈带上、或它们的某种组合。声学传感器可以是例如传声器、振动传感器、加速度计或它们的任何组合。在一些实施例中,传感器阵列220被配置为使用多个声学传感器中的至少一些声学传感器来监测由换能器阵列210生成的音频内容。增加传感器的数量可以提高如下信息的准确性:该信息(例如,方向性)描述了由换能器阵列210产生的声场、和/或来自局部区域的声音。The sensor array 220 detects sound in a local area around the sensor array 220 . The detected sounds may, for example, come from a user of audio system 200 (eg, the user's voice), and/or may be sounds from other sound sources in the local area (eg, other people). Sensor array 220 may include a plurality of acoustic sensors, each of which detects changes in air pressure of sound waves and converts the detected sound into an electronic format (analog or digital). The plurality of acoustic sensors may be positioned on the headset (eg, headset 100 and/or headset 105 ), on the user (eg, in the user's ear canal), be positioned on the neckband, or some combination thereof. An acoustic sensor may be, for example, a microphone, a vibration sensor, an accelerometer, or any combination thereof. In some embodiments, sensor array 220 is configured to monitor audio content generated by transducer array 210 using at least some of the plurality of acoustic sensors. Increasing the number of sensors can improve the accuracy of information (eg, directivity) describing the sound field generated by the transducer array 210, and/or the sound coming from a local area.

在一些实施例中,光学传声器组件222被配置为对由局部区域中的声音引起的皮肤的基于组织的(即,暂时的)振动进行检测。该光学传声器组件222包括多个光学传声器145中的一个或多个光学传声器。如上文关于图1A所描述的,光学传声器145包括至少一个光源和至少一个检测器,并且可以可选地包括一个或多个光学元件(例如,透镜)。光学传声器145可以以多种方式进行配置(例如,如下文关于图3至图5C所描述的)。来自每个光学传声器145的输出的信号(例如,混合信号、调制强度)可以用于监测局部区域中的声音。In some embodiments, optical microphone assembly 222 is configured to detect tissue-based (ie, temporary) vibrations of the skin caused by sound in a localized area. The optical microphone assembly 222 includes one or more of the plurality of optical microphones 145 . As described above with respect to FIG. 1A , optical microphone 145 includes at least one light source and at least one detector, and may optionally include one or more optical elements (eg, lenses). Optical microphone 145 may be configured in a variety of ways (eg, as described below with respect to Figures 3-5C). The signal (eg, mixed signal, modulation intensity) from the output of each optical microphone 145 can be used to monitor sound in a local area.

注意,在一些实施例中,一个或多个光学传声器145各自包括膜,并且监测膜的振动,而不是监测用户的皮肤的振动。在此实施例中,光源被配置为照射膜的一部分,并且该膜对光的一部分进行散射和/或反射。在一些实施例中,该膜中至少被光源照射的部分是高度反射的。该光源和检测器可以处于干涉配置或非干涉配置。该检测器对来自膜的散射光和/或反射光进行检测,输出的信号(例如,混合信号、调制强度)可以用于监测局部区域中的声音。Note that in some embodiments, the one or more optical microphones 145 each include a membrane, and the vibrations of the membrane are monitored rather than the vibrations of the user's skin. In this embodiment, the light source is configured to illuminate a portion of the film, and the film scatters and/or reflects a portion of the light. In some embodiments, at least the portion of the film illuminated by the light source is highly reflective. The light source and detector can be in an interference configuration or a non-interference configuration. The detector detects scattered and/or reflected light from the membrane, and the output signal (eg, mixed signal, modulation intensity) can be used to monitor sound in a local area.

音频控制器230控制音频系统200的操作。在图2的实施例中,音频控制器230包括数据存储器235、DOA估计模块240、传递函数模块250、追踪模块260、波束成形模块270、处理模块275、以及声音滤波器模块280。在一些实施例中,音频控制器230可以位于头戴式视图器内部。音频控制器230的一些实施例具有与此处描述的部件不同的部件。类似地,多种功能可以以与此处所描述的方式不同的方式分布在多个部件之中。例如,控制器的一些功能可以在头戴式视图器外部执行。用户可以选择加入,以允许音频控制器230将头戴式视图器检测到的数据发送到头戴式视图器外部的系统,并且用户可以选择对任何这样的数据的访问进行控制的隐私设置。Audio controller 230 controls the operation of audio system 200. In the embodiment of FIG. 2 , audio controller 230 includes data memory 235 , DOA estimation module 240 , transfer function module 250 , tracking module 260 , beamforming module 270 , processing module 275 , and sound filter module 280 . In some embodiments, audio controller 230 may be located inside the headset. Some embodiments of audio controller 230 have different components than those described herein. Similarly, various functions may be distributed among multiple components in a manner different from that described herein. For example, some functions of the controller can be performed outside the headset. The user can opt-in to allow audio controller 230 to send data detected by the headset to systems external to the headset, and the user can select privacy settings that control access to any such data.

数据存储器235存储音频系统200所使用的数据。数据存储器235中的数据可以包括在音频系统200的局部区域中记录的声音、音频内容、头相关传递函数(HRTF)、一个或多个传感器的传递函数、多个声学传感器中的一个或多个声学传感器的阵列传递函数(ATF)、声源位置、局部区域的虚拟模型、波达方向估计、声音滤波器、一个或多个光学传声器145检测到的组织振动、传感器阵列220检测到的声音、将光振幅映射到与光学传声器145的检测器的距离的模型、以及音频系统200使用的相关的其它数据、或它们的任何组合。Data memory 235 stores data used by audio system 200. Data in data store 235 may include sounds recorded in a local area of audio system 200, audio content, a head-related transfer function (HRTF), a transfer function of one or more sensors, one or more of a plurality of acoustic sensors. Array transfer function (ATF) of the acoustic sensor, sound source location, virtual model of the local area, direction of arrival estimation, acoustic filter, tissue vibration detected by one or more optical microphones 145, sound detected by the sensor array 220, A model that maps light amplitude to the distance from the detector of the optical microphone 145, and other relevant data used by the audio system 200, or any combination thereof.

用户可以选择加入,以允许数据存储器235对传感器阵列220和/或一个或多个光学传声器145检测到的数据进行记录。在一些实施例中,音频系统200可以采用始终开启记录,在该始终开启记录中,音频系统200对传感器阵列220和/或光学传声器组件222检测到的所有声音进行记录。用户可以选择加入或选择退出,以允许或阻止音频系统200记录、存储或向其它实体发送所记录的数据。The user may opt-in to allow data storage 235 to record data detected by sensor array 220 and/or one or more optical microphones 145 . In some embodiments, audio system 200 may employ always-on recording in which audio system 200 records all sounds detected by sensor array 220 and/or optical microphone assembly 222 . The user may opt-in or opt-out to allow or prevent audio system 200 from recording, storing, or sending recorded data to other entities.

DOA估计模块240被配置为部分地基于来自传感器阵列220的信息,来定位局部区域中的声源。定位是确定声源相对于音频系统200的用户所位于的位置的过程。DOA估计模块240执行DOA分析,以定位局部区域内的一个或多个声源。DOA分析可以包括分析传感器阵列220处每个声音的强度、频谱和/或到达时间,以确定声音源自哪个方向。在一些情况下,DOA分析可以包括用于分析音频系统200位于的周围声学环境的任何合适的算法。DOA estimation module 240 is configured to locate sound sources in a local area based in part on information from sensor array 220 . Localization is the process of determining where a sound source is located relative to the user of the audio system 200 . DOA estimation module 240 performs DOA analysis to locate one or more sound sources within a local area. DOA analysis may include analyzing the intensity, spectrum, and/or arrival time of each sound at sensor array 220 to determine from which direction the sound originates. In some cases, DOA analysis may include any suitable algorithm for analyzing the surrounding acoustic environment in which audio system 200 is located.

例如,DOA分析可以被设计为接收来自传感器阵列220的输入信号,并且将数字信号处理算法应用于输入信号以估计波达方向。这些算法可以包括例如延迟求和算法,在该延迟求和算法中,对输入信号进行采样,并且对采样信号的最终加权和延迟版本一起进行平均以确定DOA。也可以实现最小均方(least mean square,LMS)算法来创建自适应滤波器。然后,该自适应滤波器可以用于识别例如信号强度的差异或到达时间的差异。然后,这些差异可以用于估计DOA。在另一实施例中,可以通过将输入信号转换到频域中并且选择要处理的时频(time-frequency,TF)域内的特定间隔(bin)来确定DOA。可以对每个所选择的TF间隔进行处理,以确定该间隔是否包括具有直接路径音频信号的音频频谱的一部分。然后,可以对具有直接路径信号的一部分的那些间隔进行分析,以识别传感器阵列220接收该直接路径音频信号的角度。然后,所确定的角度可以用于识别接收到的输入信号的DOA。上文未列出的其它算法也可以单独使用或与上述算法组合使用来确定DOA。For example, a DOA analysis may be designed to receive an input signal from sensor array 220 and apply a digital signal processing algorithm to the input signal to estimate the direction of arrival. These algorithms may include, for example, delayed sum algorithms in which an input signal is sampled and the final weighted and delayed versions of the sampled signals are averaged together to determine the DOA. The least mean square (LMS) algorithm can also be implemented to create adaptive filters. This adaptive filter can then be used to identify differences in signal strength or arrival times, for example. These differences can then be used to estimate DOA. In another embodiment, the DOA may be determined by converting the input signal into the frequency domain and selecting specific bins within the time-frequency (TF) domain to be processed. Each selected TF interval can be processed to determine whether the interval includes a portion of the audio spectrum with a direct path audio signal. Those intervals that have a portion of the direct path signal can then be analyzed to identify the angle at which the direct path audio signal is received by the sensor array 220 . The determined angle can then be used to identify the DOA of the received input signal. Other algorithms not listed above may also be used alone or in combination with the above algorithms to determine DOA.

在一些实施例中,DOA估计模块240也可以相对于音频系统200在局部区域内的绝对定位来确定DOA。可以从外部系统(例如,头戴式视图器的某个其它部件、人工现实控制台、地图构建服务器(mapping server)、定位传感器(例如,定位传感器190)等)接收传感器阵列220的定位。外部系统可以创建局部区域的虚拟模型,在该虚拟模型中映射音频系统200的局部区域和定位。接收到的定位信息可以包括音频系统200的部分或全部的(例如,传感器阵列220的)位置和/或定向。DOA估计模块240可以基于接收到的定位信息来更新所估计的DOA。In some embodiments, DOA estimation module 240 may also determine the DOA relative to the absolute positioning of audio system 200 within the local area. The positioning of sensor array 220 may be received from an external system (eg, some other component of the headset, an artificial reality console, a mapping server, a positioning sensor (eg, positioning sensor 190), etc.). The external system may create a virtual model of the local area in which the local area and positioning of the audio system 200 is mapped. The received positioning information may include the location and/or orientation of some or all of audio system 200 (eg, sensor array 220). DOA estimation module 240 may update the estimated DOA based on the received positioning information.

传递函数模块250被配置为生成一个或多个声学传递函数。通常,传递函数是针对每个可能的输入值给出相应的输出值的数学函数。基于检测到的声音的参数,传递函数模块250生成与音频系统相关联的一个或多个声学传递函数。声学传递函数可以是阵列传递函数(ATF)、头相关传递函数(HRTF)、其它类型的声学传递函数或它们的某种组合。ATF表征传声器如何接收来自空间中的点的声音。Transfer function module 250 is configured to generate one or more acoustic transfer functions. Typically, a transfer function is a mathematical function that gives a corresponding output value for each possible input value. Based on the parameters of the detected sound, the transfer function module 250 generates one or more acoustic transfer functions associated with the audio system. The acoustic transfer function may be an array transfer function (ATF), a head-related transfer function (HRTF), other types of acoustic transfer functions, or some combination thereof. ATF characterizes how a microphone receives sound from a point in space.

ATF包括多个传递函数,该多个传递函数表征声源与传感器阵列220中的多个声学传感器接收到的对应的声音之间的关系。因此,针对一个声源,对于传感器阵列220中的多个声学传感器中的每个声学传感器均存在对应的传递函数。并且该组传递函数被统称为ATF。因此,对于每个声源,均存在对应的ATF。注意,声源可以是例如在局部区域中生成声音的某人或某物、用户、或换能器阵列210的一个或多个换能器。由于人的解剖结构(例如,耳朵形状、肩膀等)在声音行进到人的耳朵时会影响该声音,所以相对于传感器阵列220的特定声源位置的ATF可能因用户而异。因此,传感器阵列220的ATF对于音频系统200的每个用户是个性化的。The ATF includes a plurality of transfer functions that characterize relationships between sound sources and corresponding sounds received by a plurality of acoustic sensors in sensor array 220 . Therefore, for one sound source, there is a corresponding transfer function for each of the plurality of acoustic sensors in sensor array 220 . And this set of transfer functions is collectively called ATF. Therefore, for each sound source, there is a corresponding ATF. Note that a sound source may be, for example, someone or something generating sound in a local area, a user, or one or more transducers of transducer array 210. Because human anatomy (eg, ear shape, shoulders, etc.) affects sound as it travels to the human ear, the ATF for a particular sound source location relative to sensor array 220 may vary from user to user. Therefore, the ATF of sensor array 220 is personalized for each user of audio system 200.

在一些实施例中,传递函数模块250为音频系统200的用户确定一个或多个HRTF。HRTF表征耳朵如何接收来自空间中的点的声音。由于人的解剖结构(例如,耳朵形状、肩膀等)在声音行进到人耳时会影响该声音,所以相对于人的特定源位置的HRTF对于人的每只耳朵是独特的(并且对于人是独特的)。在一些实施例中,传递函数模块250可以使用校准过程来为用户确定HRTF。在一些实施例中,传递函数模块250可以向远程系统提供关于用户的信息。用户可以调整隐私设置,以允许或阻止传递函数模块250向任何远程系统提供关于用户的信息。远程系统使用例如机器学习来确定为用户定制的一组HRTF,并且将定制的该组HRTF提供给音频系统200。In some embodiments, transfer function module 250 determines one or more HRTFs for a user of audio system 200 . HRTF characterizes how the ear receives sound from a point in space. Because a person's anatomy (e.g., ear shape, shoulders, etc.) affects sound as it travels to the person's ears, the HRTF relative to a person's specific source location is unique to each of the person's ears (and is Unique). In some embodiments, transfer function module 250 may use a calibration process to determine the HRTF for the user. In some embodiments, transfer function module 250 can provide information about the user to the remote system. The user can adjust the privacy settings to allow or prevent the transfer function module 250 from providing information about the user to any remote system. The remote system determines a set of HRTFs customized for the user using, for example, machine learning, and provides the customized set of HRTFs to the audio system 200 .

追踪模块260被配置为追踪一个或多个声源的位置。追踪模块260可以将多个当前DOA估计进行比较,并且将它们与多个先前DOA估计的存储历史进行比较。在一些实施例中,音频系统200可以按照周期性时间表(例如,每秒一次或每毫秒一次)重新计算DOA估计。追踪模块可以将当前DOA估计与先前DOA估计进行比较,并且响应于声源的DOA估计中的变化,追踪模块260可以确定声源移动了。在一些实施例中,追踪模块260可以基于从头戴式视图器或某种其它外部源接收到的视觉信息来检测位置的改变。追踪模块260可以追踪一个或多个声源随时间的移动。追踪模块260可以存储声源的数量的值以及每个声源在每个时间点的位置。响应于声源的数量的值或声源的位置的变化,追踪模块260可以确定声源移动了。追踪模块260可以计算定位方差的估计。定位方差可以用作每次确定移动变化的置信水平。Tracking module 260 is configured to track the location of one or more sound sources. Tracking module 260 may compare multiple current DOA estimates and compare them to a stored history of multiple previous DOA estimates. In some embodiments, audio system 200 may recalculate the DOA estimate on a periodic schedule (eg, once every second or every millisecond). The tracking module may compare the current DOA estimate to the previous DOA estimate, and in response to changes in the DOA estimate of the sound source, the tracking module 260 may determine that the sound source has moved. In some embodiments, tracking module 260 may detect changes in position based on visual information received from a head-mounted viewer or some other external source. Tracking module 260 can track the movement of one or more sound sources over time. Tracking module 260 may store a value for the number of sound sources and the location of each sound source at each point in time. In response to a change in the value of the number of sound sources or the location of the sound source, tracking module 260 may determine that the sound source has moved. Tracking module 260 may calculate an estimate of the positioning variance. Positioning variance can be used as a confidence level for each determined movement change.

波束成形模块270被配置为对一个或多个ATF进行处理,以选择性地强调来自某个区域内的声源的声音,同时削弱来自其它区域的声音,从而起到自适应波束形成器的作用。在对传感器阵列220检测到的声音进行分析时、以及在一些情况下,光学传声器145、波束成形模块270可以组合来自不同声学传感器的信息,以强调来自局部区域的特定区域的相关联的声音,同时削弱来自该区域以外的声音。波束成形模块270可以基于例如来自DOA估计模块240和追踪模块260的不同DOA估计,将与来自特定声源的声音相关联的音频信号与局部区域中的其它声源隔离开。波束成形模块270可以因此对局部区域中的离散声源进行选择性地分析。在一些实施例中,波束成形模块270可以增强来自声源的信号。例如,波束成形模块270可以应用声音滤波器,该声音滤波器消除如下信号:高于某些频率的信号、低于某些频率的信号或在某些频率之间的信号。信号增强用于相对于传感器阵列220检测到的其它声音来增强与给定的所识别的声源相关联的声音。Beamforming module 270 is configured to process one or more ATFs to selectively emphasize sounds from sources within a certain area while attenuating sounds from other areas, thereby functioning as an adaptive beamformer. . When analyzing sounds detected by sensor array 220, and in some cases, optical microphone 145, beamforming module 270 may combine information from different acoustic sensors to emphasize associated sounds from specific areas of the local area, Also attenuates sounds coming from outside the area. Beamforming module 270 may isolate audio signals associated with sound from a particular sound source from other sound sources in a local area based on, for example, different DOA estimates from DOA estimation module 240 and tracking module 260 . The beamforming module 270 can thus selectively analyze discrete sound sources in local areas. In some embodiments, beamforming module 270 can enhance signals from sound sources. For example, beamforming module 270 may apply an acoustic filter that eliminates signals above certain frequencies, below certain frequencies, or between certain frequencies. Signal enhancement is used to enhance the sounds associated with a given identified sound source relative to other sounds detected by sensor array 220 .

处理模块275使用来自光学传声器组件222的输出信号来测量局部区域中的声音。该输出信号可以是例如对应于光的从皮肤反射的部分的振幅的信号。例如,音频控制器可以将检测到的光输入到如下模型中:该模型将光振幅映射到与检测器的距离(即,皮肤定位)。下文关于图3进一步论述这一点。The processing module 275 uses the output signal from the optical microphone assembly 222 to measure the sound in the local area. The output signal may be, for example, a signal corresponding to the amplitude of the portion of light reflected from the skin. For example, the audio controller can input detected light into a model that maps light amplitude to distance from the detector (i.e., skin localization). This is discussed further below with respect to Figure 3.

在一个或多个光学传声器145处于干涉配置的情况下,输出信号可以是例如混合信号。检测到的混合信号包括动态高频分量和调制分量。该动态高频分量随着皮肤被照射的部分和检测器之间的距离的变化而改变(频率移位)。注意,所测量的振动的振幅可以是例如50nm(例如,用户低语)至1.5微米(micron)(例如,用户叫喊、局部区域中的某种其它喧闹噪声)。因此,局部区域中的声音在皮肤中引起的振动会导致动态高频分量的变化。处理模块通过根据动态高频分量推断出会引起用户的皮肤的振动的相应的声音来测量声音。此外,由于声音所引起的用户的皮肤的振动与例如用户的运动(例如,行走、奔跑)所引起的振动(称为运动噪声)具有非常不同的频率。处理模块275可以将动态高频分量中对应于运动噪声的部分隔离和/或滤出。In the case of one or more optical microphones 145 in an interference configuration, the output signal may be, for example, a mixed signal. The detected mixed signal includes dynamic high-frequency components and modulation components. This dynamic high-frequency component changes (frequency shift) as the distance between the illuminated part of the skin and the detector changes. Note that the amplitude of the vibration measured may be, for example, 50 nm (eg, user whispering) to 1.5 micron (eg, user shouting, some other loud noise in the local area). Therefore, the vibrations induced in the skin by sounds in local areas lead to changes in dynamic high-frequency components. The processing module measures sounds by inferring corresponding sounds that cause vibrations of the user's skin based on dynamic high-frequency components. Furthermore, vibrations of the user's skin caused by sound have very different frequencies from vibrations caused by, for example, the user's movement (eg, walking, running) (called motion noise). The processing module 275 may isolate and/or filter out portions of the dynamic high-frequency components corresponding to motion noise.

注意,语音驱动的组织振动的幅度在较高的音频频率下减小。这可能是由于骨传导语音在语音驱动的振动通过骨和软组织传播时的低通性质。这种结构上的低通滤波器特征可以影响光学传声器在更高的频率下感测语音内容信息的效果。因此,在一些情况下,语音(例如,用户的语音)的高频分量可能被衰减。该高频分量可以是例如频率超过2kHz的音频内容。Note that the amplitude of speech-driven tissue vibrations decreases at higher audio frequencies. This is likely due to the low-pass nature of bone-conducted speech as speech-driven vibrations propagate through bone and soft tissue. The low-pass filter characteristics of this structure can affect the effectiveness of optical microphones in sensing speech content information at higher frequencies. Therefore, in some cases, high frequency components of speech (eg, a user's speech) may be attenuated. The high frequency component may be, for example, audio content with a frequency exceeding 2 kHz.

处理模块275可以对所感测的语音内容(例如,针对较高的频率)进行增强和/或重构。处理模块275可以例如使用基于矩阵分解的带宽扩展方法来对所感测的语音的高频内容进行重构。使用一个或多个声学传声器获得的干净语音记录被用于学习用户的宽带频谱基。这些宽带基包括语音的低频内容和高频内容这两者。然后将它们的低频内容用于窄带语音(用光学传声器获得),以学习应该如何组合宽带基以获得宽带语音。Processing module 275 may enhance and/or reconstruct the sensed speech content (eg, for higher frequencies). The processing module 275 may reconstruct the high-frequency content of the sensed speech using, for example, a matrix decomposition-based bandwidth extension method. Clean speech recordings obtained using one or more acoustic microphones are used to learn the user's broadband spectral basis. These broadband bases include both the low frequency content and the high frequency content of speech. Their low-frequency content is then used for narrowband speech (obtained with optical microphones) to learn how the wideband basis should be combined to obtain wideband speech.

在另一实施例中,处理模块275可以使用基于神经网络的音频超分辨率方法来对高频内容进行重构。如果该网络是在谱域中训练的,则音频超分辨率使能根据低频内容外推(修复)高频内容。如果该网络是在时域中训练的,则在时域中对窄带波形进行内插,以获得具有高频内容的宽带语音。替代地,还可能联合训练两个网络,一个在时域中训练,另一个在频域中训练。两个网络的结果可以与融合层组合,或两个网络可以级联。一个或多个光学传声器和声学传声器的同时记录可以用于这些神经网络的训练。然后,学习后的网络可以用于对用一个或多个光学传声器获得的语音的高频内容进行重构。In another embodiment, the processing module 275 may use a neural network-based audio super-resolution method to reconstruct the high-frequency content. If the network is trained in the spectral domain, audio super-resolution enables the extrapolation (repair) of high-frequency content based on low-frequency content. If the network is trained in the time domain, the narrowband waveforms are interpolated in the time domain to obtain wideband speech with high frequency content. Alternatively, it is also possible to jointly train two networks, one in the time domain and the other in the frequency domain. The results of both networks can be combined with a fusion layer, or the two networks can be cascaded. Simultaneous recordings from one or more optical and acoustic microphones can be used for training of these neural networks. The learned network can then be used to reconstruct the high-frequency content of speech acquired with one or more optical microphones.

注意,在文献中已经成功地使用类似的方法来根据窄带电话信号重构宽带语音。在电话应用程序中,窄带语音和宽带语音的低频内容是相同的。然而,用光学传声器捕获的语音的低频内容可能与用声学传声器捕获的语音的低频内容不同。为了解释这一差异,可以在基于分解的方法中包括如下加权矩阵:该加权矩阵将从光学传声器获得的基映射到声学传声器。此矩阵可以在训练期间学习。在训练音频超分辨率网络时,可以插入一个或多个卷积层作为一个或多个输入层。通过适当的训练,一个或多个附加层可以帮助学习光学传声器的低频内容到声学传声器的映射。Note that similar approaches have been successfully used in the literature to reconstruct wideband speech from narrowband telephone signals. In telephony applications, the low-frequency content of narrowband speech and wideband speech is the same. However, the low-frequency content of speech captured with an optical microphone may differ from the low-frequency content of speech captured with an acoustic microphone. To account for this difference, one can include in the decomposition-based approach a weighting matrix that maps the basis obtained from the optical microphone to the acoustic microphone. This matrix can be learned during training. When training an audio super-resolution network, one or more convolutional layers can be inserted as one or more input layers. With appropriate training, one or more additional layers can help learn the mapping of the low-frequency content of optical microphones to acoustic microphones.

处理模块275可以用重构的高频分量,对所测量的语音的声音进行更新。重构的高频分量减轻了高频分量的衰减。The processing module 275 may update the measured sound of the speech with the reconstructed high-frequency component. The reconstructed high-frequency components alleviate the attenuation of the high-frequency components.

处理模块275可以使用来自光学传声器组件222的输出的信号(例如,混合信号)来在检测到的声音中识别用户的语音。这些输出信号是例如当用户说话和/或来自局部区域的声音促使皮肤振动时,在用户的皮肤上发生的振动。在一些实施例中,一个或多个光学传声器145可以起到VAD的作用。因此,处理模块275可以将来自一个或多个光学传声器组件222的输出信号、以及来自局部区域的声音输入到如下模型中:该模型使用输入来对检测到的来自局部区域的声音中的用户的语音进行识别。Processing module 275 may use signals (eg, mixed signals) from the output of optical microphone assembly 222 to identify the user's speech among the detected sounds. These output signals are vibrations that occur on the user's skin, for example when the user speaks and/or sounds from a local area cause the skin to vibrate. In some embodiments, one or more optical microphones 145 may function as a VAD. Accordingly, the processing module 275 may input the output signal from one or more optical microphone assemblies 222, as well as the sound from the local area, into a model that uses the input to predict the user's behavior in the detected sound from the local area. Voice recognition.

在一些实施例中,处理模块275可以确定所识别的用户的语音包括命令。并且然后,音频系统200和/或头戴式视图器100可以根据该命令执行动作。该动作可以控制音频系统200和/或头戴式视图器100的某种操作。动作可以例如指定声源、减小/增大音量、控制音频系统200和/或头戴式视图器100的操作的某种其它动作、或它们的某种组合。In some embodiments, processing module 275 may determine that the recognized user's voice includes a command. And then, audio system 200 and/or headset 100 can perform actions based on the command. This action may control certain operations of the audio system 200 and/or the headset 100 . The action may, for example, specify a sound source, decrease/increase volume, some other action that controls the operation of audio system 200 and/or headset 100, or some combination thereof.

处理模块275以多种方式使用来自光学传声器组件222的检测到的组织振动(即,输出信号)、以及来自传感器阵列220的检测到的声音。在一些实施例中,处理模块275使用来自传感器阵列220的检测到的声音来校准一个或多个光学传声器145。The processing module 275 uses the detected tissue vibrations (ie, output signals) from the optical microphone assembly 222 and the detected sound from the sensor array 220 in a variety of ways. In some embodiments, processing module 275 uses detected sound from sensor array 220 to calibrate one or more optical microphones 145 .

在一些实施例中,处理模块275在来自光学传声器145和/或传感器阵列220的所测量的声音中对用于抑制的一个或多个声音(例如,背景噪声)进行识别。然后,作为主动降噪过程的一部分,处理模块275可以将该信息提供给声音滤波器模块280。可以应用声音滤波器来修改对应于音频内容的音频信号。并且然后,换能器阵列210可以将修改后的音频信号作为修改后的音频内容呈现给用户,该修改后的音频内容包括音频内容和抑制噪声的抑制分量。In some embodiments, processing module 275 identifies one or more sounds (eg, background noise) for suppression among the measured sounds from optical microphone 145 and/or sensor array 220 . The processing module 275 may then provide this information to the sound filter module 280 as part of the active noise reduction process. Sound filters can be applied to modify the audio signal corresponding to the audio content. And then, the transducer array 210 may present the modified audio signal to the user as modified audio content, the modified audio content including the audio content and the suppression component that suppresses the noise.

在一些实施例中,处理模块275可以使用来自光学传声器组件222的输出信号来监测头戴式视图器在用户的头部上的滑动。例如,如果头戴式视图器移动到用户的头部上的新的静止定位,它将会引入距离偏移。处理模块275可以识别输出信号中的偏移,以识别和/或监测头戴式视图器在用户上的定位。例如,声音滤波器模块280可以使用新的定位信息来生成更准确的声音滤波器。In some embodiments, processing module 275 may use output signals from optical microphone assembly 222 to monitor sliding of the headset on the user's head. For example, if the headset is moved to a new resting position on the user's head, it will introduce a distance offset. Processing module 275 may identify offsets in the output signal to identify and/or monitor the positioning of the headset on the user. For example, sound filter module 280 may use the new positioning information to generate a more accurate sound filter.

声音滤波器模块280确定用于换能器阵列210的声音滤波器。在一些实施例中,声音滤波器促使音频内容被空间化,使得音频内容似乎源自目标区域。在一些实施例中,声音滤波器可以引起随频率的变化而变化的声音的正放大或负放大。声音滤波器模块280可以使用HRTF和/或声学参数来生成声音滤波器。声学参数描述局部区域的声学特性。声学参数可以包括例如混响时间、混响水平、房间脉冲响应等。在一些实施例中,声音滤波器模块280计算这些声学参数中的一个或多个声学参数。在一些实施例中,声音滤波器模块280从地图构建服务器请求声学参数(例如,如下文关于图9所描述的)。The sound filter module 280 determines the sound filter for the transducer array 210 . In some embodiments, the sound filter causes the audio content to be spatialized so that the audio content appears to originate from the target area. In some embodiments, a sound filter may cause positive or negative amplification of sound as a function of frequency. The sound filter module 280 may use HRTFs and/or acoustic parameters to generate sound filters. Acoustic parameters describe the acoustic properties of a local area. Acoustic parameters may include, for example, reverberation time, reverberation level, room impulse response, etc. In some embodiments, sound filter module 280 calculates one or more of these acoustic parameters. In some embodiments, the acoustic filter module 280 requests the acoustic parameters from the map building server (eg, as described below with respect to Figure 9).

声音滤波器模块280可以基于在检测到的声音中所识别的用户的语音来更新一个或多个声音滤波器。可以将一个或多个更新后的声音滤波器应用于音频内容,以生成修改后的音频内容。例如,声音滤波器模块280可以更新声音滤波器,使得当该声音滤波器被应用于音频内容时,修改后的音频内容将增强所识别的用户的语音。在一些实施例中,声音滤波器模块280可以更新声音滤波器,以抑制由一个或多个光学传声器145和/或传感器阵列220检测到的一个或多个声音(即,执行主动降噪)。在一些实施例中,声音滤波器模块280将声音滤波器和/或修改后的音频内容提供给换能器阵列210、和/或局部区域中的一个或多个其它音频系统。声音滤波器模块280可以经由例如本地无线网络(例如,WIFI、蓝牙等)将一个或多个更新后的声音滤波器、和/或修改后的音频内容提供给一个或多个其它音频系统。以此方式,可以将用户的语音实时地呈现给另一音频系统的用户——这在其他用户将难以听见该用户的语音的嘈杂的环境中(例如,在足球比赛处的人群中或在某种其它低声SNR环境中)特别有用。The sound filter module 280 may update one or more sound filters based on the user's voice identified in the detected sounds. One or more updated sound filters may be applied to the audio content to generate modified audio content. For example, the sound filter module 280 may update the sound filter such that when the sound filter is applied to audio content, the modified audio content will enhance the recognized user's speech. In some embodiments, sound filter module 280 may update the sound filter to suppress one or more sounds detected by one or more optical microphones 145 and/or sensor array 220 (ie, perform active noise reduction). In some embodiments, sound filter module 280 provides sound filters and/or modified audio content to transducer array 210, and/or one or more other audio systems in the local area. The sound filter module 280 may provide one or more updated sound filters, and/or modified audio content, to one or more other audio systems via, for example, a local wireless network (eg, WIFI, Bluetooth, etc.). In this way, the user's speech can be presented to the user of another audio system in real time - this is possible in noisy environments where other users would have difficulty hearing the user's speech (e.g., in a crowd at a football match or in a This is particularly useful in other low SNR environments).

图3为根据一个或多个实施例的示例鼻托310,该示例鼻托包括具有位于鼻托310内的不同位置的光源330和检测器340的光学传声器320。鼻托310是头戴式视图器(例如,头戴式视图器100)的示例鼻托。光学传声器320是光学传声器145的实施例,其中,光源330和检测器340位于鼻托310内的不同定位,并且彼此分离开阈值距离。检测器340和光源330位于不同或同一管芯上,这取决于它们是双路干涉仪还是共路干涉仪,在这种情况下,该阈值距离由干涉仪臂长确定。如图3所示,光源330用光照射鼻子的一部分。并且检测器340检测光的被鼻子散射和反射的部分。3 is an example nose pad 310 including an optical microphone 320 with a light source 330 and a detector 340 located at different locations within the nose pad 310, in accordance with one or more embodiments. Nose pad 310 is an example nose pad for a headset (eg, headset 100 ). Optical microphone 320 is an embodiment of optical microphone 145 in which light source 330 and detector 340 are located at different locations within nose pad 310 and are separated from each other by a threshold distance. Detector 340 and light source 330 are on different or the same die, depending on whether they are dual or common interferometers, in which case the threshold distance is determined by the interferometer arm length. As shown in Figure 3, the light source 330 illuminates a portion of the nose with light. And detector 340 detects the portion of light scattered and reflected by the nose.

皮肤中的振动可能影响发射的光中有多少光从用户的皮肤反射和/或散射。在一些实施例中,音频控制器对检测到的光进行处理,以基于检测到的光的调制强度来测量来自局部区域的声音。例如,音频控制器可以将检测到的光输入到如下模型中:该模型将光振幅映射到与检测器340的距离(即,皮肤定位)。例如,在第一时间处,检测到的光可以具有相对低的振幅信号,并且在第二时间处,检测到的光可以具有增加后的振幅信号。因此,皮肤在第一时间期间会比在第二时间期间离得更远。以此方式,音频控制器可以使用检测到的信号的振幅来监测皮肤的振动。Vibrations in the skin may affect how much of the emitted light is reflected and/or scattered from the user's skin. In some embodiments, the audio controller processes the detected light to measure sound from the local area based on the modulation intensity of the detected light. For example, the audio controller may input the detected light into a model that maps light amplitude to distance from detector 340 (ie, skin location). For example, at a first time, the detected light may have a relatively low amplitude signal, and at a second time, the detected light may have an increased amplitude signal. Therefore, the skin will be farther apart during the first time than during the second time. In this way, the audio controller can use the amplitude of the detected signal to monitor the vibrations of the skin.

在一些实施例中,从光源330发射的光被划分成参考光束和感测光束。将参考光束提供给检测器340。在这些实施例中,检测器340与光源330处于干涉配置,使得检测器340被配置为对混合信号进行检测。例如,可以存在向检测器340提供参考光束的光波导管(例如,光纤)。参考光束将与感测光束的从鼻子反射和散射的部分混合,以产生混合光束,该混合光束作为混合信号被检测器检测。音频控制器对检测到的混合信号进行处理,以测量来自局部区域的声音。In some embodiments, light emitted from light source 330 is divided into a reference beam and a sensing beam. A reference beam is provided to detector 340. In these embodiments, detector 340 and light source 330 are in an interference configuration such that detector 340 is configured to detect the mixed signal. For example, there may be an optical waveguide (eg, an optical fiber) that provides a reference beam to detector 340. The reference beam will be mixed with the portion of the sensing beam that is reflected and scattered from the nose to produce a mixed beam that is detected by the detector as a mixed signal. The audio controller processes the detected mixed signal to measure the sound coming from the local area.

图4为根据一个或多个实施例的被配置为自混合干涉仪的示例光学传声器400。光学传声器400包括光源(未示出)、检测器(未示出)、以及光学元件。光学传声器400是光学传声器145的实施例,其中,光源和检测器是同一设备的一部分,并且被配置为作为自混合干涉仪如所配置地那样起作用。光学传声器400可以:例如,嵌入头戴式视图器的鼻托中、耦接到头戴式视图器的眼镜框(例如,在侧向辐射定位)等。下文关于图5A、图5B和图5C对光学传声器400的各种实施例进行详细说明。Figure 4 is an example optical microphone 400 configured as a self-mixing interferometer in accordance with one or more embodiments. Optical microphone 400 includes a light source (not shown), a detector (not shown), and optical elements. Optical microphone 400 is an embodiment of optical microphone 145 in which the light source and detector are part of the same device and are configured to function as configured as a self-mixing interferometer. Optical microphone 400 may be, for example, embedded in a nose pad of a headset, coupled to an eyeglass frame of a headset (eg, positioned in a lateral radiating position), or the like. Various embodiments of optical microphone 400 are described in detail below with respect to Figures 5A, 5B and 5C.

光学传声器400被配置为自混合干涉仪。基于自混合干涉测量的系统是这样一种系统,在该系统中,反射光被反馈到调制激光的功率的激光腔中,并且该激光腔用作锁相放大器,从而增大传声器SNR。在自混合干涉仪系统中,可以将检测器(例如,光电二极管)放置在横向地位移的或垂直地位移的激光管芯上,以测量激光强度。在一些实施例中,光源将发射的光划分成参考光束和感测光束,并且将参考光束提供给检测器。附加地或替代地,从光源发射的光的一部分可以被光学元件反射回到检测器,并且该部分的反射光为参考光束。检测器与光源处于干涉配置(以用作自混合干涉仪),并且检测器被配置为对混合信号进行检测。该混合信号对应于由参考光束与感测光束的从鼻子反射和散射的部分混合形成的混合光束。音频控制器对检测到的混合信号进行处理,以测量来自局部区域的声音。Optical microphone 400 is configured as a self-mixing interferometer. A system based on self-mixing interferometry is a system in which reflected light is fed back into a laser cavity that modulates the power of the laser, and the laser cavity acts as a lock-in amplifier, thereby increasing the microphone SNR. In a self-mixing interferometer system, a detector (eg, a photodiode) can be placed on a laterally displaced or vertically displaced laser die to measure laser intensity. In some embodiments, the light source divides the emitted light into a reference beam and a sensing beam, and provides the reference beam to the detector. Additionally or alternatively, a portion of the light emitted from the light source may be reflected back to the detector by the optical element, and this portion of the reflected light is the reference beam. The detector is in an interference configuration with the light source (to function as a self-mixing interferometer), and the detector is configured to detect the mixed signal. The mixed signal corresponds to a mixed beam formed by mixing the reference beam with the portion of the sensing beam that is reflected and scattered from the nose. The audio controller processes the detected mixed signal to measure the sound coming from the local area.

图5A为根据一个或多个实施例的被配置为自混合干涉仪的光学传声器500,该光学传声器的多个部件处于串联配置。光学传声器500包括光源510、光学元件520以及检测器530。在一些实施例中,光源510和检测器530耦接到同一管芯。光学传声器500是光学传声器400的实施例。Figure 5A is an optical microphone 500 configured as a self-mixing interferometer with multiple components in a series configuration in accordance with one or more embodiments. The optical microphone 500 includes a light source 510, an optical element 520 and a detector 530. In some embodiments, light source 510 and detector 530 are coupled to the same die. Optical microphone 500 is an embodiment of optical microphone 400 .

如所示出的,光源510(例如,VCSEL)发射光。发射的光入射到光学元件520(例如,透镜)上,并且发射的光的一部分作为感测光束被光学元件520透射,并且发射的光的一部分作为参考光束被反射到检测器530。注意,在一些实施例(未示出)中,光源510可以将光作为感测光束发射到光学元件,以及将光作为参考光束发射到检测器530。感测光束的一部分从用户的皮肤散射和/或反射,并且然后穿过光学元件520和光源510,以在检测器530处与参考光束混合以形成混合光束。检测器530对作为混合信号的此混合光束进行检测。音频控制器对检测到的混合信号进行处理,以测量来自局部区域的声音。As shown, light source 510 (eg, VCSEL) emits light. The emitted light is incident on optical element 520 (eg, a lens), and a portion of the emitted light is transmitted by optical element 520 as a sensing beam, and a portion of the emitted light is reflected to detector 530 as a reference beam. Note that in some embodiments (not shown), the light source 510 may emit light as a sensing beam to the optical element and as a reference beam to the detector 530 . A portion of the sensing beam is scattered and/or reflected from the user's skin and then passes through optical element 520 and light source 510 to mix with the reference beam at detector 530 to form a mixed beam. The detector 530 detects this mixed light beam as a mixed signal. The audio controller processes the detected mixed signal to measure the sound coming from the local area.

图5B为根据一个或多个实施例的被配置为自混合干涉仪的光学传声器540,该光学传声器的多个部件处于并联配置。光学传声器540包括光源545、光学元件520以及检测器550。在一些实施例中,光源510和检测器530耦接到同一管芯。除了光源545和检测器550以并联配置布置以及检测器550现在也耦接到光学元件520以外,光源545和检测器550与光源510和检测器530基本相同。光学传声器500是光学传声器400的实施例。Figure 5B is an optical microphone 540 configured as a self-mixing interferometer with multiple components in a parallel configuration in accordance with one or more embodiments. Optical microphone 540 includes a light source 545, an optical element 520, and a detector 550. In some embodiments, light source 510 and detector 530 are coupled to the same die. Light source 545 and detector 550 are essentially the same as light source 510 and detector 530 , except that light source 545 and detector 550 are arranged in a parallel configuration and detector 550 is now also coupled to optical element 520 . Optical microphone 500 is an embodiment of optical microphone 400 .

如所示出的,光源545(例如,VCSEL)发射光。发射的光入射到光学元件520(例如,透镜)上,并且发射的光的一部分作为感测光束被光学元件520透射,并且发射的光的一部分作为参考光束被反射到检测器550。感测光束的一部分从用户的皮肤散射和/或反射,并且然后穿过光学元件520,以在检测器550处与参考光束混合以形成混合光束。检测器550对作为混合信号的此混合光束进行检测。音频控制器对检测到的混合信号进行处理,以测量来自局部区域的声音。As shown, light source 545 (eg, VCSEL) emits light. The emitted light is incident on optical element 520 (eg, a lens), and a portion of the emitted light is transmitted by optical element 520 as a sensing beam, and a portion of the emitted light is reflected to detector 550 as a reference beam. A portion of the sensing beam is scattered and/or reflected from the user's skin and then passes through optical element 520 to mix with the reference beam at detector 550 to form a mixed beam. The detector 550 detects this mixed light beam as a mixed signal. The audio controller processes the detected mixed signal to measure the sound coming from the local area.

图5C为根据一个或多个环境的包括两个自混合干涉仪的成对光学传声器560的示例。光学传声器560包括光源510a、光源510b、光学元件570、检测器530a、检测器530b以及块580。在一些实施例中,光源510a、光源510b、检测器530a、检测器530b和块580耦接到同一管芯。光源510a和光源510b与光源510基本相同,检测器530a和检测器530b与检测器530基本相同。除了光学元件570耦接到多个光源之外,光学元件570与光学元件520基本相同。光学传声器560是光学传声器400的实施例。Figure 5C is an example of a pair of optical microphones 560 including two self-mixing interferometers according to one or more environments. Optical microphone 560 includes light source 510a, light source 510b, optical element 570, detector 530a, detector 530b, and block 580. In some embodiments, light source 510a, light source 510b, detector 530a, detector 530b, and block 580 are coupled to the same die. The light source 510a and the light source 510b are substantially the same as the light source 510, and the detector 530a and the detector 530b are substantially the same as the detector 530. Optical element 570 is substantially the same as optical element 520 except that optical element 570 is coupled to multiple light sources. Optical microphone 560 is an embodiment of optical microphone 400 .

成对光学传声器560包括一个或多个光学元件、多个光源、以及多个对应的检测器。并且每个光源与其对应的检测器处于并联配置或串联配置,以形成相应的光学传声器。例如,如所示出的,光源510a与对应的检测器530a处于串联配置,并且光源510b与对应的检测器530b处于串联配置。在一些实施例中,光源510a和光源510b以相同的波长发射。在替代实施例中,光源510a和光源510b以不同的波长发射。例如,光源510a可以以780nm发射光,并且光源510b以850nm发射光。因此,光源510a、检测器520a和光学元件570形成第一光学传声器,该第一光学传声器被配置为自混合干涉仪,该第一光学传声器的多个部件处于串联配置(例如,如图5A所示);并且光源510b、检测器520b和光学元件570形成第二光学传声器,该第二光学传声器被配置为自混合干涉仪,该第二光学传声器的多个部件处于串联配置。注意,尽管在所示出的示例中示出了两个光学传声器,但是在其它实施例中,可以存在形成的多个附加的光学传声器,该多个附加的光学传声器也耦接到光学元件570。Paired optical microphone 560 includes one or more optical elements, multiple light sources, and multiple corresponding detectors. And each light source and its corresponding detector are in a parallel configuration or a series configuration to form a corresponding optical microphone. For example, as shown, light source 510a is in a series configuration with corresponding detector 530a, and light source 510b is in a series configuration with corresponding detector 530b. In some embodiments, light source 510a and light source 510b emit at the same wavelength. In alternative embodiments, light source 510a and light source 510b emit at different wavelengths. For example, light source 510a may emit light at 780 nm, and light source 510b may emit light at 850 nm. Accordingly, light source 510a, detector 520a, and optical element 570 form a first optical microphone configured as a self-mixing interferometer with its multiple components in a series configuration (e.g., as shown in Figure 5A shown); and the light source 510b, the detector 520b and the optical element 570 form a second optical microphone configured as a self-mixing interferometer, with multiple components of the second optical microphone in a series configuration. Note that although two optical microphones are shown in the illustrated example, in other embodiments there may be multiple additional optical microphones formed that are also coupled to optical element 570 .

经由块580减轻两个光学传声器之间的串扰。块580由不透射由光源510a、光源510b发射的光(例如,吸收性或反射性)的材料制成。在一些实施例中,块580可以是半导体管芯的一部分,以与发射器形成单个芯片。在一些实施例中,块580是用于将两个单独的芯片结合在一起的金属。注意,两个光学元件相对于光学元件570的不同定位允许多个光学传声器中的每个光学传声器具有不同的发射角度。例如,使用光源510a和检测器530a形成的光学传声器发射感测光束585,并且使用光源510a和检测器530a形成的光学传声器发射感测光束590。并且感测光束585和感测光束590以不同的角度从光学元件570发射。Crosstalk between the two optical microphones is mitigated via block 580 . Block 580 is made of a material that is non-transmissive (eg, absorptive or reflective) to light emitted by light source 510a, 510b. In some embodiments, block 580 may be part of a semiconductor die to form a single chip with the emitter. In some embodiments, block 580 is the metal used to bond two separate chips together. Note that the different positioning of the two optical elements relative to optical element 570 allows for a different emission angle for each of the plurality of optical microphones. For example, an optical microphone formed using light source 510a and detector 530a emits sensing beam 585, and an optical microphone formed using light source 510a and detector 530a emits sensing beam 590. And the sensing beam 585 and the sensing beam 590 are emitted from the optical element 570 at different angles.

成对光学传声器560可以同时监测两个不同的位置。相比之下,光学传声器500、光学传声器540监测单个位置。注意,使用两个元件的封装技术而不是整片处理可以更容易地实现光学传声器540。Paired optical microphones 560 can monitor two different locations simultaneously. In contrast, optical microphones 500, 540 monitor a single location. Note that optical microphone 540 may be more easily implemented using two-component packaging techniques rather than a monolithic process.

图6为根据一个或多个实施例的被配置为LDV的光学传声器600的示例。光学传声器600包括光源610(例如,边缘发射激光器)、检测器620(例如,光电二极管)、波导结构630、以及光学天线640,它们都作为光子集成电路的一部分位于衬底650上。在一些实施例中,光学传声器600可以包括附加的部件。光学传声器600是使用LDV配置的光学传声器145的实施例。光学传声器600可以:例如,嵌入到头戴式视图器的鼻托中、耦接到头戴式视图器的眼镜框(例如,在侧向辐射定位)等。Figure 6 is an example of an optical microphone 600 configured as an LDV in accordance with one or more embodiments. Optical microphone 600 includes a light source 610 (eg, an edge-emitting laser), a detector 620 (eg, a photodiode), a waveguide structure 630, and an optical antenna 640, all located on a substrate 650 as part of a photonic integrated circuit. In some embodiments, optical microphone 600 may include additional components. Optical microphone 600 is an embodiment of optical microphone 145 using an LDV configuration. Optical microphone 600 may be, for example, embedded in a nose pad of a headset, coupled to an eyeglass frame of a headset (eg, positioned in a lateral radiating position), or the like.

波导结构630是将光引导至光学传声器600的各种部件的光波导。波导结构630可以经由多个部分将例如光源610、分光器692、一个或多个光学天线640、光学组合器695、检测器620、一个或多个光学放大器或它们的某种组合耦接在一起。该多个部分包括发射部分660、参考部分670、发射感测部分680、接收感测部分685和混合部分690。该波导结构还包括分光器692、光学组合器694,并且还可以包括激光放大器。分光器692将来自发射部分660的光的一部分(例如,50%)分开到发射感测部分680中,并且将来自发射部分660的该光的剩余部分分开到参考部分670中。在一些实施例中,光功率的某种其它部分(例如,80%)被分开到发射感测部分680而不是参考部分670中。类似地,光学组合器将来自接收感测部分685的光与来自参考部分670的光组合至混合部分690中。光学传声器600可以包括放大光的一个或多个光学放大器。例如,可以将光学放大器定位为放大由光学天线640输出之前的光、和/或放大由光学天线640内耦合的光。Waveguide structure 630 is an optical waveguide that guides light to the various components of optical microphone 600 . The waveguide structure 630 may couple together via various portions, for example, a light source 610, a beam splitter 692, one or more optical antennas 640, an optical combiner 695, a detector 620, one or more optical amplifiers, or some combination thereof. . The plurality of sections includes a transmit section 660, a reference section 670, a transmit sensing section 680, a receive sensing section 685, and a mixing section 690. The waveguide structure also includes a beam splitter 692, an optical combiner 694, and may also include a laser amplifier. Splitter 692 splits a portion (eg, 50%) of the light from emission portion 660 into emission sensing portion 680 and the remaining portion of the light from emission portion 660 into reference portion 670 . In some embodiments, some other portion (eg, 80%) of the optical power is split into emission sensing portion 680 instead of reference portion 670. Similarly, an optical combiner combines light from receive sensing portion 685 with light from reference portion 670 into mixing portion 690 . Optical microphone 600 may include one or more optical amplifiers that amplify light. For example, the optical amplifier may be positioned to amplify light prior to output by optical antenna 640 and/or to amplify light coupled in-coupled by optical antenna 640.

光学天线640将光外耦合和内耦合至光学传声器600中。光学天线640可以是例如光栅耦合器。注意,所示出的实施例包括经由光学天线640的光的公共输入/输出路径。在其它实施例(未示出)中,可以存在用于外耦合感测光束的光学天线、以及用于接收感测光束的从用户的皮肤反射和/或散射的部分的单独的光学天线。Optical antenna 640 out-couples and in-couples light into optical microphone 600 . Optical antenna 640 may be, for example, a grating coupler. Note that the illustrated embodiment includes a common input/output path for light via optical antenna 640. In other embodiments (not shown), there may be an optical antenna for outcoupling the sensing beam, and a separate optical antenna for receiving the portion of the sensing beam that is reflected and/or scattered from the user's skin.

光源610发射光,该光耦合到波导结构630的发射部分660中。分光器692将发射的光划分成参考光束和感测光束,将参考光束提供给参考部分670,并且将感测光束提供给发射感测部分680。发射感测部分680将感测光束提供给光学天线640,该光学天线640将光外耦合至局部区域中(例如,以照射用户的皮肤)。注意,在一些实施例中,可以在光由光学天线640发射之前使用光学放大器来放大该光。感测光束的一部分被用户的皮肤反射和/或散射,并且经由光学天线640被内耦合至波导结构630中。接收感测部分685将此光提供给光学组合器695。注意,在一些实施例中,可以在光被传递到光学组合器695之前使用光学放大器来放大该光。光学组合器695将接收到的感测光束的部分与参考光束组合,以生成内耦合到混合部分690的混合光束。检测器620接收该混合光束并且检测相应的混合信号。音频控制器对检测到的混合信号进行处理,以测量来自局部区域的声音。Light source 610 emits light, which couples into emission portion 660 of waveguide structure 630 . The beam splitter 692 divides the emitted light into a reference beam and a sensing beam, provides the reference beam to the reference portion 670 and provides the sensing beam to the emission sensing portion 680 . Transmit sensing portion 680 provides the sensing beam to optical antenna 640, which outcouples the light into a localized area (eg, to illuminate the user's skin). Note that in some embodiments, an optical amplifier may be used to amplify the light before it is emitted by optical antenna 640. A portion of the sensing beam is reflected and/or scattered by the user's skin and is incoupled into waveguide structure 630 via optical antenna 640 . The receiving sensing portion 685 provides this light to the optical combiner 695 . Note that in some embodiments, an optical amplifier may be used to amplify the light before it is passed to optical combiner 695. Optical combiner 695 combines the received portion of the sensing beam with the reference beam to generate a mixed beam that is incoupled to mixing portion 690 . Detector 620 receives the mixed beam and detects the corresponding mixed signal. The audio controller processes the detected mixed signal to measure the sound coming from the local area.

衬底650可以由任何标准芯片衬底材料形成,例如半导体材料、硅、绝缘体上硅、砷化镓、铝砷化镓、蓝宝石上硅等。衬底650也可以由可见光谱带(400nm至700nm)中的任何透明材料形成,例如玻璃、塑料、聚合物、聚甲基丙烯酸甲酯(PMMA)、二氧化硅、以及任何形式的晶体(例如铌酸锂、二氧化碲等)。衬底650的表面可以结合到头戴式视图器(例如,头戴式视图器100)。光学传声器600的多个部件可以通过任何标准结合技术结合到衬底650、和/或通过任何标准蚀刻或外延生长技术形成在衬底上。Substrate 650 may be formed from any standard chip substrate material, such as semiconductor materials, silicon, silicon on insulator, gallium arsenide, aluminum gallium arsenide, silicon on sapphire, etc. Substrate 650 may also be formed from any transparent material in the visible spectral band (400 nm to 700 nm), such as glass, plastic, polymer, polymethylmethacrylate (PMMA), silicon dioxide, and any form of crystal (e.g., Lithium niobate, tellurium dioxide, etc.). The surface of substrate 650 may be bonded to a headset (eg, headset 100). The various components of optical microphone 600 may be bonded to substrate 650 by any standard bonding technique, and/or formed on the substrate by any standard etching or epitaxial growth technique.

图7为根据一个或多个实施例的被配置为使用光学相干层析成像(OCT)的光学传声器700的示例。OCT是LCI的一种形式。光学传声器700包括光源710、检测器720、波导结构730以及光学天线740,它们都为光子集成电路的一部分。在一些实施例中,光学传声器700可以包括附加的部件。光学传声器700是处于OCT配置的光学传声器145的实施例。光学传声器700可以:例如,嵌入到头戴式视图器的鼻托中、耦接到头戴式视图器的眼镜框(例如,在侧向辐射定位)等。在一些实施例中,光学传声器700的多个部件中的一些或全部部件可以结合到衬底(例如,衬底650)和/或形成到衬底上。Figure 7 is an example of an optical microphone 700 configured to use optical coherence tomography (OCT) in accordance with one or more embodiments. OCT is a form of LCI. Optical microphone 700 includes a light source 710, a detector 720, a waveguide structure 730, and an optical antenna 740, all of which are part of a photonic integrated circuit. In some embodiments, optical microphone 700 may include additional components. Optical microphone 700 is an embodiment of optical microphone 145 in an OCT configuration. Optical microphone 700 may be, for example, embedded in a nose pad of a headset, coupled to an eyeglass frame of a headset (eg, positioned in a lateral radiating position), or the like. In some embodiments, some or all of the components of optical microphone 700 may be bonded to and/or formed on a substrate (eg, substrate 650).

波导结构730是将光引导至光学传声器700的各种部件的光波导。例如,波导结构730可以经由多个部分将光源710(例如,可调谐激光源)、一个或多个分光器、一个或多个组合器、一个或多个光学天线740、检测器720、一个或多个光学放大器或它们的某种组合耦接在一起。该多个部分包括发射部分760、参考部分770、发射/接收部分780、感测部分785以及混合部分790。Waveguide structure 730 is an optical waveguide that guides light to the various components of optical microphone 700 . For example, the waveguide structure 730 may combine a light source 710 (eg, a tunable laser source), one or more beam splitters, one or more combiners, one or more optical antennas 740, a detector 720, a or Multiple optical amplifiers or some combination thereof are coupled together. The plurality of sections includes a transmit section 760, a reference section 770, a transmit/receive section 780, a sensing section 785, and a mixing section 790.

该波导结构还包括分光器792、分光器794、分光器796,并且还可以包括激光放大器。分光器794将来自发射部分760的光的一部分分开,该光的一部分被内耦合到K时钟中。分光器794将来自发射部分760的光的一部分(例如,50%)分开到发射/接收部分780中,并且将来自发射部分760的光的剩余部分(例如,剩余的50%)分开到参考部分770中。在一些实施例中,光功率的某种其它部分被分开到发射/接收部分780而不是参考部分770中。注意,分光器792还将来自发射/接收部分780的沿相反方向行进的光(即,从光学天线740内耦合的光)的一部分(例如,50%)分开到感测部分785中。分光器796将来自感测部分785的光的第一部分(例如,50%)和来自参考部分770的光的第一部分(例如,50%)组合到混合部分790的第一波导中,并且将感测部分785和参考部分770这两者中的光的剩余部分组合到混合部分790的第二通道中。光学传声器700可以包括放大光的一个或多个光学放大器。例如,可以将光学放大器定位为放大由光学天线740输出之前的光、和/或放大由光学天线740内耦合的光。The waveguide structure also includes a beam splitter 792, a beam splitter 794, a beam splitter 796, and may also include a laser amplifier. Optical splitter 794 splits a portion of the light from emission portion 760 that is incoupled into the K clock. Optical splitter 794 splits a portion (eg, 50%) of the light from transmit portion 760 into transmit/receive portion 780 and a remaining portion (eg, the remaining 50%) of the light from transmit portion 760 into the reference portion 770 in. In some embodiments, some other portion of the optical power is split into transmit/receive portion 780 instead of reference portion 770. Note that beam splitter 792 also splits a portion (eg, 50%) of light traveling in the opposite direction from transmit/receive portion 780 (ie, light coupled within optical antenna 740) into sensing portion 785. Optical splitter 796 combines a first portion (eg, 50%) of light from sensing portion 785 and a first portion (eg, 50%) of light from reference portion 770 into a first waveguide of mixing portion 790, and The remainder of the light in both the measurement portion 785 and the reference portion 770 is combined into the second channel of the mixing portion 790 . Optical microphone 700 may include one or more optical amplifiers that amplify light. For example, the optical amplifier may be positioned to amplify light prior to output by optical antenna 740 and/or to amplify light coupled in-coupled by optical antenna 740.

光源710(例如,可调谐激光源)发射光,该光耦合到波导结构730的发射部分760中。分光器794将来自发射部分760的光的一部分分开,该光的一部分然后被内耦合到K时钟中。K时钟使光源710同步,使得输出波长被线性扫描,从而在模数转换器(未示出——但是其功能可以由音频控制器执行)处产生相等的波数间隔,该模数转换器对检测器720检测到的混合信号进行处理,以将该混合信号转换为数字信号。剩余的光由发射部分760传输到分光器792。分光器792将发射的光划分成参考光束和感测光束,将参考光束提供给参考部分770,并且将感测光束提供给发射/接收部分780。发射/接收部分780将感测光束引导至光学天线740,该光学天线将光外耦合到局部区域中(例如,以照射用户的皮肤)。注意,在一些实施例中,可以在光由光学天线740发射之前使用光学放大器来放大该光。Light source 710 (eg, a tunable laser source) emits light, which couples into emission portion 760 of waveguide structure 730 . Optical splitter 794 splits a portion of the light from emission portion 760, which portion is then incoupled into the K clock. The K clock synchronizes the light source 710 so that the output wavelength is linearly scanned, resulting in equal wavenumber intervals at the analog-to-digital converter (not shown - but its function may be performed by the audio controller) that detects The mixed signal detected by the processor 720 is processed to convert the mixed signal into a digital signal. The remaining light is transmitted by emission portion 760 to beam splitter 792. The beam splitter 792 divides the emitted light into a reference beam and a sensing beam, provides the reference beam to the reference part 770 and provides the sensing beam to the transmitting/receiving part 780 . The transmit/receive portion 780 directs the sensing beam to an optical antenna 740, which outcouples the light into a localized area (eg, to illuminate the user's skin). Note that in some embodiments, an optical amplifier may be used to amplify the light before it is emitted by optical antenna 740.

感测光束的一部分被用户的皮肤反射和/或散射,并且经由光学天线740内耦合到发射接收部分780中。发射/接收部分780将内耦合后的光引导至分光器792。分光器792将该光的一部分分开到感测部分785中。感测部分785将光引导至分光器796。注意,在一些实施例中,可以在光被传递到分光器796之前使用光学放大器来放大该光。分光器796将来自感测部分785的光的第一部分(例如,50%)和来自参考部分770的光的第一部分(例如,50%)组合到混合部分790的第一波导中,并且将感测部分785和参考部分770这两者中的光的剩余部分组合到混合部分790的第二通道中。检测器720接收混合光束,并且经由一对平衡的光电检测器对相应的混合信号进行检测。A portion of the sensing beam is reflected and/or scattered by the user's skin and is coupled into the transmit-receive portion 780 via the optical antenna 740 . Transmit/receive section 780 guides the incoupled light to beam splitter 792 . Beam splitter 792 splits a portion of this light into sensing portion 785 . Sensing portion 785 directs light to beam splitter 796 . Note that in some embodiments, an optical amplifier may be used to amplify the light before it is passed to beam splitter 796. Optical splitter 796 combines a first portion (eg, 50%) of light from sensing portion 785 and a first portion (eg, 50%) of light from reference portion 770 into a first waveguide of mixing portion 790, and The remainder of the light in both the measurement portion 785 and the reference portion 770 is combined into the second channel of the mixing portion 790 . Detector 720 receives the mixed beam and detects the corresponding mixed signal via a pair of balanced photodetectors.

音频控制器对检测到的混合信号进行处理,以测量来自局部区域的声音。在OCT配置中,检测到的干涉图案(表示为混合信号)是波长/波数的函数,并且提供皮肤沿光束轴的轴向轮廓,其中,条纹频率对应于皮肤的深度,并且该条纹频率的振幅对应于皮肤的反射率。注意,存在各种形式的OCT,并且在其它实施例中,光学传声器700可以被配置为以这些其它形式中的一种形式(例如,相敏OCT)运行。The audio controller processes the detected mixed signal to measure the sound coming from the local area. In the OCT configuration, the detected interference pattern (expressed as a mixed signal) is a function of wavelength/wavenumber and provides an axial profile of the skin along the beam axis, where the fringe frequency corresponds to the depth of the skin, and the amplitude of this fringe frequency Corresponds to the reflectance of the skin. Note that there are various forms of OCT, and in other embodiments, optical microphone 700 may be configured to operate in one of these other forms (eg, phase-sensitive OCT).

图8为示出了根据一个或多个实施例的使用处于干涉配置的光学接触式换能器的过程的流程图。图8中示出的过程可以由音频系统(例如,音频系统200)的部件来执行。在其它实施例中,其它实体可以执行图8中的多个步骤中的一些或全部步骤。实施例可以包括不同的步骤和/或附加的步骤,或以不同的顺序执行该多个步骤。8 is a flowchart illustrating a process for using an optical contact transducer in an interference configuration in accordance with one or more embodiments. The process illustrated in Figure 8 may be performed by components of an audio system (eg, audio system 200). In other embodiments, other entities may perform some or all of the various steps in Figure 8. Embodiments may include different steps and/or additional steps, or perform the multiple steps in a different order.

音频系统从光学传声器的光源发射810光,该光包括参考光束和感测光束。该光学传声器是光学传声器145的实施例,并且可以如上文参考图3、图4、图5A、图5B和图5C所描述的那样构造。光学传声器可以集成到头戴式视图器中。发射的光是连续波,并且其驱动电流可以被调制(例如,10kHz)。光学传声器被定位为对由音频系统的局部区域中的声音(例如,用户的语音、其他人、噪声源等)引起的用户的皮肤的振动进行监测。The audio system emits 810 light from the light source of the optical microphone, which light includes a reference beam and a sensing beam. This optical microphone is an embodiment of optical microphone 145 and may be constructed as described above with reference to Figures 3, 4, 5A, 5B and 5C. Optical microphones can be integrated into headsets. The emitted light is a continuous wave, and its driving current can be modulated (e.g., 10kHz). Optical microphones are positioned to monitor vibrations of the user's skin caused by sounds in localized areas of the audio system (eg, the user's voice, other people, noise sources, etc.).

音频系统用感测光束照射820用户的皮肤(例如,面部的一个或多个不同或相同部分)。例如,感测光束可以通过光学传声器的光学元件折射,以照射用户的皮肤。The audio system illuminates 820 the user's skin (eg, one or more different or the same portions of the face) with the sensing beam. For example, the sensing beam may be refracted through the optical elements of the optical microphone to illuminate the user's skin.

感测光束的一部分从用户的皮肤散射和/或反射,以在检测器处与参考光束混合以形成混合光束。A portion of the sensing beam is scattered and/or reflected from the user's skin to mix with the reference beam at the detector to form a mixed beam.

音频系统经由与光源处于干涉配置的检测器来对混合信号(检测到的混合光束)进行检测830。干涉配置使得光源和检测器形成干涉系统(例如,自混合干涉仪、迈克尔逊干涉仪、OCT、LDV等)。The audio system detects 830 the mixed signal (detected mixed light beam) via a detector in an interference configuration with the light source. The interference configuration enables the light source and detector to form an interference system (eg, self-mixing interferometer, Michelson interferometer, OCT, LDV, etc.).

音频系统使用混合信号来测量840局部区域中的声音。检测到的混合信号包括动态高频分量和调制分量。该动态高频分量随着皮肤被照射的部分和检测器之间的距离的变化而改变(频率移位)。注意,振动测量的振幅可以是例如50nm(例如,用户低语)至1.5微米(例如,用户叫喊)。因此,局部区域中的声音在皮肤中引起的振动会导致动态高频分量的变化。音频系统通过根据动态高频分量推断出会引起用户的皮肤的振动的相应的声音来测量640声音。此外,由于声音所引起的用户的皮肤的振动与例如用户的运动(例如,行走、奔跑)所引起的振动具有非常不同的频率,该音频系统可以将动态高频分量中对应于来自局部区域的声音的部分隔离和/或滤出。Audio systems use mixed signals to measure sound in 840 local areas. The detected mixed signal includes dynamic high-frequency components and modulation components. This dynamic high-frequency component changes (frequency shift) as the distance between the illuminated part of the skin and the detector changes. Note that the amplitude of the vibration measurement may be, for example, 50 nm (eg, user whispers) to 1.5 microns (eg, user shouts). Therefore, the vibrations induced in the skin by sounds in local areas lead to changes in dynamic high-frequency components. The audio system measures 640 sounds by inferring corresponding sounds based on dynamic high frequency components that cause vibrations in the user's skin. In addition, since the vibration of the user's skin caused by the sound has a very different frequency from the vibration caused by the user's movement (e.g., walking, running), the audio system can convert the dynamic high-frequency component corresponding to the vibration from the local area. Partial isolation and/or filtering out of sound.

在一些实施例中,所测量的声音包括用户的语音,并且该语音的高频分量相对于该语音的低频衰减。音频系统可以通过例如基于矩阵分解的带宽扩展、基于神经网络的音频超分辨率等,对该语音的高频分量进行重构。然后,音频系统可以用重构的高频分量对所测量的该语音的声音进行更新。In some embodiments, the measured sound includes the user's speech, and the high frequency components of the speech are attenuated relative to the low frequencies of the speech. The audio system can reconstruct the high-frequency components of the speech through, for example, matrix decomposition-based bandwidth expansion, neural network-based audio super-resolution, etc. The audio system can then update the measured sound of the speech with the reconstructed high-frequency components.

音频系统可以部分地基于由光学传声器测量的声音来执行各种动作。并且在一些实施例中,其中音频系统还包括传声器阵列,音频系统可以部分地基于由光学传声器测量的声音、由传声器阵列检测的声音或它们的某种组合来执行多种动作。多种动作可以包括:例如,增强用户的语音、使用一个或多个光学传声器用于语音活动检测(VAD)、执行主动降噪等。Audio systems can perform various actions based in part on sound measured by optical microphones. And in some embodiments, where the audio system also includes a microphone array, the audio system may perform a variety of actions based in part on sound measured by the optical microphone, sound detected by the microphone array, or some combination thereof. Various actions may include, for example, enhancing the user's voice, using one or more optical microphones for voice activity detection (VAD), performing active noise reduction, etc.

图9为根据一个或多个实施例的包括头戴式视图器905的系统900。在一些实施例中,头戴式视图器905可以是图1A的头戴式视图器100、或图1B的头戴式视图器105。系统900可以在人工现实环境(例如,虚拟现实环境、增强现实环境、混合现实环境或它们的某种组合)中运行。图9所示出的系统900包括头戴式视图器905、耦接到控制台915的输入/输出(input/output,I/O)接口910、网络920、以及地图构建服务器925。尽管图9示出了包括一个头戴式视图器905和一个I/O接口910的示例系统900,但是在其它实施例中,任何数量的这些部件可以被包括在系统900中。例如,可以存在多个头戴式视图器,每个头戴式视图器具有相关联的I/O接口910,其中,每个头戴式视图器和I/O接口910与控制台915通信。在替代配置中,系统900可以包括不同的和/或附加的部件。另外,在一些实施例中,结合图9中示出的一个或多个部件描述的功能可以以与结合图9所描述的方式不同的方式分布在多个部件之中。例如,控制台915的一些或全部功能可以由头戴式视图器905来提供。Figure 9 is a system 900 including a headset 905 in accordance with one or more embodiments. In some embodiments, headset 905 may be headset 100 of Figure IA, or headset 105 of Figure IB. System 900 may operate in an artificial reality environment (eg, a virtual reality environment, an augmented reality environment, a mixed reality environment, or some combination thereof). System 900 shown in FIG. 9 includes a headset 905, an input/output (I/O) interface 910 coupled to a console 915, a network 920, and a map construction server 925. Although FIG. 9 shows an example system 900 that includes a headset 905 and an I/O interface 910, in other embodiments, any number of these components may be included in the system 900. For example, there may be multiple headsets, each having an associated I/O interface 910 , where each headset and I/O interface 910 communicate with a console 915 . In alternative configurations, system 900 may include different and/or additional components. Additionally, in some embodiments, functionality described in connection with one or more components shown in FIG. 9 may be distributed among the components in a manner different from that described in connection with FIG. 9 . For example, some or all of the functionality of console 915 may be provided by headset 905 .

头戴式视图器905包括显示组件930、光学器件块935、一个或多个定位传感器990、以及DCA 945。头戴式视图器905的一些实施例具有与结合图9所描述的这些部件不同的部件。另外,在其它实施例中,结合图9所描述的各种部件所提供的功能可以不同地分布在头戴式视图器905的多个部件之中,或在远离头戴式视图器905的单独组件中被检测到。Headset 905 includes display assembly 930 , optics block 935 , one or more positioning sensors 990 , and DCA 945 . Some embodiments of the headset 905 have different components than those described in conjunction with FIG. 9 . Additionally, in other embodiments, the functionality provided by the various components described in connection with FIG. 9 may be distributed differently among the components of the headset 905 , or in a separate component remote from the headset 905 . component was detected.

显示组件930根据从控制台915接收的数据向用户显示内容。显示组件930使用一个或多个显示元件(例如,显示元件120)来显示内容。显示元件可以是例如电子显示器。在各种实施例中,显示组件930包括单个显示元件或多个显示元件(例如,对于用户的每只眼睛一个显示器)。电子显示器的示例包括:液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、有源矩阵有机发光二极管显示器(active-matrix organic light-emitting diode display,AMOLED)、波导显示器、某种其它显示器或它们的某种组合。注意,在一些实施例中,显示元件120还可以包括光学器件块935的一些或全部功能。Display component 930 displays content to the user based on data received from console 915. Display component 930 displays content using one or more display elements (eg, display element 120). The display element may be, for example, an electronic display. In various embodiments, display component 930 includes a single display element or multiple display elements (eg, one display for each eye of the user). Examples of electronic displays include: liquid crystal display (LCD), organic light emitting diode (OLED) display, active-matrix organic light-emitting diode display (AMOLED), A waveguide display, some other display, or some combination thereof. Note that in some embodiments, display element 120 may also include some or all of the functionality of optics block 935 .

光学器件块935可以放大从电子显示器接收的图像光,校正与该图像光相关联的光学误差,并且向头戴式视图器905的一个或两个适眼区呈现经校正的图像光。在各种实施例中,光学器件块935包括一个或多个光学元件。包括在光学器件块935中的示例光学元件包括:光圈、菲涅耳透镜、凸透镜、凹透镜、滤光器、反射表面、或影响图像光的任何其它合适的光学元件。此外,光学器件块935可以包括不同光学元件的组合。在一些实施例中,光学器件块935中的一个或多个光学元件可以具有一个或多个涂层,例如部分反射涂层或抗反射涂层。Optics block 935 may amplify image light received from the electronic display, correct optical errors associated with the image light, and present the corrected image light to one or both eye zones of the headset 905 . In various embodiments, optics block 935 includes one or more optical elements. Example optical elements included in optics block 935 include: apertures, Fresnel lenses, convex lenses, concave lenses, filters, reflective surfaces, or any other suitable optical element that affects image light. Additionally, optics block 935 may include a combination of different optical elements. In some embodiments, one or more optical elements in optics block 935 may have one or more coatings, such as a partially reflective coating or an anti-reflective coating.

通过光学器件块935对图像光的放大和聚焦允许电子显示器与较大的显示器相比:在物理上更小、重量更轻并且消耗更少功率。另外,放大可以增大电子显示器所呈现的内容的视场。例如,所显示的内容的视场使得所显示的内容使用几乎全部的用户的视场(例如,约110度对角线)来呈现,并且在一些情况下,所显示的内容使用全部的用户的视场来呈现。另外,在一些实施例中,可以通过添加或移除光学元件来调整放大量。The amplification and focusing of the image light by the optics block 935 allows the electronic display to be physically smaller, lighter weight and consume less power than larger displays. Additionally, magnification can increase the field of view of content presented on an electronic display. For example, the field of view of the displayed content is such that the displayed content is presented using substantially all of the user's field of view (e.g., approximately 110 degrees diagonal), and in some cases, the displayed content uses all of the user's field of view. The field of view is presented. Additionally, in some embodiments, the amount of amplification can be adjusted by adding or removing optical elements.

在一些实施例中,光学器件块935可以被设计为校正一种或多种类型的光学误差。光学误差的示例包括桶形失真或枕形失真、纵向色差或横向色差。其它类型的光学误差还可以包括球面像差;色差;或由于透镜场曲、像散引起的误差;或任何其它类型的光学误差。在一些实施例中,提供给电子显示器用于显示的内容是预失真的,并且光学器件块935在其接收到来自电子显示器的基于内容生成的图像光时,校正该失真。In some embodiments, optics block 935 may be designed to correct for one or more types of optical errors. Examples of optical errors include barrel or pincushion distortion, longitudinal chromatic aberration, or lateral chromatic aberration. Other types of optical errors may also include spherical aberration; chromatic aberration; or errors due to lens field curvature, astigmatism; or any other type of optical error. In some embodiments, the content provided to the electronic display for display is pre-distorted, and the optics block 935 corrects the distortion when it receives image light from the electronic display that is generated based on the content.

定位传感器940是如下的电子设备:该电子设备生成指示头戴式视图器905的定位的数据。定位传感器940响应于头戴式视图器905的运动而生成一个或多个测量信号。定位传感器190是定位传感器940的实施例。定位传感器940的示例包括:一个或多个IMU、一个或多个加速度计、一个或多个陀螺仪、一个或多个磁力计、检测运动的另一合适类型的传感器、或它们的某种组合。定位传感器940可以包括用于测量平移运动(向前/向后、向上/向下、向左/向右)的多个加速度计和用于测量转动运动(例如,俯仰、偏航、滚转)的多个陀螺仪。在一些实施例中,IMU快速地对测量信号进行采样,并且根据所采样的数据计算头戴式视图器905的估计定位。例如,IMU随时间对从加速度计接收的测量信号进行积分以估计速度矢量,并且随时间对速度矢量进行积分以确定头戴式视图器905上的参考点的估计定位。参考点是可以用于描述头戴式视图器905的定位的点。尽管参考点通常可以被定义为空间中的点,然而,在实践中,参考点被定义为头戴式视图器905内的点。Position sensor 940 is an electronic device that generates data indicative of the position of headset 905 . Positioning sensor 940 generates one or more measurement signals in response to movement of headset 905 . Position sensor 190 is an embodiment of position sensor 940 . Examples of positioning sensors 940 include: one or more IMUs, one or more accelerometers, one or more gyroscopes, one or more magnetometers, another suitable type of sensor that detects motion, or some combination thereof . Positioning sensors 940 may include a plurality of accelerometers for measuring translational motion (forward/backward, up/down, left/right) and for measuring rotational motion (e.g., pitch, yaw, roll) of multiple gyroscopes. In some embodiments, the IMU rapidly samples the measurement signal and calculates an estimated position of the headset 905 based on the sampled data. For example, the IMU integrates measurement signals received from the accelerometer over time to estimate a velocity vector, and integrates the velocity vector over time to determine the estimated position of a reference point on the headset 905 . A reference point is a point that can be used to describe the positioning of headset 905 . Although the reference point may generally be defined as a point in space, in practice, the reference point is defined as a point within the headset 905 .

DCA 945生成局部区域的一部分的深度信息。DCA包括一个或多个成像设备以及DCA控制器。DCA 945还可以包括照明器。上文关于图1A对DCA 945的操作和结构进行了描述。DCA 945 generates depth information for a portion of a local area. DCA includes one or more imaging devices and a DCA controller. The DCA 945 can also include an illuminator. The operation and structure of DCA 945 are described above with respect to Figure 1A.

音频系统950向头戴式视图器905的用户提供音频内容。音频系统950与上述音频系统200基本上相同。音频系统950可以包括一个或多个声学传感器(例如,作为传感器阵列的一部分)、一个或多个换能器(例如,作为换能器阵列的一部分)、一个或多个光学传声器、以及音频控制器。如上文关于例如图1至图6所描述的,来自一个或多个光学传声器的输出信号促进音频系统950在低声SNR环境中表现良好。在一些实施例中,来自一个或多个光学传声器的输出信号可以用于例如校准传感器阵列;用于主动降噪、VAD等。音频系统950可以向用户提供空间化的音频内容。在一些实施例中,音频系统950可以通过网络920请求来自地图构建服务器925的声学参数。声学参数描述了局部区域的一个或多个声学特性(例如,房间脉冲响应、混响时间、混响水平等)。音频系统950可以提供例如来自DCA 945的、描述局部区域的至少一部分的信息和/或来自定位传感器940的、头戴式视图器905的位置信息。音频系统950可以使用从地图构建服务器925接收的多个声学参数中的一个或多个声学参数来生成一个或多个声音滤波器,并且使用声音滤波器来向用户提供音频内容。Audio system 950 provides audio content to the user of headset 905 . Audio system 950 is substantially the same as audio system 200 described above. Audio system 950 may include one or more acoustic sensors (e.g., as part of a sensor array), one or more transducers (e.g., as part of a transducer array), one or more optical microphones, and audio controls device. As described above with respect to, for example, FIGS. 1-6, the output signal from one or more optical microphones facilitates the audio system 950 to perform well in low-pitched SNR environments. In some embodiments, output signals from one or more optical microphones may be used, for example, to calibrate a sensor array; for active noise reduction, VAD, etc. Audio system 950 can provide spatialized audio content to users. In some embodiments, audio system 950 may request acoustic parameters from map building server 925 over network 920 . Acoustic parameters describe one or more acoustic properties of a local area (e.g., room impulse response, reverberation time, reverberation level, etc.). Audio system 950 may provide information describing at least a portion of the local area and/or position information of headset 905 from positioning sensor 940 , for example. Audio system 950 may generate one or more sound filters using one or more of the plurality of acoustic parameters received from map building server 925 and use the sound filters to provide audio content to the user.

I/O接口910是如下的设备:该设备允许用户向控制台915发送动作请求并从控制台915接收响应。动作请求是执行特定动作的请求。例如,动作请求可以是开始或结束图像或视频数据的检测的指令、或在应用程序内执行特定动作的指令。I/O接口910可以包括一个或多个输入设备。示例输入设备包括:键盘、鼠标、游戏控制器、或用于接收动作请求并将动作请求传送到控制台915的任何其它合适的设备。由I/O接口910接收的动作请求被传送到控制台915,该控制台执行与动作请求相对应的动作。在一些实施例中,I/O接口910包括检测校准数据的IMU,该校准数据指示I/O接口910相对于I/O接口910的初始定位的估计定位。在一些实施例中,I/O接口910可以根据从控制台915接收的指令来向用户提供触觉反馈。例如,当接收到动作请求时提供触觉反馈,或控制台915在该控制台915执行动作时向I/O接口910传送指令,促使I/O接口910生成触觉反馈。I/O interface 910 is a device that allows a user to send action requests to console 915 and receive responses from console 915 . An action request is a request to perform a specific action. For example, an action request may be an instruction to start or end detection of image or video data, or an instruction to perform a specific action within an application. I/O interface 910 may include one or more input devices. Example input devices include a keyboard, mouse, game controller, or any other suitable device for receiving and communicating action requests to console 915 . Action requests received by I/O interface 910 are passed to console 915, which performs actions corresponding to the action requests. In some embodiments, I/O interface 910 includes an IMU that detects calibration data indicating an estimated position of I/O interface 910 relative to an initial position of I/O interface 910 . In some embodiments, I/O interface 910 can provide tactile feedback to the user based on instructions received from console 915 . For example, tactile feedback is provided when an action request is received, or the console 915 transmits instructions to the I/O interface 910 when the console 915 performs an action, causing the I/O interface 910 to generate tactile feedback.

控制台915根据从以下项中的一个或多个接收的信息向头戴式视图器905提供内容以供处理:DCA 945、头戴式视图器905和I/O接口910。在图9示出的示例中,控制台915包括应用程序存储库955、追踪模块960和引擎965。控制台915的一些实施例具有与结合图9所描述的这些模块或部件不同的模块或部件。类似地,下文进一步描述的功能可以以与结合图9所描述的方式不同的方式分布在控制台915的多个部件之中。在一些实施例中,本文关于控制台915所论述的功能可以在头戴式视图器905或远程系统中实现。Console 915 provides content to headset 905 for processing based on information received from one or more of: DCA 945, headset 905, and I/O interface 910. In the example shown in FIG. 9 , console 915 includes application repository 955 , tracking module 960 and engine 965 . Some embodiments of the console 915 have different modules or components than those described in conjunction with FIG. 9 . Similarly, functionality described further below may be distributed among the various components of console 915 in a manner different from that described in connection with FIG. 9 . In some embodiments, the functionality discussed herein with respect to console 915 may be implemented in headset 905 or a remote system.

应用程序存储库955存储一个或多个应用程序以供控制台915执行。应用程序是一组指令,该组指令在由处理器执行时,生成用于向用户呈现的内容。由应用程序生成的内容可以响应于经由头戴式视图器905或I/O接口910的移动从用户接收的输入。应用程序的示例包括:游戏应用程序、会议应用程序、视频播放应用程序或其它合适的应用程序。Application repository 955 stores one or more applications for execution by console 915. An application is a set of instructions that, when executed by a processor, generate content for presentation to a user. Content generated by the application may be responsive to input received from the user via movement of the headset 905 or I/O interface 910 . Examples of applications include: gaming applications, conferencing applications, video playback applications, or other suitable applications.

追踪模块960使用来自DCA 945、一个或多个定位传感器940或它们的某种组合的信息,来追踪头戴式视图器905的移动或I/O接口910的移动。例如,追踪模块960基于来自头戴式视图器905的信息,确定头戴式视图器905的参考点在局部区域的地图构建中的定位。追踪模块960还可以确定对象或虚拟对象的定位。另外,在一些实施例中,追踪模块960可以使用来自定位传感器940的指示头戴式视图器905的定位的数据的部分以及来自DCA 945的局部区域的表示,来预测头戴式视图器905的未来位置。追踪模块960向引擎965提供估计的或预测的头戴式视图器905或I/O接口910的未来定位。Tracking module 960 tracks the movement of headset 905 or the movement of I/O interface 910 using information from DCA 945, one or more positioning sensors 940, or some combination thereof. For example, tracking module 960 determines the location of a reference point of headset 905 in the construction of a map of the local area based on information from headset 905 . Tracking module 960 may also determine the location of an object or virtual object. Additionally, in some embodiments, tracking module 960 may use portions of the data from positioning sensor 940 indicating the positioning of headset 905 and representations of local areas from DCA 945 to predict the positioning of headset 905 future location. Tracking module 960 provides an estimated or predicted future position of headset 905 or I/O interface 910 to engine 965 .

引擎965执行应用程序,并且从追踪模块960接收头戴式视图器905的定位信息、加速度信息、速度信息、预测的未来定位或它们的某种组合。基于接收到的信息,引擎965确定要向头戴式视图器905提供的用于向用户呈现的内容。例如,如果接收到的信息指示用户已经看向左边,则引擎965生成用于头戴式视图器905的内容,该内容反映用户在虚拟局部区域或用附加的内容增强了局部区域的局部区域中的移动。另外,引擎965响应于从I/O接口910接收的动作请求,在控制台915上执行的应用程序内执行动作,并且向用户提供该动作被执行的反馈。提供的反馈可以是经由头戴式视图器905的视觉反馈或听觉反馈,或经由I/O接口910的触觉反馈。Engine 965 executes the application and receives positioning information, acceleration information, velocity information, predicted future positioning, or some combination thereof of headset 905 from tracking module 960 . Based on the received information, engine 965 determines content to be provided to headset 905 for presentation to the user. For example, if information received indicates that the user has looked to the left, engine 965 generates content for headset 905 that reflects the user's presence in a virtual local area or a local area enhanced with additional content. of movement. Additionally, engine 965 performs actions within the application executing on console 915 in response to action requests received from I/O interface 910 and provides feedback to the user that the action was performed. The feedback provided may be visual feedback or auditory feedback via the headset 905 , or tactile feedback via the I/O interface 910 .

网络920将头戴式视图器905和/或控制台915耦接到地图构建服务器925。网络920可以包括使用无线通信系统和/或有线通信系统的局域网和/或广域网的任何组合。例如,网络920可以包括因特网以及移动电话网络。在一个实施例中,网络920使用标准通信技术和/或协议。因此,网络920可以包括使用如下技术的链路:例如以太网、802.11、全球微波接入互操作性(worldwide interoperability for microwave access,WiMAX)、2G/3G/9G移动通信协议、数字用户线路(digital subscriber line,DSL)、异步传输模式(asynchronous transfer mode,ATM)、无限带宽(InfiniBand)、PCI Express高级交换(PCIExpress Advanced Switching)等。类似地,在网络920上使用的联网协议可以包括多协议标签交换(multiprotocol label switching,MPLS)、传输控制协议/因特网协议(transmission control protocol/Internet protocol,TCP/IP)、用户数据报协议(UserDatagram Protocol,UDP)、超文本传输协议(hypertext transport protocol,HTTP)、简单邮件传输协议(simple mail transfer protocol,SMTP)、文件传输协议(file transferprotocol,FTP)等。通过网络920交换的数据可以使用如下技术和/或格式来表示:这些技术和/或格式包括二进制形式的图像数据(例如,便携式网络图形(Portable NetworkGraphic,PNG))、超文本标记语言(hypertext markup language,HTML)、可扩展标记语言(extensible markup language,XML)等。此外,可以使用常规加密技术(例如安全套接层(secure sockets layer,SSL)、传输层安全(transport layer security,TLS)、虚拟专用网络(virtual private network,VPN)、互联网安全协议(Internet Protocol security,IPsec)等)对全部或一些链路进行加密。Network 920 couples headset 905 and/or console 915 to map building server 925 . Network 920 may include any combination of local area networks and/or wide area networks using wireless communication systems and/or wired communication systems. For example, network 920 may include the Internet and mobile phone networks. In one embodiment, network 920 uses standard communications technologies and/or protocols. Therefore, network 920 may include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 2G/3G/9G mobile communication protocols, digital subscriber lines (DSL), etc. subscriber line (DSL), asynchronous transfer mode (ATM), unlimited bandwidth (InfiniBand), PCI Express Advanced Switching (PCIExpress Advanced Switching), etc. Similarly, networking protocols used on network 920 may include multiprotocol label switching (MPLS), transmission control protocol/Internet protocol (TCP/IP), User Datagram Protocol (UserDatagram) Protocol (UDP), hypertext transport protocol (HTTP), simple mail transfer protocol (SMTP), file transfer protocol (FTP), etc. Data exchanged over network 920 may be represented using technologies and/or formats including binary forms of image data (eg, Portable Network Graphics (PNG)), hypertext markup language (HTML), extensible markup language (XML), etc. In addition, conventional encryption technologies (such as secure sockets layer (SSL), transport layer security (TLS), virtual private network (VPN), Internet Protocol security, IPsec), etc.) to encrypt all or some links.

地图构建服务器925可以包括数据库,该数据库存储有描述多个空间的虚拟模型,其中,该虚拟模型中的一个位置与头戴式视图器905的局部区域的当前配置相对应。地图构建服务器925经由网络920从头戴式视图器905接收描述局部区域的至少一部分的信息和/或局部区域的位置信息。用户可以调整隐私设置以允许或阻止头戴式视图器905将信息发送到地图构建服务器925。地图构建服务器925基于接收到的信息和/或位置信息,确定该虚拟模型中与头戴式视图器905的局部区域相关联的位置。地图构建服务器925部分地基于所确定的在该虚拟模型中的位置以及与所确定的位置相关联的任何声学参数,确定(例如,检索)与局部区域相关联的一个或多个声学参数。地图构建服务器925可以向头戴式视图器905发送局部区域的位置以及与局部区域关联的任何声学参数的值。The map building server 925 may include a database storing virtual models describing a plurality of spaces, where a location in the virtual model corresponds to the current configuration of the local area of the headset 905 . The map construction server 925 receives information describing at least a portion of the local area and/or location information of the local area from the headset 905 via the network 920 . The user can adjust privacy settings to allow or prevent the headset 905 from sending information to the map building server 925 . The map construction server 925 determines the location in the virtual model associated with the local area of the headset 905 based on the received information and/or location information. The map building server 925 determines (eg, retrieves) one or more acoustic parameters associated with the local area based in part on the determined location in the virtual model and any acoustic parameters associated with the determined location. The map building server 925 may send the location of the local area and the values of any acoustic parameters associated with the local area to the headset 905 .

系统900的一个或多个部件可以包含隐私模块,该隐私模块存储用户数据元素的一个或多个隐私设置。用户数据元素对用户或头戴式视图器905进行了描述。例如,用户数据元素可以描述用户的物理特征、由用户执行的动作、头戴式视图器905的用户的位置、头戴式视图器905的位置、用户的HRTF等。用户数据元素的隐私设置(或“访问设置”)可以以任何合适的方式存储,诸如,例如,与用户数据元素相关联地存储、在授权服务器上的索引中存储、以另一合适的方式存储或它们的任何合适的组合。One or more components of system 900 may include a privacy module that stores one or more privacy settings for user data elements. The user data element describes the user or headset 905. For example, the user data elements may describe the user's physical characteristics, actions performed by the user, the user's location of the headset 905 , the location of the headset 905 , the user's HRTF, etc. Privacy settings (or "access settings") for user data elements may be stored in any suitable manner, such as, for example, stored in association with the user data element, stored in an index on the authorization server, stored in another suitable manner or any suitable combination of them.

用户数据元素的隐私设置指定可以如何访问、存储或以其它方式使用(例如,查看、共享、修改、复制、执行、显露或识别)用户数据元素(或与用户数据元素相关联的特定信息)。在一些实施例中,用户数据元素的隐私设置可以指定可能无法访问与用户数据元素相关联的某些信息的实体的“黑名单”。与用户数据元素相关联的隐私设置可以指定许可访问或拒绝访问的任何合适的粒度。例如,一些实体可以具有查明特定用户数据元素存在的权限,一些实体可以具有查看特定用户数据元素的内容的权限,并且一些实体可以具有修改特定用户数据元素的权限。隐私设置可以允许用户允许其他实体在有限的时间段内访问或存储用户数据元素。Privacy settings for a User Data Element specify how the User Data Element (or certain information associated with the User Data Element) may be accessed, stored, or otherwise used (e.g., viewed, shared, modified, copied, performed, exposed, or identified). In some embodiments, a user data element's privacy setting may specify a "blacklist" of entities that may not have access to certain information associated with the user data element. Privacy settings associated with user data elements can specify any suitable granularity for granting access or denying access. For example, some entities may have permission to ascertain the existence of a specific user data element, some entities may have permission to view the content of a specific user data element, and some entities may have permission to modify a specific user data element. Privacy settings may allow users to allow other entities to access or store user data elements for a limited period of time.

隐私设置可以允许用户指定可以从中访问用户数据元素的一个或多个地理位置。对用户数据元素的访问或拒绝访问可以取决于试图访问用户数据元素的实体的地理位置。例如,用户可以允许访问用户数据元素,并且指定仅在用户处于特定位置时用户数据元素对于实体是可访问的。如果用户离开该特定位置,则用户数据元素对于该实体可能不再是可访问的。作为另一示例,用户可以指定用户数据元素仅对于距离用户阈值距离内的实体(例如在与该用户相同的局部区域内的头戴式视图器的另一用户)是可访问的。如果用户随后改变位置,则具有对该用户数据元素的访问权限的实体可能失去访问权限,而新的一组实体在它们进入用户的阈值距离内时可以获得访问权限。Privacy settings may allow users to specify one or more geographic locations from which user data elements may be accessed. Access or denial of access to a user data element may depend on the geographic location of the entity attempting to access the user data element. For example, a user can allow access to user data elements and specify that the user data elements are accessible to an entity only when the user is in a specific location. If the user leaves that particular location, the user data element may no longer be accessible to that entity. As another example, a user may specify that user data elements are only accessible to entities within a threshold distance from the user (eg, another user of the headset within the same local area as the user). If the user subsequently changes location, entities with access to that user's data element may lose access while a new set of entities gain access when they come within a threshold distance of the user.

系统900可以包括用于实施隐私设置的一个或多个授权/隐私服务器。来自实体的、针对特定用户数据元素的请求可以识别与该请求相关联的实体,并且如果授权服务器基于与该用户数据元素相关联的隐私设置确定该实体被授权访问该用户数据元素,则可以仅向该实体发送该用户数据元素。如果请求实体未被授权访问该用户数据元素,则授权服务器可以阻止所请求的用户数据元素被检索或可以阻止所请求的用户数据元素被发送到该实体。尽管本公开描述了以特定方式实施隐私设置,但是本公开考虑以任何合适的方式实施隐私设置。System 900 may include one or more authorization/privacy servers for enforcing privacy settings. A request from an entity for a specific user data element may identify the entity associated with the request and may only be requested if the authorization server determines that the entity is authorized to access the user data element based on the privacy settings associated with the user data element. Send this user data element to this entity. If the requesting entity is not authorized to access the user data element, the authorization server may prevent the requested user data element from being retrieved or may prevent the requested user data element from being sent to the entity. Although this disclosure describes implementing privacy settings in a particular manner, this disclosure contemplates implementing privacy settings in any suitable manner.

附加配置信息Additional configuration information

出于说明的目的呈现实施例的上述描述,这并不旨在是穷举的或将专利权限制于所公开的确切形式。相关领域的技术人员可以理解的是,关于上述公开,许多修改和变化是可能的。The foregoing description of embodiments is presented for the purpose of illustration, and is not intended to be exhaustive or to limit patent rights to the precise forms disclosed. Those skilled in the relevant art will appreciate that many modifications and variations are possible with respect to the above disclosure.

本描述的一些部分就对信息的操作的算法和符号表示而言描述了多个实施例。这些算法描述和表示通常由数据处理领域的技术人员使用以向本领域的其它技术人员有效地传达其工作的实质内容。尽管在功能上、计算上或逻辑上对这些操作进行了描述,但是这些操作应被理解为由计算机程序或等效电路、微代码等实现。此外,事实证明,在不失一般性的情况下,有时将这些操作的布置称为模块是方便的。所描述的操作和它们的相关联的模块可以实施在软件、固件、硬件或它们的任何组合中。Some portions of this description describe various embodiments in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to effectively convey the substance of their work to others skilled in the art. Although these operations are described functionally, computationally, or logically, these operations should be understood to be implemented by computer programs or equivalent circuits, microcode, or the like. Furthermore, it has proven convenient, without loss of generality, to sometimes refer to arrangements of these operations as modules. The described operations and their associated modules may be implemented in software, firmware, hardware, or any combination thereof.

本文所描述的任何步骤、操作或过程可以用一个或多个硬件或软件模块单独地或与其它设备组合地执行或实现。在一个实施例中,用包括计算机可读介质的计算机程序产品实现软件模块,该计算机可读介质包含计算机程序代码,该计算机程序代码可以由计算机处理器执行,用于执行所描述的任何或全部步骤、操作或过程。Any steps, operations or processes described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented as a computer program product comprising a computer-readable medium embodying computer program code, the computer program code being executable by a computer processor for performing any or all of the described A step, operation, or process.

实施例还可以涉及一种用于执行本文中的操作的装置。此装置可以出于所需目的专门构造,和/或此装置可以包括通用计算设备,该通用计算设备由存储在计算机中的计算机程序选择性地激活或重新配置。这种计算机程序可以存储在非暂时性有形计算机可读存储介质或适合于存储电子指令的任何类型的介质中,上述介质可以耦接到计算机系统总线。此外,本说明书中提及的任何计算系统可以包括单个处理器,或可以是采用多个处理器设计用于增加的计算能力的架构。Embodiments may also relate to an apparatus for performing the operations herein. Such apparatus may be specially constructed for the required purposes, and/or such apparatus may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such computer program may be stored on a non-transitory tangible computer-readable storage medium or any type of medium suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing system mentioned in this specification may include a single processor, or may be an architecture designed with multiple processors for increased computing power.

实施例还可以涉及一种由本文所描述的计算过程产生的产品。这种产品可以包括由计算过程产生的信息,其中,该信息被存储在非暂时性有形计算机可读存储介质上并且可以包括计算机程序产品的任何实施例或本文所描述的其它数据组合。Embodiments may also relate to a product resulting from the computational processes described herein. Such a product may include information resulting from a computing process, where the information is stored on a non-transitory tangible computer-readable storage medium and may include any embodiment of a computer program product or other data combination described herein.

最后,本说明书中使用的语言主要是出于可读性和指导目的而选择的,并且该语言可能不是为了描述或限定专利权而选择的。因此,其旨在专利权的范围不受本具体实施方式的限制,而是受基于本文的申请上公布的任何权利要求的限制。因此,实施例的公开旨在对专利权的范围进行例示而非限制,专利权的范围在所附权利要求中被阐述。Finally, the language used in this specification has been selected primarily for readability and instructional purposes, and the language may not have been selected to describe or qualify patent rights. Accordingly, it is intended that the scope of patent rights be limited not by this detailed description, but by any claims published on an application based hereof. Accordingly, the disclosure of the embodiments is intended to be illustrative and not limiting of the scope of patent rights, which is set forth in the appended claims.

Claims (20)

1.一种音频系统,包括:1. An audio system, comprising: 光学传声器,所述光学传声器包括:Optical microphone, the optical microphone includes: 光源,所述光源被配置为发射光,所述光包括参考光束和感测光束,所述光源被配置为用所述感测光束照射用户的皮肤,其中,来自局部区域的声音在所述皮肤中引起振动,a light source, the light source is configured to emit light, the light includes a reference beam and a sensing beam, the light source is configured to illuminate the skin of the user with the sensing beam, wherein the sound from the local area is heard on the skin causing vibration in 检测器,所述检测器与所述光源处于干涉配置,使得所述检测器被配置为对混合信号进行检测,所述混合信号对应于与所述感测光束的被所述皮肤反射的部分混合的所述参考光束;以及a detector in an interference configuration with the light source such that the detector is configured to detect a mixed signal corresponding to mixing with a portion of the sensing beam reflected by the skin the reference beam; and 音频控制器,所述音频控制器被配置为使用所述混合信号对所述声音进行测量。An audio controller configured to measure the sound using the mixed signal. 2.根据权利要求1所述的音频系统,其中,所述干涉配置使得所述光源和所述检测器形成以下中的至少一个:自混合干涉仪、迈克尔逊干涉仪、低相干干涉系统、激光多普勒测振仪或某种其它类型的干涉系统。2. The audio system of claim 1, wherein the interference configuration is such that the light source and the detector form at least one of: a self-mixing interferometer, a Michelson interferometer, a low-coherence interference system, a laser Doppler vibrometer or some other type of interferometric system. 3.根据权利要求1所述的音频系统,其中,所述光源与所述检测器邻接,所述光学传声器还包括:3. The audio system of claim 1, wherein the light source is adjacent the detector, the optical microphone further comprising: 透镜,所述透镜耦接到所述光源,所述透镜被配置为:a lens coupled to the light source, the lens configured to: 将从所述光源发射的所述光分成所述参考光束和所述感测光束;以及splitting the light emitted from the light source into the reference beam and the sensing beam; and 将所述感测光束引导至所述皮肤,并且将所述参考光束反射到所述检测器。The sensing beam is directed to the skin and the reference beam is reflected to the detector. 4.根据权利要求1所述的音频系统,还包括:4. The audio system of claim 1, further comprising: 第二光学传声器,包括:Second optical microphone, including: 第二光源,所述第二光源被配置为发射光,该光包括第二参考光束和第二感测光束,所述光源被配置为用所述感测光束照射所述用户的所述皮肤,a second light source configured to emit light including a second reference beam and a second sensing beam, the light source configured to illuminate the skin of the user with the sensing beam, 第二检测器,所述第二检测器与所述第二光源处于所述干涉配置,使得所述第二检测器被配置为对第二混合信号进行检测,所述第二混合信号对应于与所述第二感测光束的被所述皮肤反射的部分混合的所述第二参考光束;a second detector, the second detector and the second light source being in the interference configuration such that the second detector is configured to detect a second mixed signal, the second mixed signal corresponding to the second reference beam mixed with the portion of the second sensing beam reflected by the skin; 块,所述块包括第一侧和第二侧,所述第一侧耦接到所述光学传声器,并且所述第二侧耦接到所述第二光学传声器;以及a block including a first side coupled to the optical microphone and a second side coupled to the second optical microphone; and 透镜,所述透镜耦接到所述光源、所述块和所述第二光源,所述透镜被配置为:a lens coupled to the light source, the block, and the second light source, the lens being configured to: 将从所述光源发射的光分成所述参考光束和所述感测光束,并且将从所述第二光源发射的光分成所述第二参考光束和所述第二感测光束,splitting the light emitted from the light source into the reference beam and the sensing beam, and splitting the light emitted from the second light source into the second reference beam and the second sensing beam, 将所述感测光束和所述第二感测光束引导至所述皮肤,directing the sensing beam and the second sensing beam to the skin, 将所述参考光束反射到所述检测器,以及reflect the reference beam to the detector, and 将所述第二参考光束反射到所述第二检测器。The second reference beam is reflected to the second detector. 5.根据权利要求1所述的音频系统,其中,所述检测器和所述光源彼此分离开阈值距离。5. The audio system of claim 1, wherein the detector and the light source are separated from each other by a threshold distance. 6.根据权利要求1所述的音频系统,其中,所述光学传声器位于包括鼻托的头戴式视图器上,并且所述光学传声器集成到所述鼻托中,并且所述光源被配置为用所述感测光束照射所述用户的鼻子的所述皮肤。6. The audio system of claim 1, wherein the optical microphone is located on a headset including a nose pad, the optical microphone is integrated into the nose pad, and the light source is configured to The skin of the user's nose is illuminated with the sensing beam. 7.根据权利要求1所述的音频系统,其中,所述光学传声器位于包括眼镜框的头戴式视图器上,并且所述光学传声器集成到所述眼镜框中,并且所述光源被配置为用所述感测光束照射所述用户的面部上的所述皮肤,并且可选地,所述音频系统还包括第二光学传声器,所述第二光学传声器集成到所述眼镜框上的与所述第一光学传声器不同的位置,所述第二光学传声器被配置为用第二感测光束照射所述用户的所述面部上的所述皮肤的与所述光学传声器不同的部分。7. The audio system of claim 1, wherein the optical microphone is located on a headset including an eyeglass frame, and the optical microphone is integrated into the eyeglass frame, and the light source is configured to The skin on the user's face is irradiated with the sensing beam, and optionally, the audio system further includes a second optical microphone integrated into the glasses frame in connection with the The first optical microphone is in a different position, and the second optical microphone is configured to illuminate a portion of the skin on the face of the user with a second sensing beam that is different from the optical microphone. 8.根据权利要求1所述的音频系统,其中,所述光学传声器位于头戴式视图器上,所述音频系统还包括:8. The audio system of claim 1, wherein the optical microphone is located on a headset, the audio system further comprising: 传声器阵列,所述传声器阵列位于所述头戴式视图器上,所述传声器阵列被配置为对来自所述局部区域的所述声音进行检测;a microphone array located on the headset, the microphone array configured to detect the sound from the local area; 其中,所述音频控制器还被配置为使用检测到的声音来校准所述光学传声器。Wherein, the audio controller is further configured to use the detected sound to calibrate the optical microphone. 9.根据权利要求1所述的音频系统,其中,所述光学传声器位于头戴式视图器上,所述音频系统还包括:9. The audio system of claim 1, wherein the optical microphone is located on a headset, the audio system further comprising: 传声器阵列,所述传声器阵列位于所述头戴式视图器上,所述传声器阵列被配置为对来自所述局部区域的所述声音进行检测;a microphone array located on the headset, the microphone array configured to detect the sound from the local area; 其中,所述音频控制器还被配置为部分地基于检测到的声音来增强所测量的声音。wherein the audio controller is further configured to enhance the measured sound based in part on the detected sound. 10.根据权利要求1所述的音频系统,其中,所述音频控制器还被配置为部分地基于所测量的声音来确定所述用户的面部的表情。10. The audio system of claim 1, wherein the audio controller is further configured to determine an expression of the user's face based in part on the measured sound. 11.根据权利要求1所述的音频系统,其中,所述光学传声器位于头戴式视图器上,并且所述音频控制器还被配置为:11. The audio system of claim 1, wherein the optical microphone is located on a headset, and the audio controller is further configured to: 识别所测量的声音中的噪声;identify noise in measured sounds; 生成声音滤波器,以抑制所识别的噪声,以及Generate sound filters to suppress the identified noise, and 应用所述声音滤波器来修改对应于音频内容的音频信号,applying said sound filter to modify an audio signal corresponding to audio content, 其中,所述音频系统还包括:Wherein, the audio system also includes: 换能器阵列,所述换能器阵列集成到所述头戴式视图器中,所述换能器阵列被配置为将修改后的音频信号作为修改后的音频内容呈现给所述用户,所述修改后的音频内容包括所述音频内容和抑制所述噪声的抑制分量。a transducer array integrated into the headset, the transducer array configured to present a modified audio signal to the user as modified audio content, The modified audio content includes the audio content and a suppression component that suppresses the noise. 12.根据权利要求1所述的音频系统,其中,所述光学传声器位于头戴式视图器上,所述音频系统还包括:12. The audio system of claim 1, wherein the optical microphone is located on a headset, the audio system further comprising: 传声器阵列,所述传声器阵列位于所述头戴式视图器上,所述传声器阵列被配置为对来自所述局部区域的所述声音进行检测,来自所述局部区域的所述声音包括所述音频系统的用户的语音;a microphone array located on the headset, the microphone array configured to detect the sound from the local area, the sound from the local area including the audio The voice of the user of the system; 其中,所述音频控制器还被配置为:Wherein, the audio controller is also configured as: 使用所测量的声音在检测到的声音中识别所述用户的语音;以及using the measured sounds to identify the user's voice among the detected sounds; and 基于所识别的所述用户的语音来更新声音滤波器,updating a sound filter based on the recognized speech of the user, 其中,使用更新后的声音滤波器修改音频内容,并且由至少一个音频系统来呈现修改后的音频内容。Wherein, the audio content is modified using the updated sound filter, and the modified audio content is presented by at least one audio system. 13.根据权利要求12所述的音频系统,其中,更新后的声音滤波器增强了所述用户的语音,并且所述音频控制器还被配置为:13. The audio system of claim 12, wherein the updated sound filter enhances the user's voice, and the audio controller is further configured to: 用所述更新后的滤波器修改所述音频内容,其中,修改后的音频内容增强了所述用户的语音;以及Modify the audio content with the updated filter, wherein the modified audio content enhances the user's voice; and 将所述修改后的音频内容提供给第二音频系统,其中,所述第二音频系统呈现了所述修改后的音频内容。The modified audio content is provided to a second audio system, wherein the second audio system presents the modified audio content. 14.根据权利要求12所述的音频系统,其中,更新后的声音滤波器增强了所述用户的语音,并且所述音频控制器还被配置为:14. The audio system of claim 12, wherein the updated sound filter enhances the user's voice, and the audio controller is further configured to: 用所述更新后的滤波器修改所述音频内容,其中,修改后的音频内容增强了所述用户的语音;Modify the audio content with the updated filter, wherein the modified audio content enhances the user's voice; 确定所述修改后的音频内容包括命令;以及determining that the modified audio content includes a command; and 根据所述命令来执行动作。Perform actions based on the command. 15.一种方法,包括:15. A method comprising: 从光学传声器的光源发射光,所述光包括参考光束和感测光束;Emit light from a light source of the optical microphone, the light including a reference beam and a sensing beam; 用所述感测光束照射用户的皮肤,其中,来自局部区域的声音在所述皮肤中引起振动;irradiating the user's skin with the sensing beam, wherein sound from the localized area induces vibrations in the skin; 通过与所述光源处于干涉配置的检测器对混合信号进行检测,所述混合信号对应于与所述感测光束的被所述皮肤反射的部分混合的所述参考光束;以及detecting a mixed signal by a detector in an interference configuration with the light source, the mixed signal corresponding to the reference beam mixed with the portion of the sensing beam reflected by the skin; and 使用所述混合信号对所述声音进行测量。The sound is measured using the mixed signal. 16.根据权利要求15所述的方法,其中,所述皮肤的振动部分地由所述用户的语音引起,所述方法还包括:16. The method of claim 15, wherein the vibration of the skin is caused in part by the user's voice, the method further comprising: 通过传声器阵列对来自所述局部区域的所述声音进行检测;detecting the sound from the local area through a microphone array; 使用所测量的声音在检测到的声音中识别所述用户的所述语音;以及identifying the voice of the user among the detected sounds using the measured sounds; and 基于所识别的所述用户的语音来更新声音滤波器,updating a sound filter based on the recognized speech of the user, 其中,使用更新后的声音滤波器修改音频内容,并且由至少一个音频系统来呈现修改后的音频内容。Wherein, the audio content is modified using the updated sound filter, and the modified audio content is presented by at least one audio system. 17.根据权利要求15所述的方法,其中,所述干涉配置使得所述光源和所述检测器形成以下中的至少一个:自混合干涉仪、迈克尔逊干涉仪、低相干干涉系统、激光多普勒测振仪或某种其它类型的干涉系统。17. The method of claim 15, wherein the interference configuration is such that the light source and the detector form at least one of: a self-mixing interferometer, a Michelson interferometer, a low-coherence interference system, a laser multi- Puller vibrometer or some other type of interferometric system. 18.根据权利要求15所述的方法,其中,所测量的声音包括用户的语音,并且所述语音的高频分量相对于所述语音的低频衰减,所述方法还包括:18. The method of claim 15, wherein the measured sound includes the user's voice, and the high frequency components of the voice are attenuated relative to the low frequencies of the voice, the method further comprising: 对所述语音的所述高频分量进行重构;以及reconstructing the high-frequency components of the speech; and 用重构的高频分量更新所测量的所述语音的声音。The measured sound of the speech is updated with the reconstructed high frequency components. 19.一种非暂时性计算机可读介质,所述非暂时性计算机可读介质被配置为存储程序代码指令,所述程序代码指令在由音频系统的处理器执行时,促使所述音频系统执行根据权利要求15至18中任一项所述的方法或包括以下步骤的多个步骤:19. A non-transitory computer-readable medium configured to store program code instructions that, when executed by a processor of an audio system, cause the audio system to perform The method according to any one of claims 15 to 18 or a plurality of steps comprising the following steps: 从光学传声器的光源发射光,所述光包括参考光束和感测光束;Emit light from a light source of the optical microphone, the light including a reference beam and a sensing beam; 用所述感测光束照射用户的皮肤,其中,来自局部区域的声音在所述皮肤中引起振动;irradiating the user's skin with the sensing beam, wherein sound from the localized area induces vibrations in the skin; 通过与所述光源处于干涉配置的检测器对混合信号进行检测,所述混合信号对应于与所述感测光束的被所述皮肤反射的部分混合的所述参考光束;以及detecting a mixed signal by a detector in an interference configuration with the light source, the mixed signal corresponding to the reference beam mixed with the portion of the sensing beam reflected by the skin; and 使用所述混合信号对所述声音进行测量。The sound is measured using the mixed signal. 20.一种计算机程序,所述计算机程序包括指令,所述指令在所述程序由音频系统的处理器执行时,促使所述音频系统执行根据权利要求15至18中任一项所述的方法或包括以下步骤的多个步骤:20. A computer program comprising instructions which, when executed by a processor of an audio system, cause the audio system to perform a method according to any one of claims 15 to 18 Or multiple steps including: 从光学传声器的光源发射光,所述光包括参考光束和感测光束;Emit light from a light source of the optical microphone, the light including a reference beam and a sensing beam; 用所述感测光束照射用户的皮肤,其中,来自局部区域的声音在所述皮肤中引起振动;irradiating the user's skin with the sensing beam, wherein sound from the localized area induces vibrations in the skin; 通过与所述光源处于干涉配置的检测器对混合信号进行检测,所述混合信号对应于与所述感测光束的被所述皮肤反射的部分混合的所述参考光束;以及detecting a mixed signal by a detector in an interference configuration with the light source, the mixed signal corresponding to the reference beam mixed with the portion of the sensing beam reflected by the skin; and 使用所述混合信号对所述声音进行测量。The sound is measured using the mixed signal.
CN202180094032.0A 2020-12-17 2021-12-16 Audio system using optical microphone Pending CN116965060A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/126,669 2020-12-17
US17/525,155 2021-11-12
US17/525,155 US20220201403A1 (en) 2020-12-17 2021-11-12 Audio system that uses an optical microphone
PCT/US2021/063805 WO2022133086A1 (en) 2020-12-17 2021-12-16 Audio system that uses an optical microphone

Publications (1)

Publication Number Publication Date
CN116965060A true CN116965060A (en) 2023-10-27

Family

ID=88447754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180094032.0A Pending CN116965060A (en) 2020-12-17 2021-12-16 Audio system using optical microphone

Country Status (1)

Country Link
CN (1) CN116965060A (en)

Similar Documents

Publication Publication Date Title
US20220201403A1 (en) Audio system that uses an optical microphone
CN114223215B (en) Dynamic customization of head-related transfer functions for rendering audio content
CN114208208B (en) Wearer identification based on personalized acoustic transfer function
US12008700B1 (en) Spatial audio and avatar control at headset using audio signals
CN116134838A (en) Audio system using personalized sound profile
US10971130B1 (en) Sound level reduction and amplification
US11638110B1 (en) Determination of composite acoustic parameter value for presentation of audio content
US12094487B2 (en) Audio system for spatializing virtual sound sources
US20240211563A1 (en) User authentication using combination of vocalization and skin vibration
TW202310618A (en) Eye-tracking using embedded electrodes in a wearable device
KR20230041755A (en) Virtual microphone calibration based on displacement of the outer ear
EP4432277B1 (en) Modifying audio data associated with a speaking user based on a field of view of a listening user in an artificial reality environment
EP4264960A1 (en) Audio system that uses an optical microphone
EP4432053A1 (en) Modifying a sound in a user environment in response to determining a shift in user attention
US11012804B1 (en) Controlling spatial signal enhancement filter length based on direct-to-reverberant ratio estimation
CN118803334A (en) Synchronize the avatar's video with locally captured audio from the user corresponding to that avatar
CN118711605A (en) Self-speech Suppression in Wearable Devices
CN116965060A (en) Audio system using optical microphone
US20220180885A1 (en) Audio system including for near field and far field enhancement that uses a contact transducer
US11871198B1 (en) Social network based voice enhancement system
US12284499B1 (en) Augmented hearing via adaptive self-reinforcement
US11598962B1 (en) Estimation of acoustic parameters for audio system based on stored information about acoustic model
US12039991B1 (en) Distributed speech enhancement using generalized eigenvalue decomposition
US12108241B1 (en) Adjusting generation of spatial audio for a receiving device to compensate for latency in a communication channel between the receiving device and a sending device
US20240331677A1 (en) Active noise cancellation using remote sensing for open-ear headset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination