HK40029546B - System and method for creating crosstalk canceled zones in audio playback - Google Patents
System and method for creating crosstalk canceled zones in audio playback Download PDFInfo
- Publication number
- HK40029546B HK40029546B HK62020019051.5A HK62020019051A HK40029546B HK 40029546 B HK40029546 B HK 40029546B HK 62020019051 A HK62020019051 A HK 62020019051A HK 40029546 B HK40029546 B HK 40029546B
- Authority
- HK
- Hong Kong
- Prior art keywords
- listener
- crosstalk
- sound waves
- audio playback
- transducers
- Prior art date
Links
Description
Related to claims cross-reference
    This application claims priority from us 62/571,234 provisional patent application No. 62/571,234 filed on 11/10/2017, the entire contents of which are incorporated herein by reference.
    Technical Field
      The present invention relates to the field of reproducing three-dimensional real sound, and more particularly, to a crosstalk cancellation (XTC) method and system thereof.
    Background
      The ordinary person can hear and distinguish sounds positioned from various directions and distances, in which sound waves arriving at the left and right ears of the head of the person have Time delays (also called Interaural Time Differences (ITDs)) or volume Level differences (also called Interaural intensity differences (ILDs)). The brain can perceive sounds in three-dimensional (3D) space based on these auditory cues and recognize and confirm the spatial origin of the sounds.
      Based on such a principle, binaural recording is an audio recording in which 3D audio cues are embedded by using two microphones to simulate the left and right arrangement of the ears of a general person, in order to create a 3D audio experience (also referred to as "virtual head recording") for a listener during playback of the recording. However, the problem is that common stereo sound transducers are typically used when playing back or reproducing 3D audio recordings. Even when the recorded left and right channel signals are played back from the left and right channel transducers, respectively, the sound wave corresponding to the left channel signal cannot be guaranteed to reach only the left ear of the listener and vice versa. If the time delay and/or volume difference information of the original sound recording cannot be perfectly reproduced at the left and right ears of the listener, the listener cannot experience the 3D audio effect. This phenomenon is called crosstalk. Fig. 1 illustrates this crosstalk phenomenon.
      There are currently some prior art proposals that can be used to eliminate this crosstalk phenomenon, with the aim of being able to reproduce an uncorrupted 3D audio experience for the listener. Crosstalk cancellation (XTC) can be achieved by playing back binaural sound recordings on loudspeakers (also known as BAL) or headphones (also known as BAH). Most BAL techniques involve obtaining the effect of XTC by manipulating the time domain and/or audio spectrum of the input audio signal (essentially creating an XTC filter). Audio spectral manipulation can be accomplished by adjusting the variables of the XTC filter to match the response of the sound reproduction system, which includes a pair of transducers, the room being reproduced, the location of the listener in the room, and in some cases even the size and shape of the listener's head. In some embodiments, the adjustment is first made automatically by measuring the response of the sound reproduction system. The inverse of the system response is then used to convolve with the audio signal input to the transducer to cancel the system response. Fig. 2 provides a simplified diagram of the operation of the XTC filter in a sound reproduction system.
      The biggest challenge of BAL is the influence of listener room, and both earlier reflections and general reflections reduce the degree of crosstalk cancellation that XTC algorithms can achieve in real life. One can attempt to mitigate the reflection problem by isolating the room with a broadband absorber or using a speaker with a narrow dispersion pattern (significantly down the axis). In many real-life implementations, neither of these solutions is practical. There is also the problem of having only one sweet spot. Although XTC can be used in conjunction with listener head tracking, it still has essentially a single sweet spot and the listener has no freedom of action. Multiple XTC sweet spots can be achieved using phased array or beamforming techniques, but the design becomes very complex and the implementation cost is also very high. Such a system may provide some sweet spots, but is not feasible in an environment such as a movie theater.
      The BAH technique involves a general or individual Head Related Transfer Function (HRTF) that causes the human brain to perceive sound in 3D by convolving the audio signal. However, the 3D sound experience in BAH is still not convincing as BAL. It is often desirable to use visual cues to help the brain believe that the sound is a realistic 3D sound. Compared to BAL, the effects that BAH technology produces on the listener experience are consistently lacking in the "physics" of sound that can be experienced by BAL. Furthermore, BAH is also very difficult to implement due to the high degree of personalization of HRTFs.
      Fig. 3 shows an exemplary embodiment of a sound reproduction system with an XTC filter. However, a common drawback of these XTC techniques in practice is that they require the listener to stay in a single position and remain stationary in order to obtain the desired 3D audio experience, which must be free of any obstruction from the transducer (sweet spot), or the system must know or track the listener's position throughout the audio playback.
    Disclosure of Invention
      The present invention provides a method and system for providing one or more local crosstalk cancellation zones for 3D audio reproduction. The invention aims to enable the method and the system to be applied to small audio reproduction environments such as families and large audio reproduction environments such as indoor and outdoor theaters, so that a plurality of audiences can experience the same ideal 3D sound effect at different positions of the theaters.
      According to one aspect, one or more transducers separate from the main transducer are used to generate a separate XTC sound signal that is synchronized with the main sound signal generated by the main transducer when reaching the ear of the listener.
      According to one embodiment of the present invention, a realistic 3D sound reproduction is provided that uses close-range transducers (CPTs) associated with each listener to provide multiple crosstalk-canceling areas in a stereo reproduction environment. CPT is a transducer that generates XTC sound waves, which is a miniature transducer (one transducer per ear) specially manufactured for the listener to wear near or hang on the ear, and is configured not to interfere with the listener's listening to the main sound from the main transducer in a stereo reproduction environment. In such a stereo reproduction environment, listeners can freely receive stereo signals of the same side channels, thereby experiencing realistic 3D audio scenes. Alternatively, the listener's position during playback can be tracked by the CPT worn by the listener so that the response of the system can be continuously measured and the XTC sound waves adjusted accordingly. Therefore, it is not necessary for the listener to be stationary at a certain fixed position throughout the audio reproduction.
      According to one embodiment, a system for creating crosstalk-canceled zones in audio playback is provided, the system comprising two or more primary transducers for emitting stereo sounds of the audio playback; a local system including at least one or more CPTs disposed at the proximal ends of the left and right ear canals of a listener; wherein each CPT comprises: a position tracking device for tracking the relative position of the main transducer and the CPT and other CPTs; a control unit for receiving relative position data from the position tracking device; wherein the control unit is configured to process the relative position data and cause the CPT to generate XTC sound waves corresponding to stereo sound waves arriving at respective listener ears; wherein the generated XTC sound waves are synchronized with audio playback in response to their corresponding relative positions.
      According to one embodiment, the position tracking device further tracks the relative position of other local systems; the position tracking device employs one or more wireless communication technologies and standards, including but not limited to bluetooth and WiFi, and related signal triangulation techniques dedicated to tracking relative position; in addition, the control unit enables the CPT to send out a correction signal; and the CPT group is installed or integrated in the furniture.
      According to another embodiment, one or more CPTs are connected to microphones placed near the ears of the corresponding listener. The microphone is configured to receive and measure sound waves of the audio playback and to generate a measurement data input signal for the control unit of the CPT. This configuration can selectively replace position tracking devices and use relative position data in processing and generating XTC acoustic waves.
    Drawings
      Embodiments of the invention are described in more detail below with reference to the accompanying drawings, in which:
      fig. 1 shows a listener listening to a situation where conventional stereo audio is reproduced using two loudspeakers without XTC;
      fig. 2 shows a listener listening to a situation where two speakers are used to reproduce conventional XTC audio;
      FIG. 3 depicts an exemplary embodiment of a conventional audio system with an XTC filter;
      FIG. 4 shows a listener listening to an arrangement for audio reproduction using two speakers and two XTC transducers according to one embodiment of the invention;
      FIG. 5 provides a scheme of a partial XTC region; and
      fig. 6 provides a diagram of a close-up view of fig. 5.
    Detailed Description
      In the following description, a system and method for creating crosstalk-cancelled zones in audio playback or similar applications will be set forth as a preferred example. It will be apparent to those skilled in the art that certain modifications (including additions and/or substitutions) may be made without departing from the scope and spirit of the invention. Some of the specific details may be omitted to make the description of the invention clear; however, the description of the present invention has been intended to enable those skilled in the art to practice the teachings herein without undue experimentation.
      The present invention provides a method and system that provides one or more local crosstalk cancellation zones (LXCZ) for 3D audio reproduction. The object of the invention is to enable the method and system to be applied in small audio reproduction environments like homes and in large audio reproduction environments like indoor and outdoor theaters, such that multiple viewers can experience the same and ideal 3D sound effect at different locations of the theaters.
      According to one aspect, one or more transducers separate from the main transducer are used to generate independent XTC sound signals that are synchronized with the main sound signal generated by the main transducer when reaching the ear of the listener. Fig. 4 provides a simplified schematic of this concept.
      In one embodiment, the XTC acoustic wave generating transducer is specially made as a miniature transducer, allowing the listener to wear near or hang above the ears (one transducer per ear), and is configured so as not to obstruct the listener from listening to the primary sound from the primary transducer. Alternatively, the listener's position during playback can be tracked by the CPT worn by the listener so that the response of the system can be continuously measured and the XTC sound waves adjusted accordingly. Therefore, it is not necessary for the listener to be stationary at a certain fixed position throughout the audio reproduction.
      According to another embodiment, one or more of the XTC acoustic wave generating transducers are connected to a microphone placed near the corresponding listener's ear. The microphone is used to receive and measure the main sound and to generate a measurement data input signal for the control unit of the CPT. This configuration can selectively replace the position tracking device and use the listener's position information in the processing and generation of XTC sound waves.
      In the following, the various systems and methods of the present invention are described by mathematical formulas that define the creation and relationship of ideal local crosstalk cancellation zones.
      
        Basic principle composition of system
      
      Consider an acoustic environment Ω with n local systems Qj(j is 1. ltoreq. n), and m point sound sources Si(i is more than or equal to 1 and less than or equal to m), and i and j are integers which are equal to or more than 1.
      The acoustic environment Ω may be a closed room or an open space with different walls and environmental structures. Each local system QjThe method comprises the following steps: a set of receivers, wherein the system QjThe position of the kth receiver at time (t) isExamples of such receivers may include the listener's ear and microphone; a set of close range transducers (CPT) for transmitting a local sound field, in which the system QjThe position of the 1 st transducer at time (t) isExamples of transducers may include over-the-ear, over-the-ear and in-the-ear headphones, earplugs, other types of wearable speakers, fixed and portable speakers.
      All sources Si(1 ≦ i ≦ m) generating a sound fieldIn a system QjThe k thThe sound pressure signal of the receiver position isSound pressure signal p of different k valuesjk(t) will determine the system QjThe reproduced acoustic experience (for a human user). True 3D sound reproduction is defined as a set of target signals received by the receiverThe target signalCan also be defined as the sound source SiA sound pressure signal received in a simulated reference scene (e.g. a concert hall). The target signalMay represent a real acoustic environment (e.g., listening to a live orchestra at a concert hall), or conditioned audio (e.g., a real recording with modifications or added functionality) or complete artificial sound. Thus, the target signalAnd sound pressure signal pjkThe difference between (t) is the correction signal Δ pjk(t), which is expressed as:
      the correction signal may be obtained by CPT. Namely with the system QjAssociated 1 st CPT transmit signal xjl(t) so as to correct the signal Δ pjk(t) is received at the kth receiver.
      
        Configuring parameters
      
      Signal x emitted by CPTjl(t) generally depends on the relative position of the receiver with respect to the transducer (toRepresentation), as well as environmental acoustic characteristics including other systems and the location of the component body of the current system. All variables are time dependent. For these reasons, each system QjComputing a vector q within a time dependent variablej(t) to calculate the signal x to be transmittedjl(t) of (d). These variables include: description of System QjThe degree of freedom of the main body space structure; other internal parameters of the system, such as the Head Related Transfer Function (HRTF) of a human user in a non-time dependent framework; and in a non-time dependent framework for influencing the sound source S as a function of the environmental transformationiOf the sound propagation. These variables enable at least the reconstruction of the relative position of the listener with respect to the transducerData collected by system-related sensors can be used to calculate vector q in real timej(t)。
      
        Generation of correction signals
      
      Each local system QjAnd a multiple-input multiple-output (MIMO) linear time varying system (LTV) LjIn correlation, the system calculates the output signal x of the corresponding transducer needed to obtain the desired correction signaljl(t) of (d). Since a time-varying signal is required for a system to operate under time-varying conditions, the LTV LjIs a correction signal Δ p generated by the transducerjk(t) sum signal xjl(t) of (d). Here, indices k and l are used to represent a single system Q, respectivelyjA set of receivers (the listener's ears) and a set of transducers. Suppose that each listener j corresponds to a multi-channel signal Δ pj(t) one of the channels, each listener j corresponding to a multi-channel signal xj(t), the functional relationship between the input and the output can be described as:
      xj(t)=Lj[Δpj(t);qj(t)]
      wherein q isj(t) is a vector of time-dependent variables as defined above.
      
        Localization of the cancellation process
      
      The functional relationship defined above, in combination with the parameter q described forjThe limitation of (t) means that the process concerned is local. This means that the target signal is applied to a local systemThe cross talk from other local systems to the correction signal generated by the local system is ignored. Here, the term "local" means each local system QjA decision is made on the cancellation signal sent independently from the other local systems. This enables independent LTVs to be designed for each subsystem. Optionally, LTVs may include additional systems to detect interference between users and attenuate such interference, if desired.
      In one embodiment, local system QjMay include a set of sensors, e.g., one for tracking head movements to adjust HRTFs and ambient including the location of other local systems approaching or departing, so that loaded inter-user interference attenuation can be applied in advance.
      According to one embodiment, a pair of spaced apart transducers (close range transducers (CPTs)) is provided that are located close to the listener. The primary sound source is still a pair of primary external stereo speakers with a CPT providing a crosstalk cancellation signal located in front of the listener. XTC is performed using CPT in order to provide listeners with their personalized XTC regions/bubbles. Fig. 5 provides a diagram of the personalized XTC region/bubble, and fig. 6 provides a close-up view thereof.
      The CPT provides XTC sound waves to cancel crosstalk from the main external speakers. This allows the listener a higher freedom of action. Not only is everyone free to move, but since CPT is based on personal or localized, there are many listeners who can share the same listening experience from the same set of main speakers.
      The CPTs of the system can create crosstalk with users of other systems, especially when the users are too close together, as may occur with open headphones using different CPTs. Generally, the definition of the correction signal described above does not include such a less significant effect. Optionally, the CPT may include additional functionality to handle such inter-user interference.
      Optionally, the XTC sound waves generated by the CPT include sound effects that reduce timbre, equalization, and/or user presets.
      According to another embodiment, the CPT may be a pair of open headphones (through which external sound may travel and reach the listener's ears), or a pair of headphones, such as a Sony (SONY) PFR-V1 or a Bose soundbear. However, CPT is not limited to wearable devices. For example, in a cinema application, the CPT may be embedded in the headrest of a chair. An advantage of using CPT as a wearable device is that the physical relationship between the CPT and the listener can be fixed, but the CPT can also be embedded in the headrest, all depending on the tolerance level of the algorithm that calculates the crosstalk cancellation signal.
      Although the present document describes the CPT of the present invention as being primarily applicable to headphones, one of ordinary skill in the art should be able to apply its various embodiments to other types of proximity devices, such as, but not limited to, embeddable devices that are applied to stationary objects, such as chairs, sofas, or neck pads, without undue experimentation.
      The position of the listener relative to the main speakers will affect the effectiveness of the level reached by XTC. Various techniques may be implemented to determine the location of the listener. For example, bluetooth-based triangulation techniques may be used to determine location. Other wireless technologies may also provide very accurate positioning information. This positioning information can be used to calculate the delay required for the L and R channels of the CPT.
      The CPT may be a wired or wireless device. The main goal here is to separate the XTC region from the traditional BAL setup and main speakers and create a local XTC region for each individual instead.
      The embodiments disclosed herein may be implemented using a general purpose or special purpose computing device, a mobile communication device, a computer processor, or electronic circuits including, but not limited to, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), and other programmable logic devices configured or programmed according to the teachings of the present invention. Computer instructions or software code running in a general purpose or special purpose computing device, mobile communication device, computer processor, or programmable logic device may be readily prepared by a practitioner of software or electronics in light of the teachings of this disclosure.
      In some embodiments, the present invention includes a computer storage medium having computer instructions or software code stored therein which can be used to program a computer or microprocessor to perform any of the processes of the present invention. The storage medium may include, but is not limited to, floppy diskettes, optical disks, blu-ray disks, DVDs, CD-ROMs, and magneto-optical disks, ROMs, RAMs, flash memory devices, or any type of media or device suitable for storing instructions, code, and/or data.
      The foregoing description of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations will be apparent to practitioners skilled in the art.
      The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to understand the invention for various embodiments and with various modifications as are suited to the particular use contemplated. The scope of the invention is defined by the following claims and their equivalents.
    Claims (8)
1. A system for creating crosstalk-cancelled zones in audio playback, comprising:
      one or more primary transducers emitting stereo sound waves for audio playback;
      a local system, comprising: a set of sensors including at least two or more close range transducers and an ambient environment for tracking the location including other local systems approaching or leaving;
      wherein each of the close range transducers is disposed in the vicinity of one of the left and right ear canals of the listener;
      wherein each of the close-range transducers comprises:
      the position tracking device is used for tracking the relative positions of the main transducer, the short-distance transducer and other short-distance transducers;
      a control unit for receiving relative position data from the position tracking device and generating a control signal for generating a crosstalk cancellation sound wave according to the relative position data;
      wherein each of the close-range transducers is configured to generate crosstalk-cancelled acoustic waves corresponding to stereo acoustic waves that reach a respective ear of the listener;
      wherein the generated crosstalk-cancelled sound waves are synchronized with the audio playback in response to the relative position; and
      wherein the synchronized crosstalk cancellation sound waves pre-apply a loaded inter-user interference attenuation according to the tracked ambient environment.
    2. The system of claim 1, wherein the position tracking device further tracks the relative position of other local systems.
    3. The system of claim 1, wherein the position tracking device comprises a wireless communication triangulation device for tracking relative position.
    4. The system of claim 1, wherein the proximity transducer additionally transmits one or more correction signals.
    5. The system of claim 1, wherein the proximity transducer comprises one or more of over-the-ear, and in-the-ear headphones, earplugs, other types of wearable speakers, fixed and portable speakers.
    6. A system for creating crosstalk-cancelled zones in audio playback, comprising:
      one or more primary transducers to emit stereo sound waves for audio playback;
      a local system comprising at least two or more close-range transducers, one or more microphones, and a set of sensors for tracking the surrounding environment including the location of other local systems approaching or departing;
      wherein each of the close range transducers is disposed in the vicinity of one of the left and right ear canals of the listener;
      wherein each of the microphones is positioned near an ear of the listener and is configured to receive and measure stereo sound waves for audio playback;
      wherein each of the close-range transducers comprises:
      a control unit for receiving relative position data of stereo sound waves for audio playback from a microphone and generating a control signal for generating crosstalk-cancelled sound waves from the relative position data of the sound waves;
      wherein each of the close-range transducers is configured to generate crosstalk-cancelled acoustic waves corresponding to stereo acoustic waves that reach a respective ear of the listener;
      wherein the generated crosstalk-cancelled sound waves are synchronized with the audio playback in response to the relative position; and
      wherein the synchronized crosstalk cancellation sound waves pre-apply a loaded inter-user interference attenuation according to the tracked ambient environment.
    7. The system of claim 6, wherein the proximity transducer additionally transmits one or more correction signals.
    8. The system of claim 6, wherein the proximity transducer comprises one or more of over-the-ear, and in-the-ear headphones, earplugs, other types of wearable speakers, fixed and portable speakers.
    Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| US62/571,234 | 2017-10-11 | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| HK40029546A HK40029546A (en) | 2021-02-19 | 
| HK40029546B true HK40029546B (en) | 2022-04-22 | 
Family
ID=
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| US7123731B2 (en) | System and method for optimization of three-dimensional audio | |
| Ranjan et al. | Natural listening over headphones in augmented reality using adaptive filtering techniques | |
| AU2001239516A1 (en) | System and method for optimization of three-dimensional audio | |
| CN111316670B (en) | System and method for creating crosstalk-cancelled zones in audio playback | |
| EP3132617A1 (en) | An audio signal processing apparatus | |
| Roginska | Binaural audio through headphones | |
| JP2009077379A (en) | Stereoscopic sound reproduction equipment, stereophonic sound reproduction method, and computer program | |
| WO2013149867A1 (en) | Method for high quality efficient 3d sound reproduction | |
| US9226091B2 (en) | Acoustic surround immersion control system and method | |
| US6990210B2 (en) | System for headphone-like rear channel speaker and the method of the same | |
| US11653163B2 (en) | Headphone device for reproducing three-dimensional sound therein, and associated method | |
| US10805729B2 (en) | System and method for creating crosstalk canceled zones in audio playback | |
| US10440495B2 (en) | Virtual localization of sound | |
| HK40029546A (en) | System and method for creating crosstalk canceled zones in audio playback | |
| HK40029546B (en) | System and method for creating crosstalk canceled zones in audio playback | |
| US20050041816A1 (en) | System and headphone-like rear channel speaker and the method of the same | |
| US6983054B2 (en) | Means for compensating rear sound effect | |
| EP4085662A1 (en) | System and method for virtual sound effect with invisible loudspeaker(s) | |
| KR102740590B1 (en) | 3 dimensional sound realization method and 3 dimensional sound realization system using the same | |
| Kang et al. | Listener Auditory Perception Enhancement using Virtual Sound Source Design for 3D Auditory System | |
| Chun | A numerical study of multichannel systems for the presentation of virtual acoustic environments | |
| Avendano | Virtual spatial sound |