[go: up one dir, main page]

WO2024177629A1 - Dynamic audio mixing in a multiple wireless speaker environment - Google Patents

Dynamic audio mixing in a multiple wireless speaker environment Download PDF

Info

Publication number
WO2024177629A1
WO2024177629A1 PCT/US2023/013639 US2023013639W WO2024177629A1 WO 2024177629 A1 WO2024177629 A1 WO 2024177629A1 US 2023013639 W US2023013639 W US 2023013639W WO 2024177629 A1 WO2024177629 A1 WO 2024177629A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
component
output devices
audio output
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2023/013639
Other languages
French (fr)
Inventor
Kevin J. Bastyr
Todd S. Welti
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harman International Industries Inc
Original Assignee
Harman International Industries Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harman International Industries Inc filed Critical Harman International Industries Inc
Priority to PCT/US2023/013639 priority Critical patent/WO2024177629A1/en
Priority to CN202380094237.8A priority patent/CN120642347A/en
Publication of WO2024177629A1 publication Critical patent/WO2024177629A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0083Recording/reproducing or transmission of music for electrophonic musical instruments using wireless transmission, e.g. radio, light, infrared
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other

Definitions

  • the various embodiments relate generally to audio systems, and more specifically, to techniques for dynamic audio mixing in a multiple wireless speaker environment.
  • a feature of audio output devices that is growing in popularity is a “party mode,” in which multiple audio output devices can communicatively couple together to form an ad-hoc network of speakers that synchronizes together and outputs audio content as one speaker system.
  • the party mode can be coordinated and controlled via a mobile device. Content is output from the mobile device to the network of speakers.
  • audio output devices support stereo output via one or more left speakers and one or more right speakers that are separated by approximately 10-20 cm. While this amount of speaker separation may be adequate for a small number of users listening to a single audio output device, an issue arises when multiple such audio output devices are employed in party mode.
  • ten or more audio output devices are deployed in party mode and placed at various locations in a listening environment, where each of the audio output devices receives the same stereo signal. Each audio output device plays the left channel of the stereo signal on the left speaker(s) and the right channel of the stereo signal on the right speaker(s).
  • a user may hear multiple left channel audio signals from multiple audio output devices and may also hear multiple right channel audio signals from the same and/or different audio output devices. Due to different travel distances and orientations, the various left channel audio signals arrive at the user at different points of time. Depending on the time differences, some portions of the left channel audio signals may augment each other, causing a volume increase, while other portions of the left channel audio signals may diminish each other, causing a volume decrease. Likewise, the various right channel audio signals arrive at the user at different points of time, potentially with different arrival times than for the left channel audio signals. As a result, the user may perceive portions of the left channel audio as having a lower volume or a higher volume than portions of the right channel audio. This phenomenon, referred to as a combing effect, can lead to an undesirable listening experience.
  • Various embodiments of the present disclosure set forth a computer-implemented method for generating audio signals in an audio system.
  • the method includes receiving an audio input signal.
  • the method further includes separating a plurality of component audio signals from the audio input signal.
  • the method further includes, for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices.
  • the method further includes transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping, audio signals in an audio system include, at a first computing device.
  • At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, multiple audio output devices, such as personal speakers, can be deployed in party mode without generating undesirable combing effects, as is typical with conventional techniques.
  • Another technical advantage of the disclosed techniques relative to the prior art is that each user within the environment can have a different audio experience based on the location of the user relative to the locations of the multiple audio output devices.
  • the upmix transmitted to the multiple audio output devices is changed dynamically based on which speakers are in the network, which source audio is being used, and/or the like. More particularly, the techniques dynamically change the upmix as audio output devices move within the environment, as audio output devices leave the network, as new audio output devices enter the network, and so on.
  • the upmix is adapted to generate a suitable soundfield as changes in the audio output devices and source audio occur over time.
  • users enjoy a more interactive and immersive listening experience relative to conventional techniques.
  • FIG. 1 is a block diagram of a computer system configured to implement one or more aspects of the various embodiments
  • FIG. 2 illustrates a coordinated audio system, according to one or more aspects of the various embodiments
  • FIG. 3 illustrates an example of a listening environment for a coordinated audio system, according to one or more aspects of the various embodiments
  • FIG 4 illustrates another example of a listening environment for a coordinated audio system, according to one or more aspects of the various embodiments
  • FIG. 5 illustrates yet another example of a listening environment for a coordinated audio system, according to one or more aspects of the various embodiments.
  • FIG. 6 is a flow chart of method steps for generating a set of audio streams for a coordinated audio system, according to one or more aspects of the various embodiments.
  • FIG. 1 illustrates a computer system 100 configured to implement one or more aspects of the various embodiments.
  • computer system 100 includes, without limitation, computing device(s) 180, input devices 122, output devices 124, audio output device(s) 126, network(s) 160, audio device network 162, and media content services 170.
  • Computing device 180 includes, without limitation, one or more processing units 102, I/O device interface 104, network interface 106, interconnect 112 (e.g., a bus), storage 114, and memory 116.
  • Memory 116 stores, without limitation, output device manager application 150 and audio upmix application 152.
  • Processing unit(s) 102 and memory 116 can be implemented in any technically feasible fashion.
  • processing unit(s) 102 and memory 116 can be implemented as a stand-alone chip or as part of a more comprehensive solution that is implemented as an application-specific integrated circuit (ASIC), a system-on-a-chip (SoC), and/or the like.
  • ASIC application-specific integrated circuit
  • SoC system-on-a-chip
  • Processing unit(s) 102, I/O device interface 104, network interface 106, storage 1 14, and memory 1 16 can be communicatively coupled to each other via interconnect 112.
  • the one or more processing unit(s) 102 can include any suitable processor, such as a central processing unit (CPU), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), a tensor processing unit (TPU), any other type of processing unit, or a combination of multiple processing units, such as a CPU configured to operate in conjunction with a GPU.
  • processors such as a central processing unit (CPU), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), a tensor processing unit (TPU), any other type of processing unit, or a combination of multiple processing units, such as a CPU configured to operate in conjunction with a GPU.
  • each of the one or more processing unit(s) 102 can be any technically feasible hardware unit capable of processing data and/or executing software applications and modules.
  • Storage 114 can include non-volatile storage for applications, software modules, and data, and can include fixed or removable disk drives, flash memory devices, and CD- ROM, DVD-ROM, Blu-Ray, HD-DVD, or other magnetic, optical, solid state storage devices, and/or the like. Storage 114 can be fully or partially located in a remote storage system, referred to herein as “the cloud,” and accessed through connections such as network 160.
  • the cloud a remote storage system
  • Memory 116 can include a random access memory (RAM) module, a flash memory unit, or any other type of memory unit or combination thereof.
  • RAM random access memory
  • the one or more processing unit(s) 102, I/O device interface 104, and network interface 106 are configured to read data from and write data to memory 116.
  • Memory 116 includes various software programs and modules (e.g., an operating system, one or more applications) that can be executed by processing unit(s) 102 and application data (e.g., data loaded from storage 114) associated with said software programs.
  • one or more databases 142 are loaded from storage 114 into memory 116.
  • Databases 142 include application, user data, media content, etc. that are associated with one or more applications that can be executed by processing unit(s) 102.
  • computing device 180 is communicatively coupled to one or more networks 160.
  • Network(s) 160 can be any technically feasible type of communications network that allows data to be exchanged between computing device 180 and other systems or devices, such as a server, a cloud computing system, or other networked computing device or system (e.g., media content service(s) 170).
  • network 160 can include a wide area network (WAN), a local area network (LAN), a wireless network (e.g., a Wi-Fi network, a cellular data network, an ad-hoc network), and/or the Internet, among others.
  • Computing device 180 can connect with network(s) 160 via network interface 106.
  • network interface 106 is hardware, software, or a combination of hardware and software, that is configured to connect to and interface with network(s) 160.
  • network interface 106 facilitates communication with other devices or systems via one or more standard and/or proprietary protocols (e.g.. Bluetooth, a proprietary protocol associated with a specific manufacturer, etc.)
  • Media content service(s) 170 includes one or more computerized services configured to provide (e.g.. distribute) media content to devices (e.g., to computing devices 180).
  • Examples of media contents service(s) 170 include Spotify, Apple Music, Pandora, YouTube Music, Tidal and/or the like.
  • Media or media content as used herein, includes, without limitation, audio content (e.g.. spoken and/or musical audio content, audio content files, streaming audio content, audio track of a video, and/or the like) and/or video content.
  • Examples of media content service(s) 170 include, without limitation, media content streaming services, YouTube, digital media content sellers, media servers (local and/or remote), and/or the like.
  • media content services 170 include one or more computer systems (e.g.. a server, a cloud computing system, a networked computing system, a distributed computing system, etc.) for storing and distributing media content.
  • Computing device 180 can communicatively couple with media content service(s) 170 via network(s) 160 and download and/or stream media content from media content services 170.
  • Input devices 122 include one or more devices capable of providing input. Examples of input devices 122 include, without limitation, a touch-sensitive surface (e.g.. a touchpad), a microphone, a touch-sensitive screen, buttons, knobs, dials, a keyboard, a pointing device (e.g., a mouse), and/or the like.
  • Output devices 124 include one or more devices of providing output. Examples of output devices 124 include, without limitation, a display device, haptic devices, and/or the like. Examples of display devices include, without limitation, LCD displays, LED, OLED, AMOLED displays, touch-sensitive displays, transparent displays, projection systems, and/or the like. Additionally, input devices 122 and/or output devices 124 may include devices capable of both receiving input and providing output, such as a touch-sensitive display, and/or the like.
  • Audio output device(s) 126 include one or more devices capable of outputting sound. Audio output device(s) 126 include, without limitation, portable speakers, bone conduction speakers, shoulder worn and shoulder mounted headphones, around-neck speakers, and/or the like. In some embodiments, an audio output device 126 can be coupled to computing device 180 via I/O device interface 104 and/or network interface 106 by wire or wirelessly in any technically feasible manner (e.g., Universal Serial Bus (USB), Bluetooth, ad hoc Wi-Fi).
  • USB Universal Serial Bus
  • Bluetooth Bluetooth
  • audio output device(s) 126 also include computing, communications, and/or networking capability.
  • an audio output device 126 can also include one or more processing units similar to processing unit(s) 102, memory and/or storage, and a network interface similar to network interface 106.
  • An audio output device 126 can communicatively couple with one or more other audio output devices 126 and/or with computing device 180 via the network interface, and optionally store data.
  • multiple audio output devices 126 can communicatively couple with each other and/or the computing device 180 to form a coordinated audio system.
  • a coordinated audio system is an ad hoc network of audio output devices 126 communicatively coupled with each other and computing device 180 via an audio device network 162.
  • Audio device network 162 is typically a wireless network such as a Wi-Fi network, an ad-hoc Wi-Fi network, a Bluetooth network, and/or the like.
  • the audio output devices 126 in the coordinated audio system operate in a “party mode.”
  • audio output devices 126 output media content received from a computing device 180 via audio device network 162.
  • the audio output devices 126 output the media content in a synchronized or near-synchronized manner.
  • the coordinated audio system is initiated from a computing device 180 via output device manager application 150.
  • computing device 180 can send a media content item to audio output devices 126 to be synchronously output by audio output devices 126.
  • the coordinated audio system is described in further detail in FIG. 2.
  • Memory 116 includes an output device manager application 150 and one or more audio upmix applications 152.
  • Output device manager application 150 and audio upmix application 152 are stored in and loaded into memory 116 from storage 114.
  • audio upmix application 152 can also be loaded from the cloud, and/or executed in the cloud rather than executing locally on processing unit(s) 102.
  • audio upmix application 152 outputs (e.g., decodes for playback) locally stored media content (e.g., stored in storage 114) and/or media content from media content services 170 via audio output device 126 and/or output device(s) 124.
  • Audio upmix application 152 also can communicate with media content service(s) 170 to obtain (e.g.. purchase, rent, and/or subscribe to download, stream, or otherwise retrieve) media content for output and/or storage at computing device 180.
  • Output device manager application 150 facilitates management of audio output devices 126.
  • portions or all of output device manager application 150 can execute on one or more audio output devices 126.
  • computing device 180 is communicatively coupled to audio output devices 126
  • a user can, via output device manager application 150, perform management functions for audio output devices 126, including but not limited to monitoring a status (e.g.. battery level, volume level, firmware version, etc.) of audio output devices 126, configuring settings of audio output devices 126, updating a firmware of audio output devices 126, and/or the like.
  • output device manager application 150 can interface with audio upmix application 152 (e.g. via an application programming interface (API)) to cause or otherwise facilitate the sending of media content by audio upmix application 152 to audio output devices 126 and/or to obtain data associated with audio upmix application 152, including but not limited to media content library information.
  • API application programming interface
  • output device manager application 150 facilitates creation and management of the coordinated audio system.
  • a user can, via output device manager application 150, command an audio output device 126 to join the coordinated audio system, thereby creating the coordinated audio system.
  • audio output devices 126 that have previously been part of the coordinated audio system can automatically rejoin the coordinated audio system simply by powering on.
  • a nearby audio output device 126 can automatically join the coordinated audio system when the nearby audio output device 126 is brought in communicative proximity or when powered up in proximity to the coordinated audio system.
  • output device manager application 150 the user can configure an individual audio output device 126 and/or configure the coordinated audio system, modify (e.g., add or remove audio output devices 126) or terminate the coordinated audio system, and perform other management functions associated with the coordinated audio system. Further, output device manager application 150 can generate a playlist of media contents to be sent by audio upmix application 152, whose output is sent to audio output devices 126.
  • audio upmix application 152 receives an audio input signal.
  • audio upmix application 152 receives the audio input signal from an input device 122.
  • audio upmix application 152 retrieves data representing the audio input signal from database 142 stored in memory 116, from storage 114, and/or the like.
  • the audio input signal is a stereophonic audio signal that includes a left audio channel and a right audio channel.
  • the audio input signal is a monaural audio signal with a single channel.
  • the audio input signal is a multichannel encoded audio signal that includes more than two channels, such as four channels, six channels, eight channels, and so on.
  • audio upmix application 152 generates a custom and dynamic upmix that includes multiple component audio signals and transmits those component audio signals to audio output devices 126, as described herein.
  • audio upmix application 152 executes on a cloud computing resource, and the output of audio upmix application 152 is transmitted to computing device 180 and then to audio output devices 126. Additionally or alternatively, the output of audio upmix application 152 bypasses computing device 180 and is transmitted directly to audio output devices 126.
  • audio upmix application 152 generates a more immersive acoustic environment, referred to as a soundfield, when streaming and transmitting audio signals to the multiple audio output devices 126. Further, audio upmix application 152 generates audio output signals that reduce or eliminate undesirable combing artifacts associated with conventional techniques. In that regard, audio upmix application 152 can generate the component audio signal in any technically feasible manner.
  • One way of mitigating this combing effect is for audio upmix application 152 to transmit the left channel audio signal to one audio output device that is positioned on one side of the listening environment, such that the left channel audio signal is sent to both the left and right speakers of the audio output device.
  • audio upmix application 152 transmits the right channel audio signal to another audio output device that is positioned on the other side of the listening environment, such that the right channel audio signal is sent to both the left and right speakers of the audio output device. While this technique can reduce combing effects, this approach does not scale to systems that include more than two audio output devices.
  • audio upmix application 152 performs a signal separation process, also referred to herein as a blind source separation process, on the audio input signal.
  • Audio upmix application 152 can perform other source separation techniques known to those of ordinary skill in the art to derive component audio signals from audio input signals.
  • Signal separation is functionally equivalent to or otherwise known as source separation, blind signal separation or blind source separation.
  • Approaches can include principal component analysis, independent component analysis, independent vector analysis, nonnegative matrix factorization, independent low-rank matrix analysis or the like. Any of the various source separation techniques can be performed using classical signal processing techniques or using machine learning techniques, deep learning techniques, artificial intelligence approaches, and/or the like.
  • signal separation separates a desired audio signal from an audio input signal that also includes additional signals.
  • signal separation can be employed in a cellphone to reduce or eliminate background noise, such as from HVAC noise, traffic sounds, background voices, and/or the like, and maintain the speech audio from the user of the cellphone. Further, signal separation can be employed in a hearing aid to maintain the speech audio from one person and reduce or eliminate speech audio from other nearby persons.
  • Audio upmix application 152 perfonns a signal separation process on an audio input signal, such as a stereo audio signal, to separate multiple component audio signals from the audio input signal.
  • the audio input signal includes a song performed by a rock band that has a vocalist, a lead guitar, a bass guitar, and drums.
  • Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into four component audio signals, one audio signal for each of the vocalist, lead guitar, bass guitar, and drums.
  • the audio input signal includes a song performed by a ten-piece jazz band that includes various horns and drums, along with a vocalist. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into eleven component audio signals, one audio signal for the vocalist and one audio signal for each of the instruments in the ten-piece band.
  • the audio input signal includes a song performed by a pop band that has a lead vocalist, two background vocalists, a lead guitar, a bass guitar, keyboards, and drums.
  • Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into seven component audio signals, one audio signal for each of the lead vocalist, the two background vocalists, lead guitar, bass guitar, keyboards, and drums.
  • audio upmix application 152 can merge the two backup vocalists or all three vocalists together into one component audio signal. In some examples, audio upmix application 152 can merge two guitars together into one component audio signal. In some examples, for a drum kit, audio upmix application 152 can merge the various toms together into one component audio signal, or audio upmix application 152 can separate each of the toms into separate signals. Further, audio upmix application 152 can merge the cymbals together into one component audio signal, or audio upmix application 152 can combine the cymbals with other individual drums in various ways to form an arbitrary number of component audio signals for the drum kit.
  • audio upmix application 152 maps the component audio signal to one or more audio output devices 126 in any technically feasible combination. In some examples, audio upmix application 152 maps each component audio signal to a different audio output device in a one-to-one mapping. As a result, each audio output device plays a different component audio signal. In some examples, audio upmix application 152 maps two or more component audio signals to the same audio output device in a two-to-one mapping, or a many-to-one mapping. As a result, the audio output device 126 plays a mix of two or more component audio signals.
  • audio upmix application 152 maps a component audio signals to two or more audio output devices in a one- to-two mapping, or a one-to-many mapping. As a result, multiple audio output devices play all or a portion of the same component audio signal. Audio upmix application 152 can employ these mapping techniques in any technically feasible combination, within the scope of this disclosure.
  • Audio upmix application 152 in conjunction with output device manager application 150, transmits each component audio signal to the corresponding audio output devices 126 based on the mapping described herein. In some examples, audio upmix application 152 assigns component audio signals to audio output devices 126 in a random manner.
  • audio upmix application 152 matches the frequency bandwidth and relative volume level of each component audio signal to the frequency bandwidth and maximum loudness of each audio output device 126.
  • audio upmix application 152 can match a component audio signal that includes low frequency audio, such as bass drum, bass guitar, baritone saxophone, tuba, baritone vocals, and/or the like to an audio output device 126 that is suitable for reproducing low frequency audio.
  • audio upmix application 152 can match a component audio signal that includes medium frequency audio, such as lead guitar, French horn, tenor saxophone, snare drum, alto vocals, and/or the like to an audio output device 126 that is suitable for reproducing medium frequency audio.
  • audio upmix application 152 can match a component audio signal that includes high frequency audio, such as trumpet, hi-hat cymbal, soprano vocals, and/or the like to an audio output device 126 that is suitable for reproducing high frequency audio.
  • high frequency audio such as trumpet, hi-hat cymbal, soprano vocals, and/or the like
  • audio upmix application 152 assigns component audio signals to audio output devices 126 based on the size of the audio output devices 126 and the volume level of the component audio signal. Audio upmix application 152 maps component audio signals that include loud (aka high) volume audio to audio output devices 126 that are well suited for reproducing loud volume audio. Similarly, audio upmix application 152 maps component audio signals that include soft volume audio to audio output devices 126 that are well suited for reproducing soft volume audio, such as those that cannot play at high audio output levels.
  • the volume level that a particular audio output device 126 is well suited to accommodate can be determined a prior from product specifications of the audio output device 126, from a user interface that receives user input regarding various audio output devices 126, from metadata associated with the audio output device 126, from measurements of frequency response after generating one or more audio frequency sweeps, and/or the like.
  • audio upmix application 152 assigns component audio signals to audio output devices 126 based on a channel assignment received from a graphical user interface.
  • the graphical user interface models the relative location of the audio output devices 126 in the environment.
  • the graphical user interface can be accessed on computing device 180, an input device 122, via I/O device interface 104, and/or the like.
  • the graphical user interface provides for phantom images, whereby audio upmix application 152 generates component audio signals and transmits the component audio signals to two or more audio output devices to produce the appearance that a particular component audio signal is at a particular location where no audio output device is present, even if the original image location for the component audio signal was at a different location at the time that the audio input signal was captured, or when the audio input signal was mixed and mastered.
  • Certain types of recordings such as classical music recordings, chorus recordings, and/or the like, are recorded on a soundstage when the full orchestra and/or chorus is present.
  • audio upmix application 152 assigns component audio signals to audio output devices 126 based on a mapping between the location of the captured sounds from the soundstage at the time of recording relative to the location of the audio output device 126 in the present environment.
  • audio upmix application 152 assigns audio components present in the audio source to audio output devices 126 based at least in part on metadata included in the audio source.
  • This metadata can include information that identifies the separate audio components present in the audio source. Audio upmix application 152 assigns these separate audio components to audio output devices 126 based on this metadata.
  • audio upmix application 152 tracks the audio output devices 126 and adjusts the component audio signals, or the mapping of component audio signals, as audio output devices 126 move within the environment, leave the environment, enter the environment, and/or the like.
  • various components of the coordinated audio system perform the upmix.
  • the upmix is generated locally on computing device 180.
  • audio upmix application 152 transmits the audio input signal to one or more audio output devices 126.
  • each audio output device 126 can perform a localized audio upmix based on at least one of a stereo audio channel received by the audio output device 126, a monaural audio channel received by the audio output device 126, or all audio channels sent to each of the audio output devices.
  • various components of the coordinated audio system perform the upmix via cloud connection from each audio output device 126, from a designated audio output device 126, from computing device 180, and/or the like.
  • audio upmix application 152 transmits component audio signals to each audio output device 126. Additionally or alternatively, audio upmix application 152 transmits component audio signals to a particular audio output device 126. This particular audio output device 126, in turn, transmits component audio signals to each of the other audio output devices 126. Additionally or alternatively, audio upmix application 152 serves as a remote control unit, where audio signals are transmitted directly to each audio output device 126, or to all audio output devices 126.
  • audio upmix application 152 generates component audio signals, where each component audio signal represents a different instrument or voice. Additionally or alternatively, audio upmix application 152 generates component audio signals, where each component audio signal represents a different region of the soundstage either where the musicians were located during the recording, or that are the apparent locations of the musicians resulting from the mixing and mastering process of the recording. Additionally or alternatively, audio upmix application 152 generates component audio signals, where each component audio signals represents a different region of the soundstage, such as an angular region of the soundstage.
  • audio upmix application 152 omits a component audio signal from being transmitted to the audio output devices 126, in a “karaoke” mode.
  • a user can select one or more component audio signal, such as an instrument, voice, or group of instruments or voices, to omit via an input device, such as an interactive graphical user interface implemented on a touchscreen.
  • audio upmix application 152 can omit one or more component audio signals that include vocals so that users can sing along with the audio.
  • audio upmix application 152 can omit one or more component audio signals that include a lead guitar or a bass guitar so that users can play guitar along with the audio.
  • audio upmix application 152 can omit one or more component audio signals that include drums so that users can play drums along with the audio. In some examples, audio upmix application 152 can omit one or more component audio signals that represent various audio processing noises, background noises, artificial or natural room reverb, or artifacts of the source separation process. Various other combinations are possible within the scope of the present disclosure.
  • audio upmix application 152 can guide users, via an interactive graphical user interface, to position audio output devices 126 for accurate soundfield reproduction.
  • audio upmix application 152 can position a component audio signal of a tuba player near the component audio signals playing the brass instruments. In this manner, audio upmix application 152 can match the position of these instruments during the orchestral or brass band recording.
  • the component audio signals can be arranged in a manner totally or partially different than their position during an orchestral recording or their original location on the sound stage resulting from the original recording, mixing, or mastering.
  • audio upmix application 152 generates a room reverberation effect that is inserted into each of the component audio signals in order to increase the sense of envelopment that users experience while listening to the audio.
  • audio upmix application 152 can optionally be informed by measurements of the room that each audio output device 126 determines from the 3D position and/or 3D orientation of the audio output device 126, once the audio output device 126 is placed into position. As a result, so called “dry” rooms can benefit from more added reverberation, while “live” rooms can benefit from less added reverberation.
  • audio upmix application 152 generates the room reverberation effect based on input audio received from microphones place on or near one or more audio output devices 126.
  • FIG. 2 illustrates a coordinated audio system 200, according to one or more aspects of the various embodiments.
  • Coordinated audio system 200 includes multiple audio output devices 126-1 through 126-N (e.g., speakers) communicatively coupled together via audio device network 162.
  • Coordinated audio system 200 further includes computing device 180, communicatively coupled together via audio device network 162.
  • a computing device 180 is communicatively coupled to zero or more audio output devices 126.
  • Communications in coordinated audio system 200 can use standard and/or proprietary protocols.
  • computing device 180 can communicate with audio output devices 126 and with each other using standard protocols (e.g., Bluetooth, Wi-Fi), and audio output devices 126 can communicate with each other using standard or proprietary protocols (e.g.. Bluetooth, a proprietary protocol associated with a specific manufacturer).
  • standard protocols e.g., Bluetooth, Wi-Fi
  • audio output devices 126 that communicate with each other using a proprietary protocol are audio output devices 126 from the same manufacturer (e.g., speakers of the same brand).
  • computing device 180 receives an audio input signal, and separates multiple component audio signals from the audio input signal. For each component audio signal included in the plurality of component audio signals, computing device 180 maps the component audio signal to one or more audio output devices 126. Computing device 180 transmits each component audio signal to corresponding audio output devices 126 based on the mapping. Audio output devices 126 output audio corresponding to the component audio signals from the audio input signal received from computing device 180. In some embodiments, the data corresponding to the component audio signals can be transmitted from the computing device 180 to at least one audio output device 126, and the data can be transmitted amongst audio output devices 126.
  • FIG. 3 illustrates an example of a listening environment 300 for a coordinated audio system, according to one or more aspects of the various embodiments.
  • the listening environment 300 includes four audio output devices 310, 312, 320, and 330.
  • the four audio output devices 310, 312, 320, and 330 play component audio signals corresponding to drums 340, bass guitar 352, lead guitar 350, and vocal/microphone 370, respectively. Therefore, audio output devices 310, 312, 320, and 330. are mapped in a one-to-one relationship with the component audio signals.
  • Listening environment 300 generated by audio output devices 310, 312, 320, and 330 provides a different audio experience depending on where a user is located within listening environment 300.
  • user 380 is located centrally among audio output devices 310, 312, 320, and 330. Therefore, user 380 hears a balanced mix of component audio signals for drums 340, bass guitar 352, lead guitar 350, and vocal/microphone 370.
  • User 382 is located near audio output device 312. Therefore, user 382 hears an increased level of component audio signal for bass guitar 352.
  • User 384 is located near audio output device 330. Therefore, user 384 hears an increased level of the component audio signal for vocals 370.
  • User 386 is located near audio output device 310. Therefore, user 386 hears an increased level of the component audio signal for drums 340.
  • audio upmix application 152 assigns component audio signals to audio output devices based on matching the frequency spectrum of the component audio signals with the frequency operating ranges of the various audio output devices.
  • some audio output devices have a frequency response such that these audio output devices cannot play the lowest notes at the same output level as middle and high frequency notes. As a result, these audio output devices are less suitable for reproducing signals from instruments that have a high amplitude of these low frequency notes.
  • drums 340 and bass guitar 352 which generate audio with a relatively high amplitude in the low frequency spectrum, are assigned to audio output devices 310 and 312, respectively. Audio output devices 310 and 312 are capable of operating with a high amplitude in this low frequency spectral range.
  • Lead guitar 350 which generates output primarily in the midrange frequency spectrum with less output in the low frequency range, is assigned to audio output device 320.
  • Audio output device 320 has a frequency operating range in this midrange frequency spectral range and is unable to output sound of high amplitude in the low frequency range.
  • Vocal/microphone 370 which generates audio with a relatively high frequency spectrum, is assigned to audio output device 330.
  • Audio output device 330 has a frequency operating range in this high frequency spectral range and may not play loudly in the midrange and low frequency range.
  • FIG. 4 illustrates another example of a listening environment 400 for a coordinated audio system, according to one or more aspects of the various embodiments.
  • the listening environment 400 includes eleven audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436.
  • the eleven audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436 play component audio signals corresponding to various instruments and voices.
  • Audio output device 410 primarily plays component audio signals corresponding to baritone saxophone 460.
  • Audio output device 412 primarily plays component audio signals corresponding to bass drum 444.
  • Audio output device 414 primarily plays component audio signals corresponding to tuba 456.
  • Audio output device 420 primarily plays component audio signals corresponding to snare drum 440.
  • Audio output device 422 primarily plays component audio signals corresponding to tenor saxophone 462.
  • Audio output device 424 primarily plays component audio signals corresponding to snare drum 442.
  • Audio output device 426 primarily plays component audio signals corresponding to French horn 454.
  • Audio output device 430 primarily plays component audio signals corresponding to trumpet 450.
  • Audio output device 432 primarily plays component audio signals corresponding to trombone 452.
  • Audio output device 434 primarily plays component audio signals corresponding to vocal/microphone 470.
  • Audio output device 436 primarily plays component audio signals corresponding to hi-hat 446.
  • Listening environment 400 generated by audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436 provides a different audio experience depending on where a user is located within listening environment 400.
  • user 480 is located centrally among the eleven audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436. Therefore, user 480 hears a balanced mix of component audio signals for all instruments and vocals.
  • User 482 is located near audio output devices 410, 422, and 424.
  • user 482 primarily hears the component audio signals for baritone saxophone 460, tenor saxophone 462, and snare drum 442, along with a lower level of each of the remaining component audio signals.
  • User 484 is located near audio output devices 426 and 434.
  • user 484 primarily hears the component audio signal for French horn 454 and lead vocals 470, along with a lower level of each of the remaining component audio signals.
  • audio upmix application 152 assigns component audio signals to audio output devices based on matching the frequency spectrum of the component audio signals with the frequency operating ranges of the various audio output devices.
  • some audio output devices have a frequency response such that these audio output devices cannot play the lowest notes at the same output level as middle and high frequency notes. As a result, these audio output devices are less suitable for reproducing signals from instruments that have a high amplitude of these low frequency notes.
  • baritone saxophone 460, bass drum 444, and tuba 456, which generate audio with a relatively high amplitude in the low frequency spectrum are assigned to audio output devices 410, 412, and 414, respectively.
  • Audio output devices 410, 412, and 414 are capable of operating with a high amplitude in this low frequency spectral range.
  • Snare drum 440, tenor saxophone 462, snare drum 442, and French hom 454 which generate output primarily in the midrange frequency spectrum with less output in the low frequency range, are assigned to audio output devices 420, 422, 424, and 426 respectively.
  • Audio output devices 420, 422, 424, and 426 have a frequency operating range in this midrange frequency spectral range and are unable to output sound of high amplitude in the low frequency range.
  • Audio output devices 430, 432, 434, and 436 are assigned to audio output devices 430, 432, 434, and 436, respectively. Audio output devices 430, 432, 434, and 436 have a frequency operating range in this high frequency spectral range, and lack the ability to play sound at high amplitude in the low frequency range.
  • FIG. 5 illustrates yet another example of a listening environment 500 for a coordinated audio system, according to one or more aspects of the various embodiments.
  • the listening environment 500 includes four audio output devices 510, 512, 520, and 530.
  • the four audio output devices 510, 512, 520, and 530 play component audio signals corresponding to various instruments and voices.
  • Audio output device 510 primarily plays component audio signals corresponding to drums 540.
  • Audio output device 512 primarily plays component audio signals corresponding to a mix of bass guitar 552 and lead guitar 550.
  • Audio output device 520 primarily plays component audio signals corresponding to corresponding to a mix of keyboard 554 and two backup vocals 572 and 574.
  • Audio output device 530 primarily plays component audio signals corresponding to lead vocal/microphone 570.
  • Listening environment 500 generated by audio output devices 510, 512, 520, and 530 provides a different audio experience depending on where a user is located within listening environment 500.
  • user 580 is located centrally among audio output devices 510, 512, 520, and 530. Therefore, user 580 hears a balanced mix of component audio signals for drums 540, bass guitar 552, lead guitar 550, keyboard 554, lead vocal/microphone 570, and backup vocals/microphones 572 and 574.
  • User 582 is located near audio output device 512. Therefore, user 582 primarily hears the component audio signals for bass guitar 552 and lead guitar 550, along with a lower level of each of the remaining component audio signals.
  • User 584 is located near audio output device 530.
  • user 584 primarily hears the component audio signal for lead vocals 570, along with a lower level of each of the remaining component audio signals.
  • User 586 is located near audio output device 510. Therefore, user 586 primarily hears the component audio signal for drums 540, along with a lower level of each of the remaining component audio signals.
  • User 588 is located near audio output device 520. Therefore, user 588 primarily hears the component audio signal for the keyboard 554 and two backup vocals 572 and 574, along with a lower level of each of the remaining component audio signals.
  • the component audio signal for keyboard 554 is played from audio output devices 520 and 530, and user 588 hears a phantom image of the keyboard at a location between audio output devices 520 and 530, where no audio output device is physically located.
  • audio upmix application 152 assigns component audio signals to audio output devices based on matching the frequency spectrum of the component audio signals with the frequency operating ranges of the various audio output devices.
  • some audio output devices have a frequency response such that these audio output devices cannot play the lowest notes at the same output level as middle and high frequency notes. As a result, these audio output devices are less suitable for reproducing signals from instruments that have a high amplitude of these low frequency notes.
  • drums 540 which generates audio with a relatively high amplitude in the low frequency spectrum, is assigned to audio output device 510. Audio output device 510 is capable of operating with a high amplitude in this low frequency spectral range.
  • bass guitar 552 and lead guitar 550 which also generate audio with a relatively high amplitude in the low frequency spectrum, are assigned to audio output device 512.
  • Audio output device 512 is capable of operating with a high amplitude in this low frequency spectral range.
  • Keyboard 554 and two backup vocals 572 and 574 which generate output primarily in the midrange frequency spectrum with less output in the low frequency range, are assigned to audio output device 520.
  • Audio output device 520 has a frequency operating range in this midrange frequency spectral range, and is unable to output sound of high amplitude in the low frequency range.
  • Lead vocal/mi crophone 570 which generates audio with a relatively high frequency spectrum, is assigned to audio output device 530.
  • Audio output device 530 has a frequency operating range in this high frequency spectral range, and lacks the ability to play sound at high amplitude in the low frequency range.
  • FIG. 6 is a flow chart of method steps for generating a set of audio streams for a coordinated audio system, according to one or more aspects of the various embodiments. Although the method steps are described with respect to the systems and examples of FIGs. 1- 5, persons skilled in the art will understand that any system configured to perform the method steps, in any order, falls within the scope of the various embodiments.
  • a method 600 begins at step 602, where an audio upmix application 152 executing on a computing device 180 locates audio output devices 126 in a listening environment.
  • Audio upmix application 152 locates audio output devices 126 that are currently reachable by computing device 180 over one or more wired and/or wireless networks.
  • audio upmix application 152 includes audio output devices 126 that have recently entered a network that is accessible by computing device 180.
  • audio upmix application 152 excludes audio output devices 126 that have recently exited a network that is accessible by computing device 180.
  • audio upmix application 152 adapts to a current set of audio output devices 126 as these audio output devices 126 enter, exit, and move within the listening environment.
  • audio upmix application 152 receives one or more audio input signals.
  • audio upmix application 152 receives the audio input signal from an input device 122.
  • audio upmix application 152 retrieves data representing the audio input signal from database 142 stored in memory 116, from storage 114, and/or the like.
  • the audio input signal is a stereophonic audio signal that includes a left audio channel and a right audio channel.
  • the audio input signal is a monaural audio signal with a single channel.
  • the audio input signal is a multichannel encoded audio signal that includes more than two channels, such as four channels, six channels, eight channels, and so on. These multichannel encoded audio signals include quadrophonic audio, DVD-audio, Super Audio CD, Dolby Atmos, and/or the like.
  • audio upmix application 152 separates a plurality of component audio signals from the one or more audio input signals.
  • audio upmix application 152 performs a signal separation process on an audio input signal, such as a stereo audio signal, to separate multiple component audio signals from the audio input signal.
  • the audio input signal includes a song performed by a rock band that has a vocalist, a lead guitar, a bass guitar, and drums. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into four component audio signals, one audio signal for each of the vocalist, lead guitar, bass guitar, and drums.
  • the audio input signal includes a song performed by a ten-piece jazz band that includes various horns and drums, along with a vocalist.
  • Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into eleven component audio signals, one audio signal for the vocalist and one audio signal for each of the instruments in the ten-piece band.
  • the audio input signal includes a song performed by a pop band that has a lead vocalist, two background vocalists, a lead guitar, a bass guitar, keyboards, and drums. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into seven component audio signals, one audio signal for each of the lead vocalist, the two background vocalists, lead guitar, bass guitar, keyboards, and drums.
  • audio upmix application 152 maps the component audio signals included in a subset of the component audio signals to one or more audio output devices 126 in the listening environment.
  • the subset of component audio signals includes two or more, up to and including all, of the component audio signals included in the plurality of component audio signals of step 606.
  • audio upmix application 152 maps each component audio signal included in the subset of component audio signals to a different audio output device 126 in a one-to-one mapping. As a result, each audio output device plays a different component audio signal.
  • audio upmix application 152 maps two or more component audio signals to the same audio output device 126 in a two-to-one mapping, or a many-to-one mapping.
  • audio output device 126 plays a mix of two or more component audio signals.
  • audio upmix application 152 maps a component audio signals to two or more audio output devices 126 in a one-to-two mapping, or a one-to-many mapping. As a result, multiple audio output devices 126 play a portion of the same component audio signal. Audio upmix application 152 can employ these mapping techniques in any technically feasible combination. In some examples, a component audio signal is omitted from playback, and is not mapped to any audio output device 126, such as in the karaoke use case.
  • audio upmix application 152 transmits the component audio signals included in the subset of component audio signals to corresponding audio output devices 126 based on the mapping determined in step 608.
  • Audio output devices 126 output audio corresponding to the component audio signals from the audio input signal received from audio upmix application 152.
  • the data corresponding to the component audio signals can be transmitted from audio upmix application 152 to at least one audio output device 126, and the data can be transmitted amongst audio output devices 126.
  • the method 600 then returns to step 602 to locate audio output devices 126 currently in the listening environment. In this manner, the method 600 dynamically changes the upmix as audio output devices move within the listening environment, as audio output devices leave the network, as new audio output devices enter the network, and so on.
  • a computing device and multiple audio output devices are coupled together to form a coordinated audio system.
  • the computing device decomposes a received audio signal, such as a music signal into individual audio streams based on certain criteria.
  • the disclosed techniques generate a customizable upmix consisting of multiple audio streams in real time, where each audio stream is transmitted to one or more audio output devices.
  • the computing device generates a different audio stream for each instrument and voice present in the received audio signal.
  • Each of these individual audio streams is then played back on one or more audio output devices to generate an immersive audio field between and among these audio output devices.
  • the user hears a different custom mix with a different balance of instruments and voices, depending on the relative distance of the user to each of the audio output devices.
  • the user can move to different locations to hear all instruments and voices in balance, or can move to various locations where one or more of the individual audio streams is dominant.
  • each user experiences a different mix of the individual audio streams depending on the location and orientation of the user within the audio field generated by the audio output devices.
  • users enjoy a more interactive and immersive listening experience relative to conventional techniques.
  • At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, multiple audio output devices, such as personal speakers, can be deployed in party mode without generating undesirable combing effects, as is typical with conventional techniques.
  • Another technical advantage of the disclosed techniques relative to the prior art is that each user within the environment can have a different audio experience based on the location of the user relative to the locations of the multiple audio output devices.
  • the upmix transmitted to the multiple audio output devices is changed dynamically based on which speakers are in the network, which source audio is being used, and/or the like. More particularly, the techniques dynamically change the upmix as audio output devices move within the environment, as audio output devices leave the network, as new audio output devices enter the network, and so on. In this manner, the upmix is adapted to generate a suitable soundfield as changes in the audio output devices occur over time. As a result, users enjoy a more interactive and immersive listening experience relative to conventional techniques.
  • a computer-implemented method for generating audio signals in an audio system comprises: receiving an audio input signal; separating a plurality of component audio signals from the audio input signal; for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
  • mapping is based on at least one of a frequency spectrum or a volume of a first component audio signal included in the plurality of component audio signals.
  • mapping is based on a virtual location of a first component audio signal included in the plurality of component audio signals, wherein the virtual location is determined when the plurality of component audio signals was mixed and mastered.
  • One or more non-transitory computer-readable storage media include instructions that, when executed by one or more processors at a first computing device, cause the one or more processors to perform steps of: receiving an audio input signal; separating a plurality of component audio signals from the audio input signal; for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
  • each component audio signal included in the plurality of component audio signals comprises a different instrument, voice, or group of instruments or voices included in the audio input signal.
  • mapping is based on at least one of a frequency spectrum or a volume of a first component audio signal included in the plurality of component audio signals.
  • mapping the component audio signal to one or more of a plurality of audio output devices is based on an input received from a user interface.
  • a computing device comprises: a memory storing an application; and one or more processors that, when executing the application, are configured to: receive an audio input signal, separate a plurality of component audio signals from the audio input signal, for a subset of component audio signals included in the plurality of component audio signals, map each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices, and transmit each component audio signal to the corresponding one or more audio output devices based on the mapping.
  • aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module,” a “system,” or a “computer.” In addition, any hardware and/or software technique, process, function, component, engine, module, or system described in the present disclosure may be implemented as a circuit or set of circuits. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Stereophonic System (AREA)

Abstract

The present disclosure includes computer-implemented techniques for generating audio signals in an audio system. The audio system includes a CPU executing an audio upmix applications. The audio system receives an audio input signal. The audio system separates a plurality of component audio signals from the audio input signal. For each component audio signal included in the plurality of component audio signals, the audio system maps the component audio signal to one or more of a plurality of audio output devices. The audio system transmits each component audio signal to the corresponding one or more audio output devices based on the mapping.

Description

DYNAMIC AUDIO MIXING IN A MULTIPLE WIRELESS SPEAKER ENVIRONMENT
BACKGROUND
Field of the Various Embodiments
[0001] The various embodiments relate generally to audio systems, and more specifically, to techniques for dynamic audio mixing in a multiple wireless speaker environment.
Description of the Related Art
[0002] With the proliferation of mobile devices (e.g., smart phones, tablets, and/or the like), demand for portable audio output devices, also referred to herein as personal speakers, has also increased. Such audio output devices allow a listener, also referred to herein as a user, to have a speaker for enjoying audio content on the go that provides better audio quality than the speaker(s) included in a mobile device. A feature of audio output devices that is growing in popularity is a “party mode,” in which multiple audio output devices can communicatively couple together to form an ad-hoc network of speakers that synchronizes together and outputs audio content as one speaker system. The party mode can be coordinated and controlled via a mobile device. Content is output from the mobile device to the network of speakers.
[0003] Typically, audio output devices support stereo output via one or more left speakers and one or more right speakers that are separated by approximately 10-20 cm. While this amount of speaker separation may be adequate for a small number of users listening to a single audio output device, an issue arises when multiple such audio output devices are employed in party mode. In one example, ten or more audio output devices are deployed in party mode and placed at various locations in a listening environment, where each of the audio output devices receives the same stereo signal. Each audio output device plays the left channel of the stereo signal on the left speaker(s) and the right channel of the stereo signal on the right speaker(s). At certain positions in the listening environment, a user may hear multiple left channel audio signals from multiple audio output devices and may also hear multiple right channel audio signals from the same and/or different audio output devices. Due to different travel distances and orientations, the various left channel audio signals arrive at the user at different points of time. Depending on the time differences, some portions of the left channel audio signals may augment each other, causing a volume increase, while other portions of the left channel audio signals may diminish each other, causing a volume decrease. Likewise, the various right channel audio signals arrive at the user at different points of time, potentially with different arrival times than for the left channel audio signals. As a result, the user may perceive portions of the left channel audio as having a lower volume or a higher volume than portions of the right channel audio. This phenomenon, referred to as a combing effect, can lead to an undesirable listening experience.
[0004] As the foregoing illustrates, what is needed are more effective techniques for generating audio signals for output by an audio system having multiple audio output devices.
SUMMARY
[0005] Various embodiments of the present disclosure set forth a computer-implemented method for generating audio signals in an audio system. The method includes receiving an audio input signal. The method further includes separating a plurality of component audio signals from the audio input signal. The method further includes, for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices. The method further includes transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping, audio signals in an audio system include, at a first computing device.
[0006] Further embodiments provide, among other things, one or more non-transitory computer-readable media and systems configured to implement the method set forth above.
[0007] At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, multiple audio output devices, such as personal speakers, can be deployed in party mode without generating undesirable combing effects, as is typical with conventional techniques. Another technical advantage of the disclosed techniques relative to the prior art is that each user within the environment can have a different audio experience based on the location of the user relative to the locations of the multiple audio output devices. Further, the upmix transmitted to the multiple audio output devices is changed dynamically based on which speakers are in the network, which source audio is being used, and/or the like. More particularly, the techniques dynamically change the upmix as audio output devices move within the environment, as audio output devices leave the network, as new audio output devices enter the network, and so on. In this manner, the upmix is adapted to generate a suitable soundfield as changes in the audio output devices and source audio occur over time. As a result, users enjoy a more interactive and immersive listening experience relative to conventional techniques. These technical advantages provide one or more technological improvements over prior art approaches.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] So that the manner in which the above recited features of the various embodiments can be understood in detail, a more particular description of the inventive concepts, briefly summarized above, may be had by reference to various embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of the inventive concepts and are therefore not to be considered limiting of scope in any way, and that there are other equally effective embodiments.
[0009] FIG. 1 is a block diagram of a computer system configured to implement one or more aspects of the various embodiments;
[0010] FIG. 2 illustrates a coordinated audio system, according to one or more aspects of the various embodiments;
[0011] FIG. 3 illustrates an example of a listening environment for a coordinated audio system, according to one or more aspects of the various embodiments;
[0012] FIG 4 illustrates another example of a listening environment for a coordinated audio system, according to one or more aspects of the various embodiments;
[0013] FIG. 5 illustrates yet another example of a listening environment for a coordinated audio system, according to one or more aspects of the various embodiments; and
[0014] FIG. 6 is a flow chart of method steps for generating a set of audio streams for a coordinated audio system, according to one or more aspects of the various embodiments.
DETAILED DESCRIPTION
[0015] In the following description, numerous specific details are set forth to provide a more thorough understanding of the various embodiments. However, it will be apparent to one of skilled in the art that the inventive concepts may be practiced without one or more of these specific details.
[0016] FIG. 1 illustrates a computer system 100 configured to implement one or more aspects of the various embodiments. As shown, computer system 100 includes, without limitation, computing device(s) 180, input devices 122, output devices 124, audio output device(s) 126, network(s) 160, audio device network 162, and media content services 170. Computing device 180 includes, without limitation, one or more processing units 102, I/O device interface 104, network interface 106, interconnect 112 (e.g., a bus), storage 114, and memory 116. Memory 116 stores, without limitation, output device manager application 150 and audio upmix application 152. Processing unit(s) 102 and memory 116 can be implemented in any technically feasible fashion. For example, and without limitation, in various embodiments, any combination of processing unit(s) 102 and memory 116 can be implemented as a stand-alone chip or as part of a more comprehensive solution that is implemented as an application-specific integrated circuit (ASIC), a system-on-a-chip (SoC), and/or the like. Processing unit(s) 102, I/O device interface 104, network interface 106, storage 1 14, and memory 1 16 can be communicatively coupled to each other via interconnect 112.
[0017] The one or more processing unit(s) 102 can include any suitable processor, such as a central processing unit (CPU), a graphics processing unit (GPU), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), a tensor processing unit (TPU), any other type of processing unit, or a combination of multiple processing units, such as a CPU configured to operate in conjunction with a GPU. In general, each of the one or more processing unit(s) 102 can be any technically feasible hardware unit capable of processing data and/or executing software applications and modules.
[0018] Storage 114 can include non-volatile storage for applications, software modules, and data, and can include fixed or removable disk drives, flash memory devices, and CD- ROM, DVD-ROM, Blu-Ray, HD-DVD, or other magnetic, optical, solid state storage devices, and/or the like. Storage 114 can be fully or partially located in a remote storage system, referred to herein as “the cloud,” and accessed through connections such as network 160.
[0019] Memory 116 can include a random access memory (RAM) module, a flash memory unit, or any other type of memory unit or combination thereof. The one or more processing unit(s) 102, I/O device interface 104, and network interface 106 are configured to read data from and write data to memory 116. Memory 116 includes various software programs and modules (e.g., an operating system, one or more applications) that can be executed by processing unit(s) 102 and application data (e.g., data loaded from storage 114) associated with said software programs.
[0020] In some embodiments, one or more databases 142 are loaded from storage 114 into memory 116. Databases 142 include application, user data, media content, etc. that are associated with one or more applications that can be executed by processing unit(s) 102.
[0021] In some embodiments, computing device 180 is communicatively coupled to one or more networks 160. Network(s) 160 can be any technically feasible type of communications network that allows data to be exchanged between computing device 180 and other systems or devices, such as a server, a cloud computing system, or other networked computing device or system (e.g., media content service(s) 170). For example, network 160 can include a wide area network (WAN), a local area network (LAN), a wireless network (e.g., a Wi-Fi network, a cellular data network, an ad-hoc network), and/or the Internet, among others. Computing device 180 can connect with network(s) 160 via network interface 106. In some embodiments, network interface 106 is hardware, software, or a combination of hardware and software, that is configured to connect to and interface with network(s) 160. In some embodiments, network interface 106 facilitates communication with other devices or systems via one or more standard and/or proprietary protocols (e.g.. Bluetooth, a proprietary protocol associated with a specific manufacturer, etc.)
[0022] Media content service(s) 170 includes one or more computerized services configured to provide (e.g.. distribute) media content to devices (e.g., to computing devices 180). Examples of media contents service(s) 170 include Spotify, Apple Music, Pandora, YouTube Music, Tidal and/or the like. Media or media content as used herein, includes, without limitation, audio content (e.g.. spoken and/or musical audio content, audio content files, streaming audio content, audio track of a video, and/or the like) and/or video content. Examples of media content service(s) 170 include, without limitation, media content streaming services, YouTube, digital media content sellers, media servers (local and/or remote), and/or the like. More generally, media content services 170 include one or more computer systems (e.g.. a server, a cloud computing system, a networked computing system, a distributed computing system, etc.) for storing and distributing media content. Computing device 180 can communicatively couple with media content service(s) 170 via network(s) 160 and download and/or stream media content from media content services 170.
[0023] Input devices 122 include one or more devices capable of providing input. Examples of input devices 122 include, without limitation, a touch-sensitive surface (e.g.. a touchpad), a microphone, a touch-sensitive screen, buttons, knobs, dials, a keyboard, a pointing device (e.g., a mouse), and/or the like. Output devices 124 include one or more devices of providing output. Examples of output devices 124 include, without limitation, a display device, haptic devices, and/or the like. Examples of display devices include, without limitation, LCD displays, LED, OLED, AMOLED displays, touch-sensitive displays, transparent displays, projection systems, and/or the like. Additionally, input devices 122 and/or output devices 124 may include devices capable of both receiving input and providing output, such as a touch-sensitive display, and/or the like.
[0024] Audio output device(s) 126 (e.g., audio output devices 126-1, 126-2, . . . 126-N) include one or more devices capable of outputting sound. Audio output device(s) 126 include, without limitation, portable speakers, bone conduction speakers, shoulder worn and shoulder mounted headphones, around-neck speakers, and/or the like. In some embodiments, an audio output device 126 can be coupled to computing device 180 via I/O device interface 104 and/or network interface 106 by wire or wirelessly in any technically feasible manner (e.g., Universal Serial Bus (USB), Bluetooth, ad hoc Wi-Fi).
[0025] In various embodiments, audio output device(s) 126 also include computing, communications, and/or networking capability. For example, an audio output device 126 can also include one or more processing units similar to processing unit(s) 102, memory and/or storage, and a network interface similar to network interface 106. An audio output device 126 can communicatively couple with one or more other audio output devices 126 and/or with computing device 180 via the network interface, and optionally store data.
[0026] In various embodiments, multiple audio output devices 126 can communicatively couple with each other and/or the computing device 180 to form a coordinated audio system. A coordinated audio system, as used herein, is an ad hoc network of audio output devices 126 communicatively coupled with each other and computing device 180 via an audio device network 162. Audio device network 162 is typically a wireless network such as a Wi-Fi network, an ad-hoc Wi-Fi network, a Bluetooth network, and/or the like. In some embodiments, the audio output devices 126 in the coordinated audio system operate in a “party mode.”
[0027] In the coordinated audio system, audio output devices 126 output media content received from a computing device 180 via audio device network 162. The audio output devices 126 output the media content in a synchronized or near-synchronized manner. In some embodiments, the coordinated audio system is initiated from a computing device 180 via output device manager application 150. For example, while computing device 180 is communicatively coupled to audio output device 126, computing device 180 can send a media content item to audio output devices 126 to be synchronously output by audio output devices 126. The coordinated audio system is described in further detail in FIG. 2.
[0028] Memory 116 includes an output device manager application 150 and one or more audio upmix applications 152. Output device manager application 150 and audio upmix application 152 are stored in and loaded into memory 116 from storage 114. In some examples, audio upmix application 152 can also be loaded from the cloud, and/or executed in the cloud rather than executing locally on processing unit(s) 102. In operation, audio upmix application 152 outputs (e.g., decodes for playback) locally stored media content (e.g., stored in storage 114) and/or media content from media content services 170 via audio output device 126 and/or output device(s) 124. Audio upmix application 152 also can communicate with media content service(s) 170 to obtain (e.g.. purchase, rent, and/or subscribe to download, stream, or otherwise retrieve) media content for output and/or storage at computing device 180.
[0029] Output device manager application 150 facilitates management of audio output devices 126. In some examples, portions or all of output device manager application 150 can execute on one or more audio output devices 126. While computing device 180 is communicatively coupled to audio output devices 126, a user can, via output device manager application 150, perform management functions for audio output devices 126, including but not limited to monitoring a status (e.g.. battery level, volume level, firmware version, etc.) of audio output devices 126, configuring settings of audio output devices 126, updating a firmware of audio output devices 126, and/or the like. In some embodiments, output device manager application 150 can interface with audio upmix application 152 (e.g.. via an application programming interface (API)) to cause or otherwise facilitate the sending of media content by audio upmix application 152 to audio output devices 126 and/or to obtain data associated with audio upmix application 152, including but not limited to media content library information.
[0030] Further, in some embodiments, output device manager application 150 facilitates creation and management of the coordinated audio system. A user can, via output device manager application 150, command an audio output device 126 to join the coordinated audio system, thereby creating the coordinated audio system. In some examples, audio output devices 126 that have previously been part of the coordinated audio system can automatically rejoin the coordinated audio system simply by powering on. In some examples, a nearby audio output device 126 can automatically join the coordinated audio system when the nearby audio output device 126 is brought in communicative proximity or when powered up in proximity to the coordinated audio system. Within output device manage application 150, the user can configure an individual audio output device 126 and/or configure the coordinated audio system, modify (e.g., add or remove audio output devices 126) or terminate the coordinated audio system, and perform other management functions associated with the coordinated audio system. Further, output device manager application 150 can generate a playlist of media contents to be sent by audio upmix application 152, whose output is sent to audio output devices 126.
[0031] In operation, audio upmix application 152 receives an audio input signal. In some examples, audio upmix application 152 receives the audio input signal from an input device 122. In some examples, audio upmix application 152 retrieves data representing the audio input signal from database 142 stored in memory 116, from storage 114, and/or the like. Typically, the audio input signal is a stereophonic audio signal that includes a left audio channel and a right audio channel. Alternatively, the audio input signal is a monaural audio signal with a single channel. In some examples, the audio input signal is a multichannel encoded audio signal that includes more than two channels, such as four channels, six channels, eight channels, and so on. These multichannel encoded audio signals include quadrophonic audio, DVD-audio, Super Audio CD, Dolby Atmos, and/or the like. Regardless of the format of the input audio signal, audio upmix application 152 generates a custom and dynamic upmix that includes multiple component audio signals and transmits those component audio signals to audio output devices 126, as described herein. In some examples, audio upmix application 152 executes on a cloud computing resource, and the output of audio upmix application 152 is transmitted to computing device 180 and then to audio output devices 126. Additionally or alternatively, the output of audio upmix application 152 bypasses computing device 180 and is transmitted directly to audio output devices 126. In this manner, audio upmix application 152 generates a more immersive acoustic environment, referred to as a soundfield, when streaming and transmitting audio signals to the multiple audio output devices 126. Further, audio upmix application 152 generates audio output signals that reduce or eliminate undesirable combing artifacts associated with conventional techniques. In that regard, audio upmix application 152 can generate the component audio signal in any technically feasible manner.
[0032] One way of mitigating this combing effect is for audio upmix application 152 to transmit the left channel audio signal to one audio output device that is positioned on one side of the listening environment, such that the left channel audio signal is sent to both the left and right speakers of the audio output device. Similarly, audio upmix application 152 transmits the right channel audio signal to another audio output device that is positioned on the other side of the listening environment, such that the right channel audio signal is sent to both the left and right speakers of the audio output device. While this technique can reduce combing effects, this approach does not scale to systems that include more than two audio output devices.
[0033] Additionally or alternatively, to generate the custom upmix for the audio output devices 126, audio upmix application 152 performs a signal separation process, also referred to herein as a blind source separation process, on the audio input signal. Audio upmix application 152 can perform other source separation techniques known to those of ordinary skill in the art to derive component audio signals from audio input signals. Signal separation is functionally equivalent to or otherwise known as source separation, blind signal separation or blind source separation. Approaches can include principal component analysis, independent component analysis, independent vector analysis, nonnegative matrix factorization, independent low-rank matrix analysis or the like. Any of the various source separation techniques can be performed using classical signal processing techniques or using machine learning techniques, deep learning techniques, artificial intelligence approaches, and/or the like. In some examples, signal separation separates a desired audio signal from an audio input signal that also includes additional signals. In that regard, signal separation can be employed in a cellphone to reduce or eliminate background noise, such as from HVAC noise, traffic sounds, background voices, and/or the like, and maintain the speech audio from the user of the cellphone. Further, signal separation can be employed in a hearing aid to maintain the speech audio from one person and reduce or eliminate speech audio from other nearby persons. Audio upmix application 152 perfonns a signal separation process on an audio input signal, such as a stereo audio signal, to separate multiple component audio signals from the audio input signal.
[0034] In some examples, the audio input signal includes a song performed by a rock band that has a vocalist, a lead guitar, a bass guitar, and drums. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into four component audio signals, one audio signal for each of the vocalist, lead guitar, bass guitar, and drums. In some examples, the audio input signal includes a song performed by a ten-piece jazz band that includes various horns and drums, along with a vocalist. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into eleven component audio signals, one audio signal for the vocalist and one audio signal for each of the instruments in the ten-piece band. In some examples, the audio input signal includes a song performed by a pop band that has a lead vocalist, two background vocalists, a lead guitar, a bass guitar, keyboards, and drums. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into seven component audio signals, one audio signal for each of the lead vocalist, the two background vocalists, lead guitar, bass guitar, keyboards, and drums.
[0035] Some component audio signals are more challenging for source separators to separate fully without errors, such as when the component audio signals have very similar characteristics. In some examples, audio upmix application 152 can merge the two backup vocalists or all three vocalists together into one component audio signal. In some examples, audio upmix application 152 can merge two guitars together into one component audio signal. In some examples, for a drum kit, audio upmix application 152 can merge the various toms together into one component audio signal, or audio upmix application 152 can separate each of the toms into separate signals. Further, audio upmix application 152 can merge the cymbals together into one component audio signal, or audio upmix application 152 can combine the cymbals with other individual drums in various ways to form an arbitrary number of component audio signals for the drum kit.
[0036] For each component audio signal, audio upmix application 152 maps the component audio signal to one or more audio output devices 126 in any technically feasible combination. In some examples, audio upmix application 152 maps each component audio signal to a different audio output device in a one-to-one mapping. As a result, each audio output device plays a different component audio signal. In some examples, audio upmix application 152 maps two or more component audio signals to the same audio output device in a two-to-one mapping, or a many-to-one mapping. As a result, the audio output device 126 plays a mix of two or more component audio signals. In some examples, audio upmix application 152 maps a component audio signals to two or more audio output devices in a one- to-two mapping, or a one-to-many mapping. As a result, multiple audio output devices play all or a portion of the same component audio signal. Audio upmix application 152 can employ these mapping techniques in any technically feasible combination, within the scope of this disclosure.
[0037] Audio upmix application 152, in conjunction with output device manager application 150, transmits each component audio signal to the corresponding audio output devices 126 based on the mapping described herein. In some examples, audio upmix application 152 assigns component audio signals to audio output devices 126 in a random manner.
[0038] In some examples, audio upmix application 152 matches the frequency bandwidth and relative volume level of each component audio signal to the frequency bandwidth and maximum loudness of each audio output device 126. In this regard, audio upmix application 152 can match a component audio signal that includes low frequency audio, such as bass drum, bass guitar, baritone saxophone, tuba, baritone vocals, and/or the like to an audio output device 126 that is suitable for reproducing low frequency audio. Similarly , audio upmix application 152 can match a component audio signal that includes medium frequency audio, such as lead guitar, French horn, tenor saxophone, snare drum, alto vocals, and/or the like to an audio output device 126 that is suitable for reproducing medium frequency audio. Likewise, audio upmix application 152 can match a component audio signal that includes high frequency audio, such as trumpet, hi-hat cymbal, soprano vocals, and/or the like to an audio output device 126 that is suitable for reproducing high frequency audio.
[0039] In some examples, audio upmix application 152 assigns component audio signals to audio output devices 126 based on the size of the audio output devices 126 and the volume level of the component audio signal. Audio upmix application 152 maps component audio signals that include loud (aka high) volume audio to audio output devices 126 that are well suited for reproducing loud volume audio. Similarly, audio upmix application 152 maps component audio signals that include soft volume audio to audio output devices 126 that are well suited for reproducing soft volume audio, such as those that cannot play at high audio output levels. The volume level that a particular audio output device 126 is well suited to accommodate can be determined a prior from product specifications of the audio output device 126, from a user interface that receives user input regarding various audio output devices 126, from metadata associated with the audio output device 126, from measurements of frequency response after generating one or more audio frequency sweeps, and/or the like.
[0040] In some examples, audio upmix application 152 assigns component audio signals to audio output devices 126 based on a channel assignment received from a graphical user interface. The graphical user interface models the relative location of the audio output devices 126 in the environment. In some examples, the graphical user interface can be accessed on computing device 180, an input device 122, via I/O device interface 104, and/or the like. In some examples, the graphical user interface provides for phantom images, whereby audio upmix application 152 generates component audio signals and transmits the component audio signals to two or more audio output devices to produce the appearance that a particular component audio signal is at a particular location where no audio output device is present, even if the original image location for the component audio signal was at a different location at the time that the audio input signal was captured, or when the audio input signal was mixed and mastered. Certain types of recordings, such as classical music recordings, chorus recordings, and/or the like, are recorded on a soundstage when the full orchestra and/or chorus is present. Other types of recordings, such as pop music, rock music, and/or the like, are assembled from various studio recordings of the individual musicians and vocalists, which have previously been recorded in separate acoustic environments. The various recordings are then mixed and mastered to simulate the sound of the full group as if the group was performing on a soundstage. As a result, each of the various studio recordings is associated with a different component audio signal at a different virtual location, where the virtual location is determined when the various studio recordings are mixed and mastered into a final recording. In some examples, audio upmix application 152 assigns component audio signals to audio output devices 126 based on a mapping between the location of the captured sounds from the soundstage at the time of recording relative to the location of the audio output device 126 in the present environment. In some examples, audio upmix application 152 assigns audio components present in the audio source to audio output devices 126 based at least in part on metadata included in the audio source. This metadata can include information that identifies the separate audio components present in the audio source. Audio upmix application 152 assigns these separate audio components to audio output devices 126 based on this metadata.
[0041] In some examples, audio upmix application 152 tracks the audio output devices 126 and adjusts the component audio signals, or the mapping of component audio signals, as audio output devices 126 move within the environment, leave the environment, enter the environment, and/or the like.
[0042] In some examples, various components of the coordinated audio system perform the upmix. In some examples, the upmix is generated locally on computing device 180. Additionally or alternatively, audio upmix application 152 transmits the audio input signal to one or more audio output devices 126. In such examples, each audio output device 126 can perform a localized audio upmix based on at least one of a stereo audio channel received by the audio output device 126, a monaural audio channel received by the audio output device 126, or all audio channels sent to each of the audio output devices. In some examples, various components of the coordinated audio system perform the upmix via cloud connection from each audio output device 126, from a designated audio output device 126, from computing device 180, and/or the like.
[0043] In some examples, audio upmix application 152 transmits component audio signals to each audio output device 126. Additionally or alternatively, audio upmix application 152 transmits component audio signals to a particular audio output device 126. This particular audio output device 126, in turn, transmits component audio signals to each of the other audio output devices 126. Additionally or alternatively, audio upmix application 152 serves as a remote control unit, where audio signals are transmitted directly to each audio output device 126, or to all audio output devices 126.
[0044] In some examples, audio upmix application 152 generates component audio signals, where each component audio signal represents a different instrument or voice. Additionally or alternatively, audio upmix application 152 generates component audio signals, where each component audio signal represents a different region of the soundstage either where the musicians were located during the recording, or that are the apparent locations of the musicians resulting from the mixing and mastering process of the recording. Additionally or alternatively, audio upmix application 152 generates component audio signals, where each component audio signals represents a different region of the soundstage, such as an angular region of the soundstage.
[0045] In some examples, audio upmix application 152 omits a component audio signal from being transmitted to the audio output devices 126, in a “karaoke” mode. In that regard, a user can select one or more component audio signal, such as an instrument, voice, or group of instruments or voices, to omit via an input device, such as an interactive graphical user interface implemented on a touchscreen. For example, audio upmix application 152 can omit one or more component audio signals that include vocals so that users can sing along with the audio. In some examples, audio upmix application 152 can omit one or more component audio signals that include a lead guitar or a bass guitar so that users can play guitar along with the audio. In some examples, audio upmix application 152 can omit one or more component audio signals that include drums so that users can play drums along with the audio. In some examples, audio upmix application 152 can omit one or more component audio signals that represent various audio processing noises, background noises, artificial or natural room reverb, or artifacts of the source separation process. Various other combinations are possible within the scope of the present disclosure.
[0046] In some examples, audio upmix application 152 can guide users, via an interactive graphical user interface, to position audio output devices 126 for accurate soundfield reproduction. In that regard, audio upmix application 152 can position a component audio signal of a tuba player near the component audio signals playing the brass instruments. In this manner, audio upmix application 152 can match the position of these instruments during the orchestral or brass band recording. In some examples, the component audio signals can be arranged in a manner totally or partially different than their position during an orchestral recording or their original location on the sound stage resulting from the original recording, mixing, or mastering.
[0047] In some examples, audio upmix application 152 generates a room reverberation effect that is inserted into each of the component audio signals in order to increase the sense of envelopment that users experience while listening to the audio. To generate this room reverberation effect, audio upmix application 152 can optionally be informed by measurements of the room that each audio output device 126 determines from the 3D position and/or 3D orientation of the audio output device 126, once the audio output device 126 is placed into position. As a result, so called “dry” rooms can benefit from more added reverberation, while “live” rooms can benefit from less added reverberation. In some examples, audio upmix application 152 generates the room reverberation effect based on input audio received from microphones place on or near one or more audio output devices 126.
[0048] FIG. 2 illustrates a coordinated audio system 200, according to one or more aspects of the various embodiments. Coordinated audio system 200 includes multiple audio output devices 126-1 through 126-N (e.g., speakers) communicatively coupled together via audio device network 162. Coordinated audio system 200 further includes computing device 180, communicatively coupled together via audio device network 162. Within coordinated audio system 200, a computing device 180 is communicatively coupled to zero or more audio output devices 126.
[0049] Communications in coordinated audio system 200 can use standard and/or proprietary protocols. For example, computing device 180 can communicate with audio output devices 126 and with each other using standard protocols (e.g., Bluetooth, Wi-Fi), and audio output devices 126 can communicate with each other using standard or proprietary protocols (e.g.. Bluetooth, a proprietary protocol associated with a specific manufacturer). In some embodiments, audio output devices 126 that communicate with each other using a proprietary protocol are audio output devices 126 from the same manufacturer (e.g., speakers of the same brand).
[0050] In coordinated audio system 200, computing device 180 receives an audio input signal, and separates multiple component audio signals from the audio input signal. For each component audio signal included in the plurality of component audio signals, computing device 180 maps the component audio signal to one or more audio output devices 126. Computing device 180 transmits each component audio signal to corresponding audio output devices 126 based on the mapping. Audio output devices 126 output audio corresponding to the component audio signals from the audio input signal received from computing device 180. In some embodiments, the data corresponding to the component audio signals can be transmitted from the computing device 180 to at least one audio output device 126, and the data can be transmitted amongst audio output devices 126.
[0051] FIG. 3 illustrates an example of a listening environment 300 for a coordinated audio system, according to one or more aspects of the various embodiments. As shown, the listening environment 300 includes four audio output devices 310, 312, 320, and 330. The four audio output devices 310, 312, 320, and 330 play component audio signals corresponding to drums 340, bass guitar 352, lead guitar 350, and vocal/microphone 370, respectively. Therefore, audio output devices 310, 312, 320, and 330. are mapped in a one-to-one relationship with the component audio signals. Listening environment 300 generated by audio output devices 310, 312, 320, and 330 provides a different audio experience depending on where a user is located within listening environment 300. As shown, user 380 is located centrally among audio output devices 310, 312, 320, and 330. Therefore, user 380 hears a balanced mix of component audio signals for drums 340, bass guitar 352, lead guitar 350, and vocal/microphone 370. User 382 is located near audio output device 312. Therefore, user 382 hears an increased level of component audio signal for bass guitar 352. User 384 is located near audio output device 330. Therefore, user 384 hears an increased level of the component audio signal for vocals 370. User 386 is located near audio output device 310. Therefore, user 386 hears an increased level of the component audio signal for drums 340. The more closely a listener is located to a particular audio output device the more a particular component audio signal dominates the total sound that the listener hears. The farther a listener is from a particular audio output device, the less that the particular device output contributes to the total sound that the listener hears.
[0052] In some examples, audio upmix application 152 assigns component audio signals to audio output devices based on matching the frequency spectrum of the component audio signals with the frequency operating ranges of the various audio output devices. In that regard, some audio output devices have a frequency response such that these audio output devices cannot play the lowest notes at the same output level as middle and high frequency notes. As a result, these audio output devices are less suitable for reproducing signals from instruments that have a high amplitude of these low frequency notes. As shown, drums 340 and bass guitar 352, which generate audio with a relatively high amplitude in the low frequency spectrum, are assigned to audio output devices 310 and 312, respectively. Audio output devices 310 and 312 are capable of operating with a high amplitude in this low frequency spectral range. Lead guitar 350 which generates output primarily in the midrange frequency spectrum with less output in the low frequency range, is assigned to audio output device 320. Audio output device 320 has a frequency operating range in this midrange frequency spectral range and is unable to output sound of high amplitude in the low frequency range. Vocal/microphone 370 which generates audio with a relatively high frequency spectrum, is assigned to audio output device 330. Audio output device 330 has a frequency operating range in this high frequency spectral range and may not play loudly in the midrange and low frequency range.
[0053] FIG. 4 illustrates another example of a listening environment 400 for a coordinated audio system, according to one or more aspects of the various embodiments. As shown, the listening environment 400 includes eleven audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436. The eleven audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436 play component audio signals corresponding to various instruments and voices. Audio output device 410 primarily plays component audio signals corresponding to baritone saxophone 460. Audio output device 412 primarily plays component audio signals corresponding to bass drum 444. Audio output device 414 primarily plays component audio signals corresponding to tuba 456.
[0054] Audio output device 420 primarily plays component audio signals corresponding to snare drum 440. Audio output device 422 primarily plays component audio signals corresponding to tenor saxophone 462. Audio output device 424 primarily plays component audio signals corresponding to snare drum 442. Audio output device 426 primarily plays component audio signals corresponding to French horn 454. Audio output device 430 primarily plays component audio signals corresponding to trumpet 450. Audio output device 432 primarily plays component audio signals corresponding to trombone 452. Audio output device 434 primarily plays component audio signals corresponding to vocal/microphone 470. Audio output device 436 primarily plays component audio signals corresponding to hi-hat 446.
[0055] Listening environment 400 generated by audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436 provides a different audio experience depending on where a user is located within listening environment 400. As shown, user 480 is located centrally among the eleven audio output devices 410, 412, 414, 420, 422, 424, 426, 430, 432, 434, and 436. Therefore, user 480 hears a balanced mix of component audio signals for all instruments and vocals. User 482 is located near audio output devices 410, 422, and 424. Therefore, user 482 primarily hears the component audio signals for baritone saxophone 460, tenor saxophone 462, and snare drum 442, along with a lower level of each of the remaining component audio signals. User 484 is located near audio output devices 426 and 434.
Therefore, user 484 primarily hears the component audio signal for French horn 454 and lead vocals 470, along with a lower level of each of the remaining component audio signals.
[0056] In some examples, audio upmix application 152 assigns component audio signals to audio output devices based on matching the frequency spectrum of the component audio signals with the frequency operating ranges of the various audio output devices. In that regard, some audio output devices have a frequency response such that these audio output devices cannot play the lowest notes at the same output level as middle and high frequency notes. As a result, these audio output devices are less suitable for reproducing signals from instruments that have a high amplitude of these low frequency notes. As shown, baritone saxophone 460, bass drum 444, and tuba 456, which generate audio with a relatively high amplitude in the low frequency spectrum, are assigned to audio output devices 410, 412, and 414, respectively. Audio output devices 410, 412, and 414 are capable of operating with a high amplitude in this low frequency spectral range. Snare drum 440, tenor saxophone 462, snare drum 442, and French hom 454 which generate output primarily in the midrange frequency spectrum with less output in the low frequency range, are assigned to audio output devices 420, 422, 424, and 426 respectively. Audio output devices 420, 422, 424, and 426 have a frequency operating range in this midrange frequency spectral range and are unable to output sound of high amplitude in the low frequency range. Trumpet 450, trombone 452, vocal/microphone 470, and hi-hat 446 which generate audio with a relatively high frequency spectrum, are assigned to audio output devices 430, 432, 434, and 436, respectively. Audio output devices 430, 432, 434, and 436 have a frequency operating range in this high frequency spectral range, and lack the ability to play sound at high amplitude in the low frequency range.
[0057] FIG. 5 illustrates yet another example of a listening environment 500 for a coordinated audio system, according to one or more aspects of the various embodiments. As shown, the listening environment 500 includes four audio output devices 510, 512, 520, and 530. The four audio output devices 510, 512, 520, and 530 play component audio signals corresponding to various instruments and voices. Audio output device 510 primarily plays component audio signals corresponding to drums 540. Audio output device 512 primarily plays component audio signals corresponding to a mix of bass guitar 552 and lead guitar 550. Audio output device 520 primarily plays component audio signals corresponding to corresponding to a mix of keyboard 554 and two backup vocals 572 and 574. Audio output device 530 primarily plays component audio signals corresponding to lead vocal/microphone 570.
[0058] Listening environment 500 generated by audio output devices 510, 512, 520, and 530 provides a different audio experience depending on where a user is located within listening environment 500. As shown, user 580 is located centrally among audio output devices 510, 512, 520, and 530. Therefore, user 580 hears a balanced mix of component audio signals for drums 540, bass guitar 552, lead guitar 550, keyboard 554, lead vocal/microphone 570, and backup vocals/microphones 572 and 574. User 582 is located near audio output device 512. Therefore, user 582 primarily hears the component audio signals for bass guitar 552 and lead guitar 550, along with a lower level of each of the remaining component audio signals. User 584 is located near audio output device 530. Therefore, user 584 primarily hears the component audio signal for lead vocals 570, along with a lower level of each of the remaining component audio signals. User 586 is located near audio output device 510. Therefore, user 586 primarily hears the component audio signal for drums 540, along with a lower level of each of the remaining component audio signals. User 588 is located near audio output device 520. Therefore, user 588 primarily hears the component audio signal for the keyboard 554 and two backup vocals 572 and 574, along with a lower level of each of the remaining component audio signals. In an example, the component audio signal for keyboard 554 is played from audio output devices 520 and 530, and user 588 hears a phantom image of the keyboard at a location between audio output devices 520 and 530, where no audio output device is physically located.
[0059] In some examples, audio upmix application 152 assigns component audio signals to audio output devices based on matching the frequency spectrum of the component audio signals with the frequency operating ranges of the various audio output devices. In that regard, some audio output devices have a frequency response such that these audio output devices cannot play the lowest notes at the same output level as middle and high frequency notes. As a result, these audio output devices are less suitable for reproducing signals from instruments that have a high amplitude of these low frequency notes. As shown, drums 540, which generates audio with a relatively high amplitude in the low frequency spectrum, is assigned to audio output device 510. Audio output device 510 is capable of operating with a high amplitude in this low frequency spectral range. As shown, bass guitar 552 and lead guitar 550, which also generate audio with a relatively high amplitude in the low frequency spectrum, are assigned to audio output device 512. Audio output device 512 is capable of operating with a high amplitude in this low frequency spectral range. Keyboard 554 and two backup vocals 572 and 574, which generate output primarily in the midrange frequency spectrum with less output in the low frequency range, are assigned to audio output device 520. Audio output device 520 has a frequency operating range in this midrange frequency spectral range, and is unable to output sound of high amplitude in the low frequency range. Lead vocal/mi crophone 570 which generates audio with a relatively high frequency spectrum, is assigned to audio output device 530. Audio output device 530 has a frequency operating range in this high frequency spectral range, and lacks the ability to play sound at high amplitude in the low frequency range. [0060] FIG. 6 is a flow chart of method steps for generating a set of audio streams for a coordinated audio system, according to one or more aspects of the various embodiments. Although the method steps are described with respect to the systems and examples of FIGs. 1- 5, persons skilled in the art will understand that any system configured to perform the method steps, in any order, falls within the scope of the various embodiments.
[0061] As shown, a method 600 begins at step 602, where an audio upmix application 152 executing on a computing device 180 locates audio output devices 126 in a listening environment. Audio upmix application 152 locates audio output devices 126 that are currently reachable by computing device 180 over one or more wired and/or wireless networks. In that regard, audio upmix application 152 includes audio output devices 126 that have recently entered a network that is accessible by computing device 180. Similarly, audio upmix application 152 excludes audio output devices 126 that have recently exited a network that is accessible by computing device 180. As a result, audio upmix application 152 adapts to a current set of audio output devices 126 as these audio output devices 126 enter, exit, and move within the listening environment.
[0062] At step 604, audio upmix application 152 receives one or more audio input signals. In some examples, audio upmix application 152 receives the audio input signal from an input device 122. In some examples, audio upmix application 152 retrieves data representing the audio input signal from database 142 stored in memory 116, from storage 114, and/or the like. Typically, the audio input signal is a stereophonic audio signal that includes a left audio channel and a right audio channel. Alternatively, the audio input signal is a monaural audio signal with a single channel. In some examples, the audio input signal is a multichannel encoded audio signal that includes more than two channels, such as four channels, six channels, eight channels, and so on. These multichannel encoded audio signals include quadrophonic audio, DVD-audio, Super Audio CD, Dolby Atmos, and/or the like.
[0063] At step 606, audio upmix application 152 separates a plurality of component audio signals from the one or more audio input signals. In some embodiments, audio upmix application 152 performs a signal separation process on an audio input signal, such as a stereo audio signal, to separate multiple component audio signals from the audio input signal. In some examples, the audio input signal includes a song performed by a rock band that has a vocalist, a lead guitar, a bass guitar, and drums. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into four component audio signals, one audio signal for each of the vocalist, lead guitar, bass guitar, and drums. In some examples, the audio input signal includes a song performed by a ten-piece jazz band that includes various horns and drums, along with a vocalist. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into eleven component audio signals, one audio signal for the vocalist and one audio signal for each of the instruments in the ten-piece band. In some examples, the audio input signal includes a song performed by a pop band that has a lead vocalist, two background vocalists, a lead guitar, a bass guitar, keyboards, and drums. Audio upmix application 152 performs a signal separation process on the audio input signal to separate the audio input signal into seven component audio signals, one audio signal for each of the lead vocalist, the two background vocalists, lead guitar, bass guitar, keyboards, and drums.
[0064] At step 608, audio upmix application 152 maps the component audio signals included in a subset of the component audio signals to one or more audio output devices 126 in the listening environment. The subset of component audio signals includes two or more, up to and including all, of the component audio signals included in the plurality of component audio signals of step 606. In some examples, audio upmix application 152 maps each component audio signal included in the subset of component audio signals to a different audio output device 126 in a one-to-one mapping. As a result, each audio output device plays a different component audio signal. In some examples, audio upmix application 152 maps two or more component audio signals to the same audio output device 126 in a two-to-one mapping, or a many-to-one mapping. As a result, the audio output device 126 plays a mix of two or more component audio signals. In some examples, audio upmix application 152 maps a component audio signals to two or more audio output devices 126 in a one-to-two mapping, or a one-to-many mapping. As a result, multiple audio output devices 126 play a portion of the same component audio signal. Audio upmix application 152 can employ these mapping techniques in any technically feasible combination. In some examples, a component audio signal is omitted from playback, and is not mapped to any audio output device 126, such as in the karaoke use case.
[0065] At step 610, audio upmix application 152 transmits the component audio signals included in the subset of component audio signals to corresponding audio output devices 126 based on the mapping determined in step 608. Audio output devices 126 output audio corresponding to the component audio signals from the audio input signal received from audio upmix application 152. In some embodiments, the data corresponding to the component audio signals can be transmitted from audio upmix application 152 to at least one audio output device 126, and the data can be transmitted amongst audio output devices 126.
[0066] The method 600 then returns to step 602 to locate audio output devices 126 currently in the listening environment. In this manner, the method 600 dynamically changes the upmix as audio output devices move within the listening environment, as audio output devices leave the network, as new audio output devices enter the network, and so on.
[0067] In sum, a computing device and multiple audio output devices are coupled together to form a coordinated audio system. The computing device decomposes a received audio signal, such as a music signal into individual audio streams based on certain criteria. The disclosed techniques generate a customizable upmix consisting of multiple audio streams in real time, where each audio stream is transmitted to one or more audio output devices. In some examples, the computing device generates a different audio stream for each instrument and voice present in the received audio signal. Each of these individual audio streams is then played back on one or more audio output devices to generate an immersive audio field between and among these audio output devices. As a user moves within the listening environment, the user hears a different custom mix with a different balance of instruments and voices, depending on the relative distance of the user to each of the audio output devices. In some examples, the user can move to different locations to hear all instruments and voices in balance, or can move to various locations where one or more of the individual audio streams is dominant. As users move within this audio field, each user experiences a different mix of the individual audio streams depending on the location and orientation of the user within the audio field generated by the audio output devices. As a result, users enjoy a more interactive and immersive listening experience relative to conventional techniques.
[0068] At least one technical advantage of the disclosed techniques relative to the prior art is that, with the disclosed techniques, multiple audio output devices, such as personal speakers, can be deployed in party mode without generating undesirable combing effects, as is typical with conventional techniques. Another technical advantage of the disclosed techniques relative to the prior art is that each user within the environment can have a different audio experience based on the location of the user relative to the locations of the multiple audio output devices. Further, the upmix transmitted to the multiple audio output devices is changed dynamically based on which speakers are in the network, which source audio is being used, and/or the like. More particularly, the techniques dynamically change the upmix as audio output devices move within the environment, as audio output devices leave the network, as new audio output devices enter the network, and so on. In this manner, the upmix is adapted to generate a suitable soundfield as changes in the audio output devices occur over time. As a result, users enjoy a more interactive and immersive listening experience relative to conventional techniques. These technical advantages provide one or more technological improvements over prior art approaches.
[0069] 1. In some embodiments, a computer-implemented method for generating audio signals in an audio system comprises: receiving an audio input signal; separating a plurality of component audio signals from the audio input signal; for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0070] 2. The computer-implemented method according to clause 1, wherein each component audio signal included in the plurality of component audio signals is mapped to a different audio output device included in the plurality of audio output devices.
[0071] 3. The computer-implemented method according to clause 1 or clause 2, wherein a first component audio signal included in the plurality of component audio signals is mapped to a first audio output device included in the plurality of audio output devices, and a second component audio signal included in the plurality of component audio signals is mapped to the first audio output device.
[0072] 4. The computer-implemented method according to any of clauses 1-3, wherein a first component audio signal included in the plurality of component audio signals is mapped to a first audio output device included in the plurality of audio output devices, and the first component audio signal is also mapped to a second audio output device included in the plurality of audio output devices.
[0073] 5. The computer-implemented method according to any of clauses 1-4, further comprising: receiving a user input that identifies a first component audio signal included in the plurality of component audio signals; and removing the first component audio signal from the plurality of component audio signals to create the subset of component audio signals. [0074] 6. The computer-implemented method according to any of clauses 1-5, wherein each component audio signal included in the plurality of component audio signals comprises a different instrument, voice, or group of instruments or voices included in the audio input signal.
[0075] 7. The computer-implemented method according to any of clauses 1 -6, wherein the plurality of audio output devices is connected via a network, and further comprising: determining that a first audio output device not included the plurality of audio output devices is connected to the network; adding the first audio output device to the plurality of audio output devices to create an updated plurality of audio output devices; for each component audio signal included in the plurality of the component audio signals, mapping the component audio signal to one or more audio output devices included in the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0076] 8. The computer-implemented method according to any of clauses 1 -7, wherein the plurality of audio output devices is connected via a network, and further compnsing: determining that a first audio output device included the plurality of audio output devices is no longer connected to the network; removing the first audio output device from the plurality of audio output device to create an updated plurality of audio output devices; for each component audio signal included in the plurality of component audio signals, mapping the component audio signal to one or more of the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0077] 9. The computer-implemented method according to any of clauses 1 -8, wherein the mapping is based on at least one of a frequency spectrum or a volume of a first component audio signal included in the plurality of component audio signals.
[0078] 10. The computer-implemented method according to any of clauses 1 -9, wherein the mapping is based on a virtual location of a first component audio signal included in the plurality of component audio signals, wherein the virtual location is determined when the plurality of component audio signals was mixed and mastered.
[0079] 11. In some embodiments, One or more non-transitory computer-readable storage media include instructions that, when executed by one or more processors at a first computing device, cause the one or more processors to perform steps of: receiving an audio input signal; separating a plurality of component audio signals from the audio input signal; for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0080] 12. The one or more non-transitory computer-readable storage media according to clause 11, wherein the steps further comprise: receiving a user input that identifies a first component audio signal included in the plurality of component audio signals; and removing the first component audio signal from the plurality of component audio signals to create the subset of component audio signals.
[0081] 13. The one or more non-transitory computer-readable storage media according to clause 11 or clause 12, wherein each component audio signal included in the plurality of component audio signals comprises a different instrument, voice, or group of instruments or voices included in the audio input signal.
[0082] 14. The one or more non-transitory computer-readable storage media according to any of clauses 11-13, wherein the plurality of audio output devices is connected via a network, and wherein the steps further comprise: determining that a first audio output device not included the plurality of audio output devices is connected to the network; adding the first audio output device to the plurality of audio output devices to create an updated plurality of audio output devices; for each component audio signal included in the plurality of the component audio signals, mapping the component audio signal to one or more audio output devices included in the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0083] 15. The one or more non-transitory computer-readable storage media according to any of clauses 11-14, wherein the plurality of audio output devices is connected via a network, and wherein the steps further comprise: determining that a first audio output device included the plurality of audio output devices is no longer connected to the network; removing the first audio output device from the plurality of audio output device to create an updated plurality of audio output devices; for each component audio signal included in the plurality of component audio signals, mapping the component audio signal to one or more of the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0084] 16. The one or more non-transitory computer-readable storage media according to any of clauses 11-15, wherein the mapping is based on at least one of a frequency spectrum or a volume of a first component audio signal included in the plurality of component audio signals.
[0085] 17. The one or more non-transitory computer-readable storage media according to any of clauses 11-16, wherein mapping the component audio signal to one or more of a plurality of audio output devices is based on an input received from a user interface.
|0086| 18. The one or more non-transitory computer-readable storage media according to any of clauses 11-17, wherein the steps further comprise: receiving a second audio input signal; separating a second plurality of component audio signals from the second audio input signal: for each second component audio signal included in the second plurality of component audio signals, mapping the component audio signal to one or more of the plurality of audio output devices; and transmitting each second component audio signal to the corresponding one or more audio output devices based on the mapping.
[0087] 19. In some embodiments, a computing device comprises: a memory storing an application; and one or more processors that, when executing the application, are configured to: receive an audio input signal, separate a plurality of component audio signals from the audio input signal, for a subset of component audio signals included in the plurality of component audio signals, map each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices, and transmit each component audio signal to the corresponding one or more audio output devices based on the mapping.
[0088] 20. The computing device according to clause 19, wherein the computing device is coupled to the plurality of audio output devices via a wireless network.
[0089] Any and all combinations of any of the claim elements recited in any of the claims and/or any elements described in this application, in any fashion, fall within the contemplated scope of the present disclosure and protection.
[0090] The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
[0091] Aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module,” a “system,” or a “computer.” In addition, any hardware and/or software technique, process, function, component, engine, module, or system described in the present disclosure may be implemented as a circuit or set of circuits. Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
[0092] Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non- exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory' (RAM), a read-only memory (ROM), an erasable programmable readonly memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory' (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
[0093] Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine. The instructions, when executed via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays.
[0094] The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
[0095] While the preceding is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

WHAT IS CLAIMED IS:
1. A computer-implemented method for generating audio signals in an audio system, the method comprising: receiving an audio input signal; separating a plurality of component audio signals from the audio input signal; for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
2. The computer-implemented method of claim 1, wherein each component audio signal included in the plurality of component audio signals is mapped to a different audio output device included in the plurality of audio output devices.
3. The computer-implemented method of claim 1, wherein a first component audio signal included in the plurality of component audio signals is mapped to a first audio output device included in the plurality of audio output devices, and a second component audio signal included in the plurality of component audio signals is mapped to the first audio output device.
4. The computer-implemented method of claim 1, wherein a first component audio signal included in the plurality of component audio signals is mapped to a first audio output device included in the plurality of audio output devices, and the first component audio signal is also mapped to a second audio output device included in the plurality of audio output devices.
5. The computer-implemented method of claim 1, further comprising: receiving a user input that identifies a first component audio signal included in the plurality of component audio signals; and removing the first component audio signal from the plurality of component audio signals to create the subset of component audio signals.
6. The computer-implemented method of claim 1, wherein each component audio signal included in the plurality of component audio signals comprises a different instrument, voice, or group of instruments or voices included in the audio input signal.
7. The computer-implemented method of claim 1, wherein the plurality of audio output devices is connected via a network, and further comprising: determining that a first audio output device not included the plurality of audio output devices is connected to the network; adding the first audio output device to the plurality of audio output devices to create an updated plurality of audio output devices; for each component audio signal included in the plurality of the component audio signals, mapping the component audio signal to one or more audio output devices included in the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
8. The computer-implemented method of claim 1, wherein the plurality of audio output devices is connected via a network, and further comprising: determining that a first audio output device included the plurality of audio output devices is no longer connected to the network; removing the first audio output device from the plurality of audio output devices to create an updated plurality of audio output devices; for each component audio signal included in the plurality of component audio signals, mapping the component audio signal to one or more of the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
9. The computer-implemented method of claim 1, wherein the mapping is based on at least one of a frequency spectrum or a volume of a first component audio signal included in the plurality of component audio signals.
10. The computer-implemented method of claim 1, wherein the mapping is based on a virtual location of a first component audio signal included in the plurality of component audio signals, wherein the virtual location is determined when the plurality of component audio signals was mixed and mastered.
11. One or more non-transitory computer-readable storage media including instructions that, when executed by one or more processors at a first computing device, cause the one or more processors to perform steps of: receiving an audio input signal; separating a plurality of component audio signals from the audio input signal; for a subset of component audio signals included in the plurality of component audio signals, mapping each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices, and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
12. The one or more non-transitory computer-readable storage media of claim 11, wherein the steps further comprise: receiving a user input that identifies a first component audio signal included in the plurality of component audio signals; and removing the first component audio signal from the plurality of component audio signals to create the subset of component audio signals.
13. The one or more non-transitory computer-readable storage media of claim 11, wherein each component audio signal included in the plurality of component audio signals comprises a different instrument, voice, or group of instruments or voices included in the audio input signal.
14. The one or more non-transitory computer-readable storage media of claim 11, wherein the plurality' of audio output devices is connected via a network, and wherein the steps further comprise: determining that a first audio output device not included the plurality of audio output devices is connected to the network; adding the first audio output device to the plurality of audio output devices to create an updated plurality of audio output devices; for each component audio signal included in the plurality of the component audio signals, mapping the component audio signal to one or more audio output devices included in the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
15. The one or more non-transitory computer-readable storage media of claim 11, wherein the plurality of audio output devices is connected via a network, and wherein the steps further comprise: determining that a first audio output device included the plurality of audio output devices is no longer connected to the network; removing the first audio output device from the plurality of audio output devices to create an updated plurality of audio output devices; for each component audio signal included in the plurality of component audio signals, mapping the component audio signal to one or more of the updated plurality of audio output devices; and transmitting each component audio signal to the corresponding one or more audio output devices based on the mapping.
16. The one or more non-transitory computer-readable storage media of claim 11, wherein the mapping is based on at least one of a frequency spectrum or a volume of a first component audio signal included in the plurality of component audio signals.
17. The one or more non-transitory computer-readable storage media of claim 11, wherein mapping the component audio signal to one or more of a plurality of audio output devices is based on an input received from a user interface.
18. The one or more non-transitory computer-readable storage media of claim 11, wherein the steps further compnse: receiving a second audio input signal; separating a second plurality of component audio signals from the second audio input signal, for each second component audio signal included in the second plurality of component audio signals, mapping the component audio signal to one or more of the plurality of audio output devices; and transmitting each second component audio signal to the corresponding one or more audio output devices based on the mapping.
19. A computing device comprising: a memory storing an application; and one or more processors that, when executing the application, are configured to: receive an audio input signal, separate a plurality of component audio signals from the audio input signal, for a subset of component audio signals included in the plurality of component audio signals, map each component audio signal included in the subset of component audio signals to one or more of a plurality of audio output devices, and transmit each component audio signal to the corresponding one or more audio output devices based on the mapping.
20. The computing device of claim 19, wherein the computing device is coupled to the plurality of audio output devices via a wireless network.
PCT/US2023/013639 2023-02-22 2023-02-22 Dynamic audio mixing in a multiple wireless speaker environment Pending WO2024177629A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US2023/013639 WO2024177629A1 (en) 2023-02-22 2023-02-22 Dynamic audio mixing in a multiple wireless speaker environment
CN202380094237.8A CN120642347A (en) 2023-02-22 2023-02-22 Dynamic audio mixing in a multi-wireless speaker environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2023/013639 WO2024177629A1 (en) 2023-02-22 2023-02-22 Dynamic audio mixing in a multiple wireless speaker environment

Publications (1)

Publication Number Publication Date
WO2024177629A1 true WO2024177629A1 (en) 2024-08-29

Family

ID=85726676

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/013639 Pending WO2024177629A1 (en) 2023-02-22 2023-02-22 Dynamic audio mixing in a multiple wireless speaker environment

Country Status (2)

Country Link
CN (1) CN120642347A (en)
WO (1) WO2024177629A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007141677A2 (en) * 2006-06-09 2007-12-13 Koninklijke Philips Electronics N.V. A device for and a method of generating audio data for transmission to a plurality of audio reproduction units
US20140301574A1 (en) * 2009-04-24 2014-10-09 Shindig, Inc. Networks of portable electronic devices that collectively generate sound
US20190327559A1 (en) * 2018-04-19 2019-10-24 Robert E. Smith Multi-listener bluetooth (bt) audio system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007141677A2 (en) * 2006-06-09 2007-12-13 Koninklijke Philips Electronics N.V. A device for and a method of generating audio data for transmission to a plurality of audio reproduction units
US20140301574A1 (en) * 2009-04-24 2014-10-09 Shindig, Inc. Networks of portable electronic devices that collectively generate sound
US20190327559A1 (en) * 2018-04-19 2019-10-24 Robert E. Smith Multi-listener bluetooth (bt) audio system

Also Published As

Publication number Publication date
CN120642347A (en) 2025-09-12

Similar Documents

Publication Publication Date Title
US11962993B2 (en) Grouping and transport of audio objects
Thompson Understanding audio: getting the most out of your project or professional recording studio
US11132984B2 (en) Automatic multi-channel music mix from multiple audio stems
US10924875B2 (en) Augmented reality platform for navigable, immersive audio experience
JP7014176B2 (en) Playback device, playback method, and program
US11611840B2 (en) Three-dimensional audio systems
US20100223552A1 (en) Playback Device For Generating Sound Events
d'Escrivan Music technology
US20220386062A1 (en) Stereophonic audio rearrangement based on decomposed tracks
KR101919508B1 (en) Method and apparatus for supplying stereophonic sound through sound signal generation in virtual space
WO2022014326A1 (en) Signal processing device, method, and program
CN117043851A (en) Electronic device, method and computer program
CN114615534A (en) Display device and audio processing method
Malecki et al. Electronic music production in ambisonics-case study
Mores Music studio technology
Colbeck et al. Alan Parsons' Art & Science of Sound Recording: The Book
WO2024177629A1 (en) Dynamic audio mixing in a multiple wireless speaker environment
CA3044260A1 (en) Augmented reality platform for navigable, immersive audio experience
JP7593333B2 (en) Encoding device and method, decoding device and method, and program
CN114827886A (en) Audio generation method and device, electronic equipment and storage medium
Li et al. How Audio is Getting its Groove Back: Deep learning is delivering the century-old promise of truly realistic sound reproduction
JP6834398B2 (en) Sound processing equipment, sound processing methods, and programs
EP4571730A1 (en) Playback device and playback system
McGuire et al. Mixing
JP2005250199A (en) Audio equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23712978

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202380094237.8

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 202380094237.8

Country of ref document: CN