[go: up one dir, main page]

CN113905321A - Object-based audio channel metadata and generation method, device and storage medium - Google Patents

Object-based audio channel metadata and generation method, device and storage medium Download PDF

Info

Publication number
CN113905321A
CN113905321A CN202111020417.0A CN202111020417A CN113905321A CN 113905321 A CN113905321 A CN 113905321A CN 202111020417 A CN202111020417 A CN 202111020417A CN 113905321 A CN113905321 A CN 113905321A
Authority
CN
China
Prior art keywords
audio
audio channel
information
sub
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111020417.0A
Other languages
Chinese (zh)
Inventor
吴健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Saiyinxin Micro Beijing Electronic Technology Co ltd
Original Assignee
Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Saiyinxin Micro Beijing Electronic Technology Co ltd filed Critical Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority to CN202111020417.0A priority Critical patent/CN113905321A/en
Publication of CN113905321A publication Critical patent/CN113905321A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

The present disclosure relates to an object-based audio channel metadata and generation method, an electronic device, and a storage medium. Object-based audio channel metadata, comprising: the attribute zone comprises an audio channel name, an audio channel identifier and audio channel type description information; and the sub-element zone comprises at least one audio block format and audio cut-off frequency information, wherein the audio block format is used for indicating the time domain division of an audio channel, the audio block format comprises an audio block identifier and object sub-elements, and the object sub-elements comprise coordinate information used for describing the position and the range of an object, diffusion information used for describing the sound type of the object and preset rendering information used for indicating a renderer to render sound. The audio data can realize the reproduction of three-dimensional sound in the space during rendering, thereby improving the quality of sound scenes.

Description

Object-based audio channel metadata and generation method, device and storage medium
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to an object-based audio channel metadata and generation method, an electronic device, and a storage medium.
Background
With the development of technology, audio becomes more and more complex. The early single-channel audio is converted into stereo, and the working center also focuses on the correct processing mode of the left and right channels. But the process begins to become complex after surround sound occurs. The surround 5.1 speaker system performs ordering constraint on a plurality of channels, and further the surround 6.1 speaker system, the surround 7.1 speaker system and the like enable audio processing to be varied, and correct signals are transmitted to proper speakers to form an effect of mutual involvement. Thus, as sound becomes more immersive and interactive, the complexity of audio processing also increases greatly.
Audio channels (or audio channels) refer to audio signals that are independent of each other and that are captured or played back at different spatial locations when sound is recorded or played. The number of channels is the number of sound sources when recording or the number of corresponding speakers when playing back sound. For example, in a surround 5.1 speaker system comprising audio signals at 6 different spatial locations, each separate audio signal is used to drive a speaker at a corresponding spatial location; in a surround 7.1 speaker system comprising audio signals at 8 different spatial positions, each separate audio signal is used to drive a speaker at a corresponding spatial position.
Therefore, the effect achieved by current loudspeaker systems depends on the number and spatial position of the loudspeakers. For example, a binaural speaker system cannot achieve the effect of a surround 5.1 speaker system.
The present disclosure provides audio channel metadata and a construction method thereof in order to provide metadata capable of solving the above technical problems.
Disclosure of Invention
The present disclosure is directed to a method for generating metadata based on object audio channels, an electronic device, and a storage medium, so as to solve one of the above technical problems.
To achieve the above object, a first aspect of the present disclosure provides an object-based audio channel metadata, including:
the attribute zone comprises an audio channel name, an audio channel identifier and audio channel type description information;
and the sub-element zone comprises at least one audio block format and audio cut-off frequency information, wherein the audio block format is used for indicating the time domain division of an audio channel, the audio block format comprises an audio block identifier and object sub-elements, and the object sub-elements comprise coordinate information used for describing the position and the range of an object, diffusion information used for describing the sound type of the object and preset rendering information used for indicating a renderer to render sound.
To achieve the above object, a second aspect of the present disclosure provides a method for generating audio channel metadata, including:
the generating comprises object-based audio channel metadata as described in the first aspect.
To achieve the above object, a third aspect of the present disclosure provides an electronic device, including: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to generate audio data including object-based audio channel metadata as described in the first aspect.
To achieve the above object, a fourth aspect of the present disclosure provides a storage medium containing computer-executable instructions which, when generated by a computer processor, comprise object-based audio channel metadata as described in the first aspect.
From the above, the present disclosure is based on object audio channel metadata, including: the attribute zone comprises an audio channel name, an audio channel identifier and audio channel type description information; and the sub-element zone comprises at least one audio block format and audio cut-off frequency information, wherein the audio block format is used for indicating the time domain division of an audio channel, the audio block format comprises an audio block identifier and object sub-elements, and the object sub-elements comprise coordinate information used for describing the position and the range of an object, diffusion information used for describing the sound type of the object and preset rendering information used for indicating a renderer to render sound. The object-based audio channel metadata describes object-based audio channels, wherein the position of audio objects may be dynamically varied to enable reproduction of three-dimensional sound in space, thereby improving the quality of the sound scene.
Drawings
Fig. 1 is a schematic diagram of a three-dimensional acoustic audio production model provided in embodiment 1 of the present disclosure;
fig. 2 is a flowchart of a method for generating audio channel metadata according to embodiment 2 of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
As shown in fig. 1, a three-dimensional audio production model is composed of a set of production elements each describing one stage of audio production, and includes a content production section and a format production section.
The content production section includes: audio programs, audio content, audio objects and audio tracks are uniquely identified.
The audio program includes narration, sound effects, and background music, and the audio program references one or more audio contents that are combined together to construct a complete audio program.
The audio content describes the content of a component of an audio program, such as background music, and relates the content to its format by reference to one or more audio objects.
The audio objects are used to establish a relationship between content, format, and asset using soundtrack unique identification elements and to determine soundtrack unique identification of the actual soundtrack.
The format making part comprises: audio packet format, audio channel format, audio stream format, audio track format.
The audio packet format will be the format used when the audio objects and the original audio data are packed in packets according to the channels.
The audio channel format represents a single sequence of audio samples on which certain operations may be performed, such as movement of rendering objects in a scene.
Stream, is a combination of audio tracks needed to render a channel, object, higher order ambient sound component, or packet. The audio stream format establishes a relationship between a set of audio track formats and a set of audio channel formats or audio packet formats.
The audio track format corresponds to a set of samples or data in a single audio track in the storage medium, describing the format of the original audio data, and the decoded signal of the renderer. The audio track format is derived from an audio stream format for identifying the combination of audio tracks required for successful decoding of the audio track data.
And generating synthetic audio data containing metadata after the original audio data is produced through the three-dimensional sound audio production model.
The Metadata (Metadata) is information describing characteristics of data, and functions supported by the Metadata include indicating a storage location, history data, resource lookup, or file record.
And after the synthesized audio data is transmitted to the far end in a communication mode, the far end renders the synthesized audio data based on the metadata to restore the original sound scene.
Example 1
The present disclosure provides and describes in detail audio channel metadata in a three-dimensional acoustic audio model.
The channel-based audio type used in the prior art is a way to directly transmit each channel audio signal to each corresponding speaker without any signal modification. For example, mono, stereo, surround 5.1, surround 7.1 and surround 22.2 are all channel-based audio formats, each channel feeding one loudspeaker. Although channel-based audio types have been used in the prior art, adding corresponding audio channel metadata to a channel-based audio type can facilitate audio processing, and by tagging each channel with an appropriate identifier, can ensure that the audio is directed to the correct speaker.
The audio channel format represents a single sequence of audio samples on which certain operations may be performed, such as movement of rendering objects in a scene. The disclosed embodiments describe audio channel formats with audio channel metadata. The audio channel format of the object type is explained.
The audio channel metadata includes a property region and a sub-element region.
The attribute zone comprises an audio channel name, an audio channel identifier and audio channel type description information;
and the sub-element zone comprises at least one audio block format and audio cut-off frequency information, wherein the audio block format is used for indicating the time domain division of an audio channel, the audio block format comprises an audio block identifier and object sub-elements, and the object sub-elements comprise coordinate information used for describing the position and the range of an object, diffusion information used for describing the sound type of the object and preset rendering information used for indicating a renderer to render sound.
Wherein an audio channel format comprises a set of one or more audio block formats that subdivide the audio channel format in the time domain.
The property region includes a generic definition of audio channel metadata. The audio channel name may be a name set for the audio channel, and the user may determine the audio channel by the audio channel name. The audio channel identification is an audio channel identification symbol. The audio channel type description information may be a descriptor of the audio channel type and/or description information of the audio channel type, and the type of the channel may be defined using a type definition and/or a type tag. The type definition of an audio channel format specifies the audio type it describes and determines which parameters to use in the audio block format sub-stage. In the disclosed embodiment, the audio types may include: channel type, matrix type, object type, scene type, and binaural channel type. The type tags may be numerical codes and each channel type may have a corresponding numerical code representation. For example, the channel for the object type is denoted by 0003.
The audio channel identification may include: an audio type identifier for indicating a type of audio contained in the audio channel and an audio stream identifier for indicating a format of an audio stream contained in the audio channel. Alternatively, the audio channel identifier may comprise an 8-bit hexadecimal number, the first four digits representing the type of audio contained in the channel, and the second four digits representing a matching audio stream format. For example, the audio channel is identified as AC _ yyyyxxxx, yyyyy represents the type of audio contained in the channel, and xxxx is digitally matched to the audio stream format. As shown in the table 1 below, the following examples,
TABLE 1
Figure BDA0003241197910000061
In table 1, the requirement item means whether the attribute of the item needs to be set when generating the audio channel metadata, yes indicates that the attribute of the item is a necessary item, optional indicates that the attribute of the item is an optional item, and at least one of the type definition and the type tag needs to be set.
The sub-element zone includes at least one audio block format, the audio block including a channel time-domain partition of the dynamic metadata. Audio cut-off frequency information may also be included in the sub-element region, and the audio cut-off frequency information may be set to an audio frequency indicating a high cut-off and/or a low cut-off. As shown in the table 2 below, the following examples,
TABLE 2
Figure BDA0003241197910000062
Figure BDA0003241197910000071
The number one item in table 2 indicates the number of sub-elements that can be set, and an audio channel may include at least one audio block, so the number of sub-element audio blocks in the audio channel format may be an integer greater than 0, and audio cutoff frequency information is a selectable item, the number of items is 0 when the item is not set, the number of items is 1 when one of the audio low cutoff frequency and the audio high cutoff frequency is set, and the number of items is 2 when both the audio low cutoff frequency and the audio high cutoff frequency are set.
Each audio block format is provided with an audio block identifier, wherein the audio block identifier may comprise an index for indicating an audio block in an audio channel. The audio block identifier may include 8-bit hexadecimal digits as an index of the audio block in the channel, for example, the audio block identifier is AB _00010001_00000001, the last 8-bit hexadecimal digit is an index of the audio block in the channel, and the index of the first audio block in the audio channel may start from 00000001. The audio block format may also include the start time of the block and the duration of the block, and if the start time of the block is not set, the audio block may be considered to start from 00:00:00.0000, and for the time format, a "hh mm: ss.zzzz" format may be used, where "hh" represents time, "mm" represents minutes, "ss" represents an integer part of seconds, and "ZZZZ" represents seconds of a smaller order, such as: milliseconds; if the duration of a block is not set, the block of audio will last for the duration of the entire audio channel. If there is only one audio block format in the audio channel format, it is assumed to be a "static" object, lasting for the duration of the audio channel, and therefore the start time of the block and the duration of the block should be ignored. If multiple audio block formats are included in an audio channel format, they are assumed to be "dynamic" objects, and therefore both the start-up time of a block and the duration of a block should be used. The audio block format attribute settings are as in table 3,
TABLE 3
Figure BDA0003241197910000072
Figure BDA0003241197910000081
The types of audio channel formats may include: sound beds, matrices, objects, scenes, and binaural channels, embodiments of the present disclosure account for audio channel format metadata for object types.
The audio channel type description information in the property area is set as an object type, and the type can be defined as an object. The information in the sub-element area is also set for the type definition object. The audio block format defines an object sub-element for the audio block format whose type is defined as "object" as a sub-element of the audio block format, in addition to the information contained in the audio block format described above. The object sub-elements comprise coordinate information for describing the position and the range of the object, diffusion information for describing the sound type of the object and preset rendering information for instructing a renderer to render sound. The audio channel format metadata for the object type is applicable to object-based audio, and in practical applications, the attribute of the audio object is not necessarily fixed, and may be dynamically changed, for example, a change in position. The audio channel format metadata of the object type describes coordinate information of the object, and the coordinate information may include: coordinate position, object range, object divergence, and area exclusion. The coordinate position may be coordinates of the object in a preset coordinate system, the object range may be a parameter indicating a size of the object, the object divergence may be a divergence indicating an audio of the object, and the region exclusion may be a speaker and/or a room region indicating that the renderer should not render the object. The coordinate information may be described based on a polar coordinate system or a cartesian coordinate system, that is, under a polar coordinate system or a cartesian coordinate system. The object sub-element further includes coordinate system specification information for specifying a coordinate system of the coordinate information, and it is possible to specify whether the coordinate system on which the coordinate information is described is a polar coordinate system or a cartesian coordinate system by setting the coordinate system specification information. The diffusion information included in the object sub-element may indicate whether the object is a diffuse reflection sound or a coherent sound. The preset rendering information is a relevant parameter for instructing the renderer to perform sound rendering, and may include: audio channel lock information and audio jump position information. The channel lock parameter will inform the renderer to send the audio of the object to the nearest speaker or channel instead of the usual panning, interpolation, etc. The jump location parameter will ensure that the renderer will not perform any temporal interpolation on the location values, so the object will jump within the specified time, rather than smoothly moving to the target next location. In addition, the object sub-elements may further include: gain information, screen related information, and object importance information. The gain information is used to indicate the use of the gain for the audio in the object. The screen related information is used to indicate whether the object is related to the screen. The object importance information is used to indicate the importance of the object, which may be set to several levels, e.g., 0-10.
The coordinate information uses a coordinate attribute to specify which coordinate axis is used. If the coordinate system is a polar coordinate system, it uses an azimuth axis, an elevation axis, and a distance axis. For example, the coordinate system is a cartesian coordinate system that uses an X-axis, a Y-axis, and a Z-axis. Part of the information in the object sub-elements will have different descriptions based on different coordinate systems (based on different coordinate axes and units); while part of the information is common to different coordinate systems. For the information in the object sub-elements in the polar coordinate system, as shown in tables 4 and 5,
TABLE 4
Figure BDA0003241197910000091
Figure BDA0003241197910000101
TABLE 5
Figure BDA0003241197910000102
Figure BDA0003241197910000111
In table 4, elements with the number of "0 or 1" are selectable items, elements with the number of "1" are mandatory items, and items with the content of "empty" in the table are items that do not need to be set. In table 5, the area (spherical surface) is a sub-element excluded from the sub-element area, and is set according to the number of excluded areas.
For the information in the object sub-elements in the cartesian coordinate system, as in tables 6 and 7,
TABLE 6
Figure BDA0003241197910000112
Figure BDA0003241197910000121
TABLE 7
Figure BDA0003241197910000131
In table 6, elements with the number of "0 or 1" are selectable items, elements with the number of "1" are mandatory items, and items with the content of "empty" in the table are items that do not need to be set. In table 7, the area (cartesian coordinates) is a child sub-element excluded from the sub-element area, and is set according to the number of excluded areas.
For the common sub-elements of polar and Cartesian coordinate systems, as shown in Table 8, Table 8
Figure BDA0003241197910000141
Figure BDA0003241197910000151
Figure BDA0003241197910000161
In table 8, elements of the number "0 or 1" are optional items, and items of the table whose contents are "empty" are items that do not need to be set.
The disclosed embodiments describe object-based audio channels through audio channel metadata, where the position of audio objects may be dynamically changed to enable reproduction of three-dimensional sound in space, thereby improving the quality of sound scenes.
Example 2
The present disclosure also provides an embodiment of a method for adapting to the above embodiment, which is used for a method for generating audio packet metadata, and the explanation based on the same name and meaning is the same as that of the above embodiment, and has the same technical effect as that of the above embodiment, and details are not repeated here.
A method for generating audio channel metadata, as shown in fig. 2, comprises the following steps:
step S110, in response to a setting operation of a user on audio channel metadata, generating audio channel metadata, where the audio channel metadata includes:
the attribute zone comprises an audio channel name, an audio channel identifier and audio channel type description information;
and the sub-element zone comprises at least one audio block format and audio cut-off frequency information, wherein the audio block format is used for indicating the time domain division of an audio channel, the audio block format comprises an audio block identifier and object sub-elements, and the object sub-elements comprise coordinate information used for describing the position and the range of an object, diffusion information used for describing the sound type of the object and preset rendering information used for indicating a renderer to render sound.
The setting operation of the user for the audio channel metadata may be an operation of setting the user for the relevant attribute of the audio channel metadata, for example, the relevant attribute of the audio channel metadata input item by the user is received; or, automatically generating audio channel metadata according to the operation of a user on a preset metadata generation program, where the preset metadata generation program may be set to set all attributes of the audio channel metadata according to default attributes of the system; alternatively, the audio channel metadata may be automatically generated according to a user's operation on a preset metadata generation program, and the preset metadata generation program may be configured to set a partial attribute of the audio channel metadata according to a default attribute of the system, and then receive a remaining attribute input by the user.
Optionally, the audio channel identifier includes: an audio type identifier for indicating a type of audio contained in the audio channel and an audio stream identifier for indicating a format of an audio stream contained in the audio channel.
Optionally, the audio channel type description information includes a type tag and/or a type definition.
Optionally, the audio block identifier includes an index indicating an audio block within an audio channel.
Optionally, the audio cut-off frequency information comprises audio frequencies indicating a high cut-off and/or a low cut-off.
Optionally, the coordinate information includes: coordinate position, object range, object divergence, and area exclusion.
Optionally, the coordinate information is based on a polar coordinate system or a cartesian coordinate system; the object sub-element further includes coordinate system specifying information for specifying a coordinate system of the coordinate information.
Optionally, the preset rendering information includes: audio channel lock information and audio jump position information.
Optionally, the object sub-element further includes: gain information, screen related information, and object importance information.
The audio channel metadata generated by the method for generating the audio channel metadata describes the relation of a matrixing audio signal referring to an audio channel, and can realize the reproduction of three-dimensional sound in space, thereby improving the quality of sound scenes.
Example 3
Fig. 3 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure. As shown in fig. 3, the electronic apparatus includes: a processor 30, a memory 31, an input device 32, and an output device 33. The number of the processors 30 in the electronic device may be one or more, and one processor 30 is taken as an example in fig. 3. The number of the memories 31 in the electronic device may be one or more, and one memory 31 is taken as an example in fig. 3. The processor 30, the memory 31, the input device 32 and the output device 33 of the electronic apparatus may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example. The electronic device can be a computer, a server and the like. The embodiment of the present disclosure describes in detail by taking an electronic device as a server, and the server may be an independent server or a cluster server.
Memory 31 is provided as a computer-readable storage medium that may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules for generating audio channel metadata as described in any embodiment of the present disclosure. The memory 31 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 31 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 31 may further include memory located remotely from the processor 30, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 32 may be used to receive input numeric or character information and generate key signal inputs related to viewer user settings and function controls of the electronic device, as well as a camera for capturing images and a sound pickup device for capturing audio data. The output device 33 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 32 and the output device 33 can be set according to actual conditions.
The processor 30 executes various functional applications of the device and data processing, i.e. generating audio channel metadata, by running software programs, instructions and modules stored in the memory 31.
Example 4
The disclosed embodiment 4 also provides a storage medium containing computer-executable instructions that, when generated by a computer processor, include audio channel metadata as described in embodiment 1.
Of course, the storage medium provided by the embodiments of the present disclosure contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the electronic method described above, and may also perform related operations in the electronic method provided by any embodiments of the present disclosure, and have corresponding functions and advantages.
From the above description of the embodiments, it is obvious for a person skilled in the art that the present disclosure can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the electronic method according to any embodiment of the present disclosure.
It should be noted that, in the electronic device, the units and modules included in the electronic device are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "in an embodiment," "in yet another embodiment," "exemplary" or "in a particular embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although the present disclosure has been described in detail hereinabove with respect to general description, specific embodiments and experiments, it will be apparent to those skilled in the art that some modifications or improvements may be made based on the present disclosure. Accordingly, such modifications and improvements are intended to be within the scope of this disclosure, as claimed.

Claims (12)

1. An object-based audio channel metadata, comprising:
the attribute zone comprises an audio channel name, an audio channel identifier and audio channel type description information;
and the sub-element zone comprises at least one audio block format and audio cut-off frequency information, wherein the audio block format is used for indicating the time domain division of an audio channel, the audio block format comprises an audio block identifier and object sub-elements, and the object sub-elements comprise coordinate information used for describing the position and the range of an object, diffusion information used for describing the sound type of the object and preset rendering information used for indicating a renderer to render sound.
2. The object-based audio channel metadata according to claim 1, wherein the audio channel identification comprises: an audio type identifier for indicating a type of audio contained in the audio channel and an audio stream identifier for indicating a format of an audio stream contained in the audio channel.
3. Object based audio channel metadata according to claim 1, characterized in that said audio channel type description information comprises type tags and/or type definitions.
4. The object-based audio channel metadata according to claim 1, wherein the audio block identification comprises an index indicating an audio block within an audio channel.
5. The object-based audio channel metadata according to claim 1, wherein the audio cut-off frequency information comprises audio frequencies indicating a high cut-off and/or a low cut-off.
6. The object-based audio channel metadata according to claim 1, wherein the coordinate information comprises: coordinate position, object range, object divergence, and area exclusion.
7. The object-based audio channel metadata according to claim 6, wherein the coordinate information is based on a polar coordinate system or a Cartesian coordinate system; the object sub-element further includes coordinate system specifying information for specifying a coordinate system of the coordinate information.
8. The object-based audio channel metadata according to claim 1, wherein the preset rendering information includes: audio channel lock information and audio jump position information.
9. The object-based audio channel metadata according to claim 1, wherein the object sub-elements further comprise: gain information, screen related information, and object importance information.
10. A method of generating audio channel metadata, arranged to generate metadata comprising object-based audio channel metadata according to any of claims 1-9.
11. An electronic device, comprising: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to generate audio data including the object-based audio channel metadata of any of claims 1-9.
12. A storage medium containing computer-executable instructions which, when generated by a computer processor, comprise object-based audio channel metadata as recited in any of claims 1-9.
CN202111020417.0A 2021-09-01 2021-09-01 Object-based audio channel metadata and generation method, device and storage medium Pending CN113905321A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111020417.0A CN113905321A (en) 2021-09-01 2021-09-01 Object-based audio channel metadata and generation method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111020417.0A CN113905321A (en) 2021-09-01 2021-09-01 Object-based audio channel metadata and generation method, device and storage medium

Publications (1)

Publication Number Publication Date
CN113905321A true CN113905321A (en) 2022-01-07

Family

ID=79188277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111020417.0A Pending CN113905321A (en) 2021-09-01 2021-09-01 Object-based audio channel metadata and generation method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113905321A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499760A (en) * 2022-08-31 2022-12-20 赛因芯微(北京)电子科技有限公司 Object-based audio metadata space identification conversion method and device
WO2023212880A1 (en) * 2022-05-05 2023-11-09 北京小米移动软件有限公司 Audio processing method and apparatus, and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101379464A (en) * 2005-12-21 2009-03-04 数字标记公司 Rules driven pan ID metadata routing system and network
CN103650539A (en) * 2011-07-01 2014-03-19 杜比实验室特许公司 Systems and methods for adaptive audio signal generation, encoding and presentation
CN105431900A (en) * 2013-07-31 2016-03-23 杜比实验室特许公司 Handling of spatially diffuse or large audio objects
US9774976B1 (en) * 2014-05-16 2017-09-26 Apple Inc. Encoding and rendering a piece of sound program content with beamforming data
CN107925391A (en) * 2015-09-30 2018-04-17 苹果公司 Loudness equalization based on encoded audio metadata and dynamic equalization during DRC
JP2019003185A (en) * 2017-06-09 2019-01-10 日本放送協会 Acoustic signal auxiliary information conversion transmission apparatus and program
US20210005211A1 (en) * 2019-07-02 2021-01-07 Dolby International Ab Using metadata to aggregate signal processing operations
US20210050028A1 (en) * 2018-01-26 2021-02-18 Lg Electronics Inc. Method for transmitting and receiving audio data and apparatus therefor

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101379464A (en) * 2005-12-21 2009-03-04 数字标记公司 Rules driven pan ID metadata routing system and network
CN103650539A (en) * 2011-07-01 2014-03-19 杜比实验室特许公司 Systems and methods for adaptive audio signal generation, encoding and presentation
CN105431900A (en) * 2013-07-31 2016-03-23 杜比实验室特许公司 Handling of spatially diffuse or large audio objects
US9774976B1 (en) * 2014-05-16 2017-09-26 Apple Inc. Encoding and rendering a piece of sound program content with beamforming data
CN107925391A (en) * 2015-09-30 2018-04-17 苹果公司 Loudness equalization based on encoded audio metadata and dynamic equalization during DRC
JP2019003185A (en) * 2017-06-09 2019-01-10 日本放送協会 Acoustic signal auxiliary information conversion transmission apparatus and program
US20210050028A1 (en) * 2018-01-26 2021-02-18 Lg Electronics Inc. Method for transmitting and receiving audio data and apparatus therefor
US20210005211A1 (en) * 2019-07-02 2021-01-07 Dolby International Ab Using metadata to aggregate signal processing operations

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
国际电信联盟: "音频定义模型", 《ITU-R BS.2076-1 建议书》 *
张静琦: "音频定义模型简介", 《电声技术》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023212880A1 (en) * 2022-05-05 2023-11-09 北京小米移动软件有限公司 Audio processing method and apparatus, and storage medium
CN115499760A (en) * 2022-08-31 2022-12-20 赛因芯微(北京)电子科技有限公司 Object-based audio metadata space identification conversion method and device

Similar Documents

Publication Publication Date Title
CN114339297B (en) Audio processing method, device, electronic equipment and computer readable storage medium
CN113905321A (en) Object-based audio channel metadata and generation method, device and storage medium
JP2024534274A (en) Vibration motor control method, vibration motor control device, storage medium, and electronic device
WO2014160717A1 (en) Using single bitstream to produce tailored audio device mixes
CN114023339A (en) Audio-bed-based audio packet format metadata and generation method, device and medium
US20240388866A1 (en) Audio processing method and terminal
CN114143695A (en) Audio stream metadata and generation method, electronic equipment and storage medium
CN114203189A (en) Method, apparatus and medium for generating metadata based on binaural audio packet format
CN114023340A (en) Object-based audio packet format metadata and generation method, apparatus, and medium
CN114979935A (en) Object output rendering item determination method, device, equipment and storage medium
CN113923264A (en) Scene-based audio channel metadata and generation method, device and storage medium
CN113905322A (en) Method, device and storage medium for generating metadata based on binaural audio channel
CN114203190A (en) Matrix-based audio packet format metadata and generation method, device and storage medium
CN113938811A (en) Audio channel metadata based on sound bed, generation method, equipment and storage medium
CN114121036A (en) Audio track unique identification metadata and generation method, electronic device and storage medium
CN114051194A (en) Audio track metadata and generation method, electronic equipment and storage medium
CN113923584A (en) Matrix-based audio channel metadata and generation method, equipment and storage medium
CN114203188A (en) Scene-based audio packet format metadata and generation method, device and storage medium
CN115038029A (en) Rendering item processing method, device and equipment of audio renderer and storage medium
CN115134737A (en) Sound bed output rendering item determination method, device, equipment and storage medium
CN115529548A (en) Speaker channel generation method and device, electronic device and medium
CN114360556A (en) Serial audio metadata frame generation method, device, equipment and storage medium
CN115190412A (en) Method, device and equipment for generating internal data structure of renderer and storage medium
CN114530157A (en) Audio metadata channel allocation block generation method, apparatus, device and medium
KR20190081163A (en) Method for selective providing advertisement using stereoscopic content authoring tool and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220107

RJ01 Rejection of invention patent application after publication