WO2025091293A1 - Grouping method, encoder, decoder, and storage medium - Google Patents
Grouping method, encoder, decoder, and storage medium Download PDFInfo
- Publication number
- WO2025091293A1 WO2025091293A1 PCT/CN2023/128800 CN2023128800W WO2025091293A1 WO 2025091293 A1 WO2025091293 A1 WO 2025091293A1 CN 2023128800 W CN2023128800 W CN 2023128800W WO 2025091293 A1 WO2025091293 A1 WO 2025091293A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- channel
- channels
- grouping
- groups
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Definitions
- the present disclosure relates to the field of multimedia technology, and in particular to a grouping method, an encoder, a decoder, and a storage medium.
- the audio can be applied in various fields.
- the audio can be 3D encoded and 3D decoded to ensure that the audio heard by the user is indistinguishable from the audio heard in the actual environment.
- the present disclosure solves the problem that a single channel cannot be grouped when grouping channels, and provides a method of dividing a channel group including three channels, thereby ensuring that each channel can have a corresponding group when grouping channels, ensuring the accuracy of channel grouping, and further ensuring the reduction of redundancy between channels, ensuring the accuracy of channel encoding, and further ensuring the accuracy of audio encoding.
- the embodiments of the present disclosure provide a grouping method, a device, and a storage medium.
- a grouping method is proposed.
- the method is performed by an encoder, and the method includes:
- a processing module is used to group multiple channels to obtain at least one channel group, wherein there are N first channel groups in the at least one channel group, the first channel group includes three channels, and there are M second channel groups in the at least one channel group, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
- FIG4B is a schematic diagram of a flow chart of a grouping method according to an embodiment of the present disclosure.
- FIG5 is a flow chart of a grouping method according to an embodiment of the present disclosure.
- FIG7B is a schematic diagram of the structure of a coding and decoding device proposed in an embodiment of the present disclosure.
- FIG8A is a schematic diagram of the structure of a communication device provided in an embodiment of the present disclosure.
- FIG8B is a schematic diagram of the structure of a chip proposed in an embodiment of the present disclosure.
- the present disclosure provides a grouping method, device and storage medium.
- a grouping method is proposed.
- the method is performed by an encoder, and the method includes:
- a plurality of channels are grouped to obtain at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M second channel groups, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
- the problem that a single channel cannot be grouped when grouping channels is solved, and a method of dividing a channel group including three channels is provided to ensure that each channel can have a corresponding group when grouping channels, thereby ensuring the accuracy of channel grouping, thereby ensuring the reduction of redundancy between channels, ensuring the accuracy of channel encoding, and thereby ensuring the accuracy of audio encoding.
- the step of grouping the plurality of channels to obtain at least one channel group includes:
- the plurality of channels are grouped based on the similarity between any two channels to obtain the at least one channel group.
- the channels are grouped by the similarity between every two channels, so as to ensure that the similarity of the channels included in each channel group meets the requirements, ensure the accuracy of the channel grouping, and further ensure the reduction of redundancy between channels, ensure the accuracy of channel encoding, and further ensure the accuracy of audio encoding.
- grouping the multiple channels based on the similarity between any two channels to obtain the at least one channel group includes:
- the two channels with the greatest similarity are determined as a candidate channel group until one independent channel remains or no channel remains;
- the at least one channel grouping is determined based on the obtained candidate channel groupings and/or the independent channels.
- the channels are divided into the same channel group in the order of similarity from large to small, ensuring that the similarities of the channels included in each channel group are channels with large correlation, ensuring the accuracy of channel grouping, and further ensuring the reduction of redundancy between channels, ensuring the accuracy of channel encoding, and further ensuring the accuracy of audio encoding.
- the determining the at least one channel grouping based on the obtained candidate channel grouping and/or the independent channel includes:
- Each of the remaining candidate channel groups is determined as a second channel group.
- the remaining single channels are also divided into a channel group to ensure that there is no single channel that is not divided into a channel group, thereby ensuring that the redundancy between channels is reduced, ensuring the accuracy of channel encoding, and thus ensuring the accuracy of audio encoding.
- grouping the multiple channels to obtain at least one channel group includes:
- the multiple channels are grouped to directly obtain the N first channel groups and the M second channel groups.
- grouping the multiple channels to directly obtain the N first channel groups and the M second channel groups includes:
- the multiple channels are grouped according to the similarity between any two channels to obtain N first channel groups and the M second channel groups.
- grouping the multiple channels to obtain at least one channel group includes:
- An independent channel among the at least one independent channel is divided into one candidate channel group among the plurality of candidate channel groups to obtain the N first channel groups and the M second channel groups.
- the downmixed audio signal is encoded to obtain an audio stream.
- the method further includes:
- First information is sent, where the first information is used to indicate channel information of the plurality of channel groups.
- channel identifiers of channels included in a channel group including two channels
- An energy parameter of a channel grouping comprising two channels
- channel identifiers of channels included in a channel group including three channels
- An energy parameter of a channel grouping comprising three channels
- the energy parameter is used to adjust the energy of the channels in the channel group.
- the channel information includes information of channel grouping of two channels or three channels, thereby ensuring the comprehensiveness of the channel information.
- an embodiment of the present disclosure provides a grouping method, which is performed by a decoder, and includes:
- the channel information includes at least one of the following:
- channel identifiers of channels included in a channel group including two channels
- An energy parameter of a channel grouping comprising two channels
- channel identifiers of channels included in a channel group including three channels
- An energy parameter of a channel grouping comprising three channels
- the energy parameter is used to adjust the energy of the channels in the channel group.
- the method further includes:
- the channel groups do not include a channel grouping for three channels
- two-channel upmixing is performed on the audio stream.
- the method further includes:
- the channel groups include a channel grouping with three channels
- three-channel upmixing is performed on the audio stream.
- an embodiment of the present disclosure provides a grouping method, the method comprising:
- the encoder groups the multiple channels to obtain at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M second channel groups, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer;
- the decoder decodes the first information to obtain channel information of the at least one channel grouping, wherein at least one channel grouping among the multiple channel groups includes three channels.
- an embodiment of the present disclosure provides a coding and decoding device, which includes at least one of a transceiver module and a processing module; wherein the encoder is used to execute the optional implementation methods of the first and third aspects.
- an embodiment of the present disclosure provides a coding and decoding device, which includes at least one of a transceiver module and a processing module; wherein the access network device is used to execute the optional implementation methods of the second and third aspects.
- an embodiment of the present disclosure provides a coding and decoding device, including:
- processors one or more processors
- the encoding and decoding device is used to execute the method described in any one of the first aspect and the third aspect.
- an embodiment of the present disclosure provides a coding and decoding device, including:
- processors one or more processors
- the encoding and decoding device is used to execute the method described in any one of the second aspect and the third aspect.
- an embodiment of the present disclosure provides a storage medium, wherein the storage medium stores first information, and when the first information is run on a communication device, the communication device executes a method as described in any one of the first aspect, the second aspect, and the third aspect.
- an embodiment of the present disclosure proposes a program product.
- the communication device executes any one of the methods described in the first aspect, the second aspect and the third aspect.
- an embodiment of the present disclosure proposes a computer program, which, when executed on a communication device, enables the communication device to execute any one of the methods described in the first aspect, the second aspect, and the third aspect.
- an embodiment of the present disclosure provides a chip or a chip system, wherein the chip or the chip system comprises a processing circuit configured to execute any one of the methods described in the first aspect, the second aspect, and the third aspect.
- the disclosed embodiments propose a grouping method, device, and storage medium.
- the terms grouping method, information grouping method, grouping method, etc. can be replaced with each other, the terms encoding and decoding device, information processing device, indicating device, etc. can be replaced with each other, and the terms information processing system, encoding and decoding system, etc. can be replaced with each other.
- each step in a certain embodiment can be implemented as an independent embodiment, and the steps can be arbitrarily combined.
- a solution after removing some steps in a certain embodiment can also be implemented as an independent embodiment, and the order of the steps in a certain embodiment can be arbitrarily exchanged.
- the optional implementation methods in a certain embodiment can be arbitrarily combined; in addition, the embodiments can be arbitrarily combined, for example, some or all of the steps of different embodiments can be arbitrarily combined, and a certain embodiment can be arbitrarily combined with the optional implementation methods of other embodiments.
- elements expressed in the singular form such as “a”, “an”, “the”, “above”, “said”, “aforementioned”, “this”, etc., may mean “one and only one", or “one or more”, “at least one”, etc.
- the noun after the article may be understood as a singular expression or a plural expression.
- plurality refers to two or more.
- the terms "at least one of”, “one or more”, “a plurality of”, “multiple”, etc. can be used interchangeably.
- "at least one of A and B", “A and/or B", “A in one case, B in another case”, “in response to one case A, in response to another case B”, etc. may include the following technical solutions according to the situation: in some embodiments, A (A is executed independently of B); in some embodiments, B (B is executed independently of A); in some embodiments, execution is selected from A and B (A and B are selectively executed); in some embodiments, A and B (both A and B are executed). When there are more branches such as A, B, C, etc., the above is also similar.
- the recording method of "A or B” may include the following technical solutions according to the situation: in some embodiments, A (A is executed independently of B); in some embodiments, B (B is executed independently of A); in some embodiments, execution is selected from A and B (A and B are selectively executed).
- A A is executed independently of B
- B B is executed independently of A
- execution is selected from A and B (A and B are selectively executed).
- prefixes such as “first” and “second” in the embodiments of the present disclosure are only used to distinguish different description objects, and do not constitute any restrictions on the position, order, priority, quantity or content of the description objects.
- the description object please refer to the description in the context of the claims or embodiments, and no unnecessary restrictions should be imposed due to the use of prefixes.
- the description object is a "field”
- the ordinal number before the "field” in the "first field” and the "second field” does not limit the position or order between the "fields”.
- “First” and “second” do not limit whether the "fields” they modify are in the same message, nor do they limit the order of the "first field” and the "second field”.
- the description object is a "level”
- the ordinal number before the "level” in the “first level” and the “second level” does not limit the priority between the "levels”.
- the number of description objects is not limited by ordinal numbers and can be one or more. Taking “first device” as an example, the number of "devices” can be one or more.
- different prefixes modify The objects may be the same or different.
- the description object is “device”, then the “first device” and the “second device” may be the same device or different devices, and their types may be the same or different.
- the description object is “information”, then the “first information” and the “second information” may be the same information or different information, and their contents may be the same or different.
- “including A”, “comprising A”, “used to indicate A”, and “carrying A” can be interpreted as directly carrying A or indirectly indicating A.
- time/frequency refers to the time domain and/or the frequency domain.
- terms such as “greater than”, “greater than or equal to”, “not less than”, “more than”, “more than or equal to”, “not less than”, “higher than”, “higher than or equal to”, “not lower than”, and “above” can be replaced with each other, and terms such as “less than”, “less than or equal to”, “not greater than”, “less than”, “less than or equal to”, “no more than”, “lower than”, “lower than or equal to”, “not higher than”, and “below” can be replaced with each other.
- devices and equipment may be interpreted as physical or virtual, and their names are not limited to the names recorded in the embodiments. In some cases, they may also be understood as “equipment”, “device”, “circuit”, “network element”, “node”, “function”, “unit”, “section”, “system”, “network”, “chip”, “chip system”, “entity”, “subject”, etc.
- network can be interpreted as devices included in the network, such as access network equipment, core network equipment, etc.
- access network device may also be referred to as “radio access network device (RAN device)", “base station (BS)”, “radio base station (radio base station)”, “fixed station” and in some embodiments may also be understood as “node”, “access point (access point)”, “transmission point (TP)”, “reception point (RP)”, “transmission and/or reception point (transmission/reception point, TRP)", “panel”, “antenna panel”, “antenna array”, “cell”, “macro cell”, “small cell”, “femto cell”, “pico cell”, “sector”, “cell group”, “serving cell”, “carrier”, “component carrier”, “bandwidth part (bandwidth part, BWP)", etc.
- RAN device radio access network device
- base station base station
- RP radio base station
- TRP transmission and/or reception point
- encoder (terminal) or “encoder device (terminal device)” can be referred to as "user equipment (encoder)", “user encoder (user terminal)", “mobile station (MS)”, “mobile encoder (mobile terminal, MT)", subscriber station (subscriber station), mobile unit (mobile unit), subscriber unit (subscriber unit), wireless unit (wireless unit), remote unit (remote unit), mobile device (mobile device), wireless device (wireless device), wireless communication device (wireless communication device), remote device (remote device), mobile subscriber station (mobile subscriber station), access terminal, mobile encoder (mobile terminal), wireless encoder (wireless terminal), remote encoder (remote terminal), handheld device (handset), user agent (user agent), mobile client (mobile client), client (client), etc.
- acquisition of data, information, etc. may comply with the laws and regulations of the country where the data is obtained.
- data, information, etc. may be obtained with the user's consent.
- each element, each row, or each column in the table of the embodiment of the present disclosure can be implemented as an independent embodiment. Any combination of pixels, any rows, and any columns can also be implemented as independent embodiments.
- FIG1 is a schematic diagram of the architecture of a codec system according to an embodiment of the present disclosure. As shown in FIG1 , the method provided by the embodiment of the present disclosure can be applied to a codec system 100, and the codec system can include an encoder 101 and a decoder 102. It should be noted that the codec system 100 can also include other devices, and the present disclosure does not limit the devices included in the codec system 100.
- the encoder 101 and the decoder 102 are both arranged in the terminal.
- the terminal can be various devices.
- it includes at least one of a mobile phone, a wearable device, an Internet of Things device, a car with communication function, a smart car, a tablet computer (Pad), a computer with wireless transceiver function, a virtual reality (VR) encoder device, an augmented reality (AR) encoder device, a wireless encoder device in industrial control, a wireless encoder device in self-driving, a wireless encoder device in remote medical surgery, a wireless encoder device in smart grid, a wireless encoder device in transportation safety, a wireless encoder device in smart city, and a wireless encoder device in smart home, but is not limited thereto.
- VR virtual reality
- AR augmented reality
- the encoding and decoding system described in the embodiment of the present disclosure is for the purpose of more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not constitute a limitation on the technical solution proposed in the embodiment of the present disclosure.
- a person of ordinary skill in the art can know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solution proposed in the embodiment of the present disclosure is also applicable to similar technical problems.
- the following embodiments of the present disclosure may be applied to the codec system 100 shown in FIG1 , or part of the subject, but are not limited thereto.
- the subjects shown in FIG1 are examples, and the codec system may include all or part of the subjects in FIG1 , or may include other subjects other than FIG1 , and the number and form of the subjects are arbitrary, and the subjects may be physical or virtual, and the connection relationship between the subjects is an example, and the subjects may be connected or disconnected, and the connection may be in any manner, and may be a direct connection or an indirect connection, and may be a wired connection or a wireless connection.
- the embodiments of the present disclosure may be applied to Long Term Evolution (LTE), LTE-Advanced (LTE-A), LTE-Beyond (LTE-B), SUPER 3G, IMT-Advanced, the fourth generation mobile communication system (4G), the fifth generation mobile communication system (5G), 5G new radio (NR), Future Radio Access (FRA), New-Radio Access Technology (RAT), New Radio (NR), New radio access (NX), Future generation radio access ...-Radio
- the present invention relates to wireless communication networks such as LTE, LTE-A, LTE-X, Global System for Mobile communications (GSM (registered trademark)), CDMA2000, Ultra Mobile Broadband (UMB), IEEE 802.11 (Wi-Fi (registered trademark)), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, Ultra Wide Band (UWB), Bluetooth (registered trademark)), Public Land Mobile Network (PLMN) network, Device to Device (D2D) system, Machine to Machine to Machine (M2M) system,
- the present disclosure is used for three-dimensional sound audio coding.
- the three-dimensional sound audio coding is one of the key technologies of immersive audio technology. Three-dimensional sound increases the sense of space and orientation compared to traditional sound, allowing listeners to reproduce the sound heard in the real world, thereby meeting people's needs for highly restored sound and highly immersive experience, while providing personalized selection and interactive experience.
- the technology in order to reproduce the spatial sense and orientation sense of sound, can rely on a channel-based approach, an object-based approach, a sound field-based approach, and a combination of the above three forms, among which: channel-based audio is a group of interrelated channels, common ones are 5.1 channels, 7.1 channels, 5.1.4 channels, 7.1.4 channels, etc.
- channel-based audio is a group of interrelated channels
- common ones are 5.1 channels, 7.1 channels, 5.1.4 channels, 7.1.4 channels, etc.
- Each format corresponds to a speaker layout, and the best playback effect can be obtained under the corresponding speaker layout.
- object-based audio is a collection of a series of mono audio elements and corresponding metadata.
- the metadata represents information such as the location, intensity, size, etc. of the object.
- the object is mapped to one or more speakers or binaurally rendered to headphones based on the metadata information to achieve the desired spatial audio effect.
- sound field-based audio is a 3D sound field modeling format defined on the surface of a sphere.
- the principle is that sound is transmitted as a pressure wave.
- each point needs to be reflected by several pressure functions. If the pressure value of each point in the space is known, the sound in the space can be reconstructed. There is a certain relationship between the pressure of each point in the space and its neighboring points.
- the collected sound field signal is called Higher Order Ambisonics (HOA).
- HOA Higher Order Ambisonics
- FIG2 is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure. As shown in FIG2 , an embodiment of the present disclosure relates to a grouping method, and the method includes:
- Step S2101 the encoder groups multiple channels to obtain at least one channel group.
- a channel refers to independent audio signals collected or played back at different spatial locations when recording or playing sound.
- a channel refers to an element of audio, and different types of audio have different numbers of channels. For example, a HOA3 signal has 16 channels, and a MC22.2 signal has 24 channels.
- the audio signal comprises audio data and side information.
- a channel group refers to a group including at least two channels.
- the encoder encodes the channels in units of channel groups to obtain an audio stream.
- the name of the channel group is not limited, and may be, for example, a channel group, a channel group category, etc.
- N first channel groups in at least one channel group there are N first channel groups in at least one channel group, the first channel group includes three channels, and M is a non-negative integer.
- the second channel group includes two channels, where N is 1.
- N may also be 0, that is, when a plurality of channels are grouped, no channel group including three channels is obtained, that is, 0 first channel groups.
- M should not be greater than half of the number of channels.
- the value of M is M greater than or equal to 0 and less than or equal to P/2, where P is a positive integer.
- the channels before the channels are grouped, it is necessary to pre-process the signals included in the channels, and then group the channels after the pre-processing is completed.
- the method of preprocessing the signal included in the channel includes at least one of the following: transient detection, window type judgment, time-frequency Transform, frequency domain noise shaping, time domain noise shaping, and band extension coding.
- grouping the multiple channels to obtain at least one channel group includes: obtaining similarity between any two channels in the multiple channels, and grouping the multiple channels based on the similarity between any two channels to obtain at least one channel group.
- the encoder may obtain the similarity between every two channels, and then group the multiple channels according to the magnitude relationship of the similarity between every two channels to obtain at least one channel group.
- a plurality of channels are grouped based on similarity between any two channels to obtain at least one channel grouping, including: for any two channels among the plurality of channels, two channels with the greatest similarity are determined as a candidate channel grouping, and among channels other than the two channels with the greatest similarity, the two channels with the greatest similarity are determined as a candidate channel grouping until one independent channel remains or no channels remain, and at least one channel grouping is determined based on the obtained candidate channel grouping and/or independent channels.
- the multiple channels are channel 1, channel 2, channel 3, channel 4 and channel 5, then the similarity between every two channels is obtained respectively.
- the similarity between channel 1 and channel 2 is the highest, then channel 1 and channel 2 are grouped as a channel, and then channel 1 and channel 2 are excluded, and the channels with the greatest similarity among channel 3, channel 4 and channel 5 are grouped as a channel.
- one channel 5 remains, and channel 5 is an independent channel.
- At least one channel grouping is determined based on the obtained candidate channel groups and/or independent channels, including: obtaining the similarity between the independent channel and each channel in each candidate channel group, and when the similarity between the independent channel and each channel in the first candidate channel group is greater than a similarity threshold, determining the independent channel and each channel in the first candidate channel group as a first channel group, and determining each remaining candidate channel group as a second channel group.
- the similarity between an independent channel and each channel in the first candidate channel group when the similarity between an independent channel and each channel in the first candidate channel group is greater than the similarity threshold, it means that the independent channel is similar to both channels in the first candidate channel group.
- the independent channel can be divided into the first candidate channel group.
- the first candidate channel group includes three channels, that is, the first channel group, and the remaining channel group still includes two channels, that is, the second channel group.
- the similarity between the independent channel and each channel in each candidate channel group is obtained, and when the similarity between the independent channel and each channel in the first candidate channel group is greater than a similarity threshold, the independent channel and each channel in the first candidate channel group are determined as a first channel group, and each of the remaining candidate channel groups are respectively determined as a second channel group.
- the embodiment of the present disclosure is described by taking the example of obtaining independent channels.
- the two channels with the greatest similarity are determined as a candidate channel group, and among the channels other than the two channels with the greatest similarity, the two channels with the greatest similarity are determined as a candidate channel group until no channels remain, and at least one channel group is determined based on the obtained candidate channel group.
- any two channels among the multiple channels if the similarity between any two channels is not greater than the similarity threshold, the channels are not grouped.
- the similarity between an independent channel and two channels in multiple candidate channel groups may be greater than the similarity threshold. In this case, it is necessary to additionally determine into which candidate channel group the independent channel is to be divided.
- the sum of similarities between the independent channel and two channels in each candidate channel group is obtained, the largest sum is selected from the sums corresponding to each candidate channel group, and the independent channel is divided into the channel group corresponding to the largest sum.
- the independent channel is channel 5
- the first candidate channel group includes channel 1 and channel 2
- the second candidate channel group includes channel 3 and channel 4. If the sum of the similarities between channel 5 and channel 1 and channel 2 is 1.8, and the sum of the similarities between channel 5 and channel 3 and channel 4 is 1.9, channel 5 is divided into the second candidate channel group.
- the candidate channel group corresponding to the greatest similarity is determined, and the independent channel is divided into the channel group corresponding to the greatest similarity.
- the independent channel is channel 5
- the first candidate channel group includes channel 1 and channel 2
- the second candidate channel group includes channel 3 and channel 4. If the similarity between channel 5 and channel 1 is 0.7, the similarity between channel 5 and channel 2 is 0.9, the similarity between channel 5 and channel 3 is 0.8, and the similarity between channel 5 and channel 4 is 0.6, then channel 5 is divided into the first candidate channel group.
- one of the candidate channel groups is randomly selected and the independent channel is divided into the selected channel group.
- grouping in the embodiment of the present disclosure includes two grouping schemes. Each grouping scheme is described below.
- multiple channels are grouped to directly obtain N first channel groups and M second channel groups.
- the N first channel groups and M second channel groups obtained by grouping are directly output, and no other grouping results are output during the process.
- grouping the multiple channels to directly obtain N first channel groups and M second channel groups includes: performing a global search on the multiple channels and grouping the multiple channels according to the similarity between any two channels to obtain N first channel groups and M second channel groups.
- the similarity between any two of every three channels in the multiple channels is obtained, and if the similarity between any two of every three channels is greater than a similarity threshold, the three channels are divided into a candidate channel group, and if a candidate channel group is finally obtained, the candidate channel group is determined as the first channel group. If multiple candidate channel groups are finally obtained, the first channel group is determined from the multiple candidate channel groups according to the similarity relationship of each candidate channel group.
- determining a first channel group from the multiple candidate channel groups according to the similarity relationship of each candidate channel group includes: obtaining the sum of the similarities between every two channels in each candidate channel group, selecting the largest sum from the sums corresponding to each candidate channel group, and determining the channel group corresponding to the largest sum as the first channel group.
- the first candidate channel grouping includes channel 1, channel 2, and channel 3, and the second candidate channel grouping includes channel 4, channel 5, and channel 6. If the sum of the similarities between each two channels in channel 1, channel 2, and channel 3 is 2.7, and the sum of the similarities between channel 4, channel 5, and channel 6 is 2.6, then the first candidate channel grouping is determined to be the first channel grouping.
- a first channel group is determined from the multiple candidate channel groups, including: determining a candidate channel group corresponding to the greatest similarity between the independent channel and every two channels in each candidate channel group, and determining the candidate channel group corresponding to the greatest similarity as the first channel group.
- the process of grouping multiple channels is similar to the process of grouping based on similarity in the above embodiments, and will not be described in detail here.
- multiple channels are grouped to obtain multiple candidate channel groups including two channels and at least one independent channel, and an independent channel of the at least one independent channel is divided into one candidate channel group in the multiple candidate channel groups to obtain N first channel groups and M second channel groups.
- the number of channel groups including two channels may also be referred to as the number of second channel groups, or the number of second channel groups, or the number of channel group pairs, or the number of groups, etc., which is not limited in the embodiments of the present disclosure.
- the channel identifiers of the channels included in the channel group including two channels are used to indicate the index of the channel pair, and the index values of the two channels in the current channel pair can be obtained by parsing.
- the number of channel groups including three channels may also be referred to as the number of first channel groups, or the number of first channel groups, or the number of channel group pairs, or the number of groups, etc., which is not limited in the embodiments of the present disclosure.
- the number of channel groups including three channels is used to represent the number of first channel group pairs of the current frame.
- the channel identifiers of the channels included in the channel group including three channels are used to indicate the index of the channel pair, and the index values of the three channels in the current channel pair can be obtained by parsing.
- the energy parameter is used to adjust the energy of the channels in the channel grouping.
- the energy parameter is used to quantize an index of an inter-channel amplitude difference ILD parameter between a first channel and a second channel in a current channel pair for inter-channel energy/amplitude adjustment.
- a plurality of channels are grouped to directly obtain N first channel groups and M second channel groups, and channel information of the first channel groups and the second channel groups is generated.
- a plurality of channels are grouped to obtain a plurality of candidate channel groups including two channels and at least one independent channel, channel information of the plurality of candidate channel groups and the at least one independent channel is generated, an independent channel of the at least one independent channel is divided into one candidate channel group of the plurality of candidate channel groups to obtain N first channel groups and M second channel groups, and the channel information of the plurality of candidate channel groups and the at least one independent channel is rewritten to obtain channel information of the first channel group and the second channel group.
- multiple channels are grouped to obtain M+1 candidate channel groups including two channels and at least one independent channel, channel information of the multiple candidate channel groups and the at least one independent channel is generated, and an independent channel of the at least one independent channel is divided into one candidate channel group in the multiple candidate channel groups to obtain N first channel groups and M second channel groups.
- the method of grouping multiple channels to obtain the first channel grouping can be called 3-channel sum and difference coding.
- the method of grouping multiple channels to obtain the second channel grouping can be called 2-channel sum and difference coding.
- Step S2102 the encoder downmixes the audio signal in each channel group to obtain a downmixed audio signal.
- downmixing refers to mixing the grouped channels using an orthogonal normalization matrix to obtain a mixed channel for each channel.
- the orthogonal normalized matrix is preset, and the embodiment of the present disclosure does not limit the orthogonal normalized matrix.
- the orthogonal normalization matrix used is Among them, the first row is the sum vector and the second row is the difference vector.
- the orthogonal normalization matrix used is Among them, the first row is the sum vector, and the second and third rows are the difference vectors.
- M 3ch ms I 3ch ms O 3ch ms
- M 2ch ms I 2ch ms O 2ch ms
- I 3ch ms is a 1*3 column vector
- the column vector refers to the audio data in the channel group including three channels
- I 2ch ms is a 1*2 column vector
- the column vector refers to the audio data in the channel group including two channels.
- Step S2103 the encoder encodes the downmixed audio signal to obtain an audio stream.
- encoding includes bit allocation quantization entropy coding and code stream multiplexing.
- the decision module (step S2101) is executed to determine which sum and difference coding method or a combination of the two is used.
- the channels include an L channel, an R channel, a C channel, an LS channel and an RS channel, wherein the similarity between any two channels is shown in Table 1.
- the downmix threshold is 0.5.
- the decision module obtains the sum-difference coding method to be used through a certain algorithm.
- 3CH M/S refers to a module in which three channels are divided into the first channel grouping
- 2CH M/S refers to a module in which two channels are divided into the second channel grouping.
- the two-channel downmix matrix M 2ch ms is a 2x2 orthogonal normalized matrix, in which the first row is a sum vector and the second row is a difference vector.
- the three-channel downmix matrix M 3ch ms is a 3x3 orthogonal normalized matrix, in which the first row is a sum vector and the second and third rows are difference vectors.
- I 2ch ms is a 1x2 column vector
- I 3ch ms is a 1x3 column vector.
- the data contained in the vector is the pre-processed audio data, in units of sampling points or frequency points.
- two-channel pairing decision is performed to obtain two-channel pairing information, including the number of pairs and the channel pair index.
- the channels include an L channel, an R channel, a C channel, an LS channel and an RS channel, wherein the similarity between any two channels is as shown in Table 1, and the downmix threshold is 0.5.
- the number of the second channel groups finally obtained is 2, including the first channel group L channel and R channel, and the second channel group LS channel and RS channel.
- a three-channel pairing judgment is executed. If a three-channel pairing is generated, the above channel grouping result will be changed.
- the similarity between all channels that are not downmixed (C channel) and the two channels of the first channel pair and the second channel pair is calculated.
- the judgment result of 3CH M/S outputs L channel, R channel, and C channel.
- 1 channel group is output, that is, LS channel and RS channel are output.
- the two-channel downmix matrix M 2ch ms is a 2x2 orthogonal normalized matrix, in which the first row is a sum vector and the second row is a difference vector.
- the three-channel downmix matrix M 3ch ms is a 3x3 orthogonal normalized matrix, in which the first row is a sum vector and the second and third rows are difference vectors.
- the above channel information is rewritten to obtain new channel information.
- the encoder obtains an audio stream by executing the above steps.
- the channel signal is preprocessed, and then the preprocessed channel signal is down-mixed between channels, and then bit allocation quantization entropy coding is performed, and finally bit stream multiplexing is performed to obtain an audio stream.
- the preprocessing includes transient detection window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, band extension coding and other processes.
- Inter-channel group downmixing includes three-channel group downmixing of L channel, R channel and C channel to obtain M1 channel, S11 channel and S12 channel; and two-channel group downmixing of LS channel and RS channel to obtain M2 channel and S2 channel, and LFE channel is not processed.
- Step S2104 the encoder sends the first information and the audio stream.
- a decoder receives first information and an audio stream.
- the encoder may send the first information and the audio stream separately. For example, the encoder sends the first information first and then sends the audio stream. Alternatively, the encoder sends the audio stream first and then sends the first information. In some embodiments, the encoder may send the first information and the audio stream simultaneously.
- the channel information includes at least one of the following:
- channel identifiers of channels included in a channel group including two channels
- An energy parameter of a channel grouping comprising two channels
- channel identifiers of channels included in a channel group including three channels
- An energy parameter of a channel grouping comprising three channels
- the energy parameter is used to adjust the energy of the channels in the channel grouping.
- Step S2105 The decoder decodes the first information to obtain channel information of at least one channel group.
- Step S2106 When the decoder determines that the channel grouping does not include a channel grouping with three channels, the decoder performs two-channel upmixing on the audio stream; when the decoder determines that the channel grouping includes a channel grouping with three channels, the decoder performs three-channel upmixing on the audio stream.
- the decoder determines whether the encoder has performed a two-channel downmix and a three-channel downmix. If the encoder has performed a two-channel downmix, the decoder performs a two-channel upmix. If the encoder has performed a three-channel downmix, the decoder performs a three-channel upmix.
- audio post-processing includes but is not limited to general decoding processes, such as time-frequency inverse transform, time-domain noise shaping inverse transform, frequency-domain noise shaping inverse transform, frequency band extension inverse transform and other modules, and also includes decoding processing for certain types of signal characteristics, such as multi-channel decoding processing, HOA channel decoding processing, object metadata decoding processing, etc.
- general decoding processes such as time-frequency inverse transform, time-domain noise shaping inverse transform, frequency-domain noise shaping inverse transform, frequency band extension inverse transform and other modules
- decoding processing for certain types of signal characteristics such as multi-channel decoding processing, HOA channel decoding processing, object metadata decoding processing, etc.
- the decoder obtains each channel signal by performing the above steps. Referring to FIG2E , after obtaining the audio stream, the audio stream is demultiplexed, and then bit allocation inverse quantization entropy coding, inter-channel upmixing, and post-processing are performed to obtain decoded channel signals.
- post-processing includes frequency band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time-frequency transformation, etc.
- the inter-channel group upmixing includes upmixing the M1 channel, S11 channel, and S12 channel into L channel, R channel, and C channel in three-channel groups to obtain channels; and upmixing the M2 channel and S2 channel into LS channel and RS channel in two-channel groups to obtain channels, and the LFE channel is not processed.
- the names of information, etc. are not limited to the names recorded in the embodiments, and terms such as “information”, “message”, “signal”, “signaling”, “report”, “configuration”, “indication”, “instruction”, “command”, “channel”, “parameter”, “domain”, “field”, “symbol”, “symbol”, “code element”, “codebook”, “codeword”, “codepoint”, “bit”, “data”, “program”, and “chip” can be used interchangeably.
- terms such as “uplink”, “uplink”, “physical uplink” can be interchangeable, and terms such as “downlink”, “downlink”, “physical downlink” can be interchangeable, and terms such as “side”, “sidelink”, “side communication”, “sidelink communication”, “direct connection”, “direct link”, “direct communication”, “direct link communication” can be interchangeable.
- obtain can be interchangeable, and can be interpreted as receiving from other entities, obtaining from protocols, obtaining from high levels, obtaining by self-processing, autonomous implementation, etc.
- terms such as “moment”, “time point”, “time”, and “time position” can be interchangeable, and terms such as “duration”, “period”, “time window”, “window”, and “time” can be interchangeable.
- terms such as “certain”, “preset”, “preset”, “set”, “indicated”, “some”, “any”, and “first” can be interchangeable, and "specific A”, “preset A”, “preset A”, “set A”, “indicated A”, “some A”, “any A”, and “first A” can be interpreted as A pre-defined in a protocol, etc., or as A obtained through setting, configuration, or indication, etc., and can also be interpreted as specific A, some A, any A, or first A, etc., but is not limited to this.
- step S2101 may be implemented as an independent embodiment
- step S2102 may be implemented as an independent embodiment
- step S2103 may be implemented as an independent embodiment
- step S2104 may be implemented as an independent embodiment
- step S2105 may be implemented as an independent embodiment
- step S2106 may be implemented as an independent embodiment
- step S2101 and step S2102 may be implemented as independent embodiments
- ...6 may be implemented as an independent embodiment
- step S2103 may be implemented as an independent embodiment
- step S2104 may be implemented as an independent embodiment
- step S2105 may be implemented as an independent embodiment
- step S2106 may be implemented as an independent embodiment
- step S2106 may be implemented as an independent embodiment
- step S210 Step S2103 and step S2104 can be implemented as independent embodiments
- step S2105 and step S2106 can be implemented as independent embodiments
- step S2101, step S2102, step S2103, and step S2104 can be implemented as independent embodiments
- step S2101 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2102 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2103 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2104 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2105 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2106 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2101 and step S2102 are optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S2103 and step S2104 are optional, and one or more of these steps may be omitted or replaced in different embodiments.
- step S215 and step S2106 are optional, and one or more of these steps may be omitted or replaced in different embodiments.
- FIG3A is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to an encoder. As shown in FIG3A , an embodiment of the present disclosure relates to a grouping method, which includes:
- Step S3101 The encoder groups multiple channels to obtain at least one channel group.
- step S3101 can refer to the optional implementation of step S2101 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
- Step S3102 the encoder downmixes the audio signal in each channel group to obtain a downmixed audio signal.
- Step S3103 the encoder encodes the downmixed audio signal to obtain an audio stream.
- step S3103 can refer to the optional implementation of step S2103 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
- Step S3104 the encoder sends the first information and the audio stream.
- step S3103 can refer to the optional implementation of step S2104 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
- the grouping method involved in the embodiment of the present disclosure may include at least one of steps S3101 to S3104.
- step S3101 may be implemented as an independent embodiment
- step S3102 may be implemented as an independent embodiment
- step S3103 may be implemented as an independent embodiment
- step S3104 may be implemented as an independent embodiment, or at least two steps may be combined, but are not limited thereto.
- step S3101 is optional
- step S3102 is optional
- step S3103 is optional
- step S3104 is optional.
- one or more of these steps may be omitted or replaced, but the present invention is not limited thereto.
- FIG3B is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to an encoder. As shown in FIG3B , an embodiment of the present disclosure relates to a grouping method, which includes:
- step S3201 can refer to step S2101 of FIG. 2 , step S3101 of FIG. 3A , and other related parts of the embodiments involved in FIG. 2 and FIG. 3A , which will not be described in detail here.
- FIG4A is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to a decoder. As shown in FIG4A , an embodiment of the present disclosure relates to a grouping method, and the method includes:
- Step S4101 The decoder decodes the first information to obtain channel information of at least one channel group.
- step S4101 can refer to step S2105 of FIG. 2 and other related parts of the embodiment involved in FIG. 2 , which will not be described in detail here.
- Step S4102 When the decoder determines that the channel grouping does not include a channel grouping with three channels, it performs two-channel upmixing on the audio stream; when the decoder determines that the channel grouping includes a channel grouping with three channels, it performs three-channel upmixing on the audio stream.
- step S4102 can refer to step S2106 of FIG. 2 and other related parts of the embodiment involved in FIG. 2 , which will not be described in detail here.
- step S4101 may be implemented as an independent embodiment
- step S4102 may be implemented as an independent embodiment
- at least two steps may be combined, but not limited thereto.
- step S4101 is optional
- step S4102 is optional
- one or more of these steps may be omitted or replaced in different embodiments, but the present invention is not limited thereto.
- FIG4B is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to a decoder. As shown in FIG4B , an embodiment of the present disclosure relates to a grouping method, and the method includes:
- Step S4201 The decoder decodes the first information to obtain channel information of at least one channel group.
- step S4201 can refer to step S2105 of FIG. 2 and other related parts of the embodiment involved in FIG. 2 , which will not be described in detail here.
- N first channel groups in the at least one channel group there are N first channel groups in the at least one channel group, the first channel group includes three channels, and there are M second channel groups in the at least one channel group, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
- the channel information includes at least one of the following:
- channel identifiers of channels included in a channel group including two channels
- An energy parameter of a channel grouping comprising two channels
- channel identifiers of channels included in a channel group including three channels
- An energy parameter of a channel grouping comprising three channels
- the energy parameter is used to adjust the energy of the channels in the channel group.
- the method further comprises:
- the channel groups do not include a channel grouping for three channels
- two-channel upmixing is performed on the audio stream.
- the method further comprises:
- the channel groups include a channel grouping with three channels
- three-channel upmixing is performed on the audio stream.
- FIG5 is a flow chart of a grouping method according to an embodiment of the present disclosure. As shown in FIG5 , the embodiment of the present disclosure relates to a grouping method, and the method includes:
- Step S5101 the encoder groups multiple channels to obtain at least one channel group.
- N first channel groups in the at least one channel group there are N first channel groups in the at least one channel group, the first channel group includes three channels, and there are M second channel groups in the at least one channel group, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
- Step S5102 The decoder decodes the first information to obtain channel information of at least one channel group.
- step S5101 can refer to step S2101 in FIG. 2 , step S3101 in FIG. 3A , and other related parts in the embodiments involved in FIG. 2 and FIG. 4A , which will not be described in detail here.
- step S5102 can refer to step S2105 of FIG. 2 , step S4101 of FIG. 4A , and other related parts of the embodiments involved in FIG. 2 and FIG. 3A , which will not be described in detail here.
- the above method may include the method of the above-mentioned embodiments of the coding and decoding system side, encoder side, decoder side, etc., which will not be repeated here.
- FIG6 is a flow chart of a grouping method according to an embodiment of the present disclosure. As shown in FIG6 , the embodiment of the present disclosure relates to a grouping method, and the method includes:
- Step S6101 the encoding end encodes multiple channels by combining two-channel sum and difference encoding and three-channel sum and difference encoding.
- Audio preprocessing includes but is not limited to general encoding processes, such as transient analysis, time-frequency transformation, time-domain noise shaping, frequency-domain noise shaping, and frequency band extension. It also includes audio preprocessing for certain types of signals. Features are processed, such as multi-channel encoding, HOA channel encoding, object metadata encoding, etc.
- the inter-channel group downmix module introduces a three-channel sum-difference coding method and combines it with a two-channel sum-difference coding framework.
- the module includes a decision module and a downmix module.
- the decision module is used to determine which sum and difference coding method or combination to use.
- the decision criterion is the correlation between channels, which is compared with the group pair threshold.
- the decision result is that the L, R, and C channels are 3-channel sum and difference coded, LS and LS are 2-channel sum and difference coded, and the LFE channel is not processed.
- the downmix module performs 3-channel sum and difference encoding on the L, R, and C channels, and 2-channel sum and difference encoding on LS and LS.
- the inter-channel group downmixing module After the inter-channel group downmixing module, the three-channel sum difference coding downmixed channels (M1 channel, S11 channel, S12 channel), the two-channel sum difference coding downmixed channels (M2 channel, S2 channel) and the un-downmixed channels (LFE channel) are all passed through the bit allocation quantization entropy coding module, and the bit stream is multiplexed to form the coded bit stream E.
- part or all of the steps and their optional implementations may be arbitrarily combined with part or all of the steps in other embodiments, or may be arbitrarily combined with optional implementations of other embodiments.
- the embodiments of the present disclosure also propose a device for implementing any of the above methods, for example, a device is proposed, the above device includes a unit or module for implementing each step performed by the encoder in any of the above methods.
- a device is also proposed, including a unit or module for implementing each step performed by the decoder in any of the above methods.
- the division of the units or modules in the above device is only a division of logical functions, which can be fully or partially integrated into one physical entity or physically separated in actual implementation.
- the units or modules in the device can be implemented in the form of a processor calling software: for example, the device includes a processor, the processor is connected to a memory, and instructions are stored in the memory.
- the processor calls the instructions stored in the memory to implement any of the above methods or implement the functions of the units or modules of the above device, wherein the processor is, for example, a general-purpose processor, such as a central processing unit (CPU) or a microprocessor, and the memory is a memory inside the device or a memory outside the device.
- CPU central processing unit
- microprocessor a microprocessor
- the units or modules in the device may be implemented in the form of hardware circuits, and the functions of some or all of the units or modules may be implemented by designing the hardware circuits.
- the hardware circuits may be understood as one or more processors; for example, in one implementation, the hardware circuits are application-specific integrated circuits (ASICs), and the functions of some or all of the above units or modules may be implemented by designing the logical relationship of the components in the circuits; for another example, in another implementation, the hardware circuits may be implemented by programmable logic devices (PLDs), and Field Programmable Gate Arrays (FPGAs) may be used as an example, which may include a large number of logic gate circuits, and the connection relationship between the logic gate circuits may be configured by configuring the configuration files, thereby implementing the functions of some or all of the above units or modules. All units or modules of the above devices may be implemented in the form of software called by the processor, or in the form of hardware circuits, or in the form of software called by the processor, and the remaining part may be implemented in
- the processor is a circuit with signal processing capability.
- the processor may be a circuit with instruction reading and execution capability, such as a central processing unit (CPU), a microprocessor, a graphics processing unit (GPU) (which may be understood as a microprocessor), or a digital signal processor (DSP).
- the processor may implement certain functions through the logical relationship of hardware circuits, and the logical relationship of the above hardware circuits may be fixed or reconfigurable, such as a processor being an application-specific integrated circuit (ASIC).
- the process of the processor loading a configuration document to implement the hardware circuit configuration can be understood as the process of the processor loading instructions to implement the functions of some or all of the above units or modules.
- it can also be a hardware circuit designed for artificial intelligence, which can be understood as an ASIC, such as a neural network processing unit (NPU), a tensor processing unit (TPU), a deep learning processing unit (DPU), etc.
- NPU neural network processing unit
- TPU tensor processing unit
- DPU deep learning processing unit
- FIG7A is a schematic diagram of the structure of the encoding and decoding device proposed in an embodiment of the present disclosure.
- the encoding and decoding device 7100 may include: at least one of a transceiver module 7101, a processing module 7102, etc.
- the processing module 7102 is used to group multiple channels to obtain at least one channel group, wherein there are N first channel groups in the at least one channel group, and the first channel group includes three channels, and there are M second channel groups in the at least one channel group, and the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
- the above-mentioned transceiver module 7101 is used to perform at least one of the communication steps such as sending and/or receiving performed by the encoder in any of the above methods (for example, step S2101 but not limited thereto), which will not be repeated here.
- the above-mentioned processing module is used to perform at least one of the other steps performed by the encoder in any of the above methods, which will not be repeated here.
- the processing module 7102 is used to execute at least one of the communication steps such as processing performed by the encoder in any of the above methods, which will not be repeated here.
- FIG7B is a schematic diagram of the structure of the coding and decoding device proposed in the embodiment of the present disclosure.
- the coding and decoding device 7200 may include: at least one of a transceiver module 7201 and a processing module 7202.
- the processing module 7202 is used to decode the first information to obtain channel information of at least one channel grouping, wherein there are N first channel groups in the at least one channel grouping, and the first channel grouping includes three channels, and there are M second channel groups in the at least one channel grouping, and the second channel grouping includes two channels, wherein N is 1 and M is a non-negative integer.
- the above-mentioned transceiver module is used to execute at least one of the communication steps such as sending and/or receiving (such as step S2102 but not limited thereto) executed by the decoder in any of the above methods, which will not be repeated here.
- the processing module 7202 is used to execute at least one of the communication steps such as processing performed by the decoder in any of the above methods, which will not be repeated here.
- the transceiver module may include a sending module and/or a receiving module, and the sending module and the receiving module may be separate or integrated.
- the transceiver module may be interchangeable with the transceiver.
- the processing module can be a module or include multiple submodules.
- the multiple submodules respectively execute all or part of the steps required to be executed by the processing module.
- the processing module can be replaced with the processor.
- FIG8A is a schematic diagram of the structure of a communication device 8100 proposed in an embodiment of the present disclosure.
- the communication device 8100 may be a decoder (e.g., an access network device, a core network device, etc.), or an encoder (e.g., a user device, etc.), or a chip, a chip system, or a processor that supports a decoder to implement any of the above methods, or a chip, a chip system, or a processor that supports an encoder to implement any of the above methods.
- the communication device 8100 may be used to implement the method described in the above method embodiment, and the details may refer to the description in the above method embodiment.
- the communication device 8100 includes one or more processors 8101.
- the processor 8101 may be a general purpose processor or a processor.
- the communication device 8100 may be a dedicated processor, such as a baseband processor or a central processing unit.
- the baseband processor may be used to process the communication protocol and the communication data
- the central processing unit may be used to control the coding and decoding device (such as a base station, a baseband chip, an encoder device, an encoder device chip, a DU or a CU, etc.), execute a program, and process the data of the program.
- the communication device 8100 is used to execute any of the above methods.
- the communication device 8100 further includes one or more memories 8102 for storing instructions.
- the memory 8102 may also be outside the communication device 8100.
- the communication device 8100 further includes one or more transceivers 8103.
- the transceiver 8103 performs at least one of the communication steps such as sending and/or receiving in the above method (for example, step S2101, step S2102, step S2103, step S2104, but not limited thereto).
- the transceiver may include a receiver and/or a transmitter, and the receiver and the transmitter may be separate or integrated.
- the terms such as transceiver, transceiver unit, transceiver, transceiver circuit, etc. may be replaced with each other, the terms such as transmitter, transmission unit, transmitter, transmission circuit, etc. may be replaced with each other, and the terms such as receiver, receiving unit, receiver, receiving circuit, etc. may be replaced with each other.
- the communication device 8100 may include one or more interface circuits 8104.
- the interface circuit 8104 is connected to the memory 8102, and the interface circuit 8104 may be used to receive signals from the memory 8102 or other devices, and may be used to send signals to the memory 8102 or other devices.
- the interface circuit 8104 may read instructions stored in the memory 8102 and send the instructions to the processor 8101.
- the communication device 8100 described in the above embodiments may be a decoder or an encoder, but the scope of the communication device 8100 described in the present disclosure is not limited thereto, and the structure of the communication device 8100 may not be limited by FIG. 8A.
- the communication device may be an independent device or may be part of a larger device.
- the communication device may be: 1) an independent integrated circuit IC, or a chip, or a chip system or subsystem; (2) a collection of one or more ICs, optionally, the above IC collection may also include a storage component for storing data and programs; (3) an ASIC, such as a modem; (4) a module that can be embedded in other devices; (5) a receiver, an encoder device, an intelligent encoder device, a cellular phone, a wireless device, a handheld device, a mobile unit, a vehicle-mounted device, a decoder, a cloud device, an artificial intelligence device, etc.; (6) others, etc.
- FIG. 8B is a schematic diagram of the structure of a chip 8200 provided in an embodiment of the present disclosure.
- the communication device 8100 may be a chip or a chip system
- the chip 8200 includes one or more processors 8201, and the chip 8200 is used to execute any of the above methods.
- the chip 8200 further includes one or more interface circuits 8202.
- the interface circuit 8202 is connected to the memory 8203.
- the interface circuit 8202 can be used to receive signals from the memory 8203 or other devices, and the interface circuit 8202 can be used to send signals to the memory 8203 or other devices.
- the interface circuit 8202 can read instructions stored in the memory 8203 and send the instructions to the processor 8201.
- the interface circuit 8202 performs at least one of the communication steps such as sending and/or receiving in the above method, and the processor 8201 performs at least one of the other steps.
- interface circuit interface circuit
- transceiver pin transceiver
- the chip 8200 further includes one or more memories 8203 for storing instructions.
- the memory 8203 may be outside the chip 8200.
- the present disclosure also proposes a storage medium, on which instructions are stored, and when the instructions are executed on the communication device 8100, the communication device 8100 executes any of the above methods.
- the storage medium is an electronic storage medium.
- the storage medium is a computer-readable storage medium, but is not limited to this, and it can also be a storage medium readable by other devices.
- the storage medium can be a non-transitory storage medium, but is not limited to this, and it can also be a temporary storage medium.
- the present disclosure also proposes a program product, which, when executed by the communication device 8100, enables the communication device 8100 to execute any of the above methods.
- the program product is a computer program product.
- the present disclosure also proposes a computer program, which, when executed on a computer, causes the computer to execute any one of the above methods.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
本公开涉及多媒体技术领域,尤其涉及分组方法、编码器、解码器以及存储介质。The present disclosure relates to the field of multimedia technology, and in particular to a grouping method, an encoder, a decoder, and a storage medium.
随着多媒体技术的快速发展,音频可以应用在各个领域中,并且,为了提高音频的空间感和方位感,可以对音频进行三维编码和三维解码,实现用户听到的音频与实际环境中听到的音频无差别。With the rapid development of multimedia technology, audio can be applied in various fields. In addition, in order to improve the spatial sense and orientation sense of audio, the audio can be 3D encoded and 3D decoded to ensure that the audio heard by the user is indistinguishable from the audio heard in the actual environment.
发明内容Summary of the invention
本公开解决了对声道进行分组时存在单一声道无法分组的问题,提供了一种通过划分包括三个声道的声道分组的方式,保证对声道分组时每个声道都可以具有对应的分组,保证对声道分组的准确性,进而保证声道之间的冗余减少,保证对声道进行编码的准确性,进而保证音频编码的准确性。The present disclosure solves the problem that a single channel cannot be grouped when grouping channels, and provides a method of dividing a channel group including three channels, thereby ensuring that each channel can have a corresponding group when grouping channels, ensuring the accuracy of channel grouping, and further ensuring the reduction of redundancy between channels, ensuring the accuracy of channel encoding, and further ensuring the accuracy of audio encoding.
本公开实施例提出了分组方法、装置以及存储介质。The embodiments of the present disclosure provide a grouping method, a device, and a storage medium.
根据本公开实施例的第一方面,提出了一种分组方法,所述方法由编码器执行,所述方法包括:According to a first aspect of an embodiment of the present disclosure, a grouping method is proposed. The method is performed by an encoder, and the method includes:
对多个声道进行分组,得到至少一个声道分组,其中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。A plurality of channels are grouped to obtain at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M second channel groups, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
根据本公开实施例的第二方面,提出了一种分组方法,所述方法由解码器执行,所述方法包括:According to a second aspect of an embodiment of the present disclosure, a grouping method is proposed. The method is performed by a decoder, and the method includes:
对第一信息进行解码,得到至少一个声道分组的声道信息,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。The first information is decoded to obtain channel information of at least one channel grouping, where there are N first channel groups in the at least one channel grouping, the first channel grouping includes three channels, and there are M second channel groups in the at least one channel grouping, the second channel grouping includes two channels, where N is 1 and M is a non-negative integer.
根据本公开实施例的第三方面,提出了一种分组方法,所述方法包括:According to a third aspect of an embodiment of the present disclosure, a grouping method is proposed, the method comprising:
编码器对多个声道进行分组,得到至少一个声道分组,其中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数;The encoder groups the multiple channels to obtain at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M second channel groups, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer;
解码器对第一信息进行解码,得到至少一个声道分组的声道信息。The decoder decodes the first information to obtain channel information of at least one channel group.
根据本公开实施例的第四方面,提出了一种编解码装置,包括:According to a fourth aspect of an embodiment of the present disclosure, a coding and decoding device is proposed, including:
处理模块,用于对多个声道进行分组,得到至少一个声道分组,其中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。A processing module is used to group multiple channels to obtain at least one channel group, wherein there are N first channel groups in the at least one channel group, the first channel group includes three channels, and there are M second channel groups in the at least one channel group, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
根据本公开实施例的第五方面,提出了一种编解码装置,包括:According to a fifth aspect of an embodiment of the present disclosure, a coding and decoding device is proposed, including:
处理模块,用于对第一信息进行解码,得到至少一个声道分组的声道信息,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个 第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。a processing module, configured to decode the first information to obtain channel information of at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M The second channel grouping includes two channels, wherein N is 1 and M is a non-negative integer.
根据本公开实施例的第六方面,提出了一种编解码装置,包括:According to a sixth aspect of an embodiment of the present disclosure, a coding and decoding device is provided, including:
一个或多个处理器;one or more processors;
其中,所述编解码装置用于执行第一方面中任一所述的方法。The encoding and decoding device is used to execute any method described in the first aspect.
根据本公开实施例的第七方面,提出了一种编解码装置,包括:According to a seventh aspect of an embodiment of the present disclosure, a coding and decoding device is provided, including:
一个或多个处理器;one or more processors;
其中,所述编解码装置用于执行第二方面中任一所述的方法。Wherein, the encoding and decoding device is used to execute any method described in the second aspect.
根据本公开实施例的第八方面,提出了一种编解码系统,包括:According to an eighth aspect of the embodiments of the present disclosure, a coding and decoding system is proposed, including:
编码器和解码器,其中,所述编码器被配置为实现第一方面所述的分组方法,所述解码器被配置为实现第二方面所述的分组方法。An encoder and a decoder, wherein the encoder is configured to implement the grouping method described in the first aspect, and the decoder is configured to implement the grouping method described in the second aspect.
根据本公开实施例的第九方面,提出了一种存储介质,所述存储介质存储有指令,当所述指令在通信设备上运行时,使得所述通信设备执行如第一方面或第二方面中任一项所述的方法。According to a ninth aspect of an embodiment of the present disclosure, a storage medium is proposed, wherein the storage medium stores instructions, and when the instructions are executed on a communication device, the communication device executes a method as described in any one of the first aspect or the second aspect.
此处所说明的附图用来提供对本公开实施例的进一步理解,构成本公开的一部分,本公开实施例的示意性实施例及其说明用于解释本公开实施例,并不构成对本公开实施例的不当限定。在附图中:The drawings described herein are used to provide a further understanding of the embodiments of the present disclosure and constitute a part of the present disclosure. The illustrative embodiments of the embodiments of the present disclosure and their descriptions are used to explain the embodiments of the present disclosure and do not constitute an improper limitation on the embodiments of the present disclosure. In the drawings:
图1是根据本公开实施例示出的编解码系统的架构示意图;FIG1 is a schematic diagram of the architecture of a coding and decoding system according to an embodiment of the present disclosure;
图2A是根据本公开实施例示出的分组方法的交互示意图;FIG2A is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure;
图2B是根据本公开实施例示出的分组方法的交互示意图;FIG2B is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure;
图2C是根据本公开实施例示出的分组方法的交互示意图;FIG2C is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure;
图2D是根据本公开实施例示出的分组方法的交互示意图;FIG2D is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure;
图2E是根据本公开实施例示出的分组方法的交互示意图;FIG2E is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure;
图3A是根据本公开实施例示出的分组方法的流程示意图;FIG3A is a schematic diagram of a flow chart of a grouping method according to an embodiment of the present disclosure;
图3B是根据本公开实施例示出的分组方法的流程示意图;FIG3B is a flow chart of a grouping method according to an embodiment of the present disclosure;
图4A是根据本公开实施例示出的分组方法的流程示意图;FIG4A is a schematic diagram of a flow chart of a grouping method according to an embodiment of the present disclosure;
图4B是根据本公开实施例示出的分组方法的流程示意图;FIG4B is a schematic diagram of a flow chart of a grouping method according to an embodiment of the present disclosure;
图5是根据本公开实施例示出的分组方法的流程示意图;FIG5 is a flow chart of a grouping method according to an embodiment of the present disclosure;
图6是根据本公开实施例示出的分组方法的流程示意图;FIG6 is a flow chart of a grouping method according to an embodiment of the present disclosure;
图7A是本公开实施例提出的编解码装置的结构示意图;FIG7A is a schematic diagram of the structure of a coding and decoding device proposed in an embodiment of the present disclosure;
图7B是本公开实施例提出的编解码装置的结构示意图;FIG7B is a schematic diagram of the structure of a coding and decoding device proposed in an embodiment of the present disclosure;
图8A是本公开实施例提出的通信设备的结构示意图;FIG8A is a schematic diagram of the structure of a communication device provided in an embodiment of the present disclosure;
图8B是本公开实施例提出的芯片的结构示意图。 FIG8B is a schematic diagram of the structure of a chip proposed in an embodiment of the present disclosure.
本公开提供了一种分组方法、装置以及存储介质。The present disclosure provides a grouping method, device and storage medium.
根据本公开实施例的第一方面,提出了一种分组方法,所述方法由编码器执行,所述方法包括:According to a first aspect of an embodiment of the present disclosure, a grouping method is proposed. The method is performed by an encoder, and the method includes:
对多个声道进行分组,得到至少一个声道分组,其中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。A plurality of channels are grouped to obtain at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M second channel groups, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
上述实施例中,解决了对声道进行分组时存在单一声道无法分组的问题,提供了一种通过划分包括三个声道的声道分组的方式,保证对声道分组时每个声道都可以具有对应的分组,保证对声道分组的准确性,进而保证声道之间的冗余减少,保证对声道进行编码的准确性,进而保证音频编码的准确性。In the above embodiment, the problem that a single channel cannot be grouped when grouping channels is solved, and a method of dividing a channel group including three channels is provided to ensure that each channel can have a corresponding group when grouping channels, thereby ensuring the accuracy of channel grouping, thereby ensuring the reduction of redundancy between channels, ensuring the accuracy of channel encoding, and thereby ensuring the accuracy of audio encoding.
结合第一方面的一些实施例,在一些实施例中,In conjunction with some embodiments of the first aspect, in some embodiments,
所述对多个声道进行分组,得到至少一个声道分组,包括:The step of grouping the plurality of channels to obtain at least one channel group includes:
获取所述多个声道中任两个声道之间的相似度;Obtaining a similarity between any two channels of the multiple channels;
基于所述任两个声道之间的相似度对所述多个声道进行分组,得到所述至少一个声道分组。The plurality of channels are grouped based on the similarity between any two channels to obtain the at least one channel group.
在上述实施例中,通过每两个声道之间的相似度对声道分组,保证每个声道分组中包括的声道的相似度符合要求,保证对声道分组的准确性,进而保证声道之间的冗余减少,保证对声道进行编码的准确性,进而保证音频编码的准确性。In the above embodiment, the channels are grouped by the similarity between every two channels, so as to ensure that the similarity of the channels included in each channel group meets the requirements, ensure the accuracy of the channel grouping, and further ensure the reduction of redundancy between channels, ensure the accuracy of channel encoding, and further ensure the accuracy of audio encoding.
结合第一方面的一些实施例,所述基于所述任两个声道之间的相似度对所述多个声道进行分组,得到所述至少一个声道分组,包括:In combination with some embodiments of the first aspect, grouping the multiple channels based on the similarity between any two channels to obtain the at least one channel group includes:
对于所述多个声道中的任两个声道,将相似度最大的两个声道确定为一个候选声道分组;For any two channels among the multiple channels, determine the two channels with the greatest similarity as a candidate channel group;
排除所述相似度最大的两个声道以外的其他声道中,将相似度最大的两个声道确定为一个候选声道分组,直至剩余一个独立声道或不剩余声道;Excluding the other channels except the two channels with the greatest similarity, the two channels with the greatest similarity are determined as a candidate channel group until one independent channel remains or no channel remains;
基于得到的候选声道分组和/或所述独立声道,确定所述至少一个声道分组。The at least one channel grouping is determined based on the obtained candidate channel groupings and/or the independent channels.
在上述实施例中,按照相似度由大到小的顺序将声道划分为同一个声道分组,保证每个声道分组中包括的声道的相似度均为相关性大的声道,保证对声道分组的准确性,进而保证声道之间的冗余减少,保证对声道进行编码的准确性,进而保证音频编码的准确性。In the above embodiment, the channels are divided into the same channel group in the order of similarity from large to small, ensuring that the similarities of the channels included in each channel group are channels with large correlation, ensuring the accuracy of channel grouping, and further ensuring the reduction of redundancy between channels, ensuring the accuracy of channel encoding, and further ensuring the accuracy of audio encoding.
结合第一方面的一些实施例,所述基于得到的候选声道分组和/或所述独立声道,确定所述至少一个声道分组,包括:In combination with some embodiments of the first aspect, the determining the at least one channel grouping based on the obtained candidate channel grouping and/or the independent channel includes:
获取所述独立声道与每个候选声道分组中每个声道的相似度;Obtaining a similarity between the independent channel and each channel in each candidate channel group;
在所述独立声道与第一候选声道分组中每个声道的相似度均大于相似度阈值时,将所述独立声道与所述第一候选声道分组中每个声道确定为一个所述第一声道分组;When the similarity between the independent channel and each channel in the first candidate channel group is greater than a similarity threshold, determining the independent channel and each channel in the first candidate channel group as a first channel group;
将剩余的每个候选声道分组分别确定为一个所述第二声道分组。Each of the remaining candidate channel groups is determined as a second channel group.
在上述实施例中,将剩余的单个声道也划分至一个声道分组中,保证不存在单独的声道没有划分至声道分组的情况,进而保证声道之间的冗余减少,保证对声道进行编码的准确性,进而保证音频编码的准确性。In the above embodiment, the remaining single channels are also divided into a channel group to ensure that there is no single channel that is not divided into a channel group, thereby ensuring that the redundancy between channels is reduced, ensuring the accuracy of channel encoding, and thus ensuring the accuracy of audio encoding.
结合第一方面的一些实施例,在一些实施例中,所述对多个声道进行分组,得到至少一个声道分组,包括: In combination with some embodiments of the first aspect, in some embodiments, grouping the multiple channels to obtain at least one channel group includes:
对所述多个声道进行分组,直接得到所述N个第一声道分组和所述M个第二声道分组。The multiple channels are grouped to directly obtain the N first channel groups and the M second channel groups.
结合第一方面的一些实施例,在一些实施例中,所述对所述多个声道进行分组,直接得到所述N个第一声道分组和所述M个第二声道分组,包括:In combination with some embodiments of the first aspect, in some embodiments, grouping the multiple channels to directly obtain the N first channel groups and the M second channel groups includes:
通过对所述多个声道进行全局搜索,根据任两个声道之间的相似度对所述多个声道进行分组,以得到N个第一声道分组以及所述M个第二声道分组。By performing a global search on the multiple channels, the multiple channels are grouped according to the similarity between any two channels to obtain N first channel groups and the M second channel groups.
结合第一方面的一些实施例,在一些实施例中,所述对多个声道进行分组,得到至少一个声道分组,包括:In combination with some embodiments of the first aspect, in some embodiments, grouping the multiple channels to obtain at least one channel group includes:
对所述多个声道进行分组,得到包括两个声道的多个候选声道分组和至少一个独立声道;Grouping the multiple channels to obtain multiple candidate channel groups including two channels and at least one independent channel;
将所述至少一个独立声道中的一个独立声道划分至所述多个候选声道分组中的一个候选声道分组,得到所述N个第一声道分组和所述M个第二声道分组。An independent channel among the at least one independent channel is divided into one candidate channel group among the plurality of candidate channel groups to obtain the N first channel groups and the M second channel groups.
结合第一方面的一些实施例,在一些实施例中,所述方法还包括:In combination with some embodiments of the first aspect, in some embodiments, the method further includes:
对每个所述声道分组中的音频信号进行下混,得到下混后的音频信号;Downmixing the audio signal in each of the channel groups to obtain a downmixed audio signal;
对所述下混后的音频信号进行编码,得到音频流。The downmixed audio signal is encoded to obtain an audio stream.
在上述实施例中,对声道分组的音频信号进行下混编码,得到音频流,保证编码后的音频流可以正常传输,并且也保证了音频流的准确性。In the above embodiment, the audio signals of the channel groups are down-mixed and encoded to obtain an audio stream, which ensures that the encoded audio stream can be transmitted normally and also ensures the accuracy of the audio stream.
结合第一方面的一些实施例,在一些实施例中,所述方法还包括:In combination with some embodiments of the first aspect, in some embodiments, the method further includes:
发送第一信息,所述第一信息用于指示所述多个声道分组的声道信息。First information is sent, where the first information is used to indicate channel information of the plurality of channel groups.
在上述实施例中,通过第一信息来告知每个声道分组的声道信息,保证传输声道分组的可靠性。In the above embodiment, the channel information of each channel group is notified through the first information, thereby ensuring the reliability of transmitting the channel group.
结合第一方面的一些实施例,在一些实施例中,所述声道信息包括以下至少之一:In combination with some embodiments of the first aspect, in some embodiments, the channel information includes at least one of the following:
包括两个声道的声道分组的个数;The number of channel groups including two channels;
包括两个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including two channels;
包括两个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising two channels;
包括三个声道的声道分组的个数;The number of channel groups including three channels;
包括三个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including three channels;
包括三个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising three channels;
其中,所述能量参数用于对声道分组中声道的能量调整。The energy parameter is used to adjust the energy of the channels in the channel group.
在上述实施例中,声道信息中包括两个声道或三个声道的声道分组的信息,保证声道信息的全面性。In the above embodiment, the channel information includes information of channel grouping of two channels or three channels, thereby ensuring the comprehensiveness of the channel information.
第二方面,本公开实施例提供了一种分组方法,所述方法由解码器执行,所述方法包括:In a second aspect, an embodiment of the present disclosure provides a grouping method, which is performed by a decoder, and includes:
对第一信息进行解码,得到至少一个声道分组的声道信息,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。The first information is decoded to obtain channel information of at least one channel grouping, where there are N first channel groups in the at least one channel grouping, the first channel grouping includes three channels, and there are M second channel groups in the at least one channel grouping, the second channel grouping includes two channels, where N is 1 and M is a non-negative integer.
结合第二方面的一些实施例,在一些实施例中,所述声道信息包括以下至少之一:In conjunction with some embodiments of the second aspect, in some embodiments, the channel information includes at least one of the following:
包括两个声道的声道分组的个数;The number of channel groups including two channels;
包括两个声道的声道分组中包括的声道的声道标识; channel identifiers of channels included in a channel group including two channels;
包括两个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising two channels;
包括三个声道的声道分组的个数;The number of channel groups including three channels;
包括三个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including three channels;
包括三个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising three channels;
其中,所述能量参数用于对声道分组中声道的能量调整。The energy parameter is used to adjust the energy of the channels in the channel group.
结合第二方面的一些实施例,在一些实施例中,所述方法还包括:In conjunction with some embodiments of the second aspect, in some embodiments, the method further includes:
在确定所述声道分组中不包括三个声道的声道分组时,对音频流进行两声道上混。When it is determined that the channel groups do not include a channel grouping for three channels, two-channel upmixing is performed on the audio stream.
结合第二方面的一些实施例,在一些实施例中,所述方法还包括:In conjunction with some embodiments of the second aspect, in some embodiments, the method further includes:
在确定所述声道分组中包括三个声道的声道分组时,对音频流进行三声道上混。When it is determined that the channel groups include a channel grouping with three channels, three-channel upmixing is performed on the audio stream.
第三方面,本公开实施例提供了一种分组方法,所述方法包括:In a third aspect, an embodiment of the present disclosure provides a grouping method, the method comprising:
编码器对多个声道进行分组,得到至少一个声道分组,其中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数;The encoder groups the multiple channels to obtain at least one channel group, wherein the at least one channel group includes N first channel groups, the first channel group includes three channels, and the at least one channel group includes M second channel groups, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer;
解码器对第一信息进行解码,得到所述至少一个声道分组的声道信息,所述多个声道分组中至少存在一个声道分组中包括三个声道。The decoder decodes the first information to obtain channel information of the at least one channel grouping, wherein at least one channel grouping among the multiple channel groups includes three channels.
第四方面,本公开实施例提供了一种编解码装置,上述编解码装置包括收发模块、处理模块中的至少一者;其中,上述编码器用于执行第一方面和第三方面的可选实现方式。In a fourth aspect, an embodiment of the present disclosure provides a coding and decoding device, which includes at least one of a transceiver module and a processing module; wherein the encoder is used to execute the optional implementation methods of the first and third aspects.
第五方面,本公开实施例提供了一种编解码装置,上述编解码装置包括收发模块、处理模块中的至少一者;其中,上述接入网设备用于执行第二方面和第三方面的可选实现方式。In a fifth aspect, an embodiment of the present disclosure provides a coding and decoding device, which includes at least one of a transceiver module and a processing module; wherein the access network device is used to execute the optional implementation methods of the second and third aspects.
第六方面,本公开实施例提供了一种编解码装置,包括:In a sixth aspect, an embodiment of the present disclosure provides a coding and decoding device, including:
一个或多个处理器;one or more processors;
其中,所述编解码装置用于执行第一方面和第三方面中任一项所述的方法。The encoding and decoding device is used to execute the method described in any one of the first aspect and the third aspect.
第七方面,本公开实施例提供了一种编解码装置,包括:In a seventh aspect, an embodiment of the present disclosure provides a coding and decoding device, including:
一个或多个处理器;one or more processors;
其中,所述编解码装置用于执行第二方面和第三方面中任一项所述的方法。The encoding and decoding device is used to execute the method described in any one of the second aspect and the third aspect.
第八方面,本公开实施例提供了一种存储介质,所述存储介质存储有第一信息,当所述第一信息在通信设备上运行时,使得所述通信设备执行如第一方面、第二方面和第三方面中任一项所述的方法。In an eighth aspect, an embodiment of the present disclosure provides a storage medium, wherein the storage medium stores first information, and when the first information is run on a communication device, the communication device executes a method as described in any one of the first aspect, the second aspect, and the third aspect.
第九方面,本公开实施例提出了程序产品,上述程序产品被通信设备执行时,使得上述通信设备执行如第一方面、第二方面和第三方面中任一所述的方法。In a ninth aspect, an embodiment of the present disclosure proposes a program product. When the program product is executed by a communication device, the communication device executes any one of the methods described in the first aspect, the second aspect and the third aspect.
第十方面,本公开实施例提出了计算机程序,当其在通信设备上运行时,使得通信设备执行如第一方面、第二方面和第三方面中任一所述的方法。In a tenth aspect, an embodiment of the present disclosure proposes a computer program, which, when executed on a communication device, enables the communication device to execute any one of the methods described in the first aspect, the second aspect, and the third aspect.
第十一方面,本公开实施例提供了一种芯片或芯片系统。该芯片或芯片系统包括处理电路,被配置为执行第一方面、第二方面和第三方面中任一所述的方法。In an eleventh aspect, an embodiment of the present disclosure provides a chip or a chip system, wherein the chip or the chip system comprises a processing circuit configured to execute any one of the methods described in the first aspect, the second aspect, and the third aspect.
可以理解地,上述编码器、存储介质、程序产品、计算机程序、芯片或芯片系统均用于执行本公开实 施例所提出的方法。因此,其所能达到的有益效果可以参考对应方法中的有益效果,此处不再赘述。It can be understood that the above encoders, storage media, program products, computer programs, chips or chip systems are all used to implement the present disclosure. Therefore, the beneficial effects that can be achieved by the method proposed in the embodiment can refer to the beneficial effects in the corresponding method, which will not be repeated here.
本公开实施例提出了分组方法、装置以及存储介质。在一些实施例中,分组方法与信息分组方法、分组方法等术语可以相互替换,编解码装置与信息处理装置、指示装置等术语可以相互替换,信息处理系统、编解码系统等术语可以相互替换。The disclosed embodiments propose a grouping method, device, and storage medium. In some embodiments, the terms grouping method, information grouping method, grouping method, etc. can be replaced with each other, the terms encoding and decoding device, information processing device, indicating device, etc. can be replaced with each other, and the terms information processing system, encoding and decoding system, etc. can be replaced with each other.
本公开实施例并非穷举,仅为部分实施例的示意,不作为对本公开保护范围的具体限制。在不矛盾的情况下,某一实施例中的每个步骤均可以作为独立实施例来实施,且各步骤之间可以任意组合,例如,在某一实施例中去除部分步骤后的方案也可以作为独立实施例来实施,且在某一实施例中各步骤的顺序可以任意交换,另外,某一实施例中的可选实现方式可以任意组合;此外,各实施例之间可以任意组合,例如,不同实施例的部分或全部步骤可以任意组合,某一实施例可以与其他实施例的可选实现方式任意组合。The embodiments of the present disclosure are not exhaustive, but are only illustrative of some embodiments, and are not intended to be a specific limitation on the scope of protection of the present disclosure. In the absence of contradiction, each step in a certain embodiment can be implemented as an independent embodiment, and the steps can be arbitrarily combined. For example, a solution after removing some steps in a certain embodiment can also be implemented as an independent embodiment, and the order of the steps in a certain embodiment can be arbitrarily exchanged. In addition, the optional implementation methods in a certain embodiment can be arbitrarily combined; in addition, the embodiments can be arbitrarily combined, for example, some or all of the steps of different embodiments can be arbitrarily combined, and a certain embodiment can be arbitrarily combined with the optional implementation methods of other embodiments.
在各本公开实施例中,如果没有特殊说明以及逻辑冲突,各实施例之间的术语和/或描述具有一致性,且可以互相引用,不同实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。In each embodiment of the present disclosure, unless otherwise specified or there is a logical conflict, the terms and/or descriptions between the embodiments are consistent and can be referenced to each other, and the technical features in different embodiments can be combined to form a new embodiment based on their internal logical relationships.
本公开实施例中所使用的术语只是为了描述特定实施例的目的,而并非作为对本公开的限制。The terms used in the embodiments of the present disclosure are only for the purpose of describing specific embodiments and are not intended to limit the present disclosure.
在本公开实施例中,除非另有说明,以单数形式表示的元素,如“一个”、“一种”、“该”、“上述”、“所述”、“前述”、“这一”等,可以表示“一个且只有一个”,也可以表示“一个或多个”、“至少一个”等。例如,在翻译中使用如英语中的“a”、“an”、“the”等冠词(article)的情况下,冠词之后的名词可以理解为单数表达形式,也可以理解为复数表达形式。In the embodiments of the present disclosure, unless otherwise specified, elements expressed in the singular form, such as "a", "an", "the", "above", "said", "aforementioned", "this", etc., may mean "one and only one", or "one or more", "at least one", etc. For example, when using articles such as "a", "an", "the" in English in translation, the noun after the article may be understood as a singular expression or a plural expression.
在本公开实施例中,“多个”是指两个或两个以上。In the embodiments of the present disclosure, “plurality” refers to two or more.
在一些实施例中,“至少一者(至少一项、至少一个)(at least one of)”、“一个或多个(one or more)”、“多个(a plurality of)”、“多个(multiple)等术语可以相互替换。In some embodiments, the terms "at least one of", "one or more", "a plurality of", "multiple", etc. can be used interchangeably.
在一些实施例中,“A、B中的至少一者”、“A和/或B”、“在一情况下A,在另一情况下B”、“响应于一情况A,响应于另一情况B”等记载方式,根据情况可以包括以下技术方案:在一些实施例中A(与B无关地执行A);在一些实施例中B(与A无关地执行B);在一些实施例中从A和B中选择执行(A和B被选择性执行);在一些实施例中A和B(A和B都被执行)。当有A、B、C等更多分支时也类似上述。In some embodiments, "at least one of A and B", "A and/or B", "A in one case, B in another case", "in response to one case A, in response to another case B", etc., may include the following technical solutions according to the situation: in some embodiments, A (A is executed independently of B); in some embodiments, B (B is executed independently of A); in some embodiments, execution is selected from A and B (A and B are selectively executed); in some embodiments, A and B (both A and B are executed). When there are more branches such as A, B, C, etc., the above is also similar.
在一些实施例中,“A或B”等记载方式,根据情况可以包括以下技术方案:在一些实施例中A(与B无关地执行A);在一些实施例中B(与A无关地执行B);在一些实施例中从A和B中选择执行(A和B被选择性执行)。当有A、B、C等更多分支时也类似上述。In some embodiments, the recording method of "A or B" may include the following technical solutions according to the situation: in some embodiments, A (A is executed independently of B); in some embodiments, B (B is executed independently of A); in some embodiments, execution is selected from A and B (A and B are selectively executed). When there are more branches such as A, B, C, etc., the above is also similar.
本公开实施例中的“第一”、“第二”等前缀词,仅仅为了区分不同的描述对象,不对描述对象的位置、顺序、优先级、数量或内容等构成限制,对描述对象的陈述参见权利要求或实施例中上下文的描述,不应因为使用前缀词而构成多余的限制。例如,描述对象为“字段”,则“第一字段”和“第二字段”中“字段”之前的序数词并不限制“字段”之间的位置或顺序,“第一”和“第二”并不限制其修饰的“字段”是否在同一个消息中,也不限制“第一字段”和“第二字段”的先后顺序。再如,描述对象为“等级”,则“第一等级”和“第二等级”中“等级”之前的序数词并不限制“等级”之间的优先级。再如,描述对象的数量并不受序数词的限制,可以是一个或者多个,以“第一装置”为例,其中“装置”的数量可以是一个或者多个。此外,不同前缀词修饰的 对象可以相同或不同,例如,描述对象为“装置”,则“第一装置”和“第二装置”可以是相同的装置或者不同的装置,其类型可以相同或不同;再如,描述对象为“信息”,则“第一信息”和“第二信息”可以是相同的信息或者不同的信息,其内容可以相同或不同。The prefixes such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different description objects, and do not constitute any restrictions on the position, order, priority, quantity or content of the description objects. For the statement of the description object, please refer to the description in the context of the claims or embodiments, and no unnecessary restrictions should be imposed due to the use of prefixes. For example, if the description object is a "field", the ordinal number before the "field" in the "first field" and the "second field" does not limit the position or order between the "fields". "First" and "second" do not limit whether the "fields" they modify are in the same message, nor do they limit the order of the "first field" and the "second field". For another example, if the description object is a "level", the ordinal number before the "level" in the "first level" and the "second level" does not limit the priority between the "levels". For another example, the number of description objects is not limited by ordinal numbers and can be one or more. Taking "first device" as an example, the number of "devices" can be one or more. In addition, different prefixes modify The objects may be the same or different. For example, if the description object is “device”, then the “first device” and the “second device” may be the same device or different devices, and their types may be the same or different. For another example, if the description object is “information”, then the “first information” and the “second information” may be the same information or different information, and their contents may be the same or different.
在一些实施例中,“包括A”、“包含A”、“用于指示A”、“携带A”,可以解释为直接携带A,也可以解释为间接指示A。In some embodiments, “including A”, “comprising A”, “used to indicate A”, and “carrying A” can be interpreted as directly carrying A or indirectly indicating A.
在一些实施例中,“时频(time/frequency)”、“时频域”等术语是指时域和/或频域。In some embodiments, terms such as "time/frequency", "time/frequency domain", etc. refer to the time domain and/or the frequency domain.
在一些实施例中,“响应于……”、“响应于确定……”、“在……的情况下”、“在……时”、“当……时”、“若……”、“如果……”等术语可以相互替换。In some embodiments, terms such as "in response to ...", "in response to determining ...", "in the case of ...", "at the time of ...", "when ...", "if ...", "if ...", etc. can be used interchangeably.
在一些实施例中,“大于”、“大于或等于”、“不小于”、“多于”、“多于或等于”、“不少于”、“高于”、“高于或等于”、“不低于”、“以上”等术语可以相互替换,“小于”、“小于或等于”、“不大于”、“少于”、“少于或等于”、“不多于”、“低于”、“低于或等于”、“不高于”、“以下”等术语可以相互替换。In some embodiments, terms such as "greater than", "greater than or equal to", "not less than", "more than", "more than or equal to", "not less than", "higher than", "higher than or equal to", "not lower than", and "above" can be replaced with each other, and terms such as "less than", "less than or equal to", "not greater than", "less than", "less than or equal to", "no more than", "lower than", "lower than or equal to", "not higher than", and "below" can be replaced with each other.
在一些实施例中,装置和设备可以解释为实体的、也可以解释为虚拟的,其名称不限定于实施例中所记载的名称,在一些情况下也可以被理解为“设备(equipment)”、“设备(device)”、“电路”、“网元”、“节点”、“功能”、“单元”、“部件(section)”、“系统”、“网络”、“芯片”、“芯片系统”、“实体”、“主体”等。In some embodiments, devices and equipment may be interpreted as physical or virtual, and their names are not limited to the names recorded in the embodiments. In some cases, they may also be understood as "equipment", "device", "circuit", "network element", "node", "function", "unit", "section", "system", "network", "chip", "chip system", "entity", "subject", etc.
在一些实施例中,“网络”可以解释为网络中包含的装置,例如,接入网设备、核心网设备等。In some embodiments, "network" can be interpreted as devices included in the network, such as access network equipment, core network equipment, etc.
在一些实施例中,“接入网设备(access network device,AN device)”也可以被称为“无线接入网设备(radio access network device,RAN device)”、“基站(base station,BS)”、“无线基站(radio base station)”、“固定台(fixed station)”,在一些实施例中也可以被理解为“节点(node)”、“接入点(access point)”、“发送点(transmission point,TP)”、“接收点(reception point,RP)”、“发送和/或接收点(transmission/reception point,TRP)”、“面板(panel)”、“天线面板(antenna panel)”、“天线阵列(antenna array)”、“小区(cell)”、“宏小区(macro cell)”、“小型小区(small cell)”、“毫微微小区(femto cell)”、“微微小区(pico cell)”、“扇区(sector)”、“小区组(cell group)”、“服务小区”、“载波(carrier)”、“分量载波(component carrier)”、“带宽部分(bandwidth part,BWP)”等。In some embodiments, "access network device (AN device)" may also be referred to as "radio access network device (RAN device)", "base station (BS)", "radio base station (radio base station)", "fixed station" and in some embodiments may also be understood as "node", "access point (access point)", "transmission point (TP)", "reception point (RP)", "transmission and/or reception point (transmission/reception point, TRP)", "panel", "antenna panel", "antenna array", "cell", "macro cell", "small cell", "femto cell", "pico cell", "sector", "cell group", "serving cell", "carrier", "component carrier", "bandwidth part (bandwidth part, BWP)", etc.
在一些实施例中,“编码器(terminal)”或“编码器设备(terminal device)”可以被称为“用户设备(user equipment,编码器)”、“用户编码器(user terminal)”、“移动台(mobile station,MS)”、“移动编码器(mobile terminal,MT)”、订户站(subscriber station)、移动单元(mobile unit)、订户单元(subscriber unit)、无线单元(wireless unit)、远程单元(remote unit)、移动设备(mobile device)、无线设备(wireless device)、无线通信设备(wireless communication device)、远程设备(remote device)、移动订户站(mobile subscriber station)、接入编码器(access terminal)、移动编码器(mobile terminal)、无线编码器(wireless terminal)、远程编码器(remote terminal)、手持设备(handset)、用户代理(user agent)、移动客户端(mobile client)、客户端(client)等。In some embodiments, "encoder (terminal)" or "encoder device (terminal device)" can be referred to as "user equipment (encoder)", "user encoder (user terminal)", "mobile station (MS)", "mobile encoder (mobile terminal, MT)", subscriber station (subscriber station), mobile unit (mobile unit), subscriber unit (subscriber unit), wireless unit (wireless unit), remote unit (remote unit), mobile device (mobile device), wireless device (wireless device), wireless communication device (wireless communication device), remote device (remote device), mobile subscriber station (mobile subscriber station), access terminal, mobile encoder (mobile terminal), wireless encoder (wireless terminal), remote encoder (remote terminal), handheld device (handset), user agent (user agent), mobile client (mobile client), client (client), etc.
在一些实施例中,获取数据、信息等可以遵照所在地国家的法律法规。In some embodiments, acquisition of data, information, etc. may comply with the laws and regulations of the country where the data is obtained.
在一些实施例中,可以在得到用户同意后获取数据、信息等。In some embodiments, data, information, etc. may be obtained with the user's consent.
此外,本公开实施例的表格中的每一元素、每一行、或每一列均可以作为独立实施例来实施,任意元 素、任意行、任意列的组合也可以作为独立实施例来实施。In addition, each element, each row, or each column in the table of the embodiment of the present disclosure can be implemented as an independent embodiment. Any combination of pixels, any rows, and any columns can also be implemented as independent embodiments.
图1是根据本公开实施例示出的编解码系统的架构示意图,如图1所示,本公开实施例提供的方法可应用于编解码系统100,该编解码系统可以包括编码器101、解码器102。需要说明的是,该编解码系统100还可以包括其他设备,本公开对该编解码系统100包括的设备不做限定。FIG1 is a schematic diagram of the architecture of a codec system according to an embodiment of the present disclosure. As shown in FIG1 , the method provided by the embodiment of the present disclosure can be applied to a codec system 100, and the codec system can include an encoder 101 and a decoder 102. It should be noted that the codec system 100 can also include other devices, and the present disclosure does not limit the devices included in the codec system 100.
在一些实施例中,编码器101和解码器102均设置于终端中。在一些实施例中,该终端可以为各种事设备。例如包括手机(mobile phone)、可穿戴设备、物联网设备、具备通信功能的汽车、智能汽车、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)编码器设备、增强现实(augmented reality,AR)编码器设备、工业控制(industrial control)中的无线编码器设备、无人驾驶(self-driving)中的无线编码器设备、远程手术(remote medical surgery)中的无线编码器设备、智能电网(smart grid)中的无线编码器设备、运输安全(transportation safety)中的无线编码器设备、智慧城市(smart city)中的无线编码器设备、智慧家庭(smart home)中的无线编码器设备中的至少一者,但不限于此。In some embodiments, the encoder 101 and the decoder 102 are both arranged in the terminal. In some embodiments, the terminal can be various devices. For example, it includes at least one of a mobile phone, a wearable device, an Internet of Things device, a car with communication function, a smart car, a tablet computer (Pad), a computer with wireless transceiver function, a virtual reality (VR) encoder device, an augmented reality (AR) encoder device, a wireless encoder device in industrial control, a wireless encoder device in self-driving, a wireless encoder device in remote medical surgery, a wireless encoder device in smart grid, a wireless encoder device in transportation safety, a wireless encoder device in smart city, and a wireless encoder device in smart home, but is not limited thereto.
可以理解的是,本公开实施例描述的编解码系统是为了更加清楚的说明本公开实施例的技术方案,并不构成对于本公开实施例提出的技术方案的限定,本领域普通技术人员可知,随着系统架构的演变和新业务场景的出现,本公开实施例提出的技术方案对于类似的技术问题同样适用。It can be understood that the encoding and decoding system described in the embodiment of the present disclosure is for the purpose of more clearly illustrating the technical solution of the embodiment of the present disclosure, and does not constitute a limitation on the technical solution proposed in the embodiment of the present disclosure. A person of ordinary skill in the art can know that with the evolution of the system architecture and the emergence of new business scenarios, the technical solution proposed in the embodiment of the present disclosure is also applicable to similar technical problems.
下述本公开实施例可以应用于图1所示的编解码系统100、或部分主体,但不限于此。图1所示的各主体是例示,编解码系统可以包括图1中的全部或部分主体,也可以包括图1以外的其他主体,各主体数量和形态为任意,各主体可以是实体的也可以是虚拟的,各主体之间的连接关系是例示,各主体之间可以不连接也可以连接,其连接可以是任意方式,可以是直接连接也可以是间接连接,可以是有线连接也可以是无线连接。The following embodiments of the present disclosure may be applied to the codec system 100 shown in FIG1 , or part of the subject, but are not limited thereto. The subjects shown in FIG1 are examples, and the codec system may include all or part of the subjects in FIG1 , or may include other subjects other than FIG1 , and the number and form of the subjects are arbitrary, and the subjects may be physical or virtual, and the connection relationship between the subjects is an example, and the subjects may be connected or disconnected, and the connection may be in any manner, and may be a direct connection or an indirect connection, and may be a wired connection or a wireless connection.
本公开各实施例可以应用于长期演进(Long Term Evolution,LTE)、LTE-Advanced(LTE-A)、LTE-Beyond(LTE-B)、SUPER 3G、IMT-Advanced、第四代移动编解码系统(4th generation mobile communication system,4G)、)、第五代移动编解码系统(5th generation mobile communication system,5G)、5G新空口(new radio,NR)、未来无线接入(Future Radio Access,FRA)、新无线接入技术(New-Radio Access Technology,RAT)、新无线(New Radio,NR)、新无线接入(New radio access,NX)、未来一代无线接入(Future generation radio access,FX)、Global System for Mobile communications(GSM(注册商标))、CDMA2000、超移动宽带(Ultra Mobile Broadband,UMB)、IEEE 802.11(Wi-Fi(注册商标))、IEEE 802.16(WiMAX(注册商标))、IEEE 802.20、超宽带(Ultra-WideBand,UWB)、蓝牙(Bl编码器tooth(注册商标))、陆上公用移动通信网(Public Land Mobile Network,PLMN)网络、设备到设备(Device-to-Device,D2D)系统、机器到机器(Machine to Machine,M2M)系统、物联网(Internet of Things,IoT)系统、车联网(Vehicle-to-Everything,V2X)、利用其他分组方法的系统、基于它们而扩展的下一代系统等。此外,也可以将多个系统组合(例如,LTE或者LTE-A与5G的组合等)应用。The embodiments of the present disclosure may be applied to Long Term Evolution (LTE), LTE-Advanced (LTE-A), LTE-Beyond (LTE-B), SUPER 3G, IMT-Advanced, the fourth generation mobile communication system (4G), the fifth generation mobile communication system (5G), 5G new radio (NR), Future Radio Access (FRA), New-Radio Access Technology (RAT), New Radio (NR), New radio access (NX), Future generation radio access ...-Radio The present invention relates to wireless communication networks such as LTE, LTE-A, LTE-X, Global System for Mobile communications (GSM (registered trademark)), CDMA2000, Ultra Mobile Broadband (UMB), IEEE 802.11 (Wi-Fi (registered trademark)), IEEE 802.16 (WiMAX (registered trademark)), IEEE 802.20, Ultra Wide Band (UWB), Bluetooth (registered trademark)), Public Land Mobile Network (PLMN) network, Device to Device (D2D) system, Machine to Machine to Machine (M2M) system, Internet of Things (IoT) system, Vehicle to Everything (V2X), systems using other packet methods, and next-generation systems extended based on them. In addition, a combination of multiple systems (for example, a combination of LTE or LTE-A with 5G, etc.) may also be applied.
在一些实施例中,本公开用于三维声音频编码。其中,该三维声音频编码是沉浸式音频技术的关键技术之一。三维声相对传统声音增加了空间感和方位感,使听众能再现在现实世界中所听到的声音,从而满足人们对声音高度还原,高度沉浸的体验需求,同时可具备个性化选择和交互体验。 In some embodiments, the present disclosure is used for three-dimensional sound audio coding. Among them, the three-dimensional sound audio coding is one of the key technologies of immersive audio technology. Three-dimensional sound increases the sense of space and orientation compared to traditional sound, allowing listeners to reproduce the sound heard in the real world, thereby meeting people's needs for highly restored sound and highly immersive experience, while providing personalized selection and interactive experience.
在一些实施例中,为实现重现声音的空间感和方位感,技术上可以依托于基于声道的方式、基于对象的方式、基于声场的方式以及以上三种形式的组合,其中:基于声道的音频是一组相互关联的声道,常见的有5.1声道、7.1声道、5.1.4声道、7.1.4声道等,每一种格式对应一种扬声器布局,在对应的扬声器布局下可以获得最佳的回放效果。In some embodiments, in order to reproduce the spatial sense and orientation sense of sound, the technology can rely on a channel-based approach, an object-based approach, a sound field-based approach, and a combination of the above three forms, among which: channel-based audio is a group of interrelated channels, common ones are 5.1 channels, 7.1 channels, 5.1.4 channels, 7.1.4 channels, etc. Each format corresponds to a speaker layout, and the best playback effect can be obtained under the corresponding speaker layout.
在一些实施例中,基于对象的音频是一系列单声道音频元素和对应元数据的集合。元数据表示对象的位置,强度,大小等信息。在回放时根据元数据信息,将对象映射到一个或多个扬声器或者双耳化渲染到耳机播放,以达到想要的空间音频效果。In some embodiments, object-based audio is a collection of a series of mono audio elements and corresponding metadata. The metadata represents information such as the location, intensity, size, etc. of the object. During playback, the object is mapped to one or more speakers or binaurally rendered to headphones based on the metadata information to achieve the desired spatial audio effect.
在一些实施例中,基于声场的音频是一种定义在球体表面上的3D声场建模格式。其原理是声音作为压力波进行传递,对于给定时间的声音场景,每个点都需要借助数个压力函数得以体现。倘若获知该空间中每个点的压力值,便可对空间中的声音进行重构。空间中每个点的压力和其邻近的点存在一定的关系,为了使基于场景的音频制作方式的优势得以充分发挥,需要对系数进行准确获取,提高声场空间系数的编码质量。采集到的声场信号称为高阶立体声(Higher Order Ambisonics,HOA)。HOA系统性能随着HOA阶数增加而增加,但HOA信号数量也随之增加。In some embodiments, sound field-based audio is a 3D sound field modeling format defined on the surface of a sphere. The principle is that sound is transmitted as a pressure wave. For a sound scene at a given time, each point needs to be reflected by several pressure functions. If the pressure value of each point in the space is known, the sound in the space can be reconstructed. There is a certain relationship between the pressure of each point in the space and its neighboring points. In order to give full play to the advantages of scene-based audio production, it is necessary to accurately obtain the coefficients and improve the encoding quality of the sound field space coefficients. The collected sound field signal is called Higher Order Ambisonics (HOA). The performance of the HOA system increases with the increase of the HOA order, but the number of HOA signals also increases.
图2是根据本公开实施例示出的分组方法的交互示意图。如图2所示,本公开实施例涉及分组方法,上述方法包括:FIG2 is an interactive schematic diagram of a grouping method according to an embodiment of the present disclosure. As shown in FIG2 , an embodiment of the present disclosure relates to a grouping method, and the method includes:
步骤S2101,编码器对多个声道进行分组,得到至少一个声道分组。Step S2101: the encoder groups multiple channels to obtain at least one channel group.
在一些实施例中,声道指声音在录制或播放时在不同空间位置采集或回放的相互独立的音频信号。声道是指音频的元素,不同类型的音频具有的声道数不同。例如,HOA3信号有16个声道,比如MC22.2信号有24个声道。In some embodiments, a channel refers to independent audio signals collected or played back at different spatial locations when recording or playing sound. A channel refers to an element of audio, and different types of audio have different numbers of channels. For example, a HOA3 signal has 16 channels, and a MC22.2 signal has 24 channels.
在一些实施例中,该音频信号包括音频数据和边信息。In some embodiments, the audio signal comprises audio data and side information.
在一些实施例中,声道分组是指包括至少两个声道的分组。编码器会以声道分组为单位对声道进行编码,得到音频流。In some embodiments, a channel group refers to a group including at least two channels. The encoder encodes the channels in units of channel groups to obtain an audio stream.
在一些实施例中,该声道分组的名称不做限定。其例如是声道组、声道组别等。In some embodiments, the name of the channel group is not limited, and may be, for example, a channel group, a channel group category, etc.
在一些实施例中,至少一个声道分组中存在N个第一声道分组,第一声道分组包括三个声道,M为非负整数In some embodiments, there are N first channel groups in at least one channel group, the first channel group includes three channels, and M is a non-negative integer.
在一些实施例中,至少一个声道分组中存在M个第二声道分组,第二声道分组包括两个声道,其中,N为1。In some embodiments, there are M second channel groups in the at least one channel group, the second channel group includes two channels, where N is 1.
在一些实施例中,该N也可以为0,也就是说,在对多个声道进行分组时,未得到包括三个声道的声道分组,也就是0个第一声道分组。In some embodiments, N may also be 0, that is, when a plurality of channels are grouped, no channel group including three channels is obtained, that is, 0 first channel groups.
可选地,对于M的取值,M还应该不大于声道数量的一半。可选地,对声道进行分组的声道数量为P,则M的取值为M大于或等于0,小于或等于P/2,其中,P为正整数。Optionally, for the value of M, M should not be greater than half of the number of channels. Optionally, if the number of channels for grouping channels is P, then the value of M is M greater than or equal to 0 and less than or equal to P/2, where P is a positive integer.
在一些实施例中,声道在进行分组之前,需要先对声道包括的信号进行预处理,在预处理完成后再对声道进行分组。In some embodiments, before the channels are grouped, it is necessary to pre-process the signals included in the channels, and then group the channels after the pre-processing is completed.
可选地,对声道包括的信号进行预处理的方式包括以下至少之一:暂态检测、窗型判断、时频 变换、频域噪声整形、时域噪声整形、频带扩展编码。Optionally, the method of preprocessing the signal included in the channel includes at least one of the following: transient detection, window type judgment, time-frequency Transform, frequency domain noise shaping, time domain noise shaping, and band extension coding.
在一些实施例中,对多个声道进行分组,得到至少一个声道分组,包括:获取多个声道中任两个声道之间的相似度,基于任两个声道之间的相似度对多个声道进行分组,得到至少一个声道分组。In some embodiments, grouping the multiple channels to obtain at least one channel group includes: obtaining similarity between any two channels in the multiple channels, and grouping the multiple channels based on the similarity between any two channels to obtain at least one channel group.
在本公开实施例中,编码器可以获取每两个声道之间的相似度,进而根据每两个声道之间的相似度的大小关系对多个声道进行分组,得到至少一个声道分组。In the embodiment of the present disclosure, the encoder may obtain the similarity between every two channels, and then group the multiple channels according to the magnitude relationship of the similarity between every two channels to obtain at least one channel group.
在一些实施例中,基于任两个声道之间的相似度对多个声道进行分组,得到至少一个声道分组,包括:对于多个声道中的任两个声道,将相似度最大的两个声道确定为一个候选声道分组,排除相似度最大的两个声道以外的其他声道中,将相似度最大的两个声道确定为一个候选声道分组,直至剩余一个独立声道或不剩余声道,基于得到的候选声道分组和/或独立声道,确定至少一个声道分组。In some embodiments, a plurality of channels are grouped based on similarity between any two channels to obtain at least one channel grouping, including: for any two channels among the plurality of channels, two channels with the greatest similarity are determined as a candidate channel grouping, and among channels other than the two channels with the greatest similarity, the two channels with the greatest similarity are determined as a candidate channel grouping until one independent channel remains or no channels remain, and at least one channel grouping is determined based on the obtained candidate channel grouping and/or independent channels.
例如,多个声道分别为声道1、声道2、声道3、声道4和声道5,则分别获取每两个声道之间的相似度,例如声道1和声道2之间的相似度最高,则将声道1和声道2作为一个声道分组,然后再排除声道1和声道2,从声道3、声道4和声道5中相似度最大的作为一个声道分组,此时剩余一个声道5,该声道5即为独立声道。For example, the multiple channels are channel 1, channel 2, channel 3, channel 4 and channel 5, then the similarity between every two channels is obtained respectively. For example, the similarity between channel 1 and channel 2 is the highest, then channel 1 and channel 2 are grouped as a channel, and then channel 1 and channel 2 are excluded, and the channels with the greatest similarity among channel 3, channel 4 and channel 5 are grouped as a channel. At this time, one channel 5 remains, and channel 5 is an independent channel.
在一些实施例中,基于得到的候选声道分组和/或独立声道,确定至少一个声道分组,包括:获取独立声道与每个候选声道分组中每个声道的相似度,在独立声道与第一候选声道分组中每个声道的相似度均大于相似度阈值时,将独立声道与第一候选声道分组中每个声道确定为一个第一声道分组,将剩余的每个候选声道分组分别确定为一个第二声道分组。In some embodiments, at least one channel grouping is determined based on the obtained candidate channel groups and/or independent channels, including: obtaining the similarity between the independent channel and each channel in each candidate channel group, and when the similarity between the independent channel and each channel in the first candidate channel group is greater than a similarity threshold, determining the independent channel and each channel in the first candidate channel group as a first channel group, and determining each remaining candidate channel group as a second channel group.
在本公开实施例中,在独立声道与第一候选声道分组中每个声道的相似度均大于相似度阈值时,说明该独立声道与第一候选声道分组中的两个声道均相似,则此时可以将该独立声道划分至第一候选声道分组中,此时第一候选声道分组中包括三个声道,也就是说为第一声道分组,剩余的声道分组中仍包括两个声道,也就是说为第二声道分组。In the embodiment of the present disclosure, when the similarity between an independent channel and each channel in the first candidate channel group is greater than the similarity threshold, it means that the independent channel is similar to both channels in the first candidate channel group. In this case, the independent channel can be divided into the first candidate channel group. At this time, the first candidate channel group includes three channels, that is, the first channel group, and the remaining channel group still includes two channels, that is, the second channel group.
在一些实施例中,在基于任两个声道之间的相似度对多个声道进行分组时,确定奇数个声道满足分组条件,进而分组得到至少一个声道分组的情况下,获取独立声道与每个候选声道分组中每个声道的相似度,在独立声道与第一候选声道分组中每个声道的相似度均大于相似度阈值时,将独立声道与第一候选声道分组中每个声道确定为一个第一声道分组,将剩余的每个候选声道分组分别确定为一个第二声道分组。In some embodiments, when a plurality of channels are grouped based on the similarity between any two channels, when it is determined that an odd number of channels satisfy the grouping condition and then grouped to obtain at least one channel group, the similarity between the independent channel and each channel in each candidate channel group is obtained, and when the similarity between the independent channel and each channel in the first candidate channel group is greater than a similarity threshold, the independent channel and each channel in the first candidate channel group are determined as a first channel group, and each of the remaining candidate channel groups are respectively determined as a second channel group.
需要说明的是,本公开实施例是以可以得到独立声道为例进行说明。而在另一实施例中,对于多个声道中的任两个声道,将相似度最大的两个声道确定为一个候选声道分组,排除相似度最大的两个声道以外的其他声道中,将相似度最大的两个声道确定为一个候选声道分组,直至不剩余声道,基于得到的候选声道分组,确定至少一个声道分组。It should be noted that the embodiment of the present disclosure is described by taking the example of obtaining independent channels. In another embodiment, for any two channels among the multiple channels, the two channels with the greatest similarity are determined as a candidate channel group, and among the channels other than the two channels with the greatest similarity, the two channels with the greatest similarity are determined as a candidate channel group until no channels remain, and at least one channel group is determined based on the obtained candidate channel group.
需要说明的是,对于多个声道中的任两个声道,若任两个声道之间的相似度均不大于相似度阈值,则不对声道进行分组。It should be noted that, for any two channels among the multiple channels, if the similarity between any two channels is not greater than the similarity threshold, the channels are not grouped.
需要说明的是,本公开实施例中可能存在独立声道与多个候选声道分组中的两个声道的相似度均大于相似度阈值,在此情况下,需要额外判断将独立声道划分至哪个候选声道分组中。 It should be noted that in the embodiment of the present disclosure, the similarity between an independent channel and two channels in multiple candidate channel groups may be greater than the similarity threshold. In this case, it is necessary to additionally determine into which candidate channel group the independent channel is to be divided.
在一些实施例中,获取独立声道与每个候选声道分组中的两个声道的相似度的总和,从每个候选声道分组对应的总和中选择最大的总和,将该独立声道划分至该最大的总和对应的声道分组中。In some embodiments, the sum of similarities between the independent channel and two channels in each candidate channel group is obtained, the largest sum is selected from the sums corresponding to each candidate channel group, and the independent channel is divided into the channel group corresponding to the largest sum.
例如,独立声道为声道5,第一个候选声道分组中包括声道1和声道2,第二个候选声道分组包括声道3和声道4,若声道5与声道1和声道2的相似度的总和为1.8,声道5与声道3和声道4的相似度的总和为1.9,则将声道5划分至第二个候选声道分组中。For example, the independent channel is channel 5, the first candidate channel group includes channel 1 and channel 2, and the second candidate channel group includes channel 3 and channel 4. If the sum of the similarities between channel 5 and channel 1 and channel 2 is 1.8, and the sum of the similarities between channel 5 and channel 3 and channel 4 is 1.9, channel 5 is divided into the second candidate channel group.
在一些实施例中,确定独立声道与每个候选声道分组中的两个声道的相似度中,最大的相似度对应的候选声道分组,将该独立声道划分至该最大的相似度对应的声道分组中。In some embodiments, among the similarities between the independent channel and the two channels in each candidate channel group, the candidate channel group corresponding to the greatest similarity is determined, and the independent channel is divided into the channel group corresponding to the greatest similarity.
例如,独立声道为声道5,第一个候选声道分组中包括声道1和声道2,第二个候选声道分组包括声道3和声道4,若声道5与声道1的相似度为0.7,与声道2的相似度的为0.9,声道5与声道3的相似度为0.8,与声道4的相似度为0.6,则将声道5划分至第一个候选声道分组中。For example, the independent channel is channel 5, the first candidate channel group includes channel 1 and channel 2, and the second candidate channel group includes channel 3 and channel 4. If the similarity between channel 5 and channel 1 is 0.7, the similarity between channel 5 and channel 2 is 0.9, the similarity between channel 5 and channel 3 is 0.8, and the similarity between channel 5 and channel 4 is 0.6, then channel 5 is divided into the first candidate channel group.
需要说明的是,上述实施例中若存在独立声道与至少两个候选声道分组中的声道的总和或最大值相同,则从这些候选声道分组中随机选择一个,将独立声道划分至选择的声道分组中。It should be noted that, in the above embodiment, if there is an independent channel that is the same as the sum or maximum value of the channels in at least two candidate channel groups, one of the candidate channel groups is randomly selected and the independent channel is divided into the selected channel group.
需要说明的是,本公开实施例中分组时包括两种分组方案。下面,对每种分组方案进行说明。It should be noted that the grouping in the embodiment of the present disclosure includes two grouping schemes. Each grouping scheme is described below.
在一些实施例中,对多个声道进行分组,直接得到N个第一声道分组和M个第二声道分组。可选地,本公开实施例中是指对多个声道分组时,直接就输出分组得到的N个第一声道分组和M个第二声道分组,期间不会输出其他分组结果。In some embodiments, multiple channels are grouped to directly obtain N first channel groups and M second channel groups. Optionally, in the embodiments of the present disclosure, when multiple channels are grouped, the N first channel groups and M second channel groups obtained by grouping are directly output, and no other grouping results are output during the process.
可选地,对多个声道进行分组,直接得到N个第一声道分组和M个第二声道分组,包括:通过对多个声道进行全局搜索,根据任两个声道之间的相似度对多个声道进行分组,以得到N个第一声道分组以及M个第二声道分组。Optionally, grouping the multiple channels to directly obtain N first channel groups and M second channel groups includes: performing a global search on the multiple channels and grouping the multiple channels according to the similarity between any two channels to obtain N first channel groups and M second channel groups.
在一种可能实现方式中,获取多个声道中每三个声道的任两个声道之间的相似度,若每三个声道的任两个声道之间的相似度均大于相似度阈值,则将该三个声道划分为一个候选声道分组,若最终得到一个候选声道分组,则将该候选声道分组确定为第一声道分组。若最终得到多个候选声道分组,则再根据每个候选声道分组的相似度关系,从多个候选声道分组中确定第一声道分组。In a possible implementation, the similarity between any two of every three channels in the multiple channels is obtained, and if the similarity between any two of every three channels is greater than a similarity threshold, the three channels are divided into a candidate channel group, and if a candidate channel group is finally obtained, the candidate channel group is determined as the first channel group. If multiple candidate channel groups are finally obtained, the first channel group is determined from the multiple candidate channel groups according to the similarity relationship of each candidate channel group.
可选地,若最终得到多个候选声道分组,则再根据每个候选声道分组的相似度关系,从多个候选声道分组中确定第一声道分组,包括:获取每个候选声道分组中的每两个声道之间的相似度的总和,从每个候选声道分组对应的总和中选择最大的总和,将该最大的总和对应的声道分组确定为第一声道分组。Optionally, if multiple candidate channel groups are finally obtained, then determining a first channel group from the multiple candidate channel groups according to the similarity relationship of each candidate channel group includes: obtaining the sum of the similarities between every two channels in each candidate channel group, selecting the largest sum from the sums corresponding to each candidate channel group, and determining the channel group corresponding to the largest sum as the first channel group.
例如,第一个候选声道分组中包括声道1、声道2和声道3,第二个候选声道分组包括声道4、声道5和声道6,若声道1、声道2和声道3中每两个声道之间的相似度的总和为2.7,声道4、声道5和声道6的相似度的总和为2.6,则确定第一个候选声道分组为第一声道分组。For example, the first candidate channel grouping includes channel 1, channel 2, and channel 3, and the second candidate channel grouping includes channel 4, channel 5, and channel 6. If the sum of the similarities between each two channels in channel 1, channel 2, and channel 3 is 2.7, and the sum of the similarities between channel 4, channel 5, and channel 6 is 2.6, then the first candidate channel grouping is determined to be the first channel grouping.
可选地,若最终得到多个候选声道分组,则再根据每个候选声道分组的相似度关系,从多个候选声道分组中确定第一声道分组,包括:确定独立声道与每个候选声道分组中的每两个声道的相似度中,最大的相似度对应的候选声道分组,将该最大的相似度对应的候选声道分组确定为第一声道分组。 Optionally, if multiple candidate channel groups are finally obtained, then according to the similarity relationship between each candidate channel group, a first channel group is determined from the multiple candidate channel groups, including: determining a candidate channel group corresponding to the greatest similarity between the independent channel and every two channels in each candidate channel group, and determining the candidate channel group corresponding to the greatest similarity as the first channel group.
在一些实施例中,对多个声道进行分组的过程与上述实施例中基于相似度分组的过程类似,在此不再赘述。In some embodiments, the process of grouping multiple channels is similar to the process of grouping based on similarity in the above embodiments, and will not be described in detail here.
在一些实施例中,对多个声道进行分组,得到包括两个声道的多个候选声道分组和至少一个独立声道,将至少一个独立声道中的一个独立声道划分至多个候选声道分组中的一个候选声道分组,得到N个第一声道分组和M个第二声道分组。In some embodiments, multiple channels are grouped to obtain multiple candidate channel groups including two channels and at least one independent channel, and an independent channel of the at least one independent channel is divided into one candidate channel group in the multiple candidate channel groups to obtain N first channel groups and M second channel groups.
需要说明的是,本公开实施例是以对多个声道进行分组为例进行说明。而在另一实施例中,在对多个声道进行分组后,还会生成每个声道分组的声道信息。It should be noted that the embodiment of the present disclosure is described by taking grouping of multiple channels as an example. In another embodiment, after the multiple channels are grouped, channel information of each channel group is also generated.
在一些实施例中,声道信息包括以下至少之一:In some embodiments, the channel information includes at least one of the following:
(1)包括两个声道的声道分组的个数;(1) The number of channel groups including two channels;
在一些实施例中,包括两个声道的声道分组的个数也可以称为第二声道分组数量,或者,第二声道分组数,或者,声道组对数量,或者,分组数量等,本公开实施例不做限定。In some embodiments, the number of channel groups including two channels may also be referred to as the number of second channel groups, or the number of second channel groups, or the number of channel group pairs, or the number of groups, etc., which is not limited in the embodiments of the present disclosure.
在一些实施例中,包括两个声道的声道分组的个数用于表示当前帧的第二声道分组对数量。In some embodiments, the number of channel groups including two channels is used to represent the number of second channel group pairs of the current frame.
(2)包括两个声道的声道分组中包括的声道的声道标识;(2) channel identifiers of channels included in a channel group including two channels;
在一些实施例中,包括两个声道的声道分组中包括的声道的声道标识用于表示声道对的索引,可解析得到当前声道对中的两个声道的索引值。In some embodiments, the channel identifiers of the channels included in the channel group including two channels are used to indicate the index of the channel pair, and the index values of the two channels in the current channel pair can be obtained by parsing.
(3)包括两个声道的声道分组的能量参数;(3) Energy parameters of a channel grouping including two channels;
(4)包括三个声道的声道分组的个数;(4) The number of channel groups including three channels;
在一些实施例中,包括三个声道的声道分组的个数也可以称为第一声道分组数量,或者,第一声道分组数,或者,声道组对数量,或者,分组数量等,本公开实施例不做限定。In some embodiments, the number of channel groups including three channels may also be referred to as the number of first channel groups, or the number of first channel groups, or the number of channel group pairs, or the number of groups, etc., which is not limited in the embodiments of the present disclosure.
在一些实施例中,包括三个声道的声道分组的个数用于表示当前帧的第一声道分组对数量。In some embodiments, the number of channel groups including three channels is used to represent the number of first channel group pairs of the current frame.
(5)包括三个声道的声道分组中包括的声道的声道标识;(5) channel identifiers of channels included in a channel group including three channels;
在一些实施例中,包括三个声道的声道分组中包括的声道的声道标识用于表示声道对的索引,可解析得到当前声道对中的三个声道的索引值。In some embodiments, the channel identifiers of the channels included in the channel group including three channels are used to indicate the index of the channel pair, and the index values of the three channels in the current channel pair can be obtained by parsing.
(6)包括三个声道的声道分组的能量参数;(6) Energy parameters of a channel grouping including three channels;
其中,能量参数用于对声道分组中声道的能量调整。The energy parameter is used to adjust the energy of the channels in the channel grouping.
在一些实施例中,能量参数用于当前声道对中第一个声道和第二个声道的声道间幅度差ILD参数量化索引,用于声道间能量/幅度调整。In some embodiments, the energy parameter is used to quantize an index of an inter-channel amplitude difference ILD parameter between a first channel and a second channel in a current channel pair for inter-channel energy/amplitude adjustment.
在一些实施例中,对多个声道进行分组,直接得到N个第一声道分组和M个第二声道分组,并且生成第一声道分组和第二声道分组的声道信息。In some embodiments, a plurality of channels are grouped to directly obtain N first channel groups and M second channel groups, and channel information of the first channel groups and the second channel groups is generated.
在一些实施例中,对多个声道进行分组,得到包括两个声道的多个候选声道分组和至少一个独立声道,生成多个候选声道分组和至少一个独立声道的声道信息,将至少一个独立声道中的一个独立声道划分至多个候选声道分组中的一个候选声道分组,得到N个第一声道分组和M个第二声道分组,并对多个候选声道分组和至少一个独立声道的声道信息进行改写,得到第一声道分组和第二声道分组的声道信息。 In some embodiments, a plurality of channels are grouped to obtain a plurality of candidate channel groups including two channels and at least one independent channel, channel information of the plurality of candidate channel groups and the at least one independent channel is generated, an independent channel of the at least one independent channel is divided into one candidate channel group of the plurality of candidate channel groups to obtain N first channel groups and M second channel groups, and the channel information of the plurality of candidate channel groups and the at least one independent channel is rewritten to obtain channel information of the first channel group and the second channel group.
可选地,对多个声道进行分组,得到包括两个声道的M+1个候选声道分组和至少一个独立声道,生成多个候选声道分组和至少一个独立声道的声道信息,将至少一个独立声道中的一个独立声道划分至多个候选声道分组中的一个候选声道分组,得到N个第一声道分组和M个第二声道分组。Optionally, multiple channels are grouped to obtain M+1 candidate channel groups including two channels and at least one independent channel, channel information of the multiple candidate channel groups and the at least one independent channel is generated, and an independent channel of the at least one independent channel is divided into one candidate channel group in the multiple candidate channel groups to obtain N first channel groups and M second channel groups.
需要说明的是,对多个声道进行分组得到第一声道分组的方法可以称为是3声道和差编码。对多个声道进行分组得到第二声道分组的方法可以称为是2声道和差编码。It should be noted that the method of grouping multiple channels to obtain the first channel grouping can be called 3-channel sum and difference coding. The method of grouping multiple channels to obtain the second channel grouping can be called 2-channel sum and difference coding.
需要说明的是,本公开实施例中所执行的方式由判决模块执行,或者由其他模块执行,本公开实施例不作限定。It should be noted that the method executed in the embodiment of the present disclosure is executed by the judgment module, or by other modules, and the embodiment of the present disclosure is not limited.
步骤S2102,编码器对每个声道分组中的音频信号进行下混,得到下混后的音频信号。Step S2102: the encoder downmixes the audio signal in each channel group to obtain a downmixed audio signal.
在一些实施例中,下混是指采用正交归一化矩阵对分组后的声道进行混合,以得到每个声道混合的声道。In some embodiments, downmixing refers to mixing the grouped channels using an orthogonal normalization matrix to obtain a mixed channel for each channel.
可选地,该正交归一化矩阵预先设置,本公开实施例对该正交归一化矩阵不作限定。Optionally, the orthogonal normalized matrix is preset, and the embodiment of the present disclosure does not limit the orthogonal normalized matrix.
例如,对于包括两个声道的声道分组,采用的正交归一化矩阵为其中,第一行是和向量,第二行是差向量。For example, for a channel group including two channels, the orthogonal normalization matrix used is Among them, the first row is the sum vector and the second row is the difference vector.
对于包括三个声道的声道分组,采用的正交归一化矩阵为其中,第一行是和向量,第二行和第三行是差向量。For a channel group including three channels, the orthogonal normalization matrix used is Among them, the first row is the sum vector, and the second and third rows are the difference vectors.
在一些实施例中M3ch ms I3ch ms=O3ch ms,M2ch ms I2ch ms=O2ch ms。其中,I3ch ms是1*3的列向量,另外,该列向量是指包括三个声道的声道分组中的音频数据。I2ch ms是1*2的列向量,另外,该列向量是指包括两个声道的声道分组中的音频数据。In some embodiments, M 3ch ms I 3ch ms = O 3ch ms , M 2ch ms I 2ch ms = O 2ch ms . Wherein, I 3ch ms is a 1*3 column vector, and the column vector refers to the audio data in the channel group including three channels. I 2ch ms is a 1*2 column vector, and the column vector refers to the audio data in the channel group including two channels.
需要说明的是,本公开实施例所执行的步骤由下混模块执行,或者采用其他方式执行,本公开实施例不作限定。It should be noted that the steps performed in the embodiment of the present disclosure are performed by a downmixing module, or are performed in other ways, which is not limited in the embodiment of the present disclosure.
步骤S2103,编码器对下混后的音频信号进行编码,得到音频流。Step S2103: the encoder encodes the downmixed audio signal to obtain an audio stream.
在一些实施例中,编码包括比特分配量化熵编码和码流复用。In some embodiments, encoding includes bit allocation quantization entropy coding and code stream multiplexing.
下面,通过举例说明上述步骤执行的方案。The following is an example of how to execute the above steps.
例如,参见图2B,执行判决模块(步骤S2101),判断采用哪种和差编码方法或者两者的组合。其中,声道包括L声道、R声道、C声道、LS声道和RS声道,其中,任两个声道之间的相似度如表1所示,For example, referring to FIG. 2B , the decision module (step S2101) is executed to determine which sum and difference coding method or a combination of the two is used. The channels include an L channel, an R channel, a C channel, an LS channel and an RS channel, wherein the similarity between any two channels is shown in Table 1.
表1
Table 1
下混阈值为0.5。判决模块通过某种算法获得使用的和差编码方法。The downmix threshold is 0.5. The decision module obtains the sum-difference coding method to be used through a certain algorithm.
其中,3CH M/S是指三个声道划分为第一声道分组的模块,2CH M/S是指两个声道划分为第二声道分组的模块。Among them, 3CH M/S refers to a module in which three channels are divided into the first channel grouping, and 2CH M/S refers to a module in which two channels are divided into the second channel grouping.
1、采用最大相似度迭代筛选出第一声道对L声道和R声道(最大相似度cox_L_R=0.85),第二声道对LS声道和RS声道(剩余未下混声道中的最大相似度cox_LS_RS=0.66)。1. Use maximum similarity iteration to select the first channel pair L channel and R channel (maximum similarity cox_L_R=0.85), and the second channel pair LS channel and RS channel (maximum similarity cox_LS_RS=0.66 among the remaining un-downmixed channels).
2、计算未下混所有声道(C声道)和第一声道对、第二声道对的两个声道的相似度。2. Calculate the similarity between all channels that are not downmixed (C channel) and the two channels of the first channel pair and the second channel pair.
C声道和L声道的相似度cox_C_L=0.74大于下混阈值,C声道和R声道的相似度cox_C_R=0.78大于下混阈值。The similarity between the C channel and the L channel cox_C_L=0.74 is greater than the downmix threshold, and the similarity between the C channel and the R channel cox_C_R=0.78 is greater than the downmix threshold.
3、输出3CH M/S和2CH M/S的判决结果。本实施例中,3CH M/S判决结果输出L声道、R声道、C声道。2CH M/S判决结果输出LS声道、RS声道。3. Output the judgment results of 3CH M/S and 2CH M/S. In this embodiment, the judgment result of 3CH M/S outputs L channel, R channel, and C channel. The judgment result of 2CH M/S outputs LS channel and RS channel.
后续,执行下混模块。
M2ch ms I2ch ms=O2ch ms
M3ch ms I3ch ms=O3ch ms
Subsequently, the downmixing module is executed.
M 2ch ms I 2ch ms =O 2ch ms
M 3ch ms I 3ch ms =O 3ch ms
其中,两声道下混矩阵M2ch ms为2x2的正交归一化矩阵,其中,第一行是和向量,第二行是差向量。三声道下混矩阵M3ch ms为3x3的正交归一化矩阵,其中,第一行是和向量,第二行和第三行是差向量。The two-channel downmix matrix M 2ch ms is a 2x2 orthogonal normalized matrix, in which the first row is a sum vector and the second row is a difference vector. The three-channel downmix matrix M 3ch ms is a 3x3 orthogonal normalized matrix, in which the first row is a sum vector and the second and third rows are difference vectors.
I2ch ms是1x2的列向量,I3ch ms是1x3的列向量。向量包含的数据是经过了预处理的音频数据,单位为采样点或者频点。I 2ch ms is a 1x2 column vector, and I 3ch ms is a 1x3 column vector. The data contained in the vector is the pre-processed audio data, in units of sampling points or frequency points.
生成每个声道分组的声道信息。Generate channel information for each channel group.
又例如,参见图2C,执行两声道组对判决,得到两声道组对边信息,包括组对个数和声道对索引。For another example, referring to FIG. 2C , two-channel pairing decision is performed to obtain two-channel pairing information, including the number of pairs and the channel pair index.
其中,声道包括L声道、R声道、C声道、LS声道和RS声道,其中,任两个声道之间的相似度如表1所示,下混阈值为0.5。The channels include an L channel, an R channel, a C channel, an LS channel and an RS channel, wherein the similarity between any two channels is as shown in Table 1, and the downmix threshold is 0.5.
两声道组对判决采用最大相似度迭代筛选出第一声道对L声道和R声道(最大相似度cox_L_R=0.85),第二声道对LS声道和RS声道(剩余未下混声道中的最大相似度cox_LS_RS=0.66)。The two-channel pair decision uses maximum similarity iteration to select the first channel pair L channel and R channel (maximum similarity cox_L_R=0.85), and the second channel pair LS channel and RS channel (maximum similarity cox_LS_RS among the remaining un-downmixed channels=0.66).
最后得到的第二声道分组个数为2,包括第一声道分组L声道和R声道、第二声道分组LS声道和RS声道。The number of the second channel groups finally obtained is 2, including the first channel group L channel and R channel, and the second channel group LS channel and RS channel.
其次,执行三声道组对判决,如果产生三声道组对,会改判上述的声道分组结果。Secondly, a three-channel pairing judgment is executed. If a three-channel pairing is generated, the above channel grouping result will be changed.
首先,计算未下混所有声道(C声道)和第一声道对、第二声道对的两个声道的相似度。First, the similarity between all channels that are not downmixed (C channel) and the two channels of the first channel pair and the second channel pair is calculated.
C声道和L声道的相似度cox_C_L=0.74大于下混阈值,C声道和R声道的相似度cox_C_R=0.78大于下混阈值。 The similarity between the C channel and the L channel cox_C_L=0.74 is greater than the downmix threshold, and the similarity between the C channel and the R channel cox_C_R=0.78 is greater than the downmix threshold.
输出3CH M/S和2CH M/S的判决结果。本实施例中,3CH M/S判决结果输出L声道、R声道、C声道。2CH M/S判决结果改判后输出1个声道分组,即输出LS声道、RS声道。Output the judgment results of 3CH M/S and 2CH M/S. In this embodiment, the judgment result of 3CH M/S outputs L channel, R channel, and C channel. After the judgment result of 2CH M/S is changed, 1 channel group is output, that is, LS channel and RS channel are output.
生成两声道和三声道的声道信息。Generates two-channel and three-channel channel information.
执行两声道和三声道下混模块。
M2ch ms I2ch ms=O2ch ms
M3ch ms I3ch ms=O3ch ms
Implements two-channel and three-channel downmixing modules.
M 2ch ms I 2ch ms =O 2ch ms
M 3ch ms I 3ch ms =O 3ch ms
其中,两声道下混矩阵M2ch ms为2x2的正交归一化矩阵,其中,第一行是和向量,第二行是差向量。三声道下混矩阵M3ch ms为3x3的正交归一化矩阵,其中,第一行是和向量,第二行和第三行是差向量。The two-channel downmix matrix M 2ch ms is a 2x2 orthogonal normalized matrix, in which the first row is a sum vector and the second row is a difference vector. The three-channel downmix matrix M 3ch ms is a 3x3 orthogonal normalized matrix, in which the first row is a sum vector and the second and third rows are difference vectors.
对上述声道信息进行改写,得到新的声道信息。The above channel information is rewritten to obtain new channel information.
在一些实施例中,编码器通过执行上述步骤得到音频流。参见图2D,在获取到声道信号后,对该声道信号进行预处理,再对预处理后的声道信号进行声道间组对下混,然后进行比特分配量化熵编码,最后进行码流复用,得到音频流。In some embodiments, the encoder obtains an audio stream by executing the above steps. Referring to FIG2D , after obtaining the channel signal, the channel signal is preprocessed, and then the preprocessed channel signal is down-mixed between channels, and then bit allocation quantization entropy coding is performed, and finally bit stream multiplexing is performed to obtain an audio stream.
其中,预处理包括暂态检测窗型判断、时频变换、频域噪声整形、时域噪声整形、频带扩展编码等过程。声道间组对下混包括将L声道、R声道、C声道进行三声道组对下混,得到M1声道、S11声道、S12声道;以及将LS声道、RS声道进行两声道组对下混,得到M2声道、S2声道,另外LFE声道不做处理。Among them, the preprocessing includes transient detection window type judgment, time-frequency transformation, frequency domain noise shaping, time domain noise shaping, band extension coding and other processes. Inter-channel group downmixing includes three-channel group downmixing of L channel, R channel and C channel to obtain M1 channel, S11 channel and S12 channel; and two-channel group downmixing of LS channel and RS channel to obtain M2 channel and S2 channel, and LFE channel is not processed.
步骤S2104,编码器发送第一信息和音频流。Step S2104: the encoder sends the first information and the audio stream.
在一些实施例中,解码器接收第一信息和音频流。In some embodiments, a decoder receives first information and an audio stream.
在一些实施例中,编码器可以分别发送第一信息和音频流。例如,编码器先发送第一信息,再发送音频流。或者,编码器先发送音频流,再发送第一信息。在一些实施例中,编码器可以同时发送第一信息和音频流。In some embodiments, the encoder may send the first information and the audio stream separately. For example, the encoder sends the first information first and then sends the audio stream. Alternatively, the encoder sends the audio stream first and then sends the first information. In some embodiments, the encoder may send the first information and the audio stream simultaneously.
在一些实施例中,声道信息包括以下至少之一:In some embodiments, the channel information includes at least one of the following:
包括两个声道的声道分组的个数;The number of channel groups including two channels;
包括两个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including two channels;
包括两个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising two channels;
包括三个声道的声道分组的个数;The number of channel groups including three channels;
包括三个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including three channels;
包括三个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising three channels;
其中,能量参数用于对声道分组中声道的能量调整。The energy parameter is used to adjust the energy of the channels in the channel grouping.
步骤S2105,解码器对第一信息进行解码,得到至少一个声道分组的声道信息。Step S2105: The decoder decodes the first information to obtain channel information of at least one channel group.
步骤S2106,解码器在确定声道分组中不包括三个声道的声道分组时,对音频流进行两声道上混,在确定声道分组包括三个声道的声道分组时,对音频流进行三声道上混。Step S2106: When the decoder determines that the channel grouping does not include a channel grouping with three channels, the decoder performs two-channel upmixing on the audio stream; when the decoder determines that the channel grouping includes a channel grouping with three channels, the decoder performs three-channel upmixing on the audio stream.
在一些实施例中,解码器判断编码器是否已经执行两声道下混和三声道下混。如果编码器执行了两声道下混,解码器执行两声道上混。如果编码器执行了三声道下混,解码器执行三声道上混。 In some embodiments, the decoder determines whether the encoder has performed a two-channel downmix and a three-channel downmix. If the encoder has performed a two-channel downmix, the decoder performs a two-channel upmix. If the encoder has performed a three-channel downmix, the decoder performs a three-channel upmix.
可选地,音频后处理包括但不限于通用的解码流程,比如时频反变换、时域噪声整形反变换、频域噪声整形反变换、频带扩展反变换等模块,也包括针对某类信号特征进行的解码处理,比如多声道解码处理、HOA声道解码处理、对象元数据解码处理等。Optionally, audio post-processing includes but is not limited to general decoding processes, such as time-frequency inverse transform, time-domain noise shaping inverse transform, frequency-domain noise shaping inverse transform, frequency band extension inverse transform and other modules, and also includes decoding processing for certain types of signal characteristics, such as multi-channel decoding processing, HOA channel decoding processing, object metadata decoding processing, etc.
在一些实施例中,解码器通过执行上述步骤解码得到各声道信号。参见图2E,在获取到音频流后,对该音频流进行码流解复用,再进行比特分配反量化熵编码、声道间上混、后处理,得到解码后的声道信号。In some embodiments, the decoder obtains each channel signal by performing the above steps. Referring to FIG2E , after obtaining the audio stream, the audio stream is demultiplexed, and then bit allocation inverse quantization entropy coding, inter-channel upmixing, and post-processing are performed to obtain decoded channel signals.
其中,后处理包括频带扩展解码、逆时域噪声整形、逆频域噪声整形、逆时频变换等过程。声道间组对上混包括将M1声道、S11声道、S12声进行三声道组对上混L声道、R声道、C声道,得到道;以及将M2声道、S2声道进行两声道组对上混LS声道、RS声道,得到,另外LFE声道不做处理。Among them, post-processing includes frequency band extension decoding, inverse time domain noise shaping, inverse frequency domain noise shaping, inverse time-frequency transformation, etc. The inter-channel group upmixing includes upmixing the M1 channel, S11 channel, and S12 channel into L channel, R channel, and C channel in three-channel groups to obtain channels; and upmixing the M2 channel and S2 channel into LS channel and RS channel in two-channel groups to obtain channels, and the LFE channel is not processed.
在一些实施例中,信息等的名称不限定于实施例中所记载的名称,“信息(information)”、“消息(message)”、“信号(signal)”、“信令(signaling)”、“报告(report)”、“配置(configuration)”、“指示(indication)”、“指令(instruction)”、“命令(command)”、“信道”、“参数(parameter)”、“域”、“字段”、“符号(symbol)”、“码元(symbol)”、“码本(codebook)”、“码字(codeword)”、“码点(codepoint)”、“比特(bit)”、“数据(data)”、“程序(program)”、“码片(chip)”等术语可以相互替换。In some embodiments, the names of information, etc. are not limited to the names recorded in the embodiments, and terms such as "information", "message", "signal", "signaling", "report", "configuration", "indication", "instruction", "command", "channel", "parameter", "domain", "field", "symbol", "symbol", "code element", "codebook", "codeword", "codepoint", "bit", "data", "program", and "chip" can be used interchangeably.
在一些实施例中,“上行”、“上行链路”、“物理上行链路”等术语可以相互替换,“下行”、“下行链路”、“物理下行链路”等术语可以相互替换,“侧行(side)”、“侧行链路(sidelink)”、“侧行通信”、“侧行链路通信”、“直连”、“直连链路”、“直连通信”、“直连链路通信”等术语可以相互替换。In some embodiments, terms such as "uplink", "uplink", "physical uplink" can be interchangeable, and terms such as "downlink", "downlink", "physical downlink" can be interchangeable, and terms such as "side", "sidelink", "side communication", "sidelink communication", "direct connection", "direct link", "direct communication", "direct link communication" can be interchangeable.
在一些实施例中,“获取”、“获得”、“得到”、“接收”、“传输”、“双向传输”、“发送和/或接收”可以相互替换,其可以解释为从其他主体接收,从协议中获取,从高层获取,自身处理得到、自主实现等多种含义。In some embodiments, "obtain", "obtain", "get", "receive", "transmit", "bidirectional transmission", "send and/or receive" can be interchangeable, and can be interpreted as receiving from other entities, obtaining from protocols, obtaining from high levels, obtaining by self-processing, autonomous implementation, etc.
在一些实施例中,“发送”、“发射”、“上报”、“下发”、“传输”、“双向传输”、“发送和/或接收”等术语可以相互替换。In some embodiments, terms such as "send", "transmit", "report", "send", "transmit", "bidirectional transmission", "send and/or receive" can be used interchangeably.
在一些实施例中,“时刻”、“时间点”、“时间”、“时间位置”等术语可以相互替换,“时长”、“时段”、“时间窗口”、“窗口”、“时间”等术语可以相互替换。In some embodiments, terms such as "moment", "time point", "time", and "time position" can be interchangeable, and terms such as "duration", "period", "time window", "window", and "time" can be interchangeable.
在一些实施例中,“特定(certain)”、“预定(preseted)”、“预设”、“设定”、“指示(indicated)”、“某一”、“任意”、“第一”等术语可以相互替换,“特定A”、“预定A”、“预设A”、“设定A”、“指示A”、“某一A”、“任意A”、“第一A”可以解释为在协议等中预先规定的A,也可以解释为通过设定、配置、或指示等得到的A,也可以解释为特定A、某一A、任意A、或第一A等,但不限于此。In some embodiments, terms such as "certain", "preset", "preset", "set", "indicated", "some", "any", and "first" can be interchangeable, and "specific A", "preset A", "preset A", "set A", "indicated A", "some A", "any A", and "first A" can be interpreted as A pre-defined in a protocol, etc., or as A obtained through setting, configuration, or indication, etc., and can also be interpreted as specific A, some A, any A, or first A, etc., but is not limited to this.
本公开实施例所涉及的分组方法可以包括步骤S2101~步骤S2106中的至少一者。例如,步骤S2101可以作为独立实施例来实施,步骤S2102可以作为独立实施例来实施,步骤S2103可以作为独立实施例来实施,步骤S2104可以作为独立实施例来实施,步骤S2105可以作为独立实施例来实施,步骤S2106可以作为独立实施例来实施,步骤S2101和步骤S2102可以作为独立实施例来实施,步 骤S2103、步骤S2104可以作为独立实施例来实施,步骤S2105、步骤S2106可以作为独立实施例来实施,步骤S2101、步骤S2102、步骤S2103、步骤S2104可以作为独立实施例来实施,步骤S2101、步骤S2102、步骤S2105、步骤S2106可以作为独立实施例来实施,步骤S2103、步骤S2104、步骤S2105、步骤S2106可以作为独立实施例来实施,但不限于此。The grouping method involved in the embodiment of the present disclosure may include at least one of steps S2101 to S2106. For example, step S2101 may be implemented as an independent embodiment, step S2102 may be implemented as an independent embodiment, step S2103 may be implemented as an independent embodiment, step S2104 may be implemented as an independent embodiment, step S2105 may be implemented as an independent embodiment, step S2106 may be implemented as an independent embodiment, step S2101 and step S2102 may be implemented as independent embodiments ...6 may be implemented as an independent embodiment, step S2103 may be implemented as an independent embodiment, step S2104 may be implemented as an independent embodiment, step S2105 may be implemented as an independent embodiment, step S2106 may be implemented as an independent embodiment, step S2106 may be implemented as an independent embodiment, step S210 Step S2103 and step S2104 can be implemented as independent embodiments, step S2105 and step S2106 can be implemented as independent embodiments, step S2101, step S2102, step S2103, and step S2104 can be implemented as independent embodiments, step S2101, step S2102, step S2105, and step S2106 can be implemented as independent embodiments, step S2103, step S2104, step S2105, and step S2106 can be implemented as independent embodiments, but are not limited to this.
在一些实施例中,步骤S2101是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2101 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2102是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2102 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2103是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2103 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2104是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2104 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2105是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2105 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2106是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2106 is optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2101、步骤S2102是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2101 and step S2102 are optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S2103、步骤S2104是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S2103 and step S2104 are optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,步骤S215、步骤S2106是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。In some embodiments, step S215 and step S2106 are optional, and one or more of these steps may be omitted or replaced in different embodiments.
在一些实施例中,可参见图2所对应的说明书之前或之后记载的其他可选实现方式。In some embodiments, reference may be made to other optional implementations recorded before or after the description corresponding to FIG. 2 .
图3A是根据本公开实施例示出的分组方法的流程示意图,应用于编码器。如图3A所示,本公开实施例涉及分组方法,上述方法包括:FIG3A is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to an encoder. As shown in FIG3A , an embodiment of the present disclosure relates to a grouping method, which includes:
步骤S3101,编码器对多个声道进行分组,得到至少一个声道分组。Step S3101: The encoder groups multiple channels to obtain at least one channel group.
步骤S3101的可选实现方式可以参见图2的步骤S2101的可选实现方式、及图2所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S3101 can refer to the optional implementation of step S2101 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
步骤S3102,编码器对每个声道分组中的音频信号进行下混,得到下混后的音频信号。Step S3102: the encoder downmixes the audio signal in each channel group to obtain a downmixed audio signal.
步骤S3102的可选实现方式可以参见图2的步骤S2102的可选实现方式、及图2所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S3102 can refer to the optional implementation of step S2102 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
步骤S3103,编码器对下混后的音频信号进行编码,得到音频流。Step S3103: the encoder encodes the downmixed audio signal to obtain an audio stream.
步骤S3103的可选实现方式可以参见图2的步骤S2103的可选实现方式、及图2所涉及的实施例中其他关联部分,此处不再赘述。 The optional implementation of step S3103 can refer to the optional implementation of step S2103 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
步骤S3104,编码器发送第一信息和音频流。Step S3104: the encoder sends the first information and the audio stream.
步骤S3103的可选实现方式可以参见图2的步骤S2104的可选实现方式、及图2所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S3103 can refer to the optional implementation of step S2104 in FIG. 2 and other related parts in the embodiment involved in FIG. 2 , which will not be described in detail here.
本公开实施例所涉及的分组方法可以包括步骤S3101~步骤S3104中的至少一者。例如,步骤S3101可以作为独立实施例来实施,步骤S3102可以作为独立实施例来实施,步骤S3103可以作为独立实施例来实施,步骤S3104可以作为独立实施例来实施,或者也可以至少两个步骤结合,但不限于此。The grouping method involved in the embodiment of the present disclosure may include at least one of steps S3101 to S3104. For example, step S3101 may be implemented as an independent embodiment, step S3102 may be implemented as an independent embodiment, step S3103 may be implemented as an independent embodiment, step S3104 may be implemented as an independent embodiment, or at least two steps may be combined, but are not limited thereto.
在一些实施例中,步骤S3101是可选的,步骤S3102是可选的,步骤S3103是可选的,步骤S3103是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。但不限于此。In some embodiments, step S3101 is optional, step S3102 is optional, step S3103 is optional, and step S3104 is optional. In different embodiments, one or more of these steps may be omitted or replaced, but the present invention is not limited thereto.
图3B是根据本公开实施例示出的分组方法的流程示意图,应用于编码器。如图3B所示,本公开实施例涉及分组方法,上述方法包括:FIG3B is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to an encoder. As shown in FIG3B , an embodiment of the present disclosure relates to a grouping method, which includes:
步骤S3201,编码器对多个声道进行分组,得到至少一个声道分组。Step S3201: The encoder groups multiple channels to obtain at least one channel group.
步骤S3201的可选实现方式可以参见图2的步骤S2101、图3A的步骤S3101及图2、图3A所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S3201 can refer to step S2101 of FIG. 2 , step S3101 of FIG. 3A , and other related parts of the embodiments involved in FIG. 2 and FIG. 3A , which will not be described in detail here.
图4A是根据本公开实施例示出的分组方法的流程示意图,应用于解码器,如图4A所示,本公开实施例涉及分组方法,上述方法包括:FIG4A is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to a decoder. As shown in FIG4A , an embodiment of the present disclosure relates to a grouping method, and the method includes:
步骤S4101,解码器对第一信息进行解码,得到至少一个声道分组的声道信息。Step S4101: The decoder decodes the first information to obtain channel information of at least one channel group.
步骤S4101的可选实现方式可以参见图2的步骤S2105及图2所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S4101 can refer to step S2105 of FIG. 2 and other related parts of the embodiment involved in FIG. 2 , which will not be described in detail here.
步骤S4102,解码器在确定声道分组中不包括三个声道的声道分组时,对音频流进行两声道上混,在确定声道分组包括三个声道的声道分组时,对音频流进行三声道上混。Step S4102: When the decoder determines that the channel grouping does not include a channel grouping with three channels, it performs two-channel upmixing on the audio stream; when the decoder determines that the channel grouping includes a channel grouping with three channels, it performs three-channel upmixing on the audio stream.
步骤S4102的可选实现方式可以参见图2的步骤S2106及图2所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S4102 can refer to step S2106 of FIG. 2 and other related parts of the embodiment involved in FIG. 2 , which will not be described in detail here.
本公开实施例所涉及的分组方法可以包括步骤S4101~步骤S4102中的至少一者。例如,步骤S4101可以作为独立实施例来实施,步骤S4102可以作为独立实施例来实施,或者也可以至少两个步骤结合,但不限于此。The grouping method involved in the embodiment of the present disclosure may include at least one of step S4101 to step S4102. For example, step S4101 may be implemented as an independent embodiment, step S4102 may be implemented as an independent embodiment, or at least two steps may be combined, but not limited thereto.
在一些实施例中,步骤S4101是可选的,步骤S4102是可选的,在不同实施例中可以对这些步骤中的一个或多个步骤进行省略或替代。但不限于此。In some embodiments, step S4101 is optional, step S4102 is optional, and one or more of these steps may be omitted or replaced in different embodiments, but the present invention is not limited thereto.
图4B是根据本公开实施例示出的分组方法的流程示意图,应用于解码器,如图4B所示,本公开实施例涉及分组方法,上述方法包括:FIG4B is a flow chart of a grouping method according to an embodiment of the present disclosure, which is applied to a decoder. As shown in FIG4B , an embodiment of the present disclosure relates to a grouping method, and the method includes:
步骤S4201,解码器对第一信息进行解码,得到至少一个声道分组的声道信息。Step S4201: The decoder decodes the first information to obtain channel information of at least one channel group.
步骤S4201的可选实现方式可以参见图2的步骤S2105及图2所涉及的实施例中其他关联部分,此处不再赘述。 The optional implementation of step S4201 can refer to step S2105 of FIG. 2 and other related parts of the embodiment involved in FIG. 2 , which will not be described in detail here.
在一些实施例中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。In some embodiments, there are N first channel groups in the at least one channel group, the first channel group includes three channels, and there are M second channel groups in the at least one channel group, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
在一些实施例中,所述声道信息包括以下至少之一:In some embodiments, the channel information includes at least one of the following:
包括两个声道的声道分组的个数;The number of channel groups including two channels;
包括两个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including two channels;
包括两个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising two channels;
包括三个声道的声道分组的个数;The number of channel groups including three channels;
包括三个声道的声道分组中包括的声道的声道标识;channel identifiers of channels included in a channel group including three channels;
包括三个声道的声道分组的能量参数;An energy parameter of a channel grouping comprising three channels;
其中,所述能量参数用于对声道分组中声道的能量调整。The energy parameter is used to adjust the energy of the channels in the channel group.
在一些实施例中,所述方法还包括:In some embodiments, the method further comprises:
在确定所述声道分组中不包括三个声道的声道分组时,对音频流进行两声道上混。When it is determined that the channel groups do not include a channel grouping for three channels, two-channel upmixing is performed on the audio stream.
在一些实施例中,所述方法还包括:In some embodiments, the method further comprises:
在确定所述声道分组中包括三个声道的声道分组时,对音频流进行三声道上混。When it is determined that the channel groups include a channel grouping with three channels, three-channel upmixing is performed on the audio stream.
图5是根据本公开实施例示出的分组方法的流程示意图,如图5所示,本公开实施例涉及分组方法,上述方法包括:FIG5 is a flow chart of a grouping method according to an embodiment of the present disclosure. As shown in FIG5 , the embodiment of the present disclosure relates to a grouping method, and the method includes:
步骤S5101:编码器对多个声道进行分组,得到至少一个声道分组。Step S5101: the encoder groups multiple channels to obtain at least one channel group.
在一些实施例中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。In some embodiments, there are N first channel groups in the at least one channel group, the first channel group includes three channels, and there are M second channel groups in the at least one channel group, the second channel group includes two channels, wherein N is 1 and M is a non-negative integer.
步骤S5102:解码器对第一信息进行解码,得到至少一个声道分组的声道信息。Step S5102: The decoder decodes the first information to obtain channel information of at least one channel group.
步骤S5101的可选实现方式可以参见图2的步骤S2101、图3A中的步骤S3101及图2、图4A所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S5101 can refer to step S2101 in FIG. 2 , step S3101 in FIG. 3A , and other related parts in the embodiments involved in FIG. 2 and FIG. 4A , which will not be described in detail here.
步骤S5102的可选实现方式可以参见图2的步骤S2105、图4A的步骤S4101及图2、图3A所涉及的实施例中其他关联部分,此处不再赘述。The optional implementation of step S5102 can refer to step S2105 of FIG. 2 , step S4101 of FIG. 4A , and other related parts of the embodiments involved in FIG. 2 and FIG. 3A , which will not be described in detail here.
在一些实施例中,上述方法可以包括上述编解码系统侧、编码器侧、解码器侧等的实施例的方法,此处不再赘述。In some embodiments, the above method may include the method of the above-mentioned embodiments of the coding and decoding system side, encoder side, decoder side, etc., which will not be repeated here.
图6是根据本公开实施例示出的分组方法的流程示意图,如图6所示,本公开实施例涉及分组方法,上述方法包括:FIG6 is a flow chart of a grouping method according to an embodiment of the present disclosure. As shown in FIG6 , the embodiment of the present disclosure relates to a grouping method, and the method includes:
步骤S6101,编码端采用两声道和差编码和三声道和差编码组合的方法对多个声道进行编码。Step S6101: the encoding end encodes multiple channels by combining two-channel sum and difference encoding and three-channel sum and difference encoding.
编码端完成音频预处理进入声道间组对下混模块。音频预处理包括但不限于通用的编码流程,比如暂态分析、时频变换、时域噪声整形、频域噪声整形、频带扩展等模块,也包括针对某类信号 特征进行的处理,比如多声道编码处理、HOA声道编码处理、对象元数据编码处理等。The encoder completes audio preprocessing and enters the channel pair downmixing module. Audio preprocessing includes but is not limited to general encoding processes, such as transient analysis, time-frequency transformation, time-domain noise shaping, frequency-domain noise shaping, and frequency band extension. It also includes audio preprocessing for certain types of signals. Features are processed, such as multi-channel encoding, HOA channel encoding, object metadata encoding, etc.
声道间组对下混模块引入三声道和差编码的方法,并和两声道和差编码框架组合使用。该模块包括判决模块和下混模块。The inter-channel group downmix module introduces a three-channel sum-difference coding method and combines it with a two-channel sum-difference coding framework. The module includes a decision module and a downmix module.
判决模块用来判断采用哪种和差编码方法或者组合。判决标准是声道间的相关性,并与组对阈值进行比较。判决结果是L、R、C三个声道进行3声道和差编码,LS和LS进行2声道和差编码,LFE声道不做处理。The decision module is used to determine which sum and difference coding method or combination to use. The decision criterion is the correlation between channels, which is compared with the group pair threshold. The decision result is that the L, R, and C channels are 3-channel sum and difference coded, LS and LS are 2-channel sum and difference coded, and the LFE channel is not processed.
下混模块对L、R、C三个声道进行3声道和差编码,LS和LS进行2声道和差编码。The downmix module performs 3-channel sum and difference encoding on the L, R, and C channels, and 2-channel sum and difference encoding on LS and LS.
经过声道间组对下混模块后,三声道和差编码下混后的声道(M1声道、S11声道,S12声道),两声道和差编码下混后的声道(M2声道、S2声道)和未下混后的声道(LFE声道)都经过比特分配量化熵编码模块,经过码流复用形成编码比特流E。。After the inter-channel group downmixing module, the three-channel sum difference coding downmixed channels (M1 channel, S11 channel, S12 channel), the two-channel sum difference coding downmixed channels (M2 channel, S2 channel) and the un-downmixed channels (LFE channel) are all passed through the bit allocation quantization entropy coding module, and the bit stream is multiplexed to form the coded bit stream E.
在本公开实施例中,部分或全部步骤、其可选实现方式可以与其他实施例中的部分或全部步骤任意组合,也可以与其他实施例的可选实现方式任意组合。In the embodiments of the present disclosure, part or all of the steps and their optional implementations may be arbitrarily combined with part or all of the steps in other embodiments, or may be arbitrarily combined with optional implementations of other embodiments.
本公开实施例还提出用于实现以上任一方法的装置,例如,提出一装置,上述装置包括用以实现以上任一方法中编码器所执行的各步骤的单元或模块。再如,还提出另一装置,包括用以实现以上任一方法中解码器所执行的各步骤的单元或模块。The embodiments of the present disclosure also propose a device for implementing any of the above methods, for example, a device is proposed, the above device includes a unit or module for implementing each step performed by the encoder in any of the above methods. For another example, another device is also proposed, including a unit or module for implementing each step performed by the decoder in any of the above methods.
应理解以上装置中各单元或模块的划分仅是一种逻辑功能的划分,在实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。此外,装置中的单元或模块可以以处理器调用软件的形式实现:例如装置包括处理器,处理器与存储器连接,存储器中存储有指令,处理器调用存储器中存储的指令,以实现以上任一方法或实现上述装置各单元或模块的功能,其中处理器例如为通用处理器,例如中央处理单元(Central Processing Unit,CPU)或微处理器,存储器为装置内的存储器或装置外的存储器。或者,装置中的单元或模块可以以硬件电路的形式实现,可以通过对硬件电路的设计实现部分或全部单元或模块的功能,上述硬件电路可以理解为一个或多个处理器;例如,在一种实现中,上述硬件电路为专用集成电路(application-specific integrated circuit,ASIC),通过对电路内元件逻辑关系的设计,实现以上部分或全部单元或模块的功能;再如,在另一种实现中,上述硬件电路为可以通过可编程逻辑器件(programmable logic device,PLD)实现,以现场可编程门阵列(Field Programmable Gate Array,FPGA)为例,其可以包括大量逻辑门电路,通过配置文件来配置逻辑门电路之间的连接关系,从而实现以上部分或全部单元或模块的功能。以上装置的所有单元或模块可以全部通过处理器调用软件的形式实现,或全部通过硬件电路的形式实现,或部分通过处理器调用软件的形式实现,剩余部分通过硬件电路的形式实现。It should be understood that the division of the units or modules in the above device is only a division of logical functions, which can be fully or partially integrated into one physical entity or physically separated in actual implementation. In addition, the units or modules in the device can be implemented in the form of a processor calling software: for example, the device includes a processor, the processor is connected to a memory, and instructions are stored in the memory. The processor calls the instructions stored in the memory to implement any of the above methods or implement the functions of the units or modules of the above device, wherein the processor is, for example, a general-purpose processor, such as a central processing unit (CPU) or a microprocessor, and the memory is a memory inside the device or a memory outside the device. Alternatively, the units or modules in the device may be implemented in the form of hardware circuits, and the functions of some or all of the units or modules may be implemented by designing the hardware circuits. The hardware circuits may be understood as one or more processors; for example, in one implementation, the hardware circuits are application-specific integrated circuits (ASICs), and the functions of some or all of the above units or modules may be implemented by designing the logical relationship of the components in the circuits; for another example, in another implementation, the hardware circuits may be implemented by programmable logic devices (PLDs), and Field Programmable Gate Arrays (FPGAs) may be used as an example, which may include a large number of logic gate circuits, and the connection relationship between the logic gate circuits may be configured by configuring the configuration files, thereby implementing the functions of some or all of the above units or modules. All units or modules of the above devices may be implemented in the form of software called by the processor, or in the form of hardware circuits, or in the form of software called by the processor, and the remaining part may be implemented in the form of hardware circuits.
在本公开实施例中,处理器是具有信号处理能力的电路,在一种实现中,处理器可以是具有指令读取与运行能力的电路,例如中央处理单元(Central Processing Unit,CPU)、微处理器、图形处理器(graphics processing unit,GPU)(可以理解为微处理器)、或数字信号处理器(digital signal processor,DSP)等;在另一种实现中,处理器可以通过硬件电路的逻辑关系实现一定功能,上述硬件电路的逻辑关系是固定的或可以重构的,例如处理器为专用集成电路(application-specific integrated circuit,ASIC)或可编程逻辑器件(programmable logic device,PLD)实现的硬件电路,例如FPGA。在可重构的硬件电路中,处理器加载配置文档,实现硬件电路配置的过程,可以理解为处理器加载指令,以实现以上部分或全部单元或模块的功能的过程。此外,还可以是针对人工智能设计的硬件电路,其可以理解为ASIC,例如神经网络处理单元(Neural Network Processing Unit,NPU)、张量处理单元(Tensor Processing Unit,TPU)、深度学习处理单元(Deep learning Processing Unit,DPU)等。In the embodiments of the present disclosure, the processor is a circuit with signal processing capability. In one implementation, the processor may be a circuit with instruction reading and execution capability, such as a central processing unit (CPU), a microprocessor, a graphics processing unit (GPU) (which may be understood as a microprocessor), or a digital signal processor (DSP). In another implementation, the processor may implement certain functions through the logical relationship of hardware circuits, and the logical relationship of the above hardware circuits may be fixed or reconfigurable, such as a processor being an application-specific integrated circuit (ASIC). A hardware circuit implemented by an ASIC (Integrated Circuit) or a programmable logic device (PLD), such as an FPGA. In a reconfigurable hardware circuit, the process of the processor loading a configuration document to implement the hardware circuit configuration can be understood as the process of the processor loading instructions to implement the functions of some or all of the above units or modules. In addition, it can also be a hardware circuit designed for artificial intelligence, which can be understood as an ASIC, such as a neural network processing unit (NPU), a tensor processing unit (TPU), a deep learning processing unit (DPU), etc.
图7A是本公开实施例提出的编解码装置的结构示意图。如图7A所示,编解码装置7100可以包括:收发模块7101、处理模块7102等中的至少一者。在一些实施例中,处理模块7102用于对多个声道进行分组,得到至少一个声道分组,其中,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。可选地,上述收发模块7101用于执行以上任一方法中编码器执行的发送和/或接收等通信步骤中的至少一者(例如步骤S2101但不限于此),此处不再赘述。可选地,上述处理模块用于执行以上任一方法中编码器执行的其他步骤中的至少一者,此处不再赘述。FIG7A is a schematic diagram of the structure of the encoding and decoding device proposed in an embodiment of the present disclosure. As shown in FIG7A, the encoding and decoding device 7100 may include: at least one of a transceiver module 7101, a processing module 7102, etc. In some embodiments, the processing module 7102 is used to group multiple channels to obtain at least one channel group, wherein there are N first channel groups in the at least one channel group, and the first channel group includes three channels, and there are M second channel groups in the at least one channel group, and the second channel group includes two channels, wherein N is 1 and M is a non-negative integer. Optionally, the above-mentioned transceiver module 7101 is used to perform at least one of the communication steps such as sending and/or receiving performed by the encoder in any of the above methods (for example, step S2101 but not limited thereto), which will not be repeated here. Optionally, the above-mentioned processing module is used to perform at least one of the other steps performed by the encoder in any of the above methods, which will not be repeated here.
可选地,处理模块7102用于执行以上任一方法中编码器执行的处理等通信步骤中的至少一者,此处不再赘述。Optionally, the processing module 7102 is used to execute at least one of the communication steps such as processing performed by the encoder in any of the above methods, which will not be repeated here.
图7B是本公开实施例提出的编解码装置的结构示意图。如图7B所示,编解码装置7200可以包括:收发模块7201、处理模块7202等中的至少一者。在一些实施例中,处理模块7202用于对第一信息进行解码,得到至少一个声道分组的声道信息,所述至少一个声道分组中存在N个第一声道分组,所述第一声道分组包括三个声道,所述至少一个声道分组中存在M个第二声道分组,所述第二声道分组包括两个声道,其中,N为1,M为非负整数。可选地,上述收发模块用于执行以上任一方法中解码器执行的发送和/或接收等通信步骤(例如步骤S2102但不限于此)中的至少一者,此处不再赘述。FIG7B is a schematic diagram of the structure of the coding and decoding device proposed in the embodiment of the present disclosure. As shown in FIG7B , the coding and decoding device 7200 may include: at least one of a transceiver module 7201 and a processing module 7202. In some embodiments, the processing module 7202 is used to decode the first information to obtain channel information of at least one channel grouping, wherein there are N first channel groups in the at least one channel grouping, and the first channel grouping includes three channels, and there are M second channel groups in the at least one channel grouping, and the second channel grouping includes two channels, wherein N is 1 and M is a non-negative integer. Optionally, the above-mentioned transceiver module is used to execute at least one of the communication steps such as sending and/or receiving (such as step S2102 but not limited thereto) executed by the decoder in any of the above methods, which will not be repeated here.
可选地,处理模块7202用于执行以上任一方法中解码器执行的处理等通信步骤中的至少一者,此处不再赘述。Optionally, the processing module 7202 is used to execute at least one of the communication steps such as processing performed by the decoder in any of the above methods, which will not be repeated here.
在一些实施例中,收发模块可以包括发送模块和/或接收模块,发送模块和接收模块可以是分离的,也可以集成在一起。可选地,收发模块可以与收发器相互替换。In some embodiments, the transceiver module may include a sending module and/or a receiving module, and the sending module and the receiving module may be separate or integrated. Optionally, the transceiver module may be interchangeable with the transceiver.
在一些实施例中,处理模块可以是一个模块,也可以包括多个子模块。可选地,上述多个子模块分别执行处理模块所需执行的全部或部分步骤。可选地,处理模块可以与处理器相互替换。In some embodiments, the processing module can be a module or include multiple submodules. Optionally, the multiple submodules respectively execute all or part of the steps required to be executed by the processing module. Optionally, the processing module can be replaced with the processor.
图8A是本公开实施例提出的通信设备8100的结构示意图。通信设备8100可以是解码器(例如接入网设备、核心网设备等),也可以是编码器(例如用户设备等),也可以是支持解码器实现以上任一方法的芯片、芯片系统、或处理器等,还可以是支持编码器实现以上任一方法的芯片、芯片系统、或处理器等。通信设备8100可用于实现上述方法实施例中描述的方法,具体可以参见上述方法实施例中的说明。FIG8A is a schematic diagram of the structure of a communication device 8100 proposed in an embodiment of the present disclosure. The communication device 8100 may be a decoder (e.g., an access network device, a core network device, etc.), or an encoder (e.g., a user device, etc.), or a chip, a chip system, or a processor that supports a decoder to implement any of the above methods, or a chip, a chip system, or a processor that supports an encoder to implement any of the above methods. The communication device 8100 may be used to implement the method described in the above method embodiment, and the details may refer to the description in the above method embodiment.
如图8A所示,通信设备8100包括一个或多个处理器8101。处理器8101可以是通用处理器或 者专用处理器等,例如可以是基带处理器或中央处理器。基带处理器可以用于对通信协议以及通信数据进行处理,中央处理器可以用于对编解码装置(如,基站、基带芯片,编码器设备、编码器设备芯片,DU或CU等)进行控制,执行程序,处理程序的数据。通信设备8100用于执行以上任一方法。As shown in FIG8A , the communication device 8100 includes one or more processors 8101. The processor 8101 may be a general purpose processor or a processor. The communication device 8100 may be a dedicated processor, such as a baseband processor or a central processing unit. The baseband processor may be used to process the communication protocol and the communication data, and the central processing unit may be used to control the coding and decoding device (such as a base station, a baseband chip, an encoder device, an encoder device chip, a DU or a CU, etc.), execute a program, and process the data of the program. The communication device 8100 is used to execute any of the above methods.
在一些实施例中,通信设备8100还包括用于存储指令的一个或多个存储器8102。可选地,全部或部分存储器8102也可以处于通信设备8100之外。In some embodiments, the communication device 8100 further includes one or more memories 8102 for storing instructions. Optionally, all or part of the memory 8102 may also be outside the communication device 8100.
在一些实施例中,通信设备8100还包括一个或多个收发器8103。在通信设备8100包括一个或多个收发器8103时,收发器8103执行上述方法中的发送和/或接收等通信步骤(例如步骤S2101、步骤S2102、步骤S2103、步骤S2104,但不限于此)中的至少一者。In some embodiments, the communication device 8100 further includes one or more transceivers 8103. When the communication device 8100 includes one or more transceivers 8103, the transceiver 8103 performs at least one of the communication steps such as sending and/or receiving in the above method (for example, step S2101, step S2102, step S2103, step S2104, but not limited thereto).
在一些实施例中,收发器可以包括接收器和/或发送器,接收器和发送器可以是分离的,也可以集成在一起。可选地,收发器、收发单元、收发机、收发电路等术语可以相互替换,发送器、发送单元、发送机、发送电路等术语可以相互替换,接收器、接收单元、接收机、接收电路等术语可以相互替换。In some embodiments, the transceiver may include a receiver and/or a transmitter, and the receiver and the transmitter may be separate or integrated. Optionally, the terms such as transceiver, transceiver unit, transceiver, transceiver circuit, etc. may be replaced with each other, the terms such as transmitter, transmission unit, transmitter, transmission circuit, etc. may be replaced with each other, and the terms such as receiver, receiving unit, receiver, receiving circuit, etc. may be replaced with each other.
在一些实施例中,通信设备8100可以包括一个或多个接口电路8104。可选地,接口电路8104与存储器8102连接,接口电路8104可用于从存储器8102或其他装置接收信号,可用于向存储器8102或其他装置发送信号。例如,接口电路8104可读取存储器8102中存储的指令,并将该指令发送给处理器8101。In some embodiments, the communication device 8100 may include one or more interface circuits 8104. Optionally, the interface circuit 8104 is connected to the memory 8102, and the interface circuit 8104 may be used to receive signals from the memory 8102 or other devices, and may be used to send signals to the memory 8102 or other devices. For example, the interface circuit 8104 may read instructions stored in the memory 8102 and send the instructions to the processor 8101.
以上实施例描述中的通信设备8100可以是解码器或者编码器,但本公开中描述的通信设备8100的范围并不限于此,通信设备8100的结构可以不受图8A的限制。通信设备可以是独立的设备或者可以是较大设备的一部分。例如所述通信设备可以是:1)独立的集成电路IC,或芯片,或,芯片系统或子系统;(2)具有一个或多个IC的集合,可选地,上述IC集合也可以包括用于存储数据,程序的存储部件;(3)ASIC,例如调制解调器(Modem);(4)可嵌入在其他设备内的模块;(5)接收机、编码器设备、智能编码器设备、蜂窝电话、无线设备、手持机、移动单元、车载设备、解码器、云设备、人工智能设备等等;(6)其他等等。The communication device 8100 described in the above embodiments may be a decoder or an encoder, but the scope of the communication device 8100 described in the present disclosure is not limited thereto, and the structure of the communication device 8100 may not be limited by FIG. 8A. The communication device may be an independent device or may be part of a larger device. For example, the communication device may be: 1) an independent integrated circuit IC, or a chip, or a chip system or subsystem; (2) a collection of one or more ICs, optionally, the above IC collection may also include a storage component for storing data and programs; (3) an ASIC, such as a modem; (4) a module that can be embedded in other devices; (5) a receiver, an encoder device, an intelligent encoder device, a cellular phone, a wireless device, a handheld device, a mobile unit, a vehicle-mounted device, a decoder, a cloud device, an artificial intelligence device, etc.; (6) others, etc.
图8B是本公开实施例提出的芯片8200的结构示意图。对于通信设备8100可以是芯片或芯片系统的情况,可以参见图8B所示的芯片8200的结构示意图,但不限于此。8B is a schematic diagram of the structure of a chip 8200 provided in an embodiment of the present disclosure. In the case where the communication device 8100 may be a chip or a chip system, reference may be made to the schematic diagram of the structure of the chip 8200 shown in FIG8B , but the present disclosure is not limited thereto.
芯片8200包括一个或多个处理器8201,芯片8200用于执行以上任一方法。The chip 8200 includes one or more processors 8201, and the chip 8200 is used to execute any of the above methods.
在一些实施例中,芯片8200还包括一个或多个接口电路8202。可选地,接口电路8202与存储器8203连接,接口电路8202可以用于从存储器8203或其他装置接收信号,接口电路8202可用于向存储器8203或其他装置发送信号。例如,接口电路8202可读取存储器8203中存储的指令,并将该指令发送给处理器8201。In some embodiments, the chip 8200 further includes one or more interface circuits 8202. Optionally, the interface circuit 8202 is connected to the memory 8203. The interface circuit 8202 can be used to receive signals from the memory 8203 or other devices, and the interface circuit 8202 can be used to send signals to the memory 8203 or other devices. For example, the interface circuit 8202 can read instructions stored in the memory 8203 and send the instructions to the processor 8201.
在一些实施例中,接口电路8202执行上述方法中的发送和/或接收等通信步骤中的至少一者,处理器8201执行其他步骤中的至少一者。In some embodiments, the interface circuit 8202 performs at least one of the communication steps such as sending and/or receiving in the above method, and the processor 8201 performs at least one of the other steps.
在一些实施例中,接口电路、接口、收发管脚、收发器等术语可以相互替换。 In some embodiments, terms such as interface circuit, interface, transceiver pin, and transceiver may be used interchangeably.
在一些实施例中,芯片8200还包括用于存储指令的一个或多个存储器8203。可选地,全部或部分存储器8203可以处于芯片8200之外。In some embodiments, the chip 8200 further includes one or more memories 8203 for storing instructions. Optionally, all or part of the memory 8203 may be outside the chip 8200.
本公开还提出存储介质,上述存储介质上存储有指令,当上述指令在通信设备8100上运行时,使得通信设备8100执行以上任一方法。可选地,上述存储介质是电子存储介质。可选地,上述存储介质是计算机可读存储介质,但不限于此,其也可以是其他装置可读的存储介质。可选地,上述存储介质可以是非暂时性(non-transitory)存储介质,但不限于此,其也可以是暂时性存储介质。The present disclosure also proposes a storage medium, on which instructions are stored, and when the instructions are executed on the communication device 8100, the communication device 8100 executes any of the above methods. Optionally, the storage medium is an electronic storage medium. Optionally, the storage medium is a computer-readable storage medium, but is not limited to this, and it can also be a storage medium readable by other devices. Optionally, the storage medium can be a non-transitory storage medium, but is not limited to this, and it can also be a temporary storage medium.
本公开还提出程序产品,上述程序产品被通信设备8100执行时,使得通信设备8100执行以上任一方法。可选地,上述程序产品是计算机程序产品。The present disclosure also proposes a program product, which, when executed by the communication device 8100, enables the communication device 8100 to execute any of the above methods. Optionally, the program product is a computer program product.
本公开还提出计算机程序,当其在计算机上运行时,使得计算机执行以上任一方法。 The present disclosure also proposes a computer program, which, when executed on a computer, causes the computer to execute any one of the above methods.
Claims (21)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/128800 WO2025091293A1 (en) | 2023-10-31 | 2023-10-31 | Grouping method, encoder, decoder, and storage medium |
| CN202380012056.6A CN117730367A (en) | 2023-10-31 | 2023-10-31 | Grouping methods, encoders, decoders, and storage media |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/128800 WO2025091293A1 (en) | 2023-10-31 | 2023-10-31 | Grouping method, encoder, decoder, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025091293A1 true WO2025091293A1 (en) | 2025-05-08 |
Family
ID=90203912
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/128800 Pending WO2025091293A1 (en) | 2023-10-31 | 2023-10-31 | Grouping method, encoder, decoder, and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN117730367A (en) |
| WO (1) | WO2025091293A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104364842A (en) * | 2012-04-18 | 2015-02-18 | 诺基亚公司 | Stereo audio signal encoder |
| CN107112020A (en) * | 2014-10-31 | 2017-08-29 | 杜比国际公司 | The parametrization mixing of audio signal |
| US20170339505A1 (en) * | 2014-10-31 | 2017-11-23 | Dolby International Ab | Parametric encoding and decoding of multichannel audio signals |
| CN107592937A (en) * | 2015-03-09 | 2018-01-16 | 弗劳恩霍夫应用研究促进协会 | Apparatus and method for encoding or decoding multi-channel signal |
| CN113948095A (en) * | 2020-07-17 | 2022-01-18 | 华为技术有限公司 | Codec method and device for multi-channel audio signal |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| SE0400998D0 (en) * | 2004-04-16 | 2004-04-16 | Cooding Technologies Sweden Ab | Method for representing multi-channel audio signals |
| SE0402650D0 (en) * | 2004-11-02 | 2004-11-02 | Coding Tech Ab | Improved parametric stereo compatible coding or spatial audio |
| DE102005010057A1 (en) * | 2005-03-04 | 2006-09-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating a coded stereo signal of an audio piece or audio data stream |
| US7831434B2 (en) * | 2006-01-20 | 2010-11-09 | Microsoft Corporation | Complex-transform channel coding with extended-band frequency coding |
| CN102737647A (en) * | 2012-07-23 | 2012-10-17 | 武汉大学 | Encoding and decoding method and encoding and decoding device for enhancing dual-track voice frequency and tone quality |
| EP2880653B1 (en) * | 2012-08-03 | 2017-11-01 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Decoder and method for multi-instance spatial-audio-object-coding employing a parametric concept for multichannel downmix/upmix cases |
| CN104782145B (en) * | 2012-09-12 | 2017-10-13 | 弗劳恩霍夫应用研究促进协会 | The device and method of enhanced guiding downmix performance is provided for 3D audios |
| CN103400582B (en) * | 2013-08-13 | 2015-09-16 | 武汉大学 | Towards decoding method and the system of multisound path three dimensional audio frequency |
| CN104240712B (en) * | 2014-09-30 | 2018-02-02 | 武汉大学深圳研究院 | A kind of three-dimensional audio multichannel grouping and clustering coding method and system |
| CN105828271B (en) * | 2015-01-09 | 2019-07-05 | 南京青衿信息科技有限公司 | A method of two channel sound signals are converted into three sound channel signals |
| CN106710600B (en) * | 2016-12-16 | 2020-02-04 | 广州广晟数码技术有限公司 | Decorrelation coding method and apparatus for a multi-channel audio signal |
| CN120526779A (en) * | 2020-03-09 | 2025-08-22 | 日本电信电话株式会社 | Audio signal down-mixing method, encoding method, down-mixing device, and program |
-
2023
- 2023-10-31 CN CN202380012056.6A patent/CN117730367A/en active Pending
- 2023-10-31 WO PCT/CN2023/128800 patent/WO2025091293A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104364842A (en) * | 2012-04-18 | 2015-02-18 | 诺基亚公司 | Stereo audio signal encoder |
| CN107112020A (en) * | 2014-10-31 | 2017-08-29 | 杜比国际公司 | The parametrization mixing of audio signal |
| US20170339505A1 (en) * | 2014-10-31 | 2017-11-23 | Dolby International Ab | Parametric encoding and decoding of multichannel audio signals |
| CN107592937A (en) * | 2015-03-09 | 2018-01-16 | 弗劳恩霍夫应用研究促进协会 | Apparatus and method for encoding or decoding multi-channel signal |
| CN113948095A (en) * | 2020-07-17 | 2022-01-18 | 华为技术有限公司 | Codec method and device for multi-channel audio signal |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117730367A (en) | 2024-03-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4246510A1 (en) | Audio encoding and decoding method and apparatus | |
| EP4246509B1 (en) | Audio encoding/decoding method and device | |
| US20240312469A1 (en) | Apparatus, Methods and Computer Programs for Encoding Spatial Metadata | |
| CN114731483A (en) | Sound field adaptation for virtual reality audio | |
| TWI853232B (en) | Audio encoding/decoding method and apparatus | |
| WO2022002216A1 (en) | Audio encoding method and encoding/decoding device | |
| JP2023523081A (en) | Bit allocation method and apparatus for audio signal | |
| RU2769789C2 (en) | Method and device for encoding an inter-channel phase difference parameter | |
| WO2025160833A1 (en) | Methods and apparatuses for encoding and decoding, and storage medium | |
| CN118160034A (en) | Encoding and decoding method, device and storage medium | |
| WO2025091293A1 (en) | Grouping method, encoder, decoder, and storage medium | |
| WO2025160826A1 (en) | Encoding, decoding and encoding-decoding methods, encoding apparatus, decoding apparatus and storage medium | |
| KR20250102055A (en) | Parameter space audio encoding | |
| CN113948095A (en) | Codec method and device for multi-channel audio signal | |
| WO2025091294A1 (en) | Encoding and decoding method, terminal, network device, and storage medium | |
| KR20230069173A (en) | Quantizing Spatial Audio Parameters | |
| US20250210049A1 (en) | Parametric spatial audio encoding | |
| CN120112994A (en) | Signal processing method and device | |
| WO2025081393A1 (en) | Audio signal processing method and apparatus, and audio device and storage medium | |
| WO2025145384A1 (en) | Coding method and device, decoding method and device, and storage medium | |
| CN117769740A (en) | Audio signal encoding and decoding method and device, communication system, communication equipment and storage medium | |
| CN118591840A (en) | Audio data encoding method, device, system and storage medium | |
| KR20250088634A (en) | Parameter space audio encoding | |
| WO2025160827A1 (en) | Processing method and apparatus, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23957170 Country of ref document: EP Kind code of ref document: A1 |