WO2006003813A1

WO2006003813A1 - Audio encoding and decoding apparatus

Info

Publication number: WO2006003813A1
Application number: PCT/JP2005/011340
Authority: WO
Inventors: Naoya Tanaka; Kok Seng Chong; Mineo Tsushima
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-07-02
Filing date: 2005-06-21
Publication date: 2006-01-12
Anticipated expiration: 2007-01-02

Abstract

An audio decoding apparatus for improving the signal separation process based on spatial acoustic information to improve the sound quality, wherein signal separating means (104) performs, based on both an inter-channel phase difference information IPD parameter (112) and a signal transition degree TF parameter (113), a signal separation process for an inputted monaural FFT coefficient (111). FFT coefficients (116) as separated for a plurality of channels are adjusted in inter-channel correlation by correction control means (105) and thereafter adjusted in gain by gain control means (106). The signal separation process is achieved by shifting, based on a phase shift amount as calculated from the IPD parameter, the phase of the monaural FFT coefficient. The shift amount for each channel is adjusted in accordance with the TF parameter, thereby improving the sound quality against a transient signal, particularly, an attack sound or the like.

Description

明細書 Specification

オーディオ符号化及び復号化装置 Audio encoding and decoding apparatus

技術分野 Technical field

[0001] 本発明は、オーディオ信号を少ない情報量で効率的に符号化し、また、符号化された情報を復号化する装置に関する。 [0001] The present invention relates to an apparatus for efficiently encoding an audio signal with a small amount of information, and for decoding the encoded information.

背景技術 Background art

[0002] オーディオ符号化の目的は、ディジタルィ匕されたオーディオ信号をできるだけ効率的に圧縮符号化、伝送し、デコーダにおける復号ィ匕処理によって、できるだけ高い品質のオーディオ信号を再生することにある。 The purpose of audio coding is to compress and transmit digitally encoded audio signals as efficiently as possible, and to reproduce as high quality audio signals as possible by decoding processing at a decoder. .

[0003] 入力オーディオ信号を周波数スぺ外ル信号に変換し、人間の聴覚的な特性を利用して、効率的に圧縮符号ィ匕する技術が数多く知られているが、その一つとして、空間音響情報 (Spatial Information)もしくは、聴覚的音響情報 (Binaural Cue)と呼ばれる情報を利用する技術がある。このような技術の例としては、 ISO国際標準規格である M PEG- 4 Audio (ISO/IEC 14496- 3)において定められたパラメトリックステレオ (Paramet ric Stereo)方式がある (非特許文献 1参照)。また、別の例として、特許文献 1において開示される方式がある。 [0003] A number of techniques are known for efficiently converting an input audio signal into a frequency space signal and utilizing the human auditory characteristics to efficiently perform compression coding, but one of them is known. There is a technology that uses information called Spatial Acoustic Information or Auditory Acoustic Information (Binaural Cue). As an example of such a technique, there is a parametric stereo (Parametric Stereo) system defined in M PEG-4 Audio (ISO / IEC 14496-3) which is an ISO international standard (see Non-Patent Document 1). Also, as another example, there is a method disclosed in Patent Document 1.

[0004] 代表例として、後者の構成を図 6に示す。この構成において、入力される情報は、モノラルの時間信号 504と、 BCC (Binaural Cue Coding)パラメータ 506である。モノラル時間信号 504は、通常、原音信号のダウンミックス信号、例えば (L+R)/2である。 TF 変換手段 501は、モノラル時間信号 504を周波数パラメータ 505に変換する。用いる変換の種類は、フーリエ変換、コサイン変換等、公知のいかなる変換を用いても良い。ここでは、高速フーリエ変換 (FFT)を用いるものとして説明する。 As a representative example, the configuration of the latter is shown in FIG. In this configuration, the information to be input is a monaural time signal 504 and a BCC (Binaural Cue Coding) parameter 506. The mono time signal 504 is typically a downmix signal of the original signal, eg, (L + R) / 2. The TF converter 501 converts the monaural time signal 504 into a frequency parameter 505. As a type of transformation to be used, any known transformation such as Fourier transformation or cosine transformation may be used. Here, it is assumed that the fast Fourier transform (FFT) is used.

[0005] 聴覚的音場生成手段 502は、入力されたモノラルの FFT係数 505に対して、 BCC ノメータによって与えられる空間音響情報に基づく処理を行い、所定の音響空間を表現するステレオの FFT係数 507を生成する。逆 TF変換手段 503は、 FFT係数 50 7を逆変換し、ステレオの時間信号 508を出力する。ここで、 BCCパラメータによって与えられる空間音響情報は、 ITD (Inter- auraほたは Inter- channel Time Difference)、 ILD (Inter— auralまた ίま Inter— channel Level Difference)、 ICC (Inter— auralまた ίま Inter— channel Coherence)で表される。一般的にこれらの空間音響情報は、 FFT係数 507を周波数方向に複数に分割したサブバンド単位に伝送され、各サブバンドにおいて所望の特性を持つ信号が生成されるように処理される。 Auditory sound field generation means 502 performs processing based on the spatial acoustic information provided by the BCC meter on the input monaural FFT coefficients 505, and stereo FFT coefficients representing a predetermined acoustic space. Generate 507. The inverse TF transformation means 503 inversely transforms the FFT coefficient 50 7 and outputs a stereo time signal 508. Here, the spatial acoustic information given by the BCC parameter is ITD (Inter-Aura or Inter-channel Time Difference), It is expressed as ILD (Inter-aural or Inter- channel Level Difference) and ICC (Inter-aural or Inter- channel Coherence). In general, such spatial acoustic information is transmitted in units of subbands obtained by dividing the FFT coefficient 507 in the frequency direction, and processed so that a signal having desired characteristics is generated in each subband.

[0006] 聴覚的音場生成手段 502は、モノラル FFT係数 505に対して、 ITDで示される時間差 Tsを持つステレオ信号を生成する。一般的には、モノラル FFT係数 505に対して、それぞれ +Ts/2と- Ts/2の時間差を生じさせる遅延もしくは位相シフト操作を行うことにより実現する。続いて、 ILDによって示されるチャネル間レベル差を反映するように、各チャネルのゲインを調整する。 Auditory sound field generation means 502 generates a stereo signal having a time difference Ts indicated by ITD with respect to monaural FFT coefficient 505. In general, this is realized by performing delay or phase shift operation on the monaural FFT coefficient 505 to generate a time difference of + Ts / 2 and −Ts / 2, respectively. Subsequently, the gain of each channel is adjusted to reflect the inter-channel level difference indicated by the ILD.

[0007] さらに、 ICCは、特許文献 2に示される様に、音響空間の広がりを示すパラメータである。例えば、特許文献 2においては、チャネル間のゲイン差を擬似ランダム的に変化させ、さらに、ゲイン差変化を伝送される ICCにしたがって調整することにより、音響空間の広がりを制御する方法が示されている。また、 ITDによって示される時間差を I CCにしたがって調整することによって、音響空間の広がりを制御する方法も示されている。 Further, as shown in Patent Document 2, ICC is a parameter indicating the spread of acoustic space. For example, Patent Document 2 shows a method of controlling the spread of acoustic space by changing the gain difference between channels in a pseudo-random manner and adjusting the change in gain difference according to the ICC to be transmitted. ing. It also shows how to control the spread of the acoustic space by adjusting the time difference indicated by ITD according to I CC.

[0008] このように、 BCCパラメータにしたがって、チャネル間の時間差、レベル差および広力 Sり感を調整されて生成されたステレオ信号 508は、基準とする原音ステレオ信号と同等の空間音響特性を持つことになり、聴覚上、原音ステレオ信号に近い音質を実現することができる。 Thus, the stereo signal 508 generated by adjusting the time difference between channels, the level difference and the sense of wideness according to the BCC parameters has the same spatial acoustic characteristics as the reference original sound stereo signal. As a result, the sound quality similar to the original sound stereo signal can be realized in terms of hearing.

非特許文献 l : ISO/IEC 14496-3:2001 AMD2 "Parametric Coding for High Quality Audio Non-Patent Document l: ISO / IEC 14496-3: 2001 AMD2 "Parametric Coding for High Quality Audio

特許文献 1 :米国公開特許 US2003/0035553 "Backwards- compatible Perceptual Cod ing of Spatial Cues Patent Document 1: US Published Patent US 2003/0035553 "Backwards- compatible Perceptual Coding of Spatial Cues

特許文献 2 :米国公開特許 US2003/0219130 "Coherence- based Audio Coding and S ynthesis" Patent Document 2: US Published Patent US2003 / 0219130 "Coherence-based Audio Coding and Synthesis"

発明の開示 Disclosure of the invention

発明が解決しょうとする課題 Problem that invention tries to solve

[0009] し力しながら、原音信号におけるすべての空間音響特性を、 ITD, ILD, ICCの 3つのパラメータで表現することは難しい。たとえば、原音信号に過渡的な成分、特にァタック音が含まれている場合に、そのアタック音力音響空間内の狭い範囲に存在する、引き締まった音像なのか、それとも広がり感を持った音像なの力を表現する必要がある。 [0009] While stressing, all spatial acoustic characteristics of the original sound signal, ITD, ILD, ICC three It is difficult to express with the parameters of For example, if the original sound signal contains a transient component, in particular, an attack sound, the attack sound may be a tight sound image or a sound image with a sense of expansion that exists in a narrow range within the sound space. It is necessary to express the power of

[0010] 従来、音像の広がり感を示す指標としては、 ICCに代表されるコヒーレンス (Coheren ce)と呼ばれる尺度が用いられていた力この値は、チャネル間の相関を示す尺度であり、アタック音など、もともとチャネル間相関性の低い信号に対しては、尺度としての信頼性が低下するという問題があった。この問題に対して、例えば、 ISO/IEC 14496- 3:2001 AMD2において開示されるパラメトリックステレオ方式では、デコーダにおいて信号の過渡性を検出し、過渡性の信号に対しては、付加する残響成分量を調整することにより、音像の広がり感の制御を行っている。 [0010] Conventionally, a measure called coherence (Coherence ce) represented by ICC has been used as an indicator that indicates a sense of sound image spread. This value is a measure that indicates the correlation between channels, and the attack sound is For example, for signals with low inter-channel correlation, there has been a problem that the reliability as a measure decreases. To address this problem, for example, in the parametric stereo method disclosed in ISO / IEC 14496-3: 2001 AMD2, the transient property of the signal is detected at the decoder, and the amount of reverberation component to be added to the signal of transient property. Controls the sense of the sound image's spread.

[0011] し力しながら、この処理においては、ダウンミックス信号に対する過渡性検出が行われるため、本来の空間音響特性を表現するために必要なチャネル毎の過渡性を表現することができない。また、デコーダ側に検出処理が必要となるため、再生機器における必要演算量が増加するという問題も生じる。 However, in this process, since the transientity detection for the downmix signal is performed, it is not possible to express the transientity for each channel necessary to represent the original spatial acoustic characteristics. In addition, since the detection process is required on the decoder side, there arises a problem that the amount of calculation required in the playback device is increased.

[0012] 本発明は、上記課題を解決するものであって、アタック音に対する空間音響特性をより正確に表現できるオーディオ符号ィ匕及び復号ィ匕技術を提供することを目的とする課題を解決するための手段 The present invention is intended to solve the above-mentioned problems, and an object of the present invention is to provide an audio coding system and decoding technology capable of more accurately expressing the spatial acoustic characteristics to an attack sound. Means for

[0013] 本発明のオーディオ符号ィ匕装置は、 mチャネル (mは 2以上の自然数)の原音信号から、 nチャネル (nは mよりも小さな自然数)のダウンミックス信号と、前記原音信号のチャネル間の位相差を表す空間音響情報信号とを生成するオーディオ符号化装置であって、前記原音信号をダウンミックスすることによって、前記ダウンミックス信号を生成するダウンミックス手段と、前記原音信号を分析することによって、前記空間音響情報信号と共に、前記原音信号の過渡性を示す過渡性度合をチャネルごとに表す過渡性情報信号を生成する空間音響情報分析手段と、前記ダウンミックス信号、前記空間音響情報信号、及び前記過渡性情報信号を一つのビットストリームに多重化して出力するビットストリーム多重化手段とを備える。 [0014] また、本発明のオーディオ復号ィ匕装置は、 mチャネル (mは 2以上の自然数)の原音信号をダウンミックスして得られる nチャネル (nは mよりも小さな自然数)のダウンミックス信号から、チャネル間に定められる位相差に基づいて mチャネルの復号信号を生成するオーディオ復号化装置であって、前記原音信号の過渡性を示す過渡性度合をチャネルごとに表す過渡性情報信号を取得する信号取得手段と、前記ダウンミツタス信号から、前記位相差と前記過渡性度合とに基づ!ヽて前記復号信号をチャネルごとに生成し、生成された復号信号を出力する信号生成手段とを備える。 The audio code device of the present invention comprises an original sound signal of m channels (m is a natural number of 2 or more), a downmix signal of n channels (n is a natural number smaller than m), and a channel of the original sound signal. An audio encoding device for generating a spatial acoustic information signal representing a phase difference between the two, and downmixing means for generating the downmix signal by downmixing the original sound signal, and analyzing the original sound signal. Space acoustic information analysis means for generating a transient information signal representing, for each channel, a transient degree indicative of the transient nature of the original sound signal together with the spatial acoustic information signal, the downmix signal, and the spatial acoustic information And a bitstream multiplexing means for multiplexing and outputting the signal and the transient information signal into one bitstream. Also, the audio decoding device of the present invention is an n-channel (n is a natural number smaller than m) downmix obtained by downmixing m-channel (m is a natural number of 2 or more) source signals. An audio decoding device that generates an m-channel decoded signal based on a phase difference determined between channels from a signal, and is a transient information signal that indicates transientness level indicating the transient property of the original sound signal for each channel. And a signal generation unit for generating the decoded signal for each channel based on the phase difference and the degree of transientness from the downmixing signal, and the generated signal. And

[0015] さらに、本発明は、オーディオ符号化装置、及びオーディオ復号化装置として実現することができるだけでなぐその両者力なるオーディオ伝送システムとして実現することや、これらの装置が備える特徴的な手段によって実行される処理をステップとするオーディオ符号化方法、及びオーディオ復号ィ匕方法として実現することもできる。発明の効果 [0015] Furthermore, the present invention can be realized as an audio encoding device and an audio decoding device, and can be realized as an audio transmission system having both capabilities, and the characteristic means provided with these devices. The present invention can also be realized as an audio coding method and an audio decoding method, each of which has a process performed by Effect of the invention

[0016] 本発明によれば、原音信号の過渡性を表すチャネルごとの過渡性度合を用いるという特徴的な構成により、音声の再生の際に、チャネル間の位相差、相関、レベル差といった空間音響情報を、音声の過渡性に応じてチャネルごとに選択的に適用することができる。そのため、その音声が空間音響情報を正確に求めることが難しい過渡的な音声 (例えば、アタック音など)であることが、過渡性度合によって示される場合には、その再生に際して空間音響情報が適用される度合いを抑制することによって、その音声が不正確な空間音響情報に従って再生され、その結果再生音の音像が意図に反して広がってしまう、という不都合が解消される。 According to the present invention, the characteristic configuration of using the transient level for each channel, which represents the transient property of the original sound signal, enables phase difference, correlation, level difference, etc. between channels when reproducing speech. Spatial acoustic information can be selectively applied on a channel-by-channel basis depending on speech transients. Therefore, if it is indicated by the degree of transientness that the voice is a transient voice (for example, attack sound etc.) for which it is difficult to obtain the space acoustic information accurately, the spatial acoustic information is applied at the time of the reproduction. By suppressing the noise level, the problem that the sound is reproduced according to the inaccurate spatial acoustic information and the sound image of the reproduced sound is unintentionally spread is eliminated.

図面の簡単な説明 Brief description of the drawings

[0017] [図 1]図 1は、本発明の実施の形態 1のオーディオ伝送システムの一構成例を示す図である。 FIG. 1 is a diagram showing an exemplary configuration of an audio transmission system according to a first embodiment of the present invention.

[図 2]図 2は、本発明の実施の形態 1のオーディオ復号装置の構成例を示す図である [FIG. 2] FIG. 2 is a diagram showing a configuration example of an audio decoding device according to a first embodiment of the present invention.

[図 3]図 3は、位相シフト量の平滑ィ匕処理を説明する図である。 [FIG. 3] FIG. 3 is a diagram for explaining a smoothing process of the phase shift amount.

[図 4]図 4は、本発明の実施の形態 2のオーディオ復号装置の構成例を示す図である [図 5]図 5は、本発明の実施の形態 3のオーディオ復号装置の構成例を示す図である 1— [FIG. 4] FIG. 4 is a diagram showing a configuration example of an audio decoding device according to a second embodiment of the present invention. [FIG. 5] FIG. 5 is a diagram showing an exemplary configuration of an audio decoding device according to a third embodiment of the present invention.

〇 Yes

[図 6]図 6は、従来技術のオーディオ復号装置の構成例を示す図である。 [FIG. 6] FIG. 6 is a diagram showing a configuration example of a conventional audio decoding device.

符号の説明 Explanation of sign

符号化装置 Coding device

20 復号化装置 20 decryption device

100 信号生成手段 100 Signal generation means

101 ビットストリーム分離手段 101 bit stream separation means

102 コアデコード手段 102 core decoding means

103 FFT手段 103 FFT means

104 信号分離手段 104 Signal separation means

105 相関制御手段 105 Correlation control means

106 ゲイン制御手段 106 Gain control means

107 IFFT手段 107 IFFT means

108 入力ビットストリーム 108 input bit stream

109 コアビットストリーム 109 core bit stream

110 モノラル PCM信号 110 mono PCM signal

111 モノラル FFT係数 111 monaural FFT coefficients

112 IPDパラメータ 112 IPD parameters

113 TFパラメータ 113 TF parameters

114 ICCパラメータ 114 ICC parameters

115 ゲインパラメータ 115 gain parameters

116 分離 FFT係数 116 separate FFT coefficients

117 チャネル間相関調整後の分離 FFT係数 117 Separated FFT coefficients after correlation adjustment between channels

118 出力ステレオ FFT係数 118 output stereo FFT coefficients

119 出力ステレオ PCM信号 119 output stereo PCM signals

201 周波数サブバンド 0 201 frequency subband 0

202 周波数サブバンド 1 203 周波数サブバンド 2 202 frequency subband 1 203 frequency subband 2

301 過渡性検出手段 301 Transient detection means

302 過渡性度合 302 Transient degree

303 TCパラメータ 303 TC parameter

401 第 2の信号分離手段 401 Second signal separation means

501 TF変換手段 501 TF conversion means

502 聴覚的音場生成手段 502 Auditory Sound Field Generation Means

503 逆 TF変換手段 503 inverse TF conversion means

504 入力モノラル信号 504 input monaural signal

505 モノラノレ FFT係数 505 Monolanore FFT coefficients

506 入力 BCCパラメータ 506 input BCC parameters

507 ステレオ FFT係数 507 stereo FFT coefficients

508 出力ステレオ信号 508 output stereo signal

601 ダウンミックス手段 601 downmix means

602 コアエンコード手段 602 Core encoding method

603 空間音響情報分析手段 603 Space acoustic information analysis means

604 過渡性検出手段 604 Transientness detection means

605 ビットストリーム多重化手段 605 bit stream multiplexing means

発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION

[0019] (実施の形態 1) Embodiment 1

図 1は、本発明の第 1の実施の形態に係るオーディオ伝送システムの構成の一例を示す機能ブロック図である。本オーディオ伝送システムは、ステレオの原音信号 L、 R を、モノラルのダウンミックス信号 Mと、少なくとも前記原音信号のチャネル間の位相差を表す空間音響情報信号とに表して伝送するオーディオ伝送システムであり、ォ一ディォ符号化装置 10及びオーディオ復号化装置 20から構成される。 FIG. 1 is a functional block diagram showing an example of a configuration of an audio transmission system according to a first embodiment of the present invention. The present audio transmission system is an audio transmission system that expresses stereo original sound signals L and R into a monaural downmix signal M and a spatial acoustic information signal representing at least a phase difference between channels of the original sound signals. , An audio encoding device 10 and an audio decoding device 20.

[0020] 本発明のオーディオ伝送システムは、特に、前記ダウンミックス信号及び空間音響情報信号と共に、前記原音信号の過渡性を示す過渡性度合をチャネルごとに表す過渡性情報信号を伝送する点で特徴付けられる。この過渡性度合の技術的な意義については、後に詳しく説明する。 [0020] The audio transmission system of the present invention is particularly characterized in that it transmits, together with the downmix signal and the spatial acoustic information signal, a transient information signal representing, for each channel, a transient degree indicating the transientness of the original sound signal. Will be attached. Technical significance of this level of transition Will be described in detail later.

[0021] ここで、ステレオの原音信号は、 mチャネルの原音信号の mが 2である場合の一例であり、モノラルのダウンミックス信号は、 nチャネルのダウンミックス信号の nが 1である場合の一例である。 Here, the stereo original sound signal is an example when m of the m channel original sound signal is 2, and the monaural downmix signal is when n of the n channel downmix signal is 1. An example of

[0022] オーディオ符号化装置 10は、原音信号 L、 Rから得たダウンミックス信号 M、空間音響情報信号、及び過渡性情報信号を一つのビットストリーム 108に多重化して出力する装置であり、ダウンミックス手段 601、コアエンコード手段 602、空間音響情報分析手段 603、過渡性検出手段 604、及びビットストリーム多重化手段 605から構成される。 The audio encoding device 10 is a device that multiplexes the downmix signal M obtained from the original sound signals L and R, the spatial sound information signal, and the transient information signal into one bit stream 108 and outputs the multiplexed signal. A downmix unit 601, a core encoding unit 602, a spatial acoustic information analysis unit 603, a transient detection unit 604, and a bit stream multiplexing unit 605.

[0023] ダウンミックス手段 601は、入力される原音信号 L、 Rをダウンミックスすることによつてダウンミックス信号 Mを得てコアエンコード手段 602へ出力する。具体的に、このダゥンミックス処理は、原音信号 L、 Rの平均を求めて、それをダウンミックス信号 Mとする処理であってもよい。 The downmix means 601 obtains the downmix signal M by downmixing the input original sound signals L and R, and outputs the downmix signal M to the core encoding means 602. Specifically, this dual mix process may be a process of obtaining the average of the original sound signals L and R and using it as the down mix signal M.

[0024] コアエンコード手段 602は、入力されるダウンミックス信号 Mをエンコードすることによってコアビットストリームを得てビットストリーム多重化手段 605へ出力する。このェンコードは、具体的に、 MPEG-4 AAC方式等の周知のエンコード技術を用いて行うことができる。 The core encoding means 602 obtains a core bit stream by encoding the input downmix signal M, and outputs the core bit stream to the bit stream multiplexing means 605. Specifically, this encoding can be performed using a known encoding technique such as the MPEG-4 AAC method.

[0025] 空間音響情報分析手段 603は、原音信号 L、 Rを分析して、原音信号のチャネル間の位相差を検出し、検出された位相差を表す空間音響情報信号を、ビットストリーム多重化手段 605へ出力する。空間音響情報分析手段 603は、この位相差の他に、原音信号のチャネル間の相関及びレベル差を検出し、検出された相関及びレベル差を前記位相差と共に空間音響情報信号に表してもよい。このような、原音信号から位相差 (時間差と数学的に同義である）、相関、レベル差を求めてそれらを空間音響情報として取り扱う技術は、従来技術の項で説明したとおり、公知であるので、ここでは詳細な説明を省略する。 The spatial acoustic information analysis means 603 analyzes the original sound signals L and R to detect the phase difference between the channels of the original sound signal, and bit stream multiplexes the spatial acoustic information signal representing the detected phase difference. Output to the converting means 605. The spatial acoustic information analysis means 603 may detect the correlation and level difference between channels of the original sound signal in addition to the phase difference, and may express the detected correlation and level difference in the spatial acoustic information signal together with the phase difference. . Such techniques for obtaining phase difference (which is mathematically synonymous with time difference), correlation, and level difference from the original sound signal and treating them as space acoustic information are known as described in the section of the prior art. Here, the detailed explanation is omitted.

[0026] 過渡性検出手段 604は、原音信号 L、 Rそれぞれにつヽて過渡性を示す過渡性度合を検出する。具体的に、過渡性検出手段 604は、原音信号のチャネルごとに、予め定められた時間内における、例えば信号エネルギの変化値や信号振幅の変化値を検出し、検出された変化値が予め定められたしきい値を上回った力否かを示す 2値情報、又は、検出された変化値を量子化した多数値情報の形式で、過渡性度合を得る。そして、空間音響情報分析手段 603は、過渡性検出手段 604において検出されたチャネルごとの過渡性度合を表す過渡性情報信号をビットストリーム多重化手段 6 05へと出力する。 The transient detection means 604 detects a transient degree indicating transientness with respect to each of the original sound signals L and R. Specifically, the transient property detection means 604 is configured to, for example, change value of signal energy or change value of signal amplitude within a predetermined time for each channel of the original sound signal. The degree of transientness in the form of binary information indicating whether the detected change value exceeds a predetermined threshold value or binary information indicating whether the detected change value exceeds a predetermined threshold value, or of quantizing the detected change value. Get The spatial acoustic information analysis means 603 outputs a transient information signal representing the degree of transientness for each channel detected by the transient detection means 604 to the bit stream multiplexing means 605.

[0027] チャネルごとの過渡性度合は、このように過渡性度合そのものとして表される他に、基準となる過渡性度合 (例えば過渡性度合のチャネル平均)力のチャネルごとの偏差の形式で表されることも考えられる。 [0027] The degree of transientness of each channel is thus expressed as the degree of transientness itself, or in the form of a deviation of the standard level of transientity (for example, channel average of the degree of transientness) per channel. It is also conceivable to be represented.

[0028] ビットストリーム多重化手段 605は、入力されるコアビットストリーム、音響空間情報信号、及び過渡性情報信号を多重化することによって一つのビットストリーム 108を得て、そのビットストリーム 108を、図示されない通信回線、放送回線、記録媒体等へと出力する。 The bitstream multiplexing means 605 obtains one bitstream 108 by multiplexing the core bitstream, the acoustic space information signal, and the transient information signal, and the bitstream 108 is not shown. Output to communication lines, broadcast lines, recording media, etc.

[0029] オーディオ復号化装置 20は、その図示されなヽ通信回線、放送回線、記録媒体等から、ビットストリーム 108を取得して、原音信号 L、 Rを模した復号信号 L'、 R'を得る装置であり、ビットストリーム分離手段 101、コアデコード手段 102、信号生成手段 10 0から構成される。 The audio decoding device 20 acquires a bit stream 108 from a communication line, a broadcast line, a recording medium, etc. (not shown), and decodes the decoded signals L ′ and R ′ which simulate the original sound signals L and R. It is an apparatus to be obtained, and is composed of a bit stream separation means 101, a core decoding means 102, and a signal generation means 100.

[0030] なお、ビットストリーム分離手段 101は、信号取得手段の一例である。 Note that the bit stream separation means 101 is an example of a signal acquisition means.

[0031] ビットストリーム分離手段 101は、ビットストリーム 108を取得し、取得されたビットストリーム 108から、コアビットストリーム、空間音響情報信号、及び過渡性情報信号を多重分離する。空間音響情報信号は、前述したように、少なくとも原音信号のチャネル間の位相差を表し、この位相差の他に、さらに原音信号のチャネル間の相関及びレベル差を表してもよい。 [0031] The bit stream separation means 101 acquires a bit stream 108, and multiples separates the core bit stream, the spatial acoustic information signal, and the transient information signal from the acquired bit stream 108. As described above, the spatial acoustic information signal represents at least a phase difference between channels of the original sound signal, and may further express the correlation and level difference between channels of the original sound signal in addition to the phase difference.

[0032] コアデコード手段 102は、分離されたコアビットストリームをデコードすることによってダウンミックス信号 Mを得て信号生成手段 100へ出力する。このコアデコード手段 10 2は、言うもでもなぐ前述したコアエンコード手段 602によるエンコードの逆変換によつてダウンミックス信号 Mを得る。 The core decoding unit 102 obtains the downmix signal M by decoding the separated core bit stream, and outputs the downmix signal M to the signal generation unit 100. The core decoding means 102 obtains the downmix signal M by inverse conversion of the encoding by the core encoding means 602 described above.

[0033] 信号生成手段 100は、入力されたダウンミックス信号 Mから、分離された空間環境信号によって表されるチャネル間の位相差、及び分離された過渡性情報信号から求まる過渡性度合に基づいて、復号信号 L'、 R'をチャネルごとに生成して出力する。 The signal generation means 100 obtains from the input downmix signal M the phase difference between the channels represented by the separated spatial environment signal and the separated transient information signal. Decoded signals L 'and R' are generated and output for each channel based on the degree of transientness.

[0034] 以下、信号生成手段 100の機能的な詳細構成とそこで行われる処理に焦点を当てながら、オーディオ復号ィ匕装置 20について説明を続ける。 Hereinafter, the audio decoding device 20 will be described, focusing on the functional detailed configuration of the signal generation means 100 and the processing performed there.

[0035] 図 2は、本発明の第 1の実施の形態に係るオーディオ復号ィ匕装置 20の機能的な構成の一例を示す図である。図 2には、信号生成手段 100の内部的な機能構成が詳細に示される。信号生成手段 100は、 FFT手段 103、信号分離手段 104、相関制御手段 105、ゲイン制御手段 106、及び IFFT手段 107から構成される。 FIG. 2 is a diagram showing an example of a functional configuration of the audio decoding device 20 according to the first embodiment of the present invention. The internal functional configuration of the signal generation means 100 is shown in detail in FIG. The signal generation unit 100 includes an FFT unit 103, a signal separation unit 104, a correlation control unit 105, a gain control unit 106, and an IFFT unit 107.

[0036] 入力ビットストリーム 108は、ビットストリーム分離手段 101において、コアビットストリーム 109、 IPDノラメータ 112、 TFパラメータ 113、 ICCパラメータ 114、および Gainパラメータ 115に分離される。コアビットストリーム 109は、コアデコード手段 102において復号処理され、モノラル PCM信号 110が生成される。このモノラル PCM信号 110 力先に説明したダウンミックス信号 Mに相当する。コアビットストリーム 109およびコアデコード手段 102については、例えば、 MPEG-4 AAC方式等、既存のいかなる符号ィ匕方式を用いても良い。復号されたモノラル PCM信号 110は、 FFT手段 103において、モノラル FFT係数 111に変換される。 The input bit stream 108 is separated into a core bit stream 109, an IPD metric 112, a TF parameter 113, an ICC parameter 114, and a gain parameter 115 in the bit stream separating means 101. The core bit stream 109 is decoded by the core decoding means 102 to generate a monophonic PCM signal 110. This monaural PCM signal 110 power corresponds to the downmix signal M described above. For the core bit stream 109 and the core decoding means 102, any existing coding scheme such as the MPEG-4 AAC scheme may be used, for example. The decoded monaural PCM signal 110 is converted into monaural FFT coefficients 111 in the FFT means 103.

[0037] IPDパラメータ 112は、チャネル間の位相差情報 (Inter- channel Phase Difference)を表すパラメータであり、先に説明した ITDと数学的に同義である。この IPDパラメータ 1 12、 ICCパラメータ 114、及び Gainパラメータ 115は、空間音響情報信号によって表され、伝送される。また、 TFパラメータ 113は、先に説明した過渡性度合を表すパラメータであり、過渡性情報信号によって表され伝送される。 The IPD parameter 112 is a parameter representing inter-channel phase difference, and is mathematically synonymous with the ITD described above. The IPD parameter 112, the ICC parameter 114, and the Gain parameter 115 are represented by a space acoustic information signal and transmitted. Also, the TF parameter 113 is a parameter that represents the degree of transientness described above, and is represented and transmitted by the transientity information signal.

[0038] 信号分離手段 104は、モノラル FFT係数 111に対して、 IPDパラメータ 112および T Fパラメータ 113に基づく分離処理を適用して、分離 FFT係数 116を生成する。相関制御手段 105は分離 FFT係数 116について、 ICCパラメータ 114にしたがってチヤネル間相関を調整する。信号分離手段 104および相関制御手段 105の動作については、後で詳しく説明する。チャネル間相関を調整された FFT係数 117は、ゲイン制御手段 106においてゲイン調整され、ステレオ FFT係数 118が生成される。最後に、逆 FFT手段 107は、ステレオ FFT係数 118を逆 FFT変換し、ステレオ PCM信号 119を出力する。 [0039] 次に、信号分離手段 104の動作を詳しく説明する。 IPDはエンコーダ (オーディオ符号化装置 10は、このエンコーダの一例である）において分析され符号化された原音ステレオ信号のチャネル間の位相差情報である。 IPDは一般的に、複数のサブバンドに分割された FFT係数にっヽて、各サブバンドに含まれる複数の FFT係数に対する代表値として符号化される。したがって、信号分離処理はサブバンド単位で行われる。以下の説明においては、当該サブバンドにおけるチャネル間位相差を 0で表す。信号分離手段 104は、モノラル FFT係数 111を、チャネル間位相差 Θを持つ 2つの信号に分離する。分離処理の例として、モノラル FFT係数 111に対して- Θ /2と + Θ /2 の位相をもつ 2つの FFT係数は次のように算出される。 The signal separation means 104 applies separation processing based on the IPD parameter 112 and the TF parameter 113 to the monaural FFT coefficient 111 to generate a separated FFT coefficient 116. The correlation control means 105 adjusts the correlation between channels according to the ICC parameter 114 for the separated FFT coefficient 116. The operations of the signal separation means 104 and the correlation control means 105 will be described in detail later. The inter-channel correlation adjusted FFT coefficients 117 are gain adjusted in gain control means 106 to produce stereo FFT coefficients 118. Finally, the inverse FFT means 107 inverse FFT transforms the stereo FFT coefficients 118 and outputs a stereo PCM signal 119. Next, the operation of the signal separation unit 104 will be described in detail. The IPD is phase difference information between channels of the original sound stereo signal analyzed and encoded in an encoder (the audio encoding device 10 is an example of this encoder). In general, IPD is encoded as a representative value for a plurality of FFT coefficients included in each subband, with respect to the FFT coefficients divided into a plurality of subbands. Therefore, signal separation processing is performed in units of subbands. In the following description, the inter-channel phase difference in the sub-band is represented by 0. The signal separation means 104 separates the monaural FFT coefficient 111 into two signals having an inter-channel phase difference Θ. As an example of separation processing, two FFT coefficients having phases of −Θ / 2 and + Θ / 2 with respect to monaural FFT coefficient 111 are calculated as follows.

[0040] [数 1] [0040] [Number 1]

[0041] ここで、 sは、モノラル FFT係数 111、 kは FFT係数の添え字、 Nは使用する FFTの点数である。チャネル間位相差 Θが大きいほど、生成されるステレオ信号は、聴覚上、より広がった音像となる。分離されたステレオ FFT係数 dおよび元のモノラル FFT係数 si 16は、相関制御手段 105に入力され、 ICCパラメータ 114に従って、チャネル間相関が調整され、所望の広がり感を有するステレオ FFT係数 105が生成される。チヤネル間相関の調整は分離ステレオ FFT係数と元のモノラル FFT係数を加減算することによって実現でき、例えば、以下に示す式に従って行なわれる。 Here, s is a monaural FFT coefficient 111, k is a suffix of the FFT coefficient, and N is the number of FFT points to be used. As the inter-channel phase difference 大きい is larger, the generated stereo signal becomes a sound image that is more widely auditory. The separated stereo FFT coefficient d and the original monaural FFT coefficient si 16 are input to the correlation control means 105, the inter-channel correlation is adjusted according to the ICC parameter 114, and the stereo FFT coefficient 105 having a desired sense of spread is generated. Be done. The adjustment of the interchannel correlation can be realized by adding and subtracting the separated stereo FFT coefficient and the original monaural FFT coefficient, and is performed, for example, according to the equation shown below.

[0042] [数 2] L'(k) = co^{a)s k) + sin^^ (k) [0042] [Number 2] L '(k) = co ^ {a) sk) + sin ^^ (k)

R'(k) = cos(a)s(k) + sin(a)d_R (k) R '(k) = cos (a) s (k) + sin (a) d _R (k)

a - A - arccos(/CC) a-A-arccos (/ CC)

[0043] ここで、 ICCパラメータはチャネル間相関を示す値、例えば両チャネル信号の正規化相互相関値であり、 -1 <= ICC <= 1もしくは、 0 <= ICC <= 1の値をとる。また、 Aは、あらかじめ定められた定数であり、例えば、 A = 0.5とする。この処理によって、 ICCパラメータによって示される両チャネル間の相関が最大、すなわち、 ICC = 1のときに、チャネル間相関調整後の信号の相関も最大となり、 ICC値が減少するに伴って、チヤネル間相関調整後の信号の相関も減少する。 Here, the ICC parameter is a value indicating inter-channel correlation, for example, a normalized cross-correlation value of both channel signals, and takes a value of -1 <= ICC <= 1 or 0 <= ICC <= 1. . Also, A is a predetermined constant, for example, A = 0.5. This process maximizes the correlation between both channels indicated by the ICC parameter, that is, when ICC = 1, the correlation of the signal after the inter-channel correlation adjustment also maximizes, and the ICC value decreases as the ICC value decreases. The correlation of the signal after inter-cell correlation adjustment also decreases.

[0044] し力しながら、先に課題に挙げたように、入力原音ステレオ信号が過渡的な信号の場合、エンコーダにおいて算出、伝送される ICCパラメータの信頼性が低下する。この問題を解決するため、本発明の実施の形態においては、入力信号の過渡性度合を表す TFパラメータ 113を用いる。 TFパラメータ 113は、エンコーダにおいて、入力原音ステレオ信号の各チャネルについて算出、伝送される。用いる尺度としては、あらカじめ定められた時間フレーム内における信号エネルギの変化値、信号振幅の変化値等を用いることができ、過渡性の有無による 2値情報もしくは、過渡性の度合を複数ステップで表す多数値情報として伝送される。また、 TFパラメータは、各チヤネルについて、全周波数帯域を代表する 1つの値として算出、伝送するか、もしくは、周波数帯域を複数のサブバンドに分割し、それぞれのサブバンドについて算出、伝送する。 However, as described above, when the input source sound stereo signal is a transient signal, the reliability of the ICC parameters calculated and transmitted by the encoder is lowered. In order to solve this problem, the embodiment of the present invention uses a TF parameter 113 that represents the degree of transientness of the input signal. The TF parameter 113 is calculated and transmitted for each channel of the input original sound stereo signal in the encoder. As a measure to be used, a change value of signal energy within a predetermined time frame, a change value of signal amplitude, etc. can be used, and binary information with or without the transient property or the degree of the transient property It is transmitted as multi-value information representing multiple steps. In addition, TF parameters are calculated and transmitted as one value representing all frequency bands for each channel, or the frequency bands are divided into a plurality of subbands, and calculated and transmitted for each subband. .

[0045] デコーダにおいて、ビットストリーム分離手段 101において分離された TFパラメータ In the decoder, TF parameters separated in bit stream separation means 101

113は、信号分離手段 104に入力される。（1)式によって表される、信号分離手段 1 04における信号分離処理は、 TFパラメータ 113を用いることによって、例えば、次の様に変形される。 The signal 113 is input to the signal separation unit 104. The signal separation processing in the signal separation means 104 represented by the equation (1) is transformed, for example, as follows by using the TF parameter 113.

[0046] [数 3]

[Number 3]

e_R = e - (\ .o ~~ TF_R ) _{e R = e - (\ .o} ~~ TF R)

[0047] ここで、下付の添え字 L、 Rはそれぞれ Lチャネル、 Rチャネルのパラメータであることを示す。 TFパラメータは、信号の過渡性が最大の時に TF = 1.0、最小の時に TF = 0. 0となるように符号ィ匕されている。この処理により、入力信号の過渡性の度合に基づいて、各チャネル独立に、位相シフト量の調整を行うことが可能となる。過渡性チャネルにおける位相シフト量が 0に近いほど、分離処理後の信号の音としての広がり感が抑えられ、アタック音等において必要とされる、引き締まった音像を得ることができる。 Here, subscripts L and R indicate that the parameters are L channel and R channel, respectively. The TF parameters are coded so that TF = 1.0 at maximum signal transient and TF = 0. 0 at minimum. By this processing, it is possible to adjust the phase shift amount independently for each channel based on the degree of transition of the input signal. As the phase shift amount in the transient channel is closer to 0, the sense of the spread of the signal after separation processing is suppressed, and a tight sound image required for an attack sound or the like can be obtained.

[0048] なお、チャネル間位相差 Θにつ、ては、高、周波数サブバンドの信号に対しては、符号化伝送する必要が無い。これは、人間の聴覚特性により、たとえば、 2kHz程度以上の音に対しては、チャネル間位相差に対する検知感度が低下するためである。しかしながら、正確なチャネル間位相差の検知は難しくなるものの、チャネル間に位相差があることによる、音の広がり感は依然検知することができる。このため、高い周波数サブバンドの信号に対しては、ランダム、もしくは、サブバンド単位であらかじめ定められたチャネル間位相差 Θを適用し、音像の広がり感を再現できるようにする。さらに、非常に低いビットレートにおいては、伝送できる情報量が非常に少なくなるため、 IPDパラメータによるチャネル間位相差情報の伝送を完全に省略することもある。このような構成においては、チャネル間位相差について、エンコーダにおける分析結果に基づく制御が行われて、な、ので、 TFパラメータ導入によって位相シフト量の制御を行うことにより、音質向上により大きな効果を得ることができる。 Note that there is no need to encode and transmit signals of high frequency and frequency sub-bands for the inter-channel phase difference Θ. This is because, for example, for sounds of about 2 kHz or more, the detection sensitivity to the inter-channel phase difference is lowered due to human auditory characteristics. However, although it is difficult to accurately detect the inter-channel phase difference, the sense of sound spreading can still be detected due to the inter-channel phase difference. For this reason, to signals of high frequency sub-bands, a phase difference Θ between channels predetermined in random or in sub-band units is applied, so that the sense of sound image spread can be reproduced. Furthermore, at very low bit rates, the amount of information that can be transmitted is very small, so the transmission of inter-channel phase difference information by IPD parameters may be completely omitted. In such a configuration, control based on the analysis result in the encoder is performed on the inter-channel phase difference. Therefore, by controlling the phase shift amount by introducing the TF parameter, the sound quality is greatly improved. You can get

[0049] なお、上記説明では、各周波数サブバンド内にお!、てチャネル間位相差 Θは一定であるが、隣接する周波数サブバンド間の位相シフト量を平滑ィ匕する処理を導入しても良い。図 3は、位相シフト量平滑ィ匕の一例を示す図である。周波数サブバンド 202 の位相シフト量を 0 1、隣接する両側のサブバンド 201、 203の位相シフト量をそれぞれ 0 0、 0 2として、各サブバンドに含まれる FFT係数に対する位相シフト量は、その周波数上の位置に基づいて、 Θ 0から Θ 1、 Θ 2へと、段階的に変化するように制御される。このような位相シフト量の平滑ィ匕を行うことにより、サブバンド境界における急激な位相変化を原因とする音質劣化を防ぐことができる。なお、図 3に示す例では、平滑ィ匕は直線補間により行われているが、多項式による曲線補間など、いかなる補間方法でも用、ることができる。 In the above description, the phase difference! Between channels is constant within each frequency sub-band. However, processing for smoothing phase shift amounts between adjacent frequency sub-bands may be introduced. FIG. 3 is a diagram showing an example of phase shift amount smoothing. Assuming that the phase shift amount of frequency subband 202 is 0 1 and the phase shift amounts of adjacent subbands 201 and 203 are 0 0 and 0 2 respectively, the phase shift amounts for FFT coefficients included in each subband are It is controlled to change stepwise from Θ 0 to Θ 1 and Θ 2 based on the position on the frequency. By smoothing the phase shift amount like this, it is possible to prevent the sound quality deterioration due to the rapid phase change at the sub-band boundary. In the example shown in FIG. 3, although the smoothing is performed by linear interpolation, any interpolation method such as curve interpolation using a polynomial can be used.

[0050] また、上記説明にお、ては、 IPDパラメータによるチャネル間位相差情報は、 L、 Rチャネル間の位相差情報として伝送されている力これを、ダウンミックスされたモノラル信号 (Mで表す)に対する、 L、 Mチャネル間の位相差情報 Θ LMおよび、 R、 Mチャネル間の位相差情報 Θ RMとして伝送しても良い。この場合の信号分離処理は、例えば、次式のように実現できる。 Further, in the above description, the inter-channel phase difference information by the IPD parameter is the force transmitted as the phase difference information between L and R channels, and the down-mixed monaural signal (M The phase difference information 位相 LM between L and M channels and the phase difference information Θ RM between R and M channels may be transmitted. The signal separation process in this case can be realized, for example, as the following equation.

[0051] [数 4] [0051] [Number 4]

R = ^ -(i.o-r^)

R = ^-(io-r ^)

[0052] なお、 L、 R、 Mの 3つのチャネルから任意に選択した 2つのチャネル間の位相差は、任意の 2組のチャネル間位相差（上記の例では、 L、 Mの組み合わせと、 R、 Mの組み合わせ）が伝送されていれば、すべて再現できる。したがって、チャネル間位相差情報は、どのような任意の 2組のチャネル組み合わせにつ、て伝送しても良、。 Note that the phase difference between two channels arbitrarily selected from the three channels L, R, and M is an arbitrary two pairs of inter-channel phase differences (in the above example, a combination of L and M, Any combination of R and M) can be reproduced. Therefore, the inter-channel phase difference information may be transmitted for any two sets of channel combinations.

[0053] (実施の形態 2) 図 4は、本発明の第 2の実施の形態に係るオーディオ復号化装置 20aの機能的な構成の一例を示す図である。オーディオ復号化装置 20aは、先に説明した第 1の実施の形態のオーディオ復号化装置 20に対して、信号生成手段 100aにおいて、新たに過渡性検出手段 301を追加し、また、過渡性度合を示す TFパラメータ 113の代わりに、過渡性チャネル情報である TC (Transient Channel)パラメータ 303を使用するようにした点が異なる。過渡性検出手段 301は、コアデコード手段 102の出力モノラル PCM信号 110を分析して、信号の過渡性の度合を示す TFパラメータ 302を算出する。 TCパラメータ 303は、エンコーダにおいて算出され、伝送されるチャネル間の相対的な過渡性の度合を示す値であり、例えば、ステレオ信号において、 Lチャネルの過渡性が最大のとき、 TC = -1.0、 Lチャネルと Rチャネルの過渡性が同等のとき、 TC = 0.0、 Rチャネルの過渡性が最大のとき、 TC = 1.0の値をとる。 TCパラメータを用いて、各チャネルに対する過渡性度合は次のように算出できる。 Second Embodiment FIG. 4 is a diagram showing an example of a functional configuration of the audio decoding device 20a according to the second embodiment of the present invention. The audio decoding device 20a adds a transientity detection means 301 in the signal generation means 100a to the audio decoding device 20 of the first embodiment described above, and the transientness degree The difference is that, instead of the TF parameter 113, which is shown, a transient channel information TC (Transient Channel) parameter 303 is used. The transient detection unit 301 analyzes the output monaural PCM signal 110 of the core decoding unit 102 to calculate a TF parameter 302 indicating the degree of transient of the signal. The TC parameter 303 is a value calculated in the encoder and indicating the degree of relative transientity between transmitted channels, and, for example, in the case of stereo signals, when the L channel transparency is maximum, TC = -1.0. When the L channel and R channel transients are equivalent, take TC = 0.0, and when the R channel transient is maximum, take a value of TC = 1.0. Using TC parameters, the degree of transientity for each channel can be calculated as follows.

[0054] [数 5] [0054] [Number 5]

[0055] 各チャネルに対する過渡性度合が算出された後は、先に説明した（3)式に基づいて信号分離処理を行えば良い。この構成によれば、デコーダに新たに過渡性検出手段を設ける必要があるが、一般的に TCパラメータを符号ィ匕伝送するために必要な情報量は、 TFパラメータをチャネル毎に符号ィ匕伝送するために必要な情報量と比較して少ない。したがって、より少ない情報量で、高音質な符号化伝送を実現することができる。 After the transient degree for each channel is calculated, signal separation processing may be performed based on the equation (3) described above. According to this configuration, it is necessary to newly provide a means for detecting transientity in the decoder, but in general, the amount of information necessary for code transmission of TC parameters is determined by coding TF parameters for each channel.少ない Less than the amount of information required to transmit. Therefore, high-quality coded transmission can be realized with a smaller amount of information.

[0056] なお、上記の説明において、過渡性検出手段 301は、コアデコード手段から出力されたモノラル PCM信号 110を用いて過渡性度合を算出して、るが、 FFT変換手段 10 3から時系列に出力されるモノラル FFT係数 111の変化を用いるようにする構成をとつても良い。 In the above description, transientness detection means 301 calculates the transientness degree using monaural PCM signal 110 output from core decoding means. The configuration is such that the change of the monaural FFT coefficient 111 output to the series is used. It is good.

[0057] (実施の形態 3) Third Embodiment

図 5は、本発明の第 3の実施の形態に係るオーディオ復号化装置 20bの機能的な構成を示す図である。オーディオ復号化装置 20bは、先に説明した第 1の実施の形態のオーディオ復号化装置 20に対して、信号生成手段 100bにおいて、信号分離手段 104と相関制御手段 105を取り去り、新たに、第 2の信号分離手段 401を設け、また、 ICCパラメータ 114を第 2の信号分離手段 401に入力するようにした点が異なる。この構成においては、信号分離処理とチャネル間相関制御処理は統合され、 TFパラメータ 113および ICCパラメータ 114に従って、同時に実行される。第 2の信号分離手段 401における処理は、 IPDパラメータ 112によるチャネル間位相差情報が伝送されている力否かによって異なる。なぜならば、チャネル間位相差とチャネル間相関は共に、チャネル間の信号の分離度を示す尺度であり、一般的に、正しいチャネル間位相差を持つ様に処理された信号間のチャネル間相関は、所望のチャネル間相関と一致する。したがって、 IPDパラメータ 112が伝送されているサブバンドについては、位相差情報に従って位相シフト量を制御する。 IPDパラメータが伝送されて、る周波数サブバンドにおける処理は、次式で表される。 FIG. 5 is a diagram showing a functional configuration of an audio decoding device 20b according to a third embodiment of the present invention. The audio decoding device 20b removes the signal separation means 104 and the correlation control means 105 in the signal generation means 100b from the audio decoding device 20 of the first embodiment described above, and newly The second embodiment differs in that a second signal separation unit 401 is provided, and an ICC parameter 114 is input to the second signal separation unit 401. In this configuration, signal separation processing and inter-channel correlation control processing are integrated and performed simultaneously according to TF parameter 113 and ICC parameter 114. The processing in the second signal separation means 401 differs depending on whether the inter-channel phase difference information by the IPD parameter 112 is transmitted or not. The reason is that inter-channel phase difference and inter-channel correlation are both measures of signal separation between channels, and generally, inter-channel correlation between signals processed to have correct inter-channel phase difference. Is consistent with the desired inter-channel correlation. Therefore, the phase shift amount is controlled in accordance with the phase difference information for the sub-band in which the IPD parameter 112 is transmitted. When IPD parameters are transmitted, processing in the frequency sub-band is expressed by the following equation.

[0058] [数 6]

[0058] [Number 6]

f 2τάθ_τ. f 2τάθ _τ .

R'(k) = s(k)oxxi R '(k) = s (k) oxxi

、 , ^w IN j ,, ^W IN j

0_L = ^ - (l -O -T j 0 _L = ^-(l -O -T j

0_R = 0 _R =

[0059] また、 IPDパラメータが伝送されて!ヽな、周波数サブバンドでは、 ICCパラメータによつて表されるチャネル間相関から、チャネル間の位相差を推定する必要がある。 IPD ノメータが伝送されていない周波数サブバンドにおける処理は、次式で表される。 [0060] [数 7] In addition, in the frequency sub-band where the IPD parameters are transmitted, it is necessary to estimate the phase difference between the channels from the inter-channel correlation represented by the ICC parameter. The processing in the frequency sub-bands where IPD meters are not transmitted is expressed by the following equation. [0060] [Number 7]

2τ±θ 2τ ± θ

L'(k) = xp L L '(k) = xp L

S ( ) e S () e

IN ノ

IN

0_L = a - (l .0-TF_L ) 0 _L = a-(l .0-TF _L )

θ_κ = a - (l.0-TF_R ) _{θ κ = a - (l.0-} TF R)

a = A - arccos(lCC) a = A-arccos (lCC)

[0061] ここで、 aはチャネル間相関 ICC力も推定されるチャネル間位相差である。 Aはあら力じめ定められた定数であり、例えば A = 1.0とする。 Here, a is the inter-channel phase difference between which the inter-channel correlation ICC force is also estimated. A is a predetermined constant, for example A = 1.0.

[0062] この構成によれば、各周波数サブバンドにぉ、て、伝送されるパラメータの種類に応じて、適切な信号分離処理を行うことができるため、符号化信号の音質を向上させることが出来る。 According to this configuration, it is possible to perform appropriate signal separation processing according to the type of parameter to be transmitted to each frequency subband, so that the sound quality of the encoded signal is improved. Can do.

[0063] なお、上記の説明の例では、 IPDパラメータが伝送されている周波数サブバンドについては、 ICCパラメータを必要とせず、伝送を省略することができるため、より少な Vヽ情報量で、高音質な符号化伝送を実現することができる。 In the example of the above description, the ICD parameter is not required for the frequency subband in which the IPD parameter is transmitted, and the transmission can be omitted. Sound quality coded transmission can be realized.

[0064] また、 IPDパラメータによって伝送されるチャネル間位相差情報は、ある周波数サブバンド代表する値であるため、正しいチャネル間相関が得られない可能性がある。したがって、 IPDパラメータが伝送されている周波数サブバンドにおける処理においても、 ICCパラメータによる補正項を導入しても良い。補正項としては、 ICCパラメータによって示されるチャネル間相関が大きいほど位相シフト量が小さぐチャネル間相関力 S小さくなるにしたがって位相シフト量が大きくなるような補正を行い、例えば、（6)式で示される処理は、次式のように補正することができる。 Also, since the inter-channel phase difference information transmitted by the IPD parameter is a value representative of a certain frequency sub-band, there is a possibility that correct inter-channel correlation can not be obtained. Therefore, a correction term by the ICC parameter may be introduced also in the processing in the frequency subband in which the IPD parameter is transmitted. As the correction term, correction is performed such that the phase shift amount increases as the inter-channel correlation force S decreases as the inter-channel correlation indicated by the ICC parameter increases, for example, equation (6) The process indicated by can be corrected as in the following equation.

[0065] [数 8] 2nk6jヽ [0065] [Number 8] 2 nk 6 j ヽ

L'(k) = ( ;)exp| L '(k) = (;) exp |

IN ノ IN

[0066] 本発明の実施の形態 1、 2および 3の構成の説明においては、 FFT係数を用いて復号化処理を行う例を用いた力 FFTの代わりに、 DCT、フィルタバンク等、公知のいかなる時間周波数変換を用いても、同様の構成を実現することは容易である。 In the description of the configurations according to the first, second and third embodiments of the present invention, DCT, filter bank, etc. are known instead of force FFT using an example of performing decoding processing using FFT coefficients. It is easy to realize the same configuration using any time-frequency conversion.

[0067] 本発明の実施の形態 1、 2および 3の構成の説明においては、ステレオ原音信号に対して、ダウンミックスされたモノラル信号と、ステレオ音響空間を表現するための空間音響パラメータを符号ィ匕伝送し、伝送された情報をもとにステレオ信号を復号する構成について説明した力この構成は、いかなるチャネル数のマルチチャネル原音信号に対しても適応可能である。つまり、 nチャネルの入力原音信号に対して、 mく nとなる mチャネルのダウンミックス信号を生成、符号化し、コアビットストリームとして伝送する。残りの n-mチャネルについては、空間音響パラメータで表現し、符号化伝送する。デコーダにおいては、コアデコード手段において mチャネルの信号を復号し、続 V、て、本発明の構成にしたがって復号処理を行ヽ n-mチャネルの信号を生成すれば、入力原音信号と同じ nチャネルの信号を生成することができる。 In the description of the configurations of the first, second, and third embodiments of the present invention, down-mixed monaural signals for the stereo original sound signal and spatial acoustic parameters for representing the stereo acoustic space are represented by symbols. This invention is applicable to multichannel original sound signals of any number of channels, and the structure described for the structure of transmitting and decoding a stereo signal based on the transmitted information. That is, an n-channel downmix signal of m by n is generated and encoded with respect to the n-channel input original sound signal, and transmitted as a core bit stream. The remaining nm channels are represented by spatial acoustic parameters and coded and transmitted. In the decoder, the m-channel signal is decoded in the core decoding means, and the decoding process is performed according to the construction of the present invention to generate a ヽ nm-channel signal. The n-channel signal is the same as the input original sound signal. Can be generated.

[0068] (まとめ） (Summary)

以上説明してきたように、本発明のオーディオ符号化装置、オーディオ復号化装置、及びその両者力なるオーディオ伝送システムによれば、原音信号のチャネルごとの過渡性度合を用いるという特徴的な構成により、チャネル間の位相差、相関、レべル差と！/、つた従来力も用いられて、る空間音響情報を正確に求めることが難、過渡的な音声 (例えば、アタック音など）について選択的に、その再生に際して空間音響情報が適用される度合いを抑制することができる。 [0069] これによつて、本来は狭!、音像が期待される過渡的な音声が不正確な空間音響情報に従って再生され、その結果再生音の音像が広がってしまう、という不都合が解消される。 As described above, according to the audio encoding device, the audio decoding device, and the audio transmission system of the present invention, it is characterized by using the transition degree of each channel of the original sound signal. Phase difference, correlation and level difference between channels! It is difficult to accurately obtain space acoustic information by using conventional powers, and the degree to which the space acoustic information is applied during reproduction selectively for a transient sound (for example, attack sound etc.) Can be suppressed. [0069] By this, the inconvenience that the sound is expected to be transient and the transient sound that is expected to be sound image is reproduced according to the inaccurate spatial acoustic information, and the sound image of the reproduced sound is spread as a result is eliminated. Ru.

[0070] (変形例） (Modification)

なお、本発明を実施の形態に基づいて説明してきた力本発明は、前述した実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。 It should be understood that the present invention has been described based on the embodiments. The present invention is, of course, not limited to the embodiments described above. The following cases are also included in the present invention.

[0071] 本発明は、前述したオーディオ符号化装置、及びオーディオ復号化装置として実現することができるだけでなぐこれらの装置が備える特徴的な手段によって実行される処理をステップとするオーディオ符号化方法、及びオーディオ復号化方法として実現することちでさる。 [0071] The present invention provides an audio encoding device and an audio encoding method including steps performed by characteristic means included in these devices that can be realized as an audio decoding device as described above. And as an audio decoding method.

[0072] 例えば、オーディオ符号ィ匕装置 10において実行される次の 3つのステップ、 For example, the following three steps performed in the audio code device 10:

(1)原音信号をダウンミックスすることによって、ダウンミックス信号を生成するダウンミックスステップ、 (1) Downmixing step to generate a downmix signal by downmixing the original sound signal,

(2)前記原音信号を分析することによって、前記原音信号のチャネル間の位相差を表す空間音響情報信号と共に、前記原音信号の過渡性を示す過渡性度合をチヤネルごとに表す過渡性情報信号を生成する空間音響情報分析ステップ、及び、 (2) A transient information signal representing, for each channel, a transient degree indicative of transientity of the original sound signal together with a spatial acoustic information signal representing a phase difference between channels of the original sound signal by analyzing the original sound signal Spatial acoustic information analysis step to generate

(3)前記ダウンミックス信号、前記空間音響情報信号、及び前記過渡性情報信号を出力する信号出力ステップ (3) Signal output step of outputting the downmix signal, the spatial acoustic information signal, and the transient information signal

を含むオーディオ符号化方法は、本発明に含まれる。 An audio coding method that includes is included in the present invention.

[0073] また、例えば、オーディオ復号ィ匕装置 20において実行される次の 2つのステップ、 Also, for example, the following two steps performed in the audio decoding device 20:

(1)前記過渡性情報信号を取得する信号取得ステップ、及び (1) a signal acquisition step of acquiring the transient information signal;

(2)前記ダウンミックス信号から、前記位相差と前記過渡性度合とに基づ!ヽて前記復号信号をチャネルごとに生成し、生成された復号信号を出力する信号生成ステップを含むオーディオ復号化方法は、本発明に含まれる。 (2) Audio decoding including a signal generation step of generating the decoded signal for each channel based on the phase difference and the transition degree from the downmix signal, and outputting the generated decoded signal. The method of conversion is included in the present invention.

[0074] また、これらの方法をコンピュータを用いて実行するための、コンピュータ実行可能なプログラム、及びそのプログラムを格納しているプログラム記録媒体もまた、本発明に含まれる。 In addition, a computer-executable program for executing these methods using a computer, and a program recording medium storing the program are also included in the present invention.

産業上の利用可能性本発明は、ステレオもしくはマルチチャネルのオーディオ信号を少なヽ情報量で高音質に伝送、再生することを可能とする技術であり、放送、通信、インターネットを含む音楽配信等において、より少ない帯域で高品質なサービスを可能とし、また、 CD、 DVD、ハードディスク等のメディアにおいて、より長時間の高品質なオーディオ信号を記録保存することを可能にするものである。 Industrial applicability The present invention is a technology that enables high-quality transmission and reproduction of stereo or multi-channel audio signals with a small amount of information, and can be performed with less bandwidth in broadcasting, communication, music distribution including the Internet, etc. It enables high-quality services, and enables recording and storage of longer-term high-quality audio signals in media such as CDs, DVDs and hard disks.

Claims

The scope of the claims

[1] From the downmix signal of n channel (n is a natural number smaller than m) obtained by downmixing the original sound signal of m channel (m is a natural number of 2 or more), based on the phase difference between the channels An audio decoding apparatus for generating m channel decoded signals, comprising:

A signal acquisition means for acquiring a transient information signal representing, for each channel, a transient degree indicating a magnitude of a transient component included in the original sound signal;

And audio signal generation means for generating the decoded signal for each channel from the downmix signal based on the phase difference and the transient degree, and outputting the generated decoded signal. Decoding device.

[2] The transient information signal is multiplexed into one bit stream together with the downmix signal and a spatial acoustic information signal representing the phase difference,

The signal acquisition unit acquires the bit stream, and demultiplexes the downmix signal, the spatial acoustic information signal, and the transient information signal from the acquired bit stream,

The signal generation means calculates a phase shift amount for each channel from the phase difference represented by the spatial acoustic information signal, and the calculated phase shift amount is represented by the transient information signal. To generate the decoded signal by applying a corrected amount of phase shift to the down-mittus signal.

The audio decoding device according to claim 1, characterized in that:

[3] The transient information signal represents the transient degree of each channel of the original sound signal

The audio decoding device according to claim 2, characterized in that:

[4] The transient information signal represents the deviation of the original sound signal from channel to channel from the reference transient degree.

The audio decoding device according to claim 2, characterized in that:

[5] The transient information signal represents the deviation from the mean of each transient degree,

The signal generation means calculates an average of the transient degree from the downmix signal, and obtains the transient degree for each channel from the calculated average and the deviation represented by the transient information signal. The audio decoding device according to claim 4, characterized in that:

[6] The phase difference is determined for each of a plurality of representative frequencies,

The signal generation means calculates a representative phase shift amount for each representative frequency using the phase difference and the transient degree, and smoothes the calculated representative phase shift amount to calculate a plurality of frequencies respectively. Calculating the amount of phase shift for each of the signals, and generating the decoded signal by applying the calculated amount of phase shift to the downmix signal for each of the frequencies.

The audio decoding device according to claim 1, characterized in that:

[7] The phase difference is a force which is a phase difference between channels of the original sound signal, or a phase difference determined between each channel of the decoded signal according to an intended sound image or randomly

The audio decoding device according to claim 1, characterized in that:

[8] The signal generation means further adjusts the correlation between each channel of the generated decoded signal based on the correlation determined between the channels, and outputs the correlation-adjusted decoded signal.

The audio decoding device according to claim 1, characterized in that:

[9] The signal generation means calculates a phase shift amount for each channel using the phase difference and the correlation, and the calculated phase shift amount is determined according to the transient degree represented by the transient information signal. Correction and giving a phase shift of the corrected amount to the downmix signal, thereby performing correlation adjustment together with the phase shift and generating and outputting a decoded signal subjected to correlation adjustment.

The audio decoding device according to claim 8, characterized in that:

[10] The signal generation means further adjusts the gain of each channel of the correlation-adjusted decoded signal, and outputs the gain-adjusted decoded signal.

The audio decoding device according to claim 8, characterized in that:

[11] The m is 2 and the n is 1

The audio decoding device according to claim 1, characterized in that:

[12] From an original sound signal of m channels (m is a natural number of 2 or more), n channels (n is smaller than m) An audio encoding apparatus for generating a natural sound downmix signal and a spatial sound information signal representing a phase difference between channels of the original sound signal,

Downmixing means for generating the downmix signal by downmixing the original sound signal;

Spatial acoustic information analysis means for generating a transient information signal representing, for each channel, the transient sound level indicative of the magnitude of the transient component included in the original sound signal together with the spatial acoustic information signal by analyzing the original sound signal; ,

And a bitstream multiplexing means for multiplexing the downmix signal, the spatial acoustic information signal, and the transient information signal into one bitstream and outputting the same.

An audio code device comprising:

An original sound signal of m channels (m is a natural number of 2 or more) is represented by a downmix signal of n channels (n is a natural number smaller than m) and a spatial acoustic information signal representing a phase difference between channels. An audio transmission system that transmits

An audio encoding device and an audio decoding device;

The audio coding device

Equipped with

The audio decoding device

Signal acquisition means for acquiring the bit stream and demultiplexing the downmix signal, the spatial acoustic information signal, and the transient information signal from the acquired bit stream;

The phase difference represented by the spatial acoustic information signal and the Signal generation means for generating an m-channel decoded signal for each channel and outputting the generated decoded signal based on the transientness degree determined from the transient information signal;

An audio transmission system comprising:

[14] From the downmix signal of n channel (n is a natural number smaller than m) obtained by downmixing the original sound signal of m channel (m is a natural number of 2 or more), based on the phase difference between the channels An audio decoding method for generating an m channel decoded signal, comprising:

Acquiring a transient information signal representing, for each channel, a transient degree indicating a magnitude of a transient component included in the original sound signal;

Generating a decoded signal for each channel based on the phase difference and the degree of transientness from the downmix signal, and outputting the generated decoded signal. Audio decoding method.

[15] Spatial sound information indicating the phase difference between the channels of the original sound signal and the downmix signal of n channels (n is a natural number smaller than m) from the original sound signal of m channels (m is a natural number of 2 or more) An audio coding method for generating a signal and

Generating a downmix signal by downmixing the original sound signal;

A spatial acoustic information analysis step of generating a transient information signal representing, for each channel, a transient degree indicating a magnitude of a transient component included in the original sound signal by analyzing the original sound signal; ,

A signal output step of outputting the downmix signal, the spatial acoustic information signal, and the transient information signal;

An audio encoding method characterized in that it comprises:

[16] From the downmix signal of n channel (n is a natural number smaller than m) obtained by downmixing the original sound signal of m channel (m is a natural number of 2 or more), based on the phase difference between the channels V , A computer executable program for generating an m channel decoded signal,

Acquiring a transient information signal representing, for each channel, a transient degree indicating a magnitude of a transient component included in the original sound signal; Generating, from the downmix signal, the decoded signal for each channel based on the phase difference and the transition degree, and causing a computer to execute a signal generation step of outputting the generated decoded signal. Program to feature.

[17] Spatial sound information indicating the phase difference between the channels of the original sound signal and the downmix signal of n channels (n is a natural number smaller than m) from the original sound signal of m channels (m is a natural number of 2 or more) A computer executable program for generating a signal, generating a downmix signal by downmixing the original sound signal;

A program characterized by causing a computer to execute.

[18] A computer readable recording medium storing the program according to at least one of claims 15 and 16.