JP3606457B2

JP3606457B2 - Audio signal transmission method and audio decoding method

Info

Publication number: JP3606457B2
Application number: JP2001131801A
Authority: JP
Inventors: 美昭田中; 昭治植野; 徳彦渕上
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1998-11-16
Filing date: 2001-04-27
Publication date: 2005-01-05
Anticipated expiration: 2019-11-16
Also published as: JP2002006897A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号を予測符号化して圧縮するための音声符号化方法により符号化された音声信号を伝送する音声信号伝送方法及びその音声信号を復号する音声復号方法に関する。
【０００２】
【従来の技術】
音声信号を予測符号化する方法として、本発明者は先の出願（特願平９−２８９１５９号）において１チャネルの原デジタル音声信号に対して、特性が異なる複数の予測器により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し、原デジタル音声信号と、この複数の線形予測値から予測器毎の予測残差を算出し、予測残差の最小値を選択する方法を提案している。
【０００３】
【発明が解決しようとする課題】
しかしながら、上記方法では原デジタル音声信号がサンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２０ビット程度の場合にある程度の圧縮効果を得ることができるが、近年のＤＶＤオーディオディスクではこの２倍のサンプリング周波数（＝１９２ｋＨｚ）が使用され、また、量子化ビット数も２４ビットが使用される傾向があるので、圧縮率を改善する必要がある。
【０００４】
そこで本発明は、音声信号を予測符号化する場合に圧縮率を改善することができる音声符号化方法により符号化されたデータの伝送方法及び復号方法を提供することを目的とする。
【０００５】
【課題を解決するための手段】
本発明は上記目的を達成するために、以下の１）及び２）に記載の手段よりなる。
すなわち、
【０００６】
１）元のマルチチャネルの音声信号をダウンミクスしてステレオ２チャネルの音声信号に変換するステップと、
前記ダウンミクスされない元のチャネルの複数チャネルの各音声信号を所定のマトリクス演算により相関性のある音声信号に変換するステップと、
前記ステレオ２チャネルと前記相関性のある音声信号のチャネル毎に入力される音声信号に応答して先頭サンプル値を得ると共に、特性が異なる複数の線形予測方法により時間領域の過去から現在の信号の線形予測値がそれぞれ予測され、その予測される線形予測値と前記音声信号とから得られる予測残差が最小となるような線形予測方法を選択するステップと、
ヘッダ情報と、圧縮ＰＣＭプライベートヘッダ及びオーディオ圧縮ＰＣＭデータ部を含むユーザデータと、を含んだデータ構造にすると共に、前記ステップにより選択された各チャネルの先頭サンプル値と予測残差と線形予測方法を含む予測符号化データを、前記オーディオ圧縮ＰＣＭデータ部内に記録し、前記音声信号のＵＰＣ／ＥＡＮ−ＩＳＲＣ番号及びＵＰＣ／ＥＡＮ−ＩＳＲＣデータを前記圧縮ＰＣＭプライベートヘッダ内に配置するステップからなる音声符号化方法により符号化された音声信号を伝送する音声信号伝送方法であって、
前記選択された先頭サンプル値と予測残差と線形予測方法とを含む予測符号化データと前記音声信号のＵＰＣ／ＥＡＮ−ＩＳＲＣ番号及びＵＰＣ／ＥＡＮ−ＩＳＲＣデータとをパケット化して伝送することを特徴とする音声信号伝送方法。２）元のマルチチャネルの音声信号をダウンミクスしてステレオ２チャネルの音声信号に変換するステップと、
前記ダウンミクスされない元のチャネルの複数チャネルの各音声信号を所定のマトリクス演算により相関性のある音声信号に変換するステップと、
前記ステレオ２チャネルと前記相関性のある音声信号のチャネル毎に入力される音声信号に応答して先頭サンプル値を得ると共に、特性が異なる複数の線形予測方法により時間領域の過去から現在の信号の線形予測値がそれぞれ予測され、その予測される線形予測値と前記音声信号とから得られる予測残差が最小となるような線形予測方法を選択するステップと、
ヘッダ情報と、圧縮ＰＣＭプライベートヘッダ及びオーディオ圧縮ＰＣＭデータ部を含むユーザデータと、を含んだデータ構造にすると共に、前記ステップにより選択された各チャネルの先頭サンプル値と予測残差と線形予測方法を含む予測符号化データを、前記オーディオ圧縮ＰＣＭデータ部内に記録し、前記音声信号のＵＰＣ／ＥＡＮ−ＩＳＲＣ番号及びＵＰＣ／ＥＡＮ−ＩＳＲＣデータを前記圧縮ＰＣＭプライベートヘッダ内に配置するステップからなる音声符号化方法により符号化されたデータから元の音声信号を復号する音声復号方法であって、
前記選択された先頭サンプル値と予測残差と線形予測方法を含む予測符号化データから予測値を算出するステップと、
この算出された予測値から前記第１の複数チャネルのデジタル音声信号を復元するステップと、
からなる音声復号方法。
【０００７】
【発明の実施の形態】
以下、図面を参照して本発明を説明する。図１は本発明が適用される音声符号化装置とそれに対応する音声復号装置の第１の実施形態を示すブロック図、図２は図１の符号化部を詳しく示すブロック図、図３は図１、図２の符号化部により符号化されたビットストリームを示す説明図、図４は図１の復号化部を詳しく示すブロック図、図５はＤＶＤのパックのフォーマットを示す説明図、図６はＤＶＤのオーディオパックのフォーマットを示す説明図、図７、図８は音声伝送方法を示すフローチャートである。
【０００８】
ここで、マルチチャネル方式としては、例えば次の４つの方式が知られている。
（１）４チャネル方式ドルビーサラウンド方式のように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方Ｓの１チャネルの合計４チャネル
（２）５チャネル方式ドルビーＡＣ−３方式のＳＷチャネルなしのように、前方Ｌ、Ｃ、Ｒの３チャネル＋後方ＳＬ、ＳＲの２チャネルの合計５チャネル
（３）６チャネル方式ＤＴＳ（ＤｉｇｉｔａｌＴｈｅａｔｅｒＳｙｓｔｅｍ）方式や、ドルビーＡＣ−３方式のように６チャネル（Ｌ、Ｃ、Ｒ、ＳＷ（Ｌｆｅ）、ＳＬ、ＳＲ）
（４）８チャネル方式ＳＤＤＳ（ＳｏｎｙＤｙｎａｍｉｃＤｉｇｉｔａｌＳｏｕｎｄ）方式のように、前方Ｌ、ＬＣ、Ｃ、ＲＣ、Ｒ、ＳＷの６チャネル＋後方ＳＬ、ＳＲの２チャネルの合計８チャネル
【０００９】
図１に示す符号化側の６チャネル（ｃｈ）ミクス＆マトリクス回路１’は、マルチチャネル信号の一例としてフロントレフト（Ｌｆ）、センタ（Ｃ）、フロントライト（Ｒｆ）、サラウンドレフト（Ｌｓ）、サラウンドライト（Ｒｓ）及びＬｆｅ（ＬｏｗＦｒｅｑｕｅｎｃｙＥｆｆｅｃｔ）の６ｃｈのＰＣＭデータを係数ｍｉｊ（ｉ＝１，２，ｊ＝１，２〜６）を用いて次式（１）によりステレオ２チャネル（Ｌ、Ｒ）にダウンミクスする。

【００１０】
またミクス＆マトリクス回路１’は、元の６ｃｈ（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を前方グループに関する２ｃｈと他のグループに関する４ｃｈに分類して４ｃｈを次式（２）のように、相関性のある信号「３」〜「６」に変換し、２ｃｈ（Ｌ、Ｒ）を第１符号化部２’−１に、また、４ｃｈ「３」〜「６」を第２符号化部２’−２に出力する。
「１」＝Ｌ
「２」＝Ｒ
「３」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
「４」＝Ｌｓ＋Ｒｓ
「５」＝Ｌｓ−Ｒｓ
「６」＝Ｌｆｅ−Ｃ …（２）
【００１１】
符号化部２’を構成する第１及び第２符号化部２’−１、２’−２はそれぞれ、図２に詳しく示すように２ｃｈ「１」、「２」と４ｃｈ「３」〜「６」のＰＣＭデータをチャネル毎に予測符号化し、予測符号化データを図３に示すようなビットストリームで記録媒体５や衛星回線や電話回線等の通信媒体６を介して復号側に伝送する。復号側では復号化部３’を構成する第１及び第２復号化部３’−１、３’−２により、図４に詳しく示すようにそれぞれ前方グループに関する２ｃｈ「１」、「２」と他のグループに関する４ｃｈ「３」〜「６」の予測符号化データをチャネル毎にＰＣＭデータに復号する。次いでミクス＆マトリクス回路４’により式（１）、（２）に基づいて元の６ｃｈ（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元するとともに、ステレオ２ｃｈデータ（Ｌ、Ｒ）をそのまま出力する。
【００１２】
図２を参照して符号化部２’−１、２’−２について詳しく説明する。各ｃｈ「１」〜「６」のＰＣＭデータは１フレーム毎に１フレームバッファ１０に格納される。そして、１フレームの各ｃｈ「１」〜「６」のサンプルデータがそれぞれ予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４に印加されるとともに、各ｃｈ「１」〜「６」の各フレームの先頭サンプルデータ（後述のリスタートヘッダ内に格納される）がアンパッキング回路８及びフォーマット化回路１９に印加される。また、ＰＣＭデータがＡ／Ｄ変換されたときのサンプリング周波数（ｆｓ）と量子化ビット数（Ｑｂ）がパッキング回路１８及びフォーマット化回路１９に印加される。予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４はそれぞれ、各ｃｈ「１」〜「６」のＰＣＭデータに対して、特性が異なる複数の予測器（不図示）により時間領域における過去の信号から現在の信号の複数の線形予測値を算出し、次いで原ＰＣＭデータと、この複数の線形予測値から予測器毎の予測残差を算出する。続くバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４はそれぞれ、予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４により算出された各予測残差を一時記憶して、選択信号／ＤＴＳ（デコーディング・タイム・スタンプ）生成器１７により指定されたサブフレーム毎に予測残差の最小値を選択する。
【００１３】
選択信号生成器１７は予測残差のビット数フラグをパッキング回路１８とフォーマット化回路１９に対して印加し、また、予測残差が最小の予測器を示す予測器選択フラグと、後述するような相関係数をフォーマット化回路１９に対して印加する。パッキング回路１８はバッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４により選択された６ｃｈ分の予測残差を、選択信号生成器１７により指定されたビット数フラグに基づいて指定ビット数でパッキングする。
【００１４】
続くフォーマット化回路１９は図３に示すようなユーザデータにフォーマット化する。このユーザデータは前方グループに関する２ｃｈ（１）、（２）の予測符号化データを含む可変レートビットストリームＢＳ０と、他のグループに関する４ｃｈ（３）〜（６）の予測符号化データを含む可変レートビットストリームＢＳ１と、ストリームＢＳ０、ＢＳ１の前に設けられたビットストリームヘッダにより構成されている。また、１フレーム分のストリームＢＳ０、ＢＳ１は
・フレームヘッダと、
・各ｃｈ（１）〜（６）の１フレームの先頭サンプルデータと、
・各ｃｈ（１）〜（６）のサブフレーム毎の予測器選択フラグと、
・各ｃｈ（１）〜（６）のサブフレーム毎のビット数フラグと、
・各ｃｈ（１）〜（６）の予測残差データ列（可変ビット数）と、
・後述する相関係数
が多重化されている。このような予測符号化によれば、原信号が例えばサンプリング周波数＝９６ｋＨｚ、量子化ビット数＝２４ビット、６チャネルの場合、７１％の圧縮率を実現することができる。
【００１５】
次に図４を参照して復号化部３’−１、３’−２について説明する。上記フォーマットの可変レートビットストリームデータＢＳ０、ＢＳ１は、デフォーマット化回路２１によりストリームデータとフレームヘッダに基づいて分離される。そして、各ｃｈ「１」〜「６」の１フレームの先頭サンプルデータと予測器選択フラグはそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４に印加され、各ｃｈ「１」〜「６」のビット数フラグと予測残差データ列はアンパッキング回路２２に印加される。ここで、予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４内の複数の予測器（不図示）はそれぞれ、符号化側の予測回路１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４内の複数の予測器と同一の特性であり、予測器選択フラグにより同一特性のものが選択される。
【００１６】
アンパッキング回路２２は各ｃｈ「１」〜「６」の予測残差データ列をビット数フラグ毎に基づいて分離してそれぞれ予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４に出力する。予測回路２４Ｄ１、２４Ｄ２、２３Ｄ１〜２３Ｄ４ではそれぞれ、アンパッキング回路２２からの各ｃｈ「１」〜「６」の今回の予測残差データと、内部の複数の予測器の内、予測器選択フラグにより選択された各１つにより予測された前回の予測値が加算されて今回の予測値が算出され、次いで１フレームの先頭サンプルデータを基準として各サンプルのＰＣＭデータが算出される。
【００１７】
ここで、図２に示す符号化部２’−１、２’−２により予測符号化された可変レートビットストリームデータを、記録媒体の一例としてＤＶＤオーディオディスクに記録する場合には、図５に示すオーディオ（Ａ）パックにパッキングされる。このパックは２０３４バイトのユーザデータ（Ａパケット、Ｖパケット）に対して４バイトのパックスタート情報と、６バイトのＳＣＲ（ＳｙｓｔｅｍＣｌｏｃｋＲｅｆｅｒｅｎｃｅ：システム時刻基準参照値）情報と、３バイトのＭｕｘレート（ｒａｔｅ）情報と１バイトのスタッフィングの合計１４バイトのパックヘッダが付加されて構成されている（１パック＝合計２０４８バイト）。この場合、タイムスタンプであるＳＣＲ情報を、ＡＣＢユニット内の先頭パックでは「１」として同一タイトル内で連続とすることにより同一タイトル内のＡパックの時間を管理することができる。
【００１８】
圧縮ＰＣＭのＡパケットは図６に詳しく示すように、９〜２２バイトのパケットヘッダと、圧縮ＰＣＭのプライベートヘッダと、図３に示すフォーマットの１ないし２０１５バイトのオーディオデータ（圧縮ＰＣＭ）により構成されている。圧縮ＰＣＭのプライベートヘッダは、
・１バイトのサブストリームＩＤと、
・２バイトのＵＰＣ／ＥＡＮ−ＩＳＲＣ（ＵｎｉｖｅｒｓａｌＰｒｏｄｕｃｔＣｏｄｅ／ＥｕｒｏｐｅａｎＡｒｔｉｃｌｅＮｕｍｂｅｒ−ＩｎｔｅｒｎａｔｉｏｎａｌＳｔａｎｄａｒｄＲｅｃｏｒｄｉｎｇＣｏｄｅ）番号、及びＵＰＣ／ＥＡＮ−ＩＳＲＣデータと、
・１バイトのプライベートヘッダ長と、
・２バイトの第１アクセスユニットポインタと、
・４バイトのオーディオデータ情報（ＡＤＩ）と、
・０〜７バイトのスタッフィングバイトとに、
より構成されている。
【００１９】
そして、ＡＤＩ内に１秒後のアクセスユニットをサーチするための前方アクセスユニット・サーチポインタと、１秒前のアクセスユニットをサーチするための後方アクセスユニット・サーチポインタがともに１バイトでセットされる。具体的には、ＡＤＩの１バイト目に前方アクセスユニット・サーチポインタが、８バイト目に後方アクセスユニット・サーチポインタがセットされる。
このようにＡＤＩは、圧縮ＰＣＭでは４バイトに減少させるためオーディオデータを２０１５バイトまで収納できる。
【００２０】
図６に示す圧縮ＰＣＭ（ＰＰＣＭ）のオーディオパケットにおけるオーディオデータエリアは、図７に示すように複数のＰＰＣＭアクセスユニットにより構成され、ＰＰＣＭアクセスユニットはＰＰＣＭシンク情報とサブパケットにより構成されている。最初のＰＰＣＭアクセスユニット内のサブパケットは、ディレクトリと、サブストリーム「ＢＳ０」と、ＣＲＣ（１バイト又は２バイト）と、サブストリーム「ＢＳ１」と、ＣＲＣとエクストラ情報により構成され、サブストリーム「ＢＳ０」、「ＢＳ１」はＰＰＣＭブロックのみにより構成されている。２番目以降のＰＰＣＭアクセスユニット内のサブパケットも、ディレクトリと、サブストリーム「ＢＳ０」と、ＣＲＣと、サブストリーム「ＢＳ１」と、ＣＲＣとエクストラ情報により構成され、サブストリーム「ＢＳ０」、「ＢＳ１」はリスタートヘッダとＰＰＣＭブロックにより構成されている。
【００２１】
また、図２に示す符号化部２’−１、２’−２により予測符号化された可変レートビットストリームデータをネットワークを介して伝送する場合には、符号化側では図８示すように伝送用にパケット化し（ステップＳ４１）、次いでパケットヘッダを付与し（ステップＳ４２）、次いでこのパケットをネットワーク上に送り出す（ステップＳ４３）。復号側では図９に示すようにヘッダを除去し（ステップＳ５１）、次いでデータを復元し（ステップＳ５２）、次いでこのデータをメモリに格納して復号を待つ（ステップＳ５３）。
【００２２】
なお、上記実施形態では、ステレオ２ｃｈデータ（Ｌ、Ｒ）をそのまま伝送したが、
「１」＝Ｌ＋Ｒ
「２」＝Ｌ−Ｒ
「３」〜「５」は同じ
「６」＝Ｌｆｅ−ａ×Ｃ
ただし、０≦ａ≦１ …（２）’
により６チャネル「１」〜「６」と共に、相関のある信号に変換して予測符号化するようにしてもよい（第２の実施形態）。この場合には、復号化側のミクス＆マトリクス回路４’はチャネル「１」、「２」を加算することによりチャネルＬを、減算することによりチャネルＲを生成することができる。
なお、上記実施例では、マルチチャンネル（６ｃｈ）とステレオ（２ｃｈ）と復元するようにしているが、いずれか一方でもよいことは言うまでもない。
【００２３】
また、図１０は第３の実施の形態を示す図で、この場合にはダウンミックスすることなく、前方グループに関する２ｃｈ「１」、「２」を
「１」＝Ｌｆ＋Ｒｆ
「２」＝Ｌｆ−Ｒｆ
として伝送する。そして、再生側では、所望に応じて後段側のミックス＆マトリクス回路４’から出力されたダウンミックスされないステレオ２チャンネル信号Ｌｆ，Ｒｆを使用したり、この回路４’内でダウンミックスされて取り出されたステレオ２チャンネル信号Ｌ，Ｒを使用することもできる。
【００２４】
次に、図１１、図１２、図１３を参照して第４の実施形態について説明する。上記の実施形態では、１グループの相関性の信号「１」〜「６」を予測符号化するように構成されているが、この第４の実施形態では複数グループの相関性のある信号を生成して予測符号化し、圧縮率が最も高いグループの予測符号化データを選択するように構成されている。また、このこの実施例ではその１グループ内における符号化は、前述の各実施例の場合のように前方グループに関する２ｃｈと他のグループに関する４ｃｈに分類して変換するようなことはせずに、一つにまとめた符号化処理が行われる構成で、図１１は前述の図１に対応した図として示してある。このため図１２に示す符号化部では、第１〜第ｎの相関回路１−１〜１−ｎが設けられ、このｎ個の相関回路１−１〜１−ｎは例えば６ｃｈ（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のＰＣＭデータを、相関性が異なるｎ種類の６ｃｈ信号「１」〜「６」に変換する。
【００２５】
例えば第１の相関回路１−１は以下のように変換し、
「１」＝Ｌｆ
「２」＝Ｃ−（Ｌｓ＋Ｒｓ）／２
「３」＝Ｒｆ−Ｌｆ
「４」＝Ｌｓ−ａ×Ｌｆｅ
「５」＝Ｒｓ−ｂ×Ｒｆ
「６」＝Ｌｆｅ
また、第ｎの相関回路１−ｎは以下のように変換し、
「１」＝Ｌｆ＋Ｒｆ
「２」＝Ｃ−Ｌｆ
「３」＝Ｒｆ−Ｌｆ
「４」＝Ｌｓ−Ｌｆ
「５」＝Ｒｓ−Ｌｆ
「６」＝Ｌｆｅ−Ｃ
また、他の相関回路は第１の実施形態のように変換する。
【００２６】
また、相関回路１−１〜１−ｎ毎に予測回路１５とバッファ・選択器１６が設けられ、グループ毎の予測残差の最小値のデータ量に基づいて圧縮率が最も高いグループが相関選択信号生成器１７ｂにより選択される。このとき、フォーマット化回路１９はその選択フラグ（相関回路選択フラグ、その相関回路の相関係数ａ、ｂ）を追加して多重化する。
【００２７】
そして、図１３は前述の図６に対応したデータエリアを示し、この実施例ではサブストリーム「ＢＳ１」を用いず、サブストリーム「ＢＳ０」のみで構成することになる。
【００２８】
また、図１４に示す復号化側では、符号化側の相関回路１−１〜１−ｎに対してｎ個の相関回路４−１〜４−ｎ（又は係数ａ、ｂが変更可能な１つの相関回路４）が設けられる。なお、図１２に示すｎグループの予測回路が同一の構成である場合、復号装置では図１４に示すようにｎグループ分の予測回路を設ける必要はなく、１つのグループ分の予測回路でよい。そして、符号化装置から伝送された選択フラグに基づいて相関回路４−１〜４−ｎの１つを選択、又は係数ａ、ｂを設定して元の６ｃｈ（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）を復元し、また、式（１）によりマルチチャネルをダウンミクスしてステレオ２ｃｈデータ（Ｌ、Ｒ）を生成する。
また、チャンネル数が「１」〜「６」の６チャンネル方式のものは、一例であって５チャンネル方式等他の方式のものであってもよい。
【００２９】
また、上記の第１の実施形態では、１種類の相関性の信号「１」〜「６」を予測符号化するように構成されているが、この信号「１」〜「６」のグループと原信号（Ｌｆ、Ｃ、Ｒｆ、Ｌｓ、Ｒｓ、Ｌｆｅ）のグループを予測符号化し、圧縮率が高い方のグループを選択するようにしてもよい。
【００３０】
【発明の効果】
以上説明したように本発明によれば、今まで以上に圧縮率を改善した音声信号を伝送し、この音声信号を不都合なく復号することができる。
【図面の簡単な説明】
【図１】本発明が適用される音声符号化装置とそれに対応する音声復号装置の第１の実施形態を示すブロック図である。
【図２】図１の符号化部を詳しく示すブロック図である。
【図３】図１、図２の符号化部により符号化されたビットストリームを示す説明図である。
【図４】図１の復号化部を詳しく示すブロック図である。
【図５】ＤＶＤのパックのフォーマットを示す説明図である。
【図６】ＤＶＤのオーディオパックのフォーマットを示す説明図である。
【図７】図６のオーディオデータエリアのフォーマットを詳しく示す説明図である。
【図８】音声伝送方法を示すフローチャートである。
【図９】音声伝送方法を示すフローチャートである。
【図１０】第３の実施形態の音声符号化装置とそれに対応する音声復号装置を示すブロック図である。
【図１１】本発明が適用される音声符号化装置とそれに対応する音声復号装置の第４の実施形態を示すブロック図である。
【図１２】第４の実施形態の音声符号化装置を示すブロック図である。
【図１３】図７に対応した別の実施例の説明図である。
【図１４】第４の実施形態の音声復号装置を示すブロック図である。
【符号の説明】
１’ ６ｃｈミクス＆マトリクス回路（相関手段、ダウンミクス手段）
１３Ｄ１、１３Ｄ２、１５Ｄ１〜１５Ｄ４予測回路（バッファ・選択器１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４と共に予測符号化手段を構成する。）
１４Ｄ１、１４Ｄ２、１６Ｄ１〜１６Ｄ４バッファ・選択器
１９フォーマット化回路（フォーマット化手段）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech decoding method for decoding an audio signal transmission method and the audio signal to transmit heat the audio signal encoded by the speech encoding method for compressing and predictive coding the speech signal.
[0002]
[Prior art]
As a method for predictive coding of a speech signal, the present inventor has proposed that in the previous application (Japanese Patent Application No. 9-289159), a single channel original digital speech signal is recorded in the past in the time domain by a plurality of predictors having different characteristics. A method for calculating a plurality of linear prediction values of a current signal from a plurality of signals, calculating a prediction residual for each predictor from the original digital speech signal and the plurality of linear prediction values, and selecting a minimum value of the prediction residual Has proposed.
[0003]
[Problems to be solved by the invention]
However, in the above method, a certain degree of compression effect can be obtained when the original digital audio signal has a sampling frequency = 96 kHz and the number of quantization bits = 20 bits. = 192 kHz) is used, and the number of quantization bits tends to be 24. Therefore, it is necessary to improve the compression rate.
[0004]
SUMMARY OF THE INVENTION An object of the present invention is to provide a transmission method and a decoding method for data encoded by a speech coding method that can improve the compression rate when predictive coding a speech signal.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, the present invention comprises the following means 1) and 2) .
That is,
[0006]
1) Downmixing the original multi-channel audio signal and converting it to a stereo 2-channel audio signal;
Converting each audio signal of a plurality of channels of the original channel not downmixed into a correlated audio signal by a predetermined matrix operation;
A head sample value is obtained in response to an audio signal input for each channel of the stereo 2 channel and the correlated audio signal, and a plurality of linear prediction methods having different characteristics are used to calculate the current signal from the past in the time domain. Selecting a linear prediction method such that each of the linear prediction values is predicted, and the prediction residual obtained from the predicted linear prediction value and the speech signal is minimized;
A data structure including header information, user data including a compressed PCM private header and an audio compressed PCM data portion, and a head sample value, a prediction residual, and a linear prediction method of each channel selected in the above step. Audio encoding comprising the steps of: recording predictive encoded data in the audio compressed PCM data section, and placing the UPC / EAN-ISRC number and UPC / EAN-ISRC data of the audio signal in the compressed PCM private header An audio signal transmission method for transmitting an audio signal encoded by the method,
Predictive encoded data including the selected first sample value, prediction residual, and linear prediction method, and UPC / EAN-ISRC number and UPC / EAN-ISRC data of the voice signal are packetized and transmitted. A voice signal transmission method. 2) Downmixing the original multi-channel audio signal to convert it into a stereo 2-channel audio signal;
Converting each audio signal of a plurality of channels of the original channel not downmixed into a correlated audio signal by a predetermined matrix operation;
A head sample value is obtained in response to an audio signal input for each channel of the stereo 2 channel and the correlated audio signal, and a plurality of linear prediction methods having different characteristics are used to calculate the current signal from the past in the time domain. Selecting a linear prediction method such that each of the linear prediction values is predicted, and the prediction residual obtained from the predicted linear prediction value and the speech signal is minimized;
A data structure including header information, user data including a compressed PCM private header and an audio compressed PCM data portion, and a head sample value, a prediction residual, and a linear prediction method of each channel selected in the above step. Audio encoding comprising the steps of: recording predictive encoded data in the audio compressed PCM data section, and placing the UPC / EAN-ISRC number and UPC / EAN-ISRC data of the audio signal in the compressed PCM private header A speech decoding method for decoding an original speech signal from data encoded by the method,
Calculating a prediction value from predictive encoded data including the selected first sample value, prediction residual, and linear prediction method;
Restoring the first plurality of channels of digital audio signals from the calculated predicted value;
A speech decoding method comprising:
[0007]
DETAILED DESCRIPTION OF THE INVENTION
The present invention will be described below with reference to the drawings. 1 is a block diagram showing a first embodiment of a speech encoding apparatus to which the present invention is applied and a speech decoding apparatus corresponding to the speech encoding apparatus, FIG. 2 is a block diagram showing in detail the encoding unit of FIG. 1, and FIG. 1 and 2 are explanatory diagrams showing a bit stream encoded by the encoding unit of FIG. 2, FIG. 4 is a block diagram showing in detail the decoding unit of FIG. 1, FIG. 5 is an explanatory diagram showing a format of a DVD pack, and FIG. Is an explanatory diagram showing the format of a DVD audio pack, and FIGS.
[0008]
Here, as the multi-channel method, for example, the following four methods are known.
(1) 4-channel system Like the Dolby Surround system, a total of 4 channels of 3 channels for the front L, C, and R + 1 channel for the rear S (2) 5 channels system Like no Dolby AC-3 system SW channel , Forward L, C, R 3 channels + backward SL, SR 2 channels in total 5 channels (3) 6 channel system 6 channels (L, D, D) (Digital Theater System) system and Dolby AC-3 system C, R, SW (Lfe), SL, SR)
(4) 8-channel system Like the SDDS (Sony Dynamic Digital Sound) system, a total of 8 channels including 6 channels of forward L, LC, C, RC, R, and SW + 2 channels of backward SL and SR
The encoding-side 6-channel (ch) mix & matrix circuit 1 ′ shown in FIG. 1 includes, as an example of a multi-channel signal, a front left (Lf), a center (C), a front right (Rf), a surround left (Ls), Surround channel (Ls) and Lfe (Low Frequency Effect) 6ch PCM data using coefficients mij (i = 1, 2, j = 1, 2 to 6) according to the following equation (1), stereo 2 channels (L, Downmix to R).

[0010]
Also, the mix & matrix circuit 1 ′ classifies the original 6ch (Lf, C, Rf, Ls, Rs, Lfe) into 2ch related to the front group and 4ch related to the other group, and 4ch is expressed by the following equation (2). , Converted to correlated signals “3” to “6”, 2ch (L, R) to the first encoding unit 2′-1, and 4ch “3” to “6” to the second encoding Output to unit 2′-2.
“1” = L
“2” = R
“3” = C− (Ls + Rs) / 2
“4” = Ls + Rs
“5” = Ls−Rs
“6” = Lfe−C (2)
[0011]
As shown in detail in FIG. 2, the first and second encoding units 2′-1, 2′-2 constituting the encoding unit 2 ′ are respectively 2ch “1”, “2” and 4ch “3” to “3” to “3” to “3”. The PCM data of “6” is predictively encoded for each channel, and the predictive encoded data is transmitted to the decoding side via the recording medium 5 and the communication medium 6 such as a satellite line or a telephone line as a bit stream as shown in FIG. On the decoding side, as shown in detail in FIG. 4, the first and second decoding units 3′-1 and 3′-2 constituting the decoding unit 3 ′ are respectively connected to 2ch “1” and “2” related to the front group. Predictive encoded data of 4ch “3” to “6” related to other groups is decoded into PCM data for each channel. Next, the original 6ch (Lf, C, Rf, Ls, Rs, Lfe) is restored based on the equations (1) and (2) by the mix & matrix circuit 4 ′, and the stereo 2ch data (L, R) is directly used. Output.
[0012]
The encoding units 2′-1 and 2′-2 will be described in detail with reference to FIG. The PCM data of each channel “1” to “6” is stored in one frame buffer 10 for each frame. The sample data of each channel “1” to “6” in one frame is applied to the prediction circuits 13D1, 13D2, and 15D1 to 15D4, respectively, and the head sample data of each frame in each channel “1” to “6”. (Stored in a restart header described later) is applied to the unpacking circuit 8 and the formatting circuit 19. Further, the sampling frequency (fs) and the number of quantization bits (Qb) when the PCM data is A / D converted are applied to the packing circuit 18 and the formatting circuit 19. Each of the prediction circuits 13D1, 13D2, 15D1 to 15D4 outputs a current signal from a past signal in the time domain to a PCM data of each channel “1” to “6” by a plurality of predictors (not shown) having different characteristics. A plurality of linear prediction values are calculated, and then a prediction residual for each predictor is calculated from the original PCM data and the plurality of linear prediction values. The subsequent buffer / selectors 14D1, 14D2, 16D1 to 16D4 temporarily store the prediction residuals calculated by the prediction circuits 13D1, 13D2, and 15D1 to 15D4, respectively, and select signals / DTS (decoding time stamp). The minimum value of the prediction residual is selected for each subframe designated by the generator 17.
[0013]
The selection signal generator 17 applies a prediction residual bit number flag to the packing circuit 18 and the formatting circuit 19, and also includes a predictor selection flag indicating a predictor with the smallest prediction residual, as described later. A correlation coefficient is applied to the formatting circuit 19. The packing circuit 18 packs the prediction residuals for 6ch selected by the buffers / selectors 14D1, 14D2, 16D1 to 16D4 with the designated number of bits based on the bit number flag designated by the selection signal generator 17.
[0014]
The subsequent formatting circuit 19 formats the user data as shown in FIG. This user data includes a variable rate bit stream BS0 including 2ch (1) and (2) predictive encoded data related to the forward group, and a variable rate including predictive encoded data of 4ch (3) to (6) related to other groups. It consists of a bit stream BS1 and a bit stream header provided before the streams BS0 and BS1. Also, the stream BS0, BS1 for one frame is a frame header,
-First sample data of one frame of each ch (1) to (6),
A predictor selection flag for each subframe of ch (1) to (6);
A bit number flag for each subframe of each channel (1) to (6);
A prediction residual data string (number of variable bits) of each ch (1) to (6);
-Correlation coefficients to be described later are multiplexed. According to such predictive coding, when the original signal is, for example, sampling frequency = 96 kHz, quantization bit number = 24 bits, and 6 channels, a compression rate of 71% can be realized.
[0015]
Next, the decoding units 3′-1 and 3′-2 will be described with reference to FIG. The variable rate bit stream data BS0 and BS1 in the above format are separated by the deformatting circuit 21 based on the stream data and the frame header. The first sample data of one frame of each channel “1” to “6” and the predictor selection flag are respectively applied to the prediction circuits 24D1, 24D2, 23D1 to 23D4, and the number of bits of each channel “1” to “6”. The flag and the prediction residual data string are applied to the unpacking circuit 22. Here, a plurality of predictors (not shown) in the prediction circuits 24D1, 24D2, 23D1 to 23D4 have the same characteristics as the plurality of predictors in the encoding-side prediction circuits 13D1, 13D2, and 15D1 to 15D4, respectively. Those having the same characteristics are selected by the predictor selection flag.
[0016]
The unpacking circuit 22 separates the prediction residual data strings of the channels “1” to “6” based on the bit number flags and outputs them to the prediction circuits 24D1, 24D2, and 23D1 to 23D4, respectively. Each of the prediction circuits 24D1, 24D2, 23D1 to 23D4 uses the current prediction residual data of each channel “1” to “6” from the unpacking circuit 22 and a predictor selection flag among a plurality of internal predictors. The previous predicted value predicted by each selected one is added to calculate the current predicted value, and then the PCM data of each sample is calculated based on the first sample data of one frame.
[0017]
Here, when the variable rate bit stream data predictively encoded by the encoding units 2′-1, 2′-2 shown in FIG. 2 is recorded on a DVD audio disk as an example of a recording medium, FIG. Packed in the audio (A) pack shown. This pack has 20 bytes of user data (A packet, V packet), 4 bytes of pack start information, 6 bytes of SCR (System Clock Reference) information, and 3 bytes of Mux rate ( rate) information and a 1-byte stuffing total 14-byte pack header are added (1 pack = total 2048 bytes). In this case, the time of the A pack in the same title can be managed by setting the SCR information as a time stamp as “1” in the first pack in the ACB unit and continuing in the same title.
[0018]
As shown in detail in FIG. 6, the compressed PCM A packet is composed of a 9 to 22 byte packet header, a compressed PCM private header, and audio data (compressed PCM) of 1 to 2015 bytes in the format shown in FIG. ing. The compressed PCM private header is
A 1-byte substream ID,
2-byte UPC / EAN-ISRC (Universal Product Code / European Articial Number-International Standard Recording Code) number and UPC / EAN-ISRC data;
-1 byte private header length,
A 2-byte first access unit pointer;
-4 bytes of audio data information (ADI),
・ With stuffing byte of 0-7 bytes,
It is made up of.
[0019]
A forward access unit search pointer for searching for an access unit after 1 second and a backward access unit search pointer for searching for an access unit before 1 second are both set in one byte in the ADI. Specifically, the forward access unit / search pointer is set in the first byte of the ADI, and the backward access unit / search pointer is set in the eighth byte.
Thus, ADI can store up to 2015 bytes of audio data because it is reduced to 4 bytes in compressed PCM.
[0020]
The audio data area in the compressed PCM (PPCM) audio packet shown in FIG. 6 is composed of a plurality of PPCM access units as shown in FIG. 7, and the PPCM access unit is composed of PPCM sync information and subpackets. A subpacket in the first PPCM access unit is composed of a directory, a substream “BS0”, a CRC (1 byte or 2 bytes), a substream “BS1”, a CRC, and extra information. "," BS1 "is composed only of PPCM blocks. The subpackets in the second and subsequent PPCM access units are also composed of a directory, substream “BS0”, CRC, substream “BS1”, CRC and extra information, and substreams “BS0” and “BS1”. Consists of a restart header and a PPCM block.
[0021]
In addition, when the variable rate bit stream data predictively encoded by the encoding units 2′-1 and 2′-2 shown in FIG. 2 is transmitted via the network, the encoding side transmits the data as shown in FIG. Packetized (step S41), then a packet header is added (step S42), and then the packet is sent out on the network (step S43). As shown in FIG. 9, the decoding side removes the header (step S51), then restores the data (step S52), then stores this data in the memory and waits for decoding (step S53).
[0022]
In the above embodiment, stereo 2ch data (L, R) is transmitted as it is.
“1” = L + R
“2” = LR
“3” to “5” are the same “6” = Lfe−a × C
However, 0 ≦ a ≦ 1 (2) ′
Thus, together with the six channels “1” to “6”, it may be converted into a correlated signal and subjected to predictive coding (second embodiment). In this case, the decoding-side mix & matrix circuit 4 ′ can generate channel R by adding channels “1” and “2” and subtracting channel L by subtraction.
In the above embodiment, multi-channel (6 ch) and stereo (2 ch) are restored, but it goes without saying that either one may be used.
[0023]
FIG. 10 is a diagram illustrating the third embodiment. In this case, 2ch “1” and “2” related to the front group are set to “1” = Lf + Rf without downmixing.
“2” = Lf−Rf
As transmitted. On the playback side, stereo down-mixed stereo two-channel signals Lf and Rf output from the later-stage mix and matrix circuit 4 ′ are used as desired, or downmixed and extracted in this circuit 4 ′. Stereo two-channel signals L and R can also be used.
[0024]
Next, a fourth embodiment will be described with reference to FIG. 11, FIG. 12, and FIG. In the above embodiment, a group of correlated signals “1” to “6” is configured to be predictively encoded. In the fourth embodiment, a plurality of groups of correlated signals are generated. Thus, the prediction coding is performed, and the prediction coding data of the group having the highest compression rate is selected. Further, in this embodiment, the encoding within one group is not classified and converted into 2ch related to the front group and 4ch related to the other group as in the case of each of the above-described embodiments, FIG. 11 is a diagram corresponding to FIG. 1 described above, in which a single encoding process is performed. 12 is provided with first to n-th correlation circuits 1-1 to 1-n. These n correlation circuits 1-1 to 1-n are, for example, 6ch (Lf, C , Rf, Ls, Rs, Lfe) is converted into n types of 6-channel signals “1” to “6” having different correlations.
[0025]
For example, the first correlation circuit 1-1 converts as follows:
“1” = Lf
“2” = C− (Ls + Rs) / 2
“3” = Rf−Lf
“4” = Ls−a × Lfe
“5” = Rs−b × Rf
“6” = Lfe
Further, the nth correlation circuit 1-n converts as follows,
“1” = Lf + Rf
“2” = C−Lf
“3” = Rf−Lf
“4” = Ls−Lf
“5” = Rs−Lf
“6” = Lfe-C
In addition, other correlation circuits perform conversion as in the first embodiment.
[0026]
Further, a prediction circuit 15 and a buffer / selector 16 are provided for each of the correlation circuits 1-1 to 1-n, and the group having the highest compression rate is selected based on the data amount of the minimum value of the prediction residual for each group. It is selected by the signal generator 17b. At this time, the formatting circuit 19 adds and multiplexes the selection flag (correlation circuit selection flag, correlation coefficients a and b of the correlation circuit).
[0027]
FIG. 13 shows a data area corresponding to FIG. 6 described above. In this embodiment, the sub-stream “BS1” is not used, and only the sub-stream “BS0” is used.
[0028]
Further, on the decoding side shown in FIG. 14, n correlation circuits 4-1 to 4-n (or coefficients a and b can be changed to 1 with respect to the correlation circuits 1-1 to 1-n on the encoding side. Two correlation circuits 4) are provided. When the n groups of prediction circuits shown in FIG. 12 have the same configuration, the decoding device does not need to have n groups of prediction circuits as shown in FIG. Then, one of the correlation circuits 4-1 to 4-n is selected based on the selection flag transmitted from the encoding device, or the coefficients a and b are set and the original 6ch (Lf, C, Rf, Ls, Rs, Lfe) is restored, and the multi-channel is downmixed according to Equation (1) to generate stereo 2ch data (L, R).
Further, the 6-channel system having the number of channels “1” to “6” is an example, and another system such as a 5-channel system may be used.
[0029]
In the first embodiment described above, one type of correlation signal “1” to “6” is configured to be predictively encoded. The group of signals “1” to “6” A group of original signals (Lf, C, Rf, Ls, Rs, Lfe) may be predictively encoded, and a group with a higher compression rate may be selected.
[0030]
【The invention's effect】
As described above, according to the present invention, it is possible to transmit an audio signal whose compression rate is improved more than ever and to decode the audio signal without any inconvenience .
[Brief description of the drawings]
FIG. 1 is a block diagram showing a first embodiment of a speech encoding apparatus to which the present invention is applied and a speech decoding apparatus corresponding to the speech encoding apparatus.
FIG. 2 is a block diagram illustrating in detail an encoding unit in FIG. 1;
FIG. 3 is an explanatory diagram showing a bitstream encoded by the encoding unit in FIGS. 1 and 2;
FIG. 4 is a block diagram illustrating in detail a decoding unit of FIG. 1;
FIG. 5 is an explanatory diagram showing a DVD pack format;
FIG. 6 is an explanatory diagram showing a format of a DVD audio pack;
7 is an explanatory diagram showing in detail the format of the audio data area of FIG. 6; FIG.
FIG. 8 is a flowchart illustrating an audio transmission method.
FIG. 9 is a flowchart showing an audio transmission method.
FIG. 10 is a block diagram showing a speech coding apparatus and a speech decoding apparatus corresponding to the third embodiment.
FIG. 11 is a block diagram showing a fourth embodiment of a speech encoding apparatus to which the present invention is applied and a speech decoding apparatus corresponding to the speech encoding apparatus.
FIG. 12 is a block diagram showing a speech encoding apparatus according to a fourth embodiment.
FIG. 13 is an explanatory diagram of another embodiment corresponding to FIG. 7;
FIG. 14 is a block diagram showing a speech decoding apparatus according to a fourth embodiment.
[Explanation of symbols]
1 '6ch mix & matrix circuit (correlation means, downmix means)
13D1, 13D2, 15D1 to 15D4 Prediction circuit (composed with a buffer / selector 14D1, 14D2, 16D1 to 16D4 to constitute a predictive coding means)
14D1, 14D2, 16D1 to 16D4 Buffer / selector 19 Formatting circuit (formatting means)

Claims

Downmixing the original multi-channel audio signal into a stereo 2-channel audio signal;
Converting each audio signal of a plurality of channels of the original channel not downmixed into a correlated audio signal by a predetermined matrix operation;
A head sample value is obtained in response to an audio signal input for each channel of the stereo 2 channel and the correlated audio signal, and a plurality of linear prediction methods having different characteristics are used to calculate the current signal from the past in the time domain. Selecting a linear prediction method such that each of the linear prediction values is predicted, and the prediction residual obtained from the predicted linear prediction value and the speech signal is minimized;
A data structure including header information, user data including a compressed PCM private header and an audio compressed PCM data portion, and a head sample value, a prediction residual, and a linear prediction method of each channel selected in the above step. Audio encoding comprising the steps of: recording predictive encoded data in the audio compressed PCM data section, and placing the UPC / EAN-ISRC number and UPC / EAN-ISRC data of the audio signal in the compressed PCM private header An audio signal transmission method for transmitting an audio signal encoded by the method,
Predictive encoded data including the selected first sample value, prediction residual, and linear prediction method, and UPC / EAN-ISRC number and UPC / EAN-ISRC data of the voice signal are packetized and transmitted. A voice signal transmission method.

Downmixing the original multi-channel audio signal into a stereo 2-channel audio signal;
Converting each audio signal of a plurality of channels of the original channel that is not downmixed into a correlated audio signal by a predetermined matrix operation;
A head sample value is obtained in response to an audio signal input for each channel of the stereo 2 channel and the correlated audio signal, and a plurality of linear prediction methods having different characteristics are used to calculate the current signal from the past in the time domain. Selecting a linear prediction method such that each of the linear prediction values is predicted, and the prediction residual obtained from the predicted linear prediction value and the speech signal is minimized;
A data structure including header information, user data including a compressed PCM private header and an audio compressed PCM data portion, and a head sample value, a prediction residual, and a linear prediction method of each channel selected in the above step. Audio encoding comprising the steps of: recording predictive encoded data in the audio compressed PCM data section, and placing the UPC / EAN-ISRC number and UPC / EAN-ISRC data of the audio signal in the compressed PCM private header A speech decoding method for decoding an original speech signal from data encoded by the method,
Calculating a prediction value from predictive encoded data including the selected first sample value, prediction residual, and linear prediction method;
Restoring the first plurality of channels of digital audio signals from the calculated predicted value;
A speech decoding method comprising: