JP4103155B2

JP4103155B2 - Digital audio signal processing apparatus and processing method

Info

Publication number: JP4103155B2
Application number: JP25331897A
Authority: JP
Inventors: 眞吾中田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1997-09-18
Filing date: 1997-09-18
Publication date: 2008-06-18
Anticipated expiration: 2017-09-18
Also published as: JPH1196693A

Description

【０００１】
【発明の属する技術分野】
この発明は、例えばディジタルビデオ再生装置におけるディジタルオーディオ信号の処理装置および処理方法に関する。
【０００２】
【従来の技術】
近年、ディジタル化されたビデオ信号を記録／再生するようなディジタルビデオテープレコーダが出現している。このようなディジタルビデオテープレコーダでは、記録時にはディジタル化されたビデオ信号に誤り訂正符号が付された信号が磁気テープに記録される。この磁気テープに対する記録は、従来のアナログ方式と同様、ヘリカルスキャンで、回転ヘッドによる傾斜アジマス記録という方式が用いられる。この傾斜アジマス記録方式とは、ギャップの延長方向が互いに異なる２つのヘッドによってテープ上に信号を記録していく方法である。
【０００３】
再生時には、このテープ上に形成された斜めのトラックを記録時と同じように回転ヘッドでスキャンすることによって再生ＲＦ信号を得ている。この再生ＲＦ信号は、アンプ、イコライザなどを介してディジタルのデータ列である再生データとされる。また、この再生データは、再生信号と同期したクロックを生成するＰＬＬにも供給され、再生クロックが生成される。
【０００４】
実際には、ビデオ信号の１フレーム分のデータが複数、例えば１０フレームにわたって記録される。したがって、再生時にも、回転ヘッドがこの１０とラックをスキャンすることでビデオ信号１フレーム分のデータが得られる。
【０００５】
当然のことながら、ディジタルビデオテープレコーダにおいては、ビデオデータと共にオーディオデータも記録される。このとき、ＮＴＳＣによる（５２５／６０）方式のビデオ信号におけるフィールド周波数は、５９．９４Ｈｚとされている。それに対して、オーディオデータのサンプリング周波数は、４８ｋＨｚ，４４．１ｋＨｚあるいは、３２ｋＨｚなどとされている。したがって、１ビデオフレームの周期とオーディオデータのサンプリング周波数とが整数比の関係にない。
【０００６】
例えば、サンプリング周波数が４８ｋＨｚの場合、１ビデオフレーム中のオーディオサンプル数は、
４８０００Ｈｚ×（２〔フィールド〕／５９．９４Ｈｚ）≒１６０１．６サンプル
とされ、１フレーム当たり０．６サンプル分の端数が生じる。同様に、サンプリング周波数が３２ｋＨｚの場合、
３２０００Ｈｚ×（２〔フィールド〕／５９．９４Ｈｚ）≒１０６７．７３４サンプル
とされ、１フレーム当たり略０．７３４サンプル分の端数が生じる。
【０００７】
そのため、オーディオデータにおいては、整数個のサンプル数が入るフレームの組み合わせによって、平均的ビデオフレームとオーディオデータのサンプリング周波数との関係が規定のものとなるようにされている。上述の４８ｋＨｚの例では、５フレーム周期でサンプル数を調整することによって、端数を吸収して整数サンプル数での処理が行なえるようになる。例えば、オーディオデータのサンプリング周波数が４８ｋＨｚの場合においては、１ビデオフレームに対してオーディオデータ１６００あるいは１６０１サンプル入るフレームと、１６０２サンプルが入るフレームとの組み合わせが用いられる。
【０００８】
オーディオデータは、上述した１ビデオフレーム分のデータが記録される１０トラックのうち、例えば前半の５トラックには左チャンネル（L-ch）のオーディオデータが、後半の５トラックには右チャンネル（R-ch）のオーディオデータが記録される。後述するように、オーディオデータは、１ビデオフレーム毎に異なるサンプル数を有する場合があり、そのため、１ビデオフレーム中のオーディオデータのサンプル数の情報が記録される。これは、ＡＦＳＩＺＥと称され、L-chおよびR-chそれぞれについて、対応するトラックにトラック毎の所定の領域に記録される。
【０００９】
そして、再生時にこのＡＦＳＩＺＥが読み出され、例えばオーディオ再生処理回路におけるＰＬＬの手掛かりとして用いられる。すなわち、オーディオデータは、ビデオフレームに対して位相が合っていなければならないため、このＡＦＳＩＺＥによって再生クロックの周波数を変えるのである。
【００１０】
ところで、ディジタルビデオのフォーマットにおいて、ディジタルオーディオ信号の扱いとして、ビデオフレームと非同期で処理を行なうＵｎｌｏｃｋｅｄモードと、ビデオフレームに対して同期を取るＬｏｃｋｅｄモードと称されるモードとの２モードが規定されている。
【００１１】
Ｕｎｌｏｃｋモードでは、１フレームに入るサンプル数が、例えば１５８０サンプル〜１６２０サンプルといった所定の範囲内で可変とされる。標準的には、１フレームには、１６０１サンプルあるいは１６０２サンプルが入る。１６０１サンプルの２フレームと１６０２サンプルの３フレームとが組み合わされ、５フレームを周期として、１６０１サンプルのフレームと１６０２サンプルのフレームとが繰り返されることが予想される。Ｕｎｌｏｃｋモードでは、上述のＡＦＳＩＺＥに基づき、フレーム毎のオーディオデータのサンプル数を求める。
【００１２】
一方、Ｌｏｃｋｅｄモードでは、ビデオフレームとの同期を取るために、１フレーム当たりのディジタルオーディオデータのサンプル数が固定的に決められている。すなわち、Ｌｏｃｋｅｄモードでは、１フレーム当たりのオーディオデータのサンプル数は、１６００サンプルまたは１６０２サンプルである。Ｌｏｃｋｅｄモードでは、１６００サンプルの１フレームと、１６０２サンプルの４フレームとが組み合わされる。また、これらの１６００および１６０２サンプルのフレームの並び方も、固定的とされる。
【００１３】
このように、オーディオデータは、フレーム毎にサンプル数が異なる。そのため、特にアナログ処理において、ビデオフレームとオーディオデータとを対応させるために、フレーム毎にオーディオデータのサンプリング周波数を変えてやる必要がある。そのためには、上述したＡＦＳＩＺＥデータが用いられる。ＡＦＳＩＺＥデータに基づき、オーディオデータのサンプリング周波数を生成するＰＬＬ(Phase Locked Loop) の動作を制御することで、フレーム毎に異なるサンプリング周波数を得る。図１７は、従来の技術による、ＡＦＳＩＺＥデータを用いたＰＬＬ回路１００の構成の一例を示す。
【００１４】
図１７に示されるＰＬＬ回路１００において、基準フレーム信号が端子１１０を介して位相比較器１１２の一方の入力端に供給される。ＡＦＳＩＺＥデータが端子１１１を介して、フレームカウンタ１１６の一方の入力端に供給される。フレームカウンタ１１６では、ＶＣＯ（電圧制御発振器）１１４から出力された信号がカウントされ、ＡＦＳＩＺＥデータ分カウントしたところでフレーム信号が出力される。このようにして帰還フレームが再生される。このフレーム信号が位相比較器１１２の他方の入力端に供給される。
【００１５】
位相比較器１１２において、基準フレーム信号とカウンタ１１６から供給されるフレーム信号とが比較され、位相誤差データが出力される。位相誤差データがローパスフィルタ１１３を介してＶＣＯ１１４に供給される。ＶＣＯ１１４では、位相誤差を打ち消すような周波数の信号が出力される。この信号がフレームカウンタ１１６に供給される。また、ＶＣＯ１１４の出力は、１／ｎ分周器１１５にも供給され、分周比ｎで分周されることにより、所定のサンプリング周波数を有するオーディオ信号のサンプリングクロックとされ、出力端１１７に導出される。
【００１６】
このような構成とすることによって、位相比較器１１２では、基準フレーム信号とフレームカウンタ１１６においてＡＦＳＩＺＥデータに応じて出力されたフレーム信号とが比較されるため、ＡＦＳＩＺＥデータに応じたオーディオサンプリングクロックを、フレーム毎に得ることができる。
【００１７】
【発明が解決しようとする課題】
Ｌｏｃｋｅｄモードは、オーディオサンプリングクロックの安定した供給がその目的の一つとされる。しかしながら、Ｌｏｃｋｅｄモードにおいて、上述した従来技術による方法によってサンプリングクロックを得ようとした場合、所定フレーム周期で位相エラーが発生してしまうという問題点があった。例えば、サンプリング周波数が４８ｋＨｚのモードでは、５フレーム周期で、また、３２ｋＨｚのモードでは、１５フレーム周期で、ＡＦＳＩＺＥデータについて２サンプル分の位相エラーが発生してしまう。
【００１８】
また、Ｕｎｌｏｃｋモードでは、標準的な設定では、５フレーム周期で１６０２サンプル，１６０１サンプル，１６０２サンプル，１６０１サンプル，１６０２サンプルの繰り返しが予想される。これに対して、Ｌｏｃｋｅｄモードでは、５フレーム周期での、１６００サンプル，１６０２サンプル×４の繰り返しである。これで明らかなように、ＰＬＬの位相エラーについて考えた場合、ＬｏｃｋｅｄモードがＵｎｌｏｃｋモードに対して有利であるとは言い難いという問題点があった。
【００１９】
さらに、Ｌｏｃｋｅｄモードにおいて、サンプリングクロックの安定供給を行なうために、ＡＦＳＩＺＥサイズデータを用いないような、Ｕｎｌｏｃｋモードとは異なった方法でＰＬＬを行なうことが考えられる。しかしながら、この場合には、２種類のＰＬＬ回路を用意したり、また、１種類のＰＬＬ回路でも一部の動作設定を切り替えるような構成が必要となり、コストや回路規模などの点で不利になるという問題点があった。
【００２０】
さらにまた、この従来技術による方法では、フレーム毎に、あるいは数フレームに１フレームだけ、オーディオサンプリングクロックが異なってしまうことにより、例えばＰＬＬの追従能力の限界などにより、得られるサンプリングクロックにジッタが含まれるといった問題点があった。
【００２１】
したがって、この発明の目的は、同一の回路構成で以て、ＬｏｃｋｅｄモードおよびＵｎｌｏｃｋモードの双方で、オーディオサンプリングクロックを安定に供給できるようなディジタルオーディオ信号の処理装置および処理方法を提供することにある。
【００２２】
【課題を解決するための手段】
この発明は、上述した課題を解決するために、ビデオフレームと関連してディジタルオーディオ信号を扱うようにされたディジタルオーディオ信号の処理装置において、各ビデオフレームのディジタルオーディオ信号のサンプル数を示す制御情報を抽出する抽出手段と、制御情報により示される、連続する複数のビデオフレームのサンプル数の平均値を求める平均化手段と、平均化手段によって求められた平均値に基づき生成されるフレーム信号と基準フレーム信号とを比較して位相誤差データを形成し、位相誤差データに基づきディジタルオーディオ信号を処理するためのクロックを生成するクロック生成手段とを有することを特徴とするディジタルオーディオ信号の処理装置である。
【００２３】
また、この発明は、上述した課題を解決するために、ビデオフレームと関連してディジタルオーディオ信号を扱うようにされたディジタルオーディオ信号の処理方法において、各ビデオフレームのディジタルオーディオ信号のサンプル数を示す制御情報を抽出する抽出のステップと、制御情報により示される、連続する複数のビデオフレームのサンプル数の平均値を求める平均化のステップと、平均化のステップによって求められた平均値に基づき生成されるフレーム信号と基準フレーム信号とを比較して位相誤差データを形成し、位相誤差データに基づきディジタルオーディオ信号を処理するためのクロックを生成するクロック生成のステップとを有することを特徴とするディジタルオーディオ信号の処理方法である。
【００２４】
上述したように、この発明は、複数のビデオフレームにおけるディジタルオーディオ信号のサンプル数の平均値を求め、この平均値と基準フレーム信号とを比較することによってディジタルオーディオ信号を処理するためのクロックを生成するようにされているため、ディジタルオーディオ信号のサンプル数がビデオフレーム間で標準値近傍の変動を有していても、安定したクロックを生成することができる。
【００２５】
【発明の実施の形態】
以下、この発明の実施の一形態を、図面を参照しながら説明する。この発明では、連続する５フレームについて、順次オーディオデータのサンプル数の平均化を行い、平均化の結果に基づきオーディオサンプリングクロックを生成する。最初に、この発明を適用できる磁気再生装置の具体的な一例として、回転ヘッド型のディジタルＶＴＲ（ビデオテープレコーダ）について説明する。図１に示すように、テープ上に斜めにトラックが形成される。Ｔ０、Ｔ１は、トラックナンバーを示し、隣接するトラック間のアジマスが相違する傾斜アジマス記録がなされる。図２は、１本のトラックを示す。トラック入口側には、ＩＴＩ（Insert and Track Information）なるアフレコを確実に行うためのタイミングブロックが設けられる。これは、それ以降のエリアに書かれたデータをアフレコして書き直す場合に、そのエリアの位置決めを正確にするために設けられるものである。
【００２６】
この例では、コンポジットディジタルカラービデオ信号が輝度信号Ｙ、色差信号Ｃ_RおよびＣ_Bからなるコンポーネント信号に変換され、コンポーネント信号がＤＣＴ変換と可変長符号により圧縮され、回転ヘッドにより磁気テープに記録される。記録方式としては、ＳＤ方式（５２５ライン／６０Ｈｚ、６２５ライン／５０Ｈｚ）とＨＤ方式（１１２５ライン／６０Ｈｚ、１２５０ライン／５０Ｈｚ）とが設定できる。
【００２７】
１フレーム当たりのトラック数は、ＳＤ方式の場合には、５２５ライン／６０Ｈｚでは図３に示されるように１０トラックとされ、６２５ライン／５０Ｈｚでは図４に示されるように１２トラックとされる。図示しないが、ＨＤ方式の場合には、１フレーム当たりのトラック数がＳＤ方式の倍、つまり、２０トラック（１１２５ライン／６０Ｈｚの場合）、または２４トラック（１２５０ライン／５０Ｈｚの場合）である。オーディオサンプリング周波数が４４．１ｋHzおよび４８ｋHzの場合では、前半の５トラック（６トラック）にＬチャンネルのオーディオデータが記録され、その後半の５トラック（６トラック）にＲチャンネルのオーディオデータが記録される。
【００２８】
図２のトラックフォーマットに示すように、ＩＴＩエリアの後に、ヘッドの走査順に、オーディオデータ、ビデオデータおよびサブコードデータが記録される。ビデオデータおよびオーディオデータを記録するエリアには、それぞれに付加情報を記録するための補助的データ（ＡＵＸ）を書込むエリアが設けられる。ＡＵＸには、記録日時や記録時間などオーディオ、ビデオデータ以外のデータを書込むことができる。サブコードデータ、ＡＵＸ、カセットに内蔵した半導体メモリに記録するデータは、形式を共通とされている。この形式は、パック構造と称される。パックとは、データグルーブの最小単位のことである。
【００２９】
図５Ａに示すように、一つのパックは、５バイト（ＰＣ０〜ＰＣ４）から構成される。先頭の１バイト（ＰＣ０）がヘッダであり、残りの４バイトがデータである。ヘッダの１バイトは、上位４ビットと下位の４ビットに分かれ、上位４ビットの上位ヘッダと下位４ビットの下位ヘッダとからなる階層構造を形成する。図５Ｂは、ヘッダバイトＰＣ０が（０１０１００００）とされるオーディオＡＵＸソースパックを示す。このパック内のデータ、例えばバイトＰＣ１内のデータは、次に記すように規定される。
【００３０】
ＬＦ（１ビット）：ビデオサンプリング周波数とオーディオサンプリング周波数とがロックしているかどうかの指示
ＡＦＳＩＺＥ（６ビット）：１ビデオフレーム内のオーディオフレームの大きさ（オーディオサンプル数）の指示
この発明では、このＡＦＳＩＺＥが関連している。
【００３１】
ビデオフレーム周波数は、ＮＴＳＣによる（５２５／６０）方式の場合では、２９．９７Ｈｚである。一方、オーディオのサンプリング周波数が例えば４８ｋＨｚの場合では、ビデオフレーム内のオーディオサンプル数が整数とならず、略１６０１．６となる。そこで、従来技術で上述したように、各ビデオフレームに対して、この数に近い整数のオーディオサンプル数を配分し、平均的なオーディオサンプル数が上述の数に一致するようになされる。
【００３２】
Ｕｎｌｏｃｋモードの場合のＡＦＳＩＺＥ（例えば５２５／６０方式の場合）は、図５Ｃに示すように規定されている。この図５Ｃから分かるように、例えばサンプリング周波数が４８ｋHzの場合では、１ビデオフレーム当りのオーディオサンプル数として１５８０〜１６２０の範囲内の数をとりうる。そのトラック（フレーム）に記録されているオーディオサンプル数がＡＦＳＩＺＥによって指示される。
【００３３】
なお、従来技術で上述したように、Ｕｎｌｏｃｋモードでは、１６０１サンプル×２フレームおよび１６０２サンプル×３フレームの組み合わせが標準的な設定とされる。また、Ｌｏｃｋｅｄモードでは１６００サンプル×１フレームおよび１６０２サンプル×４フレームの組み合わせが規定される。
【００３４】
オーディオデータ、ビデオデータ、サブコードがそれぞれ記録されるエリアは、それぞれオーディオセクタ、ビデオセクタ、サブコードセクタと呼ばれる。これらのセクタ間には、データを記録していないギャップＧ１、Ｇ２、Ｇ３が配される。オーディオセクタは、プリアンブル（プリシンクブロック）ＰＲ１、データ部（１４シンクブロック）およびポストアンブルＰＯ１（ポストシンクブロッ）からなる。
【００３５】
オーディオシンクブロックは、図６のように、９０バイトで構成される。前半の５バイトは、シンクおよびＩＤデータである。オーディオデータ（７２バイト）およびオーディオＡＵＸ（ＡＡＵＸ）（５バイト）が１シンクブロックに含まれる。このデータが積符号によってエラー訂正符号化される。すなわち、水平方向に整列する７７バイトに対して内符号（Ｃ１符号と称される）の符号化がなされる。具体的には、（８５，７７）リード・ソロモン符号がＣ１符号として使用され、８バイトのＣ１（内符号）パリティが付加される。Ｃ１符号の系列の方向がデータの記録／再生方向である。また、垂直方向に並ぶ９バイトのデータに対して、外符号（Ｃ２符号と称される）のエラー訂正符号化がなされる。具体的には、（１４，９）リード・ソロモン符号がＣ２符号として使用され、５バイトのＣ２（外符号）パリティが付加される。
【００３６】
ビデオセクタは、プリアンブル（プリシンクブロック）ＰＲ２、データ部（１４９シンクブロック）およびポストアンブルＰＯ２（ポストシンクブロッ）からなる。図７は、ビデオセクタの構成を示す。プリアンブルおよびポストアンブルの構成は、図６に示されるオーディオセクタと同様である。ビデオセクタ内に１４９個含まれるビデオシンクブロックは、オーディオシンクブロックと同様に９０バイトで１シンクブロックが構成される。
【００３７】
シンクブロックの先頭の５バイトは、シンクおよびＩＤである。データ部は７７バイトで、オーディオデータと同様の積符号のエラー訂正符号化がなされ。具体的には、（８５，７７）リード・ソロモン符号がＣ１符号として使用され、また、（１４９，１３８）リード・ソロモン符号がＣ２符号として使用される。そして、Ｃ１（内符号）パリティ（８バイト）とＣ２（外符号）パリティ（１１バイト）がそれぞれ付加されている。シンクブロック番号１９および２０の２シンクブロックと、Ｃ２パリティの直前の１シンクブロックはビデオＡＵＸ（ＶＡＵＸ）専用のシンクで、７７バイトのデータはＶＡＵＸデータとして用いられる。ＶＡＵＸおよびＣ２パリティ以外の中央部の１３５シンクブロックは、圧縮されたビデオ信号のビデオデータが格納されるエリアである。
【００３８】
さらに、図８は、サブコードセクタの構成を示す。サブコードセクタのプリアンブル、ポストアンブルには、オーディオセクタやビデオセクタと異なりプリシンクおよびポストシンクが存在しない。サブコードシンクブロックは、１２バイトの長さであり、その前半の５バイトは、シンクおよびＩＤである。続く５バイトはデータ部で、データ部に対しては、Ｃ１符号の符号化のみがなされる。そして、Ｃ１パリティ（２バイト）が付加される。このように、積符号構成は、サブコードでは、採用されていない。これは、サブコードが主として高速サーチ用のものであり、Ｃ２パリティを再生できることが少ないからである。また、２００倍程度まで高速サーチするために、シンク長も１２バイトと短くしてある。サブコードシンクブロックは、１トラック当り１２シンクブロックある。
【００３９】
図９は、上述したディジタルＶＴＲにこの発明を適用した場合の再生系の構成を示す。図示しないが、このディジタルＶＴＲは、マイクロプロセッサなどによるＣＰＵで制御されるものである。磁気テープ（カセットテープ）１から磁気ヘッド（回転ヘッド）２により再生された信号が再生信号処理回路３に供給される。再生信号処理回路３は、再生アンプ、再生等化器等で構成されている。再生信号処理回路３からの再生データがＣ１デコーダ４に供給される。Ｃ１デコーダ４は、Ｃ１符号のエラー訂正を行う。上述したＣ１符号の場合、例えばシンクブロック内の３シンボルまでのエラーを訂正する。
【００４０】
Ｃ１デコーダ４の出力がＴＢＣ（時間軸補償器）５に供給される。ＴＢＣ５は、メモリを有し、再生信号中に含まれる時間軸変動を除去する。ＴＢＣ５の出力データがフレームメモリ５に供給される。フレームメモリ５によって、データの順序がＣ２符号の順序へ変換され、次段のＣ２デコーダ７において、Ｃ２復号がなされる。一例として、Ｃ２復号では、Ｃ１符号でエラー訂正できなかった所定数までのエラーシンボルをイレージャ訂正によって訂正する。
【００４１】
Ｃ２デコーダ７の出力データがデシャフリングおよび補間処理回路８に供給される。デシャフリングは、記録処理においてなされているシャフリング（データの配列、順序の並び替え）を元の配列、順序に戻す処理である。補間処理は、Ｃ１符号およびＣ２符号によって訂正できなかったエラーを修整する処理である。ビデオデータの場合では、例えば１フレーム前の正しいデータによってエラーデータが修整される。また、デシャフリングおよび補間処理回路８は、メモリ９ａ、９ｂと入力切り換えスイッチ１０と出力切り換えスイッチ１１とからなる２バンク構成とされ、連続的に再生されたデータを処理して、連続的に出力することが可能とされている。デシャフリングおよび補間処理回路８から出力されるビデオ信号は、後段のビデオ信号処理系へ供給される。
【００４２】
また、再生オーディオ信号がスイッチング回路１２に供給され、チャンネル毎に分離されたオーディオデータが形成される。Ｌチャンネルのオーディオデータがオーディオ信号処理回路１３ａに供給され、Ｒチャンネルのオーディオデータがオーディオ信号処理回路１３ｂに供給される。これらオーディオ信号処理回路１３ａ、１３ｂは、デシャフリング、時間軸伸長、ＡＡＵＸ（オーディオＡＵＸ）の分離等の処理を行う。分離されたＡＡＵＸから上述したＡＦＳＩＺＥが抽出される。これらの処理のために、各信号処理回路には、１フレーム分の再生オーディオデータを記憶できるメモリが設けられており、このメモリの読出しアドレスがＡＦＳＩＺＥに基づいて生成される。
【００４３】
オーディオ信号処理回路１３ａからのＬチャンネルのデータがＤ／Ａ変換器１４ａに供給され、Ｄ／Ａ変換器１４ａからアナログのＬチャンネルのオーディオ信号が出力される。同様に、オーディオ信号処理回路１３ｂからのＲチャンネルのデータがＤ／Ａ変換器１４ｂに供給され、Ｄ／Ａ変換器１４ｂからアナログのＲチャンネルのオーディオ信号が出力される。
【００４４】
オーディオ信号処理回路１３ａおよび１３ｂ，Ｄ／Ａ変換器１４ａおよび１４ｂで用いられる、オーディオ処理のためのオーディオサンプリングクロックは、ＰＬＬ回路１５によって生成される。オーディオ信号処理回路１３ａでＡＡＵＸから抽出されたＡＦＳＩＺＥがＰＬＬ回路１５の一方の入力端に供給される。タイミング信号発生回路１６において、ビデオ信号処理系で用いられる基準フレーム信号が発生される。この基準フレーム信号がＰＬＬ回路１５の他方の入力端に供給される。ＰＬＬ回路１５では、これらＡＦＳＩＺＥおよび基準フレーム信号とに基づき、上述のオーディオサンプリングクロックを生成する。
【００４５】
図１０は、この実施の一形態におけるＰＬＬ回路１５の構成の一例を示す。ＡＦＳＩＺＥが端子２０に対して供給される。基準フレーム信号が端子２１に対して供給される。基準フレーム信号は、帰還フレームカウンタ２２に対してリセット信号として供給されると共に、演算処理回路２３および位相比較器２４のそれぞれの一方の入力端に供給される。ＡＦＳＩＺＥは、帰還フレームカウンタ２２および演算処理回路２３のそれぞれの一方の入力端に対して供給される。
【００４６】
演算処理回路２３には、後述する１／ｍ分周器２７からオーディオサンプリング周波数より高い周波数のクロックが動作クロックとして供給される。この動作クロックは、例えばオーディオサンプリング周波数の１０倍の周波数を有する。すなわち、オーディオサンプリング周波数が４８ｋＨｚであれば、４８０ｋＨｚの周波数を有する。勿論、オーディオサンプリング周波数の２５６倍といった、さらに高い周波数のクロックとしてもよい。これにより、演算処理回路２３において、より高い分解能力が実現される。
【００４７】
図１１は、演算処理回路２３の構成の一例をさらに詳細に示す。演算処理回路２３では、５フレーム分のＡＦＳＩＺＥの平均値が求められる。すなわち、基準フレーム信号のタイミングで遅延される遅延素子を４個用い、ＡＦＳＩＺＥを順次遅延させ、入力ＡＦＳＩＺＥおよび遅延させたそれぞれのＡＦＳＩＺＥを加算ならびに除算して、平均値を求める。
【００４８】
この例では、遅延素子として、端子Ｄに供給された信号を、端子Ｅｎに供給される信号のタイミングで端子Ｑに出力するようなレジスタ４２ａ〜４２ｄが用いられる。端子４１から供給されたＡＦＳＩＺＥがレジスタ４２ａに供給され、端子４０から各レジスタ４２ａ〜４２ｄの端子Ｅｎに対して供給された基準フレーム信号により、順次レジスタ４２ｂ，４２ｃ，４２ｄへと送られる。このようにして得られた５フレーム分のＡＦＳＩＺＥが加算器４３で加算され、除算器４４で除算されることにより、５フレーム分のＡＦＳＩＺＥの平均値が算出される。このＡＦＳＩＺＥの平均値は、スイッチ回路４５を介して帰還フレームカウンタ４６に供給される。
【００４９】
こうして求められるＡＦＳＩＺＥの平均値は、例えば次のようになる。Ｌｏｃｋｅｄモードの場合、図１２Ａに示されるように５フレーム周期で１６００サンプルのフレームが到来し、平均化処理を行なうと、
（１６００＋１６０２×４）／５＝１６０１．６サンプル
となる。すなわち、０．６サンプルの端数が生じる。
【００５０】
一方、Ｕｎｌｏｃｋモードの場合、図１２Ｂに示されるように、５フレーム周期で、１６０２サンプルの３フレームと１６０１サンプルの２フレームとが混在して到来することが予想される。平均化処理を行なうと、
（１６０１×２＋１６０２×３）／５＝１６０１．６サンプル
となり、０．６サンプルの端数が生じる。
【００５１】
上述したように、この例に示される演算処理回路２３は、オーディオサンプリング周波数の１０倍の周波数を有する高速なクロックで動作している。すなわち、１／ｍ分周器２７から端子４９を介して、この高速なクロックが供給される。このクロックに基づき、帰還フレームカウンタ４６は、この例では、ＡＦＳＩＺＥの小数点第１位以下の分解能でカウントを行なうことが可能である。したがって、帰還フレームカウンタ４６は、ＡＦＳＩＺＥの平均化の際に端数として生じた０．６サンプルをカウントすることができる。
【００５２】
このようにして、帰還フレームカウンタ４６では、高速なクロックに基づきカウントを行ない、ＡＦＳＩＺＥの平均値に達すると、フレーム信号を出力する。このフレーム信号は、位相比較器２４の他方の入力端に供給される。演算処理回路２３では、このようにして、５フレーム周期でオーディオサンプル数の端数の正規化を行なっている。
【００５３】
位相比較器２４において、一方の入力端に供給された基準フレーム信号と、他方の入力端に供給されたフレーム信号とが比較され、位相誤差データが出力される。この位相誤差データは、ローパスフィルタ２５を介してＶＣＯ２６に供給される。ＶＣＯ２６では、この位相誤差データを打ち消すような周波数の信号を出力する。この信号は、上述の１／ｍ分周器２７および１／ｎ分周器２８に共に供給される。
【００５４】
１／ｍ分周器２７では、上述したように、オーディオサンプリング周波数の１０倍以上の周波数を有するクロックが得られるように、分周比ｍが選ばれる。１／ｍ分周器２７の出力が演算処理回路２３に動作クロックとして供給される。また、１／ｎ分周器２８では、分周出力としてオーディオサンプリング周波数を有するクロックが得られるように分周比ｎが選ばれる。１／ｎ分周器２８の分周出力が端子２９に対して供給され、オーディオサンプリングクロックとして出力される。また、それと共に、１／ｎ分周器２８の出力が帰還フレームカウンタ２２の他方の入力端に対して供給される。
【００５５】
帰還フレームカウンタ２２では、１／ｎ分周器２８の分周出力に基づき、ビデオフレーム毎のＡＦＳＩＺＥまでのカウントがなされる。カウントは、基準フレーム信号によってリセットされる。このカウントによって、帰還フレームカウンタ２２から、オーディオ処理のための動作フレーム信号が出力される。このオーディオ動作フレーム信号は、端子３１を介して外部に出力され、図９では省略されているが、オーディオ信号処理回路１３ａおよび１３ｂ，Ｄ／Ａ変換器１４ａおよび１４ｂに供給される。
【００５６】
なお、上述の構成では、急激な変位、例えば動作モードがオーディオサンプル周波数が４８ｋＨｚのモードから３２ｋＨｚのモードへと変化したような場合、クロックの追従が非常に遅くなってしまう。そこで、このような場合には、演算処理回路２３のスイッチ回路４５において端子４５ｂを選択する。こうすることによって、高速な動作への対応が可能とされる。
【００５７】
図１３は、５フレーム周期の処理を行なった場合の、１ビデオフレーム当たりのオーディオサンプル数の標準偏差を示す。ＡＦＳＩＺＥ入力パターンＡは、Ｌｏｃｋｅｄモードに対応し、パターンＢは、Ｕｎｌｏｃｋモードに対応する。この図で明らかなように、この発明による、５フレームでの平均化処理を行なった場合には、標準偏差の値が０となり、フレーム毎に処理を行なうよりも安定したクロックの供給がなされることがわかる。また、図１４は、フレーム毎に処理を行なった場合の、標準近傍のＡＦＳＩＺＥの、標準値（１６０１．１６）に対するオフセットを示す。フレーム毎の処理では、常にこの図１４に示されるようなオフセットが含まれることになる。
【００５８】
上述では、この発明がビデオがＮＴＳＣ方式であって、オーディオサンプリング周波数が４８ｋＨｚである場合に適用されるように説明したが、これはこの例に限定されるものではない。ビデオがＮＴＳＣ方式で、オーディオサンプリング周波数が３２ｋＨｚの例にも適用可能なものである。
【００５９】
従来技術で既に説明したように、３２ｋＨｚモードでは、１フレーム当たりのオーディオサンプリング数は、１０６７．７３４サンプルとされる。そこで、３２ｋＨｚモードでは、Ｌｏｃｋｅｄモードで以て１５フレーム周期で、端数の正規化を行なっている。図１５に示されるように、１０６６サンプルの３フレームと、１０６８サンプルの１２フレームとが組み合わされる。すなわち、
（１０６６〔サンプル〕×３＋１０６８〔サンプル〕×１２）×２＝３２０２８〔サンプル〕
となる。５フレームの周期内で、１０６６サンプルのフレームが６フレームおきに入れられる。
【００６０】
１５フレーム周期で波数の正規化を行なおうとする場合、本来であれば、基準フレーム信号のタイミングで遅延される遅延素子を１４個用意し、１５フレームでの演算を行なう必要がある。図１６は、３２ｋＨｚモードで、５，７，８，および１５の各フレーム周期で処理を行なった場合について、１ビデオフレーム当たりのオーディオサンプル数の標準偏差を示す。この図１６に示されるように、遅延素子の数を減らし、７および８フレーム周期を設定した場合でも、フレーム毎に処理を行なう例に比べて格段に安定したクロックを供給することができる。
【００６１】
さらに遅延素子を減らし、５フレーム周期に設定した場合でも、フレーム毎の処理に比べ、非常に好適な結果が得られるこの場合には、上述の４８ｋＨｚの場合と、回路を共通化できることは言うまでもない。
【００６２】
また、遅延素子の構成をこれらの何れに設定する場合でも、ＬｏｃｋｅｄモードをＵｎｌｏｃｋモードとで同一の処理を行なうことができる。
【００６３】
なお、上述では、この発明がディジタルＶＴＲに適用された例について説明したが、これはこの例に限定されるものではない。例えば、ＭＤ(Mini Disc) やＤＶＤ(Digital Versatile Disc)、ハードディスクといったディスク記録媒体からビデオ信号を再生するような場合にも、この発明を適用することができる。
【００６４】
【発明の効果】
以上説明したように、この発明によれば、複数フレームのＡＦＳＩＺＥを平均化し、平均化されたＡＦＳＩＺＥを用いてＰＬＬ処理を行なっているため、帰還フレームの安定化が図られ、オーディオサンプリングクロックを安定的に供給することができるという効果がある。
【００６５】
また、この発明では、ＡＦＳＩＺＥの複数フレームでの平均化により、ＬｏｃｋｅｄモードおよびＵｎｌｏｃｋモードとを考慮しなくても、同一構成で再生オーディオデータの処理を行なうことができるという効果がある。
【図面の簡単な説明】
【図１】この発明を適用することができるディジタルＶＴＲの一例のトラックパターンを示す略線図である。
【図２】１トラックのデータ配列を説明するための略線図である。
【図３】ディジタルＶＴＲの一例のトラックパターンを示す略線図である。
【図４】ディジタルＶＴＲの一例のトラックパターンを示す略線図である。
【図５】データのパック構造の説明に用いる略線図である。
【図６】オーディオセクタのデータ構造の一例を示す略線図である。
【図７】ビデオセクタのデータ構造の一例を示す略線図である。
【図８】サブコードセクタのデータ構造の一例を示す略線図である。
【図９】この発明の一実施例のブロック図である。
【図１０】この実施の一形態におけるＰＬＬ回路の構成の一例を示すブロック図である。
【図１１】演算処理回路の構成の一例を示すブロック図である。
【図１２】４８ｋＨｚモードのフレーム構成を説明するための図である。
【図１３】５フレーム周期の処理を行なった場合の、１ビデオフレーム当たりのオーディオサンプル数の標準偏差を示す略線図である。
【図１４】ＡＦＳＩＺＥの、標準値（１６０１．１６）に対するオフセットを示す略線図である。
【図１５】３２ｋＨｚモードのフレーム構成を説明するための図である。
【図１６】３２ｋＨｚモードで、５，７，８，および１５の各フレーム周期で処理を行なった場合の１ビデオフレーム当たりのオーディオサンプル数の標準偏差を示す略線図である。
【図１７】従来技術によるＰＬＬ回路の構成の一例を示すブロック図である。
【符号の説明】
１３ａ，１３ｂ・・・オーディオ信号処理回路、１４ａ，１４ｂ・・・Ｄ／Ａ変換器、１５・・・ＰＬＬ回路、２２・・・帰還フレームカウンタ、２３・・・演算処理回路、２４・・・位相比較器、２５・・・ローパスフィルタ、２６・・・ＶＣＯ、２７・・・１／ｍ分周器、２８・・・１／ｎ分周器、４２ａ〜４２ｄ・・・遅延素子として用いられるレジスタ、４３・・・加算器、４４・・・除算器、４６・・・帰還フレームカウンタ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a digital audio signal processing apparatus and processing method in, for example, a digital video playback apparatus.
[0002]
[Prior art]
In recent years, digital video tape recorders that record / reproduce digitized video signals have appeared. In such a digital video tape recorder, a signal obtained by adding an error correction code to a digitized video signal is recorded on a magnetic tape at the time of recording. For recording on the magnetic tape, a method called inclined azimuth recording by a rotating head is used in a helical scan as in the conventional analog method. The inclined azimuth recording method is a method of recording a signal on a tape by two heads having different gap extending directions.
[0003]
At the time of reproduction, a reproduction RF signal is obtained by scanning an oblique track formed on the tape with a rotary head in the same manner as at the time of recording. This reproduction RF signal is converted into reproduction data which is a digital data string via an amplifier, an equalizer, and the like. The reproduction data is also supplied to a PLL that generates a clock synchronized with the reproduction signal, and a reproduction clock is generated.
[0004]
Actually, data for one frame of the video signal is recorded over a plurality of, for example, 10 frames. Therefore, even during reproduction, the rotating head scans 10 and the rack to obtain data for one frame of the video signal.
[0005]
As a matter of course, in the digital video tape recorder, audio data is recorded together with video data. At this time, the field frequency in the video signal of the NTSC (525/60) system is 59.94 Hz. On the other hand, the sampling frequency of audio data is 48 kHz, 44.1 kHz, 32 kHz, or the like. Therefore, the period of one video frame and the sampling frequency of audio data are not in an integer ratio relationship.
[0006]
For example, when the sampling frequency is 48 kHz, the number of audio samples in one video frame is
48000Hz × (2 [field] /59.94Hz) ≒ 1601.6 samples
As a result, a fraction of 0.6 samples per frame is generated. Similarly, when the sampling frequency is 32 kHz,
32000 Hz × (2 [field] /59.94 Hz) ≈1067.734 samples
A fraction of approximately 0.734 samples per frame is generated.
[0007]
Therefore, in the audio data, the relationship between the average video frame and the sampling frequency of the audio data is defined by a combination of frames in which an integer number of samples are included. In the example of 48 kHz described above, by adjusting the number of samples in a 5-frame cycle, fractions are absorbed and processing with an integer number of samples can be performed. For example, when the sampling frequency of audio data is 48 kHz, a combination of a frame containing audio data 1600 or 1601 samples and a frame containing 1602 samples is used for one video frame.
[0008]
Of the 10 tracks on which the data for one video frame is recorded, the audio data of the left channel (L-ch) is, for example, the first 5 tracks, and the right channel (R) is the second 5 tracks. -ch) audio data is recorded. As will be described later, the audio data may have a different number of samples for each video frame, and therefore information on the number of samples of the audio data in one video frame is recorded. This is called AFSIZE, and is recorded in a predetermined area for each track on the corresponding track for each of L-ch and R-ch.
[0009]
Then, the AFSIZE is read out during reproduction, and is used as a clue to the PLL in the audio reproduction processing circuit, for example. That is, since the audio data must be in phase with the video frame, the frequency of the reproduction clock is changed by this AFSIZE.
[0010]
By the way, in the digital video format, two modes are defined for handling digital audio signals: an Unlocked mode in which processing is performed asynchronously with a video frame, and a mode called a Locked mode in which the video frame is synchronized. Yes.
[0011]
In the Unlock mode, the number of samples entering one frame is variable within a predetermined range such as 1580 samples to 1620 samples. Typically, one frame contains 1601 samples or 1602 samples. It is expected that 2 frames of 1601 samples and 3 frames of 1602 samples are combined, and a frame of 1601 samples and a frame of 1602 samples are repeated with a period of 5 frames. In the Unlock mode, the number of samples of audio data for each frame is obtained based on the above AFSIZE.
[0012]
On the other hand, in the Locked mode, the number of samples of digital audio data per frame is fixedly determined in order to synchronize with the video frame. That is, in the Locked mode, the number of audio data samples per frame is 1600 samples or 1602 samples. In the Locked mode, 1 frame of 1600 samples and 4 frames of 1602 samples are combined. The arrangement of the frames of these 1600 and 1602 samples is also fixed.
[0013]
Thus, the audio data has a different number of samples for each frame. Therefore, in particular, in analog processing, in order to associate video frames with audio data, it is necessary to change the sampling frequency of audio data for each frame. For that purpose, the above-described AFSIZE data is used. A different sampling frequency is obtained for each frame by controlling the operation of a PLL (Phase Locked Loop) that generates a sampling frequency of audio data based on the AFSIZE data. FIG. 17 shows an example of the configuration of a PLL circuit 100 using AFSIZE data according to the prior art.
[0014]
In the PLL circuit 100 shown in FIG. 17, the reference frame signal is supplied to one input terminal of the phase comparator 112 via the terminal 110. The AFSIZE data is supplied to one input terminal of the frame counter 116 via the terminal 111. In the frame counter 116, the signal output from the VCO (voltage controlled oscillator) 114 is counted, and the frame signal is output when counted by the AFSIZE data. In this way, the feedback frame is reproduced. This frame signal is supplied to the other input terminal of the phase comparator 112.
[0015]
The phase comparator 112 compares the reference frame signal with the frame signal supplied from the counter 116, and outputs phase error data. Phase error data is supplied to the VCO 114 via the low-pass filter 113. The VCO 114 outputs a signal having a frequency that cancels the phase error. This signal is supplied to the frame counter 116. The output of the VCO 114 is also supplied to the 1 / n frequency divider 115, and is divided by a frequency division ratio n to be a sampling clock for an audio signal having a predetermined sampling frequency, which is derived to the output terminal 117. Is done.
[0016]
With such a configuration, the phase comparator 112 compares the reference frame signal and the frame signal output in accordance with the AFSIZE data in the frame counter 116, so that the audio sampling clock in accordance with the AFSIZE data is It can be obtained for each frame.
[0017]
[Problems to be solved by the invention]
One of the purposes of the Locked mode is to stably supply an audio sampling clock. However, in the Locked mode, there is a problem in that a phase error occurs at a predetermined frame period when an attempt is made to obtain a sampling clock by the above-described conventional method. For example, a phase error of 2 samples occurs in the AFSIZE data in a period of 5 frames in a mode with a sampling frequency of 48 kHz and in a period of 15 frames in a mode with 32 kHz.
[0018]
In the Unlock mode, it is expected that 1602 samples, 1601 samples, 1602 samples, 1601 samples, and 1602 samples will be repeated in a period of 5 frames under standard settings. On the other hand, in the Locked mode, 1600 samples and 1602 samples × 4 are repeated in a cycle of 5 frames. As is clear from this, when the phase error of the PLL is considered, there is a problem that it is difficult to say that the Locked mode is advantageous over the Unlock mode.
[0019]
Further, in the Locked mode, in order to stably supply the sampling clock, it is conceivable to perform PLL by a method different from the Unlock mode in which the AFSIZE size data is not used. However, in this case, it is necessary to prepare two types of PLL circuits, or to switch some operation settings even with one type of PLL circuit, which is disadvantageous in terms of cost and circuit scale. There was a problem.
[0020]
Furthermore, in this prior art method, the audio sampling clock differs from frame to frame or by one frame every few frames. For example, the sampling clock obtained includes jitter due to the limit of the tracking capability of the PLL. There was a problem such as.
[0021]
SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a digital audio signal processing apparatus and processing method capable of stably supplying an audio sampling clock in both the Locked mode and the Unlock mode with the same circuit configuration. .
[0022]
[Means for Solving the Problems]
In order to solve the above-described problem, the present invention provides a digital audio signal processing apparatus which handles a digital audio signal in association with a video frame, and control information indicating the number of samples of the digital audio signal in each video frame. Extracting means for extracting the average value, averaging means for obtaining an average value of the number of samples of a plurality of continuous video frames indicated by the control information, and an average value obtained by the averaging means Frame signal generated based on And reference frame signal To form phase error data and based on the phase error data A digital audio signal processing apparatus comprising clock generation means for generating a clock for processing a digital audio signal.
[0023]
The present invention also provides a digital audio signal processing method for handling a digital audio signal in association with a video frame in order to solve the above-described problem, and indicates the number of samples of the digital audio signal in each video frame. An extraction step for extracting control information, an averaging step for obtaining an average value of the number of samples of a plurality of continuous video frames indicated by the control information, and an average value obtained by the averaging step Frame signal generated based on And reference frame signal To form phase error data and based on the phase error data And a clock generation step of generating a clock for processing the digital audio signal.
[0024]
As described above, the present invention obtains an average value of the number of samples of a digital audio signal in a plurality of video frames, and generates a clock for processing the digital audio signal by comparing the average value with a reference frame signal. Thus, a stable clock can be generated even if the number of samples of the digital audio signal has a fluctuation near the standard value between video frames.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In the present invention, the number of samples of audio data is sequentially averaged for five consecutive frames, and an audio sampling clock is generated based on the averaged result. First, a rotating head type digital VTR (video tape recorder) will be described as a specific example of a magnetic reproducing apparatus to which the present invention can be applied. As shown in FIG. 1, tracks are formed obliquely on the tape. T0 and T1 indicate track numbers, and inclined azimuth recording is performed in which the azimuth between adjacent tracks is different. FIG. 2 shows one track. On the track entrance side, a timing block is provided for reliably performing ITI (Insert and Track Information) dubbing. This is provided in order to accurately position the area when the data written in the subsequent area is rewritten after dubbing.
[0026]
In this example, the composite digital color video signal is a luminance signal Y and a color difference signal C. _R And C _B The component signal is compressed by DCT conversion and variable length code, and recorded on the magnetic tape by the rotary head. As a recording method, an SD method (525 lines / 60 Hz, 625 lines / 50 Hz) and an HD method (1125 lines / 60 Hz, 1250 lines / 50 Hz) can be set.
[0027]
In the SD system, the number of tracks per frame is 10 tracks as shown in FIG. 3 at 525 lines / 60 Hz, and 12 tracks as shown in FIG. 4 at 625 lines / 50 Hz. Although not shown, in the HD system, the number of tracks per frame is twice that of the SD system, that is, 20 tracks (1125 lines / 60 Hz) or 24 tracks (1250 lines / 50 Hz). When the audio sampling frequency is 44.1 kHz and 48 kHz, L channel audio data is recorded on the first 5 tracks (6 tracks), and R channel audio data is recorded on the second 5 tracks (6 tracks). .
[0028]
As shown in the track format of FIG. 2, audio data, video data, and subcode data are recorded in the head scanning order after the ITI area. The areas for recording video data and audio data are provided with areas for writing auxiliary data (AUX) for recording additional information, respectively. In the AUX, data other than audio and video data such as recording date and time and recording time can be written. The subcode data, AUX, and data recorded in the semiconductor memory built in the cassette have a common format. This format is referred to as a pack structure. A pack is a minimum unit of a data groove.
[0029]
As shown in FIG. 5A, one pack is composed of 5 bytes (PC0 to PC4). The first 1 byte (PC0) is a header, and the remaining 4 bytes are data. One byte of the header is divided into upper 4 bits and lower 4 bits, and forms a hierarchical structure including an upper header of upper 4 bits and a lower header of lower 4 bits. FIG. 5B shows an audio AUX source pack in which the header byte PC0 is (01010000). Data in this pack, for example, data in the byte PC1, is defined as follows.
[0030]
LF (1 bit): Indication of whether the video sampling frequency and the audio sampling frequency are locked
AFSIZE (6 bits): Instruction of audio frame size (number of audio samples) in one video frame
In the present invention, this AFSIZE is related.
[0031]
The video frame frequency is 29.97 Hz in the case of the NTSC (525/60) system. On the other hand, when the audio sampling frequency is 48 kHz, for example, the number of audio samples in the video frame is not an integer, but is approximately 1601.6. Therefore, as described above in the related art, an integer number of audio samples close to this number is allocated to each video frame so that the average number of audio samples matches the above number.
[0032]
The AFSIZE in the case of the Unlock mode (for example, in the case of the 525/60 system) is defined as shown in FIG. 5C. As can be seen from FIG. 5C, for example, when the sampling frequency is 48 kHz, the number of audio samples per video frame can be in the range of 1580 to 1620. The number of audio samples recorded in the track (frame) is designated by AFSIZE.
[0033]
As described above in the prior art, in the Unlock mode, a combination of 1601 samples × 2 frames and 1602 samples × 3 frames is a standard setting. In the Locked mode, a combination of 1600 samples × 1 frame and 1602 samples × 4 frames is defined.
[0034]
The areas in which the audio data, video data, and subcode are recorded are called an audio sector, a video sector, and a subcode sector, respectively. Between these sectors, gaps G1, G2, and G3 in which no data is recorded are arranged. The audio sector includes a preamble (presync block) PR1, a data portion (14 sync blocks), and a postamble PO1 (postsync block).
[0035]
The audio sync block is composed of 90 bytes as shown in FIG. The first 5 bytes are sync and ID data. Audio data (72 bytes) and audio AUX (AAUX) (5 bytes) are included in one sync block. This data is error correction encoded by a product code. That is, inner codes (referred to as C1 codes) are encoded for 77 bytes aligned in the horizontal direction. Specifically, the (85, 77) Reed-Solomon code is used as the C1 code, and an 8-byte C1 (inner code) parity is added. The direction of the C1 code sequence is the data recording / reproducing direction. Further, error correction coding of an outer code (referred to as C2 code) is performed on 9-byte data arranged in the vertical direction. Specifically, a (14, 9) Reed-Solomon code is used as the C2 code, and a 5-byte C2 (outer code) parity is added.
[0036]
The video sector includes a preamble (presync block) PR2, a data portion (149 sync block), and a postamble PO2 (postsync block). FIG. 7 shows the configuration of the video sector. The configuration of the preamble and the postamble is the same as that of the audio sector shown in FIG. In the video sync block included in 149 video sectors, one sync block is composed of 90 bytes as in the audio sync block.
[0037]
The first 5 bytes of the sync block are a sync and an ID. The data portion is 77 bytes, and error correction encoding of the product code is performed in the same way as audio data. Specifically, the (85, 77) Reed-Solomon code is used as the C1 code, and the (149, 138) Reed-Solomon code is used as the C2 code. Then, C1 (inner code) parity (8 bytes) and C2 (outer code) parity (11 bytes) are respectively added. Two sync blocks with sync block numbers 19 and 20 and one sync block immediately before the C2 parity are dedicated to video AUX (VAUX), and 77-byte data is used as VAUX data. A central 135 sync block other than VAUX and C2 parity is an area in which video data of a compressed video signal is stored.
[0038]
Further, FIG. 8 shows the structure of the subcode sector. Unlike the audio sector and the video sector, the subcode sector preamble and postamble have no presync and postsync. The subcode sync block has a length of 12 bytes, and the first 5 bytes are a sync and an ID. The subsequent 5 bytes are the data part, and only the C1 code is encoded for the data part. Then, C1 parity (2 bytes) is added. Thus, the product code configuration is not adopted in the subcode. This is because the subcode is mainly for high-speed search, and the C2 parity can hardly be reproduced. Also, the sync length is shortened to 12 bytes for high-speed search up to about 200 times. There are 12 sub-code sync blocks per track.
[0039]
FIG. 9 shows the configuration of a reproduction system when the present invention is applied to the digital VTR described above. Although not shown, this digital VTR is controlled by a CPU such as a microprocessor. A signal reproduced from a magnetic tape (cassette tape) 1 by a magnetic head (rotating head) 2 is supplied to a reproduction signal processing circuit 3. The reproduction signal processing circuit 3 includes a reproduction amplifier, a reproduction equalizer, and the like. The reproduction data from the reproduction signal processing circuit 3 is supplied to the C1 decoder 4. The C1 decoder 4 performs error correction of the C1 code. In the case of the above-described C1 code, for example, errors up to three symbols in the sync block are corrected.
[0040]
The output of the C1 decoder 4 is supplied to a TBC (time axis compensator) 5. The TBC 5 has a memory and removes time axis fluctuations included in the reproduction signal. Output data of the TBC 5 is supplied to the frame memory 5. The frame memory 5 converts the data order into the C2 code order, and the C2 decoder 7 in the next stage performs C2 decoding. As an example, in C2 decoding, up to a predetermined number of error symbols that could not be corrected by the C1 code are corrected by erasure correction.
[0041]
Output data of the C2 decoder 7 is supplied to the deshuffling and interpolation processing circuit 8. Deshuffling is a process of returning shuffling (data arrangement, rearrangement of the order) performed in the recording process to the original arrangement and order. The interpolation process is a process for correcting an error that could not be corrected by the C1 code and the C2 code. In the case of video data, for example, error data is corrected with correct data one frame before. Further, the deshuffling and interpolation processing circuit 8 has a two-bank configuration including memories 9a and 9b, an input changeover switch 10 and an output changeover switch 11, and processes continuously reproduced data and outputs it continuously. It is possible. The video signal output from the deshuffling and interpolation processing circuit 8 is supplied to the video signal processing system at the subsequent stage.
[0042]
In addition, the reproduced audio signal is supplied to the switching circuit 12, and audio data separated for each channel is formed. The L channel audio data is supplied to the audio signal processing circuit 13a, and the R channel audio data is supplied to the audio signal processing circuit 13b. These audio signal processing circuits 13a and 13b perform processing such as deshuffling, time axis expansion, and AAUX (audio AUX) separation. The above-described AFSIZE is extracted from the separated AAUX. For these processes, each signal processing circuit is provided with a memory capable of storing reproduced audio data for one frame, and a read address of this memory is generated based on AFSIZE.
[0043]
The L channel data from the audio signal processing circuit 13a is supplied to the D / A converter 14a, and an analog L channel audio signal is output from the D / A converter 14a. Similarly, R channel data from the audio signal processing circuit 13b is supplied to the D / A converter 14b, and an analog R channel audio signal is output from the D / A converter 14b.
[0044]
An audio sampling clock for audio processing used in the audio signal processing circuits 13 a and 13 b and the D / A converters 14 a and 14 b is generated by the PLL circuit 15. AFSIZE extracted from AAUX by the audio signal processing circuit 13 a is supplied to one input terminal of the PLL circuit 15. In the timing signal generation circuit 16, a reference frame signal used in the video signal processing system is generated. This reference frame signal is supplied to the other input terminal of the PLL circuit 15. The PLL circuit 15 generates the above-described audio sampling clock based on the AFSIZE and the reference frame signal.
[0045]
FIG. 10 shows an example of the configuration of the PLL circuit 15 in this embodiment. AFSIZE is supplied to the terminal 20. A reference frame signal is supplied to the terminal 21. The reference frame signal is supplied as a reset signal to the feedback frame counter 22 and is supplied to one input terminal of each of the arithmetic processing circuit 23 and the phase comparator 24. AFSIZE is supplied to one input terminal of each of the feedback frame counter 22 and the arithmetic processing circuit 23.
[0046]
A clock having a frequency higher than the audio sampling frequency is supplied to the arithmetic processing circuit 23 as an operation clock from a 1 / m frequency divider 27 described later. This operation clock has, for example, a frequency 10 times the audio sampling frequency. That is, if the audio sampling frequency is 48 kHz, the frequency is 480 kHz. Of course, a higher frequency clock such as 256 times the audio sampling frequency may be used. As a result, higher resolution is realized in the arithmetic processing circuit 23.
[0047]
FIG. 11 shows an example of the configuration of the arithmetic processing circuit 23 in more detail. In the arithmetic processing circuit 23, an average value of AFSIZE for five frames is obtained. That is, four delay elements delayed by the timing of the reference frame signal are used, AFSIZE is sequentially delayed, and the input AFSIZE and each delayed AFSIZE are added and divided to obtain an average value.
[0048]
In this example, registers 42a to 42d that output the signal supplied to the terminal D to the terminal Q at the timing of the signal supplied to the terminal En are used as the delay elements. The AFSIZE supplied from the terminal 41 is supplied to the register 42a, and is sequentially sent to the registers 42b, 42c and 42d by the reference frame signal supplied from the terminal 40 to the terminal En of each of the registers 42a to 42d. The AFSIZE for 5 frames obtained in this way is added by the adder 43 and divided by the divider 44, whereby the average value of AFSIZE for 5 frames is calculated. The average value of the AFSIZE is supplied to the feedback frame counter 46 via the switch circuit 45.
[0049]
The average value of AFSIZE obtained in this way is, for example, as follows. In the locked mode, as shown in FIG. 12A, when a frame of 1600 samples arrives at a period of 5 frames and an averaging process is performed,
(1600 + 1602 × 4) /5=1601.6 samples
It becomes. That is, a fraction of 0.6 samples occurs.
[0050]
On the other hand, in the case of the Unlock mode, as shown in FIG. 12B, it is expected that 3 frames of 1602 samples and 2 frames of 1601 samples come together in a 5-frame cycle. When averaging is performed,
(1601 × 2 + 1602 × 3) /5=1601.6 samples
This yields a fraction of 0.6 samples.
[0051]
As described above, the arithmetic processing circuit 23 shown in this example operates with a high-speed clock having a frequency 10 times the audio sampling frequency. That is, this high-speed clock is supplied from the 1 / m frequency divider 27 via the terminal 49. Based on this clock, the feedback frame counter 46 can count with a resolution of the first decimal place of AFSIZE in this example. Therefore, the feedback frame counter 46 can count 0.6 samples generated as a fraction during the averaging of AFSIZE.
[0052]
In this way, the feedback frame counter 46 performs counting based on the high-speed clock, and outputs a frame signal when the average value of AFSIZE is reached. This frame signal is supplied to the other input terminal of the phase comparator 24. In this way, the arithmetic processing circuit 23 normalizes the fraction of the number of audio samples in a 5-frame cycle.
[0053]
In the phase comparator 24, the reference frame signal supplied to one input terminal is compared with the frame signal supplied to the other input terminal, and phase error data is output. This phase error data is supplied to the VCO 26 via the low pass filter 25. The VCO 26 outputs a signal having a frequency that cancels the phase error data. This signal is supplied to both the 1 / m frequency divider 27 and the 1 / n frequency divider 28 described above.
[0054]
In the 1 / m frequency divider 27, as described above, the frequency division ratio m is selected so that a clock having a frequency 10 times or more the audio sampling frequency can be obtained. The output of the 1 / m frequency divider 27 is supplied to the arithmetic processing circuit 23 as an operation clock. Further, in the 1 / n frequency divider 28, the frequency division ratio n is selected so that a clock having an audio sampling frequency is obtained as a frequency division output. The frequency-divided output of the 1 / n frequency divider 28 is supplied to the terminal 29 and output as an audio sampling clock. At the same time, the output of the 1 / n frequency divider 28 is supplied to the other input terminal of the feedback frame counter 22.
[0055]
The feedback frame counter 22 counts up to AFSIZE for each video frame based on the divided output of the 1 / n divider 28. The count is reset by the reference frame signal. With this count, the feedback frame counter 22 outputs an operation frame signal for audio processing. The audio operation frame signal is output to the outside via the terminal 31 and is supplied to the audio signal processing circuits 13a and 13b and the D / A converters 14a and 14b, which are omitted in FIG.
[0056]
In the above-described configuration, the clock tracking becomes very slow when the abrupt displacement, for example, the operation mode is changed from the mode with the audio sample frequency of 48 kHz to the mode of 32 kHz. Therefore, in such a case, the terminal 45b is selected in the switch circuit 45 of the arithmetic processing circuit 23. By doing so, it is possible to cope with high-speed operation.
[0057]
FIG. 13 shows the standard deviation of the number of audio samples per video frame when processing is performed for a period of 5 frames. The AFSIZE input pattern A corresponds to the Locked mode, and the pattern B corresponds to the Unlock mode. As is apparent from this figure, when the averaging process is performed in 5 frames according to the present invention, the standard deviation value is 0, and a more stable clock is supplied than when the process is performed for each frame. I understand that. FIG. 14 shows an offset with respect to the standard value (1601.16) of AFSIZE near the standard when processing is performed for each frame. In the processing for each frame, an offset as shown in FIG. 14 is always included.
[0058]
In the above description, the present invention has been described as being applied when the video is NTSC and the audio sampling frequency is 48 kHz. However, the present invention is not limited to this example. The present invention can also be applied to an example in which the video is NTSC and the audio sampling frequency is 32 kHz.
[0059]
As already described in the prior art, in the 32 kHz mode, the number of audio samples per frame is 1067.734 samples. Therefore, in the 32 kHz mode, fractional normalization is performed at a period of 15 frames in the Locked mode. As shown in FIG. 15, 3 frames of 1066 samples and 12 frames of 1068 samples are combined. That is,
(1066 [sample] × 3 + 1068 [sample] × 12) × 2 = 32028 [sample]
It becomes. Within a period of 5 frames, 1066 sample frames are inserted every 6 frames.
[0060]
When attempting to normalize the wave number with a period of 15 frames, it is originally necessary to prepare 14 delay elements that are delayed at the timing of the reference frame signal and perform calculations in 15 frames. FIG. 16 shows the standard deviation of the number of audio samples per video frame when processing is performed in each frame period of 5, 7, 8, and 15 in the 32 kHz mode. As shown in FIG. 16, even when the number of delay elements is reduced and 7 and 8 frame periods are set, a much more stable clock can be supplied as compared with the example in which processing is performed for each frame.
[0061]
Furthermore, even when the number of delay elements is reduced and the period is set to 5 frames, a very favorable result can be obtained as compared with the processing for each frame. .
[0062]
In addition, when the delay element configuration is set to any one of these, the same processing can be performed in the Locked mode and the Unlock mode.
[0063]
In the above description, an example in which the present invention is applied to a digital VTR has been described. However, the present invention is not limited to this example. For example, the present invention can be applied to a case where a video signal is reproduced from a disk recording medium such as an MD (Mini Disc), a DVD (Digital Versatile Disc), or a hard disk.
[0064]
【The invention's effect】
As described above, according to the present invention, since the AFSIZE of a plurality of frames is averaged and the PLL processing is performed using the averaged AFSIZE, the feedback frame is stabilized and the audio sampling clock is stabilized. There is an effect that it can be supplied.
[0065]
In addition, according to the present invention, it is possible to process the reproduced audio data with the same configuration without considering the Locked mode and the Unlock mode by averaging the AFSIZE in a plurality of frames.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing a track pattern of an example of a digital VTR to which the present invention can be applied.
FIG. 2 is a schematic diagram for explaining the data arrangement of one track.
FIG. 3 is a schematic diagram showing a track pattern of an example of a digital VTR.
FIG. 4 is a schematic diagram showing a track pattern of an example of a digital VTR.
FIG. 5 is a schematic diagram used for explaining a data pack structure;
FIG. 6 is a schematic diagram illustrating an example of a data structure of an audio sector.
FIG. 7 is a schematic diagram illustrating an example of a data structure of a video sector.
FIG. 8 is a schematic diagram illustrating an example of a data structure of a subcode sector.
FIG. 9 is a block diagram of one embodiment of the present invention.
FIG. 10 is a block diagram illustrating an example of a configuration of a PLL circuit according to the embodiment.
FIG. 11 is a block diagram illustrating an example of a configuration of an arithmetic processing circuit.
FIG. 12 is a diagram for explaining a frame configuration in a 48 kHz mode;
FIG. 13 is a schematic diagram illustrating a standard deviation of the number of audio samples per video frame when processing is performed in a 5-frame cycle.
FIG. 14 is a schematic diagram illustrating an offset of an AFSIZE with respect to a standard value (1601.16).
FIG. 15 is a diagram for explaining a frame configuration in a 32 kHz mode;
FIG. 16 is a schematic diagram illustrating the standard deviation of the number of audio samples per video frame when processing is performed in each frame period of 5, 7, 8, and 15 in the 32 kHz mode.
FIG. 17 is a block diagram showing an example of a configuration of a PLL circuit according to a conventional technique.
[Explanation of symbols]
13a, 13b ... audio signal processing circuit, 14a, 14b ... D / A converter, 15 ... PLL circuit, 22 ... feedback frame counter, 23 ... arithmetic processing circuit, 24 ... Phase comparator, 25... Low-pass filter, 26... VCO, 27... 1 / m frequency divider, 28... 1 / n frequency divider, 42 a to 42 d. Register, 43 ... adder, 44 ... divider, 46 ... feedback frame counter

Claims

In a digital audio signal processing apparatus adapted to handle a digital audio signal in association with a video frame,
Extraction means for extracting control information indicating the number of samples of the digital audio signal of each video frame;
Averaging means for obtaining an average value of the number of samples of the plurality of consecutive video frames indicated by the control information;
A clock for processing the digital audio signal based on the phase error data by comparing the frame signal generated based on the average value obtained by the averaging means and the reference frame signal to form phase error data And a clock generation means for generating a digital audio signal processing apparatus.

The digital audio signal processing apparatus according to claim 1,
The clock generation means includes
Frequency dividing means for generating a clock having a frequency higher than the sampling frequency of the digital audio signal;
Counting means that is reset by the reference frame signal, counts the clock, and outputs the frame signal when the count value reaches the average value;
Phase comparison means for comparing the frame signal with the reference frame signal and outputting the phase error data;
An apparatus for processing a digital audio signal, characterized in that it is a PLL comprising variable frequency oscillating means supplied with the output of the phase comparing means as a control signal.

In a method of processing a digital audio signal adapted to handle a digital audio signal in association with a video frame,
An extraction step of extracting control information indicating the number of samples of the digital audio signal of each video frame;
An averaging step for obtaining an average value of the number of samples of the plurality of consecutive video frames indicated by the control information;
A phase error data is formed by comparing a frame signal generated based on the average value obtained by the averaging step and a reference frame signal, and the digital audio signal is processed based on the phase error data A method of processing a digital audio signal, comprising: a clock generation step of generating a clock.