JPH10507042A

JPH10507042A - Measurement and regulation of synchronization of merged video and audio data

Info

Publication number: JPH10507042A
Application number: JP8508678A
Authority: JP
Inventors: ヘイ、ステファン、ジー
Original assignee: フューチャーテル・インク
Priority date: 1994-08-29
Filing date: 1994-08-29
Publication date: 1998-07-07
Also published as: WO1996007274A1; EP0783823A4; AU8009894A; EP0783823A1

Abstract

(57)【要約】この発明は、オーディオ及びビデオデータが引き続き同期して呈示されるような圧縮可聴／可視システムデータ流（２２）の実時間組立に関する。本発明によるシステムデータ流（２２）の組立は、圧縮オーディオビット流（１６）から選ばれたデータのパケットを圧縮ビデオビット流（１８）から選ばれたデータのパケットと交互配置する。システムデータ流（２２）に組み込まれているビデオデータのフレームがシステムデータ流（２２）に組み込まれているオーディオデータに先立ち過ぎると、ビデオ信号の単一フレームに対する全データはシステムデータ流（２２）から除外される。逆に、システムデータ流（２２）に組み込まれているビデオデータのフレームがシステムデータ流（２２）に組み込まれているオーディオデータに遅れ過ぎると、ビデオ信号の単一フレームに対する全データの第二のコピーはシステムデータ流（２２）に組み込まれる。 SUMMARY The present invention relates to a real-time assembly of a compressed audio / visual system data stream (22) such that audio and video data are subsequently presented in a synchronized manner. The assembly of the system data stream (22) according to the invention interleaves packets of data selected from the compressed audio bit stream (16) with packets of data selected from the compressed video bit stream (18). If the frames of video data embedded in the system data stream (22) precede the audio data embedded in the system data stream (22), all data for a single frame of the video signal will be transmitted to the system data stream (22). Excluded from Conversely, if the frame of video data embedded in the system data stream (22) is too late for the audio data embedded in the system data stream (22), the second of all data for a single frame of the video signal will be lost. The copy is incorporated into the system data stream (22).

Description

【発明の詳細な説明】併合ビデオ及びオーディオ・データの同期化の測定及び規制発明の技術分野この発明は、一般に、記録及び／又は送信される圧縮ディジタルデータの技術分野、特に、ビデオ及びオーディオデータを、それ等が単一の圧縮ディジタルデータ流に組み合わされる従い、順次同期化呈示可能化させることに関する。背景技術付随する圧縮ディジタル化オーディオデータを伴う圧縮ディジタル化ビデオデータから成る記録及び／又は送信された多数媒体プログラムを適切に再生するには、ディジタルデータビット流をビデオデータとオーディオデータの両方を含む単一同期化順次システムデータ流に組み合わせることを要する。データをシステムデータ流に組み立てるに又は組立システムデータ流を呈示するに際して、ビデオデータとオーディオデータが無いか或いはそれ等が不適切に同期化されていると、しばしば可視画像は付随する音と同期化して現れるものである。例えば、言葉を話す個人の唇移動を示す画像を呈示しようとすると、それはこれ等の言葉の可聴音と同期しないことが有る。以上の問題に答えるべく、移動画像専門家グループ（”ＭＰＥＧ”）基準、標準化国際機構（”ＩＳＯ”）及び国際電子技術委員会（”ＩＥＣ”）基準ＩＳＯ／ＩＥＣ１１１７２のパート１は、ディジタル化ビデオ及びオーディオデータを単一同期化順次システムデータ流に組み合わせることを許容する枠組みを規定している。一旦単一ディジタルデータ流に組み合わされると、データは、ディジタルコンピュータに含まれるハードディスク又はＣＤ−ＲＯＭ上等のディジタル記憶装置又はケーブルアンテナテレビ（”ＣＡＴＶ”）システム又は高ビットレートディジタル電話システム、例えば、ＴＩ、ＩＳＤＮ主要レート又はＡＴＭ遠隔通信アクセスに亘るような送信の為に充分適した形式となる。ＩＳＯ／ＩＥＣ１１１７２基準に従って組み立てられたシステムデータ流は、ＭＰＥＧデコーダ（復号器）により復号化されて復号化画像及び／又は復号化オーディオサンプルを得ることが出来る。ＭＰＥＧ圧縮を規定するＩＳＯ／ＩＥＣ１１１７２基準は、圧縮ビデオビット流及び圧縮オーディオビット流から抽出されたデータのパケットがシステムデータ流を組み立てる際に交互配置されるべきと規定している。更に、ＩＳＯ／ＩＥＣ１１１７２基準に従って、システムデータ流は、圧縮ビデオ及び圧縮オーディオビット流に加えて、個人保有のパディング流を含むことが出来る。ＭＰＥＧにより規定されたシステムデータ流の特性はＭＰＥＧエンコーダ（符号器）及びデコーダ（復号器）に機能的及び性能要求を課しているが、ＭＰＥＧ基準に特定されたシステムデータ流はＭＰＥＧ符号器又は復号器に対するアーキテクチャ又はその具現化を規定していない。事実、ＩＳＯ／ＩＥＣ１１１７２基準に従って動作する符号器及び復号器の可能な設計及び具現化に対して、かなりの自由度が存在する。ＩＳＯ／ＩＥＣ１１１７２基準のパート１に従うシステムデータ流は二つのデータ層、即ち、圧縮層と圧縮層のディジタルデータを包括するシステムデータ層を含む。ＩＳＯ／ＩＥＣ１１１７２システム層はそれ自体、二つの副層、即ち、「パック層」として識別される多重幅動作の為の層と「パケット層」として識別される流れ特定動作の為の層に分割される。ＩＳＯ／ＩＥＣ１１１７２基準に従うシステムデータ流のパック層に属するパックは、システムクロック基準（”ＳＣＲ”）を特定するヘッダーを含む。ＳＣＲは、９０キロヘルツ（”ｋＨｚ”）の期間内で圧縮層に含まれるディジタル化ビデオ及びオーディオデータの圧縮を開始する意図された時間を固定する。ディジタル化ビデオ及びオーディオデータの同期化呈示を行う為に、パケット層を規定するＩＳＯ／ＩＥＣ１１１７２基準は「呈示時間スタンプ」（”ＰＴＳ ”）及び任意復号化時間スタンプ（”ＤＴＳ”）を提供する。ＰＴＳとＤＴＳはパック層に特定されたＳＣＲに関しビデオ及びオーディオデータ用同期を特定する。ＰＳＴとＤＴＳの両方を任意に含むパケット層は、ＩＳＯ／ＩＥＣ１１１７２基準により規定された圧縮層に含まれるデータと独立している。例えば、ビデオパケットはビデオ流内のどんなバイトでも開始出来る。しかしながら、ＰＳＴと任意のＤＴＳは、各パケットのヘッダーに符号化されていると、そのパケットで開始する第一の「アクセスユニット」（”ＡＵ”）に適用する。ＭＰＥＧ基準ＩＳＯ／ＩＥＣ１１１７２は、ＡＵを「呈示ユニット」（ＰＵ）の符号化表示と定義する。ＩＳＯ／ＩＥＣ１１１７２基準は更に、ＰＵを復号化オーディオＡＵ又は復号化画像と定義している。この基準はまた、オーディオ信号をコンプレス又はデコンプレスする基準において「層」と呼ばれる三つの異なる方法を規定する。これ等の方法の二つに対しては、基準はオーディオＡＵを、それ自体で復号化され得る符号化オーディオビット流の最小部分として定義する。第三の方法に対しては、基準はオーディオＡＵを、前に捕捉された位置と主情報を用いて復号化可能な符号化オーディオビット流最小部分として定義する。ＩＳＯ／ＩＥＣ１１１７２基準のパート１は、圧縮ビデオ及びオーディオデータの同期化呈示の間に、ビデオ画像とオーディオ音の再生が、一つの流、例えば、ビデオデータ流を調整して他の流、例えば、オーディオデータ流の再生に整合させるより寧ろ、流タイムクロック（”ＳＴＣ”）と呼ばれるマスタータイムベースに対して両圧縮ディジタルデータの再生に調整することによって同期化されることを示唆している。ＩＳＯ／ＩＥＣ１１１７２基準は、ＭＰＧＥ復号器のＳＴＭが復号器のクロック（例えば、ＳＣＲ、ビデオＰＴＳ又はオーディオＰＴＳ）、ディジタル記憶媒体（”ＤＳＭ”）又はチャネルクロックの一つであるか、或いはそれが何らかの外部クロックで良いことを示唆している。ＭＰＥＧシステムデータ流に符号化される多数媒体プログラムのエンドツーエンド同期化は：ａ．符号器が、システムデータ流の組立中に、タイムスタンプを埋め込む；ｂ．ビデオ及びオーディオ復号器が圧縮データと共に埋め込みタイムスタンプを受け取る；及びｃ．復号器が多数媒体プログラムの呈示を予定するのに際してタイムスタンプを用いる場合に起こる。ＭＰＥＧ復号器に符号化ビット流はＳＣＲに対して正確な関係を有していることを伝達する為に、システムデータ流の開始時に起こり且つ該流内で繰り返される「システムヘッダー」（”ＳＨ”）は、「システムオーディオロックフラグ」と「システムビデオロックフラグ」を含む。システムオーディオロックフラグを１に設定すれば、オーディオサンプリングレートとＳＣＲの間に特定な一定の関係が存在することを表示する。システムビデオロックフラグを１に設定すれば、ビデオ画面とレートとＳＣＲの間に特定な一定の関係が存在することを表示する。これのフラグの何れかを零に設定すれば、対応する関係が無いことを表示する。上述のように、ＩＳＯ／ＩＥＣ１１１７２基準は、システムデータ流がパディング流を含むことが出来ることを特に規定している。パディング流からシステムデータ流に組み立てられたパケットは、一定の全データ流レートを維持するのに用いることが出来、セクタ合わせを達成し、或いは復号器バッファーアンダーフローを阻止する。パディング流に加えて、１６バイト迄の「スタッフィング」は各パケット内で許容される。スタッフィングはパディング流の目的と同様の目的で用いられ、バイト（８ビット）合わせが不十分なような応用においてワード（１６ビット）又は長ワード（３２ビット）合わせを提供するのに特に有用である。スタッフィングは、必要とされるバイト数がパディング流パケットの最小大きさがより小さい時、パケットを満たす唯一の方法である。ＩＳＯ／ＩＥＣ１１１７２基準のパート２に従って圧縮されたビデオデータのビット流は、圧縮ビデオデータの一連のフレームから成る。ＭＰＧＭ圧縮ビデオデータビット流における一連のフレームは、内部（”Ｉ”）フレーム、予測（” Ｐ”）フレーム及び双方向（”Ｂ”）フレームを含む。他のどんなデータをも参照しないＭＰＥＧＩフレームを復号化すれば、ビデオデータの全未圧縮フレームが再生される。ＭＰＥＧＰフレームは、ビデオデータの前の復号化フレームを参照することによってのみ、即ち前の復号化Ｉフレームを参照するか前の復号化Ｐフレームを参照してビデオデータの全未圧縮フレームを得るように復号化することが出来る。ＭＰＥＧＢフレームは、前の及び引き続く参照フレームの両者を参照することによってのみ、即ち、復号化Ｉ又はＰフレームの何れかを参照して、ビデオデータの全未圧縮フレームを得るように復号化することが出来る。ＩＳＯ／ＩＥＣ１１１７２仕様は、画像群（”ＧＯＰ”）として、Ｉフレームが基準となるＰフレーム及びＢフレームの全てと共に、一つ以上のＩフレームを規定している。システムデータ流を組み立てるのに際し、実時間ＭＰＥＧ符号器は各システムデータ流の開始時にシステムヘッダーを含まなければならず、そのシステムヘッダーはシステムオーディオロックフラグ及びシステムビデオロックフラグを零（０）か一（１）に設定しなければならない。実時間ＭＰＥＧ符号器がこれ等のフラグの何れか又は両方は設定されるべきと特定すれば、特定される全システムデータ流に亘って、オーディオサンプリングレートとＳＣＲの間に且つビデオ画面レートとＳＣＲの間にそれぞれ一定関係が存在することを適切に保証する。圧縮オーディオビット流符号器がビデオのフレームが起こるレートに無関係に動作すれば、システムデータ流にインターリーブ（交互配置）されるべき符号化データにはこれ等の一定の関係が存在すると云う保証は無い。発明の開示本発明の一目的は、可視画像と付随する音の同期化呈示を許容するシステムデータ流を組み立てる方法を提供することにある。本発明の他の目的は、オーディオサンプリングレートとＳＣＲの間に一定関係を維持するシステムデータ流を提供することにある。本発明の他の目的は、ビデオ画像レートとＳＣＲの間に一定関係を維持するシステムデータ流を提供することにある。簡略して云えば、本発明は、復号器により復号化ビデオ画像と復号化オーディオ信号に復号化され得る符号化システムデータ流の実時間組立方法である。特に、本発明により組み立てられたシステムデータ流は、復号器が復号化ビデオ画像と実質的に同期して復号化オーディオ信号を呈示するのを許容する。このシステムデータ流は、圧縮オーディオビット流から選ばれたデータのパケットを圧縮ビデオビット流から選ばれたデータのパケットで交互配置することによって組み立てられる。システムデータ流に交互配置される圧縮オーディオビット流は、予め特定されたオーディオサンプリングレートでサンプルされたオーディオ信号を圧縮することによって発生される。システムデータ流に交互配置された圧縮ビデオビット流は、予め特定されたビデオフレームレートを有するビデオ信号の一連のフレームを圧縮することによって発生される。システムビット流の組立を開始する前に、予め特定されたビデオフレームレートにより分割された予め特定されたオーディオサンプリングレートに等しい予期される符号化オーディオ／ビデオ比が計算される。次いで、システムヘッダー（ ”ＳＨ”）は、オーディオサンプリングレートとシステムクロック基準（”ＳＣＲ”）の間に特定された一定の関係があり且つビデオ画像レートとＳＣＲの間に特定された一定の関係があることをそれぞれ表示するように設定されたシステムオーディオロックフラグとシステムビデオロックフラグの両方を含むシステムデータ流に埋め込まれる。次いで、データのパケットは、圧縮オーディオビット流又は圧縮ビデオビット流の何れかからそれぞれ選ばれシステムデータ流に組み込まれる。同期を行う為に、呈示タイムスタンプ（”ＰＴＳ”）及び任意復号器タイムスタンプ（”ＤＴＳ”）が、各パケットと共に、システムデータ流に埋め込まれる。更に、圧縮用受信オーディオ信号の全サンプルのカウントを表す数により分割された圧縮用受信ビデオ信号のフレーム全数に等しい実際の符号化オーディオ／ビデオ比が計算される。この実際の符号化オーディオ／ビデオ比を用いて、符号化フレーム誤差値は次いで、予期される符号化オーディオ／ビデオ比を実際の符号化オーディオ／ビデオ比から先ず差引いて計算され比の差を得る。次いで、比の差は圧縮用受信ビデオ信号のフレーム全数で掛け合わされる。かく計算された符号化フレーム誤差値が予め特定された負の誤差値より小さければ、ビデオ信号の全フレームに対する圧縮ビデオビット流内の全データは次いで、システムデータ流から除外される。逆に、符号化フレーム誤差値が予め特定された正の誤差数より大きければ、ビデオ信号の全フレームの第二のコピーに対する全データは、圧縮ビデオビット流からシステムデータ流に組み込まれる。本発明のより好ましい実施例においては、予め特定された正の誤差値と予め特定された負の誤差値の両者は、復号化ビデオ画像の１．５フレームを呈示するのに要する時間間隔に約等しい時間間隔を表す。この発明の利点は、より容易に復号化され得るシステムデータ流を発生することである。この発明の他の利点は、種々の異なる復号器により復号化され得るシステムデータ流を発生することである。この発明の他の利点は、比較的簡単な復号器により復号化され得るシステムデータ流を発生することである。これ等及び他の特徴、目的及び利点は、種々の図面図に例示された好ましい実施例の以下の詳細な説明から当業技術者に容易に理解されよう。図面の簡単な説明図１は、圧縮オーディオビット流から選ばれたパケットを圧縮ビデオビット流から選ばれたパケットで交互配置しシステムデータ流を組み立てるプロセスを図表的に描写する図である。図２は、ビデオ信号の一連のフレームを圧縮ビデオビット流に圧縮するビデオ符号器、オーディオ信号を圧縮オーディオビット流に圧縮するオーディオ符号器及び圧縮ビデオビット流から選ばれたパケットを圧縮ビット流から選ばれたパケットで交互配置してシステムビット流を組み立てるマルチプレクサを例示するブロック線図である。図３は、圧縮オーディオビット流から選ばれたパケットを圧縮ビデオビット流から選ばれたパケットで交互配置することによって組み立てられたシステムデータ流を例示する図である。図４は、ビデオ信号全フレームに対する全てのデータがシステムデータ流から省かれる（除外される）べきか、又はビデオ信号の全フレームの第二のコピーに対する全てのデータがシステム流に組み込まれるべきかを決定するプロセスを具現化するＣプログラム言語で書かれたコンピュータプログラムである。発明を実施する最良の態様図１における矢印１２ａと１２ｂは、圧縮オーディオビット流１６から選ばれたパケットを圧縮ビデオビット流１８から選ばれたパケットと交互配置して連結パック２４から成る直列システムデータ流２２を組み立てるプロセスを描写している。ブロック線図に例示されたオーディオ符号器３２は、矢印３４により図２に例示されたオーディオ信号を処理することによって、圧縮されたオーディオビット流１６を発生する。オーディオ符号器３２は、予め特定されたオーディオサンプリングレート（”ＰＳＡＳＲ”）でオーディオ信号３４を先ずディジタル化し、次いでディジタル化オーディオ信号のディジタル化表示を圧縮することによって、圧縮オーディオビット流を発生する。ビデオ符号器３６は、予め特定されたビデオフレームレート（”ＰＳＶＦＲ”）を有する、図２に矢印３８により例示されたビデオ信号の一連のフレームをＭＰＥＧＧＯＰｓに圧縮することによって、圧縮されたビデオビット流１８を発生する。オーディオ符号器３２は、好ましくは、カリフォルニア州９４０８６サニーベル、イー・アーカス・アベニュー１０９６所在のＦｕｔｕｒｅＴｅｌ，Ｉｎｃ．により市販されているオーディオ圧縮エンジン・モデルＮｏ．９６−０００３−０００２である。ビデオ符号器３６は、好ましくは、同ＦｕｔｕｒｅＴｅｌ，Ｉｎｃ．により販売されているビデオ圧縮エンジン・モデルＮｏ．９６−０００２−００２である。これ等の好ましいオーディオ符号器３２とビデオ符号器３６は、オーディオ信号３４を圧縮オーディオビット流１６に及びビデオ信号３８を圧縮ビデオビット流１８にそれぞれ実時間で圧縮出来る。実時間において、システムデータ流マルチプレクサ４４は、圧縮オーディオビット流１６又は圧縮ビデオビット流１８からの圧縮オーディオデータ又は圧縮ビデオデータのパケットをそれぞれ繰り返し選び図１に例示されたシステムデータ流２２のパックに交互配置組立を行う。システムデータ流マルチプレクサ４４は、好ましくは、オーディオ符号器３２とビデオ符号器３６が位置するパーソナルコンピュータ（図示せず）に含まれるホストマイクロプロセッサによって実行されるコンピュータプログラムである。オーディオ信号３４とビデオ信号３８をそれぞれ圧縮するオーディオ符号器３２とビデオ符号器３６の製作に際して、ホストマイクロプロセッサにより実行されるコンピュータプログラムは指令とデータをオーディオ符号器３２とビデオ符号器３６に転送し、予め特定されたビットレートで圧縮オーディオビット流１６と圧縮ビデオビット流を生成する。システムデータ流に埋め込まれる制御データ用に必要とされるオーバーヘッドを収容する為、圧縮オーディオビット流１６と圧縮ビデオビット流１８用コンピュータプログラムによって特定されるビットレートの和は、システムデータ流２２に対して特定されるビットレートより僅かに少ない。オーディオ符号器３２に圧縮オーディオビット流１６を予め特定されたビットレートで発生するように指令することに加えて、ホストマイクロプロセッサは、オーディオ符号器３２にオーディオ信号３４をＰＳＡＳＲでディジタル化するように指令する付加制御データをオーディオ符号器３２に転送する。制御データをオーディオ符号器３２とビデオ符号器３６に転送しそれ等をして圧縮オーディオビット流１６と圧縮ビデオビット流１８をそれぞれ発生させる準備を為さしめることに加えて、ホストマイクロプロセッサにより実行されるコンピュータブログラムはまた、システムデータ流２２を組み立てるのに用いられるある種のデータを準備する。特に本発明に関して、ホストマイクロプロセッサにより実行されるコンピュータプログラムは、ＰＳＡＳＲとＰＳＶＦＲを分割することによって、システムデータ流２２に対して予測される符号化オーディオ／ビデオ比（”ＥＥＡＶＲ”）を計算する。ホストマイクロプロセッサにより実行されるコンピュータプログラムがシステムデータ流２２を組み立てる準備を完了した後、システムデータ流マルチプレクサ４４は、圧縮オーディオビット流１６又は圧縮ビデオビット流１８からのそれぞれのデータのパケットを繰り返し選び、システムデータ流２２のパック２４に組み込む。図３に例示されているように、ＩＳＯ／ＩＥＣ１１１７２仕様に従う組立システムデータ流２２の各パック２４は、予め特定された長さＬを有する。各パック２４は、６５．５３８バイト程度の長さＬをもつことが出来る。各パック２４は、その特定のパック２４に対してシステムクロック基準（”ＳＣＲ”）値を含む、図３においてＰＨで示されたパックヘッダー５２で始まる。システムデータ流２２の第一のパック２４において、図３にＳＨで示されたシステムヘッダー５４は、パックヘッダー５２に直従する。ＩＳＯ／ＩＥＣ１１１７２仕様に従って、システムヘッダー５４はまた、システムデータ流２２内で各パック２４で繰り返される。システムヘッダー５４は、システムオーディオロックフラグ及びシステムビデオロックフラグの両方を含む。ホストマイクロプロセッサにより実行されるコンピュータプログラムは、システムオーディオロックフラグ及びシステムビデオロックフラグを１に設定して、オーディオサンプリングレートとＳＣＲの間に特定な一定の関係が存在し且つビデオ画像レートとＳＣＲの間に特定な一定の関係が存在することをそれぞれ表示する。一つがパック２４に含まれていれば、パックヘッダー５２とシステムヘッダー５４に引き続いて、図３に例示された各パック２４の残りは、圧縮オーディオビット流１６又は圧縮ビデオビット流１８の何れかからシステムデータ流マルチプレクサ４４により選ばれたデータのパケット５６を含む。各パケット５６は、図示しないパケットを含み、これは呈示タイムスタンプ（”ＰＴＳ”）を含むことが出来、ＩＳＯ／ＩＥＣ１１１７２仕様に従って任意の復号化タイムスタンプ（ ”ＤＴＳ”）も含むことが出来る。図示されていないが、この発明によるシステムデータ流２２はまた、パディング流のパックを含むことが出来る。ＩＳＯ／ＩＥＣ１１１７２仕様下で許容されているように、システムデータ流マルチプレクサ４４はパディング流からのパックをシステムデータ流２２に組立て、一定の全データレートを維持し、セクタ合わせを達成し、或いは復号器バッファーアンダーフローを阻止するようにすることがある。好ましいオーディオ符号器３２は、オーディオ信号３４を予め特定されたサンプリングレートでディジタル化することによってオーディオビット流１６を圧縮し、次いで、ディジタル化されたオーディオ信号を圧縮し予め特定されたビットレートで圧縮オーディオビット流１６を発生するので、好ましいオーディオ符号器３２により発生された圧縮オーディオビット流１６は、ＳＣＲ、ＳＴＳ及びＤＴＳをシステムデータ流２２のパック２４に割り当てる安定したタイミング基準を固有に提供する。比較によれば、ビデオ信号３８がビデオカセットをビデオカセットレコーダ（”ＶＴＲ”）上で再生し又はレーザーディスクをレーザーディスク上で動作させる場合には、ビデオ信号のフレームレートの変動故に、ビデオ信号のフレームレートはＳＣＲ，ＰＴＳ及びＤＴＳを割り当てる安定したタイミング基準を提供しない。システムデータ流２２の組立中に、ホストマイクロプロセッサにより実行されるコンピュータプログラムは、圧縮オーディオビット流１６又は圧縮ビデオビット流１８から選ばれたパケット５６に加えてオーディオ符号器３２とビデオ符号器３６からのデータを取り込む。特に、システムデータ流マルチプレクサ４４は、オーディオ符号器３２内の位置６２からオーディオ符号器３２が圧縮用に受信したオーディオ信号の全サンプル（”ＮＯＳ”）の走行サンプルを表す数を取り込む。同様に、システムデータ流マルチプレクサ４４はまた、ビデオ符号器３６内の位置６４からビデオ符号器３６が圧縮用に受信したビデオ信号３８の全フレーム数（”ＮＯＦ”）の走行カウントを取り込む。ホストマイクロプロセッサにより実行されるコンピュータプログラムは、これ等二つの値を出来るだけ時間的に近接して取り込む。次いで、システムデータ流マルチプレクサ４４は、ＮＯＳをＮＯＦで分割し実際の符号化オーディオ／ビデオ比（”ＡＥＡＶＲ”）を得る。次いで、システムデータ流マルチプレクサ４４は、先ずＡＥＡＶＲから前に計算されたＥＥＡＶＲを差し引いて比（”ＤＯＲ”）の差を得る。次いで、ＤＯＲはＮＯＦにより掛け合わされ符号化フレーム誤差値（”ＥＦＥＶ”）を得る。ＥＦＥＶは、予め特定されたオーディオサンプリング比に基づいて、システムデータ流２２に組み立てられたＮＯＦに対する実際の時間とシステムデータ流２２に組み立てられたＮＯＦに対する予測時間の時間差を表す。かく計算されたＥＦＥＶが、圧縮ビデオビット流１８に組み立てられたＮＯＦに対する実際の時間が圧縮ビデオビット流１８より予め特定された負の誤差値（ ”ＰＳＮＥＶ”）以上に大きいと云う理由で、ＰＶＮＥＶより小さければ、システムデータ流マルチプレクサ４４はシステムデータ流２２から、圧縮ビデオビット流１８における全Ｂフレームに対する全てのデータを省く（除外する）。ＥＦＥＶが、圧縮ビデオビット流１８に組み立てられたＮＯＦに対する実際の時間が圧縮ビデオビット流１８に組み立てられたＮＯＦに対する予測時間より予め特定された正の誤差値（”ＰＳＰＥＶ”）以上少ないと云う理由で、ＰＳＰＥＶより大きければ、システムデータ流マルチプレクサ４４は、圧縮ビデオビット流１８における全Ｂフレームに対する全てのデータの第二のコピーをシステムデータ流２２に組み込む。ＰＳＮＥＶ及びＰＳＰＥＶに対する好ましい値は、復号化ビデオ画像の１．５フレームの呈示に対して要求される時間間隔を表す。かくして、ＥＦＥＶの大きさが復号化ビデオ画像の１．５フレームの呈示に必要とされる時間間隔を超える時間間隔を表す場合にのみ、圧縮ビデオビット流１８内の全Ｂフレームがシステムデータ流２２から省かれ、或いは圧縮ビデオビット流１８内の全Ｂフレームの第二のコピーがシステムデータ流２２に組み立てられるのである。ＩＳＯ／ＩＥＣ１１１７２のパート２に従うシステムデータ流２２内の各フレームは付番されるので、システムデータ流マルチプレクサ４４がシステムデータ流２２から圧縮ビデオビット流１８における全Ｂフレームに対する全てのデータを省く場合、システムデータ流マルチプレクサ４４は、現在のＧＯＰにおける全ての引き続くフレームに、それ等をシステムデータ流２２に組み立てる前に、従って再付番しなければならない。それに対応して、システムデータ流マルチプレクサ４４がシステムデータ流２２に圧縮ビデオビット流１８内の全Ｂフレームに対する全てのデータの第二のコピーを組み込む場合には、システムデータ流マルチプレクサ４４は従ってそのフレームに付番し且つ現在のＧＯＰから引き続く全フレームに再付番になければならない。図４は、ビデオ信号３８の全フレームに対する全てのデータがシステムデータ流２２から省かれるべきか、或いはビデオ信号３８の全フレームの第二のコピーに対する全てのデータがシステムデータ流２２に組み込まれるべきかを決定するプロセスを具現化する、Ｃプログラム言語で書かれたコンピュータプログラムである。図４におけるライン番号１−８は、オーディオ符号器３２内の位置６２及びビデオ信号６４内の位置からのカウントを取り込みＮＯＦ及びＮＯＳに対する値を設定する。図４におけるライン番号１３−１６は、ＥＦＥＶの計算を具現する。図４におけるライン番号２１−２２は低域フィルタをＥＦＥＶに適用する。図４におけるライン番号２６−３６は、ビデオ信号３８の全フレームに対する全てのデータがシステムデータ流２２から省かれるべきか、或いはビデオ信号３８の全フレームの第二のコピーに対する全てのデータがシステムデータ流２２に組み込まれるべきかを決定する。産業上適用性圧縮ビデオビット流１８に対するビットレートを設定するのに、ホストマイクロプロセッサにより実行されるコンピュータプログラムは、そのビットレートを、システムデータ流２２に対する所望の基準ビットレート差引圧縮オーディオビット流１６に対する予め特定されたビットレート以下約１％に設定する。圧縮ビデオビット流１８に対するビットレートを所望の基準ビットレート以下１％に設定すると、圧縮オーディオビット流１６と圧縮ビデオビット流１８に対するビットレートの和にシステムデータ流２２のオーバーヘッドを加えたものが、たとえ圧縮ビデオビット流１８内の全Ｂフレームに対する全てのデータの第二のコピーがシステムデータ流２２にたまたま組み込まれようと、システムデータ流２２に対する最大ビットレートを決して超えることがないと云う充分な安全余裕が得られる。システムデータ流マルチプレクサ４４は、それがシステムデータ流２２を数分間組み立てた後にのみ、システムデータ流２２からのＢフレームの省略（除外）又はシステムデータ流２２に対するＢフレームの付加をして開始する。システムデータ流マルチプレクサ４４は、誤動作を避ける為、短い時間間隔、Ｂフレームの省略又は付加を禁止する。システムデータ流２２の始めの数分間中のかかるＢフレームの誤動作省力又は付加は、ＮＯＳに対する一つの比較的小数をＮＯＦに対する他の比較的小数により分割する結果である。両オーディオ符号器３２とビデオ符号器３６の動作をそれぞれ起動させる、ホストマイクロプロセッサにより実行されるコンピュータプログラムから送られる指令がオーディオ符号器３２とビデオ符号器３６の両者で実行されるマイクロコードを生じ、それが位置６２にあるカウントと位置６４にあるカウントを零にリセットするので、始めの数分間の動作中に小数がＮＯＳとＮＯＦに対して起こる。数分間の時間間隔の後、カウントＮＯＳとカウントＮＯＦは充分に大きくなり、引き続くＤＯＲｓは一ＧＯＰから次のＧＯＰに対してそれ程顕著には変化しない。システムデータ流２２の始めの数分間の短い時間の間、Ｂフレームの省略（除外）と付加を完全に禁止することに加えて、Ｂフレームがシステムデータ流２２から省かれる（除外される）べきか、或いはＢフレームがシステムデータ流２２に付加されるべきかを決定するＥＦＷＶをテストする前に、低域フィルタがＥＦＥＶに適用されＢフレームの誤動作除外又は付加を更に禁止する。低域フィルタをＥＦＥＶに適用すると、ＥＥＡＶＲとＡＥＡＶＲ間の差における長時間傾向にのみ応答して、且つ、ＮＯＳとＮＯＦの値の変動によらず、多分一つのＧＯＰ中のＮＯＳ又はＮＯＦの何れかの一つの値の読み取り及び直前又は直後のＧＯＰ中のＮＯＦ又はＮＯＳの何れかの対応する値の読み取りによらず、Ｂフレームのシステムデータ流２２からの省略又はＢフレームのシステムデータ流２２への付加が保証される。ＥＦＥＶに適用される好ましい低域フィルタは、非対称応答を有する。即ち、低域フィルタの特性により、フィルタの出力値は、それがＥＦＥＶに対する非零値に応答して零から離れるのより速やかに、ＥＦＥＶに対する零値に応答して零に帰還する。低域フィルタに採用される実際の応答時間は、半経験的に決定される。更に、システムデータ流マルチプレクサ４４が圧縮ビデオビット流１８のフレームをシステムデータ流からから省くか或いはそれに付加するならば、低域フィルタの出力値は任意に零に設定される。低域フィルタの出力値を零に設定すれば、直続するＭＰＥＧＧＯＰｓの処理中に、圧縮ビデオビット流１８の全フレームの省略又は圧縮ビデオビット流１８の全フレームの付加がより禁止されるようになる。本発明に従って、好ましいオーディオ符号器３２、好ましいビデオ符号器３６及びシステムデータ流マルチプレクサ４４の組み合わせは、事実上、如何なる所望のデータ流２２をも直接及び如何なる介在処理動作無しに組み立てることを許容する。例えば、オランダ国、ジェービー・アインドホーフェン５６００、Ｐ．Ｏ．Ｂｏｘ８０００２、ビルディングＳＡ−１のフィリップス・コンシューマー・エレクトロニクス、コーディネーション・オフィス＆磁気媒体システムは、「ホワイト・ボックス」仕様と口語的に言及されるビデオＣＤに対する仕様を確立した。フィリップスのホワイト・ボックス基準は、毎秒１，１５１．９２９．１ビットの圧縮ビデオビット流１８に対する最大ビットレート、４４．１ｋＨｚのオーディオサンプリングレート及び毎秒２２４キロビットのオーディオビットレートを特定する。フィリップスのホワイト・ボックス基準はまた、オーディオパケットを２２７９バイト長とし、一方、ビデオパケットは２２９６バイト長を有し且つシステムデータ流２２は毎秒７５パックのパックレートを有するとしている。好ましいオーディオ符号器３２及び好ましいビデオ符号器３６と関連して動作する本発明によるシステムデータ流マルチプレクサ４４は、システムデータ流２２を、安定して特定された圧縮オーディオビット流１６と圧縮ビデオビット流１８からフィリップスのホワイト・ボックス基準に従って、また如何なる介在動作も無く、直接組み立てることが出来る。本発明は以上、現在のところ好ましい実施例に付いて記載されたが、かかる開示は純粋に例示的なものであり、限定的に解釈されるべきでないことが理解されるべきである。従って、発明の精神と範囲を逸脱することなく、本発明の種々の変更、修正及び／又は代替的応用は、疑いも無く、以上の開示を読了した当業技術者に示唆されるであろう。従って、以下の請求項は、本発明の真の精神と範囲内に該当する全ての変更、修正又は代替的応用を包括するものと理解されるべきことが意図されるものである。DETAILED DESCRIPTION OF THE INVENTION Measurement and regulation of synchronization of merged video and audio data TECHNICAL FIELD OF THE INVENTION The present invention generally relates to techniques for recording and / or transmitting compressed digital data. Field, and in particular, video and audio data, The present invention relates to enabling synchronized presentation in accordance with data flow. Background art Compressed digitized video data with accompanying compressed digitized audio data To properly reproduce recorded and / or transmitted multi-media programs consisting of data Contains digital data bit stream including both video and audio data Requires combining into a single synchronized sequential system data stream. System data When assembling to a data stream or presenting an assembled system data stream, Audio data and audio data are missing or improperly synchronized Often, the visible image appears in synchronization with the accompanying sound. For example, say If you try to present an image that shows the movement of the lips of a leaf-speaking individual, May not be synchronized with audible sound. In order to answer the above questions, the Moving Image Experts Group ("MPEG") International Organization for Standardization ("ISO") and International Electrotechnical Commission ("IEC") Standard ISO / IEC 11172 Part 1 digitized video and audio data Stipulates a framework that allows to combine data into a single synchronized sequential system data stream doing. Once combined into a single digital data stream, the data is digitized. Digital on hard disk or CD-ROM included in the computer Storage device or cable antenna television ("CATV") system or high bit-rate Digital telephone systems such as TI, ISDN major rate or ATM remote It is a format well suited for transmissions over remote communications access. ISO / IEC The system data stream assembled according to the 11172 standard is an MPEG decoder (Decoder) decoded image and / or decoded audio sample Can be obtained. The ISO / IEC 11172 standard that defines MPEG compression is based on compressed video bits. Packets of data extracted from the stream and compressed audio bit stream Stipulate that they should be interleaved when assembling the flow. Furthermore, ISO / IE In accordance with the C11172 standard, system data streams are compressed video and compressed audio In addition to the obbit stream, it can include privately owned padding streams. MPEG The characteristics of the system data stream specified by the MPEG encoder (encoder) and the data It imposes functional and performance requirements on the coder (decoder), but does not The system data stream that has been created is based on the architecture for the MPEG encoder or decoder or It does not specify its implementation. In fact, it works according to the ISO / IEC 11172 standard. There is considerable freedom in the possible design and implementation of encoders and decoders that make Exist. The system data stream according to Part 1 of the ISO / IEC 11172 standard is System data that includes two data layers, namely, the compression layer and the digital data of the compression layer. Data layer. The ISO / IEC 11172 system layer itself has two sub-layers: That is, a layer for multi-width operation identified as a “pack layer” and a “packet layer” Are divided into layers for the flow identification operation identified. ISO / IEC11172 units Packs belonging to the pack layer of the system data flow according to the standard are based on the system clock reference. ("SCR"). SCR is 90 kilohertz ("kHz") z ″) of the digitized video and audio data contained in the compression layer during the period Fix the intended time to start compression. Packets are used to provide synchronized presentation of digitized video and audio data. The ISO / IEC 11172 standard that defines layers is "presentation time stamp" ("PTS ") And an optional decoding time stamp (" DTS "). Specifies the synchronization for video and audio data for the SCR specified in the pack layer You. The packet layer optionally including both PST and DTS is ISO / IEC1117 It is independent of the data contained in the compression layer specified by the two standards. For example, bidet An packet can start with any byte in the video stream. However, PST And any DTS is encoded in the header of each packet, To the first "access unit" ("AU"). The MPEG standard ISO / IEC 11172 describes AU as a "presentation unit" (PU) Is defined as an encoded representation of ISO / IEC 11172 standard further decodes PU It is defined as an audio AU or a decoded image. This criterion also covers audio signals. The three different standards referred to as “layers” in the standards for compressing or decompressing Stipulates how to For two of these methods, the criteria are audio AU, Define as the smallest part of the encoded audio bit stream that can be decoded by itself . For the third method, the criteria are the audio AU, the previously captured location and Is defined as the minimum part of the encoded audio bit stream that can be decoded using the information. Part 1 of the ISO / IEC 11172 standard is for compressed video and audio data. During the synchronized presentation of data, the playback of video images and audio Adjust the video data stream to match the playback of other streams, for example, audio data stream Rather than letting it run, a master time clock called the “stream clock” (“STC”) Synchronized to the playback of both compressed digital data Suggests that The ISO / IEC 11172 standard defines the MPGE decoder S TM is the decoder clock (eg, SCR, video PTS or audio PTS) ), A digital storage medium ("DSM") or one of the channel clocks; Or it suggests that some external clock is fine. MPEG system The end-to-end synchronization of a multi-media program encoded in a data stream is: a. The encoder embeds a timestamp during assembly of the system data stream; b. Video and audio decoder embedded time stamp with compressed data Receive; and c. Timestamps when the decoder schedules the presentation of multiple media programs Use What happens if. Make sure that the encoded bit stream has an accurate relationship to the SCR for the MPEG decoder. Occurs at the beginning of the system data stream and is repeated within the stream to communicate “System Header” (“SH”) audio Lock H Lag and system video Lock Flags ". system Audio Oh Lock If the flag is set to 1, the audio sampling rate and SCR Indicates that there is a certain constant relationship between system video Lock If the flag is set to 1, a certain fixed relationship between video screen, rate and SCR To indicate that exists. If any of these flags are set to zero, the corresponding Indicates that there is no relationship. As mentioned above, the ISO / IEC 11172 standard states that system data streams In particular, it stipulates that the flow can include lingering streams. System from padding style Packets assembled into a data stream are required to maintain a constant overall data stream rate. Can be used to achieve sector alignment, or Prevent low. In addition to the padding style, up to 16 bytes of "stuffing" is included in each packet. Permissible. Stuffing is used for a purpose similar to that of the padding style. In applications where alignment (8 bits) is not sufficient, word (16 bits) or Is particularly useful for providing long word (32 bit) alignment. Staffin If the required number of bytes is smaller than the minimum size of the Sometimes, the only way to fill a packet. Video data compressed according to Part 2 of the ISO / IEC 11172 standard A bit stream consists of a series of frames of compressed video data. MPGM compressed video The sequence of frames in the data bit stream is an internal ("I") frame, a prediction (" P ") frame and bidirectional (" B ") frame. Decoding MPEG I-frames that do not illuminate all uncompressed frames of video data Is played. MPEG P-frame is a decoded frame before video data By referencing only the previous decoded I-frame or the previous decoded Decoding to obtain all uncompressed frames of video data with reference to the Rukoto can. An MPEG B frame contains both the previous and subsequent reference frames. Only by referencing the decoder, ie, referencing either the decoded I or P frame Then, it can be decoded to obtain all the uncompressed frames of the video data. According to the ISO / IEC11172 specification, an I frame is defined as an image group ("GOP"). One or more I-frames are defined along with all of the reference P- and B-frames. I have decided. In assembling the system data stream, the real-time MPEG encoder The system header must be included at the start of the data stream and the system header Dar is a system audio Lock Flags and systems video Lock The flag must be set to zero (0) or one (1). Real-time MPEG encoder Is specified if it specifies that one or both of these flags should be set Over the entire system data stream, between the audio sampling rate and the SCR and To ensure that a fixed relationship exists between the video screen rate and the SCR. Testify. The compressed audio bitstream encoder has no effect on the rate at which frames of video occur. Should be interleaved with the system data stream if it works in relation There is no guarantee that these fixed relationships exist in the encoded data. Disclosure of the invention An object of the present invention is to provide a system data that allows synchronized presentation of a visual image and accompanying sound. It is to provide a method for assembling a data flow. Another object of the invention is to provide a fixed relationship between the audio sampling rate and the SCR. The purpose is to provide a system data flow that maintains It is another object of the present invention to maintain a constant relationship between video image rate and SCR. It is to provide a stem data stream. Briefly stated, the present invention provides a method for decoding a decoded video image and a decoded audio by a decoder. A method for real-time assembly of an encoded system data stream that can be decoded into an e-signal. Especially The system data stream constructed according to the present invention is To present the decoded audio signal substantially in synchronism with the audio signal. This system The stream of data consists of packets of data selected from the compressed audio bit stream. Assembled by interleaving packets of data selected from the deobit stream I can The compressed audio bit stream interleaved with the system data stream Compresses an audio signal sampled at a specified audio sampling rate. It is generated by shrinking. Compressed video interleaved with system data stream The bit stream is a series of video signals having a pre-specified video frame rate. Generated by compressing a frame. Before starting to assemble the system bit stream, the video frame Expectation equal to a pre-specified audio sampling rate divided by An encoded audio / video ratio to be calculated is calculated. Then the system header ( "SH") is the audio sampling rate and system clock reference ("SC"). R ") and there is a specified relationship between the video image rate and the SCR Systems configured to indicate that there is a certain relationship identified audio Lock Flags and systems video Lock Include both flags Embedded in the system data stream. The packet of data is then compressed System data selected from either the Be incorporated into the stream. In order to perform synchronization, the presentation time stamp ("PTS") and An arbitrary decoder time stamp ("DTS") is added to the system data along with each packet. Embedded in the stream. Furthermore, it is divided by a number representing the count of all samples of the received audio signal for compression. Of the actual encoded audio / audio equal to the total number of frames of the compressed received video signal The video ratio is calculated. Using this actual encoded audio / video ratio, the code The framed frame error value is then used to convert the expected encoded audio / video ratio to the actual code. The difference is obtained by first subtracting from the encoded audio / video ratio. Then the ratio Is multiplied by the total number of frames of the received video signal for compression. If the calculated frame error value is smaller than the previously specified negative error value, Then, all data in the compressed video bit stream for all frames of the video signal is And is excluded from the system data stream. Conversely, the coding frame error value is specified in advance. If the number of positive errors is greater than the given error number, the second copy of all frames of the video signal is The entire data stream is incorporated into the system data stream from the compressed video bit stream. Book In a preferred embodiment of the invention, a pre-specified positive error value and a pre-specified Both negative error values obtained are required to represent 1.5 frames of the decoded video image. Represents a time interval approximately equal to An advantage of the present invention is that it produces a system data stream that can be more easily decoded. And Another advantage of the present invention is that the system data can be decoded by a variety of different decoders. Data flow. Another advantage of the present invention is that the system data can be decoded by a relatively simple decoder. Data flow. These and other features, objects and advantages are described in the preferred embodiments illustrated in the various drawing figures. Those skilled in the art will readily appreciate from the following detailed description of the embodiments. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 shows a packet selected from a compressed audio bit stream as a compressed video bit stream. Figure illustrates the process of assembling the system data flow by interleaving packets selected from FIG. FIG. 2 illustrates a video that compresses a series of frames of a video signal into a compressed video bit stream. Encoder, audio encoder that compresses audio signal into compressed audio bit stream And packets selected from the compressed video bit stream and packets selected from the compressed bit stream. Block that illustrates a multiplexer that interleaves the bits to assemble the system bit stream. It is a lock diagram. FIG. 3 shows a packet selected from a compressed audio bit stream as a compressed video bit stream. System data constructed by interleaving packets selected from It is a figure which illustrates a flow. FIG. 4 shows that all data for all frames of the video signal are taken from the system data stream. Should be omitted (excluded) or included in the second copy of every frame of the video signal Implement a process to determine if all data for It is a computer program written in the C programming language to be realized. BEST MODE FOR CARRYING OUT THE INVENTION Arrows 12a and 12b in FIG. 1 are selected from the compressed audio bit stream 16. Packets that are interleaved with packets selected from the compressed video bit stream 18 and concatenated Depicting the process of assembling a serial system data stream 22 consisting of packs 24 I have. The audio encoder 32 illustrated in the block diagram is illustrated in FIG. By processing the audio signal illustrated in A cut stream 16 is generated. The audio encoder 32 has a predetermined audio source. The audio signal 34 is first digitized at the sampling rate ("PSASR") And then compressing the digitized representation of the digitized audio signal. Thus, a compressed audio bit stream is generated. The video encoder 36 is specified in advance. Example with arrow 38 in FIG. 2 having a different video frame rate ("PSVFR") By compressing a series of frames of the indicated video signal into MPEG GOPs Thus, a compressed video bit stream 18 is generated. Audio encoder 32 is preferably Best of all, E-Arcus Avenue, 94086 Sunnybell, California FutureTel, Inc. Audio marketed by O Compression engine model No. 96-0003-0002. Video encoder 36, preferably from FutureTel, Inc .; Be sold by Video compression engine model No. 96-0002-002. Like these A new audio encoder 32 and video encoder 36 compress the audio signal 34 And the video signal 38 into the compressed video bit stream 18 It can be compressed in real time. In real time, the system data stream multiplexer 44 provides a compressed audio Compressed audio data or compressed video data from the stream 16 or the compressed video bit stream 18. System data illustrated in FIG. 1 by repeatedly selecting packets of video data. The packs of stream 22 are assembled alternately. The system data stream multiplexer 44 , Preferably where the audio encoder 32 and the video encoder 36 are located. Executed by a host microprocessor included in a computer (not shown) Computer program. Audio signal 34 and video signal 38 When producing the audio encoder 32 and the video encoder 36 for compression, respectively, The computer program executed by the microprocessor is composed of commands and data. Is transferred to the audio encoder 32 and the video encoder 36, and the bit A compressed audio bit stream 16 and a compressed video bit stream are generated by a port. system Accommodates the overhead required for control data embedded in the data stream For the compressed audio bit stream 16 and the compressed video bit stream 18 The bit rate sum specified by the gram is Slightly less than the specified bit rate. Compressed audio to audio encoder 32 Io In addition to instructing bit stream 16 to occur at a pre-specified bit rate, In addition, the host microprocessor sends the audio signal 3 to the audio encoder 32. 4 is an additional control data for instructing to digitize 4 with PSASR. Transfer to encoder 32. Transfer control data to audio encoder 32 and video encoder 36, The conditions for generating the compressed audio bit stream 16 and the compressed video bit stream 18, respectively. In addition to provisions, the The puttable is also used to assemble the system data stream 22 Prepare some data. Particularly with respect to the present invention, the host microprocessor More executed computer program splits PSASR and PSVFR This allows the expected encoded audio / video to be Calculate the video ratio ("EEAVR"). A computer program executed by the host microprocessor is After the preparation for assembling the data stream 22 is completed, the system data stream multiplex The signal from the compressed audio bit stream 16 or the compressed video bit stream 18 is Each data packet is repeatedly selected and stored in the pack 24 of the system data stream 22. Incorporate. According to the ISO / IEC11172 specification, as illustrated in FIG. Each pack 24 of the assembly system data stream 22 has a pre-specified length L. Each pack 24 can have a length L of about 65.538 bytes. Each package The clock 24 uses the system clock reference ("SCR") for that particular pack 24. It begins with a pack header 52, indicated by PH in FIG. 3, containing the value. In the first pack 24 of the system data stream 22, the system indicated by SH in FIG. The stem header 54 directly follows the pack header 52. ISO / IEC111 In accordance with the G.72 specification, the system header 54 also Repeated for pack 24. The system header 54 indicates the system audio Lock Flags and systems video Lock Includes both flags. Hostma The computer program executed by the microprocessor is a system program. Oh Dio Lock Flags and systems video Lock Set the flag to 1 There is a certain fixed relationship between the audio sampling rate and the SCR, and Bi Indicate that there is a certain fixed relationship between video image rate and SCR I do. If one is included in pack 24, pack header 52 and system header Following 54, the remainder of each pack 24 illustrated in FIG. From either the stream 16 or the compressed video bit stream 18 Includes a packet 56 of data selected by lexer 44. Each packet 56 is Include unindicated packets, which include a presentation time stamp ("PTS") And any decoding timestamp (according to the ISO / IEC11172 specification) "DTS"). Although not shown, the system data stream 22 according to the present invention also includes padding. It can include packs of style. Allowed under ISO / IEC11172 specification As shown, the system data stream multiplexer 44 includes a padding stream from the padding stream. The data into the system data stream 22 to maintain a constant overall data rate and Or to prevent decoder buffer underflow. There is. The preferred audio encoder 32 converts the audio signal 34 to a pre-specified sample. Compress audio bitstream 16 by digitizing at pulling rate And then compresses the digitized audio signal and Generates a compressed audio bit stream 16 at a rate so that the preferred audio code The compressed audio bit stream 16 generated by the modulator 32 is composed of SCR, STS and D Stable timing reference for assigning TS to packs 24 of system data stream 22 Is provided uniquely. By comparison, video signal 38 converts the video cassette to the video cassette. Play on a set recorder (“VTR”) or insert a laser disc When operating on a disc, the video frame rate may fluctuate. The frame rate of the signal is a stable time to assign SCR, PTS and DTS. Does not provide any testing criteria. Executed by the host microprocessor during assembly of the system data stream 22 The computer program can be a compressed audio bit stream 16 or a compressed video bit stream. Audio encoder 32 and video code in addition to packet 56 selected from stream 18 The data from the device 36 is fetched. In particular, the system data stream multiplexer 44 , Audio encoder 32 receives for compression from position 62 in audio encoder 32 Number representing the running samples of all samples ("NOS") of the audio signal Put in. Similarly, system data stream multiplexer 44 also controls video encoder 36. Of the video signal 38 received for compression by the video encoder 36 from a location 64 within the The running count of the number of frames ("NOF") is captured. To host microprocessor A more executed computer program will use these two values as temporally as possible. Take in close to. Next, the system data stream multiplexer 44 sets the NOS Is divided by NOF to obtain the actual coded audio / video ratio ("AEAVR") . The system data stream multiplexer 44 then first counts forward from AEAVR. Subtract the calculated EEAVR to get the difference in ratio ("DOR"). Then, DOR Are multiplied by NOF to obtain an encoded frame error value ("EFEV"). E The FEV uses the system data based on the audio sampling ratio specified in advance. The actual time for the NOF assembled in the data stream 22 and the system data stream 22 The time difference of the predicted time with respect to the assembled NOF is shown. The EFEV thus calculated is combined with the NOF assembled into the compressed video bit stream 18. Is the actual time for the negative error value ( If it is smaller than PVNEV, because it is larger than "PSNEV"), The system data stream multiplexer 44 outputs a compressed video bit from the system data stream 22. All data for all B frames in the stream 18 are omitted (excluded). EF EV sets the actual time for the NOF assembled into compressed video bitstream 18 Preliminarily specified from the estimated time for the NOF assembled in the compressed video bit stream 18 Less than the positive error value (“PSPEV”) If so, the system data stream multiplexer 44 controls the compressed video bit stream 18 A second copy of all data for all B frames in the system data stream 22. The preferred value for PSNEV and PSPEV is 1.5 for decoded video images. Represents the time interval required for the presentation of a frame. Thus, the size of the EFEV Exceeds the time interval required for the presentation of 1.5 frames of the decoded video image Only when representing a time interval, all B frames in the compressed video bit stream 18 are systematically represented. Of all B frames in the compressed video bit stream 18 A second copy is assembled into the system data stream 22. Each frame in the system data stream 22 according to ISO / IEC 11172 Part 2 Since the frames are numbered, the system data stream multiplexer 44 All data for all B frames in stream 22 from compressed video bit stream 18 Is omitted, the system data stream multiplexer 44 provides all the data in the current GOP. Before assembling them into the system data stream 22 on all subsequent frames, Must be renumbered. Correspondingly, the system data flow Wedge 44 converts all B frames in compressed video bit stream 18 into system data stream 22. Incorporate a second copy of all data for the system data stream The multiplexor 44 will therefore number that frame and all subsequent frames from the current GOP. Must be renumbered on the frame. FIG. 4 shows that all data for all frames of the video signal 38 are system data. A second copy of all frames of video signal 38 to be omitted from stream 22 To determine if all data for is to be incorporated into the system data stream 22 A computer program written in the C programming language that embodies the process is there. Line numbers 1-8 in FIG. And the count from the position in the video signal 64 and take in the NOF and NOS Set the value. Line numbers 13-16 in FIG. 4 embody the calculation of EFEV You. Line numbers 21-22 in FIG. 4 apply a low pass filter to the EFEV. The line numbers 26 to 36 in FIG. All data should be omitted from the system data stream 22 or the video signal 38 All data for the second copy of all frames of the Decide if it should be. Industrial applicability The host microphone is used to set the bit rate for the compressed video bit stream 18. Computer programs executed by the microprocessor , The desired reference bit rate subtracted compressed audio bit for system data stream 22. Approximately 1% or less of a bit rate specified in advance for the stream 16. Compressed bidet Set the bit rate for the obbit stream 18 to 1% below the desired reference bit rate Then, the bits for the compressed audio bit stream 16 and the compressed video bit stream 18 The sum of the rates plus the overhead of system data stream 22 is equivalent to the pressure A second copy of all data for all B frames in compressed video bit stream 18 is If it happens to be incorporated into the system data stream 22, A sufficient safety margin that the maximum bit rate never exceeds You. The system data stream multiplexer 44 divides the system data stream 22 into minutes. Omission (exclusion) of B frames from system data stream 22 only after assembly Or, start by adding a B frame to the system data stream 22. system The data stream multiplexer 44 has a short time interval, B frame Omission or addition of is prohibited. B taking during the first few minutes of system data stream 22 Malfunction savings or additions of frames can be made by adding one relatively small number to NOS to NOF. The result is a division by another relatively decimal number. Both audio encoders 32 The host microprocessor activates the operation of the video encoder 36, respectively. The instructions sent from the computer program to be executed Produces microcode that is executed by both of video encoders 36, which Reset the count and the count at position 64 to zero, so the first few minutes Decimal numbers occur for NOS and NOF during this operation. After a time interval of several minutes, the cow The count NOS and the count NOF are sufficiently large, and the subsequent DORs are one GOP. Does not change so noticeably for the next GOP. For a short time in the first few minutes of the system data stream 22, omission of B frames (excluding B), and in addition to completely prohibiting the addition, the B frame Should be omitted (excluded) from the system data stream 22 Before testing the EFWV to determine if it should be added to Applied to the EV, the elimination or addition of the malfunction of the B frame is further prohibited. Low pass filter Applying to EFEV, the long-term trend in the difference between EEAVR and AEAVR Only in one GOP, responding only and irrespective of the variation of NOS and NOF values During reading of one value of either NOS or NOF and GOP immediately before or after Irrespective of reading the corresponding value of either NOF or NOS Omission from system data stream 22 or addition of B frame to system data stream 22 Is guaranteed. Preferred low-pass filters applied to EFEV have an asymmetric response. That is, Due to the characteristics of the low-pass filter, the output value of the filter is non-zero with respect to EFEV. In response to the zero value in response to the zero value, Return to. The actual response time employed for the low pass filter is determined semi-empirically. You. In addition, the system data stream multiplexer 44 controls the compression of the compressed video bit stream 18. If you want to omit or add frames from the system data stream, The output value of the filter is arbitrarily set to zero. Set the output value of the low-pass filter to zero For example, during processing of successive MPEG GOPs, all frames of the compressed video bit stream 18 Omission of frames or addition of all frames of the compressed video bit stream 18 is more prohibited. Swell. In accordance with the present invention, a preferred audio encoder 32, a preferred video encoder 36 And the combination of the system data stream multiplexer 44 can be used in virtually any location. Allow the desired data stream 22 to be assembled directly and without any intervening processing operations. Accept. For example, in the Netherlands, Javi Eindhoven 5600; O. Box80002, Philips Consumer of Building SA-1 ・ Electronics, coordination office & magnetic media system Establishes specifications for video CDs colloquially referred to as "white box" specifications did. Philips' white box standard is 1,151.929.1 per second Maximum bit rate for compressed video bit stream 18 of bits, 44.1 kHz Audio sampling rate and audio bit rate of 224 kilobits per second Identify the site. Philips' White Box Standard also states that audio Video packets are 2279 bytes long, while video packets are 2296 bytes long. And the system data stream 22 has a pack rate of 75 packs per second. You. In connection with the preferred audio encoder 32 and the preferred video encoder 36 The operating system data stream multiplexer 44 according to the present invention provides Stream 22 is a stream of stably identified compressed audio bit stream 16 and compressed video bits From stream 18 according to Philips white box standards and any intervention There is no operation and can be assembled directly. Although the present invention has been described with reference to the presently preferred embodiment, Such disclosure is purely illustrative and should not be construed as limiting. Should be understood. Accordingly, without departing from the spirit and scope of the invention, Various alterations, modifications and / or alternative applications have, without doubt, read the above disclosure. It will be suggested to those skilled in the art. Therefore, the following claims are claimed by the true spirit of the invention. And is understood to encompass all changes, modifications or alternative applications falling within the scope. It is intended to be done.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＨ０４Ｎ 7/04 Ｈ０４Ｎ 7/04 １０１ 7/045 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁶ Identification code FI H04N 7/04 H04N 7/04 101 7/045

Claims

[Claims] 1. A method for real-time assembly of an encoded system data stream that may be decoded by a decoder into a decoded video image and a decoded audio signal, wherein the system data stream causes the decoder to substantially decode the decoded audio signal. Assembled in synchronization with the image, wherein the system data stream is assembled by interleaving packets of data selected from the compressed audio bit stream with packets of data selected from the compressed video bit stream; The compressed audio bit stream is generated by compressing an audio signal sampled at a pre-specified audio sampling rate ("PSASR"), and the compressed video bit stream is generated at a pre-specified video frame rate ("PSVFR"). Compressing a series of frames of a video signal having Generated I, said method comprising the following step (step): a. Calculating a predicted coded audio / video ratio (“EEAVR”) equal to the PSASR divided by the PSVFR, before commencing assembly of the system data stream; b. In the above system data flow, the system audio Lock Flags and systems video Lock Flags are set to indicate that a specific constant relationship exists between the audio sampling rate and the system clock reference (SCR) and a specific constant relationship exists between the video image rate and the SCR, respectively. Embed a system header ("SH"); c. Repeatedly selecting packets of data from the compressed audio bit stream or the compressed video bit stream, respectively, and incorporating them into the system data stream; d. Repeatedly presenting a presentation timestamp ("PTS") with each selected packet from the compressed audio bit stream or the compressed video bit stream in the system data stream; The actual encoded audio / video ratio ("AEAVR") equal to a number representing the count of all samples ("NOS") of the audio signal received for compression divided by the total number of frames ("NOF") F. Calculate the coded frame error value ("EFEV"), which is first subtracted from AEAVR by EEAVR to get the difference in the ratio ("DOR"), and then the NOF is added to the DOR thus calculated. G. If EFEV is less than a pre-specified negative error value ("PSNEV") 1. omitting all frames of said video signal from the data stream 2. h.If EFEV is greater than a pre-specified positive error value ("PSPEV"), the system data 2. The method of claim 1 comprising incorporating into the stream a second copy of all data for all frames of the video signal 3. The PSPEV is half of a single frame of the decoded image 3. A method as claimed in claim 2, characterized by representing a time interval that is greater than the time interval required for presenting the system data stream. Prohibiting the exclusion of frames of the video signal and the addition of a second copy of all data to the system data stream for all frames of the video signal. 3. The method of claim 2, further comprising: i. Prior to deciding whether to exclude frames of the video signal from the system data stream and converting the frames of the video signal to the system data stream. 5. The method of claim 2, further comprising the step of adapting a low-pass filter to the EFEV prior to the decision whether to add 6. The PSNEV is half of a single frame of the decoded video image. 6. A method as claimed in claim 1, characterized in that it represents a time interval which is greater than the time interval required for the presentation of the system data stream. 7. The method according to claim 1, wherein the exclusion of frames of the video signal is inhibited 8. Further, h. Before deciding whether et exclusion method according to claim 1, characterized by comprising the step of adapting the low-pass filter to EFEV. 9. A method for real-time assembly of an encoded system data stream that may be decoded by a decoder into a decoded video image and a decoded audio signal, wherein the system data stream causes the decoder to substantially decode the decoded audio signal. Assembled in synchronization with the image, wherein the system data stream is assembled by interleaving packets of data selected from the compressed audio bit stream with buckets of data selected from the compressed video bit stream; The compressed audio bit stream is generated by compressing an audio signal sampled at a pre-specified audio sampling rate ("PSASR"), and the compressed video bit stream is generated at a pre-specified video frame rate ("PSVFR"). Compressing a series of frames of a video signal having Generated I, said method comprising the following step (step): a. Calculating a predicted coded audio / video ratio (“EEAVR”) equal to the PSASR divided by the PSVFR, before commencing assembly of the system data stream; b. In the above system data flow, the system audio Lock Flags and systems video Lock Flags are set to indicate that a specific constant relationship exists between the audio sampling rate and the system clock reference (SCR) and a specific constant relationship exists between the video image rate and the SCR, respectively. Embed a system header ("SH"); c. Repeatedly selecting packets of data from the compressed audio bit stream or the compressed video bit stream, respectively, and incorporating them into the system data stream; d. Repeatedly presenting a presentation timestamp ("PTS") with each selected packet from the compressed audio bit stream or the compressed video bit stream in the system data stream; The actual encoded audio / video ratio ("AEAVR") equal to a number representing the count of all samples ("NOS") of the audio signal received for compression divided by the total number of frames ("NOF") F.Calculate the coded frame error value ("EFEV"), which is first subtracted from AEAVR by EEAVR to obtain the difference in the ratio ("DOR"), and then the NOF is added to the DOR thus calculated. And g.If EFEV is greater than a pre-specified positive error value ("PSPEV"), 10. Incorporating a second copy of all data for all frames of the video signal into the video data stream 10. PSPEV is required for presentation of half a single frame of the decoded video image 10. The method of claim 9, wherein the method represents a time interval greater than a time interval 11. 11. All data for all frames of the video signal during a few minutes immediately following the start of the incorporation of the system data stream. 11. The method of claim 9, further comprising: inhibiting adding a second copy of the system data stream to the system data stream 12. h. The method of claim 9, comprising applying a low pass filter to the EFWV before determining whether.