JP2004511113A

JP2004511113A - How to encode multiple audio streams

Info

Publication number: JP2004511113A
Application number: JP2000587531A
Authority: JP
Inventors: ウ，ボ; ル，ビン; フェン，イェン
Original assignee: エンリーチ・テクノロジー・インコーポレイテッド
Priority date: 1998-12-11
Filing date: 1999-12-07
Publication date: 2004-04-08
Also published as: WO2000035194A1; WO2000035194B1; WO2000035194A9

Abstract

１つまたは複数の静止ピクチャまたはビデオ・ストリームを、複数のサウンド・ストリームと、オーバーレイ・グラフィックスおよび／またはテキスト情報と共に符号化できるように、かつ符号化したものを再生できるようにし、それにより、媒体（例えばコンパクト・ディスク）が多量のオーディオ情報（例えば歌）で符号化できるようにする方法が提供される。具体的には、サウンド・ストリームの再生時に、ある静止ピクチャがすべてのオーディオ情報（例えば歌）に対して表示される方式で、静止ピクチャおよび複数のサウンド・ストリームが符号化される。ＯＧＴ情報を、任意選択で符号化することができ、あるいは符号化して任意選択で表示して、シングアロングの状況でサブタイトルまたはガイダンスを提供することができる。コンパクト・ディスクなどの媒体に記憶されるビデオ・データの量を最小限に抑えることにより、膨大な量のオーディオ情報（したがって歌）を単一のコンパクト・ディスクに記憶することができる。別法として、少なくとも１つのビデオ・ストリームを、複数のオーディオ・ストリームおよびオプションのＯＧＴ情報と共に符号化することもできる。このようにして、再生時に、単一のビデオ・ストリームが、様々なオーディオ・ストリームおよび対応するＯＧＴ情報と共に表示される。この方法は、記憶媒体に記憶することの必要なビデオ情報の量を最小限に抑え、したがって、コンパクト・ディスクなどの媒体に多大な量のオーディオ情報を記憶できるようにする。Enabling one or more still picture or video streams to be encoded with a plurality of sound streams and overlay graphics and / or textual information, and allowing the encoded to be reproduced, A method is provided that allows a medium (eg, a compact disc) to be encoded with a large amount of audio information (eg, a song). Specifically, a still picture and a plurality of sound streams are encoded in such a manner that a certain still picture is displayed for all audio information (for example, a song) when a sound stream is reproduced. The OGT information can be optionally encoded or can be encoded and optionally displayed to provide a subtitle or guidance in a sing-along situation. By minimizing the amount of video data stored on media such as compact discs, vast amounts of audio information (and thus songs) can be stored on a single compact disc. Alternatively, at least one video stream may be encoded with multiple audio streams and optional OGT information. In this way, upon playback, a single video stream is displayed with various audio streams and corresponding OGT information. This method minimizes the amount of video information that needs to be stored on the storage medium, and thus allows a large amount of audio information to be stored on media such as compact discs.

Description

【０００１】
（相互参照）
本発明は、カリフォルニアの企業であるＥｎｒｅａｃｈ　Ｔｅｃｈｎｏｌｏｇｙから提供された、１９９８年９月２２日付「Ｓｕｐｅｒ　ＶＣＤ　Ｓｙｓｔｅｍ　Ｓｅｃｉｆｉｃａｔｉｏｎ」、および「ＶＣＤ３．０　Ｓｐｅｃｉｆｉｃａｔｉｏｎ」を参照により組み込む。
【０００２】
（発明の分野）
本発明は一般に、オーディオおよびビデオの情報を符号化および再生する方法に関し、より詳細には、オーディオ・ストリーム、テキスト・ストリーム、およびビデオ・ストリームを組み合わせたマルチメディア情報を符号化および再生する方法に関する。
【０００３】
（発明の背景）
オーディオ・ストリームとビデオ・ストリームを物理媒体（コンパクト・ディスクなど）に符号化およびフォーマットする従来の方法では、ある固有のビデオ・ストリームが、１つまたは複数のオーディオ・ストリームと結合される。これは一般に、あらゆるビデオ・オーディオ結合について言える。例えば、ビデオ・コンパクト・ディスク（「ＶＣＤ」）上またはディジタル・バーサタイル・ディスク（「ＤＶＤ」）上に符号化される音楽ビデオでは、ある特定の音楽ビデオ・ストリームが、ある特定のオーディオ・ストリームと共に符号化される。再生時、ビデオ・ストリームは、オーディオ・ストリームと同期して再生される。
【０００４】
特殊化された用途、例えばシングアロング（またはカラオケ）用途では、ある歌に対するビデオおよびオーディオの情報を符号化する際の従来の方法は、ビデオ・ストリームを１つまたは２つのオーディオ・ストリームと、オーバーレイ・グラフィックスおよび／またはテキスト（「ＯＧＴ」）ストリームと結合して、ビデオ・ストリームをすべて１つのトラックにオーバーレイするものである。したがって一般に、トラックごとに１つの歌がある。オーディオ・ストリームが２つある場合は、通常、一方のオーディオ・ストリームが、歌手の音声と基調をなす音楽の両方を含み、第２のオーディオ・ストリームが、基調をなす音楽を有するだけとなる。ＯＧＴ情報は通常、選択された言語の語を含むことになる。ただしこれは、ビデオ・ストリームおよびオーディオ・ストリームと同期されなければならない。
【０００５】
この従来方法は、大部分の用途に対して優れているが、コンパクト・ディスク上に搭載できる歌（それぞれ、ある固有のビデオ・ストリームと１つまたは複数の固有のオーディオ・ストリームとを有する）の数が極端に制限される。このフォーマットにより、工業規格のＶＣＤ上では、使用される特定のフォーマットに応じて曲数が１２〜１８曲に制限される。ＶＣＤでは通常、ビデオ・ストリームおよびオーディオ・ストリームを符号化する際にＭＰＥＧ１規格が用いられることに留意されたい。ＭＰＥＧ２規格で符号化すると、工業規格のＤＶＤ上では、曲数は２５〜３０曲に制限される。通常、これ以上の歌をこれらのディスクに搭載するのは不可能である。したがって、大きなソング・コレクションでは多量のディスクが必要とされ、それら多量のディスクを保管し、取り扱わなければならず、装置および保管場所の追加コストが生じる。このことは、商業用途（例えばシンギング・ラウンジ）において重大な問題となる。
【０００６】
したがって、多量の歌をビデオ情報と共にコンパクト・ディスクに記憶する方法を有することが望ましい。この方法は、可能な限り、既存の技術との互換性を最大限にできるようにすべきである。
【０００７】
（発明の概要）
したがって本発明の目的は、１つまたは複数の静止ピクチャを、複数のオーディオ・ストリームと共に符号化および再生する方法を提供することである。
【０００８】
本発明の別の目的は、１つまたは複数の静止ピクチャを、複数のオーディオ・ストリームと、それぞれのオーディオ・ストリームに同期させたＯＧＴ情報と共に、符号化および再生する方法を提供することである。
【０００９】
本発明の別の目的は、１つのビデオ・ストリームを、複数のオーディオ・ストリームと、それぞれのオーディオ・ストリームに同期させたＯＧＴ情報と共に、符号化および再生する方法を提供することである。
【００１０】
本発明の別の目的は、シングアロングの目的で、静止ピクチャまたはビデオ・ストリームと、複数のオーディオ・ストリームとを、符号化および再生する方法を提供することである。
【００１１】
簡潔に言えば、現時点で好まれる本発明の一実施態様では、１つまたは複数の静止ピクチャまたはビデオ・ストリームを、複数のサウンド・ストリームと、オーバーレイ・グラフィックスおよび／またはテキスト情報と共に符号化できるように、かつ符号化したものを再生できるようにし、それにより、媒体（例えばコンパクト・ディスク）に多量のオーディオ情報（例えば歌）を符号化できるようにする方法が提供される。具体的には、サウンド・ストリームの再生時に、ある静止ピクチャがすべてのオーディオ情報（例えば歌）に対して表示される方式で静止ピクチャおよび複数のサウンド・ストリームが符号化される。ＯＧＴ情報を任意選択で符号化することができ、あるいは符号化して任意選択で表示して、シングアロングの状況でサブタイトルまたはガイダンスを提供することができる。コンパクト・ディスクなどの媒体に記憶されるビデオ・データの量を最小限に抑えることにより、膨大な量のオーディオ情報（したがって歌）を単一のコンパクト・ディスクに記憶することができる。別法として、少なくとも１つのビデオ・ストリームを、複数のオーディオ・ストリームおよびオプションのＯＧＴ情報と共に符号化することもできる。このようにして、再生時に、単一のビデオ・ストリームが、様々なオーディオ・ストリームおよび対応するＯＧＴ情報と共に表示される。この方法は、記憶媒体に記憶することの必要なビデオ情報の量を最小限に抑え、したがって、コンパクト・ディスクなどの媒体に多大な量のオーディオ情報を記憶できるようにする。本発明は、特定の記憶媒体に限定したものではないことに留意されたい。記憶媒体は、ＶＣＤフォーマットされた物理ディスク、ＤＶＤフォーマットされた物理ディスク、または他のどんな物理媒体とすることもできる。本発明の方法を用いれば、ＶＣＤフォーマットされたディスク上に５０曲を超える歌を符号化することができる。
【００１２】
本発明の利点は、１つまたは複数の静止ピクチャを、複数のオーディオ・ストリームと共に符号化および再生する方法が提供されることである。
【００１３】
本発明の別の利点は、１つまたは複数の静止ピクチャを、複数のオーディオ・ストリームと、それぞれのオーディオ・ストリームに同期させたグラフィックスおよびテキストと共に、符号化および再生する方法が提供されることである。
【００１４】
本発明の別の利点は、１つのビデオ・ストリームを、複数のオーディオ・ストリームと、それぞれのオーディオ・ストリームに同期させたグラフィックスおよびテキストと共に、符号化および再生する方法が提供されることである。
【００１５】
本発明の別の利点は、シングアロングの目的で、静止ピクチャまたはビデオ・ストリームと、複数のオーディオ・ストリームとを符号化および再生する方法が提供されることである。
【００１６】
これらおよび他の、本発明の特徴および利点は、図を考察し、後続の本発明の詳細な説明を読めば、よく理解されるであろう。
【００１７】
（好ましい実施形態の詳細な説明）
本発明の現行で好ましい方法では、静止ピクチャを複数のオーディオ・チャネルで符号化するための方法、静止ピクチャを複数のオーディオ・チャネルと複数のＯＧＴ情報で符号化する方法、およびビデオ・ストリームを複数のオーディオ・チャネルおよび複数のＯＧＴサブストリームで符号化する方法を開示する。ビデオ・ストリームおよび静止ピクチャの符号化は、ＩＳＯ１８３１８（ＭＰＥＧ−２）に従って実行され、またオーディオ・ストリームの符号化は、ＭＰＥＧ−１またはＭＰＥＧ−２の層ＩＩに従って実行される。
【００１８】
複数のオーディオ・チャネルで１つまたは複数の静止ピクチャの符号化では、システム・ストリームが、ＭＰＥＧビデオ・ストリームとして符号化された１つまたは複数の通常解像度または高解像度の静止ピクチャ、および１つまたは複数のオーディオ・チャネル・サブストリームを含む。静止ピクチャを圧縮する上での特定の方法を下記に説明する。オーディオ・ビットレートは、２２４ｋ／秒である。２×ＣＤローダを使用して、１２個のオーディオ・ストリームを１つのトラックに符号化することができ、そこでは、オーディオ・パケット・ヘッダ・フィールド内で、ｓｔｒｅａｍ＿ｉｄフィールドが、＄Ｃ０から＄ＣＢまでの範囲内の値を有することができ、これによって、１２個のオーディオ・ストリームを識別する。合計ビットレートは、オーディオ・ビットレート（２２４ビット／秒）に１２個のオーディオ・ストリームを掛けたものとして計算されて、毎秒２６８８０００ビットを与え、これは、２速式ＣＤプレーヤの場合のビットレート内にある。すべてのサウンド・サブストリームは、ディスクおよびプレーヤの再生性能を向上させるため、インターレースして一定のＣＤ回転速度を維持する。
【００１９】
複数のオーディオ・チャネルおよび複数のＯＧＴストリームで１つまたは複数の静止ピクチャの符号化では、システム・ストリームは、ＭＰＥＧビデオ・ストリームとして符号化された１つまたは複数の通常解像度または高解像度の静止ピクチャ、１つまたは複数のオーディオ・チャネル、および１つまたは複数のＯＧＴサブストリームを含む。静止ピクチャを圧縮する上での特定の方法を下記に説明する。前記形式の場合と同様に、オーディオ・ビットレートは、２２４ビット／秒である。１２個のオーディオ・ストリームをトラック上に符号化することができ、またオーディオ・パケット・ヘッダ・フィールド内で、ｓｔｒｅａｍ＿ｉｄフィールドが、＄Ｃ０から＄ＣＢであり得る。合計ビットレートは、オーディオ・ビットレート（２２４ｋ／秒）に１２個のオーディオ・ストリームを掛けたものとして計算されて、毎秒２，６８８，０００ビットを与え、これは、２速式ＣＤプレーヤの場合の性能範囲内にある。すべてのサウンド・サブストリームは、ディスクおよびプレーヤの再生性能を向上させるため、インターレースして一定のＣＤ回転速度を維持する。ＯＧＴ情報に関しては、これをオーディオ・ストリームに対応する複数のサブストリームを有する単一ストリームとして符号化する。ｓｕｂ＿ｓｔｒｅａｍ＿ｉｄは、０から＄ＦＣ（またはそれより大きい）ことが可能であり、２４（またはそれより多くの）サブストリームの範囲を提供する。すべてのＯＧＴパケットは、そのデータ（オフセット＄２１）の第１バイト上にそのｓｕｂ＿ｓｔｒｅａｍ＿ｉｄを有し、すべてのＯＧＴページは、すべてのデータの終りにＳＹＮＣワードを有していなければならず、このＳＹＮＣワードは、＄０４０８０Ｃ１０として指定される。ＯＧＴに対して利用可能な合計ビットレートは、下記の通り計算される。合計ＣＤビットレートから合計オーディオ・サブストリーム・ビットレートを引くと、これは２２９６ビット／秒であり、これに７５と２と８を掛けて（２７５５２００ビット／秒）、２６８８０００ビット／秒を引いた結果、６７２００ビット／秒となる。
【００２０】
複数のオーディオとＯＧＴのストリームを使用する、１つまたは複数のビデオ・ストリームの符号化では、ビデオ・ストリーム符号化ビットレートをオーディオ・チャネル数に応じたものにする。オーディオ・ストリーム（またはチャネル）が多いほど、そのビデオ・ストリームに対するより低いビットレートが必要とされる。本発明のシステムの通常ビデオ・ビットレートは、１，２００，０００ビット／秒であることが可能であり、これは、単一ビデオ・ストリームを有する単一トラック（２２４，０００×６ビット／秒）上に６個のオーディオ・ストリームを許容する。この場合、合計ビットレートは、１，２００，０００に２２４，０００を足し、それに６を掛けて、２，５４４，０００ビット／秒に等しくなる。残りの帯域幅は、ＯＧＴストリームまたは埋込みのために使用することができる。また、ビットレート分配の異なる設定も存在し得る。ビデオ・ビットレートがより高いとき、単一トラック内に圧縮できるオーディオ・チャネルは、より少なくなる。
【００２１】
ディスクは、一般的に、いくつかのトラックに分割され、これは、そのディスクのプログラマによって指定される。任意のトラック内に、それぞれが固有のヘッダまたはデータ・ストリームを有するいくつかのパックが存在する。各パック内に、いくつかのパケットが存在することが可能であり、やはり、それぞれが、そのデータ型および構成を識別するヘッダ・セクションを有している。パックとパケットは同一ではないことに留意されたい。パックは、１つまたは複数のパケットを含み得る。一般的に言って、いくつかの型のパケットが存在する。これは、オーディオ、ビデオ、ＯＧＴ、および埋込みである。ほとんどの場合、パックは、単一のパケットを含む。ストリームの終りでなど、いくつかの場合では、パックは、オーディオ・パケットおよび埋込みパケットを含み得る。
【００２２】
パックは、再生順序にしたがってトラック上に配置され、そこで、そのパックは、そのパケット型に関わらず、インターリーブされる。各パケットは、識別番号によって識別され、またすべてのパックは、プレーヤによって順次に読み取られるので、プレーヤは、パックのすべてをそのパケット識別番号およびそれぞれのシーケンス（またはタイミング）番号に従って容易に再アセンブルすることができる。
【００２３】
前述の本発明の好ましい実施形態は、範囲を変更して、オーディオ・パケットおよびＯＧＴパケットに対するストリーム識別番号、より多くのチャネル（またはストリーム）を識別すること、したがって再生することを可能にする。従来技術では、オーディオ・パケット識別番号は、４つの値に制限され、したがって、オーディオ・チャネルの数が４チャネルに制限されている。前述のとおり、オーディオ・パケットに対する識別番号は、変更されており、メディア・プレーヤの速度をオーディオ・ビットレートで割ったものまでを可能にしている。ＣＤプレーヤの場合、コンパクト・ディスクからプレーヤに読み取ることのできるデータの最大量は、プレーヤの速度によって制限されている。２速式（２×）ＣＤプレーヤの場合、ビットレートは、毎秒２，７２４，０００ビットである。毎秒２２４，０００ビットのオーディオ・ビットレートで割ると、２速式ＣＤプレーヤは、理論上は、１トラック内で２個のオーディオ・ストリームまで再生することができる。より高速のＣＤプレーヤの場合、より多くのオーディオ・ストリームをトラック内に配置して、再生することができる。
【００２４】
理論上では、前記計算は、２速式プレーヤの任意の速度に対して再生可能なオーディオ・ストリームの最大数に対して正しい。これは、ユーザが、１２個のオーディオ・ストリームのうちのどの１つでも選択して、そのオーディオ・ストリームを全く問題なく再生させることができることを意味する。実証的には、本発明の態様によって提供される追加の技法なしでは、いくつかのオーディオ・ストリームに関するデータが、再生に間に合うようにプレーヤによって読み取られ得ないアンダーフローの問題を被ることなしに、２速式ＣＤプレーヤに対する任意のトラック内で、６個または７個を越えるオーディオ・ストリームを再生することは不可能である。このアンダーフロー問題の理由は、読取られたデータと再生されるデータの間のタイミングに関する。
【００２５】
この問題を解決するため、本発明の好ましい実施形態の好ましい実施形態は、パケットのＰＴＳ値をさらに変更する。任意のパケットのＰＴＳ値は、特定パケットのプレイタイムの開始を示す。例えば（概念上でのみ）、１分間長のサン（ｓｏｎ）を含んだオーディオ・ストリームは、６つのパケットに分けて順序付けすることができる。第１パケットは、第０番秒のＰＴＳ時間を有し、第２パケットは、第１０番秒のＰＴＳ時間を有し、第３パケットは、第２０番秒のＰＴＳ時間を有することになり、以下同様である。最後のパケットは、第５０番秒のＰＴＳ時間を有する。
【００２６】
従来技術方法の下では、１２個のオーディオ・ストリーム・トラックを再生する上で、１２個すべてのオーディオ・ストリームが、時間０で再生されなければならない。先に説明したとおり、実際には、オーディオ・ストリームのすべてが時間０で開始する従来技術方法を使用することは、アンダーフロー問題を引き起こすことになる。
【００２７】
本発明の好ましい実施形態では、図１を参照すると、各オーディオ・ストリームの第１パケットに対するＰＴＳ時間を所定量の時間だけスタッガにする方法が考案されている。図示するとおり、各歌は、ある単位の時間だけスタッガにされている。この状況では、第１２番目の歌が選択されて再生された場合、その歌の再生の前に、沈黙時間（１１時間単位に等しい）が存在することになる。好ましい実施形態のこの時間単位は、パックを読み取るのに必要な時間によって決定される。２速式ＣＤプレーヤの場合は、毎秒１５０パックを読み取ることができ、これは１／１５０秒に等しい。すべてのパックは、同一サイズであり、またパックは、トラックから読み取られるべきデータの最小論理単位であることに留意されたい。
【００２８】
好ましい実施形態では、図２ａが、データをトラック上に配置するとき、静止ピクチャおよび１２個のオーディオ・ストリームを有するトラック上へのパックの配置を図示している。ビデオ・パック（記号「Ｖ」によって示される）のすべては、事前ロードすることができ、オーディオ・ストリーム１から１２（Ａ_１からＡ_１２）がそれに続き、その後、それが繰り返すことに留意されたい。また、最大量のオーディオ・ストリームを符号化するとき、そのオーディオ・ストリームは、図示するとおり、順次、符号化しなければならないことにも留意されたい。１２個すべてのオーディオ・ストリームを符号化しない場合、オーディオ・パケットの順序に、より大きな許容範囲があり得る。図２ｂは、静止ピクチャと、１２個のオーディオ・ストリームと、ＯＧＴストリームとがインターリーブされているトラック上へのパックの配置を図示している。この場合も、静止ピクチャが、オーディオ・ストリームのすべてに関して使用されるので、ビデオ情報は、どのオーディオ情報のロードにも先立って事前ロードすることができる。図２ｃは、ビデオ・クリップを符号化しているビデオ・ストリームと、６個のオーディオ・ストリームと、ＯＧＴストリームとがインターリーブされているトラック上へのパックの配置を図示している。ビデオ・クリップが存在するので、トラック上に符号化するよりも、オーディオ情報の量は縮減される。この例では、６個のオーディオ・ストリームを図示している。
【００２９】
さらに多くの歌をトラック上に与えるため、本発明は、追加の歌をトラック上に符号化するための方法を提供する。前述の好ましいい実施形態は、２速式ＣＤプレーヤのための１２チャネル（またはストリーム）符号化方法を図示しており、これにより、少なくとも１２の歌が、２速式ＣＤプレーヤ上での再生のために符号化できるようにしている。追加の歌を単一トラック上に組み込むために、１つの歌がもう１つの歌に同一オーディオ・ストリーム内で続くような方式で、２つまたはそれより多くの歌を配列することができる。この方式では、複数の歌がオーディオ・ストリーム内に存在することが可能であり、媒体への歌の数に対する制限は、媒体の論理構造によって（従来技術方法の場合のように）ではなく、その媒体の容量によってのみ制限されている。同一オーディオ・ストリーム内の他の歌にアクセスするため、読取りヘッドをトラック内の目標とする歌の開始位置に誘導するコマンドが提供される。好ましい実施形態では、再生コマンドが提供される。これは、特定のオーディオ・ストリーム、特定のＯＧＴストリーム、および特定のオーディオ・ストリーム内の開始と終了の時間を識別する引き数を有する。付録ＡおよびＢが、それに関するさらなる詳細を提供する。
【００３０】
本発明の一環として、スクリプト言語もまた提供され、より高く、より動的な対話機能性を可能にし、これは、特に、前述のファイル形式を有するファイルとともに使用するとき、そうである。これは、単純な数学計算の実行、単純な文字オペレーション、グラフィックスの描画、イメージ・ファイルおよび前記ファイルの表示、またはサウンド・クリップおよびビデオ・クリップの再生を可能にする。より詳細には、この言語は、ｅｑｕａｌ　ｔｏ（に等しい）、ｌｅｓｓ　ｔｈａｎ（より小さい）、ｇｒｅａｔｅｒ　ｔｈａｎ（より大きい）、ｌｅｓｓ　ｔｈａ　ｏｒ　ｅｑｕａｌ　ｔｏ（より小さい又はに等しい）、ｇｒｅａｔｅｒ　ｔｈａｎ　ｏｒ　ｅｑｕａｌ　ｔｏ（より大きい又はに等しい）、及びｎｏｔ　ｅｑｕａｌ　ｔｏ（に等しくない）などの関係演算子を提供する。これは、符号のラインごとの複数のステートメント、条件ステートメント（ｉｆ、ｔｈｅｎ、ｅｌｓｅ）、ｇｏｔｏステートメント、ｇｏｓｕｂ　ａｎｄ　ｒｅｔｕｒｎステートメント、ｆｏｒ　ｌｏｏｐステートメント、およびｅｎｄステートメントを可能にする。これは、また、スクリーン・クリア、スクリーン上の特定位置でのカーソルのドロー、線のドロー、長方形（フレームまた埋められた）の作図、またはスクリーン上のイメージなど、いくつかのグラフィック機能を提供する。また、マルチメディア機能も提供され、これには、サウンド・クリップ、ビデオ・クリップ、特定アイテム、特定ロケーション、またはリストの再生が含まれる。いくつかのシングアロング機能も提供され、これには、特定選択の再生リストへの設定、再生リスト内のある位置への選択の挿入、再生リストからの選択の削除、数のランダム方式の生成、クリップの再生が含まれる。また、システム・クロック時間および受信コール／受信ストア赤外線キーを得るコール・タイマ機能も存在する。本明細書に添付した付録ＡおよびＢが、本発明のスクリプト言語に関するさらなる詳細を提供する。
【００３１】
本発明の別の態様では、オーバーレイ・グラフィックスおよびテキスト（ＯＧＴ）を表示するための方法が開示される。オーバーレイ・グラフィックスおよびテキストは、プログラム・タイトル・グラフィックスおよび言語テキストの表示のために特別に設計されている。本好ましい方法は、フルスクリーン、マルチカラー、およびオーバーレイ表示をサポートし、また、音声テキストおよび他の静止グラフィカル・イメージの挿入を可能にする。その利点は、グラフィックスおよびテキストが、ビデオ符号化の前に必要とされず、代わりに、それらが、復号化時にマージされることである。この柔軟性は、言語選択を可能にして、ビデオ品質を保つ。オーバーレイ・グラフィックスおよびテキストのデータは、ＯＧＴ特別データ・ストリームに圧縮される。ＯＧＴデータ・ストリームは、ストリーム識別フィールド（＄ＢＤ）内で識別される「専用ストリーム」として符号化される。そのパック構造は、ビデオ・パック（ＰＳ）およびパケット（ＰＥＳ）と同じであり、追加のＯＧＴヘッダが、パケットのデータの始めで、サブストリームの識別（４を越える、例えば、０．．２３）を示している。それに応じて、サブストリームのうちの１つを選択することができる。
【００３２】
〔付録Ａ〕
〔ＣＨＭデータ形式仕様〕
すべての２バイトおよび４バイトの整数は、大きなエンディアン形式（ｅｎｄｉａｎｆｏｒｍａｔ）で表わされる。
すべてのテキスト・ストリングは、空文字（＼０）で終了し、４バイトの境界になるように埋め込まれる。
この仕様は、ｗｗｗ．ｗ３．ｏｒｇで入手可能な、［ＲＦＣ１８０８］「Ｒｅｌａｔｉｖｅ　Ｕｎｉｆｏｒｍ　Ｒｅｓｏｕｃｅ　Ｌｏｃａｔｏｒｓ」（Ｒ．Ｆｉｅｌｄｉｎｇ，１９９５年６月）で定義されている、用語ＵＲＬ（Ｕｎｉｆｏｒｍ　Ｒｅｓｏｕｒｃｅ　Ｌｏｃａｔｏｒｓ）を使用する。
この仕様は、Ｅｒｒｏｒ！　Ｂｏｏｋｍａｒｋ　ｎｏｔ　ｄｅｆｉｎｅｄで入手可能な、［ＲＥＣ−ｈｔｍｌ４０］で定義されている、用語ＨＴＭＬを使用する。
この仕様は、ｈｔｔｐ：／／ｗｗｗ．ｍｉｃｒｏｓｏｆｔ．ｃｏｍ／で入手可能な、「Ｍｉｃｒｏｓｏｆｔ　Ｗｉｎｄｏｗｓ　Ｍｕｌｔｉｍｅｄｉａ　Ｐｒｏｇｒａｍｍｅｒ’ｓ　Ｇｕｉｄｅ」で定義されている、用語ＷＡＶを使用する。
〔Ａ．１　ＣＨＭファイル形式（フォーマット）〕
ＣＨＭファイルは、次の一般的な形式を有する。識別テキスト・ストリングの次に、ヘッダ、複数のデータ・パケットが続き、プライベート・データ・パケットと終了テキスト・ストリングで終了する。
＜ＣＯＭＰＨＴＭＬ＞
ＣＨＭヘッダ
ＣＨＭデータ・パケット
ＣＨＭデータ・パケット
．．．．．．．．
ＣＨＭデータ・パケット
未知のデータ・パッケージ
＜／ＣＯＭＰＨＴＭＬ＞
【表１】

＜ＣＯＭＰＨＴＭＬ＞
．．．．ＣＨＭファイル識別ストリング
ＣＨＭヘッダ
．．．．大域情報を含むＣＨＭヘッダ
ＣＨＭデータ・パケット
．．．．ＣＨＭ構造を構成するデータのパケット
プライベート・データ
．．．．オーサリング・ツールの使用のために予約されたデータ・エリア
＜／ＣＯＭＰＨＭＬ＞
．．．．ＣＨＭファイルの終了の信号を送るテキスト・ストリング
〔Ａ．１．１　ＣＨＭヘッダ〕
ヘッダは、オーサリング・ツール、その作成者、およびソース・マテリアルに関する情報を含む。ブラウザは、パレット情報を使用して、ＣＨＭファイルで使用される新しいパレットを構築する。
【表２】

オーサリング・ツール情報
．．．．ＣＨＭファイルを作成するために使用されるツールを記述するテキスト・ストリング
バージョン情報
．．．．ＣＨＭファイルのバージョン
タイトル名
．．．．ＣＨＭページのタイトル
ユーザ名
．．．．ＣＨＭファイルの作成者
スクリーン幅
．．．．ソース・マテリアルの幅のピクセル数
スクリーンの高さ
．．．．ソース・マテリアルの高さのピクセル数
バックグラウンド色
．．．．このＣＨＭページにバックグラウンドとして使用されるパレットへのインデックス
パレット・サイズ
．．．．ＣＨＭファイルで使用されるパレットのサイズ
プライベート・データ・バイト
．．．．ページの作成者／内容所有者によって使用されるべきデータ
パレット・データ
．．．．パレットの＜ｐａｌｅｔｔｅ　ｓｉｚｅ＞ＹＵＶ項目
〔Ａ．１．２　ＣＨＭデータ・パケット〕
データ・パケットは、ブラウザがディスク上で情報を移動するために使用するデータを含む。ＣＨＭデータ・パケットは、次の形式を有する。ヘッダの次にデータが続く。ヘッダは、ヘッダに続くパケットのタイプとサイズを含む。
【表３】

データ・タイプ
．．．．ＣＨＭデータ・パケットのタイプ。タイプは、ＣＨＭデータ・タイプ・テーブルに列挙される。
サイズ
．．．．ヘッダを除くデータ・パケットのサイズのバイト数
チェックサム
．．．．指定されたアルゴリズムによるチェックサム
データ・パケット
．．．．ＣＨＭパケットのデータ
【表４】

〔Ａ．１．２．１　本明細書で使用するデータ・タイプ〕
【表５】

【表６】

【表７】

【表８】

【表９】

〔Ａ．１．２．２　データ・タイプ・パケット・タイプ〕
〔Ａ．１．２．２．１　ＣＨＭページのタイトル（タイプ０）〕
ＣＨＭページのタイトルを与える。これは、ブックマーク機能によって使用される。現在、英語テキストだけが認められている。
【表１０】

長さ
．．．．空文字を除くタイトル長さのバイト数
テキスト
．．．．タイトルのテキスト・ストリング
〔Ａ．１．２．２．２英語テキスト・オブジェクト（タイプ１）〕
【表１１】

座標
．．．．テキスト・ストリングの左下座標
アンカーＩＤ
．．．．アンカーのＩＤ
フォント・タイプ
．．．．テキストに使用されるフォント・スタイル
テキスト色
．．．．テキストの色
テキスト長さ
．．．．空文字を除くテキスト文字ストリングの長さ
テキスト
．．．．（ｘ，ｙ）座標で描写されるテキスト・ストリング
サポートされる文字：（スペースから〜まで）
ｓｐａｃｅ　！“＃＄％＆‘（）＊＋−．／０１２３４５６７８９：；，＜＝＞？＠ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ＼］＾＿‘ａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚ｛｜｝〜
〔Ａ．１．２．２．３　イメージ・オブジェクト（タイプ３）〕
【表１２】

座標
．．．．イメージの左上座標
幅および高さ
．．．．イメージの幅および高さ
アンカーＩＤ
．．．．アンカーのＩＤ
マップ名
．．．．クライアント側のイメージ・マップ名
色
．．．．テキストの色
イメージ・ファイル名長さ
．．．．空文字を除くイメージ・ファイル名の文字数
イメージ・ファイル名
．．．．イメージ・ファイルの名
〔Ａ．１．２．２．４長方形アウトライン・オブジェクト（タイプ４）〕
【表１３】

長方形座標
．．．．長方形アウトラインの（左上）および（右下）座標
色
．．．．アウトラインの色
〔Ａ．１．２．２．５　充填された長方形オブジェクト（タイプ５）〕
【表１４】

長方形座標
．．．．長方形アウトラインの（左上）および（右下）座標
色
．．．．アウトラインの色
〔Ａ．１．２．２．６　ライン・オブジェクト（タイプ８）〕
【表１５】

座標１
．．．．ラインの開始点
座標２
．．．．ラインの終了点
スタイル
．．．．ライン（未使用）の作図スタイル
色
．．．．ラインの色
〔Ａ．２．２．７　アンカー・オブジェクト（タイプ９）〕
【表１６】

アンカーＩＤ
．．．．アンカーのＩＤ
オーバーレイ
．．．．オーバーレイ値が「１」の場合、バックグラウンド・イメージは保存され、現在のイメージの上に新しいイメージが描画される。
イベント・ハンドラ
．．．．イベント・ハンドラ・データ構造を参照のこと。
オペレーション数
．．．．オペレーション・フィールドに存在するＣＨＭオペレーションの数
オペレーション
．．．．アンカー選択後に実行されるオペレーション
ＵＲＬ長さ
．．．．空文字を除いたＵＲＬストリングの長さ
ＵＲＬ
．．．．ＵＲＬストリング
〔Ａ．１．２．２．８　イメージ・マップ・エリア・オブジェクト（タイプ１１）〕
【表１７】

イメージ・マップ名
．．．．イメージ・マップの名前
オーバーレイ
．．．．オーバーレイ値が「１」の場合、バックグラウンド・イメージは保存され、バックグラウンド・イメージの上に新しいイメージが描画される。
イベント・ハンドラ
．．．．イベント・ハンドラ・データ構造を参照のこと。
オペレーション数
．．．．オペレーション・フィールドに存在するＣＨＭオペレーションの数
オペレーション
．．．．マップ選択後に実行されるオペレーション
ＵＲＬ長さ
．．．．空文字を除いたＵＲＬストリングの長さ
ＵＲＬ
．．．．ＵＲＬストリング
〔Ａ．１．２．２．９　リフレッシュ・オブジェクト（タイプ１３）〕
【表１８】

遅延
．．．．オペレーション実行までの秒数
オペレーション数
．．．．オペレーション・フィールドに存在するＣＨＭオペレーションの数
オペレーション
．．．．マップ選択後に実行されるオペレーション
ＵＲＬ長さ１
．．．．空文字を除く、オリジナルＵＲＬ名を参照する第１のＵＲＬストリングの長さ
ＵＲＬ長さ２
．．．．空文字を除く、オーバーレイＵＲＬ名を参照する第２のＵＲＬストリングの長さ
ＵＲＬ
．．．．空文字で終了するＵＲＬストリング
〔Ａ．１．２．２．１０　変数オブジェクト（タイプ１４）〕
【表１９】

座標
．．．．変数値を印刷するテキスト・ストリングの（左下）座標
変数値
．．．．値が表示される変数の名前
〔Ａ．１．２．２．１１　バックグラウンド・サウンド・オブジェクト（タイプ１５）〕
【表２０】

ループ数
．．．．クリップが再生されるべきループの数（−１は継続を示す）
サウンド・ファイル名長さ
．．．．サウンド・ファイル名の長さ
サウンド・ファイル名
．．．．再生されるべきＷＡＶファイルの長さ
〔Ａ．１．２．２．１２　スクリプト・オブジェクト（タイプ１６）〕
【表２１】

言語ＩＤ
．．．．スクリプト言語ＩＤ。有効ＩＤは、「ＶＣＤＳＣＲＩＰＴ」である。
スクリプト長さ
．．．．スクリプトの長さのバイト数
スクリプト
．．．．実行、または構文解析されるべきスクリプト
〔Ａ．１．２．２．１３　バックグラウンド・イメージ・オブジェクト（タイプ１７）〕
【表２２】

イメージ・ファイル名長さ
．．．．イメージ・ファイル名の長さ
イメージ・ファイル名
．．．．ロードされるべきイメージ・ファイルの名前。バックグラウンド・イメージは、様々なテキスト・スクリーンを描画するときに共有されるべきファイルである。
【００３３】
【表２３】

【００３４】
本発明を具体的な実施形態から述べたが、当業者には本発明の改変および修正が疑いなく明らかとなることが予期される。したがって、後続の特許請求の範囲は、本発明の真の主旨および範囲の内に含まれる、そうしたあらゆる改変および修正をカバーすると解釈されるものとする。
【図面の簡単な説明】
【図１】
本発明の好ましい実施形態で、オーディオ・ストリームを符号化する上での時間遅延を図示する図である。
【図２ａ】
好ましい実施形態の様々な形式に対する符号化を図示する図である。
【図２ｂ】
好ましい実施形態の様々な形式に対する符号化を図示する図である。
【図２ｃ】
好ましい実施形態の様々な形式に対する符号化を図示する図である。[0001]
(Cross-reference)
The present invention incorporates, by reference, "Super VCD System Specification" and "VCD 3.0 Specification", dated September 22, 1998, provided by Enreach Technology, a California company.
[0002]
(Field of the Invention)
The present invention relates generally to a method for encoding and playing audio and video information, and more particularly to a method for encoding and playing multimedia information that combines audio, text and video streams. .
[0003]
(Background of the Invention)
In conventional methods of encoding and formatting audio and video streams to a physical medium (such as a compact disc), one unique video stream is combined with one or more audio streams. This is generally true for any video-audio combination. For example, in a music video encoded on a video compact disc ("VCD") or a digital versatile disc ("DVD"), certain music video streams may be combined with certain audio streams. Encoded. During playback, the video stream is played back in synchronization with the audio stream.
[0004]
In specialized applications, such as sing-along (or karaoke) applications, the conventional method of encoding video and audio information for a song is to combine the video stream with one or two audio streams and an overlay. Combine with graphics and / or text ("OGT") streams to overlay the video streams all on one track. Thus, in general, there is one song per track. If there are two audio streams, typically one audio stream will contain both the singer's voice and the underlying music, and the second audio stream will only have the underlying music. The OGT information will typically include words in the selected language. However, it must be synchronized with the video and audio streams.
[0005]
This conventional method is excellent for most applications, but for songs that can be mounted on compact discs, each with one unique video stream and one or more unique audio streams. The number is extremely limited. This format limits the number of songs on an industry standard VCD to 12-18 songs depending on the particular format used. Note that VCD typically uses the MPEG1 standard when encoding video and audio streams. When encoded according to the MPEG2 standard, the number of songs on an industrial standard DVD is limited to 25 to 30 songs. Normally, it is not possible to load more songs on these disks. Thus, large song collections require a large number of disks, which must be stored and handled, adding to the cost of equipment and storage. This is a significant problem in commercial applications (eg, singing lounges).
[0006]
Therefore, it is desirable to have a way to store a large number of songs together with video information on a compact disc. This method should maximize compatibility with existing technologies whenever possible.
[0007]
(Summary of the Invention)
Accordingly, it is an object of the present invention to provide a method for encoding and playing one or more still pictures with multiple audio streams.
[0008]
It is another object of the present invention to provide a method for encoding and playing one or more still pictures, along with a plurality of audio streams and OGT information synchronized to each audio stream.
[0009]
It is another object of the present invention to provide a method for encoding and playing one video stream, with multiple audio streams and OGT information synchronized to each audio stream.
[0010]
Another object of the present invention is to provide a method for encoding and playing a still picture or video stream and a plurality of audio streams for sing-along purposes.
[0011]
Briefly, in one presently preferred embodiment of the present invention, one or more still picture or video streams can be encoded with multiple sound streams and overlay graphics and / or textual information. Thus, a method is provided that allows the encoded to be reproduced, thereby encoding a large amount of audio information (eg, a song) on a medium (eg, a compact disc). Specifically, when a sound stream is reproduced, a still picture and a plurality of sound streams are encoded in such a manner that a certain still picture is displayed for all audio information (for example, a song). The OGT information can be optionally encoded or can be encoded and optionally displayed to provide subtitles or guidance in a sing-along situation. By minimizing the amount of video data stored on media such as compact discs, vast amounts of audio information (and thus songs) can be stored on a single compact disc. Alternatively, at least one video stream may be encoded with multiple audio streams and optional OGT information. In this way, upon playback, a single video stream is displayed with various audio streams and corresponding OGT information. This method minimizes the amount of video information that needs to be stored on the storage medium, and thus allows a large amount of audio information to be stored on media such as compact discs. It should be noted that the present invention is not limited to a specific storage medium. The storage medium may be a VCD-formatted physical disk, a DVD-formatted physical disk, or any other physical medium. Using the method of the present invention, more than 50 songs can be encoded on a VCD formatted disc.
[0012]
An advantage of the present invention is that a method is provided for encoding and playing one or more still pictures with multiple audio streams.
[0013]
Another advantage of the present invention is that a method is provided for encoding and playing one or more still pictures, along with a plurality of audio streams and graphics and text synchronized to each audio stream. It is.
[0014]
Another advantage of the present invention is that a method is provided for encoding and playing one video stream, with multiple audio streams and graphics and text synchronized to each audio stream. .
[0015]
Another advantage of the present invention is that a method is provided for encoding and playing a still picture or video stream and multiple audio streams for single-along purposes.
[0016]
These and other features and advantages of the present invention will be better understood upon consideration of the figures and reading of the following detailed description of the invention.
[0017]
(Detailed description of preferred embodiments)
In the presently preferred method of the present invention, a method for encoding a still picture with multiple audio channels, a method for encoding a still picture with multiple audio channels and multiple OGT information, and a method for encoding a video stream are provided. A method is disclosed for encoding with audio channels and multiple OGT substreams. The encoding of video streams and still pictures is performed according to ISO 18318 (MPEG-2), and the encoding of audio streams is performed according to MPEG-1 or MPEG-2 Layer II.
[0018]
For encoding one or more still pictures on multiple audio channels, the system stream is composed of one or more normal or high resolution still pictures encoded as an MPEG video stream, and one or more still pictures. Includes multiple audio channel substreams. A specific method for compressing a still picture is described below. The audio bit rate is 224k / sec. Using a 2 × CD loader, 12 audio streams can be encoded into one track, where the stream_id field in the audio packet header field is from $ C0 to $ CB To identify the twelve audio streams. The total bit rate is calculated as the audio bit rate (224 bits / sec) multiplied by 12 audio streams to give 2688000 bits per second, which is the bit rate for a two-speed CD player Is within. All sound substreams are interlaced to maintain a constant CD rotation speed in order to improve disc and player playback performance.
[0019]
For encoding one or more still pictures with multiple audio channels and multiple OGT streams, the system stream is composed of one or more normal or high resolution still pictures encoded as an MPEG video stream. , One or more audio channels, and one or more OGT substreams. A specific method for compressing a still picture is described below. As in the previous case, the audio bit rate is 224 bits / sec. Twelve audio streams can be encoded on the track, and within the audio packet header field, the stream_id field can be $ C0 to $ CB. The total bit rate is calculated as the audio bit rate (224 k / s) multiplied by the 12 audio streams, giving 2,688,000 bits per second, which is the case for a two-speed CD player. Within the performance range. All sound substreams are interlaced to maintain a constant CD rotation speed in order to improve disc and player playback performance. As for the OGT information, it is encoded as a single stream having a plurality of substreams corresponding to the audio stream. The sub_stream_id can be from 0 to $ FC (or larger), providing a range of 24 (or more) substreams. Every OGT packet has its sub_stream_id on the first byte of its data (offset $ 21), all OGT pages must have a SYNC word at the end of all data, and this SYNC The word is specified as $ 04080C10. The total bit rate available for OGT is calculated as follows. Subtracting the total audio substream bit rate from the total CD bit rate, which is 2296 bits / sec, multiplied by 75, 2 and 8 (27555200 bits / sec), and subtracting 2688000 bits / sec The result is 67200 bits / sec.
[0020]
In encoding one or more video streams using multiple audio and OGT streams, the video stream encoding bit rate is dependent on the number of audio channels. The more audio streams (or channels), the lower bit rate for that video stream is needed. The typical video bit rate of the system of the present invention can be 1,200,000 bits / sec, which is a single track with a single video stream (224,000 × 6 bits / sec). ) Allows 6 audio streams on top. In this case, the total bit rate would be equal to 2,544,000 bits / second, adding 1,200,000 to 224,000 and multiplying by 6. The remaining bandwidth can be used for OGT streams or padding. There may also be different settings for bit rate distribution. At higher video bit rates, fewer audio channels can be compressed within a single track.
[0021]
A disc is generally divided into several tracks, which are specified by the programmer of the disc. Within any track, there are several packs, each with its own header or data stream. Within each pack, there can be several packets, each with a header section that identifies its data type and structure. Note that packs and packets are not the same. A pack may include one or more packets. Generally speaking, there are several types of packets. This is audio, video, OGT, and embedding. In most cases, a pack contains a single packet. In some cases, such as at the end of a stream, a pack may include audio packets and embedded packets.
[0022]
The packs are arranged on tracks according to the playback order, where the packs are interleaved regardless of their packet type. Since each packet is identified by an identification number and all packs are read sequentially by the player, the player can easily reassemble all of the packs according to their packet identification numbers and their respective sequence (or timing) numbers. be able to.
[0023]
The preferred embodiment of the invention described above changes the range to allow for stream identification numbers, more channels (or streams) for audio packets and OGT packets to be identified and thus played. In the prior art, the audio packet identification number is limited to four values, thus limiting the number of audio channels to four. As mentioned above, the identification numbers for audio packets have been changed, allowing up to the media player speed divided by the audio bit rate. In the case of a CD player, the maximum amount of data that can be read by a player from a compact disc is limited by the speed of the player. For a two-speed (2x) CD player, the bit rate is 2,724,000 bits per second. Dividing by an audio bit rate of 224,000 bits per second, a two-speed CD player can theoretically play up to two audio streams within a track. For faster CD players, more audio streams can be placed in the track and played.
[0024]
In theory, the calculation is correct for the maximum number of audio streams that can be played for any speed of a two-speed player. This means that the user can select any one of the 12 audio streams and have that audio stream play without any problems. Empirically, without the additional techniques provided by aspects of the present invention, data for some audio streams may suffer from underflow problems that cannot be read by a player in time for playback. It is not possible to play more than six or seven audio streams in any track for a two-speed CD player. The reason for this underflow problem relates to the timing between the read data and the reproduced data.
[0025]
To solve this problem, the preferred embodiment of the present invention further modifies the PTS value of the packet. The PTS value of an arbitrary packet indicates the start of the play time of a specific packet. For example (conceptually only), an audio stream containing a one minute long son can be ordered in six packets. The first packet will have a PTS time of the 0th second, the second packet will have a PTS time of the 10th second, the third packet will have a PTS time of the 20th second, The same applies hereinafter. The last packet has a PTS time of the 50th second.
[0026]
Under the prior art method, to play the 12 audio stream tracks, all 12 audio streams must be played at time zero. As explained earlier, in practice, using the prior art method where all of the audio stream starts at time zero will cause underflow problems.
[0027]
In a preferred embodiment of the present invention, referring to FIG. 1, a method is devised for staggering the PTS time for the first packet of each audio stream by a predetermined amount of time. As shown, each song is staggered for a certain unit of time. In this situation, if the twelfth song is selected and played, there will be a silence period (equivalent to 11 hour units) before the song is played. This time unit of the preferred embodiment is determined by the time required to read the pack. For a two-speed CD player, 150 packs can be read per second, which equals 1/150 second. Note that all packs are the same size, and a pack is the smallest logical unit of data to be read from a track.
[0028]
In a preferred embodiment, FIG. 2a illustrates the placement of a pack on a track with still pictures and 12 audio streams when placing data on the track. All of the video packs (indicated by the symbol "V") can be preloaded and have audio streams 1 through 12 (A ₁ From A ₁₂ Note that this follows, and then it repeats. It should also be noted that when encoding the maximum amount of audio streams, the audio streams must be encoded sequentially, as shown. If not encoding all 12 audio streams, there may be greater latitude in the order of the audio packets. FIG. 2b illustrates the arrangement of packs on a track where still pictures, 12 audio streams, and OGT streams are interleaved. Again, since still pictures are used for all of the audio streams, the video information can be pre-loaded prior to loading any audio information. FIG. 2c illustrates the placement of packs on a track on which a video stream encoding a video clip, six audio streams, and an OGT stream are interleaved. Because of the presence of the video clip, the amount of audio information is reduced compared to encoding on a track. In this example, six audio streams are illustrated.
[0029]
To provide more songs on a track, the present invention provides a method for encoding additional songs on a track. The preferred embodiment described above illustrates a 12 channel (or stream) encoding method for a two-speed CD player so that at least 12 songs can be played on a two-speed CD player. For encoding. To incorporate additional songs on a single track, two or more songs can be arranged in such a way that one song follows another in the same audio stream. In this manner, multiple songs can be present in the audio stream, and the limit on the number of songs to the media is not due to the logical structure of the media (as in the prior art method), but to that Limited only by media capacity. To access other songs in the same audio stream, a command is provided to direct the readhead to the target song start position in the track. In a preferred embodiment, a play command is provided. It has arguments that identify the particular audio stream, the particular OGT stream, and the start and end times within the particular audio stream. Appendices A and B provide further details thereon.
[0030]
As part of the present invention, a scripting language is also provided, enabling higher and more dynamic interactivity, especially when used with files having the aforementioned file formats. This allows for performing simple mathematical calculations, simple character operations, drawing graphics, displaying image files and said files, or playing sound and video clips. More specifically, the language includes equal to (equal to), less than (smaller), greater than (greater), less than or equal to (less than or equal to), greater than or equal to (more And relational operators such as not equal to (not equal to). This allows for multiple statements per line of code, conditional statements (if, then, else), goto statements, gosub and return statements, for loop statements, and end statements. It also provides some graphical features, such as screen clear, draw cursor at a specific location on the screen, draw a line, draw a rectangle (framed or filled), or image on the screen. . Also provided are multimedia features, including the playback of sound clips, video clips, specific items, specific locations, or lists. Several sing-along features are also provided, including setting a particular selection into a playlist, inserting a selection at a position in a playlist, deleting a selection from a playlist, generating a random number scheme, Includes clip playback. There is also a call timer function that obtains the system clock time and the received call / receive store infrared key. Appendices A and B attached hereto provide further details regarding the scripting language of the present invention.
[0031]
In another aspect of the present invention, a method for displaying overlay graphics and text (OGT) is disclosed. Overlay graphics and text are specially designed for displaying program title graphics and language text. The preferred method supports full screen, multi-color, and overlay displays, and also allows the insertion of spoken text and other static graphical images. The advantage is that graphics and text are not required prior to video encoding, but instead they are merged upon decoding. This flexibility allows language selection and preserves video quality. Overlay graphics and text data are compressed into an OGT special data stream. The OGT data stream is encoded as a "dedicated stream" identified in a stream identification field ($ BD). The pack structure is the same as video pack (PS) and packet (PES), with an additional OGT header identifying the substream at the beginning of the packet's data (beyond 4, eg 0.23). Is shown. Accordingly, one of the sub-streams can be selected.
[0032]
[Appendix A]
[CHM data format specification]
All 2-byte and 4-byte integers are represented in large endian format.
All text strings end with an empty character ($ 0) and are padded to 4-byte boundaries.
This specification is available at www. w3. Use the term URL (Uniform Resource Locators) as defined in [RFC1808] "Relative Uniform Resource Locators" (R. Fielding, June 1995), available at org.
This specification is Error! Use the term HTML, defined in [REC-html40], available at Bookmark not defined.
This specification is available at http: // www. Microsoft. Use the term WAV, as defined in "Microsoft Windows Multimedia Programmer's Guide", available at http://www.microsoft.com/
[A. 1 CHM file format (format)]
CHM files have the following general format: The identification text string is followed by a header, a plurality of data packets, and ends with a private data packet and an ending text string.
<COMPHTML>
CHM header
CHM data packet
CHM data packet
. . . . . . . .
CHM data packet
Unknown data package
</ COMPHTML>
[Table 1]

<COMPHTML>
. . . . CHM file identification string
CHM header
. . . . CHM header containing global information
CHM data packet
. . . . Packets of data that make up the CHM structure
Private data
. . . . Data area reserved for authoring tool use
</ COMPML>
. . . . Text string signaling end of CHM file
[A. 1.1 CHM header]
The header contains information about the authoring tool, its author, and source material. The browser uses the palette information to build a new palette to be used in the CHM file.
[Table 2]

Authoring tool information
. . . . Text string describing the tool used to create the CHM file
version information
. . . . CHM file version
Title name
. . . . CHM page title
User name
. . . . Creator of CHM file
Screen width
. . . . Number of pixels for the width of the source material
Screen height
. . . . Source Material Height Pixels
Background color
. . . . Index to palette used as background for this CHM page
Pallet size
. . . . Palette size used in CHM file
Private data bytes
. . . . Data to be used by the creator / content owner of the page
Palette data
. . . . Palette <palette size> YUV item
[A. 1.2 CHM data packet]
Data packets contain data that a browser uses to move information on disk. CHM data packets have the following format: Data follows the header. The header contains the type and size of the packet following the header.
[Table 3]

data type
. . . . Type of CHM data packet. The types are listed in the CHM data type table.
size
. . . . Number of bytes of data packet size excluding header
Checksum
. . . . Checksum with specified algorithm
Data packet
. . . . CHM packet data
[Table 4]

[A. 1.2.1 Data types used in this specification]
[Table 5]

[Table 6]

[Table 7]

[Table 8]

[Table 9]

[A. 1.2.2 Data type packet type]
[A. 1.2.2.1 CHM page title (type 0)]
Give the title of the CHM page. This is used by the bookmark function. Currently, only English text is allowed.
[Table 10]

length
. . . . Title length in bytes excluding empty characters
text
. . . . Title text string
[A. 1.2.2.2 English Text Object (Type 1)]
[Table 11]

Coordinate
. . . . Lower left coordinate of text string
Anchor ID
. . . . Anchor ID
Font type
. . . . Font style used for text
Text color
. . . . Text color
Text length
. . . . Length of text character string excluding empty characters
text
. . . . Text string rendered at (x, y) coordinates
Supported characters: (space to ~)
space! "# {% &'() * +-. / 01234456789:;, <=>? {ABCDEFGHIJKLMNOPQRSTUVWXYZ}] {_' abcdefghijklmnopqrstuvwxyz || ~
[A. 1.2.2.3 Image object (type 3)]
[Table 12]

Coordinate
. . . . Upper left coordinate of the image
Width and height
. . . . Image width and height
Anchor ID
. . . . Anchor ID
Map name
. . . . Client-side image map name
color
. . . . Text color
Image file name length
. . . . Number of characters in image file names excluding empty characters
Image file name
. . . . Image file name
[A. 1.2.2.4 Rectangular outline object (type 4)]
[Table 13]

Rectangular coordinates
. . . . (Upper left) and (lower right) coordinates of the rectangular outline
color
. . . . Outline color
[A. 1.2.2.5 Filled rectangular object (type 5)]
[Table 14]

Rectangular coordinates
. . . . (Upper left) and (lower right) coordinates of the rectangular outline
color
. . . . Outline color
[A. 1.2.2.6 Line object (type 8)]
[Table 15]

Coordinate 1
. . . . Starting point of line
Coordinate 2
. . . . End of line
style
. . . . Line (unused) drawing style
color
. . . . Line color
[A. 2.2.7 Anchor object (type 9)]
[Table 16]

Anchor ID
. . . . Anchor ID
overlay
. . . . If the overlay value is "1", the background image is saved and a new image is drawn over the current image.
Event handler
. . . . See event handler data structure.
Number of operations
. . . . Number of CHM operations present in the operation field
operation
. . . . Operations performed after anchor selection
URL length
. . . . Length of URL string excluding empty characters
URL
. . . . URL string
[A. 1.2.2.8 Image map area object (type 11)]
[Table 17]

Image map name
. . . . The name of the image map
overlay
. . . . If the overlay value is "1", the background image is saved and a new image is drawn on top of the background image.
Event handler
. . . . See event handler data structure.
Number of operations
. . . . Number of CHM operations present in the operation field
operation
. . . . Operations performed after map selection
URL length
. . . . Length of URL string excluding empty characters
URL
. . . . URL string
[A. 1.2.2.9 Refresh object (type 13)]
[Table 18]

delay
. . . . Number of seconds until operation execution
Number of operations
. . . . Number of CHM operations present in the operation field
operation
. . . . Operations performed after map selection
URL length 1
. . . . The length of the first URL string that refers to the original URL name, excluding the null character
URL length 2
. . . . The length of the second URL string that refers to the overlay URL name, excluding the null character
URL
. . . . A null-terminated URL string
[A. 1.2.2.10 Variable Object (Type 14)]
[Table 19]

Coordinate
. . . . (Lower left) coordinate of the text string to print the variable value
Variable value
. . . . The name of the variable whose value is displayed
[A. 1.2.2.11 Background sound object (type 15)]
[Table 20]

Number of loops
. . . . Number of loops in which the clip should be played (-1 indicates continuation)
Sound file name length
. . . . Length of sound file name
Sound file name
. . . . Length of WAV file to be played
[A. 1.2.2.12 Script Object (Type 16)]
[Table 21]

Language ID
. . . . Script language ID. The valid ID is “VCDSCRIPT”.
Script length
. . . . Script length in bytes
script
. . . . Script to be executed or parsed
[A. 1.2.2.13 Background Image Object (Type 17)]
[Table 22]

Image file name length
. . . . Image file name length
Image file name
. . . . The name of the image file to be loaded. Background images are files that should be shared when drawing various text screens.
[0033]
[Table 23]

[0034]
Although the invention has been described with reference to specific embodiments, it is anticipated that alterations and modifications of the invention will become apparent to those skilled in the art. It is therefore intended that the following claims be interpreted as covering all such alterations and modifications as fall within the true spirit and scope of the invention.
[Brief description of the drawings]
FIG.
FIG. 3 illustrates a time delay in encoding an audio stream in a preferred embodiment of the present invention.
FIG. 2a
FIG. 3 illustrates encoding for various forms of the preferred embodiment.
FIG. 2b
FIG. 3 illustrates encoding for various forms of the preferred embodiment.
FIG. 2c
FIG. 3 illustrates encoding for various forms of the preferred embodiment.

Claims

A method of encoding information on a physical medium, comprising:
Providing a medium for recording and playback;
Recording encoded information including an encoded video stream and a plurality of encoded audio streams on a track formed on the medium, wherein the encoded information is stored in a plurality of packs. A method of encoding information, wherein each pack is partitioned, with each packet having an associated packet of the encoded video stream and an associated packet of each of the audio streams.

The method of claim 1, wherein each of the packs is recorded on a portion of the track, and the packets of an associated pack are sequentially arranged on the track.

The audio stream includes a first audio stream and a second audio stream, wherein the second audio stream is delayed by a predetermined time frame relative to the first audio stream. The method of claim 1 wherein

4. The method of claim 3, wherein the predetermined time period is approximately equal to the amount of time required to read one of the packets.

The method of claim 1, wherein the predetermined time period is greater than an amount of time required to read the packet.

The method of claim 1, wherein the medium is a compact disc.

7. The method of claim 6, wherein said compact disc is encoded using a VCD standard.

The method of claim 1, wherein the video stream represents a still picture.

The method of claim 1, wherein the video stream represents a video clip.

The method of claim 1, wherein each of the audio streams encodes one or more songs.

A disc formatted by recording encoded information comprising an encoded video stream and a plurality of encoded audio streams on tracks formed on the disc, wherein the encoded information is A disk partitioned into a plurality of packs, each pack having associated packets of the encoded video stream and associated packets of each audio stream.

A method for decoding information encoded on a physical medium, comprising:
Reading from a selected track of the physical medium;
Reading the packs of the truck;
Identifying the contents of each pack having one or more packets;
Identifying a video packet for a video stream from the packet, and identifying an audio packet for a plurality of audio streams from the packet;
Selecting the video stream and packets of the at least one audio stream for playback.

The method of claim 12, wherein each packet of each of the audio streams includes an associated header field that identifies the associated audio stream.

14. The method of claim 13, wherein each of the header fields distinguishes between the plurality of audio streams.

The audio stream comprises a first audio stream and a second audio stream, wherein the second audio stream is delayed relative to the first audio stream. The method described in.

The method of claim 15, wherein the audio stream comprises a third audio stream, and the third audio stream is delayed relative to the second audio stream.

13. The method according to claim 12, wherein said medium is a compact disc.

The method of claim 17, wherein the compact disc is encoded using a VCD standard.

A system for decoding information, including at least one video stream and multiple audio streams, recorded on a physical medium, comprising:
A media reading unit operable to read encoded information stored on a physical medium, the encoded information comprising an encoded video stream and a plurality of encoded video streams recorded on tracks formed on the physical medium. Wherein the encoded information is partitioned into a plurality of packs, each pack having an associated packet of the encoded video stream and an associated packet of each audio stream, and a media reading unit. A media reading unit operable to generate an encoded information signal indicative of the encoded information;
A decoder responsive to the encoded information signal and operable to identify, for playback, packets associated with a selected video stream and packets associated with a selected audio stream; A decoder operative to generate an audio signal based on the audio stream, and operable to generate a video signal based on the selected video stream;
A display unit responsive to the video signal and operative to provide a video display;
A speaker unit responsive to the audio signal and operative to generate sound.

20. The system of claim 19, wherein the packets include a header operable to distinguish between six or more different audio streams.