JP2008048447A

JP2008048447A - Time and resolution layer structure to apply encryption and watermark processing thereto in next generation television

Info

Publication number: JP2008048447A
Application number: JP2007248991A
Authority: JP
Inventors: Gary A Demos; ガリーエーデモス
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2007-09-26
Filing date: 2007-09-26
Publication date: 2008-02-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and apparatus for performing encryption and watermark processing using temporal and resolution structure layering of a compressed image frame. <P>SOLUTION: The present invention relates to a method for encrypting a data stream of encoded and compressed video information to a basic layer and at least one enhancement layer and for performing watermark processing on the same data stream, including the steps of: selecting an encryption algorithm; selecting watermark processing; selecting at least one unit to be encrypted in the basic layer or the enhancement layer; selecting at least one unit to apply watermark processing thereto in the basic layer or the enhancement layer; applying watermark processing to each of the selected units by applying the selected watermark processing; and encrypting each of the selected units by applying the selected encryption algorithm. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は電子通信システムに関し、より詳細には、圧縮画像フレームの時間的および解像度レイヤ構造を有し、暗号化および透かしをいれる能力を提供する次世代（advanced）電子テレビジョンシステムに関する。 The present invention relates to electronic communication systems, and more particularly to advanced electronic television systems that have the temporal and resolution layer structure of compressed image frames and provide the ability to encrypt and watermark.

現在、米国ではテレビジョン伝送にＮＴＳＣ規格が使用されている。しかしながら、ＮＴＳＣ規格から次世代テレビジョン規格へ移行する提案がなされている。例えば、２４Ｈｚ、３０Ｈｚ、６０Ｈｚ、および６０Ｈｚインターレースの各レートのデジタル標準精細度および次世代テレビジョンフォーマットの米国での採用が提案されている。これらのレートは、既存のＮＴＳＣテレビジョン表示レートの６０Ｈｚ（または５９．９４Ｈｚ）を続けること（従ってそれとの互換性）を意図していることが明白である。また、時間レートが毎秒２４フレーム（ｆｐｓ）の映画を表示する場合、「３−２プルダウン」が６０Ｈｚ用ディスプレイ上での表示に意図されていることも明白である。しかしながら、上記提案は選択可能な複数のフォーマットを提供するものの、各フォーマットはそれぞれ単一の解像度およびフレームレートを符号化および複合化するにすぎない。これらのフォーマットの表示または動作レートは互いに整的に関連していないため、あるフォーマットから別のフォーマットへの変換は困難である。 Currently, the NTSC standard is used for television transmission in the United States. However, proposals have been made to move from the NTSC standard to the next-generation television standard. For example, the adoption of digital standard definition and next-generation television formats at 24 Hz, 30 Hz, 60 Hz, and 60 Hz interlace rates in the United States has been proposed. It is clear that these rates are intended to continue (and therefore be compatible with) the existing NTSC television display rate of 60 Hz (or 59.94 Hz). Also, when displaying a movie with a time rate of 24 frames per second (fps), it is also clear that “3-2 pulldown” is intended for display on a 60 Hz display. However, while the above proposal provides multiple selectable formats, each format only encodes and decodes a single resolution and frame rate, respectively. Since the display or operating rates of these formats are not systematically related to each other, conversion from one format to another is difficult.

更に、この提案は、極めて重大なコンピュータディスプレイとの間の互換能力を提供するものではない。提案されているこれら画像動作レートは、今世紀の初頭へ遡る歴史上のレートに基づいている。もし「白紙の状態」にするなら、これらのレートが選択されることはないだろう。過去１０年間に渡ってディスプレイに任意レートを利用してきたコンピュータ業界では、７０から８０Ｈｚレンジのレートが最適であることが証明され、７２および７５Ｈｚが最も一般的なレートになっている。残念ながら、提案のレート３０および６０Ｈｚは７２または７５Ｈｚとの有用な相互運用性に欠け、時間的性能で劣っている。 Furthermore, this proposal does not provide compatibility between critical computer displays. These proposed image motion rates are based on historical rates dating back to the beginning of this century. If it is “blank”, these rates will not be selected. In the computer industry that has used arbitrary rates for displays over the past decade, rates in the 70 to 80 Hz range have proven to be optimal, with 72 and 75 Hz being the most common rates. Unfortunately, the proposed rates 30 and 60 Hz lack useful interoperability with 72 or 75 Hz and have poor temporal performance.

その上、高フレームレートで約１０００本の解像度を持つ必要があるとの主張は、フレームインターレースを必要とするが、その考えに基づくと、従来の６ＭＨｚ放送のテレビジョンチャネルで利用可能な１８〜１９ｍビット／秒内にこれらの画像を圧縮することは不可能であるという当業者の指摘もある。 In addition, the claim that it is necessary to have a resolution of about 1000 lines at a high frame rate requires frame interlacing, but based on that idea, it is possible to use 18 ~ One skilled in the art also points out that it is impossible to compress these images within 19 mbit / s.

所望の標準および高精細解像度の全てを含む単一の信号フォーマットが採用することができることは非常に望ましい。しかしながら、従来の６ＭＨｚ放送のテレビジョンチャネルの帯域幅制約内でそれを実現するには、フレームレート（時間的）および解像度（空間的）の両方の圧縮（すなわち「スケーラビリティ」）が要求される。かかるスケーラビリティの提供を具体的に意図した方法の１つにＭＰＥＧ−２規格がある。残念ながら、ＭＰＥＧ−２規格で規定されている時間的および空間スケーラビリティの特徴は、米国向け次世代テレビジョンのニーズに対応するのには十分に効率的ではない。従って、米国向け次世代テレビジョンに対する提案は、時間的（フレームレート）および空間的（解像度）レイヤ構造が不十分であり、それゆえ別個のフォーマットが必要であるという前提に基づいている。 It is highly desirable that a single signal format that includes all of the desired standards and high definition resolutions can be employed. However, to achieve it within the bandwidth constraints of conventional 6 MHz broadcast television channels requires both frame rate (temporal) and resolution (spatial) compression (ie, “scalability”). One method specifically intended to provide such scalability is the MPEG-2 standard. Unfortunately, the temporal and spatial scalability features specified in the MPEG-2 standard are not efficient enough to meet the needs of next-generation television for the United States. Thus, proposals for next generation television for the United States are based on the premise that temporal (frame rate) and spatial (resolution) layer structures are inadequate and therefore a separate format is required.

上記課題に加え、本発明者はデジタル映画等、貴重な著作権付き音声および映像媒体の使用を保護および管理する必要性を確認している。映画データ配信の全技術の存続は、使用を保護および管理する能力にかかっていると言える。デジタルの圧縮された映画マスターの品質がオリジナル作品の品質に近づくにつれて、保護および管理手法に対するニーズが決定的な要件になる。 In addition to the above issues, the inventor has identified the need to protect and manage the use of valuable copyrighted audio and video media such as digital movies. The survival of all movie data delivery technologies depends on the ability to protect and manage their use. As the quality of digital compressed movie masters approaches the quality of original works, the need for protection and management techniques becomes a decisive requirement.

デジタルコンテンツの保護および管理のためのシステムアーキテクチャに取り組む際、モジュール化したフレキシブルな方式で適用できる各種のツールおよび手法を採ることが有益だろう。ほとんどの商用暗号化システムは最終的には傷つけられてしまっている。そのため、どの保護システムをも、それが傷つけられる場合、自らを適応させ、そして強化できるように、充分にフレキシブルに構築する必要がある。また、ソース及び予防措置（セキュリティ）が危険に晒された方法を正確に特定するために、記号および／またはシリアル番号情報の透かし処理により各コピーへ情報的な手掛かりを与えることも有益である。 When working on a system architecture for digital content protection and management, it would be beneficial to employ a variety of tools and techniques that can be applied in a modular and flexible manner. Most commercial encryption systems are ultimately damaged. Therefore, any protection system needs to be built sufficiently flexible so that it can adapt and strengthen itself if it is damaged. It is also beneficial to give each copy an informational clue by watermarking the symbol and / or serial number information in order to pinpoint exactly how the source and precautions (security) were compromised.

映画館へのデジタル形式での映画配信が実現しつつある。高価な新作映画をコピーすることが今日のフィルムプリントの盗難またはコピーの標的となって久しい。ＤＶＤ等のデジタル媒体は、不十分な暗号化および認証スキーム（ＤＩＶＸ等）を試みてきた。プレミアム有線チャネルおよびペイ・パー・ビュー番組および映画の課金には、アナログ有線スクランブラ（盗視聴防止のためにテレビなどの信号を混合混乱させる周波数帯変換機）が当初から使用されている。しかしながら、これら不十分なスクランブラは幅広く傷つけられてきている。 Digital movie distribution to movie theaters is being realized. Copying expensive new movies has long been the target of theft or copying of today's film prints. Digital media such as DVDs have attempted poor encryption and authentication schemes (such as DIVX). For the charging of premium wired channels and pay-per-view programs and movies, analog wired scramblers (frequency band converters that mix and disrupt TV signals to prevent eavesdropping) have been used from the beginning. However, these inadequate scramblers have been widely damaged.

デジタルおよびアナログのビデオシステムがかかる貧弱なセキュリティシステムを許してきた一つの理由は、２次的なビデオリリースの価値および海賊行為による損失の市場での割合が比較的小さいことである。しかしながら、デジタル形式の封切り映画、高価なライブイベントおよび高解像度画像の（ＨＤＴＶ形式による）家庭および事業所への配信の場合、強健（ロバスト）なセキュリティシステムが必需品になる。 One reason that digital and analog video systems have allowed such poor security systems is the value of secondary video releases and the relatively small proportion of piracy losses in the market. However, a robust security system is a necessity for digital open movies, expensive live events, and high resolution image distribution to homes and offices (in HDTV format).

本発明は、現行のデジタルコンテンツ保護システムの上記問題およびその他の諸問題を克服する。 The present invention overcomes the above and other problems of current digital content protection systems.

要約
本発明は、高フレームレートで高品質を備えた１０００ラインの解像度よりも優れた解像度を明白に実現できる画像圧縮の方法および装置を提供する。本発明はまた、従来のテレビジョン放送チャネルで利用可能な帯域幅内で、この解像度および高フレームレートでの時間的および解像度の両方のスケーラビリティを達成する。本発明の手法は、フレキシブルな暗号化および透かし処理手法を提供しながら、次世代テレビジョンに提案されている圧縮率の２倍以上を効率的に達成する。 SUMMARY The present invention provides an image compression method and apparatus that can clearly achieve a resolution superior to 1000 line resolution with high frame rate and high quality. The present invention also achieves both temporal and resolution scalability at this resolution and at high frame rates within the bandwidth available on conventional television broadcast channels. The technique of the present invention efficiently achieves more than twice the compression rate proposed for next-generation television, while providing flexible encryption and watermarking techniques.

画像素材を、最初の、すなわち主フレーミングレートの７２ｆｐｓでキャプチャするのが好ましい。そして以下を含むＭＰＥＧ−２データストリームが生成される：
（１）好ましくはＭＰＥＧ−２のＰフレームのみ用いて符号化され、低解像度（例えば、１０２４ｘ５１２ピクセル）、低フレームレート（２４または３６Ｈｚ）のビットストリームを含む、基本レイヤ。
（２）ＭＰＥＧ−２のＢフレームのみ用いて符号化され、低解像度（例えば、１０２４ｘ５１２ピクセル）、高フレームレート（７２Ｈｚ）のビットストリームを含む、オプションの基本解像度時間拡張レイヤ。
（３）好ましくはＭＰＥＧ−２のＰフレームのみ用いて符号化され、高解像度（例えば、２ｋｘ１ｋピクセル）、低フレームレート（２４または３６Ｈｚ）のビットストリームを含む、オプションの基本時間高解像度拡張レイヤ。
（４）ＭＰＥＧ−２のＢフレームのみ用いて符号化され、高解像度（例えば、２ｋｘ１ｋピクセル）、高フレームレート（７２Ｈｚ）のビットストリームを含む、オプションの高解像度時間拡張レイヤ。 It is preferred to capture the image material at the initial or main framing rate of 72 fps. An MPEG-2 data stream containing the following is then generated:
(1) A base layer that is preferably encoded using only MPEG-2 P-frames and includes a low resolution (eg, 1024 × 512 pixels), low frame rate (24 or 36 Hz) bitstream.
(2) An optional basic resolution time enhancement layer that is encoded using only MPEG-2 B frames and includes a low resolution (eg, 1024 × 512 pixels), high frame rate (72 Hz) bitstream.
(3) An optional base time high-resolution enhancement layer, preferably encoded using only MPEG-2 P-frames, including a high-resolution (eg, 2kx1k pixels), low-frame-rate (24 or 36 Hz) bitstream.
(4) An optional high-resolution time enhancement layer that is encoded using only MPEG-2 B-frames and includes a high-resolution (eg, 2k × 1k pixel), high-frame-rate (72 Hz) bitstream.

本発明は、現行提案に対して本質的な改良を可能にする幾つかの主要な技術的特性を提供し、こうした特性には以下が含まれる：数多くの解像度およびフレームレートを、単一のレイヤ化された解像度およびフレームレートに置換すること；６ＭＨｚのテレビジョンチャネル内において高フレームレート（７２Ｈｚ）で２メガピクセル画像に対し１０００ラインよりも優れた解像度を達成するためにインターレースを必要としないこと；主フレーミングレートである７２ｆｐｓを使用するコンピュータディスプレイとの互換性；そして、「ストレスの多い（stressful)」画像素材が出現した際は利用可能な全ビットを低解像度の基本レイヤに割り当て可能であることにより、次世代テレビジョンに対する現行の未レイヤ化フォーマットの提案よりも高いロバスト性。 The present invention provides several key technical characteristics that allow substantial improvements over current proposals, including the following: numerous resolutions and frame rates, a single layer Replacing interleaved resolutions and frame rates; no interlacing is required to achieve better than 1000 lines for 2 megapixel images at a high frame rate (72 Hz) in a 6 MHz television channel Compatible with computer displays that use the main framing rate of 72 fps; and all available bits can be assigned to a low resolution base layer when “stressful” image material appears From the proposal of the current unlayered format for next-generation television High robustness.

開示のレイヤ化圧縮技術は、画像のモジュール化された分解の一形態を可能にする。このモジュール性は、スケーラブル復号化および優れたストレス復元力を可能にするにとどまらず、更なる利点を有する。モジュラー性は更に、フレキシブルな暗号化および透かし処理手法をサポートする構造として開発され得る。暗号化機能は、１つ以上の適正なキーが認証済み解読システムに適用されない限り、音声／映像ショーの視聴、上映、コピー、またはその他の使用を制限することである。透かし処理機能は、遺失したまたは窃取されたコピーをソースまで追跡し、盗難方法の性質を判定してシステムの安全性を向上するとともに、窃取にかかわった人々を特定できることである。 The disclosed layered compression technique allows a form of modular decomposition of images. This modularity not only allows for scalable decoding and excellent stress resiliency, but has additional advantages. Modularity can also be developed as a structure that supports flexible encryption and watermarking techniques. The encryption function is to restrict viewing / showing, copying, or otherwise using an audio / video show unless one or more appropriate keys are applied to the authenticated decryption system. The watermarking function is able to track lost or stolen copies to the source, determine the nature of the theft method, improve the security of the system, and identify the people involved in the theft.

レイヤ化圧縮を用いることにより、基本レイヤおよびその基本レイヤの様々な内部コンポーネント（ＩフレームおよびそれらのＤＣ係数、またはＰフレームの動きベクトル等）を使用して、圧縮されたレイヤ構造の映画ストリームを暗号化できる。このようなレイヤ化されたビットのサブセットを使用することにより、ピクチャストリーム全体のビットのうちの小部分を暗号化するだけでピクチャストリーム全体を（解読されない限り）認識不能にできる。更に、様々な暗号化アルゴリズムおよび強度を、拡張レイヤ（プレミアム品質のサービスとして視聴可能であり、特別に暗号化されている）を含むレイヤ化されたストリームの様々な部分に適用できる。暗号化アルゴリズムまたはキーを各スライス境界毎に変更して、暗号化と画像ストリームとをより絡み合わせることも可能である。 By using layered compression, a base layer and various internal components of the base layer (such as I-frames and their DC coefficients, or P-frame motion vectors) can be used to compress compressed layered movie streams. Can be encrypted. By using such a layered subset of bits, the entire picture stream can be made unrecognizable (unless decrypted) simply by encrypting a small portion of the bits of the entire picture stream. In addition, various encryption algorithms and strengths can be applied to various parts of the layered stream including enhancement layers (viewable as a premium quality service and specially encrypted). It is also possible to change the encryption algorithm or key for each slice boundary to make the encryption and the image stream more entangled.

本発明のレイヤ化圧縮構造は、透かし処理にも用いることができる。透かし処理の目標は、検出により高い信頼性をもって識別可能であり、なお目には本質的に不可視とすることである。例えば、Ｉフレーム内のＤＣ係数におけるローオーダービットは、目には不可視だろうけれども、透かしを持つ特定のピクチャストリームを唯一に識別するのになお使用可能である。拡張レイヤは、それら自身の唯一の識別透かし構造を有することができる。 The layered compression structure of the present invention can also be used for watermark processing. The goal of watermarking is to be able to identify with high reliability by detection and still be essentially invisible to the eye. For example, the low order bits in the DC coefficient in an I frame may still be invisible to the eye, but can still be used to uniquely identify a particular picture stream with a watermark. The enhancement layers can have their own unique identification watermark structure.

本発明の１つ以上の実施の形態の詳細を添付の図面および以下の説明で提示する。本発明のその他の特長、目的および利点は、説明、図面、および特許請求の範囲から明らかになろう。 The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.

この説明を通じて、以下の好ましい実施の形態および実施例は、本発明を限定するのではなく模範例として解釈されるべきである。 Throughout this description, the following preferred embodiments and examples should be construed as exemplary rather than limiting of the present invention.

時間的および解像度レイヤ化
時間レートファミリの目標
従来技術の諸問題を検討した上で、本発明を実施するにあたり、目標を以下の通り定めて、将来のデジタルテレビジョンシステムの時間的特徴を明確にする：
・高解像度の遺物である毎秒２４フレームのフィルムを最適に表示
・スポーツ等、素早く動く画像のタイプに対する滑らかなモーションキャプチャ
・既存のアナログＮＴＳＣディスプレイ、および７２または７５Ｈｚで動作するコンピュータ互換ディスプレイ上での、スポーツおよび同様な画像の動きの滑らかな表示
・ニュースおよび生活ドラマ等、それほど速く動かない画像の適当で且つより効率的なモーションキャプチャ
・全ての新しいデジタル形式の画像をコンバータボックスを介して既存のＮＴＳＣディスプレイ上で適当に表示
・全ての新しいデジタル形式の画像をコンピュータ互換ディスプレイ上に高品質で表示
・６０Ｈｚのデジタル標準または高解像度ディスプレイが市場に登場した場合、こうしたディスプレイ上で適当にまたは高品質で表示 Goals of Temporal and Resolution Layered Time Rate Family After reviewing the problems of the prior art, in implementing the present invention, the goals are defined as follows to clarify the temporal features of future digital television systems: To:
• Optimal display of high resolution relics, 24 frames per second film • Smooth motion capture for fast moving image types, such as sports • On existing analog NTSC displays and computer compatible displays operating at 72 or 75 Hz Smooth display of sports and similar image movements • Appropriate and more efficient motion capture of images that do not move so fast, such as news and life dramas • All new digital images existing via a converter box Appropriate display on NTSC displays • Display all new digital images in high quality on computer-compatible displays • When 60Hz digital standard or high resolution displays appear on the market, Display a high-quality

６０Ｈｚおよび７２／７５Ｈｚのディスプレイは、映画のレートである２４Ｈｚ以外のいずれのレートとも本質的に互換性がないので、７２／７５と６０のどちらかを表示レートから除くのが最善であろう。７２または７５ＨｚはＮ．Ｉ．Ｉ．（National
Information Infrastructure：全米情報基盤）およびコンピュータ用途で要求されるレートであることから、６０Ｈｚのレートを基本的に時代遅れのレートとして除くのが最も未来志向であろう。しかしながら、放送およびテレビジョン機器業界内には競合する利害関係が数多くあり、そして新しいどのようなデジタルテレビジョンインフラも６０Ｈｚ（および３０Ｈｚ）に基づくべきであるという強い要請もある。このことが、テレビジョン、放送およびコンピュータ業界の間で激しい議論の火種となっている。 Since 60 Hz and 72/75 Hz displays are inherently incompatible with any rate other than the movie rate of 24 Hz, it would be best to exclude either 72/75 or 60 from the display rate. 72 or 75 Hz is N.P. I. I. (National
(Information Infrastructure: National Information Infrastructure) and the rate required for computer applications, it would be most future-oriented to exclude the 60 Hz rate as an obsolete rate. However, there are many competing interests within the broadcast and television equipment industry, and there is also a strong demand that any new digital television infrastructure should be based on 60 Hz (and 30 Hz). This has sparked intense debate among the television, broadcast and computer industries.

更に、放送およびテレビジョン業界にはインターレース６０Ｈｚフォーマットを主張する利害関係も存在し、コンピュータディスプレイの要求とのギャップを更に拡大させている。デジタルテレビジョンシステムをコンピュータ等に適用した場合には非インターレース表示が要求されるので、インターレース信号を表示するためにはデインターレーサ（de-interlacer）が必要になる。デインターレーサはそのような受信装置の全てに必要となることから、デインターレーサのコストおよび品質について相当な議論がある。デインターレース処理に加え、フレームレートの変換も更にコストおよび品質に影響を与える。例えば、ＮＴＳＣとＰＡＬ間のコンバータのコストは非常に高いままであるにもかかわらず、その変換能力は依然として一般的シーンの多くについて信頼できない。インターレースに関する議論は複雑で厄介な問題なので、また時間レートの諸問題および課題に取り組むために、本発明はインターレースのないデジタルテレビジョン規格に関連して説明する。 In addition, there is a stake in the broadcast and television industry that claims an interlaced 60 Hz format, further expanding the gap with computer display requirements. When a digital television system is applied to a computer or the like, non-interlaced display is required, so that a de-interlacer is required to display an interlaced signal. Since a deinterlacer is required for all such receivers, there is considerable debate about the cost and quality of the deinterlacer. In addition to deinterlacing, frame rate conversion also has an impact on cost and quality. For example, despite the cost of the converter between NTSC and PAL remains very high, its conversion capability is still unreliable for many common scenes. Since the discussion on interlacing is a complex and cumbersome issue, and to address the time rate issues and challenges, the present invention will be described in the context of a digital television standard without interlacing.

最適時間レートの選択
うなり（beat)の問題７２または７５Ｈｚ用ディスプレイ上では、その表示レート（それぞれ７２または７５Ｈｚ）と等しい動作レートを持つカメラまたはシミュレート画像が形成された場合に最適表示が得られ、その逆も同様に成り立つ。同様に、６０Ｈｚ用ディスプレイ上では、６０Ｈｚのカメラまたはシミュレート画像から最適なモーションフィデリティ（忠実度）が得られる。７２Ｈｚまたは７５Ｈｚの生成レートを６０Ｈｚ用ディスプレイで使用すると、それぞれ１２Ｈｚまたは１５Ｈｚのうなり周波数が発生する。このうなりは動作解析を通じて除去できるが、動作解析は高価な上に不正確であり、目に見える不自然な結果（可視アーチファクト）および時間的エイリアシングを引き起こしてしまうことがよくある。動作解析を伴わない場合は、うなり周波数が、感知された表示レートを支配し、１２または１５Ｈｚのうなりが出現して２４Ｈｚよりも更に不正確な動作（動き）がもたらされる。従って、２４Ｈｚが、６０および７２Ｈｚの間の自然数の時間的な共通の分母を形成する。７５Ｈｚは６０Ｈｚに対してやや高い１５Ｈｚのうなりを生じるものの、その動作はやはり２４Ｈｚほどスムースではなく、２４Ｈｚのレートを２５Ｈｚに増加しない限り７５Ｈｚと２４Ｈｚとの間に整数関係は存在しない。（欧州の５０Ｈｚの国々では、映画が４％速く２５Ｈｚで上映されることがよくあるが、これはフィルムを７５Ｈｚ用ディスプレイ上で表示できるようにするためである。） Choosing the optimal time rate Beat issues On a 72 or 75 Hz display, an optimal display is obtained when a camera or simulated image with an operating rate equal to its display rate (72 or 75 Hz, respectively) is formed. And vice versa. Similarly, on a 60 Hz display, optimal motion fidelity can be obtained from a 60 Hz camera or simulated image. When a 72 Hz or 75 Hz generation rate is used with a 60 Hz display, a 12 Hz or 15 Hz beat frequency is generated, respectively. Although this beat can be removed through motion analysis, motion analysis is expensive and inaccurate, often causing visible unnatural results (visible artifacts) and temporal aliasing. Without motion analysis, the beat frequency dominates the perceived display rate, and 12 or 15 Hz beats appear, resulting in motion (motion) that is less accurate than 24 Hz. Thus, 24 Hz forms a natural time common denominator between 60 and 72 Hz. Although 75 Hz produces a slightly higher 15 Hz beat than 60 Hz, the operation is still not as smooth as 24 Hz, and there is no integer relationship between 75 Hz and 24 Hz unless the 24 Hz rate is increased to 25 Hz. (In 50 Hz countries in Europe, movies are often shown 4% faster at 25 Hz, so that the film can be displayed on a 75 Hz display.)

各受信装置で動作解析をしない場合、７２または７５Ｈｚ用ディスプレイ上での６０Ｈｚの動作、および６０Ｈｚ用ディスプレイ上での７５または７２Ｈｚの動作は、２４Ｈｚ画像ほどスムースではないだろう。従って、７２／７５Ｈｚまたは６０Ｈｚの動作はどちらも、７２または７５Ｈｚ用ディスプレイおよび６０Ｈｚ用のディスプレイの両者を含む混成のディスプレイ集団に到達するには不適当である。 Without motion analysis at each receiver, 60 Hz operation on a 72 or 75 Hz display and 75 or 72 Hz operation on a 60 Hz display may not be as smooth as a 24 Hz image. Thus, either 72/75 Hz or 60 Hz operation is unsuitable for reaching a hybrid display population that includes both 72 or 75 Hz displays and 60 Hz displays.

３−２プルダウン最適フレームレートを選択する際の更なる難題が、テレシネ（フィルムからビデオへの）変換処理中のビデオ効果に関わる「３−２プルダウン」を使用することにより生ずる。かかる変換中、３−２プルダウンパターンが最初のフレーム（またはフィールド）を３回、そして次のフレームを２回、そして次のフレームを３回、そして次のフレームを２回というように繰り返す。このようにして、２４ｆｐｓのフィルムがテレビジョン上に６０Ｈｚ（実際には、ＮＴＳＣカラーの場合５９．９４Ｈｚ）で表示される。つまり、１秒間のフィルムにおいて２個のフレームを１対とする１２対のフレームがそれぞれ５回表示されることによって毎秒６０個の画像が与えられる。３−２プルダウンパターンを図１に示す。 3-2 Pulldown An additional challenge in selecting the optimal frame rate arises from using “3-2 pulldown”, which involves video effects during the telecine (film to video) conversion process. During such conversion, the 3-2 pulldown pattern repeats the first frame (or field) three times, the next frame twice, the next frame three times, and the next frame twice. In this way, a 24 fps film is displayed on a television at 60 Hz (actually 59.94 Hz for NTSC color). In other words, 60 images per second are given by displaying 12 pairs of frames, each consisting of 2 frames on a 1 second film, 5 times each. A 3-2 pull-down pattern is shown in FIG.

ある推定によれば、ビデオ上の全フィルムの半分以上は、そのかなりの部分において、５９．９４Ｈｚのビデオフィールドレートで２４ｆｐｓのフィルムへの調整がなされている。かかる調整は、「パンアンドスキャン」、色補正、およびタイトルスクロールを含む。更に、多くのフィルムはフレームの削除またはシーンの開始および終了部分を切り取ることによって時間調整され、与えられた放送予定内に収まるように適合されている。これら操作は、５９．９４Ｈｚおよび２４Ｈｚの両方の動作が存在するので、３−２プルダウン処理が逆転することを不可能にし得る。このためＭＰＥＧ−２規格を使用してフィルムを圧縮することが非常に困難になる。幸い、３−２プルダウンを使用した高解像度デジタルフィルムの大きいライブラリがないことから、この問題は既存のＮＴＳＣ解像度の素材に限られている。 According to one estimate, more than half of the total film on the video has been adjusted to 24 fps film at a video field rate of 59.94 Hz in a significant portion of it. Such adjustments include “pan and scan”, color correction, and title scrolling. In addition, many films are timed by deleting frames or cropping the beginning and end of a scene and are adapted to fit within a given broadcast schedule. These operations may make it impossible to reverse the 3-2 pulldown process because both 59.94 Hz and 24 Hz operations exist. This makes it very difficult to compress films using the MPEG-2 standard. Fortunately, this problem is limited to existing NTSC resolution material as there is no large library of high resolution digital film using 3-2 pulldown.

モーションブラー（動きの不鮮明化）２４Ｈｚよりも高い共通時間レートを見つけるという課題を更に検討するため、動画のキャプチャでのモーションブラーに言及することは有効である。カメラセンサおよびモーションピクチャフィルムは、各フレームの継続時間の一部で、動いている画像を感知するべく開いている。モーションピクチャカメラおよび多くのビデオカメラでは、この露光持続時間を調整できる。フィルムカメラはフィルム送り時間を必要とするため、通常は３６０度のうちの約２１０度、すなわち５８％のデューティサイクル分だけ開いているように制限される。ＣＣＤセンサを持つビデオカメラでは、センサから画像を「読み込む」のにフレーム時間の幾らかを必要とすることが多い。これはフレーム時間の１０％から５０％の間で変動し得る。センサによっては、この読み出し時間中に光を遮断するために電子シャッタを使用しなければならない。従って、ＣＣＤセンサの「デューティサイクル」は、通常５０から９０％の間で変動し、カメラによっては調整できるものもある。光シャッタは、もし望まれるのであれば、そのデューティサイクルを更に低下させるように時には調整可能である。しかしながら、フィルムおよびビデオの両者に対し、最も一般的なセンサのデューティサイクル持続時間は５０％である。 Motion blur (motion blurring) To further explore the challenge of finding common time rates higher than 24 Hz, it is useful to mention motion blur in the capture of moving images. The camera sensor and motion picture film are open to sense a moving image for a portion of the duration of each frame. In motion picture cameras and many video cameras, this exposure duration can be adjusted. Since film cameras require film advance time, they are typically limited to be open by about 210 degrees out of 360 degrees, or 58% duty cycle. Video cameras with CCD sensors often require some frame time to “read” an image from the sensor. This can vary between 10% and 50% of the frame time. Depending on the sensor, an electronic shutter must be used to block the light during this readout time. Thus, the “duty cycle” of a CCD sensor typically varies between 50 and 90%, and some cameras can be adjusted. The optical shutter can sometimes be adjusted to further reduce its duty cycle, if desired. However, for both film and video, the most common sensor duty cycle duration is 50%.

好ましいレートこの課題を念頭に置くと、６０、７２、または７５Ｈｚでキャプチャされた画像シーケンスからのフレームのほんの幾つかだけを使用することを考えることができる。１つのフレームを２、３、４個等の組で利用することにより、表１に示すサブレートが導き出される。

Preferred Rate With this issue in mind, it is possible to consider using only a few of the frames from an image sequence captured at 60, 72, or 75 Hz. By using one frame in groups of 2, 3, 4, etc., the subrates shown in Table 1 are derived.

１５Ｈｚというレートは６０および７５Ｈｚ間の統合レートである。１２Ｈｚというレートは６０および７２Ｈｚ間の統合レートである。しかしながら、２４Ｈｚより高いレートを望むと、これらのレートは排除される。２４Ｈｚは共通ではないが、６０Ｈｚ用ディスプレイ上での表示については３−２プルダウンの使用が業界で受け入れられてきている。従って、候補となるレートは、３０、３６、および３７．５Ｈｚのみである。３０Ｈｚは７５Ｈｚに対して７．５Ｈｚのうなりを発生し、そして７２Ｈｚに対して６Ｈｚのうなりを発生するので、候補としての役に適うものではない。 The rate of 15 Hz is an integrated rate between 60 and 75 Hz. The rate of 12 Hz is an integrated rate between 60 and 72 Hz. However, if rates higher than 24 Hz are desired, these rates are eliminated. Although 24 Hz is not common, the use of 3-2 pull-down has been accepted by the industry for display on a 60 Hz display. Thus, the only candidate rates are 30, 36, and 37.5 Hz. 30 Hz generates a 7.5 Hz beat for 75 Hz and a 6 Hz beat for 72 Hz, so it is not suitable as a candidate.

３６および３７．５Ｈｚの動作レートは、６０および７２／７５Ｈｚのディスプレイ上で表示される際、２４Ｈｚ素材よりもスムースな動きをもたらす最重視すべき候補である。これらの両レートは２４Ｈｚよりも約５０％高速でスムースである。３７．５Ｈｚのレートは、６０または７２Ｈｚのどちらの使用にも適さないため排除しなければならず、３６Ｈｚのみが所望の時間レート特性を有するものとして残る。（３７．５Ｈｚの動作レートは、テレビジョンの６０Ｈｚ表示レートを６２．５Ｈｚへ４％移動可能であれば使用できるだろう。６０Ｈｚの裏にある利害関係により６２．５Ｈｚはありそうもない。中には非常に時代遅れの５９．９４Ｈｚを新しいテレビジョンシステムに提案している人々さえいる。しかしながら、かかる変更がなされれば、本発明の他の態様を３７．５Ｈｚのレートに適用できるだろう。） The operating rates of 36 and 37.5 Hz are the most important candidates for providing smoother motion than 24 Hz material when displayed on 60 and 72/75 Hz displays. Both of these rates are smooth about 50% faster than 24 Hz. The 37.5 Hz rate is not suitable for use at either 60 or 72 Hz and must be eliminated, leaving only 36 Hz as having the desired time rate characteristics. (A 37.5 Hz operating rate could be used if the 60 Hz display rate of the television can be moved 4% to 62.5 Hz. 62.5 Hz is unlikely due to the stakes behind 60 Hz. There are even people who are proposing a very outdated 59.94 Hz for new television systems, however, if such changes are made, other aspects of the invention could be applied to the 37.5 Hz rate. )

２４、３６、６０、および７２Ｈｚのレートが時間レートファミリの候補として残っている。７２および６０Ｈｚのレートは、配信レートとして使用することはできない。なぜなら、上述したように、２４Ｈｚを配信レートとして使用した場合に比べ、これら２つのレートとの間で変換を行った際、動作がスムースでないためである。前提として、我々は２４Ｈｚより速いレートを求めている。そのため、３６Ｈｚが、６０および７２／７５Ｈｚのディスプレイで使用されるマスターに対するモーションキャプチャおよび画像配信を統合する最有力候補である。 The 24, 36, 60, and 72 Hz rates remain as candidates for the time rate family. The 72 and 60 Hz rates cannot be used as delivery rates. This is because, as described above, the operation is not smooth when converting between these two rates as compared with the case where 24 Hz is used as the distribution rate. As a premise, we are looking for a rate faster than 24 Hz. As such, 36 Hz is the leading candidate for integrating motion capture and image delivery for masters used in 60 and 72/75 Hz displays.

上述のように、２４Ｈｚ素材に対する３−２プルダウンパターンは、最初のフレーム（またはフィールド）を３回、そして次のフレームを２回、そして次のフレームを３回、そして次のフレームを２回というように繰り返す。３６Ｈｚを用いる場合、各パターンは２−１−２のパターンで繰り返されるのが最適であろう。これは表２および図１から図式的に分かる。

As mentioned above, the 3-2 pulldown pattern for 24Hz material is 3 times for the first frame (or field), 2 times for the next frame, 3 times for the next frame, and 2 times for the next frame. Repeat as follows. When using 36 Hz, each pattern would be best repeated with a pattern of 2-1-2. This can be seen graphically from Table 2 and FIG.

この３６Ｈｚと６０Ｈｚとの間の関係は、真に３６Ｈｚの素材に対してのみ成立する。６０Ｈｚ素材はインターレースされれば３６Ｈｚで「格納」できるが、３６Ｈｚは動作解析および再構築をしなければ６０Ｈｚから適当には生み出されない。しかしながら、モーションキャプチャのための新しいレートを探す際、３６Ｈｚは、６０Ｈｚ上で２４Ｈｚよりも幾分スムースな動作を提供し、そして７２Ｈｚのディスプレイ上でかなり良好なイメージモーションのスムースさを提供する。そのため、３６Ｈｚは、６０および７２／７５Ｈｚのディスプレイで用いるマスター用のモーションキャプチャおよび画像配信レートを統合する最適なレートであり、そのようなディスプレイ上で表示される場合に２４Ｈｚ素材よりもスムースな動きをもたらす。 This relationship between 36 Hz and 60 Hz is only true for 36 Hz material. 60 Hz material can be “stored” at 36 Hz if interlaced, but 36 Hz is not properly generated from 60 Hz without motion analysis and reconstruction. However, in searching for a new rate for motion capture, 36 Hz provides somewhat smoother operation on 60 Hz than 24 Hz, and provides much better image motion smoothness on a 72 Hz display. Therefore, 36 Hz is an optimal rate that integrates the motion capture and image delivery rates for masters used in 60 and 72/75 Hz displays, and moves more smoothly than 24 Hz material when displayed on such displays. Bring.

３６Ｈｚは上述の目標に合致しているが、キャプチャレートとして好適なものはそれだけではない。３６Ｈｚは６０Ｈｚから単純には抽出できないため、６０Ｈｚはキャプチャに適したレートを提供しない。しかしながら、７２Ｈｚは１個おきのフレームを３６Ｈｚ配信の基礎として使用することによって、キャプチャに使用できる。７２Ｈｚ素材の１個おきのフレームを使用することによって発生するモーションブラーは、３６Ｈｚのキャプチャの場合のモーションブラーの半分であろう。７２Ｈｚからの３個毎のフレームのモーションブラーの見え方を試験してみると、２４Ｈｚの断続的な閃光が嫌である。しかしながら、３６Ｈｚのディスプレイのために７２Ｈｚから１個おきのフレームを利用することは、もともと３６Ｈｚのキャプチャと比べると、目に嫌ではない。 Although 36 Hz meets the above target, it is not the only suitable capture rate. Since 36 Hz cannot simply be extracted from 60 Hz, 60 Hz does not provide a suitable rate for capture. However, 72 Hz can be used for capture by using every other frame as the basis for 36 Hz delivery. The motion blur generated by using every other frame of 72 Hz material would be half of the motion blur for a 36 Hz capture. When we look at the appearance of motion blur in every third frame from 72 Hz, we don't like 24 Hz intermittent flash. However, using every other frame from 72 Hz for a 36 Hz display is not annoying to the eye compared to the original 36 Hz capture.

従って、３６Ｈｚは、７２Ｈｚでキャプチャすることによって７２Ｈｚ用ディスプレイ上で非常にスムースな動作を提供できるとともに、もともと７２Ｈｚでキャプチャした素材の交互のフレームを使用して３６Ｈｚ配信レートを達成し、そして２−１−２プルダウンを用いて６０Ｈｚ画像を抽出すことによって、２４Ｈｚ素材よりも優れた動きを６０Ｈｚのディスプレイ上に提供する。 Thus, 36 Hz can provide very smooth operation on a 72 Hz display by capturing at 72 Hz, while using alternating frames of material originally captured at 72 Hz to achieve a 36 Hz delivery rate, and 2- Extracting a 60 Hz image using 1-2 pulldown provides motion on a 60 Hz display that is superior to 24 Hz material.

本発明によるキャプチャおよび配信の好ましい最適時間レートを表３に要約する。

Table 3 summarizes the preferred optimal time rates for capture and delivery according to the present invention.

また、７２Ｈｚのカメラからの交互のフレームを利用して３６Ｈｚ配信レートを達成するこの手法では、増加したモーションブラーデューティサイクルの恩恵も受けられるという点にも、言及しておく価値がある。７２Ｈｚで通常の５０％のデューティサイクルは、３６Ｈｚでは２５％のデューティサイクルをもたらすのであるが、許容できることが実証されており、そして６０Ｈｚおよび７２Ｈのディスプレイ上で２４Ｈｚを上回る顕著な改善を示している。しかしながら、デューティサイクルが７５〜９０％の範囲に増加される場合、３６Ｈｚのサンプルはより一般的な５０％のデューティサイクルに近づき始めるだろう。デューティレートを増加させることは、例えば、記録されていない時間（blanking
time）が短く、高いデューティサイクルが得られる「補助記憶（backing
store）」ＣＣＤ設計を用いることによって達成してもよい。デュアルＣＣＤ多重設計を含むその他の方法を使用してもよい。 It is also worth mentioning that this approach, which utilizes alternating frames from a 72 Hz camera to achieve a 36 Hz delivery rate, can also benefit from an increased motion blur duty cycle. The normal 50% duty cycle at 72Hz has been proven acceptable, although it yields a 25% duty cycle at 36Hz and shows a significant improvement over 24Hz on 60Hz and 72H displays. . However, if the duty cycle is increased to the 75-90% range, the 36 Hz sample will begin to approach the more common 50% duty cycle. Increasing the duty rate, for example, is not recorded time (blanking
“Auxiliary memory (backing) with short time and high duty cycle
store) "may be achieved by using a CCD design. Other methods including dual CCD multiple designs may be used.

部分修正されたＭＰＥＧ−２圧縮
効率よく格納および配信するには、好ましい時間レートである３６Ｈｚを有するデジタルソース素材を圧縮するのがよい。本発明のための好ましい圧縮形式は、ＭＰＥＧ−２規格の新規なバリエーションを用いて達成される。 Partially modified MPEG-2 compression For efficient storage and distribution, digital source material having a preferred time rate of 36 Hz should be compressed. The preferred compression format for the present invention is achieved using a novel variation of the MPEG-2 standard.

ＭＰＥＧ−２の基本ＭＰＥＧ−２は、よりコンパクトなコード化データ形式で画像シーケンスを表現する効率的な方法を提供するビデオシンタクスを定義した国際的なビデオ圧縮規格である。コード化（符合化）されたビットの言語が「シンタクス」である。例えば、数個のトークンで６４個のサンプルからなるブロック全体を表現できる。また、ＭＰＥＧは、コード化されたビットが、コンパクトな表現からオリジナルの「生」のフォーマットの画像シーケンスへとマッピングされる複合化（再構築）処理についても説明している。例えば、コード化されたビットストリーム中のフラグは、続くビットが離散コサイン変換（ＤＣＴ）アルゴリズム、または予測アルゴリズムのどちらで復号化（デコード）されるべきかを示す。復号化処理を含むアルゴリズムは、ＭＰＥＧが定義するセマンティクスによって規定されている。このシンタクスは、空間的冗長性、時間的冗長性、一定の動作、空間的マスキング等、ビデオに共通した特徴を利用するのに適用できる。ＭＰＥＧ−２は、実際にはデータフォーマットだけでなくプログラム言語も定義している。ＭＰＥＧ−２デコーダは、入ってくるデータストリームを構文解析および復号化できなければならないが、データストリームがＭＰＥＧ−２のシンタクスに準拠している限り、あり得るデータ構造および圧縮手法を幅広く使用できる。本発明は、ＭＰＥＧ−２規格を用いた時間的および解像度スケーリングのための新規な手段および方法を考案することにより、このＭＰＥＧ−２のフレキシビリティを利用している。 MPEG-2 Basics MPEG-2 is an international video compression standard that defines video syntax that provides an efficient way to represent image sequences in a more compact coded data format. The language of the coded (encoded) bits is “syntax”. For example, an entire block of 64 samples can be expressed with several tokens. MPEG also describes a decoding (reconstruction) process in which coded bits are mapped from a compact representation to an original “raw” format image sequence. For example, a flag in the encoded bitstream indicates whether subsequent bits should be decoded (decoded) with a discrete cosine transform (DCT) algorithm or a prediction algorithm. An algorithm including a decoding process is defined by semantics defined by MPEG. This syntax can be applied to take advantage of video common features such as spatial redundancy, temporal redundancy, constant behavior, spatial masking, etc. MPEG-2 actually defines a program language as well as a data format. The MPEG-2 decoder must be able to parse and decode the incoming data stream, but as long as the data stream conforms to the MPEG-2 syntax, it can use a wide range of possible data structures and compression techniques. The present invention takes advantage of the flexibility of MPEG-2 by devising new means and methods for temporal and resolution scaling using the MPEG-2 standard.

ＭＰＥＧ−２は、フレーム内およびフレーム間の圧縮方法を使用している。ほとんどのビデオシーンでは、背景が比較的安定している一方、前景ではアクションが発生する。背景が動くこともあるが、シーンの大部分は冗長である。ＭＰＥＧ−２は、Ｉ（Intra：イントラを表す）フレームと呼ばれる参照フレームを作成することによって圧縮を開始する。Ｉフレームは他のフレームを参照することなく圧縮され、従ってフレーム全体のビデオ情報を含む。Ｉフレームはランダムアクセスを行うためのデータビットストリームへのエントリポイントを提供するが、わずかしか圧縮できない。典型的には、Ｉフレームを表すデータは、ビットストリーム内で１０から１５フレーム毎に配置される。その後、参照用Ｉフレーム間に入るフレームはそのわずかな部分だけが両側のＩフレームと異なっているため、その差分のみキャプチャされ（とらえられ）、圧縮および格納される。かかる差分には２種類のフレームが用いられ、それらはＰ（Predicted：予測を表す）フレームおよびＢ（Bi-directional Interpolated：双方向補間を表す）フレームである。 MPEG-2 uses a compression method within and between frames. In most video scenes, the background is relatively stable, while actions occur in the foreground. The background may move, but most of the scene is redundant. MPEG-2 starts compression by creating a reference frame called an I (Intra) frame. I frames are compressed without reference to other frames and thus contain video information for the entire frame. I-frames provide an entry point into the data bitstream for random access, but can be compressed only slightly. Typically, data representing an I frame is placed every 10 to 15 frames in the bitstream. After that, since only a small part of the frame that enters between the reference I frames is different from the I frames on both sides, only the difference is captured (captured), and compressed and stored. Two types of frames are used for the difference, which are a P (Predicted) frame and a B (Bi-directional Interpolated) frame.

Ｐフレームは、一般的に過去のフレーム（Ｉフレームまたは先行のＰフレームのどちらか）を参照して符号化され、概して未来のＰフレームのための参照として使用される。Ｐフレームが擁する圧縮量はかなり高い。Ｂフレームのピクチャは圧縮量が最も高いが、概して符号化されるのに過去および未来の両方の参照を必要とする。双方向フレームは参照用フレームとして使用されることはない。 P frames are typically encoded with reference to past frames (either I frames or previous P frames) and are generally used as references for future P frames. The amount of compression that the P frame has is quite high. B-frame pictures have the highest amount of compression, but generally require both past and future references to be encoded. Bi-directional frames are not used as reference frames.

Ｐフレーム内のマクロブロックも、フレーム内コード化を用いて個々に符号化され得る。また、Ｂフレーム内のマクロブロックも、フレーム内コード化、順方向予測コード化、逆方向予測コード化、順方向および逆方向の両方つまり双方向補間予測コード化を用いて個々に符号化できる。マクロブロックとは、４個の８ｘ８ＤＣＴブロックからなる１６ｘ１６ピクセルのグループであって、Ｐフレームについては１つの動きベクトルを伴い、Ｂフレームについては１つまたは２つの動きベクトルを伴っている。 Macroblocks within P frames may also be encoded individually using intraframe coding. Macroblocks within B frames can also be individually encoded using intra-frame coding, forward prediction coding, backward prediction coding, both forward and backward, ie bi-directional interpolation prediction coding. A macroblock is a group of 16x16 pixels consisting of four 8x8 DCT blocks, with one motion vector for the P frame and one or two motion vectors for the B frame.

コード化後、ＭＰＥＧデータのビットストリームはＩ、Ｐ及びＢフレームのシーケンスを含んでいる。一つのシーケンスは、Ｉ、Ｐ及びＢフレームのほとんどどのようなパターンからなってもよい（それらの配置に関しては重要ではない意味上の制限が少数ある）。しかしながら、固定のパターン（例えば、ＩＢＢＰＢＢＰＢＢＰＢＢＰＢＢ）を有するのが業界プラクティスでは一般的である。 After encoding, the MPEG data bitstream contains a sequence of I, P and B frames. A sequence may consist of almost any pattern of I, P, and B frames (with a few semantic limitations that are not important for their placement). However, it is common in industry practice to have a fixed pattern (eg, IBBPBBPBBPBBPBB).

本発明の重要な部分として、基本レイヤ（base layer）、少なくとも１つの任意の時間拡張レイヤ（temporal enhancement layer）および任意の解像度拡張レイヤ（resolution enhancement
layer）を含むＭＰＥＧ−２データストリームが作成される。これらの各レイヤは後で詳細に説明する。 As an important part of the present invention, a base layer, at least one optional temporal enhancement layer and an optional resolution enhancement layer
MPEG-2 data stream including the layer) is created. Each of these layers will be described in detail later.

時間スケーラビリティ
基本レイヤ基本レイヤは３６Ｈｚのソース素材の伝達に使用される。好ましい実施の形態では、２種類のＭＰＥＧ−２フレームシーケンス、すなわちＩＢＰＢＰＢＰまたはＩＰＰＰＰＰＰの一方が基本レイヤに使用され得る。後者のパターンは、デコーダがＰフレームを復号化する必要があるだけで、２４Ｈｚの映画がＢフレームを用いずに復号化されていた場合に要求されるメモリ帯域幅を削減できるので、最も好ましい。 Time Scalability Base Layer The base layer is used for the transmission of 36 Hz source material. In the preferred embodiment, one of two types of MPEG-2 frame sequences, IBPBPBP or IPPPPPP, may be used for the base layer. The latter pattern is most preferred because it only requires the decoder to decode P-frames and can reduce the memory bandwidth required when a 24 Hz movie is decoded without using B-frames.

７２Ｈｚ時間拡張レイヤＭＰＥＧ−２圧縮を用いる際、Ｐフレーム間の間隔が規則正しければ、３６Ｈｚ基本レイヤのＭＰＥＧ−２シーケンス内に３６Ｈｚ時間拡張レイヤをＢフレームとして埋め込むことが可能である。これにより、１つのデータストリームで３６Ｈｚのディスプレイおよび７２Ｈｚのディスプレイの両方をサポートできる。例えば、両レイヤを復号化してコンピュータモニタ用の７２Ｈｚ信号を生成できる一方で、基本レイヤのみ復号化および変換してテレビ用の６０Ｈｚ信号を生成してもよい。 72 Hz Time Extension Layer When using MPEG-2 compression, if the spacing between P frames is regular, it is possible to embed the 36 Hz time extension layer as a B frame within the MPEG-2 sequence of the 36 Hz base layer. This can support both 36 Hz and 72 Hz displays in a single data stream. For example, both layers may be decoded to generate a 72 Hz signal for computer monitoring, while only the base layer may be decoded and converted to generate a 60 Hz signal for television.

好ましい実施の形態では、ＩＰＢＢＢＰＢＢＢＰＢＢＢＰまたはＩＰＢＰＢＰＢＰＢのＭＰＥＧ−２コード化パターンは共に、一つの独立したストリームに、時間的拡張Ｂフレームのみを含むフレームを一つおきに配置することによって、３６Ｈｚを７２Ｈｚにしている。これらのコード化パターンをそれぞれ図２および３に示す。図３のＰの間隔が２フレームであるコード化パターンは、３６ＨｚデコーダがＰフレームを復号化する必要があるだけで、２４Ｈｚの映画がＢフレームを用いずに復号化される場合に要求されるメモリ帯域幅を削減できるという更なる利点を有する。 In the preferred embodiment, both IPBBBPBBBBPBBBP and IPBPBPBPB MPEG-2 coding patterns are made 36 Hz to 72 Hz by placing every other frame containing only temporally extended B frames in a separate stream. Yes. These coding patterns are shown in FIGS. 2 and 3, respectively. The coding pattern in which the P interval in FIG. 3 is 2 frames is required when a 36 Hz decoder only needs to decode P frames and a 24 Hz movie is decoded without B frames. It has the further advantage that the memory bandwidth can be reduced.

高解像度画像を用いた実験は、図３のＰの間隔が２フレームである場合が、ほとんどの種類の画像について最適であると示した。つまり、図３の構成は、６０および７２Ｈｚの両方をサポートする最適な時間的構造を提供し、かつ近代の７２Ｈｚのコンピュータ互換ディスプレイ上で優れた結果をもたらすようである。この構成は２つのデジタルストリーム、すなわち基本レイヤの３６Ｈｚおよび拡張レイヤＢフレームの３６Ｈｚによって７２Ｈｚを達成している。これが図４で図解されている。図４は、３６Ｈｚ基本レイヤＭＰＥＧ−２デコーダ５０がＰフレームを単純に復号化して３６Ｈｚの出力を生成し、そしてその出力が６０Ｈｚまたは７２Ｈｚの表示のどちらにも容易に変換できることを示すブロック図である。任意の第２のデコーダ５２は、Ｂフレームを単純に復号化して第２の３６Ｈｚの出力を生成し、その出力が基本レイヤデコーダ５０の３６Ｈｚの出力と結合されると、７２Ｈｚの出力が得られる（結合方法は後に議論する）。代替の実施の形態では、１つの高速ＭＰＥＧ−２デコーダ５０は、基本レイヤのＰフレームおよび拡張レイヤのＢフレームの両方を復号化できる。 Experiments using high-resolution images have shown that the interval of P in FIG. 3 is 2 frames is optimal for most types of images. That is, the configuration of FIG. 3 seems to provide an optimal temporal structure that supports both 60 and 72 Hz, and gives excellent results on a modern 72 Hz computer compatible display. This configuration achieves 72 Hz with two digital streams: 36 Hz for the base layer and 36 Hz for the enhancement layer B frame. This is illustrated in FIG. FIG. 4 is a block diagram showing that the 36 Hz base layer MPEG-2 decoder 50 simply decodes the P frame to produce a 36 Hz output, and that output can be easily converted to either a 60 Hz or 72 Hz display. is there. An optional second decoder 52 simply decodes the B frame to produce a second 36 Hz output, and when that output is combined with the 36 Hz output of the base layer decoder 50, a 72 Hz output is obtained. (The combination method will be discussed later). In an alternative embodiment, one high speed MPEG-2 decoder 50 can decode both base layer P frames and enhancement layer B frames.

最適なマスターのフォーマット相当数の会社が、１１Ｍピクセル／秒程度で動作するＭＰＥＧ−２復号化チップを製造している。ＭＰＥＧ−２規格は、解像度およびフレームレートに対する「プロファイル」を幾つか定義している。これらのプロファイルは、６０Ｈｚ、非正方形ピクセル、およびインターレース等、コンピュータ非互換フォーマットパラメータに強く偏っているが、多くのチップメーカは「メインプロファイル、メインレベル」で動作するデコーダチップを開発しているようである。このプロファイルは、最高７２０ピクセルの水平解像度、２５Ｈｚまでは最高５７６ラインの垂直解像度、および３０Ｈｚまでは最高４８０ラインの垂直解像度となるように定められている。また、略１．５Ｍビット／秒から約１０Ｍビット／秒までの幅広いデータレートが規定されている。しかしながら、チップの観点から重要な事項は、ピクセルが復号化されるレートである。メインレベル、メインプロファイルのピクセルレートは約１０．５Ｍピクセル／秒である。 Optimal Master Format A considerable number of companies produce MPEG-2 decoding chips that operate at about 11 Mpixels / second. The MPEG-2 standard defines several “profiles” for resolution and frame rate. These profiles are strongly biased towards computer incompatible format parameters such as 60Hz, non-square pixels, and interlace, but many chipmakers appear to be developing decoder chips that operate at "main profile, main level" It is. This profile is defined to have a horizontal resolution of up to 720 pixels, a vertical resolution of up to 576 lines up to 25 Hz, and a vertical resolution of up to 480 lines up to 30 Hz. A wide range of data rates from about 1.5 Mbit / sec to about 10 Mbit / sec is defined. However, an important issue from the chip perspective is the rate at which pixels are decoded. The pixel rate of the main level and main profile is about 10.5 Mpixels / second.

チップメーカ間でばらつきはあるものの、ほとんどのＭＰＥＧ−２デコーダチップは、高速なサポートメモリを与えられた場合、実際には最高１３Ｍピクセル／秒で動作するだろう。２０Ｍピクセル／秒と同等又はそれを超えるほどに高速なデコーダチップもいくつか存在するだろう。所定のコストにてＣＰＵチップが毎年５０％以上性能アップする傾向を考慮すると、近い将来におけるＭＰＥＧ−２デコーダチップのピクセルレートのフレキシビリティを予測できる。 Despite variations among chip manufacturers, most MPEG-2 decoder chips will actually operate at up to 13 Mpixels / second when given fast support memory. There will also be some decoder chips that are as fast as 20 Mpixels / second or faster. Considering the tendency of CPU chips to improve performance by 50% or more every year at a predetermined cost, it is possible to predict the pixel rate flexibility of the MPEG-2 decoder chip in the near future.

幾つかの望ましい解像度およびフレームレート、ならびにそれらに対応するピクセルレートを表４に示す。

Some desirable resolutions and frame rates and their corresponding pixel rates are shown in Table 4.

少なくとも１２．６Ｍピクセル／秒で生成できるＭＰＥＧ−２デコーダチップを用いてこれらのフォーマットの全てを利用できる。非常に望ましい３６Ｈｚで６４０ｘ４８０のフォーマットは、そのレートが１１．１Ｍピクセル／秒であることから、ほぼ全ての現行チップで実現できる。ワイドスクリーンの１０２４ｘ５１２画像は、１．５：１の圧縮比で６８０ｘ５１２に圧縮することが可能で、１２．５Ｍピクセル／秒が扱えるならば３６Ｈｚでサポートされ得る。大いに望ましい、１０２４ｘ５１２の正方形ピクセルのワイドスクリーンテンプレートは、ＭＰＥＧ−２デコーダチップが約１８．９Ｍピクセル／秒を処理できるとき、３６Ｈｚで実現できる。これは、２４Ｈｚおよび３６Ｈｚ素材がＰフレームのみ用いてコード化され、Ｂフレームが７２Ｈｚの時間拡張レイヤデコーダにおいて必要であるとき、より実現可能性が高くなる。Ｐフレームのみ使用するデコーダは、必要なメモリおよびメモリ帯域幅が少なく、１９Ｍピクセル／秒という目標を達成し易いものにする。 All of these formats are available with an MPEG-2 decoder chip that can generate at least 12.6 Mpixels / second. The highly desirable 640x480 format at 36 Hz can be implemented on almost all current chips because its rate is 11.1 Mpixels / second. Widescreen 1024x512 images can be compressed to 680x512 with a 1.5: 1 compression ratio and can be supported at 36Hz if 12.5M pixels / second can be handled. A highly desirable 1024 × 512 square pixel widescreen template can be realized at 36 Hz when the MPEG-2 decoder chip can handle about 18.9 Mpixels / second. This is more feasible when 24 Hz and 36 Hz material is coded using only P frames and B frames are required in a 72 Hz time enhancement layer decoder. A decoder that uses only P frames requires less memory and memory bandwidth, making it easier to achieve the goal of 19 Mpixels / second.

１０２４ｘ５１２の解像度テンプレートは、２４ｆｐｓでアスペクト比２．３５：１および１．８５：１のフィルムに対して最も頻繁に使用されるだろう。この素材は１１．８Ｍピクセル／秒のみ必要とし、ほとんどの既存のメインレベル−メインプロファイル用デコーダの限界内に適合するはずである。 The 1024x512 resolution template will most often be used for films with aspect ratios of 2.35: 1 and 1.85: 1 at 24 fps. This material requires only 11.8M pixels / sec and should fit within the limits of most existing main level-main profile decoders.

これらのフォーマットの全ては、図６において、２４または３６Ｈｚでの基本レイヤのための「マスターテンプレート」内に示されている。従って、本発明は従来技術と比べて幅広いアスペクト比および時間的解像度を収容する独特な方法を提供する。（マスターテンプレートは以下で更に議論する）。 All of these formats are shown in FIG. 6 in a “master template” for the base layer at 24 or 36 Hz. The present invention thus provides a unique method of accommodating a wide range of aspect ratios and temporal resolutions compared to the prior art. (Master templates are discussed further below).

７２Ｈｚを生成するＢフレームの時間拡張レイヤは、上で規定した２倍のピクセルレートを持つチップを使用して、または第２のチップをデコーダメモリに追加的にアクセスできるように並列的に使用することによって、復号化できる。本発明下では、拡張および基本レイヤのデータストリームを結合して交互のＢフレームを挿入する少なくとも２つの方法が存在する。第１に、結合は、ＭＰＥＧ−２トランスポート層を使用して、デコーダチップに対して不可視な状態でなされ得る。２つのＰＩＤ（プログラムＩＤ）に対するＭＰＥＧ−２トランスポートパケットは、基本レイヤおよび拡張レイヤを含むと認識されることが可能で、それらストリームのコンテンツは共に、２倍のレート能力を持つデコーダチップへ、または適切に構成された１対の標準レートのデコーダへ、単純に送られることが可能になる。第２に、ＭＰＥＧ−２システムのトランスポート層の代わりに、ＭＰＥＧ−２データストリームにおける「データ・パーティショニング」機能を用いることも可能である。データ・パーティショニング機能によってＢフレームはＭＰＥＧ−２圧縮データストリーム内で異なる分類に属するとしてマーキングされ、従って、時間的基本レイヤのレートだけをサポートする３６Ｈｚデコーダから無視されるようにフラグを立てられる。 The time extension layer of the B frame that generates 72 Hz uses a chip with twice the pixel rate as defined above, or in parallel so that the second chip can additionally access the decoder memory. Can be decrypted. Under the present invention, there are at least two ways to combine the enhancement and base layer data streams to insert alternating B-frames. First, the combination can be made invisible to the decoder chip using the MPEG-2 transport layer. An MPEG-2 transport packet for two PIDs (program IDs) can be recognized as including a base layer and an enhancement layer, and the contents of those streams both go to a decoder chip with double rate capability. Or it can simply be sent to a properly configured pair of standard rate decoders. Second, it is also possible to use a “data partitioning” function in the MPEG-2 data stream instead of the transport layer of the MPEG-2 system. The data partitioning function marks B frames as belonging to different classifications in the MPEG-2 compressed data stream and is therefore flagged to be ignored by 36 Hz decoders that support only the temporal base layer rate.

ＭＰＥＧ−２ビデオ圧縮により定義されているように、時間スケーラビリティは、本発明の単純なＢフレーム分割ほど適切ではない。ＭＰＥＧ−２の時間スケーラビリティは先行のＰまたはＢフレームから順方向に参照されるにすぎず、従って、順方向および逆方向の両方で参照される、ここで提案されているＢフレーム符号化によって得られる効率には及ばない。それ故、Ｂフレームを時間拡張レイヤとして単純に使用することは、ＭＰＥＧ−２に定義されている時間スケーラビリティと比べてより単純かつ効率的な時間スケーラビリティをもたらす。それにもかかわらず、このようにＢフレームを時間スケーラビリティのメカニズムとして使用することは、ＭＰＥＧ−２と完全に整合している。Ｂフレームに対するデータ・パーティショニングまたは交互のＰＩＤによってこれらＢフレームを拡張レイヤとして認識する２つの方法もまた、完全に整合している。 As defined by MPEG-2 video compression, temporal scalability is not as good as the simple B-frame splitting of the present invention. The temporal scalability of MPEG-2 is only referenced in the forward direction from the previous P or B frame, and thus is obtained by the proposed B frame encoding referenced in both the forward and reverse directions. It is not as efficient as possible. Therefore, simply using a B frame as a time enhancement layer provides a simpler and more efficient time scalability compared to the time scalability defined in MPEG-2. Nevertheless, using B frames as a temporal scalability mechanism in this way is perfectly consistent with MPEG-2. The two methods of recognizing these B frames as enhancement layers by data partitioning for B frames or alternating PIDs are also perfectly consistent.

５０／６０Ｈｚの時間拡張レイヤ上述した７２Ｈｚ時間拡張レイヤ（３６Ｈｚ信号を符号化する）に加えて、あるいはその代替として、６０Ｈｚの時間拡張レイヤ（２４Ｈｚ信号を符号化する）を同様の方法で３６Ｈｚの基本レイヤに追加できる。６０Ｈｚ時間拡張レイヤは、既存の６０Ｈｚインターレースのビデオ素材を符号化するのに特に有用である。 50/60 Hz time extension layer In addition to or as an alternative to the 72 Hz time extension layer described above (which encodes a 36 Hz signal), a 60 Hz time extension layer (which encodes a 24 Hz signal) can be used in a similar manner. Can be added to the base layer. The 60 Hz time enhancement layer is particularly useful for encoding existing 60 Hz interlaced video material.

既存の６０Ｈｚインターレース素材のほとんどは、アナログで、Ｄ１またはＤ２フォーマットのＮＴＳＣ用ビデオテープである。日本のＨＤＴＶ（ＳＭＰＴＥ２４０／２６０Ｍ）も少しであるが存在する。また、このフォーマットで動作するカメラも存在する。いずれの６０Ｈｚインターレースフォーマットも、信号がデインターレースされおよびフレームレート変換される既知の方法で処理され得る。この処理は、ロボットビジョンに類似した非常に複雑な画像を理解する技術を含んでいる。非常に高性能の技術をもってしても、時間的エイリアシングはアルゴリズムによる「誤解」を生じさせるのが一般的で、時折アーチファクトを生じる。画像キャプチャの典型的な５０％のデューティサイクルは、カメラが半分の時間は「見ていない」ことを意味する。映画における「馬車の車輪の逆回転」は、普通のプラクティスとしての時間的アンダーサンプリングによる時間的エイリアシングの一例である。かかるアーチファクトは人の支援による再構築なしでは除去できないのが一般的である。従って、自動的に訂正できないケースは常に存在するだろう。しかしながら、現行技術で可能な動作変換は、ほとんどの素材上でそれなりの結果をもたらすはずである。 Most of the existing 60 Hz interlaced material is analog, D1 or D2 format NTSC videotape. Japanese HDTV (SMPTE240 / 260M) is also present in a small amount. There are also cameras that operate in this format. Any 60 Hz interlace format may be processed in a known manner where the signal is deinterlaced and frame rate converted. This process includes a technique for understanding very complex images similar to robot vision. Even with very high-performance techniques, temporal aliasing typically causes “misunderstanding” by the algorithm, and sometimes produces artifacts. A typical 50% duty cycle of image capture means that the camera is “not looking” for half the time. “Wagon wheel reverse rotation” in movies is an example of temporal aliasing with temporal undersampling as a common practice. Generally, such artifacts cannot be removed without human-assisted reconstruction. Therefore, there will always be cases that cannot be corrected automatically. However, the behavioral conversions possible with current technology should yield some results on most materials.

高精細カメラまたはテープマシン１台の価格はこのようなコンバータのコストと同等であろう。従って、何台かのカメラおよびテープマシンを有するスタジオであれば、このような変換のコストは適度である。しかしながら、このような処理を十分に行うことは、現時点では家庭およびオフィス向け製品の予算を超えている。従って、既存の素材に対してインターレースを除去し、フレームレートを変換する複雑な処理は、製作スタジオで成し遂げられるのが好ましい。これを図５に示す。図５は、カメラ６０またはその他のソース（非フィルム式ビデオテープ等）６２から、デインターレーサ機能およびフレームレート変換機能を含み３６Ｈｚ信号（３６Ｈｚ基本レイヤのみ）および７２Ｈｚ信号（３６Ｈｚの基本レイヤ＋３６Ｈｚの時間拡張レイヤ）を出力できるコンバータ６４への、６０Ｈｚインターレース入力を示すブロック図である。 The price of a high-definition camera or tape machine would be equivalent to the cost of such a converter. Therefore, the cost of such conversion is reasonable for a studio with several cameras and tape machines. However, adequately performing such processing is currently exceeding the budget for home and office products. Therefore, the complicated process of removing the interlace and converting the frame rate for the existing material is preferably accomplished in the production studio. This is shown in FIG. FIG. 5 shows a 36 Hz signal (36 Hz base layer only) and a 72 Hz signal (36 Hz base layer + 36 Hz) from camera 60 or other source (such as non-film video tape) 62 including deinterlacer and frame rate conversion functions. It is a block diagram which shows the 60-Hz interlace input to the converter 64 which can output a (time extension layer).

７２Ｈｚ信号（３６Ｈｚの基本レイヤ＋３６Ｈｚの時間拡張レイヤ）を出力する代替として、この変換処理は、２４Ｈｚの第２ＭＰＥＧ−２時間拡張レイヤを３６Ｈｚ基本レイヤ上に生成するようになされ、それによりデインターレースされているもののオリジナルの６０Ｈｚ信号を再現できるだろう。同様の量子化を６０Ｈｚ時間拡張レイヤのＢフレームに用いれば、Ｂフレームの数が少ないため、データレートは７２Ｈｚ時間拡張レイヤのレートをわずかに下回るはずである。
＞６０Ｉ→３６＋３６＝７２
＞６０Ｉ→３６＋２４＝６０
＞７２→３６，７２，６０
＞５０Ｉ→３６，５０，７２
＞６０→２４，３６，７２ As an alternative to outputting a 72 Hz signal (36 Hz base layer + 36 Hz time extension layer), this conversion process is adapted to generate a second MPEG-2 time extension layer of 24 Hz on the 36 Hz base layer, thereby deinterlacing. But the original 60Hz signal could be reproduced. If similar quantization is used for 60 Hz time enhancement layer B frames, the data rate should be slightly below the 72 Hz time enhancement layer rate due to the small number of B frames.
> 60I → 36 + 36 = 72
> 60I → 36 + 24 = 60
> 72 → 36,72,60
> 50I → 36, 50, 72
> 60 → 24, 36, 72

米国にとって利益のある素材の圧倒的多数は低解像度ＮＴＳＣである。現在、ほとんどのＮＴＳＣ信号は、ほとんどの家庭用テレビジョン上で実質的な悪化を伴って視聴される。更に視聴者は、フィルムをテレビジョン上で上演するための３−２プルダウンの使用につきものの時間的悪化を甘受してきた。ゴールデンアワーのテレビジョンのほぼ全ては毎秒２４フレームのフィルム上で作られている。従って、スポーツ、ニュース、およびその他ビデオオリジナルのショーだけをこのように処理する必要がある。これらショーの３６／７２Ｈｚフォーマットへの変換に伴うアーチファクトおよび損失は、信号の高品質デインターレース化に伴う改善によって相殺されるはずであろう。 The overwhelming majority of materials that benefit the United States are low resolution NTSCs. Currently, most NTSC signals are viewed with substantial deterioration on most home televisions. In addition, viewers have been tolerant of the worsening of time associated with the use of 3-2 pulldown to perform the film on television. Almost all Golden Hour televisions are made on 24 frames per second film. Therefore, only sports, news, and other video original shows need to be processed in this way. Artifacts and losses associated with converting these shows to 36/72 Hz format would be offset by improvements associated with high quality deinterlacing of the signal.

６０Ｈｚ（または５９．９４Ｈｚ）のフィールドに固有のモーションブラーは、７２Ｈｚのフレームにおけるモーションブラーと非常に類似しているはずである。従って、基本および拡張レイヤを提供するこの手法は、モーションブラーの点では７２Ｈｚのオリジナルと同様に見えるはずである。そのため、インターレース化された６０ＨｚのＮＴＳＣ素材を３６Ｈｚ基本レイヤへ処理し、時間拡張レイヤからの２４Ｈｚを加えて６０Ｈｚで表示した場合、わずかな改善として気付く場合はあっても、ほとんどの視聴者はその違いに気付かないだろう。しかし、新型の７２Ｈｚ非インターレース式デジタルテレビジョンを購入した人は、ＮＴＳＣを視聴した場合には小さな改善に気付き、更に、７２Ｈｚでキャプチャまたは製作された新しい素材を視聴した場合には大きな改善に気付くだろう。復号化された３６Ｈｚ基本レイヤを７２Ｈｚのディスプレイ上で表示した場合でさえ、インターレースのアーチファクトが低速フレームレートに置き換えられて、高品質デジタルＮＴＳＣと同等に見えるだろう。 The motion blur inherent in the 60 Hz (or 59.94 Hz) field should be very similar to the motion blur in the 72 Hz frame. Thus, this approach of providing base and enhancement layers should look similar to the 72 Hz original in terms of motion blur. Therefore, when interlaced 60Hz NTSC material is processed into a 36Hz base layer and 24Hz from the time extension layer is added and displayed at 60Hz, most viewers may notice a slight improvement. You will not notice the difference. However, those who purchase a new 72Hz non-interlaced digital television will notice a small improvement when viewing NTSC, and a significant improvement when viewing new material captured or produced at 72Hz. right. Even when the decoded 36 Hz base layer is displayed on a 72 Hz display, the interlace artifacts will be replaced by a slow frame rate and will look equivalent to a high quality digital NTSC.

同様の処理は、既存の５０ＨｚのＰＡＬ素材を第２ＭＰＥＧ−２拡張レイヤへ変換するのにも適用できる。ＰＡＬビデオテープはかかる変換の前に４８Ｈｚへ減速しておくのが最善である。生のＰＡＬは、相対関係のない５０、３６および７２Ｈｚの各レートを用いる変換が必要である。かかるコンバータユニットは、現時点では放送信号のソースでしか採算が合わず、現時点で家庭およびオフィスの各受信装置では実用的でない。 Similar processing can be applied to converting existing 50 Hz PAL material to the second MPEG-2 enhancement layer. PAL videotapes are best decelerated to 48 Hz before such conversion. Raw PAL requires conversion using rates of 50, 36 and 72 Hz, which are unrelated. At present, such a converter unit is profitable only with the source of the broadcast signal, and at present, it is not practical for home and office receivers.

解像度スケーラビリティ
基本レイヤ上により高い解像度を達成するために、ＭＰＥＧ−２を利用した階層的な解像度スケーラビリティを用いて基本解像度テンプレートを拡張することが可能である。拡張の使用によって基本レイヤの１．５倍および２倍の解像度を達成できる。２倍の解像度は、３／２そして４／３を用いて２ステップで実現してもよいし、係数２の１ステップであってもよい。これを図７に示す。 Resolution Scalability In order to achieve higher resolution on the base layer, it is possible to extend the base resolution template using hierarchical resolution scalability using MPEG-2. The use of extensions can achieve 1.5 times and twice the resolution of the base layer. The double resolution may be realized in 2 steps using 3/2 and 4/3, or may be 1 step with a factor of 2. This is shown in FIG.

解像度拡張処理は、独立したＭＰＥＧ−２ストリームとして解像度拡張レイヤを生成し、その拡張レイヤにＭＰＥＧ−２圧縮を適用することによって達成できる。この手法は、ＭＰＥＧ−２で定義され、極めて非効率であることが証明されている「空間スケーラビリティ」とは異なる。しかしながら、ＭＰＥＧ−２は空間スケーラビリティを提供するために、効果的なレイヤ化された解像度を構成するためのツールの全てを含んでいる。本発明の好ましいレイヤ化された解像度の符号化処理を図８に示す。本発明の好ましい復号化処理を図９に示す。 The resolution enhancement process can be achieved by generating a resolution enhancement layer as an independent MPEG-2 stream and applying MPEG-2 compression to the enhancement layer. This approach is different from “spatial scalability” defined in MPEG-2 and proven to be extremely inefficient. However, MPEG-2 includes all of the tools for constructing an effective layered resolution to provide spatial scalability. A preferred layered resolution encoding process of the present invention is shown in FIG. A preferred decoding process of the present invention is shown in FIG.

解像度レイヤのコード化図８において、２ｋｘ１ｋのオリジナル画像８０は従来方法でフィルタ処理されて各寸法の解像度が１／２になり、１０２４ｘ５１２の基本レイヤ８１が生成される。そして、基本レイヤ８１は従来のＭＰＥＧ−２アルゴリズムに従って圧縮され、伝送に適したＭＰＥＧ−２基本レイヤ８２が生成される。この圧縮ステップの間、ＭＰＥＧ−２の全動き補償（full
MPEG-2 motion compensation）が使用され得ることが重要である。次に、その同じ信号が従来のＭＰＥＧ−２アルゴリズムを用いて伸長され、１０２４ｘ５１２の画像８３に戻る。１０２４ｘ５１２の画像８３は拡張され（例えばピクセル複製によって、または好ましくはスプライン補間等のより優れたフィルタによって）、２ｋｘ１ｋの第１の拡大画像８４になる。 8. Coding of Resolution Layer In FIG. 8, a 2k × 1k original image 80 is filtered by a conventional method so that the resolution of each dimension is halved, and a 1024 × 512 basic layer 81 is generated. Then, the base layer 81 is compressed according to the conventional MPEG-2 algorithm, and an MPEG-2 base layer 82 suitable for transmission is generated. During this compression step, MPEG-2 full motion compensation (full
It is important that MPEG-2 motion compensation) can be used. The same signal is then decompressed using a conventional MPEG-2 algorithm and returned to a 1024 × 512 image 83. The 1024 × 512 image 83 is expanded (eg, by pixel replication or preferably by a better filter such as spline interpolation) to become a 2k × 1k first magnified image 84.

一方、任意のステップとして、フィルタ処理された１０２４ｘ５１２の基本レイヤ８１が拡張されて２ｋｘ１ｋの第２拡大画像８５になる。この２ｋｘ１ｋの第２拡大画像８５が２ｋｘ１ｋオリジナル画像８０から減算され、オリジナルの高解像度画像８０とオリジナルの基本レイヤ画像８１との間の解像度のトップオクターブを表す画像が生成される。得られた画像は任意にシャープネス係数すなわち加重値と乗算され、そして２ｋｘ１ｋのオリジナル画像８０と２ｋｘ１ｋの第２拡大画像８５との間の差分に加算されて、２ｋｘ１ｋの中心加重処理された拡張レイヤのソース画像８６が生成される。次に、この拡張レイヤのソース画像８６は従来のＭＰＥＧ−２アルゴリズムに従って圧縮され、伝送に適した別のＭＰＥＧ−２解像度拡張レイヤ８７が生成される。この圧縮ステップの間、ＭＰＥＧ−２の全動き補償（full
MPEG-2 motion compensation）が利用できることが重要である。 On the other hand, as an optional step, the filtered 1024 × 512 base layer 81 is expanded into a 2k × 1k second enlarged image 85. This 2kx1k second enlarged image 85 is subtracted from the 2kx1k original image 80, and an image representing the top octave of the resolution between the original high-resolution image 80 and the original base layer image 81 is generated. The resulting image is arbitrarily multiplied by a sharpness factor or weight and added to the difference between the 2kx1k original image 80 and the 2kx1k second magnified image 85 to give a 2kx1k center weighted enhancement layer. A source image 86 is generated. This enhancement layer source image 86 is then compressed according to a conventional MPEG-2 algorithm to generate another MPEG-2 resolution enhancement layer 87 suitable for transmission. During this compression step, MPEG-2 full motion compensation (full
It is important that MPEG-2 motion compensation is available.

解像度レイヤ復号化図９において、基本レイヤ８２は従来のＭＰＥＧ−２アルゴリズムを用いて伸長され、１０２４ｘ５１２の画像９０に戻る。１０２４ｘ５１２の画像９０は拡張され、２ｋｘ１ｋの第１画像９１になる。一方、解像度拡張レイヤ８７が従来のＭＰＥＧ−２アルゴリズムを用いて伸長され、２ｋｘ１ｋの第２画像９２に戻る。そして、２ｋｘ１ｋの第１画像９１および２ｋｘ１ｋの第２画像９２は加算され、２ｋｘ１ｋの高解像度画像９３が生成される。 Resolution Layer Decoding In FIG. 9, the base layer 82 is decompressed using a conventional MPEG-2 algorithm and returns to a 1024 × 512 image 90. The 1024 × 512 image 90 is expanded into a 2k × 1k first image 91. On the other hand, the resolution enhancement layer 87 is decompressed using the conventional MPEG-2 algorithm and returns to the 2k × 1k second image 92. Then, the 2k × 1k first image 91 and the 2k × 1k second image 92 are added to generate a 2k × 1k high-resolution image 93.

ＭＰＥＧ−２からの改良点本質的に、拡張レイヤは復号化された基本レイヤを拡張し、そのオリジナル画像とその復号化された基本レイヤとの間の差分を取り、圧縮することによって形成される。しかしながら、オプションとして、圧縮された解像度拡張レイヤは、任意ではあるが復号化後に基本レイヤに加算されて、デコーダ内により高解像度の画像が作成されてもよい。本発明によるレイヤ化された解像度符号化処理は、ＭＰＥＧ−２の空間スケーラビリティと幾つかの点で異なる。すなわち：
・拡張レイヤの差分ピクチャは、Ｉ、Ｂ及びＰフレームと共に、それ自身のＭＰＥＧ−２データストリームとして圧縮される。この違いは、ＭＰＥＧ−２の空間スケーラビリティが効果的でない場合でも、本明細書で提案されている解像度スケーラビリティが効果的であるという主たる理由を表している。ＭＰＥＧ−２に定義されている空間スケーラビリティは、上位レイヤのピクチャと拡張された基本レイヤとの間の差分として、又は実際のピクチャの動き補償されたＭＰＥＧ−２データストリームとして、又は両者が結合したものとして、上位レイヤをコード化することを可能にしている。しかしながら、これらの符号化はいずれも効率的ではない。基本レイヤからの差分を、差分のＩフレームとして考えることも可能だが、それは本発明のような動き補償された差分ピクチャと比較して非効率的である。ＭＰＥＧ−２に定義されている上位レイヤの符号化も、上位レイヤを完全に符号化することに等しいため、非効率的である。そのため、本発明のように、差分ピクチャの動き補償符号化の方が大幅に効率的である。
・拡張レイヤは独立したＭＰＥＧ−２データストリームであるので、基本レイヤおよび拡張レイヤを多重化するためにＭＰＥＧ−２システムのトランスポート層（または他の同様のメカニズム）を用いなければならない。
・拡張および解像度減少フィルタ処理はガウスまたはスプライン関数でよく、ＭＰＥＧ−２の空間スケーラビリティに規定されているバイリニア補間よりも好適である。
・好ましい実施の形態では、画像のアスペクト比が下位および上位レイヤ間で一致していなければならない。ＭＰＥＧ−２の空間スケーラビリティでは、幅および／または高さに対する伸長が許容されている。かかる伸長は、効率の要求に従い、好ましい実施の形態では許容されない。
・効率の要求により、および、拡張レイヤで用いられる極めて大きな圧縮量により、拡張レイヤの全エリアはコード化されない。通常、拡張から除外されるエリアは境界エリアであろう。従って、好ましい実施の形態における２ｋｘ１ｋの拡張レイヤソース画像８６は中心加重されている。好ましい実施の形態では、フェーディング関数（線形加重等）を使用して拡張レイヤを画像の中心に向かって、境界縁部から離れるにつれて「ぼかす」ことにより、画像内の急激な変化を回避する。その上、目で追うことになるディテールを持つ領域を決定する手動または自動の方法を利用して、ディテールを必要とする領域を選択し、および過剰なディテールが要求されない領域を除外することができる。画像全体が基本レイヤレベルのディテールを持っていて、画像の総てが存在している。特別な関心の対象となるエリアのみが拡張レイヤの恩恵を受ける。その他の基準がない場合は、上記の中心加重された実施の形態のように、フレームの縁部または境界が拡張から除外され得る。ＭＰＥＧ−２パラメータであって負号付き整数として使用される「下位＿レイヤ＿予測＿水平＆垂直オフセット」パラメータを、「水平＆垂直＿サブサンプリング＿係数ｍ＆ｎ」の値と組み合わせて使用して、拡張レイヤの矩形の全体的なサイズおよび拡張された基本レイヤ内での配置を指定することができる。
・シャープネス係数を拡張レイヤに加算して、量子化中に発生するシャープネスの損失を相殺する。オリジナルピクチャの鮮明度およびシャープネスを復元するためにのみ、このパラメータを利用し、画像を強調するために利用しないように注意しなければならない。図８に関連して述べたように、シャープネス係数は、オリジナル高解像度画像８０とオリジナル基本レイヤ画像８１（拡張後）との間の解像度の「ハイオクターブ」である。このハイオクターブ画像は、ハイオクターブの解像度のシャープネスおよびディテールを含むことに加え、ノイズがかなり多くなる。この画像を加算しすぎると、拡張レイヤの動き補償符号化が不安定になり得る。加算すべき量はオリジナル画像中のノイズレベルによる。典型的な加重値は０．２５である。ノイズが多い画像の場合は、シャープネスを加算すべきではなく、むしろディテールを維持する従来のノイズ抑制手法を用いて、圧縮前に、拡張レイヤに対するオリジナル画像中のノイズを抑制するのが賢明かもしれない。
・時間スケーラビリティおよび解像度スケーラビリティは、基本レイヤおよび解像度拡張レイヤの両方において、３６から７２Ｈｚへの時間的拡張のためのＢフレームを利用することによって混合される。このようにして、時間スケーラビリティの２レベルで利用可能なオプションがあることから、解像度スケーラビリティの２つのレイヤで４レベルの復号化能力を得ることができる。 Improvements from MPEG-2 In essence, the enhancement layer is formed by extending the decoded base layer, taking the difference between the original image and the decoded base layer, and compressing it. . However, optionally, the compressed resolution enhancement layer may optionally be added to the base layer after decoding to create a higher resolution image in the decoder. The layered resolution encoding process according to the present invention differs from MPEG-2 spatial scalability in several respects. Ie:
The enhancement layer differential picture is compressed as its own MPEG-2 data stream along with I, B and P frames. This difference represents the main reason that the resolution scalability proposed in this document is effective even when the spatial scalability of MPEG-2 is not effective. Spatial scalability defined in MPEG-2 is the difference between the upper layer picture and the extended base layer, or as a motion compensated MPEG-2 data stream of the actual picture, or a combination of both As a matter of course, it is possible to code higher layers. However, none of these encodings are efficient. Although the difference from the base layer can be considered as a difference I frame, it is inefficient compared to a motion compensated difference picture as in the present invention. The encoding of the upper layer defined in MPEG-2 is also inefficient because it is equivalent to completely encoding the upper layer. Therefore, as in the present invention, motion compensation coding of difference pictures is much more efficient.
Since the enhancement layer is an independent MPEG-2 data stream, the transport layer (or other similar mechanism) of the MPEG-2 system must be used to multiplex the base layer and the enhancement layer.
The expansion and resolution reduction filter processing may be a Gaussian or spline function, and is more preferable than the bilinear interpolation defined in the MPEG-2 spatial scalability.
In a preferred embodiment, the image aspect ratio must be consistent between the lower and upper layers. MPEG-2 spatial scalability allows expansion to width and / or height. Such stretching is not allowed in the preferred embodiment, according to efficiency requirements.
-Due to efficiency requirements and due to the very large amount of compression used in the enhancement layer, not all areas of the enhancement layer are coded. Usually, the area excluded from expansion will be a border area. Accordingly, the 2kx1k enhancement layer source image 86 in the preferred embodiment is center weighted. In a preferred embodiment, a fading function (such as linear weighting) is used to avoid abrupt changes in the image by “blurring” the enhancement layer toward the center of the image as it moves away from the border edge. In addition, you can use manual or automatic methods to determine the areas with details you will follow, and select areas that require detail, and exclude areas that do not require excessive detail. . The entire image has basic layer level details and all of the image is present. Only areas of special interest will benefit from the enhancement layer. In the absence of other criteria, frame edges or boundaries may be excluded from expansion, as in the center weighted embodiment described above. Using the “Lower_Layer_Prediction_Horizontal & Vertical Offset” parameter, which is an MPEG-2 parameter and used as a negative integer, in combination with the value of “Horizontal & Vertical_Subsampling_Coefficient m & n” It is possible to specify the overall size of the enhancement layer rectangle and its placement within the extended base layer.
Add the sharpness coefficient to the enhancement layer to offset the sharpness loss that occurs during quantization. Care must be taken to use this parameter only to restore the sharpness and sharpness of the original picture and not to enhance the image. As described in connection with FIG. 8, the sharpness coefficient is the “high octave” of the resolution between the original high resolution image 80 and the original base layer image 81 (after expansion). In addition to including high octave resolution sharpness and detail, this high octave image is considerably noisy. If this image is added too much, the motion compensation coding of the enhancement layer may become unstable. The amount to be added depends on the noise level in the original image. A typical weight is 0.25. For noisy images, sharpness should not be added, rather it may be prudent to suppress noise in the original image for the enhancement layer using a traditional noise suppression technique that preserves the detail before compression. Absent.
Temporal scalability and resolution scalability are mixed by utilizing B frames for temporal expansion from 36 to 72 Hz in both the base layer and the resolution enhancement layer. In this way, since there are options available at two levels of temporal scalability, four levels of decoding capability can be obtained with two layers of resolution scalability.

これらの違いは、ＭＰＥＧ−２の空間スケーラビリティおよび時間スケーラビリティからの本質的な改良点を表している。しかしながら、これらの違いは、図９に示す解像度拡張復号化処理において拡張および加算を行うための追加ロジックがデコーダ内で必要になるかもしれないが、依然としてＭＰＥＧ−２デコーダチップと調和している。かかる追加ロジックは、効果の点で劣るＭＰＥＧ−２の空間スケーラビリティが要求するロジックとほぼ同一である。 These differences represent an essential improvement from the spatial and temporal scalability of MPEG-2. However, these differences are still consistent with the MPEG-2 decoder chip, although additional logic may be required in the decoder to extend and add in the resolution extended decoding process shown in FIG. Such additional logic is almost the same as that required by the spatial scalability of MPEG-2, which is inferior in effect.

任意の解像度拡張レイヤの非ＭＰＥＧ−２コード化ＭＰＥＧ−２とは異なる圧縮手法を解像度拡張レイヤに利用可能である。更に、基本レイヤに対するのと同じ圧縮技術を解像度拡張レイヤに利用する必要もない。例えば、差分レイヤがコード化されるとき、動き補償されたブロックウェーブレットを利用して、非常に効率的にディテールをマッチングおよびトラッキングすることができる。各ウェーブレットを配置する最も効率的な位置が差分の量の変化に起因して画面上でジャンプしたとしても、低振幅の拡張レイヤでは気付かれないだろう。更に、画像の全体をカバーする必要はなく、ディテール上にウェーブレットを配置することのみが必要である。ウェーブレットの配置を画像内のディテール領域によって案内させることも可能である。それらの配置は縁部から離れて偏っていてもよい。 Non-MPEG-2 Coding of Arbitrary Resolution Enhancement Layer A compression technique different from MPEG-2 can be used for the resolution enhancement layer. Furthermore, it is not necessary to use the same compression technique for the resolution enhancement layer as for the base layer. For example, when a difference layer is coded, motion compensated block wavelets can be utilized to match and track details very efficiently. Even if the most efficient position to place each wavelet jumps on the screen due to a change in the amount of difference, the low amplitude enhancement layer will not notice it. Furthermore, it is not necessary to cover the entire image, only the wavelet is placed on the detail. It is also possible to guide the arrangement of wavelets by the detail area in the image. Their arrangement may be biased away from the edge.

多数の解像度拡張レイヤ２Ｍピクセル（２０４８ｘ１０２４）、７２フレーム／秒が１８．５ｍビット／秒でコード化される本明細書に記載のビットレートでは、基本レイヤ（１０２４ｘ５１２、７２ｆｐｓ）および１つの解像度拡張レイヤだけが実現に成功しているにすぎない。しかしながら、解像度拡張レイヤコード化の更なるリファインによって可能になると予想される改善効率により、多数の解像度拡張レイヤが可能になるはずである。例えば、５１２ｘ２５６基本レイヤが４つのレイヤによって１０２４ｘ５１２、１５３６ｘ７６８および２０４８ｘ１０２４に解像度拡張可能になることが考えられる。これは、映画のフレームレートの毎秒２４フレームであれば既存のＭＰＥＧ−２コード化でも可能である。毎秒７２フレーム等の高フレームレートでは、ＭＰＥＧ−２は各解像度拡張レイヤのコード化を充分な効率で行うことができず、現時点ではこの多数レイヤを実現できない。 Multiple resolution enhancement layers 2M pixels (2048 × 1024), 72 frames / second are encoded at 18.5 mbit / s, the bit rate described herein, the base layer (1024 × 512, 72 fps) and one resolution enhancement layer Only succeeded. However, the improved efficiency expected to be made possible by further refinement of the resolution enhancement layer coding should allow for multiple resolution enhancement layers. For example, it is conceivable that a 512 × 256 base layer can be extended to 1024 × 512, 1536 × 768, and 2048 × 1024 by four layers. This is possible with the existing MPEG-2 coding as long as the frame rate of the movie is 24 frames per second. At a high frame rate such as 72 frames per second, MPEG-2 cannot perform the encoding of each resolution enhancement layer with sufficient efficiency, and at the present time, this multiple layer cannot be realized.

マスタリングフォーマット
２０４８ｘ１０２４ピクセルまたはそれに近いテンプレートを利用すると、様々な公開フォーマットに対応する単一のデジタル動画マスターフォーマットソースを作成することが可能である。図６に示すように、２ｋｘ１ｋのテンプレートは、一般的なワイドスクリーンのアスペクト比である１．８５：１および２．３５：１を効率的にサポートできる。２ｋｘ１ｋのテンプレートは１．３３：１およびその他アスペクト比にも対応できる。 Mastering Format Using a template of 2048 x 1024 pixels or near, it is possible to create a single digital video master format source corresponding to various public formats. As shown in FIG. 6, a 2k × 1k template can efficiently support typical widescreen aspect ratios of 1.85: 1 and 2.35: 1. The 2kx1k template can accommodate 1.33: 1 and other aspect ratios.

解像度のレイヤ化においては整数（特に係数２）および単分数（３／２および４／３）が最も効率的なステップサイズであるが、任意の比率を用いて要求されるいかなる解像度レイヤ構造も達成可能である。しかしながら、２０４８ｘ１０２４のテンプレートまたはそれに近いものの使用は、高品質なデジタルマスターフォーマットを提供するのみならず、その他多くの都合のよい解像度を係数２の基本レイヤ（１ｋｘ５１２）から提供可能で、それらは米国テレビジョン規格のＮＴＳＣを含む。 In resolution layering, integers (especially factor 2) and fractions (3/2 and 4/3) are the most efficient step sizes, but any resolution layer structure required using any ratio is achieved. Is possible. However, the use of 2048x1024 templates or close to them not only provides a high-quality digital master format, but many other convenient resolutions can be provided from the base layer of factor 2 (1kx512), which is Includes John standard NTSC.

フィルムを４ｋｘ２ｋ、４ｋｘ３ｋまたは４ｋｘ４ｋ等、より高い解像度でスキャンすることも可能である。任意の解像度拡張を用いると、２ｋｘ１ｋ付近の中心的マスターフォーマット解像度からこれらより高い解像度を形成できる。このようなフィルムに対する拡張レイヤは、画像ディテール、粒状感およびその他ノイズのソース（スキャナーノイズ等）から成るものであろう。このノイズのため、これらの非常に高い解像度に対する拡張レイヤにおいて圧縮技術を使用するには、ＭＰＥＧ−２タイプの圧縮に代わるものが必要になるだろう。幸い、画像内の所望ディテールを維持しつつこのようなノイズの多い信号を圧縮するのに利用できるその他の圧縮技術が存在する。かかる圧縮技術の一例は、動き補償ウェーブレットまたは動き補償フラクタルである。 It is also possible to scan the film with higher resolution, such as 4kx2k, 4kx3k or 4kx4k. With arbitrary resolution extensions, higher resolutions can be formed from a central master format resolution around 2kx1k. An enhancement layer for such a film would consist of image detail, graininess and other sources of noise (such as scanner noise). Because of this noise, an alternative to MPEG-2 type compression would be required to use compression techniques in the enhancement layer for these very high resolutions. Fortunately, there are other compression techniques that can be used to compress such noisy signals while maintaining the desired details in the image. An example of such a compression technique is a motion compensated wavelet or motion compensated fractal.

望ましくは、既存の映画からならば、デジタルマスタリングフォーマットが、フィルムのフレームレート（すなわち２４フレーム毎秒）で作成されるべきである。３−２プルダウンおよびインターレースの両方を共に使用することは、デジタルフィルムマスターには適切ではないだろう。新しいデジタル電子素材には、６０Ｈｚインターレースの使用は近い将来なくなり、本明細書で提案されているように、７２Ｈｚ等、よりコンピュータ互換性が高いフレームレートに取って替わられることが期待されている。デジタル画像マスターは、７２Ｈｚ、６０Ｈｚ、３６Ｈｚ、３７．５Ｈｚ、７５Ｈｚ、５０Ｈｚ、またはその他のいかなるレートであれ、画像がキャプチャされるフレームレートで作られるべきである。 Preferably, from an existing movie, a digital mastering format should be created at the film frame rate (ie, 24 frames per second). Using both 3-2 pulldown and interlace together would not be appropriate for a digital film master. New digital electronic materials will not use 60 Hz interlace in the near future and are expected to be replaced by more computer compatible frame rates such as 72 Hz as proposed herein. The digital image master should be made at the frame rate at which the image is captured, whether 72 Hz, 60 Hz, 36 Hz, 37.5 Hz, 75 Hz, 50 Hz, or any other rate.

全ての電子公開フォーマットに対応する単一のデジタルソースピクチャフォーマットとしてのマスタリングフォーマットの概念は、ＰＡＬ、ＮＴＳＣ、レターボックス、パンアンドスキャン、ＨＤＴＶ、およびその他のマスターが全て概して独立してフィルムのオリジナルから作られるという既存のプラクティスとは異なっている。マスタリングフォーマットの使用により、フィルムおよびデジタル／電子の両方のショーは一度マスター化されれば様々な解像度およびフォーマットで公開できるようになる。 The concept of a mastering format as a single digital source picture format for all electronic public formats is that PAL, NTSC, letterbox, pan and scan, HDTV, and other masters are all generally independent of the original of the film It is different from the existing practice of being created. The use of a mastering format allows both film and digital / electronic shows to be published in various resolutions and formats once mastered.

結合された解像度拡張レイヤおよび時間拡張レイヤ
上述のように、時間的および解像度の両方の拡張レイヤ化は結合できる。時間拡張はＢフレームの復号化によって提供される。解像度拡張レイヤも２つの時間レイヤを有し、従ってＢフレームを含む。 Combined resolution enhancement layer and temporal enhancement layer As mentioned above, both temporal and resolution enhancement layering can be combined. Time extension is provided by decoding B frames. The resolution enhancement layer also has two temporal layers and therefore contains B frames.

２４ｆｐｓのフィルムの場合、最も効率的かつ最も低コストのデコーダはＰフレームのみを使用するかもしれない。それによってメモリおよびメモリ帯域幅が共に最小化し、またＢフレームの復号化を排除することによりデコーダが簡素化される。従って、本発明によれば、２４ｆｐｓの映画の復号化および３６ｆｐｓの次世代テレビジョンの復号化は、Ｂフレーム処理能力を持たないデコーダを利用することも可能である。そして、図３に示すように、Ｂフレームは各Ｐフレーム間で利用され、より高い７２Ｈｚの時間レイヤを生み出し、それを第２のデコーダで復号化することも可能である。また、この第２デコーダもＢフレームを復号化するだけでよいので簡素化され得るであろう。 For 24 fps film, the most efficient and lowest cost decoder may use only P frames. This minimizes both memory and memory bandwidth, and simplifies the decoder by eliminating B-frame decoding. Therefore, according to the present invention, it is possible to use a decoder that does not have B frame processing capability for the decoding of movies of 24 fps and the decoding of next-generation television of 36 fps. Then, as shown in FIG. 3, the B frame is used between each P frame to produce a higher 72 Hz temporal layer that can be decoded by a second decoder. The second decoder could also be simplified because it only needs to decode the B frame.

また、かかるレイヤ化は拡張された解像度レイヤにも適用され、２４および３６ｆｐｓの各レートについてＰおよびＩフレームだけを同様に利用できる。解像度拡張レイヤ内でＢフレームを復号化することを追加することにより、解像度拡張レイヤは高解像度で７２Ｈｚの完全な時間レートを更に達成できる。 Such layering also applies to the extended resolution layer, and only P and I frames can be used for 24 and 36 fps rates as well. By adding decoding of B-frames within the resolution enhancement layer, the resolution enhancement layer can further achieve a full time rate of 72 Hz at high resolution.

デコーダの、結合された解像度および時間スケーラブルのオプションを図１０に示す。この例はまた、本発明の空間−時間レイヤ化された次世代テレビジョンを実現するための略１８ｍビット／秒のデータストリームの割合の配分を示す。 The combined resolution and temporal scalable options of the decoder are shown in FIG. This example also shows the distribution of the data stream rate of approximately 18 mbit / s to realize the space-time layered next generation television of the present invention.

図１０において、基本レイヤのＭＰＥＧ−２、１０２４ｘ５１２ピクセルのデータストリーム（好ましい実施の形態ではＰフレームのみ含む）が基本解像度デコーダ１００へ与えられる。Ｐフレームには略５ｍビット／秒の帯域幅が要求される。基本解像度デコーダ１００は２４または３６ｆｐｓで復号化できる。基本解像度デコーダ１００の出力は、低解像度、低フレームレートの画像（１０２４ｘ５１２ピクセルで２４または３６Ｈｚ）を含む。 In FIG. 10, a base layer MPEG-2, 1024 × 512 pixel data stream (including only P frames in the preferred embodiment) is provided to the base resolution decoder 100. P frames require a bandwidth of approximately 5 mbit / s. The basic resolution decoder 100 can decode at 24 or 36 fps. The output of the basic resolution decoder 100 includes a low resolution, low frame rate image (1024 × 512 pixels at 24 or 36 Hz).

同じデータストリームからのＢフレームは構文解析され、基本解像度時間拡張レイヤデコーダ１０２へ与えられる。かかるＢフレームには略３ｍビット／秒の帯域幅が要求される。基本解像度デコーダ１００の出力は、時間拡張レイヤデコーダ１０２へも連結されている。時間拡張レイヤデコーダ１０２は３６ｆｐｓで復号化できる。時間拡張レイヤデコーダ１０２の結合出力は、低解像度、高フレームレートの画像（１０２４ｘ５１２ピクセルで７２Ｈｚ）を含む。 B frames from the same data stream are parsed and provided to the base resolution time enhancement layer decoder 102. Such B frames require a bandwidth of approximately 3 mbit / s. The output of the basic resolution decoder 100 is also coupled to the time enhancement layer decoder 102. The time enhancement layer decoder 102 can decode at 36 fps. The combined output of the time enhancement layer decoder 102 includes a low resolution, high frame rate image (1024 × 512 pixels, 72 Hz).

また図１０において、解像度拡張レイヤのＭＰＥＧ−２、２ｋｘ１ｋピクセルのデータストリーム（好ましい実施の形態ではＰフレームのみ含む）が基本時間高解像度拡張レイヤデコーダ１０４へ与えられる。これらのＰフレームには略６ｍビット／秒の帯域幅が要求される。基本解像度デコーダ１００の出力も、高解像度拡張レイヤデコーダ１０４に連結されている。高解像度拡張レイヤデコーダ１０４は２４または３６ｆｐｓで復号化できる。高解像度拡張レイヤデコーダ１０４の出力は、高解像度、低フレームレートの画像（２ｋｘ１ｋピクセルで２４または３６Ｈｚ）を含む。 Also in FIG. 10, a resolution enhancement layer MPEG-2, 2 k × 1 k pixel data stream (including only P frames in the preferred embodiment) is provided to the base time high resolution enhancement layer decoder 104. These P frames require a bandwidth of approximately 6 mbit / s. The output of the basic resolution decoder 100 is also connected to the high resolution enhancement layer decoder 104. The high resolution enhancement layer decoder 104 can decode at 24 or 36 fps. The output of the high resolution enhancement layer decoder 104 includes a high resolution, low frame rate image (24 or 36 Hz at 2 k × 1 k pixels).

同じデータストリームからのＢフレームは構文解析され、高解像度時間拡張レイヤデコーダ１０６へ与えられる。かかるＢフレームには略４ｍビット／秒の帯域幅が要求される。高解像度拡張レイヤデコーダ１０４の出力は、高解像度時間拡張レイヤデコーダ１０６へ連結されている。時間拡張レイヤデコーダ１０２の出力も高解像度時間拡張レイヤデコーダ１０６へ連結されている。高解像度時間拡張レイヤデコーダ１０６は３６ｆｐｓで復号化できる。高解像度時間拡張レイヤデコーダ１０６の結合された出力は、高解像度、高フレームレートの画像（７２Ｈｚで２ｋｘ１ｋピクセル）を含む。 B frames from the same data stream are parsed and provided to the high resolution temporal enhancement layer decoder 106. Such B frames require a bandwidth of approximately 4 mbit / s. The output of the high resolution enhancement layer decoder 104 is coupled to the high resolution time enhancement layer decoder 106. The output of the time enhancement layer decoder 102 is also coupled to the high resolution time enhancement layer decoder 106. The high resolution time enhancement layer decoder 106 can decode at 36 fps. The combined output of the high resolution temporal enhancement layer decoder 106 includes a high resolution, high frame rate image (2 k × 1 k pixels at 72 Hz).

このスケーラブル符号化メカニズムを通じて達成される圧縮率は非常に高く、極めて高い圧縮効率を示している。図１０の例における時間的およびスケーラビリティの各オプションについての圧縮率を表５に示す。これらの圧縮率は２４ビット／ピクセルのソースＲＧＢピクセルに基づいている。（従来の１６ビット／ピクセルの４：２：２符号化または従来の１２ビット／ピクセルの４：２：０符号化を計算にいれる場合、圧縮率は示す値のそれぞれ３／４および１／２になるだろう。）

The compression rate achieved through this scalable coding mechanism is very high, indicating very high compression efficiency. Table 5 shows the compression ratio for each of the temporal and scalability options in the example of FIG. These compression rates are based on a source RGB pixel of 24 bits / pixel. (When calculating the conventional 16 bits / pixel 4: 2: 2 encoding or the conventional 12 bits / pixel 4: 2: 0 encoding, the compression rate is 3/4 and 1/2 of the indicated value, respectively. Will be.)

これらの高圧縮率は２つの要素により可能になっている。
１）高フレームレートの７２Ｈｚ画像の高い時間的コヒーレント性（干渉性）
２）高解像度の２ｋｘ１ｋ画像の高い空間的コヒーレント性（干渉性）
３）解像度ディテール拡張を画像の重要部分（例えば、中央の中心部）に適用し、それほど重要でない部分（例えば、フレームの境界）には適用しないこと These high compression ratios are made possible by two factors.
1) High temporal coherency (coherence) of 72Hz images with high frame rate
2) High spatial coherency (coherence) of high resolution 2kx1k images
3) Apply resolution detail enhancement to important parts of the image (eg, central center) and not to less important parts (eg, frame boundaries)

これらの要素は、本発明のレイヤ化圧縮手法において、ＭＰＥＧ−２符号化シンタクスの強さを利用することによって活用されている。これらの強さは、時間スケーラビリティのための双方向補間されるＢフレームを含む。ＭＰＥＧ−２シンタクスはまた、基本および拡張の両レイヤにおいて動きベクトルを使用することよって効率的な動作表現を提供する。ＭＰＥＧ−２はまた、高いノイズおよび素早い画像変化のある閾値までは、ＤＣＴ量子化と連携した動き補償によって拡張レイヤ内で効率的にノイズではなくディテールをコード化する。この閾値を超えたら、データ帯域幅は基本レイヤへ割り当てるのが最善である。これらのＭＰＥＧ−２メカニズムは、時間的にも空間的にもスケーラブルな非常に効率的かつ効果的なコード化を生み出すために本発明に従って使用されるとき、協働する。 These elements are utilized by utilizing the strength of the MPEG-2 coding syntax in the layered compression method of the present invention. These strengths include bi-interpolated B frames for temporal scalability. MPEG-2 syntax also provides efficient motion representation by using motion vectors in both basic and extended layers. MPEG-2 also encodes detail, not noise, efficiently in the enhancement layer through motion compensation in conjunction with DCT quantization up to a threshold with high noise and quick image changes. If this threshold is exceeded, data bandwidth is best allocated to the base layer. These MPEG-2 mechanisms work together when used in accordance with the present invention to produce highly efficient and effective coding that is both temporally and spatially scalable.

５ｍビット／秒のＣＣＩＲ６０１デジタルビデオの符号化と比較すると、表５中の圧縮率ははるかに高い。この理由の１つはインターレースによる、あるコヒーレンスの損失である。インターレースは、垂直に隣接するピクセル間の相関だけでなく、後続のフレームおよびフィールドの両方を予測する能力にも悪影響を与える。従って、ここで説明されている圧縮効率の増加の大部分は、インターレースがないことによるものである。 Compared to 5 mbit / s CCIR601 digital video encoding, the compression rates in Table 5 are much higher. One reason for this is the loss of some coherence due to interlacing. Interlacing adversely affects not only the correlation between vertically adjacent pixels, but also the ability to predict both subsequent frames and fields. Thus, most of the increase in compression efficiency described here is due to the absence of interlace.

本発明によって達成される大きな圧縮率は、各ＭＰＥＧ−２マクロブロックをコード化するのに利用可能なビット数という観点から考慮することができる。上述のように、マクロブロックは、４個の８ｘ８のＤＣＴブロックからなる１６ｘ１６ピクセルのグループであって、Ｐフレームについて１つの動きベクトル、Ｂフレームについて１つまたは２つの動きベクトルを伴っている。各レイヤについてのマクロブロック当たりに利用可能なビットを表６に示す。

The large compression ratio achieved by the present invention can be considered in terms of the number of bits available to code each MPEG-2 macroblock. As described above, a macroblock is a group of 16 × 16 pixels consisting of four 8 × 8 DCT blocks, with one motion vector for P frames and one or two motion vectors for B frames. Table 6 shows the bits available per macroblock for each layer.

各マクロブロックをコード化するのに利用可能なビット数は、基本レイヤよりも拡張レイヤにおいて少ない。基本レイヤはできるだけ高品質であるのが望ましいため、これは適切である。動きベクトルは８ビット程度を必要とし、マクロブロックタイプのコードならびに全４個の８ｘ８ＤＣＴブロックについてのＤＣおよびＡＣ係数に１０〜２５ビットが残される。これは、数個分の「戦略的に利用できる」ＡＣ係数にのみしか余裕がない。従って、統計上、各ブロックについて利用可能な情報のほとんどは、拡張レイヤの先行するフレームから来るものでなければならない。 The number of bits available to code each macroblock is less in the enhancement layer than in the base layer. This is appropriate because it is desirable for the base layer to be as high quality as possible. The motion vector requires about 8 bits, leaving 10-25 bits for the macroblock type code and DC and AC coefficients for all 4 8x8 DCT blocks. This can only afford a few “strategically available” AC coefficients. Thus, statistically, most of the information available for each block must come from the preceding frame of the enhancement layer.

ＭＰＥＧ−２の空間スケーラビリティが、これらの圧縮率においてなぜ効果的でないかが容易に分かる。拡張差分画像によって表わされるディテールのハイオクターブを表すのに十分なＤＣおよびＡＣ係数をコード化するのに利用可能な十分なデータスペースがないからである。ハイオクターブは、主に第５乃至第８の水平および垂直ＡＣ係数において表される。ＤＣＴブロック当たり２〜３ビットしか利用可能でない場合は、これらの係数に到達できない。 It is easy to see why the MPEG-2 spatial scalability is not effective at these compression rates. This is because there is not enough data space available to code enough DC and AC coefficients to represent the high octave of the detail represented by the extended difference image. High octaves are represented primarily in the fifth through eighth horizontal and vertical AC coefficients. If only 2-3 bits per DCT block are available, these coefficients cannot be reached.

ここで記述されたシステムは、過去の拡張差分フレームからの動き補償予測を利用することによってその効率を得ている。これは、時間的および解像度（空間的）レイヤ構造の符号化において優れた結果をもたらす上で明らかに効果的である。 The system described here gains its efficiency by utilizing motion compensated prediction from past extended difference frames. This is clearly effective in producing excellent results in encoding temporal and resolution (spatial) layer structures.

グレースフルデグラデーションここで述べる時間的スケーリングおよび解像度スケーリングの手法は、２ｋｘ１ｋのオリジナルソースを用いた毎秒７２フレームで通常に動作する素材に対して上手く動く。これらの手法は、２４ｆｐｓで動作するフィルムベースの素材に対しても上手く動く。しかしながら、高フレームレートでは、非常にノイズのような画像がコード化される場合、または画像ストリーム中に非常に多い撮影カットが存在する場合、拡張レイヤは、効果的なコード化のために必要なフレーム間のコヒーレンスを失ってしまうことがある。かかる損失は、典型的なＭＰＥＧ−２エンコーダ／デコーダのバッファ占有量／レート制御メカニズムが量子化器を非常に粗いセッティングに設定しようとするので、容易に検出される。この状態に遭遇したとき、通常は解像度拡張レイヤの符号化に使用されるビットを全て基本レイヤに割り当てられ得る。基本レイヤは、そのストレスの多い素材をコード化するのにできるだけ多くのビットを必要とするからである。例えば、基本レイヤについて毎フレーム約０．５および０．３３Ｍピクセルの間で、毎秒７２フレームでは、得られるピクセルレートは２４から３６Ｍピクセル／秒であろう。利用可能なビットを全て基本レイヤに与えることは、フレームあたり約５０万から６７万の追加ビットが１８．５ｍビット／秒で与え、それはストレスの多い素材であっても非常に良好にコード化するのに充分なはずである。 Graceful Degradation The temporal and resolution scaling techniques described here work well for materials that normally operate at 72 frames per second using a 2kx1k original source. These approaches work well for film-based materials that operate at 24 fps. However, at high frame rates, if very noisy images are coded, or if there are too many shooting cuts in the image stream, the enhancement layer is necessary for effective coding. You may lose coherence between frames. Such loss is easily detected because the typical MPEG-2 encoder / decoder buffer occupancy / rate control mechanism attempts to set the quantizer to a very coarse setting. When this condition is encountered, all the bits normally used for encoding the resolution enhancement layer can be assigned to the base layer. This is because the base layer requires as many bits as possible to code the stressful material. For example, between about 0.5 and 0.33 Mpixels per frame for the base layer, and 72 frames per second, the resulting pixel rate would be 24 to 36 Mpixels / sec. Giving all available bits to the base layer gives about 500,000 to 670,000 additional bits per frame at 18.5 mbit / s, which encodes very well even for stressful material Should be enough.

全フレームが非常にノイズ的であったり、および／または、数フレーム毎にカットが発生したりというような、より極端なケースであっても、基本レイヤにおける解像度の損失を伴うことなく、更にグレースフルデグラデーションを行うことが可能である。これは、時間拡張レイヤをコード化するＢフレームを除去することによって達成でき、それにより利用可能な帯域幅（ビット）の全てを３６ｆｐｓの基本レイヤのＩおよびＰフレームに使用できるようになる。これによって、基本レイヤの各フレームについて利用可能なデータ量が約１．０〜１．５ｍビット／フレーム（基本レイヤの解像度による）に増加する。また、極端にストレスの多いコーディング条件の状況下でもなお、３６ｆｐｓというかなり良好な動作表現レートを、基本レイヤのかなりの高品質な解像度で実現するだろう。しかしながら、基本レイヤの量子化器が約１８．５ｍビット／秒、３６ｆｐｓという粗いレベルで依然として動作している場合は、基本レイヤのフレームレートは毎秒２４、１８、あるいは１２フレームにまで劇的が低下し（各フレームに１．５〜４ｍビットを利用可能とするだろう）、それによって最も異常に動く画像タイプであっても処理できるはずである。かかる状況でフレームレートを変化させる方法は公知である。 Even in the more extreme cases where all frames are very noisy and / or cuts occur every few frames, there is no further loss of resolution in the base layer. It is possible to perform full degradation. This can be achieved by removing the B frames that encode the time enhancement layer, so that all of the available bandwidth (bits) can be used for 36 fps base layer I and P frames. This increases the amount of data available for each frame of the base layer to approximately 1.0-1.5 mbit / frame (depending on the resolution of the base layer). Also, even under extremely stressful coding conditions, a fairly good motion representation rate of 36 fps will be achieved with a fairly high quality resolution of the base layer. However, if the base layer quantizer is still operating at a coarse level of about 18.5 mbit / s, 36 fps, the base layer frame rate drops dramatically to 24, 18, or 12 frames per second. However, it would be possible to handle even the most unusual moving image types. A method for changing the frame rate in such a situation is known.

米国の次世代テレビジョンに対する現行提案は、これらのグレースフルデグラデーションの方法を許容しておらず、そのためストレスの多い素材に対しては本発明のシステムほど優れた性能を発揮することはできない。 Current proposals for next-generation television in the United States do not allow these graceful degradation methods, and therefore cannot perform as well as the system of the present invention for stressful materials.

ほとんどのＭＰＥＧ−２エンコーダにおいて、適応量子化レベルは出力バッファ占有量によって制御されている。本発明の解像度拡張レイヤにおける高圧縮率では、このメカニズムは最適には動作しないかもしれない。様々な手法を使用して最適画像領域へのデータ割り当てを最適化できる。概念的に最も単純な手法は、解像度拡張レイヤ上で符号化のプリパスを実施し、統計を集めて、維持するべきディテールを検索することである。プリパスの結果は、解像度拡張レイヤにおけるディテールの維持を最適化するために適した量子化を設定することに使用され得る。また、量子化の設定を画像上で不均一なるように人為的に偏らせ、画像ディテールをメイン画面領域に割り当て、フレームの最縁部のマクロブロックから離れるように偏らせて割り当てることも可能である。 In most MPEG-2 encoders, the adaptive quantization level is controlled by the output buffer occupancy. At high compression rates in the resolution enhancement layer of the present invention, this mechanism may not work optimally. Various techniques can be used to optimize data allocation to the optimal image area. The simplest conceptual approach is to perform an encoding pre-pass on the resolution enhancement layer, gather statistics, and search for details to maintain. The prepass result can be used to set the appropriate quantization to optimize the maintenance of details in the resolution enhancement layer. It is also possible to artificially bias the quantization settings so that they are non-uniform on the image, assign image details to the main screen area, and bias them away from the macroblock at the edge of the frame. is there.

既存のデコーダはかかる改良がなくても良好に機能するので、拡張レイヤの境界を高フレームレートで残す以外は、これら調整のいずれも必要ではない。しかしながら、こうした更なる改良は、拡張レイヤエンコーダに僅かに余分な工夫を施すことによって利用できるようになる。 Existing decoders work well without such improvements, so none of these adjustments are necessary except to leave the enhancement layer boundaries at a high frame rate. However, these further improvements can be exploited with a slight extra effort on the enhancement layer encoder.

結論
新しい共通基盤となる時間レートとして３６Ｈｚを選択することは最適なようである。このフレームレートの使用を論証すると、６０Ｈｚおよび７２Ｈｚの両ディスプレイに対して、２４Ｈｚからは顕著な改善があることが分かる。３６Ｈｚ画像は、７２Ｈｚ画像キャプチャから１つおきのフレームを利用して作成できる。これにより、３６Ｈｚ基本レイヤ（好ましくはＰフレームを使用）と３６Ｈｚ時間拡張レイヤ（Ｂフレームを使用）とを結合して、７２Ｈｚ表示を実現できる。 Conclusion It seems optimal to choose 36 Hz as the new common base time rate. To demonstrate the use of this frame rate, it can be seen that there is a significant improvement from 24 Hz for both 60 Hz and 72 Hz displays. A 36 Hz image can be created using every other frame from a 72 Hz image capture. Thereby, a 36 Hz base layer (preferably using a P frame) and a 36 Hz time extension layer (using a B frame) can be combined to realize a 72 Hz display.

本発明のアプローチによって「未来志向」のレートである７２Ｈｚは譲歩されない。６０Ｈｚ用アナログＮＴＳＣ表示のための移行が提供される。本発明はまた、検討されているその他の受動的エンターテイメント専用（コンピュータ非互換）のその他の６０Ｈｚフォーマットが受け入れられた場合でも、その６０Ｈｚ用表示のための移行をも可能にする。 The “future-oriented” rate of 72 Hz is not compromised by the approach of the present invention. A transition for analog NTSC display for 60 Hz is provided. The present invention also allows a transition for display for 60 Hz, even if other 60 Hz formats dedicated to other passive entertainment being considered (computer incompatible) are accepted.

解像度スケーラビリティは、解像度拡張レイヤに対して分離したＭＰＥＧ−２画像データストリームを用いることにより達成できる。解像度スケーラビリティはＢフレームアプローチを利用して、基本解像度および拡張解像度の両レイヤにおいて時間スケーラビリティを提供できる。 Resolution scalability can be achieved by using a separate MPEG-2 image data stream for the resolution enhancement layer. Resolution scalability can take advantage of the B-frame approach to provide temporal scalability in both basic and extended resolution layers.

ここで説明する発明は多くの非常に望ましい特長を達成している。米国の次世代テレビジョン処理関係者の中には、地上波放送で利用可能な略１８．５ｍビット／秒以内では、解像度または時間スケーラビリティのどちらも高精細解像度で達成できないと主張する者もいる。しかしながら、本発明はこの利用可能なデータレート内で時間スケーラビリティおよび空間−解像度スケーラビリティの両方を達成する。 The invention described herein achieves many highly desirable features. Some people in the US next-generation television processing claim that neither resolution nor temporal scalability can be achieved with high-definition resolution within approximately 18.5 mbit / s available for terrestrial broadcasting. . However, the present invention achieves both temporal scalability and space-resolution scalability within this available data rate.

また、利用可能な１８．５ｍビット／秒のデータレート内では、インターレースを用いずに高フレームレートでの２Ｍピクセルは達成できないとも主張されてきた。しかしながら、本発明は、解像度（空間的）スケーラビリティおよび時間的なスケーラビリティを達成するのみでなく、７２フレーム／秒で２Ｍピクセルを実現できる。 It has also been argued that within the available data rate of 18.5 mbit / s, 2M pixels at high frame rates cannot be achieved without using interlacing. However, the present invention not only achieves resolution (spatial) scalability and temporal scalability, but can realize 2M pixels at 72 frames / second.

これらの能力の提供に加えて、本発明は、特に次世代テレビジョンについての現行提案と比較して非常に強健（ロバスト）でもある。これは、非常にストレスの多い画像素材に遭遇した場合、ほとんどまたは全てのビットを基本レイヤに割り当てることによって可能にされている。そのようなストレスの多い素材は、その性質上ノイズのようであり、かつ、非常に素早く変化する。このような状況下では、解像度の拡張レイヤに関連したディテールは目に見えない。ビットは基本レイヤに当てられるので、再生フレームは、単一で一定した、より高い解像度を用いる現行提案の次世代テレビジョンシステムに比べて、相当に精確なものとなる。 In addition to providing these capabilities, the present invention is also very robust, especially compared to current proposals for next generation television. This is made possible by assigning most or all bits to the base layer when encountering very stressful image material. Such stressful materials are noise-like in nature and change very quickly. Under these circumstances, the details associated with the resolution enhancement layer are not visible. Since the bits are devoted to the base layer, the playback frame is much more accurate than current proposed next generation television systems that use a single, constant, higher resolution.

このようにして、本発明のシステムは、最大限の視覚的インパクトを提供しつつ、知覚上の効率およびコード化の効率を共に最適化するものである。このシステムは、これまで多くの人に不可能と考えられてきた解像度およびフレームレート能力で、非常に鮮明な画像を提供する。本発明のシステムは、現時点で提案されている次世代テレビジョンフォーマットより性能がすぐれている可能性が高いと信じられる。この予想される素晴らしい性能に加え、本発明は時間的および解像度レイヤ構造の非常に価値ある特長も提供する。 In this way, the system of the present invention optimizes both perceptual efficiency and coding efficiency while providing maximum visual impact. This system provides very sharp images with resolution and frame rate capabilities that have been considered impossible for many people. It is believed that the system of the present invention is likely to outperform the next generation television format currently proposed. In addition to this expected great performance, the present invention also provides very valuable features of temporal and resolution layer structures.

暗号化および透かし処理
概要
レイヤ化圧縮は、フレキシブルな暗号化および透かし処理手法をサポートする画像のモジュール化した分解の形式を可能にする。レイヤ化圧縮を用いることにより、基本レイヤおよび基本レイヤの様々な内部コンポーネントは、圧縮されたレイヤ構造の映画データストリームを暗号化および／または透かし処理するために、使用され得る。圧縮されたデータストリームに暗号化および透かし処理を施すことは、オリジナルデータのレートで処理しなければならない高解像度のデータストリームと比較して、必要な処理量を削減する。暗号化および透かし処理に要求される計算時間は、処理しなければならないデータ量に依存する。演算リソースが特定のレベルである場合、レイヤ化圧縮を通じてデータ量を削減することは、暗号強度の改善、又は、暗号化／解読コストの削減、もしくは、その両方を生み出せる。 Encryption and Watermarking Overview Layered compression allows a form of modular decomposition of images that supports flexible encryption and watermarking techniques. By using layered compression, the base layer and various internal components of the base layer can be used to encrypt and / or watermark the compressed layered movie data stream. Encrypting and watermarking the compressed data stream reduces the amount of processing required compared to a high resolution data stream that must be processed at the rate of the original data. The computation time required for encryption and watermarking depends on the amount of data that must be processed. If the computational resources are at a certain level, reducing the amount of data through layered compression can produce improved cryptographic strength and / or reduced encryption / decryption costs.

暗号化によって圧縮画像（および音声）データは保護され、キーを持つユーザに限って情報へ容易にアクセスできるようになる。レイヤ化圧縮は、画像をコンポーネント、すなわち時間および空間基本レイヤ、ならびに時間および空間各拡張レイヤに分解する。基本レイヤは可視ピクチャの復号化へのキーである。従って、時間および空間基本レイヤのみ暗号化すればよく、それによって必要計算量を削減できる。時間および空間拡張レイヤは、解読および伸長された基本レイヤがなければ無価値である。そのため、このようなレイヤ化されたビットのサブセットを使用することにより、ストリーム全体のビットの小部分だけを暗号化することでピクチャストリーム全体を認識不能にできる。様々な暗号化アルゴリズムおよび強度が、拡張レイヤを含むレイヤ化されたストリームの様々な部分に適用され得る。また、暗号化アルゴリズムまたはキーは、暗号化とピクチャストリームとをより絡み合わせるために、各スライス境界（信号エラー回復のためのデータストリーム構造）毎に変更され得もする。 Encryption protects the compressed image (and audio) data and makes it easy for only the user with the key to access the information. Layered compression breaks an image into components: temporal and spatial base layers, and temporal and spatial enhancement layers. The base layer is the key to decoding the visible picture. Therefore, only the time and space base layers need to be encrypted, thereby reducing the amount of computation required. The temporal and spatial enhancement layers are worthless without the decrypted and decompressed base layer. Therefore, by using such a layered subset of bits, the entire picture stream can be made unrecognizable by encrypting only a small portion of the bits of the entire stream. Different encryption algorithms and strengths can be applied to different parts of the layered stream including the enhancement layer. Also, the encryption algorithm or key may be changed at each slice boundary (data stream structure for signal error recovery) to further entangle the encryption and picture stream.

不可視に（またはほとんど不可視に）透かし処理を行うことで作品のコピー（複製品）はマーキングされる。この概念は、紙の中に識別可能な記号を配置して書類（例えば、お金）が真正であることを保証するというプラクティスに端を発している。透かし処理によって、認証済みオーナーまたはライセンシーの所有から取り去られ得るコピーを追跡（トラッキング）できる。従って、透かし処理は遺失したまたは窃取されたコピーをそのソースまでトラッキングするのに役立ち、盗難方法の性質の判定ができ、窃取にかかわった人々の特定を可能にする。 A copy (reproduction) of a work is marked by watermarking invisible (or almost invisible). This concept stems from the practice of placing identifiable symbols in paper to ensure that a document (eg, money) is authentic. Watermarking can track copies that can be removed from the ownership of an authorized owner or licensee. Thus, watermarking helps to track lost or stolen copies to their source, can determine the nature of the theft method, and allows identification of those involved in the theft.

透かし処理の概念は、表示されている実際の画像上にかすかな画像シンボルまたは署名の配置を試みることによって画像へ適用されてきた。電子透かし処理の最も幅広く認知されている概念は、高振幅の可視画像上に印加した低振幅の可視画像である。しかしながら、このアプローチは、テレビジョン上の画面の隅にネットワークロゴを印加する処理と同様に、オリジナル画像の品質を若干変更してしまう。そのような変更はピクチャ品質を低下させてしまうので望ましくない。 The concept of watermarking has been applied to images by attempting to place faint image symbols or signatures on the actual image being displayed. The most widely recognized concept of digital watermarking is a low amplitude visible image applied over a high amplitude visible image. However, this approach slightly alters the quality of the original image, similar to the process of applying a network logo to the corner of the screen on the television. Such a change is undesirable because it degrades picture quality.

圧縮ドメインでは、信号を改変して、透かし記号またはコードをそれらの信号上に印加しつつも、これらの透かし改変が視覚ドメインに直接適用されないようにすることができる。例えば、ＤＣＴ変換は周波数変換スペース内にて動作する。このスペースにおけるいかなる改変も、特にフレームからフレームへと訂正されるのであれば、はるかに見え難い（または完全に不可視）かもしれない。好ましくは、透かし処理は、目には不可視またはほとんど不可視でありながらも、信頼性の高い識別を提供するために、レイヤ化圧縮映画ストリームの特定のフレームにおける特定の係数の低位ビットを使用する。透かし処理は圧縮データストリームの基本レイヤに適用できる。しかしながら、拡張レイヤは最初はディテールにおいて非常に微妙であるため、基本レイヤよりもはるかに大きい程度にて保護することができる。各拡張レイヤは、それ自身の独特な（ユニークな）識別用透かし構造を有することができる。 In the compression domain, the signals can be modified so that watermark symbols or codes are applied on those signals, but these watermark modifications are not applied directly to the visual domain. For example, DCT transform operates in a frequency transform space. Any alterations in this space may be much less visible (or completely invisible), especially if corrected from frame to frame. Preferably, the watermarking uses the low order bits of a particular coefficient in a particular frame of the layered compressed movie stream to provide reliable identification while invisible or nearly invisible to the eye. Watermarking can be applied to the base layer of the compressed data stream. However, the enhancement layer is initially very subtle in detail and can be protected to a much greater extent than the base layer. Each enhancement layer can have its own unique identification watermark structure.

一般に、透かし処理がストリームから容易に除去できないように、暗号化および透かし処理を混合することを確実にするように注意が払われなければならない。このため、透かしをレイヤ化データストリーム内の様々な有用な位置に適用することが有益である。しかしながら、透かし処理は海賊および海賊行為の経路の検出に最も有用であることから、暗号化は完全にまたは部分的に弱められたと推定しなければならず、従って透かし処理は、単純な手順を使ってそれら様々な透かしを除去できないように、データストリーム中に強健に深くしみこまされるべきである。好ましいアプローチは、安全に保管された作品のマスター表示を持ち、そのマスターからランダムなバリエーションを提供して各透かしを独自に作成することである。かかるランダムなバリエーションは、最終的なストリームからはそれらのバリエーションがどのようなものであったかは検出する方法がないので、除去され得ない。しかしながら、透かしを混乱させるべく略奪されたストリームに加えられる追加のランダムなバリエーション（多分可視レベルのノイズを画像に加えることによる）から守るためには、透かしを定める様々なその他の手法（後述の動きベクトル第２ベスト手法等）を備えることが役に立つ。 In general, care must be taken to ensure that encryption and watermarking are mixed so that watermarking cannot be easily removed from the stream. For this reason, it is beneficial to apply the watermark to various useful locations within the layered data stream. However, since watermarking is most useful for detecting pirates and piracy paths, it must be assumed that encryption has been fully or partially weakened, so watermarking uses a simple procedure. So that these various watermarks cannot be removed, they should be stubbornly deeply embedded in the data stream. A preferred approach is to have a master representation of the securely stored work and provide random variations from that master to create each watermark independently. Such random variations cannot be removed because there is no way to detect what those variations were from the final stream. However, to protect against the additional random variations (possibly by adding visible level noise to the image) that are added to the looted stream to disrupt the watermark, various other techniques for defining the watermark (see below) It is useful to have a vector second best technique etc.

暗号化は、できるだけ小さい暗号化ユニットによりできるだけ多くのフレームを撹乱する（スクランブルをかける）、または少なくとも視覚的に損傷を与えるように動作するのが好ましい。様々な種類のＭＰＥＧおよび動き補償ウェーブレット等の圧縮システムは、ある範囲のフレーム（「ピクチャ群：Group
of Pictures」、すなわちＧＯＰ）を復号化するために、カスケード処理しなければならない階層構造の情報のユニットを利用する。この特徴は、その連結された復号化されたユニットの範囲の初期に、少数のパラメータから広い範囲のフレームをスクランブルするように暗号化する機会を提供する。更に、作品を商業的に保護するためには、全てのユニットをより高レベルのユニットの暗号化によって暗号化すなわち撹乱しておく必要はない。例えば、１分おきのフィルムのフレーム、または特に重要なプロットもしくはアクションシーンを暗号化すなわち撹乱しておけば、フィルムは海賊行為にとっては無価値になるであろう。 The encryption preferably operates to perturb as many frames as possible by the smallest possible encryption unit (scramble) or at least visually damage. Various types of compression systems, such as MPEG and motion compensated wavelets, use a range of frames ("Groups"
In order to decode “of Pictures” (ie GOP), it uses units of hierarchical information that have to be cascaded. This feature provides an opportunity to encrypt to scramble a wide range of frames from a few parameters early in the range of the concatenated decrypted units. Furthermore, in order to protect the work commercially, it is not necessary for all units to be encrypted or disturbed by higher level unit encryption. For example, if every minute frame of frames, or particularly important plots or action scenes are encrypted or perturbed, the film will be worthless for piracy.

これに対して、透かし処理の目標は、分析により検出可能ながらも画像中では不可視またはほとんど不可視の（すなわち、顕著な視覚的損傷を生じない）記号および／またはシリアル番号式の識別マークを画像ストリーム上に配置することである。従って、透かし処理は、好ましくは、フレーム群内の各フレームに生じる影響を最小限にするように、復号化ユニットチェーンにおける各ユニットの階層構造の末端付近の各部分へ適用される。 In contrast, the goal of watermarking is to stream a symbol and / or serial number identification mark that is detectable by analysis but invisible or nearly invisible (ie, does not cause significant visual damage) in the image. Is to place on top. Thus, watermarking is preferably applied to each part near the end of each unit's hierarchy in the decoding unit chain so as to minimize the effect on each frame in the group of frames.

例えば、図１１は暗号化および透かし処理が及ぶ範囲をＩ、ＰおよびＢフレームに対するユニット従属性の関数として表す図である。どのフレームを暗号化しても後続の従属フレームが全て撹乱される。従って、最初のＩフレームを暗号化するとそのＩフレームから導き出される全てのＰおよびＢフレームが撹乱される。これに対して、そのＩフレーム上の透かしは通常、後続フレームへは繰り越されず、従ってより数が多いＢフレームに透かし処理を行ってデータストリーム中で透かしをより広く行き渡らせた方がよい。 For example, FIG. 11 is a diagram showing the range covered by encryption and watermark processing as a function of unit dependency on I, P and B frames. No matter which frame is encrypted, all subsequent dependent frames are disturbed. Therefore, when the first I frame is encrypted, all P and B frames derived from that I frame are disturbed. In contrast, the watermark on that I frame is usually not carried over to subsequent frames, so it is better to watermark the larger number of B frames to spread the watermark more widely in the data stream.

ビデオ情報の単位圧縮されたＭＰＥＧタイプまたは動き補償ウェーブレットのビットストリームは、ビデオ中の圧縮された情報の様々な基本的なユニットを正常に抽出および処理することによって、構文解析される。これは、ＭＰＥＧ−２、ＭＰＥＧ−４および動き補償ウェーブレット（ウェーブレットがＩ、ＰおよびＢフレームに相当するものを有すると考えるとき）等の最も効率的な圧縮システムに当てはまる。かかるユニットは、マルチフレームユニット（ＧＯＰ等）、シングルフレームユニット（例えばＩ、ＰおよびＢフレームのタイプ、ならびに動き補償ウェーブレットの均等物）、サブフレームユニット（ＡＣおよびＤＣ係数、マクロブロック、ならびに動きベクトル）、および「分散ユニット（distributed
unit)」（後述）から構成され得る。 Video Information Units A compressed MPEG type or motion compensated wavelet bitstream is parsed by successfully extracting and processing various basic units of compressed information in the video. This is true for the most efficient compression systems such as MPEG-2, MPEG-4 and motion compensated wavelets (when the wavelet is considered to have equivalent to I, P and B frames). Such units include multi-frame units (such as GOP), single frame units (eg, I, P and B frame types, and motion compensated wavelet equivalents), subframe units (AC and DC coefficients, macroblocks, and motion vectors). ), And "distributed units (distributed
unit) ”(described later).

ＧＯＰを暗号化ユニットとして用いる場合、各ＧＯＰは独立した方法および／またはキーで暗号化できる。このようにすると、各ＧＯＰは独自の処理およびモジュール性の恩恵を受け、非リアルタイムまたは準リアルタイム（数秒ほど僅かに遅延される）のアプリケーション（電子映画および放送等）において、他のＧＯＰと並行してまたは順不同で復号化および／または解読できる。最終的なフレームは、最終的な表示の順番に配列されるだけでよい。 When GOPs are used as encryption units, each GOP can be encrypted with an independent method and / or key. In this way, each GOP benefits from its own processing and modularity, and in parallel with other GOPs in non-real-time or near real-time (slightly delayed by a few seconds) applications (such as electronic movies and broadcasts). Or can be decoded and / or decrypted out of order. The final frames need only be arranged in the final display order.

上述のように、特定のユニットの暗号化は、その暗号化されたユニットから取得される情報に従属する他のユニットの適切な復号化を撹乱させることがある。つまり、一つのフレーム内のある情報が後続フレームのビデオ情報の復号化に要求され、先のフレームのみを暗号化した場合に、それ以外には暗号化されていない後のフレームの復号化が撹乱されることがある。従って、暗号化するユニットを選択する際には、特定のユニットを暗号化することによって他の関連するユニットの可用性がどのように撹乱されるかに留意することは有益である。例えば、一つのＧＯＰに及ぶ多数のフレームは、表７に示す様々なレベルで影響を受ける。

As mentioned above, the encryption of a particular unit may disrupt the proper decryption of other units that are dependent on the information obtained from that encrypted unit. In other words, when certain information in one frame is required for decoding video information of the subsequent frame and only the previous frame is encrypted, the decoding of the subsequent frame that is not otherwise encrypted is disturbed. May be. Thus, when selecting a unit to encrypt, it is beneficial to note how encrypting a particular unit disrupts the availability of other related units. For example, multiple frames spanning one GOP are affected at various levels as shown in Table 7.

更に、ＧＯＰのある部分または全てを撹乱させるに、フレーム全体を暗号化する必要はない。フレームのサブユニットを暗号化してもよく、暗号化および解読の処理時間を削減しつつ、なおも撹乱効果を発揮する。例えば、特定フレーム内ユニットを暗号化は、表８に示す様々なレベルで後続フレームに影響を与える。

Furthermore, it is not necessary to encrypt the entire frame to disturb some or all of the GOP. The sub-unit of the frame may be encrypted, and the disturbance effect is still exhibited while reducing the processing time for encryption and decryption. For example, encryption of a specific intra-frame unit affects subsequent frames at various levels as shown in Table 8.

多くのアプリケーション（放送およびデジタル映画等）では遅延を適用して、同類のユニットからのアイテムの集合体を伝送前に暗号化することが可能である。これにより、暗号化／解読ユニットを含むビットが上述した種類の従来型ユニット内でデータストリームのいたるところへ物理的に割り当てられている「分散ユニット」が実現し、キーを知らずに解読することをはるかに困難にできる。解読するには、充分な数の従来型ユニットを（例えば、バッファ内に）集め、グループとして解読する。例えば、フレームまたはＧＯＰの全体についてグループ内にＤＣ係数を集めることができる。同様に、動きベクトルは、１つの動きベクトルから次の動きベクトルへ、１つのマクロブロックから次のマクロブロックへというように、フレームを通じて差分的に（differentially）コード化および予測され、これにより集合体内で暗号化および解読され得る。また、可変長コード化テーブルもグループ内に集められ、「スタートコード」間でモジュール型ユニットを形成可能である。集合化され、暗号化され、そしてその暗号化されたビットがデータストリーム内で分離されまたは分散させられることが可能なユニットまたはサブユニットの更なる例は、動きベクトル、ＤＣ係数、ＡＣ係数および量子化器のスケールファクタを含む。 In many applications (such as broadcast and digital movies) delays can be applied to encrypt collections of items from similar units before transmission. This realizes a “distributed unit” in which the bits including the encryption / decryption unit are physically allocated everywhere in the data stream in the conventional unit of the type described above, and can be decrypted without knowing the key. Can be much more difficult. To decode, a sufficient number of conventional units are collected (eg, in a buffer) and decoded as a group. For example, DC coefficients can be collected in groups for the entire frame or GOP. Similarly, motion vectors are differentially coded and predicted throughout the frame, such as from one motion vector to the next motion vector, from one macroblock to the next, and so on. Can be encrypted and decrypted with. Variable length coding tables are also collected in the group, and a modular unit can be formed between “start codes”. Further examples of units or subunits that can be aggregated, encrypted, and whose encrypted bits can be separated or distributed within a data stream include motion vectors, DC coefficients, AC coefficients, and quantum Includes the scale factor of the generator.

暗号化の適用
好ましい実施の形態では、１つ以上の上記ユニット（または同様の特性を持つ他のデータストリームユニット）が暗号化のために選択されてもよく、各ユニットは、（ＭＰＥＧ−１、ＭＰＥＧ−２およびＭＰＥＧ−４と同様に）結合されたストリームとしてではなく、独立して暗号化され得る。各ユニットの暗号化には異なる強度の異なるキー（例えば、キー毎のビット数）を使用してもよく、そして異なる暗号化アルゴリズムを使用してもよい。 Application of Encryption In a preferred embodiment, one or more of the above units (or other data stream units with similar characteristics) may be selected for encryption, each unit being (MPEG-1, It can be encrypted independently (as in MPEG-2 and MPEG-4) rather than as a combined stream. Different units may use different keys with different strengths (eg, number of bits per key) and different encryption algorithms may be used.

暗号化は独自に作品の個別のコピー毎に適用され得（ＤＶＤ−ＲＡＭ等の物理媒体が使用される場合）、その結果、各コピーがそれ独自のキーを有するようにできる。あるいは、暗号化アルゴリズムは、（例えば、左側のマクロブロックの動きベクトルを全てゼロに設定することによって）ストリームの重要部分が暗号化前にそのデータストリームから除去または改変された状態で組み立てられたストリームに適用され得る。それによりバルク（大量）配信用コピーの形を定める。除去または改変された部分は、次いで、各表示サイト毎に別々におよび独自に暗号化されることができ、それによって便利な方法（例えば、衛星伝送、モデム、インターネット等）で個々のサイトへ別々に送信されるカスタム配信用コピーが定められる。この手法は、例えば、作品の大部分がＤＶＤ−ＲＯＭ等の媒体で配信される一方、より小さい重要な圧縮ユニットの独自のコピーが、それら自身に独自のキーと共に独立した受取先へ別々に送信される（例えば、衛星、インターネット、モデム、速達便等で）場合に有用である。カスタム部分が解読され、そして解読されたバルク配信用コピーと再結合されて初めて、作品全体がビデオ信号として復号化できることになる。かかるカスタム情報の帯域幅（サイズ容量）が大きいほど、カスタム暗号化され得る画像部分も大きい。この手法は透かし処理とも併用できる。 Encryption can be applied independently for each individual copy of the work (when a physical medium such as a DVD-RAM is used), so that each copy has its own key. Alternatively, the encryption algorithm may be a stream assembled with a significant portion of the stream removed or modified from its data stream before encryption (eg, by setting the left macroblock's motion vector to all zeros). Can be applied to. This defines the shape of the bulk (mass) distribution copy. The removed or modified parts can then be encrypted separately and independently for each display site, thereby separating them into individual sites in a convenient manner (eg satellite transmission, modem, internet, etc.) A custom distribution copy to be sent to is defined. This technique, for example, is where most of the work is delivered on a medium such as a DVD-ROM, while unique copies of smaller critical compression units are sent separately to independent recipients with their own keys. This is useful when used (eg, via satellite, internet, modem, express delivery, etc.). Only after the custom part is decrypted and recombined with the decrypted bulk delivery copy can the entire work be decoded as a video signal. The larger the bandwidth (size capacity) of such custom information, the larger the image portion that can be custom encrypted. This technique can be used in combination with watermark processing.

このアプローチの変形には、データストリームのユニットのサブセットをカスタム配信用コピーとして暗号化し、残りのユニットは全く暗号化しないというものがある。残りのユニットはバルク形式でカスタム配信用コピーとは別に配信してもよい。カスタム部分が解読され、そして暗号化解除されたバルク配信用コピーと再結合されて初めて、作品全体がビデオ信号として復号化できる。 A variation of this approach is to encrypt a subset of the units of the data stream as a custom distribution copy and not encrypt the rest of the units at all. The remaining units may be delivered in bulk form separately from the custom delivery copy. Only after the custom part has been decrypted and recombined with the decrypted bulk delivery copy can the entire work be decrypted as a video signal.

１つ以上の全体的な暗号化は、ビデオ復号化情報の様々な重要なユニットに対する特別なカスタマイズされた暗号化に連結又は結合され得る。例えば、ビデオデータストリーム全体を「軽く」（例えば、短いキーまたは単純なアルゴリズムを使用して）暗号化する一方、そのデータストリームの特定の鍵を握るユニットをより「重く」（例えば、長いキーまたはより複雑なアルゴリズムを使用して）暗号化してもよい。例えば、一実施の形態では、最も高次の解像度および／または時間レイヤをより重く暗号化して、適正に解読された場合に最も見栄えがよい画像を提供するプレミアム信号の形を作ってもよい。画像の低次のレイヤがかかる暗号化の影響を受けることはない。このアプローチはエンドユーザに対する様々なグレードの信号サービスを可能にするだろう。 One or more overall encryptions may be concatenated or combined with special customized encryptions for various important units of video decryption information. For example, the entire video data stream is “lightly” encrypted (eg, using a short key or a simple algorithm) while the unit that holds a particular key in the data stream is “heavy” (eg, a long key or It may be encrypted (using a more complex algorithm). For example, in one embodiment, the highest order resolution and / or temporal layer may be more heavily encrypted to create a premium signal shape that provides the best looking image when properly decrypted. The lower layers of the image are not affected by such encryption. This approach will enable various grades of signaling services for end users.

各ユニットが互いに独立して暗号化されている場合、圧縮画像ストリーム内の別々のユニットに１つ以上の並行処理解読方法を私用し、解読を並行して実行してもよい。 If each unit is encrypted independently of each other, one or more parallel processing decryption methods may be used privately on separate units in the compressed image stream, and decryption may be performed in parallel.

透かし処理の適用
上で議論した各ユニットおよび同様の特性を有する他のユニットに関して、圧縮ビデオデータストリーム内の様々な個所が様々な方法での透かしを適用するのに適しており、そのような個所には以下が含まれる。
・変換空間もしくは実空間、またはそれらの組み合わせにおいて。
・ＤＣ係数の最下位ビット（ＬＳＢ）において。例えば、ＤＣ係数は余分なビットを有することが可能である（ＭＰＥＧ２では１０および１１ビット、ＭＰＥＧ４では最高１４ビットが許容される）。低位ビットは、画像を視覚的に全く劣化させることなく特定の透かし識別子をコード化できる。更に、明瞭な透かしが全てのフレームにある必要はないので、このような低位ビットはＩフレームだけにあればよいだろう。
・ＡＣ係数のＬＳＢ内のノイズパターンにおいて。
・低振幅のピクチャ全体の低周波数において、フレームから次のフレームへとコード化されて、視覚的に検出できない画像化パターンを形成する。これは例えば、各フレーム上の少数の低信号振幅の文字または数字であり、各文字が非常に大きく柔らかいものであってもよいだろう。例えば、ひとつのピクセルが２進数値の「８４」を有するべきであるところ、透かし処理はその値を代わりに「８３」に設定することが可能で、その透かしはこの位置において「１」の値を有することになる。その差異は本質的に目には不可視だが、圧縮データストリーム内にコードを形成する。かかる画像化パターンは、復号化画像を、乱されていない（透かし処理を施されていない）伸長されたオリジナルから（および圧縮されていないオリジナルソース作品から）減算し、そしてその振幅を大幅に増大することによって検出されることになる。すると、非常に大きくぼんやりとした一連の文字または数字が出現するだろう。
・伝搬しないフレーム（Ｉフレーム、Ｉフレームの前の最後のＰフレーム、およびＢフレーム等）において、極めて可視性が低いマークを使用する。これらのフレームはまた短時間表示されるにすぎない。
・スライス境界において（通常、マクロブロックラインの左端開始部分） Application of watermarking For each unit discussed above and other units with similar characteristics, different locations in the compressed video data stream are suitable for applying watermarks in different ways, such locations. Includes the following:
• In transformation space or real space, or a combination thereof.
In the least significant bit (LSB) of the DC coefficient. For example, the DC coefficient can have extra bits (10 and 11 bits for MPEG2 and up to 14 bits for MPEG4 are allowed). The low order bits can encode a particular watermark identifier without any visual degradation of the image. Further, such low order bits may only need to be in I frames, since a clear watermark need not be present in every frame.
In the noise pattern in the AC coefficient LSB.
• At low frequencies of the entire low-amplitude picture, it is coded from frame to frame to form an imaging pattern that is not visually detectable. This may be, for example, a small number of low signal amplitude letters or numbers on each frame, where each letter may be very large and soft. For example, where one pixel should have the binary value “84”, the watermarking can set the value to “83” instead, and the watermark has a value of “1” at this position. Will have. The difference is essentially invisible to the eye but forms code within the compressed data stream. Such an imaging pattern subtracts the decoded image from the undisturbed (unwatered) decompressed original (and from the uncompressed original source work) and greatly increases its amplitude Will be detected. Then a very large and hazy series of letters or numbers will appear.
Use marks with very low visibility in non-propagating frames (I frame, last P frame before I frame, B frame, etc.). These frames are also only displayed for a short time.
-At the slice boundary (usually the left end of the macroblock line)

これらの個所における透かしは一般に、ピクセルデータの小変動のパターンが加えられている。これらの変動は、ピクセルの輝度および色彩におけるビット変動の振幅が非常に低いため、および／または、表示の短さのため、目には不可視またはほとんど不可視の画像または記号を形成することがある。例えば、図１２Ａおよび１２Ｂは、異なるタイプの透かしを持つ画像フレーム１２００の図である。図１２Ａは、一隅に単一の記号（「Ｘ」）１２０２を持つフレーム１２００を示す。図１２Ｂは、マーク（この例では点）１２０４のセットを持つフレーム１２００を示し、マーク１２０４はフレーム１２００上で分散している。これらの透かしはデータ比較によってのみ検出でき、透かし信号を発生する。例えば、精密なデコーダは、目には不可視であるが、オリジナル作品のカスタマイズされたコピーに独自の透かし処理を施す、オリジナル作品と透かし処理を施された作品との間の、ＬＳＢ変動を検出できる。 The watermarks at these locations are generally added with a pattern of small variations in pixel data. These variations may form images or symbols that are invisible or nearly invisible to the eye due to the very low amplitude of bit variations in pixel brightness and color and / or due to the short display. For example, FIGS. 12A and 12B are diagrams of an image frame 1200 with different types of watermarks. FIG. 12A shows a frame 1200 with a single symbol (“X”) 1202 in one corner. FIG. 12B shows a frame 1200 with a set of marks (dots in this example) 1204, where the marks 1204 are dispersed on the frame 1200. These watermarks can only be detected by data comparison and generate a watermark signal. For example, a precision decoder can detect LSB fluctuations between an original work and a watermarked work that is invisible to the eye but performs its own watermarking on a customized copy of the original work. .

特定の画像や記号は付加しないが、データストリーム中にユニークなパターンを形成する他の透かし処理形式を使用してもよい。例えば、コード化のある決定は、ほとんど不可視であるが、データストリームに透かし処理を施すのに使用され得る。例えば、レート制御の小変動は目には不可視だが、各コピーをマーキングするのに用いて、各コピーが幾つかの位置ではわずかに異なる数のＡＣ係数を有するように使用され得る。その他のかかる決定の例には以下が含まれる：
・Ｉフレーム内のレート制御の変動
・ＰおよびＢフレーム内のレート制御の変動
・ＬＳＢに影響を与える、特定のＡＣ係数の割り当て Although no specific image or symbol is added, other watermarking formats that form a unique pattern in the data stream may be used. For example, certain coding decisions are almost invisible, but can be used to watermark the data stream. For example, small fluctuations in rate control are invisible to the eye, but can be used to mark each copy so that each copy has a slightly different number of AC coefficients at several locations. Other examples of such decisions include the following:
• Rate control fluctuations in I frames • Rate control fluctuations in P and B frames • Assignment of specific AC coefficients that affect LSB

同様に、透かしコードを作成するために、最適な動きベクトルとほぼ同等の２番目に最良な動きベクトルを選択してもよい。また、全く同じＳＡＤ（差分絶対和、共通動きベクトルのマッチング基準）が発生し且つ発生する場所で、それらの２番目に最良なものを選択してもよい。必要があれば、その他の非最適（例えば、３番目及び高ランクの）動きベクトルの一致も、視覚的な損傷をほとんど伴うことなく使用できる。かかる２番目選択（およびさらにそれ以上の位）の動きベクトルは、透かしコードを形成するために、時折（例えばフレーム毎に２〜３紺個）コヒーレントのパターンで使用されることが必要となるのみである。 Similarly, the second best motion vector, which is substantially equivalent to the optimal motion vector, may be selected to create a watermark code. Alternatively, the second best one may be selected where exactly the same SAD (absolute difference sum, common motion vector matching criterion) occurs and occurs. If necessary, other non-optimal (eg, third and higher rank) motion vector matches can be used with little visual damage. Such second-choice (and higher) motion vectors only need to be used in a coherent pattern from time to time (eg 2-3 紺 per frame) to form a watermark code. It is.

画像の変動は、フレームの周辺部付近（すなわち上部、下部、右縁および左縁付近）では一層見え難い。そのため、画像または記号のタイプの透かしが僅かでも見える恐れがあるなら、その選択した透かしを画像の縁部領域に適用する方がよい。可視性が極めて低い透かし処理方法（２番目に最適な動きベクトルまたはレート制御の変動等）は、画像上の至る所で使用できる。 Image variations are less visible near the periphery of the frame (i.e., near the top, bottom, right edge, and left edge). Thus, if there is a possibility that even a small image or symbol type watermark may be visible, it is better to apply the selected watermark to the edge region of the image. Watermarking methods with very low visibility (such as the second optimal motion vector or rate control variation) can be used everywhere on the image.

また、透かし処理は、透かし処理を施したコピー毎に独自の（ユニークな）シリアル番号式のコードとしてコード化できる。従って、オリジナル作品の１０００個のコピーは、１つ以上の上記手法を使用してわずかに異なるように各々透かしが入れられることになる。透かし処理が施されたコピーがそれぞれどこへ出荷されたかを追跡することによって、無許可コピー中に透かしが見つかればどのコピーが無許可複製のソースになったかを決定することが可能である。 The watermark processing can be coded as a unique (unique) serial number type code for each copy subjected to the watermark processing. Thus, 1000 copies of the original work will each be watermarked slightly differently using one or more of the above techniques. By tracking where each watermarked copy was shipped, it is possible to determine which copy was the source of unauthorized duplication if a watermark was found during unauthorized copying.

透かしの検出
これらの透かし処理方法のほとんどは、その透かしを明らかに見えるようにする（解読する）ために、透かし処理を施した各コピーとの比較用の参照基準として伸張されたオリジナル画像を使用することを要求する。両画像間の差異が透かしを明らかにする。従って、マスターとなる伸張されたソースを安全な場所に保管する必要がある。安全確保が要求されるのは、マスターとなる伸張されたソースのコピーが所有されると、透かし処理方法の多くを駄目にするのに充分な情報が提供されることになるからである。しかしながら、透かし処理比較用マスターの盗難それ自体は検出できる。透かし処理比較用マスターには自動的にそれ自身に完璧にマッチするように「透かし処理」が施されるからである。コピーを撹乱する（すなわち、透かしを見つけて除去する）のに透かし処理比較用マスターが使用された場合、それはマスターを所有していることを示唆する。 Watermark detection Most of these watermarking methods use the decompressed original image as a reference for comparison with each watermarked copy to make it visible (decrypted). Require to do. The difference between the two images reveals the watermark. Therefore, it is necessary to store the master stretched source in a safe place. Security is required because possessing a master stretched source copy provides enough information to ruin many of the watermarking methods. However, the theft of the watermark processing comparison master itself can be detected. This is because the watermark processing comparison master is automatically subjected to “watermark processing” so as to perfectly match itself. If a watermarking comparison master was used to disturb the copy (ie find and remove the watermark), it indicates that you own the master.

低振幅の大きくぼんやりとした記号または画像を透かしとして使用することは、伸張されたマスターソースに対する比較によるだけでなく、圧縮されていないオリジナル作品に対する比較によっても、かかる記号や画像を検出できるという利点を有する。従って、圧縮されていないオリジナル作品を独立した安全な環境に格納することにより、低振幅の透かしはオリジナルの（それ以外には変更されていない）圧縮マスターソース内で使用され得る。このように、オリジナル作品または圧縮／伸張マスターソースのどちらかが窃取されたとしても、透かし比較用の参照基準は残っているだろう。しかしながら、両方を所有することは、両方の透かしを駄目にすることができるであろう。 The advantage of using a low-amplitude large blurred symbol or image as a watermark is that it can be detected not only by comparison to a stretched master source, but also by comparison to an uncompressed original work Have Thus, by storing the uncompressed original work in a separate and secure environment, the low amplitude watermark can be used in the original (unmodified otherwise) compressed master source. Thus, even if either the original work or the compression / decompression master source is stolen, the reference standard for watermark comparison will remain. However, owning both would be able to ruin both watermarks.

透かしの脆弱性
透かし処理を用いる際に重要なことは、かかるマークの検出を駄目にしたりまたは撹乱するのに使用され得る方法を理解しておくことである。透かし処理方法には、画像に少量のノイズを加えることによって撹乱されてしまうものもある。これによって画像の品質は多少劣化し、その劣化は視覚的には小さいかもしれない。しかし、透かしの判読を撹乱するには充分である可能性がある。ノイズを加える撹乱に対して脆弱な透かし処理手法は、ＤＣまたはＡＣ係数内のＬＳＢを使用するものを含む。 Watermark Vulnerability When using watermarking, it is important to understand how it can be used to ruin or disrupt the detection of such marks. Some watermark processing methods are disturbed by adding a small amount of noise to the image. This degrades the image quality somewhat, and the degradation may be visually small. However, it may be sufficient to disrupt watermark interpretation. Watermarking techniques that are vulnerable to disturbances that add noise include those that use LSBs in DC or AC coefficients.

ノイズを用いて撹乱を起こすことがもっと困難な透かし処理方法もある。ノイズによる撹乱に対して耐性があるものの、依然として容易に検出できる透かし処理手法には、低振幅のピクチャ全体の低周波数の画像変動（画像上にスーパーインポーズされた低振幅の非常にぼんやりとした大きな単語等）、２番目に最適な動きベクトルおよびレート制御の小変動を含む。 There are also watermarking methods that are more difficult to disturb with noise. A watermarking technique that is resistant to noise disturbances, but still easy to detect, includes low-frequency image fluctuations throughout the low-amplitude picture (very low-amplitude, low-amplitude superimposed on the image). Large words, etc.) including the second optimal motion vector and small variations in rate control.

従って、透かしの検出を撹乱しようとする単純な方法を駄目にするために、
複数の透かし処理方法を利用することは価値が高い。更に、暗号化を用いることによって、暗号化が傷つけられない限り、透かしを改変できないことを確実にすることができる。そのため、好ましくは、透かし処理はアプリケーションに適した強度の暗号化と併用される。 Therefore, to avoid the simple method of trying to disturb watermark detection,
It is highly valuable to use a plurality of watermark processing methods. Furthermore, the use of encryption can ensure that the watermark cannot be altered unless the encryption is compromised. Therefore, preferably, the watermark processing is used in combination with encryption having a strength suitable for the application.

ツールキットアプローチ
本発明のこのような態様を含む暗号化および透かし処理の様々な概念は、好ましくは、価値の高い音声／映像媒体を保護するというタスクに適用できるツールセットとして具現化される。ツールは、レイヤ化圧縮データストリームの保護システムを作るために、コンテンツ開発者または配信者によって要望に応じて様々に組み合わせ得る。 Tool Kit Approach The various concepts of encryption and watermarking including such aspects of the present invention are preferably embodied as a tool set that can be applied to the task of protecting valuable audio / video media. The tools can be variously combined as desired by the content developer or distributor to create a layered compressed data stream protection system.

例えば、図１３は本発明の暗号化手法を適用した１つの方法を示すフローチャートである。暗号化されるユニットが選択される（ステップ１３００）。これは、上記ユニットのいずれ（例えば、分散ユニット、マルチフレームユニット、シングルフレームユニット、もしくはサブフレームユニット）であってもよいし、または同様の特性を持つその他のユニットであってもよい。暗号化アルゴリズムが選択される（ステップ１３０２）。これは上記のように、暗号化セッションを通じて適用される単一のアルゴリズムであってもよいし、またはユニット毎に選択されたものであってもよい。好適なアルゴリズムは周知であり、例えば、ＤＥＳ、トリプルＤＥＳ、ＲＳＡ、ブローフィッシュ他等、秘密と公開キーとの両方のアルゴリズムを含む。次に、１つ以上のキーが生成される（ステップ１３０４）。これはキーの長さおよびキーの値の両方の選択を伴う。再度、これは上記のように、暗号化セッションを通じて適用される単一の選択であってもよいし、または真にユニット毎の選択であってもよい。最後に、選択されたアルゴリズムおよびキーを使用してユニットが暗号化される（ステップ１３０６）。そして次のユニットに対する処理が繰り返される。当然、幾つかのステップ、特にステップ１３００、１３０２および１３０４は、異なる順番で実行されてもよい。 For example, FIG. 13 is a flowchart showing one method to which the encryption method of the present invention is applied. A unit to be encrypted is selected (step 1300). This may be any of the above units (eg, a distributed unit, a multi-frame unit, a single frame unit, or a sub-frame unit), or any other unit with similar characteristics. An encryption algorithm is selected (step 1302). This may be a single algorithm applied throughout the encryption session, as described above, or may be selected per unit. Suitable algorithms are well known and include both secret and public key algorithms such as DES, Triple DES, RSA, Blowfish, etc. Next, one or more keys are generated (step 1304). This entails selection of both key length and key value. Again, this may be a single selection applied throughout the encryption session, as described above, or a truly unit-by-unit selection. Finally, the unit is encrypted using the selected algorithm and key (step 1306). Then, the process for the next unit is repeated. Of course, some steps, in particular steps 1300, 1302 and 1304, may be performed in a different order.

伸張のためには、データストリームを解読するために関連するキーが適用されるであろう。その後、データストリームは上記のように伸張および復号化され、表示可能な画像が生成される。 For decompression, the associated key will be applied to decrypt the data stream. The data stream is then decompressed and decoded as described above to produce a displayable image.

図１４は本発明の透かし処理手法を適用する１つの方法を示すフローチャートである。透かし処理が施されるユニットが選択される（ステップ１４００）。これも、上記ユニットのいずれ（例えば、分散ユニット、マルチフレームユニット、シングルフレームユニット、もしくはサブフレームユニット）であってもよいし、または同様の特性を持つその他のユニットであってもよい。そして、ノイズに耐性のある方法およびノイズに耐性のない方法等、１つ以上の透かし処理手法が選択される（ステップ１４０２）。これは、透かし処理セッションを通じて適用される１つの選択であってもよいし、または真にユニット（もしくは、異なるタイプのユニットに２つ以上の透かし処理手法が適用される場合、ユニットの分類）毎の選択であってもよい。最後に、選択した手法を使用して選択したユニットに透かし処理を施す（ステップ１４０４）。そして次のユニットに対して処理を繰り返す。当然、幾つかのステップ、特にステップ１４００および１４０２は、異なる順番で実行されてもよい。 FIG. 14 is a flowchart showing one method of applying the watermark processing method of the present invention. A unit to be watermarked is selected (step 1400). This may be any of the above units (for example, a distributed unit, a multi-frame unit, a single frame unit, or a sub-frame unit), or other unit having similar characteristics. Then, one or more watermark processing methods are selected (step 1402), such as a method that is resistant to noise and a method that is not resistant to noise. This may be one choice applied throughout the watermarking session, or truly per unit (or unit classification if more than one watermarking technique is applied to different types of units). May be selected. Finally, watermark processing is performed on the selected unit using the selected method (step 1404). Then, the process is repeated for the next unit. Of course, some steps, in particular steps 1400 and 1402, may be performed in a different order.

キー管理
暗号化／解読キーは、より安全なまたは同期化されたキーを構成するために、様々な情報のアイテムと結び付けられ得る。例えば、公開または秘密の暗号化および解読キーは、以下の構成要素のいずれかを含むように生成しても、あるいはそれらから導き出してもよい。
・過去のキー。
・宛先の装置（例えば、安全なシリアル番号を持つ映画館のプロジェクタ）のシリアル番号。
・日付または時間の範囲（安全な時計を使用）、キーが特定時間しか作用しないようにする（例えば、週の特定曜日のみ、１週間等の相対的な期間のみ）。例えば、暗号化システムは、時間ソースとしてデコーダ内での安全なＧＰＳ（全地球測位衛星）の使用を計画してもよい。解読処理装置はその安全な時間ソースへアクセスする必要があるだけで、画像ファイルまたはストリームを解読できる。
・解読処理装置の位置。ＧＰＳ能力によってかなり正確なリアルタイム位置情報をキーに組み込むことが可能になる。また、既知の宛先の静的インターネットプロトコル（ＩＰ）アドレスも使用可能である。
・各映画館から（手動または自動で）報告される、作品の過去の上映回数の会計記録。
・特定の認証者（例えば、映画館の管理者）の「ＰＩＮ」（個人識別番号）。
・物理的なカスタマイズ暗号化された映画（例えばＤＶＤの映画であり、そこでは各映画は特定の映画館に対してユニークにキー登録がなされる）を用いて、その意図されているサイトでキー保有者による暗号化された映画のそのものの所有をもって、後続映画のキー認証の形式とすることが可能である。例えば、映画の一部を再生し遠隔地のキー発生サイトへその部分を伝送することを、キー認証プロトコルの一部にすることが可能である。更に、配信用コピーがハードディスクまたはＤＶＤ−ＲＡＭ等の消去可能な媒体に格納されている場合、キー要素として暗号化された映画データを使用することが安全な媒体消去キーと結び付けられ得る。このようにして、新しい映画を取得するためのキー処理の一部として過去の映画は消去される。
・また、キーは、特定の上映回数またはその他の自然数の使用単位に対して有効であり、その後は新しいキーを要求するようにできる。 Key Management The encryption / decryption key can be associated with various items of information to form a more secure or synchronized key. For example, public or private encryption and decryption keys may be generated or derived from including any of the following components:
・ Past key.
The serial number of the destination device (eg a cinema projector with a secure serial number).
A date or time range (uses a safe clock) and ensures that the key only works for a specific time (eg only for a specific day of the week, only for a relative period such as a week). For example, the encryption system may plan to use a secure GPS (Global Positioning Satellite) in the decoder as a time source. The decryption processor can decrypt the image file or stream only by accessing the secure time source.
-The position of the decoding processor. The GPS capability allows fairly accurate real-time location information to be incorporated into the key. A known destination static Internet Protocol (IP) address can also be used.
・ Accounting record of the number of past screenings of the work reported by each movie theater (manually or automatically).
A “PIN” (personal identification number) of a specific authenticator (eg, movie theater administrator).
Using a physical customized encrypted movie (eg a DVD movie, where each movie is uniquely keyed to a particular theater) and keyed at its intended site Ownership of the encrypted movie itself by the holder can be used as a key authentication format for subsequent movies. For example, playing a portion of a movie and transmitting that portion to a remote key generation site can be part of a key authentication protocol. Further, if the distribution copy is stored on an erasable medium such as a hard disk or DVD-RAM, the use of encrypted movie data as a key element can be associated with a secure media erasure key. In this way, past movies are erased as part of the key process for obtaining a new movie.
Also, the key is valid for a specific number of screenings or other natural number of usage units, after which a new key can be requested.

解読用キーの配信を管理する様々な方法が適用できる。様々なキー管理戦術が各使用方式および各データ配信方式（ネットワークデータ転送、衛星、または物理的なディスクもしくはテープ媒体のいずれか）に適用できる。キー配信および管理手続きの例を以下に示す。
・キーは、媒体（例えばフロッピーディスク（フロッピーは登録商標）、ＣＤＲＯＭ）に格納されて翌日配達で宛先へ物理的に送られ、または電子的にもしくは文書形式で（例えばファクシミリ、電子メール、直結データ伝送、インターネット伝送等によって）伝送することができる。
・公開キーによる方法は、認証された第三者によるキー検証に加えて局所的な独自キーとも併用できる。
・各宛先（例えば映画館）毎にキーの解読および適用規則を予め定義しておくことで、キーそのものを暗号化して電子的に伝送（例えば直結データ伝送、インターネット伝送、電子メール、他によって）してもよい。
・新しいキーを取得または利用する条件として現行キーの所有を要件にしてもよい。現行キー値は上記の好適などの手段でキー管理サイトへ伝送してもよく、新しいキーは上記の手段の１つで返送できる。
・解読キーの使用は、解読の全ケースでキーの適用を検証または認証するキー管理サイトとの「キー握手」を要求してもよい。例えば、解読キーは、キー管理サイトによって維持される追加記号、その特定の記号は使用の度に変化するのだが、と結合する必要があってもよい。キー握手の使用は、上映毎、もしくは使用時間の長さ毎、またはその他の自然数値の単位毎に使用可能である。かかる使用は自然数単位の課金であってもよいので、キー管理は、使用回数または使用時間をログ記録してキー保持者に対して適切に課金（例えば、映画館に対する上映毎のレンタル料金）する課金システムと一体化することもできる。例えば、キー管理および使用ログ記録の両方を、認証された各上映または使用時間、に対する課金を同時に扱えるキー認証サーバシステムに結び付けることができる。 Various methods for managing the distribution of the decryption key can be applied. Various key management tactics can be applied to each usage method and each data distribution method (either network data transfer, satellite, or physical disk or tape media). An example of key distribution and management procedures is shown below.
The key is stored on a medium (eg floppy disk (floppy is a registered trademark), CDROM) and physically sent to the destination on the next day delivery, or electronically or in document form (eg facsimile, e-mail, direct connection data) (By transmission, internet transmission, etc.).
-The public key method can be used with a local unique key in addition to key verification by an authorized third party.
-By pre-defining key decryption and application rules for each destination (eg movie theater), the key itself is encrypted and transmitted electronically (eg, direct data transmission, Internet transmission, e-mail, etc.) May be.
-You may require ownership of the current key as a condition for obtaining or using a new key. The current key value may be transmitted to the key management site by any suitable means as described above, and the new key can be returned by one of the above means.
Use of a decryption key may require a “key handshake” with a key management site that verifies or authenticates the application of the key in all cases of decryption. For example, a decryption key may need to be combined with an additional symbol maintained by the key management site, although that particular symbol changes with each use. The use of the key handshake can be used for each screening, for each length of usage time, or for each other unit of natural values. Since such usage may be billed in a natural number unit, the key management logs the number of times of use or usage time and appropriately charges the key holder (for example, rental fee for each movie show) It can also be integrated with a billing system. For example, both key management and usage log records can be tied to a key authentication server system that can simultaneously handle billing for each authenticated show or usage time.

あるキーは、オンサイトで認証されるキーに対する事前認証されたキーであってもよい。事前認証キーは、一般に、キー管理サイトによって一度に１つずつ発行されるだろう。オンサイトキー認証では、キー管理サイトが映画館に対してキーのセットを発行し、それにより現場の管理者が、観客の要望に対応するために、当初予測したよりも人気が出た映画の追加的解読（すなわち上映）の権限を与えることを可能にしてもよい。かかるキーを使用する場合、課金目的で、追加上映についてキー管理サイトへ信号を（例えば、インターネットを介して送信する電子メールもしくはデータ記録により、またはモデムにより）送るようにシステムが設計されているのが好ましい。 Some keys may be pre-authenticated keys to keys that are authenticated on-site. Pre-authentication keys will generally be issued one at a time by the key management site. With on-site key authentication, the key management site issues a set of keys to the movie theater so that on-site managers can respond to audience demands for movies that are more popular than originally anticipated. It may be possible to authorize additional decryption (ie screening). If such a key is used, the system is designed to send a signal to the key management site for additional screening (eg, via email or data record sent over the Internet, or by modem) for billing purposes. Is preferred.

結論
新規であると考えられる本発明の様々な態様は、限定されることなく、以下の概念を含む。
・レイヤ化圧縮に適用する暗号化
・レイヤ化圧縮に適用する透かし処理
・レイヤ化されたシステムの各レイヤに適用され、各独立レイヤのロック解除のための異なるキー、認証、またはアルゴリズムを要求するユニークな暗号化
・各レイヤに適用され、（シリアル番号等の方法を使用して）特定レイヤを識別するためのユニークな透かし処理
・暗号化または透かし処理に圧縮画像ストリームのサブフレームユニットを利用すること
・特定種類の透かしの検出を撹乱しようとする方法から保護するため、複数の透かし処理方法を同時に利用すること
・複数の暗号化方法および強度を同時に利用。それにより、単一レイヤまたはレイヤ構造の圧縮画像ストリーム内の様々なユニットを復号化するため、複数の独立した解読システムを要求すること
・圧縮画像ストリーム内の様々なユニットに対して１つ以上の解読方法を同時に用いて並行に解読すること
・課金システムへキーを結び付けること
・特定の媒体および／または特定の目標位置もしくはシリアル番号へ暗号化を結び付けること
・安全な時計および使用日の範囲に暗号化を結び付けること
・安全な使用カウンタによる特定の使用回数へ暗号化を結び付けること
・新しい映画またはキーを取得するためのキーとして映画そのものを使用すること
・新しい映画を取得するためのキーとして使用される時、または認証された使用期間が終了する時に、物理媒体から映画データを消去すること
・フレキシブルなキーツールキットのアプローチを用い、フレキシビリティ、使用利便性、および安全性を改良するために、キーの使用方法を継続的にリファインすること
・透かし処理手法として２番目に最適な（または３番目等の最適な）の動きベクトルを使用すること
・透かし処理手法としてレート制御の小変動を使用すること（Ｉ、Ｂ、および／またはＰタイプフレームの任意の組合せ、ならびにそれらに相当する動き補償ウェーブレットに適用）
・透かし処理手法としてＤＣおよび／またはＡＣ係数における低位ビットの変動を使用すること（Ｉ、Ｂ、および／またはＰタイプフレーム、ならびにそれらに相当するものに適用）
・各コピーへユニークに透かし処理を施すために、圧縮時に画像の各コピーへユニークに加えられる低振幅のぼんやりとした文字または数字を使用すること
・画像ストリームの大きな部分に影響を与えるビットストリームの部分に暗号化を適用すること（暗号化に高影響）
・作品の大部分に全体的な暗号化を適用し、選択されたユニットにカスタマイズされた暗号化を適用すること
・データストリームのわずかな部分を暗号化し、これらをポイントツーポイントの方法で各特定の位置へ送信すること（シリアル番号、キー、職員コード、ＩＰアドレス、およびその特定の位置におけるその他のユニークな識別子へ結び付けることを含む）
・可視性を最小化するために、その他のフレームへの影響が低いビットストリームの部分に透かし処理を適用すること
・視覚的影響を最小化するために、画像の縁部領域（上部、下部、左縁および右縁付近）に対して潜在的に可視の透かし（低振幅の文字および数字、またはＤＣもしくはＡＣ係数内のＬＳＢ等）を使用すること
・左コラム（スライス開始）動きベクトル、Ｉフレーム内のＤＣおよびＡＣ係数、予測モードビット、制御コード等、独立して暗号化するためにサブフレームユニットの影響点を抽出すること CONCLUSION Various aspects of the invention that are believed to be novel include, but are not limited to, the following concepts.
• Encryption applied to layered compression • Watermarking applied to layered compression • Applied to each layer of a layered system, requiring a different key, authentication, or algorithm for unlocking each independent layer Unique encryption • Applied to each layer and uses a subframe unit of the compressed image stream for unique watermarking / encryption or watermarking (using serial number or other methods) to identify a specific layer・ Use multiple watermark processing methods simultaneously to protect against detection of specific types of watermark detection. ・ Use multiple encryption methods and strengths simultaneously. Thereby requiring multiple independent decoding systems to decode various units in a single layer or layered compressed image stream; one or more for different units in a compressed image stream Deciphering in parallel using decryption methods-Tying keys to billing systems-Tying encryption to specific media and / or specific target locations or serial numbers-Encrypting a safe clock and range of use dates・ Use encryption as a key to get a new movie or key ・ Use as a key to get a new movie Delete movie data from physical media at the end of the authorized use period・ Continue refinement of key usage to improve flexibility, usability, and safety using a flexible key toolkit approach ・ The second best watermarking technique ( Or use the third (optimal etc.) motion vector and use small variation of rate control as watermarking technique (any combination of I, B and / or P type frames and their equivalents) Applies to motion compensated wavelets)
Use low order bit fluctuations in DC and / or AC coefficients as a watermarking technique (applies to I, B, and / or P type frames and their equivalents)
Use low amplitude blur characters or numbers that are uniquely added to each copy of the image during compression to uniquely watermark each copy.Bitstream effects that affect a large portion of the image stream. Apply encryption to parts (high impact on encryption)
Apply global encryption to the majority of the work and apply customized encryption to selected unitsEncrypt a small portion of the data stream and identify each of these in a point-to-point manner Sending to a location (including binding to a serial number, key, staff code, IP address, and other unique identifiers at that particular location)
Apply watermarking to parts of the bitstream that have a low impact on other frames to minimize visibility. To minimize visual impact, the edge region (top, bottom, Use potentially visible watermarks (such as low-amplitude letters and numbers, or LSB in DC or AC coefficients) for the left and right edges) • Left column (start slice) motion vector, I frame Extracting the influence points of subframe units for independent encryption such as DC and AC coefficients, prediction mode bits, control codes, etc.

コンピュータ実装
本発明は、ハードウェア（例えば集積回路）もしくはソフトウェア、または両者の組み合わせで実施可能である。しかしながら、本発明は、少なくとも処理装置、データ記憶システム（揮発性および不揮発性メモリ、ならびに／または記憶素子を含む）、入力装置、および出力装置を含む、１つ以上のプログラム可能なコンピュータ上で実行されるコンピュータプログラムにおいて実施されるのが好ましい。プログラムコードを入力データに適用することにより、本明細書に記載する機能を実行するとともに出力情報を生成する。出力情報は既知の方法で１つ以上の出力装置に適用される。 Computer Implementation The present invention can be implemented in hardware (eg, an integrated circuit) or software, or a combination of both. However, the present invention executes on one or more programmable computers including at least a processing unit, a data storage system (including volatile and non-volatile memory and / or storage elements), an input device, and an output device. Preferably implemented in a computer program. By applying the program code to the input data, the functions described in this specification are executed and output information is generated. The output information is applied to one or more output devices in a known manner.

かかるプログラムの各々は、任意の所望のコンピュータ言語（機械語、アセンブリ語、または高レベル命令型語、論理語、もしくはオブジェクト指向プログラミング言語を含む）で実装して、コンピュータシステムと通信させてもよい。いずれの場合も、言語はコンパイル言語でも翻訳言語でもよい。 Each such program may be implemented in any desired computer language (including machine language, assembly language, or high-level imperative words, logic words, or object-oriented programming languages) and communicate with a computer system. . In either case, the language may be a compiled language or a translated language.

かかるコンピュータプログラムの各々は、汎用または専用のプログラムマブルコンピュータシステムによって可読な記憶媒体または装置（例えば、ＲＯＭ、ＣＤ−ＲＯＭ、または磁気もしくは光学媒体）に格納され、その記憶媒体または装置がコンピュータシステムによって読み込まれるとコンピュータを環境設定および動作させて、本明細書に記載の手順を遂行するのが好ましい。本発明のシステムはまた、コンピュータプログラムと共に構成されたコンピュータ可読記憶媒体としての実装されるものと考えられ、そのように構成した記憶媒体によってコンピュータシステムを特定の予め定義された方法で動作させ、本明細書に記載の機能を遂行してもよい。 Each such computer program is stored on a storage medium or device (e.g., ROM, CD-ROM, or magnetic or optical medium) readable by a general purpose or special-purpose programmable computer system, and the storage medium or device is a computer system. Preferably, the computer is configured and operated to perform the procedures described herein. The system of the present invention is also considered to be implemented as a computer-readable storage medium configured with a computer program, which causes the computer system to operate in a specific predefined manner with the storage medium configured as such. The functions described in the specification may be performed.

本発明の幾つかの実施の形態を説明してきたが、本発明の精神および範囲を逸脱することなく、種々の変更が可能であることは言うまでもない。例えば、好ましい実施の形態はＭＰＥＧ−２コード化および復号化を用いるが、本発明はＩ、Ｂ、およびＰフレームと均等なものならびにレイヤを提供するいかなる同様の規格でも作動するであろう。そのため、本発明は説明した特定の実施の形態に限定されるものではなく、専ら添付の特許請求範囲により限定されるものであることが理解されよう。 While several embodiments of the present invention have been described, it will be appreciated that various modifications can be made without departing from the spirit and scope of the invention. For example, although the preferred embodiment uses MPEG-2 encoding and decoding, the present invention will work with any similar standard that provides equivalents and layers for I, B, and P frames. Therefore, it will be understood that the invention is not limited to the specific embodiments described but is only limited by the scope of the appended claims.

２４ｆｐｓおよび３６ｆｐｓの素材を６０Ｈｚで表示するためのプルダウンレートを示すタイミング図である。It is a timing chart showing a pull-down rate for displaying materials of 24 fps and 36 fps at 60 Hz. 第１の好ましいＭＰＥＧ−２コード化パターンを示す。A first preferred MPEG-2 coding pattern is shown. 第２の好ましいＭＰＥＧ−２コード化パターンを示す。A second preferred MPEG-2 coding pattern is shown. 本発明の好ましい実施の形態による時間レイヤ復号化を示すブロック図である。FIG. 6 is a block diagram illustrating temporal layer decoding according to a preferred embodiment of the present invention. ３６Ｈｚおよび７２Ｈｚの両方のフレームを出力できるコンバータへの６０Ｈｚインターレース入力を示すブロック図である。FIG. 6 is a block diagram illustrating a 60 Hz interlaced input to a converter capable of outputting both 36 Hz and 72 Hz frames. ２４または３６Ｈｚにおける基本ＭＰＥＧ−２レイヤのための「マスターテンプレート」を示す図である。FIG. 4 shows a “master template” for a basic MPEG-2 layer at 24 or 36 Hz. ＭＰＥＧ−２を利用する階層的な解像度スケーラビリティを用いる基本解像度テンプレートの拡張を示す図である。FIG. 3 is a diagram illustrating an extension of a basic resolution template using hierarchical resolution scalability using MPEG-2. 好ましいレイヤ化解像度符号化処理を示す図である。It is a figure which shows a preferable layered resolution encoding process. 好ましいレイヤ化解像度復号化処理を示す図である。It is a figure which shows a preferable layered resolution decoding process. 本発明によるデコーダに対する解像度および時間的スケーラブルのオプションの組合せを示すブロック図である。FIG. 4 is a block diagram illustrating a combination of resolution and temporal scalable options for a decoder according to the present invention. 暗号化および透かし処理が及ぶ範囲をユニット従属性の関数として表す図である。It is a figure showing the range which encryption and a watermark process cover as a function of unit dependence. あるタイプの透かしを持つ画像フレームの図である。FIG. 6 is an illustration of an image frame with a certain type of watermark. 異なるタイプの透かしを持つ画像フレームの図である。FIG. 4 is a diagram of image frames with different types of watermarks. 本発明の暗号化手法を適用する１つの方法を示すフローチャートである。It is a flowchart which shows one method of applying the encryption method of this invention. 本発明の透かし処理手法を適用する１つの方法を示すフローチャートである。It is a flowchart which shows one method of applying the watermark processing method of this invention.

Explanation of symbols

５０…ＭＰＥＧ−２デコーダ、５２…第２のデコーダ、６０…カメラ、６２…他のソース、６４…コンバータ、１２００…フレーム、１２０２…記号、１２０４…マーク。 50 ... MPEG-2 decoder, 52 ... second decoder, 60 ... camera, 62 ... other source, 64 ... converter, 1200 ... frame, 1202 ... symbol, 1204 ... mark.

Claims

A method for encrypting and watermarking a data stream of video information encoded and compressed into a base layer and at least one enhancement layer, comprising:
(A) selecting at least one encryption algorithm;
(B) selecting at least one watermarking technique;
(C) selecting at least one unit to be encrypted from the base layer or the at least one enhancement layer;
(D) selecting a unit to be subjected to at least one watermark processing of the base layer or the at least one enhancement layer;
(E) applying said at least one selected watermarking technique to subject each of said selected units to a watermarked unit;
(F) applying the at least one selected encryption algorithm to encrypt each of the selected units into an encrypted unit;
Including methods.

A system for encrypting and watermarking a data stream of video information encoded and compressed into a base layer and at least one enhancement layer,
(A) means for selecting at least one encryption algorithm;
(B) means for selecting at least one watermark processing technique;
(C) means for selecting at least one unit to be encrypted of the base layer or the at least one enhancement layer;
(D) means for selecting a unit to be subjected to at least one watermark processing of the base layer or the at least one enhancement layer;
(E) means for applying a watermarking process by applying the at least one selected watermarking technique to each of the selected units as a unit to be subjected to the watermarking process;
(F) means for applying each of the at least one selected encryption algorithm to encrypt each of the selected units into an encrypted unit;
Including system.

A computer program for encrypting a data stream of video information stored in a computer readable medium and encoded and compressed into a base layer and at least one enhancement layer and watermarking the data stream, the computer comprising:
(A) selecting at least one encryption algorithm;
(B) selecting at least one watermark processing method;
(C) selecting at least one unit to be encrypted from the base layer or the at least one enhancement layer;
(D) selecting a unit to be subjected to at least one watermark processing of the base layer or the at least one enhancement layer;
(E) applying the at least one selected watermarking technique to cause each of the selected units to be watermarked as a unit to be watermarked;
(F) applying the at least one selected encryption algorithm to encrypt each of the selected units into an encrypted unit;
A computer program containing instructions.