JP4352105B2

JP4352105B2 - Advanced television with enhanced temporal and resolution stratification

Info

Publication number: JP4352105B2
Application number: JP2001574651A
Authority: JP
Inventors: ガリーイーデモス
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2000-04-07
Filing date: 2001-04-06
Publication date: 2009-10-28
Anticipated expiration: 2021-04-06
Also published as: AU2001251386A1; EP1279111A1; WO2001077871A1; EP1279111A4; JP2003531514A; CA2406459C; CA2406459A1

Description

【０００１】
関連出願の相互参照
本願は、１９９６年１月３０日付けで出願された米国特許願第０８／５９４，８１５号（現在は１９９８年１２月２２日付けで発行された米国特許第５，８５２，５６５号である）の継続出願であった１９９８年１２月２１日付け出願の米国特許願第０９／２１７，１５１号の継続出願であった１９９９年１１月１７日付け出願の米国特許願第０９/４４２，５９５号の一部継続出願でありその優先権を主張するものである。
【０００２】
技術分野
本発明は、電子通信システムに関し、さらに詳しく述べると、圧縮特性、フィルタリング特性及び表示特性を強化された圧縮画像フレームの時相と解像度を階層化（temporal and resolution layering）したアドバンスド電子テレビジョンシステムに関する。
【０００３】
背景
米国は、現在、テレビジョンを伝送するのに、ＮＴＳＣ標準を利用している。しかし、このＮＴＳＣ標準をアドバンスドテレビジョン標準と取替えるという提案がなされている。例えば、米国がディジタル標準精細度フォーマットとアドバンスドテレビジョンフォーマットを、２４Ｈｚ、３０Ｈｚ、６０Ｈｚ及びインタレース化された６０Ｈｚの速度（rate）で採用することが提案されている。これらの速度は、既存のＮＴＳＣテレビジョンの表示速度６０Ｈｚ（又は５９．９４Ｈｚ）を引続き利用すること（したがってこの表示速度と両立すること）を意図していることは明らかである。また、「３−２プルダウン（3-2 pulldown）」が、２４フレーム／秒（ｆｐｓ）の時相速度を有する映画を提供するときに、６０Ｈｚの表示速度で表示することを目的としていることも明らかである。しかし、上記提案は、選択すべき可能なフォーマットのメニューを提供するが、各フォーマットは、単一の解像度とフレーム速度しか符号化し復号しない。これらフォーマットの表示速度又は動き速度（motion rate）は、互いに不可分には関連していないので、一方から他方への変換は困難である。
【０００４】
さらに、この提案は、コンピュータ表示器と両立できる決定的な性能を提供していない。これらの提案された画像の動き速度は、今世紀初期にさかのぼる歴史的な速度に基づいている。いきがかりを捨てるならば、これらの速度は選択されることはないであろう。コンピュータ産業界では、表示器は、過去１０年間にわたってあらゆる速度を利用できたが、７０〜８０Ｈｚの範囲の速度が最適であると証明され、７２Ｈｚと７５Ｈｚが最も普通の速度であった。あいにく、提案された速度の３０Ｈｚと６０Ｈｚは、７２Ｈｚ又は７５Ｈｚとの有用な相互運用性を欠いており、その結果、時相性能が低下する。
【０００５】
その上に、高いフレーム速度で約１０００ラインの解像度を有する必要があると要求されているため、インタレースが必要であるが、このような画像が従来の６ＭＨｚ放送テレビジョンチャネルの利用可能な１８〜１９メガビット／秒内で圧縮できないという認識に基づいていることが一部の人によって示唆されている。
【０００６】
単一の信号フォーマットを採用しなければならないならば、そのフォーマットの中に所望の標準の高精細度の解像度をすべて含んでいることが一層要望されるであろう。しかし、従来の６ＭＨｚ放送テレビジョンチャネルの帯域幅の制約内で上記のことを行うには、フレーム速度（時相）と解像度（空間）の両者の圧縮と「スケーラビリティ（scalability）」が必要である。このようなスケーラビリティを提供することを特に意図する一つの方法はＭＰＥＧ−２標準である。ＭＰＥＧ−２標準（及びより新しい標準、例えばＭＰＥＧ−４）中に詳記されている時相及び空間のスケーラビリティの機能は、米国のアドバンスドテレビジョンの要求を満たすのに充分有効でない。したがって米国のアドバンスドテレビジョンに対する前記提案は、時相（フレーム速度）と空間（解像度）の階層化が無効果であるという前提に基づいているので、個々のフォーマットが必要である。
【０００７】
さらに、解像度、画像の明瞭度、符号化効率及び画像生成効率を高めることが望ましい。本発明はこのような性能強化を行う。
【０００８】
要約
本発明は、高フレーム速度にて高画質で、１０００ライン解像度より優れた画像圧縮を明白に達成する、画像圧縮を行う方法と装置を提供するものである。また本発明は、従来のテレビジョン放送チャネルの利用可能な帯域幅内で、上記解像度にて高フレーム速度で、時相と解像度のスケーラビリティの両者も達成するものである。本発明の方法は、アドバンスドテレビジョンに対して提案されている圧縮比の２倍を超える圧縮比を有効に達成する。さらに階層化圧縮によって、各種の画像強化方法を意のままに利用できるようにする一形態の画像のモジュール化分解が可能になる。
【０００９】
画像マテリアル（image material）は好ましくは、７２ｆｐｓという初期又は一次のフレーム指示速度で捕獲される。次に、ＭＰＥＧ式（例えばＭＰＥＧ−２、ＭＰＥＧ−４など）のデータストリームが生成し、そのデータ流は次の層を含んでいる。
（１）好ましくはＭＰＥＧ型Ｐフレームだけを使用して、符号化されるベース層であって、低解像度（例えば１０２４×５１２画素）で低フレーム速度（２４又は３６Ｈｚ）のビットストリームを含む層；
（２）ＭＰＥＧ型Ｂフレームだけを使用して符号化される任意のベース解像度の時相強化層であって、低解像度（例えば１０２４×５１２画素）で高フレーム速度（７２Ｈｚ）のビットストリームを含む層；
（３）好ましくはＭＰＥＧ型Ｐフレームだけを使用して符号化される任意のベース時相の高解像度強化層であって、高解像度（例えば２ｋ×１ｋ画素）で低フレーム速度（２４又は３６Ｈｚ）のビットストリームを含む層；
（４）ＭＰＥＧ型Ｂフレームだけを使用して符号化される任意の高解像度の時相強化層であって、高解像度（例えば２ｋ×１ｋ画素）で高フレーム速度（７２Ｈｚ）のビットストリームを含む層。
【００１０】
本発明は、現在の提案を超える大きな改良が可能になる多数の重要な技術特性、例えば、多種の解像度とフレーム速度の、単一の階層化された解像度とフレーム速度による置換；６ＭＨｚのテレビジョンチャネル内で、高フレーム速度（７２Ｈｚ）で２メガ画素の画像について１０００ラインより優れた解像度を達成するのにインタレースが不要であること；一次フレーム指示速度７２ｆｐｓを使用することによるコンピュータ表示器との互換性；及びアドバンスドテレビジョンに対する現行の階層化されていないフォーマットの提案よりはるかに高い堅牢性を提供する。なぜならば、「ストレスの多い（stressful）」画像マテリアルに出会うと、利用可能なビットがすべて、低解像度のベース層に割り当てることができるからである。
【００１１】
さらに、本発明は、ビデオ品質と圧縮の各種問題点を処理する多くの強化法を提供する。このような強化法を多数、以下に説明するが、これら強化法は大部分、好ましくは、画像の強化及びその画像の圧縮を行うタスクに適用できる一組のツールとして実施される。これらのツールは、所望どおりに、各種の方式でコンテント・デベロッパ（content developer）によって結合して、圧縮されたデータストリーム特に階層化された圧縮データストリームの視覚質と圧縮効率を最適化することができる。
【００１２】
このようなツールとしては、改良された画像フィルタリング法、動きベクトルの表現と決定、デ−インタレーシングと雑音低下の強化法、動き解析、画像形成装置の特性決定と修正、強化された３−２プルダウンシステム、生産のためのフレーム速度法、モジュラビット速度法、多層ＤＣＴ構造、各種長さの符号化の最適化、ＭＰＥＧ−２とＭＰＥＧ−４用の拡張システム、及び空間強化層用のガイドベクトルがある。
【００１３】
一般に、この技術は、以下に特徴がある。
（特徴１）画像符号化システムのベース層の強化層の製造方法であって、該ベース層をアップフィルタし拡張して拡張ベース領域にし、その拡張ベース領域を囲む追加面積領域を、該拡張ベース領域を均一な中間グレイ画素値でパッドすることでつくり、次に追加の写真情報を提供する強化層をつくる、ことを含んでなり、その強化層が、該拡張ベース領域と一致する面積に対する小範囲の可能な画素値及び該追加の面積領域と一致する面積に対する大範囲の画素値を有する差分写真を含んでいる方法。
【００１４】
特徴１は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴２）強化層を、ベース層を含む写真ストリームの一部として符号化することをさらに含む特徴１に記載の方法。
（特徴３）強化層を復号することをさらに含む特徴２に記載の方法。
（特徴４）差分写真が動きベクトルを含み、そしてさらに、その動きベクトルに、追加の面積領域を指さないように強制することを含む特徴１に記載の方法。
（特徴５）マクロブロックに基づいて動きベクトルを決定することを含み、そのマクロブロックが、該拡張ベース領域とその拡張ベース領域を囲む追加面積領域との間の境界を走査しないようにアラインされている特徴４に記載の方法。
（特徴６）ベース層と強化層が、３／２、４／３及び完全ファクター２のうち一つから選択される解像度比を有する特徴１に記載の方法。
（特徴７）差分写真が強化層の中心に配置されている特徴１に記載の方法。
（特徴８）差分写真を、強化層に対して画像から画像へ連続的に再配置することをさらに含む特徴１に記載の方法。
【００１５】
一般に、この技術は、以下に特徴がある。
（特徴９）画像符号化システム内でより高い解像度の画像からより低い解像度の画像をつくる方法であって、ダウンサイジングフィルタを、該ダウンサイジングフィルタより高い解像度の原画像に適用することを含み、そのダウンサイジングフィルタが、正の中央ローブ、その正の中央ローブの両側に各々隣接する二つの負のローブ、及び各負のローブに対応して隣接している小さい正のローブを含み、その小さい正のローブが各々、対応する負のローブによって該正の中央ローブから隔てられている方法。
【００１６】
特徴９は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴１０）ダウンサイジングフィルタの大きさが該小さい正のローブに制限されている特徴９に記載の方法。
（特徴１１）該正の中央ローブ、負のローブ及び小さい正のローブの相対振幅が、接頭sinc関数によって近似される特徴９に記載の方法。
（特徴１２）正の中央ローブの相対振幅が接頭sinc関数によって近似され、そして小さい正のローブと負のローブの相対振幅が、接頭sinc関数の１／２〜２／３と近似される特徴９に記載の方法。
【００１７】
一般に、この技術は、以下に特徴がある。
（特徴１３）画像符号化システム内で、復元されたベース画像層又は強化画像層から拡大画像をつくる方法であって、一対のアップサイジングフィルタを、復元されたベース画像層又は強化画像層に適用することを含み、各アップサイジングフィルタが正の中央ローブ及びその中央ローブの両側に各々隣接する二つの負のローブを含み、各アップサイジングフィルタの正の中央ローブのピークが互いに非対称に隔てられている方法。
【００１８】
特徴１３は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴１４）アップサイジングフィルタの大きさが負のローブに制限される特徴１３に記載の方法。
（特徴１５）正の中央ローブの相対振幅が接頭sinc関数によって近似され、そして負のローブの相対振幅が、接頭sinc関数によって近似される値より小さい特徴１３に記載の方法。
（特徴１６）正の中央ローブの相対振幅が接頭sinc関数によって近似され、そして負のローブの相対振幅が接頭sinc関数の１／２〜２／３と近似される特徴１３に記載の方法。
【００１９】
一般に、この技術は、以下に特徴がある。
（特徴１７）画像符号化システム内で元の高解像度画像からつくった元の圧縮されていないベース層入力画像から強化ディテール画像をつくる方法であって、ガウスのアップサイジングフィルタを、元の圧縮されていないベース層画像に適用して拡張画像をつくり；該拡張画像を該元の高解像度画像から差し引くことによって差分画像をつくり、次いでその差分画像に重みファクターを掛ける、ことを含む方法。
【００２０】
特徴１７は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴１８）重みファクターが約４％〜約３５％の範囲内にある特徴１７に記載の方法。
（特徴１９）符号化システムがＭＰＥＧ−４の標準に適合し、そして重みファクターが約４％〜約８％の範囲内にある特徴１７に記載の方法。
（特徴２０）符号化システムが、ＭＰＥＧ−２の標準に適合し、そして重みファクターが約１０％〜約３５％の範囲内にある特徴１７に記載の方法。
【００２１】
一般に、この技術は、以下に特徴がある。
（特徴２１）画像符号化システム内で画質を高める方法であって、デ−グレイニングフィルタ又はノイズ減少フィルタの少なくとも一方を元のディジタル画像に適用して第一処理済画像をつくり、次いで該第一処理済画像を、該画像符号化システム内で符号化して圧縮画像にする、ことを含む方法。
【００２２】
特徴２１は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴２２）元の画像が非相関ノイズ特性を有する別のカラーチャネル画像を含み、そしてさらに、別個のノイズ減少フィルタを、このような別個のカラーチャネル画像の少なくとも一つに適用することを含む特徴２１に記載の方法。
（特徴２３）圧縮された画像を復号して復元された画像にし、次いでその復元された画像に、リ−グレイニングフィルタ又はリ−ノイジングフィルタの少なくとも一方を適用する、ことをさらに含む特徴２１に記載の方法。
【００２３】
一般に、この技術は、以下に特徴がある。
（特徴２４）画像符号化システム内で画質を高める方法であって、フィールドデ−インタレーサを、一連の画像フィールドの各々に適用して、対応する一連のフィールドフレームをつくり、フィールドフレームデ−インタレーサを、一連の少なくとも三つの逐次フィールドフレームに適用して、対応する一連のデ−インタレース化画像フレームをつくり、次いでその一連のデ−インタレース化画像フレームを画像符号化システム内で符号化して一連の圧縮された画像にする、ことを含む方法。
【００２４】
特徴２４は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴２５）各画像フィールドがラインを含み、そして該フィールドデ−インタレーサを適用することが、画像フィールドの各ラインを複製し、次いで該画像フィールドの各隣接するペアのラインに対し、かようなペアのラインを平均することによって、かようなペアのラインの間に一つのラインを合成する、ことを含む特徴２４に記載の方法。
（特徴２６）該フィールドフレームデ−インタレーサを適用することが、前のフィールドフレーム、現行のフィールドフレーム及び次のフィールドフレームの各々に対し、これらフィールドフレームの重み付け平均として、デ−インタレース化画像フレームを合成することを含む、特徴２４に記載の方法。
（特徴２７）該前のフィールドフレーム、現行のフィールドフレーム及び次のフィールドフレームに対する重みがそれぞれ約２５％、５０％及び２５％である特徴２６に記載の方法。
（特徴２８）各デ−インタレース化画像フレームと各フィールドフレームが画素値を含み、そしてさらに、各デ−インタレース化画像フレーム及び各対応する現行フィールドフレームの各対応する画素値の間の差をしきい値と比較して差の値をつくりだし、次いで該デ−インタレース化画像フレームに対する各最終画素値として、該差の値が第一しきい値比較範囲内にある場合は現行フィールドフレームから対応する画素値を選び、そして該差の値が第二しきい値比較範囲内にある場合はデ−インタレース化画像フレームから対応する画素値を選ぶ、ことをさらに含む特徴２４に記載の方法。
（特徴２９）該しきい値が、約０．１〜０．３の範囲内から選択される特徴２４に記載の方法。
（特徴３０）比較する前に、各デ−インタレース化画像フレームと現行フィールドフレームを平滑フィルタリングすることをさらに含む特徴２８に記載の方法。
（特徴３１）平滑フィルタリングが、ダウンフィルタリングとこれに続くアップフィルタリングを含む特徴３０に記載の方法。
（特徴３２）各デ−インタレース化画像フレームと各フィールドフレームが画素値を含み、そしてさらに、各現行フィールドフレームの重み付け量を、各デ−インタレース化画像フレームの重み付け量に加えることを含む特徴２４に記載の方法。
（特徴３３）各現行フィールドフレームの重み付け量が１／３であり、そして各デ−インタレース化画像フレームの重み付け量が２／３である特徴３２に記載の方法。
【００２５】
一般に、この技術は、以下に特徴がある。
（特徴３４）画像符号化システム内で、非線形信号を表すディジタル画素値を含むビデオ画像の画質を強化する方法であって、該非線形信号を表す各ビデオ画像のディジタル画素値を、線形表現に変換して、線形化画像をつくり、変換関数を、少なくとも一つの線形化画像に適用して、変換された画像をつくり、次いで各変換された画像を、非線形信号を表すディジタル画素値を含むビデオ画像に変換してもどす、ことを含む方法。
【００２６】
一般に、この技術は、以下に特徴がある。
（特徴３５）ビデオ画像を符号化する方法であって、原画像の水平と垂直の寸法を、それぞれ第一と第二の選択された単分数ファクターによってダウンサイズして、第一中間画像をつくり、その第一ワーキング画像を圧縮ベース層として符号化し、そのベース層を復元し次にその結果を、該選択された単分数ファクターの逆数によってアップサイズして第二中間画像をつくり、該第一中間画像を、該選択された単分数ファクターの逆数によってアップサイズし、次にその結果を原画像から差引き次にその結果に重み付けをして第一中間結果をつくり、該第二中間画像を原画像から差引いて第二中間結果をつくり、該第一中間結果と該第二中間結果を加算して第三中間画像をつくり、次いで該第三中間画像を符号化して強化層をつくることを含む方法。
【００２７】
特徴３５は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴３６）第三中間画像を、符号化する前にクロッピングとエッジフェザリングを行うことをさらに含む特徴３５に記載の方法。
（特徴３７）該第一と第二の単分数ファクターが各々、１／３、１／２、２／３及び３／４のうちの一つから選択される特徴３５に記載の方法。
【００２８】
一般に、この技術は、以下に特徴がある。
（特徴３８）画像符号化システム内で画質を強化する方法であって、中央値フィルタを、ディジタルビデオ画像の水平画素値に適用し、中央値フィルタを、ディジタルビデオ画像の垂直画素値に適用し、次いで該水平画素値と垂直画素値のフィルタリングの結果を平均して、ノイズを減らしたディジタルビデオ画像をつくる、ことを含む方法。
【００２９】
特徴３８は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴３９）中央値フィルタを、該ディジタルビデオ画像の対角画素値に適用し、次いで該ノイズを減らしたディジタルビデオ画像の対角画素値をフィルタした結果を平均する、ことをさらに含む特徴３８に記載の方法。
【００３０】
一般に、この技術は、以下に特徴がある。
（特徴４０）画像符号化システム内で画質を強化する方法であって、時相中央値フィルタを、前のディジタルビデオ画像、現行のディジタルビデオ画像及び次のディジタルビデオ画像の対応する画素値に適用して、ノイズを減らしたディジタルビデオ画像をつくる、ことを含む方法。
【００３１】
特徴４０は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴４１）各ノイズ減少ディジタルビデオ画像及び各対応する現行ディジタルビデオ画像の各対応する画素値の間の差を、しきい値と比較して、差の値をつくりだし、次いでノイズ減少ディジタルビデオ画像の各最終画素値として、該差の値が第一しきい値比較範囲内にある場合は現行ディジタルビデオ画像から対応する画素値を選び、そして該差の値が第二しきい値比較範囲内にある場合はノイズ減少ディジタルビデオ画像から対応する画素値を選ぶ、ことをさらに含む特徴４０に記載の方法。
（特徴４２）該しきい値が約０．１〜約０．３の範囲から選択される特徴４１に記載の方法。
【００３２】
一般に、この技術は、以下に特徴がある。
（特徴４３）画像符号化システム内で画質を強化する方法であって、水平中央値フィルタを、現行ディジタルビデオ画像の水平画素値に適用し、垂直中央値フィルタを、現行ディジタルビデオ画像の垂直画素値に適用し、時相中央フィルタを、前のディジタルビデオ画像、現行ディジタルビデオ画像及び次のディジタルビデオ画像の対応する画素値に適用し、次いで中央値フィルタを、該水平フィルタ、垂直フィルタ及び時相フィルタ各々が生成した対応する画素値に適用して、ノイズ減少ディジタルビデオ画像をつくる、ことを含む方法。
【００３３】
一般に、この技術は、以下に特徴がある。
（特徴４４）画像符号化システム内で画質を強化する方法であって、下記５項目：（１）現行ディジタルビデオ画像、（２）現行ディジタルビデオ画像の水平中央値と垂直中央値の平均値、（３）しきい値処理済時相中央値、（４）該しきい値処理済時相中央値の水平中央値と垂直中央値の平均値、並びに（５）該しきい値処理済時相中央値及び現行ディジタルビデオ画像の水平中央値と垂直中央値の中央値、の線形重み付け合計を含むノイズ減少ディジタルビデオ画像をつくることを含む方法。
【００３４】
特徴４４は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴４５）該５項目の重みがそれぞれ約５０％、１５％、１０％、１０％及び１５％である特徴４４に記載の方法。
（特徴４６）該５項目の重みがそれぞれ約３５％、２０％、２２．５％、１０％及び１２．５％である特徴４４に記載の方法。
（特徴４７）少なくとも一つの前のディジタルビデオ画像と少なくとも一つの次のディジタルビデオ画像について現行ディジタルビデオ画像の各ｎｘｎ画素領域に対する動きベクトルを確認し、現行ディジタルビデオ画像の各ｎｘｎ画素領域、並びに少なくとも一つの前のディジタルビデオ画像及び少なくとも一つの次のディジタルビデオ画像の対応する動きベクトルオフセットｎｘｎ画素領域に、中央重み付け時相フィルタを適用して動き補償画像をつくり、次にその動き補償画像を、該ノイズ減少ディジタルビデオ画像に加える、
ことをさらに含む特徴４４に記載の方法。
【００３５】
一般に、この技術は、以下に特徴がある。
（特徴４８）画像符号化システム内で画質を強化する方法であって、少なくとも一つの前のディジタルビデオ画像と少なくとも一つの次のディジタルビデオ画像について現行ディジタルビデオ画像の各ｎｘｎ画素領域に対する動きベクトルを確認し、次いで現行ディジタルビデオ画像の各ｎｘｎ画素領域、並びに少なくとも一つの前のディジタルビデオ画像及び少なくとも一つの次のディジタルビデオ画像の対応する動きベクトルオフセットｎｘｎ画素領域に、中央重み付け時相フィルタを適用して動き補償画像をつくる、ことを含む方法。
【００３６】
特徴４８は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴４９）各ディジタルビデオ画像がデ−インタレース化フィールドフレームである特徴４８に記載の方法。
（特徴５０）各ディジタルビデオ画像が３フィールドフレームのデ−インタレース化画像である特徴４８に記載の方法。
（特徴５１）各ディジタルビデオ画像が、しきい値処理済の３フィールドフレームデ−インタレース化画像である特徴４８に記載の方法。
（特徴５２）該中央重み付け時相フィルタが、該画像の各々に対してそれぞれ約２５％、５０％及び２５％の重みを有する３画像時相フィルタである特徴４８に記載の方法。
（特徴５３）該中央重み付け時相フィルタが、該画像の各々に対してそれぞれ約１０％、２０％、４０％、２０％及び１０％の重みを有する５画像時相フィルタである特徴４８に記載の方法。
【００３７】
一般に、この技術は、以下に特徴がある。
（特徴５４）画像符号化システム内で画質を強化する方法であって、ノーマルダウンフィルタを画像に適用して、第一中間画像をつくり、ガウスアップフィルタを、該第一中間画像に適用して、第二中間画像をつくり、次に、該第二中間画像の重み付けフラクションを、選択された画像に加えて、高周波数のノイズが減少した画像をつくる、ことを含む方法。
【００３８】
特徴５４は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴５５）該重み付けフラクションが、該第二中間画像の約５％と１０％の間である特徴５４に記載の方法。
【００３９】
一般に、この技術は、以下に特徴がある。
（特徴５６）画像符号化システム内で画質を強化する方法であって、ダウンフィルタを、ノイズをフィルタされた原解像度画像に適用して、ベース層解像度の第一中間画像をつくり、ノーマルダウンフィルタを、該第一中間画像に適用して、第二中間画像をつくり、ガウスアップフィルタを、該第二中間画像に適用して、第三中間画像をつくり、下記３項目：（１）該第一中間画像、（２）該第一中間画像の水平中央値と垂直中央値の平均値、及び
（３）該第三中間画像、の線形重み付け合計を含むノイズ減少ディジタルビデオ画像をつくる、ことを含む方法。
【００４０】
特徴５６は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴５７）該３項目の重みがそれぞれ、約７０％、２２．５％及び７．５％である特徴５６に記載の方法。
【００４１】
一般に、この技術は、以下に特徴がある。
（特徴５８）１／４画素動き補償を利用して、画像符号化システム内で画質を強化する方法であって、負のローブを有するフィルタを、隣接する第一画素と第二画素の間の中ほどのサブ画素ポイントに適用して、１／２フィルタされた画素値をつくりだし、負のローブを有するフィルタを、該第一画素と第二画素の間の１／４ほどのサブ画素ポイントに適用し、次に、負のローブを有するフィルタを、該第一画素と第二画素の間の３／４ほどのサブ画素ポイントに適用すること、を含む方法。
【００４２】
一般に、この技術は、以下に特徴がある。
（特徴５９）負のローブを有するフィルタを、隣接する第一画素と第二画素の間の中ほどのサブ画素ポイントに適用して１／２フィルタされた画素値をつくりだすことを含む、画像符号化システム内で１／２画素動き補償を使用して画質を強化する方法。
【００４３】
一般に、この技術は、以下に特徴がある。
（特徴６０）各クロミナンスチャネルを、１／４画素解像度を利用してフィルタすることを含む、画像符号化システム内で、ルミナンスチャネルに対し１／２画素動き補償を利用して画質を高める方法。
【００４４】
特徴６０は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴６１）負のローブを有するフィルタを、隣接する第一と第二のクロミナンス画素の間の各１／４サブ画素ポイントに適用することをさらに含む特徴６０に記載の方法。
【００４５】
一般に、この技術は、以下に特徴がある。
（特徴６２）各クロミナンスチャネルを、１／８画素解像度を利用しフィルタすることを含む、画像符号化システム内で、ルミナンスチャネルに対し１／４画素動き補償を利用して画質を強化する方法。
【００４６】
特徴６２は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴６３）負のローブを有するフィルタを、隣接する第一と第二のクロミナンス画素の間の各１／８サブ画素ポイントに適用することをさらに含む特徴６２に記載の方法。
（特徴６４）負のローブを有するフィルタが接頭sincフィルタである特徴５８、５９、６１及び６３のいずれか一項に記載の方法。
【００４７】
一般に、この技術は、以下に特徴がある。
（特徴６５）ビデオ圧縮システムに対する入力画像を生成する電子画像形成システムの出力の特性を決定しその出力を修正する方法であって、該画像形成システムのカラー画素センサタイプの対をつくるため水平と垂直のカラーミスアラインメントを測定し、該画像形成システムのカラー画素センサタイプによって生成したノイズを測定し、該画像形成システムが生成した画像を、ビデオ圧縮システム内で圧縮する前に、該画像内のカラー画素を、該測定された水平と垂直のカラーミスアラインメントによって確認された量によって変換し、次いで測定されたどんなノイズの量に対しても補償する重みを有する重み付けノイズ減少フィルタを該画像に適用する、ことによって修正する、ことを含む方法。
【００４８】
一般に、この技術は、以下に特徴がある。
（特徴６６）ビデオ圧縮システムに対し入力される画像をつくるフィルムベース画像形成システムの出力の特性を決定しその出力を修正する方法であって、画像のシーケンスを記録するために使用されるフィルムタイプを決定し、このようなフィルムタイプの試験条片を、各種の照明条件下に露出し、該露出された試験条片を、既知のノイズ特性を有する電子画像形成システムによって走査し、このような走査中に電子画像形成システムが生成したノイズを測定し、次いで同じフィルムタイプ上にフィルムベース画像形成システムが生成し次に該試験条片の場合と同じ電子画像システムが走査した画像を、ビデオ圧縮システム内で圧縮する前に、測定されたどのノイズの量に対しても調節された重みを有するノイズ減少フィルタを該画像に適用することによって、修正する、ことを含む方法。
【００４９】
一般に、この技術は、以下に特徴がある。
（特徴６７）２４ｆｐｓのフィルム画像のビデオへの変換を３−２プルダウンを利用して最適化する方法であって、２４ｆｐｓのフィルム画像をディジタル画像に、このようなディジタル画像の２４ｆｐｓの記憶、処理または通信を直接行える処理装置だけを使って変換し、このようなディジタル画像すべてを、２４ｆｐｓフォーマットに、ディジタル画像ソースとして記憶し、３−２プルダウンによるビデオ変換を、決定性フレームカダンスを使用して該ディジタル画像ソースから直接フライ上に実施して、３−２ビデオ画像シーケンスをつくり、その決定性フレームカダンスを、３−２ビデオ画像シーケンスのすべての使用に対して維持し、次に３−２ビデオ画像シーケンスを使用した後、該決定性フレームカダンスを取り消し、次にその３−２ビデオ画像シーケンスを２４ｆｐｓディジタル画像に変換してもどし記憶することを含む方法。
【００５０】
一般に、この技術は、以下に特徴がある。
（特徴６８）２４ｆｐｓの移動画像を７２ｆｐｓの画像ソースから合成する方法であって、該２４ｆｐｓの移動画像の各画像フレームを、７２ｆｐｓ画像ソース由来の三つの連続するフレームから、それらフレームの重み付け平均として合成することを含み、該三つのフレームに対する重みがそれぞれ、[０．１、０．８、０．１]〜[０．２５、０．５０、０．２５]の範囲内にある方法。
【００５１】
特徴６８は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴６９）該重みが約[０．１６６７、０．６６６６、０．１６６７]である特徴６８に記載の方法。
【００５２】
一般に、この技術は、以下に特徴がある。
（特徴７０）２４ｆｐｓの移動画像を１２０ｆｐｓの画像ソースから合成する方法であって、該２４ｆｐｓの移動画像の各画像フレームを、１２０ｆｐｓ画像ソース由来の五つの連続するフレームから、それらフレームの重み付け平均として合成することを含み、該五つのフレームに対する重みが約[０．１、０．２、０．４、０．２、０．１]である方法。
【００５３】
一般に、この技術は、以下に特徴がある。
（特徴７１）６０ｆｐｓの移動画像を１２０ｆｐｓの画像ソースから合成する方法であって、該６０ｆｐｓの移動画像の各画像フレームを、１２０ｆｐｓ画像ソース由来の三つの連続するフレームから、それらフレームの重み付け平均として合成し、該三つのフレームの重みがそれぞれ[０．１、０．８、０．１]〜[０．２５、０．５０、０．２５]の範囲内にあり、そしてこのような画像フレーム各々を合成するために使用される該三つの連続するフレームに、次の画像フレームを合成するために使用される次の三つの連続するフレームを、一フレームだけオーバーラップさせることを含む方法。
【００５４】
一般に、この技術は、以下に特徴がある。
（特徴７２）ディジタルビデオ圧縮システム内で符号化ビットを割り当てる方法であって、第一一定数の符号化ビットを正常に割り当てられたビデオ画像の選択されたフレームベースのユニット内で生じる高圧縮ストレスを検出し、その検出されたユニットは高ストレスのユニットであり、該第一一定数の符号化ビットより大きい第二一定数の符号化ビットを割り当てて、該高ストレスユニットの圧縮を改善し、次いで該高ストレスユニットの少なくとも残っている部分を、第二一定数の符号化ビットを使用して圧縮する、ことを含む方法。
【００５５】
特徴７２は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴７３）ビデオ画像の該フレームベースのユニットが、Ｐフレーム又は写真のグループの範囲のフレームのうち一つを含んでいる特徴７２に記載の方法。
（特徴７４）該第二一定数の符号化ビットが、第一一定数の符号化ビットの単純倍数である特徴７２に記載の方法。
（特徴７５）高圧縮ストレスの検出が、ビデオ画像の選択されたフレームベースのユニットに対する速度制御量子化スケールファクターパラメータに基づいている特徴７２に記載の方法。
（特徴７６）高ストレスユニットをすべて、第二一定数の符号化ビットを使用して圧縮することを含む特徴７２に記載の方法。
【００５６】
一般に、この技術は、以下に特徴がある。
（特徴７７）圧縮されたディジタルビデオ情報の、復号ビット速度及びバッファシステムを有する復号器による、復号を改良する方法であって、その圧縮されたディジタルビデオ情報が、該復号ビット速度より高いソースビット速度で、ソースから提供され、介在する圧縮されたディジタルビデオ情報を、該ソースから、該バッファシステムの第一部分中に、ソースビット速度でプレロードし、プログラムコンテントで圧縮されたディジタルビデオ情報を、該ソースから、該バッファシステムの第二部分中に、ソースビット速度で同時にプレロードし、該プログラムコンテントで圧縮されたディジタルビデオ情報から、介在する圧縮されたディジタルビデオ情報に、選択的に変更し、次いで該介在する圧縮されたディジタルビデオ情報を復号して、該プログラムコンテントのほぼ瞬間的な変化を支持する、ことを含む方法。
【００５７】
一般に、この技術は、以下に特徴がある。
（特徴７８）圧縮されたディジタルビデオ情報の、バッファシステム、平均復号ビット速度及びその平均復号ビット速度より高い少なくとも一つの復号ビット速度を有する復号器による復号を改良する方法であって、その圧縮されたディジタルビデオ情報が、該平均復号ビット速度より高いソースビット速度で、ソースから提供され、増大されたビット速度をモジュールを含む圧縮されたディジタルビデオ情報を、該ソースビット速度で、該バッファシステムの第一部分中にプレロードし、増大されていないビッド速度モジュールを含む圧縮されたディジタルビデオ情報を、該ソースビット速度で、該バッファシステムの第二部分中に同時にプレロードし、次いで該バッファシステムの第二部分のコンテントを、ビデオ画像中に、平均復号ビット速度で復号し、次に該バッファシステムの第一部分のコンテントを、該平均復号ビット速度より高い復号ビット速度で、ビデオ画像中に復号する、ことを含む方法。
【００５８】
一般に、この技術は、以下に特徴がある。
（特徴７９）圧縮されたディジタルビデオ情報の、バッファシステム、平均復号ビット速度及びその平均復号ビット速度より高い少なくとも一つの復号ビット速度を有する復号器による復号を改良する方法であって、その圧縮されたディジタルビデオ情報が、該平均復号ビット速度より高いソースビット速度で、ソースから提供され、圧縮された強化層を含む圧縮されたディジタルビデオ情報を、該ソースビット速度で、該バッファシステムの第一部分中にプレロードし、ベース層を含む圧縮されたディジタルビデオ情報を、該ソースビット速度で、該バッファシステムの第二部分中に同時にプレロードし、次いで、該バッファシステムの第二部分のコンテントを、ビデオ画像中に、平均復号ビット速度で復号し、次に、該バッファシステムの第一部分のコンテントを、該平均復号ビット速度より高い復号ビット速度で、ビデオ画像中に復号する、ことを含む方法。
【００５９】
一般に、この技術は、以下に特徴がある。
（特徴８０）ビデオ画像のベース層及び少なくとも一つの解像度強化層を符号化するため離散的コサイン変換（ＤＣＴ）を利用して、ビデオ符号化システムの符号化効率を改良する方法であって、各々第一ブロックサイズを有するＤＣＴブロックを使用してベース層を符号化し、次いで、第一ブロックの大きさと大きさが比例するブロックサイズを各々有するＤＣＴブロックを使用して、各解像度強化層を、このような強化層の解像度が該ベース層の解像度に比例するように、符号化する、ことを含む方法。
【００６０】
特徴８０は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴８１）ＤＣＴブロックのサブセットを強化層に対して利用することをさらに含み、このようなサブセットが低レベルの強化層又はベース層に対するＤＣＴブロックに対応して、該低レベルの強化層又はベース層に対するかようなＤＣＴブロックの信号／ノイズ比の精度を高める特徴８０に記載の方法。
【００６１】
一般に、この技術は、以下に特徴がある。
（特徴８２）ビデオ画像符号化システム内で、ベース層及び少なくとも一つの解像度強化層に対する動き補償ベクトルを決定する方法であって、ベース層及び各解像度強化層を、このような層内の対応する画素の領域をカバーする大きさのマクロブロックを使用して符号化し、各ベース層及び解像度強化層の各マクロブロックに対し、符号化予測性能及び関連するセットの動きベクトルを指定するのに必要なビットの数の間のバランスを最適化するこのようなマクロブロックに対する動きベクトルサブブロックの数を独立して決定し、次いで関連する独立の動きベクトルのセットを、前記決定された数の動きベクトルサブブロックの各々に対して一つ決定する、ことを含む方法。
【００６２】
一般に、この技術は、以下に特徴がある。
（特徴８３）ビデオ画像符号化ユニットを圧縮する方法であって、複数の可変長符号化テーブルを、各符号化ユニットに適用し、このような符号化ユニットに対して最適の圧縮を行う可変長符号化テーブルを選択し、その選択された可変長符号化テーブルを適用してかような符号化ユニットを圧縮し、次いでこのような符号化ユニットの各々に対して選択された可変長符号化テーブルを、このような符号化ユニットを復元するため、復号器に対し識別する、ことを含む方法。
【００６３】
特徴８３は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴８４）該復号化ユニットが、サブフレーム、フレーム又はフレームのグループのうちの一つである特徴８３に記載の方法。
【００６４】
一般に、この技術は、以下に特徴がある。
（特徴８５）ビデオ画像を符号化し復号する方法であって、ビデオ画像を、基本ビデオ圧縮プロセスと強化ビデオ圧縮プロセスに適合する第一データストリーム中に、及び強化ビデオ圧縮プロセスにのみ適合する構造を有する第二データストリーム中に符号化し、基本ビデオ圧縮プロセスにだけ適合する復号システム上に、第一データストリームだけを復号し、次いで第一データストリームと第二データストリームを、強化ビデオ圧縮プロセスに適合する復号システム上で組み合わせて復号する、ことを含む方法。
【００６５】
特徴８５は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴８６）該基本ビデオ圧縮プロセスと強化ビデオ圧縮プロセスが共通の動き補償離散的コサイン変換構造体を共用している特徴８５に記載の方法。
（特徴８７）該基本ビデオ圧縮プロセスがＭＰＥＧ−２である特徴８５に記載の方法。
（特徴８８）該強化ビデオ圧縮プロセスがＭＰＥＧ−４である特徴８７に記載の方法。
【００６６】
一般に、この技術は、以下に特徴がある。
（特徴８９）階層化ビデオ圧縮システム内でビデオ画像の動き補償符号化を行う方法であって、符号化ビデオ画像のベース層に対する少なくとも一つのベース層動きベクトルを決定し、各ベース層動きベクトルを、ビデオ情報の少なくとも一つの関連する解像度強化層の解像度までスケールアップし、次いで関連する解像度強化層各々に対し、ベース層動きベクトルのうちの一つに対応する各解像度強化層の動きベクトルの少なくとも一つを決定し、このような一つの対応するベース層動きベクトルを案内ベクトルとして使用して、かような関連する解像度強化層の制限サーチ範囲の中心点を示し、かような解像度強化層動きベクトルを決定する、ことを含む方法。
【００６７】
特徴８９は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴９０）各強化層に対し、対応する解像度強化層動きベクトルのみを符号化することをさらに含む特徴８９に記載の方法。
（特徴９１）各解像度強化層動きベクトルと対応するベース層動きベクトルのベクトル和を利用して、かような解像度強化層動きベクトルと関連する強化層に対して動き補償を行うことをさらに含む特徴８９に記載の方法。
【００６８】
一般に、この技術は、以下に特徴がある。
（特徴９２）ビデオ画像を圧縮する方法であって、初期高解像度画像をダウンフィルタして第一処理済画像をつくり、その初期高解像度画像から第一動きベクトルをつくり出し、該第一処理済画像を圧縮して出力ベース層をつくり、その出力ベース層を復元して第二処理済画像をつくり、その第二処理済画像を拡大して第三処理済画像をつくり、該第一処理済画像を拡大して第四処理済画像をつくり、該第三処理済画像を、該初期高解像度画像から差し引いて第五処理済画像をつくり、該第四処理済画像を、該初期高解像度画像から差し引いて第六処理済画像をつくり、該第六処理済画像の振幅を小さくして第七処理済画像をつくり、該第七処理済画像と該第五処理済画像を加算して第八処理済画像をつくり、該第八処理済画像を、第一動きベクトルを利用して符号化して出力解像度強化層をつくり、該出力強化層を復号して第九処理済画像をつくり、該第九処理済画像と該第三処理済画像を加算して第十処理済画像をつくり、該初期高解像度画像を、該第十処理済画像から差し引いて第十一処理済画像をつくり、該十一処理済画像の振幅を大きくして第十二処理済画像をつくり、別のカラーチャネルを、該十二処理済画像から抽出して一組の第十三処理済画像をつくり、該一組の第十三処理済画像を第一動きベクトルを利用して符号化し、対応する一組の出力カラー解像度強化層をつくり、該一組の出力カラー強化層を復号して一組の第十四処理済画像をつくり、該一組の第十四処理済画像を結合して、第十五処理済画像をつくり、該第十五処理済画像の振幅を小さくして第十六処理済画像をつくり、該第十六処理済画像と該第十処理済画像を加算して第十七処理済画像をつくり、該第十七処理済画像を、該初期高解像度画像から差し引いて第十八処理済画像をつくり、次いで該第十八処理済画像を圧縮して出力最終差分残余画像にすることを含む方法。
【００６９】
一般に、この技術は、以下に特徴がある。
（特徴９３）ビデオ画像を圧縮する方法であって、ベース層を初期高解像度画像からつくり出し、第一組の動きベクトルを、該初期高解像度画像に基づいて選択された画像からつくり出し、第一差分画像を、該初期高解像度画像と該ベース層からつくり出し、第二差分画像を、初期高解像度画像及び該初期高解像度画像の処理済コピーからつくり出し、次いで解像度強化層を、該第一と第二の差分画像及び該第一組の動きベクトルからつくり出す、ことを含む方法。
【００７０】
特徴９３は、以下の一つ、或いは２以上の特徴を含んでも良い。
（特徴９４）少なくとも一つのカラー解像度強化層を、少なくととも一つの選択されたカラーに対してつくり出すことをさらに含む特徴９３に記載の方法。
（特徴９５）最終差分残余画像をつくり出すことをさらに含む特徴９３に記載の方法。
（特徴９６）該最終差分残余画像を符号化することをさらに含む特徴９５に記載の方法。
【００７１】
本発明の１又は２以上の実施態様の詳細は、添付図面と以下の説明に記載されている。本発明の他の特徴、目的及び利点は、これらの説明と図面及び特許請求の範囲から明らかになるであろう。なお各種図面の同じ参照記号は同じ要素を示す。
【００７２】
詳細な説明
この説明全体を通じて示されている好ましい実施態様と実施例は、本発明を限定するものではなく、例示しているとみなすべきである。
【００７３】
時相解像度の階層化
時相速度ファミリーの目標
従来技術の問題点を考案した後、本発明を追求中に、将来のディジタルテレビジョンシステムの時相特性を指定するため下記目標を定義した。
・２４フレーム／秒フィルムの高解像度レガシイ（high resolution
legacy）の最適プレゼンテーション。
・迅速に移動する画像タイプ例えばスポーツ画像の彩骨な動きの捕獲。
・７２Ｈｚ又は７５Ｈｚで作動するコンピュータコンパチブル表示器のみならず既存のアナログＮＴＳＣ表示器へのスポーツ画像及び類似の画像の円滑な動きプレゼンテーション。
・余り速くない移動画像、例えばニュースやライブ劇の画像の合理的であるがより効率的な動きの捕獲。
・すべての新しいディジタルタイプの画像の、コンバーターボックスを通じて既存のＮＴＳＣ表示器への合理的なプレゼンテーション。
・すべての新しいディジタルタイプの画像の、コンピュータコンパチブル表示器への高品質のプレゼンテーション。
・６０Ｈｚディジタル標準表示器又は高解像度表示器が市販されたときの、これら表示器に対する同様の合理的な又は高品質のプレゼンテーション。
【００７４】
６０Ｈｚの表示器と７２／７５Ｈｚの表示器は、２４Ｈｚの映画速度以外のどの速度でも基本的に適合しないので、最良の状態は、７２／７５Ｈｚ又は６０Ｈｚが表示速度として除かれた状態であろう。７２Ｈｚ又は７５Ｈｚは、Ｎ．Ｉ．Ｉ．（national Information Infrastructure）とコンピュータのアプリケーションのために必要な速度であるから、６０Ｈｚの速度が、基本的に時代遅れであるので、除外するということはずっと将来のことであろう。しかし、放送・テレビジョン設備の産業界には多くの競合する利害関係があり、新しいディジタルテレビジョンインフラストラクチャが６０Ｈｚ（および３０Ｈｚ）に基づいていることに対する強い要求がある。このため、テレビジョン、放送及びコンピュータの産業界間で、非常に白熱した論争が起こっている。
【００７５】
その上に、インタレースされた６０Ｈｚフォーマットに対する、放送及びテレビジョンの産業界のいくつかの利害が強調されて、コンピュータ表示器の要件とのギャップがさらに広がっている。非インタレース化表示は、ディジタルテレビジョンシステムのコンピュータ式アプリケーションに必要であるから、インタレース化信号が表示される場合、デ−インタレーサが必要である。デ−インタレーサはあらゆるこのような受信装置に必要であるから、デ−インタレーサのコストと品質についてのかなりの論争がある。デ−インタレース化に加えてフレーム速度の変換は、さらにコストと品質に強く影響する。例えば、そのＮＴＳＣ−ＰＡＬ間のコンバータは引続き非常に費用がかかり、しかも変換性能は、多くの一般タイプのシーンに対しては信頼できない。インタレースの争点は複雑で問題の多い課題なので、時相速度の問題点と争点に取りくむため、本発明を、インタレースなしのディジタルテレビジョンの標準に関連して説明する。
【００７６】
最適時相速度の選択
ビートの問題。７２Ｈｚ又は７５Ｈｚの表示器に対する最適のプレゼンテーションは、動き速度が表示速度に等しい（それぞれ７２Ｈｚ又は７５Ｈｚ）及びその逆のカメラ画像又はシミュレートされた画像が生成するときに起こる。同様に、６０Ｈｚ表示器に対する最適の動き忠実度は、６０Ｈｚのカメラ画像又はシミュレート画像から得られる。７２Ｈｚ又は７５Ｈｚの生成速度（generation rate）それぞれを６０Ｈｚ表示器で使用すると、１２Ｈｚ又は１５Ｈｚのビート周波数が生じる。このビートは動き解析を通じて除くことができるが、動き解析は費用がかかりかつ不正確であり、目視可能なアーチファクトと時相エイリアシングを生じることが多い。動き解析を行わないと、該ビート周波数が、知覚される表示速度を支配して、１２Ｈｚ又は１５Ｈｚのビートが、２４Ｈｚより正確さの低い動きを提供するようになる。したがって、２４Ｈｚは、６０Ｈｚと７２Ｈｚの間の自然の時相共通標準（natural temporal common denominator）を形成する。７５Ｈｚは６０Ｈｚと比べてわずかに１５Ｈｚ高いビートを有しているが、その動きは依然として２４Ｈｚほど滑らかではなく、２４Ｈｚの速度が２５Ｈｚまで増大しないならば、７５Ｈｚと２４Ｈｚの間にインテグラル・リレーションシップがない（ヨーロッパの５０Ｈｚの国々では、映画が２５Ｈｚの場合より４％速く上映されることが多い。これを行って、フィルムを７５Ｈｚ表示器に提示できるようにすることができる）。
【００７７】
各受信装置に動き解析がないと、７２Ｈｚ又は７５Ｈｚの表示器での６０Ｈｚの動き及び６０Ｈｚ表示器での７５Ｈｚ又は７２Ｈｚの動きは、２４Ｈｚの画像より平滑さが低い。したがって、７２／７５Ｈｚの動きも６０Ｈｚの動きも、７２Ｈｚ又は７５Ｈｚの表示器及び６０Ｈｚの表示器の両方を含む異種表示器集団に到達させるのに適していない。
【００７８】
３−２プルダウン。テレシネ変換（フィルムからビデオへの変換）プロセス中、「３−２プルダウン」をビデオ効果と組み合わせて使用するため、最適フレーム速度を選択する場合、さらに複雑になる。このような変換中、３−２プルダウンのパターンは、第一フレーム（又はフィールド）を３回繰返し、次いで次のフレームを２回、次いで次のフレームを３回、次いで次のフレームを２回など繰り返す。これは、６０Ｈｚで（実際には、ＮＴＳＣカラーの場合５９．９４Ｈｚ）テレビジョンに２４ｆｐｓフィルムがどのように提供されるかを示している。すなわち、１秒のフィルム中の１２対の２フレームが各々、５回表示され、１秒当り６０個の画像を提供する。その３−２プルダウンのパターンを図１に示す。
【００７９】
いくつかの推定によって、ビデオ上のすべてのフィルムの１／２以上は、かなりの部分が、５９．９４Ｈｚのビデオフィールド速度において２４ｆｐｓフィルムに対して調節がなされた。これらの調節には「パン−アンド−スキャン（pan-and-scan）」、カラー修正及びタイトルスクローリングが含まれている。さらに多くのフィルムは、フレームをドロップさせるか又はシーンの開始と終了をクリップすることによって時間調節されて、計画された所定の放送内にはめこまれる。これらの操作は、５９．９４Ｈｚと２４Ｈｚの動きの両方があるので、３−２プルダウンプロセスを逆転させることができない。このことによって、そのフィルムは、ＭＰＥＧ−２の標準を使用して圧縮することが非常に困難になる。幸いなことに、３−２プルダウンを使用する高解像度ディジタルフィルムの有意なライブラリーがないので、上記問題は、既存のＮＴＳＣ解像度のマテリアルに限定される。
【００８０】
動きブラー。２４Ｈｚより高い共通の時相速度を見つける問題点をさらに探究するため、移動画像を捕獲する際の動きブラーについて述べることは有用である。カメラセンサ及び映画フィルムは、各フレームの時間の一部分で移動画像を感知するため開かれている。映画カメラと多くのビデオカメラのこの露出時間は調節可能である。フィルムカメラは、フィルムを前進させる時間が必要であり、通常、３６０°のうち約２１０°だけ開くように又は５８％デューティサイクルに制限される。ＣＣＤセンサを有するビデオカメラは、そのフレーム時間の一部がそのセンサから画像を「読み取る」ために必要なことが多い。これは、フレーム時間を１０％から５０％まで変えることができる。いくつかのセンサでは、この読み出し時間中、光を遮断するため電子シャッターを使用しなければならない。したがって、ＣＣＤセンサの「デューティサイクル」は通常５０％から９０％まで変化し、いくつかのカメラでは調節することができる。前記光シャッターは、所望により、該デューティサイクルをさらに減らすため、時々、調節することができる。しかし、フィルムとビデオの両方の場合、最も普通のセンサのデューティサイクルの期間は５０％である。
【００８１】
好ましい速度。この問題を念頭に置いて、７２Ｈｚ又は７５Ｈｚで捕獲された画像シーケンスからいくつかのフレームだけを使用することを考えることができる。二つ、三つ、四つなどのフレームのうち一つのフレームを利用して表１に示す副速度（subrate）を誘導することができる。
【００８２】
【表１】

【００８３】
１５Ｈｚという速度は６０Ｈｚと７５Ｈｚの間を単一化する（unify）速度である。１２Ｈｚの速度は６０Ｈｚと７２Ｈｚの間を単一化する速度である。しかし、２４Ｈｚを超える速度が要求されるとこれらの速度がなくなる。２４Ｈｚは一般的でないが、６０Ｈｚ表示器に提示するため３−２プルダウンを使用することは産業界に受け入れられるようになってきた。したがって最良の候補速度は３０Ｈｚ、３６Ｈｚ及び３７．５Ｈｚである。３０Ｈｚは、７５Ｈｚの７．５Ｈｚビート及び７２Ｈｚの６Ｈｚビートを有しているので、候補として有用ではない。
【００８４】
３６Ｈｚと３７．５Ｈｚの動き速度は、６０Ｈｚ及び７２／７５Ｈｚの表示器に提供されるとき、２４Ｈｚマテリアルより平滑な動きの最上の候補になる。これら速度の両者は、２４Ｈｚより約５０％速くかつ平滑である。３７．５Ｈｚの速度は、６０Ｈｚ又は７２Ｈｚとともに使用するのに適切でないので、除いて、所望の時相速度特性を有しているとして３６Ｈｚだけを残さねばならない（３７．５Ｈｚという動き速度は、テレビジョンに対する６０Ｈｚ表示速度が６２．５Ｈｚまで４％移動できれば使用できる。６０Ｈｚ未満に利益があると、６２．５Ｈｚは好ましくなくなる。新しいテレビジョンシステムに非常に時代遅れの５９．９４Ｈｚの速度を提案する人さえある。しかし、このような変更をなすべきであれば、本発明の他の側面は３７．５Ｈｚの速度に適用できる）。
【００８５】
２４、３６、６０及び７２Ｈｚの速度は、時相速度のファミリーの候補として残される。７２Ｈｚ及び６０Ｈｚの速度は分布速度としては使用できない。というのは、これら二つの速度間の変換を行うとき、２４Ｈｚを上記のように分布速度として使用する場合より、動きはなめらかさが低いからである。仮説によって、発明者らは２４Ｈｚより速い速度を探している。したがって３６Ｈｚが、マスターとして最良の候補であり、動き捕獲と画像分布速度を単一化して６０Ｈｚと７２／７５Ｈｚ表示器に使用される。
【００８６】
上記のように、２４Ｈｚマテリアル用の３−２プルダウンのパターンは、第一フレーム（又はフィールド）を３回、次いで次のフレームを２回、次いで次のフレームを３回、次いで次のフレームを２回など繰り返す。３６Ｈｚを利用するとき、各パターンは、２−１−２パターンで最適に繰り返されなければならない。これは表２に示し、図２に図式的に示してある。
【００８７】
【表２】

【００８８】
３６Ｈｚと６０Ｈｚの間のこの関係は、真の３６Ｈｚマテリアルにのみ成立する。６０Ｈｚマテリアルは、インタレース化されると３０Ｈｚ中に「蓄積する（store）」ことができるが、３６Ｈｚは、動き解析と再構成なしで６０Ｈｚから合理的につくることができない。しかし、動きを捕獲するため新しい速度を探している場合、３６Ｈｚは、６０Ｈｚの上に、２４Ｈｚよりわずかに平滑な動きを提供して、実質的により優れた画像動きの平滑さを、７２Ｈｚ表示器上に提供する。したがって３６Ｈｚは、マスターとして最適の速度であり、動きの捕獲と画像分布速度を単一化して６０Ｈｚと７２Ｈｚの表示器に使用され、かような表示器に提供される２４Ｈｚマテリアルより平滑な動きを生じる。
【００８９】
３６Ｈｚは、上記目的を満たすが適切な唯一の捕獲速度ではない。３６Ｈｚは６０Ｈｚから簡単には抽出できないので、６０Ｈｚは捕獲のために適切な速度を提供しない。しかし、７２Ｈｚは、３６Ｈｚの分布のための基準として用いられるあらゆる他のフレームとともに捕獲のために使用できる。７２Ｈｚマテリアル以外のあらゆる他のフレームの使用に由来する動きブラーは、３６Ｈｚ捕獲の場合の動きブラーの１／２である。７２Ｈｚ由来のあらゆる第三フレームの動きブラー出現の試験は、２４Ｈｚにおけるスタッカートストロービング（staccato strobing）が好ましくないことを示している。しかし、３６Ｈｚ表示器に対して７２Ｈｚ由来のあらゆる他のフレームを利用することは３６Ｈｚのネイティブキャプチャー（native capture）に比べて、眼に不快ではない。
【００９０】
したがって、３６Ｈｚは、７２Ｈｚにおけるキャプチャリングによって、７２Ｈｚ表示器に、非常に円滑な動きを提供する機会を与えるが、７２Ｈｚのネイティブキャプチャーマテリアルの代わりのフレームを用いて３６Ｈｚ分布速度を達成し次に２−１−２プルダウンを利用して６０Ｈｚ画像をもたらすことによって、２４Ｈｚマテリアルより優れた動きを６０Ｈｚ表示器に提供する。
【００９１】
要するに、表３は、本発明による捕獲と分布のために好ましい最適の時相速度を示す。
【００９２】
【表３】

【００９３】
７２Ｈｚカメラからの代替フレームを利用して３６Ｈｚ分布速度を達成するこの技法が、増大した動きブルーデューティサイクルから利益を得ることができることは注目に値する。３６Ｈｚにおいて２５％デューティサイクルを生成する、７２Ｈｚにおける正常な５０％デューティサイクルは、許容可能であることが立証されて、６０Ｈｚと７２Ｈｚの表示器に対して２４Ｈｚを超える有意な改良を示している。しかし、そのデューティサイクルが、７５〜９０％の範囲内に増大したならば、３６Ｈｚの試料は、より一般的な５０％デューティサイクルに近づき始める。デューティ速度の増大は、例えば、短いブランキング時間を有し高いデューティサイクルを生成する「補助記憶装置」のＣＣＤ構造を利用することによって達成することができる。二重ＣＣＤ多重化構造を含む他の方法を利用できる。
【００９４】
変形ＭＰＥＧ−２圧縮
有効な記憶と分布を行うため、３６Ｈｚという好ましい時相速度を有するディジタルソースマテリアル（digital source material）を圧縮しなければならない。本発明の好ましい形態の圧縮は、ＭＰＥＧ−２標準の新規の変形を利用して達成されるが、類似の特性を有する他の圧縮システム（例えばＭＰＥＧ−４）で利用できる。
【００９５】
ＭＰＥＧ−２の基本原理。ＭＰＥＧ−２は、よりコンパクトな符号化データの形態で画像シーケンスを表すのに有効な方法を提供するビデオシンタックスを定義する国際ビデオ圧縮標準である。符号化ビットの言語は「シンタックス」である。例えば、いくつかのトークンが、６４個の試料の全ブロックを表すことができる。また、ＭＰＥＧは復号（再構成）プロセスを説明し、そのプロセスにおいて、符号化ビットが、画像シーケンスの元の「生」フォーマット中にコンパクトに表すことによってマッピングされる。例えば、前記符号化ビット流中のフラグが、以下のビットを離散的コサイン変換（ＤＣＴ）アルゴリズム又は予報アルゴリズムで復号すべきかどうかの信号を送る。復号化のプロセスを含むこれらのアルゴリズムは、ＭＰＥＧで定義される意味規則によって調整される。このシンタックスは、空間冗長性、時相冗長性、均一な動き、空間マスキングなどの共通のビデオ特性を活用するのに適用することができる。実際に、ＭＰＥＧ−２はデータフォーマットのみならずプログラム用言語を定義する。ＭＰＥＧ−２復号器は、受信データ流を解析し復号するが、そのデータ流がＭＰＥＧ−２のシンタックスに従う限り、広範囲の可能なデータ構造や圧縮技法を使用できる。本発明は、ＭＰＥＧ−２標準を利用して時相と解像度のスケーリングを行う新規の手段と方法を工夫することによって上記適応性を利用する。
【００９６】
ＭＰＥＧ−２は、イントラフレーム（intraframe）及び圧縮のイントラフレーム法を利用する。大部分のビデオシーンでは、背景が比較的安定して残るが、アクションが前景に起こる。その背景は移動できるがそのシーンの大部分は冗長である。ＭＰＥＧ−２は、Ｉ（イントラ用）フレームと呼称される参照フレームをつくることによって、その圧縮を開始する。Ｉフレームは、他のフレームにかかわりなく圧縮されるので、ビデオ情報の全フレームを含んでいる。Ｉフレームは、ランダムアクセスのためのデータビットストリームへの入り口点を提供するが、適度に圧縮されるだけである。一般に、Ｉフレームを表すデータは、ビットストリーム中に１０〜１５フレーム毎に配置される。その後は、参照Ｉフレームの間に入るフレームのごく小さい部分だけがブラケッティング（bracketing）Ｉフレームと異なるので、その差だけが捕獲され、圧縮され次いで記憶される。このような差を得るため、２種のフレームすなわちＰ（予測のための）フレーム及びＢ（二方向にインタポレートされる）が利用される。
【００９７】
Ｐフレームは一般に、過去のフレーム（Ｉフレーム又は先行Ｐフレーム）を参照して符号化され、そして、一般に、将来のＰフレームの基準として使用される。Ｐフレームはかなり大きい圧縮を受ける。Ｂフレームの画像は最大の圧縮を提供するが、符号化するために、過去と将来の両方の基準が一般に必要である。２方向フレームは、基準フレームとしては決して使用されない。
【００９８】
またＰフレーム内のマクロブロックは、フレーム内符号化法を利用して、個々に符号化することができる。また、Ｂフレーム内のマクロブロックは、フレーム内符号化法、順方向予報符号化法（forward predicted coding）、逆方向予報符号化法もしくはその順方向と逆方向の両方法、又は二方向に補間された予報符号化法を使用して個々に符号化することができる。マクロブロックは、Ｐフレームの場合は一つの動きベクトルとともにそしてＢフレームの場合は１又は２以上の動きベクトルとともに、四つの８×８ＤＣＴブロックからなる１６×１６画素のグルーピングである。
【００９９】
符号化を行った後、ＭＰＥＧのデータビットストリームは、Ｉ、Ｐ及びＢのフレームのシーケンスを含んでいる。一つのシーケンスは、Ｉ、Ｐ及びＢのフレームのほとんどどんなパターンで構成されていてもよい（その配置については、少数の小さい意味の制限がある）。しかし、固定したパターン（例えばＩＢＢＰＢＢＰＢＢＰＢＢＰＢＢ）を有することは産業界のプラクチスでは普通のことである。
【０１００】
本発明の重要部分として、ベース層、少なくとも一つの任意の時相強化層、及び任意の解像度強化層を含むＭＰＥＧ−２データ流がつくられる。これら層各々については詳細に説明する。
【０１０１】
時相スケーラビリティ
ベース層。このベース層は３６Ｈｚソースマテリアルを運ぶために使用される。好ましい実施態様では、二つのＭＰＥＧ−２フレームシーケンスすなわちＩＢＰＢＰＢＰ又はＩＰＰＰＰＰＰのうち一方を、ベース層に使用できる。後者のパターンは、復号器がＰフレームを復号するためにのみ必要なので、最も好ましく、２４Ｈｚの映画がＢフレームなしで復号されたならば、必要なメモリの帯域幅が小さくなる。
【０１０２】
７２Ｈｚの時相強化層。ＭＰＥＧ−２圧縮を利用するとき、Ｐフレーム距離が一定であれば、３６Ｈｚベース層に対するＭＰＥＧ−２シーケンス中に、Ｂフレームとして３６Ｈｚ時相強化層を埋め込むことが可能である。これによって、単一データ流が３６Ｈｚ表示と７２Ｈｚ表示を支持することができる。例えば、これら両方の層は、復号されて、コンピュータモニタに対して７２Ｈｚ信号を生成することができるが、該ベース層だけが復号され、変換されてテレビジョンに対し６０Ｈｚの信号を生成することができる。
【０１０３】
好ましい実施態様では、ＩＰＢＢＢＰＢＢＢＰＢＢＢＰ又はＩＰＢＰＢＰＢＰＢというＭＰＥＧ−２符号化パターンはともに、時相強化Ｂフレームだけを含む別の流れの中に代替フレームを配置して、３６Ｈｚを７２Ｈｚにすることができる。これらの符号化パターンはそれぞれ図２と３に示してある。図３に示す２フレームＰ間隔（2-Frame P spacing）の符号化パターンには、２４Ｈｚの映画がＢフレームなしで復号されたならば、３６Ｈｚの復号器はＰフレームしか復号する必要がないので、必要なメモリの帯域幅が小さくなるという追加の利点がある。
【０１０４】
高解像度画像の実験が、図３に示す２フレームＰ間隔がほとんどのタイプの画像にとって最適であることを示唆した。すなわち、図３に示す構造は、６０Ｈｚと７２Ｈｚの両者を支持する最適の時相構造を提供するようであり、一方、最新の７２Ｈｚコンピュータコンパチブル表示器に優れた結果を提供する。この構造は、二つのディジタル流すなわちベース層の３６Ｈｚのディジタル流と、強化層Ｂフレームの３６Ｈｚのディジタル流に７２Ｈｚを達成させる。このことは図４に示し、図４は、３６Ｈｚベース層ＭＰＥＧ−２復号器５０が単純にＰフレームを復号して３６Ｈｚの出力を生成し、次いでその出力は、６０Ｈｚ又は７２Ｈｚの表示に直ちに変換できることを示すブロック図である。任意の第二復号器５２が単純にＢフレームを復号して第二３６Ｈｚ出力を生成し、次いでその出力は前記ベース層復号器５０の前記３６Ｈｚ出力と結合されると、７２Ｈｚ出力が生成する（結合方法については以下で考察する）。別の実施態様では、一つの高速ＭＰＥＧ−２復号器５０が、ベース層のＰフレームと強化層のＢフレームの両者を復号することができる。
【０１０５】
最適のマスターフォーマット。多くの会社が、約１１メガ画素／秒で作動するＭＰＥＧ−２復号チップを製造している。ＭＰＥＧ−２標準は、解像度とフレーム速度についていくつかの「プロファイル」を定義している。これらのプロファイルは、６０Ｈｚのようなコンピュータインコンパチブルフォーマットパラメータ、非正方形画素及びインタレースに向かって強くバイアスされているが、多くのチップ製造業者が、「メインプロファイル、メインレベル」で作動する復号器チップを開発中のようである。このプロファイルは、水平解像度が７２０画素まで、垂直解像度が２５Ｈｚまでで５７６ラインまで及びフレーム速度が３０Ｈｚまでで４８０ラインまでと定義されている。約１．５メガビット／秒〜約１０メガビット／秒の広範囲のデータ速度も指定されている。しかし、チップの観点から、重要な問題は画素が復号される速度である。メインレベル・メインプロファイルの画素速度は約１０．５メガ画素／秒である。
【０１０６】
チップ製造業者によって異なるが、大部分のＭＰＥＧ−２復号器のチップは、実際に、高速支援メモリ（fast support memory）を与えられると、１３メガ画素／秒までで作動する。いくつかの復号器チップは２０メガ画素／秒以上の高速で作動する。ＣＰＵチップが毎年、所定のコストで５０％以上の改良がなされるとすると、ＭＰＥＧ−２復号器チップの画素速度に、近い将来、なんらかのフレキシビリティを期待することができる。
【０１０７】
表４は、いくつかの望ましい解像度とフレーム速度、及びそれらの対応する画素速度を示す。
【０１０８】
【表４】

【０１０９】
これらのフォーマットはすべて、少なくとも１２．６メガ画素／秒を生成できるＭＰＥＧ−２復号器チップで利用できる。３６Ｈｚフォーマットにおいて非常に望ましい６４０×４８０はほぼすべての現行チップによって達成できる。というのはこれらチップの速度が１１．１メガ画素／秒であるからである。ワイドスクリーン１０２４×５１２画像は、１．５：１のスクイーズを使用して６８０×５１２にスクイーズすることができるので、１２．５メガ画素／秒を操作できると、３６Ｈｚで支持できる。１０２４×５１２の非常に望ましい正方形画素のワイドスクリーンテンプレートは、ＭＰＥＧ−２復号器チップが１秒当り約１８．９メガ画素を処理できると、３６Ｈｚを達成できる。このことは、２４Ｈｚと３６ＨｚのマテリアルがＰフレームでのみ符号化される場合、一層実現可能になり、その結果、Ｂフレームは、７２Ｈｚ時相強化層復号器にのみ必要になる。Ｐフレームのみを利用する復号器は小さいメモリと小さいメモリ帯域幅しか必要としないので、１９メガ画素／秒という目標に一層到達可能になる。
【０１１０】
１０２４×５１２解像度のテンプレートは、２４ｆｐｓにおいて、２．３５：１及び１．８５：１のアスペクト比のフィルムで使用されることが最も多い。このマテリアルは１１．８メガ画素／秒のみ必要であり、大部分の既存のメインレベル−メインプロファイル復号器の限度内で適合しなければならない。
【０１１１】
２４Ｈｚ又は３６Ｈｚにおけるベース層用の「マスターテンプレート」中のこれらフォーマットのすべてを図６に示す。したがって、本発明は、従来技術と比較して広範囲のアスペクト比と時相解像度を適合させる独特の方法を提供するものである（マスターテンプレートに関するさらなる考察は以下に述べる）。
【０１１２】
７２Ｈｚを生成するＢフレームの時相強化層は、上記画素速度の２倍の画素速度でチップを使用するか、又は復号器メモリに対し追加のアクセスをする並列の第二チップを使用することによって復号することができる。本発明によれば、強化層とベース層のデータ流を併合して、代替のＢフレームを挿入する方法は少なくとも二つある。第一の方法では、併合は、ＭＰＥＧ−２トランスポート層を使用して、符号器チップに対して不可視的に行うことができる。二つのＰＩＤ（プログラムＩＤ）に対するＭＰＥＧ−２トランスポートパケットは、ベース層と強化層を含んでいると認識することができるので、それらのストリームコンテントは両者ともに、２倍の速度で作動できる復号器チップ又は適切に配置構成された一対の通常速度の復号器に簡単に送ることができる。第二の方法では、ＭＰＥＧ−２システム由来のトランスポート層の代わりに、ＭＰＥＧ−２データ流の「データ区分（data partitioning）」機能を使用することが可能である。そのデータ区分機能は、Ｂフレームに、ＭＰＥＧ−２圧縮データ流内の異なるクラスに属しているとマークをつけることができるので、フラグを立てて、時相ベース層速度だけを支持する３６Ｈｚ復号器に無視させることができる。
【０１１３】
ＭＰＥＧ−２ビデオ圧縮によって定義される時相スケーラビリティは、本発明の単純なＢフレーム区分ほど最適ではない。ＭＰＥＧ−２時相スケーラビリティは、前のＰフレーム又はＢフレームから順方向にのみ参照されるので、順方向と逆方向の両方に参照される、本願で提案されているＢフレーム符号化で得られる効力を欠いている。したがって、時相強化層としてＢフレームを単純に使用すると、ＭＰＥＧ−２内で定義されている時相スケーラビリティより、一層単純でかつ有効な時相スケーラビリティが提供される。それにもかかわらず、Ｂフレームを、時相スケーラビリティの機構として上記のように使用することは、ＭＰＥＧ−２に充分適合している。また、これらＢフレームを強化層として、Ｂフレームに対するデータ区分又は別のＰＩＤによって識別する二つの方法も充分適合している。
【０１１４】
５０／６０Ｈｚ時相強化層。上記７２Ｈｚ時相強化層（３６Ｈｚの信号を符号化する）に加えて又はこの層の代わりに、６０Ｈｚ時相強化層（２４Ｈｚの信号を符号化する）を、類似の方式で、３６Ｈｚベース層に加えることができる。６０Ｈｚ時相強化層は、既存の６０Ｈｚでインタレース化されたビデオマテリアルを符号化するのに特に有用である。
【０１１５】
大部分の既存６０Ｈｚインタレース化マテリアルは、アナログのＮＴＳＣ、Ｄ１又はＤ２のフォーマット用のビデオテープである。また、少数の日本のＨＤＴＶ（ＳＭＰＴＥ２４０／２６０Ｍ）もある。このフォーマットで作動するカメラもある。このような６０Ｈｚインタレース化フォーマットは既知の方法で処理され、その結果、その信号がデ−インタレース化され、フレーム速度を変換することができる。この処理は、ロボットビジョンと類似の非常に複雑な画像理解法を必要とする。非常に精巧な技法の場合でさえ、時相エイリアシングが一般に、アルゴリズムによる「誤解」をもたらし、時おりアーチファクトを生じる。画像捕獲の一般的な５０％デューティサイクルとは、カメラが１／２の時間「見ていない」ことを意味することに留意すべきである。映画における「逆方向ワゴンホイール」は、時相誤解のこの通常のプラクチスが原因の時相エイリアシングの一例である。このようなアーチファクトは、一般にヒトが支援する再構成なしでは除くことができない。したがって、自動的に修正できない場合が常にある。しかし、現在の技法で利用できる動き変換の結果は、ほとんどのマテリアルに対して妥当なものでなければならない。
【０１１６】
単一の高精細度のカメラ又はテープ機械の価格はかようなコンバータのコストと類似しているであろう。したがって、いくつものカメラやテープ機械を備えたスタジオにおけるこのような変換のコストは適度なものになる。しかし、このような処理を適切に行うことは、現在、ホームとオフィスのプロダクト（home and office products）の予算額（budget）を超えている。したがって、インタレースを除いてそのフレーム速度を、既存マテリアルに対して変換する複雑な処理は、オリジネーションスタジオで達成することが好ましい。これは図５に示してあり、図５は、カメラ６０又は他のソース（例えばノンフィルムビデオテープ）６２から、３６Ｈｚ信号（３６Ｈｚベース層のみ）及び７２Ｈｚ信号（３６Ｈｚベース層プラス時相強化層からの３６Ｈｚ）を出力できるデ−インタレーサ機能とフレーム速度変換機能を含むコンバータ６４への６０Ｈｚインタレース化入力を示すブロック図である。
【０１１７】
７２Ｈｚ信号（３６Ｈｚベース層プラス時相強化層からの３６Ｈｚ）を出力する別法として、この変換法は、３６Ｈｚベース層上に、デ−インタレースされているが元の６０Ｈｚ信号を再生する第二のＭＰＥＧ−２の２４Ｈｚ時相強化層を生成するように適合させることができる。類似の量子化法を、６０Ｈｚ時相強化層のＢフレームに利用すると、Ｂフレームの数は少ないので、データ速度は、７２Ｈｚ時相強化層よりわずかに低いはずである。
【０１１８】
＞６０Ｉ→３６＋３６＝７２
＞６０Ｉ→３６＋２４＝６０
＞７２→３６，７２，６０
＞５０Ｉ→３６，５０，７２
＞６０→２４，３６，７２
【０１１９】
米国で関心をもたれている大多数のマテリアルは低解像度のＮＴＳＣである。現在、大部分のホームテレビジョンの大部分のＮＴＳＣ信号には、かなりの損傷が見られる。さらに視聴者は、テレビジョンにフィルムを提供するために３−２プルダウンを使用する際に固有の時相損傷を受容するようになっている。ほぼすべてのプライムタイムのテレビジョンは、２４フレーム／秒のフィルムでつくられる。したがって、スポーツ、ニュース及びその外のビデオオリジナルのショーだけはこの方式で処理する必要がある。これらのショーの３６／７２Ｈｚフォーマットへの変換に関連するアーチファクトと損失は、信号の高品質デ−インタレース化に関連する改良によっておぎないやすい。
【０１２０】
６０Ｈｚ（又は５９．９４Ｈｚ）のフィールドに固有の動きブラーは、７２Ｈｚフレームの動きブラーに極めて類似しているはずであることに留意すべきである。したがってベース層と強化層を提供するこの方法は、動きブラーについて、７２Ｈｚオリジネーションに類似しているはずである。それで、ほとんどの視聴者は、インタレース化された６０ＨｚＮＴＳＣマテリアルが、３６Ｈｚベース層に時相強化層からの２４Ｈｚをプラスして加工されて６０Ｈｚで表示されるとき、わずかな改良として気づくことが可能な場合を除いて、前記差に気付かない。しかし新しい７２Ｈｚディジタル非インタレース化テレビジョンを買う人は、ＮＴＳＣを見るときに小さな改良に気付きそして７２Ｈｚで捕獲されるか又は生じる新しいマテリアルを見るときに大きな改良に気付く。７２Ｈｚ表示器に提供される復号化３６Ｈｚベース層でさえ、高品質のディジタルＮＴＳＣと同じほど良好に見え、インタレースのアーチファクトを、より低いフレーム速度で置換する。
【０１２１】
上記同じ方法は、既存のＰＡＬ５０Ｈｚマテリアルを、第二のＭＰＥＧ−２強化層に変換するのに適用することもできる。ＰＡＬビデオテープは、このような変化を行う前に、最も適切に低速にされる。ライブのＰＡＬは、比較的関連のない速度の５０Ｈｚ、３６Ｈｚ及び７２Ｈｚを利用して変換を行う必要がある。このようなコンバータユニットは、現在、放送信号のソースにおいて入手できるだけであるから、家庭や事務所における各受信装置では現在実用的でない。
【０１２２】
解像度のスケーラビリティ
より高い解像度を達成するためベース層上に設けられたＭＰＥＧ−２を利用する階層化解像度スケーラビリティを利用して、ベース解像度テンプレートを強化することができる。強化を行うと、ベース層で１．５ｘと２ｘの解像度を達成できる。２倍の解像度が、３／２と次に４／３を利用し、２ステップで達成することができ、又はそのステップは単一の２倍のステップ（factor-of-two step）でもよい。これを図７に示す。
【０１２３】
この解像度増強の方法は、独立のＭＰＥＧ−２ストリームとして解像度強化層をつくり、次いでＭＰＥＧ−２の圧縮を該強化層に適用することによって達成できる。この方法は、ＭＰＥＧ−２によって定義されて高度に有効ではないことが確かめられている「空間スケーラビリティ」とは異なる。しかし、ＭＰＥＧ−２は、有効な階層化解像度を構築して空間スケーラビリティを提供するすべてのツールをもっている。本発明の好ましい階層化解像度の符号化法を図８に示す。本発明の好ましい復号法を図９に示す。
【０１２４】
解像度層の符号化。図８では、２ｋ×１ｋの原画像８０が、好ましくは負のローブを有する最適化フィルタ（下記図１２の考察参照）を使用して、各次元の解像度が１／２にダウンフィルタされて、１０２４×５１２のベース層８１が生成する。このベース層８１は次に通常のＭＰＥＧ−２アルゴリズムによって圧縮され、伝送に適したＭＰＥＧ−２ベース層８２が生成する。重要なことであるが、ＭＰＥＧ−２の完全な動き補償はこの圧縮ステップ中に利用することができる。次にその同じ信号は、通常のＭＰＥＧ−２アルゴリズムを使って、１０２４×５１２の画像８３に復元される。その１０２４×５１２画像８３は第一の２ｋ×１ｋの拡大像８４に拡張される（例えば、画素の複製によって、又は好ましくはスプライン・インターポレーションなどの優れたアップフィルタ類又は負のローブを有するフィルタによって、以下の図１３Ａと１３Ｂの考察参照）。
【０１２５】
一方、任意のステップとして、前記フィルタされた１０２４×５１２のベース層８１は第二の２ｋ×１ｋの拡大層８５に拡張される。この第二の２ｋ×１ｋ拡大層８５は、元の２ｋ×１ｋの画像８０から減算されて、元の高解像度画像８０と元のベース層画像８１の間の解像度のトップオクターブ（top octave）を示す画像を生成する。その得られた画像は、鮮鋭度ファクター又は重みが任意に乗算され、次に、元の２ｋ×１ｋ画像８０と第二の２ｋ×１ｋ拡大画像８５の差に加えられて、中央重み付け２ｋ×１ｋ強化層ソース画像８６が生成する。次に、この強化層ソース画像８６を、通常のＭＰＥＧ−２アルゴリズムにしたがって圧縮して、伝送に適した別のＭＰＥＧ−２解像度強化層８７が生成する。重要なことであるが、完全なＭＰＥＧ−２の動き圧縮をこの圧縮ステップ中に使用できる。
【０１２６】
解像度の復号。図９において、ベース層８２が、通常のＭＰＥＧ−２のアルゴリズムを使用して、１０２４×５１２の画像９０に復元される。その１０２４×５１２の画像９０は第一の２ｋ×１ｋ画像９１に拡張される。一方、解像度強化層８７は、通常のＭＰＥＧ−２アルゴリズムを使用して、第二の２ｋ×１ｋ画像９２に復元される。次にこの第一２ｋ×１ｋ画像９１と第二２ｋ×１ｋ画像９２を加算して高解像度の２ｋ×１ｋ画像９３が生成する。
【０１２７】
ＭＰＥＧ−２を超える改良。本質において、強化層は、復号されたベース層を拡張し、元の画像と復号されたベース層の差をテイクし、次に圧縮することによってつくられる。しかし、圧縮された解像度強化層を、復号後、ベース層に任意に加えて、復号器内に高解像度画像をつくることができる。本発明の階層化解像度符号化法は、下記の種々の点で、ＭＰＥＧ−２空間スケーラビリティと異なっている。
・強化層の差分写真（enhancement layer difference
picture）は、それ自身のＭＰＥＧ−２データ流として、Ｉ、Ｂ及びＰのフレームによって圧縮される。この差は、本願に提案されている解像度スケーラビリティが、ＭＰＥＧ−２空間スケーラビリティが効果がない場合に有効である主な理由を示す。ＭＰＥＧ−２内で定義される空間スケーラビリティは、アッパー層を、アッパー層写真と拡張ベース層の間の差として、もしくは実際の写真の動きを補償されたＭＰＥＧ−２データ流として又はその両者の組合せとして符号化することができる。しかし、これらの符号化はいずれも有効でない。ベース層との差は、ひとつのＩフレームの差とみなすことができ、この差は、本発明の場合のような動きを補償された差分写真と比べて効果がない。また、ＭＰＥＧ−２内に定義されているアッパー層符号化は、アッパー層の完全な符号化と同一であるから効果がない。したがって、本発明の場合のような差分写真の動きを補償された符号化は実質的に一層有効である。
・強化層は独立したＭＰＥＧ−２データ流であるから、ＭＰＥＧ−２システムのトランスポート層（又は他の類似の機構）を使用して、ベース層と強化層を多重化しなければならない。
・拡張及び解像度低下（ダウン）のフィルタリングは、ガウス関数もしくはスプライン関数、又は負のローブを有するフィルタでもよく（図１２参照）、これらはＭＰＥＧ−２の空間スケーラビリティに規定されている双線形インタポレーションより最適である。
・画像のアスペクト比は、好ましい実施態様の低い層と高い層の間で適合しなければならない。ＭＰＥＧ−２空間スケーラビリティでは、幅及び／又は高さの延長を行うことができる。このような延長は、効率の要件のため、好ましい実施態様では行えない。
・効率の要件及び強化層に使用される極端な大きさの圧縮が原因で、強化層の全領域は符号化されない。強化されない領域は通常境界領域である。したがって、好ましい実施態様の２ｋ×１ｋの強化層ソース画像８６は中央が重み付けされる。好ましい実施態様では、フェージング関数（fading function）（例えば線形重み付け関数）を使用して、強化層を、画像の中央の方に向けて境界の端縁から「フェザー（feather）」させて、画像の突然のトランジションを避ける。さらに、眼がたどるディテールを有する領域を手動で又は自動的に決定する方法を利用して、ディテールを必要とする領域を選択し、そして余分のディテールを必要としない領域を排除することができる。画像全体がベース層のレベルまでディテールを有しているので直像全体が存在している。特に重要な領域だけが強化層から利得を得る。他の基準がなければ、フレームの端縁又は境界は、上記中央重み付けの実施態様のように強化から除外することができる。ＭＰＥＧ−２のパラメータすなわち符号付きの負の整数として使用され、「水平及び垂直のサブサンプリング−ファクター−ｍ＆ｎ」値と結合された「下方層−プレディクション−水平及び垂直−オフセット」パラメータを使用して、強化層の長方形の全体の大きさ及び拡張されたベース層内の配置を指定することができる。
・鮮鋭度ファクターを強化層に加えて、量子化中に起こる鮮鋭度の損失をオフセットする。このパラメータは、元の写真の明瞭性と鮮鋭性を復元するためにのみ利用し、画像を強化するために利用しないように注意しなければならない。図８に関連して先に述べたように、鮮鋭度ファクターは、元の高解像度画像８０と元のベース層画像８１（拡張後）の間の解像度の「高いオクターブ」である。この高オクターブの画像は、高オクターブの解像度の鮮鋭度とディテールを含んでいることに加えて、全くノイズが多い。この画像を加えすぎると、強化層の動きを補償された符号化が不安定になることがある。加えなければならない量は、元の画像のノイズのレベルによって決まる。一般的な重み付け値は０．２５である。ノイズの多い画像には、鮮鋭度を決して加えてはならない。そして、ディテールを保持する従来のノイズ抑制法を使用して、圧縮前の強化層に対し元のノイズを抑制することが得策である。
・時相と解像度のスケーラビリティは、ベース層と解像度強化層の両者において３６Ｈｚから７２Ｈｚへ時相強化を行うためＢフレームを利用することによって混合される。このように、復号性能の四つの可能なレベルは、時相スケーラビリティの二つのレベルで利用可能なオプションがあるので、二つの層の解像度スケーラビリティで可能である。
【０１２８】
これらの差は、ＭＰＥＧ−２の空間スケーラビリティと時相スケーラビリティを超える実質的な改良を示す。しかし、これらの差は、ＭＰＥＧ−２復号器チップと一致しているが、図９に示す解像度強化復号法で拡張と付加を行うには、その復号器に追加の論理が必要である。このような追加の論理は、余り有効でないＭＰＥＧ−２の空間スケーラビリティによって要求される論理とほぼ同一である。
【０１２９】
解像度強化層の任意の非ＭＰＥＧ−２符号化。解像度強化層に対して、ＭＰＥＧ−２とは異なる圧縮法を利用することが可能である。さらに、解像度強化層に対して、ベース層に対する圧縮法と同じ圧縮法を使用する必要はない。例えば、差分層（difference layer）が符号化されるとき、動きを補償されたブロックウェーブレット（motion-compensated block wavelet）を利用して、高い効率で、ディテールに合わせて追跡することができる。ウェーブレットを配置するのに最も有効な位置が、差の大きさが変化するため、スクリーンのまわりでジャンプしても、低振幅の強化層内では気付かれないであろう。さらに、全画像をカバーすることは不要である。すなわち、該ウェーブレットを、ディテールの上に配置することだけが必要である。これらウェーブレットは、画像内のディテール領域による案内で配置することができる。また、その配置は、端縁からバイアスさせることもできる。
【０１３０】
多重解像度強化層。ここで述べるビット速度では、７２フレーム／秒の２メガ画素（２０４８×１０２４）が１８．５メガビット／秒で符号化される場合、ベース層（７２ｆｐｓにおける１０２４×５１２）と単一の解像度強化層だけが成功裡に立証された。しかし、解像度強化層符号化法をさらに改善することから効率がさらに改良されると予想され、多重解像度強化層が可能になるであろう。例えば、５１２×２５６のベース層が、四つの層によって、解像度を、１０２４×５１２、１５３６×７６８及び２０４８×１０２４に強化できると考えられる。このことは、２４フレーム／秒という映画のフレーム速度での、既存のＭＰＥＧ−２符号化法で可能である。７２フレーム／秒などの高いフレーム速度では、ＭＰＥＧ−２は、解像度強化層を符号化する際、この多数の層を現在、許容するのに充分な効率を提供しない。
【０１３１】
マスタリングフォーマット
２０４８×１０２４の画素又はこれに近い画素のテンプレートを使用して、各種のリリースフォーマットに対する単一のディジタル移動画素マスターフォーマットのソースをつくることができる。図６に示すように、２ｋ×１ｋのテンプレートは、通常のワイドスクリーンのアスペクト比：１．８５：１と２．３５：１を有効に支持することができる。また、２ｋ×１ｋのテンプレートは、１．３３：１及びその外のアスペクト比も受け入れることができる。
【０１３２】
整数（特に２のファクター）及び単分数（３／２及び４／３）が解像度階層化の際の最も有効なステップサイズであるが、任意の比率を利用して、必要な解像度階層化を達成することも可能である。しかし、２０４８×１０２４のテンプレート又はそれに近いものを使用すると、高品質のディジタルマスターフォーマットが提供されるだけでなく、ＮＴＳＣすなわち米国のテレビジョン標準を含む、二つのベース層（１ｋ×５１２）のファクターから多くの他の便利な解像度を提供することができる。
【０１３３】
４ｋ×２ｋ、４ｋ×３ｋ又は４ｋ×４ｋなどのより高い解像度でフィルムを走査することもできる。任意の解像度強化を利用して、これらのより高い解像度を、２ｋ×１ｋに近い中央マスターフォーマット解像度からつくることができる。フィルムのためのこのような強化層は、画像のディテール、粒子及び他のノイズ源（例えばスキャナノイズ）で構成されている。このようにノイズがあるので、このような非常に高い解像度を得るため強化層に圧縮法を使用するには、ＭＰＥＧ−２タイプの圧縮に代わる圧縮法が必要である。幸いにも、このようなノイズの多い信号を圧縮するのに利用できるが、依然として所望のディテールを画像に維持する他の圧縮法がある。このような圧縮法の一例は、動き補償ウェーブレット又は動き補償フラクタルである。
【０１３４】
ディジタルマスタリングフォーマットは、既存の映画からつくられる場合そのフィルムのフレーム速度（すなわち２４フレーム／秒）でつくらねばならない。３−２プルダウンとインタレースの両者を共用することは、ディジタルフィルムマスターに対しては不適当である。新しいディジタル電子マテリアルとして、６０Ｈｚインタレースを使うことは近い将来なくなり、ここで提案される７２Ｈｚなどの、コンピュータとよりコンパチブルなフレーム速度によって代替されると考えられる。ディジタル画像マスターは、７２Ｈｚ、６０Ｈｚ、３６Ｈｚ、３７．５Ｈｚ、７５Ｈｚ、５０Ｈｚ又は他のフレーム速度にかかわらず、画像が捕獲されるどんなフレーム速度においてもつくらねばならない。
【０１３５】
すべての電子リリースフォーマット用の単一ディジタルソース写真フォーマットとしてのマスタリングフォーマットの概念は、ＰＡＬ、ＮＴＳＣ、レターボックス、パン−アンド−スキャン、ＨＤＴＶなどのマスターがすべて、フィルムオリジナルから、一般に独立してつくられる既存のプラクチスと異なっている。マスタリングフォーマットを使用すると、フィルムショーとディジタル／電子ショーの両者を、各種の解像度とフォーマットでリリースするために、一度にマスターリングすることが可能になる。
【０１３６】
結合された解像度強化層と時相強化層
上記のように、時相強化と解像度強化の階層化は結合することができる。時相強化は、Ｂフレームを復号することによって行われる。また解像度強化層は二つの時相層を有しているのでＢフレームを含んでいる。
【０１３７】
２４ｆｐｓのフィルムの場合、最も有効でかつ低コストの復号器はＰフレームだけを使用できるので、Ｂフレームの復号操作を省くことによって復号器を単純化するのみならずメモリおよびメモリの帯域幅の両者を最小限にすることができる。したがって、本発明によれば、２４ｆｐｓの映画の復号及び３６ｆｐｓのアドバンスドテレビジョンの復号を行うのに、Ｂフレームの性能なしの復号器を利用できる。次に、図３に示すように、ＢフレームがＰフレーム間に利用されて、７２Ｈｚのより高い時相層を得ることができ、そのＢフレームは第二復号器によって復号できる。この第二復号器はＢフレームだけを復号すればよいので単純化することができる。
【０１３８】
この階層化は、２４と３６のｆｐｓ速度に対して同様にＰフレームとＩフレームだけを利用できる。解像度が強化された層にも当てはまる。その解像度強化層は、その層内でのＢフレームの復号を加えることによって、７２Ｈｚの完全時相速度を高い解像度で加えることができる。
【０１３９】
復号器に対する組み合わされた解像度と時相の拡大縮小可能なオプションを図１０に示す。またこの実施例は、本発明の空間時相階層化アドバンスドテレビジョンを達成するための、約１８メガビット／秒のデータ流の比率の配分を示す。
【０１４０】
図１０において、ベース層ＭＰＥＧ−２の１０２４×５１２画素データ流（好ましい実施態様ではＰフレームだけを含んでいる）が基準解像度復号器１００に加えられる。Ｐフレームに対しては、約５メガビット／秒の帯域幅が必要である。基準解像度復号器１００は２４ｆｐｓ又は３６ｆｐｓで復号することができる。基準解像度復号器１００の出力は、低解像度、低フレーム速度の画像（２４Ｈｚ又は３６Ｈｚの１０２４×５１２画素）を含んでいる。
【０１４１】
同じデータ流からのＢフレームが解析され、基準解像度時相強化層復号器１０２に加えられる。このようなＢフレームに対しては約３メガビット／秒の帯域幅が必要である。また、基準解像度復号器１００の出力は時相強化層復号器１０２にも連結される。その時相強化層復号器１０２は３６ｆｐｓで復号することができる。時相強化層復号器１０２の結合された出力は、低解像度でかつ高フレーム速度の画像（７２Ｈｚの１０２４×５１２の画素）を含んでいる。
【０１４２】
また図１０では、解像度強化層ＭＰＥＧ−２の２ｋ×１ｋ画素データ流（好ましい実施態様ではＰフレームだけを含有している）が、基準時相高解像度強化層復号器１０４に適用される。そのＰフレームに対しては約６メガビット／秒の帯域幅が必要である。また基準解像度復号器１００の出力は高解像度強化層復号器１０４にも連結される。その高解像度強化層復号器１０４は２４ｆｐｓ又は３６ｆｐｓで復号することができる。高解像度強化層復号器１０４の出力は、高解像度でかつ低フレーム速度の画像（２４Ｈｚ又は３６Ｈｚの２ｋ×１ｋ画素）を含んでいる。
【０１４３】
同じデータ流からのＢフレームが解析され、高解像度時相強化層復号器１０６に適用される。このようなＢフレームに対しては、約４メガビット／秒の帯域幅が必要である。前記光学的解像度強化層復号器１０４の出力が高解像度時相強化層復号器１０６に連結される。時相強化層復号器１０２の出力も高解像度時相強化層復号器１０６に連結される。高解像度時相強化層復号器１０６は３６ｆｐｓで復号できる。高解像度時相強化層復号器１０６の結合された出力は、高解像度でかつ高フレーム速度の画像（７２Ｈｚの２ｋ×１ｋ画素）を含んでいる。
【０１４４】
この拡大縮小可能な符号化機構によって達成される圧縮比は、非常に高くて優れた圧縮効率を示すことに注目すべきである。図１０に示す実施例由来の時相オプションとスケーラビリティオプション各々に対する圧縮比を表５に示す。これらの比率は、２４ビット／画素における原始ＲＧＢ画素に基づいている（通常の４：２：２符号化の１６ビット／画素又は通常の４：２：０符号化の１２ビット／画素を要因として入れると、圧縮比はそれぞれ、表５に示した値の３／４及び１／２になる）。
【０１４５】
【表５】

【０１４６】
これらの高い圧力比は、二つの要因によって可能になる。
（１）高フレーム速度７２Ｈｚの画像の高い時相コヒーレンス；
（２）高解像度２ｋ×１ｋの画像の高い空間コヒーレンス；
（３）画像の重要な部分（例えば中央部分）に解像度ディテールの強化を適用し、余り重要でない部分（例えばフレームの境界には適用しない）。
【０１４７】
これらの要因は、ＭＰＥＧ−２符号化シンタックスの強さ（strength）を利用することによって、本発明の階層化圧縮法で活用される。これらの強さは、時相スケーラビリティに対して２方向に内挿されたＢフレームを含む。また、このＭＰＥＧ−２シンタックスは、ベース層と強化層の両者に動きベクトルを使用することによって有効な動きを表現する。高いノイズと迅速な画像の変化のいくらかのしきい値まで、ＭＰＥＧ−２は、ＤＣＴ量子化とともに動き補償によって、強化層内のノイズの代わりに、符号化のディテールにおいて有効である。このしきい値を超えると、データ帯域幅は、ベース層に最もよく配分される。これらのＭＰＥＧ−２の機構は、本発明にしたがって使用されると、協力して働き、時相と空間の両方を拡大縮小可能である高度に効率的でかつ有効な符号化を行う。
【０１４８】
ＣＣＩＲ６０１ディジタルビデオの５メガビット／秒符号化と比較して、表５に示す圧縮比ははるかに高い。その原因の一つは、インタレースが原因でいくらかのコヒーレンスが損失することである。インタレースは、次のフレームとフィールドを予測する性能及び垂直方向に隣接している画素の相関関係に負の影響をする。したがって、ここで述べる圧縮効率の利得の主な部分は、インタレースがないことが原因である。
【０１４９】
本発明で達成される大きな圧縮比は、各ＭＰＥＧ−２マクロブロックを符号化するのに利用可能なビットの数の釣合いから考えることができる。上記のように、マクロブロックは、Ｐフレームに対する一つの動きベクトル及びＢフレームに対する１又は２以上の動きベクトルを有する、四つの８×８ＤＣＴブロックからなる１６×１６画素のグルーピングである。各層の一マクロブロック当り、利用可能なビットを表６に示す。
【０１５０】
【表６】

【０１５１】
各マクロブロックを符号化するためのビットの利用可能な数は、ベース層より強化層の方が少ない。ベース層はできるだけ性質が優れていることが望ましいので、上記のことが適切である。動きベクトルは８ビット程度が必要であり、マクロブロックタイプの符号、及び四つの８×８ＤＣＴブロックのすべてに対するＤＣ係数とＡＣ係数に対して１０〜２５ビットを残す。これはごく少数の「戦略的」ＡＣ係数にしか空間を残さない。したがって、各マクロブロックに利用可能な情報の大部分は、統計的に、強化層の前のフレームから出なければならない。
【０１５２】
強化差分画像（enhancement difference image）で表されるディテールの高オクターブを示すため充分なＤＣ係数とＡＣ係数を符号化するために利用可能なデータ空間が充分にないので、ＭＰＥＧ-２空間スケーラビリティがこれらの圧縮比では役に立たない理由は容易に分かる。この高いオクターブは、第五〜第八の水平ＡＣ係数と垂直ＡＣ係数で表される。１ＤＣＴブロック当り利用可能なビットがごく少数しかない場合、これらの係数には到達できない。
【０１５３】
ここに述べるシステムは、前の強化差分フレームからの動きを補償された予測を利用することによって、その効力を得る。このことは、時相と解像度（空間）の階層化された符号化に優れた結果をもたらすのに明白に有効である。
【０１５４】
優雅な縮退。ここで述べる時相スケーリング法と解像度スケーリング法は、２ｋ×１ｋの原起源を用いて、７２フレーム／秒で正常に作動するマテリアルに対して良好に作動する。また、これらの方法は、２４ｆｐｓで作動するフィルムベースマテリアルに対しても良好に作動する。しかし、高フレーム速度において、非常にノイズの多い画像が符号化されるとき、又は画像流内に多数のショートカットがある場合、その強化層は、有効な符号化を行うために必要なフレーム間のコヒーレンスを失うことがある。典型的なＭＰＥＧ-２符号器／復号器のバッファフルネス（buffer fullness）／速度制御の機構は量子化器（quantizer）を非常に粗い設定に設定しようとするので、上記損失は容易に検出される。この状態に遭遇すると、該解像度強化層を符号化するのに通常使用されるすべてのビットは、ベース層がストレスの多いマテリアルを符号化するためできるたけ多数のビットを必要とするから、ベース層に割り当てることができる。例えば、７２フレーム／秒にてベース層の一フレーム当り約０．５メガ画素と０．３３メガ画素の間において、得られる画素速度は２４〜３６メガ画素／秒である。利用可能なビットをすべてベース層に適用すると、１８．５メガビット／秒で一フレーム当り約０．５〜０．６７×１００万の追加のビットを提供し、このビットは、ストレスの多いマテリアルに対してさえ、非常に良好に符号化するのに充分であろう。
【０１５５】
より極端な場合、あらゆるフレームがノイズが多い及び／又はカットを起こすあらゆる少数フレームがある場合、ベース層の解像度がそれ以上損失することなく適切に縮退することができる。これは、時相強化層を符号化するＢフレームを除くことによって行うことができ、その結果、ベース層のＩフレームとＰフレームに対して利用可能な帯域幅（ビット）すべてを３６ｆｐｓで使用することができる。これは、各ベース層フレームに利用可能なデータの量を約１．０と約１．５メガビット／フレームの間に増やす（ベース層の解像度に応じて）。これは、やはり、極端にストレスの多い符号化条件下であっても、ベース層のかなり高い品質の解像度において、３６ｆｐｓのかなり良好な動き表示（motion rendition）速度を生じる。しかしベース層の量子化器が、約１８．５メガビット／秒下、３６ｆｐｓで粗いレベルでまだ作動している場合、ベース層のフレーム速度は、動的に、２４フレーム／秒、１８フレーム／秒又は１２フレーム／秒にまでも低下させることができ（あらゆるフレームに対し１．５と４メガビットの間が利用可能になる）、最も病的な移動画像のタイプでさえ処理できるであろう。このような環境下でフレーム速度を変える方法は当該技術分野で知られている。
【０１５６】
米国のアドバンスドテレビジョンに対する現在の提案は、これらの適切な縮退法を許容できないので、本発明のシステムと同じようには、ストレスの多いマテリアル対して良好に機能できない。
【０１５７】
大部分のＭＰＥＧ−２符号器では、適応できる量子化レベルは、出力されるバッファフルネスによって制限される。本発明の解像度強化層に関連する高い圧縮比においては、この機構は最適には機能できない。各種の方法を利用して、最も適切な画像領域にデータを最適に割り当てることができる。概念的に最も簡単な方法は、解像度強化層の符号化のプレパス（pre-pass）を実施して、統計データを集め、かつ保存しなければならないディテールをさがし出すことである。前記プレパスから得た結果を利用して、適応性のある量子化を設定し、解像度強化層のディテールの保存を最適化することができる。これらの設定は、画像に対して不均一に、人工的にバイアスすることも可能であり、その結果、画像のディテールは、バイアスされて、主要スクリーン領域に、そしてフレームの端縁のマクロブロックからはなして割り当てられる。
【０１５８】
既存の復号器はこのような改良を保たずに良好に機能するので、強化層境界を高いフレーム速度で残すことを除いて、これらの調節はどれも不要である。しかし、このようなさらなる改良は、強化層符号器にわずかな追加の努力を行うことによって達成できる。
【０１５９】
結論
新しい共通の基本時相速度として３６Ｈｚを選ぶことが最適のようである。このフレーム速度を使用した実例は、このフレーム速度が、６０Ｈｚと７２Ｈｚの両方の表示器に対して、２４Ｈｚを超える有意な改良を行うことを示している。３６Ｈｚの画像は、７２Ｈｚ画像捕獲からのすべての他のフレームを利用することによってつくることができる。これによって、３６Ｈｚのベース層（好ましくはＰフレームを使用）と、３６Ｈｚの時相強化層（Ｂフレームを使用）とを結合して７２Ｈｚの表示器を達成できる。
【０１６０】
７２Ｈｚの「フューチャールッキング（future-looking）」速度は本発明の方法によって損われることはなく、６０ＨｚアナログＮＴＳＣ表示器に対し移行できる。また、本発明は、考慮中の他の受動エンターテイメントだけ（コンピュータインコンパチブル）の６０Ｈｚフォーマットを許容できるならば、他の６０Ｈｚ表示器へ移行できる。
【０１６１】
解像度のスケーラビリティは、解像度強化層に対し別のＭＰＥＧ−２画像データ流を使用することによって達成することができる。解像度スケーラビリティはＢフレーム法を利用して、ベース解像度層と強化解像度層の両者に時相スケーラビリティを提供することができる。
【０１６２】
ここに述べる発明は、多数の非常に望ましい特徴を達成する。解像度スケーラビリティ又は時相スケーラビリティは、地上放送で利用できる約１８．５メガビット／秒にて、高精細度の解像度で達成できないと、米国アドバンスドテレビジョンプロセスの関係者が主張している。しかし本発明は、この利用可能なデータ速度内で、時相スケーラビリティと空間解像度スケーラビリティの両者を達成する。
【０１６３】
また、高フレーム速度の２メガ画素は、利用可能な１８．５メガビット／秒のデータ速度ではインタレースを使用することなしで達成することができないと主張されている。しかし、本発明は、空間解像度と時相のスケーラビリティを達成するのみならず、７２フレーム／秒で２メガ画素を提供することができる。
【０１６４】
これらの性能を提供するのに加えて、本発明は、特に、アドバンスドテレビジョンプロセスに対する現在の提案に比べて非常に堅牢でもある。これは、非常にストレスの多い画像マテリアルに遭遇したとき、大部分のビット又はすべてのビットをベース層に割り当てることによって可能になる。このようなストレスの多いマテリアルは、本来、ノイズが多くしかも非常に速く変化する。このような環境で、眼は、解像度の強化層に関連するディテールを見ることができない。前記ビットはベース層に適用されるので、その複製されるフレームは、単一で一定の高解像度を利用する、現在提案されているアドバンスドテレビジョンシステムより実質的に正確である。
【０１６５】
このように、本発明のシステムのこの側面は、知覚と符号化の効率を最適化し、最高の視感インパクトを与える。このシステムは、多くの人が不可能であると考えてきた解像度とフレーム速度の性能の非常に清浄な画像を提供する。本発明のシステムのこの側面は、現時点までに提案されているアドバンスドテレビジョンフォーマットより性能が優れていると考えられる。この予想される一層優れた性能に加えて、本発明は、時相と解像度の階層化という価値の高い特徴も提供する。
【０１６６】
上記考察では、その実施例に、ＭＰＥＧ−２を利用したが、本発明のこれらの及び他の側面は、他の圧縮システムを利用して実施することができる。例えば本発明は、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４、Ｈ．２６３などの圧縮システム（ウェイブレットなどの非ＤＣＴシステムを含む）のようなＩ、Ｂ、及びＰのフレーム又はその均等物を提供する類似の標準によって作動する。
【０１６７】
階層化圧縮の強化
概要
上記実施態様のいくつかの強化を行って、ビデオ画質と圧縮の各種問題点を処理することができる。以下に、このような強化のいくつかを説明するが、これらの強化は大部分、好ましくは、画像を強化し次いでその画像を圧縮するタスクに適用できる一組のツールとして実施される。これらのツールは、コンテント・デベロッパ（content developer）によって各種の方式で所望どおりに結合されて、圧縮されたデータ流、特に階層化された圧縮データ流の視感画質と圧縮効率を最適化することができる。
【０１６８】
強化層の動きベクトルとグレイバイアス（gray bias）
解像度強化層を、ＭＰＥＧタイプ（例えばＭＰＥＧ−２、ＭＰＥＧ−４又はこれに匹敵するシステム）の圧縮を利用して符号化する通常の方法は、差分写真（difference picture）をグレイバイアスでバイアスする方法である。０＝ブラックから２５５＝ホワイトまでの通常の８ビット画素値の範囲内で、中間点の１２８が、グレイバイアス値として通常使用される。１２８より低い値は画像間の負の差を表し、そして１２８を超える値は画像間の正の差を表す（１０ビットシステムの場合、グレイは５１２であり、他のビット範囲では０＝ブラックおよび１０２３＝ホワイトなどである）。
【０１６９】
該差分写真は、拡張され次いで復元されたベース層を、元の高解像度の画像から減算することによって見出される。これら差分写真のシーケンスが、次に正常のＭＰＥＧタイプ写真流として作動するフレームのＭＰＥＧタイプ差分写真流として符号化される。該グレイバイアス値は、各差分写真が、解像度が改良されるように別の画像（例えば、拡張された復号ベース層）に加えられるときに除かれる。
【０１７０】
ノイズを減らすのに有益なもう一つのソースは、前と次のフレームからの情報（すなわち時間的中央値、時相中央値）である。以下に述べるように、動き解析は、動く領域に対しては最良の整合を提供する。しかし、動き解析は計算集中的（compute intensive）である。画像の一領域が動いていないか又はゆっくり動いている場合、現行画素からのレッド値（及びグリーン値とブルー値）は、前のフレームと次のフレーム中の同じ画素位置のレッド値でフィルタされた中央値でよい。しかし、有意な動きがあってしかもかような時相フィルタが使用されると、異常なアーチファクトが起こることがある。したがって、しきい値を第一に選んで、このような中央値が、現行画素の値から、選択された大きさを超えて異なっているかどうかを確認することが好ましい。そのしきい値は上記デ−インタレース化のしきい値の場合とほぼ同様にして、下記のようにして計算することができる。
Ｒｄｉｆｆ＝Ｒ＿現行＿画素マイナスＲ＿時相＿中央値
Ｇｄｉｆｆ＝Ｇ＿現行＿画素マイナスＧ＿時相＿中央値
Ｂｄｉｆｆ＝Ｂ＿現行＿画素マイナスＢ＿時相＿中央値
しきい値処理値＝ａｂｓ（Ｒｄｉｆｆ＋Ｇｄｉｆｆ＋Ｂｄｉｆｆ）＋ａｂｓ（Ｒｄｉｆｆ）＋ａｂｓ（Ｇｄｉｆｆ）＋ａｂｓ（Ｂｄｉｆｆ）
【０１７１】
上記しきい値処理値を次にしきい値設定値と比較する。典型的なしきい値設定値は０．１〜０．３の範囲内であり、０．２が一般的である。そのしきい値より高ければ、現行値が保持される。そのしきい値より低ければ時間的中央値が使用される。
【０１７２】
追加の中央値のタイプは、Ｘ、Ｙ及び時間的中央値から選択される中央値である。もう一つの中央値のタイプは、時間的中央値を選び、次にそれからのＸとＹの中央値の等平均値を選択する。
【０１７３】
各タイプの中央値は問題を起こすことがある。ＸとＹの中央値は画像を不鮮明にし（smear）かつブラー（blur）させるので、画像は「グリーシー（greasy）」に見える。時間的中央値は、時間が経過するにつれて動きを不鮮明にする。各中央値は、問題をもたらししかも各中央値の特性が異なっているので（ある意味では「直交している（orthogonal）」）、各種の中央値を組み合わせることによって最良の結果が得られることが実験によって確認された。
【０１７４】
特に、中央値の好ましい組合せは、現行画像の各画素に対する値を決定する下記５項目の線形重み付け合計（線形ビデオプロセッシングに関する上記考察参照）である。
現画像の５０％（したがって最大のノイズ低下は３ｄｂ又は１／２である）；
ＸとＹの中央値の平均値の１５％；
しきい値処理された時間的中央値の１０％；
しきい値処理された時間的中央値のＸとＹの中央値の平均値の１０％；
及び３ウェイのＸ、Ｙ及び時間的中央値の１５％。
【０１７５】
このように、ベース層は、高解像度で強化された画像より狭いか又は短い（又は両方）画像の大きさを表すことができる。その結果、該強化層は、実際の写真を含んでいるのみならず、拡張された復元ベース層（すなわち拡張されたベース領域１１０２）の大きさに対応するグレイでバイアスされた画像の差分写真を含んでいる。圧縮された強化層は標準のＭＰＥＧタイプの写真流として符号化されるので、端縁領域の実際の写真でありそして内部領域が差分写真であることは識別されず、両者符号化されて、フレームの同じ写真流でともにはこばれる。
【０１７６】
好ましい実施態様では、拡張された復元ベース層の大きさの外側の端縁領域は、通常の高解像度ＭＰＥＧタイプ符号化流である。上記端縁領域は、高解像度写真の通常のＭＰＥＧタイプ符号化に対応する効率を有している。しかし、それは端縁領域であるから、差分写真領域内の動きベクトルは、境界領域（実際の写真情報を含む）を指向しないように拘束しなければならない。また、境界の実写真領域の動きベクトルは、内部差分写真領域を指向しないように拘束しなければならない。このように、境界の実写真領域の符号化と差分写真領域の符号化は当然分離される。
【０１７７】
上記のことは、原画像のすべての動きベクトルを見つけることによって達成できるが、動きベクトルを、内側の差分画像領域と外側境界の実写真領域の間の境界を横切らないように強制する。マクロブロックの境界が、該内側差分写真領域と外側境界の実写真領域との間の境界に入る場合、上記のことは最高に実施される。その外に、実写真の境界を有する差分写真端縁がマクロブロックの中央領域内にある場合、差分写真領域と実写真境界の間の遷移を達成するため符号化する際に追加のビットを使用する必要がある。したがって、マクロブロックの境界が、内側差分写真領域と外側境界の実写真領域の間の端縁と同じ端縁に存在している場合、最大の効率が得られる。
【０１７８】
これらのハイブリッド差−プラス−実写真画像−拡張強化層写真を、符号化中の量子化器や速度バッファ制御装置は、境界実写真領域の信号の大きさが、内側差分写真領域のそれより大きいことを識別するため特別に調節する必要があることに留意すべきである。
【０１７９】
境界実写真領域の大きさについてこの方法を使用する際、トレードオフがある。境界の伸長が小さい場合、全流れに比例するビットの数は小さいが、動きベクトルの数が整合しないのでその小面積の相対効率は低下する。というのは、この整合が該境界領域の端縁から外れているからである。これを調べるもう一つの方法は、辺／面積比（proportion of edge to area）が非常に小さい通常の画像長方形と異なり、境界領域が高い辺／面積比を有していることである。ＭＰＥＧ−２又はＭＰＥＧ−４などの圧縮によって通常、符号化される、通常のディジタルビデオの典型的な内部長方形写真領域は、フレーム内の領域の大部分が、フレーム端縁の領域を除いて、通常、前のフレーム中に存在しているので、動きベクトルを見つけるとき、高い整合度を有している。例えば、パン上で、写真が現れるスクリーンの方向は、画像が各フレームに対してオフスクリーンから現れるので、一つの端縁に無から写真をつくらせねばならない。しかし、大部分の通常の写真長方形は前のフレームにおいてオンスクリーンであるので、動きベクトルを整合させることが最も多い。
【０１８０】
しかし、この本発明の境界伸長法を利用すると、該境界領域は、動きを補償する場合、前のフレーム中のオフスクリーンミスマッチ（off-screen mismatch）の比率が非常に高い。というのは、そのスクリーンの外側端縁と差分写真の内側端縁がともに、動きベクトルについて「制御対象外（out-of-bounds）」だからである。このように、効率がいくらか損失することは、ビット／画像面積（又は均等なビット／面積の尺度であるビット／画素もしくはビット／マクロブロック）として考察すると、この方法に固有のものである。したがって、境界領域が比較的小さいとき、この相対的非効率は、許容可能な全ビット速度の充分小さい部分である。境界が比較的大きい場合、同様に、効率が高くなり、その部分はやはり許容可能である。中位の大きさの境界はパン中にいくらか非効率になるが、この非効率は許容可能である。
【０１８１】
効率をこの技法を使用して回復させることができる一方法は、ベース層解像度／強化層解像度のより単純な比率、例えば３／２、４／３及び特に２という完全ファクター（exact factor）をより狭いベース層に使える方法である。２というファクターを使用すると、特に、ベース層と解像度強化層を使用して全体を符号化する際に有意な効率を得るのに役立つ。
【０１８２】
また、低解像度の画像はより狭いスクリーンに最も自然に使用できるが、高解像度の画像は、より大きく、幅が広く及び／又は高さが高いスクリーンでより自然に見ることができる。
【０１８３】
ベース層解像度画像に対して「パン（pan）と走査」を実施するのに対応して、内側差分写真領域を連続的に動かすか又は再配置することも可能である。そのとき、上部境界は動的なリポジション（dynamic re-position）と大きさと形態を有しているであろう。マクロブロックのアラインメントは、連続的パニングで通常失われるが、注意深くより大きな領域内にカット（cut）をアラインさせれば維持できる。しかし、最も単純で最も有効な構造は、完全なマクロブロックの境界上のベース層に対し内側差分写真の固定位置で心合せしたアラインメントである。
【０１８４】
画像のフィルタリング
ダウンサイジングフィルタとアップサイジングフィルタ
ベース層を高解像度の原写真からつくる際に使用されるダウンサイジングフィルタが、適度な負のローブ、及びこの負のローブに続く非常に小さい第一の正のローブの後で停止する大きさを有していると、最適であることを実験が示している。図１２は、好ましいダウンサイジングフィルタの相対的形態、振幅及びロープの極性の線図である。このダウンフィルタは、中央の正のローブ１２００、隣りの（挟んでいる）小さい負のローブ１２０２の対称及び隣りの（挟んでいる）非常に小さい外側の正のローブ１２０４の対称対の台形にされた中央重み付け関数である。これらローブ１２００、１２０２、１２０４の絶対振幅は、図１２に示す相対極性と相対振幅の不等相関関係が維持される限り、所望どおりに調節することができる。しかし、その相対振幅の優れた第一近似は、接頭sinc関数[sinc（ｘ）＝sin（ｘ）／ｘ]で定義される。このようなフィルタは別々に使用することができ、このことは、水平のデータ次元が独立してフィルタされ次にサイズ変更され次いで垂直のデータ次元が同様に処理され、また逆に処理されて、結果は同じであることを意味する。
【０１８５】
ベース層オリジナルを（ベース層圧縮に対する入力として）、低ノイズ高解像度オリジナル入力からつくるとき、好ましいダウンサイジングフィルタは、正規のsinc関数の振幅を有する第一負ローブを有している。清浄でかつ高解像度の入力画像の場合、この正規接頭sinc関数（normal truncated sinc function）は良好に働く。低解像度（例えば１２８０×７２０、１０２４×７６８又は１５３６×７６８）の場合、及びノイズの多い入力写真の場合、該フィルタの第一負ローブの振幅は小さくした方がより最適である。このような場合の適切な振幅は、接頭sinc関数の負ローブの振幅の約１／２である。第一負ローブの外側の小さい第一正ローブも、一般に正規sinc関数の振幅の１／２〜２／３である。第一負ローブを減らすことの影響は重要な問題点である。なぜならば、外側の小さい正のローブは写真のノイズに寄与しないからである。第一正ローブの外側のさらなる試料は好ましくは切り縮められて、リンギングなどの潜在的アーチファクトを最小限にする。
【０１８６】
ダウンフィルタのよりマイルドな負のローブ（milder negative lobe）又は完全なsinc関数振幅の負のローブを用いるかどうかの選択は、原画像の解像度とノイズのレベルによって決まる。いくつものタイプのシーンが他のシーンより符号化を行いやすい（主として動きの大きさと特定のショットの変化に関連している）ので、前記選択はいくぶん画像コンテントの関数である。負のローブを減らした「よりマイルドな」ダウンフィルタを使用することによって、ベース層のノイズが減少しかつベース層のより清浄でかつより静かな圧縮が達成され、その結果アーチファクトが少なくなる。
【０１８７】
また、実験は、最適のアップサイジングフィルタが、中央の正ローブと隣りの小さい負のローブを有しているがそれ以上に正のローブを有していないことも示した。図１３Ａと１３Ｂは、ファクター２でアップサイズする好ましいアップサイジングフィルタ一対の相対形態、振幅及びローブ極性の線図である。中央の正のローブ１３００、１３００’は、一対の小さな負のローブ１３０２、１３０２’に挟まれている。非対称に配置された正のローブ１３０４、１３０４’も必要である。また、これらペアのアップフィルタも、新しくつくられた試料にセンタリングされた接頭sincフィルタと考えられる。例えば、ファクターが２のフィルタとして二つの新しい試料が各原試料に対してつくられる。隣りの小さい負のローブ１３０２、１３０２’は、対応するダウンサイジングフィルタ（図１２）に使用される場合又は正常画像に対して最適の（サインベースの）アップサイジングフィルタに使う場合より小さい負の振幅をもっている。これはアップサイズされている画像が復元されるからであり、そして圧縮プロセスはスペクトル分布を変える。したがって、より適度の負のローブが、中央の正のローブ１３００、１３００’以外に追加の正のローブなしで、復元されたベース層をアップサイズするのにより良好に働く。
【０１８８】
実験は、わずかな負のローブ１３０２、１３０２’が、正だけのガウスアップフィルタ又はスプラインアップフィルタよりも良好な階層化の結果を提供することを示した（スプラインアップフィルタは負のローブをもつことがあるが、正だけの形態で使用されることが最も多いことに注目すべきである）。したがって、このアップライジングフィルタは、符号器及び復号器の両者のベース層に使用される。
【０１８９】
写真ディテールの高オクターブの重み付け
好ましい実施態様では、原非圧縮ベース層入力画像を拡張する信号経路が、上記アップフィルタではなくてガウスアップフィルタを使用する。特に、ガウスアップフィルタは、写真ディテールの「高オクターブ」に使用され、その「高オクターブ」は、拡張された原ベース解像度入力画像（圧縮を利用せず）を、原写真から減算することによって求められる。したがって、この特別のアップフィルタされた拡張に対して負のローブは全く使用されない。
【０１９０】
上記のように、ＭＰＥＧ−２の場合、この高オクターブ差の信号経路は一般に０．２５（すなわち２５％）重み付けされ、拡張された復元ベース層に（上記の他のアップフィルタを使用して）、強化層圧縮プロセスに入力として加えられる。しかし、実験は、１０％、１５％、２０％、３０％及び３５％の重みが、ＭＰＥＧ−２を使う場合、特定の画像に有用であることを示した。他の重みも有用であることが立証できる。ＭＰＥＧ−４の場合、４〜８％のフィルタ重みは、下記の他の改良とともに利用されると最適であることが見出されている。したがって、この重み付けは、符号化システム、符号化／圧縮されるシーン、使用される特定のカメラ（又はフィルム）及び画像の解像度に応じて、調節可能なパラメータとみなすべきである。
【０１９１】
デ−インタレーシング（de-interlacing）及びノイズ低下の強化
概要
実験は、多くのデ−インタレーシングのアルゴリズムと装置は、ヒトの眼に対応して、フィールドを結合し受容可能な結果をつくることを示した。しかし、圧縮のアルゴリズムはヒトの眼ではないので、デ−インタレースされたフィールドの結合は、このようなアルゴリズムの特性を考慮しなければならない。このような注意深いデ−インタレースされた結合がないと、圧縮プロセスは、高レベルのノイズアーチファクトを生じ、画像の外観をアーチファクトでノイジーにかつビジィ（noisy and busy）にするのみならずビットを浪費する（圧縮を妨害する）。視聴する場合[例えばライン−ダブラー（line-doubler）及びライン−クワドラプラー（quadrupler）で]のデ−インタレーシングと、圧縮に対する入力としてのデ−インタレーシングの差異が下記の技法をもたらした。特に、下記のデ−インタレーシング法は、上記階層化ＭＰＥＧ式、圧縮のみならず単一層非インタレース化ＭＰＥＧ式圧縮に対する入力として有用である。
【０１９２】
さらにノイズの減少は、ノイズ出現を減らすこと以外に、圧縮アルゴリズムへの入力であるという要求に、同様に整合しなければならない。その目標は、一般に、復元時に、原カメラ又はフィルム粒子のノイズを越えるノイズを一般に再現しないことである。等しいノイズは一般に、圧縮／復元の後、受け入れ可能と考えられる。ノイズが減られて、オリジナルと等しい鮮鋭さと清浄度を有することはボーナス（bonus）である。下記のノイズ減少はこれらの目標を達成する。
【０１９３】
さらに、通常、光が少ない、例えば高感度フィルムから又は高いカメラ感度の設定で非常にノイズが多いショットの場合、ノイズ減少は、優れた外観の圧縮／復元された画像と見るに耐えないほどノイズの多い画像との差である。圧縮プロセスは、圧縮器に対する許容性（acceptability）のなんらかのしきい値を超えるノイズを大きく増大する。したがって、ノイズをこのしきい値より低く保つために、ノイズ減少のプリプロセッシングを利用することが、許容可能な良質の結果を得るために必要である。
【０１９４】
デ−グレイニングフィルタ（de-graining
filter）とノイズ減少フィルタ
階層化された符号化又は階層化されていない符号化を行う前にデ−グレイニングフィルタリング及び／又はノイズ減少フィルタリングを適用すると、圧縮システムが実行する性能が改良されるということが実験によって見出された。グレイン又はノイズの多い画像に対し、圧縮を行う前にデ−グレイニング又はノイズ減少を行うと最も効果的であるが、両方の方法は、比較的ノイズが低いが又はグレインが少ない写真に対してさえ適度に利用すると有用である。幾種類もの既知のデ−グレイニングアルゴリズム又はノイズ減少アルゴリズムを適用できる。その例は「コアリング（coring）」、単純隣接中央値フィルタ類及びソフトニングフィルタ類である。
【０１９５】
ノイズ減少が必要であるかどうかは、原画像がどれほどノイズが多いかによって決定される。インタレース化された原画像の場合、インタレース自体はノイズの一形態であり、そしてその原画像は、下記の複雑なデ−インタレーシングプロセスに加えて追加のノイズ減少フィルタリングが通常必要である。プログレッシングスキャン（インタレースなし）のカメラ又はフィルム画像の場合、ノイズプロセッシングは、ノイズが特定のレベルを超えて存在しているとき、階層化圧縮及び非階層化圧縮を行うのに有効である。
【０１９６】
異なるタイプのノイズがある。例えば、フィルムからのビデオトランスファーはフィルムグレインノイズを含んでいる。フィルムグレインノイズは、イエロー、シアン及びマゼンタのフィルム色素に結合している銀粒子によって生じる。イエローはレッドとグリーンの両者に影響し、シアンはブルーとグリーンの両者に影響し、そしてマゼンタはレッドとブルーの両者に影響する。レッドはイエロー色素とマゼンタ色素の結晶がオーバーラップした場所に生成する。同様に、グリーンはイエローとシアンのオーバーラップしたものでありそしてブルーはマゼンタとシアンのオーバーラップしたものである。したがって、カラー間のノイズは、カラーのペア間の色素と粒子によって、部分的に相関関係がある。さらに、多数の粒子が三色全体でオーバーラップすると、これら粒子は、画像のプリントのダーク領域で又は画像のライト領域のネガ上で（ネガ上のダーク）オーバーラップするので、追加の色混合が生じる。カラー間のこの相関関係は、フィルムのグレインノイズを減らすのに利用できるが複雑なプロセスである。さらに、多数の異なるフィルムのタイプが使用され、そして各タイプは、粒子の大きさ、形態及び統計的分布状態が異なっている。
【０１９７】
ＣＣＤセンサ及び他の（例えば管）センサカメラがつくるビデオ画像の場合、レッド、グリーン及びブルーのノイズは相関関係がない。この場合、レッド、グリーン及びブルーの記録を別々に処理することが最良である。したがって、レッドのノイズは、グリーンノイズとブルーノイズを、別々に、セルフレッド処理（self-red processing）することによって減らされ、同じ方法がグリーンノイズとブルーノイズに当てはまる。
【０１９８】
したがって、ノイズの処理は、ノイズ源自体の特性に最良に整合される。コンポジット画像（複数のソースからの）の場合、そのノイズが、画像の異なる部分では特性が異なることがある。この場合、ノイズ処理が必要なとき、汎用ノイズ処理（generic noise processing）が唯一の選択肢である。
【０１９９】
場合によっては、圧縮された階層化データ流を復号した後、有意義な作用として、「リ−グレイニング（re-graining）」又は「リ−ノイジング（re-noising）」を実行することが有用であることも見出された。というのは、一部のデーグレイン化又はデーノイズ化された画像が、外観が「清浄すぎる」か又は「迫力がなさすぎる（too sterile）」ことがあるからである。リ−グレイニング及び／又はリ−ノイジングは、幾種類もの既知のアルゴリズムのどれでも使用して、復号器で加える比較的容易な作用である。例えば、これは、適切な振幅の低域フィルタされたランダムノイズを加えることによって達成することができる。
【０２００】
圧縮する前のデ−インタレーシング
上記のように、非インタレース化表示を最終的に意図している、インタレース化されたソースを圧縮する好ましい方法は、インタレース化されたソースを、圧縮ステップの前にデ−インタレース化するステップを含んでいる。信号を、受信器内で復号した後、デ−インタレースすることは（受信器内で該信号はインタレース化モードで圧縮されている）、圧縮前にデ−インタレース化され次いでインタレース化されていない圧縮信号を送るよりコストがかかりかつ効率が悪い。そのインタレース化されていない圧縮信号は、階層化されているか又は階層化されていなくても（すなわち通常の単一層圧縮でも）よい。
【０２０１】
インタレース化されたソースの単一フィールドをフィルタし次にそのフィールドを、あたかもインタレース化されていない完全フレームであるように使用すると、劣ったノイズの多い圧縮結果がもたらされることを、実験が示した。したがって、圧縮する前に、単一フィールドのデ−インタレーサを使うのは良い方法ではない。代わりに、実験は、前の、現行の及び次のフィールドフレームそれぞれに対し、[０．２５、０．５、０．２５]の重みをつけて、フィールド合成フレーム(「フィールド−フレーム」)を使用する３フィールドフレームデ−インタレーサ法が、圧縮に対して優れた入力を提供することを示した。３フィールドフレームの結合を、他の重みを利用して実施して（これらの重みは最適の重みであるが）、圧縮プロセスに対するデ−インタレース化された入力をつくることができる。
【０２０２】
好ましいデ−インタレースシステムでは、フィールドデ−インタレーサを、全プロセスの第一ステップとして使用してフィールドフレームをつくる。特に各フィールドは、デ−インタレース化されて合成フレームをつくり、その合成フレームには、フレーム中のラインの総数がフィールド中の半数のラインから誘導される。したがって、例えば、インタレース化された１０８０ラインの画像は偶数と奇数のフィールド当り５４０ラインを有し、各フィールドは１／６０秒を表す。通常、５４０ラインからなる偶数と奇数のフィールドがインタレース化されて、各フレームに対して１０８０ラインずつつくる。そのフレームは１／３０秒を表す。しかし、好ましい実施態様では、該インタレーサが、各走査線を、指定のフィールド（例えば奇数のフィールド）からの改変なしで、デ−インタレース化された結果のいくらかを保持するバッファに複写する。該フレームのための残りの中間走査線（この実施例では偶数の走査線）は、新しく記憶された各ラインの上方のフィールドラインの１／２及び新しく記憶された各ラインの下方のフィールドラインの１／２を加えることによって合成される。例えば、一フレームに対するライン２の画素値は各々、ライン１及びライン３各々からの対応する画素値を合計した画素値の１／２を含んでいる。中間合成走査線の作成は、フライ（fly）に対してなされるか、又は一フィールドからのすべての走査線がバッファに記憶された後に計算されてもよい。同じプロセスが次のフィールドにも繰り返されるが、そのフィールドのタイプ（すなわち、偶数、奇数）は逆である。
【０２０３】
図１４Ａは奇数フィールドデ−インタレーサのブロック図であり、奇数フィールド１４００からの奇数ラインが、デ−インタレース化された奇数フィールド１４０２に単純に複写され、一方、偶数ラインが、原奇数フィールドからの隣接奇数ラインを平均することによってつくられて、デ−インタレース化奇数フィールド１４０２の偶数ラインが形成されることを示している。同様に図１４Ｂは偶数フィールドデ−インタレーサのブロック図であり、偶数フィールド１４０４からの偶数ラインが、デ−インタレース化された偶数フィールド１４０６に単純に複写され、一方、奇数ラインが、原偶数フィールドからの隣接する偶数ラインを平均することによってつくられて、デ−インタレース化偶数１４０６の奇数ラインが形成されることを示している。この場合は「トップフィールドファースト」に相当し、また「ボトムフィールドファースト」は「偶数」フィールドと考えられることに留意すべきである。
【０２０４】
次のステップとして、一連のこれらデ−インタレース化フィールドを、３フィールドフレームデ−インタレーサへの入力として使用して最終のデ−インタレース化フレームがつくられる。図１５は、各出力フレームの画素が、どのようにして、前のデ−インタレース化フィールド（フィールドフレーム）１５０２からの対応する画素の２５％、現行のフィールドフレーム１５０４からの対応する画素の５０％及び次のフィールドフレーム１５０６からの対応する画素の２５％で構成されているかを示すブロック図である。
【０２０５】
そのとき、前記新しいデ−インタレース化フレームは、フレーム間のインタレース差のアーチファクトが、該フレームが構成されている３フィールドフレームよりはるかに少ない。しかし、前のフィールドフレームと次のフィールドフレームを、現行のフィールドフレームに加えることによる時相スミアリング（temporal smearing）がある。この時相スミアリングは、特にもたらされるデ−インタレース化の改良の見地から、通常、差し支えない。
【０２０６】
このデ−インタレース化法は、単一層（階層化されていない）又は階層化された単一層であろうとも、圧縮への入力として非常に有益である。またこのデ−インタレース化法は、提示、視聴又は静止フレームの製作のためのインタレース化ビデオの処理として、圧縮の利用とは独立して有益である。該デ−インタレース化法由来の写真は、インタレースを直接示すか又はデ−インタレース化フィールドを示すより「清浄」に見える。
【０２０７】
デ−インタレースのしきい値処理
先に考察したデ−インタレース３フィールド合計重み付け[０．２５、０．５、０．２５]は安定した画像を提供するが、一シーンの動く部分が時々軟調になるか又はエイリアシングアーチファクトを示すことがある。これに対抗するため、[０．２５、０．５、０．２５]時相フィルタの結果を、中央フィールドフレームだけの対応する画素値に対して比較するしきい値試験を適用できる。中央フィールドフレームの画素値が、３フィールドフレーム時相フィルタ由来の対応する画素の値と、指定のしきい値の大きさを超える差がある場合、中央フィールドフレームの画素値だけが使用される。このように、３フィールドフレーム時相フィルタ由来の画素は、画素値が、単一のデ−インタレース化中央フィールドフレームの対応する画素との差がしきい値の大きさより小さい場合に選択され、そしてその差がしきい値より大きい場合は、中央フィールドフレームの画素値が使用される。これによって、速い動きを、フィールド速度で追跡し、次いで画像のより平滑な部分をフィルタし、３フィールドフレーム時相フィルタで平滑化することができる。この組合せは、最適ではないにしても、圧縮に対する有効な入力であることが証明された。また、画像マテリアルをデ−インタレースすることは[表示と共同のラインダブリング（line doubling in conjunction
with display）と呼称されることもある]、直接視聴のための処理に対し非常に有効でもある。
【０２０８】
このようにしきい値を決定する好ましい実施態様では、中央（単一）デ−インタレース化フィールドフレーム画像と３フィールドフレームのデ−インタレース化画素から対応するＲＧＢカラー値を求めるために下記式が使用される。
Ｒｄｉｆｆ＝Ｒ＿単一＿フィールド＿デ−インタレース化マイナスＲ＿３＿フィールド＿デ−インタレース化
Ｇｄｉｆｆ＝Ｇ＿単一＿フィールド＿デ−インタレース化マイナスＧ＿３＿フィールド＿デ−インタレース化
Ｂｄｉｆｆ＝Ｂ＿単一＿フィールド＿デ−インタレース化マイナスＢ＿３＿フィールド＿デ−インタレース化
しきい値処理値＝ａｂｓ（Ｒｄｉｆｆ＋Ｇｄｉｆｆ＋Ｂｄｉｆｆ）＋ａｂｓ（Ｒｄｉｆｆ）＋ａｂｓ（Ｇｄｉｆｆ）＋ａｂｓ（Ｂｄｉｆｆ）
【０２０９】
次に上記しきい値処理値をしきい値設定値と比較する。典型的なしきい値設定値は０．１〜０．３の範囲内にあり、０．２が最も一般的である。
【０２１０】
このしきい値からノイズを除くため、３フィールドフレームと単一フィールドフレームのデ−インタレース化写真のスムースフィルタリングを使用した後、それら写真を比較してしきい値処理することができる。このスムースフィルタリングは、ダウンフィルタリング（例えば、好ましくは上記ダウンフィルタを使用して２回ダウンフィルタする）し次にアップフィルタリングする（例えば、ガウスアップフィルタを２回使用する）ことによって達成することができる。この「ダウン−アップ」スムース化フィルタは、単一フィールドフレームデ−インタレース化写真と３フィールドフレームデ−インタレース化写真の両者に適用できる。次に、上記のスムース化された、単一フィールドフレーム写真と３フィールドフレーム写真を比較してしきい値処理値を計算し、次いでしきい値処理を行ってどちらの写真が各最終出力画素のソースであるかを確認することができる。
【０２１１】
特に、上記しきい値試験は、単一フィールドフレームデ−インタレース化写真か、単一フィールドフレームデ−インタレース化写真の３フィールドフレーム時相フィルタによる結合体を選択するスイッチとして使用される。その結果、この選択によって下記画像がもたらされる。すなわち画素が、その画像が単一フィールドフレーム画像との差が小さい（すなわちしきい値より小さい）領域における３フィールドフレームデ−インタレーサ由来の画素である画像、及び画素が、３フィールドフレームが単一フィールドフレームデ−インタレース化画素（スムース化後）との差が大きかった（すなわちしきい値より大きい）領域における単一フィールドフレーム画像由来の画素である画像がもたらされる。
【０２１２】
この方法は、単一フィールドファーストモーションディテールを維持し（単一フィールドフレームデ−インタレース化画素にスイッチすることによって）しかもその画像の大きな部分をスムース化する（３フィールドフレームデ−インタレース化時相フィルタ結合にスイッチすることによって）のに有効であることを証明した。
【０２１３】
単一フィールドフレームデ−インタレース化画像か３フィールドフレームデ−インタレース化画像の選択を行うことに加えて、単一フィールドフレーム画像を少し、３フィールドフレームデ−インタレース化写真に加えて、単一フィールド写真の全画像にわたる即時性をいくらか維持することも有益なことが多い。この即時性は、３フィールドフレームフィルタの時相スムースネスと釣り合いがとられている。一般的なブレンディングは、３３．３３％（１／３）の単一中央フィールドフレームを６６．６７％（２／３）の対応する３フィールドフレームスムース化画像に加えることによって新しいフレームをつくるブレンディングである。これは、どちらであっても結果は同じなので、しきい値切換えの前後に行うことができ、スムース化された３フィールドフレーム写真に影響するだけである。これは、原３フィールドフレームの重み「０．２５、０．５、０．２５」以外の異なる比率の３フィールドフレームを使用することに事実上等しいことに注目すべきである。「０．２５、０．５、０．２５」の２／３プラス（０、１、０）の１／３を計算すると、[０．１６６７、０．６６６６、０．１６６７]が３フィールドフレームの時相フィルタとして得られる。より重く重み付けられた中央(現行)フィールドフレームは、しきい値の値より低くなったスムース化領域でさえ、追加の即時性を結果にもたらす。この組合せは、シーンの動く部分に対するデ−インタレース化プロセスにおいて時相スムースネスを即時性と釣り合わせるのに有効であることを証明した。
【０２１４】
線形フィルタの使用
ビデオ写真を含む和（sum）、フィルタ又はマトリックスは、ビデオ内の画素値が非線形信号であることを考慮しなければならない。例えば、ＨＤＴＶのビデオカーブは係数及びファクターがいくらか変化していてもよいが、一般的な式は国際ＣＣＩＲＸＡ−１１（現在はＲｅｃ．７０９と呼ばれている）である。
Ｖ＝１．０９９３＊Ｌ^０．４５−０．０９９３Ｌ＞０．０１８０５１の場合
Ｖ＝４．５＊ＬＬ≦０．０１８０５１の場合
上記式中、Ｖはビデオ値であり、そしてＬは線形ライトルミナンスである。
【０２１５】
これらの変化は、しきい値（０．０１８０５１）を少し調節し、ファクター（４．５）を少し調節し（例えば４．０）そしてべき指数（０．４５）を少し調節する（例えば０．４）。しかし、基本式は同じままである。
【０２１６】
ＲＧＢとＹＵＶの間の変換などのマトリックスオペレーションは線形値を示唆している。ＭＰＥＧが一般に、ビデオの非線形値を、それらの値があたかも線形であるように使用することから、ルミナンス（luminance）（Ｙ）とカラー値（ＵとＶ）の間の漏洩が起こる。この漏洩は圧縮の効率を阻害する。対数表現を、例えばフィルム密度の単位で使用するように使うと、この問題が大きく修正される。各種タイプのＭＰＥＧ符号化は、信号の非線形アスペクトに対してニュートラルであるが、その効率は、ＲＧＢとＹＵＶ間のマトリックス変換を利用することによって達成される。ＹＵＶ（Ｕ＝Ｒ−Ｙ、Ｖ＝Ｂ−Ｙ）は、０．５９Ｇプラス０．２９Ｒプラス０．１２Ｂの線形化合計（又はこれら係数のわずかの変化）として計算されたＹを含んでいなければならない。しかし、Ｕ（＝Ｒ−Ｙ）は、ルミナンスに直交している対数空間のＲ／Ｙに等しくなる。したがってシェードされたオレンジボール（orange ball）は、対数表現のＵ（＝Ｒ−Ｙ）パラメータを変えない。ブライトネスの変化は、完全なディテールが提供される場合、ルミナンスパラメータに完全に表される。
【０２１７】
線形対対数対ビデオの問題点はフィルタリングに強い影響を与える。注目すべきキーポイントは、小さい信号の変動（例えば１０％以下）は、非線形ビデオ信号が、あたかも線形信号であるように処理されるとき、ほぼ修正されることである。これは、スムースビデオ−ツー−フロム−線形変換カーブ（smooth video-to-from-linear conversion curve）に対する区分的線形近似が妥当であるからである。しかし、変動が大きい場合、線形フィルタの方がはるかに有効であり、はるかに良好な画質が得られる。したがって、大きな変動が最適に符号化され、変換され又は他の方法で処理されることになっている場合、線形フィルタを利用できるように、第一に非線形信号を線形信号に変換することが望ましい。
【０２１８】
それ故、デ−インタレース化は、各フィルタと加算ステップが、フィルタリング又は加算を行う前に、線形値への変換を利用するとき非常に優れている。これは、大きな信号変動が画像の小さなディテールにおけるインタレース化信号に固有なものだからである。その画像信号は、フィルタリングの後、非線形ビデオディジタル表現に変換して戻される。したがって、３フィールドフレーム重み付け（例えば[０．２５、０．５、０．２５]又は[０．１６６７、０．６６６６、０．１６６７]）を、線形化ビデオ信号に実施しなければならない。ノイズとデ−インタレースフィルタリングにおけるパーシャルターム（partial term）の他のフィルタリングと重み付けの和も、計算を行うため線形に変換しなければならない。どのオペレーションが線形処理を保証するかは、信号の変動とフィルタリングのタイプによって決定される。画像のシャープニングは、セルフ−プロポーショナル（self-proportional）であるから、ビデオ又は対数非線形の表現で適切に計算することができる。しかし、マトリックスプロセッシング、空間フィルタリング、重み付け合計及びデ−インタレースプロセッシングは、線形化されたディジタル値を使用して計算しなければならない。
【０２１９】
単純な一実施例として、上記の単一フィールドフレームデ−インタレーサは、実際のライン各々の上と下のラインを平均することによって、ミッシング代替ライン（missing alternate line）を計算する。この平均操作は、線形で行われると、数字的にかつ視覚的に極めて正しい。したがって、上のラインの０．５倍と下のラインの０．５倍を合計する代わりに、そのディジタル値が第一に線形化され、次に平均され次いで非線形ビデオ表現に再度変換されて戻される。
【０２２０】
２／３ベース層に基づいた階層化モード
１２８０×７２０強化層は８６４×４８０ベース層を利用できる（すなわち、強化層とベース層の間の２／３の関係）。図１６はこのようなモードのブロック図である。１２８０×７２０の原画像１６００は、１２９６×７２０パッド（pad）され（１６の整数倍であるように）次に２／３倍ダウンサイズして８６４×４８０画像１６０２とする（やはり１６の整数倍）。そのダウンサイジングは、好ましくは、正規フィルタ（normal filter）又はマイルドな負のローブを有するフィルタを使用する。上記のように、このダウンサイズされた画像１６０２は、第一符号器１６０４（例えば、ＭＰＥＧ−２符号器又はＭＰＥＧ−４符号器）に入力されて、ベース層として直接符号化することができる。
【０２２１】
強化層を符号化するため、ベース層を３／２倍アップサイズして（拡張し次いでアップフィルタして）１２９６×７２０中間フレーム１６０６にする。上記アップフィルタは好ましくはマイルドな負のローブを有している。この中間フレーム１６０６は現画像１６００から減算される。同時に、８６４×４８０画像１６０２が３／２倍アップフィルタされて（好ましくはガウスフィルタを使用して）１２８０×７２０になり次に原画像１６００から減算される。その結果に重み付けして（例えば、ＭＰＥＧ−２の場合２５％重み付け）、次に、原画像１６００から中間フレーム１６０６を減算した結果に加算される。このようにして得られた合計をクロップ（crop）して大きさを小さくし（例えば１１５２×６８８）次に端縁をフェザーして（feather）、プレ圧縮強化層フレーム１６０８が得られる。このプレ圧縮強化層フレーム１６０８を、第二符号器１６１０（例えばＭＰＥＧ−２又はＭＰＥＧ−４の符号器）に入力して、強化層として符号化する。
【０２２２】
１８．５メガビット／秒におけるその効率と品質は、この配置構成を利用する「単一」階層化（すなわち非階層化）システムと階層化システムではほぼ同じである。強化層とベース層の間の２／３倍の関係の効率は２倍の場合ほど優れていない。というのは、ベース層と強化層の間のＤＣＴ係数は直交性が低い。しかし、この構造は実用的であり、高品質のベース層（より安価に復号する）を提供する利点がある。これは、低解像度が特定の表示器によって提供できるすべてである場合、高解像度写真全体を復号しなければならない（より高いコストで）単一階層化配置構成を超える改良である。
【０２２３】
また、上記階層化配置構成は、強化サブ領域が調節可能であるという利点もある。したがって、効率は、強化層の大きさ、及びベース層と強化層に割り当てられた全ビット速度のベース層ビット速度／強化層ビット速度の比率を調節することによって制御することができる。上記強化層の大きさとビット速度比率を調節して、特に高いストレス（速い動き又は多数のシーンの変化）下での圧縮性能を最適化することができる。例えば、上記のように、極端のストレス下では、すべてのビットをベース層に割り当てることができる。
【０２２４】
強化層とベース層の間の好都合な解像度の関係は、１／２、２／３というファクター及び他の単分数（例えば１／３、３／４）の関係である。強化層とベース層の間の関係に対して、スキーズ（squeeze）を適用することも有用である。例えば、２０４８×１０２４のソース写真は１５３６×５１２のベース層を有していてもよく、そのベース層は、ソース画像に対して３／４の水平関係と１／２の垂直関係を有している。これは最適でないが（２のファクターが水平と垂直の関係の両者に対して最適である）、原理を示している。水平関係と垂直関係の両方に２／３を使用すると、垂直方向に２のファクター及び水平方向に２／３のファクターを利用することによって、いくつかの解像度を改善することができる。あるいは、いくつかの解像度は、垂直方向に２／３のファクターを用い水平方向に１／２のファクターを利用することがより最適である。したがって、１／２、２／３、３／４、１／３などの単分数は、水平と垂直の解像度の関係に独立して適用することができ、関係の多数の可能な組合せを行うことができる。したがって、強化層とベース層及びその入力解像度との関係のみならず、完全入力解像度とベース層の解像度の関係によって、このような分数の関係を使用する場合に完全な融通性が可能になる。このような解像度の関係の特に有用な組合せは、どの標準の一部として採用されても、圧縮「強化モード」番号を割り当てることができる。
【０２２５】
中央値フィルタ
ノイズを処理するのに最も有用なフィルタは中央値フィルタである。３要素中央値フィルタが、三つのエントリーの順位付けを、単純なソート（simple sort）によって行い、次に中央のエントリーをピック（pick）する。例えば、Ｘ（水平）中央値フィルタが、三つの隣接する水平画素のレッド値（又はグリーン値又はブルー値）を調べて、真ん中の値を有する画素をピックする。二つが同じであればその値を選ぶ。同様に、Ｙフィルタが現行画素の上と下の走査ラインで調べてやはり中央値をピックする。
【０２２６】
ＸとＹの中央値フィルタの両者を適用することから得た結果を平均して、新しいノイズ減少成分写真をつくることが有用であることが実験で確認された[すなわち、新しい画素は各々、原画像からの対応する画素のＸとＹの中央値の５０％等平均値（５０％ equal average）である]。
【０２２７】
ＸとＹ（水平と垂直）の中央値に加えて、斜め中央値などの他の中央値を採用することも可能である。しかし、垂直及び水平の画素値は、物理的に、どの特定の画素に対しても最も近い値なので、斜め中央値より、誤差又はひずみを起こす可能性が低い。しかし、このような他の中央値は、垂直と水平の中央値だけを使用することによってノイズを減らすことが困難な場合にはやはり利用することができる。
【０２２８】
ノイズを減らすのに有益なもう一つのソースは、前と次のフレームからの情報（すなわち時相中央値）である。以下に述べるように、動き解析は、動く領域に対しては最良の整合を提供する。しかし、動き解析は計算集中的（compute intensive）である。画像の一領域が動いていないか又はゆっくり動いている場合、現行画素からのレッド値（及びグリーン値とブルー値）は、前のフレームと次のフレーム中の同じ画素位置のレッド値でフィルタされた中央値でよい。しかし、有意な動きがあってしかもかような時相フィルタが使用されると、異常なアーチファクトが起こることがある。したがって、しきい値を第一に選んで、このような中央値が、現行画素の値から、選択された大きさを超えて異なっているかどうかを確認することが好ましい。そのしきい値は上記デ−インタレース化のしきい値の場合とほぼ同様にして、下記のようにして計算することができる。
Ｒｄｉｆｆ＝Ｒ＿現行＿画素マイナスＲ＿時相＿中央値
Ｇｄｉｆｆ＝Ｇ＿現行＿画素マイナスＧ＿時相＿中央値
Ｂｄｉｆｆ＝Ｂ＿現行＿画素マイナスＢ＿時相＿中央値
しきい値処理値＝ａｂｓ（Ｒｄｉｆｆ＋Ｇｄｉｆｆ＋Ｂｄｉｆｆ）＋ａｂｓ（Ｒｄｉｆｆ）＋ａｂｓ（Ｇｄｉｆｆ）＋ａｂｓ（Ｂｄｉｆｆ）
【０２２９】
上記しきい値処理値を次にしきい値設定値と比較する。典型的なしきい値設定値は０．１〜０．３の範囲内であり、０．２が一般的である。そのしきい値より高ければ、現行値が保持される。そのしきい値より低ければ時相中央値が使用される。
【０２３０】
追加の中央値のタイプは、Ｘ、Ｙ及び時相の中央値から選択される中央値である。もう一つの中央値のタイプは、時相中央値を選び、次にそれからのＸとＹの中央値の等平均値を選択する。
【０２３１】
各タイプの中央値は問題を起こすことがある。ＸとＹの中央値は画像を不鮮明にし（smear）かつブラー（blur）させるので、画像は「グリーシー（greasy）」に見える。時相中央値は、時間が経過するにつれて動きを不鮮明にする。各中央値は、問題をもたらししかも各中央値の特性が異なっているので（ある意味では「直交している（orthogonal）」）、各種の中央値を組み合わせることによって最良の結果が得られることが実験によって確認された。
【０２３２】
特に、中央値の好ましい組合せは、現行画像の各画素に対する値を決定する下記５項目の線形重み付け合計（線形ビデオプロセッシングに関する上記考察参照）である。
現画像の５０％（したがって最大のノイズ低下は３ｄｂ又は１／２である）；
ＸとＹの中央値の平均値の１５％；
しきい値処理された時相中央値の１０％；
しきい値処理された時相中央値のＸとＹの中央値の平均値の１０％；
及び３ウェイのＸ、Ｙ及び時相中央値の１５％。
【０２３３】
時間中央値のこの組合せは、画像と「グリーシー」に又はブラーしているように見せたり、動く物体の時相不鮮明又はディテールの損失を起こすことなく、画像のノイズを減らす合理的な働きをする。これら５項目のもう一つの有用な重み付けはそれぞれ３５％、２０％、２２．５％、１０％及び１２．５％である。
【０２３４】
その上に、下記のように、中央重み付け時相フィルタを、動き補償ｎｘｎ領域に適用することによって、動き補償を適用することが有用である。これら、中央値をフィルタされた画像の結果（前記５項目の）に加えられさらに画像を平滑化することができ、動く画像領域に、より優れた平滑化とディテールが提供される。
【０２３５】
動き解析
「その場での（in
place）」時相フィルタリング（ゆっくり動くディテールを平滑化するのに優れた働きをする）に加えて、デ−インタレース化とノイズ減少も動き解析を利用して改善することができる。３フィールド又は３フレームの同じ位置に画素を加えることは、静止物体の場合、有効である。しかし、動いている物体の場合、時相の平均／平滑化が望ましい場合、小グループの画素にわたる主だった動きの解析を試みることがより最適であることが多い。例えば、画素のｎｘｎブロック（例えば２×２、３×３、４×４、６×６又は８×８）を使用して、前と次のフィールド又はフレームをサーチし、整合を見つけることができる（同じ方式で、ＭＰＥＧ−２の動きベクトルが、１６×１６マクロブロックを整合することによって見つけ出される）。最良の整合が１又は２以上の前の及び次のフレーム中に一度見つけられると、「軌道（trajectory）」と「動くミニ写真」を確認できる。インタレース化フィールドの場合、しきい値処理された上記デ−インタレース化プロセスの結果を利用して推測された動くミニ写真を計算することのみならず比較結果を解析することが最良である。このプロセスは、速く動くディテールをゆっくり動くディテールからすでに分離しかつそのゆっくり動くディテールをすでに平滑化しているので、写真の比較と再構成は、個々のデ−インタレース化フィールド以上に適用可能である。
【０２３６】
動き解析は、好ましくは、現行のしきい値処理されたデ−インタレース化画像のｎｘｎブロックを、前と次の１又は２以上のフレーム中の隣接するすべてのブロックと比較することによって実施される。その比較は、ｎｘｎブロックのルミナンス又はＲＧＢの差の絶対値でもよい。一つのフレームは、その動きベクトルがほぼ等しくて逆方向であれば、充分に順方向と逆方向を向いている。しかし、動きベクトルがほぼ等しくて逆方向でない場合は、追加の順方向と逆方向の１又は２以上のフレームが実際の軌道を決定するのに役立てることができる。さらに、異なるインタレース化処理が、順方向と逆方向の「最良推測（best guess）」の動きベクトルの決定に役立てるのに有用である。一つのデ−インタレース化処理は個々のデ−インタレース化フィールドだけを使用する処理であるが、これは小さな動くディテールにエイリアシングとアーチファクトをひどく起こしやすい。もう一つのデ−インタレース化法は、フィールドフレームスムースデ−インタレース化だけを、しきい値処理を行わずに、上記の重み付け[０．２５、０．５、０．２５]をして使用する方法である。ディテールは平滑化されて時には失われるが、軌道はより正確になることが多い。
【０２３７】
一旦軌道が見つけられると、「平滑化されたｎｘｎブロック」を、１（又は２以上）の前のフレームと次のフレーム由来の動きベクトルオフセット画素を使用して時間的にフィルタすることによってつくることができる。典型的なフィルタは、３フレームに対してはやはり[０．２５、０．５、０．２５]又は[０．１６６７、０．６６６６、０．１６６７]であり、そして二つの逆方向と順方向のフレームに対しては恐らく[０．１、０．２、０．４、０．２、０．１]である。中央の重みが小さい他のフィルタも有用であり、特にブロックの大きさが一層小さい（例えば２×２、３×３及び４×４）ものが有用である。フレーム間の整合の信頼性は絶対値の差で示される。大きな最小絶対差を使用して、該フィルタのより大きい中央重みを選択することができる。絶対差の値が小さいことは、良好な整合を示唆しているので、これを利用してより小さい中央重みを選択して、重さ補償ブロックのいくつものフレームの一スパンにわたって平均値をより均一に分布させることができる。
【０２３８】
これらフィルタの重みは、上記の個々のデ−インタレース化動き補償フィールドフレーム；しきい値処理された３フィールドフレームデ−インタレース化写真；及びしきい値処理されていない３フィールドフレームデ−インタレース化画像に、上記のような[０．２５、０．５、０．２５]の重み付けで適用することができる。しかし、最良のフィルタ重みは通常、動き補償ブロック線形フィルタリングを、上記しきい値処理された３フィールドフレームの結果に適用することに由来している。これは、しきい値処理された３フィールドフレーム画素が、最も動き反応性が高い[しきい値を超える単一デ−インタレース化フィールドフレームにはジフォールト（default）することによって]のみならず最高に平滑である（平滑領域のエイリアシングを除くことによって）からである。したがって、動き解析から得た動きベクトルは、マルチフレームフィルタ又はマルチデ−インタレース化フィールドフレームフィルタ又は単一デ−インタレース化フィールドフレームフィルタ又はその組合せに対する入力として使用できる。しかし、そのしきい値処理されたマルチフィールドフレームデ−インタレース化画像は、ほとんどの場合、最良のフィルタ入力を形成する。
【０２３９】
動き解析を利用する場合、速い動きが見出されると（例えば±３２画素）、サーチ領域が大きいため、計算費用が高価になる。したがって、専用ハードウェア又はディジタル信号プロセッサ利用コンピュータを用いることによって速度を増大することが最良である。
【０２４０】
一旦、動きベクトルがそれらの絶対差の測定精度とともに見つけられると、その動きベクトルは、フレーム速度の変換を試みる複雑な方法に利用できる。しかし、遮蔽（occlusion）の問題（他のものをおおいかくすか又は暴露する物体）は、整合を混乱させて、正確にかつ自動的には推測できない。また遮蔽は、通常の画像時相アンダーサンプリング及び画像の固有周波数を有するそのビート（例えば映画の「逆転ワゴンホイール」効果）のような時相エイリアシングも伴う。これらの問題は、既知の演算法によって解明できないことが多いので、今までヒトの手助けを必要としている。したがって、ヒトによる精査や調節は、リアルタイムの自動処理が必要でない場合、オフラインと非リアルタイムのフレーム速度変換及び他の類似の時相のプロセスに利用できる。
【０２４１】
デ−インタレース化は同じ課題の単純な一形態である。フレーム速度変換の場合と同様に、デ−インタレース化のタスクは、完全に実施することは理論的に不可能である。これは、特に、時相アンダーサンプリング（閉じたシャッター）と不適当な時相サンプルフィルタ（すなわちボックスフィルタ）が原因である。しかし、正しい試料の場合でさえ、遮蔽やインタレースなどのエイリアシングの問題が、正しい結果を得ることが論理的に不可能であることをさらに保証する。このことが見える症例は、該問題に適用される、ここに記載のツールデプス（depth）によって軽減される。病理症例は、リアル画像シーケンス中に常に存在している。その目標は、このようなシーケンスに遭遇したときに病気の悪化の頻度とレベルと減らすことだけである。しかし多くの場合、デ−インタレース化プロセスは、受け入れ可能に完全に自動化することができ、そしてリアルタイムで反復されることなく作動できる。それにしても、手動調節によって利益をうけることが多い多くのパラメータがある。
【０２４２】
高周波数のフィルタによる平滑化
中央値フィルタリングに加えて、高周波数のディテールを減らしても高周波数ノイズが減少する。しかしこの平滑化は、鮮鋭度とディテールが損失するという犠牲を払って得られる。したがってこのような平滑化はごくわずかの方が一般に有用である。平滑化を起こすフィルタは、デ−インタレース化の場合のしきい値と同様に、通常フィルタ（例えば台形サインフィルタ）でダウンフィルタし次にガウスフィルタでアップフィルタすることによって容易につくることができる。結果は、高周波数写真のディテールを欠いているので平滑化される。このような項目（term）が加えられる場合、その項目は、わずかな量のノイズを減らすためには、ごく少量例えば５〜１０％でなければならない。大量になると、ブラー効果が一般にかなり目視可能になる。
【０２４３】
ベース層のノイズフィルタリング
原画像に対する上記中央値フィルタリングのフィルタパラメータは、画像を捕獲するフィルム粒子又は画像センサのノイズ特性に整合されねばならない。この中央値をフィルタされた画像は、ダウンフィルタされて、ベース層圧縮プロセスへの入力を生成した後、その画像はまだ少量のノイズを含んでいる。このノイズは、別のＸ−Ｙ中央値フィルタ（ＸとＹの中央値を等しく平均する）プラスごく少量の高周波数平滑化フィルタを組み合わすことによって、さらに減らすことができる。ベース層の各画素に加えられる、これら３項目の好ましいフィルタ重み付けは次の通りである。
原ベース層の７０％（中央値をフィルタされた上記原画像からダウンフィルタされた）；
ＸとＹの中央値の平均値の２２．５％；及び
ダウンアップ平滑化フィルタの７．５％。
【０２４４】
ベース層のこの少量の追加のフィルタリングは、ノイズを少量減らしかつ安定性を改善して、より優れたＭＰＥＧ符号化をもたらしかつこのような符号化によって加えられるノイズの量を制限する。
【０２４５】
ＭＰＥＧ−２とＭＰＥＧ−４で動き補償を行うため負のローブを有するフィルタ
ＭＰＥＧ−４には、最良の動きベクトルの整合を見つけたとき、マクロブロックをシフトし次にその整合された領域を使って動き補償するための基準フィルタが設けられている。ＭＰＥＧ−４ビデオ符号化は、ＭＰＥＧ−２と同様に、マクロブロックに対し、動きベクトルの１／２画素の解像度を保持する。またＭＰＥＧ−４は、ＭＰＥＧ−２と異なり、１／４画素の精度を保持する。しかし、ＭＰＥＧ−４の基準装備において使用されるフィルタは最善の水準に次ぐフィルタである。ＭＰＥＧ−２において、画素間の途中点（half-way point）はこれら二つの隣り同士の画素の平均値であり、最善の水準に次ぐボックスフィルタである。ＭＰＥＧ−４において、このフィルタは、１／２画素解像度に用いられる。１／４画素解像度がＭＰＥＧ−４バージョン２に呼び出されると、負のローブを有するフィルタが途中点に対して使用されるが、この結果を有する次善のボックスフィルタと隣り同士の画素が１／４と３／４の点に使用される。
【０２４６】
さらに、基準色チャネル（chrominance
channel）（Ｕ＝Ｒ−Ｙ及びＶ＝Ｂ−Ｙ）は、ＭＰＥＧ−４下の動き補償ステップでサブ画素解像度を利用しない。ルミナンスチャネル（Ｙ）は１／２又は１／４の画素の解像度を有しているから、１／２解像度の基準色のＵとＶのチャネルは、ルミナンスの１／２画素に対応して、１／４画素解像度のフィルタを使ってサンプリングしなければならない。１／４画素の解像度がルミナンスに対して選択されるとき、１／８画素の解像度をＵとＶの基準色に使用しなければならない。
【０２４７】
ルミナンスに１／４画素の解像度を実行するとき、１／４、１／２及び３／４の画素点をフィルタするのに負のローブの接頭sinc関数を使用することによって（上記のように）、及び１／２画素ポジションをつくるフィルタに対して１／２画素の解像度を実行するとき類似の負のローブを使用することによって、フィルタリングの効果が有意に改善されることを、実験が示した。
【０２４８】
１／４画素のルミナンス解像度を使用するとき、ＵとＶのクロミナンスに対し１／８画素点をフィルタするため負のローブの接頭sinc関数を使用することによって、及び１／２画素のルミナンス解像度を使用するとき、類似の負ローブフィルタを有する１／４画素解像度フィルタを使用することによって、同様に、フィルタリングの効果が有意に改善される。
【０２４９】
１／４画素の動きベクトルを接頭sinc動き補償変位フィルタリング（truncated sine motion
compensated displacement filtering）と組み合わせると、写真の画質が大きく改善されることが発見された。特に清浄性が改善され、ノイズとアーチファクトが減少し、そして彩度のディテール（chroma detail）が増大する。
【０２５０】
これらのフィルタは、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４、又は他の適切な動き補償ブロックベースの画像符号化システムによって、ビデオ画像に適用できる。
【０２５１】
画像形成装置の特性決定と修正
特定のプログレッシブスキャン（非インタレース化）カメラを扱う際に、特定のカメラに特異的なプレプロセッシングを、圧縮（階層化又は非階層化）の前に適用することが非常に望ましいことが実験で確認された。例えば、一つのカメラのタイプに、レッドとグリーンに対するセンサ間の一画素の１／３及びグリーンとブルーのセンサ間の別の１／３画素（レッドとブルーの間の２／３画素）の機械的水平方向の調整不良（mechanical horizontal misalignment）がある。これによって、小さい垂直ディテールのまわりにカラーフリンジが起こる。これらのカラーフリンジは、原画像では眼に見えないが、圧縮／復元プロセスで、非常によく眼に見えるようになり望ましくないカラーノイズを生成する。この一つのカメラタイプに特異的なプレプロセス（pre-process）がこのカラー変位を修正して、カラーアーチファクトがない圧縮に対する入力をもたらす。したがって、眼に見えないが、カメラやそのセンサの特性のこのような小さいニュアンスは、最終の圧縮され／復元された結果の許容性と品質に対して重要になる。
【０２５２】
したがって、「眼が見るもの」と、「コンプレッサが見るもの」を識別することが有用である。この識別を有利に利用して、圧縮され／復元された画像の画質を大きく改善するプレプロセッシングステップが発見された。
【０２５３】
したがって、圧縮／復元システムに対する入力をつくる際に使用される各個々の電子カメラ、各カメラタイプ、各フィルムタイプ及び各個々のフィルムスキャナ及びスキャナタイプは、カラーアラインメント（color alignment）及びノイズ（ビデオカメラとスキャナに対する電子ノイズとフィルムに対する粒子）によって、個々に特性を決定しなければならない。画像がつくられる情報、特定の性質の表及び装置の各部品の特定の設定は原画像によって運ばれ、次いで、圧縮される前にプロプロセッシングで使用されねばならない。
【０２５４】
例えば、特定のカメラはカラーリアラインメント（color realignment）を必要とすることがある。また特定のカメラは中位のノイズ設定で設定されることもある（必要なノイズプロセッシングの大きさに実質的に影響する）。これらカメラの設定と固有のカメラ特性は、そのカメラからの各ショットにそって補助情報としてはこばれねばならない。次に、この情報を利用して、プレプロセッシングのタイプ及びプレプロセスのためのパラメータの設定を制御することができる。
【０２５５】
多数のカメラから編集されるかまたは多数のカメラ及び／又はフィルム源から復号される画像の場合、そのプレプロセッシングは、恐らく、このような編集や組合せを行う前に実施すべきである。このようなプレプロセッシングは、画像の質を低下させてはならず、眼に見えないが圧縮の質には大きな影響を与える。
【０２５６】
特定の圧縮システムに入力すべき画像をつくるために使用される非フィルム画像形成システム（例えば電子カメラとフィルムスキャナ）に対しかような特性決定を実施し使用する一般的な方法は次のとおりである。
（１）解像試験チャートの画像をつくり、次いでカラーペア（例えばＲＧ、ＲＢ、ＧＢ）によって、好ましくは画素単位で表現して、画素センサの水平と垂直のカラーアラインメント（フィルムの場合は粒子）を測定する。
（２）１又は２以上のモノクロム試験チャートの画像をつくり、次いでセンサが個々に、好ましくはレッド、グリーン及びブルーの画素値として表現されたセットとして（例えばホワイトカード、黒カード、５０％と１８％のグレイカード並びにレッド、グリーン及びブルーの各基準カードの画像をつくることによる）発生したノイズを測定する。そのノイズが、他のカラーチャネルからの出力の変化及び隣接する画素を比較することによって、相互に関連しているかどうかを決定する。
（３）正確に調整された装置によってつくられた正確な情報を画像とともに選ぶ（例えば電子伝送、機械可続媒体への記憶又は画像に付随するヒト可続データによって）。
（４）画像形成システムからの画像を圧縮プロセスで使用する前に、画素を、カラーによって、等しいオフセット量によって翻訳して、測定されたミスアラインメントを修正する。例えば、レッドセンサがブルーセンサより０．２５画素低くミスアラインされていれば、画像中のすべてのレッド画素は、０．２５画素だけ上方へシフトさせねばならない。同様に、ノイズの測定量に基づいて、ノイズ減少フィルタの重みを、測定ノイズの量を補償する量だけ調節する（これは、経験で確認し、そして手作業によるか又は計算された参照表に定義する必要がある）。
【０２５７】
特定の圧縮システム中の入力すべき画像をつくるのに使用されるフィルム画像形成システムに対し、このような特性決定を実施し使用する一般的方法は次の通りである。
（１）フィルムのタイプを決定する（粒子はフィルムのタイプによって変化する）。
（２）そのフィルムを、各種の照明条件下で、１又は２以上のモノクロム試験チャートに露出する（ノイズは一部分、露出の関数である）。
（３）フィルムを通常の速度でフィルムスキャナによって走査し（このフィルムスキャナの特性は上記のようにして測定する）次に発生したノイズをセンサによって、個々にセットとして測定する。そのノイズが相互に関連しているかどうかを決定する。
（４）同タイプのフィルムが露出されて正確に調整されたスキャナで走査されるといつでも、その確認され測定された情報（すなわち、フィルムのタイプ、露出条件、走査特性）を、走査されるフィルム画像とともに運ぶ。
（５）このような画像を圧縮プロセスで使用する前に、ノイズ減少フィルタの重みを、測定されたノイズの量を補償する量だけ調節する（これは、経験で確認し次に手動の又はコンピュータ化された参照表で定義する必要がある；その調節は、少なくとも三つの要因すなわちフィルムのタイプ、露出条件及び走査特性の関数であるから、コンピュータが好ましい）。
【０２５８】
強化された３−２プルダウンシステム
上記の３−２プルダウン法を利用して、フィルムを６０Ｈｚビデオへ転送することは、一般に非常に嫌われているプラクチスである。３−２プルダウン法は、既存のＮＴＳＣ（及びいくつかの提案されているＨＤＴＶ）システムに対して、２４フレーム／秒が５９．９４フィールド又は６０フィールド／秒に均等に分割しないので使用されている。奇数のフレーム（又は偶数のフレーム）が、二つのインタレース化フィールド上に配置され、そして偶数のフレーム（又は奇数のフレーム）が三つのインタレース化フィールド上に配置される。したがって、五つのフィールド毎に一つのフィールドが重複している。フィルムの一フレームがビデオの五フィールドにマップしている。上記のように、このプロセスは非常に多くの不快な問題を起こす。
【０２５９】
大部分のビデオプロセッシング装置は、そのプロセスを中間信号に適用するだけである。この場合、時変効果（time-changing effect）が、たとえいくつかの入力フィールドが重複していても、一つのフィールドに対して、次のフィールドとは異なる作用をする。このようなプロセスの後、これらのフィールドはもはや重複せず、またフィールドペアも再結合して原フィルムフレームを回復することができない。そのフィールド速度で起こるこのようなプロセスの例としては、パン−アンド−スキャン（狭い４：３ビデオスクリーンを、ワイドスクリーン画像を水平に横切って移動させて、重要なアクションを示す）、フェードアップ又はフェードダウン、逐次カラー調節、ビデオタイトルオーバーレイスクロールなどがある。さらに、このような信号がフィルムに捕獲され、次にビデオに編集・処理されると、そのフィルムのフレーム処理とそのビデオのフィールド処理が、こみいった方式で強く混ぜ合わされる。このようなビデオ信号（広く存在している）が次に、画像圧縮システムに送られると、そのシステムは一般に、次善的に作動する。
【０２６０】
今までのところ、フィルム源からの最良の画像圧縮は、そのフィルムの２４ｆｐｓ画像が、そのビデオ信号から完全に再抽出できるときだけ（又はより良好なのは、２４ｆｐｓ領域を決して残さないときだけ）に起こることを、実験が示した。次に、その圧縮システムは、原フィルムの元の２４ｆｐｓの速度で、映画（又はフィルムベースのＴＶショー又はＴＶコマーシャル）を符号化することができる。これは最も有効な圧縮法である。いくつかの映画オンデマンドシステム（movie-on-demand system）とＤＶＤマスタリングシステムは注意深く３−２プルダウンを利用し次いで非常に制限された方法で編集して、２４ｆｐｓの原フレームを最終的に抽出し２４ｆｐｓで圧縮できることを保証する。
【０２６１】
しかし、このような注意は「開ループ」なので、通常のヒトの誤りによって破られることが多い。編集及びポストプロダクション効果のプロダクションへの適用の複雑なことが、フィールド速度プロセッシングが行われるときに「過誤」をもたらすことが多い。したがって、このようなことが起こる可能性を避けて、かような誤りを避けるためあらゆることを追跡する試みの複雑さを除く好ましい方法は次のとおりである。
（１）可能なときはいつでも、直接の２４ｆｐｓの記憶、処理又は通信を支持するフィルム処理装置を利用する。
（２）局所記憶のために電子媒体又は高速光学的媒体（例えばハード・ドライブ及び／又はＲＡＭ）を使用して、すべてのフィルム画像をそれら固有の２４ｆｐｓの速度で記憶する。
（３）装置が３−２プルダウンビデオを入力として受け取るときはいつでも、３−２プルダウンを、局所記憶（２４ｆｐｓで保たれている）から（リアルタイムで）変換されたフライ上につくる。
（４）３−２プルダウン画像を生成し伝える装置の出力を記憶させるとき、フライ上の３−２プルダウンを取り消して再び２４ｆｐｓで記憶させる。
（５）フィールドでのみ作動しなければならず、そのためフレームが通常のプロセッシング（一つのフレームとして、２及び３のフィールドに対する）で保存できないすべての装置を該システムから除く。
（６）記憶された画像シーケンスに作用するかまたは該シーケンスを編集するすべてのソフトウェアを、その記憶媒体に使用される２４ｆｐｓモードに整合するように設定する；２４ｆｐｓの固有モードで作動できないソフトウェアは使用しない。
（７）テレシネが直接の２４ｆｐｓ出力を提供しない場合、すべての原画像を決定性カダンス（deterministic cadence）（すなわち常に３と次に２、又は２と次に３）でテレビ放映する（すなわち、フィルムからビデオに変換する）。インタレース化３−２プルダウンがテレシネからのインターフェースの直後に該カダンスを取り消す。
（８）未知の３−２プルダウンカダンスを有するテープを受け取ったならば、そのカダンスはなんらかの方法で見付け出して、記憶される前に除かねばならない。これは、ハードウェア検出システム、ソフトウェア検出システム又は手動／視覚で実施できる。あいにく、ハードウェア検出システムは完全ではないので、手動・視覚による検査が常に必要である（現在のシステムは、フィールドのミスアラインメントを検出しようとしている。黒又は白のフレーム上、又は画像の明るさが一定値のフィールド上のこのようなミスアラインメントは、現在検出することができない。検出可能なミスアラインメントでさえ、いくつかの検出器は、ノイズ又はアルゴリズムの弱さのために失敗する）。
（９）３−２プルダウンを必要とする施設から出力されるテープ記憶はどれも、純粋に、維持されている既知のカダンスに記憶されそしてそのプログラムの全作動時間中妨害されない。
【０２６２】
３−２プルダウンを入力及び出力として必要とする特定のプロセッシング装置は、上記の方法によって、２４ｆｐｓのソースからリアルタイムでフライに対しなされたその単一又は複数の入力を得る。そのカダンスは、各入力に対して、標準の方式で常に始まる。その装置の出力のカダンスはそのとき知られているので、その装置の入力としてフライに生成するカダンスと同一でなければならない。そのカダンスは次に、この事前知識（a priori knowledge）によって取り消され次いでそのフレームは記憶媒体の２４ｆｐｓフォーマット内に保管される。
【０２６３】
この方法は、リアルタイムの３−２プルダウンの取り消しと３−２プルダウンの合成が必要である。そのカダンスが未知のフォーマットのテープ由来のものでないならば、それらのフレームの２４ｆｐｓ性は、かようなフィルムベースのテレシネポストプロダクションシステムによって自動的に保存される。そのシステムは次に、圧縮システム（上記の階層化圧縮プロセスを含む）への最適入力を形成する。
【０２６４】
このプロセスは、ビデオとＨＤＴＶテレシネの施設に広く有用になるであろう。他日、すべての装置が２４ｆｐｓ（及び他の速度プログレッシブスキャン）の固有信号の入力、出力、プロセッシング及び記憶モードを受け入れるとき、このような方法はもはや必要ないであろう。しかし、その間に、多くの装置は、フィルム入力で作動する目標機能（targeted function）をたとえもっていても、内外のインターフェース（interface
in and out）のために３−２プルダウンが必要である。この期間中、上記方法は３−２プルダウンの問題点を除くので、フィルムのポストプロダクションとテレシネの効力の必須要素になることができる。
【０２６５】
フレーム速度の作成方法
２４ｆｐｓは、映画フィルムの世界中に及ぶ標準を形成しているが、２４ｆｐｓを使用すると、多くの場合、飛び越しモーションが起こる（次に移動する前に、フレームの反復フラッシュが多数起こるため「スタッター（stutter）」とも呼称される）。ゆっくりした動きをさせるのみならず、より平滑な動きすなわち動く物体のより明瞭な写真を提供するためには（画像を高いフレーム速度で捕獲するが、その画像をより低い速度で遊動させることによって）、より高いフレーム速度が望ましい。上記のように、６０ｆｐｓという米国におけるビデオ速度（及び放送ビデオの５９．９４ｆｐｓ）は、２４ｆｐｓと比較的非互換性である。これは、一つの映画を世界中に放出しようとすると問題を起こす。というのは、５０ＨｚＰＡＬシステムとＳＥＣＯＭビデオシステムは、６０ｆｐｓのＮＴＳＣビデオ及び６０Ｈｚ中心ＵＳＨＤＴＶと比較的非互換性であるからである。
【０２６６】
米国特許願第０９／４３５，２７７号（発明の名称が「System And Method For Motion Compensation and Frame Rate Conversion」で１９９９年１１月５日付けで出願され、本願発明の譲受人に譲渡されている）が、例えば６０Ｈｚと５０Ｈｚ間及び６０Ｈｚと７２Ｈｚ間などの困難なフレーム速度の変換を実施できる技法を教示している。これらの技法も、フレーム速度変換に加えてデ−インタレース化を行う。
【０２６７】
６０Ｈｚと５０Ｈｚ間又は６０Ｈｚと７２Ｈｚ間などの近い高フレーム速度間の変換を行う、上記出願に教示されているフレーム速度変換法を使用したところ非常に成功したが（その結果は全く良好に見える）、演算の費用が高い。しかし、動き解析を使用して行う２４Ｈｚと６０Ｈｚ間の変換は全く困難であることが確認された。２４ｆｐｓでは、フレームが、特に各フレームの動きのブラーの大きさが異なっている点で（映画「トップガン」からのコックピットのシーンの場合のように）かなり異なっている。これによって、次のフレーム速度の変換のみならず動き分析が、２５ｆｐｓソースからは困難になる。さらに、動きのブラーを除くことが不可能であり、その結果、たとえ動き解析が高い動きの２４ｆｐｓのシーンに対して可能であっても、その画像はブレたままであろう（それら画像は、より平滑に移動しスタッターが少ない)。動き分析は画像の整合部分を必要とするので、動きのブラーの大きさが隣接するフレームとは大きく異なるフレームは整合することがほぼ不可能になる。したがって、フィルム(又は電子カメラ)からの２４ｆｐｓソースマテリアルは、５０Ｈｚ又は６０Ｈｚのビデオへのフレーム速度変換に対して劣った出発点である。
【０２６８】
これによって、高フレーム速度の電子カメラは、２４ｆｐｓの電子カメラよりはるかに優れた画像ソースであるという結論になる。しかし、６０ｆｐｓのビデオから２４ｆｐｓのフィルムへ変換して戻すことが困難であることを考えれば、７２ｆｐｓは、終局の２４ｆｐｓの互換性についてははるかに優れたカメラフレーム速度である。
【０２６９】
実験は、優れた画質の２４ｆｐｓで動く画像は、非常に単純な重み付けフレームフィルタを使用することによって、７２ｆｐｓのフレームから誘導することができる。２４ｆｐｓの一つのフレームを生じる、７２ｆｐｓのソースからの三つの連続フレーム(前、現行及び次のフレーム)に対する最良の重み付けは、［０．１６６７、０．６６６６、０．１６６７］の重み付けが中心になっている。しかし、［０．１、０．８、０．１］〜［０．２５、０．５、０．２５］の範囲内の３フレーム重み付けのセットは良好に働いているようである。中央フレームに重点があり、その中央フレームは、動きのブラーが短いことから、２４ｆｐｓの動きのスタッターを平滑化する（２４ｆｐｓの動きのブラーをシミュレートすることによって）のに役立てるため隣接するフレームから必要なブラーをプラスした単一のフレームの明瞭性間のバランスをとるのに役立つ。
【０２７０】
この重み付けの技法は、すべての場合の約９５％でうまく働いて、この単純な重み付け関数に大部分の２４ｆｐｓ変換を行わせることができる。これらの場合の残り５％ほどに対しては、米国特許願第０９／４３５，２７７号に教示されているように動き補償を利用できる。この単純な重み付け法によって、該変換プロセスに対する作業負荷を１／２０に減らしたことによって、残留動き補償変換は、必要時に一層実用的になる。
【０２７１】
また、１２０ｆｐｓソースを、五つの重み付けで使用して、２４ｆｐｓで類似の結果を達成できることにも注目すべきである。例えば［０．１、０．２、０．４、０．２、０．１］の重み付けを利用できる。また、６０ｆｐｓはフレームを一つおきにテイクすることによって１２０ｆｐｓから誘導することができるが、より短いオープンシャッター期間が速い動きに顕著である。この間題を軽減するため、オーバーラッピングフィルタも使用することができ（例えば好ましくは［０．１６６７、０．６６６６、０．１６６７］について使用できるが［０．１、０．８、０．１］〜［０．２５、０．５、０．２５］の範囲内でもよい）、低振幅重み付けフレームを繰り返す。勿論、より高いフレーム速度は、時相試料をより注意深くシェープ(shape)して２４ｆｐｓなどのフレーム速度を誘導することができる。フレーム速度が非常に高くなると、本発明の譲受け人に譲渡されている米国特許第５，４６５，１１９号と同第５，７３７，０２７号の技法の適用が始まる。なぜならば、データ転送速度を管理できるように保つため、各フレーム内のデータ速度を下げる方法が必要になるからである。しかし、センサ（例えばアクティブ画素又はＣＣＤ)内でのオンチップ並列処理は、必要なオフチップＩ／Ｏ速度を下げる別の手段を提供できる。
【０２７２】
２４ｆｐｓが、新しい７２ｆｐｓ（などの)フレーム速度フォーマットの経済的実用性のために要望されていると仮定すると、ここで述べられている時相フィルタ重み付け関数（例えば、０．１６６７、０．６６６６、０．１６６７）を使用して、２４ｆｐｓの画像を監視できることも大切である。これを行うことによって、シーン中のショットの「ブロッキング」（セッティングアップ）をチェックして、２４ｆｐｓの結果が（７２ｆｐｓなどのより高い速度のフルレートバージョンに加えて)良好に見えることを保証できる。このように、高フレーム速度捕獲の利点が、２４ｆｐｓで国際的なフィルムとビデオのリリースを行う性能と完全に統合されている。
【０２７３】
したがって、特定の選択された高フレーム速度は、既存の２４ｆｐｓのフィルム及びワールドワイドビデオをリリースする基本施設と上位互換性があるのみならず、将来の高フレーム速度電子画像ソースを創製する最も適切な基礎を形成している。
【０２７４】
モジュラビット速度
ビット速度を「モジュール化する」ことは、多くのビデオの圧縮アプリケーションに有用である。各種のビット速度システムが、連続的に変化するビット速度を利用して、より多くのビットをより速く変化するショットに適用するこころみをしている。これは、各有用なユニットに異なるビット速度を与えることによって粗い方式で行うことができる。適切なユニットの例としては、ある範囲のフレーム（「写真のグループ」すなわちＧＯＰ）又は各Ｐフレームがある。したがって、例えば、ビット速度はＧＯＰ内で一定であってもよい。しかし、（例えば、動き又はシーンの変化が大きいため）高い圧縮ストレスが検出されるＧＯＰの場合、より高い一定のビット速度を利用できる。これは、強化層中のビットすべてを、高ストレスの期間中、ベース層に適用する（一般に次のＩフレームでリセットする）上記階層化法と類似している。したがって、より多くのビットをベース層に適用するという概念に加えて、高ストレスの期間中、高品質を得るため、より多くのビットを、単一層圧縮、又はベース層と強化層に（階層化圧縮の場合）適用できる。
【０２７５】
一般に、低ビット速度は、映画又はライブイベントの時間の９０％を扱うことができる。時間の残りの１０％に対して、５０％又は１００％多いビットを使用すると、完全に近い符号化がなされるが、全ビットカウントは５％〜１０％しか増加しない。これは、一般に一定のビット速度に符号化しながら（したがって一定のビット速度のモジュール性とプロセッシングの利点を大部分保持して）、特に眼で見える完全な符号化を行うのに非常に有効な方法であることを証明している。
【０２７６】
このようなより高いビット速度の期間の使用は手動で又は自動的に制御することができる。自動制御は、速度制御量子化スケールファクターを使用して行うことができ、このパラメータは、高ストレスの期間では、（ビット速度が大きく増大しないようにするため）大きくなる。したがって、このような高ストレスが検出され、そして残りのＧＯＰはより高いビット速度で符号化されるべきであるか、あるいはまたＧＯＰは、出発Ｉフレームで始めてより高いビット速度を利用し再符号化すべきであるという信号を送ることができる。視覚検査を利用して、手動選択を利用し、ＧＯＰがより高いビット速度を必要としているというフラグを立てることもできる。
【０２７７】
ＧＯＰが一般に特定の大きさを有していることを利用するためリアルタイム復号を行うことが有益である。またＧＯＰの単純倍数（Simple multiple）（例えば、高いストレスを有するＧＯＰに対するビット数の５０％又は１００％の増加）を使用することも、前記利点を多く保持する。図１７は、より高いビット速度を、圧縮されたデータ流のモジュラ部分に適用する一例の線図である。正常シーン１８００、１８０２を含む写真のグループが、一定速度のビットを割り当てられている。高レベルのストレス（すなわち圧縮プロセスが「正常」シーンと同等に圧縮することが難しい変化）を示すシーンを含むＧＯＰ１８０４が起こると、一層多数のビット（例えば５０〜１００％の追加）がそのＧＯＰに割り当てられて、そのシーンのより正確な符号化を行うことができる。
【０２７８】
多くのＭＰＥＧ−２の装置が一定のビット速度を使用することは注目すべきである。一定のビット速度は、一定ビット速度のトランスポートと記憶の媒体と良好に整合する。放送チャネル、衛星チャネル、ケーブル及びファイバなどのトランスポートシステムはすべて、固定された一定のトータルキャパシティ（total capacity）を有している。また、ディジタル圧縮ビデオテープ記憶システムは一定のテーププレイバック速度を有しているので、一定の記録又はプレイバックのビット速度を生成する。
【０２７９】
ＤｉｒｅｃＴＶ／ＤＳＳ及びＤＶＤなどの他のＭＰＥＧ−２装置は、ある形態の可変ビット速度割り付け（variable bit rate allocation）を利用する。ＤｉｒｅｃＴＶ／ＤＳＳの場合、その変動性は、現行プログラムのシーンストレス（scene stress）対共通マルチプレックスを共用する隣接ＴＶプログラムのシーンストレスの組み合わせである。そのマルチプレックスは同調された衛星チャネルとトランスポンダに相当し、それは固定トータルビット速度を有している。消費者ビデオＤＶＤの場合、そのディジタル光ディスクの容量は２．５ギガバイトであり、ＭＰＥＧ−２のビット速度が２ｈｒの映画に対して平均４．５メガビット／秒であることが必要である。しかし、その光ディスクは、９メガビット／秒で１００％高いピーク読み出し速度の性能を有している。より短い映画の場合、平均速度は充分な９メガビット／秒まで高くしてもよい。２ｈｒの映画の場合、そのビット速度が平均４．５メガビット／秒を達成する方法は、これを超える速度を高いシーンストレスを有するシーン（シーンの動きが高いため変化が大きい）に対して使用するが、一方この平均値より低い速度をシーンストレスが低い（動きが小さいため変化が小さい）間に使用する方法である。
【０２８０】
ＭＰＥＧ−２とＭＰＥＧ−４のビット速度は、事実上の復号器バッファの容量のモデリングを組み合わせそして量子化パラメータを変えて、符号器が発するビット速度を減速することによって一定に保持される。あるいは、一定の量子化パラメータは、シーンの「エントロピー」としても知られているシーンの変化とディテールに比例して、数が変化するビットを生成する。一定の量子化パラメータは比較的一定の品質であるが可変のビット速度を生じる。変化する量子化パラメータは、サイズ限定復号器バッファ（size bounded decoder buffer）とともに使用して、どんな変動性も平滑化して一定のビット速度を提供することができる。
【０２８１】
マルチプレックスの多くのチャネルを共用することは、ＤｉｒｅｃＴＶの場合、又はＡＣＡＴＳ／ＡＴＳＣ１９．３メガビット／秒６メガヘルツマルチプレックスの標準精細度の信号の場合と同様に、可変ビット速度を支持できる一方法である。低エントロピーのショウ（トークショウのような）とペアになって高エントロピーのショウ（ホッケーのような速いスポーツ）の統計データは、より大きなエントロピーを有するショウにビットを適用する際に瞬間トレードオフ（instantaneous tradeoff）することができる。一つのショウにおけるゆっくりした期間は、より少数のビットを使用し、同じマルチプレックス内の速く動く同時の別のショウに対しより多くのビットを提供する。
【０２８２】
これらの可変ビット速度システムは、通常、平均値をほぼ１００％超えるところにピークのビット速度を有している。したがって、これらのシステムは、最も高いビット速度で一定ビット速度システムになり、高いシーンストレスが続いている期間利用可能なピークビット速度を限定する。また、いくつものＭＰＥＧ−２復号器システムの入力ビット速度にも限度があり、このような可変ビット速度システムのピークビット速度にも限度がある。しかしピーク入力ビット速度に対する限度は、復号器が改善されるとこれらの他の限度を充分超えて、徐々に上昇する。
【０２８３】
これら従来の各ビット速度制御システムの一般概念は、その復号器内に小さなメモリバッファがあり、そのバッファは、移動する画像の一フレーム及び数フレームのほぼ一フラクションを保持しているということである。この復号器のビット速度バッファが考えられた１９９０年頃には、復号器のこのバッファメモリのコストが、復号器の価格に有意に影響するであろうという懸念があった。しかし、現在は、このバッファのコストは微々たるものであることが確認されている。事実、多数秒分のバッファは現在では微々たるコストである。近い将来、ビット受取メモリバッファは、多数分のビデオ情報をわずかなコストで保持できると推測できる。さらに、ディスクなどの記憶媒体のコストも急速に低下しているが、容量は急速に増大している。したがって、圧縮されたビットストリームを、ディスクなどの記憶メモリシステムにスプールして、多数時間分又は多数日分の記憶容量を得ることも合理的である。これは、現在、市販のハードドライブベースのホームビデオレコーダによって行われている。
【０２８４】
しかし、ビットが圧縮されたビットバッファで待っている間、時間遅延があるという一つの基本的な問題点が残っている。放送テレビジョンや映画配給の場合、数秒間又は数十秒間の遅延は、進行中のプログラム「tune-in」又は「movie selection」を案内する補助選択ストリーム（auxiliary selection stream）を利用できる限り、又は（例えば映画の）初期スタートが小さい初期バッファによって短くした遅延を利用する場合、視聴するのにほとんど影響がない。しかし、遠隔会議又はライブの対話イベントの場合、遅延を最小限にするため小さい高速度の実行バッファ（running buffer）が必要である。ライブの対話と遠隔会議の用途を除いては、安価な大きいバッファを利用して品質を改善することができる。
【０２８５】
これらの傾向に照らして、可変及び一定のビット速度の圧縮法の構造は有意に改良することができる。これらの改良点としては以下のものがある。
・復号器バッファモデルにおけるバッファサイズを大きく増大して、可変ビット速度と一定ビット速度の多くの利点を同時に提供すること。
・復号器バッファが満ち始める間、標題への瞬間的な変化を支持するための「インタスティシャル（interstitial）」ショウのタイトルの事前ローディング。
・新しく出発したプログラム又は映画の開始時に部分充填ＦＩＦＯ（先入れ先出し）復号器ビット速度バッファを利用し、次いで該プログラムが開始した後、進行するにつれてバッファフルネス（buffer fullness）（したがって遅延）を徐々に増やすこと。
・平均ビット速度を、高いシーンストレスの期間に増大するため、（上記モジュラビット速度の概念を使用して）増大させたビット速度「モジュール」を、（例えば第二ＦＩＦＯ、メインメモリ又はディスクへのスプーリングを利用して）復号器ビットメモリに事前ロードすること。このような事前ローディングは、一定ビット速度のチャネルで平均ビット速度を超えるのみならず、可変ビット速度のシステム内で最大ビットを超えるビット速度の期間を可能にする。
・本発明の階層化構造において、平均の（又は一定の）ビット速度ストリーム中のビットはすべて、高いシーンストレスを有するシーンの間、ベース層にシャントさせる（shunt）ことができる。しかし、一シーンに対する強化層ビットは、そのシーンに事前ロードすることができ、そして、同期化のためのタイミングメーカーを利用してプレイアウト（play out）させることができる。トランスポート及び／又はプレイバックにおける最大（又は一定の）ビット速度の限度は、この方法を利用する期間（利用可能なバッファスペースの大きさによってのみ限定される）、超えることができるということにやはり留意すべきである。
【０２８６】
多層ＤＣＴ構造
可変ＤＣＴブロックサイズ
変形波長（transform
wavelength）の高調波アラインメントは、階層化ＤＣＴ構造にとって基本的なものである。例えば、図１８は、二つの解像度層間のＤＣＴ高調波の関係を図式で示す。本発明の現在の最適の２層配置構成において、ベース層は、８×８画素のＤＣＴブロックサイズ１９００の１、２、３、４、５、６及び７倍の周波数を有する算術高周波シリーズ（arithmetic harmonic series）を使用するＤＣＴ係数を利用する。ファクターが２の解像度強化層において、これらベース層の高調波は対応する強化層ＤＣＴブロック１９０２の１／２、１、３／２、２、５／２、３及び７／２の周波数にマップする。その周波数は全体がベース層に保持されているから、１／２項に対するペナルティはないが、残りの項が強化層と部分的にのみハーモナイズする。例えば、ベース層由来のマクロブロックサイズの２、４及び６倍の周波数を強化層由来のマクロブロックサイズの１、２及び３倍の周波数とアラインさせる。これらの項は、追加の精度があたかもベース層のこれら係数に適用されたように、自然の信号／ノイズ比（ＳＮＲ）の階層化を形成する。ベース層由来の３、５及び７項は、強化層と非調波的（non-harmonic）なので、ベース層だけに対して直交性（orthogonality）を示し、強化層との相乗作用を全く提供しない。強化層の残りの項４、５、６及び７は、強化層が、ベース層とオーバーラップすることなく、画像に提供できる追加のディテールを表す。図１９は、三つの解像度層間のＤＣＴ高調波の類似の関係を図式で表し、最高の強化層１９０４を示す。
【０２８７】
この構造には部分的な直交性と部分的なアラインメントしかないことが分かるであろう。このアラインメントと直交性は一般に有益であるが、該ＤＣＴ符号化シリーズの位相アラインメントは、二つの（又は三つ以上の）空間解像度層に対して決して最適化されなかった。むしろＤＣＴは、位相搬送画像項（phase-carrying imaginary term）をフーリエ変換級数から除いた、位相特性を利用する一組の直交基底関数（orthogonal basis function）として設計された。そのＤＣＴは、２層空間符号化構造（two-layer spatial coding structure）において符号化を行うのに、明らかに適切であるが、層の直交性と位相の関係のこれらの論点は、三つ又は四つの空間解像度層への階層化された構造の拡張が中心になっている。
【０２８８】
交差層の直交性を提供するための解決策は、各解像度層に対し異なるＤＣＴブロックサイズを利用する方法である。例えば与えた層の解像度が２倍になれば、そのＤＣＴブロックの大きさは２倍になる。これによって、解像度階層化構造が調波的にアラインされ、層間係数の直交性（inter-layer coefficient orthogonality）が最適であるため、最適の符号化効率が提供される。
【０２８９】
図２０は、異なる解像度層に対する各種のＤＣＴブロックサイズを示す線図である。例えば、４×４画素ＤＣＴブロック２０００はベース層に使用することができ、８×８画素ＤＣＴブロック２００２は、上の次の層に使用することができ、１６×１６画素ＤＣＴブロック２００４は第三層に使用することができ、そして３２×３２画素ＤＣＴブロック２００６は第四層に利用することができる。このように、各層は、完全直交性の追加の調波項を、下の単一又は複数の層に加える。任意に追加の精度（ＳＮＲのセンスの）を、先にカバーされた係数項（previously covered coefficient term）に加えることができる。例えば、上記３２×３２画素ブロック２００６中の１６×１６画素サブセット２００８を使用して、１６×１６画素ＤＣＴブロック２００４の精度と（ＳＮＡ改良センスで）高めることができる。
【０２９０】
動きベクトル
ＭＰＥＧ−２において、動きベクトルに対応するマクロブロックは、１６×１６画素からなり、四つの８×８ＤＣＴブロックとして編成されている。ＭＰＥＧ−４において、各マクロブロックは、任意に、それ自身の動きベクトルを各々がもっているＤＣＴブロックに対応する８×８領域中にさらに細分化することができる。
【０２９１】
たとえＤＣＴブロックが、各層のサイズが異なっている方が好ましくても、その動き補償マクロブロックはこの構造によって拘束される必要がない。最も単純な構造は、動きがベース層の動きベクトルによって、すべての層に対して指定されるので、各ベース層の動き補償マクロブロックの単一の動きベクトルが、すべてのより高い層にも当てはまり、すべての強化層から動きベクトルをすっかり除く構造である。しかしより効率的な構造は、各層に、独立して、（１）動きベクトルなし（すなわちベース層の動きベクトルを使用する）、（２）ベース層の動きベクトルに対する追加のサブ画素の精度、又は（３）各動き補償マクロブロックを、独立した動きベクトルを各々が有する２個、４個などの数のブロックに分割することを選択させる構造である。ＭＰＥＧ−４内のオーバーラップされたブロックの動きを補償する（ＯＢＭＣ）方法を利用して、動かされている独立したブロックの動き補償間の遷移を平滑化することができる。この説明の他の部分で詳記されているように、サブ画素を配置するため負のローブのフィルタを使うことも、このＤＣＴ層構造の動きを補償するのに有益である。
【０２９２】
したがって、各層における各ＤＣＴブロックは、その層にとって最適であるように、動きを補償するために多数の動きベクトルブロックに分割できる。図２１は、独立した動きベクトルを確認するため動き補償マクロブロックを分割する例を示す線図である。例えば、ベース層は、４×４画素ＤＣＴブロック２１００を使用して構築されると、１個（図示してある）から１６個もの多数の動きベクトル（各画素に対して一つずつ）を使用できるか又はサブ画素の動きベクトルを利用することさえできる。これに応じて、より高いレベルが各々、そのより大きい対応するＤＣＴブロック２１０２、２１０４、２１０６を適切な場合に分割して、符号化予測品質（したがってセービング（saving）ＤＣＴ係数ビット）対動きベクトルを指定するのに必要なビット間の最適のバランスが得られる。動きを補償するためのブロックの分割は、動きベクトルを符号化に使用されるビットと、写真予測の改良との間のトレードオフである。
【０２９３】
本願の他の部分に記載されているように、低い方の層の動きベクトル由来の案内ベクトルを、高い方の層の各動きベクトルを予測するのに使用すると、やはり、符号化の効率と効力が改善される。
【０２９４】
可変長さ符号化の最適化
ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４、Ｈ．２６３などの圧縮システム（ウェーブレットなどのＤＣＴシステムと非ＤＣＴシステムを含む）が利用する可変長さの符号（例えばハフマン符号又は算術符号）は、小グループの試験シーケンスについて立証された効率に基づいて選択される。これらの試験シーケンスは、画像のタイプに限定されて、比較的狭い範囲のビット速度、解像度及びフレーム速度だけを表す。さらに、該可変長の符号は、各試験シーケンス及びグループとしての試験シーケンスに関する平均の性能に基づいて選択される。
【０２９５】
実質的に一層最適の可変長さ符号化システムは、（１）特定の可変長さ符号化テーブルを各フレームに適用し、次に（２）その特定のフレームに対して最も最適の符号を選ぶことによって得ることができることを実験が示した。最適の可変長さ符号のこのような選択は、フレーム（フレームの一部又は領域）より小さいユニット又はいくつものフレームのグループに適用できる。動きベクトル、ＤＣＴ係数、マクロブロックのタイプなどに使われる可変長さ符号は、各々、与えられたユニット（すなわち、フレーム、サブフレーム又はフレームのグループ）に対し、そのユニットの現行の解像度とビット速度にて、独立して最適化することができる。また、この方法は、本願の別の部分で述べられている空間解像度強化層にも適用できる。
【０２９６】
可変長さ符号のどのグループを使うべきかという選択は、少数のビットを使って、各フレーム（又はサブパート又はグループ）で運ぶことができる。さらに、カスタム符号化表は、信頼性が高いデータの伝送とプレイバックを利用できるところへ（例えばデータ光ディスク又は光ファイバーネットワークで）ダウンロードすることができる。
【０２９７】
ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４、Ｈ．２６３、ＤＶＣ−Ｐｒｏ／ＤＶなどの圧縮システムが使用する既存の符号化表は、定義済でかつ静的（pre-defined and static）であることに注目すべきである。したがって本発明のこの側面の適用は上位互換性ではないが、将来の符号化システムと下位互換性であろう。
【０２９８】
ＭＰＥＧ−２とＭＰＥＧ−４用の増強システム
現在、ＭＰＥＧ−２を実現できる復号器（MPEG-2 capable decoder）の大きな設置ベースがある。例えば、ＤＶＤプレーヤー及びＤｉｒｅｃＴＶ衛星受信機はともに、現在、数百万の家庭にある。ＭＰＥＧ−４はＭＰＥＧ−２と互換性がないので、ＭＰＥＧ−４ビデオ圧縮復号化がＭＰＥＧ−２を超えて提供できる改良点はまだ利用できない。しかし、ＭＰＥＧ−４とＭＰＥＧ−２はともに、動きを補償されたＤＣＴ圧縮システムであり、共通の基本構造を共用している。ＭＰＥＧ−４のビデオ符号化システムの合成システム（composition system）はＭＰＥＧ−２とは基本的に異なり、いくつかの他の拡張された特徴がある。この考察では、ＭＰＥＧ−４のフルフレームビデオ符号化の側面だけを考察している。
【０２９９】
ＭＰＥＧ−４とＭＰＥＧ−２の間には多数の差があるが、主な差は次のとおりである。
（１）ＭＰＥＧ−４は、１６×１６マクロブロックを四つの８×８ブロックに、各ＤＣＴに対して一つずつ任意に分割することができ、その８×８ブロックは各々独立の動きベクトルを有している。
（２）ＭＰＥＧ−４−Ｂフレームは、予測の一タイプである「直接」モードをもっている。
（３）ＭＰＥＧ−４−Ｂフレームは、Ｂフレームの「Ｉ」マクロブロックを支持するＭＰＥＧ−２と異なり「Ｉ」マクロブロックを支持しない。
（４）ＭＰＥＧ−４のＤＣＴ係数は、ＭＰＥＧ−２の場合より一層精緻なパターンで符号化することができるが、周知のジグザグパターンがＭＰＥＧ−２とＭＰＥＧ−４の両者に共通している。
（５）ＭＰＥＧ−４は１０ビットと１２ビットの画素深度（pixel depth）を支持するが、ＭＰＥＧ−２は８ビットに限定されている。
（６）ＭＰＥＧ−４は１／４画素の動きベクトルの精度を保持しているが、ＭＰＥＧ−２は１／２画素の精度に限定されている。
【０３００】
いくつかのこれらの差、例えばＢ−フレーム「直接」モードと「Ｉ」マクロブロックの差は基本的に互換性がないことを意味している。しかし、これら符号化モード両者は自由に選択され、そして符号器はこれらをどちらも使用しないことを（小さな効率損失で）選択しその結果、この非互換性を除くことができる。同様に、符号器は、ＤＣＴ係数のためのＭＰＥＧ−４の符号化パターンを限定して、より優れたＭＰＥＧ−２の共通の性質を提供できる（やはり小さい効率損失で）。
【０３０１】
残りの三つの主要項、すなわち８×８四方向（four-way）ブロックスプリット、１／４画素動きベクトル精度及び１０ビットと１２ビットの画素深度は、ＭＰＥＧ−２がすでに提供している基本構造に対する「増加物（augmentation）」とみなすことができる。
【０３０２】
本発明のこの側面は、これらの「増加物」を別の構造物として提供できることを利用する。したがって、この増加物は、別々に符号化され、別の増加物ストリームとして、標準のＭＰＥＧ−２又はＭＰＥＧ−４のストリームとともに運ぶことができる。また、この技法は、ＭＰＥＧ−１、Ｈ．２６３などの、共通の動き補償ＤＣＴ構造体を共用するビデオ符号化システムでも使用することができる。図２２は、ＭＰＥＧ−２タイプシステムに対する増加システムを示すブロック図である。主圧縮データストリーム２２００（図２２には動きベクトル、ＤＣＴ係数、マクロブロックモードビット並びにＩ、Ｂ及びＰのフレームを含めて示してある）が、従来のＭＰＥＧ−２タイプ復号器２２０２及び並列の強化復号器２２０４に運ばれる。強化データストリーム２２０６（１／４画素動きベクトル精度、８×８四方向ブロックスプリット動きベクトル並びに１０ビット及び１２ビットの画素深度を含めて図示してある）が、同時に、強化復号器２２０４に運ばれる。強化復号器２２０４は、二つのデータストリーム２２００と２２０６を組み合わせて、それらを、復号して、強化ビデオ出力を提供する。この構造を使用して、符号化の強化を、どの動き補償ＤＣＴ圧縮システムにも加えることができる。
【０３０３】
この構造の使用は、より最適のＭＰＥＧ−２復号又はより最適の強化信号を行うための復号器によってバイアスすることができる。ＭＰＥＧ−４ビデオ符号化の改良点を加えることによって強化されたこのような復号が、ＭＰＥＧ−２が復号した写真の画質に少し妥協して、最適に強化された写真画質を達成するのに好都合であろうと期待される。
【０３０４】
例えばＭＰＥＧ−２ビデオ符号化をＭＰＥＧ−４で強化する場合、ＭＰＥＧ−２の動きベクトルは、前記四方向スプリット動きベクトルに対する「予測子（predictor）」として使用でき（ＭＰＥＧ−４が四方向スプリットを選択する場合に）、又は非スプリット１６×１６マクロブロックに対して直接使用できる。１／４画素動きベクトル解像度は、強化データストリーム２２０６内の精度の追加の１ビットとして符号化する（垂直方向と水平方向）ことができる。余剰画素深度（extra pixel depth）は、逆ＤＣＴ関数を適用する前に、余剰精度として、ＤＣＴ係数に符号化することができる。
【０３０５】
本発明の重要な課題である空間解像度の階層化は、ベース層ができるだけ完全に符号化されると、最も最適に機能する。ＭＰＥＧ−２は不完全な符号化を行い、解像度強化層に劣った性能を生じる。上記増加システムを使用することによって、ベース層は、例えば、上記のＭＰＥＧ−４の改良点（及び本願に記載の他の改良点）を用いて、ベースを符号化するＭＰＥＧ−２データストリームを増大することによって改良することができる。得られるベース層は、付随する強化データストリームとともに、より優れた符号化（例えばＭＰＥＧ−４及び本発明の他の改良方法による）からもたらされた改良ベース層を利用して得られる品質と効率の大部分を有している。得られた改良ベース層には、本発明の他の側面を使用して、又は２以上の解像度強化層を適用できる。
【０３０６】
本発明の他の改良品、例えば動きを補償するため負のローブを有するより優れたフィルタは、増大された強化復号器によって呼び出すこともでき、ＭＰＥＧ−４などの動き補償圧縮システムが提供する改良点を超える改良点がさらに生じる。
【０３０７】
空間強化層に対する案内ベクトル
動きベクトルは、本発明によってつくられた各解像度強化層内に割り当てられたビットの大きな部分を含んでいる。ベース層の同じ位置に、対応する動きベクトルを、「案内ベクトル」として使用することによって、強化層の動きベクトルに必要なビットの数を実質的に減らすことが可能であることが確認された。したがってその強化層の動きベクトルは、ベース層からの対応する案内ベクトル中心について小さいサーチ範囲のサーチだけで符号化される。このことは、ＭＰＥＧ−４強化層にとっては特に重要である。なぜならば、各マクロブロックは任意に四つの動きベクトルをもつことができ、かつ動きベクトルの１／４画素解像度を利用できるからである。
【０３０８】
図２３は、ベース層２３００からの動きベクトルを案内ベクトルとして解像度強化層２３０２に使用することを示す線図である。ベース層２３００からの動きベクトル２３０４は、解像度強化層２３０２のスケールまで拡張した後、強化層２３０２の動きベクトルを改善するための案内ベクトル２３０４’として役立つ。したがって、対応する強化層２３０２の動きベクトル２３０６を見つけるのに、小さい範囲しかサーチする必要はない。そのプロセスは、ベース層由来のすべての動きベクトルに対して同じである。例えば、ＭＰＥＧ−４では、１６×１６画素ベース層マクロブロックは、四つの８×８画素動きベクトルブロックに任意に分割できる。次に、対応するファクター２（factor-of-two）の強化層が、案内ベクトルとして、ベース層からの同時に配置されている動きベクトルを利用する。この実施例では、ベース層中の８×８動きベクトルブロックのうちの一つからの動きベクトルが、強化層中の対応する１６×１６画素マクロブロック内の動きベクトルのサーチを案内する。この１６×１６ブロックは、すべて同じ対応するベース層動きベクトルを案内ベクトルとして利用して、任意に、四つの８×８動きベクトルブロックにさらに分割することができる。
【０３０９】
強化層中のこれら小さいサーチ範囲の動きベクトルは、次に、はるかに高い効率で符号化される（すなわち、より小さい強化層動きベクトル２３０６をコードするのに必要なビットは少ない）。この案内ベクトル法は、ＭＰＥＧ−２、ＭＰＥＧ−４又は他の適切な単一又は複数の動き補償空間解像度強化層に適用できる。
【０３１０】
強化モード
図２４Ａ−２４Ｅは、代表的な専門レベルの強化モードに現れるデータ流の線図である。これらの図は、左欄に写真データ（中間段階を含む）を示し、中央欄にプロセッシングステップを示し、そして右欄に出力を示す。これはここで述べるいくつものプロセッシングステップを結合する方法のほんの一例であることに注目すべきである。より簡単な及びより複雑な異なる結合を配置構成して、異なるレベルの圧縮、アスペクト比及び画像の画質を達成することができる。
【０３１１】
図２４Ａは２ｋ×１ｋ画素の初期写真２４００を示す。この画像をダウンフィルタして（２４０２）１ｋ×５１２画素２４０４にする。動きベクトル２４０６を初期写真からつくりファイル２４０７として出力する。前記１ｋ×５１２画素画像２４０４を圧縮／復元して（２４０８）１ｋ×５１２復元画像２４１０にし、次いでその圧縮されたバージョンをベース層２４１２として、関連する動きベクトルファイル２４１６とともに出力する。１ｋ×５１２の復元された画像２４１０を拡張して（２４１８）２ｋ×１ｋ画像２４２０とする。１ｋ×５１２画像２４０４を拡張して（２４２２）、２ｋ×１ｋ画像２４２４にする。２ｋ×１ｋの画像２４２０を、オリジナル画像２４００から差し引いて（２４２８）、２ｋ×１ｋ差分写真２４２８をつくる。
【０３１２】
２ｋ×１ｋの画像２４２４をオリジナル画像２４００から差し引いて（２４３０）２ｋ×１ｋの差分写真２４３２をつくる。２ｋ×１ｋ差分写真２４３２の振幅を選択した大きさ（例えば０．２５倍）に小さくして（２４３４）、２ｋ×１ｋの大きさの差分写真２４３６をつくる。２ｋ×１ｋの大きさの差分写真２４３６を２ｋ×１ｋの差分写真２４２８に加えて（２４３８）、２ｋ×１ｋの組み合わせ差分写真２４４０をつくる。その組み合わせ差分写真２４４０を、オリジナルの動きベクトルを使用して符号化／復号し（２４４２）、次いで符号化された強化層２４４４を出力し（この実施例ではＭＰＥＧ−２）、次に２ｋ×１ｋの復号された強化層２２４６を出力する。２ｋ×１ｋの復号された強化層２２４６を、２ｋ×１ｋの画像２４２０に加えて（２４４８）、２ｋ×１ｋの再構築したフルベースプラス強化画像２４５０をつくる。オリジナル画像２４００を、２ｋ×１ｋの再構築されたフルベースプラス強化画像２４５０から差し引いて（２４５２）、２ｋ×１ｋ第二層差分写真２４５４をつくる。２ｋ×１ｋの第二層差分写真２４５４の振幅を大きくして（２４５６）、２ｋ×１ｋの差分写真２４５８をつくる。次にレッドチャネル情報２４５８、グリーンチャネル情報２４６０及びブルーチャネル情報２４６２を抽出してそれぞれ、レッド差分画像２４６４、グリーン差分画像２４６６及びブルー差分画像２４６８をつくる。動きベクトルファイル２４０７を使用して、該レッド差分写真２４６４からの第二レッド層を符号化／復号して（２４７０）、レッド第二強化層２４７２及び復号されたレッド差分画像２４７４にし；グリーン差分写真２４６６からの第二グリーン層を符号化／復号して（２４７６）、グリーン第二強化層２４７８及び復号されたグリーン差分画像２４８０にし；次いでブルー差分写真２４６８からの第二ブルー層を符号化／復号して（２４８２）、ブルー第二強化層２４８４及び復号されたブルー差分画像２４８６にする。前記復号されたレッド差分画像２４７４、前記復号されたグリーン差分画像２４８０、及び前記復号されたブルー差分画像２４８６を、復号されたＲＧＢ差分画像２４９０に連結する（２４８８）。復号化されたＲＧＢ差分画像２４９０の振幅を小さくして（２４９２）、第二の復号されたＲＧＢ差分画像２４９４をつくる。その第二の復号されたＲＧＢ差分画像２４９４を、前記２ｋ×１ｋの再構築されたフルベースプラス強化画像２４５０に付加して（２４９６）、２ｋ×１ｋの再構築された第二強化層画像２４９８をつくる。その２ｋ×１ｋの再構築された第二強化層画像２４９８をオリジナル画像２４００から差し引いて（２５００）、２ｋ×１ｋの最終残留画像２５０２をつくる。この２ｋ×１ｋの最終残留画像２５０２を次に、無損失で圧縮して（２５０４）、別々のレッド、グリーン及びブルーの最終の残留差分画像２５０６をつくる。
【０３１３】
コンピュータの使用
本発明はハードウェア又はソフトウェア又は両者の組み合わせで実施することができる。しかし、好ましくは、本発明は、１又は２以上のプログラマブルコンピュータで実行するコンピュータプログラムで実施され、そのプログラマブルコンピュータは各々、少なくとも一つのプロセッサ、データ記憶システム（揮発性及び不揮発性のメモリ及び／又は記憶素子を含む）、入力装置及び出力装置を含んでいる。プログラムコードが入力データに適用されて、ここに記載されている機能を実行して出力情報を生成する。その出力情報は、既知の方式で、１又は２以上の出力装置に加えられる。
【０３１４】
このようなプログラムは各々、所望のコンピュータ言語（機械言語、アセンブリ言語又は高レベルの手続き型言語、論理言語又はオブジェクト指向プログラミング言語がある）で実行して、コンピュータシステムと通信することができる。いずれにしろ、その言語は翻訳された言語又は解釈された言語でもよい。
【０３１５】
このようなコンピュータプログラムは、好ましくは、汎用又は専用のプログラマブルコンピュータシステムが読出し可能な記憶媒体又は記憶装置（例えばＲＯＭ、ＣＤＲＯＭ又は磁気もしくは光の媒体）に記憶され、その記憶媒体又は記憶装置が該コンピュータによって読み取られると、該コンピュータを設定し（configure）作動させて、ここに記載の手続を実行する。また本発明のシステムは、コンピュータプログラムで構成された、コンピュータが読み取り可能な記憶媒体として提供されると考えることもでき、このように配置構成された記憶媒体は、コンピュータシステムを、特定の予め定義された方式で作動させて、ここに記載の機能を実行する。
【０３１６】
結論
新規であるとみなされる本発明の異なる側面としては、限定されないが下記の思想を含んでいる。
・世界中で広く使われている既存の２４ｆｐｓのフィルムやビデオのインフラストラクチャとの互換性を提供するため、高フレーム速度の利益を新しい電子ビデオシステムに与えながら、７２ｆｐｓをソースフレーム速度として電子カメラに使用すること。
・米国特許願第０９／４３５，２７７号（発明の名称「System And Method For Motion Compensation and Frame Rate Conversion」、１９９９年１１月５日付け出願）由来の動き補償とフレーム速度変換を行う方法を利用して７２ｆｐｓ及び／又は１２０ｆｐｓから６０ｆｐｓに変換すること。
・［０．１、０．８、０．１］〜［０．２５、０．５、０．２５］の範囲の重み付けをしたフィルタを使用して行う７２ｆｐｓから２４ｆｐｓへの変換及びほぼ［０．１、０．２、０．４、０．２、０．１］の重み付けを利用して行う１２０ｆｐｓから２４ｆｐｓへの変換。
・［０．１、０．８、０．１］〜［０．２５、０．５、０．２５］の範囲の重み付けを利用する３フレームのオーバーラッピングセット（１／６０のフレーム各々に対するアドバンスド２／１２０）を使用して行う１２０ｆｐｓから６０ｆｐｓへの変換。
・米国特許願第０９／４３５，２７７号（発明の名称「System And Method For Motion Compensation and Frame Rate Conversion」、１９９９年１１月５日付け出願）由来の動き補償とフレーム速度変換を行う方法を利用して、一般に好ましい単純な重み付けが所望の品質より少ない小比率のシーンについて、動きブラーを増大しフレーム速度を７２ｆｐｓ（又は他のより高い速度）ソースから２４ｆｐｓに変換すること。
・より高いフレーム速度（７２ｆｐｓ、１２０ｆｐｓなど）を利用してシューティング（shooting）を行いながら、上記重み付け関数によって２４ｆｐｓの監視を利用すること。
・誘導された２４ｆｐｓの結果をオリジナルの高フレーム速度とともに同時にリリースすること
・階層化符号化を行う前にデ−グレイニング（de-graining）及び／又はノイズ減少のフィルタリングを行うこと。
・復号を行った後、創造効果としてリ−グレイニング（re-graining）又はリ−ノイジング（re-noising）を行うこと。
・階層化圧縮を行う前にデ−インタレーシングを行うこと。
・単一層及び多重層の圧縮を行う前に３フィールドフレームデ−インタレーサを適用すること
・単一層及び多重層の圧縮を行う前に写真をアップフィルターして写真の解像度を改善すること。
・強化層内のサブ領域の大きさ及びベース層と強化層に割り当てられたビットの相対的比率を調節すること。
・フラクショナル・リレーションシップ（fractional・relationship）が独立して異なるように、垂直と水平の関係を独立して処理すること。
・高圧縮ストレスの期間中、圧縮ユニットに（例えばＧＯＰ）に対し高ビット速度を（自動的に、速度制御量子化パラメータの高い値を検出することによって又は手動で制御することによって）与えること。
・圧縮システム及び階層化圧縮システムの自然ユニット（natural unit）がモジュラユニットの増大されたビット速度を利用できる「モジュラ化」ビット速度を使用すること。
・単一又は複数の復元バッファに、増大されたビット速度のモジュラユニットをプレロードして、圧縮システム又は階層化圧縮システムで使用すること。
・一定のビット速度のシステムを、本発明の階層化圧縮システムの１又は２以上の層で使用すること。
・可変ビット速度のシステムを、本発明の階層化圧縮システムの１又は２以上の層で使用すること。
・使用される固定ビット速度のシステムと可変ビットのシステムを組み合わせて、本発明の階層化圧縮システムの各種の層で使用すること。
・解像度を階層化（「空間スケーラビリティ」とも呼称される）の際に使用するため、対応してより大きいＤＣＴブロックサイズと追加のＤＣＴ係数を使用すること。例えば与えられた層の解像度が２倍になると、ＤＣＴブロックサイズは２倍の大きさになる。これによって、解像度階層化構造が高調波的にアラインされ、層間係数の直交性が最適であるため最適の符号化効率が提供される。
・単位ＤＣＴブロック当り多数の動きベクトルを使用して、大きいＤＣＴブロックと小さいＤＣＴブロックが動きベクトルビットと改善された動き補償予測との間のトレードオフを最適化できるようにすること。
・負のローブを有するアップサイジングフィルタとダウンサイジングフィルタ特に接頭sincフィルタを使用すること。
・負のローブを有する動き補償変位フィルタを使用すること。
・比較的に瞬間的なペイシス、例えば各フレーム、フレームの各領域（例えばいくつもの走査ライン又はマクロブロックライン又は各象限）又はあらゆるいくつものフレームで、最適の可変長さコードを選択すること。
・増大ストリームを利用して改良された符号化機能を既存の圧縮システムに加え、新しい強化復号器を使用して画質を改善するのみならず上位互換性を提供すること。
・強化された復号写真を利用して、より高い品質のベース層を提供し解像度階層化を行うこと。
・類似の移動画像符号化システム間で符号化エレメントを共用して改良への道筋のみならず上位互換性を提供すること。
・２タイプの復号器に部分的に共通で該復号器の一方又は他方を選ぶ規定を含んでいる圧縮ビットストリームの生成を、符号化プロセスに考慮すること。
・ベース層動きベクトルを案内ベクトルとして使用して、使用される動きベクトルの範囲を強化層の中心に置くこと。
・上記方法の組み合わせを、強化層に適用すること、又はＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４、Ｈ．２６３、ＤＶＣ−ｐｒｏ／ＤＶ、及びウェーブレットベースのシステムを含む他の圧縮システムを改善するために適用すること。
【０３１７】
本発明のいくつもの実施態様を説明してきた。しかしながら、各種の変形は、本発明の精神と範囲から逸脱することなく行うことができるものである。例えば、好ましい実施態様はＭＰＥＧ−２又はＭＰＥＧ−４の符号化法と復号法を利用しているが、本発明は、Ｉ、Ｐ及び／又はＢのフレームと層の均等物を提供するどんな類似の標準とでも作動する。したがって、本発明は、具体的に例示された実施態様で限定されず本願の特許請求の範囲の範囲によってのみ限定されるものである。
【図面の簡単な説明】
【図１】６０Ｈｚで表示される２４ｆｐｓと３６ｆｐｓのマテリアルに対するプルダウン速度を示すタイミング線図である。
【図２】第一の好ましいＭＰＥＧ−２符号化パターンである。
【図３】第二の好ましいＭＰＥＧ−２符号化パターンである。
【図４】本発明の好ましい実施態様による時相層の復号を示すブロック図である。
【図５】３６Ｈｚと７２Ｈｚのフレームの両者を出力することができるコンバータに対する６０Ｈｚインタレース化入力を示すブロック図である。
【図６】２４Ｈｚ又は３６ＨｚのベースＭＰＥＧ−２層に対する「マスターテンプレート」を示す線図である。
【図７】ＭＰＥＧ−２を利用する階層解像度スケーラビリティを使用して行うベース解像度テンプレートの強化を示す線図である。
【図８】好ましい階層化解像度符号化プロセスを示す線図である。
【図９】好ましい階層化解像度復号プロセスを示す線図である。
【図１０】本発明による、復号器に対する解像度と時相のスケーラブルオプションの組み合わせを示すブロック図である。
【図１１】グレイ領域と強化を利用して写真のディテールを提供することによって拡張されたベース層の線図である。
【図１２】好ましいダウンサイジングフィルタの相対的形態、振幅及びローブ極性の線図である。
【図１３Ａ】２のファクターでアップサイジングする好ましいアップサイジングフィルタの一対の相対的形態、振幅及びローブ極性の線図である。
【図１３Ｂ】２のファクターでアップサイジングする好ましいアップサイジングフィルタの一対の相対的形態、振幅及びローブ極性の線図である。
【図１４Ａ】奇数フィールドのデ−インタレーサのブロック図である。
【図１４Ｂ】偶数フィールドのデ−インタレーサのブロック図である。
【図１５】三つのデ−インタレース化フィールドを使用するフレームデ−インタレーサのブロック図である。
【図１６】２／３ベース層に基づいた追加の階層化モードのブロック図である。
【図１７】より高いビット速度を、圧縮データストリームのモジュラ部分に適用した一実施例の図である。
【図１８】二つの解像度層間のＤＣＴ高調波の関係を示す図形である。
【図１９】三つの解像度層間のＤＣＴ高調波の類似した関係を示す図形である。
【図２０】多重解像度層にマッチしたＤＣＴブロックサイズの一組を示す線図である。
【図２１】独立した動きベクトルを確認するため動き補償マクロブロックを分割する一実施例を示す線図である。
【図２２】ＭＰＥＧ−２タイプシステムの増大方式を示すブロック図である。
【図２３】ベース層由来の動きベクトルを、解像度強化層のための案内ベクトルとしての使用を示す線図である。
【図２４Ａ】プロフェッショナルレベル強化モードの一実施例を示すデータ流れ図である。
【図２４Ｂ】プロフェッショナルレベル強化モードの一実施例を示すデータ流れ図である。
【図２４Ｃ】プロフェッショナルレベル強化モードの一実施例を示すデータ流れ図である。
【図２４Ｄ】プロフェッショナルレベル強化モードの一実施例を示すデータ流れ図である。
【図２４Ｅ】プロフェッショナルレベル強化モードの一実施例を示すデータ流れ図である。[0001]
Cross-reference of related applications
  No. 08 / 594,815 filed Jan. 30, 1996 (currently, U.S. Pat. No. 5,852,565 issued Dec. 22, 1998). No. 09 / 442,595, filed on Nov. 17, 1999, which was a continuation of U.S. patent application Ser. No. 09 / 217,151 filed on Dec. 21, 1998, which was a continuation application of No. Is a continuation-in-part application and claims its priority.
[0002]
Technical field
  The present invention relates to an electronic communication system, and more particularly, to an advanced electronic television system in which the time phase and resolution of a compressed image frame with enhanced compression characteristics, filtering characteristics, and display characteristics are layered (temporal and resolution layering). .
[0003]
background
  The United States currently uses the NTSC standard to transmit television. However, proposals have been made to replace the NTSC standard with the advanced television standard. For example, it has been proposed for the United States to adopt digital standard definition formats and advanced television formats at 24 Hz, 30 Hz, 60 Hz and interlaced 60 Hz rates. It is clear that these speeds are intended to continue to use (and thus be compatible with) the existing NTSC television display speed of 60 Hz (or 59.94 Hz). In addition, “3-2 pulldown” is intended to display at a display speed of 60 Hz when providing a movie having a temporal speed of 24 frames per second (fps). it is obvious. However, although the above proposal provides a menu of possible formats to choose from, each format encodes and decodes only a single resolution and frame rate. The display speed or motion rate of these formats is not inseparably related to each other, so conversion from one to the other is difficult.
[0004]
  Furthermore, this proposal does not provide decisive performance that is compatible with computer displays. These proposed image motion speeds are based on historical speeds dating back to the beginning of this century. These speeds will not be selected if you throw away the clues. In the computer industry, displays have been available at all speeds over the past decade, but speeds in the range of 70-80 Hz have proven optimal, with 72 Hz and 75 Hz being the most common speeds. Unfortunately, the proposed speeds of 30 Hz and 60 Hz lack useful interoperability with 72 Hz or 75 Hz, resulting in poor temporal performance.
[0005]
  Moreover, since it is required to have a resolution of about 1000 lines at a high frame rate, interlacing is necessary, but such images are available 18 of the conventional 6 MHz broadcast television channel. Some have suggested that it is based on the perception that it cannot be compressed within -19 megabits per second.
[0006]
  If a single signal format must be employed, it would be more desirable to include all the desired standard high-definition resolution in the format. However, to do the above within the bandwidth constraints of conventional 6 MHz broadcast television channels, both frame rate (time phase) and resolution (space) compression and “scalability” are required. . One method that is specifically intended to provide such scalability is the MPEG-2 standard. The temporal and spatial scalability features detailed in the MPEG-2 standard (and newer standards such as MPEG-4) are not effective enough to meet the requirements of US advanced television. Thus, the above proposal for US advanced television is based on the premise that layering of time (frame rate) and space (resolution) is ineffective, so individual formats are required.
[0007]
  Furthermore, it is desirable to increase resolution, image clarity, encoding efficiency, and image generation efficiency. The present invention provides such performance enhancement.
[0008]
wrap up
  The present invention provides a method and apparatus for performing image compression that clearly achieves image compression at high frame rates with high image quality and better than 1000 line resolution. The present invention also achieves both temporal and resolution scalability within the available bandwidth of a conventional television broadcast channel at the above resolution and high frame rate. The method of the present invention effectively achieves a compression ratio that is more than twice that proposed for advanced television. Further, hierarchical compression enables modular decomposition of a form of image that allows various image enhancement methods to be used at will.
[0009]
  Image material is preferably captured at an initial or first frame indication rate of 72 fps. Next, an MPEG data stream (eg, MPEG-2, MPEG-4, etc.) is generated, and the data stream includes the following layers.
(1) A base layer that is encoded, preferably using only MPEG-type P-frames, comprising a bit stream with a low resolution (eg 1024 × 512 pixels) and a low frame rate (24 or 36 Hz);
(2) Arbitrary base resolution temporal enhancement layer encoded using only MPEG type B frames, including a low resolution (eg, 1024 × 512 pixels) and high frame rate (72 Hz) bitstream layer;
(3) A high resolution enhancement layer of any base time phase, preferably encoded using only MPEG type P frames, with high resolution (eg 2k × 1k pixels) and low frame rate (24 or 36 Hz) A layer containing a bitstream of
(4) Arbitrary high resolution temporal enhancement layer encoded using only MPEG type B frames, including a high resolution (eg 2k × 1k pixels) and high frame rate (72 Hz) bitstream layer.
[0010]
  The present invention provides a number of important technical features that allow significant improvements over current proposals, such as replacement of multiple resolutions and frame rates with a single layered resolution and frame rate; 6 MHz television No interlacing is required to achieve a resolution better than 1000 lines for a 2 megapixel image at a high frame rate (72 Hz) in the channel; and a computer display by using a primary frame indication rate of 72 fps Compatibility; and much higher robustness than current non-layered format proposals for advanced television. This is because when a “stressful” image material is encountered, all available bits can be assigned to the low resolution base layer.
[0011]
  In addition, the present invention provides a number of enhancements that address various video quality and compression issues. A number of such enhancement methods are described below, but most of these enhancement methods are preferably implemented as a set of tools that can be applied to the task of enhancing and compressing the image. These tools can be combined by content developers in various ways as desired to optimize the visual quality and compression efficiency of compressed data streams, especially layered compressed data streams. it can.
[0012]
  Such tools include improved image filtering methods, motion vector representation and determination, de-interlacing and noise reduction enhancement, motion analysis, image forming device characterization and modification, and enhanced 3-2. Pull-down system, frame rate method for production, modular bit rate method, multi-layer DCT structure, optimization of encoding of various lengths, extension systems for MPEG-2 and MPEG-4, and guide vectors for spatial enhancement layers There is.
[0013]
In general, this technique has the following characteristics.
(Feature 1) A method for manufacturing an enhancement layer of a base layer of an image encoding system, wherein the base layer is up-filtered and expanded to an extended base region, and an additional area region surrounding the extended base region is defined as the extended base region Creating an enhancement layer that provides additional photographic information by padding the region with uniform intermediate gray pixel values, and that enhancement layer is small relative to the area that matches the extended base region. A method comprising a difference photograph having a range of possible pixel values and a large range of pixel values for an area that coincides with the additional area region.
[0014]
The feature 1 may include one or more of the following features.
2. The method of feature 1, further comprising encoding the enhancement layer as part of a photographic stream that includes the base layer.
(Feature 3) The method according to Feature 2, further comprising decoding the enhancement layer.
(Feature 4) The method according to Feature 1, wherein the difference photograph includes a motion vector, and further includes forcing the motion vector not to point to an additional area region.
(Feature 5) including determining a motion vector based on the macroblock, wherein the macroblock is aligned so as not to scan a boundary between the extended base region and an additional area region surrounding the extended base region. The method of claim 4, wherein:
(Feature 6) The method according to Feature 1, wherein the base layer and the reinforcing layer have a resolution ratio selected from one of 3/2, 4/3, and perfect factor 2.
(Feature 7) The method according to Feature 1, wherein the difference photograph is arranged at the center of the enhancement layer.
(Feature 8) The method according to Feature 1, further comprising sequentially rearranging the difference photographs from image to image with respect to the enhancement layer.
[0015]
In general, this technique has the following characteristics.
(Feature 9) A method of creating a lower resolution image from a higher resolution image in an image encoding system, comprising applying a downsizing filter to an original image having a higher resolution than the downsizing filter, The downsizing filter includes a positive central lobe, two negative lobes adjacent to each side of the positive central lobe, and a small positive lobe adjacent to each negative lobe, the small The method wherein each positive lobe is separated from the positive central lobe by a corresponding negative lobe.
[0016]
The feature 9 may include one or more of the following features.
(Feature 10) The method according to Feature 9, wherein the size of the downsizing filter is limited to the small positive lobe.
Feature 11. The method of feature 9, wherein the relative amplitudes of the positive central lobe, negative lobe and small positive lobe are approximated by a prefix sinc function.
Feature 12 The relative amplitude of the positive central lobe is approximated by the prefix sinc function, and the relative amplitude of the small positive and negative lobes is approximated by 1/2 to 2/3 of the prefix sinc function. The method described in 1.
[0017]
In general, this technique has the following characteristics.
(Feature 13) A method for creating an enlarged image from a restored base image layer or enhanced image layer in an image coding system, wherein a pair of upsizing filters are applied to the restored base image layer or enhanced image layer Each upsizing filter includes a positive central lobe and two negative lobes adjacent to each side of the central lobe, and the peaks of the positive central lobe of each upsizing filter are asymmetrically separated from each other How.
[0018]
The feature 13 may include one or more of the following features.
(Feature 14) The method according to Feature 13, wherein the size of the upsizing filter is limited to a negative lobe.
Feature 15. The method of feature 13, wherein the relative amplitude of the positive central lobe is approximated by a prefix sinc function and the relative amplitude of the negative lobe is less than a value approximated by the prefix sinc function.
Feature 16. The method of feature 13, wherein the relative amplitude of the positive central lobe is approximated by a prefix sinc function and the relative amplitude of a negative lobe is approximated by 1/2 to 2/3 of the prefix sinc function.
[0019]
In general, this technique has the following characteristics.
(Feature 17) A method for creating an enhanced detail image from an original uncompressed base layer input image made from an original high resolution image in an image coding system, wherein a Gaussian upsizing filter is transformed into the original compressed image. Applying to a non-base layer image to create an augmented image; creating a difference image by subtracting the augmented image from the original high resolution image, and then multiplying the difference image by a weight factor.
[0020]
The feature 17 may include one or more of the following features.
18. The method of feature 17, wherein the weight factor is in the range of about 4% to about 35%.
Feature 19. The method of feature 17, wherein the encoding system conforms to the MPEG-4 standard and the weight factor is in the range of about 4% to about 8%.
Feature 20. The method of feature 17, wherein the encoding system complies with the MPEG-2 standard and the weight factor is in the range of about 10% to about 35%.
[0021]
In general, this technique has the following characteristics.
(Feature 21) A method for enhancing image quality in an image coding system, wherein at least one of a de-graining filter and a noise reduction filter is applied to an original digital image to produce a first processed image, A method comprising encoding a processed image into a compressed image within the image encoding system.
[0022]
The feature 21 may include one or more of the following features.
(Feature 22) The original image includes another color channel image having uncorrelated noise characteristics, and further includes applying a separate noise reduction filter to at least one of such separate color channel images. The method of feature 21.
(Feature 23) The feature 21 further includes: decoding the compressed image into a decompressed image, and then applying at least one of a re-graining filter or a re-noising filter to the decompressed image. The method described in 1.
[0023]
In general, this technique has the following characteristics.
(Feature 24) A method for enhancing image quality in an image coding system, wherein a field de-interlacer is applied to each of a series of image fields to form a corresponding series of field frames. Applied to a series of at least three sequential field frames to produce a corresponding series of de-interlaced image frames, which are then encoded in an image coding system to form a series. A compressed image of the method.
[0024]
The feature 24 may include one or more of the following features.
(Feature 25) Each image field includes a line, and applying the field deinterlacer duplicates each line of the image field, and then for each adjacent pair of lines of the image field, such as 25. The method of feature 24, comprising combining a line between such pairs of lines by averaging the lines of the pairs.
(Feature 26) Applying the field frame de-interlacer, for each of the previous field frame, the current field frame and the next field frame, as a weighted average of these field frames, de-interlaced image frames 25. The method of feature 24, comprising synthesizing.
Feature 27. The method of feature 26, wherein the weights for the previous field frame, current field frame and next field frame are about 25%, 50% and 25%, respectively.
Feature 28 Each de-interlaced image frame and each field frame contains a pixel value, and further, the difference between each corresponding pixel value of each de-interlaced image frame and each corresponding current field frame To a threshold value to produce a difference value, and then as each final pixel value for the de-interlaced image frame, the current field frame if the difference value is within a first threshold comparison range And selecting a corresponding pixel value from the de-interlaced image frame if the difference value is within a second threshold comparison range. Method.
Feature 29. The method of feature 24, wherein the threshold is selected from within a range of about 0.1 to 0.3.
30. The method of feature 28, further comprising smooth filtering each de-interlaced image frame and the current field frame before comparing.
(Feature 31) The method according to Feature 30, wherein the smooth filtering includes down-filtering and subsequent up-filtering.
(Feature 32) Each de-interlaced image frame and each field frame includes a pixel value, and further includes adding a weight amount of each current field frame to a weight amount of each de-interlaced image frame. 25. The method of feature 24.
Feature 33. The method of feature 32, wherein each current field frame has a weight of 1/3 and each de-interlaced image frame has a weight of 2/3.
[0025]
In general, this technique has the following characteristics.
(Feature 34) A method for enhancing the image quality of a video image including a digital pixel value representing a nonlinear signal in an image coding system, wherein the digital pixel value of each video image representing the nonlinear signal is converted into a linear representation Creating a linearized image, applying a transformation function to the at least one linearized image to create a transformed image, and then converting each transformed image to a video image containing digital pixel values representing a non-linear signal A method that involves converting back to
[0026]
In general, this technique has the following characteristics.
(Feature 35) A method for encoding a video image, wherein the horizontal and vertical dimensions of an original image are downsized by first and second selected fractional factors, respectively, to produce a first intermediate image. Encoding the first working image as a compressed base layer, decompressing the base layer and then up-sizing the result by the inverse of the selected fractional factor to produce a second intermediate image, The intermediate image is upsized by the reciprocal of the selected fractional factor, then the result is subtracted from the original image and then the result is weighted to produce the first intermediate result, and the second intermediate image is Subtracting from the original image to create a second intermediate result, adding the first intermediate result and the second intermediate result to create a third intermediate image, and then encoding the third intermediate image to create an enhancement layer Including Method.
[0027]
The feature 35 may include one or more of the following features.
36. The method of feature 35, further comprising cropping and edge feathering the third intermediate image before encoding.
Feature 37. The method of feature 35, wherein the first and second fractional factors are each selected from one of 1/3, 1/2, 2/3, and 3/4.
[0028]
In general, this technique has the following characteristics.
(Feature 38) A method for enhancing image quality in an image coding system, wherein a median filter is applied to a horizontal pixel value of a digital video image, and a median filter is applied to a vertical pixel value of the digital video image. Then averaging the results of the filtering of the horizontal and vertical pixel values to produce a noise-reduced digital video image.
[0029]
The feature 38 may include one or more of the following features.
(Feature 39) A median filter is further applied to the diagonal pixel values of the digital video image, and then the result of filtering the diagonal pixel values of the reduced digital video image is averaged. The method described in 1.
[0030]
In general, this technique has the following characteristics.
(Feature 40) A method for enhancing image quality in an image coding system, wherein a temporal median filter is applied to corresponding pixel values of a previous digital video image, a current digital video image, and a next digital video image. Producing a digital video image with reduced noise.
[0031]
The feature 40 may include one or more of the following features.
Feature 41 The difference between each corresponding pixel value of each noise-reduced digital video image and each corresponding current digital video image is compared with a threshold value to produce a difference value, and then the noise-reduced digital video image As the final pixel value, if the difference value is within the first threshold comparison range, the corresponding pixel value is selected from the current digital video image, and the difference value is within the second threshold comparison range. 41. The method of feature 40, further comprising: selecting a corresponding pixel value from the noise reduced digital video image if present.
42. The method of feature 41, wherein the threshold is selected from the range of about 0.1 to about 0.3.
[0032]
In general, this technique has the following characteristics.
(Feature 43) A method for enhancing image quality in an image coding system, wherein a horizontal median filter is applied to horizontal pixel values of a current digital video image, and a vertical median filter is applied to vertical pixels of the current digital video image. Applying a temporal center filter to the corresponding pixel values of the previous digital video image, the current digital video image and the next digital video image, and then applying the median filter to the horizontal filter, vertical filter and time Applying each of the phase filters to the corresponding pixel value generated to produce a noise-reduced digital video image.
[0033]
In general, this technique has the following characteristics.
(Characteristic 44) A method for enhancing image quality in an image encoding system, which includes the following five items: (1) current digital video image, (2) average value of horizontal median value and vertical median value of current digital video image, (3) Threshold value processed time phase median, (4) Average value of horizontal median and vertical median of the threshold value processed time phase median, and (5) Threshold value processed time phase A method comprising creating a noise-reduced digital video image that includes a linear weighted sum of a median and a horizontal median and vertical median of the current digital video image.
[0034]
The feature 44 may include one or more of the following features.
(Feature 45) The method according to Feature 44, wherein the weights of the five items are about 50%, 15%, 10%, 10%, and 15%, respectively.
(Feature 46) The method according to feature 44, wherein the weights of the five items are about 35%, 20%, 22.5%, 10%, and 12.5%, respectively.
(Feature 47) Confirming a motion vector for each nxn pixel region of the current digital video image for at least one previous digital video image and at least one next digital video image, and each nxn pixel region of the current digital video image, and at least A center-weighted temporal filter is applied to the corresponding motion vector offset nxn pixel region of one previous digital video image and at least one next digital video image to create a motion compensated image, and then the motion compensated image is Adding to the noise reduced digital video image;
45. The method of feature 44, further comprising:
[0035]
In general, this technique has the following characteristics.
48. A method for enhancing image quality in an image coding system, wherein a motion vector for each nxn pixel region of a current digital video image is obtained for at least one previous digital video image and at least one next digital video image. And then apply a center weighted temporal filter to each nxn pixel region of the current digital video image and the corresponding motion vector offset nxn pixel region of at least one previous digital video image and at least one next digital video image Producing a motion compensated image.
[0036]
The feature 48 may include one or more of the following features.
49. The method of feature 48, wherein each digital video image is a de-interlaced field frame.
50. The method of feature 48, wherein each digital video image is a three-field frame de-interlaced image.
51. The method of feature 48, wherein each digital video image is a thresholded three field frame de-interlaced image.
52. The method of feature 48, wherein the center weighted temporal filter is a three image temporal filter having a weight of about 25%, 50%, and 25%, respectively, for each of the images.
Feature 53. The feature 48, wherein the center weighted temporal filter is a five-image temporal filter having a weight of about 10%, 20%, 40%, 20%, and 10%, respectively, for each of the images. the method of.
[0037]
In general, this technique has the following characteristics.
(Feature 54) A method for enhancing image quality in an image coding system, wherein a normal down filter is applied to an image to form a first intermediate image, and a Gaussian up filter is applied to the first intermediate image. Creating a second intermediate image and then adding a weighted fraction of the second intermediate image to the selected image to produce an image with reduced high frequency noise.
[0038]
The feature 54 may include one or more of the following features.
55. The method of feature 54, wherein the weighted fraction is between about 5% and 10% of the second intermediate image.
[0039]
In general, this technique has the following characteristics.
(Feature 56) A method for enhancing image quality in an image coding system, wherein a down filter is applied to a filtered original resolution image to create a first intermediate image of base layer resolution, and a normal down filter Is applied to the first intermediate image to produce a second intermediate image, and a Gaussian up filter is applied to the second intermediate image to produce a third intermediate image. The following three items: (1) An intermediate image, (2) an average value of the horizontal median and vertical median of the first intermediate image, and
(3) generating a noise-reduced digital video image that includes a linear weighted sum of the third intermediate image.
[0040]
The feature 56 may include one or more of the following features.
57. The method of feature 56, wherein the weights of the three items are about 70%, 22.5%, and 7.5%, respectively.
[0041]
In general, this technique has the following characteristics.
(Feature 58) A method for enhancing image quality in an image coding system using ¼ pixel motion compensation, wherein a filter having a negative lobe is inserted between adjacent first and second pixels. Apply to the middle sub-pixel point to produce a half-filtered pixel value and apply a filter with negative lobes to about a quarter of the sub-pixel point between the first and second pixels. Applying, and then applying a filter having a negative lobe to about 3/4 sub-pixel points between the first and second pixels.
[0042]
In general, this technique has the following characteristics.
59. An image code comprising applying a filter having a negative lobe to a middle sub-pixel point between adjacent first and second pixels to produce a half-filtered pixel value. To enhance image quality using half-pixel motion compensation in a computerized system.
[0043]
In general, this technique has the following characteristics.
60. A method for enhancing image quality using ½ pixel motion compensation for luminance channels in an image coding system, including filtering each chrominance channel using ¼ pixel resolution.
[0044]
The feature 60 may include one or more of the following features.
61. The method of feature 60, further comprising applying a filter having a negative lobe to each quarter subpixel point between adjacent first and second chrominance pixels.
[0045]
In general, this technique has the following characteristics.
62. A method for enhancing image quality using quarter-pixel motion compensation for luminance channels in an image coding system, including filtering each chrominance channel using 1/8 pixel resolution.
[0046]
The feature 62 may include one or more of the following features.
63. The method of feature 62 further comprising applying a filter having a negative lobe to each 1/8 subpixel point between adjacent first and second chrominance pixels.
(Feature 64) The method according to any one of features 58, 59, 61 and 63, wherein the filter having a negative lobe is a prefix sinc filter.
[0047]
In general, this technique has the following characteristics.
65. A method for determining the characteristics of an output of an electronic imaging system that generates an input image for a video compression system and modifying the output, wherein the horizontal to create a color pixel sensor type pair of the imaging system. Measure vertical color misalignment, measure noise generated by the color pixel sensor type of the imaging system, and compress the image generated by the imaging system in the image before compressing it in the video compression system. Apply a weighted noise reduction filter to the image with weights that transform color pixels by the amount confirmed by the measured horizontal and vertical color misalignment, and then compensate for any measured amount of noise A method comprising: correcting by.
[0048]
In general, this technique has the following characteristics.
66. A method for determining the characteristics of and modifying the output of a film-based imaging system that produces an input image to a video compression system, the film type being used to record a sequence of images And exposing such film-type test strips under various lighting conditions, and scanning the exposed test strips with an electronic imaging system having known noise characteristics, such as Measures the noise generated by the electronic imaging system during scanning, then video compresses the image generated by the film-based imaging system on the same film type and then scanned by the same electronic imaging system as the test strip. Prior to compression in the system, a noise reduction filter with weights adjusted for any measured amount of noise is applied to the image. A method comprising correcting by using.
[0049]
In general, this technique has the following characteristics.
(Feature 67) A method of optimizing the conversion of a 24 fps film image into video using 3-2 pull-down, in which a 24 fps film image is converted into a digital image, and such a digital image is stored and processed at 24 fps. Or convert it using only a processing device that can communicate directly, store all such digital images in 24 fps format as a digital image source, and perform video conversion with 3-2 pulldown using deterministic frame currency. Performed directly on the fly from the digital image source to create a 3-2 video image sequence, maintaining its deterministic frame guidance for all uses of the 3-2 video image sequence, and then 3-2 After using the video image sequence, cancel the deterministic frame cadence, then 3-2 which method comprises storing tempering also convert the video image sequence to 24fps digital image.
[0050]
In general, this technique has the following characteristics.
(Feature 68) A method of synthesizing a moving image of 24 fps from an image source of 72 fps, wherein each image frame of the moving image of 24 fps is obtained from three consecutive frames derived from the 72 fps image source as a weighted average of the frames. Combining, wherein the weights for the three frames are in the range of [0.1, 0.8, 0.1] to [0.25, 0.50, 0.25], respectively.
[0051]
The feature 68 may include one or more of the following features.
Feature 69. The method of feature 68, wherein the weight is about [0.1667, 0.6666, 0.1667].
[0052]
In general, this technique has the following characteristics.
(Feature 70) A method of synthesizing a moving image of 24 fps from an image source of 120 fps, wherein each image frame of the moving image of 24 fps is used as a weighted average of the frames from five consecutive frames derived from the 120 fps image source. Combining, wherein the weights for the five frames are about [0.1, 0.2, 0.4, 0.2, 0.1].
[0053]
In general, this technique has the following characteristics.
(Feature 71) A method of synthesizing a moving image of 60 fps from an image source of 120 fps, wherein each image frame of the moving image of 60 fps is obtained from three consecutive frames derived from the 120 fps image source as a weighted average of the frames. Combining, the weights of the three frames are in the range [0.1, 0.8, 0.1] to [0.25, 0.50, 0.25] respectively, and such image frames A method comprising overlapping the three consecutive frames used to compose each with the next three consecutive frames used to compose the next image frame by one frame.
[0054]
In general, this technique has the following characteristics.
72. A method of assigning coded bits in a digital video compression system, wherein high compression occurs within a selected frame-based unit of a video image that has been successfully assigned a first fixed number of coded bits. Detecting stress, the detected unit is a high stress unit, and assigning a second constant number of encoded bits greater than the first constant number of encoded bits to compress the high stress unit. Improving and then compressing at least the remaining portion of the high stress unit using a second fixed number of encoded bits.
[0055]
The feature 72 may include one or more of the following features.
73. The method of feature 72, wherein the frame-based unit of the video image includes one of a P frame or a frame in the group of photographs.
74. The method of feature 72, wherein the second fixed number of encoded bits is a simple multiple of the first fixed number of encoded bits.
75. The method of feature 72, wherein the detection of high compressive stress is based on a rate control quantization scale factor parameter for a selected frame-based unit of the video image.
76. The method of feature 72, comprising compressing all high stress units using a second fixed number of encoded bits.
[0056]
In general, this technique has the following characteristics.
77. A method for improving the decoding of compressed digital video information by a decoder having a decoding bit rate and a buffer system, wherein the compressed digital video information has a source bit higher than the decoding bit rate. The compressed digital video information provided and intervened from the source at the rate is preloaded from the source into the first part of the buffer system at the source bit rate and compressed with program content. From the source to the second part of the buffer system, simultaneously preloading at the source bit rate and selectively changing from compressed digital video information in the program content to intervening compressed digital video information; Decode the intervening compressed digital video information Supporting substantially instantaneous changes in the program content.
[0057]
In general, this technique has the following characteristics.
78. A method for improving the decoding of compressed digital video information by a decoder having a buffer system, an average decoding bit rate and at least one decoding bit rate higher than the average decoding bit rate. Digital video information is provided from a source at a source bit rate higher than the average decoding bit rate, and the compressed bit rate of the buffer system includes the increased bit rate module including the module. Compressed digital video information preloaded during the first portion and including the unincreased bid rate module is simultaneously preloaded at the source bit rate into the second portion of the buffer system and then the second portion of the buffer system. The content of the part is averaged in the video image. Decoding the content of the first portion of the buffer system into a video image at a decoding bit rate that is higher than the average decoding bit rate.
[0058]
In general, this technique has the following characteristics.
79. A method for improving the decoding of compressed digital video information by a decoder having a buffer system, an average decoding bit rate and at least one decoding bit rate higher than the average decoding bit rate, Compressed digital video information comprising a compressed enhancement layer provided from a source at a source bit rate higher than the average decoding bit rate, and at a source bit rate, the compressed digital video information at a first part of the buffer system Pre-loaded into the second part of the buffer system at the source bit rate simultaneously with the compressed digital video information including the base layer, and then the content of the second part of the buffer system is Decoding in an image at an average decoding bit rate, and then the buffer system Decoding the first portion of content into a video image at a decoding bit rate higher than the average decoding bit rate.
[0059]
In general, this technique has the following characteristics.
80. A method for improving coding efficiency of a video coding system using a discrete cosine transform (DCT) to encode a base layer and at least one resolution enhancement layer of a video image, each comprising: The base layer is encoded using a DCT block having a first block size, and then each resolution enhancement layer is added to each resolution enhancement layer using DCT blocks each having a block size proportional to the size of the first block. Encoding the enhancement layer such that the resolution of the enhancement layer is proportional to the resolution of the base layer.
[0060]
The feature 80 may include one or more of the following features.
81. Further comprising utilizing a subset of the DCT blocks for the enhancement layer, wherein such subset corresponds to the DCT block for the low level enhancement layer or base layer, the low level enhancement layer or base. 81. The method of feature 80, which increases the accuracy of the signal / noise ratio of such a DCT block relative to a layer.
[0061]
In general, this technique has the following characteristics.
82. A method for determining motion compensation vectors for a base layer and at least one resolution enhancement layer within a video image coding system, wherein the base layer and each resolution enhancement layer are associated with each other in such a layer. Encode using a macroblock sized to cover the pixel area, and specify the encoding prediction performance and associated set of motion vectors for each macroblock in each base layer and resolution enhancement layer. Independently determining the number of motion vector sub-blocks for such a macroblock that optimizes the balance between the number of bits, then the associated independent motion vector set is determined by said determined number of motion vector sub-blocks. Determining one for each of the blocks.
[0062]
In general, this technique has the following characteristics.
(Characteristic 83) A method of compressing a video image coding unit, wherein a plurality of variable length coding tables are applied to each coding unit, and a variable length for performing optimum compression on such a coding unit. Select a coding table, apply the selected variable length coding table to compress such coding units, and then select a variable length coding table for each such coding unit Identifying to the decoder to recover such a coding unit.
[0063]
The feature 83 may include one or more of the following features.
Feature 84. The method of feature 83, wherein the decoding unit is one of a subframe, a frame, or a group of frames.
[0064]
In general, this technique has the following characteristics.
85. A method for encoding and decoding a video image, wherein the video image is structured in a first data stream that is compatible with the basic video compression process and the enhanced video compression process, and with a structure that is only compatible with the enhanced video compression process. Decode only the first data stream on a decoding system that encodes into the second data stream and fits only to the basic video compression process, and then adapts the first data stream and the second data stream to the enhanced video compression process Decoding in combination on a decoding system.
[0065]
The feature 85 may include one or more of the following features.
86. The method of feature 85, wherein the basic video compression process and the enhanced video compression process share a common motion compensated discrete cosine transform structure.
Feature 87. The method of feature 85, wherein the basic video compression process is MPEG-2.
88. The method of feature 87, wherein the enhanced video compression process is MPEG-4.
[0066]
In general, this technique has the following characteristics.
(Feature 89) A method for performing motion compensation encoding of a video image in a layered video compression system, wherein at least one base layer motion vector for a base layer of the encoded video image is determined, and each base layer motion vector is Scale up to the resolution of at least one associated resolution enhancement layer of the video information and then, for each associated resolution enhancement layer, at least one of the motion vectors of each resolution enhancement layer corresponding to one of the base layer motion vectors Determine one and indicate the center point of the limited search range of such an associated resolution enhancement layer using one corresponding base layer motion vector as a guide vector, and such a resolution enhancement layer motion Determining a vector.
[0067]
The feature 89 may include one or more of the following features.
90. The method of feature 89, further comprising encoding only the corresponding resolution enhancement layer motion vector for each enhancement layer.
(Feature 91) The method further includes performing motion compensation on the enhancement layer associated with the resolution enhancement layer motion vector using a vector sum of each resolution enhancement layer motion vector and the corresponding base layer motion vector. 90. The method according to 89.
[0068]
In general, this technique has the following characteristics.
(Characteristic 92) A method for compressing a video image, wherein a first processed image is generated by down-filtering an initial high resolution image, a first motion vector is generated from the initial high resolution image, and the first processed image To create an output base layer, restore the output base layer to create a second processed image, enlarge the second processed image to create a third processed image, and To produce a fourth processed image, subtract the third processed image from the initial high resolution image to create a fifth processed image, and convert the fourth processed image from the initial high resolution image. Subtract to create a sixth processed image, reduce the amplitude of the sixth processed image to create a seventh processed image, add the seventh processed image and the fifth processed image, and add the eighth processed image A finished image, and the eighth processed image is Encoding using the vector to create an output resolution enhancement layer, decoding the output enhancement layer to create a ninth processed image, adding the ninth processed image and the third processed image to add the tenth Create a processed image, subtract the initial high resolution image from the tenth processed image to create an eleventh processed image, increase the amplitude of the eleventh processed image, A separate color channel is extracted from the twelve processed images to form a set of thirteenth processed images, and the set of thirteenth processed images is encoded using the first motion vector. Create a corresponding set of output color resolution enhancement layers, decode the set of output color enhancement layers to produce a set of fourteenth processed images, and the set of fourteenth processed images The fifteenth processed image is combined to reduce the amplitude of the fifteenth processed image. Create a processed image, add the sixteenth processed image and the tenth processed image to create a seventeenth processed image, and subtract the seventeenth processed image from the initial high resolution image Creating an eighteenth processed image and then compressing the eighteenth processed image into an output final difference residual image.
[0069]
In general, this technique has the following characteristics.
(Feature 93) A method for compressing a video image, wherein a base layer is created from an initial high resolution image, a first set of motion vectors is created from an image selected based on the initial high resolution image, and a first difference An image is created from the initial high resolution image and the base layer, a second difference image is created from the initial high resolution image and the processed copy of the initial high resolution image, and then a resolution enhancement layer is added to the first and second layers. Generating from the difference image and the first set of motion vectors.
[0070]
The feature 93 may include one or more of the following features.
90. The method of feature 93, further comprising creating at least one color resolution enhancement layer for at least one selected color.
95. The method of feature 93, further comprising creating a final difference residual image.
96. The method of feature 95, further comprising encoding the final difference residual image.
[0071]
  The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims. The same reference symbols in the various drawings indicate the same elements.
[0072]
Detailed description
  The preferred embodiments and examples presented throughout this description should not be construed as limiting the invention, but should be considered exemplary.
[0073]
Hierarchical resolution
Temporal velocity family goals
  After devising the problems of the prior art, while pursuing the present invention, the following goals were defined in order to specify the temporal characteristics of future digital television systems.
・ High resolution legacy of 24 frames / second film
legacy) best presentation.
-Capture rapidly moving image types such as sporty images.
• Smooth motion presentation of sports and similar images to existing analog NTSC displays as well as computer compatible displays operating at 72 Hz or 75 Hz.
• Reasonable but more efficient movement capture of moving images that are not too fast, such as news and live play images.
Reasonable presentation of all new digital type images to existing NTSC displays through a converter box.
• High-quality presentation of all new digital type images to computer-compatible displays.
• Similar reasonable or high quality presentation for these displays when 60Hz digital standard displays or high resolution displays are marketed.
[0074]
  The 60Hz and 72 / 75Hz displays are basically incompatible with any speed other than 24Hz movie speed, so the best condition would be 72 / 75Hz or 60Hz removed as the display speed . 72 Hz or 75 Hz is N. I. I. Since it is a necessary speed for (National Information Infrastructure) and computer applications, the 60 Hz speed is basically obsolete and will be excluded in the future. However, there are many competing interests in the broadcast and television equipment industry, and there is a strong demand for the new digital television infrastructure to be based on 60 Hz (and 30 Hz). This has led to a very heated debate between the television, broadcast and computer industries.
[0075]
  In addition, some of the interests of the broadcast and television industry for the interlaced 60 Hz format have been highlighted, further widening the gap with computer display requirements. Since non-interlaced display is necessary for computerized applications in digital television systems, a de-interlacer is required when interlaced signals are displayed. Since a deinterlacer is required for any such receiver, there is considerable debate about the cost and quality of the deinterlacer. In addition to de-interlacing, frame rate conversion also has a strong impact on cost and quality. For example, the NTSC-PAL converter continues to be very expensive and the conversion performance is unreliable for many common types of scenes. Since the issue of interlacing is a complex and problematic issue, the present invention will be described in the context of a standard for digital television without interlacing in order to address the problems and issues of temporal speed.
[0076]
Selection of optimum time phase speed
  Beat problem. Optimal presentation for a 72 Hz or 75 Hz display occurs when a camera image or simulated image is generated where the motion speed is equal to the display speed (72 Hz or 75 Hz, respectively) and vice versa. Similarly, optimal motion fidelity for a 60 Hz display is obtained from a 60 Hz camera image or simulated image. Using a generation rate of 72 Hz or 75 Hz, respectively, with a 60 Hz indicator results in a beat frequency of 12 Hz or 15 Hz. Although this beat can be removed through motion analysis, motion analysis is expensive and inaccurate, often resulting in visible artifacts and temporal aliasing. Without motion analysis, the beat frequency dominates the perceived display speed, and a 12 Hz or 15 Hz beat will provide less accurate movement than 24 Hz. Thus, 24 Hz forms a natural temporal common denominator between 60 Hz and 72 Hz. 75Hz has a beat that is only 15Hz higher than 60Hz, but its movement is still not as smooth as 24Hz, and if the speed of 24Hz does not increase to 25Hz, an integral relationship between 75Hz and 24Hz. (European 50 Hz countries often show movies 4% faster than at 25 Hz. This can be done to allow the film to be presented on a 75 Hz display).
[0077]
  Without motion analysis on each receiver, 60 Hz motion on a 72 Hz or 75 Hz display and 75 Hz or 72 Hz motion on a 60 Hz display are less smooth than a 24 Hz image. Thus, neither 72/75 Hz movement nor 60 Hz movement is suitable for reaching a heterogeneous display population that includes both 72 Hz or 75 Hz displays and 60 Hz displays.
[0078]
  3-2 pull down. During the telecine conversion (film to video conversion) process, “3-2 pulldown” is used in combination with video effects, which makes it more complicated when selecting an optimal frame rate. During such conversion, the 3-2 pulldown pattern repeats the first frame (or field) three times, then the next frame twice, then the next frame three times, then the next frame twice, etc. repeat. This shows how 24 fps film is provided to television at 60 Hz (actually 59.94 Hz for NTSC color). That is, 12 pairs of 2 frames in 1 second of film are each displayed 5 times, providing 60 images per second. The 3-2 pull-down pattern is shown in FIG.
[0079]
  By some estimates, more than half of all films on the video were adjusted for 24 fps film at a video field rate of 59.94 Hz for a significant portion. These adjustments include “pan-and-scan”, color correction and title scrolling. Many more films are timed by dropping frames or clipping the start and end of the scene to fit within a planned scheduled broadcast. These operations cannot reverse the 3-2 pulldown process because there is both 59.94 Hz and 24 Hz motion. This makes the film very difficult to compress using the MPEG-2 standard. Fortunately, the above problem is limited to existing NTSC resolution materials, as there is no significant library of high resolution digital film using 3-2 pulldown.
[0080]
  Motion blur. To further explore the problem of finding common temporal velocities higher than 24 Hz, it is useful to describe motion blur when capturing moving images. The camera sensor and motion picture film are open to sense the moving image for a portion of the time of each frame. This exposure time for movie cameras and many video cameras is adjustable. Film cameras require time to advance the film and are usually limited to open about 210 ° out of 360 ° or to a 58% duty cycle. A video camera with a CCD sensor is often required for a portion of its frame time to “read” an image from the sensor. This can vary the frame time from 10% to 50%. For some sensors, an electronic shutter must be used to block the light during this readout time. Thus, the “duty cycle” of the CCD sensor typically varies from 50% to 90% and can be adjusted for some cameras. The optical shutter can be adjusted from time to time to further reduce the duty cycle, if desired. However, for both film and video, the most common sensor duty cycle duration is 50%.
[0081]
  Preferred speed. With this problem in mind, it can be considered to use only a few frames from an image sequence captured at 72 Hz or 75 Hz. The subrate shown in Table 1 can be derived using one of the two, three, four, etc. frames.
[0082]
[Table 1]

[0083]
  A rate of 15 Hz is a rate that unify between 60 Hz and 75 Hz. The 12 Hz rate is a rate that unifies between 60 Hz and 72 Hz. However, when speeds exceeding 24 Hz are required, these speeds disappear. Although 24 Hz is not common, the use of 3-2 pulldown to present on a 60 Hz display has become accepted by industry. The best candidate speeds are therefore 30 Hz, 36 Hz and 37.5 Hz. 30 Hz is not useful as a candidate because it has a 7.5 Hz beat of 75 Hz and a 6 Hz beat of 72 Hz.
[0084]
  Motion speeds of 36 Hz and 37.5 Hz are the best candidates for smoother motion than 24 Hz material when provided on 60 Hz and 72/75 Hz displays. Both of these speeds are about 50% faster and smoother than 24 Hz. The speed of 37.5 Hz is not suitable for use with 60 Hz or 72 Hz, so except that only 36 Hz should be left as having the desired temporal velocity characteristics (a motion speed of 37.5 Hz is Can be used if the 60Hz display speed for John can move 4% to 62.5Hz, 62.5Hz would be unfavorable if there was a benefit below 60Hz, who proposed a very outdated 59.94Hz speed for new television systems However, if such changes are to be made, other aspects of the invention can be applied to a 37.5 Hz rate).
[0085]
  The speeds of 24, 36, 60 and 72 Hz are left as candidates for the family of temporal speeds. The speeds of 72 Hz and 60 Hz cannot be used as the distribution speed. This is because when converting between these two velocities, the movement is less smooth than when 24 Hz is used as the distribution velocity as described above. By hypothesis, the inventors are looking for speeds faster than 24 Hz. Thus, 36 Hz is the best candidate as a master and is used for 60 Hz and 72/75 Hz displays with unified motion capture and image distribution speed.
[0086]
  As mentioned above, the 3-2 pulldown pattern for 24 Hz material is the first frame (or field) 3 times, then the next frame 2 times, then the next frame 3 times, then the next frame 2 Repeat etc. When utilizing 36 Hz, each pattern must be optimally repeated with a 2-1-2 pattern. This is shown in Table 2 and shown schematically in FIG.
[0087]
[Table 2]

[0088]
  This relationship between 36 Hz and 60 Hz holds only for true 36 Hz material. While 60 Hz material can be “stored” in 30 Hz when interlaced, 36 Hz cannot be reasonably created from 60 Hz without motion analysis and reconstruction. However, when looking for a new speed to capture motion, 36 Hz provides a slightly smoother motion over 60 Hz than 24 Hz, providing a substantially better image motion smoothness to a 72 Hz display. Provide above. Thus, 36Hz is the optimal speed for the master, and it is used for 60Hz and 72Hz displays with a unified motion capture and image distribution speed, providing smoother motion than the 24Hz material provided for such displays. Arise.
[0089]
  36 Hz meets the above objective but is not the only appropriate capture rate. Since 36 Hz cannot be easily extracted from 60 Hz, 60 Hz does not provide an adequate rate for capture. However, 72 Hz can be used for capture with any other frame used as a reference for the 36 Hz distribution. The motion blur resulting from the use of any other frame other than 72Hz material is 1/2 of the motion blur for 36Hz capture. Tests for the appearance of any third frame motion blur from 72 Hz indicate that staccato strobing at 24 Hz is undesirable. However, using any other frame derived from 72 Hz for a 36 Hz display is less uncomfortable to the eye compared to a 36 Hz native capture.
[0090]
  Thus, 36 Hz gives the 72 Hz display the opportunity to provide very smooth movement by capturing at 72 Hz, but achieves a 36 Hz distribution rate using an alternative frame of 72 Hz native capture material and then 2 Providing a 60 Hz display with better motion than 24 Hz material by utilizing a -1-2 pulldown to produce a 60 Hz image.
[0091]
  In summary, Table 3 shows the preferred optimal temporal speed for capture and distribution according to the present invention.
[0092]
[Table 3]

[0093]
  It is noteworthy that this technique that utilizes an alternative frame from a 72 Hz camera to achieve a 36 Hz distributed speed can benefit from an increased motion blue duty cycle. A normal 50% duty cycle at 72 Hz, producing a 25% duty cycle at 36 Hz, has proven to be acceptable and represents a significant improvement over 24 Hz for 60 Hz and 72 Hz displays. However, if its duty cycle increases within the range of 75-90%, the 36 Hz sample begins to approach the more common 50% duty cycle. Increasing the duty rate can be achieved, for example, by utilizing a “auxiliary storage” CCD structure that has a short blanking time and produces a high duty cycle. Other methods including a dual CCD multiplexed structure can be used.
[0094]
Modified MPEG-2 compression
  For effective storage and distribution, a digital source material having a preferred temporal velocity of 36 Hz must be compressed. The preferred form of compression of the present invention is accomplished using a novel variant of the MPEG-2 standard, but can be used with other compression systems (eg MPEG-4) with similar characteristics.
[0095]
  The basic principle of MPEG-2. MPEG-2 is an international video compression standard that defines a video syntax that provides an effective way to represent image sequences in the form of more compact encoded data. The language of the coded bits is “syntax”. For example, several tokens can represent the entire block of 64 samples. MPEG also describes a decoding (reconstruction) process in which the encoded bits are mapped by compactly representing them in the original “raw” format of the image sequence. For example, a flag in the coded bit stream signals whether the following bits should be decoded with a discrete cosine transform (DCT) algorithm or a forecast algorithm. These algorithms, including the decoding process, are coordinated by semantic rules defined in MPEG. This syntax can be applied to exploit common video characteristics such as spatial redundancy, temporal redundancy, uniform motion, and spatial masking. In fact, MPEG-2 defines a programming language as well as a data format. The MPEG-2 decoder analyzes and decodes the received data stream, but a wide range of possible data structures and compression techniques can be used as long as the data stream follows the MPEG-2 syntax. The present invention takes advantage of the above flexibility by devising new means and methods for scaling time and resolution using the MPEG-2 standard.
[0096]
  MPEG-2 uses intraframe and compression intraframe methods. In most video scenes, the background remains relatively stable, but the action occurs in the foreground. The background can move, but most of the scene is redundant. MPEG-2 begins its compression by creating a reference frame called an I (intra) frame. Since I frames are compressed regardless of other frames, they contain all frames of video information. I-frames provide an entry point into the data bitstream for random access, but are only moderately compressed. In general, data representing an I frame is arranged every 10 to 15 frames in a bit stream. Thereafter, only the very small portion of the frame that falls between the reference I frames differs from the bracketing I frame, so only the difference is captured, compressed, and stored. To obtain such a difference, two types of frames are used: P (for prediction) frame and B (interpolated in two directions).
[0097]
  P frames are generally encoded with reference to past frames (I frames or previous P frames) and are generally used as a reference for future P frames. P frames are subject to significant compression. B-frame images provide maximum compression, but both past and future criteria are generally needed to encode. Bi-directional frames are never used as reference frames.
[0098]
  In addition, macroblocks in the P frame can be individually encoded by using an intraframe encoding method. Macroblocks in B frames are inter-frame encoded, forward predicted coding, backward predicted encoding or both forward and reverse methods, or interpolated in two directions. Can be encoded individually using the predicted forecast encoding method. A macroblock is a 16 × 16 pixel grouping of four 8 × 8 DCT blocks with one motion vector for P frames and one or more motion vectors for B frames.
[0099]
  After encoding, the MPEG data bitstream contains a sequence of I, P and B frames. A sequence may consist of almost any pattern of I, P, and B frames (with a few minor implications for its placement). However, having a fixed pattern (eg IBBPBBPBBPBBPBB) is common in industry practice.
[0100]
  As an important part of the present invention, an MPEG-2 data stream is created that includes a base layer, at least one optional temporal enhancement layer, and an optional resolution enhancement layer. Each of these layers will be described in detail.
[0101]
Temporal scalability
  Base layer. This base layer is used to carry 36 Hz source material. In the preferred embodiment, one of two MPEG-2 frame sequences, IBPBPBP or IPPPPPP, can be used for the base layer. The latter pattern is most preferred because the decoder is only needed to decode P frames, and if a 24 Hz movie is decoded without B frames, the required memory bandwidth is reduced.
[0102]
  72Hz time phase enhancement layer. When using MPEG-2 compression, if the P frame distance is constant, it is possible to embed a 36 Hz temporal enhancement layer as a B frame in the MPEG-2 sequence for the 36 Hz base layer. This allows a single data stream to support a 36 Hz display and a 72 Hz display. For example, both of these layers can be decoded to produce a 72 Hz signal for a computer monitor, but only the base layer can be decoded and transformed to produce a 60 Hz signal for a television. it can.
[0103]
  In the preferred embodiment, both MPEG-2 coding patterns, IPBBBPBBBBPBBBP or IPBPBPBPB, can place an alternate frame in a separate stream that includes only temporal enhancement B frames, resulting in 36 Hz to 72 Hz. These coding patterns are shown in FIGS. 2 and 3, respectively. The encoding pattern of 2-Frame P spacing shown in FIG. 3 shows that if a 24 Hz movie is decoded without B frames, the 36 Hz decoder only needs to decode P frames. There is an additional advantage that the required memory bandwidth is reduced.
[0104]
  Experiments with high resolution images suggested that the 2 frame P spacing shown in FIG. 3 is optimal for most types of images. That is, the structure shown in FIG. 3 appears to provide an optimal time phase structure that supports both 60 Hz and 72 Hz, while providing superior results for the latest 72 Hz computer compatible displays. This structure achieves 72 Hz in two digital streams, the base layer 36 Hz digital stream and the enhancement layer B frame 36 Hz digital stream. This is illustrated in FIG. 4, which shows that the 36 Hz base layer MPEG-2 decoder 50 simply decodes the P frame to produce a 36 Hz output, which is then immediately converted to a 60 Hz or 72 Hz display. It is a block diagram which shows what can be done. An optional second decoder 52 simply decodes the B frame to produce a second 36 Hz output, which is then combined with the 36 Hz output of the base layer decoder 50 to produce a 72 Hz output ( The coupling method will be discussed below). In another embodiment, a single high speed MPEG-2 decoder 50 can decode both base layer P frames and enhancement layer B frames.
[0105]
  Optimal master format. Many companies produce MPEG-2 decoding chips that operate at about 11 megapixels / second. The MPEG-2 standard defines several “profiles” for resolution and frame rate. These profiles are strongly biased towards computer-incompatible format parameters such as 60 Hz, non-square pixels and interlace, but many chip manufacturers have decoders that operate at “main profile, main level”. The chip seems to be under development. This profile is defined as a horizontal resolution up to 720 pixels, a vertical resolution up to 25 Hz up to 576 lines and a frame rate up to 30 Hz up to 480 lines. A wide range of data rates from about 1.5 megabits / second to about 10 megabits / second has also been specified. However, from a chip perspective, an important issue is the rate at which pixels are decoded. The pixel speed of the main level main profile is about 10.5 megapixels / second.
[0106]
  Depending on the chip manufacturer, most MPEG-2 decoder chips actually operate at up to 13 megapixels / second when given fast support memory. Some decoder chips operate at high speeds of 20 megapixels / second or more. If the CPU chip is improved by 50% or more at a predetermined cost every year, some flexibility can be expected in the near future to the pixel speed of the MPEG-2 decoder chip.
[0107]
  Table 4 shows some desirable resolutions and frame rates and their corresponding pixel rates.
[0108]
[Table 4]

[0109]
  All of these formats are available on MPEG-2 decoder chips capable of generating at least 12.6 megapixels / second. The highly desirable 640 × 480 in the 36 Hz format can be achieved with almost all current chips. This is because the speed of these chips is 11.1 megapixels / second. A wide screen 1024 × 512 image can be squeezed to 680 × 512 using a 1.5: 1 squeeze, so it can be supported at 36 Hz when operating at 12.5 megapixels / second. A highly desirable square pixel widescreen template of 1024 × 512 can achieve 36 Hz if the MPEG-2 decoder chip can process about 18.9 megapixels per second. This becomes even more feasible when 24 Hz and 36 Hz material is encoded only in P frames, so that B frames are only required for 72 Hz time phase enhancement layer decoders. A decoder that uses only P frames requires only a small memory and a small memory bandwidth, so the goal of 19 megapixels / second is even more reachable.
[0110]
  1024 × 512 resolution templates are most often used with 2.35: 1 and 1.85: 1 aspect ratio films at 24 fps. This material requires only 11.8 megapixels / second and must fit within the limits of most existing main level-main profile decoders.
[0111]
  All of these formats in the “master template” for the base layer at 24 Hz or 36 Hz are shown in FIG. Thus, the present invention provides a unique way of adapting a wide range of aspect ratios and temporal resolutions compared to the prior art (further discussion on master templates is given below).
[0112]
  The B-frame temporal enhancement layer that generates 72 Hz uses a chip at a pixel rate twice that of the pixel rate, or by using a parallel second chip that provides additional access to the decoder memory. Can be decrypted. In accordance with the present invention, there are at least two ways to merge the enhancement layer and base layer data streams and insert alternative B-frames. In the first method, merging can be done invisible to the encoder chip using the MPEG-2 transport layer. An MPEG-2 transport packet for two PIDs (program IDs) can be recognized as including a base layer and an enhancement layer, so that their stream content can both be operated at twice the speed. It can easily be sent to a chip or a properly arranged pair of normal speed decoders. In the second method, the “data partitioning” function of the MPEG-2 data stream can be used instead of the transport layer derived from the MPEG-2 system. Its data partitioning function can mark B frames as belonging to different classes in the MPEG-2 compressed data stream, so a 36Hz decoder that flags and supports only temporal base layer speed Can be ignored.
[0113]
  The temporal scalability defined by MPEG-2 video compression is not as optimal as the simple B-frame partition of the present invention. MPEG-2 temporal scalability is obtained with the B frame coding proposed in this application, which is referenced in both the forward and reverse directions, since it is referenced only in the forward direction from the previous P-frame or B-frame. Lacks efficacy. Therefore, simply using a B frame as a temporal enhancement layer provides a simpler and more effective temporal scalability than the temporal scalability defined in MPEG-2. Nevertheless, the use of B frames as described above as a temporal scalability mechanism is well suited to MPEG-2. In addition, these B frames are used as an enhancement layer, and two methods for discriminating by the data partition for the B frame or another PID are also well suited.
[0114]
  50/60 Hz time phase enhancement layer. In addition to or instead of the 72 Hz time phase enhancement layer (which encodes a 36 Hz signal), a 60 Hz time phase enhancement layer (which encodes a 24 Hz signal) is applied in a similar manner to the 36 Hz base layer. Can be added. The 60 Hz temporal enhancement layer is particularly useful for encoding existing 60 Hz interlaced video material.
[0115]
  Most existing 60 Hz interlaced material is videotapes for analog NTSC, D1 or D2 formats. There are also a few Japanese HDTVs (SMPTE 240 / 260M). Some cameras operate in this format. Such a 60 Hz interlaced format is processed in a known manner so that the signal is de-interlaced and the frame rate can be converted. This process requires a very complex image understanding method similar to robot vision. Even in the case of very sophisticated techniques, temporal aliasing generally results in “misunderstanding” by the algorithm and sometimes produces artifacts. It should be noted that a typical 50% duty cycle of image capture means that the camera is “not looking” for half the time. The “reverse wagon wheel” in the movie is an example of temporal aliasing caused by this normal practice of temporal misunderstanding. Such artifacts cannot generally be removed without human-assisted reconstruction. Therefore, there is always a case that cannot be automatically corrected. However, the motion transformation results available with current techniques must be reasonable for most materials.
[0116]
  The price of a single high definition camera or tape machine would be similar to the cost of such a converter. Therefore, the cost of such conversion in a studio with a number of cameras and tape machines is reasonable. However, appropriately performing such processing currently exceeds the budget of home and office products (budget). Therefore, it is preferable to achieve a complicated process of converting the frame rate with respect to the existing material, excluding the interlace, in the origination studio. This is illustrated in FIG. 5, which shows from a camera 60 or other source (eg, non-film videotape) 62 from a 36 Hz signal (36 Hz base layer only) and a 72 Hz signal (36 Hz base layer plus a time enhancement layer). FIG. 6 is a block diagram showing a 60 Hz interlaced input to a converter 64 including a de-interlacer function capable of outputting (36 Hz) and a frame rate conversion function.
[0117]
  As an alternative to outputting a 72 Hz signal (36 Hz from the base layer plus 36 Hz from the time enhancement layer), this transformation method is a second method that regenerates the original 60 Hz signal that is de-interlaced on the 36 Hz base layer. Can be adapted to produce a 24 Hz temporal enhancement layer of MPEG-2. If a similar quantization method is used for B frames in the 60 Hz temporal enhancement layer, the data rate should be slightly lower than in the 72 Hz temporal enhancement layer because there are fewer B frames.
[0118]
  > 60I → 36 + 36 = 72
  > 60I → 36 + 24 = 60
  > 72 → 36,72,60
  > 50I → 36, 50, 72
  > 60 → 24, 36, 72
[0119]
  The majority of materials of interest in the United States are low resolution NTSCs. Currently, most NTSC signals on most home televisions have significant damage. In addition, viewers have become aware of the inherent temporal damage when using 3-2 pulldown to provide film to television. Almost all primetime television is made of 24 frames per second film. Therefore, only sports, news and other video original shows need to be processed in this manner. Artifacts and losses associated with converting these shows to 36/72 Hz format are easily overridden by improvements associated with high quality de-interlacing of the signal.
[0120]
  It should be noted that the motion blur inherent to the 60 Hz (or 59.94 Hz) field should be very similar to the motion blur of the 72 Hz frame. Thus, this method of providing a base layer and a reinforcement layer should be similar to 72 Hz origination for motion blur. So most viewers will notice a slight improvement when the interlaced 60Hz NTSC material is processed at 36Hz base layer plus 24Hz from the temporal enhancement layer and displayed at 60Hz. I don't notice the difference except in some cases. However, those who buy a new 72 Hz digital de-interlaced television notice a small improvement when viewing NTSC and notice a significant improvement when viewing new material that is captured or produced at 72 Hz. Even the decoded 36 Hz base layer provided in the 72 Hz display looks as good as a high quality digital NTSC, replacing the interlace artifacts at a lower frame rate.
[0121]
  The same method can also be applied to convert existing PAL 50 Hz material to a second MPEG-2 enhancement layer. PAL videotapes are most appropriately slowed before making such changes. Live PAL needs to convert using relatively unrelated speeds of 50 Hz, 36 Hz and 72 Hz. Since such a converter unit is currently only available at the source of the broadcast signal, it is not currently practical for each receiver in a home or office.
[0122]
Resolution scalability
  The base resolution template can be enhanced using hierarchical resolution scalability utilizing MPEG-2 provided on the base layer to achieve higher resolution. With enhancement, 1.5x and 2x resolution can be achieved in the base layer. A double resolution can be achieved in two steps, using 3/2 and then 4/3, or that step may be a single factor-of-two step. This is shown in FIG.
[0123]
  This method of resolution enhancement can be accomplished by creating a resolution enhancement layer as an independent MPEG-2 stream and then applying MPEG-2 compression to the enhancement layer. This method is different from “spatial scalability” defined by MPEG-2 and proven to be not highly effective. However, MPEG-2 has all the tools to build an effective layered resolution and provide spatial scalability. A preferred hierarchical resolution encoding method of the present invention is shown in FIG. A preferred decoding method of the present invention is shown in FIG.
[0124]
  Resolution layer encoding. In FIG. 8, a 2k × 1k original image 80 is down-filtered by half the resolution in each dimension, preferably using an optimization filter with negative lobes (see discussion in FIG. 12 below) A base layer 81 of 1024 × 512 is generated. This base layer 81 is then compressed by a normal MPEG-2 algorithm to produce an MPEG-2 base layer 82 suitable for transmission. Importantly, full motion compensation of MPEG-2 can be utilized during this compression step. The same signal is then restored to a 1024 × 512 image 83 using the normal MPEG-2 algorithm. The 1024 × 512 image 83 is expanded into a first 2k × 1k magnified image 84 (eg, with superior upfilters or negative lobes such as by pixel replication or preferably spline interpolation) Depending on the filter, see the discussion of FIGS. 13A and 13B below).
[0125]
  Meanwhile, as an optional step, the filtered 1024 × 512 base layer 81 is expanded to a second 2k × 1k expansion layer 85. This second 2k × 1k magnification layer 85 is subtracted from the original 2k × 1k image 80 to give a top octave of resolution between the original high resolution image 80 and the original base layer image 81. Generate the image shown. The resulting image is arbitrarily multiplied by a sharpness factor or weight and then added to the difference between the original 2k × 1k image 80 and the second 2k × 1k magnified image 85 to give a center weight of 2k × 1k. An enhancement layer source image 86 is generated. This enhancement layer source image 86 is then compressed according to the normal MPEG-2 algorithm to produce another MPEG-2 resolution enhancement layer 87 suitable for transmission. Importantly, full MPEG-2 motion compression can be used during this compression step.
[0126]
  Resolution decoding. In FIG. 9, the base layer 82 is restored to a 1024 × 512 image 90 using a normal MPEG-2 algorithm. The 1024 × 512 image 90 is expanded to the first 2k × 1k image 91. On the other hand, the resolution enhancement layer 87 is restored to the second 2k × 1k image 92 using a normal MPEG-2 algorithm. Next, the first 2k × 1k image 91 and the second 2k × 1k image 92 are added to generate a high-resolution 2k × 1k image 93.
[0127]
  Improvements over MPEG-2. In essence, the enhancement layer is created by extending the decoded base layer, taking the difference between the original image and the decoded base layer, and then compressing. However, a compressed resolution enhancement layer can optionally be added to the base layer after decoding to create a high resolution image in the decoder. The hierarchical resolution encoding method of the present invention differs from MPEG-2 spatial scalability in the following various ways.
・ Enhancement layer difference
picture) is compressed by its I, B and P frames as its own MPEG-2 data stream. This difference shows the main reason why the resolution scalability proposed in this application is effective when MPEG-2 spatial scalability is ineffective. Spatial scalability defined within MPEG-2 is defined as the upper layer, the difference between the upper layer photo and the extended base layer, or the MPEG-2 data stream compensated for actual photo motion, or a combination of both. Can be encoded as However, none of these encodings are valid. The difference from the base layer can be regarded as a difference of one I frame, and this difference is ineffective compared to the difference photograph compensated for motion as in the present invention. Further, the upper layer encoding defined in MPEG-2 is not effective because it is the same as the complete encoding of the upper layer. Therefore, the motion-compensated coding as in the present invention is substantially more effective.
Since the enhancement layer is an independent MPEG-2 data stream, the transport layer (or other similar mechanism) of the MPEG-2 system must be used to multiplex the base layer and the enhancement layer.
The extended and reduced resolution (down) filtering may be a Gaussian or spline function, or a filter with negative lobes (see FIG. 12), which are bilinear interpolated as specified in MPEG-2 spatial scalability. Is more optimal than
-The aspect ratio of the image must match between the low and high layers of the preferred embodiment. In MPEG-2 spatial scalability, width and / or height can be extended. Such extension is not possible in the preferred embodiment due to efficiency requirements.
• Due to the efficiency requirements and the extreme amount of compression used for the enhancement layer, the entire region of the enhancement layer is not encoded. The region that is not strengthened is usually the border region. Accordingly, the 2k × 1k enhancement layer source image 86 of the preferred embodiment is weighted in the center. In a preferred embodiment, a fading function (eg, a linear weighting function) is used to cause the enhancement layer to “feather” from the edge of the boundary toward the center of the image, to Avoid sudden transitions. In addition, methods that manually or automatically determine the areas with details that the eye follows can be utilized to select areas that require detail and to eliminate areas that do not require extra detail. Since the entire image has details up to the level of the base layer, the entire direct image exists. Only particularly important areas gain gain from the enhancement layer. If there are no other criteria, the edge or border of the frame can be excluded from enhancement as in the central weighting embodiment described above. Used as an MPEG-2 parameter, i.e. a signed negative integer, using the "lower layer-prediction-horizontal and vertical-offset" parameter combined with the "horizontal and vertical sub-sampling-factor-m & n" value Thus, it is possible to specify the overall size of the rectangle of the reinforcing layer and the arrangement within the expanded base layer.
Add a sharpness factor to the enhancement layer to offset the sharpness loss that occurs during quantization. Care must be taken that this parameter is only used to restore the clarity and sharpness of the original photo and not to enhance the image. As described above in connection with FIG. 8, the sharpness factor is the “high octave” of resolution between the original high resolution image 80 and the original base layer image 81 (after expansion). This high octave image is quite noisy in addition to the sharpness and detail of the high octave resolution. If this image is added too much, the compensation compensated for the enhancement layer motion may become unstable. The amount that must be added depends on the noise level of the original image. A typical weighting value is 0.25. Never add sharpness to noisy images. And it is a good idea to suppress the original noise for the enhancement layer before compression using a conventional noise suppression method that preserves detail.
Time phase and resolution scalability are mixed by using B-frames for time phase enhancement from 36 Hz to 72 Hz in both the base layer and the resolution enhancement layer. Thus, four possible levels of decoding performance are possible with two layers of resolution scalability, since there are options available at two levels of temporal scalability.
[0128]
  These differences represent a substantial improvement over MPEG-2 spatial and temporal scalability. However, these differences are consistent with the MPEG-2 decoder chip, but in order to extend and add with the resolution enhanced decoding method shown in FIG. 9, the decoder requires additional logic. Such additional logic is nearly identical to that required by the less efficient MPEG-2 spatial scalability.
[0129]
  Optional non-MPEG-2 encoding of the resolution enhancement layer. It is possible to use a compression method different from MPEG-2 for the resolution enhancement layer. Furthermore, it is not necessary to use the same compression method for the resolution enhancement layer as the compression method for the base layer. For example, when a difference layer is encoded, motion-compensated block wavelets can be utilized to track with high efficiency and detail. The most effective location for placing the wavelet will change in magnitude, so jumping around the screen will not be noticed in the low amplitude enhancement layer. Furthermore, it is not necessary to cover the entire image. That is, it is only necessary to place the wavelet on the detail. These wavelets can be arranged with guidance by detail areas in the image. The arrangement can also be biased from the edge.
[0130]
  Multi-resolution enhancement layer. At the bit rate described here, a base layer (1024 × 512 at 72 fps) and a single resolution enhancement layer when 72 frames / second 2 megapixels (2048 × 1024) are encoded at 18.5 megabits / second. Only proved successful. However, further improvements in resolution enhancement layer coding methods are expected to further improve efficiency, and multiple resolution enhancement layers will be possible. For example, a 512 × 256 base layer could be enhanced to 1024 × 512, 1536 × 768 and 2048 × 1024 by four layers. This is possible with the existing MPEG-2 encoding method at a movie frame rate of 24 frames per second. At high frame rates, such as 72 frames per second, MPEG-2 does not provide sufficient efficiency to currently allow this multiple layers when encoding resolution enhancement layers.
[0131]
Mastering format
  A template of 2048 × 1024 pixels or near pixels can be used to create a single digital moving pixel master format source for various release formats. As shown in FIG. 6, the 2k × 1k template can effectively support the normal widescreen aspect ratios: 1.85: 1 and 2.35: 1. A 2k × 1k template can also accept 1.33: 1 and other aspect ratios.
[0132]
  Integers (particularly a factor of 2) and fractions (3/2 and 4/3) are the most effective step sizes for resolution stratification, but any ratio can be used to achieve the required resolution stratification It is also possible to do. However, the use of a 2048 × 1024 template or something similar not only provides a high quality digital master format, but also includes two base layer (1k × 512) factors, including NTSC or US television standards. Can provide many other convenient resolutions.
[0133]
  It is also possible to scan the film at higher resolutions such as 4k x 2k, 4k x 3k or 4k x 4k. Any resolution enhancement can be used to create these higher resolutions from a central master format resolution close to 2k × 1k. Such enhancement layers for film are composed of image details, grain and other noise sources (eg scanner noise). Because of this noise, using a compression method in the enhancement layer to obtain such a very high resolution requires a compression method that replaces MPEG-2 type compression. Fortunately, there are other compression methods that can be used to compress such noisy signals but still maintain the desired detail in the image. An example of such a compression method is a motion compensated wavelet or motion compensated fractal.
[0134]
  A digital mastering format must be created at the frame rate of the film (ie 24 frames / second) if it is created from an existing movie. Sharing both 3-2 pulldown and interlace is inappropriate for a digital film master. The use of 60 Hz interlace as a new digital electronic material will disappear in the near future and will be replaced by a frame rate that is more compatible with computers, such as the proposed 72 Hz. The digital image master must have any frame rate at which images are captured, regardless of 72 Hz, 60 Hz, 36 Hz, 37.5 Hz, 75 Hz, 50 Hz or other frame rates.
[0135]
  The concept of a mastering format as a single digital source photo format for all electronic release formats is that masters such as PAL, NTSC, letterbox, pan-and-scan, HDTV, etc. are all generally independent of the film original. Is different from existing practice. Using a mastering format allows both film shows and digital / electronic shows to be mastered at once for release in various resolutions and formats.
[0136]
Combined resolution enhancement layer and temporal enhancement layer
  As described above, the hierarchy of temporal enhancement and resolution enhancement can be combined. Temporal enhancement is performed by decoding the B frame. Further, since the resolution enhancement layer has two time phase layers, it includes a B frame.
[0137]
  In the case of 24 fps film, the most effective and low cost decoder can only use P frames, so not only the decoding operation of B frames simplifies the decoder but also both memory and memory bandwidth. Can be minimized. Therefore, according to the present invention, a decoder without B-frame performance can be used to decode 24 fps movies and 36 fps advanced television. Next, as shown in FIG. 3, B frames are utilized between P frames to obtain a higher temporal layer of 72 Hz, which can be decoded by a second decoder. This second decoder can be simplified because it only needs to decode B frames.
[0138]
  This hierarchization can only use P and I frames for 24 and 36 fps rates as well. This also applies to layers with enhanced resolution. The resolution enhancement layer can add a full temporal speed of 72 Hz with high resolution by adding decoding of B frames within that layer.
[0139]
  The combined resolution and temporal scaling options for the decoder are shown in FIG. This example also shows the distribution of the data stream rate of about 18 megabits / second to achieve the spatial temporal stratified advanced television of the present invention.
[0140]
  In FIG. 10, a base layer MPEG-2 1024 × 512 pixel data stream (which in the preferred embodiment includes only P frames) is applied to the reference resolution decoder 100. For P frames, a bandwidth of about 5 megabits / second is required. The reference resolution decoder 100 can decode at 24 fps or 36 fps. The output of the reference resolution decoder 100 includes a low resolution, low frame rate image (1024 × 512 pixels at 24 Hz or 36 Hz).
[0141]
  B frames from the same data stream are analyzed and applied to the reference resolution temporal enhancement layer decoder 102. A bandwidth of about 3 megabits / second is required for such B frames. The output of the reference resolution decoder 100 is also connected to the time phase enhancement layer decoder 102. The temporal enhancement layer decoder 102 can decode at 36 fps. The combined output of the temporal enhancement layer decoder 102 contains a low resolution and high frame rate image (1024 Hz 512 × 512 pixels).
[0142]
  Also in FIG. 10, the 2k × 1k pixel data stream of the resolution enhancement layer MPEG-2 (which contains only P frames in the preferred embodiment) is applied to the reference time phase high resolution enhancement layer decoder 104. A bandwidth of about 6 megabits / second is required for the P frame. The output of the reference resolution decoder 100 is also coupled to the high resolution enhancement layer decoder 104. The high resolution enhancement layer decoder 104 can decode at 24 fps or 36 fps. The output of the high resolution enhancement layer decoder 104 includes a high resolution, low frame rate image (2k × 1k pixels at 24 Hz or 36 Hz).
[0143]
  B frames from the same data stream are analyzed and applied to the high resolution temporal enhancement layer decoder 106. For such B frames, a bandwidth of about 4 megabits / second is required. The output of the optical resolution enhancement layer decoder 104 is coupled to a high resolution temporal enhancement layer decoder 106. The output of the time phase enhancement layer decoder 102 is also coupled to the high resolution time phase enhancement layer decoder 106. The high resolution temporal enhancement layer decoder 106 can decode at 36 fps. The combined output of the high resolution temporal enhancement layer decoder 106 includes a high resolution and high frame rate image (72 Hz 2k × 1k pixels).
[0144]
  It should be noted that the compression ratio achieved by this scalable encoding mechanism is very high and exhibits excellent compression efficiency. Table 5 shows the compression ratios for the time phase option and the scalability option derived from the example shown in FIG. These ratios are based on primitive RGB pixels at 24 bits / pixel (due to 16 bits / pixel with normal 4: 2: 2 encoding or 12 bits / pixel with normal 4: 2: 0 encoding) When included, the compression ratios are 3/4 and 1/2 of the values shown in Table 5, respectively).
[0145]
[Table 5]

[0146]
  These high pressure ratios are possible due to two factors.
(1) High temporal coherence of images with a high frame rate of 72 Hz;
(2) High spatial coherence of high resolution 2k x 1k images;
(3) Apply resolution detail enhancement to important parts of the image (eg central part) and less important parts (eg not to frame boundaries).
[0147]
  These factors are exploited in the hierarchical compression method of the present invention by utilizing the strength of the MPEG-2 coding syntax. These strengths include B frames interpolated in two directions for temporal scalability. The MPEG-2 syntax expresses effective motion by using motion vectors for both the base layer and the enhancement layer. Up to some threshold of high noise and rapid image change, MPEG-2 is effective in coding details instead of noise in the enhancement layer by DCT quantization and motion compensation. Beyond this threshold, data bandwidth is best allocated to the base layer. These MPEG-2 mechanisms, when used in accordance with the present invention, work in concert to provide highly efficient and effective coding that can scale both in time and space.
[0148]
  Compared to 5 megabits / second encoding of CCIR601 digital video, the compression ratios shown in Table 5 are much higher. One cause is the loss of some coherence due to interlacing. Interlacing has a negative impact on the ability to predict the next frame and field and the correlation of vertically adjacent pixels. Therefore, the main part of the gain of compression efficiency described here is due to the absence of interlace.
[0149]
  The large compression ratio achieved with the present invention can be considered from the balance of the number of bits available to encode each MPEG-2 macroblock. As described above, a macroblock is a 16 × 16 pixel grouping of four 8 × 8 DCT blocks with one motion vector for a P frame and one or more motion vectors for a B frame. Table 6 shows the available bits per macroblock for each layer.
[0150]
[Table 6]

[0151]
  The number of bits available for encoding each macroblock is less in the enhancement layer than in the base layer. The above is appropriate because it is desirable for the base layer to be as good as possible. The motion vector requires about 8 bits, leaving 10-25 bits for the DC and AC coefficients for the macroblock type code and all four 8 × 8 DCT blocks. This leaves space for only a few "strategic" AC coefficients. Therefore, most of the information available for each macroblock must come statistically from the previous frame of the enhancement layer.
[0152]
  MPEG-2 spatial scalability is available because there is not enough data space available to encode the DC and AC coefficients to show the high octave of detail represented by the enhancement difference image. It is easy to see why this compression ratio is not useful. This high octave is represented by the fifth to eighth horizontal AC coefficients and vertical AC coefficients. If only a few bits are available per DCT block, these coefficients cannot be reached.
[0153]
  The system described here gains its effectiveness by utilizing motion compensated prediction from the previous enhanced difference frame. This is clearly effective in producing excellent results for layered coding of time and resolution (space).
[0154]
  Graceful degeneration. The temporal and resolution scaling methods described here work well for materials that operate normally at 72 frames / second, using a 2k x 1k origin. These methods also work well for film-based materials that operate at 24 fps. However, at high frame rates, when a very noisy image is encoded, or if there are a large number of shortcuts in the image stream, the enhancement layer can be used between frames necessary for effective encoding. May lose coherence. The typical MPEG-2 encoder / decoder buffer fullness / rate control mechanism attempts to set the quantizer to a very coarse setting, so the loss is easily detected. The When this situation is encountered, all the bits normally used to encode the resolution enhancement layer require as many bits as possible for the base layer to encode stressful material. Can be assigned to. For example, between about 0.5 and 0.33 megapixels per frame of the base layer at 72 frames / second, the resulting pixel speed is 24 to 36 megapixels / second. Applying all the available bits to the base layer provides approximately 0.5 to 0.67 x 1 million additional bits per frame at 18.5 megabits / second, which can be used for stressful materials. Even for that, it will be sufficient to encode very well.
[0155]
  In the more extreme case, if every frame is noisy and / or there are every few frames that cause cuts, the base layer resolution can be properly degenerated without any further loss. This can be done by removing the B frame encoding time enhancement layer, so that all bandwidth (bits) available for the base layer I and P frames is used at 36 fps. be able to. This increases the amount of data available for each base layer frame between about 1.0 and about 1.5 megabits / frame (depending on the resolution of the base layer). This again results in a fairly good motion rendition speed of 36 fps at a fairly high quality resolution of the base layer, even under extremely stressful coding conditions. However, if the base layer quantizer is still operating at a coarse level at 36 fps under about 18.5 megabits / second, the base layer frame rate is dynamically 24 frames / second, 18 frames / second. Or even down to 12 frames per second (between 1.5 and 4 megabits will be available for every frame) and even the most pathological moving image types could be processed. Methods for changing the frame rate under such circumstances are known in the art.
[0156]
  Current proposals for advanced television in the United States cannot tolerate these proper degeneracy methods and, like the system of the present invention, do not work well for stressful materials.
[0157]
  In most MPEG-2 encoders, the adaptive quantization level is limited by the output buffer fullness. At high compression ratios associated with the resolution enhancement layer of the present invention, this mechanism cannot function optimally. Various methods can be used to optimally assign data to the most appropriate image area. The conceptually simplest method is to perform a pre-pass of encoding of the resolution enhancement layer to find out the details that must be collected and stored. The results obtained from the prepass can be used to set adaptive quantization and optimize the preservation of resolution enhancement layer details. These settings can also be artificially biased non-uniformly with respect to the image so that image details are biased into the main screen area and from the macroblock at the edge of the frame. It is assigned after all.
[0158]
  Since existing decoders work well without keeping such improvements, none of these adjustments are necessary except to leave the enhancement layer boundary at a high frame rate. However, such further improvements can be achieved by making a slight additional effort on the enhancement layer encoder.
[0159]
Conclusion
  It seems optimal to choose 36 Hz as the new common basic time phase velocity. Examples using this frame rate show that this frame rate provides a significant improvement over 24 Hz for both 60 Hz and 72 Hz displays. A 36 Hz image can be created by utilizing all other frames from a 72 Hz image capture. This allows a 36 Hz base layer (preferably using a P frame) and a 36 Hz temporal enhancement layer (using a B frame) to be combined to achieve a 72 Hz display.
[0160]
  The 72 Hz “future-looking” speed is not compromised by the method of the present invention and can be transitioned to a 60 Hz analog NTSC display. The present invention can also be transferred to other 60 Hz displays if the other passive entertainment under consideration (computer incompatible) 60 Hz format can be tolerated.
[0161]
  Resolution scalability can be achieved by using a separate MPEG-2 image data stream for the resolution enhancement layer. Resolution scalability can provide temporal scalability to both the base resolution layer and the enhanced resolution layer using the B frame method.
[0162]
  The invention described herein achieves a number of highly desirable features. Participants in the US advanced television process claim that resolution scalability or temporal scalability cannot be achieved with high definition resolution at about 18.5 megabits per second available in terrestrial broadcasts. However, the present invention achieves both temporal scalability and spatial resolution scalability within this available data rate.
[0163]
  It is also claimed that the high frame rate of 2 megapixels cannot be achieved without the use of interlace at the available data rate of 18.5 megabits / second. However, the present invention not only achieves spatial resolution and temporal scalability, but can provide 2 megapixels at 72 frames / second.
[0164]
  In addition to providing these capabilities, the present invention is also very robust, especially compared to current proposals for advanced television processes. This is made possible by assigning most or all bits to the base layer when encountering very stressful image material. Such stressful materials are inherently noisy and change very quickly. In such an environment, the eye cannot see the details associated with the resolution enhancement layer. Since the bits are applied to the base layer, the duplicated frame is substantially more accurate than currently proposed advanced television systems that utilize a single, constant high resolution.
[0165]
  Thus, this aspect of the system of the present invention optimizes perception and coding efficiency and gives the best visual impact. This system provides very clean images of resolution and frame rate performance that many have considered impossible. This aspect of the system of the present invention is believed to outperform the advanced television formats proposed to date. In addition to this expected better performance, the present invention also provides a valuable feature of time phase and resolution stratification.
[0166]
  In the above discussion, MPEG-2 was used in the example, but these and other aspects of the invention can be implemented using other compression systems. For example, the present invention relates to MPEG-1, MPEG-2, MPEG-4, H.264. Works with similar standards that provide I, B, and P frames or equivalents, such as compression systems such as H.263 (including non-DCT systems such as wavelets).
[0167]
Enhanced hierarchical compression
Overview
  Several enhancements of the above embodiment can be made to handle various video image quality and compression issues. In the following, some of these enhancements are described, but these enhancements are mostly implemented as a set of tools that can be applied to the task of enhancing an image and then compressing the image. These tools can be combined in various ways as desired by content developers to optimize the visual quality and compression efficiency of compressed data streams, especially layered compressed data streams. Can do.
[0168]
Enhancement layer motion vectors and gray bias
  The usual method of encoding the resolution enhancement layer using MPEG type (eg MPEG-2, MPEG-4 or comparable system) compression is to bias the difference picture with gray bias. It is. Within the normal 8-bit pixel value range from 0 = black to 255 = white, the midpoint of 128 is typically used as the gray bias value. Values lower than 128 represent negative differences between images, and values greater than 128 represent positive differences between images (for 10-bit systems, gray is 512, and in other bit ranges 0 = black and 1023 = white etc.).
[0169]
  The difference photograph is found by subtracting the expanded and then restored base layer from the original high resolution image. The sequence of these difference pictures is then encoded as an MPEG type difference picture stream of frames that operate as a normal MPEG type picture stream. The gray bias value is removed when each difference photo is added to another image (eg, an extended decoded base layer) so that the resolution is improved.
[0170]
  Another source useful for reducing noise is information from previous and next frames (ieMedian timeMedian time phase). As described below, motion analysis provides the best match for moving regions. However, motion analysis is compute intensive. If a region of the image is not moving or moving slowly, the red value from the current pixel (and the green and blue values) is filtered by the red value at the same pixel location in the previous and next frames. The median value is acceptable. However, if there is significant movement and such a temporal filter is used, abnormal artifacts may occur. Therefore, it is preferable to first select a threshold value to see if such a median value differs from the current pixel value beyond the selected size. The threshold value can be calculated as follows in substantially the same manner as the de-interlacing threshold value.
Rdiff = R_current_pixel minus R_time phase_median
Gdiff = G_current_pixel minus G_time phase_median
Bdiff = B_current_pixel minus B_time phase_median
Threshold processing value = abs (Rdiff + Gdiff + Bdiff) + abs (Rdiff) + abs (Gdiff) + abs (Bdiff)
[0171]
  The threshold processing value is then compared with a threshold setting value. Typical threshold setting values are in the range of 0.1 to 0.3, with 0.2 being common. If it is above that threshold, the current value is retained. Below that thresholdTemporalThe median is used.
[0172]
  Additional median types are X, Y andTemporalMedian selected from the median. Another median type isTemporalChoose the median, then choose the average of the median X and Y from it.
[0173]
  Each type of median can cause problems. The median of X and Y makes the image smear and blur, so the image looks “greasy”.TemporalThe median blurs the movement over time. Each median poses a problem and the characteristics of each median are different (in a sense, “orthogonal”), so combining the various medians may give the best results. Confirmed by experiment.
[0174]
  In particular, the preferred combination of medians is a linear weighted sum of the following five items that determine the value for each pixel of the current image (see discussion above regarding linear video processing).
50% of the current image (thus the maximum noise reduction is 3db or 1/2);
15% of the average of the median X and Y;
ThresholdedTemporal10% of the median;
ThresholdedTemporal10% of the average of the median X and Y median;
And 3-way X, Y andTemporal15% of the median.
[0175]
  In this way, the base layer can represent an image size that is narrower or shorter (or both) than the high resolution enhanced image. As a result, the enhancement layer not only contains the actual photo, but also shows a differential photo of the gray-biased image corresponding to the size of the expanded restored base layer (ie, the expanded base region 1102). Contains. The compressed enhancement layer is encoded as a standard MPEG type photo stream, so it is an actual photo of the edge region and the inner region is not identified as a differential photo, both encoded and framed The same photo style of both will be used together.
[0176]
  In the preferred embodiment, the outer edge region of the expanded decompression base layer size is a normal high resolution MPEG type encoded stream. The edge region has an efficiency corresponding to normal MPEG type encoding of high resolution photographs. However, since it is an edge region, the motion vector in the difference photo region must be constrained not to point to the border region (including actual photo information). Also, the motion vector of the boundary actual photo area must be constrained so as not to point to the internal difference photo area. In this way, the encoding of the actual photograph area at the boundary and the encoding of the difference photograph area are naturally separated.
[0177]
  The above can be accomplished by finding all the motion vectors in the original image, but forces the motion vector not to cross the boundary between the inner difference image region and the outer boundary real photo region. This is best done if the macroblock boundary falls within the boundary between the inner differential photo area and the outer boundary real photo area. In addition, if the difference photo edge with the actual photo boundary is in the center area of the macroblock, use additional bits when encoding to achieve the transition between the difference photo region and the actual photo boundary There is a need to. Therefore, maximum efficiency is obtained when the macroblock boundary is at the same edge as the edge between the inner differential photo area and the outer boundary real photo area.
[0178]
  The quantizer and the speed buffer controller that are encoding these hybrid difference-plus-actual photograph image-enhancement enhancement layer photograph have a larger signal size in the boundary actual photograph area than that in the inner difference photograph area. It should be noted that special adjustments are necessary to identify this.
[0179]
  There is a trade-off in using this method for the size of the border real photo area. When the extension of the boundary is small, the number of bits proportional to the total flow is small, but the relative efficiency of the small area decreases because the number of motion vectors does not match. This is because this alignment deviates from the edge of the boundary region. Another way to look at this is that the border region has a high side / area ratio, unlike a normal image rectangle with a very small proportion of edge to area. A typical internal rectangular photo area of normal digital video, usually encoded by compression such as MPEG-2 or MPEG-4, is that most of the area in the frame, except for the area at the edge of the frame, Since it is usually present in the previous frame, it has a high degree of consistency when finding motion vectors. For example, on pan, the direction of the screen on which the picture appears must cause the picture to be made from nothing on one edge since the picture appears off-screen for each frame. However, since most regular photographic rectangles are on-screen in the previous frame, the motion vectors are most often matched.
[0180]
  However, using the boundary stretching method of the present invention, the boundary region has a very high off-screen mismatch ratio in the previous frame when compensating for motion. This is because both the outer edge of the screen and the inner edge of the difference photo are “out-of-bounds” for the motion vector. Thus, some loss of efficiency is inherent to this method when considered as bit / image area (or bit / pixel or bit / macroblock which is a measure of equal bit / area). Thus, when the boundary area is relatively small, this relative inefficiency is a sufficiently small part of the total allowable bit rate. Similarly, if the boundary is relatively large, the efficiency is high and that part is still acceptable. The medium size boundary is somewhat inefficient during panning, but this inefficiency is acceptable.
[0181]
  One way in which efficiency can be restored using this technique is to use a simpler ratio of base layer resolution / enhancement layer resolution, eg, an exact factor of 3/2, 4/3 and especially 2. This method can be used for narrow base layers. Using a factor of 2 helps to obtain significant efficiency, especially when encoding the whole using the base layer and the resolution enhancement layer.
[0182]
  Also, low resolution images can be most naturally used on narrower screens, while high resolution images can be viewed more naturally on larger, wider and / or higher screens.
[0183]
  Corresponding to performing a “pan and scan” on the base layer resolution image, it is also possible to continuously move or reposition the inner difference photo area. The upper boundary will then have a dynamic re-position, size and shape. Macroblock alignment is usually lost with continuous panning, but can be maintained by carefully aligning the cut within a larger area. However, the simplest and most effective structure is an alignment centered at the fixed position of the inner difference photo with respect to the base layer on the boundary of the complete macroblock.
[0184]
Image filtering
Downsizing filter and upsizing filter
  The downsizing filter used in creating the base layer from the high-resolution original photograph has a magnitude that stops after a moderate negative lobe and a very small first positive lobe following this negative lobe. Experiments have shown that it is optimal to have. FIG. 12 is a diagram of the relative form, amplitude and rope polarity of the preferred downsizing filter. This downfilter is trapezoidal with a symmetrical pair of central positive lobe 1200, symmetry of adjacent (pinching) small negative lobe 1202 and adjacent (pinching) very small outer positive lobe 1204. Central weighting function. The absolute amplitudes of these

lobes

1200, 1202, 1204 can be adjusted as desired as long as the unequal correlation between relative polarity and relative amplitude shown in FIG. 12 is maintained. However, an excellent first approximation of the relative amplitude is defined by the prefix sinc function [sinc (x) = sin (x) / x]. Such filters can be used separately, which means that the horizontal data dimension is independently filtered and then resized and then the vertical data dimension is processed similarly and vice versa, The results mean the same.
[0185]
  When creating a base layer original (as input to base layer compression) from a low noise high resolution original input, a preferred downsizing filter has a first negative lobe with a normal sinc function amplitude. For clean and high resolution input images, this normal truncated sinc function works well. For low resolution (eg, 1280 × 720, 1024 × 768 or 1536 × 768), and for noisy input photos, it is more optimal to reduce the amplitude of the first negative lobe of the filter. A suitable amplitude in such a case is about 1/2 of the negative lobe amplitude of the prefix sinc function. The small first positive lobe outside the first negative lobe is also generally 1/2 to 2/3 of the amplitude of the normal sinc function. The impact of reducing the first negative lobe is an important issue. This is because the outer small positive lobe does not contribute to photographic noise. Additional samples outside the first positive lobe are preferably trimmed to minimize potential artifacts such as ringing.
[0186]
  The choice of whether to use a milder negative lobe of the downfilter or a negative lobe of full sinc function amplitude depends on the resolution of the original image and the level of noise. The selection is somewhat a function of the image content, since several types of scenes are easier to encode than other scenes (mainly related to the magnitude of motion and changes in specific shots). By using a “milder” downfilter with reduced negative lobes, base layer noise is reduced and cleaner and quieter compression of the base layer is achieved, resulting in fewer artifacts.
[0187]
  Experiments have also shown that the optimal upsizing filter has a small negative lobe adjacent to the central positive lobe but no more positive lobes. FIGS. 13A and 13B are diagrams of the relative form, amplitude and lobe polarity of a preferred pair of upsizing filters that are upsized by a factor of two. The central positive lobe 1300, 1300 'is sandwiched between a pair of small negative lobes 1302, 1302'. Asymmetrically arranged positive lobes 1304, 1304 'are also required. These pairs of up-filters are also considered prefix sinc filters centered on the newly created sample. For example, two new samples are created for each original sample as a factor 2 filter. Neighboring small

negative lobes

1302, 1302 ′ have a smaller negative amplitude when used in the corresponding downsizing filter (FIG. 12) or when used for an optimal (sine-based) upsizing filter for normal images. Have This is because the upsized image is restored and the compression process changes the spectral distribution. Therefore, a more moderate negative lobe works better to upsize the restored base layer without any additional positive lobes other than the central positive lobe 1300, 1300 '.
[0188]
  Experiments have shown that a few negative lobes 1302, 1302 'provide better layering results than just positive Gaussian or spline up filters (spline up filters have negative lobes) (It should be noted that, however, it is most often used in a positive form). This uprising filter is therefore used in the base layer of both the encoder and the decoder.
[0189]
High octave weighting for photo details
  In a preferred embodiment, the signal path that extends the original uncompressed base layer input image uses a Gaussian up-filter instead of the up-filter. In particular, the Gaussian Up Filter is used for “high octaves” of photographic details, which are determined by subtracting the expanded original base resolution input image (without using compression) from the original photo. It is done. Therefore, no negative lobes are used for this special up-filtered extension.
[0190]
  As mentioned above, in the case of MPEG-2, this high octave difference signal path is generally 0.25 (ie 25%) weighted and extended to the recovered base layer (using the other up-filters above). Added as an input to the reinforcement layer compression process. However, experiments have shown that 10%, 15%, 20%, 30% and 35% weights are useful for certain images when using MPEG-2. Other weights can prove useful. For MPEG-4, a filter weight of 4-8% has been found to be optimal when used with the other improvements described below. This weighting should therefore be considered as an adjustable parameter depending on the encoding system, the scene to be encoded / compressed, the particular camera (or film) used and the resolution of the image.
[0191]
Enhanced de-interlacing and noise reduction
Overview
  Experiments have shown that many de-interlacing algorithms and devices combine fields to produce acceptable results for the human eye. However, since the compression algorithm is not the human eye, the combination of de-interlaced fields must take into account the characteristics of such an algorithm. Without such careful de-interlaced coupling, the compression process produces high levels of noise artifacts, which wastes bits as well as making the appearance of the image noisy and busy. Yes (interrupts compression). The difference between de-interlacing for viewing (eg, with line-doubler and line-quadrupler) and de-interlacing as input for compression resulted in the following technique. In particular, the de-interlacing method described below is useful as an input for the above-described hierarchical MPEG compression and compression as well as single layer non-interlaced MPEG compression.
[0192]
  Furthermore, the noise reduction must be consistent with the requirement that it be an input to the compression algorithm, besides reducing noise appearance. The goal is to generally not reproduce noise that exceeds the noise of the original camera or film grain during restoration. Equal noise is generally considered acceptable after compression / decompression. It is a bonus that the noise is reduced and has the same sharpness and cleanliness as the original. The following noise reduction achieves these goals.
[0193]
  In addition, for low light, eg shots from high-sensitivity film or very noisy with high camera sensitivity settings, noise reduction is a noise that is uncomfortable with a compressed / decompressed image with good appearance. It is a difference from an image with a lot. The compression process greatly increases noise that exceeds some threshold of acceptability for the compressor. Therefore, in order to keep the noise below this threshold, it is necessary to use noise reduction preprocessing in order to obtain acceptable good results.
[0194]
De-graining filter
filter) and noise reduction filter
  Experiments have found that applying de-graining filtering and / or noise reduction filtering prior to layered coding or non-layered coding improves the performance performed by the compression system. It was done. For grain or noisy images, de-graining or noise reduction before compression is most effective, but both methods are for relatively low noise or low grain photos. Even moderate use is useful. Any number of known de-graining algorithms or noise reduction algorithms can be applied. Examples are “coring”, simple adjacent median filters and softening filters.
[0195]
  Whether noise reduction is necessary depends on how noisy the original image is. In the case of an interlaced original image, the interlace itself is a form of noise, and the original image usually requires additional noise reduction filtering in addition to the complex de-interlacing process described below. In the case of a progressive scan (non-interlaced) camera or film image, noise processing is effective for performing layered and non-layered compression when noise is present above a certain level.
[0196]
  There are different types of noise. For example, a video transfer from film contains film grain noise. Film grain noise is caused by silver particles bound to yellow, cyan and magenta film dyes. Yellow affects both red and green, cyan affects both blue and green, and magenta affects both red and blue. Red is generated at the place where the crystals of yellow dye and magenta dye overlap. Similarly, green is the overlap of yellow and cyan and blue is the overlap of magenta and cyan. Thus, the noise between colors is partially correlated by the pigments and particles between the color pairs. In addition, if a large number of particles overlap in all three colors, these particles overlap in the dark area of the image print or on the negative in the light area of the image (dark on the negative), so that additional color mixing occurs. Arise. This correlation between colors is a complex process that can be used to reduce film grain noise. In addition, a number of different film types are used, and each type is different in grain size, morphology and statistical distribution.
[0197]
  For video images produced by CCD sensors and other (eg tube) sensor cameras, the red, green and blue noises are uncorrelated. In this case, it is best to process the red, green and blue records separately. Thus, red noise is reduced by self-red processing of green and blue noise separately, and the same method applies to green and blue noise.
[0198]
  Thus, the noise handling is best matched to the characteristics of the noise source itself. In the case of composite images (from multiple sources), the noise may have different characteristics in different parts of the image. In this case, generic noise processing is the only option when noise processing is required.
[0199]
  In some cases, it is useful to perform “re-graining” or “re-noising” as a meaningful action after decoding the compressed layered data stream. It has also been found. This is because some degrained or denoising images may be “too clean” or “too sterile” in appearance. Re-graining and / or re-noising is a relatively easy action applied at the decoder using any of a number of known algorithms. For example, this can be accomplished by adding low-pass filtered random noise of appropriate amplitude.
[0200]
De-interlacing before compression
  As noted above, the preferred method of compressing interlaced sources, which is ultimately intended for non-interlaced display, is to de-interlace the interlaced source prior to the compression step. Includes steps to do. De-interlacing the signal after decoding in the receiver (in the receiver the signal is compressed in interlaced mode) is de-interlaced before compression and then interlaced More costly and less efficient than sending uncompressed compressed signals. The non-interlaced compressed signal may be layered or unlayered (ie normal single layer compression).
[0201]
  Experiments have shown that filtering a single field of an interlaced source and then using that field as if it were a full frame that was not interlaced would result in poor noisy compression results. Indicated. Therefore, it is not a good practice to use a single field deinterlacer before compression. Instead, the experiment weights [0.25, 0.5, 0.25] for each of the previous, current and next field frames to create a field composite frame ("field-frame"). The three field frame deinterlacer method used has been shown to provide excellent input for compression. The combination of the three field frames can be implemented using other weights (though these weights are optimal weights) to create a de-interlaced input to the compression process.
[0202]
  In the preferred de-interlacing system, a field de-interlacer is used as the first step in the overall process to create a field frame. In particular, each field is de-interlaced to create a composite frame, in which the total number of lines in the frame is derived from half of the lines in the field. Thus, for example, an interlaced 1080-line image has 540 lines per even and odd fields, each field representing 1/60 second. Normally, even and odd fields of 540 lines are interlaced to produce 1080 lines for each frame. The frame represents 1/30 second. However, in the preferred embodiment, the interlacer copies each scan line into a buffer that holds some of the de-interlaced results without modification from a specified field (eg, an odd field). The remaining intermediate scan lines for the frame (even scan lines in this example) are half the field lines above each newly stored line and the field lines below each newly stored line. Synthesized by adding 1/2. For example, the pixel value of line 2 for one frame each includes ½ of the pixel value obtained by summing the corresponding pixel values from line 1 and line 3 respectively. The creation of the intermediate composite scan line may be made for the fly or calculated after all the scan lines from one field have been stored in the buffer. The same process is repeated for the next field, but the field type (ie, even, odd) is reversed.
[0203]
  FIG. 14A is a block diagram of an odd field de-interlacer where the odd lines from the odd field 1400 are simply copied to the de-interlaced odd field 1402, while the even lines are from the original odd field. It is shown that the even lines of the de-interlaced odd field 1402 are formed by averaging adjacent odd lines. Similarly, FIG. 14B is a block diagram of an even field de-interlacer, where even lines from even field 1404 are simply copied to de-interlaced even field 1406, while odd lines are the original even field. Is created by averaging adjacent even lines from, indicating that an odd line of de-interlaced even 1406 is formed. It should be noted that this case corresponds to “top field first” and “bottom field first” is considered an “even” field.
[0204]
  As a next step, a series of these de-interlaced fields are used as input to the three-field frame de-interlacer to create the final de-interlaced frame. FIG. 15 shows how each output frame pixel is 25% of the corresponding pixel from the previous de-interlaced field (field frame) 1502 and 50 of the corresponding pixel from the current field frame 1504. FIG. 6 is a block diagram showing what is comprised of% and 25% of the corresponding pixel from the next field frame 1506.
[0205]
  The new de-interlaced frame then has much less interlace difference artifacts between frames than the three field frame in which it is constructed. However, there is temporal smearing by adding the previous and next field frames to the current field frame. This temporal smearing is usually acceptable, especially in view of the de-interlacing improvements that are provided.
[0206]
  This de-interlacing method is very useful as an input to compression, whether it is a single layer (not layered) or a layered single layer. This de-interlacing method is also useful independently of the use of compression as the processing of interlaced video for presentation, viewing or production of still frames. The photos from the de-interlacing method appear to be “cleaner” than showing the interlace directly or showing the de-interlaced field.
[0207]
Deinterlace thresholding
  The de-interlaced three-field total weighting [0.25, 0.5, 0.25] discussed above provides a stable image, but the moving part of a scene sometimes becomes soft or exhibits aliasing artifacts. Sometimes. To counter this, a threshold test can be applied that compares the [0.25, 0.5, 0.25] temporal filter results against the corresponding pixel values in the central field frame only. If the central field frame pixel value differs from the corresponding pixel value from the three field frame temporal filter by more than a specified threshold magnitude, only the central field frame pixel value is used. Thus, a pixel from a three field frame temporal filter is selected if the pixel value is less than the threshold magnitude by a difference from the corresponding pixel in a single de-interlaced central field frame, If the difference is larger than the threshold value, the pixel value of the central field frame is used. This allows fast motion to be tracked at field speed, then the smoother part of the image to be filtered and smoothed with a three field frame temporal filter. This combination proved to be a valid input for compression, if not optimal. Also, de-interlacing image material is [line doubling in conjunction]
with display), which is also very effective for direct viewing.
[0208]
  In a preferred embodiment for determining the threshold in this manner, the following equation is used to determine the corresponding RGB color value from the central (single) de-interlaced field frame image and the three field frame de-interlaced pixels: used.
Rdiff = R_single_field_de-interlaced minus R_3_field_de-interlaced
Gdiff = G_single_field_de-interlaced minus G_3_field_de-interlaced
Bdiff = B_single_field_de-interlaced minus B_3_field_de-interlaced
Threshold processing value = abs (Rdiff + Gdiff + Bdiff) + abs (Rdiff) + abs (Gdiff) + abs (Bdiff)
[0209]
  Next, the threshold processing value is compared with a threshold setting value. Typical threshold settings are in the range of 0.1 to 0.3, with 0.2 being the most common.
[0210]
  To remove noise from this threshold, after using smooth filtering of de-interlaced photos of 3 field frames and single field frames, the photos can be compared and thresholded. This smooth filtering can be achieved by down-filtering (eg, preferably down-filtering twice using the down-filter) and then up-filtering (eg, using the Gauss-up filter twice). . This “down-up” smoothing filter can be applied to both single field frame de-interlaced pictures and three field frame de-interlaced pictures. Next, the above smoothed single field frame photo and three field frame photo are compared to calculate a thresholded value, and then thresholded to determine which photo is assigned to each final output pixel. You can check if it is a source.
[0211]
  In particular, the threshold test is used as a switch to select a single field frame de-interlaced photograph or a combination of single field frame de-interlaced photographs with a three field frame temporal filter. This selection results in the following image: That is, an image in which a pixel is a pixel derived from a three-field frame deinterlacer in a region where the difference between the image and the single-field frame image is small (that is, smaller than a threshold value) The result is an image that is a pixel from a single field frame image in an area where the difference from the field frame de-interlaced pixels (after smoothing) was large (ie greater than the threshold).
[0212]
  This method preserves single field first motion detail (by switching to single field frame de-interlaced pixels) and smooths a large portion of the image (when 3 field frame de-interlaced). Proved to be effective (by switching to phase filter coupling).
[0213]
  In addition to making a choice between single field frame de-interlaced images or three field frame de-interlaced images, in addition to single field frame de-interlaced images, It is often beneficial to maintain some immediacy across all images of a single field photograph. This immediacy is balanced with the temporal smoothness of the three-field frame filter. A typical blending is a blending that creates a new frame by adding 33.33% (1/3) single center field frame to 66.67% (2/3) corresponding 3 field frame smoothed image. is there. This can be done before and after threshold switching because the result is the same for both, and only affects the smoothed three-field frame picture. It should be noted that this is virtually equivalent to using different ratios of three field frames other than the original three field frame weights “0.25, 0.5, 0.25”. When 1/3 of “0.25, 0.5, 0.25” plus (0, 1, 0) is calculated, [0.1667, 0.6666, 0.1667] is a three-field frame. Obtained as a time-phase filter. A heavier weighted central (current) field frame results in additional immediacy, even for smoothed regions that are below the threshold value. This combination has proved effective in balancing temporal smoothness with immediacy in the de-interlacing process for moving parts of the scene.
[0214]
Using linear filters
  A sum, filter or matrix containing a video picture must take into account that the pixel values in the video are non-linear signals. For example, HDTV video curves may vary somewhat in coefficients and factors, but the general formula is International CCIRXA-11 (currently called Rec. 709).
V = 1.993 * L^0.45-0.0993 when L> 0.080551
When V = 4.5 * L L ≦ 0.018051
Where V is the video value and L is the linear light luminance.
[0215]
  These changes slightly adjust the threshold (0.080551), slightly adjust the factor (4.5) (eg 4.0), and slightly adjust the exponent (0.45) (eg 0. 0). 4). But the basic formula remains the same.
[0216]
  Matrix operations such as conversion between RGB and YUV suggest a linear value. Since MPEG generally uses video non-linear values as if they were linear, a leak between luminance (Y) and color values (U and V) occurs. This leakage hinders the efficiency of compression. Using the logarithmic representation, eg, in units of film density, greatly corrects this problem. Various types of MPEG encoding are neutral to the non-linear aspect of the signal, but its efficiency is achieved by utilizing a matrix transformation between RGB and YUV. YUV (U = R−Y, V = B−Y) must include Y calculated as a linearized sum of 0.59G plus 0.29R plus 0.12B (or a slight change in these coefficients). I must. However, U (= R−Y) is equal to the R / Y of the log space that is orthogonal to the luminance. Therefore, a shaded orange ball does not change the logarithmic U (= R−Y) parameter. The change in brightness is fully represented in the luminance parameter if full details are provided.
[0217]
  The problem of linear logarithm vs video has a strong impact on filtering. The key point to note is that small signal variations (eg, 10% or less) are mostly corrected when a non-linear video signal is processed as if it were a linear signal. This is because a piecewise linear approximation to a smooth video-to-from-linear conversion curve is reasonable. However, if the variation is large, the linear filter is much more effective and a much better image quality is obtained. Therefore, it is desirable to first convert a non-linear signal to a linear signal so that a linear filter can be used when large variations are to be optimally encoded, transformed or otherwise processed. .
[0218]
  De-interlacing is therefore very good when each filter and summing step utilizes a conversion to a linear value before filtering or summing. This is because large signal variations are inherent in interlaced signals in small details of the image. The image signal is filtered and converted back to a non-linear video digital representation. Therefore, three field frame weighting (eg [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667]) must be performed on the linearized video signal. The sum of other filtering and weighting of partial terms in noise and de-interlace filtering must also be converted to linear for computation. Which operation guarantees linear processing is determined by the variation of the signal and the type of filtering. Since the sharpening of the image is self-proportional, it can be appropriately calculated with a video or logarithmic nonlinear representation. However, matrix processing, spatial filtering, weighted sum and de-interlace processing must be calculated using linearized digital values.
[0219]
  As a simple example, the single field frame deinterlacer described above calculates a missing alternate line by averaging the lines above and below each actual line. This averaging operation is very numerically and visually correct when performed linearly. Thus, instead of summing 0.5 times the top line and 0.5 times the bottom line, the digital values are first linearized, then averaged, and then converted back to a non-linear video representation. It is.
[0220]
Hierarchical mode based on 2/3 base layer
  The 1280 × 720 reinforcement layer can utilize an 864 × 480 base layer (ie, a 2/3 relationship between the reinforcement layer and the base layer). FIG. 16 is a block diagram of such a mode. The 1280x720 original image 1600 is padded with 1296x720 (as it is an integer multiple of 16) and then downsized 2/3 times to an 864x480 image 1602 (again an integer multiple of 16) ). The downsizing preferably uses a normal filter or a filter with a mild negative lobe. As described above, this downsized image 1602 can be input to a first encoder 1604 (eg, an MPEG-2 encoder or an MPEG-4 encoder) and directly encoded as a base layer.
[0221]
To encode the enhancement layer, the base layer is upsized 3/2 times (expanded and then upfiltered) to become a 1296 × 720 intermediate frame 1606. The upfilter preferably has a mild negative lobe. This intermediate frame 1606 is subtracted from the current image 1600. At the same time, the 864 × 480 image 1602 is up-filtered 3/2 times (preferably using a Gaussian filter) to 1280 × 720 and then subtracted from the original image 1600. The result is weighted (for example, 25% for MPEG-2) and then added to the result of subtracting the intermediate frame 1606 from the original image 1600. The resulting sum is cropped to reduce size (eg, 1152 × 688) and then the edges are feathered to provide a pre-compression enhanced layer frame 1608. The pre-compression enhancement layer frame 1608 is input to a second encoder 1610 (for example, an MPEG-2 or MPEG-4 encoder) and encoded as an enhancement layer.
[0222]
  Its efficiency and quality at 18.5 megabits / second is about the same for “single” layered (ie non-layered) and layered systems that utilize this arrangement. The efficiency of the 2/3 times relationship between the reinforcing layer and the base layer is not as good as the 2 times case. This is because the DCT coefficient between the base layer and the reinforcing layer has low orthogonality. However, this structure is practical and has the advantage of providing a high quality base layer (decoding cheaper). This is an improvement over a single layered arrangement where the entire high resolution photo must be decoded (at a higher cost) where low resolution is all that can be provided by a particular display.
[0223]
  In addition, the hierarchical arrangement configuration has an advantage that the reinforcement sub-region can be adjusted. Thus, the efficiency can be controlled by adjusting the size of the enhancement layer and the ratio of base layer bit rate / enhancement layer bit rate of all bit rates assigned to the base layer and enhancement layer. The enhancement layer size and bit rate ratio can be adjusted to optimize the compression performance, especially under high stress (fast motion or multiple scene changes). For example, as described above, under extreme stress, all bits can be assigned to the base layer.
[0224]
  A convenient resolution relationship between the enhancement layer and the base layer is a factor of 1/2, 2/3 and other fractional (eg, 1/3, 3/4) relationships. It is also useful to apply squeeze to the relationship between the reinforcement layer and the base layer. For example, a 2048 × 1024 source photo may have a 1536 × 512 base layer that has a 3/4 horizontal relationship and a 1/2 vertical relationship to the source image. Yes. While this is not optimal (a factor of 2 is optimal for both horizontal and vertical relationships), it illustrates the principle. Using 2/3 for both horizontal and vertical relationships can improve some resolution by utilizing a factor of 2 in the vertical direction and a factor of 2/3 in the horizontal direction. Alternatively, for some resolutions, it is more optimal to use a factor of 2/3 in the vertical direction and a factor of 1/2 in the horizontal direction. Thus, fractions such as 1/2, 2/3, 3/4, and 1/3 can be applied independently to the relationship between horizontal and vertical resolution, making many possible combinations of relationships. Can do. Thus, not only the relationship between the enhancement layer and the base layer and its input resolution, but also the relationship between the full input resolution and the base layer resolution allows full flexibility when using such a fractional relationship. Such a particularly useful combination of resolution relationships can be assigned a compressed “enhanced mode” number, even as part of any standard.
[0225]
Median filter
  The most useful filter for processing noise is the median filter. A three element median filter ranks the three entries by a simple sort and then picks the middle entry. For example, an X (horizontal) median filter examines the red value (or green value or blue value) of three adjacent horizontal pixels and picks the pixel having the middle value. If the two are the same, choose that value. Similarly, the Y filter looks at the scan line above and below the current pixel and again picks the median.
[0226]
  Experiments have confirmed that it is useful to average the results from applying both the X and Y median filters to produce a new noise reduction component picture [ie, each new pixel is the original 50% equal average of the median X and Y of the corresponding pixels from the image].
[0227]
  In addition to the median values of X and Y (horizontal and vertical), other median values such as diagonal median values can be employed. However, since the vertical and horizontal pixel values are physically closest to any particular pixel, they are less likely to cause errors or distortion than the diagonal median value. However, such other medians can still be used when it is difficult to reduce noise by using only the vertical and horizontal medians.
[0228]
  Another useful source for reducing noise is information from the previous and next frame (ie, median time). As described below, motion analysis provides the best match for moving regions. However, motion analysis is compute intensive. If a region of the image is not moving or moving slowly, the red value from the current pixel (and the green and blue values) is filtered by the red value at the same pixel location in the previous and next frames. The median value is acceptable. However, if there is significant movement and such a temporal filter is used, abnormal artifacts may occur. Therefore, it is preferable to first select a threshold value to see if such a median value differs from the current pixel value beyond the selected size. The threshold value can be calculated as follows in substantially the same manner as the de-interlacing threshold value.
Rdiff = R_current_pixel minus R_time phase_median
Gdiff = G_current_pixel minus G_time phase_median
Bdiff = B_current_pixel minus B_time phase_median
Threshold processing value = abs (Rdiff + Gdiff + Bdiff) + abs (Rdiff) + abs (Gdiff) + abs (Bdiff)
[0229]
  The threshold processing value is then compared with a threshold setting value. Typical threshold setting values are in the range of 0.1 to 0.3, with 0.2 being common. If it is above that threshold, the current value is retained. If it is below that threshold, the median time phase is used.
[0230]
  The additional median type is a median selected from the median of X, Y and time phase. Another median type is to select the temporal median and then the equal average of the X and Y median values from it.
[0231]
  Each type of median can cause problems. The median of X and Y makes the image smear and blur, so the image looks “greasy”. The median temporal phase blurs the movement over time. Each median poses a problem and the characteristics of each median are different (in a sense, “orthogonal”), so combining the various medians may give the best results. Confirmed by experiment.
[0232]
  In particular, the preferred combination of medians is a linear weighted sum of the following five items that determine the value for each pixel of the current image (see discussion above regarding linear video processing).
50% of the current image (thus the maximum noise reduction is 3db or 1/2);
15% of the average of the median X and Y;
10% of threshold time median;
10% of the average of the median X and Y of the threshold time phase median;
And 15% of the 3 way X, Y and median time.
[0233]
  This combination of median time works rationally to reduce image noise without causing the image to appear “greasy” or blurry, or causing phase blur or loss of detail in moving objects. . Another useful weighting of these five items is 35%, 20%, 22.5%, 10% and 12.5%, respectively.
[0234]
  In addition, it is useful to apply motion compensation by applying a center weighted temporal filter to the motion compensated nxn region as described below. These medians can be added to the filtered image results (of the above five items) to further smooth the image, providing better smoothing and detail for moving image regions.
[0235]
Motion analysis
  “In the spot (in
In addition to “place” temporal filtering (which works well to smooth slowly moving details), de-interlacing and noise reduction can also be improved using motion analysis. Adding pixels at the same position in 3 fields or 3 frames is effective for stationary objects. However, for moving objects, it is often more optimal to try to analyze the main motion over a small group of pixels if temporal averaging / smoothing is desired. For example, an nxn block of pixels (eg 2 × 2, 3 × 3, 4 × 4, 6 × 6 or 8 × 8) can be used to search the previous and next fields or frames to find a match. (In the same way, MPEG-2 motion vectors are found by matching 16 × 16 macroblocks). Once the best match is found in one or more previous and next frames, a “trajectory” and a “moving mini-photo” can be identified. In the case of interlaced fields, it is best to analyze the comparison results as well as calculate the moving mini-photos inferred using the result of the thresholded de-interlacing process. This process already separates fast-moving details from slow-moving details and already smooths the slow-moving details so that photo comparison and reconstruction can be applied more than individual de-interlaced fields .
[0236]
  Motion analysis is preferably performed by comparing the nxn block of the current thresholded de-interlaced image with all adjacent blocks in the previous and next one or more frames. The The comparison may be the absolute value of the luminance or RGB difference of the nxn block. If one frame has almost the same motion vector and is in the reverse direction, the frame is sufficiently oriented in the forward direction and the reverse direction. However, if the motion vectors are approximately equal and not in the reverse direction, one or more additional forward and reverse frames can help determine the actual trajectory. Furthermore, different interlacing processes are useful to help determine the “best guess” motion vector in the forward and reverse directions. One de-interlacing process is one that uses only individual de-interlacing fields, but this is very prone to aliasing and artifacts in small moving details. Another de-interlacing method uses only the field frame smooth de-interlacing and performs the above weighting [0.25, 0.5, 0.25] without thresholding. This is the method to use. Details are smoothed and sometimes lost, but the trajectory is often more accurate.
[0237]
  Once a trajectory is found, a “smoothed nxn block” is created by temporally filtering using motion vector offset pixels from one (or more) previous and next frames. Can do. A typical filter is again [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667] for 3 frames, and two reverse directions and forwards. Probably [0.1, 0.2, 0.4, 0.2, 0.1] for directional frames. Other filters with a small center weight are also useful, especially those with smaller block sizes (eg 2 × 2, 3 × 3 and 4 × 4). The reliability of matching between frames is indicated by the difference in absolute value. A large minimum absolute difference can be used to select a larger center weight for the filter. A small absolute difference value suggests a good match, so it can be used to select a smaller center weight to make the average value more uniform over one span of several frames of the weight compensation block. Can be distributed.
[0238]
  The weights of these filters are the individual de-interlaced motion compensated field frames described above; thresholded 3 field frame de-interlaced pictures; The laced image can be applied with the weighting of [0.25, 0.5, 0.25] as described above. However, the best filter weight usually comes from applying motion compensated block linear filtering to the result of the thresholded 3-field frame. This is not only because the thresholded 3-field frame pixels are the most motion responsive [by defaulting to single de-interlaced field frames that exceed the threshold]. This is because it is maximally smooth (by eliminating aliasing in the smooth region). Thus, the motion vector obtained from the motion analysis can be used as input to a multi-frame filter or multi-deinterlaced field frame filter or single de-interlaced field frame filter or a combination thereof. However, the thresholded multi-field frame de-interlaced image in most cases forms the best filter input.
[0239]
  When using motion analysis, if a fast motion is found (for example ± 32 pixels), the search area is large and the computational cost is expensive. Therefore, it is best to increase speed by using dedicated hardware or a digital signal processor based computer.
[0240]
  Once motion vectors are found along with their absolute difference measurement accuracy, the motion vectors can be used in complex ways to attempt frame rate conversion. However, occlusion problems (objects that overwhelm or expose others) can confuse alignment and cannot be accurately and automatically inferred. Occlusion also involves temporal aliasing such as normal image temporal undersampling and its beat with the natural frequency of the image (eg, the “reverse wagon wheel” effect of a movie). These problems often cannot be solved by known arithmetic methods, and so far, human help is required. Thus, human scrutiny and adjustment can be used for off-line and non-real-time frame rate conversion and other similar temporal processes if real-time automated processing is not required.
[0241]
  De-interlacing is a simple form of the same problem. As with frame rate conversion, the de-interlacing task is theoretically impossible to perform completely. This is in particular due to temporal undersampling (closed shutter) and improper temporal sample filter (ie box filter). However, even with the correct sample, aliasing problems such as shielding and interlacing further ensure that it is logically impossible to obtain correct results. Cases where this is seen are mitigated by the tool depth described here applied to the problem. Pathological cases are always present in the real image sequence. The goal is only to reduce the frequency and level of disease progression when encountering such sequences. However, in many cases, the de-interlacing process can be fully automated acceptably and can operate without iterating in real time. Nevertheless, there are many parameters that often benefit from manual adjustment.
[0242]
Smoothing with high frequency filter
  In addition to median filtering, reducing high frequency detail also reduces high frequency noise. However, this smoothing comes at the cost of a loss of sharpness and detail. Therefore, very little such smoothing is generally useful. A filter that causes smoothing can be easily created by down-filtering with a normal filter (for example, a trapezoidal sine filter) and then up-filtering with a Gaussian filter, similar to the threshold in the case of de-interlacing. . The result is smoothed because it lacks high frequency photographic details. If such a term is added, it must be a very small amount, eg 5-10%, to reduce a small amount of noise. When large, the blur effect is generally quite visible.
[0243]
Base layer noise filtering
  The filter parameters of the median filtering for the original image must be matched to the noise characteristics of the film grain or image sensor that captures the image. After this median filtered image is down-filtered to produce an input to the base layer compression process, the image still contains a small amount of noise. This noise can be further reduced by combining another XY median filter (averaging the median of X and Y equally) plus a small amount of high frequency smoothing filter. The preferred filter weights for these three items applied to each pixel in the base layer are as follows.
70% of the original base layer (median down-filtered from the original image filtered);
22.5% of the average of the median X and Y; and
7.5% of the down-up smoothing filter.
[0244]
  This small amount of additional filtering in the base layer reduces noise by a small amount and improves stability, resulting in better MPEG encoding and limiting the amount of noise added by such encoding.
[0245]
Filter with negative lobes for motion compensation in MPEG-2 and MPEG-4
  MPEG-4 provides a reference filter to shift the macroblock and then use the aligned region to compensate for motion when the best motion vector match is found. MPEG-4 video encoding maintains a resolution of 1/2 pixel of a motion vector for a macroblock, similar to MPEG-2. MPEG-4, unlike MPEG-2, maintains 1/4 pixel accuracy. However, the filter used in MPEG-4 standard equipment is the filter after the best. In MPEG-2, a half-way point between pixels is an average value of these two adjacent pixels, and is a box filter next to the best level. In MPEG-4, this filter is used for 1/2 pixel resolution. When 1/4 pixel resolution is invoked by MPEG-4 version 2, a filter with a negative lobe is used for the midpoint, but the next best box filter with this result and the adjacent pixels are 1 / Used for

points

4 and 3/4.
[0246]
  In addition, the reference color channel (chrominance
channel) (U = R−Y and V = B−Y) does not use sub-pixel resolution in the motion compensation step under MPEG-4. Since the luminance channel (Y) has a resolution of 1/2 or 1/4 pixel, the U and V channels of the 1/2 resolution reference color correspond to 1/2 pixel of the luminance, Sampling must be done using a 1/4 pixel resolution filter. When a 1/4 pixel resolution is selected for luminance, 1/8 pixel resolution must be used for the U and V reference colors.
[0247]
  By using the negative lobe prefix sinc function to filter the 1/4, 1/2, and 3/4 pixel points when performing 1/4 pixel resolution for luminance (as above) Experiments have shown that using a similar negative lobe when performing half pixel resolution for a filter that creates half pixel positions, and the filtering effect is significantly improved. .
[0248]
  When using 1/4 pixel luminance resolution, by using a negative lobe prefix sinc function to filter 1/8 pixel points to U and V chrominance, and 1/2 pixel luminance resolution. When used, the effect of filtering is likewise significantly improved by using a 1/4 pixel resolution filter with a similar negative lobe filter.
[0249]
  ¼ sinc motion compensated displacement filtering (truncated sine motion)
It has been discovered that when combined with compensated displacement filtering, the picture quality of the photo is greatly improved. In particular, cleanliness is improved, noise and artifacts are reduced, and chroma detail is increased.
[0250]
  These filters can be applied to video images by MPEG-1, MPEG-2, MPEG-4, or other suitable motion compensation block-based image coding system.
[0251]
Image forming device characterization and correction
  When working with a specific progressive scan (non-interlaced) camera, it is highly desirable to apply pre-processing specific to a specific camera before compression (layered or non-layered). confirmed. For example, one camera type may have a machine with 1/3 pixel between red and green sensors and another 1/3 pixel between green and blue sensors (2/3 pixels between red and blue). There is mechanical horizontal misalignment. This causes color fringes around small vertical details. These color fringes are not visible in the original image, but in the compression / decompression process, they become very visible and produce undesirable color noise. A pre-process specific to this one camera type corrects this color displacement and provides an input for compression free of color artifacts. Thus, such small nuances of the camera and its sensor characteristics, which are invisible, are important for the acceptability and quality of the final compressed / decompressed result.
[0252]
  Therefore, it is useful to distinguish between “what the eye sees” and “what the compressor sees”. A preprocessing step has been discovered that advantageously uses this identification to greatly improve the quality of the compressed / decompressed image.
[0253]
  Thus, each individual electronic camera, each camera type, each film type, and each individual film scanner and scanner type used in creating the input to the compression / decompression system is color alignment and noise (video camera). And the electronic noise for the scanner and the particles for the film) must be individually determined. The information that the image is created in, the table of specific properties and the specific settings of each part of the device must be carried by the original image and then used in processing before being compressed.
[0254]
  For example, certain cameras may require color realignment. Certain cameras may also be set with a medium noise setting (which substantially affects the amount of noise processing required). These camera settings and unique camera characteristics must be included as auxiliary information along each shot from that camera. This information can then be used to control the type of preprocessing and the setting of parameters for preprocessing.
[0255]
  In the case of images that are edited from multiple cameras or decoded from multiple cameras and / or film sources, the pre-processing should probably be performed before such editing or combination. Such preprocessing should not degrade the quality of the image and is invisible but has a significant impact on the quality of the compression.
[0256]
  The general method for implementing and using such characterization for non-film imaging systems (eg, electronic cameras and film scanners) used to produce images to be input to a particular compression system is as follows: is there.
(1) Create an image of a resolution test chart, and then express it by color pairs (eg RG, RB, GB), preferably in units of pixels, horizontal and vertical color alignment of the pixel sensor (particles in the case of film) Measure.
(2) Create an image of one or more monochrome test charts, then the sensor individually, preferably as a set expressed as red, green and blue pixel values (eg white card, black card, 50% and 18 Measure the noise generated (by creating images of% gray cards and red, green and blue reference cards). Determine if the noise is interrelated by comparing output changes from other color channels and adjacent pixels.
(3) Choose the correct information produced by the precisely tuned device along with the image (e.g., by electronic transmission, storage on a mechanical sustainable medium, or human sustainable data associated with the image).
(4) Before using the image from the imaging system in the compression process, the pixels are translated by color with an equal amount of offset to correct the measured misalignment. For example, if the red sensor is misaligned 0.25 pixels below the blue sensor, all red pixels in the image must be shifted upward by 0.25 pixels. Similarly, based on the measured amount of noise, the weight of the noise reduction filter is adjusted by an amount that compensates for the amount of measured noise (this is confirmed by experience and in a lookup table calculated manually or calculated). Need to be defined).
[0257]
  The general method for performing and using such characterization for a film imaging system used to produce an image to be input in a particular compression system is as follows.
(1) Determine the type of film (particles vary with film type).
(2) Expose the film to one or more monochrome test charts under various lighting conditions (noise is partly a function of exposure).
(3) The film is scanned by a film scanner at a normal speed (characteristics of the film scanner are measured as described above), and the generated noise is measured individually as a set by a sensor. Determine if the noise is interrelated.
(4) When the same type of film is exposed and scanned with a precisely calibrated scanner, its confirmed and measured information (ie, film type, exposure conditions, scanning characteristics) is scanned into the film being scanned. Carry with the image.
(5) Before using such an image in the compression process, adjust the noise reduction filter weight by an amount that compensates for the amount of noise measured (this is confirmed by experience and then manually or computerized). The adjustment is a function of at least three factors: film type, exposure conditions, and scanning characteristics, which is preferably a computer).
[0258]
Enhanced 3-2 pulldown system
  Transferring film to 60 Hz video using the 3-2 pulldown method described above is a practice that is generally highly disliked. The 3-2 pulldown method is used for existing NTSC (and some proposed HDTV) systems because 24 frames / second does not divide equally into 59.94 fields or 60 fields / second. . Odd frames (or even frames) are placed on two interlaced fields, and even frames (or odd frames) are placed on three interlaced fields. Therefore, one field overlaps every five fields. One frame of film maps to five fields of video. As mentioned above, this process causes numerous unpleasant problems.
[0259]
  Most video processing devices only apply the process to intermediate signals. In this case, the time-changing effect acts on one field differently from the next field, even if several input fields overlap. After such a process, these fields no longer overlap and the field pairs cannot be recombined to recover the original film frame. Examples of such processes that occur at that field speed include pan-and-scan (a narrow 4: 3 video screen that moves across a widescreen image horizontally to show important actions), fade-up or There are fade down, sequential color adjustment, video title overlay scroll, etc. Furthermore, when such a signal is captured on film and then edited and processed into video, the frame processing of the film and the field processing of the video are strongly mixed in this manner. When such a video signal (which is widely present) is then sent to an image compression system, the system generally operates suboptimally.
[0260]
  So far, the best image compression from a film source occurs only when a 24 fps image of the film can be completely re-extracted from the video signal (or better, only when it never leaves a 24 fps region). Experiments have shown that. The compression system can then encode the movie (or film-based TV show or TV commercial) at the original 24 fps speed of the original film. This is the most effective compression method. Some movie-on-demand systems and DVD mastering systems carefully utilize 3-2 pulldown and then edit in a very limited way to finally extract the 24 fps original frame. It is guaranteed that it can be compressed at 24 fps.
[0261]
  However, such attention is “open loop” and is often broken by normal human error. The complexity of applying editing and post-production effects to production often results in “errors” when field speed processing occurs. Therefore, a preferred method that avoids the possibility of this happening and eliminates the complexity of trying to track everything to avoid such errors is as follows.
(1) Utilize film processing equipment that supports direct 24 fps storage, processing or communication whenever possible.
(2) Store all film images at their native 24 fps rate using electronic media or high speed optical media (eg hard drive and / or RAM) for local storage.
(3) Whenever the device receives 3-2 pulldown video as input, it creates a 3-2 pulldown on the fly that is converted (in real time) from local storage (maintained at 24 fps).
(4) When storing the output of the device that generates and transmits the 3-2 pulldown image, cancel the 3-2 pulldown on the fly and store it again at 24 fps.
(5) Remove all devices from the system that must operate only in the field, so that the frame cannot be stored in normal processing (as a single frame, for fields 2 and 3).
(6) Set all software that operates on or edits the stored image sequence to match the 24 fps mode used for that storage medium; use software that cannot operate in 24 fps native mode do not do.
(7) If telecine does not provide direct 24 fps output, all original images are televised on deterministic cadence (ie always 3 and then 2 or 2 and then 3) (ie from film) Convert to video). The interlaced 3-2 pull-down cancels the cast immediately after the interface from the telecine.
(8) If a tape with an unknown 3-2 pulldown cascade is received, the cascade must be found in some way and removed before it can be stored. This can be done with a hardware detection system, a software detection system or manually / visually. Unfortunately, hardware detection systems are not perfect, and manual and visual inspection is always necessary (current systems are trying to detect field misalignments. On black or white frames or image brightness. Such misalignment on a constant value field cannot currently be detected, even with detectable misalignment, some detectors fail due to noise or weak algorithms).
(9) Any tape storage output from a facility that requires 3-2 pull-down is stored purely in a known cadence that is maintained and undisturbed during the entire run time of the program.
[0262]
  Certain processing devices that require 3-2 pull-down as input and output obtain their single or multiple inputs made to the fly in real time from a 24 fps source by the method described above. The dance always starts in a standard way for each input. The device's output cadence is then known and must be the same as the cadence generated on the fly as the device's input. The cascade is then canceled by this a priori knowledge and then the frame is stored in the 24 fps format of the storage medium.
[0263]
  This method requires real-time 3-2 pulldown cancellation and 3-2 pulldown synthesis. If the CADANCE is not from an unknown format tape, the 24 fps nature of those frames is automatically preserved by such a film-based telecine post-production system. The system then forms an optimal input to the compression system (including the layered compression process described above).
[0264]
  This process will be widely useful for video and HDTV telecine facilities. The other day, when all devices accept 24 fps (and other speed progressive scan) native signal input, output, processing and storage modes, such a method would no longer be necessary. In the meantime, however, many devices have an internal and external interface, even if they have a targeted function that operates on film input.
3-2 pull down is required for in and out). During this period, the above method eliminates the 3-2 pull-down problem and can therefore become an essential element of film post-production and telecine efficacy.
[0265]
How to create a frame rate
  24 fps forms the worldwide standard for motion picture film, but using 24 fps often results in interlaced motion (the “stutter” because there are many repeated flashes of the frame before the next move. stutter) "). To provide a smoother motion, ie a clearer picture of a moving object, as well as a slow motion (by capturing the image at a high frame rate but moving the image at a lower rate) Higher frame rates are desirable. As noted above, the US video speed of 60 fps (and 59.94 fps for broadcast video) is relatively incompatible with 24 fps. This causes problems when trying to release a single movie around the world. This is because 50 Hz PAL and SECOM video systems are relatively incompatible with 60 fps NTSC video and 60 Hz centered USHDTV.
[0266]
  U.S. Patent Application No. 09 / 435,277 (filed on November 5, 1999 with the title "System And Method For Motion Compensation and Frame Rate Conversion" and assigned to the assignee of the present invention) Teach a technique that can perform difficult frame rate conversions, such as between 60 Hz and 50 Hz and between 60 Hz and 72 Hz. These techniques also de-interlace in addition to frame rate conversion.
[0267]
  Although very successful using the frame rate conversion method taught in the above application, which performs conversions between close high frame rates, such as between 60 Hz and 50 Hz or between 60 Hz and 72 Hz (the result looks quite good) The operation cost is high. However, it was confirmed that conversion between 24 Hz and 60 Hz using motion analysis is quite difficult. At 24 fps, the frames are quite different (as in the cockpit scene from the movie “Top Gun”), especially in that the magnitude of motion blur in each frame is different. This makes motion analysis as well as conversion of the next frame rate difficult from a 25 fps source. Furthermore, it is impossible to eliminate motion blur, so that the images will remain blurred even if motion analysis is possible for high motion 24 fps scenes (the images will be more It moves smoothly and has less stutter). Since motion analysis requires a matching part of the image, it is almost impossible to match frames where the magnitude of motion blur is significantly different from adjacent frames. Thus, 24 fps source material from film (or electronic camera) is a poor starting point for frame rate conversion to 50 Hz or 60 Hz video.
[0268]
  This concludes that a high frame rate electronic camera is a much better image source than a 24 fps electronic camera. However, given that it is difficult to convert back from 60 fps video to 24 fps film, 72 fps is a much better camera frame rate for the ultimate 24 fps compatibility.
[0269]
  Experiments show that images with excellent image quality moving at 24 fps can be derived from 72 fps frames by using a very simple weighted frame filter. The best weighting for three consecutive frames (previous, current and next frames) from a 72fps source that yields one frame of 24fps is centered on the weighting of [0.1667, 0.6666, 0.1667] It has become. However, the 3-frame weighting set in the range [0.1, 0.8, 0.1] to [0.25, 0.5, 0.25] seems to work well. Emphasis is placed on the center frame, which has a short motion blur so that it can smooth out 24 fps motion stutter (by simulating a 24 fps motion blur) from adjacent frames. Helps balance the clarity of a single frame plus the necessary blur.
[0270]
  This weighting technique works well in about 95% of all cases and allows this simple weighting function to perform most 24 fps transforms. For the remaining 5% of these cases, motion compensation can be used as taught in US patent application Ser. No. 09 / 435,277. With this simple weighting method, the residual motion compensation transformation becomes more practical when needed by reducing the workload on the transformation process to 1/20.
[0271]
  It should also be noted that a 120 fps source can be used with five weights to achieve a similar result at 24 fps. For example, a weighting of [0.1, 0.2, 0.4, 0.2, 0.1] can be used. Also, 60 fps can be derived from 120 fps by taking every other frame, but a shorter open shutter period is noticeable for faster movement. To alleviate this problem, overlapping filters can also be used (eg, preferably [0.1667, 0.6666, 0.1667] but [0.1, 0.8, 0.1]). To [0.25, 0.5, 0.25]), the low amplitude weighted frame is repeated. Of course, higher frame rates can shape temporal samples more carefully to induce frame rates such as 24 fps. At very high frame rates, the application of the techniques of US Pat. Nos. 5,465,119 and 5,737,027, assigned to the assignee of the present invention, begins. This is because a method for lowering the data rate in each frame is necessary to keep the data transfer rate manageable. However, on-chip parallel processing within a sensor (eg, active pixel or CCD) can provide another means of reducing the required off-chip I / O speed.
[0272]
  Assuming 24 fps is desired for the economic utility of the new 72 fps (such as) frame rate format, the temporal filter weighting functions described herein (eg, 0.1667, 0.6666, It is also important to be able to monitor images at 24 fps using 0.1667). By doing this, we can check for “blocking” (setting up) of shots in the scene and ensure that 24 fps results look good (in addition to higher speed full-rate versions such as 72 fps). Thus, the advantages of high frame rate capture are fully integrated with the ability to perform international film and video releases at 24 fps.
[0273]
  Thus, the particular selected high frame rate is not only upward compatible with the basic facility releasing existing 24 fps film and world wide video, but also the most appropriate to create future high frame rate electronic image sources. Forming the foundation.
[0274]
Modular bit speed
  “Modularizing” the bit rate is useful for many video compression applications. Various bit rate systems take advantage of continuously changing bit rates to apply more bits to faster changing shots. This can be done in a coarse manner by giving each useful unit a different bit rate. Examples of suitable units include a range of frames ("photo group" or GOP) or each P frame. Thus, for example, the bit rate may be constant within the GOP. However, for a GOP where high compression stress is detected (eg, due to large motion or scene changes), a higher constant bit rate can be utilized. This is similar to the above layering method where all bits in the enhancement layer are applied to the base layer during periods of high stress (generally reset at the next I frame). Therefore, in addition to the concept of applying more bits to the base layer, more bits can be compressed into a single layer compression, or base layer and enhancement layer (layered) to obtain high quality during periods of high stress. Applicable for compression).
[0275]
  In general, a low bit rate can handle 90% of the time of a movie or live event. Using 50% or 100% more bits for the remaining 10% of time results in near-perfect encoding, but the total bit count only increases by 5% to 10%. This is a very effective way to do complete coding, especially visible, while generally encoding to a constant bit rate (thus retaining most of the constant bit rate modularity and processing advantages). Prove that.
[0276]
  The use of such higher bit rate periods can be controlled manually or automatically. Automatic control can be performed using a rate-controlled quantization scale factor, and this parameter increases during periods of high stress (to avoid significant increases in bit rate). Therefore, such high stress is detected and the remaining GOP should be encoded at a higher bit rate, or alternatively the GOP re-encodes using a higher bit rate starting with the starting I frame. Can be signaled. Using visual inspection, manual selection can be used to flag that the GOP requires a higher bit rate.
[0277]
  It is beneficial to perform real-time decoding to take advantage of the fact that GOPs generally have a certain size. The use of a simple multiple of GOP (eg, a 50% or 100% increase in the number of bits for a highly stressed GOP) also retains many of the advantages. FIG. 17 is an example diagram that applies a higher bit rate to the modular portion of the compressed data stream. A group of photos including

normal scenes

1800, 1802 is assigned a constant speed bit. When a GOP 1804 occurs that includes a scene that exhibits a high level of stress (ie, a change that makes the compression process difficult to compress as much as a “normal” scene), a greater number of bits (eg, 50-100% addition) will be added to that GOP. Assigned to allow more accurate encoding of the scene.
[0278]
  It should be noted that many MPEG-2 devices use a constant bit rate. The constant bit rate is a good match for the constant bit rate transport and storage media. Transport systems such as broadcast channels, satellite channels, cables and fibers all have a fixed and constant total capacity. Also, since the digital compressed video tape storage system has a constant tape playback speed, it generates a constant recording or playback bit rate.
[0279]
  Other MPEG-2 devices such as DirecTV / DSS and DVD use some form of variable bit rate allocation. In the case of DirecTV / DSS, the variability is a combination of the scene stress of the current program versus the scene stress of adjacent TV programs sharing a common multiplex. The multiplex corresponds to a tuned satellite channel and transponder, which has a fixed total bit rate. In the case of a consumer video DVD, the capacity of the digital optical disk is 2.5 gigabytes, and it is necessary that the average MPEG-2 bit rate is 4.5 megabits / second for a 2 hr movie. However, the optical disc has a peak read speed performance of 100% at 9 megabits / second. For shorter movies, the average speed may be as high as 9 megabits / second. For a 2 hr movie, a method that achieves an average bit rate of 4.5 megabits / second uses a rate above this for scenes with high scene stress (large changes due to high scene motion) However, this is a method of using a speed lower than the average value while the scene stress is low (the change is small because the movement is small).
[0280]
  The bit rates of MPEG-2 and MPEG-4 are held constant by combining the modeling of the actual decoder buffer capacity and changing the quantization parameter to reduce the bit rate emitted by the encoder. Alternatively, a constant quantization parameter produces a bit whose number varies in proportion to scene changes and details, also known as scene “entropy”. A constant quantization parameter produces a relatively constant quality but variable bit rate. The changing quantization parameter can be used with a size bounded decoder buffer to smooth any variability and provide a constant bit rate.
[0281]
  Sharing many channels of a multiplex is one way that can support variable bit rates, as in the case of DirecTV, or the standard definition signal of ACATS / ATSC 19.3 Mbit / s 6 MHz. is there. Paired with low-entropy shows (like talk shows) and high-entropy shows (fast sports like hockey) statistical data is an instantaneous trade-off when applying bits to shows with higher entropy ( instantaneous tradeoff). A slow period in one show uses fewer bits and provides more bits for another fast-moving simultaneous show in the same multiplex.
[0282]
  These variable bit rate systems typically have a peak bit rate that is approximately 100% above the average value. Thus, these systems become constant bit rate systems at the highest bit rate, limiting the peak bit rate available during periods of high scene stress. There are also limits on the input bit rate of several MPEG-2 decoder systems, and the peak bit rate of such variable bit rate systems is also limited. However, the limit on the peak input bit rate will gradually rise beyond these other limits as the decoder is improved.
[0283]
  The general concept of each of these conventional bit rate control systems is that there is a small memory buffer in the decoder that holds one frame of moving images and almost one fraction of several frames. . Around 1990, when the decoder bit rate buffer was considered, there was concern that the cost of the buffer memory of the decoder would significantly affect the price of the decoder. However, at present, it has been confirmed that the cost of this buffer is negligible. In fact, a buffer of many seconds is now a negligible cost. In the near future, it can be assumed that the bit receiving memory buffer can hold a large amount of video information at a small cost. Furthermore, although the cost of storage media such as disks is rapidly decreasing, the capacity is rapidly increasing. Therefore, it is also reasonable to spool the compressed bit stream to a storage memory system such as a disk to obtain a storage capacity for many hours or many days. This is currently done by commercially available hard drive based home video recorders.
[0284]
  However, one basic problem remains that there is a time delay while waiting for bits in the compressed bit buffer. For broadcast television and movie distribution, a delay of a few seconds or tens of seconds is as long as an auxiliary selection stream is available to guide the ongoing program “tune-in” or “movie selection”, or When using a short delay with an initial buffer with a small initial start (eg, a movie), there is little effect on viewing. However, for teleconferencing or live interactive events, a small high speed running buffer is required to minimize delay. Except for live dialogue and teleconferencing applications, quality can be improved by using inexpensive large buffers.
[0285]
  In light of these trends, the structure of variable and constant bit rate compression methods can be significantly improved. These improvements include the following.
Greatly increase the buffer size in the decoder buffer model to provide many advantages of variable bit rate and constant bit rate simultaneously.
• Preloading of “interstitial” show titles to support momentary changes to the title while the decoder buffer begins to fill.
Utilize a partially filled FIFO (first in first out) decoder bit rate buffer at the start of a newly started program or movie, and then gradually increase the buffer fullness (and hence delay) as the program starts and then progresses To increase.
In order to increase the average bit rate during periods of high scene stress, an increased bit rate “module” (using the above modular bit rate concept) can be used (eg to a second FIFO, main memory or disk). Preload into the decoder bit memory (using spooling). Such preloading not only exceeds the average bit rate on a constant bit rate channel, but also allows a period of bit rate above the maximum bit in a variable bit rate system.
In the hierarchical structure of the present invention, all bits in the average (or constant) bit rate stream can be shunted to the base layer during scenes with high scene stress. However, enhancement layer bits for a scene can be preloaded into that scene and played out using a timing maker for synchronization. Again, the maximum (or constant) bit rate limit in transport and / or playback can be exceeded for the duration of using this method (limited only by the amount of available buffer space). It should be noted.
[0286]
Multi-layer DCT structure
Variable DCT block size
  Deformation wavelength (transform
wavelength) harmonic alignment is fundamental to the layered DCT structure. For example, FIG. 18 graphically illustrates the relationship of DCT harmonics between two resolution layers. In the current optimal two-layer arrangement of the present invention, the base layer is an arithmetic high frequency series (arithmetic) having

frequencies

1, 2, 3, 4, 5, 6 and 7 times the 8 × 8 pixel DCT block size 1900. DCT coefficients using harmonic series) are used. In a resolution enhancement layer with a factor of 2, these base layer harmonics map to 1/2, 1, 3/2, 2, 5/2, 3 and 7/2 frequencies of the corresponding enhancement layer DCT block 1902. . Since the entire frequency is held in the base layer, there is no penalty for the 1/2 term, but the remaining terms are only partially harmonized with the enhancement layer. For example, the frequency of 2, 4 and 6 times the macroblock size derived from the base layer is aligned with the frequency of 1, 2 and 3 times the macroblock size derived from the enhancement layer. These terms form a natural signal / noise ratio (SNR) hierarchy, as if additional accuracy was applied to these coefficients in the base layer.

Terms

3, 5 and 7 from the base layer are non-harmonic with the reinforcing layer, so they are orthogonal to the base layer only and do not provide any synergy with the reinforcing layer . The remaining

terms

4, 5, 6, and 7 of the enhancement layer represent additional details that the enhancement layer can provide to the image without overlapping the base layer. FIG. 19 graphically illustrates a similar relationship of DCT harmonics between the three resolution layers, showing the best enhancement layer 1904.
[0287]
  It will be appreciated that this structure has only partial orthogonality and partial alignment. Although this alignment and orthogonality is generally beneficial, the phase alignment of the DCT coding series has never been optimized for two (or more) spatial resolution layers. Rather, the DCT was designed as a set of orthogonal basis functions that take advantage of phase characteristics, with the phase-carrying imaginary term removed from the Fourier transform series. Although the DCT is clearly suitable for coding in a two-layer spatial coding structure, these issues of layer orthogonality and phase relationship are three or four. The focus is on extending the layered structure to two spatial resolution layers.
[0288]
  A solution to provide cross layer orthogonality is to utilize a different DCT block size for each resolution layer. For example, if the resolution of a given layer is doubled, the size of the DCT block is doubled. As a result, the resolution layered structure is harmonically aligned and the inter-layer coefficient orthogonality is optimal, thus providing optimal coding efficiency.
[0289]
  FIG. 20 is a diagram illustrating various DCT block sizes for different resolution layers. For example, a 4 × 4 pixel DCT block 2000 can be used for the base layer, an 8 × 8 pixel DCT block 2002 can be used for the next layer above, and a 16 × 16 pixel DCT block 2004 can be used for the third layer. The 32 × 32 pixel DCT block 2006 can be used for the fourth layer. Thus, each layer adds an additional harmonic term of perfect orthogonality to the underlying single or multiple layers. Optionally, additional accuracy (of SNR sense) can be added to the previously covered coefficient term. For example, the 16 × 16 pixel subset 2008 in the 32 × 32 pixel block 2006 can be used to improve the accuracy and (with SNA improved sense) of the 16 × 16 pixel DCT block 2004.
[0290]
Motion vector
  In MPEG-2, a macroblock corresponding to a motion vector consists of 16 × 16 pixels and is organized as four 8 × 8 DCT blocks. In MPEG-4, each macroblock can optionally be further subdivided into 8 × 8 regions corresponding to DCT blocks each having its own motion vector.
[0291]
  Even if the DCT block is preferably different in size in each layer, its motion compensation macroblock need not be constrained by this structure. The simplest structure is that motion is specified for all layers by base layer motion vectors, so a single motion vector for each base layer motion compensation macroblock applies to all higher layers as well. This is a structure that completely removes motion vectors from all enhancement layers. However, a more efficient structure is that for each layer, independently, (1) no motion vector (ie, use the base layer motion vector), (2) additional sub-pixel accuracy relative to the base layer motion vector, or (3) A structure in which each motion compensation macroblock is selected to be divided into two, four, etc. blocks each having an independent motion vector. The method of compensating for motion of overlapping blocks in MPEG-4 (OBMC) can be used to smooth the transition between motion compensation of independent blocks being moved. As detailed elsewhere in this description, the use of a negative lobe filter to place the sub-pixels is also beneficial in compensating for the motion of this DCT layer structure.
[0292]
  Thus, each DCT block in each layer can be divided into a number of motion vector blocks to compensate for motion, as optimal for that layer. FIG. 21 is a diagram showing an example in which a motion compensation macroblock is divided in order to confirm an independent motion vector. For example, when the base layer is built using a 4 × 4 pixel DCT block 2100, it uses from 1 (as shown) to as many as 16 motion vectors (one for each pixel). Can even use sub-pixel motion vectors. Accordingly, each higher level divides its larger corresponding DCT blocks 2102, 2104, 2106, where appropriate, and the encoded prediction quality (and thus saving DCT coefficient bits) versus motion vector. The optimum balance between the bits necessary to specify is obtained. Block partitioning to compensate for motion is a trade-off between the bits used to encode motion vectors and improved picture prediction.
[0293]
  As described elsewhere in this application, the use of guide vectors derived from the lower layer motion vectors to predict each motion vector in the higher layer still results in encoding efficiency and effectiveness. Is improved.
[0294]
Optimizing variable length coding
  MPEG-1, MPEG-2, MPEG-4, H.264. Variable length codes (eg, Huffman or arithmetic codes) utilized by compression systems such as H.263 (including DCT and non-DCT systems such as wavelets) are selected based on proven efficiency for small groups of test sequences Is done. These test sequences are limited to the type of image and represent only a relatively narrow range of bit rates, resolutions and frame rates. Further, the variable length code is selected based on average performance for each test sequence and test sequence as a group.
[0295]
  A substantially more optimal variable length coding system (1) applies a particular variable length coding table to each frame and then (2) selects the most optimal code for that particular frame. Experiments have shown that it can be obtained by Such a selection of the optimal variable length code can be applied to smaller units or groups of frames than frames (parts or regions of frames). The variable length codes used for motion vectors, DCT coefficients, macroblock types, etc., for each given unit (ie frame, subframe or group of frames), are each the unit's current resolution and bit rate. Can be optimized independently. This method can also be applied to the spatial resolution enhancement layer described elsewhere in this application.
[0296]
  The choice of which group of variable length codes to use can be carried in each frame (or subpart or group) using a small number of bits. In addition, the custom coding table can be downloaded (eg, on a data optical disk or fiber optic network) where reliable data transmission and playback are available.
[0297]
  MPEG-1, MPEG-2, MPEG-4, H.264. It should be noted that existing coding tables used by compression systems such as H.263, DVC-Pro / DV are predefined and static. Thus, application of this aspect of the invention is not upward compatible, but will be backward compatible with future coding systems.
[0298]
Enhancement system for MPEG-2 and MPEG-4
  Currently, there is a large installed base of a decoder capable of realizing MPEG-2 (MPEG-2 capable decoder). For example, both DVD players and DirecTV satellite receivers are currently in millions of homes. Since MPEG-4 is not compatible with MPEG-2, the improvements that MPEG-4 video compression decoding can provide over MPEG-2 are not yet available. However, both MPEG-4 and MPEG-2 are motion compensated DCT compression systems and share a common basic structure. The composition system of the MPEG-4 video coding system is fundamentally different from MPEG-2 and has several other extended features. In this discussion, only the full frame video coding aspect of MPEG-4 is considered.
[0299]
  There are a number of differences between MPEG-4 and MPEG-2, the main differences are as follows.
(1) MPEG-4 can arbitrarily divide a 16 × 16 macroblock into four 8 × 8 blocks, one for each DCT, and each 8 × 8 block has independent motion vectors. Have.
(2) The MPEG-4-B frame has a “direct” mode which is one type of prediction.
(3) The MPEG-4-B frame does not support the “I” macroblock, unlike MPEG-2 which supports the “I” macroblock of the B frame.
(4) The DCT coefficient of MPEG-4 can be encoded with a finer pattern than that of MPEG-2, but a well-known zigzag pattern is common to both MPEG-2 and MPEG-4.
(5) MPEG-4 supports 10-bit and 12-bit pixel depth, whereas MPEG-2 is limited to 8 bits.
(6) MPEG-4 retains the accuracy of a ¼ pixel motion vector, while MPEG-2 is limited to ½ pixel accuracy.
[0300]
  Some of these differences, such as the difference between B-frame “direct” mode and “I” macroblocks, mean that they are basically incompatible. However, both of these coding modes are freely chosen, and the encoder can choose to use neither (with a small loss of efficiency), so that this incompatibility can be eliminated. Similarly, the encoder can limit the MPEG-4 coding pattern for DCT coefficients to provide better MPEG-2 common properties (again with low efficiency loss).
[0301]
  The remaining three main terms, namely 8x8 four-way block split, 1/4 pixel motion vector accuracy and 10 and 12 bit pixel depth, are the basic structure already provided by MPEG-2. Can be regarded as “augmentation”.
[0302]
  This aspect of the invention takes advantage of the ability to provide these “additions” as separate structures. Thus, this augment can be encoded separately and carried with the standard MPEG-2 or MPEG-4 stream as a separate augment stream. This technique is also described in MPEG-1, H.264. It can also be used in video coding systems that share a common motion compensated DCT structure, such as H.263. FIG. 22 is a block diagram illustrating an augmentation system for an MPEG-2 type system. A main compressed data stream 2200 (shown in FIG. 22 including motion vectors, DCT coefficients, macroblock mode bits and I, B and P frames) is a conventional MPEG-2 type decoder 2202 and parallel enhancements. It is carried to the decoder 2204. The enhanced data stream 2206 (shown including 1/4 pixel motion vector accuracy, 8 × 8 four-way block split motion vector and 10-bit and 12-bit pixel depths) is simultaneously conveyed to the enhanced decoder 2204. . The enhanced decoder 2204 combines the two

data streams

2200 and 2206 and decodes them to provide an enhanced video output. Using this structure, encoding enhancement can be applied to any motion compensated DCT compression system.
[0303]
  The use of this structure can be biased by a decoder for performing a more optimal MPEG-2 decoding or a more optimal enhancement signal. Such decoding enhanced by adding improvements in MPEG-4 video encoding favors a little compromise on the picture quality of pictures decoded by MPEG-2 to achieve optimal enhanced picture quality. Expected to be.
[0304]
  For example, when MPEG-2 video encoding is enhanced with MPEG-4, an MPEG-2 motion vector can be used as a “predictor” for the four-way split motion vector (MPEG-4 uses a four-way split. (If selected), or directly for non-split 16 × 16 macroblocks. The 1/4 pixel motion vector resolution can be encoded (vertical and horizontal) as an additional bit of accuracy in the enhanced data stream 2206. The extra pixel depth can be encoded into DCT coefficients as extra precision before applying the inverse DCT function.
[0305]
Spatial resolution layering, which is an important issue of the present invention, works best when the base layer is encoded as completely as possible. MPEG-2 performs incomplete encoding and produces inferior performance to the resolution enhancement layer. By using the augmentation system, the base layer augments the MPEG-2 data stream that encodes the base using, for example, the above-described MPEG-4 improvements (and other improvements described herein). It can be improved by doing. The resulting base layer, along with the accompanying enhanced data stream, is the quality and efficiency obtained using the improved base layer resulting from better encoding (eg, by MPEG-4 and other improved methods of the present invention). Have the majority. The resulting improved base layer can be applied using other aspects of the invention or with two or more resolution enhancement layers.
[0306]
  Other improvements of the present invention, such as a better filter with negative lobes to compensate for motion, can also be invoked by an enhanced enhancement decoder, and improvements provided by motion compensated compression systems such as MPEG-4. Further improvements beyond that point arise.
[0307]
Guide vector for the space enhancement layer
  The motion vector contains a large portion of the bits allocated within each resolution enhancement layer created by the present invention. It has been determined that the number of bits required for the enhancement layer motion vector can be substantially reduced by using the corresponding motion vector as a “guide vector” at the same location in the base layer. Therefore, the enhancement layer motion vectors are encoded with only a small search range search for the corresponding guide vector center from the base layer. This is particularly important for the MPEG-4 enhancement layer. This is because each macroblock can arbitrarily have four motion vectors, and a 1/4 pixel resolution of the motion vector can be used.
[0308]
  FIG. 23 is a diagram showing the use of the motion vector from the base layer 2300 as the guide vector for the resolution enhancement layer 2302. The motion vector 2304 from the base layer 2300 serves as a guide vector 2304 ′ to improve the motion vector of the enhancement layer 2302 after extending to the scale of the resolution enhancement layer 2302. Thus, only a small range needs to be searched to find the corresponding enhancement layer 2302 motion vector 2306. The process is the same for all motion vectors from the base layer. For example, in MPEG-4, a 16 × 16 pixel base layer macroblock can be arbitrarily divided into four 8 × 8 pixel motion vector blocks. Next, the corresponding factor-of-two enhancement layer uses the simultaneously arranged motion vectors from the base layer as guide vectors. In this example, a motion vector from one of the 8 × 8 motion vector blocks in the base layer guides the search for motion vectors in the corresponding 16 × 16 pixel macroblock in the enhancement layer. This 16 × 16 block can be further subdivided into four 8 × 8 motion vector blocks, optionally using the same corresponding base layer motion vector as a guide vector.
[0309]
  These small search range motion vectors in the enhancement layer are then encoded with much higher efficiency (ie, fewer bits are required to code the smaller enhancement layer motion vector 2306). This guide vector method can be applied to MPEG-2, MPEG-4 or other suitable single or multiple motion compensated spatial resolution enhancement layers.
[0310]
Enhanced mode
  Figures 24A-24E are diagrams of data streams appearing in a typical professional level enhancement mode. These figures show photographic data (including intermediate stages) in the left column, processing steps in the middle column, and output in the right column. It should be noted that this is just one example of how to combine several processing steps described here. Simpler and more complex different combinations can be arranged to achieve different levels of compression, aspect ratio and image quality.
[0311]
  FIG. 24A shows an initial photograph 2400 of 2k × 1k pixels. This image is down-filtered (2402) to 1k × 512 pixels 2404. A motion vector 2406 is created from the initial photograph and output as a file 2407. The 1k × 512 pixel image 2404 is compressed / decompressed (2408) into a 1k × 512 restored image 2410, and the compressed version is output as a base layer 2412 together with an associated motion vector file 2416. The restored image 2410 of 1k × 512 is expanded (2418) to become a 2k × 1k image 2420. The 1k × 512 image 2404 is expanded (2422) to become a 2k × 1k image 2424. The 2k × 1k image 2420 is subtracted from the original image 2400 (2428) to create a 2k × 1k differential photograph 2428.
[0312]
  The 2k × 1k image 2424 is subtracted from the original image 2400 (2430) to create a 2k × 1k difference photograph 2432. The amplitude of the 2k × 1k difference photograph 2432 is reduced to a selected size (for example, 0.25 times) (2434), and a difference photograph 2436 having a size of 2k × 1k is created. The 2k × 1k difference photo 2436 is added to the 2k × 1k difference photo 2428 (2438) to create a 2k × 1k combined difference photo 2440. The combined difference photo 2440 is encoded / decoded using the original motion vector (2442) and then the encoded enhancement layer 2444 is output (MPEG-2 in this example), then 2k × 1k. The decoded enhancement layer 2246 is output. The 2k × 1k decoded enhancement layer 2246 is added to the 2k × 1k image 2420 (2448) to create a 2k × 1k reconstructed full base plus enhancement image 2450. The original image 2400 is subtracted from the 2k × 1k reconstructed full base plus enhanced image 2450 (2452) to create a 2k × 1k second layer difference photo 2454. The amplitude of the 2k × 1k second layer difference photo 2454 is increased (2456), and a 2k × 1k difference photo 2458 is created. Next, red channel information 2458, green channel information 2460, and blue channel information 2462 are extracted to create a red difference image 2464, a green difference image 2466, and a blue difference image 2468, respectively. The motion vector file 2407 is used to encode / decode (2470) the second red layer from the red difference photo 2464 into a red second enhancement layer 2472 and a decoded red difference image 2474; The second green layer from 2466 is encoded / decoded (2476) into a green second enhancement layer 2478 and a decoded green difference image 2480; then the second blue layer from the blue difference photograph 2468 is encoded / decoded. (2482), a blue second enhancement layer 2484 and a decoded blue difference image 2486 are obtained. The decoded red difference image 2474, the decoded green difference image 2480, and the decoded blue difference image 2486 are connected to the decoded RGB difference image 2490 (2488). The amplitude of the decoded RGB difference image 2490 is reduced (2492) to create a second decoded RGB difference image 2494. The second decoded RGB difference image 2494 is added to the 2k × 1k reconstructed full base plus enhancement image 2450 (2496) and a 2k × 1k reconstructed second enhancement layer image 2498. Make. The 2k × 1k reconstructed second enhancement layer image 2498 is subtracted (2500) from the original image 2400 to produce a 2k × 1k final residual image 2502. This 2k × 1k final residual image 2502 is then losslessly compressed (2504) to produce separate red, green and blue final residual difference images 2506.
[0313]
Use of computer
  The present invention can be implemented in hardware or software or a combination of both. Preferably, however, the present invention is implemented in a computer program executing on one or more programmable computers, each of which includes at least one processor, data storage system (volatile and non-volatile memory and / or A storage element), an input device and an output device. Program code is applied to the input data to perform the functions described herein to generate output information. The output information is applied to one or more output devices in a known manner.
[0314]
  Each such program can be executed in any desired computer language (including machine language, assembly language or high level procedural language, logic language or object oriented programming language) to communicate with the computer system. In any case, the language may be a translated or interpreted language.
[0315]
Such a computer program is preferably stored in a storage medium or storage device (eg, ROM, CDROM or magnetic or optical medium) readable by a general purpose or dedicated programmable computer system, and the storage medium or storage device is stored in the storage medium or storage device. When read by a computer, the computer is configured and operated to perform the procedures described herein. The system of the present invention can also be considered to be provided as a computer-readable storage medium configured by a computer program, and the storage medium arranged and configured in this way defines a computer system in a specific predefined manner. To perform the functions described herein.
[0316]
Conclusion
  Different aspects of the invention that are considered novel include, but are not limited to, the following ideas.
An electronic camera with 72fps as the source frame speed while providing the benefits of high frame speed to new electronic video systems to provide compatibility with existing 24fps film and video infrastructure widely used around the world Use for.
Utilizes a method for motion compensation and frame rate conversion derived from US patent application Ser. No. 09 / 435,277 (named “System And Method For Motion Compensation and Frame Rate Conversion”, filed November 5, 1999) To convert from 72 fps and / or 120 fps to 60 fps.
Conversion from 72 fps to 24 fps using a weighted filter in the range of [0.1, 0.8, 0.1] to [0.25, 0.5, 0.25] and almost [0 .1, 0.2, 0.4, 0.2, 0.1] using a weighting of 120 fps to 24 fps.
A 3-frame overlapping set (an advanced for each 1/60 frame) that uses a weighting in the range [0.1, 0.8, 0.1] to [0.25, 0.5, 0.25] Conversion from 120 fps to 60 fps using 2/120).
Utilizes a method for motion compensation and frame rate conversion derived from US patent application Ser. No. 09 / 435,277 (named “System And Method For Motion Compensation and Frame Rate Conversion”, filed November 5, 1999) Thus, for a small proportion of scenes where the generally preferred simple weight is less than the desired quality, increase motion blur and convert the frame rate from 72 fps (or other higher rate) source to 24 fps.
Use 24 fps monitoring with the weighting function while shooting using higher frame rates (72 fps, 120 fps, etc.).
Release simultaneously the induced 24fps results with the original high frame rate
Perform de-graining and / or noise reduction filtering before layered coding.
-After decryption, re-graining or re-noising as a creative effect.
• De-interlacing before layering compression.
Apply a 3 field frame deinterlacer before doing single layer and multi-layer compression
• Up-filter photos before single layer and multi-layer compression to improve photo resolution.
Adjusting the size of the sub-regions in the enhancement layer and the relative ratio of the bits allocated to the base layer and the enhancement layer.
Treat the vertical and horizontal relationships independently so that the fractional relationships are independently different.
Giving the compression unit (eg, GOP) a high bit rate (automatically, by detecting a high value of the rate control quantization parameter or by manual control) during periods of high compression stress.
Use a “modularized” bit rate where the natural unit of the compression system and the layered compression system can take advantage of the increased bit rate of the modular unit.
Preloading the single or multiple decompression buffers with increased bit rate modular units for use in compression or layered compression systems.
Use a constant bit rate system in one or more layers of the layered compression system of the present invention.
Use a variable bit rate system in one or more layers of the layered compression system of the present invention.
Combine the fixed bit rate system used and the variable bit system for use in the various layers of the hierarchical compression system of the present invention.
• Use correspondingly larger DCT block sizes and additional DCT coefficients for use when layering resolution (also referred to as “spatial scalability”). For example, if the resolution of a given layer is doubled, the DCT block size is doubled. As a result, the resolution layered structure is aligned harmonically and the orthogonality of the interlayer coefficients is optimal, so that optimal encoding efficiency is provided.
Use multiple motion vectors per unit DCT block to allow large and small DCT blocks to optimize the tradeoff between motion vector bits and improved motion compensated prediction.
Use upsizing and downsizing filters with negative lobes, especially prefix sinc filters.
• Use motion compensated displacement filters with negative lobes.
Selecting an optimal variable length code in a relatively instantaneous pace, eg each frame, each region of the frame (eg several scan lines or macroblock lines or each quadrant) or any number of frames.
• Add improved encoding capabilities using augmented streams to existing compression systems and use new enhanced decoders to improve image quality as well as provide forward compatibility.
• Use enhanced decoded photos to provide a higher quality base layer and layer resolution.
• Share encoding elements between similar moving image encoding systems to provide upward compatibility as well as a path to improvement.
Consider the generation of a compressed bitstream that is partly common to the two types of decoders and includes the provision of choosing one or the other of the decoders into the encoding process.
Use the base layer motion vector as the guide vector and center the range of motion vectors used in the enhancement layer.
-Applying a combination of the above methods to the enhancement layer, or MPEG-1, MPEG-2, MPEG-4, H.264. Apply to improve other compression systems, including H.263, DVC-pro / DV, and wavelet-based systems.
[0317]
  A number of embodiments of the invention have been described. However, various modifications can be made without departing from the spirit and scope of the invention. For example, although the preferred embodiment utilizes MPEG-2 or MPEG-4 encoding and decoding methods, the present invention provides any equivalent that provides I, P and / or B frame and layer equivalents. Works with any standard. Accordingly, the invention is not limited by the specifically illustrated embodiments, but only by the scope of the claims of the present application.
[Brief description of the drawings]
FIG. 1 is a timing diagram showing pull-down rates for 24 fps and 36 fps materials displayed at 60 Hz.
FIG. 2 is a first preferred MPEG-2 coding pattern.
FIG. 3 is a second preferred MPEG-2 coding pattern.
FIG. 4 is a block diagram illustrating temporal phase decoding according to a preferred embodiment of the present invention.
FIG. 5 is a block diagram showing a 60 Hz interlaced input for a converter capable of outputting both 36 Hz and 72 Hz frames.
FIG. 6 is a diagram showing a “master template” for a base MPEG-2 layer of 24 Hz or 36 Hz.
FIG. 7 is a diagram illustrating base resolution template enhancement performed using hierarchical resolution scalability utilizing MPEG-2.
FIG. 8 is a diagram illustrating a preferred layered resolution encoding process.
FIG. 9 is a diagram illustrating a preferred layered resolution decoding process.
FIG. 10 is a block diagram illustrating a combination of resolution and temporal scalable options for a decoder according to the present invention.
FIG. 11 is a diagram of a base layer extended by providing photographic details using gray areas and enhancements.
FIG. 12 is a diagram of the relative form, amplitude and lobe polarity of a preferred downsizing filter.
FIG. 13A is a diagram of a pair of relative configurations, amplitudes and lobe polarities of a preferred upsizing filter that upsizes by a factor of two.
FIG. 13B is a diagram of a pair of relative configurations, amplitudes and lobe polarities of a preferred upsizing filter that upsizes by a factor of two.
FIG. 14A is a block diagram of an odd field deinterlacer.
FIG. 14B is a block diagram of an even field de-interlacer.
FIG. 15 is a block diagram of a frame deinterlacer that uses three de-interlaced fields.
FIG. 16 is a block diagram of an additional layering mode based on the 2/3 base layer.
FIG. 17 is a diagram of one embodiment in which a higher bit rate is applied to the modular portion of the compressed data stream.
FIG. 18 is a diagram showing the relationship of DCT harmonics between two resolution layers.
FIG. 19 is a diagram showing a similar relationship of DCT harmonics between three resolution layers.
FIG. 20 is a diagram illustrating a set of DCT block sizes matched to a multi-resolution layer.
FIG. 21 is a diagram illustrating an example of dividing a motion compensation macroblock to identify independent motion vectors.
FIG. 22 is a block diagram showing an increase method of an MPEG-2 type system.
FIG. 23 is a diagram showing the use of motion vectors from the base layer as guide vectors for the resolution enhancement layer.
FIG. 24A is a data flow diagram illustrating one embodiment of a professional level enhancement mode.
FIG. 24B is a data flow diagram illustrating one embodiment of a professional level enhancement mode.
FIG. 24C is a data flow diagram illustrating one embodiment of a professional level enhancement mode.
FIG. 24D is a data flow diagram illustrating one embodiment of a professional level enhancement mode.
FIG. 24E is a data flow diagram illustrating one embodiment of a professional level enhancement mode.

Claims

A method for enhancing image quality within an image coding system, comprising:
Apply a median filter to the horizontal pixel values of the digital video image;
Apply a median filter to the vertical pixel values of the digital video image;
Average the results of filtering the horizontal pixel value and the vertical pixel value ,
The following 5 items:
(1) Current digital video image,
(2) an average value of a horizontal median value and a vertical median value of the current digital video image;
(3) When a difference value between a pixel value of the current digital video image and a temporal median value of the pixel value is compared with a threshold value, and the difference value is larger than the threshold value, the current value The threshold time-processed median value that becomes the temporal median when the difference value is smaller than the threshold value.
(4) an average value of the horizontal median value and the vertical median value of the threshold-processed temporal median value, and
(5) the median of the thresholded temporal median and the horizontal and vertical median of the current digital video image;
To create a weighted first-order sum of and to produce a digital video image with reduced noise,
A method involving that.

Applying a median filter to the diagonal pixel values of the digital video image and then averaging the results of filtering the diagonal pixel values of the digital video image with reduced noise;
The method of claim 1 further comprising: