JP4552263B2

JP4552263B2 - Digital signal processing apparatus and method, and digital image signal processing apparatus and method

Info

Publication number: JP4552263B2
Application number: JP2000101544A
Authority: JP
Inventors: 哲二郎近藤; 泰弘藤森
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2000-04-03
Filing date: 2000-04-03
Publication date: 2010-09-29
Anticipated expiration: 2020-04-03
Also published as: JP2001285870A

Description

【０００１】
【発明の属する技術分野】
この発明は、符号化されたディジタル画像信号に対してクラス分類適応処理を施すときに、予測精度を向上することができるディジタル画像信号処理装置および方法に関する。
【０００２】
【従来の技術】
画像信号の圧縮符号化方式のひとつとしてＭＰＥＧ２（Moving Picture Expert Group phase ２) による符号化方式が用いられている。ＭＰＥＧ２による送受信または記録再生システムでは、画像信号に対してＭＰＥＧ２による圧縮符号化処理を施して送信または記録し、また、受信または再生した画像信号に対して、ＭＰＥＧ２による圧縮符号化処理に対応する伸長復号化を施すことにより、元の画像信号を復元する。
【０００３】
ＭＰＥＧ２による符号化処理では、符号化処理に汎用性を持たせ、また、符号化による圧縮の効率を向上させるために、符号化された画像データと共に、復号化処理用の付加情報を伝送している。付加情報は、ＭＰＥＧ２のストリーム中のヘッダ中に挿入され、復号化装置に対して伝送される。
【０００４】
ＭＰＥＧに限らず、復号化によって得られる画像信号の特性は、適用される符号化復号化方式によって大きく異なる。例えば輝度信号、色差信号、三原色信号などの信号種類に応じてその物理的な特性（周波数特性等）が大きく相違する。
この相違が符号化復号化処理を経た復号信号にも残ることになる。また、一般的に画像の符号化復号化処理では、時空間の間引き処理を導入することによって、符号化の対象となる画素数を低減することが多い。間引き方法によって、画像の時空間解像度の特性が大きく相違する。さらに、時空間解像度特性の相違が小さい場合においても、符号化における圧縮率（伝送レート）の条件によってＳ／Ｎ、符号化歪み量などの画質特性が大きく異なる。
【０００５】
本願出願人は、先に、クラス分類適応処理を提案している。これは、予め（オフラインで）学習処理において、実際の画像信号（教師信号および生徒信号）を使用して予測係数をクラス毎に求め、蓄積しておき、実際の画像変換処理では、入力画像信号からクラスを求め、クラスに対応する予測係数と入力画像信号の複数の画素値との予測演算によって、出力画素値を求めるものである。クラスは、作成する画素の空間的、時間的近傍の画素値の分布、波形に対応して決定される。実際の画像信号を使用して予測係数を演算し、また、クラス毎に予測係数を演算することによって、種々の信号処理が可能なものである。例えば時空間の解像度を入力信号以上とする解像度創造の処理、サブサンプリングによって間引かれた画素の補間、ノイズの低減、エラーの修整等の処理が可能である。
【０００６】
【発明が解決しようとする課題】
クラス分類適応処理における予測精度を向上するには、クラスを決定するのに使用する複数の画素の時間および／または空間における相関が高いことが必要である。また、予測演算に使用する複数の画素の時間および／または空間における相関が高いことも、予測精度の向上に効果的である。
【０００７】
例えば、クラス分類適応処理において、対象画像信号の動き情報をクラスに導入することによって予測性能を向上することができる。その動き情報は、動きベクトルのような詳細な動き情報の表現形式が効果的である。しかしながら、符号化復号化処理を経た画像信号から動きベクトルを検出する場合には、復号画像信号の歪みのために動きベクトルの検出精度が低下し、また、動きベクトル検出のために、多量の演算処理が必要となるという問題があった。
【０００８】
従って、この発明の目的は、符号化復号化の処理を経たディジタル画像信号に対して付加情報を基づいて、クラス分類または予測演算に使用する複数のデータの抽出範囲または位置を変更することよって、予測精度を向上することが可能なディジタル画像信号処理装置および方法を提供することにある。
【０００９】
【課題を解決するための手段】
上述した課題を解決するために、請求項１の発明は、符号化されたディジタル画像信号を復号化することによって生成される入力画像信号に対して画素単位の予測信号処理を施すようにしたディジタル画像信号処理装置において、
復号化処理用の付加情報を抽出する付加情報抽出手段と、
入力画像信号から、所定の注目画素周辺の複数の画素からなるクラスタップ領域を抽出する第１の領域切り出し手段と、
第１の領域切出し手段によって切り出されるクラスタップ領域のレベル分布の特徴量を抽出する特徴量抽出手段と、
付加情報および特徴量からクラス情報を生成するクラス情報生成手段と、
入力ディジタル画像信号から、所定の注目画素周辺の複数の画素からなる予測タップ領域を抽出する第２の領域切り出し手段と、
クラス情報生成手段で生成されたクラス情報に対応して予め決定され、処理後の出力画像信号を推定するための予測係数が記憶手段に記憶されており、
クラス情報生成ステップで生成されたクラス情報に従って、記憶手段から選択される予測係数と、第２の領域切り出し手段で抽出された予測タップ領域の複数の画素との積和演算によって、注目画素に対する画素値を予測生成するための演算処理を行う演算処理手段とを有し、
予測係数と第２の領域切り出し手段によって抽出される予測タップ領域の複数の画像データとの積和演算の計算値と、出力画像信号に対応する所定の画像信号中の真の画素値との差を最小とするように、予測係数が予め定められ、
付加情報には、処理対象画像信号の種類を表す情報、処理対象画像信号の時間および／または空間解像度情報、および符号化の圧縮率の少なくとも一つが含まれることを特徴とするディジタル画像信号処理装置である。
【００１０】
請求項３の発明は、符号化されたディジタル画像信号を復号化することによって生成される入力画像信号に対して画素単位の予測信号処理を施すようにしたディジタル画像信号処理方法において、
復号化処理用の付加情報を抽出する付加情報抽出ステップと、
入力画像信号から、所定の注目画素周辺の複数の画素からなるクラスタップ領域を抽出する第１の領域切り出しステップと、
第１の領域切出しステップによって切り出されるクラスタップ領域のレベル分布の特徴量を抽出する特徴量抽出ステップと、
付加情報および特徴量からクラス情報を生成するクラス情報生成ステップと、
入力ディジタル画像信号から、所定の注目画素周辺の複数の画素からなる予測タップ領域を抽出する第２の領域切り出しステップと、
クラス情報生成ステップで生成されたクラス情報に対応して予め決定され、処理後の出力画像信号を推定するための予測係数が記憶手段に記憶されており、
クラス情報生成ステップで生成されたクラス情報に従って、記憶手段から選択される予測係数と、第２の領域切り出しステップで抽出された予測タップ領域の複数の画素との積和演算によって、注目画素に対する画素値を予測生成するための演算処理を行う演算処理ステップとを有し、
予測係数と第２の領域切り出しステップによって抽出される予測タップ領域の複数の画像データとの積和演算の計算値と、出力画像信号に対応する所定の画像信号中の真の画素値との差を最小とするように、予測係数が予め定められ、
付加情報には、処理対象画像信号の種類を表す情報、処理対象画像信号の時間および／または空間解像度情報、および符号化の圧縮率の少なくとも一つが含まれることを特徴とするディジタル画像信号処理方法である。
【００１３】
この発明によれば、クラスタツプ領域および予測タツプ領域の少なくとも一方を復号化処理用の付加情報に基づいて変更することによって、クラス分類適応処理における予測精度を向上することができる。
【００１４】
【発明の実施の形態】
以下、この発明の一実施形態について説明する。一実施形態は、サブサンプリングで間引かれた画素をクラス分類適応処理によって補間するようにしたディジタル画像信号処理の例である。まず、図１を参照して、予測画像信号（すなわち、補間された画像信号）の生成に係る構成について説明する。入力ビットストリームが復号器１に供給される。ここでは、入力ビットストリームは、送受信システム（または記録再生システム、以下、同様である。）において、サブサンプリングされ、ＭＰＥＧ２で圧縮符号化された画像データと、付加情報等のその他のデータとである。復号器１からは、復号化された画像信号と、復号化用の付加情報とが出力される。復号器１においては、従来の処理による間引き画素の補間がなされるが、真値と補間画素値との誤差が十分には小さくならない。一実施形態では、クラス分類適応処理によって従来よりも改善された補間処理がなされ、復号器１の出力信号内の補間画素が一実施形態により生成された補間画素と置き換えられる。
【００１５】
付加情報は、復号化処理に必要な付随情報であり、入力ビットストリーム中のシーケンス層、ＧＯＰ層、ピクチャー層のそれぞれのヘッダ中に挿入されており、復号器１は、付加情報を使用して復号化処理を行い、また、付加情報を分離して出力する。付加情報中には、サブサンプリングの間引き構造を示す識別情報が含まれている。識別情報からサブサンプリングパターン生成部１０は、復号データ内の画素の位相に合わせて、補間対象画素か否かを表示するサブサンプリングパターンデータを生成する。復号データとクラス分類適応処理で生成された補間画素とがセレクタ１１に供給され、パターンデータにしたがって、セレクタ１１が制御され、その出力に間引き画素が補間された出力画像信号が得られる。なお、サブサンプリングパターンは、例えば複数の種類のものが用意されており、補間対象画素の位置によって切り替えられる。
【００１６】
付加情報は、付加情報抽出部２に供給され、クラス分類適応処理に使用される付加情報が付加情報抽出部２から選択的に出力される。この抽出された付加情報が付加情報クラス生成部３に供給される。例えばクラス分類適応処理に使用される付加情報として、以下に挙げるものがある。
【００１７】
(1) 信号種類情報：コンポーネント信号の各成分（Ｙ，Ｕ，Ｖのコンポーネント、Ｙ，Ｐｒ，Ｐｂのコンポーネント、Ｒ，Ｇ，Ｂのコンポーネント等）
(2) 画像フォーマット情報：インターレース／プログレッシブの識別情報、フィールドまたはフレーム周波数（時間解像度情報）、水平画素数や垂直ライン数の画像サイズ情報（空間解像度情報）、４：３，１６：９等のアスペクトレシオ情報
(3) 画質情報：伝送ビットレート（圧縮率）情報
(4) 動きベクトル：水平と垂直の動き量情報
画像符号化の対象信号は、種々のものがあり、上述の付加情報を含む各種制御信号を伝送することによって受信側での復号を実現している。上述の付加情報で示される種々の仕様や属性によって、復号画像信号の信号特性が大きく異なる。そこで、この特性情報をクラス分類適応処理に導入することによって、予測性能の向上が図られる。
【００１８】
復号器１からの復号化画像信号とサブサンプリングパターン生成部１０からのパターンデータとが領域切出し部４および予測タップデータ生成部５に供給される。領域切出し部４は、入力画像信号から複数の画素からなる領域を抽出し、抽出した領域に係る画素データ（サブサンプリングパターンで指示される伝送画素）を特徴量抽出部６に供給する。特徴量抽出部６は、供給される画素データに１ビットＡＤＲＣ等の処理を施すことによってＡＤＲＣコードを生成し、生成したＡＤＲＣコードをクラスコード生成部７に供給する。領域切出し部４において抽出される複数の画素領域をクラスタップと称する。クラスタップは、注目（目標）画素の空間的および／または時間的近傍に存在する複数の画素からなる領域である。後述するように、クラスは、注目（目標）画素ごとに決定される。
【００１９】
ＡＤＲＣは、クラスタップ内の画素値の最大値および最小値を求め、最大値および最小値の差であるダイナミックレンジを求め、ダイナミックレンジに適応して各画素値を再量子化するものである。１ビットＡＤＲＣの場合では、タップ内の複数の画素値の平均値より大きいか、小さいかでその画素値が１ビットに変換される。ＡＤＲＣの処理は、画素値のレベル分布を表すクラスの数を比較的小さなものにするための処理である。したがって、ＡＤＲＣに限らず、ベクトル量子化等の画素値のビット数を圧縮する符号化を使用するようにしても良い。
【００２０】
また、特徴量抽出部６からクラスコード生成部７に対して、パターンデータに基づくクラス情報が供給される。すなわち、補間の対象画素（注目画素）のサブサンプリングパターンがサブサンプリングパターンクラスとしてクラスコード生成部７に供給される。なお、サブサンプリングパターンクラスは、付加情報から付加情報クラス生成部３において生成することも可能である。
【００２１】
クラスコード生成部７には、付加情報クラス生成部３において、付加情報に基づいて生成された付加情報クラスも供給される。クラスコード生成部７は、付加情報クラスとＡＤＲＣコードとサブサンプリングパターンクラスとに基づいて、クラス分類の結果を表すクラスコードを発生し、クラスコードを予測係数ＲＯＭ８に対してアドレスとして供給する。ＲＯＭ８は、供給されるクラスコードに対応する予測係数セットを予測演算部９に出力する。予測係数セットは、後述する学習処理によって予め決定され、クラス毎に、より具体的にはクラスコードをアドレスとする形態で予測係数ＲＯＭ８に記憶されている。予測係数は、外部から予測係数のダウンロードが可能なＲＡＭの構成のメモリに蓄積しても良い。
【００２２】
一方、予測タップデータ生成部５は、入力画像信号から複数の画素からなる所定の領域（予測タップ）を抽出し、抽出した予測タップの画素データを予測演算部９に供給する。予測タップは、クラスタップと同様に、注目（目標）画素の空間的および／または時間的近傍に存在する複数の画素からなる領域である。予測タップデータ生成部５に対して、パターンデータが供給されており、パターンデータによって間引き画素と指示されるものは、予測タツプの画素として使用されない。予測演算部９は、予測タップデータ生成部５から供給される画素データと、ＲＯＭ８から供給される予測係数セットとに基づいて以下の式（１）に従う積和演算を行うことによって、予測画素値を生成し、予測画素値を出力する。予測タップと上述したクラスタップは、同一、または別々の何れでも良い。
【００２３】
ｙ＝ｗ₁×ｘ₁＋ｗ₂×ｘ₂＋‥‥＋ｗ_n×ｘ_n （１）
ここで、ｘ₁，‥‥，ｘ_nが予測タップの各画素データであり、ｗ₁，‥‥，ｗ_nが予測係数セットである。予測演算は、この式（１）で示す１次式に限らず、２次以上の高次式でも良いし、非線形であっても良い。
【００２４】
予測画像信号は、復号器１の出力画像信号中の間引き画素が補間修整されたものである。クラス分類適応処理は、固定係数のフィルタによって間引き画素を補間するのと異なり、予め実際の画像信号を使用して求めた予測係数を使用するので、より真値に近い画素値を求めるように、間引き画素を補間することができる。
【００２５】
図２は、領域切出し部４によって抽出されるクラスタップの配置の一例を示す。復号化画像信号の内で注目画素とその周辺の複数画素との合計７個の画素によってクラスタップが設定される。図３は、予測タップデータ生成部５から出力される予測タップの配置の一例を示す。復号化画像信号の内で、注目画素と注目画素を中心とした周辺の複数の画素との合計１３個の画素によって予測タップが設定される。なお、図２および図３において、実線は、第１フィールドを示し、破線が第２フィールドを示す。また、図示のタップの配置は、一例であって、種々の配置を使用することができる。
【００２６】
次に、図４を参照して、クラスコード生成部７において形成されるクラスコード（予測係数ＲＯＭのアドレス）と、予測係数ＲＯＭ８に記憶されている予測係数との一例について説明する。図４に示すクラス情報の内で、信号種類クラス、フォーマットクラス、圧縮率（伝送レート）クラス、動きベクトルクラスは、付加情報クラス生成部３で生成されるクラスである。信号特徴量クラスは、特徴量抽出部６で抽出された特徴量に基づくクラス、例えばＡＤＲＣクラスである。サブサンプリングパターンクラスは、パターンデータに基づいて特徴量抽出部６で生成されるクラスである。図４の表において、最も左側の信号種類クラスがアドレスの最上位側となり、最も右側の信号特徴量クラスが最も下位側となる。
【００２７】
信号種類クラスは、例えばＹ，Ｕ，ＶとＹ，Ｐｒ，Ｐｂとの２種類とされ、各信号種類に対応して予測係数が別々に求められ、各信号種類がクラスＫ０，Ｋ１で区別される。フォーマットクラスは、処理対象の画像の時空間解像度特性に対応したもので、例えば２種類とされ、各フォーマットクラスに対応してＦ０，Ｆ１のクラスが規定される。例えばインターレースの画像であれば、Ｆ０、プログレッシブの画像であれば、Ｆ１のクラスが割り当てられる。画像フォーマットのクラスの他の例は、フィールドまたはフレーム周波数、水平画素数または垂直ライン数である。一例として、Ｆ０，Ｆ１，Ｆ２，・・・と番号が大きくなるほど、時空間解像度が高くなる。
【００２８】
圧縮率（伝送レート）クラスは、画質情報に基づいたクラスであり、ｉ種類のクラスＲ０〜Ｒi-1 が用意されている。圧縮率が高いほど符号化歪み量が多くなる。動きベクトルクラスは、注目画素が含まれるフレーム（現フレーム）と時間的に前のフレームとの間の動きベクトルに応じたクラスであり、ｊ種類用意されている。圧縮率クラスおよび動きベクトルクラスは、個々の値でも良いが、その場合には、クラス数が多くなるので、代表的な複数の値にまとめられている。例えば適当なしきい値によって形成された複数の範囲毎に一つの代表値を設定し、その代表値に対応したクラスを設定すればよい。具体的には、水平方向および垂直方向の動きを表現した動きベクトルから静止、小さな動き、大きな動きとの３段階のクラスを形成しても良い。
【００２９】
以上の４種類のクラスが付加情報クラス生成部３において生成されるクラスである。但し、上述したクラスは、一例であり、一部のクラスのみを使用しても良い。例えば付加情報クラスのみをクラスとして使用しても良い。そして、上述した４種類のクラスの下位側に、特徴量抽出部６において生成されたサブサンプリングパターンクラスが付加される。サブサンプリングパターンクラスとしては、ｍ種類用意されている。さらに、サブサンプリングパターンクラスの下位側に特徴量抽出部６において生成された信号特徴量クラス（例えばＡＤＲＣコードに基づくクラス）が付加される。信号特徴量クラスとしては、ｋ種類用意されている。
【００３０】
このように、４種類の付加情報クラスとサブサンプリングパターンクラスと信号特徴量クラスで定まるクラス毎に予測係数セットがＲＯＭ８に記憶されている。上述した式（１）で示される予測演算を行う時には、ｗ₁，ｗ₂，‥‥，ｗ_nのｎ個の予測係数セットが各クラス毎に存在する。
【００３１】
さらに、この発明の一実施形態では、復号器１からの復号画像信号の特性に基づいて、クラス分類のためのデータ抽出方法と、予測タップの構造を変更することによって、クラス分類適応処理の予測性能を向上する。付加情報抽出部２によって抽出される付加情報によって、復号画像信号の特徴量を抽出するクラスタップ構造を変更するために、付加情報によって領域切出し部４で抽出されるクラスタップのパターンが切り替えられる。特徴量抽出部６がＡＤＲＣによって特徴量としての波形、レベル分布を抽出する場合、対象画像の画像フォーマット情報例えば時間および／または空間解像度に応じてＡＤＲＣの対象とするクラスタツプ領域の広さが変更される。また、信号の種類によって信号特性が異なるので、クラスタップ構造を変更しても良い。さらに、画像のアスペクト比に応じてクラスタップ構造を変更することも可能である。
【００３２】
また、付加情報には、符号化復号化による画像の歪みを示す圧縮率（伝送レート情報）も含まれ、圧縮率の情報を付加情報から抽出することができる。一旦復号化された画像信号中の符号化歪み量を検出することは、難しい。異なる符号化歪み量の信号に対してクラス分類適応処理を適用した場合、予測性能の向上が困難である。そこで、この圧縮率（伝送レート情報）に対応してクラスタップの構成が変更される。さらに、動きベクトル情報に基づいてクラスタップの構成を変更することによって、時空間相関特性が高いクラスタップ構造を実現することができる。例えば静止の場合では、フレーム内でクラスタップを構成し、動きがあるときには、現在フレームに加えて前後のフレームにわたってクラスタップを構成するようになされる。
【００３３】
さらに、クラスコード生成部７で形成されたクラスコードが予測タップデータ生成部５に対して制御信号として供給される。それによって、図４に示すような付加情報を加味したクラス毎に、最適な予測タップのパターンが設定されるようになされる。上述したクラスタップの構造を付加情報によって変更するのと同様に、クラス中の付加情報に応じて予測タップの構造が変更され、クラスタップの場合と同様に、予測タップを変更することによって、予測性能を向上することができる。
【００３４】
図５は、タップ（クラスタップまたは予測タップ）の領域を付加情報に応じて変更する一例を模式的に示すものである。図５は、現フレームとその前のフレームにそれぞれ属する空間的なタップによって時空間タップを設定する例を示し、破線の枠は、タップ領域を表している。また、○が付された画素は、伝送画素を示し、×が付された画素は、非伝送画素を示す。現フレーム内の四角が付された画素は、補間の対象である注目画素を示す。
【００３５】
図５は、前フレームと現フレームとの間の動きベクトルによって、前フレームに設定される空間タップ（図５の例では、３×３画素の領域）の位置が変更される。この動き補正によって、相関が強い複数画素を使用してタップを構成することが可能となる。また、画像フォーマット情報例えば空間解像度情報Ｆ０，Ｆ１，Ｆ２に応じて、現フレームに設定される空間タップの領域が変更される。空間解像度情報Ｆ０，Ｆ１，Ｆ２は、注目された付加情報または付加情報クラスとしてクラスコード生成部７が生成するクラス情報中に含まれている。前述の図４の例では、Ｆ０，Ｆ１の２種類のクラスが存在している。
【００３６】
一例として、Ｆ０が空間解像度が最も低く、Ｆ１が空間解像度が中間で、Ｆ２が最も空間解像度が高い。空間解像度が高くなるにしたがってタップが含まれる領域が徐々に拡大される。空間解像度が低い場合には、相関の強い画素が存在する範囲が狭くなるために、タップの領域も狭いものとされる。それによって、クラス分類適応処理による補間処理の性能の向上を図ることができる。
【００３７】
次に、学習すなわちクラス毎の予測係数を求める処理について説明する。一般的には、クラス分類適応処理によって予測されるべき画像信号と同一の信号形式の画像信号（以下、教師信号と称する）と、教師信号にクラス分類適応処理の目的とされる処理（すなわち、補間処理）と関連する処理を行うことによって得られる画像信号（生徒信号）とに基づく所定の演算処理によって予測係数が決定される。ＭＰＥＧ２規格等に従う画像信号の符号化／復号化を経た画像信号を対象としてなされるクラス分類適応処理においては、学習は、例えば図６に示すような構成によって行われる。
【００３８】
学習のために、教師信号と入力画像信号が使用される。教師信号は、サブサンプリングされていない信号であり、生徒信号は、サブサンプリングされた信号である。教師信号をサブサンプリングすることによって入力画像信号を形成しても良い。入力画像信号が符号化器２１で例えばＭＰＥＧ２によって符号化される。符号化器２１の出力信号が図１における入力信号に相当する。符号化器２１の出力信号が復号器２２に供給される。復号器２２からの復号画像信号が生徒信号として使用される。また、復号器２２で分離された復号用の付加情報が付加情報抽出部２３に供給され、付加情報が抽出される。さらに、サブサンプリングパターン生成部３２においてパターンデータが生成され、伝送画素および非伝送画素の位置を指示するパターンデータがサブサンプリングパターン生成部３２から出力される。
【００３９】
抽出された付加情報は、付加情報クラス生成部２４および領域切出し部２５に供給される。付加情報は、上述したのと同様に、信号種類情報、画像フォーマット情報、画質情報、動きベクトル等である。また、サブサンプリングパターン生成部３２からのパターンデータが領域切出し部２５および予測タップデータ生成部２６に供給される。
【００４０】
復号器２２からの復号画像信号、すなわち、生徒信号が領域切出し部２５および予測タップデータ生成部２６に供給される。図１５の構成と同様に、領域切出し部２５が付加情報抽出部２３で抽出された付加情報によって制御され、予測タップデータ生成部２６がクラスコード生成部２８で生成されたクラスの内の付加情報クラスによって制御される。それによって、時間的および／または空間的相関の高い複数の画素によってタップを設定することが可能とされる。領域切出し部２５で抽出されたクラスタップのデータが特徴量抽出部２７に供給され、特徴量抽出部２７においてＡＤＲＣ等の処理によって、特徴量を抽出する。この特徴量がクラスコード生成部２８に供給される。クラスコード生成部２８は、付加情報クラスとＡＤＲＣコードとサブサンプリングパターンとに基づいて、クラス分類の結果を表すクラスコードを発生する。クラスコードは、正規方程式加算部２９に供給される。
【００４１】
一方、予測タップデータ生成部２６により抽出された予測タップの画素データであって、伝送画素データが正規方程式加算部２９に供給される。正規方程式加算部２９は、予測タップデータ生成部２６の出力と、教師信号とに基づく所定の演算処理によって、クラスコード生成部２８から供給されるクラスコードに対応する予測係数セットを解とする正規方程式のデータを生成する。正規方程式加算部２９の出力は、予測係数算出部３０に供給される。
【００４２】
予測係数算出部３０は、供給されるデータに基づいて正規方程式を解くための演算処理を行う。この演算処理によって算出された予測係数セットがメモリ３１に供給され、記憶される。予測推定に係る画像変換処理を行うに先立って、図１５中の予測係数ＲＯＭ８にメモリ３１の記憶内容がロードされる。
【００４３】
正規方程式について以下に説明する。上述の式（１）において、学習前は予測係数セットｗ₁，‥‥，ｗ_nが未定係数である。学習は、クラス毎に複数の教師信号を入力することによって行う。教師信号の種類数をｍと表記する場合、式（１）から、以下の式（２）が設定される。
【００４４】
ｙ_k＝ｗ₁×ｘ_k1＋ｗ₂×ｘ_k2＋‥‥＋ｗ_n×ｘ_kn （２）
（ｋ＝１，２，‥‥，ｍ）
【００４５】
ｍ＞ｎの場合、予測係数セットｗ₁，‥‥，ｗ_nは一意に決まらないので、誤差ベクトルｅの要素ｅ_kを以下の式（３）で定義して、式（４）によって定義される誤差ベクトルｅを最小とするように予測係数セットを定めるようにする。すなわち、いわゆる最小２乗法によって予測係数セットを一意に定める。
【００４６】
ｅ_k＝ｙ_k−｛ｗ₁×ｘ_k1＋ｗ₂×ｘ_k2＋‥‥＋ｗ_n×ｘ_kn｝（３）
（ｋ＝１，２，‥‥ｍ）
【００４７】
【数１】

【００４８】
式（４）のｅ²を最小とする予測係数セットを求めるための実際的な計算方法としては、ｅ²を予測係数ｗ_i(i=1,2‥‥）で偏微分し（式（５））、ｉの各値について偏微分値が０となるように各予測係数ｗ_iを定めれば良い。
【００４９】
【数２】

【００５０】
式（５）から各予測係数ｗ_iを定める具体的な手順について説明する。式（６）、（７）のようにＸ_ji，Ｙ_iを定義すると、式（５）は、式（８）の行列式の形に書くことができる。
【００５１】
【数３】

【００５２】
【数４】

【００５３】
【数５】

【００５４】
式（８）が一般に正規方程式と呼ばれるものである。予測係数算出部３０は、掃き出し法等の一般的な行列解法に従って正規方程式（８）を解くための計算処理を行って予測係数ｗ_iを算出する。
【００５５】
また、予測係数の生成は、図７に示すフローチャートで示されるようなソフトウェア処理によっても行うことができる。ステップＳ１から処理が開始され、ステップＳ２において、生徒信号を生成することによって、予測係数を生成するのに必要十分な学習データを生成する。ステップＳ３において、予測係数を生成するのに必要十分な学習データが得られたどうかを判定し、未だ必要十分な学習データが得られていないと判断された場合には、ステップＳ４に処理が移行する。
【００５６】
ステップＳ４において、生徒信号から抽出された特徴量と付加情報とパターンデータからクラスを決定する。ステップＳ５においては、各クラス毎に正規方程式を生成し、ステップＳ２に戻って同様の処理手順を繰り返すことによって、予測係数セットを生成するのに必要十分な正規方程式を生成する。
【００５７】
ステップＳ３において、必要十分な学習データが得られたと判断されると、ステップＳ６に処理が移る。ステップＳ６では、正規方程式を掃き出し法によって解くことによって、予測係数セットｗ₁，ｗ₂，・・・・，ｗ_nを各クラス毎に生成する。そして、ステップＳ７において、生成した各クラス毎の予測係数セットｗ₁〜ｗ_nをメモリに記憶し、ステップＳ８で学習処理を終了する。
【００５８】
次に、時間および／または空間解像度を創造するようにしたこの発明の他の実施形態について説明する。図８は、予測画像信号の生成に係る構成を示す。上述した一実施形態の構成（図１）と対応する部分には、同一の参照符号を付してその説明は省略する。例えばＭＰＥＧの復号器１からは、復号化された画像信号と、復号化用の付加情報とが出力される。付加情報は、付加情報抽出部２に供給され、クラス分類適応処理に使用される付加情報が付加情報抽出部２から選択的に出力される。この抽出された付加情報が付加情報クラス生成部３に供給される。
【００５９】
復号器１からの復号化画像信号が領域切出し部４および予測タップデータ生成部５に供給される。領域切出し部４は、入力画像信号から複数の画素からなる領域を抽出し、抽出した領域に係る画素データを特徴量抽出部６に供給する。特徴量抽出部６は、ＡＤＲＣコードを生成し、生成したＡＤＲＣコードをクラスコード生成部７に供給する。クラスコード生成部７には、付加情報クラス生成部３において、付加情報に基づいて生成された付加情報クラスも供給される。クラスコード生成部７は、付加情報クラスとＡＤＲＣコードに基づいて、クラス分類の結果を表すクラスコードを発生し、クラスコードを予測係数ＲＯＭ８に対してアドレスとして供給する。ＲＯＭ８は、供給されるクラスコードに対応する予測係数セットを予測演算部９に出力する。予測係数セットは、上述した学習処理によって予め決定され、クラス毎に、より具体的にはクラスコードをアドレスとする形態で予測係数ＲＯＭ８に記憶されている。
【００６０】
一方、予測タップデータ生成部５は、入力画像信号から複数の画素からなる所定の領域（予測タップ）を抽出し、抽出した予測タップの画素データを予測演算部９に供給する。予測タップは、クラスタップと同様に、注目（目標）画素の空間的および／または時間的近傍に存在する複数の画素からなる領域である。予測演算部９は、予測タップデータ生成部５から供給される画素データと、ＲＯＭ８から供給される予測係数セットとに基づいて積和演算を行うことによって、予測画素値を生成し、予測画素値を出力する。
【００６１】
予測画像信号は、復号器１の出力画像信号に対して、空間解像度がより高いものとされたものである。例えば、水平方向および垂直方向のそれぞれに関して画素数が元の画像の２倍とされた画像信号が出力される。クラス分類適応処理は、平均値等で画素を補間するものとは異なり、予め実際の画像信号を使用して求めた予測係数を使用するので、解像度を創造することができる処理である。また、この発明は、空間解像度に限らず、時間解像度を高くする処理に対しても適用できる。例えばフィールド周波数を６０Hzから１２０Hzとする処理に対しても適用することができる。さらに、時空間（空間および時間）の解像度を高くする処理を行うようにしても良い。
【００６２】
図９は、クラスコード生成部７において形成されるクラスコード（予測係数ＲＯＭのアドレス）と、予測係数ＲＯＭ８に記憶されている予測係数との一例を示す。一実施形態におけるクラスコード（図４参照）と比較すると、サブサンプリングクラスが含まれていない点を除いて同様のクラス情報が使用される。
【００６３】
このように、４種類の付加情報クラスと１種類の信号特徴量クラスとで定まるクラス毎に予測係数セットがＲＯＭ８に記憶されている。上述した式（１）で示される予測演算を行う時には、ｗ₁，ｗ₂，‥‥，ｗ_nのｎ個の予測係数セットが各クラス毎に存在する。
【００６４】
さらに、復号器１からの復号画像信号の特性に基づいて、クラス分類のためのデータ抽出方法と、予測タップの構造を変更することによって、クラス分類適応処理の予測性能を向上するようにしている。すなわち、付加情報抽出部２によって抽出される付加情報によって、復号画像信号の特徴量を抽出するクラスタップ構造を変更するために、付加情報例えば対象画像の時間および／または空間解像度によって領域切出し部４で抽出されるクラスタップの大きさまたは位置が切り替えられる。また、信号の種類によって信号特性が異なるので、クラスタップ構造が変更されるようにしても良く、アスペクト比に応じてクラスタップ構造を変更しても良い。また、圧縮率（伝送レート情報）に対応してクラスタップの構成を変更しても良い。さらに、動きベクトル情報に基づいてクラスタップの構成を変更することによって、時空間相関特性が高いクラスタップ構造を実現することができる。
【００６５】
さらに、クラスコード生成部７で形成されたクラスコードが予測タップデータ生成部５に対して制御信号として供給される。それによって、付加情報を加味したクラス毎に、最適な予測タップのパターンが設定されるようになされる。上述したクラスタップの構造を付加情報によって変更するのと同様に、クラス中の付加情報クラスに応じて予測タップの構造が変更され、クラスタップの場合と同様に、予測タップを変更することによって、予測性能を向上することができる。
【００６６】
図１０は、タップ（クラスタップまたは予測タップ）の領域を付加情報に応じて変更する一例を示すものである。図１０は、空間解像度と時間解像度の両者を創造する例を示している。すなわち、時間的に連続するフレーム（またはフィールド）Ｔ０，Ｔ１の中間に新たなフレームＴ’を作成し、また、元の画素数の４倍の画素数を作成する。
【００６７】
タップは、復号画像信号中に存在するフレームＴ０およびＴ１に属する画像中に構成される空間タップを合わせた時空間タップとされる。画像フォーマット情報例えば空間解像度情報Ｆ０，Ｆ１，Ｆ２に応じて、タップが含まれる範囲の領域が変更される。具体的なタップ構造は、これらの何れかの領域内に構成される。空間解像度情報Ｆ０，Ｆ１，Ｆ２は、付加情報クラスとしてクラスコード生成部７が生成するクラス情報中に含まれている。
【００６８】
一例として、Ｆ０が空間解像度が最も低く、Ｆ１が空間解像度が中間で、Ｆ２が最も空間解像度が高い。空間解像度が高くなるにしたがってタップが含まれる領域が徐々に拡大される。空間解像度が低い場合には、相関の強い画素が存在する範囲が狭くなるために、タップの領域も狭いものとされる。それによって、クラス分類適応処理による解像度創造の性能の向上を図ることができる。
【００６９】
また、フレームＴ０およびＴ１間の動き量に応じてタップが含まれる領域の位置が変更される。クラスタップの場合では、付加情報中の動きベクトルに応じてタップが含まれる領域の位置が変更される。予測タップの場合では、クラス情報中の動きベクトルクラスに応じてタップが含まれる領域の位置が変更される。このように領域の位置を変更することによって、より高い相関を持つ領域からクラスタップの画素を切出し、または予測タップの画素を抽出することができる。それによって、クラス分類適応処理の予測精度を向上することができる。
【００７０】
解像度のより高い出力画像信号を生成する他の実施形態において、教師信号として解像度の高い信号を使用し、復号画像信号に対応する生徒信号を使用することで、予測係数を予め求め、予測係数ＲＯＭ８に蓄積する。また、ソフトウェア処理によって予測係数を求めるようにしても良い。
【００７１】
この発明は、上述したこの発明の一実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。例えばＭＰＥＧ２に限らず、ＭＰＥＧ４等の他の符号化方法を使用する場合に対して、この発明を適用することができる。
【００７２】
【発明の効果】
上述したようにこの発明では、復号用付加情報を用いることによって、対象信号の属性や、特性を反映した適切なクラスタツプおよび／または予測タップを設定することが可能となり、クラス分類適応処理の予測精度を向上することができる。すなわち、クラスを決定するのに使用する複数の画素の時空間の相関が高いものとなり、また、予測演算に使用する複数の画素の時空間の相関が高いものとなり、クラス分類適応処理における予測精度を向上することができる。
【００７３】
また、この発明では、対象とする復号信号の動きベクトル情報によってクラスタツプ、予測タツプの位置を切り替えている。この動きベクトル情報を復号信号から検出するのではなく、付加情報として伝送される動きベクトル情報を使用するので、動きベクトル検出に必要とされる膨大な演算を回避でき、しかも、動きベクトルの精度を高くできる。
【図面の簡単な説明】
【図１】この発明の一実施形態の構成を示すブロック図である。
【図２】クラスタップの画素配置の一例の略線図である。
【図３】予測タップの画素配置の一例の略線図である。
【図４】一実施形態における付加情報および特徴量に基づくクラスの一例を示す略線図である。
【図５】この発明の一実施形態を説明するための略線図である。
【図６】クラス分類適応処理を行う場合の予測係数の学習処理に係る構成の一例を示すブロック図である。
【図７】学習処理をソフトウェアで行う時の処理を示すフローチャートである。
【図８】この発明の他の実施形態の構成を示すブロック図である。
【図９】他の実施形態における付加情報および特徴量に基づくクラスの一例を示す略線図である。
【図１０】この発明の他の実施形態を説明するための略線図である。
【符号の説明】
１，２２・・・復号器、２，２３・・・付加情報抽出部、３，２４・・・付加情報クラス生成部、４，２５・・・領域切出し部、５，２６・・・予測タップデータ生成部、６，２７・・・特徴量抽出部，７，２８・・・クラスコード生成部、８・・・予測係数ＲＯＭ、９・・・予測演算部[0001]
BACKGROUND OF THE INVENTION
The present invention, when for the digital image signal that has been encoded applying the classification adaptive processing relates Lud Ijitaru image signal processing apparatus and method it is possible to improve the prediction accuracy.
[0002]
[Prior art]
As one of image signal compression encoding methods, an MPEG2 (Moving Picture Expert Group phase 2) encoding method is used. In the transmission / reception or recording / reproducing system based on MPEG2, the image signal is subjected to compression coding processing according to MPEG2 to be transmitted or recorded, and the received or reproduced image signal is expanded corresponding to the compression coding processing according to MPEG2. By decoding, the original image signal is restored.
[0003]
In the encoding process by MPEG2, in order to give the encoding process versatility and to improve the compression efficiency by encoding, additional information for decoding process is transmitted together with the encoded image data. Yes. The additional information is inserted into the header of the MPEG2 stream and transmitted to the decoding apparatus.
[0004]
The characteristics of an image signal obtained by decoding, not limited to MPEG, vary greatly depending on the encoding / decoding method applied. For example, the physical characteristics (frequency characteristics, etc.) differ greatly depending on the signal type such as a luminance signal, color difference signal, and three primary color signal.
This difference also remains in the decoded signal that has undergone the encoding / decoding process. Also, in general, in image coding / decoding processing, the number of pixels to be encoded is often reduced by introducing spatiotemporal thinning processing. The spatio-temporal resolution characteristics of images differ greatly depending on the thinning method. Furthermore, even when the difference in spatio-temporal resolution characteristics is small, image quality characteristics such as S / N and coding distortion amount greatly differ depending on the compression rate (transmission rate) conditions in encoding.
[0005]
The applicant of the present application has previously proposed a classification adaptation process. This is because the prediction coefficient is obtained and stored for each class in advance (offline) in the learning process using the actual image signal (teacher signal and student signal). In the actual image conversion process, the input image signal A class is obtained from the above, and an output pixel value is obtained by a prediction calculation of a prediction coefficient corresponding to the class and a plurality of pixel values of the input image signal. The class is determined according to the distribution and waveform of pixel values in the spatial and temporal vicinity of the pixel to be created. Various signal processing is possible by calculating a prediction coefficient using an actual image signal and calculating a prediction coefficient for each class. For example, it is possible to perform resolution creation processing in which the temporal and spatial resolution is equal to or higher than the input signal, interpolation of pixels thinned out by sub-sampling, noise reduction, error correction, and the like.
[0006]
[Problems to be solved by the invention]
In order to improve the prediction accuracy in the class classification adaptive process, it is necessary that the correlation in time and / or space of a plurality of pixels used to determine a class is high. In addition, a high correlation in time and / or space of a plurality of pixels used for prediction calculation is also effective in improving prediction accuracy.
[0007]
For example, in the class classification adaptation process, the prediction performance can be improved by introducing the motion information of the target image signal into the class. As the motion information, a detailed motion information expression format such as a motion vector is effective. However, when a motion vector is detected from an image signal that has been subjected to encoding / decoding processing, the accuracy of motion vector detection decreases due to distortion of the decoded image signal, and a large amount of computation is required for motion vector detection. There was a problem that processing was necessary.
[0008]
Therefore, an object of the present invention is to change the extraction range or position of a plurality of data used for class classification or prediction calculation based on additional information for a digital image signal that has undergone encoding / decoding processing, to provide a de-Ijitaru image signal processing apparatus and method capable of improving the prediction accuracy.
[0009]
[Means for Solving the Problems]
In order to solve the above-described problem, the invention of claim 1 is a digital signal processing method in which prediction signal processing in units of pixels is performed on an input image signal generated by decoding an encoded digital image signal. In the image signal processing apparatus,
Additional information extraction means for extracting additional information for decryption processing;
First area cutout means for extracting a class tap area composed of a plurality of pixels around a predetermined target pixel from the input image signal;
Feature quantity extraction means for extracting the feature quantity of the level distribution of the class tap area cut out by the first area cutout means;
Class information generating means for generating class information from the additional information and the feature quantity ;
Second region cutout means for extracting a prediction tap region composed of a plurality of pixels around a predetermined target pixel from the input digital image signal;
Predetermined corresponding to the class information generated by the class information generating means, and a prediction coefficient for estimating the output image signal after processing is stored in the storage means,
According to the class information generated in the class information generation step, a pixel for the target pixel is obtained by a product-sum operation between the prediction coefficient selected from the storage unit and a plurality of pixels in the prediction tap region extracted by the second region cutout unit. possess an arithmetic processing unit for performing arithmetic processing for predicting generating a value,
The difference between the calculated value of the product-sum operation of the prediction coefficient and the plurality of image data of the prediction tap area extracted by the second area cutout means and the true pixel value in the predetermined image signal corresponding to the output image signal The prediction coefficient is predetermined so as to minimize
The additional information includes at least one of information indicating the type of the processing target image signal, time and / or spatial resolution information of the processing target image signal, and encoding compression rate. It is.
[0010]
According to a third aspect of the present invention, there is provided a digital image signal processing method in which prediction signal processing in units of pixels is performed on an input image signal generated by decoding an encoded digital image signal.
An additional information extraction step of extracting additional information for decryption processing;
A first region extraction step of extracting a class tap region composed of a plurality of pixels around a predetermined target pixel from the input image signal;
A feature amount extraction step of extracting a feature amount of the level distribution of the class tap region extracted by the first region extraction step;
A class information generation step for generating class information from the additional information and the feature amount;
A second region extraction step of extracting a prediction tap region composed of a plurality of pixels around a predetermined target pixel from the input digital image signal;
Predetermined in correspondence with the class information generated in the class information generation step, the prediction coefficient for estimating the output image signal after processing is stored in the storage means,
According to the class information generated in the class information generation step, a pixel for the target pixel is obtained by multiplying the prediction coefficient selected from the storage unit by a plurality of pixels in the prediction tap region extracted in the second region cutout step. An arithmetic processing step for performing arithmetic processing for predicting and generating a value,
The difference between the calculated value of the product-sum operation of the prediction coefficient and the plurality of image data of the prediction tap region extracted by the second region cutout step and the true pixel value in the predetermined image signal corresponding to the output image signal The prediction coefficient is predetermined so as to minimize
Digital image signal processing method characterized in that the additional information includes at least one of information indicating the type of the processing target image signal, time and / or spatial resolution information of the processing target image signal, and encoding compression rate It is.
[0013]
According to the present invention, it is possible to improve the prediction accuracy in the class classification adaptation process by changing at least one of the cluster tab area and the prediction tap area based on the additional information for the decoding process.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described. One embodiment is an example of digital image signal processing in which pixels thinned out by sub-sampling are interpolated by class classification adaptive processing. First, a configuration relating to generation of a predicted image signal (that is, an interpolated image signal) will be described with reference to FIG. An input bit stream is supplied to the decoder 1. Here, the input bit stream is image data that has been subsampled and compressed and encoded by MPEG2 in the transmission / reception system (or recording / playback system, the same applies hereinafter), and other data such as additional information. . The decoder 1 outputs a decoded image signal and additional information for decoding. In the decoder 1, thinned pixels are interpolated by conventional processing, but the error between the true value and the interpolated pixel value is not sufficiently reduced. In one embodiment, an improved interpolation process is performed by the class classification adaptive process, and the interpolation pixel in the output signal of the decoder 1 is replaced with the interpolation pixel generated by the embodiment.
[0015]
The additional information is accompanying information necessary for the decoding process, and is inserted into each header of the sequence layer, the GOP layer, and the picture layer in the input bitstream. The decoder 1 uses the additional information. Decoding processing is performed, and additional information is separated and output. The additional information includes identification information indicating the subsampling thinning structure. From the identification information, the sub-sampling pattern generation unit 10 generates sub-sampling pattern data for displaying whether or not the pixel is an interpolation target pixel in accordance with the phase of the pixel in the decoded data. The decoded data and the interpolated pixel generated by the class classification adaptive process are supplied to the selector 11, and the selector 11 is controlled according to the pattern data, and an output image signal in which the thinned pixel is interpolated in the output is obtained. Note that, for example, a plurality of types of sub-sampling patterns are prepared, and are switched depending on the position of the interpolation target pixel.
[0016]
The additional information is supplied to the additional information extraction unit 2, and the additional information used for the class classification adaptation process is selectively output from the additional information extraction unit 2. The extracted additional information is supplied to the additional information class generation unit 3. For example, the following information is used as additional information used in the classification adaptation process.
[0017]
(1) Signal type information: each component of the component signal (Y, U, V component, Y, Pr, Pb component, R, G, B component, etc.)
(2) Image format information: interlace / progressive identification information, field or frame frequency (time resolution information), image size information (spatial resolution information) such as the number of horizontal pixels and the number of vertical lines, 4: 3, 16: 9, etc. Aspect ratio information
(3) Image quality information: Transmission bit rate (compression rate) information
(4) Motion vector: There are various target signals for horizontal and vertical motion amount information image encoding, and decoding on the receiving side is realized by transmitting various control signals including the above-mentioned additional information. Yes. The signal characteristics of the decoded image signal vary greatly depending on various specifications and attributes indicated by the additional information. Therefore, the prediction performance can be improved by introducing this characteristic information into the classification adaptation process.
[0018]
The decoded image signal from the decoder 1 and the pattern data from the sub-sampling pattern generation unit 10 are supplied to the region extraction unit 4 and the prediction tap data generation unit 5. The region cutout unit 4 extracts a region composed of a plurality of pixels from the input image signal, and supplies pixel data (transmission pixel indicated by the subsampling pattern) related to the extracted region to the feature amount extraction unit 6. The feature amount extraction unit 6 generates an ADRC code by performing processing such as 1-bit ADRC on the supplied pixel data, and supplies the generated ADRC code to the class code generation unit 7. A plurality of pixel regions extracted by the region cutout unit 4 are referred to as class taps. The class tap is a region composed of a plurality of pixels existing in the spatial and / or temporal vicinity of the target (target) pixel. As will be described later, the class is determined for each target (target) pixel.
[0019]
In ADRC, the maximum value and the minimum value of pixel values in a class tap are obtained, a dynamic range that is a difference between the maximum value and the minimum value is obtained, and each pixel value is requantized in accordance with the dynamic range. In the case of 1-bit ADRC, the pixel value is converted to 1 bit depending on whether it is larger or smaller than the average value of a plurality of pixel values in the tap. The ADRC process is a process for making the number of classes representing the level distribution of pixel values relatively small. Therefore, not only ADRC but also encoding that compresses the number of bits of a pixel value such as vector quantization may be used.
[0020]
Also, class information based on the pattern data is supplied from the feature quantity extraction unit 6 to the class code generation unit 7. That is, the subsampling pattern of the interpolation target pixel (target pixel) is supplied to the class code generation unit 7 as a subsampling pattern class. The subsampling pattern class can also be generated from the additional information by the additional information class generation unit 3.
[0021]
The additional information class generated based on the additional information in the additional information class generating unit 3 is also supplied to the class code generating unit 7. Based on the additional information class, ADRC code, and sub-sampling pattern class, the class code generation unit 7 generates a class code representing the result of class classification, and supplies the class code to the prediction coefficient ROM 8 as an address. The ROM 8 outputs a prediction coefficient set corresponding to the supplied class code to the prediction calculation unit 9. The prediction coefficient set is determined in advance by a learning process, which will be described later, and is stored in the prediction coefficient ROM 8 for each class, more specifically, in a form using the class code as an address. The prediction coefficient may be stored in a memory having a RAM configuration in which the prediction coefficient can be downloaded from the outside.
[0022]
On the other hand, the prediction tap data generation unit 5 extracts a predetermined region (prediction tap) including a plurality of pixels from the input image signal, and supplies the extracted prediction tap pixel data to the prediction calculation unit 9. Similar to the class tap, the prediction tap is a region composed of a plurality of pixels existing in the spatial and / or temporal vicinity of the target (target) pixel. The pattern data is supplied to the prediction tap data generation unit 5, and the pixel designated as the thinned pixel by the pattern data is not used as the pixel of the prediction tap. The prediction calculation unit 9 performs a product-sum operation according to the following equation (1) based on the pixel data supplied from the prediction tap data generation unit 5 and the prediction coefficient set supplied from the ROM 8, thereby predicting pixel values. And outputs a predicted pixel value. The prediction tap and the class tap described above may be the same or different.
[0023]
y = w ₁ × x ₁ + w ₂ × x ₂ +... + w _n × x _n (1)
Here, x _1, ‥‥, x _n is the pixel data of the prediction taps, w _1, ‥‥, w _n is the prediction coefficient set. The prediction calculation is not limited to the linear expression shown in the expression (1), and may be a higher order expression of the second order or higher, or may be nonlinear.
[0024]
The predicted image signal is obtained by interpolating the thinned pixels in the output image signal of the decoder 1. Unlike the case of interpolating thinned pixels with a fixed coefficient filter, the class classification adaptive process uses a prediction coefficient obtained in advance using an actual image signal, so that a pixel value closer to the true value is obtained. Thinned pixels can be interpolated.
[0025]
FIG. 2 shows an example of the arrangement of class taps extracted by the area cutout unit 4. In the decoded image signal, a class tap is set by a total of seven pixels including the target pixel and a plurality of pixels around it. FIG. 3 shows an example of the arrangement of prediction taps output from the prediction tap data generation unit 5. In the decoded image signal, a prediction tap is set by a total of 13 pixels including a target pixel and a plurality of pixels around the target pixel. 2 and 3, the solid line indicates the first field, and the broken line indicates the second field. Further, the arrangement of the taps shown is an example, and various arrangements can be used.
[0026]
Next, an example of the class code (address of the prediction coefficient ROM) formed in the class code generation unit 7 and the prediction coefficient stored in the prediction coefficient ROM 8 will be described with reference to FIG. Among the class information shown in FIG. 4, a signal type class, a format class, a compression rate (transmission rate) class, and a motion vector class are classes generated by the additional information class generation unit 3. The signal feature quantity class is a class based on the feature quantity extracted by the feature quantity extraction unit 6, for example, an ADRC class. The sub-sampling pattern class is a class generated by the feature amount extraction unit 6 based on the pattern data. In the table of FIG. 4, the leftmost signal type class is the most significant address side, and the rightmost signal feature class is the least significant side.
[0027]
The signal type classes are, for example, two types of Y, U, V and Y, Pr, Pb, and prediction coefficients are obtained separately for each signal type, and each signal type is distinguished by classes K0 and K1. The The format class corresponds to the spatio-temporal resolution characteristics of the image to be processed. For example, there are two types, and F0 and F1 classes are defined corresponding to each format class. For example, an F0 class is assigned for an interlaced image, and a F1 class is assigned for a progressive image. Other examples of image format classes are field or frame frequency, number of horizontal pixels or number of vertical lines. As an example, the larger the numbers F0, F1, F2,..., The higher the spatiotemporal resolution.
[0028]
The compression rate (transmission rate) class is a class based on image quality information, and i types of classes R0 to Ri-1 are prepared. The higher the compression ratio, the larger the coding distortion amount. The motion vector class is a class corresponding to a motion vector between a frame including the pixel of interest (current frame) and a temporally previous frame, and j types are prepared. The compression rate class and the motion vector class may be individual values, but in that case, the number of classes increases, and therefore, the compression rate class and the motion vector class are grouped into a plurality of representative values. For example, one representative value may be set for each of a plurality of ranges formed by appropriate threshold values, and a class corresponding to the representative value may be set. Specifically, a three-stage class of stationary, small motion, and large motion may be formed from motion vectors representing horizontal and vertical motion.
[0029]
The above four types of classes are classes generated by the additional information class generation unit 3. However, the above-described classes are examples, and only some of the classes may be used. For example, only the additional information class may be used as the class. Then, the sub-sampling pattern class generated in the feature quantity extraction unit 6 is added to the lower side of the above-described four types of classes. There are m types of sub-sampling pattern classes. Further, the signal feature quantity class generated by the feature quantity extraction unit 6 (for example, a class based on the ADRC code) is added to the lower side of the sub-sampling pattern class. There are k types of signal feature classes.
[0030]
As described above, the prediction coefficient set is stored in the ROM 8 for each class determined by the four types of additional information class, sub-sampling pattern class, and signal feature amount class. When performing a prediction calculation of the formula (1) described _{_{above, w 1, w 2, ‥‥}} , n pieces of predictive coefficient set w _n are present in each class.
[0031]
Furthermore, in an embodiment of the present invention, the prediction of the class classification adaptive process is performed by changing the data extraction method for class classification and the structure of the prediction tap based on the characteristics of the decoded image signal from the decoder 1. Improve performance. Depending on the additional information extracted by the additional information extraction unit 2, the class tap pattern extracted by the region extraction unit 4 is switched by the additional information in order to change the class tap structure for extracting the feature amount of the decoded image signal. When the feature quantity extraction unit 6 extracts a waveform and level distribution as a feature quantity by ADRC, the size of the cluster top area targeted for ADRC is changed according to image format information of the target image, for example, time and / or spatial resolution. The Further, since the signal characteristics differ depending on the type of signal, the class tap structure may be changed. Furthermore, the class tap structure can be changed according to the aspect ratio of the image.
[0032]
Further, the additional information includes a compression rate (transmission rate information) indicating image distortion caused by coding and decoding, and the compression rate information can be extracted from the additional information. It is difficult to detect the amount of encoding distortion in a once decoded image signal. When class classification adaptive processing is applied to signals with different coding distortion amounts, it is difficult to improve prediction performance. Therefore, the configuration of the class tap is changed corresponding to this compression rate (transmission rate information). Furthermore, a class tap structure with high spatiotemporal correlation characteristics can be realized by changing the configuration of the class tap based on the motion vector information. For example, in the case of stillness, a class tap is configured within a frame, and when there is movement, a class tap is configured over the previous and subsequent frames in addition to the current frame.
[0033]
Further, the class code formed by the class code generation unit 7 is supplied as a control signal to the prediction tap data generation unit 5. As a result, an optimum prediction tap pattern is set for each class in consideration of additional information as shown in FIG. The structure of the prediction tap is changed according to the additional information in the class in the same manner as the class tap structure is changed according to the additional information, and the prediction is performed by changing the prediction tap as in the case of the class tap. The performance can be improved.
[0034]
FIG. 5 schematically shows an example of changing a tap (class tap or prediction tap) region according to additional information. FIG. 5 shows an example in which spatiotemporal taps are set by spatial taps respectively belonging to the current frame and the previous frame, and a broken-line frame represents a tap area. Also, a pixel with a circle indicates a transmission pixel, and a pixel with a cross indicates a non-transmission pixel. A pixel with a square in the current frame indicates a target pixel that is an object of interpolation.
[0035]
In FIG. 5, the position of the space tap (3 × 3 pixel region in the example of FIG. 5) set in the previous frame is changed by the motion vector between the previous frame and the current frame. This motion correction makes it possible to configure a tap using a plurality of pixels having a strong correlation. Further, the area of the spatial tap set in the current frame is changed according to the image format information, for example, the spatial resolution information F0, F1, and F2. The spatial resolution information F0, F1, and F2 is included in the class information generated by the class code generation unit 7 as the additional information or additional information class that has received attention. In the example of FIG. 4 described above, there are two types of classes F0 and F1.
[0036]
As an example, F0 has the lowest spatial resolution, F1 has the middle spatial resolution, and F2 has the highest spatial resolution. As the spatial resolution increases, the area including the tap is gradually enlarged. When the spatial resolution is low, the area where pixels with strong correlation exist becomes narrow, and the tap area is also narrow. Accordingly, it is possible to improve the performance of the interpolation process by the class classification adaptive process.
[0037]
Next, learning, that is, processing for obtaining a prediction coefficient for each class will be described. In general, an image signal having the same signal format as the image signal to be predicted by the class classification adaptation process (hereinafter referred to as a teacher signal) and a process targeted for the class classification adaptation process on the teacher signal (that is, The prediction coefficient is determined by a predetermined calculation process based on the image signal (student signal) obtained by performing the process related to the interpolation process. In the class classification adaptation process performed on an image signal that has undergone encoding / decoding of an image signal in accordance with the MPEG2 standard or the like, learning is performed with a configuration as shown in FIG. 6, for example.
[0038]
Teacher signals and input image signals are used for learning. The teacher signal is an unsubsampled signal, and the student signal is a subsampled signal. The input image signal may be formed by sub-sampling the teacher signal. The input image signal is encoded by the encoder 21 using, for example, MPEG2. The output signal of the encoder 21 corresponds to the input signal in FIG. The output signal of the encoder 21 is supplied to the decoder 22. The decoded image signal from the decoder 22 is used as a student signal. Further, the additional information for decoding separated by the decoder 22 is supplied to the additional information extraction unit 23, and the additional information is extracted. Further, pattern data is generated in the sub-sampling pattern generation unit 32, and pattern data indicating the positions of the transmission pixel and the non-transmission pixel is output from the sub-sampling pattern generation unit 32.
[0039]
The extracted additional information is supplied to the additional information class generation unit 24 and the region extraction unit 25. The additional information is signal type information, image format information, image quality information, motion vectors, and the like, as described above. Further, the pattern data from the sub-sampling pattern generation unit 32 is supplied to the region cutout unit 25 and the prediction tap data generation unit 26.
[0040]
The decoded image signal from the decoder 22, that is, the student signal is supplied to the region extraction unit 25 and the prediction tap data generation unit 26. Similarly to the configuration of FIG. 15, the region extraction unit 25 is controlled by the additional information extracted by the additional information extraction unit 23, and the prediction tap data generation unit 26 adds additional information in the class generated by the class code generation unit 28. Controlled by class. Thereby, a tap can be set by a plurality of pixels having high temporal and / or spatial correlation. The class tap data extracted by the area cutout unit 25 is supplied to the feature amount extraction unit 27, and the feature amount extraction unit 27 extracts the feature amount by processing such as ADRC. This feature amount is supplied to the class code generation unit 28. Based on the additional information class, the ADRC code, and the subsampling pattern, the class code generation unit 28 generates a class code that represents the result of class classification. The class code is supplied to the normal equation adding unit 29.
[0041]
On the other hand, the pixel data of the prediction tap extracted by the prediction tap data generation unit 26 and the transmission pixel data is supplied to the normal equation addition unit 29. The normal equation adding unit 29 performs normal operation with a prediction coefficient set corresponding to the class code supplied from the class code generating unit 28 as a solution by a predetermined calculation process based on the output of the prediction tap data generating unit 26 and the teacher signal. Generate equation data. The output of the normal equation adding unit 29 is supplied to the prediction coefficient calculating unit 30.
[0042]
The prediction coefficient calculation unit 30 performs a calculation process for solving a normal equation based on the supplied data. The prediction coefficient set calculated by this calculation process is supplied to the memory 31 and stored therein. Prior to performing image conversion processing related to prediction estimation, the storage contents of the memory 31 are loaded into the prediction coefficient ROM 8 in FIG.
[0043]
The normal equation will be described below. In the above equation (1), before learning, the prediction coefficient sets w ₁ ,..., W _n are undetermined coefficients. Learning is performed by inputting a plurality of teacher signals for each class. When the number of types of teacher signals is expressed as m, the following equation (2) is set from equation (1).
[0044]
y _k = w ₁ × x _k1 + w ₂ × x _k2 +... + w _n × x _kn (2)
(K = 1, 2,..., M)
[0045]
When m> n, the prediction coefficient set w ₁ ,..., w _n is not uniquely determined, so the element e _k of the error vector e is defined by the following equation (3) and defined by the equation (4). The prediction coefficient set is determined so as to minimize the error vector e. That is, a prediction coefficient set is uniquely determined by a so-called least square method.
[0046]
e _k = y _k − {w ₁ × x _k1 + w ₂ × x _k2 +... + w _n × x _kn } (3)
(K = 1, 2, ... m)
[0047]
[Expression 1]

[0048]
A practical calculation method for obtaining the prediction coefficient set that minimizes the e ² of the formula (4), partially differentiated by the prediction coefficient w _{i (i} = 1,2 ‥‥) the e ² (formula (5 )), Each prediction coefficient w _i may be determined so that the partial differential value becomes 0 for each value of _i .
[0049]
[Expression 2]

[0050]
A specific procedure for determining each prediction coefficient w _i from Expression (5) will be described. If X _ji and Y _i are defined as in equations (6) and (7), equation (5) can be written in the form of the determinant of equation (8).
[0051]
[Equation 3]

[0052]
[Expression 4]

[0053]
[Equation 5]

[0054]
Equation (8) is generally called a normal equation. The prediction coefficient calculation unit 30 calculates a prediction coefficient w _i by performing a calculation process for solving the normal equation (8) according to a general matrix solution method such as a sweep-out method.
[0055]
The generation of the prediction coefficient can also be performed by software processing as shown in the flowchart shown in FIG. The processing is started from step S1, and in step S2, learning data necessary and sufficient for generating a prediction coefficient is generated by generating a student signal. In step S3, it is determined whether or not learning data necessary and sufficient for generating a prediction coefficient has been obtained. If it is determined that necessary and sufficient learning data has not yet been obtained, the process proceeds to step S4. To do.
[0056]
In step S4, a class is determined from the feature amount extracted from the student signal, the additional information, and the pattern data. In step S5, a normal equation is generated for each class, and by returning to step S2 and repeating the same processing procedure, a normal equation necessary and sufficient for generating a prediction coefficient set is generated.
[0057]
If it is determined in step S3 that necessary and sufficient learning data has been obtained, the process proceeds to step S6. In step S6, by solving the method sweeping the normal equation to generate prediction coefficient set w _1, w _2, · · · ·, the w _n for each class. Then, at step S7, it stores the prediction coefficient set w ₁ to w _n for each class generated in the memory, and ends the learning processing in step S8.
[0058]
Next, another embodiment of the present invention in which time and / or spatial resolution is created will be described. FIG. 8 shows a configuration related to generation of a predicted image signal. Portions corresponding to the configuration of the above-described embodiment (FIG. 1) are assigned the same reference numerals, and descriptions thereof are omitted. For example, the MPEG decoder 1 outputs a decoded image signal and additional information for decoding. The additional information is supplied to the additional information extraction unit 2, and the additional information used for the class classification adaptation process is selectively output from the additional information extraction unit 2. The extracted additional information is supplied to the additional information class generation unit 3.
[0059]
The decoded image signal from the decoder 1 is supplied to the region extraction unit 4 and the prediction tap data generation unit 5. The region cutout unit 4 extracts a region composed of a plurality of pixels from the input image signal, and supplies pixel data relating to the extracted region to the feature amount extraction unit 6. The feature amount extraction unit 6 generates an ADRC code and supplies the generated ADRC code to the class code generation unit 7. The additional information class generated based on the additional information in the additional information class generating unit 3 is also supplied to the class code generating unit 7. The class code generation unit 7 generates a class code representing the result of class classification based on the additional information class and the ADRC code, and supplies the class code as an address to the prediction coefficient ROM 8. The ROM 8 outputs a prediction coefficient set corresponding to the supplied class code to the prediction calculation unit 9. The prediction coefficient set is determined in advance by the learning process described above, and is stored in the prediction coefficient ROM 8 for each class, more specifically, in a form having the class code as an address.
[0060]
On the other hand, the prediction tap data generation unit 5 extracts a predetermined region (prediction tap) including a plurality of pixels from the input image signal, and supplies the extracted prediction tap pixel data to the prediction calculation unit 9. Similar to the class tap, the prediction tap is a region composed of a plurality of pixels existing in the spatial and / or temporal vicinity of the target (target) pixel. The prediction calculation unit 9 generates a prediction pixel value by performing a product-sum operation based on the pixel data supplied from the prediction tap data generation unit 5 and the prediction coefficient set supplied from the ROM 8, and generates a prediction pixel value. Is output.
[0061]
The predicted image signal has a higher spatial resolution than the output image signal of the decoder 1. For example, an image signal in which the number of pixels is twice that of the original image in each of the horizontal direction and the vertical direction is output. The class classification adaptive process is a process that can create a resolution because it uses a prediction coefficient obtained in advance using an actual image signal, unlike the case of interpolating pixels with an average value or the like. Further, the present invention is not limited to the spatial resolution but can be applied to processing for increasing the temporal resolution. For example, the present invention can be applied to processing for changing the field frequency from 60 Hz to 120 Hz. Furthermore, processing for increasing the resolution of space-time (space and time) may be performed.
[0062]
FIG. 9 shows an example of class codes (prediction coefficient ROM addresses) formed in the class code generation unit 7 and prediction coefficients stored in the prediction coefficient ROM 8. Compared to the class code in one embodiment (see FIG. 4), similar class information is used except that no sub-sampling class is included.
[0063]
As described above, the prediction coefficient set is stored in the ROM 8 for each class determined by the four types of additional information classes and the one type of signal feature amount class. When performing a prediction calculation of the formula (1) described _{_{above, w 1, w 2, ‥‥}} , n pieces of predictive coefficient set w _n are present in each class.
[0064]
Further, the prediction performance of the class classification adaptive processing is improved by changing the data extraction method for class classification and the structure of the prediction tap based on the characteristics of the decoded image signal from the decoder 1. . That is, in order to change the class tap structure for extracting the feature amount of the decoded image signal according to the additional information extracted by the additional information extraction unit 2, the region extraction unit 4 is added according to the additional information, for example, the time and / or spatial resolution of the target image. The size or position of the class taps extracted in is switched. Further, since the signal characteristics differ depending on the type of signal, the class tap structure may be changed, or the class tap structure may be changed according to the aspect ratio. Further, the class tap configuration may be changed in accordance with the compression rate (transmission rate information). Furthermore, a class tap structure with high spatiotemporal correlation characteristics can be realized by changing the configuration of the class tap based on the motion vector information.
[0065]
Further, the class code formed by the class code generation unit 7 is supplied as a control signal to the prediction tap data generation unit 5. As a result, an optimum prediction tap pattern is set for each class in which additional information is added. Similar to changing the structure of the class tap described above according to the additional information, the structure of the prediction tap is changed according to the additional information class in the class, and as in the case of the class tap, by changing the prediction tap, Prediction performance can be improved.
[0066]
FIG. 10 shows an example of changing a tap (class tap or prediction tap) area according to additional information. FIG. 10 shows an example of creating both spatial resolution and temporal resolution. That is, a new frame T ′ is created in the middle of temporally continuous frames (or fields) T0 and T1, and the number of pixels is four times the original number of pixels.
[0067]
The tap is a space-time tap that is a combination of space taps configured in images belonging to frames T0 and T1 existing in the decoded image signal. The area of the range including the tap is changed according to the image format information, for example, the spatial resolution information F0, F1, and F2. A specific tap structure is configured in any one of these regions. The spatial resolution information F0, F1, F2 is included in the class information generated by the class code generation unit 7 as an additional information class.
[0068]
As an example, F0 has the lowest spatial resolution, F1 has the middle spatial resolution, and F2 has the highest spatial resolution. As the spatial resolution increases, the area including the tap is gradually enlarged. When the spatial resolution is low, the area where pixels with strong correlation exist becomes narrow, and the tap area is also narrow. Thereby, it is possible to improve the performance of resolution creation by the class classification adaptive processing.
[0069]
Further, the position of the region including the tap is changed according to the amount of movement between the frames T0 and T1. In the case of the class tap, the position of the region including the tap is changed according to the motion vector in the additional information. In the case of the prediction tap, the position of the region including the tap is changed according to the motion vector class in the class information. By changing the position of the region in this way, it is possible to cut out the class tap pixels from the region having higher correlation or extract the prediction tap pixels. Thereby, the prediction accuracy of the class classification adaptive processing can be improved.
[0070]
In another embodiment for generating an output image signal having a higher resolution, a prediction coefficient is obtained in advance by using a high-resolution signal as a teacher signal and a student signal corresponding to the decoded image signal, and the prediction coefficient ROM 8 To accumulate. Further, the prediction coefficient may be obtained by software processing.
[0071]
The present invention is not limited to the above-described embodiment of the present invention, and various modifications and applications can be made without departing from the gist of the present invention. For example, the present invention can be applied not only to MPEG2 but also to other encoding methods such as MPEG4.
[0072]
【The invention's effect】
As described above, according to the present invention, by using the decoding additional information, it is possible to set an appropriate cluster tap and / or prediction tap reflecting the attribute and characteristics of the target signal, and the prediction accuracy of the class classification adaptive processing Can be improved. In other words, the correlation of the spatio-temporal of multiple pixels used to determine a class is high, and the correlation of the spatio-temporal of multiple pixels used for prediction calculation is high, so that the prediction accuracy in the class classification adaptive processing Can be improved.
[0073]
In the present invention, the positions of the cluster tap and the prediction tap are switched according to the motion vector information of the target decoded signal. This motion vector information is not detected from the decoded signal, but the motion vector information transmitted as additional information is used, so that a huge amount of computation required for motion vector detection can be avoided and the accuracy of the motion vector can be improved. Can be high.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.
FIG. 2 is a schematic diagram illustrating an example of pixel arrangement of class taps.
FIG. 3 is a schematic diagram illustrating an example of a pixel arrangement of prediction taps.
FIG. 4 is a schematic diagram illustrating an example of a class based on additional information and feature amounts according to an embodiment.
FIG. 5 is a schematic diagram for explaining an embodiment of the present invention.
FIG. 6 is a block diagram illustrating an example of a configuration related to a prediction coefficient learning process when performing a class classification adaptation process.
FIG. 7 is a flowchart showing processing when learning processing is performed by software.
FIG. 8 is a block diagram showing a configuration of another embodiment of the present invention.
FIG. 9 is a schematic diagram illustrating an example of a class based on additional information and feature amounts according to another embodiment.
FIG. 10 is a schematic diagram for explaining another embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF

SYMBOLS

1,22 ... Decoder, 2,23 ... Additional information extraction part, 3,24 ... Additional information class generation part, 4,25 ... Area extraction part, 5,26 ... Prediction tap Data generation unit, 6, 27 ... feature quantity extraction unit, 7, 28 ... class code generation unit, 8 ... prediction coefficient ROM, 9 ... prediction calculation unit

Claims

In a digital image signal processing apparatus configured to perform prediction signal processing in units of pixels on an input image signal generated by decoding an encoded digital image signal,
Additional information extraction means for extracting additional information for decryption processing;
First area cutout means for extracting a class tap area composed of a plurality of pixels around a predetermined target pixel from the input image signal;
Feature quantity extraction means for extracting the feature quantity of the level distribution of the class tap area cut out by the first area cutout means;
Class information generating means for generating class information from the additional information and the feature amount;
Second region cutout means for extracting a prediction tap region composed of a plurality of pixels around the predetermined pixel of interest from the input digital image signal;
Predetermined corresponding to the class information generated by the class information generating means, a prediction coefficient for estimating the output image signal after processing is stored in the storage means,
According to the class information generated in the class information generation step, the product-sum operation of the prediction coefficient selected from the storage means and the plurality of pixels in the prediction tap area extracted by the second area cutout means, Arithmetic processing means for performing arithmetic processing to predict and generate a pixel value for the pixel of interest,
The calculated value of the product-sum operation of the prediction coefficient and the plurality of image data of the prediction tap area extracted by the second area cutout means, and the true pixel value in the predetermined image signal corresponding to the output image signal The prediction coefficient is determined in advance so as to minimize the difference between
Digital image signal processing characterized in that the additional information includes at least one of information indicating the type of the image signal to be processed, time and / or spatial resolution information of the image signal to be processed, and an encoding compression rate apparatus.

In claim 1,
The above prediction coefficient is
Pre-generated by the learning device using the teacher signal corresponding to the output image signal and the student signal corresponding to the input image signal,
The learning device
A first region cutout unit for learning that extracts a class tap region composed of a plurality of pixels around a predetermined target pixel from the student signal;
A feature extraction unit for learning that extracts a feature of the level distribution of the class tap region extracted by the first region extraction unit;
Learning class information generating means for generating class information from the additional information for decoding processing accompanying the student signal and the feature amount;
A second region cutout unit for learning that extracts a prediction tap region composed of a plurality of pixels around the predetermined pixel of interest from the student signal;
The difference between the calculated value of the product-sum operation between the prediction coefficient and the image data extracted by the second region extraction unit and the true pixel value in the predetermined image signal corresponding to the output image signal is minimized. As described above, and calculation means for calculating the prediction coefficient for each class information,
Digital image signal processing characterized in that the additional information includes at least one of information indicating the type of the image signal to be processed, time and / or spatial resolution information of the image signal to be processed, and an encoding compression rate apparatus.

In a digital image signal processing method in which prediction signal processing in units of pixels is performed on an input image signal generated by decoding an encoded digital image signal,
An additional information extraction step of extracting additional information for decryption processing;
A first region extraction step of extracting a class tap region composed of a plurality of pixels around a predetermined target pixel from the input image signal;
A feature amount extracting step of extracting a feature amount of the level distribution of the class tap region extracted by the first region extracting step;
A class information generating step for generating class information from the additional information and the feature amount; and a second region cutting out step for extracting a prediction tap region including a plurality of pixels around the predetermined target pixel from the input digital image signal. When,
Predetermined corresponding to the class information generated in the class information generation step, the prediction coefficient for estimating the output image signal after processing is stored in the storage means,
According to the class information generated in the class information generation step, the product-sum operation of the prediction coefficient selected from the storage means and the plurality of pixels in the prediction tap region extracted in the second region cutout step, An arithmetic processing step for performing arithmetic processing for predictive generation of a pixel value for the pixel of interest,
Calculated value of product-sum operation of the prediction coefficient and a plurality of image data of the prediction tap area extracted by the second area cutout step, and a true pixel value in a predetermined image signal corresponding to the output image signal The prediction coefficient is determined in advance so as to minimize the difference between
Digital image signal processing characterized in that the additional information includes at least one of information indicating the type of the image signal to be processed, time and / or spatial resolution information of the image signal to be processed, and an encoding compression rate Method.

In claim 3 ,
The above prediction coefficient is
Pre-generated by learning using a teacher signal corresponding to the output image signal and a student signal corresponding to the input image signal,
The above learning
A first region cutting step for learning that extracts a class tap region composed of a plurality of pixels around a predetermined pixel of interest from the student signal;
A feature amount extraction step for learning that extracts a feature amount of the level distribution of the class tap region extracted by the first region extraction step;
A class information generation step for learning for generating class information from the additional information for decoding processing and the feature amount accompanying the student signal;
A second region segmentation step for learning for extracting a prediction tap region composed of a plurality of pixels around the predetermined pixel of interest from the student signal;
The difference between the calculated value of the product-sum operation of the prediction coefficient and the image data extracted by the second region extraction step and the true pixel value in the predetermined image signal corresponding to the output image signal is minimized. As described above, the calculation step includes calculating the prediction coefficient for each class information,
Digital image signal processing characterized in that the additional information includes at least one of information indicating the type of the image signal to be processed, time and / or spatial resolution information of the image signal to be processed, and an encoding compression rate Method.