JP4081745B2

JP4081745B2 - Decoding device and decoding method, learning device and learning method, program, and recording medium

Info

Publication number: JP4081745B2
Application number: JP2002061419A
Authority: JP
Inventors: 哲二郎近藤; 俊彦浜松; 丈晴西片; 秀樹大塚; 威國弘; 孝文森藤
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-03-07
Filing date: 2002-03-07
Publication date: 2008-04-30
Anticipated expiration: 2022-03-07
Also published as: JP2003264837A

Description

【０００１】
【発明の属する技術分野】
本発明は、復号装置および復号方法、学習装置および学習方法、並びにプログラムおよび記録媒体に関し、特に、例えば、画像データを符号化した符号化データを、高品質（高画質）の画像に復号することができるようにする復号装置および復号方法、学習装置および学習方法、並びにプログラムおよび記録媒体に関する。
【０００２】
【従来の技術】
画像（動画像）データの高能率符号化方式としては、例えば、ＭＰＥＧ(Moving Picture Experts Group)方式が知られており、ＭＰＥＧ方式では、画像データが、横×縦が８×８画素のブロック単位で、水平および垂直の２方向についてＤＣＴ（Discrete Cosine Transform）変換され、さらに量子化される。
【０００３】
このように、ＭＰＥＧ方式では、画像データがＤＣＴ変換されるが、例えば、ＭＰＥＧ２方式では、ＤＣＴ変換の対象となるブロックのＤＣＴタイプを、マクロブロック単位で、フレームＤＣＴモードとフィールドＤＣＴモードに切り替えることができる。フレームＤＣＴモードでは、ブロックが、同一フレームの画素から構成され、そのようなブロックの画素値がＤＣＴ変換される。また、フィールドＤＣＴモードでは、ブロックが、同一フィールドの画素から構成され、そのようなブロックの画素値がＤＣＴ変換される。
【０００４】
ＤＣＴタイプを、フレームＤＣＴモードまたはフィールドＤＣＴモードのうちのいずれとするかは、基本的には、例えば、画像の動きや、周辺のマクロブロックとの連続性等の画像の特性に基づき、復号画像におけるブロック歪みモスキートノイズ等を低減するように決定される。即ち、例えば、動きの大きい画像については、フィールドＤＣＴモードが選択され、動きのほとんどない画像(静止している画像)については、フレームＤＣＴモードが選択される。
【０００５】
【発明が解決しようとする課題】
ところで、ＭＰＥＧ２方式においては、デコーダ側においてオーバーフローおよびアンダーフローが生じないように、符号化データのデータレートが制限される。そして、この符号化データのデータレートを制限するために、本来、フレームＤＣＴモードまたはフィールドＤＣＴモードに設定すべきＤＣＴタイプが、フィールドＤＣＴモードまたはフレームＤＣＴモードに、いわば不適切に設定されることがある。
【０００６】
即ち、ＤＣＴタイプとしては、一般には、フィールドを構成する画素間の相関（例えば、フィールドを構成する、隣接する画素どうしの差分の自乗和の逆数など）（以下、適宜、フィールド画素相関という）が、フレームを構成する画素間の相関（例えば、フレームを構成する、隣接する画素どうしの差分の自乗和の逆数など）（以下、適宜、フレーム画素相関という）より大であれば、フィールドＤＣＴモードが設定され、フレーム画素相関が、フィールド画素相関より大であれば、フレームＤＣＴモードが設定される。
【０００７】
しかしながら、符号化データが、データレートの制限を受ける場合には、ＤＣＴタイプは、フィールド画素相関とフレーム画素相関の大小に関係なく、その制限されたデータレートに基づいて設定され、従って、例えば、動きの大きい画像について、フィールドＤＣＴモードではなく、フレームＤＣＴモードが設定されるような、不適切なＤＣＴタイプが設定されることがある。
【０００８】
このような不適切なＤＣＴタイプが設定された場合であっても、デコーダ側では、その不適切なＤＣＴタイプにしたがって、符号化データを復号しなければならず、復号画像の画質が劣化する課題があった。
【０００９】
また、動きのある画像が、高圧縮率でＭＰＥＧ２符号化された場合には、データレートの制限に起因して、あるフレームのマクロブロックと、次のフレームの対応するマクロブロックとにおいて、同一の動き物体が表示されているのにもかかわらず、異なるＤＣＴタイプが設定されることがあり、その結果、動きが不自然な復号画像が得られることがあった。
【００１０】
一方、復号側において、復号画像から、フレームＤＣＴモードとフィールドＤＣＴモードのうちのいずれを設定するのが適切であったのかを判定することは困難である。
【００１１】
本発明は、このような状況に鑑みてなされたものであり、符号化データを、高品質（高画質）の画像に復号することができるようにするものである。
【００１２】
【課題を解決するための手段】
本発明の復号装置は、符号化データに含まれるＤＣＴタイプの正しさを、その符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定手段と、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出手段と、低品質データに対応する、学習の生徒となる生徒データと、高品質データに対応する、学習の教師となる教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差を統計的に最小にする学習を行うことにより得られるタップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算手段とを有する復号手段とを備え、予測タップ抽出手段は、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出することを特徴とする。
【００１３】
本発明の復号方法は、符号化データに含まれるＤＣＴタイプの正しさを、その符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定ステップと、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出ステップと、低品質データに対応する、学習の生徒となる生徒データと、高品質データに対応する、学習の教師となる教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差を統計的に最小にする学習を行うことにより得られるタップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算ステップとを含む復号ステップとを備え、予測タップ抽出ステップにおいては、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出することを特徴とする。
【００１４】
本発明の第１のプログラムは、符号化データに含まれるＤＣＴタイプの正しさを、その符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定ステップと、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出ステップと、低品質データに対応する、学習の生徒となる生徒データと、高品質データに対応する、学習の教師となる教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差を統計的に最小にする学習を行うことにより得られるタップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算ステップとを含む復号ステップとを備え、予測タップ抽出ステップにおいては、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出することを特徴とする。
【００１５】
本発明の第１の記録媒体は、符号化データに含まれるＤＣＴタイプの正しさを、その符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定ステップと、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出ステップと、低品質データに対応する、学習の生徒となる生徒データと、高品質データに対応する、学習の教師となる教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差を統計的に最小にする学習を行うことにより得られるタップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算ステップとを含む復号ステップとを備え、予測タップ抽出ステップにおいては、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出するプログラムが記録されていることを特徴とする。
【００１６】
本発明の学習装置は、学習用の画像データから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成手段と、学習用の画像データから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成手段と、学習用の画像データを符号化し、ＤＣＴタイプおよび画像データの動きベクトルを含む学習用の符号化データを出力する符号化手段と、学習用の符号化データに含まれるＤＣＴタイプの正しさを、その学習用の符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定手段と、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出手段と、低品質データに対応する生徒データと、高品質データに対応する教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差が統計的に最小になるタップ係数を求めるタップ係数演算手段とを有する学習手段と、タップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算手段を有する復号手段とを備え、予測タップ抽出手段は、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出することを特徴とする。
【００１７】
本発明の学習方法は、学習用の画像データから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成ステップと、学習用の画像データから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成ステップと、学習用の画像データを符号化し、ＤＣＴタイプおよび画像データの動きベクトルを含む学習用の符号化データを出力する符号化ステップと、学習用の符号化データに含まれるＤＣＴタイプの正しさを、その学習用の符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定ステップと、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出ステップと、低品質データに対応する生徒データと、高品質データに対応する教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差が統計的に最小になるタップ係数を求めるタップ係数演算ステップとを有する学習ステップと、タップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算ステップを有する復号ステップとを備え、予測タップ抽出ステップにおいては、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出することを特徴とする。
【００１８】
本発明の第２のプログラムは、学習用の画像データから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成ステップと、学習用の画像データから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成ステップと、学習用の画像データを符号化し、ＤＣＴタイプおよび画像データの動きベクトルを含む学習用の符号化データを出力する符号化ステップと、学習用の符号化データに含まれるＤＣＴタイプの正しさを、その学習用の符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定ステップと、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出ステップと、低品質データに対応する生徒データと、高品質データに対応する教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差が統計的に最小になるタップ係数を求めるタップ係数演算ステップとを有する学習ステップと、タップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算ステップを有する復号ステップとを備え、予測タップ抽出ステップにおいては、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出することを特徴とする。
【００１９】
本発明の第２の記録媒体は、学習用の画像データから、タップ係数の学習の教師となる教師データを生成して出力する教師データ生成ステップと、学習用の画像データから、タップ係数の学習の生徒となる生徒データを生成して出力する生徒データ生成ステップと、学習用の画像データを符号化し、ＤＣＴタイプおよび画像データの動きベクトルを含む学習用の符号化データを出力する符号化ステップと、学習用の符号化データに含まれるＤＣＴタイプの正しさを、その学習用の符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定し、その判定結果を表すミスマッチ情報を出力する判定ステップと、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データを注目データとし、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかを、予測タップとして抽出する予測タップ抽出ステップと、低品質データに対応する生徒データと、高品質データに対応する教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差が統計的に最小になるタップ係数を求めるタップ係数演算ステップとを有する学習ステップと、タップ係数と、予測タップとの積和演算を行うことにより、注目データを求める予測演算ステップを有する復号ステップとを備え、予測タップ抽出ステップにおいては、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップを抽出し、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップを抽出するプログラムが記録されていることを特徴とする。
【００２０】
本発明の復号装置および復号方法、並びに第１のプログラムおよび第１の記録媒体においては、符号化データに含まれるＤＣＴタイプの正しさが、その符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定され、その判定結果を表すミスマッチ情報を出力される。そして、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データが注目データとされ、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかが、予測タップとして抽出され、低品質データに対応する、学習の生徒となる生徒データと、高品質データに対応する、学習の教師となる教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差を統計的に最小にする学習を行うことにより得られるタップ係数と、予測タップとの積和演算を行うことにより、注目データが求められる。ここで、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップが抽出される。
【００２１】
本発明の学習装置および学習方法、並びに第２のプログラムおよび第２の記録媒体においては、学習用の画像データから、タップ係数の学習の教師となる教師データが生成されるとともに、生徒となる生徒データが生成される。また、学習用の画像データが符号化され、ＤＣＴタイプおよび画像データの動きベクトルを含む学習用の符号化データが出力される。そして、学習用の符号化データに含まれるＤＣＴタイプの正しさが、その学習用の符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定され、その判定結果を表すミスマッチ情報が出力される。そして、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データが注目データとされ、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかが、予測タップとして抽出され、低品質データに対応する生徒データと、高品質データに対応する教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差が統計的に最小になるタップ係数が求められ、タップ係数と、予測タップとの積和演算を行うことにより、注目データが求められる。ここで、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップが抽出される。
【００２２】
【発明の実施の形態】
図１は、本発明を適用した復号装置の一実施の形態の構成例を示している。
【００２３】
復号装置には、図示せぬ記録媒体（例えば、光ディスクや、光磁気ディスク、相変化ディスク、磁気テープ、半導体メモリ等）から再生された符号化データ、または伝送媒体（例えば、インターネットや、ＣＡＴＶ網、衛星回線、地上波等）を介して伝送されてくる符号化データが、復号対象として入力されるようになっている。ここで、符号化データは、画像（動画像）データを所定の符号化方式で符号化して得られるもので、少なくとも、その復号を制御するための復号制御情報を含んでいる。
【００２４】
なお、符号化データとしては、例えば、画像データをＭＰＥＧ２方式で符号化したもの等を採用することができる。
【００２５】
ここで、ＭＰＥＧ２方式では、符号化側において、画像データ（原画像）がブロック単位でＤＣＴ変換され、さらに量子化される。また、符号化側では、符号化対象の画像データについて、動きベクトルが検出されるとともに、符号化データがローカルデコードされ、そのローカルデコードされた画像データを参照画像として、その参照画像について、検出された動きベクトルを用いて動き補償が施されることにより、予測画像が生成される。そして、符号化対象の画像と予測画像との差分が演算されることにより、残差画像が求められ、その残差画像が、上述のようにＤＣＴ変換、量子化される。さらに、符号化側では、ブロック単位でのＤＣＴ変換にあたって、ＤＣＴタイプ（フレームＤＣＴモードまたはフィールドＤＣＴモード）が、マクロブロック単位で設定される。
【００２６】
一方、画像データ（原画像または残差画像）をＤＣＴ変換し、さらに量子化して得られるＤＣＴ係数を、量子化ＤＣＴ係数というものとすると、復号側では、量子化ＤＣＴ係数が、逆量子化され、ＤＣＴ係数とされる。さらに、復号側では、そのＤＣＴ係数が逆ＤＣＴ変換され、その結果得られる画素が、ＤＣＴタイプにしたがい、フレーム構造に並べ替えられることで、画像データが復号され、あるいは残差画像データが求められる。そして、残差画像データについては、既に復号された画像データを参照画像として、その参照画像について、動きベクトルを用いて動き補償が施されることにより、予測画像データが生成される。そして、残差画像データと予測画像データとが加算されることにより、画像データが復号される。
【００２７】
従って、画像データをＭＰＥＧ２方式で符号化して得られる符号化データには、画像データ（原画像または残差画像）をＤＣＴ変換し、さらに量子化して得られるＤＣＴ係数、つまり、画像データの直接の符号化結果の他、復号側において、そのＤＣＴ係数を画像に復号するのに必要な情報、即ち、動きベクトルや、ＤＣＴタイプなどの復号を制御する情報（以下、適宜、復号制御情報という）も含まれる。なお、符号化データには、動きベクトルやＤＣＴタイプの他、ピクチャタイプや、テンポラルリファレンス、その他の復号制御情報も含まれる。
【００２８】
復号装置に入力された符号化データは、ミスマッチ検出部１と復号処理部２に供給されるようになっている。
【００２９】
ミスマッチ検出部１は、符号化データからミスマッチ情報を検出する。即ち、ミスマッチ検出部１は、符号化データに含まれる復号制御情報の正しさを判定し、その判定結果を表すミスマッチ情報を、復号処理部２に出力する。復号処理部２は、ミスマッチ検出部１から供給されるミスマッチ情報に基づいて、符号化データを復号し、その結果得られる復号データを出力する。
【００３０】
次に、図２のフローチャートを参照して、図１の復号装置の処理（復号処理）について説明する。
【００３１】
ミスマッチ検出部１と復号処理部２には、符号化データが供給され、ミスマッチ検出部１は、まず最初に、ステップＳ１において、符号化データからミスマッチ情報を検出し、復号処理部２に供給して、ステップＳ２に進む。ステップＳ２では、復号処理部２が、ミスマッチ検出部１から供給されるミスマッチ情報に基づいて、そのミスマッチ情報が検出された符号化データを復号し、復号画像データを出力して、ステップＳ３に進む。ステップＳ３では、ミスマッチ検出部１または復号処理部２が、復号すべき符号化データが、まだ存在するかどうかを判定する。ステップＳ３において、復号すべき符号化データが、まだ存在すると判定された場合、ステップＳ１に戻り、以下、同様の処理が繰り返される。
【００３２】
また、ステップＳ３において、復号すべき符号化データが存在しないと判定された場合、処理を終了する。
【００３３】
次に、図３は、本発明を適用した復号装置の他の実施の形態の構成例を示している。なお、図中、図１における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。即ち、図３の復号装置は、パラメータ記憶部３が、新たに設けられている他は、基本的に、図１の復号装置と同様に構成されている。
【００３４】
パラメータ記憶部３は、後述する学習装置による学習によって得られたパラメータを記憶しており、復号処理部２は、パラメータ記憶部３に記憶されたパラメータを用いて、そこに供給される符号化データを復号する。
【００３５】
従って、図３の復号装置では、復号処理部２において、符号化データの復号が、パラメータ記憶部３に記憶されたパラメータを用いて行われる他は、図１の復号装置と同様の処理が行われるため、その処理についての説明は省略する。
【００３６】
次に、図４は、図３のパラメータ記憶部３に記憶させるパラメータを学習する学習装置の一実施の形態の構成例を示している。
【００３７】
学習用データ記憶部１１は、パラメータの学習に用いられる画像（動画像）データである学習用データを記憶している。
【００３８】
符号化部１２は、学習用データ記憶部１１に記憶されている学習用データを読み出し、図３の復号装置で復号対象とする符号化データと同一の符号化方式で、学習用データを符号化する。学習用データを符号化することにより得られる符号化データ（以下、適宜、学習用符号化データという）は、符号化部１２からミスマッチ検出部１３に供給されるようになっている。
【００３９】
ミスマッチ検出部１３は、図３のミスマッチ検出部１と同様に構成され、符号化部１２から供給される符号化データから、ミスマッチ情報を検出し、学習処理部１４に供給する。
【００４０】
学習処理部１４は、学習用データ記憶部１１に記憶されている学習用データを読み出し、その学習用データから、パラメータについての学習の教師となる教師データと、その学習の生徒となる生徒データを生成する。さらに、学習処理部１４は、ミスマッチ検出部１３から供給されるミスマッチ情報に基づき、生成した教師データと生徒データを用いて、生徒データを教師データに変換するパラメータを学習する。
【００４１】
次に、図５のフローチャートを参照して、図４の学習装置の処理（学習処理）について説明する。
【００４２】
まず最初に、ステップＳ１１において、符号化部１２は、学習用データ記憶部１１に記憶されている学習用データを読み出して符号化し、その結果得られる学習用符号化データを、ミスマッチ検出部１３に供給して、ステップＳ１２に進む。ステップＳ１２では、ミスマッチ検出部１３が、符号化部１２から供給される符号化データから、ミスマッチ情報を検出し、学習処理部１４に供給して、ステップＳ１３に進む。
【００４３】
ステップＳ１３では、学習処理部１４が、学習用データ記憶部１１から、学習用データを読み出し、その学習用データから、教師データと生徒データを生成する。さらに、学習処理部１４は、ミスマッチ検出部１３から供給されるミスマッチ情報に基づき、生成した教師データと生徒データを用いて、パラメータを学習する。
【００４４】
即ち、学習処理部１４は、ミスマッチ情報に基づき、生徒データから、対応する教師データを得ることができるようにするのに最適なパラメータを算出することができるようにするための処理（学習）を行う。
【００４５】
そして、ステップＳ１４に進み、符号化部１２または学習処理部１４が、まだ処理していない学習用データが、学習用データ記憶部１１に記憶されているかどうかを判定する。ステップＳ１４において、まだ処理していない学習用データが、学習用データ記憶部１１に記憶されていると判定された場合、ステップＳ１１に戻り、その、まだ処理していない学習用データを対象に、以下、同様の処理が繰り返される。
【００４６】
また、ステップＳ１４において、まだ処理していない学習用データが、学習用データ記憶部１１に記憶されていないと判定された場合、即ち、学習用データ記憶部１１に記憶された学習用データすべてを用いて学習を行った場合、ステップＳ１５に進み、学習処理部１４は、ステップＳ１３の学習結果に基づき、パラメータを算出し、処理を終了する。
【００４７】
次に、図６は、図３の復号装置の詳細構成例を示している。
【００４８】
復号制御情報抽出部２１には、画像データを、例えばＭＰＥＧ２方式で符号化して得られる符号化データが、復号対象として供給されるようになっており、復号制御情報抽出部２１は、符号化データから、その符号化データに含まれる複数（複数種類）の復号制御情報、即ち、本実施の形態では、例えば、ＤＣＴタイプ、ピクチャタイプ、動きベクトルを抽出して、判定部２２に供給する。
【００４９】
判定部２２は、復号制御情報抽出部２１から供給される複数の復号制御情報のうちの１つの（１つの種類の）復号制御情報の正しさを、他の（他の種類の）復号制御情報に基づいて判定する。そして、判定部２２は、その１つの復号制御情報の正しさの判定結果としてのミスマッチ情報を、復号処理部２に出力する。
【００５０】
なお、以上の復号制御情報抽出部２１および判定部２２が、図３のミスマッチ検出部１を構成している。
【００５１】
前処理部３１には、復号対象の符号化データが供給されるようになっており、前処理部３１は、符号化データに対して、所定の前処理を施し、その結果得られる前処理データを、クラス分類適応処理部３２に供給する。
【００５２】
クラス分類適応処理部３２は、前処理部３１から供給される前処理データから、後述する予測タップおよびクラスタップを構成し、係数メモリ４１に記憶されたパラメータを用いて、後述するクラス分類適応処理を行う。そして、クラス分類適応処理部３２は、クラス分類適応処理を行うことによって得られるデータ（以下、適宜、適応処理データという）を、後処理部３３に出力する。
【００５３】
また、クラス分類適応処理部３２には、ミスマッチ検出部１の判定部２２が出力するミスマッチ情報が供給されるようになっており、クラス分類適応処理部３２は、このミスマッチ情報に基づき、クラス分類適応処理を行う。
【００５４】
後処理部３３は、クラス分類適応処理部３２が出力するデータに対して、所定の後処理を施し、これにより、符号化データを、高画質の画像データに復号して出力する。
【００５５】
なお、以上の前処理部３１、クラス分類適応処理部３２、および後処理部３３が、図３の復号処理部２を構成している。
【００５６】
係数メモリ４１は、クラス分類適応処理部３２がクラス分類適応処理を行うのに用いる、後述するクラスごとのタップ係数を記憶している。
【００５７】
なお、この係数メモリ４１によって、図３のパラメータ記憶部３が構成されている。
【００５８】
次に、図７および図８を参照して、図６のミスマッチ検出部１の処理について説明する。
【００５９】
図７は、ＭＰＥＧ２方式において、フレームＤＣＴモードでＤＣＴ変換されるブロック（図７Ａ）と、フィールドＤＣＴモードでＤＣＴ変換されるブロック（図７Ｂ）を示している。
【００６０】
なお、図７の実施の形態では、輝度信号のブロックを示してある。また、図７において（後述する図８においても同様）、影を付してあるラインは、奇数ライン（トップフィールド）を表し、影を付していないラインは、偶数ライン（ボトムフィールド）を表す。
【００６１】
フレームＤＣＴモードでは、横×縦が１６×１６画素で構成されるマクロブロックが、図７Ａに示すように、左上、左下、右上、または右下の４つの８×８画素のブロックに分割され、各ブロックがＤＣＴ変換される。
【００６２】
一方、フィールドＤＣＴモードでは、マクロブロックは、図７Ｂに示すように、上側の８ラインが奇数ライン（トップフィールド）で構成され、下側の８ラインが偶数ライン（ボトムフィールド）で構成されるように、画素の位置が並べ替えられる。そして、その並べ替え後のマクロブロックが、左上、左下、右上、または右下の４つの８×８画素のブロックに分割され、各ブロックがＤＣＴ変換される。
【００６３】
以上のように、フレームＤＣＴモードでは、同一フレームを構成する８×８画素のブロック単位で、ＤＣＴ変換が行われ、フィールドＤＣＴモードでは、同一フィールドを構成する８×８画素のブロック単位で、ＤＣＴ変換が行われる。
【００６４】
ところで、例えば、いま、円形の動き物体が、水平方向に移動している画像を考えた場合、あるフレームを構成するトップフィールドとボトムフィールドにおいて、円形の動き物体は、例えば、図８Ａに示すように、その動きに対応して、少しずれた位置に表示される。このため、このような動き物体が表示された画像については、フレーム画素相関よりも、フィールド画素相関の方が大になり、フィールドＤＣＴモードでＤＣＴ変換を行うことにより、滑らかな動きの復号画像を得ることができる。
【００６５】
しかしながら、ＭＰＥＧ方式では、前述したように、動き物体が表示された画像について、データレートの制限に起因して、符号化データのデータ量を低減するために、フィールドＤＣＴモードではなく、フレームＤＣＴモードで、画像データがＤＣＴ変換される場合がある。
【００６６】
いま、円形の動き物体が表示されている部分の一部のマクロブロックについて、フレームＤＣＴモードが設定されるとともに、他のマクロブロックについて、フィールドＤＣＴモードが設定され、ＤＣＴ変換が行われたとすると、フレームＤＣＴモードが設定されたマクロブロックについては、例えば、図８Ｂに示すように、円形の動き物体のエッジ部分がぼやけた復号画像が得られる。
【００６７】
ここで、図８Ｂは、２×２個のマクロブロックのうち、右上のマクロブロックのＤＣＴタイプがフレームＤＣＴモードとされ、他の３つのマクロブロックのＤＣＴタイプがフィールドＤＣＴモードとされた場合の復号画像を示している。
【００６８】
ＤＣＴタイプを、フレームＤＣＴモードまたはフィールドＤＣＴモードのうちのいずれとするかは、マクロブロック単位で設定されることから、異なるフレームの対応するマクロブロック（同一位置のマクロブロック）であっても、ＤＣＴタイプが異なる場合がある。そして、動き物体が表示されている、ある位置のマクロブロックのＤＣＴタイプが、フレーム単位で変化すると、復号画像における動き物体の動きは、不自然なものとなる。
【００６９】
このような復号画像におけるエッジ部分のぼけ（ぶれ）や、不自然な動きは、フィールドＤＣＴモードでＤＣＴ変換すべきマクロブロックが、データレートの制限から、フレームＤＣＴモードでＤＣＴ変換されたこと、即ち、動きのある部分は、フィールドＤＣＴモードでＤＣＴ変換すべきであるのに、フレームＤＣＴモードでＤＣＴ変換されたことに起因する。従って、フィールドＤＣＴモードでＤＣＴ変換すべきマクロブロックを、フレームＤＣＴモードでＤＣＴ変換したことは、復号画像の画質を向上させる観点からは、正しくない（適切でない）ということができ、符号化データに含まれる復号制御情報の１つである、そのようなフレームＤＣＴモードを表すＤＣＴタイプも正しくないということができる。
【００７０】
そこで、ミスマッチ検出部１は、例えば、符号化データに含まれるＤＣＴタイプの正しさを判定し、その判定結果を表すミスマッチ情報を出力する。
【００７１】
即ち、ミスマッチ検出部１は、例えば、動きのある画像が表示されているマクロブロックのＤＣＴタイプが、フレームＤＣＴモードとなっている場合、そのマクロブロックのＤＣＴタイプが正しくないと判定する。一方、ミスマッチ検出部１は、例えば、動きのある画像が表示されているマクロブロックのＤＣＴタイプが、フィールドＤＣＴモードとなっている場合と、マクロブロックに動きがない画像が表示されている場合は、そのマクロブロックのＤＣＴタイプが正しいと判定する。
【００７２】
なお、ミスマッチ検出部１は、マクロブロック（に表示された画像）に動きがあるかどうかを、符号化データに含まれる復号制御情報の他の１つである、例えば、そのマクロブロックの動きベクトルに基づいて判定する。
【００７３】
次に、図９は、図６のクラス分類適応処理部３２の構成例を示している。
【００７４】
クラス分類適応処理は、クラス分類処理と適応処理とからなり、クラス分類処理によって、データが、その性質に基づいてクラス分けされ、各クラスごとに適応処理が施される。
【００７５】
ここで、適応処理について、低画質の画像（以下、適宜、低画質画像という）を、高画質の画像（以下、適宜、高画質画像という）に変換する場合を例に説明する。
【００７６】
この場合、適応処理では、低画質画像を構成する画素（以下、適宜、低画質画素という）と、所定のタップ係数との線形結合により、その低画質画像の画質を向上させた高画質画像の画素の予測値を求めることで、その低画質画像の画質を高画質化した画像が得られる。
【００７７】
具体的には、例えば、いま、ある高画質画像データを教師データとするとともに、その高画質画像の画質を劣化させた低画質画像データを生徒データとして、高画質画像を構成する画素（以下、適宜、高画質画素という）ｙの予測値Ｅ［ｙ］を、幾つかの低画質画素（低画質画像を構成する画素の画素値）ｘ₁，ｘ₂，・・・の集合と、所定のタップ係数ｗ₁，ｗ₂，・・・の線形結合により規定される線形１次結合モデルにより求めることを考える。この場合、予測値Ｅ［ｙ］は、次式で表すことができる。
【００７８】
Ｅ［ｙ］＝ｗ₁ｘ₁＋ｗ₂ｘ₂＋・・・・・・（１）
【００７９】
式（１）を一般化するために、タップ係数ｗ_jの集合でなる行列Ｗ、生徒データｘ_ijの集合でなる行列Ｘ、および予測値Ｅ［ｙ_j］の集合でなる行列Ｙ’を、
【数１】

で定義すると、次のような観測方程式が成立する。
【００８０】
ＸＷ＝Ｙ’・・・（２）
【００８１】
ここで、行列Ｘの成分ｘ_ijは、ｉ件目の生徒データの集合（ｉ件目の教師データｙ_iの予測に用いる生徒データの集合）の中のｊ番目の生徒データを意味し、行列Ｗの成分ｗ_jは、生徒データの集合の中のｊ番目の生徒データとの積が演算されるタップ係数を表す。また、ｙ_iは、ｉ件目の教師データを表し、従って、Ｅ［ｙ_i］は、ｉ件目の教師データの予測値を表す。なお、式（１）の左辺におけるｙは、行列Ｙの成分ｙ_iのサフィックスｉを省略したものであり、また、式（１）の右辺におけるｘ₁，ｘ₂，・・・も、行列Ｘの成分ｘ_ijのサフィックスｉを省略したものである。
【００８２】
式（２）の観測方程式に最小自乗法を適用して、高画質画素（の画素値）ｙに近い予測値Ｅ［ｙ］を求めることを考える。この場合、教師データとなる高画質画素の真値ｙの集合でなる行列Ｙ、および高画質画素ｙの予測値Ｅ［ｙ］の残差（真値ｙに対する誤差）ｅの集合でなる行列Ｅを、
【数２】

で定義すると、式（２）から、次のような残差方程式が成立する。
【００８３】
ＸＷ＝Ｙ＋Ｅ・・・（３）
【００８４】
この場合、高画質画素ｙに近い予測値Ｅ［ｙ］を求めるためのタップ係数ｗ_jは、自乗誤差
【数３】

を最小にすることで求めることができる。
【００８５】
従って、上述の自乗誤差をタップ係数ｗ_jで微分したものが０になる場合、即ち、次式を満たすタップ係数ｗ_jが、高画質画素ｙに近い予測値Ｅ［ｙ］を求めるため最適値ということになる。
【００８６】
【数４】

【００８７】
そこで、まず、式（３）を、タップ係数ｗ_jで微分することにより、次式が成立する。
【００８８】
【数５】

【００８９】
式（４）および（５）より、式（６）が得られる。
【００９０】
【数６】

【００９１】
さらに、式（３）の残差方程式における生徒データｘ_ij、タップ係数ｗ_j、教師データｙ_i、および残差ｅ_iの関係を考慮すると、式（６）から、次のような正規方程式を得ることができる。
【００９２】
【数７】

【００９３】
なお、式（７）に示した正規方程式は、行列（共分散行列）Ａおよびベクトルｖを、
【数８】

で定義するとともに、ベクトルＷを、数１で示したように定義すると、式
ＡＷ＝ｖ・・・（８）
で表すことができる。
【００９４】
式（７）における各正規方程式は、生徒データｘ_ijおよび教師データｙ_iのセットを、ある程度の数だけ用意することで、求めるべきタップ係数ｗ_jの数Ｊと同じ数だけたてることができ、従って、式（８）を、ベクトルＷについて解くことで（但し、式（８）を解くには、式（８）における行列Ａが正則である必要がある）、最適なタップ係数ｗ_jを求めることができる。なお、式（８）を解くにあたっては、例えば、掃き出し法（Gauss-Jordanの消去法）などを用いることが可能である。
【００９５】
以上のように、生徒データと教師データを用いて、最適なタップ係数（ここでは、生徒データから教師データの予測値を求めた場合に、その予測値の自乗誤差の総和を最小にするタップ係数）ｗ_jを求める学習をしておき、さらに、そのタップ係数ｗ_jを用い、式（１）により、教師データｙに近い予測値Ｅ［ｙ］を求めるのが適応処理である。
【００９６】
なお、適応処理は、低画質画像には含まれていないが、高画質画像に含まれる成分が再現される点で、単なる補間とは異なる。即ち、適応処理では、式（１）だけを見る限りは、いわゆる補間フィルタを用いての単なる補間と同一に見えるが、その補間フィルタのタップ係数に相当するタップ係数ｗが、教師データと生徒データを用いての学習により求められるため、教師データとしての高画質画像に含まれる成分を再現することができる。このことから、適応処理は、いわば画像の創造作用がある処理ということができる。
【００９７】
ここで、生徒データとしては、例えば、教師データとしての高画質の画像データをＭＰＥＧ符号化し、さらにＭＰＥＧ復号して得られる復号画像データを用いることができる。この場合、ＭＰＥＧ符号化における量子化に起因して生じるブロック歪み等を低減した高画質の画像を求めることのできるタップ係数を得ることができる。
【００９８】
さらに、例えば、教師データとして、高画質の画像データを用いるとともに、生徒データとして、教師データとしての画像データをＤＣＴ変換し、さらに量子化、逆量子化して得られるＤＣＴ係数を用いるようにすることも可能である。この場合、ＤＣＴ係数を、高画質の画像（の予測値）に変換するタップ係数を得ることができる。
【００９９】
また、上述の場合には、高画質画像の予測値を、線形１次予測するようにしたが、その他、高画質画像の予測値は、２次以上の式によって予測することも可能である。
【０１００】
図９のクラス分類適応処理部３２は、上述のようなクラス分類適応処理を行うようになっている。
【０１０１】
即ち、前処理部３１（図６）が出力する前処理データは、タップ抽出部５１および５２に供給されるようになっている。
【０１０２】
タップ抽出部５１は、得ようとしている適応処理データを、注目データとし、さらに、その注目データを予測するのに用いる前処理データの幾つかを、予測タップとして抽出する。また、タップ抽出部５２は、注目データをクラス分類するのに用いる前処理データの幾つかを、クラスタップとして抽出する。
【０１０３】
ここで、タップ抽出部５１および５２には、判定部２２（図６）が出力するミスマッチ情報も供給されるようになっている。そして、タップ抽出部５１と５２は、ミスマッチ情報に基づき、予測タップとクラスタップの構造を、それぞれ変更するようになっている。
【０１０４】
なお、ここでは、説明を簡単にするために、予測タップとクラスタップは、同一のタップ構造を有するものとする。但し、予測タップとクラスタップとは、異なるタップ構造とすることが可能である。
【０１０５】
タップ抽出部５１で得られた予測タップは、予測部５４に供給され、タップ抽出部５２で得られたクラスタップは、クラス分類部５３に供給される。
【０１０６】
クラス分類部５３には、クラスタップの他、ミスマッチ情報も供給されるようになっており、クラス分類部５３は、タップ抽出部５２からのクラスタップとミスマッチ情報に基づき、注目データをクラス分類し、その結果得られるクラスに対応するクラスコードを、係数メモリ４１に供給する。
【０１０７】
係数メモリ４１は、各クラスコードに対応するアドレスに、そのクラスコードに対応するクラスのタップ係数を記憶しており、クラス分類部５３から供給されるクラスコードに対応するアドレスに記憶されているタップ係数を、予測部５４に供給する。
【０１０８】
予測部５４は、タップ抽出部５１が出力する予測タップと、係数メモリ４１が出力するタップ係数とを取得し、その予測タップとタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部５４は、適応処理データ（の予測値）を求めて出力する。
【０１０９】
次に、図１０のフローチャートを参照して、図６の復号装置の処理（復号処理）について説明する。
【０１１０】
クラス分類適応処理部３２（図９）のタップ抽出部５１では、得ようとしている適応処理データが、注目データとされ、ステップＳ２１において、ミスマッチ検出部１が、その注目データに対応する符号化データ（以下、適宜、注目符号化データという）から、ミスマッチ情報を生成する。
【０１１１】
即ち、ミスマッチ検出部１では、復号制御情報抽出部２１が、注目符号化データから、複数の復号制御情報としての、例えば、動きベクトルやＤＣＴタイプなどを抽出し、判定部２２に供給する。そして、判定部２２は、例えば、復号制御情報抽出部２１から供給される動きベクトルなどに基づいて、同じく復号制御情報抽出部２１から供給されるＤＣＴタイプの正しさを判定し、その判定結果としてのミスマッチ情報を、クラス分類適応処理部３２に供給する。
【０１１２】
そして、ステップＳ２２に進み、前処理部３１は、注目データについての予測タップとクラスタップを構成するのに必要な前処理データを得るための符号化データに対して、前処理を施し、その結果得られる前処理データを、クラス分類適応処理部３２に供給する。
【０１１３】
クラス分類適応処理部３２（図９）では、ステップＳ２３において、タップ抽出部５１と５２が、前処理部３１から供給される前処理データを用い、例えば、ミスマッチ検出部１からのミスマッチ情報に基づくタップ構造の予測タップとクラスタップを、それぞれ構成する。そして、予測タップは、タップ抽出部５１から予測部５４に供給され、クラスタップは、タップ抽出部５２からクラス分類部５３に供給される。
【０１１４】
クラス分類部５３は、タップ抽出部５２から、注目データについてのクラスタップを受信し、ステップＳ２４において、そのクラスタップと、ミスマッチ検出部１から供給されるミスマッチ情報に基づき、注目データをクラス分類し、注目データのクラスを表すクラスコードを、係数メモリ４１に出力する。
【０１１５】
係数メモリ４１は、クラス分類部５３から供給されるクラスコードに対応するアドレスに記憶されているタップ係数を読み出して出力する。予測部５４は、ステップＳ２５において、係数メモリ４１が出力するタップ係数を取得し、ステップＳ２６に進む。
【０１１６】
ステップＳ２６では、予測部５４が、タップ抽出部５１が出力する予測タップと、係数メモリ４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部５４は、注目データとしての適応処理データ（の予測値）を求め、後処理部３３に供給する。
【０１１７】
後処理部３３（図６）は、ステップＳ２７において、クラス分類適応処理部３２（の予測部５４）からの注目データに対して、所定の後処理を施し、これにより、復号画像データを得て出力する。
【０１１８】
その後、ステップＳ２８に進み、まだ、注目データとしていない適応処理データがあるかどうかが判定される。ステップＳ２８において、まだ、注目データとしていない適応処理データがあると判定された場合、その、まだ注目データとされていない適応処理データのうちの１つが、新たに注目データとされ、ステップＳ２１に戻り、以下、同様の処理が繰り返される。
【０１１９】
また、ステップＳ２８において、まだ、注目データとされていない適応処理データがないと判定された場合、処理を終了する。
【０１２０】
次に、図１１は、図６の係数メモリ４１に記憶させるタップ係数を学習する場合の、図４の学習装置の詳細構成例を示している。
【０１２１】
図１１の実施の形態において、ミスマッチ検出部１３は、復号制御情報抽出部７１および判定部７２から構成されており、符号化部１２が出力する符号化データは、復号制御情報抽出部７１に供給されるようになっている。復号制御情報抽出部７１または判定部７２は、図６の復号制御情報抽出部２１または判定部２２とそれぞれ同様に構成されており、図６で説明した場合と同様に、後述する注目教師データに対応する符号化データから、ミスマッチ情報を求めて、学習処理部１４に供給する。
【０１２２】
学習処理部１４は、適応学習部６０、教師データ生成部６１、および生徒データ生成部６３から構成されている。
【０１２３】
適応学習部６０は、教師データ記憶部６２、生徒データ記憶部６４、タップ抽出部６５および６６、クラス分類部６７、足し込み部６８、およびタップ係数算出部６９から構成され、教師データ生成部６１は、逆後処理部６１Ａから構成され、生徒データ生成部６３は、符号化部６３Ａおよび前処理部６３Ｂから構成されている。
【０１２４】
逆後処理部６１Ａは、学習用データ記憶部１１から学習用データを読み出し、図６の後処理部３３が行う処理と相補的な関係にある処理（以下、適宜、逆後処理という）を行う。即ち、例えば、学習用データをｙとするとともに、図６の後処理部３３が、適応処理データｘに対して施す後処理を、関数ｆ（ｘ）で表すとすると、逆後処理部６１Ａは、学習用データｙに対して、関数ｆ^-1（ｙ）（ｆ^-1（）は、関数ｆ（）の逆関数を表す）で表される処理を逆後処理として施し、その結果得られるデータを、教師データとして、適応学習部６０に出力する。なお、逆後処理部６１Ａが出力する教師データは、図６のクラス分類適応処理部３２から後処理部３３に供給される適応処理データに相当する。
【０１２５】
教師データ記憶部６２は、教師データ生成部６１（の逆後処理部６１Ａ）が出力する教師データを一時記憶する。
【０１２６】
符号化部６３Ａは、学習用データ記憶部１１から学習用データを読み出し、符号化部１２と同一の符号化方式、即ち、本実施の形態では、例えば、ＭＰＥＧ２方式で符号化して出力する。従って、符号化部６３Ａは、符号化部１２が出力するのと同一の符号化データを出力する。なお、符号化部１２と６３Ａとは、１つの符号化部で共用することが可能である。
【０１２７】
前処理部６３Ｂは、符号化部６３Ａが出力する符号化データに対して、図６の前処理部３１が行うのと同一の前処理を施し、その結果得られる前処理データを、生徒データとして、適応学習部６０に出力する。なお、前処理部６３Ｂが出力する生徒データは、図６の前処理部３１からクラス分類適応処理部３２に供給される前処理データに相当する。
【０１２８】
生徒データ記憶部６４は、生徒データ生成部６３（の前処理部６３Ｂ）が出力する生徒データを一時記憶する。
【０１２９】
タップ抽出部６５は、教師データ記憶部６２に記憶された教師データを、順次、注目教師データとし、その注目教師データについて、生徒データ記憶部６４に記憶された生徒データを抽出することにより、図９のタップ抽出部５１が構成するのと同一のタップ構造の予測タップを構成して出力する。なお、タップ抽出部６５には、ミスマッチ検出部１３（の判定部７２）が出力するミスマッチ情報が供給されるようになっており、タップ抽出部６５は、図９のタップ抽出部５１と同様に、注目教師データについてのミスマッチ情報に基づいて、予測タップのタップ構造を変更するようになっている。
【０１３０】
タップ抽出部６６は、注目教師データについて、生徒データ記憶部６４に記憶された生徒データを抽出することにより、図９のタップ抽出部５２が構成するのと同一のタップ構造のクラスタップを構成して出力する。なお、タップ抽出部６６には、ミスマッチ検出部１３が出力するミスマッチ情報が供給されるようになっており、タップ抽出部６６は、図９のタップ抽出部５２と同様に、注目教師データについてのミスマッチ情報に基づいて、クラスタップのタップ構造を変更するようになっている。
【０１３１】
クラス分類部６７には、タップ抽出部６６が出力するクラスタップと、ミスマッチ検出部１３が出力するミスマッチ情報が供給されるようになっている。クラス分類部６７は、注目教師データについてのクラスタップとミスマッチ情報に基づき、注目教師データについて、図９のクラス分類部５３と同一のクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部６８に出力する。
【０１３２】
足し込み部６８は、教師データ記憶部６２から、注目教師データを読み出し、その注目教師データと、タップ抽出部６５から供給される注目教師データについて構成された予測タップを構成する生徒データを対象とした足し込みを、クラス分類部６７から供給されるクラスコードごとに行う。
【０１３３】
即ち、足し込み部６８は、クラス分類部６７から供給されるクラスコードに対応するクラスごとに、予測タップ（生徒データ）を用い、式（８）の行列Ａにおける各コンポーネントとなっている、生徒データどうしの乗算（ｘ_inｘ_im）と、サメーション（Σ）に相当する演算を行う。
【０１３４】
さらに、足し込み部６８は、やはり、クラス分類部６７から供給されるクラスコードに対応するクラスごとに、予測タップ（生徒データ）および教師データを用い、式（８）のベクトルｖにおける各コンポーネントとなっている、生徒データと教師データの乗算（ｘ_inｙ_i）と、サメーション（Σ）に相当する演算を行う。
【０１３５】
即ち、足し込み部６８は、前回、注目教師データとされた教師データについて求められた式（８）における行列Ａのコンポーネントと、ベクトルｖのコンポーネントを、その内蔵するメモリ（図示せず）に記憶しており、その行列Ａまたはベクトルｖの各コンポーネントに対して、新たに注目教師データとされた教師データについて、その教師データｙ_iおよび生徒データx_in(x_im)を用いて計算される、対応するコンポーネントｘ_inｘ_imまたはｘ_inｙ_iを足し込む（行列Ａ、ベクトルｖにおけるサメーションで表される加算を行う）。
【０１３６】
そして、足し込み部６８は、教師データ記憶部６２に記憶された教師データすべてを注目教師データとして、上述の足し込みを行うことにより、各クラスについて、式（８）に示した正規方程式をたてると、その正規方程式を、タップ係数算出部６９に供給する。
【０１３７】
タップ係数算出部６９は、足し込み部６８から供給されるクラスごとの正規方程式を解くことにより、各クラスごとのタップ係数を求めて出力する。
【０１３８】
次に、図１２のフローチャートを参照して、図１１の学習装置の処理（学習処理）について、説明する。
【０１３９】
まず最初に、ステップＳ３１において、教師データ生成部６１と生徒データ生成部６３が、学習用データ記憶部１１に記憶された学習用データから、教師データと生徒データを、それぞれ生成する。教師データは、教師データ生成部６１から教師データ記憶部６２に供給されて記憶され、生徒データは、生徒データ生成部６３から生徒データ記憶部６４に供給されて記憶される。
【０１４０】
その後、タップ抽出部６５は、教師データ記憶部６２に記憶された教師データのうち、まだ、注目教師データとしていないものを、注目教師データとする。そして、ステップＳ３２において、符号化部１２は、学習用データ記憶部１１に記憶された学習用データを符号化し、これにより、注目教師データに対応する符号化データ（注目教師データに対応する学習用データを符号化したもの）を得て、ミスマッチ検出部１３に供給する。
【０１４１】
ミスマッチ検出部１３は、符号化部１２から供給される符号化データから、注目教師データについてのミスマッチ情報を生成し、学習処理部１４のタップ抽出部６５および６６、並びにクラス分類部６７に供給する。
【０１４２】
そして、ステップＳ３４に進み、タップ抽出部６５が、ミスマッチ情報に基づき、注目教師データについて、生徒データ記憶部６４に記憶された生徒データを読み出して予測タップを構成し、足し込み部６８に供給するとともに、タップ抽出部６６が、やはり、ミスマッチ情報に基づき、注目教師データについて、生徒データ記憶部６４に記憶された生徒データを読み出してクラスタップを構成し、クラス分類部６７に供給する。
【０１４３】
クラス分類部６７は、ステップＳ３５において、注目教師データについてのクラスタップとミスマッチ情報に基づき、注目教師データについてクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部６８に出力する。
【０１４４】
足し込み部６８は、ステップＳ３６において、教師データ記憶部６２から注目教師データを読み出し、その注目教師データと、タップ抽出部６５からの予測タップを用い、式（８）の行列Ａとベクトルｖのコンポーネントを計算する。さらに、足し込み部６８は、既に得られている行列Ａとベクトルｖのコンポーネントのうち、クラス分類部６７からのクラスコードに対応するものに対して、注目データと予測タップから求められた行列Ａとベクトルｖのコンポーネントをそれぞれ足し込み、ステップＳ３７に進む。
【０１４５】
ステップＳ３７では、タップ抽出部６５が、教師データ記憶部６２に、まだ、注目教師データとしていない教師データが記憶されているかどうかを判定する。ステップＳ３７において、注目教師データとしていない教師データが、まだ、教師データ記憶部６２に記憶されていると判定された場合、タップ抽出部６５は、まだ注目教師データとしていない教師データを、新たに、注目教師データとして、ステップＳ３２に戻り、以下、同様の処理が繰り返される。
【０１４６】
また、ステップＳ３７において、注目教師データとしていない教師データが、教師データ記憶部６２に記憶されていないと判定された場合、足し込み部６８は、いままでの処理によって得られたクラスごとの行列Ａおよびベクトルｖのコンポーネントで構成される式（８）の正規方程式を、タップ係数算出部６９に供給し、ステップＳ３８に進む。
【０１４７】
ステップＳ３８では、タップ係数算出部６９は、足し込み部６８から供給される各クラスごとの正規方程式を解くことにより、各クラスごとに、タップ係数を求めて出力し、処理を終了する。
【０１４８】
なお、学習用データ記憶部１１に記憶されている学習用データの数が十分でないこと等に起因して、タップ係数を求めるのに必要な数の正規方程式が得られないクラスが生じることがあり得るが、そのようなクラスについては、タップ係数算出部６９は、例えば、デフォルトのタップ係数を出力するようになっている。
【０１４９】
次に、図１３は、符号化データが画像データをＭＰＥＧ２方式で符号化したものである場合の、図６の復号装置の第１の詳細構成例を示している。
【０１５０】
図１３の実施の形態では、復号制御情報抽出部２１は、逆ＶＬＣ部１１１で構成されている。逆ＶＬＣ部１１１は、例えば、後述するＭＰＥＧデコーダ１１６を構成する逆ＶＬＣ部１２１（図１４）と同様に構成されており、符号化データから、複数の復号制御情報としての、例えば、ＤＣＴタイプ、ピクチャタイプ、マクロブロック（ＭＢ）タイプ、動きベクトルを抽出し、判定部２２に供給する。
【０１５１】
判定部２２は、フィールド／フレーム判定部１１２、イントラ／ノンイントラ判定部１１３、静動判定部１１４、およびミスマッチ情報生成部１１５から構成されている。
【０１５２】
フィールド／フレーム判定部１１２は、逆ＶＬＣ部１１１が出力するＤＣＴタイプに基づいて、注目データに対応する画素を有するブロック（以下、適宜、注目ブロックという）が、フレームＤＣＴモードとフィールドＤＣＴモードのうちのいずれでＤＣＴ変換されたかを判定し、その判定結果を、ミスマッチ情報生成部１１５に供給する。
【０１５３】
イントラ／ノンイントラ判定部１１３は、逆ＶＬＣ部１１１が出力するピクチャタイプとマクロブロックタイプに基づいて、注目ブロック（を含むマクロブロック）が、イントラ符号化とノンイントラ符号化のうちのいずれで符号化されているのかを判定し、その判定結果を、ミスマッチ情報生成部１１５に供給する。
【０１５４】
静動判定部１１４は、逆ＶＬＣ部１１１が出力する動きベクトルに基づいて、注目ブロックの動きの有無（注目ブロックに表示された画像の動きの有無）を判定し、その判定結果を、ミスマッチ情報生成部１１５に供給する。
【０１５５】
ミスマッチ情報生成部１１５は、フィールド／フレーム判定部１１２、イントラ／ノンイントラ判定部１１３、および静動判定部１１４の出力に基づいて、逆ＶＬＣ部１１１が出力する注目ブロック（を含むマクロブロック）のＤＣＴタイプの正しさを判定し、その判定結果としてのミスマッチ情報を生成して、クラス分類適応処理部３２に供給する。
【０１５６】
ここで、図１３の実施の形態では、前処理部３１は、ＭＰＥＧデコーダ１１６で構成されており、ＭＰＥＧデコーダ１１６は、符号化データをＭＰＥＧ２方式で復号し、その結果得られる復号画像データを、前処理データとして、クラス分類適応処理部３２に供給する。
【０１５７】
次に、図１４は、図１３のＭＰＥＧデコーダ１１６の構成例を示している。
【０１５８】
符号化データは、逆ＶＬＣ部１２１に供給される。逆ＶＬＣ部１２１は、符号化データから、量子化ＤＣＴ係数（量子化されたＤＣＴ係数）のＶＬＣコード（量子化ＤＣＴ係数を可変長符号化したもの）を分離するとともに、量子化ステップ、動きベクトル、ピクチャタイプ、テンポラルリファレンス、その他の復号制御情報を分離する。
【０１５９】
そして、逆ＶＬＣ部１２１は、量子化ＤＣＴ係数のＶＬＣコードを逆ＶＬＣ処理することで、量子化ＤＣＴ係数に復号し、逆量子化部１２２に供給する。さらに、逆ＶＬＣ部１２１は、量子化ステップを逆量子化部１２２に、動きベクトルを動き補償部１２５に、ピクチャタイプをメモリ１２６に、テンポラルリファレンスをピクチャ選択部１２７に、それぞれ供給する。
【０１６０】
逆量子化部１２２は、逆ＶＬＣ部１２１から供給される量子化ＤＣＴ係数を、同じく逆ＶＬＣ部１２１から供給される量子化ステップで逆量子化し、その結果得られるＤＣＴ係数を、逆ＤＣＴ変換部１２３に供給する。逆ＤＣＴ変換部１２３は、逆量子化部１２２から供給されるＤＣＴ係数を、逆ＤＣＴ変換し、演算部１２４に供給する。
【０１６１】
演算部１２４には、逆ＤＣＴ変換部１２３の出力の他、動き補償部１２５の出力も供給されるようになっており、演算部１２４は、逆ＤＣＴ変換部１２３の出力に対して、動き補償部１２５の出力を、必要に応じて加算することにより、復号画像データを得て出力する。
【０１６２】
即ち、ＭＰＥＧ符号化では、ピクチャタイプとして、Ｉ，Ｐ，Ｂの３つが定義されており、各ピクチャは、横×縦が８×８画素のブロック単位で、ＤＣＴ変換されるが、その際、Ｉピクチャのブロックは、他のフレームまたはフィールドを参照せずに（予測画像との差分が計算されずに）イントラ(intra)符号化され、Ｐピクチャのブロックは、イントラ符号化、または前方予測符号化され、Ｂピクチャのブロックは、イントラ符号化、前方予測符号化、後方予測符号化、または両方向予測符号化される。
【０１６３】
ここで、前方予測符号化では、符号化対象のブロックのフレーム（またはフィールド）より時間的に先行するフレーム（またはフィールド）の画像を参照画像として、その参照画像を動き補償することにより得られる、符号化対象のブロックの予測画像と、符号化対象のブロックとの差分が求められ、その差分値、即ち、残差画像がＤＣＴ変換される。
【０１６４】
また、後方予測符号化では、符号化対象のブロックのフレームより時間的に後行するフレームの画像を参照画像として、その参照画像を動き補償することにより得られる、符号化対象のブロックの予測画像と、符号化対象のブロックとの差分が求められ、その差分値（残差画像）がＤＣＴ変換される。
【０１６５】
さらに、両方向予測符号化では、符号化対象のブロックのフレームより時間的に先行するフレームと後行するフレームの２フレーム（またはフィールド）の画像を参照画像として、その参照画像を動き補償することにより得られる、符号化対象のブロックの予測画像と、符号化対象のブロックとの差分が求められ、その差分値（残差画像）がＤＣＴ変換される。
【０１６６】
従って、ブロックが、ノンイントラ(non-intra)符号化（前方予測符号化、後方予測符号化、または両方向予測符号化）されている場合、逆ＤＣＴ変換部１２３の出力は、残差画像（元の画像と、その予測画像との差分値）を復号したものとなっており、演算部１２４は、この残差画像の復号結果（以下、適宜、復号残差画像という）と、動き補償部１２５から供給される予測画像とを加算することで、ノンイントラ符号化されたブロックを復号し、その結果得られる復号画像データを出力する。
【０１６７】
一方、逆ＤＣＴ変換部１２３が出力するブロックが、イントラ符号化されたものであった場合には、逆ＤＣＴ変換部１２３の出力は、元の画像を復号したものとなっており、演算部１２４は、逆ＤＣＴ変換部１２３の出力を、そのまま、復号画像データとして出力する。
【０１６８】
演算部１２４が出力する復号画像データは、メモリ１２６とピクチャ選択部１２７に供給される。
【０１６９】
メモリ１２６は、演算部１２４から供給される復号画像データが、ＩピクチャまたはＰピクチャの画像データである場合、その復号画像データを、その後に復号される符号化データの参照画像として一時記憶する。ここで、ＭＰＥＧ２では、Ｂピクチャは参照画像とされないことから、演算部１２４から供給される復号画像が、Ｂピクチャの画像である場合には、メモリ１２６では、Ｂピクチャの復号画像は記憶されない。なお、メモリ１２６は、演算部１２４から供給される復号画像が、Ｉ，Ｐ，Ｂのうちのいずれのピクチャであるかは、逆ＶＬＣ部１２１から供給されるピクチャタイプを参照することにより判断する。
【０１７０】
ピクチャ選択部１２７は、演算部１２４が出力する復号画像、またはメモリ１２６に記憶された復号画像のフレーム（またはフィールド）を、表示順に選択して出力する。即ち、ＭＰＥＧ２方式では、画像のフレーム（またはフィールド）の表示順と復号順（符号化順）とが一致していないため、ピクチャ選択部１２７は、復号順に得られる復号画像のフレーム（またはフィールド）を表示順に並べ替えて出力する。なお、ピクチャ選択部１２７は、表示順を、逆ＶＬＣ部１２１から供給されるテンポラルリファレンスを参照することにより判断する。
【０１７１】
一方、動き補償部１２５は、逆ＶＬＣ部１２１が出力する動きベクトルを受信するとともに、参照画像となるフレーム（またはフィールド）を、メモリ１２６から読み出し、その参照画像に対して、逆ＶＬＣ部１２１からの動きベクトルにしたがった動き補償を施し、その結果得られる予測画像を、演算部１２４に供給する。演算部１２４では、上述したように、動き補償部１２５からの予測画像と、逆ＤＣＴ変換部１２３が出力する残差画像と加算され、これにより、ノンイントラ符号化（予測符号化）されたブロックが復号される。
【０１７２】
次に、図１５のフローチャートを参照して、図１３のミスマッチ情報生成部１１５の処理について説明する。
【０１７３】
ミスマッチ情報生成部１１５は、まず最初に、ステップＳ４１において、注目ブロック（を含むマクロブロック）が、イントラ符号化されたものであるか、またはノンイントラ符号化されたものであるかを、イントラ／ノンイントラ判定部１１３の出力に基づいて判定する。
【０１７４】
ここで、イントラ／ノンイントラ判定部１１３は、注目ブロックのフレームのピクチャタイプが、Ｉピクチャを表している場合、注目ブロックがイントラ符号化されていると判定する。また、イントラ／ノンイントラ判定部１１３は、注目ブロックのフレームのピクチャタイプが、ＰまたはＢピクチャを表している場合には、注目ブロックを含むマクロブロック（以下、適宜、注目マクロブロックという）のマクロブロックタイプに基づいて、注目ブロックがイントラ符号化またはノンイントラ符号化されているかを判定する。
【０１７５】
ステップＳ４１において、注目ブロックがノンイントラ符号化されていると判定された場合、ステップＳ４２に進み、ミスマッチ情報生成部１１５は、注目ブロックが、動いている画像を表示しているブロック（以下、適宜、動きブロックという）であるか、または静止している画像を表示しているブロック（以下、適宜、静止ブロックという）であるかを、静動判定部１１４の出力に基づいて判定する。
【０１７６】
ここで、静動判定部１１４は、ノンイントラ符号化されているブロックについては、そのブロックを含むマクロブロックの動きベクトルの大きさが、所定の閾値εより大（または以上）である場合、そのノンイントラ符号化されているブロックが動きブロックであると判定する。また、静動判定部１１４は、ノンイントラ符号化されているブロックを含むマクロブロックの動きベクトルの大きさが、所定の閾値ε以下（または未満）である場合、そのノンイントラ符号化されているブロックが静止ブロックであると判定する。
【０１７７】
ステップＳ４２において、注目ブロックが動きブロックであると判定された場合、ステップＳ４５に進み、後述する処理が行われる。
【０１７８】
また、ステップＳ４２において、注目ブロックが静止ブロックであると判定された場合、ステップＳ４３に進み、ミスマッチ情報生成部１１５は、注目データのミスマッチ情報として、注目データのＤＣＴタイプ（注目マクロブロックのＤＣＴタイプ）が正しいことを表す、例えば、１ビットの０を生成して出力し、処理を終了する。
【０１７９】
一方、ステップＳ４１において、注目ブロックがイントラ符号化されていると判定された場合、ステップＳ４４に進み、ミスマッチ情報生成部１１５は、注目ブロックが、動きブロックまたは静止ブロックのうちのいずれであるかを、静動判定部１１４の出力に基づいて判定する。
【０１８０】
ここで、静動判定部１１４は、イントラ符号化されているブロックについては、例えば、そのブロックの、１フレーム前のフレームにおける対応するブロック（以下、適宜、対応前ブロックという）と、１フレーム後のフレームにおける対応するブロック（以下、適宜、対応後ブロックという）のうちのいずれか一方、または両方の動きベクトルと、所定の閾値εとの大小関係によって、ノンイントラ符号化されているブロックにおける場合と同様に、動きブロックまたは静止ブロックの別を判定する。あるいは、静動判定部１１４は、例えば、イントラ符号化されているブロックについての対応前ブロックと対応後ブロックのうちのいずれか一方、または両方が動きブロックである場合、そのイントラ符号化されているブロックも動きブロックであると判定し、対応前ブロックと対応後ブロックのうちの両方またはいずれか一方が静止ブロックである場合、そのイントラ符号化されているブロックも静止ブロックであると判定する。
【０１８１】
ステップＳ４４において、注目ブロックが静止ブロックであると判定された場合、ステップＳ４３に進み、上述したように、ミスマッチ情報生成部１１５は、注目データのミスマッチ情報として、注目データのＤＣＴタイプが正しいことを表す１ビットの０を生成して出力し、処理を終了する。
【０１８２】
また、ステップＳ４４において、注目ブロックが動きブロックであると判定された場合、ステップＳ４５に進み、ミスマッチ情報生成部１１５は、注目ブロックのＤＣＴタイプが、フレームＤＣＴモードまたはフィールドＤＣＴモードのうちのいずれであるかを、フィールド／フレーム判定部１１２の出力に基づいて判定する。
【０１８３】
ステップＳ４５において、注目ブロックのＤＣＴタイプが、フィールドＤＣＴモードであると判定された場合、ステップＳ４３に進み、上述したように、ミスマッチ情報生成部１１５は、注目データのミスマッチ情報として、注目データのＤＣＴタイプが正しいことを表す１ビットの０を生成して出力し、処理を終了する。
【０１８４】
また、ステップＳ４５において、注目ブロックのＤＣＴタイプが、フレームＤＣＴモードであると判定された場合、ステップＳ４６に進み、ミスマッチ情報生成部１１５は、注目データのミスマッチ情報として、注目データのＤＣＴタイプ（注目マクロブロックのＤＣＴタイプ）が正しくないことを表す、例えば、１ビットの１を生成して出力し、処理を終了する。
【０１８５】
図１５の実施の形態によれば、例えば、図１６に示すように、隣接する２×２個のマクロブロックＭＢ＃１，＃２，＃３，＃４において、水平方向に移動している円形の物体が表示されている場合において、右上のマクロブロックＭＢ＃２のＤＣＴタイプがフレームＤＣＴモードであり、他の３つのマクロブロックＭＢ＃１，＃３、および＃４のＤＣＴタイプがフィールドＤＣＴモードであるときには、ミスマッチ情報生成部１１５において、以下のようなミスマッチ情報が生成される。
【０１８６】
即ち、マクロブロックＭＢ＃１，＃２，＃３，＃４それぞれを構成するブロックは、いずれも、動きブロックであり、フィールドＤＣＴモードでＤＣＴ変換すべきである。従って、ＤＣＴタイプがフィールドＤＣＴモードになっているマクロブロックＭＢ＃１，＃３，＃４それぞれを構成するブロックのデータが注目データとされた場合には、ミスマッチ情報として、ＤＣＴタイプが正しいことを表す１ビットの０が生成される。また、ＤＣＴタイプがフレームＤＣＴモードになっているマクロブロックＭＢ＃２を構成するブロックのデータが注目データとされた場合には、ミスマッチ情報として、ＤＣＴタイプが正しくないことを表す１ビットの１が生成される。
【０１８７】
なお、図１５の実施の形態では、注目ブロックが動きブロックであり、かつそのＤＣＴタイプがフレームＤＣＴモードになっている場合にのみ、ＤＣＴタイプが正しくないことを表すミスマッチ情報を生成し、他の場合には、ＤＣＴタイプが正しいことを表すミスマッチ情報を生成するようにしたが、その他、例えば、注目ブロックが動きブロックであり、かつそのＤＣＴタイプがフレームＤＣＴモードになっている場合と、注目ブロックが静止ブロックであり、かつそのＤＣＴタイプがフィールドＤＣＴモードになっている場合に、ＤＣＴタイプが正しくないことを表すミスマッチ情報を生成し、注目ブロックが動きブロックであり、かつそのＤＣＴタイプがフィールドＤＣＴモードになっている場合と、注目ブロックが静止ブロックであり、かつそのＤＣＴタイプがフレームＤＣＴモードになっている場合に、ＤＣＴタイプが正しいことを表すミスマッチ情報を生成するようにすることなども可能である。
【０１８８】
また、図１５の実施の形態では、説明を簡単にするために、ＤＣＴタイプが正しいか、正しくないかを表す１ビットのミスマッチ情報を生成するようにしたが、ミスマッチ情報としては、その他、例えば、注目データのＤＣＴタイプと、その注目データを含むブロック（注目ブロック）が、本来、フレームＤＣＴモードまたはフィールドＤＣＴモードのうちのいずれでＤＣＴ変換すべきものであるかを表す情報（以下、適宜、ブロックタイプという）とのセットを生成するようにすることも可能である。
【０１８９】
ここで、ブロックタイプは、例えば、注目ブロックが動きブロックである場合には、フィールドＤＣＴモードを表すものとし、注目ブロックが静止ブロックである場合には、フレームＤＣＴモードを表すものとするようにすることが可能である。
【０１９０】
次に、図１３の実施の形態におけるクラス分類適応処理部３２（図９）の処理について説明する。
【０１９１】
クラス分類適応処理部３２では、前処理部３１を構成する図１４で説明したＭＰＥＧデコーダ１１６が出力する復号画像データを対象に、クラス分類適応処理が行われ、その結果得られる適応処理データが、後処理部３３に出力される。後処理部３３は、クラス分類適応処理部３２からの適応処理データを、そのまま、高画質の画像データ（高画質画像データ）として出力する。
【０１９２】
従って、図１３の実施の形態では、クラス分類適応処理部３２においてクラス分類適応処理が行われることにより、前処理部３１のＭＰＥＧデコーダ１１６が出力する、符号化データをＭＰＥＧ方式で復号した復号画像データが、高画質画像データに変換されて出力される。
【０１９３】
即ち、クラス分類適応処理部３２（図９）では、前処理部３１のＭＰＥＧデコーダ１１６が出力する復号画像データが、タップ抽出部５１と５２に供給される。
【０１９４】
タップ抽出部５１は、まだ、注目データとしていない高画質画像データの画素を注目データとして、その注目データ（の画素値）を予測するのに用いる復号画像データの幾つか（の画素）を、予測タップとして抽出する。タップ抽出部５２も、注目データをクラス分類するのに用いる復号画像データの幾つかを、クラスタップとして抽出する。
【０１９５】
ここで、上述したように、タップ抽出部５１および５２には、判定部２２からミスマッチ情報も供給されるようになっており、タップ抽出部５１と５２は、ミスマッチ情報に基づき、予測タップとクラスタップの構造を、それぞれ変更するようになっている。
【０１９６】
即ち、例えば、いま、上述したような、注目ブロックのＤＣＴタイプとブロックタイプとのセットが、注目データについてのミスマッチ情報として、判定部２２（のミスマッチ情報生成部１１５（図１３））からクラス分類適応処理部３２に供給されるものとすると、タップ抽出部５１は、ミスマッチ情報としての、注目ブロックのＤＣＴタイプとブロックタイプとのセットを受信し、ＭＰＥＧデコーダ１１６から供給される復号画像データから、例えば、図１７に示すようなタップ構造設定テーブルにしたがったタップ構造の予測タップを抽出する。
【０１９７】
即ち、タップ抽出部５１は、ミスマッチ情報としてのＤＣＴタイプとブロックタイプが、いずれもフィールドＤＣＴモードである場合、後述するフィールドタップのみからなるパターンＡのタップ構造の予測タップを構成する。また、タップ抽出部５１は、ミスマッチ情報としてのＤＣＴタイプとブロックタイプが、それぞれフィールドＤＣＴモードとフレームＤＣＴモードである場合、フィールドタップの数が、後述するフレームタップの数より多いパターンＢのタップ構造の予測タップを構成する。さらに、タップ抽出部５１は、ミスマッチ情報としてのＤＣＴタイプとブロックタイプが、それぞれフレームＤＣＴモードとフィールドＤＣＴモードである場合、フレームタップの数が、フィールドタップの数より多いパターンＣのタップ構造の予測タップを構成する。また、タップ抽出部５１は、ミスマッチ情報としてのＤＣＴタイプとブロックタイプが、いずれもフレームＤＣＴモードである場合、フレームタップのみからなるパターンＤのタップ構造の予測タップを構成する。
【０１９８】
ここで、図１８は、パターンＡ乃至Ｄのタップ構造を示している。なお、図１８において、○印が、復号画像データの画素を表している。また、斜線を付してある○印は、フィールドタップとなっている画素を表し、●印は、フレームタップとなっている画素を表している。
【０１９９】
図１８Ａは、パターンＡのタップ構造を示している。パターンＡのタップ構造は、例えば、注目データに対応する復号画像データの画素（以下、適宜、注目画素という）、注目画素の左右それぞれに隣接する２画素、注目画素の上方向に１画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素、注目画素の上方向に３画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素、注目画素の下方向に１画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素、注目画素の下方向に３画素おいて隣接する画素、その画素の左右それぞれに隣接する２画素の合計２５画素で構成される。
【０２００】
ここで、フィールドタップとは、例えば、その上下に隣接する２画素が、いずれもタップ（ここでは、予測タップまたはクラスタップ）となっていない画素を意味する。図１８ＡのパターンＡのタップ構造では、いずれのタップも、その上下に隣接する画素がタップになっていないので、すべてフィールドタップである。
【０２０１】
図１８Ｂは、パターンＢのタップ構造を示している。パターンＢのタップ構造は、例えば、注目画素、注目画素の左右それぞれに隣接する２画素、注目画素の上方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の上方向に３画素おいて隣接する画素の左右それぞれに隣接する１画素、注目画素の下方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の下方向に３画素おいて隣接する画素の左右それぞれに隣接する１画素、注目画素の上に隣接する４画素、注目画素の下に隣接する４画素の合計２５画素で構成される。
【０２０２】
ここで、フレームタップとは、その上または下に隣接する画素のうちの少なくとも一方がタップとなっている画素を意味する。図１８ＢのパターンＢのタップ構造では、注目画素と、注目画素の上下それぞれに隣接する４画素の合計９画素がフレームタップとなっており、残りの１６画素がフィールドタップとなっている。
【０２０３】
図１８Ｃは、パターンＣのタップ構造を示している。パターンＣのタップ構造は、例えば、注目画素、注目画素の左右それぞれに隣接する２画素、注目画素の上方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の下方向に１画素おいて隣接する画素の左右それぞれに隣接する２画素、注目画素の上下それぞれに隣接する４画素、注目画素の上に隣接する画素の左右それぞれに隣接する１画素、注目画素の下に隣接する画素の左右それぞれに隣接する１画素の合計２５画素で構成される。
【０２０４】
パターンＣのタップ構造では、注目画素、注目画素の上下それぞれに隣接する４画素、注目画素の左に隣接する画素、その画素の上下それぞれに隣接する２画素、注目画素の右に隣接する画素、その画素の上下それぞれに隣接する２画素の合計１９画素がフレームタップとなっており、残りの６画素がフィールドタップになっている。
【０２０５】
図１８Ｄは、パターンＤのタップ構造を示している。パターンＤのタップ構造は、例えば、注目画素を中心として隣接する、横×縦が５×５画素の合計２５画素で構成される。
【０２０６】
パターンＤのタップ構造では、いずれのタップも、その上または下の少なくとも一方の画素がタップとなっているので、すべてフレームタップである。
【０２０７】
タップ抽出部５１（図９）は、ミスマッチ情報に基づき、注目データについて、図１８に示したパターンＡ乃至Ｄのうちのいずれかのタップ構造の予測タップを構成する。
【０２０８】
タップ抽出部５２も、タップ抽出部５１と同様に、ミスマッチ情報に基づくタップ構造のクラスタップを構成する。
【０２０９】
なお、ここでは、ミスマッチ情報に基づいて、予測タップとして抽出する復号画像データの画素の位置を変更するだけで、予測タップを構成する画素数は、２５画素のまま変更しないようにしたが、タップ抽出部５１では、ミスマッチ情報に基づいて、予測タップを構成する復号画像データの画素の数を変更するようにすることも可能である。
【０２１０】
また、前処理部３１のＭＰＥＧデコーダ１１６では、符号化データにおける量子化ＤＣＴ係数が、その符号化データに含まれる動きベクトルや、ＤＣＴタイプ、量子化ステップ、ピクチャタイプ、その他の復号制御情報を用いて、画像に復号されるが、タップ抽出部５１では、このような復号制御情報も、予測タップに含めることが可能である。さらに、この場合、ミスマッチ情報に基づいて、予測タップとする復号制御情報を変更することも可能である。さらに、タープ抽出部５１では、符号化データに含まれる量子化ＤＣＴ係数や、その量子化ＤＣＴ係数を逆量子化して得られるＤＣＴ係数も、予測タップに含めるようにすることが可能である。
【０２１１】
タップ抽出部５２でも、タップ抽出部５１における場合と同様にして、クラスタップを構成することができる。
【０２１２】
タップ抽出部５１で得られた予測タップは、予測部５４に供給され、タップ抽出部５２で得られたクラスタップは、クラス分類部５３に供給される。
【０２１３】
クラス分類部５３には、クラスタップの他、注目データについてのミスマッチ情報も供給され、クラス分類部５３では、上述したように、クラスタップとミスマッチ情報に基づき、注目データがクラス分類される。
【０２１４】
即ち、クラス分類部５３は、例えば、注目データについてのクラスタップに対して、例えば、ADRC(Adaptive Dynamic Range Coding)処理等の圧縮処理を施すことによりクラス分類を行い、クラスコードを求める。
【０２１５】
ここで、ADRC処理を用いたクラス分類では、クラスタップを構成するデータ（ここでは、画素値）が、ADRC処理され、例えば、その結果得られるADRCコードが、クラスコードとされる。
【０２１６】
なお、KビットADRCにおいては、例えば、クラスタップを構成するデータの最大値MAXと最小値MINが検出され、DR=MAX-MINを、集合の局所的なダイナミックレンジとし、このダイナミックレンジDRに基づいて、クラスタップを構成するデータがKビットに再量子化される。即ち、クラスタップを構成する各データから、最小値MINが減算され、その減算値がDR/2^Kで除算（量子化）される。そして、以上のようにして得られる、クラスタップを構成するKビットの各データを、所定の順番で並べたビット列が、ADRCコードとして出力される。従って、クラスタップが、例えば、１ビットADRC処理された場合には、そのクラスタップを構成する各データは、最小値MINが減算された後に、最大値MAXと最小値MINとの平均値で除算され（小数点以下切り捨て）、これにより、各データが１ビットとされる（２値化される）。そして、その１ビットのデータを所定の順番で並べたビット列が、ADRCコードとして出力される。
【０２１７】
なお、クラス分類部５３には、例えば、クラスタップを構成するデータのレベル分布のパターンを、そのままクラスコードとして出力させることも可能である。しかしながら、この場合、クラスタップが、Ｎ個のデータで構成され、各データに、Ｋビットが割り当てられているとすると、クラス分類部２４が出力するクラスコードの場合の数は、（２^N）^K通りとなり、データのビット数Ｋに指数的に比例した膨大な数となる。
【０２１８】
従って、クラス分類部５３においては、クラスタップの情報量を、上述のADRC処理や、あるいはベクトル量子化等によって圧縮することにより、クラス分類を行うのが好ましい。
【０２１９】
ここで、クラスタップを用いてクラス分類を行うことにより得られるクラスコードを、以下、適宜、クラスタップコードという。
【０２２０】
クラス分類部５３は、上述のようにしてクラスタップコードを求める他、注目データについてのミスマッチ情報としての、例えば、ＤＣＴタイプとブロックタイプのセットを用いてクラス分類を行うことにより、２ビットのクラスコードを求める。
【０２２１】
即ち、いま、ミスマッチ情報を用いたクラス分類によって得られるクラスコードを、ミスマッチコードというものとすると、クラス分類部５３は、ミスマッチ情報としてのＤＣＴタイプとブロックタイプが、いずれもフィールドＤＣＴモードを表している場合には、２ビットのミスマッチコードを、例えば「００」とする。また、クラス分類部５３は、ＤＣＴタイプとブロックタイプが、それぞれフィールドＤＣＴモードとフレームＤＣＴモードを表している場合には、２ビットのミスマッチコードを、例えば「０１」とする。さらに、クラス分類部５３は、ＤＣＴタイプとブロックタイプが、それぞれフレームＤＣＴモードとフィールドＤＣＴモードを表している場合には、２ビットのミスマッチコードを、例えば「１０」とする。また、クラス分類部５３は、ＤＣＴタイプとブロックタイプが、いずれもフレームＤＣＴモードを表している場合には、２ビットのミスマッチコードを、例えば「１１」とする。
【０２２２】
その後、クラス分類部５３は、例えば、注目データについて得られたクラスタップコードの上位ビットとして、注目データについて得られたミスマッチコードを付加し、このクラスタップコードとミスマッチコードとで構成されるコードを、注目データについての最終的なクラスコードとして出力する。
【０２２３】
なお、クラス分類部５３では、その他、例えば、ＤＣＴタイプ以外の復号制御情報にも基づいて、クラス分類を行うようにすることが可能である。
【０２２４】
クラス分類部５３が出力するクラスコードは、係数メモリ４１に供給される。係数メモリ４１では、そのクラスコードに対応するタップ係数が読み出され、予測部５４に供給される。
【０２２５】
予測部５４は、タップ抽出部５１が出力する予測タップと、係数メモリ４１から取得したタップ係数とを用いて、式（１）に示した線形予測演算を行う。これにより、予測部５４は、注目データ（の予測値）、即ち、高画質画像データを求め、後処理部３３に供給する。
【０２２６】
後処理部３３では、上述したように、クラス分類適応処理部３２（の予測部５４）の出力、即ち、高画質画像データが、そのまま出力される。
【０２２７】
なお、上述の場合には、注目ブロックのＤＣＴタイプが正しいか、正しくないかを表す１ビットの情報や、注目ブロックのＤＣＴタイプとブロックタイプのセットを、ミスマッチ情報とするようにしたが、ミスマッチ情報としては、その他、例えば、注目ブロックのＤＣＴタイプが、どの程度正しいかを表す評価値などを採用することが可能である。
【０２２８】
注目ブロックのＤＣＴタイプの正しいの程度を表す評価値としては、例えば、注目ブロックのＤＣＴタイプがフィールドＤＣＴモードである場合には、注目ブロック（注目マクロブロック）の動きベクトルの大きさを採用し、注目ブロックのＤＣＴタイプがフレームＤＣＴモードである場合には、動きベクトルの最大の大きさから、注目ブロックの動きベクトルの大きさを減算して得られる減算値を採用することが可能である。この場合、注目ブロックのＤＣＴタイプがフィールドＤＣＴモードであるときには、注目ブロックの動きベクトルの大きさが大きいほど、また、注目ブロックのＤＣＴタイプがフレームＤＣＴモードであるときには、注目ブロックの動きベクトルの大きさが小さいほど、評価値が大きくなる。
【０２２９】
そして、この場合、タップ抽出部５１や５２では、例えば、ミスマッチ情報としての評価値を、１つ以上の閾値と比較し、その比較結果に基づいて、予測タップやクラスタップのタップ構造を変更するようにすることが可能である。また、クラス分類部５３では、例えば、ミスマッチ情報としての評価値を量子化し、その量子化値を、ミスマッチコードとして用いることが可能である。
【０２３０】
さらに、上述の場合には、注目ブロックが動きブロックか、または静止ブロックであるかを、注目ブロックの動きベクトル、あるいは対応前ブロックまたは対応後ブロックの動きベクトルや、静止ブロックもしくは動きブロックの別に基づいて判定するようにしたが、注目ブロックが動きブロックまたは静止ブロックのうちのいずれであるかの判定は、その他、例えば、注目ブロックや、対応前ブロックまたは対応後ブロックの周辺のブロックの動きベクトル等にも基づいて判定するようにすることが可能である。
【０２３１】
次に、図１９は、図１３の係数メモリ４１に記憶させるタップ係数を学習する場合の、図１１の学習装置の詳細構成例を示している。
【０２３２】
図１９の実施の形態では、学習用データ記憶部１１に、学習用データとして、高画質の画像データ（学習用画像データ）が記憶されている。
【０２３３】
図１９の実施の形態において、符号化部１２は、ＭＰＥＧエンコーダ１３１で構成されており、ＭＰＥＧエンコーダ１３１は、学習用データ記憶部１１から学習用画像データを読み出して、ＭＰＥＧ２方式で符号化し、その結果得られる符号化データを出力する。
【０２３４】
即ち、図２０は、図１９のＭＰＥＧエンコーダ１３１の構成例を示している。
【０２３５】
学習用画像データは、動きベクトル検出部１４１と演算部１４３に供給される。動きベクトル検出部１４１は、学習用画像データを対象に、例えば、ブロックマッチングを行うことにより、学習用画像データの動きベクトルを検出し、動き補償部１４２に供給する。
【０２３６】
また、演算部１４３は、必要に応じて、学習用画像データ（原画像）から、動き補償部１４２から供給される予測画像を減算し、その結果得られる残差画像を、ＤＣＴ変換部１４４に供給する。ＤＣＴ変換部１４４は、演算部１４３からの残差画像をＤＣＴ変換し、その結果得られるＤＣＴ係数を、量子化部１４５に供給する。量子化部１４５は、ＤＣＴ変換部１４４から供給されるＤＣＴ係数を、所定の量子化ステップで量子化することにより、量子化ＤＣＴ係数を得て、ＶＬＣ部１４６および逆量子化部１４７に供給する。
【０２３７】
ＶＬＣ部１４６は、量子化部１４５から供給される量子化ＤＣＴ係数をＶＬＣコードに可変長符号化し、さらに、必要な復号制御情報（例えば、動きベクトル検出部１４１で検出された動きベクトルや、量子化部１４５で用いられた量子化ステップなど）を多重化することで、符号化データを得て出力する。
【０２３８】
一方、逆量子化部１４７では、量子化部１４５が出力する量子化ＤＣＴ係数が逆量子化され、ＤＣＴ係数が求められて、逆ＤＣＴ変換部１４８に供給される。逆ＤＣＴ変換部１４８は、逆量子化部１４７からのＤＣＴ係数を、逆ＤＣＴ変換することにより、残差画像に復号し、演算部１４９に供給する。
【０２３９】
演算部１４９には、逆ＤＣＴ変換部１４８から、残差画像が供給される他、動き補償部１４２から、その残差画像を求めるのに演算部１４３で用いられたのと同一の予測画像が供給されるようになっており、演算部１４９は、残差画像と予測画像とを加算することで、元の画像を復号（ローカルデコード）する。この復号画像は、メモリ１５０に供給され、参照画像として記憶される。
【０２４０】
そして、動き補償部１４２では、メモリ１５０に記憶された参照画像が読み出され、動きベクトル検出部１４１から供給される動きベクトルにしたがって動き補償が施されることにより、予測画像が生成される。この予測画像は、動き補償部１４２から演算部１４３および１４９に供給される。
【０２４１】
上述したように、演算部１４３では、動き補償部１４２からの予測画像を用いて、残差画像が求められ、また、演算部１４９では、動き補償部１４２からの予測画像を用いて、元の画像が復号される。
【０２４２】
図１９に戻り、ＭＰＥＧデコーダ１３１が出力する符号化データは、復号制御情報抽出部７１に供給される。
【０２４３】
復号制御情報抽出部７１は、逆ＶＬＣ部１３２で構成されている。逆ＶＬＣ部１３２は、図１３の逆ＶＬＣ部１１１と同様の処理を行い、これにより、符号化データから、複数の復号制御情報としてのＤＣＴタイプ、ピクチャタイプ、マクロブロックタイプ、動きベクトルを抽出し、判定部７２に供給する。
【０２４４】
判定部７２は、フィールド／フレーム判定部１３３、イントラ／ノンイントラ判定部１３４、静動判定部１３５、およびミスマッチ情報生成部１３６で構成されている。そして、フィールド／フレーム判定部１３３、イントラ／ノンイントラ判定部１３４、静動判定部１３５、またはミスマッチ情報生成部１３６では、復号制御情報抽出部７１から供給される複数の復号制御情報としてのＤＣＴタイプ、ピクチャタイプ、マクロブロックタイプ、および動きベクトルを用いて、図１３のフィールド／フレーム判定部１１２、イントラ／ノンイントラ判定部１１３、静動判定部１１４、またはミスマッチ情報生成部１１５における場合とそれぞれ同様の処理が行われ、これにより、適応学習部６０において注目教師データとされている教師データについてのミスマッチ情報が生成される。このミスマッチ情報は、ミスマッチ情報生成部１３６から適応学習部６０に供給される。
【０２４５】
図１９の実施の形態では、逆後処理部６１Ａは、学習用データ記憶部１１から学習用画像データを読み出し、そのまま、教師データとして、適応学習部６０に出力する。適応学習部６０（図１１）では、教師データ記憶部６２において、逆後処理部６１Ａからの教師データが記憶される。
【０２４６】
符号化部６３Ａは、ＭＰＥＧエンコーダ１３７で構成され、ＭＰＥＧエンコーダ１３７は、ＭＰＥＧエンコーダ１３１と同様に、学習用データ記憶部１１から学習用画像データを読み出して、ＭＰＥＧ２方式で符号化し、その結果得られる符号化データを、前処理部６３Ｂに出力する。
【０２４７】
前処理部６３Ｂは、図１４のＭＰＥＧデコーダ１１６と同様に構成されるＭＰＥＧデコーダ１３８で構成され、ＭＰＥＧデコーダ１３８は、ＭＰＥＧエンコーダ１３７からの符号化データを、ＭＰＥＧ２方式で復号し、その結果得られる復号画像データを、生徒データとして、適応学習部６０に出力する。適応学習部６０（図１１）では、生徒データ記憶部６４において、ＭＰＥＧデコーダ１３８からの生徒データが記憶される。
【０２４８】
そして、適応学習部６０では、教師データおよび生徒データを用い、生徒データから抽出される予測タップから、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われる。
【０２４９】
即ち、適応学習部６０（図１１）では、タップ抽出部６５が、教師データ記憶部６２に記憶された教師データのうち、まだ、注目教師データとしていないものを、注目教師データとし、注目教師データについて、生徒データ記憶部６４に記憶された生徒データから予測タップを構成して、足し込み部６８に供給する。さらに、タップ抽出部６６が、注目教師データについて、生徒データ記憶部６４に記憶された生徒データからクラスタップを構成し、クラス分類部６７に供給する。
【０２５０】
ここで、タップ抽出部６５および６６には、ミスマッチ情報が供給されるようになっており、タップ抽出部６５または６６それぞれは、ミスマッチ情報に基づき、注目教師データについて、図１３で説明したクラス分類適応処理部３２のタップ抽出部５１または５２（図９）が構成するのと同一のタップ構造の予測タップまたはクラスタップを構成する。
【０２５１】
従って、例えば、タップ抽出部５１または５２において、図１３で説明したように、符号化データに含まれる復号制御情報をも用いて、予測タップまたはクラスタップがそれぞれ構成される場合には、図１９の学習装置でも、タップ抽出部６５または６６（図１１）において、復号制御情報をも用いて、予測タップまたはクラスタップがそれぞれ構成される。
【０２５２】
その後、クラス分類部６７（図１１）では、注目教師データについてのクラスタップとミスマッチ情報に基づき、注目教師データについて、図１３で説明したクラス分類部５３における場合と同様のクラス分類を行い、その結果得られるクラスに対応するクラスコードを、足し込み部６８に出力する。
【０２５３】
足し込み部６８は、教師データ記憶部６２から注目教師データを読み出し、その注目教師データと、タップ抽出部６５からの予測タップを用い、式（８）の行列Ａとベクトルｖのコンポーネントを計算する。さらに、足し込み部６８は、既に得られている行列Ａとベクトルｖのコンポーネントのうち、クラス分類部６７からのクラスコードに対応するものに対して、注目教師データと予測タップから求められた行列Ａとベクトルｖのコンポーネントを足し込む。
【０２５４】
以上の処理が、教師データ記憶部６２に記憶された教師データすべてを、注目教師データとして行われると、足し込み部６８は、いままでの処理によって得られたクラスごとの行列Ａおよびベクトルｖのコンポーネントで構成される式（８）の正規方程式を、タップ係数算出部６９に供給し、タップ係数算出部６９は、その各クラスごとの正規方程式を解くことにより、各クラスごとに、タップ係数を求めて出力する。
【０２５５】
なお、図１９の学習装置では、例えば、符号化部６３ＡのＭＰＥＧエンコーダ１３７において学習用画像データをＭＰＥＧ符号化する前に、その学習用画像データの画素数を，１／Ｎに間引くようにすることで、適応学習部６０において、ＭＰＥＧ復号された画像データを、高画質で、かつ画素数をＮ倍にする（解像度を高くする）タップ係数を得ることができる。
【０２５６】
次に、図２１は、符号化データが画像データをＭＰＥＧ２方式で符号化したものである場合の、図６の復号装置の第２の詳細構成例を示している。なお、図中、図１３における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。
【０２５７】
図２１の実施の形態では、前処理部３１が、逆ＶＬＣ部１６１、逆量子化部１６２、演算部１６３、ＭＰＥＧデコーダ１６４、メモリ１６５、動き補償部１６６、およびＤＣＴ変換部１６７で構成されている。
【０２５８】
前処理部３１において、符号化データは、逆ＶＬＣ部１６１とＭＰＥＧデコーダ１６４に供給される。
【０２５９】
逆ＶＬＣ部１６１は、符号化データから、量子化ＤＣＴ係数のＶＬＣコードを分離するとともに、量子化ステップ、動きベクトル、その他の復号制御情報を分離する。そして、逆ＶＬＣ部１６１は、量子化ＤＣＴ係数のＶＬＣコードを逆ＶＬＣ処理することで、量子化ＤＣＴ係数に復号し、逆量子化部１６２に供給する。さらに、逆ＶＬＣ部１６１は、量子化ステップを、逆量子化部１６２に、動きベクトルを、動き補償部１６６に、それぞれ供給する。
【０２６０】
逆量子化部１６２は、逆ＶＬＣ部１６１から供給される量子化ＤＣＴ係数を、同じく逆ＶＬＣ部１６１から供給される量子化ステップで逆量子化し、その結果得られる８×８画素のブロックのＤＣＴ係数を、演算部１６３に供給する。
【０２６１】
一方、ＭＰＥＧデコーダ１６４では、符号化データが、ＭＰＥＧ方式で復号され、復号画像データが出力される。ＭＰＥＧデコーダ１６４が出力する復号画像のうち、参照画像とされ得るＩピクチャとＰピクチャは、メモリ１６５に供給されて記憶される。
【０２６２】
そして、動き補償部１６６は、メモリ１６５に記憶された復号画像を参照画像として読み出し、その参照画像に対して、逆ＶＬＣ部１６１から供給される動きベクトルにしたがい、動き補償を施すことで、逆量子化部１６２から演算部１６３に供給されたブロックの予測画像を生成し、ＤＣＴ変換部１６７に供給する。ＤＣＴ変換部１６７は、動き補償部１６６から供給される予測画像をＤＣＴ変換し、その結果得られるＤＣＴ係数を、演算部１６３に供給する。
【０２６３】
演算部１６３は、逆量子化部１６２から供給されるブロックの各ＤＣＴ係数と、ＤＣＴ変換部１６７から供給される、対応するＤＣＴ係数とを、必要に応じて加算することで、そのブロックの画素値をＤＣＴ変換したＤＣＴ係数を求める。
【０２６４】
即ち、逆量子化部１６２から供給されるブロックがイントラ符号化されているものである場合、逆量子化部１６２から供給されるブロックのＤＣＴ係数は、元の画素値をＤＣＴ変換したものとなっているから、演算部１６３は、逆量子化部１６２から供給されるブロックのＤＣＴ係数を、そのまま出力する。
【０２６５】
また、逆量子化部１６２から供給されるブロックがノンイントラ符号化されているものである場合、逆量子化部１６２から供給されるブロックのＤＣＴ係数は、元の画素値と予測画像との差分値（残差画像）をＤＣＴ変換したものとなっているから、演算部１６３は、逆量子化部１６２から供給されるブロックの各ＤＣＴ係数と、ＤＣＴ変換部１６７から供給される、予測画像をＤＣＴ変換して得られるＤＣＴ係数の対応するものとを加算することにより、元の画素値をＤＣＴ変換して得られるＤＣＴ係数を求めて出力する。
【０２６６】
演算部１６３が出力するブロックのＤＣＴ係数は、前処理データとして、クラス分類適応処理部３２に供給される。
【０２６７】
図２１の実施の形態では、クラス分類適応処理部３２において、前処理部３１が出力するＤＣＴ係数を対象に、クラス分類適応処理が行われ、これにより、高画質画像データ（の予測値）が、適応処理データとして求められる。
【０２６８】
即ち、クラス分類適応処理部３２（図９）では、前処理部３１が出力するＤＣＴ係数が、タップ抽出部５１と５２に供給される。
【０２６９】
タップ抽出部５１は、まだ、注目データとしていない高画質画像データの画素を注目データとして、その注目データを予測するのに用いる前処理データとしてのＤＣＴ係数の幾つかを、予測タップとして抽出する。タップ抽出部５２も、注目データをクラス分類するのに用いる前処理データとしてのＤＣＴ係数の幾つかを、クラスタップとして抽出する。
【０２７０】
なお、タップ抽出部５１または５２は、注目データについてのミスマッチ情報に基づいて、予測タップまたはクラスタップのタップ構造を、それぞれ変更する。
【０２７１】
即ち、タップ抽出部５１は、例えば、注目データのブロック（注目ブロック）のＤＣＴ係数すべての他、注目ブロックの上下左右それぞれに隣接するブロックにおける必要なＤＣＴ係数を、ミスマッチ情報に応じて抽出して、予測タップを構成する。タップ抽出部５１も、タップ抽出部５１と同様にして、クラスタップを構成する。
【０２７２】
そして、タップ抽出部５１で得られた予測タップは、予測部５４に供給され、タップ抽出部５２で得られたクラスタップは、クラス分類部５３に供給される。
【０２７３】
クラス分類部５３では、クラスタップと、注目データについてのミスマッチ情報に基づき、図１３で説明した場合と同様にして、注目データがクラス分類され、注目データについてのクラスコードが、係数メモリ４１に供給される。係数メモリ４１では、注目データについてのクラスコードに対応するタップ係数が読み出され、予測部５４に供給される。
【０２７４】
予測部５４は、係数メモリ４１から供給されるタップ係数を取得し、そのタップ係数と、タップ抽出部５１が出力する予測タップとを用いて、式（１）に示した線形予測演算を行う。これにより、予測部５４は、注目データ（の予測値）、即ち、高画質画像データを求め、後処理部３３に供給する。
【０２７５】
後処理部３３では、クラス分類適応処理部３２からの高画質画像データが、そのまま出力される。
【０２７６】
従って、図２１の実施の形態では、クラス分類適応処理部３２において、ＤＣＴ係数が高画質画像データに変換される。
【０２７７】
次に、図２２は、図２１の復号装置の係数メモリ４１に記憶させるタップ係数を学習する場合の、図１１の学習装置の詳細構成例を示している。なお、図中、図１９における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。
【０２７８】
図２２の実施の形態では、前処理部６３Ｂが、逆ＶＬＣ部１７１、逆量子化部１７２、演算部１７３、ＭＰＥＧデコーダ１７４、メモリ１７５、動き補償部１７６、およびＤＣＴ変換部１７７で構成されており、これらの逆ＶＬＣ部１７１乃至ＤＣＴ変換部１７７は、図２１の逆ＶＬＣ部１６１乃至ＤＣＴ変換部１６７とそれぞれ同様に構成されている。
【０２７９】
従って、前処理部６３Ｂでは、符号化部６３ＡのＭＰＥＧエンコーダ１３７が出力する符号化データに対して、図２１の前処理部３１における場合と同様の処理が施され、これにより得られるＤＣＴ係数が、生徒データとして、適応学習部６０に供給される。
【０２８０】
適応学習部６０（図１１）では、生徒データ記憶部６４において、前処理部６３Ｂから供給されるＤＣＴ係数が、生徒データとして記憶され、図１９で説明した場合と同様に、教師データおよび生徒データを用い、生徒データから抽出される予測タップから、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われ、これにより、生徒データとしてのＤＣＴ係数を、高画質画像データに変換するクラスごとのタップ係数が求められる。
【０２８１】
但し、図２２の実施の形態において、適応学習部６０（図１１）では、そのタップ抽出部６５または６６それぞれにおいて、図２１のクラス分類適応処理部３２（図９）におけるタップ抽出部５１または５２が構成するのと同一のタップ構造の予測タップまたはクラスタップが、ミスマッチ情報に基づいて構成される。さらに、図２２の適応学習部６０（図１１）におけるクラス分類部６７でも、図２１のクラス分類適応処理部３２（図９）におけるクラス分類部５３と同様のクラス分類が行われる。
【０２８２】
次に、図２３は、符号化データが画像データをＭＰＥＧ２方式で符号化したものである場合の、図６の復号装置の第３の詳細構成例を示している。なお、図中、図２１における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。
【０２８３】
図２３の復号装置は、後処理部３３が、逆ＤＣＴ変換部１８１で構成されていることを除いて、図２１における場合と同様に構成されている。
【０２８４】
図２３の実施の形態では、クラス分類適応処理部３２において、前処理部３１が出力するＤＣＴ係数を対象に、クラス分類適応処理が行われ、これにより、逆ＤＣＴ変換を行った場合に、高画質画像データを得ることのできるＤＣＴ係数（以下、適宜、高画質ＤＣＴ係数という）（の予測値）が、適応処理データとして求められる。
【０２８５】
即ち、クラス分類適応処理部３２（図９）では、前処理部３１が出力する前処理データとしてのＤＣＴ係数が、タップ抽出部５１と５２に供給される。
【０２８６】
タップ抽出部５１は、まだ、注目データとしていない高画質ＤＣＴ係数を注目データとして、その注目データを予測するのに用いる前処理データとしてのＤＣＴ係数の幾つかを、予測タップとして抽出する。即ち、タップ抽出部５１は、ミスマッチ情報に基づき、注目データについて、例えば、図２１における場合と同様のタップ構造の予測タップを構成する。タップ抽出部５２も、ミスマッチ情報に基づき、注目データについて、例えば、図２１における場合と同様のタップ構造のクラスタップを構成する。
【０２８７】
そして、タップ抽出部５１で得られた予測タップは、予測部５４に供給され、タップ抽出部５２で得られたクラスタップは、クラス分類部５３に供給される。
【０２８８】
クラス分類部５３では、クラスタップと、注目データについてのミスマッチ情報に基づき、図２１における場合と同様にして、注目データがクラス分類され、注目データについてのクラスコードが、係数メモリ４１に供給される。係数メモリ４１では、注目データについてのクラスコードに対応するタップ係数が読み出され、予測部５４に供給される。
【０２８９】
予測部５４は、係数メモリ４１が出力するタップ係数を取得し、そのタップ係数と、タップ抽出部５１が出力する予測タップとを用いて、式（１）に示した線形予測演算を行う。これにより、予測部５４は、注目データ（の予測値）、即ち、高画質ＤＣＴ係数を求め、後処理部３３に供給する。
【０２９０】
後処理部３３では、逆ＤＣＴ変換部１８１において、クラス分類適応処理部３２が出力する高画質ＤＣＴ係数が、ブロック単位で逆ＤＣＴ変換され、これにより、高画質画像データが求められて出力される。
【０２９１】
次に、図２４は、図２３の復号装置の係数メモリ４１に記憶させるタップ係数を学習する場合の、図１１の学習装置の詳細構成例を示している。なお、図中、図２２における場合と対応する部分については、同一の符号を付してあり、以下では、その説明は、適宜省略する。
【０２９２】
図２４の学習装置は、逆後処理部６１Ａが、ＤＣＴ変換部１９１で構成されていることを除いて、図２２における場合と同様に構成されている。
【０２９３】
従って、逆後処理部６１Ａでは、ＤＣＴ変換部１９１において、学習用データ記憶部１１から読み出された学習用画像データとしての高画質画像データが、ブロック単位でＤＣＴ変換され、その結果得られるＤＣＴ係数である高画質ＤＣＴ係数が、教師データとして、適応学習部６０に供給される。
【０２９４】
適応学習部６０（図１１）では、教師データ記憶部６２において、逆後処理部６１Ａから供給される高画質ＤＣＴ係数が、教師データとして記憶され、その教師データと、生徒データ記憶部６４に記憶された生徒データとしてのＤＣＴ係数（このＤＣＴ係数は、画像データをＭＰＥＧ符号化した符号化データから得たもの）とを用い、生徒データから抽出される予測タップから、式（１）の線形予測演算を行うことにより得られる教師データの予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われ、これにより、生徒データとしてのＤＣＴ係数を、高画質ＤＣＴ係数に変換するクラスごとのタップ係数が求められる。
【０２９５】
即ち、いまの場合、生徒データされているＤＣＴ係数は、前処理部６３Ｂにおいて、符号化データから求められたものであり、量子化誤差を含んでいるため、そのＤＣＴ係数を逆ＤＣＴ変換して得られる画像は、いわゆるブロック歪み等を有する低画質のものとなる。
【０２９６】
そこで、適応学習部６０では、上述のように、式（１）の線形予測演算を行うことにより得られる教師データ（学習用画像データをＤＣＴ変換して得られる高画質ＤＣＴ係数）の予測値の予測誤差を統計的に最小にするタップ係数を求める学習が行われることにより、生徒データされているＤＣＴ係数を、高画質ＤＣＴ係数に変換するクラスごとのタップ係数が求められる。
【０２９７】
なお、図２４の実施の形態において、適応学習部６０（図１１）では、そのタップ抽出部６５または６６それぞれにおいて、図２３のクラス分類適応処理部３２（図９）におけるタップ抽出部５１または５２が構成するのと同一のタップ構造の予測タップまたはクラスタップが、ミスマッチ情報に基づいて構成される。さらに、図２４の適応学習部６０（図１１）におけるクラス分類部６７でも、図２３のクラス分類適応処理部３２（図９）におけるクラス分類部５３と同様のクラス分類が行われる。
【０２９８】
以上のように、符号化データに含まれる復号制御情報の正しさを判定し、その判定結果を表すミスマッチ情報に基づいて、符号化データの復号、およびその復号に用いるタップ係数の学習を行うようにしたので、学習においては、復号制御情報の正しさを考慮して、原画像に近い予測値を求めるためのタップ係数を求めることができ、その結果、そのようなタップ係数を用いて、符号化データの復号を行うことで、高画質の画像を得ることが可能となる。
【０２９９】
即ち、本実施の形態では、ＤＣＴタイプの正しさを判定し、その判定結果を考慮して、タップ係数の学習を行うようにしたので、ＭＰＥＧ２方式で復号すれば、自然な動きになる部分については、その部分を、原画像に近い予測値に復号するためのタップ係数を得ることができる他、ＭＰＥＧ２方式で復号すれば、不自然な動きになる部分についても、その部分を、原画像に近い予測値に復号するためのタップ係数を得ることができる。そして、そのようなタップ係数を用い、やはり、ＤＣＴタイプの正しさを考慮して、符号化データの復号を行うことにより、高画質の画像を得ることができる。
【０３００】
次に、上述した一連の処理は、ハードウェアにより行うこともできるし、ソフトウェアにより行うこともできる。一連の処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、汎用のコンピュータ等にインストールされる。
【０３０１】
そこで、図２５は、上述した一連の処理を実行するプログラムがインストールされるコンピュータの一実施の形態の構成例を示している。
【０３０２】
プログラムは、コンピュータに内蔵されている記録媒体としてのハードディスク４０５やＲＯＭ４０３に予め記録しておくことができる。
【０３０３】
あるいはまた、プログラムは、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto Optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリなどのリムーバブル記録媒体４１１に、一時的あるいは永続的に格納（記録）しておくことができる。このようなリムーバブル記録媒体４１１は、いわゆるパッケージソフトウエアとして提供することができる。
【０３０４】
なお、プログラムは、上述したようなリムーバブル記録媒体４１１からコンピュータにインストールする他、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送し、コンピュータでは、そのようにして転送されてくるプログラムを、通信部４０８で受信し、内蔵するハードディスク４０５にインストールすることができる。
【０３０５】
コンピュータは、CPU(Central Processing Unit)４０２を内蔵している。CPU４０２には、バス４０１を介して、入出力インタフェース４１０が接続されており、CPU４０２は、入出力インタフェース４１０を介して、ユーザによって、キーボードや、マウス、マイク等で構成される入力部４０７が操作等されることにより指令が入力されると、それにしたがって、ROM(Read Only Memory)４０３に格納されているプログラムを実行する。あるいは、また、CPU４０２は、ハードディスク４０５に格納されているプログラム、衛星若しくはネットワークから転送され、通信部４０８で受信されてハードディスク４０５にインストールされたプログラム、またはドライブ４０９に装着されたリムーバブル記録媒体４１１から読み出されてハードディスク４０５にインストールされたプログラムを、RAM(Random Access Memory)４０４にロードして実行する。これにより、CPU４０２は、上述したフローチャートにしたがった処理、あるいは上述したブロック図の構成により行われる処理を行う。そして、CPU４０２は、その処理結果を、必要に応じて、例えば、入出力インタフェース４１０を介して、LCD(Liquid CryStal Display)やスピーカ等で構成される出力部４０６から出力、あるいは、通信部４０８から送信、さらには、ハードディスク４０５に記録等させる。
【０３０６】
ここで、本明細書において、コンピュータに各種の処理を行わせるためのプログラムを記述する処理ステップは、必ずしもフローチャートとして記載された順序に沿って時系列に処理する必要はなく、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含むものである。
【０３０７】
また、プログラムは、１のコンピュータにより処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。
【０３０８】
なお、本実施の形態では、画像データをＭＰＥＧ２方式で符号化した場合について説明したが、本発明は、ＭＰＥＧ２方式に限定されるものではなく、その他の非可逆圧縮方式で符号化された画像を復号する場合に適用可能である。
【０３０９】
また、本実施の形態では、符号化データに含まれる複数の復号制御情報のうちの１つであるＤＣＴタイプの正しさ（適切さ）を、その複数の復号制御情報のうちの他の１つである動きベクトルに基づいて判定し、その判定結果を表すミスマッチ情報に基づいて、符号化データの復号およびタップ係数の学習を行うようにしたが、その他、符号化データに含まれる複数の復号制御情報のうちのＤＣＴタイプ以外の正しさ（適切さ）を、その複数の復号制御情報のうちの他の１以上に基づいて判定し、その判定結果を表すミスマッチ情報に基づいて、符号化データの復号およびタップ係数の学習を行うようにすることが可能である。
【０３１０】
【発明の効果】
本発明の復号装置および復号方法、並びに第１のプログラムおよび第１の記録媒体によれば、符号化データに含まれるＤＣＴタイプの正しさが、その符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定され、その判定結果を表すミスマッチ情報を出力される。そして、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データが注目データとされ、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかが、予測タップとして抽出され、低品質データに対応する、学習の生徒となる生徒データと、高品質データに対応する、学習の教師となる教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差を統計的に最小にする学習を行うことにより得られるタップ係数と、予測タップとの積和演算を行うことにより、注目データが求められる。ここで、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップが抽出される。従って、符号化データを、高画質の画像データに復号することが可能となる。
【０３１１】
本発明の学習装置および学習方法、並びに第２のプログラムおよび第２の記録媒体によれば、学習用の画像データから、タップ係数の学習の教師となる教師データが生成されるとともに、生徒となる生徒データが生成される。また、学習用の画像データが符号化され、ＤＣＴタイプおよび画像データの動きベクトルを含む学習用の符号化データが出力される。そして、学習用の符号化データに含まれるＤＣＴタイプの正しさが、その学習用の符号化データに含まれる画像データの動きベクトルに基づいて、ブロック単位の画像データの動きの有無によって判定され、その判定結果を表すミスマッチ情報が出力される。さらに、符号化データを復号して得られる低品質な画像よりも高品質な画像の高品質データのうちの、得ようとしている画素単位の高品質データが注目データとされ、注目データを求めるための所定のタップ係数との積和演算に用いる低品質な画像の画素単位の低品質データの幾つかが、予測タップとして抽出され、低品質データに対応する生徒データと、高品質データに対応する教師データとを用い、生徒データとタップ係数との積和演算により求められる教師データの予測値の予測誤差が統計的に最小になるタップ係数が求められ、タップ係数と、予測タップとの積和演算を行うことにより、注目データが求められる。ここで、ミスマッチ情報に基づき、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフィールドＤＣＴモードであるとき、注目データのフィールドの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しいことを表している場合において、ＤＣＴタイプがフレームＤＣＴモードであるとき、注目データのフレームの低品質データから、予測タップが抽出され、ミスマッチ情報が、ＤＣＴタイプが正しくないことを表している場合、注目データのフィールドとフレームの両方の低品質データから、予測タップが抽出される。従って、そのタップ係数により、符号化データを、高画質の画像データに復号することが可能となる。
【図面の簡単な説明】
【図１】本発明を適用した復号装置の一実施の形態の構成例を示すブロック図である。
【図２】復号装置の処理を説明するフローチャートである。
【図３】本発明を適用した復号装置の他の一実施の形態の構成例を示すブロック図である。
【図４】本発明を適用した学習装置の一実施の形態の構成例を示すブロック図である。
【図５】学習装置の処理を説明するフローチャートである。
【図６】本発明を適用した復号装置のより詳細な構成例を示すブロック図である。
【図７】フレームＤＣＴモードとフィールドＤＣＴモードを説明する図である。
【図８】動き物体が表示されたマクロブロックを、フレームＤＣＴモードとフィールドＤＣＴモードで符号化した場合の復号画像を模式的に示す図である。
【図９】クラス分類適応処理部３２の構成例を示すブロック図である。
【図１０】復号装置の処理を説明するフローチャートである。
【図１１】本発明を適用した学習装置のより詳細な構成例を示すブロック図である。
【図１２】学習装置の処理を説明するフローチャートである。
【図１３】ＭＰＥＧ方式で符号化された符号化データを復号する復号装置の第１の構成例を示すブロック図である。
【図１４】ＭＰＥＧデコーダ１１６の構成例を示すブロック図である。
【図１５】ミスマッチ情報生成部１１５の処理を説明するフローチャートである。
【図１６】動き物体が表示されたマクロブロックを、フレームＤＣＴモードとフィールドＤＣＴモードで符号化した場合の復号画像を模式的に示す図である。
【図１７】タップ構造設定テーブルを示す図である。
【図１８】パターンＡ乃至Ｄのタップ構造を示す図である。
【図１９】ＭＰＥＧ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第１の構成例を示すブロック図である。
【図２０】ＭＰＥＧエンコーダ１３１の構成例を示すブロック図である。
【図２１】ＭＰＥＧ方式で符号化された符号化データを復号する復号装置の第２の構成例を示すブロック図である。
【図２２】ＭＰＥＧ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第２の構成例を示すブロック図である。
【図２３】ＭＰＥＧ方式で符号化された符号化データを復号する復号装置の第３の構成例を示すブロック図である。
【図２４】ＭＰＥＧ方式で符号化された符号化データを復号するのに用いられるタップ係数を学習する学習装置の第３の構成例を示すブロック図である。
【図２５】本発明を適用したコンピュータの一実施の形態の構成例を示すブロック図である。
【符号の説明】
１ミスマッチ検出部，２復号処理部，３パラメータ記憶部，１１学習用データ記憶部，１２符号化部，１３ミスマッチ検出部，１４学習処理部，２１復号制御情報抽出部，２２判定部，３１前処理部，３２クラス分類適応処理部，３３後処理部，４１係数メモリ，５１，５２タップ抽出部，５３クラス分類部，５４予測部，６０適応学習部，６１教師データ生成部，６１Ａ逆後処理部，６２教師データ記憶部，６３生徒データ生成部，６３Ａ符号化部，６３Ｂ前処理部，６４生徒データ記憶部，６５，６６タップ抽出部，６７クラス分類部，６８足し込み部，６９タップ係数算出部，７１復号制御情報抽出部，７２判定部，１１１逆ＶＬＣ部，１１２フィールド／フレーム判定部，１１３イントラ／ノンイントラ判定部，１１４静動判定部，１１５ミスマッチ情報生成部，１１６ＭＰＥＧデコーダ，１２１逆ＶＬＣ部，１２２逆量子化部，１２３逆ＤＣＴ変換部，１２４演算部，１２５動き補償部，１２６メモリ，１２７ピクチャ選択部，１３１ＭＰＥＧエンコーダ，１３２逆ＶＬＣ部，１３３フィールド／フレーム判定部，１３４イントラ／ノンイントラ判定部，１３５静動判定部，１３６ミスマッチ情報生成部，１３７ＭＰＥＧエンコーダ，１３８ＭＰＥＧデコーダ，１４１動きベクトル検出部，１４２動き補償部，１４３演算部，１４４ＤＣＴ変換部，１４５量子化部，１４６ＶＬＣ部，１４７逆量子化部，１４８逆ＤＣＴ変換部，１４９演算部，１５０メモリ，１６１逆ＶＬＣ部，１６２逆量子化部，１６３演算部，１６４ＭＰＥＧデコーダ，１６５メモリ，１６６動き補償部，１６７ＤＣＴ変換部，１７１逆ＶＬＣ部，１７２逆量子化部，１７３演算部，１７４ＭＰＥＧデコーダ，１７５メモリ，１７６動き補償部，１７７ＤＣＴ変換部，１８１逆ＤＣＴ変換部，１９１ＤＣＴ変換部，４０１バス，４０２ CPU，４０３ ROM，４０４ RAM，４０５ハードディスク，４０６出力部，４０７入力部，４０８通信部，４０９ドライブ，４１０入出力インタフェース，４１１リムーバブル記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a decoding device and a decoding method, a learning device and a learning method, and a program and a recording medium, and in particular, for example, decoding encoded data obtained by encoding image data into a high-quality (high-quality) image. The present invention relates to a decoding device and a decoding method, a learning device and a learning method, a program, and a recording medium.
[0002]
[Prior art]
For example, MPEG (Moving Picture Experts Group) method is known as a high-efficiency encoding method for image (moving image) data. In MPEG method, image data is a block unit of 8 × 8 pixels in horizontal × vertical. Thus, DCT (Discrete Cosine Transform) transformation is performed in two directions, horizontal and vertical, and further quantized.
[0003]
As described above, in the MPEG system, the image data is DCT converted. For example, in the MPEG2 system, the DCT type of the block to be DCT converted is switched between the frame DCT mode and the field DCT mode in units of macroblocks. Can do. In the frame DCT mode, a block is composed of pixels of the same frame, and pixel values of such a block are DCT transformed. In the field DCT mode, a block is composed of pixels in the same field, and pixel values of such a block are DCT converted.
[0004]
Whether the DCT type is the frame DCT mode or the field DCT mode is basically determined based on the characteristics of the image such as the motion of the image and continuity with the surrounding macroblocks. Is determined so as to reduce block distortion mosquito noise and the like. That is, for example, the field DCT mode is selected for an image with a large motion, and the frame DCT mode is selected for an image with little motion (a still image).
[0005]
[Problems to be solved by the invention]
By the way, in the MPEG2 system, the data rate of encoded data is limited so that overflow and underflow do not occur on the decoder side. In order to limit the data rate of the encoded data, the DCT type that should originally be set to the frame DCT mode or the field DCT mode may be inappropriately set to the field DCT mode or the frame DCT mode. is there.
[0006]
That is, as a DCT type, there is generally a correlation between pixels constituting a field (for example, the reciprocal of the sum of squares of differences between adjacent pixels constituting a field) (hereinafter referred to as field pixel correlation as appropriate). If the correlation between the pixels constituting the frame (for example, the reciprocal of the sum of squares of differences between adjacent pixels constituting the frame) (hereinafter referred to as frame pixel correlation as appropriate) is greater than the field DCT mode, If set and the frame pixel correlation is greater than the field pixel correlation, the frame DCT mode is set.
[0007]
However, if the encoded data is subject to data rate limitations, the DCT type is set based on the limited data rate regardless of the magnitude of the field pixel correlation and frame pixel correlation, and thus, for example, An improper DCT type may be set such that the frame DCT mode is set instead of the field DCT mode for an image with large motion.
[0008]
Even when such an inappropriate DCT type is set, the decoder side must decode the encoded data in accordance with the inappropriate DCT type, and the image quality of the decoded image deteriorates. was there.
[0009]
Also, when a moving image is MPEG2 encoded at a high compression rate, the same macroblock in one frame and the corresponding macroblock in the next frame are caused by the data rate limitation. Although a moving object is displayed, a different DCT type may be set, and as a result, a decoded image with unnatural motion may be obtained.
[0010]
On the other hand, on the decoding side, it is difficult to determine which one of the frame DCT mode and the field DCT mode is appropriate from the decoded image.
[0011]
The present invention has been made in view of such a situation, and enables encoded data to be decoded into a high-quality (high-quality) image.
[0012]
[Means for Solving the Problems]
The decoding device of the present invention determines the correctness of the DCT type included in the encoded data based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the encoded data, Among the high-quality data of the high-quality image than the low-quality image obtained by decoding the encoded data and the determination means for outputting the mismatch information representing the determination result, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Prediction tap extracting means for extracting some of the low quality data as prediction taps, student data to be learning students corresponding to the low quality data, teacher data to be learning teachers corresponding to the high quality data, and The product-sum operation of the tap coefficient obtained by performing learning that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the product-sum operation of the student data and the tap coefficient and the prediction tap is performed. And a decoding means having a prediction calculation means for obtaining the attention data. The prediction tap extraction means is based on the mismatch information, and when the mismatch information indicates that the DCT type is correct, the DCT type is When in the field DCT mode, a prediction tap is extracted from the low quality data in the field of interest data, and the mismatch information is D In the case where the T type represents correctness, when the DCT type is the frame DCT mode, a prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information indicates that the DCT type is incorrect. The prediction tap is extracted from the low quality data of both the field and the frame of the target data.
[0013]
In the decoding method of the present invention, the correctness of the DCT type included in the encoded data is determined based on the motion vector of the image data included in the encoded data based on the presence / absence of motion of the image data in block units. Of the high-quality data of the high-quality image than the low-quality image obtained by decoding the encoded data and the determination step that outputs mismatch information representing the determination result, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel A prediction tap extraction step for extracting some of the low quality data as prediction taps, student data for learning students corresponding to the low quality data, and teacher data for learning teacher corresponding to the high quality data; The product-sum operation of the tap coefficient obtained by performing learning that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the product-sum operation of the student data and the tap coefficient and the prediction tap is performed. And a decoding step including a prediction calculation step for obtaining data of interest. In the prediction tap extraction step, when the mismatch information indicates that the DCT type is correct based on the mismatch information, the DCT type When is in the field DCT mode, the prediction tap is extracted from the low quality data in the field of the target data. When the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, a prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is correct for the DCT type. In the case where it indicates that there is no data, a prediction tap is extracted from the low quality data of both the field and the frame of the data of interest.
[0014]
The first program of the present invention determines the correctness of the DCT type included in the encoded data based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the encoded data. A determination step of outputting mismatch information representing the determination result, and high-quality data of a higher-quality image than a low-quality image obtained by decoding the encoded data, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel A prediction tap extraction step for extracting some of the low quality data as prediction taps, student data for learning students corresponding to the low quality data, and teacher data for learning teacher corresponding to the high quality data; The product-sum operation of the tap coefficient obtained by performing learning that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the product-sum operation of the student data and the tap coefficient and the prediction tap is performed. And a decoding step including a prediction calculation step for obtaining data of interest. In the prediction tap extraction step, when the mismatch information indicates that the DCT type is correct based on the mismatch information, the DCT type When is in the field DCT mode, the prediction tap is extracted from the low quality data in the field of the target data. When the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, a prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is correct for the DCT type. In the case where it indicates that there is no data, a prediction tap is extracted from the low quality data of both the field and the frame of the data of interest.
[0015]
According to the first recording medium of the present invention, the correctness of the DCT type included in the encoded data is determined based on the presence or absence of the motion of the image data in block units based on the motion vector of the image data included in the encoded data. A determination step of outputting mismatch information representing the determination result, and high quality data of a higher quality image than a low quality image obtained by decoding the encoded data, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel A prediction tap extraction step for extracting some of the low quality data as prediction taps, student data for learning students corresponding to the low quality data, and teacher data for learning teacher corresponding to the high quality data; The product-sum operation of the tap coefficient obtained by performing learning that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the product-sum operation of the student data and the tap coefficient and the prediction tap is performed. And a decoding step including a prediction calculation step for obtaining data of interest. In the prediction tap extraction step, when the mismatch information indicates that the DCT type is correct based on the mismatch information, the DCT type When is in the field DCT mode, the prediction tap is extracted from the low quality data in the field of the target data. When the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, a prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is correct for the DCT type. In the case of the absence of data, a program for extracting prediction taps from low quality data of both the field and frame of the data of interest is recorded.
[0016]
The learning device of the present invention includes teacher data generation means for generating and outputting teacher data to be a teacher for learning tap coefficients from image data for learning, and a student for learning tap coefficients from image data for learning. Student data generating means for generating and outputting student data, encoding means for encoding learning image data, and outputting learning encoded data including a DCT type and a motion vector of the image data, and learning data The correctness of the DCT type included in the encoded data is determined on the basis of the motion vector of the image data included in the encoded data for learning based on the presence or absence of motion of the image data in block units, and the determination result is Among the high-quality data of the high-quality image than the low-quality image obtained by decoding the encoded data and the determination means that outputs the mismatch information that represents, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel The product sum of student data and tap coefficients using prediction tap extraction means for extracting some of the low quality data as prediction taps, student data corresponding to the low quality data, and teacher data corresponding to the high quality data By performing a product-sum operation with a learning means having a tap coefficient calculation means for obtaining a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the calculation, and the tap coefficient and the prediction tap, And a predicting tap extracting means based on the mismatch information, when the mismatch information indicates that the DCT type is correct, the DCT type is the field DCT mode. When the prediction tap is extracted from the low quality data in the field of attention data, the mismatch information is When the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information indicates that the DCT type is not correct. In this case, the prediction tap is extracted from the low quality data of both the field and the frame of the attention data.
[0017]
The learning method of the present invention includes a teacher data generation step for generating and outputting teacher data to be a teacher for learning tap coefficients from image data for learning, and a student for learning tap coefficients from image data for learning. A student data generation step for generating and outputting student data, an encoding step for encoding image data for learning, and outputting encoded data for learning including a DCT type and a motion vector of the image data, and for learning The correctness of the DCT type included in the encoded data is determined on the basis of the motion vector of the image data included in the encoded data for learning based on the presence or absence of motion of the image data in block units, and the determination result is A decision step that outputs mismatch information to represent, and high quality data of a higher quality image than a lower quality image obtained by decoding the encoded data Of out, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Using a prediction tap extraction step for extracting some of the low quality data as prediction taps, student data corresponding to the low quality data, and teacher data corresponding to the high quality data, the product sum of the student data and the tap coefficient By performing a product-sum operation of the tap coefficient and the prediction tap, a learning step having a tap coefficient calculation step for obtaining a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the calculation, And a decoding step having a prediction calculation step for obtaining data of interest. In the prediction tap extraction step, when the mismatch information indicates that the DCT type is correct based on the mismatch information, the DCT type is the field DCT mode. At one time, the prediction tap is extracted from the low quality data of the field of attention data In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is the DCT type In the case where it is not correct, a prediction tap is extracted from the low quality data of both the field and the frame of the target data.
[0018]
A second program of the present invention includes a teacher data generation step for generating and outputting teacher data to be a teacher for learning tap coefficients from image data for learning, and learning of tap coefficients from the image data for learning. A student data generation step for generating and outputting student data to be a student; an encoding step for encoding learning image data and outputting encoded data for learning including a DCT type and a motion vector of the image data; The correctness of the DCT type included in the learning encoded data is determined based on the motion vector of the image data included in the learning encoded data based on the presence or absence of motion of the image data in units of blocks, and the determination Judgment step that outputs mismatch information indicating the result, and high quality image with higher quality than low quality image obtained by decoding encoded data Of the data, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Using the prediction tap extraction step for extracting some of the low quality data as prediction taps, the student data corresponding to the low quality data, and the teacher data corresponding to the high quality data, the product sum of the student data and the tap coefficient By performing a product-sum operation of the tap coefficient and the prediction tap, a learning step having a tap coefficient calculation step for obtaining a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the calculation, And a decoding step having a prediction calculation step for obtaining data of interest. In the prediction tap extraction step, when the mismatch information indicates that the DCT type is correct based on the mismatch information, the DCT type is the field DCT mode. At one time, the prediction tap is extracted from the low quality data of the field of attention data In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is the DCT type In the case where it is not correct, a prediction tap is extracted from the low quality data of both the field and the frame of the target data.
[0019]
The second recording medium of the present invention includes a teacher data generation step for generating and outputting teacher data serving as a teacher for learning tap coefficients from image data for learning, and learning of tap coefficients from the image data for learning. A student data generation step for generating and outputting student data to be a student, an encoding step for encoding learning image data and outputting encoded data for learning including a DCT type and a motion vector of the image data; The correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the learning encoded data, A decision step that outputs mismatch information indicating the decision result, and a higher quality image than a lower quality image obtained by decoding the encoded data Of the over data, High quality data for each pixel you are trying to obtain Featured data age , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Using a prediction tap extraction step for extracting some of the low quality data as prediction taps, student data corresponding to the low quality data, and teacher data corresponding to the high quality data, the product sum of the student data and the tap coefficient By performing a product-sum operation of the tap coefficient and the prediction tap, a learning step having a tap coefficient calculation step for obtaining a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the calculation, And a decoding step having a prediction calculation step for obtaining data of interest. In the prediction tap extraction step, when the mismatch information indicates that the DCT type is correct based on the mismatch information, the DCT type is the field DCT mode. At one time, the prediction tap is extracted from the low quality data of the field of attention data In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is the DCT type If it is not correct, a program for extracting prediction taps from low quality data of both the field and frame of the data of interest is recorded.
[0020]
In the decoding apparatus and decoding method, and the first program and the first recording medium of the present invention, the correctness of the DCT type included in the encoded data is based on the motion vector of the image data included in the encoded data. Thus, determination is made based on the presence or absence of movement of the image data in block units, and mismatch information indicating the determination result is output. Of the high quality data of the high quality image than the low quality image obtained by decoding the encoded data, The high-quality data for each pixel you are trying to obtain Featured data And , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Some of the low-quality data is extracted as prediction taps, and the student data is the student data that corresponds to the low-quality data and becomes the learning student, and the teacher data that corresponds to the high-quality data and becomes the learning teacher. By performing the product-sum operation on the tap coefficient obtained by performing learning that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the product-sum operation between the tap coefficient and the tap coefficient, Data is required. Here, based on the mismatch information, when the mismatch information indicates that the DCT type is correct, when the DCT type is the field DCT mode, a prediction tap is extracted from the low quality data in the field of the target data, In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is correct for the DCT type. If not, a prediction tap is extracted from the low quality data of both the field of interest data and the frame.
[0021]
In the learning apparatus and the learning method, the second program, and the second recording medium of the present invention, teacher data serving as a teacher for learning the tap coefficient is generated from learning image data, and a student serving as a student Data is generated. Further, learning image data is encoded, and encoded learning data including a DCT type and a motion vector of the image data is output. Then, the correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the learning encoded data. Mismatch information representing the determination result is output. Of the high quality data of the high quality image than the low quality image obtained by decoding the encoded data, The high-quality data for each pixel you are trying to obtain Featured data And , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Some of the low-quality data is extracted as prediction taps, and the teacher is obtained by multiplying the student data by the tap coefficient using the student data corresponding to the low-quality data and the teacher data corresponding to the high-quality data. A tap coefficient that statistically minimizes the prediction error of the predicted value of data is obtained, and attention data is obtained by performing a product-sum operation on the tap coefficient and the prediction tap. Here, based on the mismatch information, when the mismatch information indicates that the DCT type is correct, when the DCT type is the field DCT mode, a prediction tap is extracted from the low quality data in the field of the target data, In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of interest data, and the mismatch information is correct for the DCT type. If not, a prediction tap is extracted from the low quality data of both the field of interest data and the frame.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a configuration example of an embodiment of a decoding device to which the present invention is applied.
[0023]
The decoding apparatus includes encoded data reproduced from a recording medium (not shown) (for example, an optical disk, a magneto-optical disk, a phase change disk, a magnetic tape, a semiconductor memory, etc.) or a transmission medium (for example, the Internet or a CATV network). Encoded data transmitted via a satellite line, terrestrial wave, etc.) is input as a decoding target. Here, the encoded data is obtained by encoding image (moving image) data by a predetermined encoding method, and includes at least decoding control information for controlling the decoding.
[0024]
As the encoded data, for example, image data encoded by the MPEG2 method can be employed.
[0025]
Here, in the MPEG2 system, on the encoding side, image data (original image) is subjected to DCT conversion in units of blocks and further quantized. On the encoding side, a motion vector is detected for the image data to be encoded, the encoded data is locally decoded, and the local decoded image data is used as a reference image to detect the reference image. A predicted image is generated by performing motion compensation using the obtained motion vector. Then, by calculating the difference between the image to be encoded and the predicted image, a residual image is obtained, and the residual image is DCT transformed and quantized as described above. Further, on the encoding side, in DCT conversion in units of blocks, a DCT type (frame DCT mode or field DCT mode) is set in units of macroblocks.
[0026]
On the other hand, assuming that the DCT coefficient obtained by DCT transforming and further quantizing the image data (original image or residual image) is called a quantized DCT coefficient, on the decoding side, the quantized DCT coefficient is dequantized. , DCT coefficients. Further, on the decoding side, the DCT coefficients are subjected to inverse DCT transform, and the resulting pixels are rearranged into a frame structure according to the DCT type, whereby image data is decoded or residual image data is obtained. . As for the residual image data, predicted image data is generated by performing motion compensation on the reference image using the already decoded image data as a reference image using a motion vector. Then, the image data is decoded by adding the residual image data and the predicted image data.
[0027]
Therefore, the encoded data obtained by encoding the image data by the MPEG2 system is subjected to DCT conversion of the image data (original image or residual image) and further quantized, that is, the direct data of the image data. In addition to the encoding result, information necessary for decoding the DCT coefficient into an image on the decoding side, that is, information for controlling decoding such as a motion vector and a DCT type (hereinafter referred to as decoding control information as appropriate) is also included. included. The encoded data includes a picture type, a temporal reference, and other decoding control information in addition to a motion vector and a DCT type.
[0028]
The encoded data input to the decoding device is supplied to the mismatch detection unit 1 and the decoding processing unit 2.
[0029]
The mismatch detection unit 1 detects mismatch information from the encoded data. That is, the mismatch detection unit 1 determines the correctness of the decoding control information included in the encoded data, and outputs mismatch information indicating the determination result to the decoding processing unit 2. The decoding processing unit 2 decodes the encoded data based on the mismatch information supplied from the mismatch detection unit 1, and outputs decoded data obtained as a result.
[0030]
Next, processing (decoding processing) of the decoding device in FIG. 1 will be described with reference to the flowchart in FIG.
[0031]
Encoded data is supplied to the mismatch detection unit 1 and the decoding processing unit 2, and the mismatch detection unit 1 first detects mismatch information from the encoded data and supplies it to the decoding processing unit 2 in step S1. Then, the process proceeds to step S2. In step S2, the decoding processing unit 2 decodes the encoded data in which the mismatch information is detected based on the mismatch information supplied from the mismatch detection unit 1, outputs decoded image data, and proceeds to step S3. . In step S3, the mismatch detection unit 1 or the decoding processing unit 2 determines whether there is still encoded data to be decoded. If it is determined in step S3 that encoded data to be decoded still exists, the process returns to step S1, and the same processing is repeated thereafter.
[0032]
If it is determined in step S3 that there is no encoded data to be decoded, the process ends.
[0033]
Next, FIG. 3 shows a configuration example of another embodiment of a decoding device to which the present invention is applied. In the figure, portions corresponding to those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate. That is, the decoding apparatus in FIG. 3 is basically configured in the same manner as the decoding apparatus in FIG. 1 except that the parameter storage unit 3 is newly provided.
[0034]
The parameter storage unit 3 stores parameters obtained by learning by a learning device, which will be described later, and the decoding processing unit 2 uses the parameters stored in the parameter storage unit 3 and supplies encoded data supplied thereto. Is decrypted.
[0035]
Therefore, in the decoding device of FIG. 3, the decoding processing unit 2 performs the same processing as the decoding device of FIG. 1 except that the encoded data is decoded using the parameters stored in the parameter storage unit 3. Therefore, the description of the process is omitted.
[0036]
Next, FIG. 4 shows a configuration example of an embodiment of a learning device that learns parameters to be stored in the parameter storage unit 3 of FIG.
[0037]
The learning data storage unit 11 stores learning data which is image (moving image) data used for parameter learning.
[0038]
The encoding unit 12 reads the learning data stored in the learning data storage unit 11 and encodes the learning data with the same encoding method as the encoded data to be decoded by the decoding device in FIG. To do. Encoded data obtained by encoding the learning data (hereinafter referred to as learning encoded data as appropriate) is supplied from the encoding unit 12 to the mismatch detection unit 13.
[0039]
The mismatch detection unit 13 is configured in the same manner as the mismatch detection unit 1 in FIG. 3, detects mismatch information from the encoded data supplied from the encoding unit 12, and supplies the mismatch information to the learning processing unit 14.
[0040]
The learning processing unit 14 reads out the learning data stored in the learning data storage unit 11 and, from the learning data, the teacher data serving as a learning teacher about the parameters and the student data serving as the learning student. Generate. Further, the learning processing unit 14 learns parameters for converting student data into teacher data using the generated teacher data and student data based on the mismatch information supplied from the mismatch detection unit 13.
[0041]
Next, processing (learning processing) of the learning device in FIG. 4 will be described with reference to the flowchart in FIG.
[0042]
First, in step S <b> 11, the encoding unit 12 reads and encodes the learning data stored in the learning data storage unit 11, and sends the learning encoded data obtained as a result to the mismatch detection unit 13. Then, the process proceeds to step S12. In step S12, the mismatch detection unit 13 detects mismatch information from the encoded data supplied from the encoding unit 12, supplies the mismatch information to the learning processing unit 14, and proceeds to step S13.
[0043]
In step S13, the learning processing unit 14 reads the learning data from the learning data storage unit 11, and generates teacher data and student data from the learning data. Further, the learning processing unit 14 learns parameters using the generated teacher data and student data based on the mismatch information supplied from the mismatch detection unit 13.
[0044]
That is, the learning processing unit 14 performs a process (learning) for calculating an optimum parameter for enabling the corresponding teacher data to be obtained from the student data based on the mismatch information. Do.
[0045]
In step S14, the encoding unit 12 or the learning processing unit 14 determines whether learning data that has not yet been processed is stored in the learning data storage unit 11. If it is determined in step S14 that learning data that has not yet been processed is stored in the learning data storage unit 11, the process returns to step S11, and the learning data that has not yet been processed is targeted. Thereafter, the same processing is repeated.
[0046]
If it is determined in step S14 that the learning data not yet processed is not stored in the learning data storage unit 11, that is, all the learning data stored in the learning data storage unit 11 are stored. When learning is performed using the learning process, the process proceeds to step S15, where the learning processing unit 14 calculates parameters based on the learning result of step S13, and ends the process.
[0047]
Next, FIG. 6 shows a detailed configuration example of the decoding device of FIG.
[0048]
The decoding control information extraction unit 21 is supplied with encoded data obtained by encoding image data, for example, in the MPEG2 format, as a decoding target. Therefore, a plurality of (plural types) of decoding control information included in the encoded data, that is, in the present embodiment, for example, a DCT type, a picture type, and a motion vector are extracted and supplied to the determination unit 22.
[0049]
The determination unit 22 determines the correctness of one (one type) of decoding control information among the plurality of decoding control information supplied from the decoding control information extraction unit 21, and determines the other (other types) of decoding control information. Determine based on. Then, the determining unit 22 outputs mismatch information as a determination result of the correctness of the one decoding control information to the decoding processing unit 2.
[0050]
The decoding control information extraction unit 21 and the determination unit 22 described above constitute the mismatch detection unit 1 in FIG.
[0051]
The preprocessing unit 31 is supplied with encoded data to be decoded. The preprocessing unit 31 performs predetermined preprocessing on the encoded data, and the preprocessed data obtained as a result thereof. Is supplied to the class classification adaptive processing unit 32.
[0052]
The class classification adaptive processing unit 32 configures a prediction tap and a class tap, which will be described later, from the preprocessing data supplied from the preprocessing unit 31, and uses a parameter stored in the coefficient memory 41 to perform a class classification adaptive processing described later. I do. Then, the class classification adaptive processing unit 32 outputs data obtained by performing the class classification adaptive processing (hereinafter referred to as adaptive processing data as appropriate) to the post-processing unit 33.
[0053]
The class classification adaptation processing unit 32 is supplied with mismatch information output from the determination unit 22 of the mismatch detection unit 1, and the class classification adaptation processing unit 32 performs class classification based on the mismatch information. Perform adaptive processing.
[0054]
The post-processing unit 33 performs predetermined post-processing on the data output from the class classification adaptation processing unit 32, thereby decoding the encoded data into high-quality image data and outputting it.
[0055]
The above preprocessing unit 31, class classification adaptation processing unit 32, and postprocessing unit 33 constitute the decoding processing unit 2 in FIG.
[0056]
The coefficient memory 41 stores a tap coefficient for each class, which will be described later, used by the class classification adaptation processing unit 32 to perform the class classification adaptation process.
[0057]
The coefficient memory 41 constitutes the parameter storage unit 3 shown in FIG.
[0058]
Next, processing of the mismatch detection unit 1 in FIG. 6 will be described with reference to FIGS. 7 and 8.
[0059]
FIG. 7 shows a block (FIG. 7A) subjected to DCT conversion in the frame DCT mode and a block (FIG. 7B) subjected to DCT conversion in the field DCT mode in the MPEG2 system.
[0060]
In the embodiment of FIG. 7, a block of luminance signals is shown. In FIG. 7 (the same applies to FIG. 8 described later), a shaded line represents an odd line (top field), and a non-shadowed line represents an even line (bottom field). .
[0061]
In the frame DCT mode, a macro block composed of 16 × 16 pixels in horizontal and vertical directions is divided into four 8 × 8 pixel blocks in the upper left, lower left, upper right, or lower right, as shown in FIG. Each block is DCT transformed.
[0062]
On the other hand, in the field DCT mode, as shown in FIG. 7B, in the macroblock, the upper 8 lines are configured by odd lines (top field), and the lower 8 lines are configured by even lines (bottom field). The pixel positions are rearranged. Then, the rearranged macroblock is divided into four 8 × 8 pixel blocks in the upper left, lower left, upper right, or lower right, and each block is DCT transformed.
[0063]
As described above, in the frame DCT mode, DCT conversion is performed in units of 8 × 8 pixels constituting the same frame, and in the field DCT mode, DCT is performed in units of blocks of 8 × 8 pixels constituting the same field. Conversion is performed.
[0064]
By the way, for example, when an image in which a circular moving object is moving in the horizontal direction is considered, the circular moving object is, for example, as shown in FIG. In addition, it is displayed at a slightly shifted position corresponding to the movement. For this reason, for an image in which such a moving object is displayed, the field pixel correlation is larger than the frame pixel correlation, and a smooth motion decoded image is obtained by performing DCT conversion in the field DCT mode. Obtainable.
[0065]
However, in the MPEG system, as described above, the frame DCT mode is used instead of the field DCT mode in order to reduce the data amount of the encoded data due to the data rate limitation for the image on which the moving object is displayed. Thus, the image data may be DCT transformed.
[0066]
Assuming that the frame DCT mode is set for some of the macroblocks where the circular moving object is displayed, and the field DCT mode is set for the other macroblocks, and DCT conversion is performed. For a macroblock in which the frame DCT mode is set, for example, as illustrated in FIG. 8B, a decoded image in which the edge portion of a circular moving object is blurred is obtained.
[0067]
Here, FIG. 8B shows the decoding when the DCT type of the upper right macroblock among the 2 × 2 macroblocks is set to the frame DCT mode and the DCT types of the other three macroblocks are set to the field DCT mode. An image is shown.
[0068]
Whether the DCT type is the frame DCT mode or the field DCT mode is set in units of macroblocks. Therefore, even if the corresponding macroblocks (macroblocks at the same position) in different frames are used, The type may be different. When the DCT type of a macroblock at a certain position where a moving object is displayed changes in units of frames, the movement of the moving object in the decoded image becomes unnatural.
[0069]
Such blurring (blurring) and unnatural motion in the decoded image is caused by the fact that a macroblock to be DCT-converted in the field DCT mode has been DCT-converted in the frame DCT mode due to data rate limitations. The moving part is caused by DCT conversion in the frame DCT mode, which should be DCT converted in the field DCT mode. Therefore, it can be said that the macroblock to be DCT-converted in the field DCT mode is DCT-converted in the frame DCT mode from the viewpoint of improving the image quality of the decoded image. It can be said that the DCT type representing such a frame DCT mode, which is one of the included decoding control information, is also incorrect.
[0070]
Therefore, for example, the mismatch detection unit 1 determines the correctness of the DCT type included in the encoded data, and outputs mismatch information indicating the determination result.
[0071]
That is, the mismatch detection unit 1 determines that the DCT type of the macroblock is not correct, for example, when the DCT type of the macroblock on which a moving image is displayed is in the frame DCT mode. On the other hand, the mismatch detection unit 1, for example, when the DCT type of the macroblock on which the moving image is displayed is in the field DCT mode, or when the image on which the macroblock does not move is displayed. , It is determined that the DCT type of the macroblock is correct.
[0072]
Note that the mismatch detection unit 1 determines whether or not there is motion in the macroblock (the image displayed in it), which is another one of the decoding control information included in the encoded data. For example, the motion vector of the macroblock Determine based on.
[0073]
Next, FIG. 9 shows a configuration example of the class classification adaptation processing unit 32 of FIG.
[0074]
The class classification adaptation process includes a class classification process and an adaptation process. By the class classification process, data is classified based on its property, and the adaptation process is performed for each class.
[0075]
Here, the adaptive processing will be described by taking as an example the case of converting a low-quality image (hereinafter, appropriately referred to as a low-quality image) into a high-quality image (hereinafter, appropriately referred to as a high-quality image).
[0076]
In this case, in the adaptive processing, a high-quality image that has improved the image quality of the low-quality image by linear combination of pixels constituting the low-quality image (hereinafter referred to as low-quality pixels as appropriate) and a predetermined tap coefficient. By obtaining the predicted value of the pixel, an image in which the image quality of the low-quality image is improved can be obtained.
[0077]
Specifically, for example, a certain high-quality image data is used as teacher data, and low-quality image data with degraded image quality of the high-quality image is used as student data. The predicted value E [y] of y (referred to as a high-quality pixel, as appropriate) ₁ , X ₂ , ... and a predetermined tap coefficient w ₁ , W ₂ Consider a linear primary combination model defined by the linear combination of. In this case, the predicted value E [y] can be expressed by the following equation.
[0078]
E [y] = w ₁ x ₁ + W ₂ x ₂ ＋･･･ (1)
[0079]
To generalize equation (1), tap coefficient w _j A matrix W consisting of _ij And a predicted value E [y _j ] A matrix Y ′ consisting of a set of
[Expression 1]

Then, the following observation equation holds.
[0080]
XW = Y ′ (2)
[0081]
Here, the component x of the matrix X _ij Is a set of i-th student data (i-th teacher data y _i The j-th student data in the set of student data used for the prediction of _j Represents a tap coefficient by which a product with the jth student data in the student data set is calculated. Y _i Represents the i-th teacher data, and thus E [y _i ] Represents the predicted value of the i-th teacher data. Note that y on the left side of Equation (1) is the component y of the matrix Y. _i The suffix i is omitted, and x on the right side of Equation (1) ₁ , X ₂ ,... Are also components x of the matrix X _ij The suffix i is omitted.
[0082]
Consider that the least square method is applied to the observation equation of Expression (2) to obtain a predicted value E [y] close to (a pixel value of) high-quality pixel (y). In this case, a matrix Y composed of a set of true values y of high-quality pixels to be teacher data, and a matrix E composed of sets of residuals (errors relative to the true value y) e of predicted values E [y] of the high-quality pixels y. The
[Expression 2]

From the equation (2), the following residual equation is established.
[0083]
XW = Y + E (3)
[0084]
In this case, the tap coefficient w for obtaining the predicted value E [y] close to the high-quality pixel y _j Is the square error
[Equation 3]

Can be obtained by minimizing.
[0085]
Therefore, the above square error is converted to the tap coefficient w. _j When the value differentiated by 0 is 0, that is, the tap coefficient w satisfying the following equation: _j However, this is the optimum value for obtaining the predicted value E [y] close to the high-quality pixel y.
[0086]
[Expression 4]

[0087]
Therefore, first, the equation (3) is changed to the tap coefficient w. _j Is differentiated by the following equation.
[0088]
[Equation 5]

[0089]
From equations (4) and (5), equation (6) is obtained.
[0090]
[Formula 6]

[0091]
Furthermore, the student data x in the residual equation of equation (3) _ij , Tap coefficient w _j , Teacher data y _i And residual e _i Considering this relationship, the following normal equation can be obtained from the equation (6).
[0092]
[Expression 7]

[0093]
In addition, the normal equation shown in Expression (7) has a matrix (covariance matrix) A and a vector v,
[Equation 8]

And the vector W is defined as shown in Equation 1,
AW = v (8)
Can be expressed as
[0094]
Each normal equation in equation (7) is the student data x _ij And teacher data y _i By preparing a certain number of sets, a tap coefficient w to be obtained _j Therefore, by solving equation (8) with respect to vector W (however, to solve equation (8), matrix A in equation (8) is regular). Necessary), the optimal tap coefficient w _j Can be requested. In solving the equation (8), for example, a sweeping method (Gauss-Jordan elimination method) or the like can be used.
[0095]
As described above, the optimum tap coefficient using the student data and the teacher data (in this case, when the predicted value of the teacher data is obtained from the student data, the tap coefficient that minimizes the sum of the square errors of the predicted values) ) W _j And learning the tap coefficient w _j The adaptive process is to obtain the predicted value E [y] close to the teacher data y by using the equation (1).
[0096]
The adaptive processing is not included in the low-quality image, but differs from simple interpolation in that the component included in the high-quality image is reproduced. That is, in the adaptive processing, as long as only Expression (1) is seen, it looks the same as simple interpolation using a so-called interpolation filter, but the tap coefficient w corresponding to the tap coefficient of the interpolation filter is the teacher data and student data. Therefore, the components included in the high-quality image as teacher data can be reproduced. From this, it can be said that the adaptive process is a process having an image creating action.
[0097]
Here, as the student data, for example, decoded image data obtained by MPEG-encoding high-quality image data as teacher data and further MPEG decoding can be used. In this case, it is possible to obtain a tap coefficient that can obtain a high-quality image with reduced block distortion or the like caused by quantization in MPEG encoding.
[0098]
Further, for example, high-quality image data is used as teacher data, and DCT coefficients obtained by DCT transforming image data as teacher data and further quantizing and dequantizing are used as student data. Is also possible. In this case, a tap coefficient for converting the DCT coefficient into a high-quality image (predicted value thereof) can be obtained.
[0099]
In the above case, the prediction value of the high-quality image is linearly linearly predicted, but the prediction value of the high-quality image can also be predicted by a quadratic or higher formula.
[0100]
The class classification adaptation processing unit 32 in FIG. 9 performs the class classification adaptation process as described above.
[0101]
That is, the preprocessing data output from the preprocessing unit 31 (FIG. 6) is supplied to the

tap extraction units

51 and 52.
[0102]
The tap extraction unit 51 extracts the adaptive processing data to be obtained as attention data, and further extracts some of the preprocessing data used for predicting the attention data as prediction taps. Further, the tap extraction unit 52 extracts some of the preprocess data used for classifying the attention data as class taps.
[0103]
Here, mismatch information output from the determination unit 22 (FIG. 6) is also supplied to the

tap extraction units

51 and 52. And the

tap extraction parts

51 and 52 change the structure of a prediction tap and a class tap, respectively based on mismatch information.
[0104]
Here, in order to simplify the description, it is assumed that the prediction tap and the class tap have the same tap structure. However, the prediction tap and the class tap can have different tap structures.
[0105]
The prediction tap obtained by the tap extraction unit 51 is supplied to the prediction unit 54, and the class tap obtained by the tap extraction unit 52 is supplied to the class classification unit 53.
[0106]
In addition to the class tap, mismatch information is also supplied to the class classification unit 53. The class classification unit 53 classifies the data of interest based on the class tap and the mismatch information from the tap extraction unit 52. The class code corresponding to the class obtained as a result is supplied to the coefficient memory 41.
[0107]
The coefficient memory 41 stores the tap coefficient of the class corresponding to the class code at the address corresponding to each class code, and the tap stored at the address corresponding to the class code supplied from the class classification unit 53. The coefficient is supplied to the prediction unit 54.
[0108]
The prediction unit 54 acquires the prediction tap output from the tap extraction unit 51 and the tap coefficient output from the coefficient memory 41, and uses the prediction tap and the tap coefficient to perform the linear prediction calculation shown in Expression (1). I do. Thereby, the prediction unit 54 obtains and outputs the adaptive processing data (predicted value thereof).
[0109]
Next, processing (decoding processing) of the decoding device in FIG. 6 will be described with reference to the flowchart in FIG.
[0110]
In the tap extraction unit 51 of the class classification adaptive processing unit 32 (FIG. 9), the adaptive processing data to be obtained is the attention data. In step S21, the mismatch detection unit 1 encodes the encoded data corresponding to the attention data. Mismatch information is generated from (hereinafter, referred to as encoded data of interest as appropriate).
[0111]
That is, in the mismatch detection unit 1, the decoding control information extraction unit 21 extracts, for example, a motion vector or a DCT type as a plurality of decoding control information from the encoded data of interest, and supplies the extracted information to the determination unit 22. And the determination part 22 determines the correctness of the DCT type similarly supplied from the decoding control information extraction part 21, for example based on the motion vector etc. which are supplied from the decoding control information extraction part 21, As the determination result Is supplied to the class classification adaptation processing unit 32.
[0112]
And it progresses to step S22 and the pre-processing part 31 performs pre-processing with respect to the encoding data for obtaining the pre-processing data required in order to comprise the prediction tap and class tap about attention data, and the result The obtained preprocessing data is supplied to the class classification adaptation processing unit 32.
[0113]
In the class classification adaptive processing unit 32 (FIG. 9), in step S23, the

tap extraction units

51 and 52 use the preprocessing data supplied from the preprocessing unit 31, for example, based on the mismatch information from the mismatch detection unit 1. Each of the tap structure prediction tap and the class tap is configured. The prediction tap is supplied from the tap extraction unit 51 to the prediction unit 54, and the class tap is supplied from the tap extraction unit 52 to the class classification unit 53.
[0114]
The class classification unit 53 receives the class tap for the data of interest from the tap extraction unit 52, and classifies the data of interest based on the class tap and the mismatch information supplied from the mismatch detection unit 1 in step S24. The class code representing the class of the data of interest is output to the coefficient memory 41.
[0115]
The coefficient memory 41 reads out and outputs the tap coefficient stored at the address corresponding to the class code supplied from the class classification unit 53. In step S25, the prediction unit 54 acquires the tap coefficient output from the coefficient memory 41, and proceeds to step S26.
[0116]
In step S <b> 26, the prediction unit 54 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 51 and the tap coefficient acquired from the coefficient memory 41. Thereby, the prediction unit 54 obtains (predicted value) of adaptive processing data as attention data, and supplies it to the post-processing unit 33.
[0117]
In step S27, the post-processing unit 33 (FIG. 6) performs predetermined post-processing on the attention data from the class classification adaptive processing unit 32 (prediction unit 54 thereof), thereby obtaining decoded image data. Output.
[0118]
Thereafter, the process proceeds to step S28, where it is determined whether there is any adaptive processing data that has not yet been set as the data of interest. If it is determined in step S28 that there is adaptation processing data that has not yet been set as attention data, one of the adaptation processing data that has not yet been set as attention data is newly set as attention data, and the process returns to step S21. Thereafter, the same processing is repeated.
[0119]
If it is determined in step S28 that there is no adaptive process data that has not yet been set as attention data, the process ends.
[0120]
Next, FIG. 11 shows a detailed configuration example of the learning device in FIG. 4 when learning tap coefficients to be stored in the coefficient memory 41 in FIG.
[0121]
In the embodiment of FIG. 11, the mismatch detection unit 13 includes a decoding control information extraction unit 71 and a determination unit 72, and the encoded data output from the encoding unit 12 is supplied to the decoding control information extraction unit 71. It has come to be. The decoding control information extraction unit 71 or the determination unit 72 is configured in the same manner as the decoding control information extraction unit 21 or the determination unit 22 in FIG. 6, and in the same way as described with reference to FIG. Mismatch information is obtained from the corresponding encoded data and supplied to the learning processing unit 14.
[0122]
The learning processing unit 14 includes an adaptive learning unit 60, a teacher data generation unit 61, and a student data generation unit 63.
[0123]
The adaptive learning unit 60 includes a teacher data storage unit 62, a student data storage unit 64,

tap extraction units

65 and 66, a class classification unit 67, an addition unit 68, and a tap coefficient calculation unit 69, and a teacher data generation unit 61. Is composed of a reverse post-processing unit 61A, and the student data generation unit 63 is composed of an encoding unit 63A and a pre-processing unit 63B.
[0124]
The reverse post-processing unit 61A reads the learning data from the learning data storage unit 11, and performs a process complementary to the process performed by the post-processing unit 33 in FIG. 6 (hereinafter referred to as reverse post-processing as appropriate). . That is, for example, if the learning data is y and the post-processing unit 33 shown in FIG. 6 performs post-processing performed on the adaptive processing data x by a function f (x), the reverse post-processing unit 61A , The function f for the learning data y ^-1 (Y) (f ^-1 () Represents the inverse function of the function f ()), and the resultant data is output to the adaptive learning unit 60 as teacher data. The teacher data output from the reverse post-processing unit 61A corresponds to the adaptive processing data supplied from the class classification adaptive processing unit 32 to the post-processing unit 33 in FIG.
[0125]
The teacher data storage unit 62 temporarily stores the teacher data output from the teacher data generation unit 61 (the reverse post-processing unit 61A).
[0126]
The encoding unit 63A reads the learning data from the learning data storage unit 11, encodes the same encoding method as that of the encoding unit 12, that is, the MPEG2 method in the present embodiment, and outputs the encoded data. Therefore, the encoding unit 63A outputs the same encoded data that the encoding unit 12 outputs. Note that the

encoding units

12 and 63A can be shared by one encoding unit.
[0127]
The preprocessing unit 63B performs the same preprocessing as that performed by the preprocessing unit 31 in FIG. 6 on the encoded data output from the encoding unit 63A, and uses the preprocessed data obtained as a result as student data. And output to the adaptive learning unit 60. Note that the student data output by the preprocessing unit 63B corresponds to preprocessing data supplied from the preprocessing unit 31 of FIG. 6 to the class classification adaptive processing unit 32.
[0128]
The student data storage unit 64 temporarily stores the student data output from the student data generation unit 63 (preprocessing unit 63B).
[0129]
The tap extraction unit 65 sequentially uses the teacher data stored in the teacher data storage unit 62 as attention teacher data, and extracts the student data stored in the student data storage unit 64 for the attention teacher data. The prediction taps having the same tap structure as the nine tap extraction units 51 are configured and output. The tap extraction unit 65 is supplied with mismatch information output from the mismatch detection unit 13 (the determination unit 72), and the tap extraction unit 65 is similar to the tap extraction unit 51 of FIG. The tap structure of the prediction tap is changed based on the mismatch information about the attention teacher data.
[0130]
The tap extraction unit 66 extracts the student data stored in the student data storage unit 64 for the teacher data of interest, thereby configuring a class tap having the same tap structure as the tap extraction unit 52 of FIG. Output. Note that the tap extraction unit 66 is supplied with mismatch information output from the mismatch detection unit 13, and the tap extraction unit 66 is similar to the tap extraction unit 52 of FIG. The tap structure of the class tap is changed based on the mismatch information.
[0131]
The class classification unit 67 is supplied with the class tap output from the tap extraction unit 66 and the mismatch information output from the mismatch detection unit 13. The class classification unit 67 performs the same class classification on the attention teacher data as the class classification unit 53 of FIG. 9 based on the class tap and mismatch information regarding the attention teacher data, and obtains a class code corresponding to the class obtained as a result. , And output to the adding unit 68.
[0132]
The adding unit 68 reads attention teacher data from the teacher data storage unit 62, and targets the attention teacher data and the student data constituting the prediction tap configured for the attention teacher data supplied from the tap extraction unit 65. The addition is performed for each class code supplied from the class classification unit 67.
[0133]
That is, the adding unit 68 uses the prediction tap (student data) for each class corresponding to the class code supplied from the class classification unit 67, and is a component in the matrix A of the equation (8). Multiplication of data (x _in x _im ) And a calculation corresponding to summation (Σ).
[0134]
Further, the adding unit 68 uses the prediction tap (student data) and the teacher data for each class corresponding to the class code supplied from the class classification unit 67, and uses each component in the vector v in the equation (8) Multiplication of student data and teacher data (x _in y _i ) And a calculation corresponding to summation (Σ).
[0135]
In other words, the adding unit 68 stores the component of the matrix A and the component of the vector v in the formula (8) obtained for the teacher data that was previously regarded as the teacher data of interest in its built-in memory (not shown). For each of the components of the matrix A or the vector v, the teacher data _i And student data x _in (x _im ) To calculate the corresponding component x _in x _im Or x _in y _i (Addition represented by summation in matrix A and vector v is performed).
[0136]
Then, the addition unit 68 performs the above addition using all the teacher data stored in the teacher data storage unit 62 as the attention teacher data, thereby obtaining the normal equation shown in Expression (8) for each class. Then, the normal equation is supplied to the tap coefficient calculation unit 69.
[0137]
The tap coefficient calculation unit 69 obtains and outputs the tap coefficient for each class by solving the normal equation for each class supplied from the adding unit 68.
[0138]
Next, processing (learning processing) of the learning device in FIG. 11 will be described with reference to the flowchart in FIG.
[0139]
First, in step S31, the teacher data generation unit 61 and the student data generation unit 63 generate teacher data and student data from the learning data stored in the learning data storage unit 11, respectively. The teacher data is supplied from the teacher data generation unit 61 to the teacher data storage unit 62 and stored therein, and the student data is supplied from the student data generation unit 63 to the student data storage unit 64 and stored therein.
[0140]
Thereafter, the tap extraction unit 65 sets the teacher data stored in the teacher data storage unit 62 as attention teacher data that has not yet been regarded as attention teacher data. In step S 32, the encoding unit 12 encodes the learning data stored in the learning data storage unit 11, thereby encoding data corresponding to the attention teacher data (the learning data corresponding to the attention teacher data). (Encoded data) is obtained and supplied to the mismatch detection unit 13.
[0141]
The mismatch detection unit 13 generates mismatch information about the teacher data of interest from the encoded data supplied from the encoding unit 12 and supplies the mismatch information to the

tap extraction units

65 and 66 and the class classification unit 67 of the learning processing unit 14. .
[0142]
In step S 34, the tap extraction unit 65 reads out the student data stored in the student data storage unit 64 for the teacher data of interest based on the mismatch information, configures a prediction tap, and supplies the prediction tap to the addition unit 68. At the same time, the tap extraction unit 66 also reads out student data stored in the student data storage unit 64 for the teacher data of interest based on the mismatch information, forms a class tap, and supplies the class tap to the class classification unit 67.
[0143]
In step S35, the class classification unit 67 performs class classification for the attention teacher data based on the class tap and mismatch information regarding the attention teacher data, and outputs the class code corresponding to the resulting class to the addition unit 68. To do.
[0144]
In step S 36, the adding unit 68 reads the attention teacher data from the teacher data storage unit 62, and uses the attention teacher data and the prediction tap from the tap extraction unit 65, and uses the matrix A and the vector v of Expression (8). Calculate the component. Further, the adding unit 68 applies the matrix A obtained from the attention data and the prediction tap to the component corresponding to the class code from the class classification unit 67 among the components of the matrix A and the vector v already obtained. And the component of the vector v are added, respectively, and the process proceeds to step S37.
[0145]
In step S <b> 37, the tap extraction unit 65 determines whether or not teacher data that has not yet been set as the teacher data of interest is stored in the teacher data storage unit 62. If it is determined in step S37 that the teacher data that is not the attention teacher data is still stored in the teacher data storage unit 62, the tap extraction unit 65 newly adds the teacher data that is not yet the attention teacher data. As attention teacher data, the process returns to step S32, and the same processing is repeated thereafter.
[0146]
If it is determined in step S37 that the teacher data that is not the attention teacher data is not stored in the teacher data storage unit 62, the adding unit 68 uses the matrix A for each class obtained by the processing so far. And the normal equation of the equation (8) composed of the components of the vector v is supplied to the tap coefficient calculation unit 69, and the process proceeds to step S38.
[0147]
In step S38, the tap coefficient calculation unit 69 finds and outputs a tap coefficient for each class by solving the normal equation for each class supplied from the adding unit 68, and ends the process.
[0148]
There may be a class in which the number of normal equations necessary for obtaining tap coefficients cannot be obtained due to an insufficient number of learning data stored in the learning data storage unit 11. However, for such classes, the tap coefficient calculation unit 69 outputs, for example, default tap coefficients.
[0149]
Next, FIG. 13 shows a first detailed configuration example of the decoding device of FIG. 6 when the encoded data is obtained by encoding image data by the MPEG2 system.
[0150]
In the embodiment of FIG. 13, the decoding control information extraction unit 21 includes an inverse VLC unit 111. The inverse VLC unit 111 is configured, for example, in the same manner as the inverse VLC unit 121 (FIG. 14) that configures an MPEG decoder 116 described later, and from the encoded data, for example, a DCT type, as a plurality of decoding control information, The picture type, macroblock (MB) type, and motion vector are extracted and supplied to the determination unit 22.
[0151]
The determination unit 22 includes a field / frame determination unit 112, an intra / non-intra determination unit 113, a static motion determination unit 114, and a mismatch information generation unit 115.
[0152]
Based on the DCT type output from the inverse VLC unit 111, the field / frame determination unit 112 determines whether a block having pixels corresponding to the data of interest (hereinafter referred to as the block of interest as appropriate) is a frame DCT mode or a field DCT mode. The DCT conversion is performed and the determination result is supplied to the mismatch information generation unit 115.
[0153]
Based on the picture type and macroblock type output from the inverse VLC unit 111, the intra / non-intra determination unit 113 encodes the block of interest (including the macroblock) in either intra coding or non-intra coding. And the determination result is supplied to the mismatch information generation unit 115.
[0154]
The static motion determination unit 114 determines the presence / absence of the motion of the block of interest (presence / absence of the motion of the image displayed in the block of interest) based on the motion vector output from the inverse VLC unit 111, and uses the determination result as mismatch information. This is supplied to the generation unit 115.
[0155]
Based on the outputs of the field / frame determination unit 112, the intra / non-intra determination unit 113, and the static motion determination unit 114, the mismatch information generation unit 115 outputs the target block (including the macro block) output by the inverse VLC unit 111. The correctness of the DCT type is determined, mismatch information as a determination result is generated, and supplied to the class classification adaptive processing unit 32.
[0156]
Here, in the embodiment of FIG. 13, the preprocessing unit 31 is configured by an MPEG decoder 116, and the MPEG decoder 116 decodes the encoded data by the MPEG2 system, and obtains the decoded image data obtained as a result, The pre-processed data is supplied to the class classification adaptive processing unit 32.
[0157]
Next, FIG. 14 shows a configuration example of the MPEG decoder 116 of FIG.
[0158]
The encoded data is supplied to the inverse VLC unit 121. The inverse VLC unit 121 separates a VLC code (quantized DCT coefficient obtained by variable length coding) of a quantized DCT coefficient (quantized DCT coefficient) from the encoded data, a quantization step, a motion vector , Picture type, temporal reference, and other decoding control information.
[0159]
Then, the inverse VLC unit 121 performs inverse VLC processing on the VLC code of the quantized DCT coefficient, thereby decoding the quantized DCT coefficient and supplies the decoded DCT coefficient to the inverse quantization unit 122. Further, the inverse VLC unit 121 supplies the quantization step to the inverse quantization unit 122, the motion vector to the motion compensation unit 125, the picture type to the memory 126, and the temporal reference to the picture selection unit 127, respectively.
[0160]
The inverse quantization unit 122 inversely quantizes the quantized DCT coefficient supplied from the inverse VLC unit 121 in the quantization step similarly supplied from the inverse VLC unit 121, and converts the resulting DCT coefficient into an inverse DCT transform unit. 123. The inverse DCT conversion unit 123 performs inverse DCT conversion on the DCT coefficient supplied from the inverse quantization unit 122 and supplies the DCT coefficient to the calculation unit 124.
[0161]
The calculation unit 124 is supplied with the output of the motion compensation unit 125 in addition to the output of the inverse DCT conversion unit 123. The calculation unit 124 performs motion compensation on the output of the inverse DCT conversion unit 123. The output of the unit 125 is added as necessary to obtain and output decoded image data.
[0162]
That is, in MPEG encoding, three picture types of I, P, and B are defined, and each picture is DCT-converted in units of blocks of 8 × 8 pixels in width × length. The I picture block is intra-coded without referring to other frames or fields (the difference from the predicted image is not calculated), and the P picture block is intra-coded or forward-predicted code. The B picture block is subjected to intra coding, forward prediction coding, backward prediction coding, or bidirectional prediction coding.
[0163]
Here, in forward predictive coding, an image of a frame (or field) temporally preceding the frame (or field) of the block to be coded is used as a reference image, and the reference image is obtained by motion compensation. The difference between the prediction image of the encoding target block and the encoding target block is obtained, and the difference value, that is, the residual image is DCT transformed.
[0164]
Further, in backward predictive coding, a predicted image of a block to be encoded, which is obtained by performing motion compensation on the reference image using a frame image temporally following the frame of the block to be encoded as a reference image. And the difference from the block to be encoded are obtained, and the difference value (residual image) is DCT transformed.
[0165]
Furthermore, in bi-directional predictive coding, two frames (or fields) of a frame temporally preceding and following a frame of a block to be encoded are used as reference images, and the reference image is subjected to motion compensation. The obtained difference between the prediction image of the encoding target block and the encoding target block is obtained, and the difference value (residual image) is subjected to DCT transform.
[0166]
Therefore, when the block is non-intra coded (forward prediction coding, backward prediction coding, or bidirectional prediction coding), the output of the inverse DCT transform unit 123 is a residual image (original The difference between the image and the predicted image is decoded. The calculation unit 124 decodes the residual image (hereinafter, referred to as a decoded residual image as appropriate) and the motion compensation unit 125. Are added to the prediction image supplied from the non-intra-coded block, and the decoded image data obtained as a result is output.
[0167]
On the other hand, when the block output from the inverse DCT transform unit 123 is an intra-coded block, the output from the inverse DCT transform unit 123 is obtained by decoding the original image, and the computation unit 124. Outputs the output of the inverse DCT transform unit 123 as it is as decoded image data.
[0168]
The decoded image data output from the calculation unit 124 is supplied to the memory 126 and the picture selection unit 127.
[0169]
When the decoded image data supplied from the calculation unit 124 is I picture or P picture image data, the memory 126 temporarily stores the decoded image data as a reference image of encoded data to be decoded thereafter. Here, in MPEG2, since the B picture is not a reference image, when the decoded image supplied from the calculation unit 124 is a B picture image, the memory 126 does not store the B picture decoded image. Note that the memory 126 determines whether the decoded image supplied from the calculation unit 124 is a picture of I, P, or B by referring to the picture type supplied from the inverse VLC unit 121. .
[0170]
The picture selection unit 127 selects and outputs the decoded image output from the calculation unit 124 or the frame (or field) of the decoded image stored in the memory 126 in the display order. That is, in the MPEG2 system, since the display order of the frame (or field) of the image does not match the decoding order (encoding order), the picture selection unit 127 obtains the frame (or field) of the decoded image obtained in the decoding order. Are output in the order of display. The picture selection unit 127 determines the display order by referring to the temporal reference supplied from the reverse VLC unit 121.
[0171]
On the other hand, the motion compensation unit 125 receives the motion vector output from the inverse VLC unit 121, reads out a frame (or field) serving as a reference image from the memory 126, and outputs the reference image from the inverse VLC unit 121. The motion compensation according to the motion vector is performed, and the predicted image obtained as a result is supplied to the calculation unit 124. In the calculation unit 124, as described above, the prediction image from the motion compensation unit 125 and the residual image output from the inverse DCT conversion unit 123 are added, whereby non-intra coding (prediction coding) is performed. Is decrypted.
[0172]
Next, processing of the mismatch information generation unit 115 in FIG. 13 will be described with reference to the flowchart in FIG.
[0173]
First, in step S41, the mismatch information generation unit 115 determines whether the target block (including the macroblock) is intra-coded or non-intra-coded. The determination is made based on the output of the non-intra determination unit 113.
[0174]
Here, the intra / non-intra determination unit 113 determines that the block of interest is intra-coded if the picture type of the frame of the block of interest represents an I picture. Also, the intra / non-intra determination unit 113, when the picture type of the frame of the target block represents a P or B picture, the macro of the macro block including the target block (hereinafter referred to as the target macro block as appropriate). Based on the block type, it is determined whether the block of interest is intra-coded or non-intra-coded.
[0175]
When it is determined in step S41 that the target block is non-intra coded, the process proceeds to step S42, and the mismatch information generation unit 115 displays the block in which the target block displays a moving image (hereinafter referred to as appropriate). Or a block displaying a still image (hereinafter, referred to as a stationary block as appropriate) based on the output of the static motion determination unit 114.
[0176]
Here, the static motion determination unit 114, for a non-intra coded block, if the size of a motion vector of a macroblock including the block is greater than (or greater than) a predetermined threshold ε, It is determined that the non-intra coded block is a motion block. The static motion determination unit 114 performs non-intra coding when the magnitude of a motion vector of a macroblock including a non-intra coded block is equal to or smaller than (or less than) a predetermined threshold ε. Determine that the block is a static block.
[0177]
If it is determined in step S42 that the block of interest is a motion block, the process proceeds to step S45, and processing described later is performed.
[0178]
If it is determined in step S42 that the target block is a still block, the process proceeds to step S43, and the mismatch information generation unit 115 uses the DCT type of the target data (DCT type of the target macroblock as mismatch information of the target data). ) Is correct, for example, 1 bit of 0 is generated and output, and the process ends.
[0179]
On the other hand, when it is determined in step S41 that the block of interest is intra-encoded, the process proceeds to step S44, and the mismatch information generation unit 115 determines whether the block of interest is a motion block or a stationary block. The determination is made based on the output of the static motion determination unit 114.
[0180]
Here, for the block that has been intra-encoded, the static motion determination unit 114, for example, a block corresponding to the previous frame of the block (hereinafter referred to as a pre-corresponding block as appropriate) and a post-frame In the case of a non-intra coded block based on the magnitude relationship between one or both of the corresponding blocks (hereinafter referred to as post-corresponding blocks as appropriate) in the frame of the frame and a predetermined threshold ε In the same manner as above, it is determined whether a motion block or a stationary block is different. Alternatively, for example, when one or both of the pre-corresponding block and the post-corresponding block for the block that is intra-encoded, or both are motion blocks, the static motion determination unit 114 is intra-encoded. It is determined that the block is also a motion block, and when both or one of the pre-corresponding block and the post-corresponding block is a static block, it is determined that the intra-coded block is also a static block.
[0181]
When it is determined in step S44 that the target block is a still block, the process proceeds to step S43, and as described above, the mismatch information generation unit 115 confirms that the DCT type of the target data is correct as the mismatch information of the target data. The 1-bit 0 that represents is generated and output, and the process ends.
[0182]
If it is determined in step S44 that the target block is a motion block, the process proceeds to step S45, and the mismatch information generation unit 115 determines whether the DCT type of the target block is either the frame DCT mode or the field DCT mode. It is determined based on the output of the field / frame determination unit 112.
[0183]
If it is determined in step S45 that the DCT type of the target block is the field DCT mode, the process proceeds to step S43, and as described above, the mismatch information generation unit 115 uses the DCT of the target data as the mismatch information of the target data. A 1-bit 0 indicating that the type is correct is generated and output, and the process ends.
[0184]
If it is determined in step S45 that the DCT type of the block of interest is the frame DCT mode, the process proceeds to step S46, and the mismatch information generation unit 115 uses the DCT type of the data of interest (attention) as the mismatch information of the data of interest. For example, 1 bit 1 is generated and output indicating that the DCT type of the macro block is not correct, and the process is terminated.
[0185]
According to the embodiment of FIG. 15, for example, as shown in FIG. 16, in the adjacent 2 × 2 macroblocks MB # 1, # 2, # 3, and # 4, a circular moving in the horizontal direction When the object is displayed, the DCT type of the upper right macro block MB # 2 is the frame DCT mode, and the DCT types of the other three macro blocks MB # 1, # 3, and # 4 are the field DCT mode. When this is the case, the mismatch information generation unit 115 generates the following mismatch information.
[0186]
That is, all the blocks constituting the macroblocks MB # 1, # 2, # 3, and # 4 are motion blocks and should be DCT transformed in the field DCT mode. Therefore, if the data of the blocks constituting each of the macro blocks MB # 1, # 3, and # 4 in which the DCT type is in the field DCT mode is set as the attention data, it is confirmed that the DCT type is correct as mismatch information. A 1-bit representing 0 is generated. Further, when the data of the block constituting the macro block MB # 2 in which the DCT type is in the frame DCT mode is set as the attention data, 1 bit 1 indicating that the DCT type is not correct is used as mismatch information. Generated.
[0187]
In the embodiment of FIG. 15, only when the target block is a motion block and the DCT type is in the frame DCT mode, mismatch information indicating that the DCT type is incorrect is generated, In this case, mismatch information indicating that the DCT type is correct is generated. However, for example, when the target block is a motion block and the DCT type is in the frame DCT mode, Is a static block and the DCT type is in field DCT mode, mismatch information indicating that the DCT type is incorrect is generated, the target block is a motion block, and the DCT type is a field DCT. Mode and when the block of interest is a static block And that when the DCT type is in the frame DCT mode, it is also such that to generate a mismatch information indicating that DCT type is correct.
[0188]
Further, in the embodiment of FIG. 15, in order to simplify the description, 1-bit mismatch information indicating whether the DCT type is correct or incorrect is generated. However, as the mismatch information, for example, , DCT type of the data of interest, and information indicating whether the block including the data of interest (target block) is to be subjected to DCT conversion in the frame DCT mode or the field DCT mode (hereinafter referred to as a block as appropriate) It is also possible to generate a set with a type).
[0189]
Here, the block type represents, for example, the field DCT mode when the target block is a motion block, and represents the frame DCT mode when the target block is a stationary block. It is possible.
[0190]
Next, processing of the class classification adaptation processing unit 32 (FIG. 9) in the embodiment of FIG. 13 will be described.
[0191]
In the class classification adaptation processing unit 32, class classification adaptation processing is performed on the decoded image data output from the MPEG decoder 116 described in FIG. 14 constituting the preprocessing unit 31, and the adaptive processing data obtained as a result is It is output to the post-processing unit 33. The post-processing unit 33 outputs the adaptive processing data from the class classification adaptive processing unit 32 as it is as high-quality image data (high-quality image data).
[0192]
Therefore, in the embodiment of FIG. 13, the class classification adaptation processing unit 32 performs the class classification adaptation processing, so that the decoded image output from the MPEG decoder 116 of the preprocessing unit 31 and decoded from the encoded data in the MPEG system. The data is converted into high-quality image data and output.
[0193]
That is, in the class classification adaptive processing unit 32 (FIG. 9), the decoded image data output from the MPEG decoder 116 of the preprocessing unit 31 is supplied to the

tap extraction units

51 and 52.
[0194]
The tap extraction unit 51 predicts some (pixels) of decoded image data used for predicting the attention data (pixel value thereof) using the pixels of the high-quality image data that are not yet the attention data as the attention data. Extract as a tap. The tap extraction unit 52 also extracts some of the decoded image data used for classifying the data of interest as class taps.
[0195]
Here, as described above, mismatch information is also supplied from the determination unit 22 to the

tap extraction units

51 and 52, and the

tap extraction units

51 and 52 are configured to use the prediction tap and the class based on the mismatch information. Each tap structure is changed.
[0196]
That is, for example, as described above, the set of the DCT type and block type of the block of interest is classified into the class classification from the determination unit 22 (the mismatch information generation unit 115 (FIG. 13)) as mismatch information about the data of interest. Assuming that the tap extraction unit 51 is supplied to the adaptive processing unit 32, the tap extraction unit 51 receives a set of the DCT type and block type of the target block as mismatch information, and from the decoded image data supplied from the MPEG decoder 116, For example, a prediction tap having a tap structure according to a tap structure setting table as shown in FIG. 17 is extracted.
[0197]
That is, when both the DCT type and the block type as mismatch information are in the field DCT mode, the tap extracting unit 51 configures a prediction tap having a pattern A tap structure including only field taps to be described later. Further, when the DCT type and the block type as mismatch information are the field DCT mode and the frame DCT mode, the tap extraction unit 51 has a tap structure of the pattern B in which the number of field taps is larger than the number of frame taps described later. Configure prediction taps for. Further, when the DCT type and the block type as mismatch information are the frame DCT mode and the field DCT mode, respectively, the tap extraction unit 51 predicts the tap structure of the pattern C in which the number of frame taps is larger than the number of field taps. Configure taps. Further, when both the DCT type and the block type as mismatch information are the frame DCT mode, the tap extraction unit 51 configures a prediction tap having a tap structure of a pattern D including only frame taps.
[0198]
Here, FIG. 18 shows a tap structure of patterns A to D. In FIG. 18, the ◯ marks represent the pixels of the decoded image data. In addition, a circle mark with a hatched line represents a pixel that is a field tap, and a mark ● represents a pixel that is a frame tap.
[0199]
FIG. 18A shows a tap structure of pattern A. The tap structure of the pattern A includes, for example, a pixel of decoded image data corresponding to the target data (hereinafter referred to as the target pixel as appropriate), two pixels adjacent to the left and right of the target pixel, and one pixel above the target pixel. Adjacent pixels, 2 pixels adjacent to the left and right of the pixel, 3 pixels above the pixel of interest adjacent to each other, 2 pixels adjacent to the left and right of the pixel, and 1 pixel below the pixel of interest A total of 25 pixels are composed of adjacent pixels, two pixels adjacent to the left and right of the pixel, pixels adjacent to each other in the downward direction of the pixel of interest, and two pixels adjacent to the left and right of the pixel.
[0200]
Here, the field tap means, for example, a pixel in which none of the two adjacent pixels above and below is a tap (in this case, a prediction tap or a class tap). In the tap structure of the pattern A in FIG. 18A, all the taps are field taps because the adjacent pixels above and below the taps are not taps.
[0201]
FIG. 18B shows a tap structure of pattern B. The tap structure of the pattern B includes, for example, the target pixel, two pixels adjacent to the left and right of the target pixel, two pixels adjacent to the left and right of the adjacent pixel in the upper direction of the target pixel, and the upward direction of the target pixel 1 pixel adjacent to the left and right of the adjacent pixels in 3 pixels, 2 pixels adjacent to the left and right of each adjacent pixel in the downward direction of the target pixel, and 3 pixels in the downward direction of the target pixel 1 pixel adjacent to the left and right of each pixel, 4 pixels adjacent above the target pixel, and 4 pixels adjacent below the target pixel, for a total of 25 pixels.
[0202]
Here, the frame tap means a pixel in which at least one of the adjacent pixels above or below is a tap. In the tap structure of the pattern B in FIG. 18B, a total of 9 pixels of the target pixel and the four pixels adjacent to the top and bottom of the target pixel are frame taps, and the remaining 16 pixels are field taps.
[0203]
FIG. 18C shows a tap structure of pattern C. The tap structure of the pattern C is, for example, the target pixel, two pixels adjacent to the left and right of the target pixel, two pixels adjacent to the left and right of the adjacent pixel in the upper direction of the target pixel, and the downward direction of the target pixel 2 pixels adjacent to the left and right of each adjacent pixel, 4 pixels adjacent to the top and bottom of the target pixel, one pixel adjacent to the left and right of the adjacent pixel above the target pixel, below the target pixel It is composed of a total of 25 pixels, one adjacent to the left and right of adjacent pixels.
[0204]
In the tap structure of pattern C, the pixel of interest, four pixels adjacent to the top and bottom of the pixel of interest, the pixel adjacent to the left of the pixel of interest, two pixels adjacent to the top and bottom of the pixel, the pixel adjacent to the right of the pixel of interest, A total of 19 pixels, which are two adjacent pixels above and below the pixel, are frame taps, and the remaining 6 pixels are field taps.
[0205]
FIG. 18D shows a tap structure of the pattern D. The tap structure of the pattern D includes, for example, a total of 25 pixels, which are adjacent to each other with the pixel of interest at the center and are 5 × 5 pixels in horizontal × vertical.
[0206]
In the tap structure of the pattern D, all the taps are frame taps because at least one pixel above or below is a tap.
[0207]
Based on the mismatch information, the tap extraction unit 51 (FIG. 9) configures a prediction tap having a tap structure of any one of the patterns A to D shown in FIG.
[0208]
Similarly to the tap extraction unit 51, the tap extraction unit 52 also configures a class tap having a tap structure based on mismatch information.
[0209]
Here, based on the mismatch information, only the pixel position of the decoded image data extracted as the prediction tap is changed, and the number of pixels constituting the prediction tap remains 25 pixels, but the tap is not changed. The extraction unit 51 can change the number of pixels of the decoded image data constituting the prediction tap based on the mismatch information.
[0210]
In addition, in the MPEG decoder 116 of the preprocessing unit 31, the quantized DCT coefficient in the encoded data uses a motion vector, DCT type, quantization step, picture type, and other decoding control information included in the encoded data. The tap extraction unit 51 can also include such decoding control information in the prediction tap. Further, in this case, it is also possible to change the decoding control information used as the prediction tap based on the mismatch information. Further, the tarp extraction unit 51 can include the quantized DCT coefficient included in the encoded data and the DCT coefficient obtained by inverse quantization of the quantized DCT coefficient in the prediction tap.
[0211]
In the tap extraction unit 52, class taps can be configured in the same manner as in the tap extraction unit 51.
[0212]
The prediction tap obtained by the tap extraction unit 51 is supplied to the prediction unit 54, and the class tap obtained by the tap extraction unit 52 is supplied to the class classification unit 53.
[0213]
In addition to the class tap, mismatch information about the attention data is also supplied to the class classification unit 53. As described above, the class classification unit 53 classifies the attention data based on the class tap and the mismatch information.
[0214]
That is, for example, the class classification unit 53 performs class classification by performing compression processing such as ADRC (Adaptive Dynamic Range Coding) processing on the class tap for the data of interest, and obtains a class code.
[0215]
Here, in class classification using ADRC processing, data (here, pixel values) constituting a class tap is subjected to ADRC processing, and for example, an ADRC code obtained as a result is used as a class code.
[0216]
In the K-bit ADRC, for example, the maximum value MAX and the minimum value MIN of the data constituting the class tap are detected, and DR = MAX-MIN is set as the local dynamic range of the set, and based on this dynamic range DR Thus, the data constituting the class tap is requantized to K bits. That is, the minimum value MIN is subtracted from each data constituting the class tap, and the subtracted value is DR / 2. ^K Divide by (quantize). A bit string obtained by arranging the K-bit data constituting the class tap in a predetermined order is output as an ADRC code. Therefore, for example, when a class tap is subjected to 1-bit ADRC processing, each data constituting the class tap is divided by the average value of the maximum value MAX and the minimum value MIN after the minimum value MIN is subtracted. Thus, each data is made 1 bit (binarized). Then, a bit string in which the 1-bit data is arranged in a predetermined order is output as an ADRC code.
[0217]
Note that the class classification unit 53 can output, for example, the level distribution pattern of the data constituting the class tap as a class code as it is. However, in this case, if the class tap is composed of N pieces of data and K bits are assigned to each data, the number of class codes output by the class classification unit 24 is (2 ^N ) ^K As a result, the number is exponentially proportional to the number of bits K of data.
[0218]
Therefore, the class classification unit 53 preferably performs class classification by compressing the information amount of the class tap by the above-described ADRC processing or vector quantization.
[0219]
Here, a class code obtained by classifying using a class tap is hereinafter referred to as a class tap code as appropriate.
[0220]
In addition to obtaining the class tap code as described above, the class classification unit 53 performs class classification using, for example, a set of a DCT type and a block type as mismatch information for the data of interest, thereby generating a 2-bit class. Ask for code.
[0221]
That is, if a class code obtained by class classification using mismatch information is referred to as a mismatch code, the class classification unit 53 indicates that both the DCT type and the block type as mismatch information indicate the field DCT mode. If there is, the 2-bit mismatch code is set to “00”, for example. Also, the class classification unit 53 sets the 2-bit mismatch code to “01”, for example, when the DCT type and the block type represent the field DCT mode and the frame DCT mode, respectively. Furthermore, the class classification unit 53 sets the 2-bit mismatch code to, for example, “10” when the DCT type and the block type represent the frame DCT mode and the field DCT mode, respectively. Further, the class classification unit 53 sets the 2-bit mismatch code to, for example, “11” when the DCT type and the block type both represent the frame DCT mode.
[0222]
Thereafter, the class classification unit 53 adds, for example, the mismatch code obtained for the noticed data as the upper bits of the class tap code obtained for the noticed data, and generates a code composed of the class tap code and the mismatch code. , And output as the final class code for the data of interest.
[0223]
The class classification unit 53 can perform class classification based on other decoding control information other than the DCT type, for example.
[0224]
The class code output from the class classification unit 53 is supplied to the coefficient memory 41. In the coefficient memory 41, the tap coefficient corresponding to the class code is read and supplied to the prediction unit 54.
[0225]
The prediction unit 54 performs the linear prediction calculation shown in Expression (1) using the prediction tap output from the tap extraction unit 51 and the tap coefficient acquired from the coefficient memory 41. Accordingly, the prediction unit 54 obtains attention data (predicted value thereof), that is, high-quality image data, and supplies it to the post-processing unit 33.
[0226]
As described above, the post-processing unit 33 outputs the output of the class classification adaptive processing unit 32 (prediction unit 54 thereof), that is, the high-quality image data as it is.
[0227]
In the above case, 1-bit information indicating whether the DCT type of the target block is correct or incorrect, or a set of the DCT type and block type of the target block is used as mismatch information. As the information, for example, an evaluation value indicating how much the DCT type of the block of interest is correct can be adopted.
[0228]
As an evaluation value representing the correctness of the DCT type of the target block, for example, when the DCT type of the target block is the field DCT mode, the size of the motion vector of the target block (target macroblock) is adopted. When the DCT type of the block of interest is the frame DCT mode, it is possible to employ a subtraction value obtained by subtracting the size of the motion vector of the block of interest from the maximum size of the motion vector. In this case, when the DCT type of the block of interest is the field DCT mode, the larger the motion vector of the block of interest is, and when the DCT type of the block of interest is the frame DCT mode, the size of the motion vector of the block of interest is large. The smaller the value, the larger the evaluation value.
[0229]
In this case, for example, the

tap extraction unit

51 or 52 compares the evaluation value as mismatch information with one or more threshold values, and changes the tap structure of the prediction tap or the class tap based on the comparison result. It is possible to do so. In the class classification unit 53, for example, an evaluation value as mismatch information can be quantized, and the quantized value can be used as a mismatch code.
[0230]
Further, in the above-described case, whether the target block is a motion block or a still block is determined based on the motion vector of the target block, the motion vector of the pre-corresponding block or the post-corresponding block, and the static block or the motion block. However, the determination of whether the target block is a motion block or a still block is performed in addition to, for example, a motion vector of a block around the target block, a pre-corresponding block, or a post-corresponding block, etc. It is also possible to make a determination based on the above.
[0231]
Next, FIG. 19 shows a detailed configuration example of the learning device of FIG. 11 when learning the tap coefficients to be stored in the coefficient memory 41 of FIG.
[0232]
In the embodiment of FIG. 19, high-quality image data (learning image data) is stored in the learning data storage unit 11 as learning data.
[0233]
In the embodiment of FIG. 19, the encoding unit 12 includes an MPEG encoder 131. The MPEG encoder 131 reads learning image data from the learning data storage unit 11, encodes it using the MPEG2 method, and Output the resulting encoded data.
[0234]
That is, FIG. 20 shows a configuration example of the MPEG encoder 131 of FIG.
[0235]
The image data for learning is supplied to the motion vector detection unit 141 and the calculation unit 143. The motion vector detection unit 141 detects the motion vector of the learning image data, for example, by performing block matching on the learning image data, and supplies the motion vector to the motion compensation unit 142.
[0236]
In addition, the calculation unit 143 subtracts the prediction image supplied from the motion compensation unit 142 from the learning image data (original image) as necessary, and sends the residual image obtained as a result to the DCT conversion unit 144. Supply. The DCT conversion unit 144 DCT-transforms the residual image from the calculation unit 143 and supplies the DCT coefficient obtained as a result to the quantization unit 145. The quantization unit 145 obtains a quantized DCT coefficient by quantizing the DCT coefficient supplied from the DCT transform unit 144 at a predetermined quantization step, and supplies the quantized DCT coefficient to the VLC unit 146 and the inverse quantization unit 147. .
[0237]
The VLC unit 146 variable-length-encodes the quantized DCT coefficient supplied from the quantization unit 145 into a VLC code, and further performs necessary decoding control information (for example, the motion vector detected by the motion vector detection unit 141, the quantum The quantizing step used in the converting unit 145 is multiplexed to obtain and output encoded data.
[0238]
On the other hand, in the inverse quantization unit 147, the quantized DCT coefficient output from the quantization unit 145 is inversely quantized, and a DCT coefficient is obtained and supplied to the inverse DCT transform unit 148. The inverse DCT transform unit 148 decodes the DCT coefficient from the inverse quantization unit 147 into a residual image by performing inverse DCT transform, and supplies the residual image to the computation unit 149.
[0239]
The arithmetic unit 149 is supplied with the residual image from the inverse DCT transform unit 148, and also receives the same predicted image used in the arithmetic unit 143 to obtain the residual image from the motion compensation unit 142. The calculation unit 149 decodes the original image (local decoding) by adding the residual image and the predicted image. This decoded image is supplied to the memory 150 and stored as a reference image.
[0240]
Then, the motion compensation unit 142 reads the reference image stored in the memory 150 and performs motion compensation according to the motion vector supplied from the motion vector detection unit 141, thereby generating a predicted image. This predicted image is supplied from the motion compensation unit 142 to the calculation units 143 and 149.
[0241]
As described above, the calculation unit 143 obtains a residual image using the prediction image from the motion compensation unit 142, and the calculation unit 149 uses the prediction image from the motion compensation unit 142 to obtain the original image. The image is decoded.
[0242]
Returning to FIG. 19, the encoded data output from the MPEG decoder 131 is supplied to the decoding control information extraction unit 71.
[0243]
The decoding control information extraction unit 71 includes an inverse VLC unit 132. The inverse VLC unit 132 performs the same processing as the inverse VLC unit 111 in FIG. 13, and thereby extracts a DCT type, a picture type, a macroblock type, and a motion vector as a plurality of decoding control information from the encoded data. To the determination unit 72.
[0244]
The determination unit 72 includes a field / frame determination unit 133, an intra / non-intra determination unit 134, a static motion determination unit 135, and a mismatch information generation unit 136. In the field / frame determination unit 133, the intra / non-intra determination unit 134, the static motion determination unit 135, or the mismatch information generation unit 136, DCT types as a plurality of decoding control information supplied from the decoding control information extraction unit 71 , Picture type, macroblock type, and motion vector, respectively, the same as in the field / frame determination unit 112, intra / non-intra determination unit 113, static motion determination unit 114, or mismatch information generation unit 115 of FIG. As a result, mismatch information is generated for the teacher data that is the attention teacher data in the adaptive learning unit 60. This mismatch information is supplied from the mismatch information generation unit 136 to the adaptive learning unit 60.
[0245]
In the embodiment of FIG. 19, the reverse post-processing unit 61A reads the learning image data from the learning data storage unit 11 and outputs it as it is to the adaptive learning unit 60 as teacher data. In the adaptive learning unit 60 (FIG. 11), the teacher data from the inverse post-processing unit 61A is stored in the teacher data storage unit 62.
[0246]
The encoding unit 63A is composed of an MPEG encoder 137, and the MPEG encoder 137 reads learning image data from the learning data storage unit 11 and encodes it by the MPEG2 system, as in the case of the MPEG encoder 131, and is obtained as a result. The encoded data is output to the preprocessing unit 63B.
[0247]
The pre-processing unit 63B includes an MPEG decoder 138 configured in the same manner as the MPEG decoder 116 in FIG. 14. The MPEG decoder 138 decodes encoded data from the MPEG encoder 137 using the MPEG2 method, and is obtained as a result. The decoded image data is output to the adaptive learning unit 60 as student data. In the adaptive learning unit 60 (FIG. 11), the student data from the MPEG decoder 138 is stored in the student data storage unit 64.
[0248]
Then, the adaptive learning unit 60 uses the teacher data and the student data, and statistically calculates the prediction error of the predicted value of the teacher data obtained by performing the linear prediction calculation of Expression (1) from the prediction tap extracted from the student data. Learning to obtain a tap coefficient that is minimized is performed.
[0249]
That is, in the adaptive learning unit 60 (FIG. 11), the tap extraction unit 65 sets the teacher data stored in the teacher data storage unit 62 as not-notice teacher data as attention teacher data, and attention teacher data. Is configured from the student data stored in the student data storage unit 64 and supplied to the adding unit 68. Further, the tap extraction unit 66 configures class taps from the student data stored in the student data storage unit 64 for the teacher data of interest, and supplies the class taps to the class classification unit 67.
[0250]
Here, mismatch information is supplied to the

tap extraction units

65 and 66, and each of the

tap extraction units

65 and 66 uses the class classification described in FIG. 13 for the attention teacher data based on the mismatch information. A prediction tap or a class tap having the same tap structure as that formed by the tap extraction unit 51 or 52 (FIG. 9) of the adaptive processing unit 32 is configured.
[0251]
Therefore, for example, in the case where the

tap extraction unit

51 or 52 configures each prediction tap or class tap using the decoding control information included in the encoded data as described with reference to FIG. Also in the learning device, the tap extraction unit 65 or 66 (FIG. 11) also uses the decoding control information to form a prediction tap or a class tap, respectively.
[0252]
Thereafter, in the class classification unit 67 (FIG. 11), based on the class tap and mismatch information for the attention teacher data, the same class classification as that in the class classification unit 53 described in FIG. The class code corresponding to the resulting class is output to the adding unit 68.
[0253]
The adding unit 68 reads the attention teacher data from the teacher data storage unit 62, and calculates the components of the matrix A and the vector v in Expression (8) using the attention teacher data and the prediction tap from the tap extraction unit 65. . Furthermore, the adding unit 68 is a matrix obtained from the attention teacher data and the prediction tap for the components corresponding to the class code from the class classification unit 67 among the components of the matrix A and the vector v that have already been obtained. Add the components of A and vector v.
[0254]
When the above processing is performed on all the teacher data stored in the teacher data storage unit 62 as attention teacher data, the adding unit 68 adds the matrix A and the vector v for each class obtained by the above processing. The normal equation of the equation (8) composed of components is supplied to the tap coefficient calculation unit 69, and the tap coefficient calculation unit 69 solves the normal equation for each class, thereby obtaining the tap coefficient for each class. Find and output.
[0255]
In the learning device of FIG. 19, for example, the number of pixels of the learning image data is thinned out to 1 / N before the MPEG image 137 of the encoding unit 63A encodes the learning image data. As a result, the adaptive learning unit 60 can obtain a tap coefficient of the MPEG decoded image data with high image quality and N-times the number of pixels (higher resolution).
[0256]
Next, FIG. 21 shows a second detailed configuration example of the decoding device of FIG. 6 when the encoded data is obtained by encoding image data by the MPEG2 system. In the figure, portions corresponding to those in FIG. 13 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.
[0257]
In the embodiment of FIG. 21, the preprocessing unit 31 includes an inverse VLC unit 161, an inverse quantization unit 162, a calculation unit 163, an MPEG decoder 164, a memory 165, a motion compensation unit 166, and a DCT conversion unit 167. Yes.
[0258]
In the preprocessing unit 31, the encoded data is supplied to the inverse VLC unit 161 and the MPEG decoder 164.
[0259]
The inverse VLC unit 161 separates the VLC code of the quantized DCT coefficient from the encoded data, and also separates the quantization step, the motion vector, and other decoding control information. Then, the inverse VLC unit 161 performs inverse VLC processing on the VLC code of the quantized DCT coefficient to decode the quantized DCT coefficient, and supplies the quantized DCT coefficient to the inverse quantization unit 162. Further, the inverse VLC unit 161 supplies the quantization step to the inverse quantization unit 162 and the motion vector to the motion compensation unit 166, respectively.
[0260]
The inverse quantization unit 162 inversely quantizes the quantized DCT coefficient supplied from the inverse VLC unit 161 in the quantization step also supplied from the inverse VLC unit 161, and the DCT of the 8 × 8 pixel block obtained as a result The coefficient is supplied to the calculation unit 163.
[0261]
On the other hand, in the MPEG decoder 164, the encoded data is decoded by the MPEG method, and decoded image data is output. Of the decoded images output from the MPEG decoder 164, I and P pictures that can be used as reference images are supplied to the memory 165 and stored therein.
[0262]
Then, the motion compensation unit 166 reads the decoded image stored in the memory 165 as a reference image, and performs reverse motion compensation on the reference image according to the motion vector supplied from the inverse VLC unit 161, thereby A prediction image of the block supplied from the quantization unit 162 to the calculation unit 163 is generated and supplied to the DCT conversion unit 167. The DCT conversion unit 167 performs DCT conversion on the prediction image supplied from the motion compensation unit 166 and supplies the DCT coefficient obtained as a result to the calculation unit 163.
[0263]
The calculation unit 163 adds each DCT coefficient of the block supplied from the inverse quantization unit 162 and the corresponding DCT coefficient supplied from the DCT conversion unit 167 as necessary, thereby adding a pixel of the block. A DCT coefficient obtained by DCT transforming the value is obtained.
[0264]
That is, when the block supplied from the inverse quantization unit 162 is intra-coded, the DCT coefficient of the block supplied from the inverse quantization unit 162 is obtained by DCT transforming the original pixel value. Therefore, the calculation unit 163 outputs the DCT coefficient of the block supplied from the inverse quantization unit 162 as it is.
[0265]
Further, when the block supplied from the inverse quantization unit 162 is non-intra coded, the DCT coefficient of the block supplied from the inverse quantization unit 162 is the difference between the original pixel value and the predicted image. Since the value (residual image) is obtained by DCT transform, the calculation unit 163 calculates each DCT coefficient of the block supplied from the inverse quantization unit 162 and the predicted image supplied from the DCT transform unit 167. By adding the corresponding DCT coefficients obtained by DCT conversion, the DCT coefficients obtained by DCT conversion of the original pixel values are obtained and output.
[0266]
The DCT coefficient of the block output from the calculation unit 163 is supplied to the class classification adaptive processing unit 32 as preprocessing data.
[0267]
In the embodiment of FIG. 21, the class classification adaptation processing unit 32 performs the class classification adaptation processing for the DCT coefficients output from the preprocessing unit 31, whereby high-quality image data (predicted value thereof) is obtained. Is obtained as adaptive processing data.
[0268]
That is, in the class classification adaptive processing unit 32 (FIG. 9), the DCT coefficients output from the preprocessing unit 31 are supplied to the

tap extraction units

51 and 52.
[0269]
The tap extraction unit 51 extracts, as prediction taps, some DCT coefficients as preprocessing data used for predicting the attention data using pixels of high-quality image data that are not yet attention data as attention data. The tap extraction unit 52 also extracts some of the DCT coefficients as preprocessing data used for classifying the data of interest as class taps.
[0270]
The

tap extraction unit

51 or 52 changes the tap structure of the prediction tap or the class tap based on the mismatch information about the data of interest.
[0271]
That is, for example, the tap extraction unit 51 extracts all the DCT coefficients of the block of attention data (target block) and the necessary DCT coefficients in the blocks adjacent to the top, bottom, left, and right of the target block according to the mismatch information. Configure the prediction tap. The tap extraction unit 51 also forms a class tap in the same manner as the tap extraction unit 51.
[0272]
The prediction tap obtained by the tap extraction unit 51 is supplied to the prediction unit 54, and the class tap obtained by the tap extraction unit 52 is supplied to the class classification unit 53.
[0273]
Based on the class tap and mismatch information about the data of interest, the class classification unit 53 classifies the data of interest in the same manner as described with reference to FIG. 13 and supplies the class code of the data of interest to the coefficient memory 41. Is done. In the coefficient memory 41, the tap coefficient corresponding to the class code for the data of interest is read and supplied to the prediction unit 54.
[0274]
The prediction unit 54 acquires the tap coefficient supplied from the coefficient memory 41, and performs the linear prediction calculation shown in Expression (1) using the tap coefficient and the prediction tap output from the tap extraction unit 51. Accordingly, the prediction unit 54 obtains attention data (predicted value thereof), that is, high-quality image data, and supplies it to the post-processing unit 33.
[0275]
In the post-processing unit 33, the high-quality image data from the class classification adaptation processing unit 32 is output as it is.
[0276]
Therefore, in the embodiment of FIG. 21, the class classification adaptation processing unit 32 converts the DCT coefficients into high-quality image data.
[0277]
Next, FIG. 22 shows a detailed configuration example of the learning device in FIG. 11 when learning tap coefficients to be stored in the coefficient memory 41 of the decoding device in FIG. In the figure, portions corresponding to those in FIG. 19 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.
[0278]
In the embodiment of FIG. 22, the preprocessing unit 63B includes an inverse VLC unit 171, an inverse quantization unit 172, a calculation unit 173, an MPEG decoder 174, a memory 175, a motion compensation unit 176, and a DCT conversion unit 177. The inverse VLC unit 171 through DCT conversion unit 177 are configured in the same manner as the inverse VLC unit 161 through DCT conversion unit 167 in FIG.
[0279]
Therefore, in the preprocessing unit 63B, the same processing as in the preprocessing unit 31 in FIG. 21 is performed on the encoded data output from the MPEG encoder 137 of the encoding unit 63A, and the DCT coefficient obtained thereby is converted. , And supplied to the adaptive learning unit 60 as student data.
[0280]
In the adaptive learning unit 60 (FIG. 11), the DCT coefficients supplied from the preprocessing unit 63B are stored as student data in the student data storage unit 64, and teacher data and student data are the same as described with reference to FIG. Is used to learn the tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the linear prediction calculation of Equation (1) from the prediction tap extracted from the student data. Thus, the tap coefficient for each class for converting the DCT coefficient as the student data into the high-quality image data is obtained.
[0281]
However, in the embodiment of FIG. 22, in the adaptive learning unit 60 (FIG. 11), the

tap extraction unit

51 or 52 in the class classification adaptation processing unit 32 (FIG. 9) in FIG. A prediction tap or a class tap having the same tap structure as that configured by is configured based on the mismatch information. Furthermore, the class classification unit 67 in the adaptive learning unit 60 (FIG. 11) in FIG. 22 performs the same class classification as the class classification unit 53 in the class classification adaptation processing unit 32 (FIG. 9) in FIG.
[0282]
Next, FIG. 23 shows a third detailed configuration example of the decoding device of FIG. 6 when the encoded data is obtained by encoding image data by the MPEG2 system. In the figure, portions corresponding to those in FIG. 21 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.
[0283]
The decoding apparatus in FIG. 23 is configured in the same manner as in FIG. 21 except that the post-processing unit 33 is configured by an inverse DCT transform unit 181.
[0284]
In the embodiment of FIG. 23, in the class classification adaptation processing unit 32, class classification adaptation processing is performed on the DCT coefficients output from the preprocessing unit 31, and as a result, when inverse DCT transformation is performed, A DCT coefficient (hereinafter appropriately referred to as a high image quality DCT coefficient) (predicted value) from which image quality image data can be obtained is obtained as adaptive processing data.
[0285]
That is, in the class classification adaptive processing unit 32 (FIG. 9), the DCT coefficient as the preprocessing data output from the preprocessing unit 31 is supplied to the

tap extraction units

51 and 52.
[0286]
The tap extraction unit 51 extracts, as prediction taps, some DCT coefficients as preprocessing data used for predicting the attention data using high-quality DCT coefficients that are not yet attention data as attention data. That is, the tap extraction unit 51 configures, for example, a prediction tap having the same tap structure as that in FIG. 21 for the attention data based on the mismatch information. Based on the mismatch information, the tap extraction unit 52 also configures class taps having the same tap structure as that in FIG.
[0287]
The prediction tap obtained by the tap extraction unit 51 is supplied to the prediction unit 54, and the class tap obtained by the tap extraction unit 52 is supplied to the class classification unit 53.
[0288]
Based on the class tap and mismatch information about the data of interest, the class classification unit 53 classifies the data of interest in the same manner as in FIG. 21, and supplies the class code for the data of interest to the coefficient memory 41. . In the coefficient memory 41, the tap coefficient corresponding to the class code for the data of interest is read and supplied to the prediction unit 54.
[0289]
The prediction unit 54 acquires the tap coefficient output from the coefficient memory 41, and performs the linear prediction calculation shown in Expression (1) using the tap coefficient and the prediction tap output from the tap extraction unit 51. As a result, the prediction unit 54 obtains attention data (predicted value thereof), that is, a high-quality DCT coefficient, and supplies it to the post-processing unit 33.
[0290]
In the post-processing unit 33, in the inverse DCT conversion unit 181, the high-quality DCT coefficients output from the class classification adaptation processing unit 32 are subjected to inverse DCT conversion in units of blocks, whereby high-quality image data is obtained and output. .
[0291]
Next, FIG. 24 illustrates a detailed configuration example of the learning device in FIG. 11 when learning tap coefficients to be stored in the coefficient memory 41 of the decoding device in FIG. In the figure, portions corresponding to those in FIG. 22 are denoted by the same reference numerals, and description thereof will be omitted below as appropriate.
[0292]
The learning device in FIG. 24 is configured in the same manner as in FIG. 22 except that the reverse post-processing unit 61A is configured by a DCT conversion unit 191.
[0293]
Accordingly, in the reverse post-processing unit 61A, the DCT conversion unit 191 DCT-converts the high-quality image data as the learning image data read from the learning data storage unit 11 in units of blocks, and the resulting DCT A high-quality DCT coefficient that is a coefficient is supplied to the adaptive learning unit 60 as teacher data.
[0294]
In the adaptive learning unit 60 (FIG. 11), the high-quality DCT coefficient supplied from the inverse post-processing unit 61A is stored as teacher data in the teacher data storage unit 62 and stored in the teacher data and student data storage unit 64. Using the DCT coefficient as the student data (the DCT coefficient is obtained from the encoded data obtained by MPEG-coding the image data), the linear prediction of Expression (1) is performed from the prediction tap extracted from the student data. Learning is performed to obtain a tap coefficient that statistically minimizes the prediction error of the predicted value of the teacher data obtained by performing the calculation, thereby converting the DCT coefficient as student data into a high-quality DCT coefficient Each tap coefficient is determined.
[0295]
That is, in this case, the DCT coefficient that is student data is obtained from the encoded data in the pre-processing unit 63B and includes a quantization error. Therefore, the DCT coefficient is subjected to inverse DCT conversion. The obtained image has a low image quality having a so-called block distortion or the like.
[0296]
Therefore, as described above, the adaptive learning unit 60 calculates the prediction value of the teacher data (high-quality DCT coefficient obtained by DCT conversion of learning image data) obtained by performing the linear prediction calculation of Expression (1). By performing learning for obtaining a tap coefficient that statistically minimizes the prediction error, a tap coefficient for each class that converts the DCT coefficient that is student data into a high-quality DCT coefficient is obtained.
[0297]
24, in the adaptive learning unit 60 (FIG. 11), the

tap extraction unit

51 or 52 in the class classification adaptive processing unit 32 (FIG. 9) in FIG. A prediction tap or a class tap having the same tap structure as that configured by is configured based on the mismatch information. Furthermore, the class classification unit 67 in the adaptive learning unit 60 (FIG. 11) in FIG. 24 performs the same class classification as the class classification unit 53 in the class classification adaptation processing unit 32 (FIG. 9) in FIG.
[0298]
As described above, the correctness of the decoding control information included in the encoded data is determined, and based on the mismatch information representing the determination result, the encoded data is decoded and the tap coefficients used for the decoding are learned. Therefore, in learning, it is possible to obtain a tap coefficient for obtaining a prediction value close to the original image in consideration of the correctness of the decoding control information, and as a result, using such a tap coefficient, By decoding the digitized data, a high-quality image can be obtained.
[0299]
In other words, in the present embodiment, the correctness of the DCT type is determined, and the tap coefficient is learned in consideration of the determination result. In addition to obtaining a tap coefficient for decoding the part into a predicted value close to the original image, if the MPEG2 method is used for decoding, the part that becomes unnatural motion is converted into the original image. A tap coefficient for decoding to a close prediction value can be obtained. A high-quality image can be obtained by using such a tap coefficient and decoding the encoded data in consideration of the correctness of the DCT type.
[0300]
Next, the series of processes described above can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is installed in a general-purpose computer or the like.
[0301]
Therefore, FIG. 25 illustrates a configuration example of an embodiment of a computer in which a program for executing the above-described series of processes is installed.
[0302]
The program can be recorded in advance on a hard disk 405 or a ROM 403 as a recording medium built in the computer.
[0303]
Alternatively, the program is stored temporarily on a removable recording medium 411 such as a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, or a semiconductor memory. It can be stored permanently (recorded). Such a removable recording medium 411 can be provided as so-called package software.
[0304]
The program is installed in the computer from the removable recording medium 411 as described above, or transferred from the download site to the computer wirelessly via a digital satellite broadcasting artificial satellite, LAN (Local Area Network), The program can be transferred to a computer via a network such as the Internet, and the computer can receive the program transferred in this way by the communication unit 408 and install it in the built-in hard disk 405.
[0305]
The computer includes a CPU (Central Processing Unit) 402. An input / output interface 410 is connected to the CPU 402 via the bus 401, and the CPU 402 operates the input unit 407 including a keyboard, a mouse, a microphone, and the like by the user via the input / output interface 410. When a command is input by the equalization, a program stored in a ROM (Read Only Memory) 403 is executed accordingly. Alternatively, the CPU 402 may be a program stored in the hard disk 405, a program transferred from a satellite or a network, received by the communication unit 408, installed in the hard disk 405, or a removable recording medium 411 installed in the drive 409. The program read and installed in the hard disk 405 is loaded into a RAM (Random Access Memory) 404 and executed. Thereby, the CPU 402 performs processing according to the above-described flowchart or processing performed by the configuration of the above-described block diagram. Then, the CPU 402 outputs the processing result from the output unit 406 configured with an LCD (Liquid Crystal Display), a speaker, or the like via the input / output interface 410, or from the communication unit 408 as necessary. Transmission and further recording on the hard disk 405 are performed.
[0306]
Here, in this specification, the processing steps for describing a program for causing a computer to perform various types of processing do not necessarily have to be processed in time series according to the order described in the flowchart, but in parallel or individually. This includes processing to be executed (for example, parallel processing or processing by an object).
[0307]
Further, the program may be processed by one computer or may be distributedly processed by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.
[0308]
In the present embodiment, the case where the image data is encoded by the MPEG2 system has been described. However, the present invention is not limited to the MPEG2 system, and an image encoded by another lossy compression system is used. Applicable when decrypting.
[0309]
Further, in the present embodiment, the correctness (appropriateness) of the DCT type, which is one of the plurality of decoding control information included in the encoded data, is set as another one of the plurality of decoding control information. Is determined based on the motion vector, and decoding of the encoded data and learning of the tap coefficient are performed based on the mismatch information representing the determination result. In addition, a plurality of decoding controls included in the encoded data are performed. The correctness (adequacy) of information other than the DCT type is determined based on one or more of the plurality of decoding control information, and based on the mismatch information representing the determination result, the encoded data It is possible to perform decoding and learning of tap coefficients.
[0310]
【The invention's effect】
According to the decoding apparatus and the decoding method, the first program, and the first recording medium of the present invention, the correctness of the DCT type included in the encoded data is converted into the motion vector of the image data included in the encoded data. Based on the presence / absence of motion of image data in units of blocks, mismatch information representing the determination result is output. Of the high quality data of the high quality image than the low quality image obtained by decoding the encoded data, The high-quality data for each pixel you are trying to obtain Featured data And , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Some of the low-quality data is extracted as prediction taps, and the student data is the student data that corresponds to the low-quality data and becomes the learning student, and the teacher data that corresponds to the high-quality data and becomes the learning teacher. By performing the product-sum operation on the tap coefficient obtained by performing learning that statistically minimizes the prediction error of the predicted value of the teacher data obtained by the product-sum operation between the tap coefficient and the tap coefficient, Data is required. Here, based on the mismatch information, when the mismatch information indicates that the DCT type is correct, when the DCT type is the field DCT mode, a prediction tap is extracted from the low quality data in the field of the target data, In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is correct for the DCT type. If not, a prediction tap is extracted from the low quality data of both the field of interest data and the frame. Therefore, the encoded data can be decoded into high-quality image data.
[0311]
According to the learning device, the learning method, the second program, and the second recording medium of the present invention, teacher data serving as a teacher for learning tap coefficients is generated from the learning image data, and the student becomes a student. Student data is generated. Further, learning image data is encoded, and encoded learning data including a DCT type and a motion vector of the image data is output. Then, the correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the learning encoded data. Mismatch information representing the determination result is output. Furthermore, of the high quality data of the high quality image than the low quality image obtained by decoding the encoded data, The high-quality data for each pixel you are trying to obtain Featured data And , For seeking attention data Low quality image used for product-sum operation with a given tap coefficient Pixel by pixel Some of the low-quality data is extracted as prediction taps, and the teacher is obtained by multiplying the student data by the tap coefficient using the student data corresponding to the low-quality data and the teacher data corresponding to the high-quality data. A tap coefficient that statistically minimizes the prediction error of the predicted value of data is obtained, and attention data is obtained by performing a product-sum operation on the tap coefficient and the prediction tap. Here, based on the mismatch information, when the mismatch information indicates that the DCT type is correct, when the DCT type is the field DCT mode, a prediction tap is extracted from the low quality data in the field of the target data, In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is the frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data, and the mismatch information is correct for the DCT type. If not, a prediction tap is extracted from the low quality data of both the field of interest data and the frame. Therefore, encoded data can be decoded into high-quality image data by the tap coefficient.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of an embodiment of a decoding device to which the present invention has been applied.
FIG. 2 is a flowchart illustrating processing of a decoding device.
FIG. 3 is a block diagram illustrating a configuration example of another embodiment of a decoding device to which the present invention has been applied.
FIG. 4 is a block diagram illustrating a configuration example of an embodiment of a learning device to which the present invention has been applied.
FIG. 5 is a flowchart illustrating processing of the learning device.
FIG. 6 is a block diagram illustrating a more detailed configuration example of a decoding device to which the present invention has been applied.
FIG. 7 is a diagram for explaining a frame DCT mode and a field DCT mode.
FIG. 8 is a diagram schematically illustrating a decoded image when a macroblock on which a moving object is displayed is encoded in a frame DCT mode and a field DCT mode.
9 is a block diagram illustrating a configuration example of a class classification adaptation processing unit 32. FIG.
FIG. 10 is a flowchart illustrating processing of a decoding device.
FIG. 11 is a block diagram showing a more detailed configuration example of a learning apparatus to which the present invention is applied.
FIG. 12 is a flowchart illustrating processing of the learning device.
FIG. 13 is a block diagram illustrating a first configuration example of a decoding device that decodes encoded data encoded by the MPEG method.
14 is a block diagram illustrating a configuration example of an MPEG decoder 116. FIG.
FIG. 15 is a flowchart for explaining processing of the mismatch information generation unit 115;
FIG. 16 is a diagram schematically illustrating a decoded image when a macroblock on which a moving object is displayed is encoded in a frame DCT mode and a field DCT mode.
FIG. 17 is a diagram illustrating a tap structure setting table.
FIG. 18 is a diagram illustrating a tap structure of patterns A to D;
FIG. 19 is a block diagram illustrating a first configuration example of a learning device that learns tap coefficients used to decode encoded data encoded by the MPEG method;
20 is a block diagram illustrating a configuration example of an MPEG encoder 131. FIG.
FIG. 21 is a block diagram illustrating a second configuration example of a decoding device that decodes encoded data encoded by the MPEG method.
FIG. 22 is a block diagram illustrating a second configuration example of a learning device that learns tap coefficients used to decode encoded data encoded by the MPEG method.
FIG. 23 is a block diagram illustrating a third configuration example of a decoding device that decodes encoded data encoded by the MPEG method.
FIG. 24 is a block diagram illustrating a third configuration example of a learning device that learns tap coefficients used to decode encoded data encoded by the MPEG method.
FIG. 25 is a block diagram illustrating a configuration example of an embodiment of a computer to which the present invention has been applied.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Mismatch detection part, 2 Decoding processing part, 3 Parameter storage part, 11 Learning data storage part, 12 Encoding part, 13 Mismatch detection part, 14 Learning processing part, 21 Decoding control information extraction part, 22 Determination part, 31 Before Processing unit, 32 class classification adaptive processing unit, 33 post-processing unit, 41 coefficient memory, 51, 52 tap extraction unit, 53 class classification unit, 54 prediction unit, 60 adaptive learning unit, 61 teacher data generation unit, 61A reverse post-processing Unit, 62 teacher data storage unit, 63 student data generation unit, 63A encoding unit, 63B preprocessing unit, 64 student data storage unit, 65, 66 tap extraction unit, 67 class classification unit, 68 addition unit, 69 tap coefficient Calculation unit, 71 decoding control information extraction unit, 72 determination unit, 111 inverse VLC unit, 112 field / frame determination unit, 113 intra / Non-intra decision unit, 114 static motion decision unit, 115 mismatch information generation unit, 116 MPEG decoder, 121 inverse VLC unit, 122 inverse quantization unit, 123 inverse DCT conversion unit, 124 operation unit, 125 motion compensation unit, 126 memory , 127 picture selection unit, 131 MPEG encoder, 132 inverse VLC unit, 133 field / frame determination unit, 134 intra / non-intra determination unit, 135 still motion determination unit, 136 mismatch information generation unit, 137 MPEG encoder, 138 MPEG decoder, 141 motion vector detection unit, 142 motion compensation unit, 143 calculation unit, 144 DCT conversion unit, 145 quantization unit, 146 VLC unit, 147 inverse quantization unit, 148 inverse DCT conversion unit, 149 calculation unit, 150 memory, 161 inverse VLC section, 16 Inverse quantization unit, 163 calculation unit, 164 MPEG decoder, 165 memory, 166 motion compensation unit, 167 DCT conversion unit, 171 inverse VLC unit, 172 inverse quantization unit, 173 calculation unit, 174 MPEG decoder, 175 memory, 176 motion Compensation unit, 177 DCT conversion unit, 181 inverse DCT conversion unit, 191 DCT conversion unit, 401 bus, 402 CPU, 403 ROM, 404 RAM, 405 hard disk, 406 output unit, 407 input unit, 408 communication unit, 409 drive, 410 I / O interface, 411 removable recording media

Claims

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a decoding device for decoding encoded data,
The correctness of the DCT type included in the encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the encoded data, and represents the determination result Judgment means for outputting mismatch information;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the higher-quality image than the low-quality image obtained by decoding the encoded data. Prediction tap extraction means for extracting some of the low-quality data in pixel units of the low-quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using student data corresponding to the low-quality data and learning data corresponding to the high-quality data and teacher data corresponding to the high-quality data and obtained by a product-sum operation of the student data and tap coefficients Prediction calculation for obtaining the attention data by performing the product-sum operation on the tap coefficient obtained by performing the learning to statistically minimize the prediction error of the predicted value of the teacher data and the prediction tap. And a decoding means comprising:
The prediction tap extraction means is based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is incorrect, the prediction tap is extracted from the low quality data of both the field of interest data and the frame.

The determination means includes
2. The decoding device according to claim 1, wherein when there is a motion in the image data of the block unit, when the DCT type for the block is a frame DCT mode, it is determined that the DCT type is not correct.

The encoded data is obtained by performing non-intra encoding or intra encoding that is encoding that does not use the predicted image for the block-unit image data.
The determination means includes
For an intra-coded block, the presence or absence of motion of image data is determined based on the motion vector in a block of a frame before or after the frame of the block;
The decoding apparatus according to claim 2, wherein the presence or absence of motion of image data is determined for a non-intra coded block based on the motion vector in the block.

The decoding means includes
Class tap extraction means for extracting some of the low quality data in pixel units used as a class tap for classifying the attention data into any one of a plurality of classes;
Classifying means for classifying the attention data into classes corresponding to values of the low quality data constituting the class tap, and outputting a class code representing the class of the attention data;
Tap coefficient acquisition means for acquiring a tap coefficient corresponding to the class code from the tap coefficient for each class; and
The prediction calculation means obtains the attention data by performing a product-sum operation on the prediction tap output by the prediction tap extraction means and the tap coefficient obtained by the tap coefficient acquisition means. Item 4. A decoding device according to Item 1.

The class tap extraction means is based on the mismatch information,
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the class tap is extracted from the low quality data in the field of the target data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the class tap is extracted from the low quality data of the frame of the target data;
The class tap is extracted from the low quality data of both the field and the frame of the data of interest when the mismatch information indicates that the DCT type is not correct. Decoding device.

The class classification means classifies the data of interest into a class corresponding to a value of the low quality data constituting the class tap and a value of the mismatch information based on the mismatch information. The decoding device according to claim 4.

The decoding apparatus according to claim 1, wherein the encoded data is obtained by encoding image data according to an MPEG (Moving Picture Experts Group) method.

The low-quality data is image data obtained by decoding the encoded data according to the MPEG (Moving Picture Experts Group) system,
The decoding apparatus according to claim 1, wherein the high quality data is image data with higher image quality than image data that is the low quality data.

The low quality data is a DCT coefficient of image data obtained by decoding the encoded data in accordance with an MPEG (Moving Picture Experts Group) system,
The decoding apparatus according to claim 1, wherein the high-quality data is image data with higher image quality than image data obtained by decoding the encoded data according to an MPEG system.

The low quality data is a DCT coefficient of image data obtained by decoding the encoded data in accordance with an MPEG (Moving Picture Experts Group) system,
The decoding apparatus according to claim 1, wherein the high-quality data is a DCT coefficient of image data having higher image quality than image data obtained by decoding the encoded data according to an MPEG system.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a decoding method for decoding encoded data,
The correctness of the DCT type included in the encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the encoded data, and represents the determination result A determination step of outputting mismatch information;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the high-quality image obtained by decoding the encoded data. A prediction tap extraction step for extracting some of the low quality data in pixel units of the low quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using student data corresponding to the low-quality data and learning data corresponding to the high-quality data and teacher data corresponding to the high-quality data and obtained by a product-sum operation of the student data and tap coefficients Prediction calculation for obtaining the attention data by performing the product-sum operation on the tap coefficient obtained by performing the learning to statistically minimize the prediction error of the predicted value of the teacher data and the prediction tap. And a decryption step comprising:
In the prediction tap extraction step, based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, the prediction tap is extracted from the low quality data of both the field of interest data and the frame.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a program for causing a computer to perform a decoding process for decoding encoded data,
The correctness of the DCT type included in the encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the encoded data, and represents the determination result A determination step of outputting mismatch information;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the high-quality image obtained by decoding the encoded data. A prediction tap extraction step for extracting some of the low quality data in pixel units of the low quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using student data corresponding to the low-quality data and learning data corresponding to the high-quality data and teacher data corresponding to the high-quality data and obtained by a product-sum operation of the student data and tap coefficients Prediction calculation for obtaining the attention data by performing the product-sum operation on the tap coefficient obtained by performing the learning to statistically minimize the prediction error of the predicted value of the teacher data and the prediction tap. And a decryption step comprising:
In the prediction tap extraction step, based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, the prediction tap is extracted from the low quality data of both the field of interest data and the frame.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT conversion is performed in a predetermined block unit in a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a recording medium on which a program for causing a computer to perform a decoding process for decoding encoded data is recorded,
The correctness of the DCT type included in the encoded data is determined based on the presence or absence of motion of the image data in block units based on the motion vector of the image data included in the encoded data, and represents the determination result A determination step of outputting mismatch information;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the high-quality image obtained by decoding the encoded data. A prediction tap extraction step for extracting some of the low quality data in pixel units of the low quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using student data corresponding to the low-quality data and learning data corresponding to the high-quality data and teacher data corresponding to the high-quality data and obtained by a product-sum operation of the student data and tap coefficients Prediction calculation for obtaining the attention data by performing the product-sum operation on the tap coefficient obtained by performing the learning to statistically minimize the prediction error of the predicted value of the teacher data and the prediction tap. And a decryption step comprising:
In the prediction tap extraction step, based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, a program for extracting the prediction tap from the low-quality data of both the field and frame of the data of interest is recorded. Recording media to be used.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a learning device that learns tap coefficients used to decode encoded data,
Teacher data generation means for generating and outputting teacher data serving as a teacher for learning the tap coefficient from image data for learning;
Student data generation means for generating and outputting student data to be students of learning of the tap coefficient from the learning image data;
Encoding means for encoding the learning image data and outputting encoded data for learning including the DCT type and a motion vector of the image data;
The correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the block-based image data based on the motion vector of the image data included in the learning encoded data. Determining means for outputting mismatch information representing the determination result;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the higher-quality image than the low-quality image obtained by decoding the encoded data. Prediction tap extraction means for extracting some of the low-quality data in pixel units of the low-quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using the student data corresponding to the low quality data and the teacher data corresponding to the high quality data, a prediction error of the predicted value of the teacher data obtained by a product-sum operation of the student data and a tap coefficient is A learning means having a tap coefficient calculating means for obtaining a statistically minimum tap coefficient;
Decoding means having prediction calculation means for obtaining the data of interest by performing the product-sum calculation of the tap coefficient and the prediction tap;
The prediction tap extraction means is based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, the prediction tap is extracted from the low quality data of both the field of interest data and the frame.

The determination means includes
The learning apparatus according to claim 14, wherein when there is a motion in the block-unit image data and the DCT type for the block is a frame DCT mode, the DCT type is determined to be incorrect.

The encoded data is obtained by performing non-intra encoding or intra encoding that is encoding that does not use the predicted image for the block-unit image data.
The determination means includes
For an intra-coded block, the presence or absence of motion of image data is determined based on the motion vector in a block of a frame before or after the frame of the block;
The learning apparatus according to claim 15, wherein the presence or absence of motion of image data is determined for a non-intra coded block based on the motion vector in the block.

The learning means includes
Class tap extraction means for extracting some of the low quality data in pixel units used as a class tap for classifying the attention data into any one of a plurality of classes;
Classifying means for classifying the attention data into classes corresponding to values of the low quality data constituting the class tap, and classifying means for outputting a class code representing the class of the attention data; and
The tap coefficient calculation means, for each class, the tap coefficient for which the prediction error of the predicted value of the teacher data obtained by performing a product-sum operation using the prediction tap and the tap coefficient is statistically minimized. The learning device according to claim 14, characterized in that:

15. The learning apparatus according to claim 14, wherein the encoding means encodes learning image data by an MPEG (Moving Picture Experts Group) method and outputs the learning encoded data.

The low-quality data is image data obtained by decoding the encoded data according to the MPEG (Moving Picture Experts Group) system,
The learning apparatus according to claim 14, wherein the high quality data is image data with higher image quality than image data that is the low quality data.

The low quality data is a DCT coefficient of image data obtained by decoding the encoded data in accordance with an MPEG (Moving Picture Experts Group) system,
The learning device according to claim 14, wherein the high-quality data is image data with higher image quality than image data obtained by decoding the encoded data according to an MPEG system.

The low quality data is a DCT coefficient of image data obtained by decoding the encoded data in accordance with an MPEG (Moving Picture Experts Group) system,
The learning apparatus according to claim 14, wherein the high-quality data is a DCT coefficient of image data having higher image quality than image data obtained by decoding the encoded data according to an MPEG system.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a learning method for learning tap coefficients used to decode encoded data,
A teacher data generation step for generating and outputting teacher data to be a teacher for learning the tap coefficient from the image data for learning; and
A student data generation step of generating and outputting student data to be students of learning of the tap coefficient from the learning image data;
An encoding step of encoding the learning image data and outputting encoded data for learning including the DCT type and a motion vector of the image data;
The correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the block-based image data based on the motion vector of the image data included in the learning encoded data. And a determination step for outputting mismatch information representing the determination result;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the high-quality image obtained by decoding the encoded data. A prediction tap extraction step for extracting some of the low quality data in pixel units of the low quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using the student data corresponding to the low quality data and the teacher data corresponding to the high quality data, a prediction error of a predicted value of the teacher data obtained by a product-sum operation of the student data and a tap coefficient is A learning step having a tap coefficient calculation step for obtaining a tap coefficient that is statistically minimized;
A decoding step including a prediction calculation step for obtaining the data of interest by performing the product-sum operation on the tap coefficient and the prediction tap,
In the prediction tap extraction step, based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, the prediction tap is extracted from the low quality data of both the field of interest data and the frame.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a program for causing a computer to perform a learning process for learning a tap coefficient used to decode encoded data,
A teacher data generation step for generating and outputting teacher data to be a teacher for learning the tap coefficient from the image data for learning; and
A student data generation step of generating and outputting student data to be students of learning of the tap coefficient from the learning image data;
An encoding step of encoding the learning image data and outputting encoded data for learning including the DCT type and a motion vector of the image data;
The correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the block-based image data based on the motion vector of the image data included in the learning encoded data. And a determination step for outputting mismatch information representing the determination result;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the high-quality image obtained by decoding the encoded data. A prediction tap extraction step for extracting some of the low quality data in pixel units of the low quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using the student data corresponding to the low quality data and the teacher data corresponding to the high quality data, a prediction error of a predicted value of the teacher data obtained by a product-sum operation of the student data and a tap coefficient is A learning step having a tap coefficient calculation step for obtaining a tap coefficient that is statistically minimized;
A decoding step including a prediction calculation step for obtaining the data of interest by performing the product-sum operation on the tap coefficient and the prediction tap,
In the prediction tap extraction step, based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, the prediction tap is extracted from the low quality data of both the field of interest data and the frame.

Encoded data obtained by encoding image data, and at least a motion vector is detected from the image data and a motion compensation is performed using the motion vector to generate a predicted image, and the image data and the predicted image Including a DCT type representing a field DCT mode or a frame DCT mode when DCT is converted in a predetermined block unit by a field DCT (Discrete Cosine Transform) mode or a frame DCT mode, and a motion vector of the image data In a recording medium on which a program for causing a computer to perform a learning process for learning a tap coefficient used for decoding encoded data is recorded,
A teacher data generation step for generating and outputting teacher data to be a teacher for learning the tap coefficient from the image data for learning; and
A student data generation step of generating and outputting student data to be students of learning of the tap coefficient from the learning image data;
An encoding step of encoding the learning image data and outputting encoded data for learning including the DCT type and a motion vector of the image data;
The correctness of the DCT type included in the learning encoded data is determined based on the presence or absence of motion of the block-based image data based on the motion vector of the image data included in the learning encoded data. And a determination step for outputting mismatch information representing the determination result;
In order to obtain the attention data by using the high-quality data in units of pixels to be obtained as the attention data among the high-quality data of the high-quality image obtained by decoding the encoded data. A prediction tap extraction step for extracting some of the low quality data in pixel units of the low quality image used as a prediction tap for a product-sum operation with a predetermined tap coefficient;
Using the student data corresponding to the low quality data and the teacher data corresponding to the high quality data, a prediction error of a predicted value of the teacher data obtained by a product-sum operation of the student data and a tap coefficient is A learning step having a tap coefficient calculation step for obtaining a tap coefficient that is statistically minimized;
A decoding step including a prediction calculation step for obtaining the data of interest by performing the product-sum operation on the tap coefficient and the prediction tap,
In the prediction tap extraction step, based on the mismatch information,
When the mismatch information indicates that the DCT type is correct, when the DCT type is a field DCT mode, the prediction tap is extracted from the low quality data in the field of the attention data;
In the case where the mismatch information indicates that the DCT type is correct, when the DCT type is a frame DCT mode, the prediction tap is extracted from the low quality data of the frame of the target data;
When the mismatch information indicates that the DCT type is not correct, a program for extracting the prediction tap from the low-quality data of both the field and frame of the data of interest is recorded. Recording media to be used.