JP2004221744A

JP2004221744A - Dynamic image encoding instrument, its method and program, dynamic image decoding instrument, and its method and program

Info

Publication number: JP2004221744A
Application number: JP2003004487A
Authority: JP
Inventors: Shinichi Sakaida; 慎一境田; Hiroyuki Imaizumi; 浩幸今泉; Kazuhisa Iguchi; 和久井口; Makoto Ikeda; 誠池田
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-01-10
Filing date: 2003-01-10
Publication date: 2004-08-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a dynamic image encoding instrument and its method and program which reduces a volume of calculation for encoding dynamic images and improver an encoding efficiency when the dynamic images are encoded through a motion-compensated prediction method. <P>SOLUTION: The dynamic image encoding instrument 1 is equipped with a reference image searching means 16 which searches images highly correlative to the target images to encode among encoded images stored in an encoded image storage 15a on the basis of camera information set correspondent to each image when the dynamic images are photoed, the searched images are made to serve as reference images, and the dynamic images are encoded on the basis of the reference images through a motion-compensated prediction method. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像符号化・復号技術に関し、より詳細には、動き補償予測によって動画像を符号化・復号する動画像符号化装置、その方法及びそのプログラム、並びに、動画像復号装置、その方法及びそのプログラムに関する。
【０００２】
【従来の技術】
現在、動画像を圧縮符号化する方式として、ＩＳＯ／ＩＥＣＪＴＣ１ＳＣ２９のＷＧ１１（ＷｏｒｋｉｎｇＧｒｏｕｐ１１）で標準化されたＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４等の符号化方式（以下、ＭＰＥＧ符号化方式と呼ぶ）が存在する。これらのＭＰＥＧ符号化方式では、動画像の動きを予測、補償することで、動画像の時間的な冗長性を取り除くようにして符号化を行っている。
【０００３】
このＭＰＥＧ符号化方式では、入力された画像を、マクロブロックと呼ばれる水平１６画素×垂直１６ラインの大きさの領域を単位として動き補償予測を行っている。そして、ＭＰＥＧ−２は、この動き補償予測を行った画像と入力画像との差分を、ブロックと呼ばれる水平８画素×垂直８ラインの大きさの領域毎に離散コサイン変換（ＤＣＴ：ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）し、視覚感度の低い高周波成分を大きく削減するように量子化を行い、可変長符号化を行うことで動画像の符号化を行っている。
【０００４】
なお、ＭＰＥＧ符号化方式は、動きの予測を行う場合、過去の画像から未来の画像を予測する順方向予測と、未来の画像から過去の画像を予測する逆方向予測の機能を有している。ここで、未来の画像から過去の画像を予測するとは、符号化をスキップした画像を現在の画像から予測することである。このＭＰＥＧ符号化方式における順方向予測及び逆方向予測では、処理対象となる画像と時間的に近傍な画像とが画像間の相関性が高いことを利用して、処理対象となる画像の直前又は直後の画像を、動き予測を行う際に参照する参照画像として用いることが多い。
【０００５】
しかし、ＭＰＥＧ符号化方式では、動画像撮影時におけるカメラのパン、チルト等のカメラの動きが速い場合や、カットチェンジ直後の画像のように画像間の変化が大きい場合は、時間的に近傍な画像であっても画像間の相関性が低くなり、動き補償予測の利点が活用できないという問題がある。
【０００６】
最近では、前記した問題を解決した新しい動画像符号化方式として、ＪＶＴと呼ばれる符号化方式が提案されている。このＪＶＴ符号化方式は、ＭＰＥＧグループとＩＴＵ−Ｔ（ＩｎｔｅｒｎａｔｉｏｎａｌＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎＵｎｉｏｎ−Ｔｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓｅｃｔｏｒ）とが合同で設立したＪｏｉｎｔＶｉｄｅｏＴｅａｍが規格化を進めている符号化方式であって、ＭＰＥＧ−４のｐａｒｔ１０及びＩＴＵ−ＴのＨ．２６４という番号が付され、２００２年中に規格化が完了予定の符号化方式である（非特許文献１参照）。
【０００７】
このＪＶＴ符号化方式は、基本的な枠組みはＭＰＥＧ−２と同様であるが、動き補償予測の単位がマクロブロックだけではなく、水平８画素×垂直８ラインのブロック、水平１６画素×垂直８ラインの長方形ブロック等、数種類が用意されている。また、離散コサイン変換（ＤＣＴ）が、水平４画素×垂直４ライン単位で整数演算のみで行われること、動き予測を行う際に参照する画像（参照画像）を、現在の画像の近傍だけでなく、過去に符号化された画像（符号化済み画像）の中から選択することができる等の特徴を有している。
【０００８】
このＪＶＴ符号化方式によれば、符号化された画像の中から任意の画像を参照画像として選択することが可能になるため、入力された画像と、既に符号化された画像との誤差が最小となる画像を参照画像として選択することで、動き補償予測を活用することが可能になる。
【０００９】
【非特許文献１】
“ＪｏｉｎｔＦｉｎａｌＣｏｍｍｉｔｔｅｅＤｒａｆｔｏｆＪｏｉｎｔＶｉｄｅｏＳｐｅｃｉｆｉｃａｔｉｏｎ（ＩＴＵ−ＴＲｅｃ．Ｈ．２６４｜ＩＳＯ／ＩＥＣ１４４９６−１０）”、ＩＴＵ−Ｔ｜ＩＳＯ／ＩＥＣ、ＪＶＴ−Ｄ１５７、２００２−０８−１０
【００１０】
【発明が解決しようとする課題】
前記従来の技術におけるＪＶＴ符号化方式は、動画像を圧縮符号化するときに、動画像を撮影したカメラの動きが速い場合や、カットチェンジが発生した場合においても、入力された画像と、既に符号化された画像との誤差が最小となる画像を参照画像として選択することで、動き補償予測を行うことが可能である。
しかし、既に符号化された膨大な符号化済みの画像の中から、入力画像との誤差が最小となる画像を選択するには、誤差を計算するための計算量が膨大となってしまうため、符号化に時間がかかるという問題がある。
【００１１】
本発明は、このような問題点に鑑みてなされたものであり、動画像を撮影したカメラ情報に基づいて、既に符号化された画像の中から、入力画像と最も相関の高い画像を参照画像とすることで、動き補償予測を有効に活用して符号化効率を高めた符号化データを生成する動画像符号化装置、その方法及びそのプログラムを提供することを目的とする。さらに、その符号化データを復号する動画像復号装置、その方法及びそのプログラムを提供することを目的とする。
【００１２】
【課題を解決するための手段】
本発明は、前記目的を達成するために創案されたものであり、まず、請求項１に記載の動画像符号化装置は、時系列に連続した画像で構成される動画像を、その動画像を撮影したときの前記画像に対応付けられたカメラ情報に基づいて、動き補償予測を行うことにより符号化する動画像符号化装置であって、前記画像と動き補償予測を行った予測画像との差分により、差分画像を生成する差分画像生成手段と、この差分画像生成手段で生成された差分画像を、特定の大きさのブロック単位で圧縮符号化して、差分画像符号化データを生成する差分画像符号化手段と、この差分画像符号化手段で生成された差分画像符号化データを復号して、復号差分画像を生成する復号差分画像生成手段と、この復号差分画像生成手段で生成された復号差分画像と前記予測画像とを加算することで、復号画像を生成する復号画像生成手段と、この復号画像生成手段で生成された復号画像を蓄積する復号画像蓄積手段と、前記カメラ情報に基づいて、前記復号画像蓄積手段に蓄積されている復号画像の中から、符号化対象となる処理対象画像と相関の高い復号画像を探索して参照画像とする参照画像探索手段と、前記処理対象画像と前記参照画像とに基づいて、前記処理対象画像の動き予測となる動きベクトルを生成する動き予測手段と、この動き予測手段で生成された動きベクトルと前記参照画像とに基づいて、前記処理対象画像の動きを予測した前記予測画像を生成する動き補償手段と、前記参照画像を識別するための識別情報と前記差分画像符号化データと前記動きベクトルとを多重化して、符号化データを生成する多重化手段と、を備える構成とした。
【００１３】
かかる構成によれば、動画像符号化装置は、差分画像生成手段によって、動画像を構成する個々の画像と、既に符号化を行った符号化済み画像に対して動き補償予測を行うことで予測された予測画像との差分をとることにより差分画像を生成し、復号差分画像生成手段によって、差分画像を特定の大きさのブロック単位で圧縮符号化して、差分画像符号化データを生成する。例えば、ＭＰＥＧ符号化方式のブロック単位で、離散コサイン変換（ＤＣＴ）を行い、ＤＣＴされた結果に対して量子化を行うことで差分画像符号化データを生成する。これによって、情報量を削減することが可能になる。
【００１４】
そして、動画像符号化装置は、復号差分画像生成手段によって、差分画像符号化データを復号し、復号画像生成手段によって、その復号された復号差分画像と予測画像とを加算することで、復号側で復号される画像（復号画像）を再生し、復号画像蓄積手段に蓄積する。そして、参照画像探索手段によって、この復号画像蓄積手段に蓄積されている復号画像の中から、符号化対象となる画像（処理対象画像）と相関の高い画像を動き補償予測を行うための参照画像として探索する。
【００１５】
なお、参照画像探索手段は、この参照画像を探索する際に、復号画像を撮影したカメラ情報に基づいて、その相関を判定する。ここでカメラ情報とは、カメラの絶対位置、パン、チルト、ロール、ズーム、フォーカス等のカメラパラメータのことをいう。なお、ここでカメラパラメータを用いるのは、カメラパラメータが等しい画像は、画角が同一で、輝度変化を除けば画像が類似しており、動き補償による予測が当たる可能性を高くすることができるためである。これによって、複数存在する復号画像の中から、カメラ情報によって画像の相関を判定することができるため、画像そのものの類似性を判定する処理動作を軽減することができる。
【００１６】
また、動画像符号化装置は、動き予測手段によって、処理対象画像と参照画像との間の動きベクトルを算出し、動き補償手段によって、その動きベクトルと参照画像とにより、差分画像生成手段で用いられる処理対象画像の動きを予測した予測画像を生成する。これによって、動画像符号化装置は、時系列に連続した画像を動き補償予測によって順次符号化することが可能になる。
【００１７】
そして、動画像符号化装置は、多重化手段によって、参照画像の識別情報と差分画像符号化データと動きベクトルとを多重化して、符号化データを生成する。このように、参照画像の識別情報（参照画像番号）を符号化データに付加することで、符号化データを復号側で復号する際に、参照画像を探索する処理を省くことができる。
【００１８】
また、請求項２に記載の動画像符号化装置は、請求項１に記載の動画像符号化装置において、前記動画像は複数のカメラで撮影され、前記カメラ情報は、前記カメラを識別するためのカメラ識別情報を含むものであって、前記参照画像探索手段が、同一の前記カメラ識別情報に対応付けられている前記復号画像の中から、前記参照画像を探索する構成とした。
【００１９】
かかる構成によれば、動画像符号化装置は、複数のカメラを識別するためのカメラ識別情報（ＩＤ番号）をカメラ情報に付加することで、参照画像探索手段が、複数のカメラを切り替えて撮影した動画像であっても、カメラとそのカメラパラメータを特定することができる。
【００２０】
さらに、請求項３に記載の動画像符号化方法は、時系列に連続した画像で構成される動画像を動き補償予測により符号化するときに、前記動画像を撮影したときのカメラ情報に基づいて、既に符号化され、前記カメラ情報に対応付けられた符号化済み画像の中から動き補償予測に用いる参照画像を選択して符号化を行う動画像符号化方法であって、前記カメラ情報に基づいて、前記符号化済み画像の中から、前記時系列に連続した画像の符号化対象となる処理対象画像と相関の高い画像を参照画像として探索する参照画像探索ステップと、この参照画像探索ステップで探索された参照画像に基づいて、前記処理対象画像に対して動き補償予測を行うことで、前記処理対象画像に対する予測誤差となる差分画像を生成する動き補償予測ステップと、この動き補償予測ステップで生成された差分画像を符号化する画像符号化ステップと、この画像符号化ステップによる符号化結果を前記符号化済み画像として蓄積する符号化済み画像蓄積ステップと、を含むことを特徴とする。
【００２１】
この方法によれば、動画像符号化方法は、参照画像探索ステップで、既に符号化された符号化済み画像の中から、符号化対象となる画像（処理対象画像）と相関の高い画像を、動き補償予測を行うための参照画像として探索する。このとき、参照画像探索ステップは、復号画像を撮影したカメラ情報に基づいて、処理対象画像と相関の高い画像を探索する。ここでカメラ情報には、カメラの絶対位置、パン、チルト、ロール、ズーム、フォーカス等のカメラパラメータや、動画像を複数のカメラで撮影したときのカメラを識別するためのＩＤ番号を含ませることができる。これによって、動き補償による予測が当たる可能性を高くすることができる。
【００２２】
そして、動画像符号化方法は、動き補償予測ステップで、参照画像探索ステップで探索した参照画像に基づいて、処理対象画像に対して動き補償予測を行うことで、処理対象画像に対する予測誤差となる差分画像を生成する。この差分画像は、カメラ情報に基づいて、相関の高い参照画像から生成されたものであるため、予測誤差が小さいものとなる。そして、画像符号化ステップで動き補償予測ステップで生成された差分画像を符号化し、符号化済み画像蓄積ステップで、画像符号化ステップによる符号化結果を符号化済み画像として蓄積する。このように、前記した参照画像探索ステップで探索対象となる符号化済み画像（復号画像）を複数蓄積しておくことで、時系列に連続した画像を効率よく符号化することが可能になる。
【００２３】
また、請求項４に記載の動画像符号化プログラムは、時系列に連続した画像で構成される動画像を、その動画像を撮影したときの前記画像に対応付けられたカメラ情報に基づいて、動き補償予測を行うことにより符号化するために、コンピュータを、以下の手段によって機能させる構成とした。
【００２４】
すなわち、前記画像と動き補償予測を行った予測画像との差分により、差分画像を生成する差分画像生成手段、この差分画像生成手段で生成された差分画像を、特定の大きさのブロック単位で圧縮符号化して、差分画像符号化データを生成する差分画像符号化手段、この差分画像符号化手段で生成された差分画像符号化データを復号して、復号差分画像を生成する復号差分画像生成手段、この復号差分画像生成手段で生成された復号差分画像と前記予測画像とを加算することで復号画像を生成し、復号画像蓄積手段に蓄積する復号画像生成手段、前記カメラ情報に基づいて、前記復号画像蓄積手段に蓄積されている復号画像の中から、符号化対象となる処理対象画像と相関の高い復号画像を探索して参照画像とする参照画像探索手段、前記処理対象画像と前記参照画像とに基づいて、前記処理対象画像の動き予測となる動きベクトルを生成する動き予測手段、この動き予測手段で生成された動きベクトルと前記参照画像とに基づいて、前記処理対象画像の動きを予測した前記予測画像を生成する動き補償手段、前記参照画像を識別するための識別情報と前記差分画像符号化データと前記動きベクトルとを多重化して、符号化データを生成する多重化手段、とした。
【００２５】
かかる構成によれば、動画像符号化プログラムは、差分画像生成手段によって、動画像を構成する個々の画像と、既に符号化を行った符号化済み画像に対して動き補償予測を行うことで予測された予測画像との差分をとることにより差分画像を生成し、復号差分画像生成手段によって、差分画像を特定の大きさのブロック単位で圧縮符号化して、差分画像符号化データを生成する。例えば、ＭＰＥＧ符号化方式のブロック単位で、離散コサイン変換（ＤＣＴ）を行い、ＤＣＴされた結果に対して量子化を行うことで差分画像符号化データを生成する。
【００２６】
そして、動画像符号化プログラムは、復号差分画像生成手段によって、差分画像符号化データを復号し、復号画像生成手段によって、その復号された復号差分画像と予測画像とを加算することで、復号側で復号される画像（復号画像）を再生し、復号画像蓄積手段に蓄積する。そして、参照画像探索手段によって、復号画像を撮影したカメラ情報を参照して、復号画像蓄積手段に蓄積されている復号画像の中から、符号化対象となる画像（処理対象画像）と相関の高い画像を、動き補償予測を行うための参照画像として探索する。ここでカメラ情報には、カメラの絶対位置、パン、チルト、ロール、ズーム、フォーカス等のカメラパラメータや、動画像を複数のカメラで撮影したときのカメラを識別するためのＩＤ番号を含ませることができる。これによって、動き補償による予測が当たる可能性を高くすることができる。
【００２７】
そして、動画像符号化プログラムは、動き予測手段によって、処理対象画像と参照画像との間の動きベクトルを算出し、動き補償手段によって、その動きベクトルと参照画像とにより、差分画像生成手段で用いられる処理対象画像の動きを予測した予測画像を生成し、多重化手段によって、参照画像の識別情報と差分画像符号化データと動きベクトルとを多重化して、符号化データを生成する。
【００２８】
さらに、請求項５に記載の動画像復号装置は、動き補償予測により画像間の差分を符号化した差分画像符号化データと、前記画像間の動きベクトルと、既に復号された復号画像の中で復号を行う画像と相関の高い復号画像を指定した識別情報とを多重化した動画像の符号化データを復号する動画像復号装置であって、
既に復号された復号画像を蓄積する復号画像蓄積手段と、前記符号化データを前記差分画像符号化データと前記動きベクトルと前記識別情報とに分離する分離手段と、前記差分画像符号化データを復号して復号差分画像とする差分画像復号手段と、前記識別情報に基づいて、前記復号画像蓄積手段に蓄積されている復号画像の中から、動きの予測に用いる参照画像を選択する参照画像選択手段と、この参照画像選択手段で選択された参照画像と前記動きベクトルとに基づいて、復号される画像の動きを予測した予測画像を生成する動き補償手段と、この動き補償手段で生成された予測画像と前記復号差分画像とに基づいて、前記復号画像を生成し、前記復号画像蓄積手段に蓄積する復号画像生成手段と、を備える構成とした。
【００２９】
かかる構成によれば、動画像復号装置は、分離手段によって、符号化データに含まれる差分画像符号化データと動きベクトルと識別情報とを分離する。そして、差分画像復号手段によって、差分画像符号化データを復号し復号差分画像とする。この復号は、例えば、差分画像符号化データに対して、逆量子化、逆ＤＣＴを順番に行う。
【００３０】
また、動画像復号装置は、参照画像選択手段によって、復号画像蓄積手段に蓄積されている復号画像の中から、識別情報で指定された動きの予測に用いる参照画像を選択する。これによって、動画像復号装置は、復号画像蓄積手段に蓄積されている復号画像の中から、参照画像を選択するための演算を行わなくても、相関の高い参照画像を選択することができる。
【００３１】
そして、動画像復号装置は、動き補償手段によって、参照画像選択手段で選択された参照画像と動きベクトルとに基づいて、復号される画像の動きを予測した予測画像を生成する。この予測画像は、参照画像選択手段によって、復号される画像と相関の高い参照画像から生成されたものであるため、予測誤差の少ない画像となる。
【００３２】
そして、動画像復号装置は、復号画像生成手段によって、予測画像と復号差分画像とから復号画像を生成するとともに、その復号画像を復号画像蓄積手段に蓄積する。ここで蓄積された復号画像は、参照画像選択手段によって、参照画像を選択するための対象となる。
【００３３】
また、請求項６に記載の動画像復号方法は、動き補償予測により画像間の差分を符号化した差分画像符号化データと、前記画像間の動きベクトルと、既に復号された復号画像の中で復号を行う画像と相関の高い復号画像を指定した識別情報とを含んだ動画像の符号化データを復号する動画像復号方法であって、前記識別情報に基づいて、既に復号され復号画像蓄積手段に蓄積された復号画像の中から、動きの予測に用いる参照画像を選択する参照画像選択ステップと、この参照画像選択ステップで選択された参照画像と前記動きベクトルとに基づいて、復号される画像の動きを予測した予測画像を生成する動き補償ステップと、この動き補償ステップで生成された予測画像に基づいて、前記差分画像符号化データを復号した復号画像を生成する画像復号ステップと、この画像復号ステップで生成された復号画像を前記復号画像蓄積手段に蓄積する復号画像蓄積ステップと、を含むことを特徴とする。
【００３４】
この方法によれば、動画像復号方法は、参照画像選択ステップで、符号化データに含まれる識別情報に基づいて、既に復号され復号画像蓄積手段に蓄積された復号画像の中から、動きの予測に用いる参照画像を選択する。これによって、復号画像蓄積手段に蓄積されている復号画像の中から、参照画像を選択するための演算を行わなくても、相関の高い参照画像を選択することができる。
【００３５】
そして、動画像復号方法は、動き補償ステップで、参照画像と動きベクトルとに基づいて、復号される画像の動きを予測した予測画像を生成する。この予測画像は、参照画像選択ステップで選択された、復号される画像と相関の高い参照画像から生成されたものであるため、予測誤差の少ない画像となる。
【００３６】
そして、動画像復号方法は、画像復号ステップで、予測画像に基づいて差分画像符号化データを復号した復号画像を生成するとともに、復号画像蓄積ステップで、その復号された復号画像を復号画像蓄積手段に蓄積する。ここで蓄積された復号画像は、参照画像選択ステップで参照画像を選択するための対象となる。
【００３７】
さらに、請求項７に記載の動画像復号プログラムは、動き補償予測により画像間の差分を符号化した差分画像符号化データと、前記画像間の動きベクトルと、既に復号された復号画像の中で復号を行う画像と相関の高い復号画像を指定した識別情報とを多重化した動画像の符号化データを復号するために、コンピュータを、以下の手段によって機能させる構成とした。
【００３８】
すなわち、前記符号化データを前記差分画像符号化データと、前記動きベクトルと、前記識別情報とに分離する分離手段、前記差分画像符号化データを復号して復号差分画像とする差分画像復号手段、前記識別情報に基づいて、復号画像蓄積手段に蓄積されている既に復号された復号画像の中から、動きの予測に用いる参照画像を選択する参照画像選択手段、この参照画像選択手段で選択された参照画像と前記動きベクトルとに基づいて、復号される画像の動きを予測した予測画像を生成する動き補償手段、この動き補償手段で生成された予測画像と前記復号差分画像とに基づいて、前記復号画像を生成し、前記復号画像蓄積手段に蓄積する復号画像生成手段、とした。
【００３９】
かかる構成によれば、動画像復号プログラムは、分離手段によって、符号化データに含まれる差分画像符号化データと動きベクトルと識別情報とを分離し、差分画像復号手段によって、差分画像符号化データを復号し復号差分画像とする。
【００４０】
また、動画像復号プログラムは、参照画像選択手段によって、復号画像蓄積手段に蓄積されている復号画像の中から、識別情報で指定された動きの予測に用いる参照画像を選択する。これによって、動画像復号装置は、復号画像蓄積手段に蓄積されている復号画像の中から、参照画像を選択するための演算を行わなくても、最も相関の高い参照画像を選択することができる。
【００４１】
そして、動画像復号プログラムは、動き補償手段によって、参照画像選択手段で選択された参照画像と動きベクトルとに基づいて、復号される画像の動きを予測した予測画像を生成する。この予測画像は、参照画像選択手段によって、相関の高い参照画像から生成されたものであるため、予測誤差の少ない画像となる。
【００４２】
そして、動画像復号プログラムは、復号画像生成手段によって、予測画像と復号差分画像とから復号画像を生成するとともに、その復号画像を復号画像蓄積手段に蓄積する。ここで蓄積された復号画像は、参照画像選択手段によって、参照画像を選択するための対象となる。
【００４３】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
［動画像符号化装置の構成］
図１は、本発明における動画像符号化装置１の構成を示したブロック図である。図１に示すように動画像符号化装置１は、時系列に連続した画像である動画像（動画像データ）に対して動き補償予測を行うことによって、動画像を符号化した符号化データを生成するものである。なお、この動画像符号化装置１は、動画像の個々の画像を撮影したときのカメラ情報（パン、チルト等のカメラパラメータや、カメラ識別情報（カメラＩＤ））に基づいて、動画像の動き補償予測を行う際に使用する画像である参照画像を、既に符号化を行った画像（符号化済み画像）の中から選択することを特徴とする。
【００４４】
ここでは、動画像符号化装置１を、動画像記憶手段１０と、差分画像生成手段１１と、差分画像符号化手段１２と、復号差分画像生成手段１３と、復号画像生成手段１４と、画像蓄積切替手段１５と、参照画像探索手段１６と、動き予測手段１７と、動き補償手段１８と、多重化手段１９とを備えて構成した。
【００４５】
動画像記憶手段１０は、符号化対象となる動画像（動画像データ１０ａ）と、動画像を撮影したときの種々のカメラ情報１０ｂとを蓄積しておくものであって、一般的なハードディスク等で構成されているものである。この動画像データ１０ａの各画像とカメラ情報１０ｂとは、例えば、時刻情報（タイムコード）によって対応付けておく。なお、この動画像記憶手段１０は、動画像蓄積サーバとして、ネットワーク上に存在する構成としてもよい。
【００４６】
ここでカメラ情報１０ｂとは、動画像を撮影したときのカメラの絶対位置や、パン、チルト及びロールといったカメラの向きや、ズーム、フォーカス等のレンズ情報を含んだカメラパラメータのことをいう。なお、このカメラパラメータは、現在、放送局で使用されているバーチャルスタジオ用のエンコーダ等を用いて、フレーム（画像）毎に取り込むことが可能である。また、カメラが複数台存在する場合は、カメラを識別するためのカメラ識別情報（カメラＩＤ）を、カメラ情報に含ませるものとする。
【００４７】
差分画像生成手段１１は、動画像記憶手段１０に記憶されている動画像データ１０ａから、時系列に画像を読み込んで、動き補償手段１８で生成された予測画像との差分をとった差分画像を生成するものである。以下、動画像記憶手段１０から読み込まれる個々の画像を処理対象画像と呼ぶこととする。また、この差分画像生成手段１１は、例えば、ＭＰＥＧ符号化方式におけるマクロブロック単位（水平１６画素×垂直１６ライン）で、処理対象画像から予測画像を減算する一般的な減算器で構成することができる。ここで生成された差分画像は差分画像符号化手段１２へ出力される。
【００４８】
差分画像符号化手段１２は、差分画像生成手段１１で生成された差分画像を特定の大きさのブロック単位で圧縮符号化して、差分画像符号化データを生成するものである。この差分画像符号化手段１２は、例えば、ＭＰＥＧ符号化方式のように、差分画像をブロック単位（水平８画素×垂直８ライン）で離散コサイン変換（ＤＣＴ）し、視覚感度の低い高周波成分を大きく削減するように予め設定した量子化テーブルに基づいて量子化することで、差分画像の圧縮符号化を行う。ここで生成された差分画像符号化データは、符号化データを生成するためのデータとして多重化手段１９へ出力されるとともに、動き補償予測を行うために復号差分画像生成手段１３へ出力される。
【００４９】
復号差分画像生成手段１３は、差分画像符号化手段１２で生成された差分画像符号化データを復号して、復号差分画像を生成するものである。この復号差分画像生成手段１３は、例えば、差分画像符号化手段１２で離散コサイン変換（ＤＣＴ）及び量子化された差分画像符号化データに対して、逆量子化、逆ＤＣＴを順番に行うことで、復号差分画像を生成する。なお、この復号差分画像は、差分画像生成手段１１で生成された差分画像を復号した画像に相当する。ここで、生成された復号差分画像は、復号画像生成手段１４へ出力される。
【００５０】
復号画像生成手段１４は、復号差分画像生成手段１３で生成された復号差分画像と、動き補償手段１８で生成された予測画像とを加算して復号画像を生成するものである。この復号画像生成手段１４は、一般的な加算器で構成することができる。なお、この復号画像は、処理対象画像がマクロブロック数分復号された画像に相当する。ここで生成された復号画像は、画像蓄積切替手段１５へ出力される。
【００５１】
画像蓄積切替手段１５は、復号画像生成手段１４で生成された復号画像を蓄積し、その蓄積された復号画像の中から、参照画像探索手段１６から通知される選択情報に基づいて、動き補償予測を行うための参照画像を切り替えて出力するものである。ここでは、画像蓄積切替手段１５を、復号画像蓄積部１５ａと参照画像切替部１５ｂとで構成した。
【００５２】
復号画像蓄積部（復号画像蓄積手段）１５ａは、復号画像生成手段１４で生成された復号画像を蓄積するもので、例えば、フレームメモリで構成される。この復号画像蓄積部１５ａは、フレームメモリの容量内であればすべての復号画像が蓄積され、フレームメモリの容量を超える場合は、古い復号画像から削除され、逐次新しい復号画像が蓄積される。このように復号画像蓄積部１５ａには、過去に符号化データとして符号化された画像（符号化済み画像）が、復号画像として蓄積されることになる。
【００５３】
参照画像切替部１５ｂは、参照画像探索手段１６から通知される選択情報に基づいて、復号画像蓄積部１５ａに蓄積されている復号画像を切り替えて、１つの復号画像を参照画像として出力するものである。なお、この参照画像は、動き予測手段１７及び動き補償手段１８へ出力される。また、参照画像切替部１５ｂは、出力した参照画像を識別するための識別情報（ここでは、参照画像番号とする）を、符号化データを生成するためのデータとして多重化手段１９へ出力する。
【００５４】
参照画像探索手段１６は、処理対象画像に対応付けられている動画像記憶手段１０のカメラ情報１０ｂに基づいて、復号画像蓄積部１５ａに蓄積されている復号画像（符号化済み画像）の中から、最も処理対象画像と相関の高い復号画像を探索するものである。また、どの復号画像を探索したかは、選択情報として画像蓄積切替手段１５へ通知される。ここで探索された復号画像が、動き補償予測を行うための参照画像として用いられる。なお、参照画像探索手段１６のカメラ情報による探索処理の詳細については、動画像符号化装置１の動作の説明で行うこととする。
【００５５】
動き予測手段１７は、入力された処理対象画像と、画像蓄積切替手段１５から出力される参照画像とに基づいて、処理対象画像が参照画像に対してどれくらい動いたかを示す動き予測の方向及び大きさである動きベクトルを生成するものである。この動き予測手段１７は、例えば、従来の画像符号化で用いられているブロックマッチング法によって、ブロック単位で動きベクトルを求める。ここで求められた動きベクトルは、符号化データを生成するためのデータとして多重化手段１９へ出力されるとともに、動き補償手段１８へ出力される。
【００５６】
動き補償手段１８は、動き予測手段１７で生成された動きベクトルに基づいて、画像蓄積切替手段１５から出力される参照画像が、その動きベクトル分動いたと予測される予測画像を生成するものである。なお、この予測画像は、現在の処理対象画像の１つ前に入力された画像から予測した画像である。ここで生成された予測画像は、差分画像生成手段１１及び復号画像生成手段１４へ出力される。
【００５７】
多重化手段１９は、差分画像符号化手段１２で生成された差分画像符号化データと、動き予測手段１７で生成された動きベクトルと、画像蓄積切替手段１５で切り替えを行った参照画像の識別情報（参照画像番号）とを、それぞれエントロピ符号化し多重化することで、動画像を符号化した符号化データを生成するものである。
【００５８】
以上の構成によって、動画像符号化装置１は、ＪＶＴ符号化方式のような動き補償予測に基づいて符号化を行うときに、既に過去に符号化データとして符号化された複数の符号化済み画像（復号画像として復号画像蓄積部１５ａに蓄積）から、カメラ情報に基づいて相関の高い画像を参照画像として選択することができるため、参照画像を選択するための計算量を抑えることができる。
なお、動画像符号化装置１は、コンピュータにおいて各手段を各機能プログラムとして実現することも可能であり、各機能プログラムを結合して動画像符号化プログラムとして動作させることも可能である。
【００５９】
［動画像符号化装置の動作］
次に、動画像符号化装置１の動作について説明する。ここでは、動画像符号化装置１をＪＶＴ符号化方式、ＭＰＥＧ符号化方式等におけるＧＯＰ（ＧｒｏｕｐｏｆＰｉｃｔｕｒｅｓ）構造を持つ符号化データを生成するものとして、その動作を説明する。
【００６０】
ここで、図２を参照して、簡単にＧＯＰ構造について説明しておく。ＧＯＰ構造は、符号化データの再生時に、早送り、巻き戻し、途中再生、逆転再生等を行うために、ＭＰＥＧ−１で採用されたものである。このＧＯＰ構造は、何枚かの画面データ（＃１〜＃Ｎ）を１つの単位として構成されたもので、ＧＯＰ単位でランダムなアクセスを可能にしている。シーケンスヘッダ（ＳＨ：ＳｅｑｕｅｎｃｅＨｅａｄｅｒ）は、ランダムなアクセスを行うための頭出しの位置を特定するための情報が書き込まれたるヘッダである。
【００６１】
なお、ＧＯＰは、複数の画面データ（＃１〜＃Ｎ）を含んであるが、その中の少なくとも１枚の画面は、動き補償予測を行わずにその画面内だけの情報で符号化されたもの（イントラ（画面内符号化）画像）である。それ以外の画面データは、画面間の動き補償予測によって符号化されたもの（インター（画面間予測符号化）画像）である。これによって、ＧＯＰ構造を持つ符号化データは、復号側でＧＯＰのイントラ画像を基準に、動画像として復号される。
次に、図３を参照（適宜図１参照）して、動画像符号化装置１の動作について説明する。図３は、動画像符号化装置１の動作を示すフローチャートである。
【００６２】
（準備ステップ）
まず、動画像を撮影した画像そのもの（動画像データ）と、その動画像を撮影したときのカメラ情報を記録する（ステップＳ１）。ここでは、動画像データ及びカメラ情報を動画像記憶手段１０に記憶しておく。このカメラ情報は、カメラの絶対位置、パン、チルト、ロール、ズーム、フォーカス等のカメラパラメータである。また、動画像を撮影したカメラが複数台存在する場合は、カメラ情報にカメラの番号（カメラＩＤ）を含ませる。
【００６３】
そして、動画像符号化装置１の操作者が、符号化処理を行う際の事前設定を図示していない入力手段を介して、動画像符号化装置１に対して行う（ステップＳ２）。例えば、量子化のレベル（量子化値）を入力することで、差分画像符号化手段１２の量子化テーブルを設定（更新）したり、ＧＯＰに含まれる画面数（図２参照）を入力することで、ＧＯＰ間隔を設定する。なお、ここでは、ＧＯＰの先頭の画面（画像）をイントラ画像とする。
以上の準備を行った後に、動画像符号化装置１は、動画像記憶手段１０に記憶されている動画像データ１０ａから時系列に画像（処理対象画像）を読み出して、その処理対象画像を符号化する。
【００６４】
（画像符号化ステップ；イントラ画像）
まず、動画像符号化装置１は、最初のＧＯＰの先頭画像をイントラ画像として符号化する（ステップＳ３）。このイントラ画像は、動き補償予測を行わずに符号化を行うため、差分画像符号化手段１２において、イントラ画像を符号化（例えば、ＤＣＴ変換）及び量子化され、復号差分画像生成手段１３において、逆量子化及び逆ＤＣＴ変換される。この復号差分画像生成手段１３で生成された復号差分画像が、イントラ画像全体の復号画像となる。すなわち、イントラ画像の復号画像は、差分画像生成手段１１及び復号画像生成手段１４を介さずに生成される。
【００６５】
そして、動画像符号化装置１は、この復号画像を画像蓄積切替手段１５の復号画像蓄積部１５ａに蓄積する。すなわち、過去に符号化データとして符号化された画像（符号化済み画像）が、復号画像蓄積部１５ａに蓄積される（ステップＳ４）。
そして、動画像符号化装置１は、次の画像（処理対象画像）であるインター画像を以下のステップで符号化する。
【００６６】
（参照画像探索ステップ）
動画像符号化装置１は、参照画像探索手段１６によって、処理対象画像に対応したカメラ情報と、復号画像蓄積部１５ａに蓄積されている復号画像（符号化済み画像）に対応したカメラ情報とを、動画像記憶手段１０のカメラ情報１０ｂから読み出し、その各々のカメラ情報に基づいて、処理対象画像と相関の高い復号画像を探索し、相関の最も高い復号画像を参照画像とする（ステップＳ５）。なお、この参照画像は、参照画像探索手段１６の探索結果として通知される選択情報に基づいて、参照画像切替部１５ｂが復号画像蓄積部１５ａに蓄積されている復号画像の出力を切り替えることで、動き予測手段１７及び動き補償手段１８へ出力される。また、その参照画像を識別するための情報（参照画像番号）は、参照画像切替部１５ｂによって、多重化手段１９へ出力される。
【００６７】
（動き補償予測ステップ）
そして、動画像符号化装置１は、参照画像に基づいて、処理対象画像（ここではインター画像）に対してブロック単位で動き補償予測を行う（ステップＳ６）。すなわち、動画像符号化装置１は、動き予測手段１７によって、処理対象画像及び参照画像間の動きベクトルを求め、動き補償手段１８によって、動きベクトルと参照画像とから処理対象画像が動いたと予測される予測画像を生成する。そして、差分画像生成手段１１によって、処理対象画像と予測画像との差分により、動き補償予測の予測誤差となる差分画像を生成する。
【００６８】
（画像符号化ステップ；インター画像）
そして、動画像符号化装置１は、差分画像符号化手段１２によって、差分画像をブロック単位で符号化する（ステップＳ７）。また、差分画像符号化手段１２では、インター画像の１画面の符号化が終了したかどうかを判定し（ステップＳ８）、１画面の符号化が終了していない場合（Ｎｏ）は、ステップＳ６へ戻ってブロック単位での動き補償予測及び符号化を繰り返す。
【００６９】
なお、ここで符号化された差分画像符号化データは、動き予測手段１７で生成された動きベクトルと、参照画像切替部１５ｂから出力される参照画像番号とともに、多重化手段１９によって、それぞれエントロピ符号化され、多重化されて符号化データとして出力される。
【００７０】
（符号化済み画像蓄積ステップ）
一方、１画面の符号化が終了した場合（ステップＳ８でＹｅｓ）は、各ブロックをまとめた１画面分の符号化済み画像を復号画像蓄積部１５ａに蓄積する（ステップＳ９）。すなわち、動画像符号化装置１は、差分画像符号化手段１２によってブロック単位で符号化された差分画像符号化データを、復号差分画像生成手段１３で復号し、その復号された復号差分画像と動き補償手段１８で予測された予測画像とを復号画像生成手段１４で加算する。そして、そこで生成されたブロック数分の復号画像を１画面の符号化済み画像として、復号画像蓄積部１５ａに蓄積する。
【００７１】
なお、ここでインター画像の復号がステップＳ２で設定したＧＯＰ間隔分終了したかどうかを判定し（ステップＳ１０）、終了していない場合（Ｎｏ）は、ステップＳ５へ戻って、次のインター画像を、カメラ情報により相関の高いと探索された参照画像に基づいて符号化を行う。
【００７２】
一方、インター画像をＧＯＰ間隔分符号化した場合（ステップＳ１０でＹｅｓ）は、次の画像（処理対象画像）が存在するかどうかを判定し（ステップＳ１１）、処理対象画像が存在する場合（Ｙｅｓ）は、その画像をイントラ画像（ＧＯＰの先頭画像）に設定して（ステップＳ１２）、ステップＳ３へ戻って、イントラ画像の符号化を行う。また、ステップＳ１１で次の画像（処理対象画像）が存在しない場合（Ｎｏ）は、すべての画像（動画像）の符号化が終了したことになり動作を終了する。
【００７３】
以上の各ステップによって、ＪＶＴ符号化方式のような動き補償予測に基づいて符号化を行うときに、その動き補償予測に用いる参照画像を、カメラ情報に基づいて、処理対象画像と相関の高い画像を選択することができるので、計算量を抑えたままで最適な参照画像を選択することができる。
【００７４】
［参照画像の探索動作例］
次に、図４乃至図６を参照（適宜図１参照）して、参照画像探索手段１６が、カメラ情報に基づいて参照画像を探索する動作例について説明する。図４乃至図６は、カメラ情報に基づいて参照画像を探索する動作を示すフローチャートで、具体的な動作例を３つ例示したものである。なお、この探索動作は図３のフローチャートにおけるステップＳ５の具体的な動作となるものである。
【００７５】
（第１の探索動作例）
最初に、図４を参照（適宜図１参照）して第１の探索動作例を説明する。まず、参照画像探索手段１６は、復号画像蓄積部１５ａの中から、処理対象画像を撮影したカメラのＩＤ番号が同じ復号画像を選択する（ステップＳ２１）。
【００７６】
そして、処理対象画像とステップＳ２１で選択された復号画像との間で、カメラの絶対位置の差分（Ｐｄｉｆｆ）を算出する（ステップＳ２２）。例えば、カメラの絶対位置を３次元座標（ｘ，ｙ，ｚ）で表現したとき、ｘ座標の差分、ｙ座標の差分及びｚ座標の差分を加算することでＰｄｉｆｆとする。
【００７７】
また、処理対象画像とステップＳ２１で選択された復号画像との間で、カメラの向きの差分（Ｄｄｉｆｆ）を算出する（ステップＳ２３）。例えば、カメラの向きをカメラの絶対位置におけるパン、チルト及びロールとし、それぞれの差分を加算することでＤｄｉｆｆとする。
【００７８】
さらに、処理対象画像とステップＳ２１で選択された復号画像との間で、レンズ情報の差分（Ｌｄｉｆｆ）を算出する（ステップＳ２４）。例えば、レンズ情報をズーム及びフォーカスとし、それぞれの差分を加算することでＬｄｉｆｆとする。
【００７９】
また、処理対象画像とステップＳ２１で選択された復号画像との間で、時刻情報の差分（Ｔｄｉｆｆ）を算出する（ステップＳ２５）。例えば、時刻情報を撮影された画像のフレーム単位の撮影時刻とし、その差分をＴｄｉｆｆとする。なお、この撮影時刻には、動画像撮影時のタイムコードを用いることができる。
そして、処理対象画像とステップＳ２１で選択された復号画像との相違量Ｒ１を（１）式に基づいて算出する（ステップＳ２６）。
【００８０】

【００８１】
ここで、Ｗｐ、Ｗｄ、Ｗｌ及びＷｔは、それぞれＰｄｉｆｆ、Ｄｄｉｆｆ、Ｌｄｉｆｆ及びＴｄｉｆｆの重み係数を示す。そして、前記（１）式で求めた相違値Ｒ１が最も小さくなる復号画像を、処理対象画像に最も相関が高い画像であると判定し参照画像に決定する（ステップＳ２７）。
【００８２】
このように、第１の探索動作例では、参照画像探索手段１６が、種々のカメラ情報に重み付けを行って、参照画像の探索を行う。これによって、例えば、カメラのズームはあまり行われないが、カメラの切り替えが頻繁に発生する画像には、レンズ情報よりもカメラの絶対位置に重みを持たせることで、相関の高い参照画像を得ることが可能になる。
【００８３】
（第２の探索動作例）
次に、図５を参照（適宜図１参照）して第２の探索動作例を説明する。まず、参照画像探索手段１６は、復号画像蓄積部１５ａの中から、処理対象画像を撮影したカメラのＩＤ番号が同じ復号画像を選択する（ステップＳ３１）。
そして、処理対象画像とステップＳ３１で選択された復号画像との間で、カメラの絶対位置の差分（Ｐｄｉｆｆ）を算出し（ステップＳ３２）、カメラの向きの差分（Ｄｄｉｆｆ）を算出し（ステップＳ３３）、レンズ情報の差分（Ｌｄｉｆｆ）を算出する（ステップＳ３４）。なお、このステップＳ３１〜Ｓ３４は、図４のステップＳ２１〜Ｓ２４と同じ動作である。
【００８４】
そして、処理対象画像とステップＳ３１で選択された復号画像との相違量Ｒ２を（２）式に基づいて算出する（ステップＳ３５）。
【００８５】
Ｒ２＝Ｗｐ×Ｐｄｉｆｆ＋Ｗｄ×Ｄｄｉｆｆ＋Ｗｌ×Ｌｄｉｆｆ …（２）式
【００８６】
ここで、Ｗｐ、Ｗｄ及びＷｌは、それぞれＰｄｉｆｆ、Ｄｄｉｆｆ及びＬｄｉｆｆの重み係数を示す。そして、前記（２）式で求めた相違量Ｒ２が予め定めた閾値以下となる復号画像を選択する（ステップＳ３６）。
【００８７】
そして、ステップＳ３６で選択された復号画像の中から、処理対象画像の動画像内の時刻に最も近いものを参照画像に決定する（ステップＳ３７）。
このように、第２の探索動作例では、カメラ情報によって参照画像の絞り込みを行い、その中で、時間的に最も近い画像を参照画像とする。これによって、従来のＭＰＥＧ符号化方式のように時間的に最も近い画像を参照画像としていた構成又はプログラムに、カメラ情報によって参照画像の絞り込みを行う構成又はプログラムを付加するという簡単な構成で動画像の符号化を実現することができる。これにより、従来の資源を活かして動画像符号化装置１又は動画像符号化プログラムを実現することができる。
【００８８】
（第３の探索動作例）
次に、図６を参照（適宜図１参照）して第３の探索動作例を説明する。まず、参照画像探索手段１６は、復号画像蓄積部１５ａの中から、処理対象画像を撮影したカメラのＩＤ番号が同じ復号画像を選択する（ステップＳ４１）。
そして、処理対象画像とステップＳ４１で選択された復号画像との間で、レンズ情報のズーム値（ズームパラメータ）の差が、予め定めた閾値以下となる復号画像を選択する（ステップＳ４２）。
【００８９】
そして、処理対象画像とステップＳ４２で選択された復号画像との間で、カメラの絶対位置の差が、予め定めた閾値以下となる復号画像を選択する（ステップＳ４３）。
さらに、処理対象画像とステップＳ４３で選択された復号画像との間で、カメラの向き（パン、チルト及びロール）の差が、予め定めた閾値以下となる復号画像を選択する（ステップＳ４４）。
【００９０】
そして、ステップＳ４４で選択された復号画像の中から、処理対象画像の動画像内の時刻に最も近いものを参照画像に決定する（ステップＳ４５）。
このように、第３の探索動作例では、レンズ情報のズーム値、カメラの絶対位置、カメラの向きの順番で、カメラ情報に優先順位を持たせて参照画像を探索する。これによって、優先順位の高いものから確実に参照画像を絞り込んでいくことができる。
【００９１】
以上、参照画像探索手段１６が参照画像を探索する動作例について説明したが、これら以外にも種々の探索動作が可能である。例えば、第３の探索動作例において、優先させる順番を変えて動作させることも可能である。また、動画像がカメラ１台で撮影されたものであれば、第１〜第３の各探索動作例のうち、ステップＳ２１（図４）、ステップＳ３１（図５）、ステップＳ４１（図６）を削除すればよい。
【００９２】
［動画像復号装置の構成］
次に、図７を参照して、動画像復号装置２の構成について説明する。図７は、動画像復号装置２の構成を示したブロック図である。動画像復号装置２は、動画像符号化装置１（図１）で符号化されて生成された符号化データを復号した復号画像（動画像）を出力するものである。
ここでは、動画像復号装置２を、分離手段２１と、差分画像復号手段２２と、復号画像生成手段２３と、動き補償手段２４と、画像蓄積切替手段２５と、参照画像選択手段２６とを備えて構成した。
【００９３】
分離手段２１は、入力された符号化データを分離、復号することで符号化データに含まれる差分画像符号化データ、動きベクトル及び参照画像を抽出するものである。
【００９４】
差分画像復号手段２２は、分離手段２１で抽出された差分画像符号化データを復号して、復号差分画像を生成するものである。この差分画像復号手段２２は、例えば、差分画像符号化データに対して、逆量子化、逆ＤＣＴを順番に行うことで、復号差分画像を生成する。なお、この逆量子化に用いる量子化テーブルは、動画像符号化装置１（図１）と同じものを使用する。
【００９５】
復号画像生成手段２３は、差分画像復号手段２２で生成された復号差分画像と、動き補償手段２４で生成された予測画像とを加算した復号画像を生成するものである。ここで生成された復号画像は、外部に出力されるとともに、画像蓄積切替手段２５へ出力される。なお、この復号画像を時系列に出力することで、動画像符号化装置１（図１）で符号化される前の動画像が再生されることになる。
【００９６】
動き補償手段２４は、分離手段２１で抽出された動きベクトルと、画像蓄積切替手段２５から出力される参照画像とに基づいて、復号されるべき復号画像が動いたと予測される予測画像を生成するものである。この予測画像は、復号画像生成手段２３へ出力される。
【００９７】
画像蓄積切替手段２５は、復号画像生成手段２３で生成された復号画像を蓄積し、その蓄積された復号画像の中から、次に復号する画像と相関の高い画像を、動き補償予測を行うための参照画像として、切り替えて出力するものである。ここでは、画像蓄積切替手段２５を、復号画像蓄積部２５ａと参照画像切替部２５ｂとで構成した。
【００９８】
復号画像蓄積部（復号画像蓄積手段）２５ａは、復号画像生成手段２３で生成された復号画像を蓄積するもので、例えば、フレームメモリで構成される。この復号画像蓄積部２５ａは、フレームメモリの容量内であればすべての復号画像が蓄積され、フレームメモリの容量を超える場合は、古い復号画像から削除され、逐次新しい復号画像が蓄積される。なお、フレームメモリの容量は、動画像符号化装置１（図１）と同容量とする。
【００９９】
参照画像切替部２５ｂは、参照画像選択手段２６から通知される選択情報に基づいて、復号画像蓄積部２５ａに蓄積されている復号画像を切り替えて、１つの復号画像を参照画像として出力するものである。なお、この参照画像は、動き補償手段２４へ出力される。
【０１００】
参照画像選択手段２６は、復号画像蓄積部２５ａに蓄積されている復号画像の中で、分離手段２１で抽出された参照画像番号に対応する復号画像を選択し、選択情報として画像蓄積切替手段２５へ通知して、その復号画像を参照画像として出力させるものである。
【０１０１】
以上の構成によって、動画像復号装置２は、動画像符号化装置１（図１）で生成された符号化データに含まれて指示される参照画像番号に基づいて、既に復号した復号画像の中から参照画像を選択し、その参照画像に基づいて、符号化データを復号するため、参照画像を探索する必要がなく、復号のための計算量を抑えることができる。
【０１０２】
［動画像復号装置の動作］
次に、図８を参照（適宜図７参照）して、動画像復号装置２の動作について説明する。図８は、動画像復号装置２の動作を示すフローチャートである。ここでは、動画像復号装置２を、動画像復号装置１（図１）で生成されたＧＯＰ構造（図２参照）を持つ符号化データを復号するものとして、その動作を説明する。
【０１０３】
（画像復号ステップ；イントラ画像）
まず、動画像復号装置２は、分離手段２１で分離された差分画像符号化データを、ＧＯＰの先頭画像であるイントラ画像として復号する（ステップＳ５１）。このイントラ画像は、動き補償予測を行わずに復号を行うため、差分画像復号手段２２において、逆量子化及び逆ＤＣＴ変換された画像が、イントラ画像全体の復号画像となる。そして、動画像復号装置２は、この復号画像を画像蓄積切替手段２５の復号画像蓄積部２５ａに蓄積する（ステップＳ５２）。
【０１０４】
（参照画像選択ステップ）
そして、動画像復号装置２は、分離手段２１で分離された参照画像番号に基づいて、参照画像選択手段２６で復号画像蓄積部２５ａに蓄積されている復号画像を選択する（ステップＳ５３）。なお、ここで選択された参照画像は、参照画像切替部２５ｂが復号画像蓄積部２５ａに蓄積されている復号画像の出力を切り替えることで、動き補償手段２４へ出力される。
【０１０５】
（動き補償ステップ）
そして、動画像復号装置２は、動き補償手段２４によって、画像蓄積切替手段２５から出力された参照画像と、分離手段２１で分離された動きベクトルとから、予測画像を生成する（ステップＳ５４）。
【０１０６】
（画像復号ステップ；インター画像）
この予測画像に基づいて、動画像復号装置２は、インター画像を復号する（ステップＳ５５）。すなわち、分離手段２１で分離された差分画像符号化データを差分画像復号手段２２で復号することで、復号差分画像を生成し、その復号差分画像と、動き補償手段２４で予測された予測画像とを復号画像生成手段２３で加算することで、インター画像を復号した復号画像が生成される。
【０１０７】
（復号画像蓄積ステップ）
動画像復号装置２は、この復号画像を復号画像蓄積部２５ａに蓄積する（ステップＳ５６）。
そして、動画像復号装置２は、すべての符号化データの復号が完了したかどうかを判定し（ステップＳ５７）、完了した場合（Ｙｅｓ）は、動作を終了する。一方、復号が完了していない場合（Ｎｏ）は、次に入力される符号化データがイントラ画像を符号化したものかどうかを判定する（ステップＳ５８）。ここで、符号化データがインター画像を符号化したものである場合（Ｙｅｓ）は、ステップＳ５３へ戻って、参照画像番号に基づいて選択された参照画像により、インター画像の復号を行う。一方、インター画像を符号化したものでない場合（ステップＳ５８でＮｏ）は、ステップＳ５１へ戻って、イントラ画像の復号を行う。
【０１０８】
以上の各ステップによって、符号化データに含まれて指示される参照画像番号に基づいて、既に復号した復号画像の中から参照画像を選択し、その参照画像に基づいて、符号化データを復号するため、参照画像を探索する必要がないため、復号のための計算量を抑えることができる。
【０１０９】
【発明の効果】
以上説明したとおり、本発明に係る動画像符号化装置、その方法及びそのプログラム、並びに、動画像復号装置、その方法及びそのプログラムでは、以下に示す優れた効果を奏する。
【０１１０】
請求項１、請求項３又は請求項４に記載の発明によれば、動き補償予測によって動画像を符号化する際に、その動画像を撮影したときのカメラパラメータによって、既に符号化された画像（符号化済み画像）の中から、最も相関の高い画像を参照画像とするため、誤差の少ない画像を参照画像として選択することができ、符号化効率を高めることができる。また、参照画像を探索するための計算量を軽減することができ、符号化の時間を短縮することができる。
【０１１１】
請求項２に記載の発明によれば、動画像を複数のカメラで撮影したときのカメラを識別する識別情報をカメラ情報に付加し、その識別情報とカメラパラメータとに基づいて、既に符号化された画像（符号化済み画像）の中から、最も相関の高い画像を参照画像として選択するため、カットチェンジが発生する動画像であっても、誤差の少ない画像を参照画像として選択することができるため、符号化効率を高めることができる。
【０１１２】
請求項５、請求項６又は請求項７に記載の発明によれば、個々の画像を動き補償によって予測するときの参照画像を特定する参照画像番号に基づいて、符号化データを復号するため、復号側で既に復号した復号画像の中から相関の高い画像を探索する必要がなく、高速に符号化データを復号することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る動画像符号化装置の全体構成を示すブロック図である。
【図２】ＧＯＰ構造を説明するための説明図である。
【図３】本発明の実施の形態に係る動画像符号化装置の動作を示すフローチャートである。
【図４】本発明の実施の形態に係る動画像符号化装置の参照画像を探索する第１の探索動作例を示すフローチャートである。
【図５】本発明の実施の形態に係る動画像符号化装置の参照画像を探索する第２の探索動作例を示すフローチャートである。
【図６】本発明の実施の形態に係る動画像符号化装置の参照画像を探索する第３の探索動作例を示すフローチャートである。
【図７】本発明の実施の形態に係る動画像復号装置の全体構成を示すブロック図である。
【図８】本発明の実施の形態に係る動画像復号装置の動作を示すフローチャートである。
【符号の説明】
１ ……動画像符号化装置
１０……動画像記憶手段
１０ａ…動画像データ
１０ｂ…カメラ情報
１１……差分画像生成手段
１２……差分画像符号化手段
１３……復号差分画像生成手段
１４……復号画像生成手段
１５……画像蓄積切替手段
１５ａ…復号画像蓄積部（復号画像蓄積手段）
１５ｂ…参照画像切替部
１６……参照画像探索手段
１７……動き予測手段
１８……動き補償手段
１９……多重化手段
２ ……動画像復号装置
２１……分離手段
２２……差分画像復号化手段
２３……復号画像生成手段
２４……動き補償手段
２５……画像蓄積切替手段
２５ａ…復号画像蓄積部（復号画像蓄積手段）
２５ｂ…参照画像切替部
２６……参照画像選択手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a moving image encoding / decoding technique, and more particularly, to a moving image encoding device that encodes / decodes a moving image by motion compensation prediction, a method and a program thereof, and a moving image decoding device. It relates to a method and its program.
[0002]
[Prior art]
At present, as a method for compressing and encoding moving images, encoding methods such as MPEG-1, MPEG-2, and MPEG-4 standardized by WG11 (Working Group 11) of ISO / IEC JTC1 SC29 (hereinafter referred to as MPEG encoding method) There is). In these MPEG encoding methods, encoding is performed by removing and moving the temporal redundancy of a moving image by predicting and compensating for the motion of the moving image.
[0003]
In the MPEG coding method, an input image is subjected to motion compensation prediction in units of an area having a size of 16 horizontal pixels × 16 vertical lines called a macroblock. In MPEG-2, the difference between the image subjected to the motion compensation prediction and the input image is converted into a discrete cosine transform (DCT: Discrete Cosine Transform (DCT)) for each area having a size of 8 horizontal pixels × 8 vertical lines called a block. Then, quantization is performed so as to greatly reduce high-frequency components having low visual sensitivity, and moving image coding is performed by performing variable length coding.
[0004]
Note that the MPEG coding method has functions of forward prediction for predicting a future image from a past image and backward prediction for predicting a future image from a future image when performing motion prediction. . Here, predicting a past image from a future image refers to predicting an image for which encoding has been skipped from a current image. In the forward prediction and the backward prediction in the MPEG encoding method, the fact that the image to be processed and the image that is temporally close to each other has a high correlation between the images, and is used just before or before the image to be processed. The image immediately after is often used as a reference image to be referred to when performing motion prediction.
[0005]
However, according to the MPEG encoding method, when the movement of the camera, such as panning and tilting of the camera, is fast when shooting a moving image, or when there is a large change between images such as an image immediately after a cut change, a temporally close relationship is obtained. Even in the case of images, there is a problem that the correlation between the images is low, and the advantage of motion compensation prediction cannot be utilized.
[0006]
Recently, a coding method called JVT has been proposed as a new moving image coding method that solves the above-mentioned problem. This JVT coding scheme is a coding scheme that is being standardized by the Joint Video Team, which was jointly established by the MPEG Group and the International Telecommunication Union- Telecommunications Sector (ITU-T). H. ITU-T. The coding method is numbered 264, and standardization is scheduled to be completed in 2002 (see Non-Patent Document 1).
[0007]
This JVT coding scheme has the same basic framework as MPEG-2, but the unit of motion compensation prediction is not only a macroblock but also a block of 8 horizontal pixels × 8 vertical lines, 16 horizontal pixels × 8 vertical lines. There are several types, such as rectangular blocks. In addition, the discrete cosine transform (DCT) is performed only by integer operation in units of horizontal 4 pixels × vertical 4 lines. , And can be selected from previously encoded images (encoded images).
[0008]
According to the JVT encoding method, an arbitrary image can be selected as a reference image from among the encoded images, so that an error between the input image and the already encoded image is minimized. By selecting the image to be used as the reference image, it becomes possible to utilize the motion compensation prediction.
[0009]
[Non-patent document 1]
"Joint Final Committee Draftof Joint Video Specification (ITU-T Rec. H.264 | ISO / IEC 14496-10)", ITU-T | ISO / IEC, JVT-D157, 2002-08-10
[0010]
[Problems to be solved by the invention]
The JVT encoding method according to the above-mentioned conventional technology is based on the conventional technology. When compressing and encoding a moving image, even when the camera that has shot the moving image moves quickly or when a cut change occurs, the input image is Motion-compensated prediction can be performed by selecting, as a reference image, an image having the smallest error from the encoded image.
However, in order to select an image with the smallest error from the input image from among a large number of already encoded images, the amount of calculation for calculating the error becomes enormous, There is a problem that encoding takes time.
[0011]
The present invention has been made in view of such a problem, and based on camera information of a moving image, among images already encoded, an image having the highest correlation with an input image is referred to as a reference image. Accordingly, an object of the present invention is to provide a moving image encoding device, a method thereof, and a program therefor that generate encoded data with enhanced encoding efficiency by effectively utilizing motion compensation prediction. Still another object of the present invention is to provide a moving picture decoding apparatus for decoding the encoded data, a method thereof, and a program thereof.
[0012]
[Means for Solving the Problems]
The present invention has been devised to achieve the above object. First, the moving picture coding apparatus according to claim 1 converts a moving picture composed of time-series continuous pictures into a moving picture. A video encoding apparatus that performs encoding by performing motion compensation prediction based on camera information associated with the image at the time of capturing the image. A differential image generating means for generating a differential image based on the differential, and a differential image generating a differential image encoded data by compressing and encoding the differential image generated by the differential image generating means in block units of a specific size Encoding means, decoding difference image generation means for decoding the difference image encoded data generated by the difference image encoding means to generate a decoding difference image, and decoding difference generated by the decoding difference image generation means Picture A decoded image generating unit that generates a decoded image by adding the predicted image and the predicted image; a decoded image storing unit that stores the decoded image generated by the decoded image generating unit; and Reference image search means for searching a decoded image having a high correlation with the processing target image to be encoded from among the decoded images stored in the decoded image storage means and setting the decoded image as a reference image; A motion prediction unit that generates a motion vector that is a motion prediction of the processing target image based on the image; and a motion vector of the processing target image based on the motion vector generated by the motion prediction unit and the reference image. A motion compensating unit that generates the predicted image that predicts the reference image, multiplexes the identification information for identifying the reference image, the difference image encoded data, and the motion vector, and generates a code. And multiplexing means for generating the data, and configured to include a.
[0013]
According to this configuration, the moving image encoding apparatus performs the prediction by performing the motion compensation prediction on the individual images constituting the moving image and the already-encoded image by the difference image generating unit. A difference image is generated by taking a difference from the predicted image thus obtained, and the difference image is compression-encoded in block units of a specific size by a decoding difference image generation unit, thereby generating difference image encoded data. For example, differential cosine transform (DCT) is performed for each block of the MPEG coding method, and the result of the DCT is quantized to generate differential image coded data. This makes it possible to reduce the amount of information.
[0014]
Then, the moving picture coding apparatus decodes the difference picture coded data by the decoded difference picture generation means, and adds the decoded decoded difference picture and the predicted picture by the decoded picture generation means, so that the decoding side The reproduced image (decoded image) is reproduced and stored in the decoded image storage means. Then, a reference image for performing motion compensation prediction on an image having a high correlation with an image to be encoded (image to be processed) from among the decoded images stored in the decoded image storage means by the reference image search means. To search for.
[0015]
When searching for the reference image, the reference image search means determines the correlation based on the camera information of the captured image. Here, the camera information refers to camera parameters such as the absolute position of the camera, pan, tilt, roll, zoom, and focus. Note that the camera parameters are used here because images having the same camera parameters have the same angle of view and have similar images except for a change in luminance, so that the possibility of being predicted by motion compensation can be increased. That's why. This makes it possible to determine the correlation between the images from the plurality of decoded images based on the camera information, thereby reducing the processing operation for determining the similarity of the images themselves.
[0016]
Further, the moving picture coding apparatus calculates a motion vector between the processing target image and the reference image by the motion prediction means, and uses the motion vector and the reference image by the motion compensation means to use the motion vector in the difference image generation means. A predicted image is generated by predicting the motion of the image to be processed. Thus, the moving picture coding apparatus can sequentially code images continuous in time series by motion compensation prediction.
[0017]
Then, the moving picture coding apparatus multiplexes the identification information of the reference picture, the coded difference picture data, and the motion vector by the multiplexing means to generate coded data. As described above, by adding the identification information (reference image number) of the reference image to the encoded data, the process of searching for the reference image can be omitted when the encoded data is decoded on the decoding side.
[0018]
Also, in the moving picture coding apparatus according to claim 2, in the moving picture coding apparatus according to claim 1, the moving picture is photographed by a plurality of cameras, and the camera information is used to identify the camera. Wherein the reference image searching means searches for the reference image from the decoded images associated with the same camera identification information.
[0019]
According to this configuration, the moving image encoding apparatus adds camera identification information (ID number) for identifying a plurality of cameras to the camera information, so that the reference image search unit switches between the plurality of cameras and shoots the image. The camera and its camera parameters can be specified even in the case of a moving image.
[0020]
Furthermore, the moving picture coding method according to claim 3, when coding a moving picture composed of images continuous in time series by motion compensation prediction, based on camera information at the time of shooting the moving picture. A moving image encoding method for selecting and encoding a reference image used for motion compensation prediction from among encoded images already encoded and associated with the camera information, A reference image search step for searching, as a reference image, an image having a high correlation with a processing target image to be encoded of the time-sequential image from among the encoded images based on the reference image search step. A motion compensation prediction step of performing a motion compensation prediction on the processing target image based on the reference image searched in the step (a), thereby generating a difference image serving as a prediction error with respect to the processing target image. An image encoding step of encoding the difference image generated in the motion compensation prediction step, and an encoded image accumulating step of accumulating the encoding result of the image encoding step as the encoded image. It is characterized by the following.
[0021]
According to this method, the moving image encoding method includes, in the reference image searching step, an image having a high correlation with an image to be encoded (image to be processed) from among the encoded images already encoded. Search as a reference image for performing motion compensation prediction. At this time, the reference image search step searches for an image having a high correlation with the processing target image based on the camera information of the captured image of the decoded image. Here, the camera information includes camera parameters such as an absolute position of the camera, pan, tilt, roll, zoom, focus, and the like, and an ID number for identifying a camera when a moving image is captured by a plurality of cameras. Can be. As a result, it is possible to increase the possibility that the prediction by the motion compensation is successful.
[0022]
Then, in the moving image encoding method, in the motion compensation prediction step, the motion compensation prediction is performed on the processing target image based on the reference image searched in the reference image search step, thereby obtaining a prediction error for the processing target image. Generate a difference image. Since the difference image is generated from the reference image having high correlation based on the camera information, the prediction error has a small prediction error. Then, the difference image generated in the motion compensation prediction step is encoded in the image encoding step, and the encoding result in the image encoding step is accumulated as an encoded image in the encoded image accumulation step. As described above, by storing a plurality of encoded images (decoded images) to be searched in the above-described reference image search step, it is possible to efficiently encode images that are continuous in time series.
[0023]
Further, the moving image encoding program according to claim 4, a moving image composed of images that are continuous in time series, based on the camera information associated with the image when the moving image was taken, In order to perform encoding by performing motion compensation prediction, the computer is configured to function by the following means.
[0024]
That is, a difference image generation unit that generates a difference image based on a difference between the image and a prediction image that has been subjected to motion compensation prediction, and compresses the difference image generated by the difference image generation unit in blocks of a specific size. A difference image encoding unit that encodes and generates difference image encoded data, a decoded difference image generation unit that decodes the difference image encoded data generated by the difference image encoding unit, and generates a decoded difference image, A decoded image is generated by adding the decoded difference image generated by the decoded difference image generating unit and the prediction image, and the decoded image is stored in the decoded image storage unit. A reference image search unit that searches a decoded image stored in the image storage unit for a decoded image having a high correlation with the processing target image to be encoded and sets the decoded image as a reference image; A motion prediction unit configured to generate a motion vector serving as a motion prediction of the processing target image based on the target image and the reference image; and performing the processing based on the motion vector generated by the motion prediction unit and the reference image. A motion compensating unit that generates the predicted image in which a motion of the target image is predicted, multiplexes the identification information for identifying the reference image, the differential image encoded data, and the motion vector, and generates encoded data. Multiplexing means.
[0025]
According to this configuration, the moving image coding program performs the prediction by performing the motion compensation prediction on the individual images constituting the moving image and the already coded image by the difference image generating unit. A difference image is generated by taking a difference from the predicted image thus obtained, and the difference image is compression-encoded in block units of a specific size by a decoding difference image generation unit, thereby generating difference image encoded data. For example, differential cosine transform (DCT) is performed for each block of the MPEG coding method, and the result of the DCT is quantized to generate differential image coded data.
[0026]
Then, the moving picture coding program decodes the difference picture coded data by the decoded difference picture generation means, and adds the decoded decoded difference picture and the prediction picture by the decoded picture generation means, thereby obtaining the decoding side. The reproduced image (decoded image) is reproduced and stored in the decoded image storage means. The reference image search unit refers to the camera information of the captured image and, from among the decoded images stored in the decoded image storage unit, has a high correlation with the image to be encoded (image to be processed). An image is searched for as a reference image for performing motion compensation prediction. Here, the camera information includes camera parameters such as an absolute position of the camera, pan, tilt, roll, zoom, focus, and the like, and an ID number for identifying a camera when a moving image is captured by a plurality of cameras. Can be. As a result, it is possible to increase the possibility that the prediction by the motion compensation is successful.
[0027]
Then, the moving image encoding program calculates a motion vector between the processing target image and the reference image by the motion prediction unit, and uses the motion vector and the reference image by the motion compensation unit to use the motion vector in the difference image generation unit. A predicted image is generated by predicting the motion of the processing target image to be processed, and the multiplexing unit multiplexes the identification information of the reference image, the differential image encoded data, and the motion vector to generate encoded data.
[0028]
Furthermore, the moving picture decoding apparatus according to claim 5 includes a difference picture coded data obtained by coding a difference between pictures by motion compensation prediction, a motion vector between the pictures, and a decoded picture already decoded. A moving image decoding apparatus for decoding encoded data of a moving image obtained by multiplexing an image to be decoded and identification information specifying a decoded image having a high correlation,
Decoded image storage means for storing a decoded image already decoded; separating means for separating the coded data into the differential image coded data, the motion vector and the identification information; and decoding the differential image coded data Difference image decoding means for obtaining a decoded difference image, and reference image selection means for selecting a reference image used for motion prediction from among the decoded images stored in the decoded image storage means based on the identification information A motion compensator for generating a predicted image in which a motion of a decoded image is predicted based on the reference image selected by the reference image selector and the motion vector; and a prediction generated by the motion compensator. A decoded image generation unit configured to generate the decoded image based on the image and the decoded difference image, and to store the decoded image in the decoded image storage unit.
[0029]
According to such a configuration, the moving image decoding apparatus separates the differential image encoded data, the motion vector, and the identification information included in the encoded data by the separating unit. Then, the differential image encoded data is decoded by the differential image decoding means to obtain a decoded differential image. In this decoding, for example, inverse quantization and inverse DCT are sequentially performed on the difference image encoded data.
[0030]
In the moving image decoding apparatus, the reference image selection unit selects a reference image used for predicting the motion specified by the identification information from the decoded images stored in the decoded image storage unit. Accordingly, the moving picture decoding apparatus can select a reference image having a high correlation from the decoded images stored in the decoded image storage unit without performing an operation for selecting the reference image.
[0031]
Then, the video decoding device generates a predicted image in which the motion of the image to be decoded is predicted by the motion compensation unit based on the reference image selected by the reference image selection unit and the motion vector. Since the predicted image is generated from the reference image having a high correlation with the image to be decoded by the reference image selecting means, the predicted image has a small prediction error.
[0032]
Then, the moving image decoding apparatus generates a decoded image from the predicted image and the decoded difference image by the decoded image generation unit, and stores the decoded image in the decoded image storage unit. The decoded image stored here is a target for selecting a reference image by the reference image selection unit.
[0033]
In the moving picture decoding method according to the sixth aspect, the difference picture coded data obtained by coding the difference between pictures by motion compensation prediction, the motion vector between the pictures, and the decoded picture already decoded may be used. What is claimed is: 1. A moving image decoding method for decoding encoded data of a moving image including an image to be decoded and identification information specifying a decoded image having a high correlation, the decoding method comprising: A reference image selection step of selecting a reference image to be used for motion prediction from among the decoded images stored in the reference image, and an image to be decoded based on the reference image selected in the reference image selection step and the motion vector. A motion compensation step of generating a predicted image in which the motion of the motion image is predicted, and a decoded image obtained by decoding the differential image coded data based on the predicted image generated in the motion compensation step. An image decoding step, characterized in that the decoded image generated by the image decoding step; and a decoded image storing step of storing said decoded image storing means.
[0034]
According to this method, in the moving image decoding method, in the reference image selection step, motion prediction is performed based on the identification information included in the encoded data from among the decoded images already decoded and stored in the decoded image storage means. Select a reference image to be used for. Thus, a reference image having a high correlation can be selected from among the decoded images stored in the decoded image storage unit without performing an operation for selecting the reference image.
[0035]
Then, in the moving image decoding method, in the motion compensation step, a predicted image in which the motion of the image to be decoded is predicted is generated based on the reference image and the motion vector. Since the predicted image is generated from the reference image selected in the reference image selection step and having a high correlation with the image to be decoded, the predicted image has a small prediction error.
[0036]
In the moving picture decoding method, in the picture decoding step, a decoded picture obtained by decoding the difference picture encoded data based on the predicted picture is generated, and in the decoded picture accumulating step, the decoded picture is stored in the decoded picture accumulating means. To accumulate. The decoded image stored here is a target for selecting a reference image in the reference image selection step.
[0037]
The moving picture decoding program according to claim 7, further comprising: a difference picture coded data obtained by coding a difference between pictures by motion compensation prediction; a motion vector between the pictures; and a decoded picture already decoded. In order to decode encoded data of a moving image in which an image to be decoded and identification information specifying a decoded image having a high correlation are multiplexed, the computer is configured to function by the following means.
[0038]
That is, a separation unit that separates the encoded data into the difference image encoded data, the motion vector, and the identification information, a difference image decoding unit that decodes the difference image encoded data to obtain a decoded difference image, Based on the identification information, a reference image selecting unit that selects a reference image used for motion prediction from among already decoded images stored in the decoded image storing unit, A motion compensation unit that generates a predicted image in which the motion of the image to be decoded is predicted based on the reference image and the motion vector, based on the predicted image generated by the motion compensation unit and the decoded difference image, A decoded image generation means for generating a decoded image and storing the decoded image in the decoded image storage means.
[0039]
According to this configuration, the moving image decoding program separates the differential image encoded data, the motion vector, and the identification information included in the encoded data by the separating unit, and converts the differential image encoded data by the differential image decoding unit. The decoded image is decoded.
[0040]
In the moving image decoding program, the reference image selection unit selects a reference image used for predicting the motion specified by the identification information from the decoded images stored in the decoded image storage unit. Thus, the video decoding device can select the reference image having the highest correlation from the decoded images stored in the decoded image storage unit without performing the operation for selecting the reference image. .
[0041]
Then, the moving image decoding program generates a predicted image in which the motion of the image to be decoded is predicted by the motion compensation unit based on the reference image and the motion vector selected by the reference image selection unit. Since the predicted image is generated from the highly correlated reference image by the reference image selecting unit, the predicted image is an image having a small prediction error.
[0042]
Then, the moving image decoding program generates a decoded image from the predicted image and the decoded difference image by the decoded image generation unit, and stores the decoded image in the decoded image storage unit. The decoded image stored here is a target for selecting a reference image by the reference image selection unit.
[0043]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[Configuration of Video Encoding Apparatus]
FIG. 1 is a block diagram showing a configuration of a moving picture coding apparatus 1 according to the present invention. As shown in FIG. 1, the moving picture coding apparatus 1 performs motion compensation prediction on a moving picture (moving picture data) which is a picture which is continuous in time series, thereby converting the coded data obtained by coding the moving picture. To generate. Note that the moving image encoding apparatus 1 performs the motion of the moving image based on camera information (camera parameters such as pan and tilt and camera identification information (camera ID)) at the time of capturing each image of the moving image. A reference image, which is an image used when performing compensation prediction, is selected from images that have already been encoded (encoded images).
[0044]
Here, the moving picture coding apparatus 1 is divided into a moving picture storage means 10, a difference picture generating means 11, a difference picture coding means 12, a decoded difference picture generating means 13, a decoded picture generating means 14, It comprises a switching unit 15, a reference image search unit 16, a motion prediction unit 17, a motion compensation unit 18, and a multiplexing unit 19.
[0045]
The moving image storage means 10 stores a moving image to be encoded (moving image data 10a) and various kinds of camera information 10b at the time of shooting the moving image. It is composed of Each image of the moving image data 10a is associated with the camera information 10b by, for example, time information (time code). Note that the moving image storage means 10 may be configured to exist on a network as a moving image storage server.
[0046]
Here, the camera information 10b refers to camera parameters including the absolute position of the camera when capturing a moving image, the camera direction such as pan, tilt, and roll, and lens information such as zoom and focus. The camera parameters can be captured for each frame (image) by using an encoder for a virtual studio currently used in a broadcasting station. If there are a plurality of cameras, camera identification information (camera ID) for identifying the cameras is included in the camera information.
[0047]
The difference image generation unit 11 reads images in time series from the moving image data 10 a stored in the moving image storage unit 10, and calculates a difference image obtained by calculating a difference from the prediction image generated by the motion compensation unit 18. To generate. Hereinafter, each image read from the moving image storage unit 10 is referred to as a processing target image. The difference image generating means 11 may be constituted by a general subtractor for subtracting a prediction image from a processing target image in units of macroblocks (16 horizontal pixels × 16 vertical lines) in the MPEG encoding method, for example. it can. The difference image generated here is output to the difference image encoding unit 12.
[0048]
The difference image encoding unit 12 compresses and encodes the difference image generated by the difference image generation unit 11 in block units of a specific size to generate encoded difference image data. The difference image encoding means 12 performs discrete cosine transform (DCT) on the difference image in block units (8 horizontal pixels × 8 vertical lines), as in the case of the MPEG coding method, for example, to increase high-frequency components with low visual sensitivity. Compression encoding of the difference image is performed by performing quantization based on a quantization table set in advance to reduce the difference image. The difference image encoded data generated here is output to the multiplexing unit 19 as data for generating encoded data, and is also output to the decoded difference image generation unit 13 for performing motion compensation prediction.
[0049]
The decoded difference image generation unit 13 decodes the difference image encoded data generated by the difference image encoding unit 12 to generate a decoded difference image. The decoded difference image generating unit 13 performs, for example, inverse quantization and inverse DCT on the differential image coded data that has been subjected to the discrete cosine transform (DCT) and the quantization by the difference image encoding unit 12 in order. , And generates a decoded difference image. Note that the decoded difference image corresponds to an image obtained by decoding the difference image generated by the difference image generation unit 11. Here, the generated decoded difference image is output to the decoded image generation unit 14.
[0050]
The decoded image generation unit 14 generates a decoded image by adding the decoded difference image generated by the decoded difference image generation unit 13 and the predicted image generated by the motion compensation unit 18. The decoded image generating means 14 can be constituted by a general adder. Note that this decoded image corresponds to an image obtained by decoding the processing target image by the number of macroblocks. The decoded image generated here is output to the image storage switching unit 15.
[0051]
The image accumulation switching unit 15 accumulates the decoded image generated by the decoded image generation unit 14 and performs motion compensation prediction from among the accumulated decoded images based on the selection information notified from the reference image search unit 16. Is switched and output. Here, the image storage switching means 15 is composed of a decoded image storage unit 15a and a reference image switching unit 15b.
[0052]
The decoded image storage unit (decoded image storage unit) 15a stores the decoded image generated by the decoded image generation unit 14, and is configured by, for example, a frame memory. The decoded image storage unit 15a stores all decoded images within the capacity of the frame memory, and deletes the oldest decoded image and sequentially stores new decoded images when the capacity of the frame memory is exceeded. As described above, in the decoded image storage unit 15a, an image previously encoded as encoded data (encoded image) is accumulated as a decoded image.
[0053]
The reference image switching unit 15b switches the decoded images stored in the decoded image storage unit 15a based on the selection information notified from the reference image search unit 16, and outputs one decoded image as a reference image. is there. The reference image is output to the motion prediction unit 17 and the motion compensation unit 18. Further, the reference image switching unit 15b outputs the identification information (here, the reference image number) for identifying the output reference image to the multiplexing unit 19 as data for generating encoded data.
[0054]
The reference image search unit 16 selects one of the decoded images (encoded images) stored in the decoded image storage unit 15a based on the camera information 10b of the moving image storage unit 10 associated with the processing target image. , To search for a decoded image having the highest correlation with the image to be processed. Further, which decoded image has been searched is notified to the image storage switching means 15 as selection information. The decoded image searched here is used as a reference image for performing motion compensation prediction. The details of the search processing by the reference image search means 16 based on the camera information will be described in the description of the operation of the video encoding device 1.
[0055]
The motion prediction unit 17 is configured to calculate the direction and size of the motion prediction indicating how much the processing target image has moved with respect to the reference image based on the input processing target image and the reference image output from the image accumulation switching unit 15. That is, a motion vector is generated. The motion prediction unit 17 obtains a motion vector in block units by, for example, a block matching method used in conventional image coding. The motion vector obtained here is output to the multiplexing means 19 as data for generating encoded data, and is also output to the motion compensation means 18.
[0056]
The motion compensating unit 18 generates a predicted image in which the reference image output from the image storage switching unit 15 is predicted to have moved by the motion vector based on the motion vector generated by the motion predicting unit 17. . Note that the predicted image is an image predicted from an image input immediately before the current processing target image. The prediction image generated here is output to the difference image generation unit 11 and the decoded image generation unit 14.
[0057]
The multiplexing unit 19 includes the differential image encoded data generated by the differential image encoding unit 12, the motion vector generated by the motion prediction unit 17, and the identification information of the reference image switched by the image accumulation switching unit 15. (Reference image number) is multiplexed with each entropy encoding to generate encoded data obtained by encoding a moving image.
[0058]
With the above configuration, the moving picture coding apparatus 1 performs a plurality of coded pictures that have already been coded as coded data when performing coding based on motion compensation prediction such as the JVT coding scheme. From (accumulated as a decoded image in the decoded image storage unit 15a), an image having a high correlation can be selected as a reference image based on the camera information, so that the amount of calculation for selecting the reference image can be reduced.
In the moving picture coding apparatus 1, each means in the computer can be realized as each function program, and each function program can be combined and operated as a moving picture coding program.
[0059]
[Operation of Video Encoding Device]
Next, the operation of the video encoding device 1 will be described. Here, the operation will be described assuming that the moving image encoding apparatus 1 generates encoded data having a GOP (Group of Pictures) structure in a JVT encoding method, an MPEG encoding method, or the like.
[0060]
Here, the GOP structure will be briefly described with reference to FIG. The GOP structure is employed in MPEG-1 in order to perform fast forward, rewind, midway play, reverse play, and the like when playing back encoded data. This GOP structure is configured by using several pieces of screen data (# 1 to #N) as one unit, and enables random access in GOP units. The sequence header (SH: Sequence Header) is a header in which information for specifying a cue position for performing random access is written.
[0061]
The GOP includes a plurality of screen data (# 1 to #N), and at least one of the screen data is encoded with information only in the screen without performing motion compensation prediction. (Intra (intra-screen coded) image). Other screen data is data (inter (inter-screen predictive coding) image) coded by inter-screen motion compensation prediction. As a result, the encoded data having the GOP structure is decoded as a moving image on the decoding side based on the GOP intra-image.
Next, the operation of the video encoding device 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the video encoding device 1.
[0062]
(Preparation step)
First, an image itself (moving image data) obtained by capturing a moving image and camera information at the time of capturing the moving image are recorded (step S1). Here, the moving image data and the camera information are stored in the moving image storage unit 10. This camera information is camera parameters such as the absolute position of the camera, pan, tilt, roll, zoom, focus, and the like. When there are a plurality of cameras that have captured a moving image, the camera information (camera ID) is included in the camera information.
[0063]
Then, the operator of the video encoding device 1 performs a presetting at the time of performing the encoding process on the video encoding device 1 via input means (not shown) (Step S2). For example, by inputting a quantization level (quantized value), the quantization table of the differential image encoding unit 12 is set (updated), or the number of screens included in the GOP (see FIG. 2) is input. Sets the GOP interval. Here, the top screen (image) of the GOP is an intra image.
After performing the above preparations, the moving picture coding apparatus 1 reads out images (processing target pictures) in time series from the moving picture data 10a stored in the moving picture storage means 10 and codes the processing target pictures. Become
[0064]
(Image coding step; Intra image)
First, the moving picture coding apparatus 1 codes the first picture of the first GOP as an intra picture (Step S3). Since this intra image is encoded without performing motion compensation prediction, the intra image is encoded (for example, DCT transformed) and quantized by the difference image encoding unit 12, and the decoded difference image generation unit 13 Inverse quantization and inverse DCT are performed. The decoded difference image generated by the decoded difference image generation means 13 becomes a decoded image of the entire intra image. That is, the decoded image of the intra image is generated without passing through the difference image generation unit 11 and the decoded image generation unit 14.
[0065]
Then, the video encoding device 1 stores the decoded image in the decoded image storage unit 15a of the image storage switching unit 15. That is, an image (encoded image) previously encoded as encoded data is stored in the decoded image storage unit 15a (step S4).
Then, the moving image encoding device 1 encodes an inter image as a next image (image to be processed) in the following steps.
[0066]
(Reference image search step)
In the moving image encoding apparatus 1, the reference image search means 16 compares the camera information corresponding to the processing target image and the camera information corresponding to the decoded image (encoded image) stored in the decoded image storage unit 15a. In step S5, a decoded image having a high correlation with the processing target image is searched for based on the respective camera information, and the decoded image having the highest correlation is set as a reference image. . Note that the reference image switching unit 15b switches the output of the decoded image stored in the decoded image storage unit 15a based on the selection information notified as a search result of the reference image search unit 16, It is output to the motion prediction means 17 and the motion compensation means 18. Information (reference image number) for identifying the reference image is output to the multiplexing unit 19 by the reference image switching unit 15b.
[0067]
(Motion compensation prediction step)
Then, the video encoding device 1 performs motion compensation prediction on a processing target image (here, an inter image) in block units based on the reference image (Step S6). That is, the moving picture coding apparatus 1 obtains a motion vector between the processing target image and the reference image by the motion prediction unit 17, and predicts that the processing target image has moved from the motion vector and the reference image by the motion compensation unit 18. A predicted image is generated. Then, the difference image generation unit 11 generates a difference image serving as a prediction error of the motion compensation prediction based on a difference between the processing target image and the prediction image.
[0068]
(Image coding step; Inter image)
Then, the moving picture coding apparatus 1 codes the difference picture in block units by the difference picture coding means 12 (Step S7). Further, the difference image encoding means 12 determines whether or not encoding of one screen of the inter image has been completed (step S8), and if encoding of one screen has not been completed (No), the process proceeds to step S6. Returning, the motion compensation prediction and the encoding in the block unit are repeated.
[0069]
Note that the coded difference image data is encoded by the multiplexing unit 19 together with the motion vector generated by the motion prediction unit 17 and the reference image number output from the reference image switching unit 15b. And multiplexed and output as encoded data.
[0070]
(Encoded image storage step)
On the other hand, when the encoding of one screen is completed (Yes in step S8), the encoded image of one screen in which each block is put together is stored in the decoded image storage unit 15a (step S9). That is, the moving picture coding apparatus 1 decodes the difference picture coded data coded in the block unit by the difference picture coding means 12 by the decoded difference picture generation means 13, and The decoded image generation unit 14 adds the predicted image predicted by the compensation unit 18 and the predicted image. Then, the decoded images for the number of blocks generated there are stored in the decoded image storage unit 15a as encoded images of one screen.
[0071]
Here, it is determined whether or not decoding of the inter image has been completed for the GOP interval set in step S2 (step S10). If not completed (No), the process returns to step S5, and the next inter image is decoded. The encoding is performed based on the reference image that is found to have high correlation by the camera information.
[0072]
On the other hand, if the inter image has been encoded for the GOP interval (Yes in step S10), it is determined whether the next image (image to be processed) exists (step S11), and if the image to be processed exists (Yes). ) Sets the image as an intra image (the top image of the GOP) (step S12), and returns to step S3 to encode the intra image. If the next image (image to be processed) does not exist in step S11 (No), it means that encoding of all images (moving images) has been completed, and the operation ends.
[0073]
By performing the above steps, when encoding is performed based on motion compensation prediction such as the JVT encoding method, a reference image used for the motion compensation prediction is set to an image having a high correlation with the processing target image based on camera information. Can be selected, so that an optimal reference image can be selected while suppressing the amount of calculation.
[0074]
[Reference image search operation example]
Next, an operation example in which the reference image search means 16 searches for a reference image based on camera information will be described with reference to FIGS. 4 to 6 are flowcharts showing an operation of searching for a reference image based on camera information, and exemplify three specific operation examples. This search operation is a specific operation of step S5 in the flowchart of FIG.
[0075]
(First search operation example)
First, a first search operation example will be described with reference to FIG. 4 (see FIG. 1 as appropriate). First, the reference image search means 16 selects a decoded image having the same ID number of the camera that captured the processing target image from the decoded image storage unit 15a (step S21).
[0076]
Then, a difference (Pdiff) of the absolute position of the camera is calculated between the processing target image and the decoded image selected in step S21 (step S22). For example, when the absolute position of the camera is represented by three-dimensional coordinates (x, y, z), Pdiff is obtained by adding the x coordinate difference, the y coordinate difference, and the z coordinate difference.
[0077]
Further, a difference (Ddiff) in the direction of the camera is calculated between the processing target image and the decoded image selected in step S21 (step S23). For example, the direction of the camera is defined as pan, tilt, and roll at the absolute position of the camera, and Ddiff is obtained by adding the respective differences.
[0078]
Further, a difference (Ldiff) of lens information is calculated between the processing target image and the decoded image selected in step S21 (step S24). For example, the lens information is set to zoom and focus, and Ldiff is obtained by adding the respective differences.
[0079]
Further, a difference (Tdiff) of time information is calculated between the processing target image and the decoded image selected in step S21 (step S25). For example, the time information is set to the shooting time of each frame of the shot image, and the difference is set to Tdiff. Note that a time code at the time of capturing a moving image can be used as the capturing time.
Then, the difference R1 between the processing target image and the decoded image selected in step S21 is calculated based on the equation (1) (step S26).
[0080]

[0081]
Here, Wp, Wd, Wl, and Wt indicate weighting factors of Pdiff, Ddiff, Ldiff, and Tdiff, respectively. Then, the decoded image having the smallest difference value R1 obtained by the equation (1) is determined as the image having the highest correlation with the processing target image, and is determined as the reference image (step S27).
[0082]
As described above, in the first search operation example, the reference image search unit 16 performs weighting on various types of camera information to search for a reference image. Thus, for example, for an image in which camera zoom is not performed very often, but camera switching frequently occurs, a reference image having a high correlation is obtained by giving a weight to the absolute position of the camera rather than the lens information. It becomes possible.
[0083]
(Example of second search operation)
Next, a second search operation example will be described with reference to FIG. 5 (see FIG. 1 as appropriate). First, the reference image search unit 16 selects a decoded image having the same ID number of the camera that has captured the processing target image from the decoded image storage unit 15a (step S31).
Then, a difference (Pdiff) in the absolute position of the camera is calculated between the image to be processed and the decoded image selected in step S31 (step S32), and a difference (Ddiff) in the direction of the camera is calculated (step S33). ), And calculate the difference (Ldiff) of the lens information (step S34). Steps S31 to S34 are the same operations as steps S21 to S24 in FIG.
[0084]
Then, a difference R2 between the processing target image and the decoded image selected in step S31 is calculated based on the equation (2) (step S35).
[0085]
R2 = Wp × Pdiff + Wd × Ddiff + W1 × Ldiff Equation (2)
[0086]
Here, Wp, Wd and Wl indicate weighting factors of Pdiff, Ddiff and Ldiff, respectively. Then, a decoded image in which the difference R2 obtained by the above equation (2) is equal to or smaller than a predetermined threshold is selected (step S36).
[0087]
Then, among the decoded images selected in step S36, the one closest to the time in the moving image of the processing target image is determined as the reference image (step S37).
As described above, in the second search operation example, the reference images are narrowed down by the camera information, and the image closest in time is set as the reference image. This makes it possible to provide a moving image with a simple configuration in which a configuration or a program for narrowing down a reference image based on camera information or a program is added to a configuration or a program in which an image closest in time is used as a reference image as in the conventional MPEG encoding method Can be realized. This makes it possible to realize the video encoding device 1 or the video encoding program by utilizing the conventional resources.
[0088]
(Third search operation example)
Next, a third search operation example will be described with reference to FIG. 6 (see FIG. 1 as appropriate). First, the reference image search means 16 selects a decoded image having the same ID number of the camera that captured the processing target image from the decoded image storage unit 15a (step S41).
Then, a decoded image in which the difference in the zoom value (zoom parameter) of the lens information between the image to be processed and the decoded image selected in step S41 is equal to or smaller than a predetermined threshold is selected (step S42).
[0089]
Then, a decoded image in which the difference in absolute position of the camera between the image to be processed and the decoded image selected in step S42 is equal to or smaller than a predetermined threshold is selected (step S43).
Further, a decoded image in which the difference in the camera direction (pan, tilt, and roll) between the processing target image and the decoded image selected in step S43 is equal to or smaller than a predetermined threshold is selected (step S44).
[0090]
Then, among the decoded images selected in step S44, the one closest to the time in the moving image of the processing target image is determined as the reference image (step S45).
As described above, in the third search operation example, the reference information is searched by giving priority to the camera information in the order of the zoom value of the lens information, the absolute position of the camera, and the direction of the camera. As a result, the reference images can be reliably narrowed down from those having the highest priority.
[0091]
The operation example in which the reference image search means 16 searches for a reference image has been described above, but various other search operations are also possible. For example, in the third search operation example, it is also possible to change the order of priorities to operate. If the moving image is captured by one camera, step S21 (FIG. 4), step S31 (FIG. 5), and step S41 (FIG. 6) of the first to third search operation examples. Can be deleted.
[0092]
[Configuration of Video Decoding Device]
Next, the configuration of the video decoding device 2 will be described with reference to FIG. FIG. 7 is a block diagram illustrating a configuration of the video decoding device 2. The moving image decoding device 2 outputs a decoded image (moving image) obtained by decoding encoded data generated by being encoded by the moving image encoding device 1 (FIG. 1).
Here, the moving picture decoding apparatus 2 includes a separating unit 21, a difference image decoding unit 22, a decoded image generating unit 23, a motion compensating unit 24, an image storage switching unit 25, and a reference image selecting unit 26. Was configured.
[0093]
The separating unit 21 separates and decodes the input coded data to extract the differential image coded data, the motion vector, and the reference image included in the coded data.
[0094]
The difference image decoding unit 22 decodes the difference image encoded data extracted by the separation unit 21 to generate a decoded difference image. The difference image decoding means 22 generates, for example, a decoded difference image by sequentially performing inverse quantization and inverse DCT on the encoded difference image data. Note that the same quantization table as used in the moving picture coding apparatus 1 (FIG. 1) is used for the inverse quantization.
[0095]
The decoded image generation unit 23 generates a decoded image by adding the decoded difference image generated by the difference image decoding unit 22 and the predicted image generated by the motion compensation unit 24. The decoded image generated here is output to the outside and to the image accumulation switching means 25. By outputting the decoded image in time series, the moving image before being encoded by the moving image encoding device 1 (FIG. 1) is reproduced.
[0096]
The motion compensating unit 24 generates a predicted image in which the decoded image to be decoded is predicted to move based on the motion vector extracted by the separating unit 21 and the reference image output from the image accumulation switching unit 25. Things. This predicted image is output to the decoded image generation unit 23.
[0097]
The image accumulation switching unit 25 accumulates the decoded image generated by the decoded image generation unit 23 and performs motion compensation prediction on an image having a high correlation with the next image to be decoded from among the accumulated decoded images. Are switched and output as reference images. Here, the image storage switching unit 25 is configured by a decoded image storage unit 25a and a reference image switching unit 25b.
[0098]
The decoded image storage unit (decoded image storage unit) 25a stores the decoded image generated by the decoded image generation unit 23, and includes, for example, a frame memory. The decoded image storage unit 25a stores all decoded images within the capacity of the frame memory, and deletes older decoded images and sequentially stores new decoded images when the capacity of the frame memory is exceeded. The capacity of the frame memory is the same as that of the moving picture coding device 1 (FIG. 1).
[0099]
The reference image switching unit 25b switches the decoded images stored in the decoded image storage unit 25a based on the selection information notified from the reference image selection unit 26, and outputs one decoded image as a reference image. is there. Note that this reference image is output to the motion compensation unit 24.
[0100]
The reference image selection unit 26 selects a decoded image corresponding to the reference image number extracted by the separation unit 21 from among the decoded images stored in the decoded image storage unit 25a, and uses the image storage switching unit 25 as selection information. And outputs the decoded image as a reference image.
[0101]
With the above configuration, the moving picture decoding apparatus 2 performs the decoding on the decoded picture which has already been decoded based on the reference picture number specified and included in the coded data generated by the moving picture coding apparatus 1 (FIG. 1). Since the reference image is selected from the reference image and the encoded data is decoded based on the reference image, it is not necessary to search for the reference image, and the amount of calculation for decoding can be suppressed.
[0102]
[Operation of Video Decoding Device]
Next, the operation of the video decoding device 2 will be described with reference to FIG. 8 (see FIG. 7 as appropriate). FIG. 8 is a flowchart illustrating the operation of the video decoding device 2. Here, the operation will be described assuming that the video decoding device 2 decodes encoded data having the GOP structure (see FIG. 2) generated by the video decoding device 1 (FIG. 1).
[0103]
(Image decoding step; Intra image)
First, the video decoding device 2 decodes the differential image encoded data separated by the separating unit 21 as an intra image which is the first image of the GOP (Step S51). Since the intra image is decoded without performing motion compensation prediction, the image that has been inversely quantized and inversely DCT-transformed by the differential image decoding unit 22 becomes a decoded image of the entire intra image. Then, the video decoding device 2 stores the decoded image in the decoded image storage unit 25a of the image storage switching unit 25 (Step S52).
[0104]
(Reference image selection step)
Then, the video decoding device 2 selects the decoded image stored in the decoded image storage unit 25a by the reference image selection unit 26 based on the reference image number separated by the separation unit 21 (Step S53). Note that the reference image selected here is output to the motion compensation unit 24 by the reference image switching unit 25b switching the output of the decoded image stored in the decoded image storage unit 25a.
[0105]
(Motion compensation step)
Then, the moving image decoding apparatus 2 generates a predicted image from the reference image output from the image storage switching unit 25 and the motion vector separated by the separating unit 21 by the motion compensating unit 24 (Step S54).
[0106]
(Image decoding step; Inter image)
The moving image decoding device 2 decodes the inter image based on the predicted image (Step S55). That is, the difference image encoded data separated by the separation unit 21 is decoded by the difference image decoding unit 22 to generate a decoded difference image, and the decoded difference image and the prediction image predicted by the motion compensation unit 24 are compared with each other. Is added by the decoded image generation means 23, thereby generating a decoded image obtained by decoding the inter image.
[0107]
(Decoded image storage step)
The video decoding device 2 stores the decoded image in the decoded image storage unit 25a (Step S56).
Then, the video decoding device 2 determines whether or not decoding of all the encoded data has been completed (Step S57). If the decoding has been completed (Yes), the operation ends. On the other hand, if the decoding has not been completed (No), it is determined whether or not the next input coded data is obtained by coding an intra image (step S58). Here, when the encoded data is obtained by encoding the inter image (Yes), the process returns to step S53, and the inter image is decoded by the reference image selected based on the reference image number. On the other hand, when the inter image is not encoded (No in step S58), the process returns to step S51 to decode the intra image.
[0108]
Through the above steps, a reference image is selected from already decoded images based on a reference image number included in the encoded data and indicated, and the encoded data is decoded based on the reference image. Therefore, since it is not necessary to search for a reference image, the amount of calculation for decoding can be reduced.
[0109]
【The invention's effect】
As described above, the moving picture coding apparatus, the method and the program thereof, and the moving picture decoding apparatus, the method and the program according to the present invention have the following excellent effects.
[0110]
According to the first, third or fourth aspect of the present invention, when a moving image is coded by motion compensated prediction, an image which has already been coded by a camera parameter at the time of shooting the moving image. Since the image having the highest correlation among the (encoded images) is used as the reference image, an image with a small error can be selected as the reference image, and the encoding efficiency can be improved. Further, the amount of calculation for searching for a reference image can be reduced, and the encoding time can be reduced.
[0111]
According to the second aspect of the present invention, identification information for identifying a camera when a moving image is captured by a plurality of cameras is added to the camera information, and already encoded based on the identification information and the camera parameters. Since the image having the highest correlation is selected as the reference image from among the images (encoded images), even a moving image in which a cut change occurs, an image with a small error can be selected as the reference image. Therefore, encoding efficiency can be improved.
[0112]
According to the fifth, sixth, or seventh aspect of the present invention, encoded data is decoded based on a reference image number that specifies a reference image when an individual image is predicted by motion compensation. There is no need to search for a highly correlated image from among decoded images already decoded on the decoding side, and encoded data can be decoded at high speed.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a video encoding device according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram for explaining a GOP structure.
FIG. 3 is a flowchart showing an operation of the video encoding device according to the embodiment of the present invention.
FIG. 4 is a flowchart showing a first search operation example for searching for a reference image in the video encoding device according to the embodiment of the present invention.
FIG. 5 is a flowchart showing a second search operation example for searching for a reference image in the moving picture coding apparatus according to the embodiment of the present invention.
FIG. 6 is a flowchart illustrating a third search operation example for searching for a reference image in the moving picture coding apparatus according to the embodiment of the present invention.
FIG. 7 is a block diagram illustrating an overall configuration of a video decoding device according to an embodiment of the present invention.
FIG. 8 is a flowchart showing an operation of the video decoding device according to the embodiment of the present invention.
[Explanation of symbols]
1 ...... Moving picture coding device
10. Moving image storage means
10a: Moving image data
10b… Camera information
11 difference image generation means
12... Difference image encoding means
13 ... Decoded difference image generating means
14 ... Decoded image generation means
15. Image storage switching means
15a... Decoded image storage section (decoded image storage means)
15b: Reference image switching unit
16 Reference image search means
17 ... Motion prediction means
18 ... Motion compensation means
19 multiplexing means
2 ...... Moving picture decoding device
21 Separation means
22... Difference image decoding means
23 ... Decoded image generation means
24 ... Motion compensation means
25 image storage switching means
25a: decoded image storage unit (decoded image storage unit)
25b: Reference image switching unit
26 Reference image selecting means

Claims

A moving image encoding apparatus that encodes a moving image composed of images continuous in time series by performing motion compensation prediction based on camera information associated with the image at the time of capturing the moving image And
A difference image generating unit configured to generate a difference image based on a difference between the image and a predicted image obtained by performing motion compensation prediction;
Difference image encoding means for compressing and encoding the difference image generated by the difference image generation means in block units of a specific size to generate difference image encoded data;
Decoding difference image generation means for decoding the difference image encoded data generated by the difference image encoding means to generate a decoded difference image;
A decoded image generation unit that generates a decoded image by adding the decoded difference image generated by the decoded difference image generation unit and the prediction image;
Decoded image storage means for storing the decoded image generated by the decoded image generation means;
A reference image search unit configured to search for a decoded image having a high correlation with a processing target image to be coded from among the decoded images stored in the decoded image storage unit based on the camera information, and to use the reference image as a reference image; ,
Based on the processing target image and the reference image, a motion prediction unit that generates a motion vector serving as a motion prediction of the processing target image,
Based on the motion vector generated by the motion prediction unit and the reference image, a motion compensation unit that generates the predicted image in which the motion of the processing target image is predicted,
Multiplexing means for multiplexing the identification information for identifying the reference image, the difference image encoded data, and the motion vector, and generating encoded data;
A moving picture coding apparatus comprising:

The moving image is taken by a plurality of cameras, the camera information includes camera identification information for identifying the camera,
The moving picture coding apparatus according to claim 1, wherein the reference picture search means searches the decoded picture associated with the same camera identification information for the reference picture.

When encoding a moving image composed of time-sequential images by motion compensation prediction, it is already encoded based on the camera information at the time of capturing the moving image, and is associated with the camera information. A moving image encoding method for selecting and encoding a reference image used for motion compensation prediction from among encoded images,
Based on the camera information, from among the coded images, a reference image search step of searching as a reference image an image having a high correlation with a processing target image to be coded for the time-series continuous image,
A motion compensation prediction step of performing a motion compensation prediction on the processing target image based on the reference image searched in the reference image search step, thereby generating a difference image serving as a prediction error with respect to the processing target image;
An image encoding step of encoding the difference image generated in the motion compensation prediction step;
An encoded image accumulation step of accumulating an encoding result by the image encoding step as the encoded image,
A moving picture coding method comprising:

In order to encode a moving image composed of images continuous in time series by performing motion compensation prediction based on camera information associated with the image when the moving image was captured, a computer is provided. ,
Difference image generating means for generating a difference image by a difference between the image and a predicted image subjected to motion compensation prediction,
Difference image encoding means for compressing and encoding the difference image generated by the difference image generation means in block units of a specific size to generate difference image encoded data;
Decoding difference image generation means for decoding the difference image encoded data generated by the difference image encoding means to generate a decoded difference image;
A decoded image generation unit that generates a decoded image by adding the decoded difference image generated by the decoded difference image generation unit and the prediction image, and stores the decoded image in the decoded image storage unit;
A reference image search unit configured to search for a decoded image having a high correlation with a processing target image to be encoded from among the decoded images stored in the decoded image storage unit based on the camera information, and to use the reference image as a reference image;
A motion prediction unit configured to generate a motion vector serving as a motion prediction of the processing target image based on the processing target image and the reference image;
A motion compensation unit that generates the predicted image in which the motion of the processing target image is predicted, based on the motion vector generated by the motion prediction unit and the reference image;
Multiplexing means for multiplexing the identification information for identifying the reference image, the difference image encoded data, and the motion vector, to generate encoded data,
A moving image encoding program characterized by functioning as:

Differentiated image encoded data obtained by encoding the difference between images by motion compensation prediction, a motion vector between the images, and identification specifying a decoded image having a high correlation with an image to be decoded among already decoded images. A moving picture decoding apparatus for decoding encoded data of a moving picture multiplexed with information,
Decoded image storage means for storing a decoded image already decoded;
Separating means for separating the coded data into the difference image coded data, the motion vector, and the identification information,
Difference image decoding means for decoding the difference image encoded data to make a decoded difference image,
Based on the identification information, from among the decoded images stored in the decoded image storage unit, a reference image selection unit that selects a reference image used for motion prediction,
Based on the reference image selected by the reference image selection unit and the motion vector, a motion compensation unit that generates a predicted image that predicts the motion of the image to be decoded,
A decoded image generation unit configured to generate the decoded image based on the predicted image generated by the motion compensation unit and the decoded difference image, and to store the decoded image in the decoded image storage unit;
A video decoding device comprising:

Differentiated image encoded data obtained by encoding the difference between images by motion compensation prediction, a motion vector between the images, and identification specifying a decoded image having a high correlation with an image to be decoded among already decoded images. A moving image decoding method for decoding encoded data of a moving image including information and
A reference image selecting step of selecting a reference image used for motion prediction from among the decoded images already decoded and stored in the decoded image storage unit based on the identification information;
A motion compensation step of generating a predicted image that predicts the motion of a decoded image based on the reference image selected in the reference image selection step and the motion vector;
An image decoding step of generating a decoded image obtained by decoding the difference image encoded data based on the predicted image generated in the motion compensation step;
A decoded image storage step of storing the decoded image generated in the image decoding step in the decoded image storage means;
A moving picture decoding method comprising:

Differentiated image encoded data obtained by encoding the difference between images by motion compensation prediction, a motion vector between the images, and identification specifying a decoded image having a high correlation with an image to be decoded among already decoded images. In order to decode encoded data of a moving image multiplexed with information, a computer
Separating means for separating the coded data into the difference image coded data, the motion vector, and the identification information,
Difference image decoding means for decoding the difference image encoded data to obtain a decoded difference image;
A reference image selecting unit that selects a reference image used for motion prediction from among the already decoded images stored in the decoded image storage unit based on the identification information;
A motion compensation unit that generates a predicted image in which the motion of the image to be decoded is predicted based on the reference image selected by the reference image selection unit and the motion vector;
A decoded image generation unit that generates the decoded image based on the predicted image generated by the motion compensation unit and the decoded difference image, and stores the decoded image in the decoded image storage unit;
A moving image decoding program characterized by functioning as: