JP2005303738A

JP2005303738A - Image processing apparatus

Info

Publication number: JP2005303738A
Application number: JP2004118231A
Authority: JP
Inventors: Osamu Itokawa; 修糸川
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-04-13
Filing date: 2004-04-13
Publication date: 2005-10-27

Abstract

PROBLEM TO BE SOLVED: To provide a high-efficiency encoding technique having high versatility without requiring preprocessing for setting a background image. SOLUTION: An image processing apparatus has a moving image inputting means (2101) for inputting a moving image consisting of a plurality of frames; an image selecting means (101) for selecting a frame to be a background, irrespective of the presence or absence of an object in an image in selecting the frame to be the background image from among the moving images; a shape data generating means (103) for comparing the frame to be the background image with a frame of an input image, and generating shape data on the basis of the difference value; a background data correcting means (104) for correcting the background image on the basis of the shape data; and an arbitrary shape image encoding means (2105) for encoding the input image as an arbitrary shape image together with the shape image. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、画像処理技術に関し、特に動画像を符号化するための画像処理技術に関する。 The present invention relates to an image processing technique, and more particularly to an image processing technique for encoding a moving image.

近年、デジタル技術を利用して画像を対象物（オブジェクト）毎に分離、合成する処理が注目されている。特に、動画像の符号化においては、国際標準としてＭＰＥＧ−４符号化方式が規格化されている。ＭＰＥＧ−４符号化方式では、任意形状の画像データが扱え、オブジェクト毎の符号化／復号化を行うことにより、符号化効率の向上、伝送路に応じたデータ配分、画像の再加工等、従来は困難であったさまざまな応用が期待されている。 In recent years, attention has been focused on processing for separating and synthesizing images for each object (object) using digital technology. In particular, in moving picture coding, the MPEG-4 coding system is standardized as an international standard. In the MPEG-4 encoding method, image data of an arbitrary shape can be handled, and encoding / decoding is performed for each object, thereby improving encoding efficiency, data distribution according to a transmission path, image reprocessing, and the like. Various applications that were difficult are expected.

また、動画像処理における対象物の抽出方法としては、一般に背景差分方式という手法が知られている。これは、予め撮影した背景画像と実際の入力画像とを比較することにより、変化点を検出する方法である。以下、簡単にその原理について説明する。 As a method for extracting an object in moving image processing, a method called a background difference method is generally known. In this method, a change point is detected by comparing a background image captured in advance with an actual input image. The principle will be briefly described below.

まず、画像平面上の座標（ｘ，ｙ）における点の入力画像の画素値をＰｃ（ｘ，ｙ）、背景画像の画素値をＰｂ（ｘ，ｙ）とする。このとき、Ｐｃ（ｘ，ｙ）とＰｂ（ｘ，ｙ）との差分をとり、その絶対値をある閾値Ｔｈと比較する。 First, assume that the pixel value of the input image at the point at the coordinates (x, y) on the image plane is Pc (x, y), and the pixel value of the background image is Pb (x, y). At this time, the difference between Pc (x, y) and Pb (x, y) is taken and the absolute value is compared with a certain threshold Th.

判定式の例を示すと次の通りである。
ｉｆ（｜Ｐｃ（ｘ，ｙ）−Ｐｂ（ｘ，ｙ）｜ ≦ Ｔｈ）Ｓ（ｘ，ｙ）＝０；
ｅｌｓｅＳ（ｘ，ｙ）＝１； … （１） An example of the determination formula is as follows.
if (| Pc (x, y) −Pb (x, y) | ≦ Th) S (x, y) = 0;
else S (x, y) = 1; (1)

差分絶対値が閾値Ｔｈ以下の場合、この点（ｘ，ｙ）は変化なしということで、Ｐｃ（ｘ，ｙ）は背景と判定され、Ｓ（ｘ，ｙ）＝０となる。一方、差分絶対値が閾値Ｔｈを超えている場合は、値が変化したということで抽出対象とみなされＳ（ｘ、ｙ）＝１となる。画面上のすべての点において上記の判定を行うことで、１フレーム分の抽出が完了する。 When the absolute difference value is equal to or smaller than the threshold Th, this point (x, y) is unchanged, and Pc (x, y) is determined as the background, and S (x, y) = 0. On the other hand, when the difference absolute value exceeds the threshold Th, it is regarded as an extraction target because the value has changed, and S (x, y) = 1. By performing the above determination at all points on the screen, extraction for one frame is completed.

図１５は、背景差分方式とＭＰＥＧ−４符号化方式を組み合わせた従来のシステムの構成を示すブロック図である。図１５において、画像入力部２１０１は、例えばカメラの撮像部であり、動画像を入力する部分である。画像分離部２１０２は、背景画像として処理するフレームと任意形状画像として処理するフレームとを切り替えるスイッチ回路である。背景画像に設定したフレームは、矩形画像符号化部２１０４で、１フレーム分符号化される。形状データ生成部２１０３では、先の背景画像と現在入力された画像との比較により、形状データＳ（ｘ，ｙ）を生成する。一般に形状データＳ（ｘ，ｙ）は、オブジェクトであるか否かの２値データである。任意形状画像符号化部２１０５では、画像データと形状データを入力とし、符号化した結果をビットストリームとして出力する。多重化部２１０６では図示した矩形画像と任意形状画像の２種類のビットストリームの他に、オーディオのビットストリームなどを１本のデータにまとめるべく、多重化処理を行う。 FIG. 15 is a block diagram showing the configuration of a conventional system that combines the background difference method and the MPEG-4 encoding method. In FIG. 15, an image input unit 2101 is an imaging unit of a camera, for example, and is a part for inputting a moving image. The image separation unit 2102 is a switch circuit that switches between a frame to be processed as a background image and a frame to be processed as an arbitrary shape image. The frame set as the background image is encoded for one frame by the rectangular image encoding unit 2104. The shape data generation unit 2103 generates shape data S (x, y) by comparing the previous background image with the currently input image. In general, the shape data S (x, y) is binary data indicating whether or not the object is an object. The arbitrary shape image encoding unit 2105 receives image data and shape data as input, and outputs the encoded result as a bit stream. In addition to the two types of bit streams of the rectangular image and the arbitrary shape image shown in the figure, the multiplexing unit 2106 performs a multiplexing process so as to combine the audio bit stream and the like into one data.

図１７は、図１５のブロック図をより具体的に説明するための図である。フレーム２３０１から２３１８は、画像入力部２１０１から入力されたフレームのデータ列であり、先頭のフレーム２３０１が背景のみが映っている画像、フレーム２３１１以降が抽出対象も映っている画像である。画像分離部２１０２では、フレーム２３０１を背景画像に、フレーム２３１１以降を任意形状画像に切り替えている。この切り替えの最も簡単な方法は、入力画像を見ながら、手動で操作するのが確実である。 FIG. 17 is a diagram for explaining the block diagram of FIG. 15 more specifically. Frames 2301 to 2318 are data sequences of frames input from the image input unit 2101. The top frame 2301 is an image in which only the background is shown, and the frames 2311 and after are images in which an extraction target is also shown. In the image separation unit 2102, the frame 2301 is switched to the background image, and the frames after the frame 2311 are switched to the arbitrary shape image. The simplest method of switching is surely operated manually while viewing the input image.

実際のデータ処理の流れを図１９を用いて説明する。画像２５００が背景画像２３０１に対応し、画像２５０１が任意形状画像２３１１に対応しているものとする。このとき、差分処理部２３５１では、画像２５００と２５０１が入力され、先に説明した背景差分法により、対応する画素間の差が閾値以下か否かで２値化したデータを出力する。形状データ２５１１は２値化された形状データであり、黒い部分が背景、白い部分がオブジェクトを示している。同様に、画像２５０２、２５０３がそれぞれフレーム２３１２、２３１３に対応しているとすると、生成される形状データは、背景画像２５００との差分閾値処理をした結果、形状データ２５１２、２５１３となる。 The actual data processing flow will be described with reference to FIG. Assume that the image 2500 corresponds to the background image 2301 and the image 2501 corresponds to the arbitrary shape image 2311. At this time, the images 2500 and 2501 are input to the difference processing unit 2351, and binarized data is output according to whether the difference between corresponding pixels is equal to or less than a threshold value by the background difference method described above. The shape data 2511 is binarized shape data, and the black portion indicates the background and the white portion indicates the object. Similarly, if the images 2502 and 2503 correspond to the frames 2312 and 2313, respectively, the generated shape data becomes the shape data 2512 and 2513 as a result of the difference threshold processing with the background image 2500.

図１７では、任意形状画像符号化にＭＰＥＧ−４のＣｏｒｅＰｒｏｆｉｌｅのエンコーダ２３５３を用いている。以下、この符号化方式について説明する。 In FIG. 17, an MPEG-4 Core Profile encoder 2353 is used for arbitrary shape image coding. Hereinafter, this encoding method will be described.

オブジェクト（対象物）を符号化する場合には、オブジェクトの形と位置の情報を符号化する必要がある。そのために、まず、オブジェクトを内包する矩形領域を設定し、この矩形の左上位置の座標と矩形領域の大きさを符号化する。この矩形領域はバウンディングボックスと呼ばれる。また、画像データ、形状データにより表現されるオブジェクト内部の領域をＶＯＰ（ＶｉｄｅｏＯｂｊｅｃｔＰｌａｎｅ）と呼ぶ。 When encoding an object (object), it is necessary to encode information on the shape and position of the object. For this purpose, first, a rectangular area containing the object is set, and the coordinates of the upper left position of the rectangle and the size of the rectangular area are encoded. This rectangular area is called a bounding box. A region inside the object expressed by image data and shape data is called a VOP (Video Object Plane).

図２１は、図１７の符号化部２３５３の細部構成を示すブロック図である。尚、入力されるデータは画像の輝度・色差データと形状データであり、それらはマクロブロック単位で処理される。 FIG. 21 is a block diagram illustrating a detailed configuration of the encoding unit 2353 of FIG. The input data is image brightness / color difference data and shape data, which are processed in units of macroblocks.

まず、イントラモードでは、各ブロックをＤＣＴ部２７０１において離散コサイン変換（ＤＣＴ）し、量子化部２７０２で量子化する。量子化されたＤＣＴ係数と量子化幅は、可変長符号化部２７１２で可変長符号化される。また、インターモードで利用する参照画像を生成するため、一旦量子化されたデータは、逆量子化部２７０３、逆ＤＣＴ部２７０４を経て、画像データに戻される。これはローカルデコードの画像とも呼ばれる。この画像はメモリ部２７０５に保存される。 First, in the intra mode, each block is subjected to discrete cosine transform (DCT) in the DCT unit 2701 and quantized in the quantization unit 2702. The quantized DCT coefficient and the quantization width are variable length encoded by the variable length encoding unit 2712. In addition, in order to generate a reference image used in the inter mode, the once quantized data is returned to the image data through the inverse quantization unit 2703 and the inverse DCT unit 2704. This is also called a locally decoded image. This image is stored in the memory unit 2705.

一方、インターモードでは、動き検出部２７０７においてメモリ部２７０５に保存されている時間的に隣接する別のＶＯＰの中からブロックマッチングをはじめとする動き検出方法により動きを検出し、動きベクトル予測部２７０８で対象マクロブロックに対して誤差のもっとも小さい予測マクロブロックを検出する。誤差の最も小さい予測マクロブロックへの動きを示すデータが動きベクトルである。尚、予測マクロブロックを生成するために参照する画像を参照ＶＯＰと呼ぶ。 On the other hand, in the inter mode, the motion detection unit 2707 detects a motion from another temporally adjacent VOP stored in the memory unit 2705 by a motion detection method such as block matching, and the motion vector prediction unit 2708. The prediction macroblock with the smallest error is detected with respect to the target macroblock. Data indicating the motion to the prediction macroblock with the smallest error is a motion vector. An image referred to for generating a prediction macroblock is referred to as a reference VOP.

検出された動きベクトルに基づいて、参照ＶＯＰを動き補償部２７０６において動き補償し、最適な予測マクロブロックを取得する。次に対象となるマクロブロックと対応する予測マクロブロックとの差分を求め、この差分画像に対してＤＣＴ部２７０１でＤＣＴを施し、ＤＣＴ変換係数を量子化部２７０２で量子化する。この時も量子化されたデータは、逆量子化部２７０３、逆ＤＣＴ部２７０４を経て、画像データに戻される。この時の逆ＤＣＴ部２７０４の出力は差分画像となるので、前の画像と合成した後、メモリ部２７０５に保存される。 Based on the detected motion vector, the motion compensation unit 2706 performs motion compensation on the reference VOP to obtain an optimal prediction macroblock. Next, the difference between the target macroblock and the corresponding predicted macroblock is obtained, the DCT is applied to the difference image by the DCT unit 2701, and the DCT transform coefficient is quantized by the quantization unit 2702. Also at this time, the quantized data is returned to the image data through the inverse quantization unit 2703 and the inverse DCT unit 2704. Since the output of the inverse DCT unit 2704 at this time is a difference image, it is stored in the memory unit 2705 after being combined with the previous image.

一方、形状データは、形状符号化ＣＡＥ部２７０９で符号化される。但し、ここで実際にＣＡＥ符号化が行われるのは境界ブロックのみであり、ＶＯＰ内のブロック（ブロック内全てのデータがオブジェクト内）やＶＯＰ外のブロック（ブロック内全てのデータがオブジェクト外）はヘッダ情報のみが可変長符号化部２７１２に送られる。また、ＣＡＥ符号化が施される境界ブロックは、画像データと同様に、インターモードにおいては、動き検出部２７０７による動き検出を行い、動きベクトル予測部２７０８で動きベクトルの予測を行う。そして、動き補償した形状データと前フレームの形状データとの差分値に対しＣＡＥ符号化を行う。 On the other hand, the shape data is encoded by the shape encoding CAE unit 2709. However, only the boundary block is actually subjected to CAE encoding here, and blocks in the VOP (all data in the block is in the object) and blocks outside the VOP (all data in the block is outside the object) Only the header information is sent to the variable length coding unit 2712. In the inter mode, the motion detection unit 2707 performs motion detection on the boundary block on which CAE encoding is performed, and the motion vector prediction unit 2708 predicts motion vectors in the inter mode. Then, CAE encoding is performed on the difference value between the shape data subjected to motion compensation and the shape data of the previous frame.

また、図１７では、矩形画像符号化にＭＰＥＧ−４のＳｉｍｐｌｅＰｒｏｆｉｌｅのエンコーダ２３５２を用いている。以下、この符号化方式について説明する。ＳｉｍｐｌｅＰｒｏｆｉｌｅのエンコーダ２３５２は、ＣｏｒｅＰｒｏｆｉｌｅのエンコーダ２３５３と下位互換がある。図２１において、任意形状に関する処理、すなわち形状符号化ＣＡＥ部２７０９、メモリ部２７１０、動き補償部２７１１を除いたものがＳｉｍｐｌｅＰｒｏｆｉｌｅのエンコーダとなる。画像データの処理手順は、ＣｏｒｅＰｒｏｆｉｌｅの画像データの処理と同じである。背景画像は１フレーム分符号化すればよいので、必ずしも動画像の符号化方式を用いなくてもよく、静止画符号化方式でもよい。 In FIG. 17, an MPEG-4 Simple Profile encoder 2352 is used for rectangular image coding. Hereinafter, this encoding method will be described. The Simple Profile encoder 2352 is backward compatible with the Core Profile encoder 2353. In FIG. 21, processing relating to an arbitrary shape, that is, a shape profile CAE unit 2709, a memory unit 2710, and a motion compensation unit 2711 are removed as an encoder of a simple profile. The processing procedure of the image data is the same as the processing of the image data of the Core Profile. Since the background image only needs to be encoded for one frame, the moving image encoding method is not necessarily used, and the still image encoding method may be used.

ＭＵＸ処理部２３５４は、多重化部２１０６に対応し、多重化を行う。 The MUX processing unit 2354 corresponds to the multiplexing unit 2106 and performs multiplexing.

次に復号側の処理について説明する。
図１６が、全体の概略構成ブロック図である。符号化側で１本にまとめられたビットストリームは、分離部２２０１で各デコーダが入力できるビットストリームに分離処理される。このうち、符号化された背景画像は、矩形画像復号化部２２０２により１フレームの画像データに復号される。任意形状画像復号化部２２０３では、形状データと形状データに対応した画像データを復号する。画像合成部２２０４では、形状データの値を基に、背景画像と任意形状画像を画素単位で切り替え、合成画像を生成する。画像出力部２２０５は、一般的にはモニタ等の画像表示装置である。 Next, processing on the decoding side will be described.
FIG. 16 is an overall schematic block diagram. The bitstreams combined into one on the encoding side are separated into bitstreams that can be input by the decoders in the separation unit 2201. Among these, the encoded background image is decoded into one frame of image data by the rectangular image decoding unit 2202. The arbitrary shape image decoding unit 2203 decodes shape data and image data corresponding to the shape data. The image composition unit 2204 generates a composite image by switching the background image and the arbitrary shape image in units of pixels based on the value of the shape data. The image output unit 2205 is generally an image display device such as a monitor.

図１８と図２０を用いて、図１６のブロック図をより具体的に説明する。図１６における分離部２２０１、矩形画像符号化部２２０２、任意形状符号化部２２０３、画像合成部２２０４がそれぞれ、図１８におけるＤＥＭＵＸ処理２４５１、ＭＰＥＧ−４ＳｉｍｐｌｅＰｒｏｆｉｌｅのデコーダ２４５２、ＭＰＥＧ−４ＣｏｒｅＰｒｏｆｉｌｅのデコーダ２４５３、合成処理部２４５４に対応している。フレーム２４１１から２４１８は、画像出力部２２０５において表示されるフレームのデータ列であり、図１７における入力画像２３１１から２３１８に対応している。 The block diagram of FIG. 16 will be described more specifically with reference to FIGS. 18 and 20. The separation unit 2201, the rectangular image encoding unit 2202, the arbitrary shape encoding unit 2203, and the image composition unit 2204 in FIG. This corresponds to the decoder 2453 and the composition processing unit 2454. Frames 2411 to 2418 are data strings of frames displayed in the image output unit 2205 and correspond to the input images 2311 to 2318 in FIG.

ＭＰＥＧ−４ＳｉｍｐｌｅＰｒｏｆｉｌｅデコーダ２４５２の出力は図２０における背景画像２６００となる。背景画像は最初に１フレーム復号するだけなので、デコーダは静止画の復号化方式でもよい。また、背景画像は必ず他の画像と合成処理されるので、背景画像がそのまま出力されることはない。 The output of the MPEG-4 Simple Profile decoder 2452 is the background image 2600 in FIG. Since the background image is first decoded by one frame, the decoder may be a still image decoding method. Further, since the background image is always combined with another image, the background image is not output as it is.

ＭＰＥＧ−４ＣｏｒｅＰｒｏｆｉｌｅデコーダ２４５３は、まず形状データ２６０１、画像データ２６１１を出力する。合成処理２４５４では、形状データ２６０１で背景と判断されている画素に関しては、背景画像２６００の画素を、オブジェクトと判断されている画素に関しては、画像データ２６１１の画素を選択し、合成画像２６２１を生成する。この画像は、符号化側における画像２５０１に対応している。画像２５０２に対応する復号画像は、形状データ２６０２と画像データ２６１２ならびに復号背景画像２６００から合成され、画像２６２２となる。同様に、画像２５０３に対応する復号画像は、画像２６２３となる。 The MPEG-4 Core Profile decoder 2453 first outputs shape data 2601 and image data 2611. In the composition processing 2454, a pixel of the background image 2600 is selected for the pixel determined to be the background in the shape data 2601, and a pixel of the image data 2611 is selected for the pixel determined to be the object to generate the composite image 2621. To do. This image corresponds to the image 2501 on the encoding side. A decoded image corresponding to the image 2502 is synthesized from the shape data 2602, the image data 2612, and the decoded background image 2600 to become an image 2622. Similarly, the decoded image corresponding to the image 2503 is an image 2623.

ここで、ＭＰＥＧ−４ＣｏｒｅＰｒｏｆｉｌｅデコーダ２４５３の詳細を、図２２を用いて説明する。基本的には図２１の逆の動作であり、マクロブロック単位で、画像の輝度・色差データと形状データが復号される。 Details of the MPEG-4 Core Profile decoder 2453 will be described with reference to FIG. Basically, the operation is the reverse of FIG. 21, and the luminance / color difference data and shape data of the image are decoded in units of macroblocks.

まず、イントラモードでは、可変長復号化部２８０１が、量子化されたＤＣＴ係数の復号化を行い、それを逆量子化部２８０２に入力する。逆量子化部２８０２の出力は、復号したＤＣＴ係数となり、逆ＤＣＴ部２８０３の入力となる。逆ＤＣＴ部２８０３では、逆ＤＣＴ処理を行うことにより、復号画像を出力する。この時の画像は、インターモードで利用する参照画像とするため、メモリ部２８０４に保存される。 First, in the intra mode, the variable length decoding unit 2801 decodes the quantized DCT coefficient and inputs it to the inverse quantization unit 2802. The output of the inverse quantization unit 2802 becomes the decoded DCT coefficient and becomes the input of the inverse DCT unit 2803. The inverse DCT unit 2803 outputs a decoded image by performing inverse DCT processing. The image at this time is stored in the memory unit 2804 so as to be a reference image used in the inter mode.

一方、インターモードでは、逆量子化部２８０２、逆ＤＣＴ部２８０３を経て復号される画像は、フレーム間の差分画像である。また、動きベクトル復号部２８０６においては、動きベクトルを復号する。動き補償部２８０５では、復号した動きベクトルを用いてメモリ部２８０４に保存されている前フレームの画像から動き補償した画像を生成する。この画像と先の差分画像を合成することにより、インターモードにおける画像の復号が行われる。 On the other hand, in the inter mode, an image decoded through the inverse quantization unit 2802 and the inverse DCT unit 2803 is a difference image between frames. The motion vector decoding unit 2806 decodes the motion vector. The motion compensation unit 2805 uses the decoded motion vector to generate a motion compensated image from the previous frame image stored in the memory unit 2804. By synthesizing this image and the previous difference image, the image in the inter mode is decoded.

また、形状データは、可変長復号化部２８０１から形状復号化ＣＡＥ部２８０７を経て復号される。インターモードの場合は、メモリ部２８０８に保存しておいた前フレームの形状データを動きベクトル復号化部２８０６によって復号した動きベクトルを用いて、動き補償部２８０９で動き補償した後、形状復号化ＣＡＥ部２８０７にて復号される。 Further, the shape data is decoded from the variable length decoding unit 2801 through the shape decoding CAE unit 2807. In the case of the inter mode, the motion compensation unit 2809 performs motion compensation using the motion vector obtained by decoding the shape data of the previous frame stored in the memory unit 2808 by the motion vector decoding unit 2806, and then performs shape decoding CAE. Decoded by the unit 2807.

図１８では、矩形画像復号化にＭＰＥＧ−４のＳｉｍｐｌｅＰｒｏｆｉｌｅのデコーダ２４５２を用いている。以下、この符号化方式について説明する。ＳｉｍｐｌｅＰｒｏｆｉｌｅのデコーダ２４５２は、ＣｏｒｅＰｒｏｆｉｌｅのデコーダ２４５３と下位互換がある。図２２において、任意形状に関する処理、すなわち形状復号化ＣＡＥ部２８０７、メモリ部２８０８、動き補償部２８０９を除いたものがＳｉｍｐｌｅＰｒｏｆｉｌｅのデコーダ２４５２となる。画像データの処理手順は、ＣｏｒｅＰｒｏｆｉｌｅの画像データの処理と同じである。背景画像は１フレーム分復号化すればよいので、必ずしも動画像の復号化方式を用いなくてもよく、静止画復号化方式でもよい。 In FIG. 18, an MPEG-4 Simple Profile decoder 2452 is used for rectangular image decoding. Hereinafter, this encoding method will be described. The Simple Profile decoder 2452 is backward compatible with the Core Profile decoder 2453. In FIG. 22, processing relating to an arbitrary shape, that is, a shape decoding CAE unit 2807, a memory unit 2808, and a motion compensation unit 2809 are removed as a simple profile decoder 2452. The processing procedure of the image data is the same as the processing of the image data of the Core Profile. Since the background image only needs to be decoded for one frame, it is not always necessary to use a moving image decoding method, and a still image decoding method may be used.

しかしながら、上述したシステムは、あらかじめ背景のみの画像を用意しておかなければならないという欠点があった。また、入力画像と背景画像の間に相対的な位置のずれが生じると、正しくオブジェクトが抽出できない、という問題もある。カメラの動く範囲があらかじめわかっている場合は、下記の特許文献１に開示されているように、スプライトと呼ばれる広範囲の画像を用意しておくことで、ある程度の対策は可能だが、あらかじめ前準備が必要であることには変わりがない。 However, the system described above has a drawback in that an image of only the background must be prepared in advance. There is also a problem that an object cannot be correctly extracted if a relative position shift occurs between the input image and the background image. If the camera movement range is known in advance, it is possible to take some measures by preparing a wide range of images called sprites as disclosed in Patent Document 1 below. There is no change in what is necessary.

つまり、上記のシステムは、精度のよい抽出結果を得るにはよい構成であるが、符号化効率の向上を目的としてみた場合、適応範囲が限定されてしまうため、単純に矩形の動画像符号化システムから任意形状の動画像符号化システムへ置き換えることができないという問題があった。 In other words, the above system is a good configuration for obtaining accurate extraction results, but the range of adaptation is limited for the purpose of improving the encoding efficiency, so that a rectangular moving image encoding is simply performed. There was a problem that the system could not be replaced with a video coding system of arbitrary shape.

特開２００２−１１８８４３号公報JP 2002-118843 A

本発明は、このような事情を考慮してなされたものであり、事前処理を必要とせず、矩形の動画像符号化システムとの単純な置き換えが可能である汎用性の高い高能率符号化技術を提供することを目的とする。 The present invention has been made in consideration of such circumstances, and does not require pre-processing, and can be simply replaced with a rectangular moving image encoding system. The purpose is to provide.

本発明の画像処理装置は、複数のフレームから構成される動画像を入力する動画像入力手段と、前記動画像の中から背景画像とするフレームを選択する際に、画像内の被写体の有無に関わらず背景画像とするフレームを選択する画像選択手段と、前記背景画像とするフレームと前記入力画像のフレームとを比較し、その差分値を基に形状データを生成する形状データ生成手段と、前記形状データを基に前記背景画像を補正する背景データ補正手段と、前記入力画像を前記形状データと共に任意形状画像として符号化する任意形状画像符号化手段とを有することを特徴とする。
また、本発明の画像処理方法は、複数のフレームから構成される動画像を入力する動画像入力ステップと、前記動画像の中から背景画像とするフレームを選択する際に、画像内の被写体の有無に関わらず背景画像とするフレームを選択する画像選択ステップと、前記背景画像とするフレームと前記入力画像のフレームとを比較し、その差分値を基に形状データを生成する形状データ生成ステップと、前記形状データを基に前記背景画像を補正する背景データ補正ステップと、前記入力画像を前記形状データと共に任意形状画像として符号化する任意形状画像符号化ステップとを有することを特徴とする。 The image processing apparatus according to the present invention detects a moving image input unit that inputs a moving image composed of a plurality of frames, and whether or not there is a subject in the image when selecting a frame as a background image from the moving images. Regardless of the image selection means for selecting a frame as a background image, the shape data generation means for comparing the frame as the background image and the frame of the input image, and generating shape data based on the difference value; The image processing apparatus includes a background data correcting unit that corrects the background image based on shape data, and an arbitrary shape image encoding unit that encodes the input image together with the shape data as an arbitrary shape image.
The image processing method according to the present invention also includes a moving image input step of inputting a moving image composed of a plurality of frames, and a selection of a frame as a background image from the moving images. An image selection step for selecting a frame as a background image regardless of the presence, a shape data generation step for comparing the frame as the background image with the frame of the input image, and generating shape data based on the difference value; A background data correction step for correcting the background image based on the shape data, and an arbitrary shape image encoding step for encoding the input image as an arbitrary shape image together with the shape data.

また、本発明のプログラムは、上記の画像処理方法の各ステップをコンピュータに実行させるためのプログラムである。
また、本発明の記録媒体は、上記の画像処理方法の各ステップをコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The program of the present invention is a program for causing a computer to execute each step of the image processing method.
The recording medium of the present invention is a computer-readable recording medium recording a program for causing a computer to execute each step of the image processing method.

画像内の被写体の有無に関わらず背景画像とするフレームを選択することができるので、背景画像を設定するための事前処理が必要なくなる。また、矩形画像符号化手段を必要とせず、任意形状画像符号化手段のみで符号化することができる。また、背景データを補正することにより、動画像のノイズや微小変動による影響を抑制でき、復号後の合成画像においても、より自然な画像を得ることができる。 Since a frame as a background image can be selected regardless of the presence or absence of a subject in the image, pre-processing for setting the background image is not necessary. Further, the rectangular image encoding means is not required, and the encoding can be performed only by the arbitrary shape image encoding means. Further, by correcting the background data, it is possible to suppress the influence of moving image noise and minute fluctuations, and it is possible to obtain a more natural image even in a composite image after decoding.

以下、図面を参照して、本発明の好適な実施形態について詳細に説明する。
＜第１の実施形態＞
本発明の第１の実施形態について説明する。図１（ａ）は、符号化側の全体構成を示すブロック図である。背景技術で説明した図１５との大きな違いは、背景画像と任意形状画像の選択方法が異なる（画像選択部１０１）こと、形状データ生成後にそれを反映させた背景データ補正部１０４を設けていること、矩形形状符号化部２１０４が不要であること、などである。図１７における２３０１に相当する背景画像というものをあらかじめ用意しておかないことが、本実施形態の大きな特徴である。本実施形態では、フレーム列の中から背景画像を選択し、符号化する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
<First Embodiment>
A first embodiment of the present invention will be described. FIG. 1A is a block diagram showing an overall configuration on the encoding side. The major difference from FIG. 15 described in the background art is that the selection method of the background image and the arbitrary shape image is different (image selection unit 101), and the background data correction unit 104 that reflects the shape data after generation is provided. That is, the rectangular encoding unit 2104 is unnecessary. A major feature of this embodiment is that a background image corresponding to 2301 in FIG. 17 is not prepared in advance. In the present embodiment, a background image is selected from the frame sequence and encoded.

画像入力部２１０１から入力された複数のフレーム列から構成される動画像は、画像選択部１０１によって、背景とするフレームとそれ以外のフレームに選択される。その際、画像内の被写体の有無に関わらず背景とするフレームを選択する。形状データ生成部１０３では、フレーム画像を入力し、その入力画像のフレームと背景のフレームとを比較し、そのフレーム差分により生成したデータを２値化して任意形状符号化用の形状データを生成する。背景データ補正部１０４では、背景として選択されたフレームを、形状データの生成結果を基に補正処理を行う。任意形状画像符号化部２１０５では、先に生成した形状データと画像データを同じフレームのタイミングで入力し、符号化を行う。その際、イントラモード（フレーム内符号化）とインターモード（フレーム間符号化）とを切り替えて行う。また、タイミングを合わせるためのデータ一時保存（メモリ）機能については、ここでは明示しないが、各機能ブロックが有しているものとする。画像データの符号化および形状データの符号化の詳細は、背景技術で説明したとおりである。多重化部２１０６は、複数のエンコードしたビットストリームの多重化を行う。ビデオの他にオーディオのビットストリームなども多重化する部分であるが、ここでは特に本発明固有の処理がないため、説明を省略する。 A moving image composed of a plurality of frame sequences input from the image input unit 2101 is selected by the image selection unit 101 as a background frame and other frames. At that time, a frame as a background is selected regardless of the presence or absence of a subject in the image. The shape data generation unit 103 inputs a frame image, compares the frame of the input image with a background frame, binarizes the data generated based on the frame difference, and generates shape data for arbitrary shape encoding. . The background data correction unit 104 performs a correction process on the frame selected as the background based on the generation result of the shape data. The arbitrary shape image encoding unit 2105 inputs the previously generated shape data and image data at the same frame timing and performs encoding. At this time, switching is performed between the intra mode (intraframe coding) and the inter mode (interframe coding). Further, the data temporary storage (memory) function for matching the timing is not explicitly shown here, but it is assumed that each function block has. Details of the encoding of the image data and the encoding of the shape data are as described in the background art. The multiplexing unit 2106 multiplexes a plurality of encoded bit streams. In addition to video, an audio bit stream and the like are multiplexed. However, since there is no processing specific to the present invention, description thereof is omitted here.

背景画像の選択方法は後に詳しく説明するが、ここではまず、その後の処理である、形状データの生成とそれに伴う背景データの補正処理について、図３の画面イメージ、および図５、図６、図７のフローチャートを用いて説明する。 The background image selection method will be described in detail later. First, regarding the subsequent generation of shape data generation and background data correction processing associated therewith, the screen image of FIG. 3 and FIGS. This will be described with reference to the flowchart of FIG.

まず図５のステップＳ７０１にて、最初の画像を入力する。ここでは、図３の画像５０１を最初の画像とする。背景フレームの判定ステップＳ７０２で、このフレームは背景と判定されるので、ステップＳ７０３にて、一時保存される。一時保存は差分処理を行うためのものである。次にステップＳ７０４にて、このフレームの形状データとして、全画面（全画素）オブジェクトの形状データを生成する。形状データ５１１は、この形状データを示している。終了判定Ｓ７０９では最終フレームか否かの判定を行う。この時点ではＮＯなので、ステップＳ７０１に戻り、次のフレームの画像入力に移る。２番目の画像５０２は、背景フレーム判定Ｓ７０２において、ＮＯと判定されるので、ステップＳ７０５に進み、背景フレームとの差分検出を行う。このとき背景となる画像が、先ほど一時保存しておいたフレーム５０１である。ステップＳ７０６にて、差分値の２値化を行い、形状データ生成する。フレーム５０１と５０２の差分値から生成される形状データは、形状データ５１２となる。補正判定ステップＳ７０７では、背景フレームの補正を行うか否かの判定を行う。判定方法の詳細は後に説明する。ここでは、ＮＯと判定として、ステップＳ７０１に戻る。３番目の画像５０３は、背景フレーム判定Ｓ７０２において、ＮＯと判定されるので、２番目の画像５０２と同様、ステップＳ７０５、Ｓ７０６の処理を行う。フレーム５０１と５０３の差分値から生成される形状データは、形状データ５１３となる。補正判定ステップＳ７０７で、ＹＥＳと判定されると、ステップＳ７０８に進み、背景データの補正を行う。背景データ補正時には、背景フレームを含むすべての入力画像と、各フレームにおける形状データの生成結果を利用する。補正方法の詳細は後ほど説明する。ステップＳ７０９の終了判定では、すべてのフレームを処理したかどうかを判定し、終了（ＹＥＳ）ならば一連の処理を終える。 First, in step S701 in FIG. 5, the first image is input. Here, the image 501 in FIG. 3 is the first image. In the background frame determination step S702, since this frame is determined to be the background, it is temporarily stored in step S703. Temporary storage is for performing difference processing. In step S704, shape data of the full screen (all pixels) object is generated as shape data of the frame. The shape data 511 indicates this shape data. In end determination S709, it is determined whether or not it is the last frame. Since NO is determined at this time, the process returns to step S701 to shift to image input for the next frame. Since the second image 502 is determined as NO in the background frame determination S702, the process advances to step S705 to detect a difference from the background frame. At this time, the background image is the frame 501 temporarily stored. In step S706, the difference value is binarized to generate shape data. Shape data generated from the difference value between the frames 501 and 502 is shape data 512. In the correction determination step S707, it is determined whether or not the background frame is to be corrected. Details of the determination method will be described later. Here, it is determined as NO, and the process returns to step S701. Since the third image 503 is determined as NO in the background frame determination S 702, the processes of steps S 705 and S 706 are performed in the same manner as the second image 502. Shape data generated from the difference value between the frames 501 and 503 is shape data 513. If YES is determined in the correction determination step S707, the process proceeds to step S708 to correct the background data. At the time of background data correction, all input images including the background frame and the shape data generation result in each frame are used. Details of the correction method will be described later. In the end determination in step S709, it is determined whether or not all the frames have been processed. If the end (YES), a series of processing is ended.

次に図６（ａ）を用いて、背景フレーム以外のフレームを処理する方法の詳細を説明する。図６（ａ）において、ステップＳ８０１では、背景データ補正に必要な初期設定処理を行う。初期設定処理の詳細を、図７（ｂ）に示す。ステップＳ８０８のカウント値のセットは、各位置で背景と判定されたデータがいくつあるかを示すカウンタ値をセットするものである。ある位置（ｘ，ｙ）におけるカウンタ値をｃ（ｘ，ｙ）とすると、背景フレームのみを入力した状態では、画面全体で次式になる。 Next, details of a method for processing frames other than the background frame will be described with reference to FIG. In FIG. 6A, in step S801, an initial setting process necessary for background data correction is performed. Details of the initial setting process are shown in FIG. The setting of the count value in step S808 is to set a counter value indicating how many data are determined to be background at each position. Assuming that the counter value at a certain position (x, y) is c (x, y), the following equation is obtained for the entire screen when only the background frame is input.

ｃ（ｘ，ｙ）＝１・・・（２） c (x, y) = 1 (2)

ステップＳ８０９の画素値のセットは、各位置における背景の累積画素値の初期設定をするものである。ある位置（ｘ，ｙ）における画素値をｐ（ｘ，ｙ）とすると、背景フレームのみを入力した状態では、ｐ（ｘ，ｙ）は背景フレームの画素値そのものとなる。ステップＳ８１０の終了判定で、画面内の全画素を処理した段階でこの初期設定は終了となる。水平方向の画素数をＷ、垂直方向の画素数をＨとすると、Ｗ×Ｈの回数だけ上記の処理を行うことになる。ＲＧＢのカラー画像の場合はＷ×Ｈ×３、４：２：２と呼ばれるフォーマットの場合はＷ×Ｈ×２となる。図３のフレーム５０１を背景フレームとすると、領域５５１すなわち画面上すべての位置で、累積画素値は背景フレームの画素値となる。 The set of pixel values in step S809 is for initial setting of the cumulative pixel value of the background at each position. Assuming that the pixel value at a certain position (x, y) is p (x, y), p (x, y) is the pixel value of the background frame when only the background frame is input. This initial setting ends when all the pixels in the screen are processed in the end determination in step S810. If the number of pixels in the horizontal direction is W and the number of pixels in the vertical direction is H, the above processing is performed the number of times W × H. In the case of an RGB color image, W × H × 3, and in the case of a format called 4: 2: 2, the format is W × H × 2. Assuming that the frame 501 in FIG. 3 is a background frame, the accumulated pixel value becomes the pixel value of the background frame at the region 551, that is, at all positions on the screen.

図６（ａ）のステップＳ８０２では、背景以外のフレームを入力し、ステップＳ８０３で、背景データ補正のための前処理を行う。詳細を、図７（ｃ）を用いて説明する。まず、ステップＳ８１１にてあらかじめ保存しておいた背景フレームと、現入力との差分値を検出する。ステップＳ８１２にて、差分値と閾値の比較を行い、２値化する。ステップＳ８１３にて、２値化した結果の形状データが背景だった場合は、ステップＳ８１４にて、画素数のカウントを行う。ある位置（ｘ，ｙ）の２値化した結果が背景であったとすると、ｃ（ｘ，ｙ）は、０から１になる。ステップＳ８１５では、画素値の加算を行う。（ｘ，ｙ）における最初の背景フレームの画素値が５２で、現フレームの画素値が５０であったとすると、加算後の画素値は次式になる。 In step S802 of FIG. 6A, a frame other than the background is input, and in step S803, preprocessing for background data correction is performed. Details will be described with reference to FIG. First, a difference value between the background frame stored in advance in step S811 and the current input is detected. In step S812, the difference value and the threshold value are compared and binarized. If the binarized shape data is background in step S813, the number of pixels is counted in step S814. If the binarized result at a certain position (x, y) is the background, c (x, y) is changed from 0 to 1. In step S815, pixel values are added. If the pixel value of the first background frame at (x, y) is 52 and the pixel value of the current frame is 50, the pixel value after the addition is as follows.

ｐ（ｘ，ｙ）＝５２＋５０＝１０２・・・（３） p (x, y) = 52 + 50 = 102 (3)

また、２値化した結果が背景でなかった場合は、何も処理は行わない。これらの処理を画面内の全画素数だけ繰り返す。図３のフレーム５０２を最初の背景以外のフレームとすると、形状データ５１２における黒の領域がこのフレームにおける背景領域となる。したがって、領域５５３が、背景フレームの画素値、領域５５２が、フレーム５０１の画素値とフレーム５０２の画素値を加算した値となる。 If the binarized result is not the background, no processing is performed. These processes are repeated for the total number of pixels in the screen. If the frame 502 in FIG. 3 is a frame other than the first background, the black area in the shape data 512 is the background area in this frame. Therefore, the region 553 is a pixel value of the background frame, and the region 552 is a value obtained by adding the pixel value of the frame 501 and the pixel value of the frame 502.

全部の画素の処理が終了した段階（ステップＳ８１６の判定がＹＥＳ）で一連の前処理を終える。 A series of pre-processing is finished at the stage where all the pixels have been processed (YES in step S816).

ステップＳ８０４は、補正の後処理を開始するか否かの判定処理である。１つの背景フレームから、次の背景フレームまでの期間が、ひとつの処理単位となるので、次の背景フレームの直前のフレームまでは、ループ処理を繰り返す。背景フレームの挿入間隔が一定の場合には、次の背景フレームが来る前に判定が可能である。背景フレームの間隔が一定でない場合は、次の背景フレームを検出するまで、それまでの処理内容を保持しておく必要がある。また、最終フレームを検出した場合も、それまでの結果から後処理に移ることになる。図３のフレーム５０３を背景直前のフレームとすると、形状データ５１３の形状データ生成結果から、背景データの重なり具合によって、いくつかの領域が生じる。領域５５３は、フレーム５０１でのみ背景と判定された領域であり、この領域におけるカウンタ値ｃ（ｘ，ｙ）は１であり、累積画素値は、背景フレーム５０１の画素値そのものである。領域５５５は、フレーム５０１と５０２で背景と判定された領域であり、この領域におけるカウンタ値ｃ（ｘ，ｙ）は２、累積画素値は、５０１と５０２の画素値の和となる。領域５５４は、フレーム５０１、５０２、５０３で背景と判定された領域であり、この領域におけるカウンタ値ｃ（ｘ，ｙ）は３、累積画素値は、フレーム５０１、５０２、５０３の画素値の和となる。 Step S804 is a process for determining whether or not to start the post-correction process. Since the period from one background frame to the next background frame is one processing unit, the loop processing is repeated until the frame immediately before the next background frame. If the background frame insertion interval is constant, the determination can be made before the next background frame arrives. If the interval between the background frames is not constant, it is necessary to retain the processing contents until then until the next background frame is detected. Also, when the last frame is detected, post-processing is performed from the previous result. Assuming that the frame 503 in FIG. 3 is a frame immediately before the background, several regions are generated from the shape data generation result of the shape data 513 depending on the overlapping state of the background data. An area 553 is an area determined to be the background only in the frame 501, the counter value c (x, y) in this area is 1, and the accumulated pixel value is the pixel value itself of the background frame 501. An area 555 is an area determined to be the background in the frames 501 and 502, the counter value c (x, y) in this area is 2, and the accumulated pixel value is the sum of the pixel values of 501 and 502. An area 554 is an area determined to be the background in the frames 501, 502, and 503. In this area, the counter value c (x, y) is 3, and the accumulated pixel value is the sum of the pixel values of the frames 501, 502, and 503. It becomes.

ステップＳ８０５は、背景データ補正の後処理である。図７（ｄ）を用いて説明する。ステップＳ８１７で、平均画素値を算出する。これまで累積してきた画素値とカウンタ値から、ある位置（ｘ，ｙ）の平均画素値ａ（ｘ，ｙ）は、次式で求めることができる。 Step S805 is post-processing of background data correction. This will be described with reference to FIG. In step S817, an average pixel value is calculated. From the pixel values and counter values accumulated so far, the average pixel value a (x, y) at a certain position (x, y) can be obtained by the following equation.

ａ（ｘ，ｙ）＝ｐ（ｘ，ｙ）／ｃ（ｘ，ｙ）・・・（４） a (x, y) = p (x, y) / c (x, y) (4)

この処理を画面内の全画素数だけ繰り返し、全部の画素の処理が終了した段階（ステップＳ８１８の判定がＹＥＳ）で一連の後処理を終える。図３においては、領域５５３では、カウンタ値１なので、平均画素値は、累積画素値のまま、領域５５５では、カウンタ値２なので、平均画素値は、累積画素値／２、領域５５４では、カウンタ値３なので、平均画素値は、累積画素値／３となる。すなわち、背景画像と形状データの共通領域を求め、その共通領域毎の平均画像を生成し、１枚の合成背景画像データを生成する。 This process is repeated for the total number of pixels in the screen, and the series of post-processing is finished at the stage where the processing of all the pixels is completed (the determination in step S818 is YES). In FIG. 3, since the counter value is 1 in the area 553, the average pixel value remains the accumulated pixel value, and the counter value is 2 in the area 555, so the average pixel value is the accumulated pixel value / 2. Since the value is 3, the average pixel value is cumulative pixel value / 3. That is, a common area of the background image and shape data is obtained, an average image for each common area is generated, and one piece of composite background image data is generated.

ステップＳ８０６は、最終の終了判定であり、シーケンスの最終フレームの処理が終わり（ＹＥＳ）ならば、終了となる。 Step S806 is a final end determination. If the processing of the last frame of the sequence is completed (YES), the process ends.

このように、複数のフレームを用いて画像を平均化することにより、ノイズや微小変動による影響を抑制でき、復号後の合成画像においても、より自然な画像を得ることができる。図１の任意形状符号化部２１０５には、この補正した背景データと、その形状データ５１１を入力することになる。 In this way, by averaging images using a plurality of frames, the influence of noise and minute fluctuations can be suppressed, and a more natural image can be obtained even in a composite image after decoding. The corrected background data and its shape data 511 are input to the arbitrary shape encoding unit 2105 in FIG.

先に説明したように、ＭＰＥＧ−４任意形状画像符号化方式では、フレーム内の処理を行うイントラモード（フレーム内符号化）と、フレーム間の処理を行うインターモード（フレーム間符号化）がある。符号化の処理モードをどう選ぶかという問題は、背景に設定するフレームをどう決めるかという問題と直接の関連はなく、自由に設定可能である。ただし、背景に設定したフレームをイントラモードにすると、効率のよい符号化が可能となる。インターモードでは、形状データもフレーム間のマッチングを取るため、各フレームの形状データが似ている場合は、形状データにおける発生符号量を小さくすることができる。フレーム５１１をイントラモードとして、フレーム５１２、５１３をインターモードとすると、形状データ５１３は、形状データ５１２と似ているため、発生符号量は、少なくて済むことになる。 As described above, in the MPEG-4 arbitrary shape image coding method, there are an intra mode (intra-frame coding) for performing processing within a frame and an inter mode (inter-frame coding) for performing processing between frames. . The problem of how to select the encoding processing mode is not directly related to the problem of how to determine the frame to be set as the background, and can be freely set. However, if the frame set as the background is set to the intra mode, efficient encoding becomes possible. In the inter mode, the shape data is also matched between frames, so that if the shape data of each frame is similar, the amount of generated codes in the shape data can be reduced. When the frame 511 is set to the intra mode and the frames 512 and 513 are set to the inter mode, the shape data 513 is similar to the shape data 512, so that the amount of generated codes can be reduced.

ここで、背景フレームの設定方法について、図１０から図１３を用いて説明する。図１０は、シーンの内容とは無関係に周期的に背景を選択する方法である。先頭フレームから周期的に背景を更新することにより、画像の内容が変化しても発生符号量を抑えることができる。まず初期設定としてステップＳ１２０１にてフレーム数をカウントする値ｉを０にセットする。次にステップＳ１２０２にて、現フレームが先頭フレームであるか否かを判定する。カウント値ｉ＝０ならば、先頭フレームなので、ステップＳ１２０４にて背景フレームと設定する。先頭フレームでない場合は、ステップＳ１２０３にて、現フレームが周期Ｔの倍数か否かを判定する。カウント値ｉを周期Ｔで割った余りが０ならば、周期Ｔの倍数なので、ステップＳ１２０４にて背景フレームと設定する。現フレームが周期Ｔの倍数でない場合は、背景設定をせずに終了判定のステップＳ１２０５に進む。最終フレームでなければ、終了判定Ｓ１２０５はＮＯとなり、ステップＳ１２０６にてフレーム数のカウント値ｉを１つ増やして、ステップＳ１２０２に戻り、次のフレームの処理に移る。以下同様の処理を繰り返し、最終フレームの処理が終わった段階で、終了判定Ｓ１２０５がＹＥＳとなり、一連の処理を終える。 Here, a background frame setting method will be described with reference to FIGS. FIG. 10 shows a method of selecting a background periodically regardless of the contents of the scene. By periodically updating the background from the first frame, the generated code amount can be suppressed even if the content of the image changes. First, as an initial setting, a value i for counting the number of frames is set to 0 in step S1201. In step S1202, it is determined whether the current frame is the top frame. If the count value i = 0, since it is the first frame, it is set as a background frame in step S1204. If it is not the first frame, it is determined in step S1203 whether or not the current frame is a multiple of the period T. If the remainder obtained by dividing the count value i by the period T is 0, it is a multiple of the period T, so that a background frame is set in step S1204. If the current frame is not a multiple of the period T, the process proceeds to step S1205 for end determination without setting the background. If it is not the last frame, the end determination S1205 is NO, the count value i of the number of frames is incremented by 1 in step S1206, the process returns to step S1202, and the process for the next frame is started. Thereafter, the same processing is repeated, and when the final frame processing is completed, the end determination S1205 is YES, and the series of processing ends.

図１１は、動画像のシーンの内容を解析し、シーンチェンジが発生したところで背景フレームを設定する方法であり、図１０とは異なる方法を示している。シーンチェンジ検出直後の画像は直前の画像との相関が低いため、大きな差分データを発生してしまう。シーンチェンジが発生したフレームを背景とすることで、シーンにまたがっての差分処理を防ぎ、発生符号量を抑えることができる。まずステップＳ１３０１にて、現フレームが先頭フレームか否かの判定を行い、先頭フレームならば、ステップＳ１３０３にて背景フレームに設定する。現フレームが先頭フレームでない場合は、ステップＳ１３０２にて、シーンチェンジが発生しているか否かの判定を行う。シーンチェンジ検出法は、本実施形態に特に限定されるものはないが、フレーム間の差分絶対値を閾値処理により２値化し、閾値を超える領域の面積によって判定するのはその一例である。このようにして、現フレームがシーンチェンジを発生したフレームであると判定されると、ステップＳ１５０３にて背景フレームに設定する。シーンチェンジがなければ、背景設定をせずに終了判定のステップＳ１３０４に進む。最終フレームでなければ、終了判定Ｓ１３０４はＮＯとなり、ステップＳ１３０１に戻り、次のフレームの処理に移る。以下同様の処理を繰り返し、最終フレームの処理が終わった段階で、終了判定Ｓ１３０４がＹＥＳとなり、一連の処理を終える。 FIG. 11 shows a method of analyzing the contents of a moving image scene and setting a background frame when a scene change occurs, and shows a method different from FIG. Since the image immediately after the scene change detection has a low correlation with the immediately preceding image, large difference data is generated. By using a frame in which a scene change has occurred as a background, difference processing across scenes can be prevented, and the amount of generated codes can be suppressed. First, in step S1301, it is determined whether or not the current frame is the first frame. If the current frame is the first frame, the background frame is set in step S1303. If the current frame is not the first frame, it is determined in step S1302 whether a scene change has occurred. The scene change detection method is not particularly limited to this embodiment, but one example is that the absolute value of the difference between frames is binarized by threshold processing and the determination is made based on the area of the region exceeding the threshold. In this way, when it is determined that the current frame is a frame in which a scene change has occurred, the background frame is set in step S1503. If there is no scene change, the process proceeds to step S1304 for end determination without setting the background. If it is not the last frame, the end determination S1304 is NO, the process returns to step S1301, and the process proceeds to the next frame. Thereafter, the same processing is repeated, and when the processing of the last frame is completed, the end determination S1304 becomes YES, and the series of processing is finished.

図１２は、図１０と図１１を組み合わせた背景フレーム設定法である。毎回周期Ｔ毎に背景を設定するが、それ以外にもシーンチェンジがあるフレームは背景とする。まず、ステップＳ１４０１にて、フレーム数を数えるカウンタ値ｉを０に初期設定する。次にステップＳ１４０２にて、ｉが０かどうかの判定を行う。これは、先頭フレームか否かの判定となる。先頭フレームならば、ステップＳ１４０５に進み、背景フレームに設定する。先頭フレームでない場合は、ステップＳ１４０３にて現フレームが周期Ｔの倍数か否かの判定を行う。判定は、フレーム値ｉを周期Ｔで割った余りが０ならば、倍数とするのが簡便な方法である。現フレームが周期Ｔの倍数ならば、ステップＳ１４０５にて、背景フレームに設定し、そうでなければ、ステップＳ１４０４のシーンチェンジ判定に進む。シーンチェンジがあればステップＳ１４０５にて背景フレームに設定し、そうでなければ、終了判定ステップＳ１４０６に進む。終了判定では最終フレームか否かの判定を行い、最終フレームでない場合は、ステップＳ１４０７にてカウンタ値ｉを１つ増やし、ステップＳ１４０２からの処理を繰り返す。最終フレームを処理した段階で終了判定のループを抜け、一連の処理を終了する。 FIG. 12 shows a background frame setting method combining FIG. 10 and FIG. A background is set every time period T, but other frames with scene changes are used as the background. First, in step S1401, a counter value i for counting the number of frames is initialized to 0. Next, in step S1402, it is determined whether i is 0 or not. This is a determination as to whether or not it is the first frame. If it is the first frame, the process advances to step S1405 to set the background frame. If it is not the first frame, it is determined whether or not the current frame is a multiple of the period T in step S1403. For the determination, if the remainder obtained by dividing the frame value i by the period T is 0, a simple method is to use a multiple. If the current frame is a multiple of the period T, it is set as a background frame in step S1405. Otherwise, the process proceeds to the scene change determination in step S1404. If there is a scene change, the background frame is set in step S1405; otherwise, the process proceeds to end determination step S1406. In the end determination, it is determined whether or not it is the final frame. If it is not the final frame, the counter value i is incremented by 1 in step S1407, and the processing from step S1402 is repeated. When the final frame is processed, the end determination loop is exited, and the series of processing ends.

図１３も、図１０と図１１を組み合わせた他の背景フレーム設定法である。図１３の場合は、図１２の場合と異なり、シーンチェンジ検出後、シーンチェンジのない区間の続く場合に周期的に背景を設定する。まず、ステップＳ１５０１にて、先頭フレームか否かの判定を行う。先頭フレームの判定方法は先に説明した方法でよい。先頭フレームならば、ステップＳ１５０３にて、背景フレームに設定し、ステップＳ１５０４にてフレームのカウンタ値ｉを１にセットする。ここでのカウンタ値ｉは先ほど説明した先頭からのフレーム数ではなく、背景フレームを常に１として、そこからのフレーム数を数えるものである。先頭フレームでなければ、ステップＳ１５０２で、シーンチェンジの判定を行う。シーンチェンジのフレームであれば、ステップＳ１５０３にて、背景フレームに設定し、ステップＳ１５０４にてカウンタ値ｉを１に設定する。シーンチェンジのフレームでなければ、ステップＳ１５０５にて、フレームのカウンタ値ｉを１つ増やす。背景フレームの直後であれば、ｉ＝２となる。次にステップＳ１５０６にて現フレームが周期Ｔの倍数か否かを判定する。判定方法としては、背景フレームからのカウント値ｉが周期Ｔと同じ値ならば、Ｔの倍数とするが簡便である。現フレームが周期Ｔの倍数ならば、ステップＳ１５０３にて、背景フレームに設定し、ステップＳ１５０４にてカウンタ値ｉを１に設定する。また、背景フレームからのカウント値ｉが周期Ｔに満たなければ、ステップＳ１５０７の終了判定に進む。終了判定では最終フレームか否かの判定を行い、最終フレームでない場合は、ステップＳ１５０１からの処理を繰り返す。これにより、シーンチェンジを検出時に背景フレーム設定をし、シーンチェンジがない区間が続く場合は、周期Ｔで背景フレーム設定をする、ということが可能となる。最終フレームを処理した段階で終了判定Ｓ１５０７はＹＥＳとなり、一連の処理を終了する。 FIG. 13 is another background frame setting method combining FIG. 10 and FIG. In the case of FIG. 13, unlike the case of FIG. 12, after the scene change is detected, the background is periodically set when a section without a scene change continues. First, in step S1501, it is determined whether it is the first frame. The method for determining the first frame may be the method described above. If it is the first frame, the background frame is set in step S1503, and the frame counter value i is set to 1 in step S1504. The counter value i here is not the number of frames from the top described above, but always counts the number of frames from the background frame as 1. If it is not the first frame, the scene change is determined in step S1502. If it is a scene change frame, the background frame is set in step S1503, and the counter value i is set to 1 in step S1504. If it is not a scene change frame, the frame counter value i is incremented by one in step S1505. If it is immediately after the background frame, i = 2. In step S1506, it is determined whether the current frame is a multiple of the period T. As a determination method, if the count value i from the background frame is the same value as the period T, a multiple of T is convenient. If the current frame is a multiple of the period T, the background frame is set in step S1503, and the counter value i is set to 1 in step S1504. On the other hand, if the count value i from the background frame does not reach the period T, the process proceeds to step S1507 to determine the end. In the end determination, it is determined whether or not it is the final frame. If it is not the final frame, the processing from step S1501 is repeated. As a result, it is possible to set the background frame when detecting a scene change, and to set the background frame at the period T when a section without a scene change continues. When the final frame is processed, the end determination S1507 is YES, and the series of processing ends.

次に、第１の実施形態で説明した符号化側の処理に対応する復号化側の処理について、図２、図４、図１４を用いて説明する。 Next, decoding-side processing corresponding to the encoding-side processing described in the first embodiment will be described with reference to FIGS. 2, 4, and 14.

図２は、復号化側の全体構成を示すブロック図である。分離部２２０１では、複数のビットストリームをデコーダ毎のビットストリームに分離する。ビットストリームの種類としては、ビデオの他にオーディオなどもあるが、ここでは本実施形態固有の処理であるビデオのビットストリームについてのみ図示している。任意形状画像復号化部２２０３では、分離されたビデオのビットストリームを入力し、画像データと形状データを復号画像として出力する。これは、背景技術で説明したＭＰＥＧ−４ＣｏｒｅＰｒｏｆｉｌｅのデコーダを用いるのがよい。画像合成部２０１では、入力した背景画像と現フレームの画像データおよび形状データから合成画像を生成し、画像出力部２２０５へ出力する。画像出力部２２０５は、ディスプレイのような画像表示装置が代表的なものであり、入力したフレームを所望のタイミングで順次表示する。 FIG. 2 is a block diagram showing the overall configuration on the decoding side. The separation unit 2201 separates a plurality of bit streams into bit streams for each decoder. The type of bit stream includes audio as well as video, but only the video bit stream, which is a process unique to the present embodiment, is illustrated here. The arbitrary shape image decoding unit 2203 receives the separated video bit stream and outputs the image data and shape data as decoded images. For this, it is preferable to use the MPEG-4 Core Profile decoder described in the background art. The image composition unit 201 generates a composite image from the input background image and the current frame image data and shape data, and outputs the composite image to the image output unit 2205. The image output unit 2205 is typically an image display device such as a display, and sequentially displays input frames at a desired timing.

画像合成部２０１の処理手順を、図１４のフローチャートを用いて説明する。まずステップＳ１８０１にて、最初に入力されたフレームが背景かどうかの判定を行う。形状データが全画素オブジェクトであれば、その画像は背景であると判定できる。入力されたフレームが背景ならば、ステップＳ１８０２にて、合成のためにデータを一時保存する。ステップＳ１８０３では、背景フレームと現フレームとの合成を行うが、この時点では現フレームが背景フレームなので、ステップＳ１８０４では現フレームをそのまま出力する。ステップＳ１８０５の終了判定では最終フレームか否かの判定を行い、最終フレームでない場合は、ステップＳ１８０１に戻り、次のフレームの判定を行う。背景フレームでない場合は、ステップＳ１８０３にて先ほど一時保存した背景画像と現フレームの任意形状画像との合成を形状データを基にして行う。合成した画像は、ステップＳ１８０４にて出力し、終了判定ステップＳ１８０５に進む。以上の処理を繰り返し、最終フレームを処理した段階で終了判定のループを抜け、一連の処理を終了する。 The processing procedure of the image composition unit 201 will be described with reference to the flowchart of FIG. First, in step S1801, it is determined whether the first input frame is the background. If the shape data is an all-pixel object, it can be determined that the image is the background. If the input frame is the background, in step S1802, the data is temporarily stored for synthesis. In step S1803, the background frame and the current frame are synthesized. Since the current frame is the background frame at this point, the current frame is output as it is in step S1804. In the end determination in step S1805, it is determined whether or not it is the last frame. If it is not the last frame, the process returns to step S1801 to determine the next frame. If it is not a background frame, in step S1803, the background image temporarily stored and the arbitrary shape image of the current frame are combined based on the shape data. The synthesized image is output in step S1804, and the process proceeds to end determination step S1805. The above processing is repeated, and when the final frame is processed, the end determination loop is exited, and the series of processing ends.

図４において最初のフレームでは、画像データ６１１が任意形状の画像データ、形状データ６０１がその形状データ、画像データ６４１が合成画像データである。形状データ６０１がすべてオブジェクトを示しているので、画像データ６１１がそのまま合成後の出力画像６４１になっている。次のフレームでは、画像データ６１２が任意形状の画像データ、形状データ６０２がその形状データ、画像データ６４１が背景画像データである。形状データ６０２の黒い部分はオブジェクトでないので、背景画像データ６４１の画素を当てはめ、白い部分はオブジェクトなので、画像データ６１２の画素を当てはめる。各画素単位で画像データ６４１もしくは６１２の画素値を当てはめていくことにより、合成画像６４２を得ることができる。同様に、背景画像データ６４１と任意形状画像データ６１３を形状データ６０３に基づいて処理すると、合成画像６４３を得ることができる。 In the first frame in FIG. 4, the image data 611 is image data having an arbitrary shape, the shape data 601 is the shape data, and the image data 641 is the composite image data. Since all the shape data 601 indicates an object, the image data 611 is the combined output image 641 as it is. In the next frame, image data 612 is image data of an arbitrary shape, shape data 602 is the shape data, and image data 641 is background image data. Since the black portion of the shape data 602 is not an object, the pixel of the background image data 641 is applied, and the white portion is an object, so the pixel of the image data 612 is applied. By applying the pixel value of the image data 641 or 612 for each pixel, a composite image 642 can be obtained. Similarly, when the background image data 641 and the arbitrary shape image data 613 are processed based on the shape data 603, a composite image 643 can be obtained.

上述したように、第１の実施形態に係る画像処理装置によれば、あらかじめ背景画像を用意するような構成を採らなくても、連続して入力するフレームの中から背景画像と任意形状画像を選択することにより、シーンに限定されない汎用的な高能率符号化システムを実現することができる。特に背景データの補正処理により、ノイズや微小変動による影響を抑制でき、復号後の合成画像においても、より自然な画像を得ることが可能となる。 As described above, according to the image processing apparatus according to the first embodiment, a background image and an arbitrary shape image can be extracted from frames that are continuously input without adopting a configuration in which a background image is prepared in advance. By selecting, a general-purpose high-efficiency encoding system that is not limited to a scene can be realized. In particular, the background data correction process can suppress the influence of noise and minute fluctuations, and a more natural image can be obtained even in a composite image after decoding.

＜第２の実施形態＞
本発明の第２の実施形態について説明する。全体構成を図１（ｂ）に示す。第１の実施形態との違いは、背景データ補正部１０４によって得られたデータを、形状データの補正に利用することにある。図１（ａ）の形状データ生成部１０３を図１（ｂ）では、形状データ生成・補正部１０５としている。この２つの機能ブロックにおける画像データと形状データの処理手順を、図８、図９のフローチャートを用いて詳しく説明する。なお、図８には第１の実施形態の図５が、図９には第１の実施形態の図６が、それぞれ対応しており、同じ処理を行う部分については、同じ番号を付してある。 <Second Embodiment>
A second embodiment of the present invention will be described. The overall configuration is shown in FIG. The difference from the first embodiment is that the data obtained by the background data correction unit 104 is used for correction of shape data. The shape data generation unit 103 in FIG. 1A is the shape data generation / correction unit 105 in FIG. The processing procedure of image data and shape data in these two functional blocks will be described in detail with reference to the flowcharts of FIGS. 8 corresponds to FIG. 5 of the first embodiment, and FIG. 9 corresponds to FIG. 6 of the first embodiment. Parts that perform the same processing are denoted by the same reference numerals. is there.

まず図８を用いて、全体の処理の流れを示す。ステップＳ７０１からステップＳ７０７までの処理は、図５で説明したとおりである。ステップＳ７０８にて、背景データの補正を行うが、第１の実施形態では、ひとつの処理区間につき１回補正するのみであった。ステップＳ１００１では、ステップＳ７０８の背景補正データを基に、形状データの補正を行う。形状データの補正を行うと、そのフレーム内で背景と判定される領域も変化するので、その情報を基に再度ステップＳ７０８にて、背景データの補正を行う。この処理をステップＳ１００２の補正終了判定がＹＥＳとなるまで繰り返す。補正終了判定がＹＥＳとなった時点で、ひとつの背景フレームに対応した区間の処理が終わり、次の背景フレームに対応した区間の処理を開始する。最終フレームまで処理すると、ステップＳ７０９の終了判定がＹＥＳとなり、一連の処理を終了する。 First, the overall processing flow will be described with reference to FIG. The processing from step S701 to step S707 is as described in FIG. In step S708, the background data is corrected. In the first embodiment, the correction is performed only once for each processing section. In step S1001, the shape data is corrected based on the background correction data in step S708. When the shape data is corrected, the area determined to be the background within the frame also changes, so that the background data is corrected again in step S708 based on the information. This process is repeated until the correction end determination in step S1002 is YES. When the correction end determination is YES, the process for the section corresponding to one background frame is completed, and the process for the section corresponding to the next background frame is started. When the process reaches the last frame, the end determination in step S709 is YES, and the series of processes ends.

次に図９を用いて、背景フレーム以外のフレームを処理する方法の詳細を説明する。まずステップＳ８０１にて、図６で説明した初期設定を行う。次にステップＳ１１０１にて、フレーム数カウントのための初期設定を行う。ここでは、カウント値ｋ＝０とする。ステップＳ８０２にて、画像の入力をし、ステップＳ１１０２にて、フレーム数をカウントする。最初の背景以外のフレームを入力した時点では、ｋ＝１となる。次にステップＳ８０３にて、図６で説明した前処理を行う。ステップＳ８０４は、ひとつの背景に対し、処理するフレームの最後を判定するものである。次のフレームが背景である場合、またはシーケンスの最後のフレームである場合は、ＹＥＳとなり、処理終了となる。ＮＯの場合はステップＳ８０２からＳ８０３の処理を繰り返し、繰り返した回数がｋのカウント値となる。ここでは、ｋ＝ｋｍａｘとする。このループを抜けると、ステップＳ８０５にて、背景補正の後処理を行う。ここまでの処理により、補正後の背景データが得られることになる。この背景データを用いて、形状データの補正処理に進む。ステップＳ１１０３にて、処理するフレーム数をカウントする。ここでは、ｋｍａｘからカウントダウンを行う。ステップＳ１１０４では、新たな背景データを用いて前処理を行う。この処理は、ステップＳ１１０５にて終了と判定するまで、繰り返す。終了フレーム数の判定は、カウント値ｋ＝０となった時点で、ＹＥＳとなる。ここで、ステップＳ１１０６の後処理を行うと、最初に補正した背景データを用いて生成した補正形状データと、補正形状データによって再度補正された背景データが得られることになる。ステップＳ１１０７は、この再補正処理を何回繰り返すかの判定処理である。終了条件に満たない場合は、ステップＳ１１０８にて、０になったカウンタ値を再度ｋｍａｘに設定し、ステップＳ１１０３からの処理を繰り返す。ステップＳ１１０７の終了判定は、あらかじめ設定した回数の繰り返すという方法でも良いし、繰り返し毎に背景データや形状データの変化量を算出し、それが閾値以下の場合に繰り返しを打ち切る、といった動的な処理方法としても良い。このループを抜けると、ひとつの背景フレームに対応した区間の処理が終わることになる。最終フレームまで処理すると、ステップＳ８０６の終了判定がＹＥＳとなり、一連の処理を終了する。 Next, details of a method of processing frames other than the background frame will be described with reference to FIG. First, in step S801, the initial setting described with reference to FIG. 6 is performed. In step S1101, initial setting for counting the number of frames is performed. Here, the count value k = 0. In step S802, an image is input, and in step S1102, the number of frames is counted. When a frame other than the first background is input, k = 1. In step S803, the preprocessing described with reference to FIG. 6 is performed. Step S804 determines the end of the frame to be processed for one background. If the next frame is the background, or if it is the last frame in the sequence, the determination is YES and the process ends. In the case of NO, the processing from step S802 to S803 is repeated, and the number of repetitions becomes the count value of k. Here, k = kmax. When this loop is exited, background correction post-processing is performed in step S805. Through the processing so far, corrected background data is obtained. Using this background data, the process proceeds to the shape data correction process. In step S1103, the number of frames to be processed is counted. Here, the countdown is performed from kmax. In step S1104, preprocessing is performed using new background data. This process is repeated until it is determined to end in step S1105. The determination of the number of end frames is YES when the count value k = 0. Here, when the post-processing of step S1106 is performed, corrected shape data generated using the first corrected background data and background data corrected again by the corrected shape data are obtained. Step S1107 is a determination process of how many times this re-correction process is repeated. If the end condition is not satisfied, the counter value that has become 0 is set to kmax again in step S1108, and the processing from step S1103 is repeated. The determination of the end of step S1107 may be a method of repeating a preset number of times, or a dynamic process such as calculating the amount of change in the background data or shape data for each repetition and aborting the repetition when it is less than or equal to the threshold value. It is good as a method. When this loop is exited, the processing for the section corresponding to one background frame is completed. When the process reaches the last frame, the end determination in step S806 is YES, and the series of processes ends.

このように、複数のフレームを用いて背景データを補正し、更に補正した背景データを用いて、形状データの補正を行う、といった処理を繰り返すと、より精度の高い背景データならびに形状データが得られることとなる。図１（ｂ）の任意形状符号化部２１０５には、この補正した背景データと補正した形状データを入力することになる。 As described above, when the background data is corrected using a plurality of frames and the shape data is corrected using the corrected background data, more accurate background data and shape data can be obtained. It will be. The corrected background data and the corrected shape data are input to the arbitrary shape encoding unit 2105 in FIG.

また、背景フレームの設定方法については、第１の実施形態で説明したものと同様である。更に、復号側の処理についても、同様なので、ここでは説明を省略する。 The background frame setting method is the same as that described in the first embodiment. Further, since the process on the decoding side is the same, the description is omitted here.

上述したように、第２の実施形態に係る画像処理装置によれば、あらかじめ背景画像を用意するような構成を採らなくても、連続して入力するフレームの中から背景画像と任意形状画像を選択することにより、シーンに限定されない汎用的な高能率符号化システムを実現することができる。特に背景データならびに形状データの補正処理により、高精度の抽出結果が得られ、高能率の符号化が可能となる。 As described above, according to the image processing apparatus according to the second embodiment, a background image and an arbitrary shape image can be selected from frames that are continuously input without adopting a configuration in which a background image is prepared in advance. By selecting, a general-purpose high-efficiency encoding system that is not limited to a scene can be realized. In particular, the background data and the shape data correction process provides a highly accurate extraction result, and enables highly efficient encoding.

＜第３の実施形態＞
図２３は、本発明の第３の実施形態によるコンピュータのハードウエア構成例を示す。本実施形態は、前記第１及び第２の実施形態の装置をコンピュータで実現する例を示す。 <Third Embodiment>
FIG. 23 shows a hardware configuration example of a computer according to the third embodiment of the present invention. The present embodiment shows an example in which the devices of the first and second embodiments are realized by a computer.

バス２９０１には、中央処理装置（ＣＰＵ）２９０２、ＲＯＭ２９０３、ＲＡＭ２９０４、ネットワークインタフェース２９０５、入力装置２９０６、出力装置２９０７及び外部記憶装置２９０８が接続されている。 A central processing unit (CPU) 2902, a ROM 2903, a RAM 2904, a network interface 2905, an input device 2906, an output device 2907, and an external storage device 2908 are connected to the bus 2901.

ＣＰＵ２９０２は、データの処理又は演算を行うと共に、バス２９０１を介して接続された各種構成要素を制御するものである。ＲＯＭ２９０３には、予めＣＰＵ２９０２の制御手順（コンピュータプログラム）を記憶させておき、このコンピュータプログラムをＣＰＵ２９０２が実行することにより、起動する。外部記憶装置２９０８にコンピュータプログラムが記憶されており、そのコンピュータプログラムがＲＡＭ２９０４にコピーされて実行される。ＲＡＭ２９０４は、データの入出力、送受信のためのワークメモリ、各構成要素の制御のための一時記憶として用いられる。外部記憶装置２９０８は、例えばハードディスク記憶装置やＣＤ−ＲＯＭ等であり、画像データ等を記憶し、電源を切っても記憶内容が消えない。ＣＰＵ２９０２は、ＲＡＭ２９０４内のコンピュータプログラムを実行することにより、第１及び第２の実施形態の処理を行う。 The CPU 2902 performs data processing or calculation and controls various components connected via the bus 2901. The ROM 2903 stores the control procedure (computer program) of the CPU 2902 in advance, and the CPU 2902 is activated when the computer program is executed. A computer program is stored in the external storage device 2908, and the computer program is copied to the RAM 2904 and executed. The RAM 2904 is used as a work memory for data input / output, transmission / reception, and temporary storage for control of each component. The external storage device 2908 is, for example, a hard disk storage device or a CD-ROM, and stores image data and the like, and the stored content does not disappear even when the power is turned off. The CPU 2902 performs the processes of the first and second embodiments by executing the computer program in the RAM 2904.

ネットワークインタフェース２９０５は、ネットワークに接続するためのインタフェースである。入力装置２９０６は、例えばキーボード及びマウス等であり、各種指定又は入力等を行うことができる。出力装置２９０７は、ディスプレイ及びプリンタ等である。 The network interface 2905 is an interface for connecting to a network. The input device 2906 is, for example, a keyboard and a mouse, and can perform various designations or inputs. The output device 2907 is a display, a printer, or the like.

本実施形態は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録したコンピュータ読み取り可能な記録媒体（または記憶媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによって、達成することができる。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、第１及び第２の実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 In the present embodiment, a computer-readable recording medium (or storage medium) in which a program code of software that realizes the functions of the above-described embodiments is recorded is supplied to the system or apparatus, and the computer (or CPU) of the system or apparatus is supplied. Or MPU) can read out and execute the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention. In addition, by executing the program code read by the computer, not only the functions of the first and second embodiments are realized, but also an operating system running on the computer based on an instruction of the program code ( It goes without saying that the case where the functions of the above-described embodiments are realized by performing part or all of the actual processing by the OS) or the like.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, after the program code read from the recording medium is written in a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the card or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

本実施形態を上記記録媒体に適用する場合、その記録媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。 When the present embodiment is applied to the recording medium, program code corresponding to the flowchart described above is stored in the recording medium.

尚、本実施形態は、複数の機器（例えば、ホストコンピュータ、インタフェース機器、リーダ、プリンタ等）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置等）に適用してもよい。 Note that the present embodiment can be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), but a device (for example, a copier, a facsimile machine, etc.) composed of a single device. ).

以上説明したように、第１〜第３の実施形態によれば、あらかじめ背景画像を用意するような構成を採らなくても、連続して入力するフレームの中から背景画像と任意形状画像を選択し、分離・合成することにより、シーンに限定されない汎用的な高能率符号化システムを実現することができる。 As described above, according to the first to third embodiments, it is possible to select a background image and an arbitrary shape image from continuously input frames without adopting a configuration in which a background image is prepared in advance. However, by separating and synthesizing, a general-purpose high-efficiency encoding system that is not limited to a scene can be realized.

なお、上記実施形態は、何れも本発明を実施するにあたっての具体化の例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

本発明の第１の実施形態における画像処理装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an image processing apparatus according to a first embodiment of the present invention. 復号化システムの構成を示すブロック図である。It is a block diagram which shows the structure of a decoding system. 本発明の第１の実施形態における形状データと画像データの処理について説明するための図である。It is a figure for demonstrating the process of the shape data in the 1st Embodiment of this invention, and image data. 復号化システムにおける形状データと画像データの処理について説明するための図である。It is a figure for demonstrating the process of the shape data and image data in a decoding system. 本発明の第１の実施形態における形状データと画像データの全体の処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the whole process of the shape data in the 1st Embodiment of this invention, and image data. 本発明の第１の実施形態における形状データと画像データの詳細な処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the detailed process of the shape data and image data in the 1st Embodiment of this invention. 本発明の第１の実施形態における形状データと画像データの詳細な処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the detailed process of the shape data and image data in the 1st Embodiment of this invention. 本発明の第２の実施形態における形状データと画像データの全体の処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the whole process of the shape data and image data in the 2nd Embodiment of this invention. 本発明の第２の実施形態における形状データと画像データの詳細な処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the detailed process of the shape data and image data in the 2nd Embodiment of this invention. 本発明の実施形態に係る背景フレームの設定手順を説明するためのフローチャートである。It is a flowchart for demonstrating the setting procedure of the background frame which concerns on embodiment of this invention. 本発明の実施形態に係る他の背景フレームの設定手順を説明するためのフローチャートである。It is a flowchart for demonstrating the setting procedure of the other background frame which concerns on embodiment of this invention. 本発明の実施形態に係る他の背景フレームの設定手順を説明するためのフローチャートである。It is a flowchart for demonstrating the setting procedure of the other background frame which concerns on embodiment of this invention. 本発明の実施形態に係る他の背景フレームの設定手順を説明するためのフローチャートである。It is a flowchart for demonstrating the setting procedure of the other background frame which concerns on embodiment of this invention. 復号化システムにおける画像データの合成手順を説明するためのフローチャートである。It is a flowchart for demonstrating the synthetic | combination procedure of the image data in a decoding system. 従来例に係る符号化側の画像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image processing apparatus by the side of the encoding which concerns on a prior art example. 従来例に係る復号化側の画像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image processing apparatus by the side of the decoding which concerns on a prior art example. 従来例に係る符号化側の画像処理装置の構成を示すブロック図であり、特に画像分離部の処理について説明するための図である。It is a block diagram which shows the structure of the image processing apparatus of the encoding side which concerns on a prior art example, and is a figure for demonstrating especially the process of an image separation part. 従来例に係る復号化側の画像処理装置の構成を示すブロック図であり、特に画像表示部の処理について説明するための図である。It is a block diagram which shows the structure of the image processing apparatus by the side of the decoding which concerns on a prior art example, and is a figure for demonstrating especially the process of an image display part. 従来例に係る符号化側の形状データと画像データの処理について説明するための図である。It is a figure for demonstrating the process of the shape data and image data by the side of an encoding concerning a prior art example. 従来例に係る復号化側の形状データと画像データの処理について説明するための図である。It is a figure for demonstrating the process of the shape data and image data by the side of decoding concerning a prior art example. 任意形状符号化部の構成を示すブロック図である。It is a block diagram which shows the structure of an arbitrary shape encoding part. 任意形状復号化部の構成を示すブロック図である。It is a block diagram which shows the structure of an arbitrary shape decoding part. コンピュータのハードウエア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of a computer.

Explanation of symbols

１０１画像選択部
１０３形状データ生成部
１０４背景データ補正部
１０５形状データ生成・補正部
２１０１画像入力部
２１０５任意形状画像符号化部
２１０６多重化部
２９０１バス
２９０２ＣＰＵ
２９０３ＲＯＭ
２９０４ＲＡＭ
２９０５ネットワークインタフェース
２９０６入力装置
２９０７出力装置
２９０８外部記憶装置 101 Image selection unit 103 Shape data generation unit 104 Background data correction unit 105 Shape data generation / correction unit 2101 Image input unit 2105 Arbitrary shape image encoding unit 2106 Multiplexing unit 2901 Bus 2902 CPU
2903 ROM
2904 RAM
2905 Network interface 2906 Input device 2907 Output device 2908 External storage device

Claims

A moving image input means for inputting a moving image composed of a plurality of frames;
When selecting a frame as a background image from the moving image, an image selection means for selecting a frame as a background image regardless of the presence or absence of a subject in the image;
A shape data generating means for comparing the frame as the background image with the frame of the input image and generating shape data based on the difference value;
Background data correction means for correcting the background image based on the shape data;
An image processing apparatus comprising: an arbitrary shape image encoding means for encoding the input image as an arbitrary shape image together with the shape data.

2. The background data correcting unit obtains a common area of the background image and the shape data, generates an average image for each common area, and generates one composite background image data. Image processing apparatus.

The image processing apparatus according to claim 1, wherein the background data correction unit corrects the shape data in accordance with correction of background image data.

The image processing apparatus according to claim 1, wherein the shape data generation unit outputs the shape data as an all-pixel object when the input image is a frame as a background image. .

5. The image processing apparatus according to claim 1, wherein the image selection unit selects a frame to be a background image at a constant cycle from a top frame.

The image processing apparatus according to claim 1, wherein the image selection unit selects the frame as a background image when a scene change of the moving image is detected.

The image processing apparatus according to claim 6, wherein the image selection unit selects a frame as a background image at a constant cycle after the scene change of the moving image is detected.

The image processing apparatus according to claim 1, wherein the arbitrary-shaped image encoding unit performs switching between intra-frame encoding and inter-frame encoding.

9. The image processing apparatus according to claim 8, wherein the image selection unit selects a frame as a background image corresponding to the intra-frame encoding process in the arbitrary shape image encoding unit.

A moving image input step for inputting a moving image composed of a plurality of frames;
An image selection step of selecting a frame as a background image regardless of the presence or absence of a subject in the image when selecting a frame as a background image from the moving image;
A shape data generation step of comparing the frame as the background image with the frame of the input image and generating shape data based on the difference value;
A background data correction step for correcting the background image based on the shape data;
An image processing method comprising: an arbitrary shape image encoding step for encoding the input image as an arbitrary shape image together with the shape data.

A program for causing a computer to execute each step of the image processing method according to claim 10.

A computer-readable recording medium on which a program for causing a computer to execute each step of the image processing method according to claim 10 is recorded.