JP6272071B2

JP6272071B2 - Image processing apparatus, image processing method, and program

Info

Publication number: JP6272071B2
Application number: JP2014028824A
Authority: JP
Inventors: 秀樹三ツ峰; 英彦大久保; 寛史盛岡
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2014-02-18
Filing date: 2014-02-18
Publication date: 2018-01-31
Anticipated expiration: 2034-02-18
Also published as: JP2015153321A

Description

本発明は、画像を処理する画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program for processing an image.

撮影映像から被写体領域を切り出して映像合成を行う技術がある。以下に代表的な方法を示す。
カメラを固定し、背景映像を先に撮っておく方法（以下、第１方法という。）がある。具体的には、所定の被写体（例えば、人、動物等）が含まれていない場所を事前にカメラで撮影しておき、これを背景映像とする。その後、カメラの前に所定の被写体が移動してきたと仮定し、これを撮影映像とする。第１方法では、撮影映像と背景映像との差分を求め、差のある領域を非剛体領域（所定の被写体が含まれた領域）とし取り出す。 There is a technique for extracting a subject area from a captured video and synthesizing the video. A typical method is shown below.
There is a method (hereinafter referred to as a first method) in which a camera is fixed and a background image is taken first. Specifically, a place where a predetermined subject (for example, a person, an animal, etc.) is not included is photographed in advance with a camera, and this is used as a background video. Thereafter, it is assumed that a predetermined subject has moved in front of the camera, and this is taken as a captured image. In the first method, a difference between the captured image and the background image is obtained, and an area having the difference is extracted as a non-rigid body area (an area including a predetermined subject).

また、クロマキー合成による方法（以下、第２方法という。）がある（例えば、特許文献１を参照）。クロマキー合成では、まず、特定色のスクリーンを背景として配置し、この背景とともに所定の被写体を撮影する。第２方法では、撮影映像に基づいて、各画素の色相、彩度、輝度等を基準に背景色かそれ以外かを判定し、背景色ではない部分を非剛体領域（所定の被写体が含まれた領域）として取り出す。 There is also a method based on chroma key synthesis (hereinafter referred to as a second method) (see, for example, Patent Document 1). In chroma key composition, first, a screen of a specific color is arranged as a background, and a predetermined subject is photographed together with this background. In the second method, a background color or other color is determined based on the hue, saturation, luminance, and the like of each pixel based on the captured video, and a non-rigid body region (a predetermined subject is included) Area).

また、グラフカットを利用した方法（以下、第３方法という。）がある。第３方法では、被写体を撮影した後、ユーザが対象領域をラフに切り出し、これをＳｅｅｄとして所定の拘束条件を与えて、グラフカットにより最適化を行う。なお、所定の拘束条件とは、例えば、１）隣接画素の色の差が大きい部分にラベル境界が来るようにする、２）指定された前景／背景Ｓｅｅｄに似た色を持つ画素が前景／背景になるようする、等である。 There is also a method using graph cut (hereinafter referred to as a third method). In the third method, after the subject is photographed, the user roughly cuts out the target area, which is set as Seed, given a predetermined constraint condition, and optimized by graph cut. The predetermined constraint conditions are, for example, 1) a label boundary is placed at a portion where the color difference between adjacent pixels is large, and 2) a pixel having a color similar to the designated foreground / background Seed Make it a background, etc.

特開２００４−７７７０号公報JP 2004-7770 A

第１方法を利用する場合には、カメラを固定する必要があり、カメラワークを自在にしたい場合に適用が困難となる。
また、第２方法を利用する場合には、背景を特殊なものに限定する必要があり、風景等を背景に利用したい場合に適用が困難となる。特に、映像ライブラリに保存されている過去の映像は、クロマキーによって撮影されていないものが多く、第２方法の利用は困難である。
また、第３方法では、ユーザ自身が対象領域を指定する必要があり、手作業の労力が多大である。 When the first method is used, it is necessary to fix the camera, which makes it difficult to apply the camera work freely.
Further, when the second method is used, it is necessary to limit the background to a special one, which makes it difficult to apply when it is desired to use a landscape or the like as the background. In particular, many of the past videos stored in the video library are not shot with the chroma key, and it is difficult to use the second method.
Further, in the third method, it is necessary for the user himself / herself to specify the target area, which requires a lot of manual labor.

このようにして、映像制作では、撮影映像から被合成対象となる被写体領域を切り出して映像合成を行うために、クロマキー合成（第２方法）や後処理による煩雑な手作業（第３方法）が利用されてきた。 In this way, in video production, in order to cut out a subject area to be synthesized from a shot video and perform video synthesis, chroma key synthesis (second method) and complicated manual work (third method) by post-processing are performed. Has been used.

本発明は、対象となる部分（非剛体部分）を手作業によらずに分離する画像処理装置、画像処理方法及びプログラムを提供することを目的とする。これにより、本発明は、映像加工の専門知識を持たない人でも多大な労力を払うことなく所望の映像制作を行うことを可能とする。 An object of the present invention is to provide an image processing device, an image processing method, and a program for separating a target portion (non-rigid body portion) without using manual work. As a result, the present invention enables a person who does not have specialized knowledge of video processing to perform desired video production without much effort.

本発明に係る画像処理装置は、複数のフレームで構成されるコンテンツにおいて、各フレームの画像から特徴点を検出する特徴点検出部と、前記特徴点検出部により検出された各特徴点の特徴量に基づいて、各フレームの画像間の対応点を探索する対応点探索部と、対象となるフレームの画像において、隣接するｎ個の特徴点を連結して複数の多角形状に分割し、前記対象となるフレームの画像の前又は／及び後のフレームの画像に対して、前記複数の多角形状を投影する多角形状分割部と、前記対象となるフレームの画像及び前記対象となるフレームの画像の前又は／及び後のフレームの画像において、画素同士の類似性に基づいて、各画像を複数の領域に分割する領域分割部と、前記多角形状分割部により複数の多角形状に分割した前記対象となるフレームの画像と、前記領域分割部により複数の領域に分割した前記対象となるフレームの画像とを比較し、各多角形状の位置に対応する各分割領域を重複領域として抽出する抽出部と、前記対応点探索部により探索した対応点に基づいて、前記対象となるフレームの画像の前又は／及び後のフレームの画像から、前記抽出部で抽出された各重複領域に対応する対応領域を特定する特定部と、前記抽出部で抽出された各重複領域と、前記特定部で特定された各対応領域とを比較して、変化量を算出し、算出した変化量が所定の閾値を超える場合には、その重複領域には非剛体が含まれると判断する判断部と、前記領域分割部により分割した領域に基づいて、前記判断部により非剛体が含まれていると判断された重複領域から非剛体部分を分離する分離部とを備える構成である。 An image processing apparatus according to the present invention includes a feature point detection unit that detects a feature point from an image of each frame in content composed of a plurality of frames, and a feature amount of each feature point detected by the feature point detection unit A corresponding point search unit for searching for corresponding points between images in each frame, and in the image of the target frame, n adjacent feature points are connected and divided into a plurality of polygonal shapes, and the target A polygon shape dividing unit that projects the plurality of polygonal shapes with respect to an image of a frame before and / or after a frame image, and an image of the target frame and an image of the target frame Or, in the image of the subsequent frame, based on the similarity between the pixels, the image is divided into a plurality of polygons by a region dividing unit that divides each image into a plurality of regions, and the polygon dividing unit An extraction unit that compares an image of an elephant frame with an image of the target frame divided into a plurality of regions by the region dividing unit, and extracts each divided region corresponding to each polygonal position as an overlapping region And a corresponding region corresponding to each overlapping region extracted by the extraction unit from the image of the frame before or / and after the image of the target frame based on the corresponding point searched by the corresponding point search unit The identifying unit for identifying the region, each overlapping region extracted by the extracting unit, and each corresponding region identified by the identifying unit are used to calculate a change amount, and the calculated change amount has a predetermined threshold value. If it exceeds, a determination unit that determines that the overlapping region includes a non-rigid body, and an overlap that is determined by the determination unit to include a non-rigid body based on the region divided by the region dividing unit Non-rigid from region Min is configured to include a separation unit for separating the.

かかる構成によれば、画像処理装置は、各フレームの画像に対して、特徴点の抽出、対応点探索等を行い、一方、各フレームの画像に対して領域分割を行って、これらの結果に基づいて各フレームの画像から対象となる部分（非剛体部分）を分離するので、対象となる部分（非剛体部分）を手作業によらずに分離することができ、映像加工の専門知識を持たない人でも多大な労力を払うことなく所望の映像制作を行うことを可能とする。 According to such a configuration, the image processing apparatus performs feature point extraction, corresponding point search, and the like for each frame image, while performing region segmentation for each frame image to obtain these results. Based on the image of each frame, the target part (non-rigid part) is separated, so the target part (non-rigid part) can be separated without manual work, and has expertise in video processing Even a person who is not able to perform desired video production without paying a great deal of labor.

画像処理装置では、前記対応点探索部により探索された各フレームの画像間の対応点を利用して、バンドル調整による最適化により、各特徴点の３次元位置と、前記コンテンツを撮像したカメラのカメラ姿勢を推定し、前記各特徴点の３次元位置と、前記カメラ姿勢に基づいて、特徴点を３次元空間に逆投影し、さらに画像面に再投影し、特徴点と再投影点の距離を再投影誤差として算出し、所定の大きさ以上離れている特徴点を特定する算出部を備え、前記対応点探索部は、記特徴点検出部により検出された各特徴点から前記算出部により特定された特徴点を除外し、残った特徴点の特徴量に基づいて、各フレームの画像間の対応点を探索する構成でもよい。 In the image processing apparatus, the corresponding points between the images of each frame searched by the corresponding point search unit are used to optimize the three-dimensional position of each feature point and the camera that captured the content by optimization through bundle adjustment. Estimating the camera posture, and based on the three-dimensional position of each feature point and the camera posture, the feature point is back-projected into a three-dimensional space and re-projected onto the image plane, and the distance between the feature point and the re-projected point Is calculated as a reprojection error, and a calculation unit that identifies feature points that are separated by a predetermined size or more is provided, and the corresponding point search unit uses the calculation unit based on each feature point detected by the feature point detection unit. A configuration may be adopted in which the identified feature points are excluded and corresponding points between images of each frame are searched based on the feature values of the remaining feature points.

かかる構成によれば、画像処理装置は、各特徴点の３次元位置と、カメラ姿勢に基づいて、特徴点を３次元空間に逆投影し、さらに画像面に再投影し、特徴点と再投影点の距離を再投影誤差として算出し、所定の大きさ以上離れている特徴点を除外するので、誤対応を減少させて高精度に各フレームの画像から対象となる部分（非剛体部分）を分離することができる。 According to this configuration, the image processing apparatus back-projects the feature points into the three-dimensional space based on the three-dimensional position of each feature point and the camera posture, and re-projects the feature points onto the image plane. Since the distance between points is calculated as a reprojection error, and feature points that are more than a predetermined size are excluded, the miscorrespondence is reduced and the target part (non-rigid part) is accurately extracted from each frame image. Can be separated.

画像処理装置では、前記分離部により各重複領域から非剛体部分を分離した後、非剛体部分を前景領域とし、前記前景領域以外の部分を背景領域とし、前記前景領域と前記背景領域の境界領域からトライマップを作成し、前記前景領域と前記背景領域の連続性に基づいて、前記前景領域の境界部分の透過度を算出する透過度算出部を備える構成でもよい。 In the image processing apparatus, after separating the non-rigid body part from each overlapping area by the separating unit, the non-rigid body part is used as a foreground area, the part other than the foreground area is used as a background area, and the boundary area between the foreground area and the background area is used. In other words, a configuration may be provided that includes a transparency calculation unit that creates a tri-map from the boundary and calculates the transparency of the boundary portion of the foreground area based on the continuity of the foreground area and the background area.

かかる構成によれば、画像処理装置は、透過度算出部により前景領域（非剛体部分）の境界部分の透過度を算出するので、非剛体部分の境界部分に自然なソフトエッジ処理を行うことができる。例えば、このような非剛体部分を他の映像に合成処理した場合、非剛体部分の境界部分が他の映像にうまく溶け込み、違和感のない合成映像を作ることができる。 According to such a configuration, the image processing apparatus calculates the transparency of the boundary portion of the foreground area (non-rigid body portion) by the transparency calculation unit, and thus can perform natural soft edge processing on the boundary portion of the non-rigid body portion. it can. For example, when such a non-rigid body part is combined with another image, the boundary part of the non-rigid body part is well blended with the other image, and a composite image without a sense of incongruity can be created.

画像処理装置では、前記分離部により各重複領域から非剛体部分を分離した分離領域と、前記対象となるフレームの画像の前又は／及び後のフレームの画像において前記分離領域に対応する対応領域とを比較して、類似度を判定する第１類似度判定部と、前記対象となるフレームの画像の前及び後のフレームの画像において前記分離領域に対応する対応領域同士を比較して、類似度を判定する第２類似度判定部と、前記第１類似度判定部による判定結果から類似度が所定値よりも低いと判定され、前記第２類似度判定部により類似度が所定値よりも高いと判定された場合、前記対象となるフレームの画像の前又は後のフレームの画像において前記分離領域に対応する対応領域のうち、類似度が高い方の対応領域を切り出して前記分離領域に張り付ける処理部を備える構成でもよい。 In the image processing apparatus, a separation region in which a non-rigid portion is separated from each overlapping region by the separation unit, and a corresponding region corresponding to the separation region in an image of a frame before or / and after the target frame image And comparing the corresponding regions corresponding to the separation regions in the image of the frame before and after the image of the target frame with the first similarity determination unit that determines the similarity The similarity is determined to be lower than a predetermined value from the determination result by the second similarity determination unit and the first similarity determination unit, and the similarity is higher than the predetermined value by the second similarity determination unit Is determined, the corresponding region having the higher similarity among the corresponding regions corresponding to the separation region in the image of the frame before or after the target frame image is cut out and stretched to the separation region. It may be configured to include a processing unit to attach.

かかる構成によれば、画像処理装置は、画像から非剛体部分を分離した箇所を他のフレームの画像における同一領域の要素で補うので、好適に剛体部分のみの画像を作成することができる。 According to such a configuration, the image processing apparatus supplements the portion where the non-rigid body part is separated from the image with the elements of the same region in the image of the other frame, so that it is possible to suitably create an image of only the rigid body part.

本発明に係る画像処理方法は、複数のフレームで構成されるコンテンツにおいて、各フレームの画像から特徴点を検出する特徴点検出工程と、前記特徴点検出工程により検出された各特徴点の特徴量に基づいて、各フレームの画像間の対応点を探索する対応点探索工程と、対象となるフレームの画像において、隣接するｎ個の特徴点を連結して複数の多角形状に分割し、前記対象となるフレームの画像の前又は／及び後のフレームの画像に対して、前記複数の多角形状を投影する多角形状分割工程と、前記対象となるフレームの画像及び前記対象となるフレームの画像の前又は／及び後のフレームの画像において、画素同士の類似性に基づいて、各画像を複数の領域に分割する領域分割工程と、前記多角形状分割工程により複数の多角形状に分割した前記対象となるフレームの画像と、前記領域分割工程により複数の領域に分割した前記対象となるフレームの画像とを比較し、各多角形状の位置に対応する各分割領域を重複領域として抽出する抽出工程と、前記対応点探索工程により探索した対応点に基づいて、前記対象となるフレームの画像の前又は／及び後のフレームの画像から、前記抽出工程で抽出された各重複領域に対応する対応領域を特定する特定工程と、前記抽出工程で抽出された各重複領域と、前記特定工程で特定された各対応領域とを比較して、変化量を算出し、算出した変化量が所定の閾値を超える場合には、その重複領域には非剛体が含まれると判断する判断工程と、前記領域分割工程により分割した領域に基づいて、前記判断工程により非剛体が含まれていると判断された重複領域から非剛体部分を分離する分離工程とを備える構成である。 An image processing method according to the present invention includes a feature point detection step of detecting feature points from an image of each frame in content composed of a plurality of frames, and a feature amount of each feature point detected by the feature point detection step A corresponding point search step for searching for corresponding points between images in each frame, and in the target frame image, n adjacent feature points are connected and divided into a plurality of polygonal shapes, and the target A polygonal shape dividing step of projecting the plurality of polygonal shapes with respect to an image of a frame before and / or after an image of a target frame, and an image of the target frame and an image of the target frame Alternatively, in the image of the subsequent frame, based on the similarity between pixels, a region dividing step of dividing each image into a plurality of regions, and a plurality of polygonal shapes by the polygonal shape dividing step The divided image of the target frame is compared with the image of the target frame divided into a plurality of regions by the region dividing step, and each divided region corresponding to each polygonal position is extracted as an overlapping region. Corresponding to each overlapping region extracted in the extraction step from the image of the frame before or / and after the target frame image based on the corresponding point searched by the corresponding point search step. A specific step for identifying the corresponding region to be compared, each overlapping region extracted in the extraction step, and each corresponding region identified in the specific step are calculated to calculate a change amount, and the calculated change amount is predetermined. If the threshold is exceeded, the determination step determines that the overlapping region includes a non-rigid body, and the determination step includes a non-rigid body based on the region divided by the region division step. It is configured to include a separation step of separating the determined non-rigid portion from the overlapping region with.

かかる構成によれば、画像処理方法は、各フレームの画像に対して、特徴点の抽出、対応点探索等を行い、一方、各フレームの画像に対して領域分割を行って、これらの結果に基づいて各フレームの画像から対象となる部分（非剛体部分）を分離するので、対象となる部分（非剛体部分）を手作業によらずに分離することができ、映像加工の専門知識を持たない人でも多大な労力を払うことなく所望の映像制作を行うことを可能とする。 According to such a configuration, the image processing method performs feature point extraction, corresponding point search, and the like for each frame image, while performing region segmentation for each frame image to obtain these results. Based on the image of each frame, the target part (non-rigid part) is separated, so the target part (non-rigid part) can be separated without manual work, and has expertise in video processing Even a person who is not able to perform desired video production without paying a great deal of labor.

本発明に係るプログラムは、複数のフレームで構成されるコンテンツにおいて、各フレームの画像から特徴点を検出する特徴点検出工程と、前記特徴点検出工程により検出された各特徴点の特徴量に基づいて、各フレームの画像間の対応点を探索する対応点探索工程と、対象となるフレームの画像において、隣接するｎ個の特徴点を連結して複数の多角形状に分割し、前記対象となるフレームの画像の前又は／及び後のフレームの画像に対して、前記複数の多角形状を投影する多角形状分割工程と、前記対象となるフレームの画像及び前記対象となるフレームの画像の前又は／及び後のフレームの画像において、画素同士の類似性に基づいて、各画像を複数の領域に分割する領域分割工程と、前記多角形状分割工程により複数の多角形状に分割した前記対象となるフレームの画像と、前記領域分割工程により複数の領域に分割した前記対象となるフレームの画像とを比較し、各多角形状の位置に対応する各分割領域を重複領域として抽出する抽出工程と、前記対応点探索工程により探索した対応点に基づいて、前記対象となるフレームの画像の前又は／及び後のフレームの画像から、前記抽出工程で抽出された各重複領域に対応する対応領域を特定する特定工程と、前記抽出工程で抽出された各重複領域と、前記特定工程で特定された各対応領域とを比較して、変化量を算出し、算出した変化量が所定の閾値を超える場合には、その重複領域には非剛体が含まれると判断する判断工程と、前記領域分割工程により分割した領域に基づいて、前記判断工程により非剛体が含まれていると判断された重複領域から非剛体部分を分離する分離工程とをコンピュータに実行させるためのものである。 The program according to the present invention is based on a feature point detection step for detecting a feature point from an image of each frame in content composed of a plurality of frames, and a feature amount of each feature point detected by the feature point detection step. The corresponding point search step for searching for the corresponding points between the images of each frame, and in the image of the target frame, n feature points adjacent to each other are connected and divided into a plurality of polygonal shapes to be the target. A polygonal shape dividing step of projecting the plurality of polygonal shapes with respect to an image of a frame before or / and after a frame image; and an image of the target frame and an image of the target frame before or / And in the image of the subsequent frame, based on the similarity between pixels, the image is divided into a plurality of polygons by a region dividing step for dividing each image into a plurality of regions, and the polygon shape dividing step. The target frame image is compared with the target frame image divided into a plurality of regions in the region dividing step, and each divided region corresponding to each polygonal position is extracted as an overlapping region. Corresponding to each overlapping region extracted in the extraction step from the image of the frame before or / and after the target frame image based on the extraction step and the corresponding points searched by the corresponding point search step A specific step of identifying a corresponding region, each overlapping region extracted in the extraction step, and each corresponding region specified in the specific step are compared to calculate a change amount, and the calculated change amount is a predetermined amount When the threshold value is exceeded, the determination step determines that the overlapping region includes a non-rigid body, and the determination step includes a non-rigid body based on the region divided by the region division step. It is intended for executing a separation step of separating the non-rigid portion from the determined overlapping regions in the computer.

かかる構成によれば、プログラムは、各フレームの画像に対して、特徴点の抽出、対応点探索等を行い、一方、各フレームの画像に対して領域分割を行って、これらの結果に基づいて各フレームの画像から対象となる部分（非剛体部分）を分離するので、対象となる部分（非剛体部分）を手作業によらずに分離することができ、映像加工の専門知識を持たない人でも多大な労力を払うことなく所望の映像制作を行うことを可能とする。 According to such a configuration, the program performs feature point extraction, corresponding point search, and the like for each frame image, while performing region segmentation for each frame image and based on these results. Since the target part (non-rigid part) is separated from the image of each frame, the target part (non-rigid part) can be separated without manual work, and the person does not have expertise in video processing. However, the desired video production can be performed without much effort.

本発明によれば、対象となる部分（非剛体部分）を手作業によらずに分離することができる。 According to the present invention, a target portion (non-rigid portion) can be separated without manual operation.

画像処理装置の構成を示す図である。It is a figure which shows the structure of an image processing apparatus. 画像処理装置の動作についての説明に供するフローチャートである。It is a flowchart with which it uses for description about operation | movement of an image processing apparatus. ＳＬＩＣ手法についての説明に供する図である。It is a figure where it uses for description about a SLIC technique. 剛体映像及び非剛体映像を作成する手順についての説明に供する図である。It is a figure with which it uses for description about the procedure which produces a rigid body image | video and a non-rigid body image | video.

ビデオカメラによる撮影映像において、カメラ姿勢を考慮し大域的な動きをキャンセルした場合、非剛体又は動物体を特定の位置、及び時間方向で観察すると、映像情報に変化が現れる。一方、背景の建物などは変化しない。 When a global motion is canceled in consideration of the camera posture in a video image taken by a video camera, a change appears in video information when a non-rigid body or an animal body is observed at a specific position and time direction. On the other hand, the buildings in the background do not change.

本実施例は、この点に着目し、特徴点の抽出と、対応点の探索の結果を利用して、最適化によりカメラ姿勢や特徴点の３次元位置情報を算出し、この過程で再投影誤差を利用して、対応点の外れ値を除外する構成を提案する。さらに、本実施例では、色や明るさが類似している領域で画像を分割した領域分割情報を利用して、カメラ本体を移動させながら撮影した映像に対しても、動物体を含めた非剛体の領域を判定し、分離する構成を提案する。以下に、本実施例に係る画像処理装置１の構成と動作について詳述する。 In this embodiment, focusing on this point, using the result of feature point extraction and corresponding point search, the camera posture and the three-dimensional position information of the feature point are calculated by optimization, and reprojection is performed in this process. A configuration is proposed in which outliers at corresponding points are excluded using an error. Furthermore, in the present embodiment, the image including the moving object is also included in the video shot while moving the camera body using the area division information obtained by dividing the image in the area where the color and brightness are similar. We propose a configuration that determines and separates rigid regions. Hereinafter, the configuration and operation of the image processing apparatus 1 according to the present embodiment will be described in detail.

画像処理装置１は、図１に示すように、特徴点検出部１１と、対応点探索部１２と、多角形状分割部１３と、領域分割部１４と、抽出部１５と、特定部１６と、判断部１７と、分離部１８とを備える。なお、特徴点検出部１１及び対応点探索部１２は、カメラ姿勢推定部１０１として構成されてもよい。また、多角形状分割部１３、抽出部１５、特定部１６、判断部１７及び分離部１８は、非剛体領域抽出部１０２として構成されてもよい。 As shown in FIG. 1, the image processing apparatus 1 includes a feature point detection unit 11, a corresponding point search unit 12, a polygonal shape division unit 13, a region division unit 14, an extraction unit 15, a specification unit 16, A determination unit 17 and a separation unit 18 are provided. Note that the feature point detection unit 11 and the corresponding point search unit 12 may be configured as the camera posture estimation unit 101. Further, the polygonal shape dividing unit 13, the extracting unit 15, the specifying unit 16, the determining unit 17, and the separating unit 18 may be configured as the non-rigid body region extracting unit 102.

特徴点検出部１１は、複数のフレームで構成されるコンテンツにおいて、各フレームの画像から特徴点を検出する。なお、コンテンツは、ビデオカメラ等により撮像されたデータであり、例えば、コンテンツＤＢ２に記録されている。 The feature point detection unit 11 detects a feature point from an image of each frame in content composed of a plurality of frames. Note that the content is data captured by a video camera or the like, and is recorded in the content DB 2, for example.

対応点探索部１２は、特徴点検出部１１により検出された各特徴点の特徴量に基づいて、各フレームの画像間の対応点を探索する。 The corresponding point search unit 12 searches for corresponding points between images of each frame based on the feature amount of each feature point detected by the feature point detection unit 11.

多角形状分割部１３は、対象となるフレームの画像において、隣接するｎ個の特徴点を連結して複数の多角形状に分割し、対象となるフレームの画像の前又は／及び後のフレームの画像に対して、複数の多角形状を投影する。多角形状分割部１３は、例えば、隣接する３個の特徴点を連結して複数の三角形状に分割する。 The polygonal shape dividing unit 13 connects n feature points adjacent to each other in an image of a target frame and divides the feature points into a plurality of polygonal shapes, and an image of a frame before or / and after the image of the target frame. In contrast, a plurality of polygonal shapes are projected. The polygonal shape dividing unit 13 divides, for example, three adjacent feature points into a plurality of triangular shapes.

領域分割部１４は、対象となるフレームの画像及び対象となるフレームの画像の前又は／及び後のフレームの画像において、画素同士の類似性に基づいて、各画像を複数の領域に分割する。なお、領域分割部１４は、コンテンツＤＢ２から各フレームの画像が供給される。 The region dividing unit 14 divides each image into a plurality of regions based on the similarity between pixels in the image of the target frame and the image of the frame before and / or after the image of the target frame. Note that the region dividing unit 14 is supplied with an image of each frame from the content DB 2.

抽出部１５は、多角形状分割部１３により複数の多角形状に分割した対象となるフレームの画像と、領域分割部１４により複数の領域に分割した対象となるフレームの画像とを比較し、各多角形状の位置に対応する各分割領域を重複領域として抽出する。 The extraction unit 15 compares the image of the target frame divided into a plurality of polygonal shapes by the polygonal shape dividing unit 13 and the image of the target frame divided into a plurality of regions by the region dividing unit 14, and determines each polygon. Each divided region corresponding to the position of the shape is extracted as an overlapping region.

特定部１６は、対応点探索部１２により探索した対応点に基づいて、対象となるフレームの画像の前又は／及び後のフレームの画像から、抽出部１５で抽出された各重複領域に対応する対応領域を特定する。 Based on the corresponding points searched by the corresponding point search unit 12, the specifying unit 16 corresponds to each overlapping region extracted by the extraction unit 15 from the image of the frame before or / and after the target frame image. Identify the corresponding area.

判断部１７は、抽出部１５で抽出された各重複領域と、特定部１６で特定された各対応領域とを比較して、変化量を算出し、算出した変化量が所定の閾値を超える場合には、その重複領域には非剛体が含まれると判断する。 When the determination unit 17 compares each overlapping area extracted by the extraction unit 15 with each corresponding region specified by the specifying unit 16 to calculate a change amount, and the calculated change amount exceeds a predetermined threshold value Determines that the overlapping region includes a non-rigid body.

分離部１８は、領域分割部１４により分割した領域に基づいて、判断部１７により非剛体が含まれていると判断された重複領域から非剛体部分を分離する。 The separation unit 18 separates the non-rigid portion from the overlapping region determined by the determination unit 17 as containing a non-rigid body based on the region divided by the region dividing unit 14.

このように構成されることにより、画像処理装置１は、各フレームの画像に対して、特徴点の抽出、対応点探索等を行い、一方、各フレームの画像に対して領域分割を行って、これらの結果に基づいて各フレームの画像から対象となる部分（非剛体部分）を分離するので、対象となる部分（非剛体部分）を手作業によらずに分離することができ、映像加工の専門知識を持たない人でも多大な労力を払うことなく所望の映像制作を行うことを可能とする。 With this configuration, the image processing apparatus 1 performs feature point extraction, corresponding point search, and the like on each frame image, while performing region division on each frame image, Based on these results, the target part (non-rigid part) is separated from the image of each frame, so that the target part (non-rigid part) can be separated without manual work. Even a person who does not have specialized knowledge can perform desired video production without much effort.

画像処理装置１は、図１に示すように、算出部１９を備える構成でもよい。なお、算出部１９は、カメラ姿勢推定部１０１として構成されてもよい。
算出部１９は、対応点探索部１２により探索された各フレームの画像間の対応点を利用して、バンドル調整による最適化により、各特徴点の３次元位置と、コンテンツを撮像したカメラのカメラ姿勢を推定し、各特徴点の３次元位置と、カメラ姿勢に基づいて、特徴点を３次元空間に逆投影し、さらに画像面に再投影し、特徴点と再投影点の距離を再投影誤差として算出し、所定の大きさ以上離れている特徴点を特定する。
対応点探索部１２は、特徴点検出部１１により検出された各特徴点から算出部１９により特定された特徴点を除外し、残った特徴点の特徴量に基づいて、各フレームの画像間の対応点を探索する。 As shown in FIG. 1, the image processing apparatus 1 may include a calculation unit 19. The calculation unit 19 may be configured as the camera posture estimation unit 101.
The calculation unit 19 uses the corresponding points between the images of each frame searched by the corresponding point search unit 12 and optimizes the three-dimensional position of each feature point and the camera of the camera that captured the content by optimization by bundle adjustment. Estimate the posture, back-project the feature point to the three-dimensional space based on the 3D position of each feature point and the camera posture, re-project it on the image plane, and re-project the distance between the feature point and the re-projection point It is calculated as an error, and a feature point that is more than a predetermined size is specified.
The corresponding point search unit 12 excludes the feature points specified by the calculation unit 19 from the feature points detected by the feature point detection unit 11, and based on the feature quantities of the remaining feature points, Search for corresponding points.

かかる構成によれば、画像処理装置１は、各特徴点の３次元位置と、カメラ姿勢に基づいて、特徴点を３次元空間に逆投影し、さらに画像面に再投影し、特徴点と再投影点の距離を再投影誤差として算出し、所定の大きさ以上離れている特徴点を除外するので、誤対応を減少させて高精度に各フレームの画像から対象となる部分（非剛体部分）を分離することができる。 According to this configuration, the image processing apparatus 1 back-projects the feature points into the three-dimensional space based on the three-dimensional position of each feature point and the camera posture, and re-projects the feature points onto the image plane. The distance between the projection points is calculated as a reprojection error, and feature points that are more than the specified size are excluded, so that miscorrespondence is reduced and the target part (non-rigid part) from each frame image with high accuracy Can be separated.

画像処理装置１は、図１に示すように、透過度算出部２０を備える構成でもよい。
透過度算出部２０は、分離部１８により各重複領域から非剛体部分を分離した後、非剛体部分を前景領域とし、前景領域以外の部分を背景領域とし、前景領域と背景領域の境界領域からトライマップを作成し、前景領域と背景領域の連続性に基づいて、前景領域の境界部分の透過度を算出する。 As shown in FIG. 1, the image processing apparatus 1 may be configured to include a transparency calculation unit 20.
After the separation unit 18 separates the non-rigid body part from each overlapping area, the transparency calculation unit 20 sets the non-rigid body part as the foreground area, the part other than the foreground area as the background area, and the boundary area between the foreground area and the background area. A trimap is created, and the transparency of the boundary portion of the foreground area is calculated based on the continuity between the foreground area and the background area.

かかる構成によれば、画像処理装置１は、透過度算出部２０により前景領域（非剛体部分）の境界部分の透過度を算出するので、非剛体部分の境界部分に自然なソフトエッジ処理を行うことができる。例えば、このような非剛体部分を他の映像に合成処理した場合、非剛体部分の境界部分が他の映像にうまく溶け込み、違和感のない合成映像を作ることができる。 According to this configuration, the image processing apparatus 1 calculates the transparency of the boundary portion of the foreground area (non-rigid body portion) by the transparency calculation unit 20, and thus performs natural soft edge processing on the boundary portion of the non-rigid body portion. be able to. For example, when such a non-rigid body part is combined with another image, the boundary part of the non-rigid body part is well blended with the other image, and a composite image without a sense of incongruity can be created.

画像処理装置１は、図１に示すように、第１類似度判定部２１と、第２類似度判定部２２と、処理部２３を備える構成でもよい。なお、第１類似度判定部２１、第２類似度判定部２２及び処理部２３は、剛体映像生成部１０３として構成されてもよい。 As illustrated in FIG. 1, the image processing apparatus 1 may include a first similarity determination unit 21, a second similarity determination unit 22, and a processing unit 23. The first similarity determination unit 21, the second similarity determination unit 22, and the processing unit 23 may be configured as the rigid body image generation unit 103.

第１類似度判定部２１は、分離部１８により各重複領域から非剛体部分を分離した分離領域と、対象となるフレームの画像の前又は／及び後のフレームの画像において分離領域に対応する対応領域とを比較して、類似度を判定する。 The first similarity determination unit 21 corresponds to the separation region in which the non-rigid portion is separated from each overlapping region by the separation unit 18 and the separation region in the image of the frame before and / or after the target frame image. The similarity is determined by comparing with the area.

第２類似度判定部２２は、対象となるフレームの画像の前及び後のフレームの画像において分離領域に対応する対応領域同士を比較して、類似度を判定する。 The second similarity determination unit 22 determines the similarity by comparing corresponding areas corresponding to the separation areas in the image of the frame before and after the image of the target frame.

処理部２３は、第１類似度判定部２１による判定結果から類似度が所定値よりも低いと判定され、第２類似度判定部２２により類似度が所定値よりも高いと判定された場合、対象となるフレームの画像の前又は後のフレームの画像において分離領域に対応する対応領域のうち、類似度が高い方の対応領域を切り出して分離領域に張り付ける。 When the processing unit 23 determines that the similarity is lower than a predetermined value from the determination result by the first similarity determination unit 21 and the second similarity determination unit 22 determines that the similarity is higher than the predetermined value, Of the corresponding regions corresponding to the separation region in the image of the frame before or after the target frame image, the corresponding region having the higher similarity is cut out and pasted to the separation region.

かかる構成によれば、画像処理装置１は、画像から非剛体部分を分離した箇所を他のフレームの画像における同一領域の要素で補うので、好適に剛体部分のみの画像を作成することができる。 According to such a configuration, the image processing apparatus 1 supplements the portion where the non-rigid body part is separated from the image with the elements of the same region in the image of the other frame, so that an image of only the rigid body part can be preferably created.

上述したように、ビデオカメラから取得された映像（複数のフレームで構成されるコンテンツ）は、コンテンツＤＢ２に格納される。
カメラ姿勢推定部１０１は、コンテンツＤＢ２に格納された映像に対し、特徴点の抽出、対応点探索、カメラ姿勢推定、特徴点３次元位置の推定を行う。また、領域分割部１４は、これと同時に、映像に対し領域分割を行う。 As described above, the video (content composed of a plurality of frames) acquired from the video camera is stored in the content DB 2.
The camera posture estimation unit 101 performs feature point extraction, corresponding point search, camera posture estimation, and feature point three-dimensional position estimation on the video stored in the content DB 2. At the same time, the area dividing unit 14 performs area division on the video.

非剛体領域抽出部１０２は、これらの結果を利用し、非剛体領域（動物体を含む非剛体が含まれる領域）を抽出する。剛体映像生成部１０３は、非剛体領域を除去した剛体領域のみの映像を生成する。また、透過度算出部２０は、非剛体領域と剛体領域の境界の透過度を推定する。 The non-rigid region extraction unit 102 extracts non-rigid regions (regions including non-rigid bodies including moving objects) using these results. The rigid image generation unit 103 generates an image of only the rigid region from which the non-rigid region is removed. Further, the transmittance calculation unit 20 estimates the transmittance at the boundary between the non-rigid region and the rigid region.

このようにして、画像処理装置１は、非剛体領域のみで構成された映像と、剛体領域のみで構成された映像を作り出すことができる。 In this way, the image processing apparatus 1 can create an image composed only of the non-rigid region and an image composed only of the rigid region.

ここで、画像処理装置１による処理の流れについて、図２に示すフローチャートを参照しながら説明する。なお、ステップＳＴ１〜ステップＳＴ３は、カメラ姿勢推定工程ＳＴ１０１であり、ステップＳＴ４〜ステップＳＴ８は、剛体・非剛体領域判定工程ＳＴ１０２であり、ステップＳＴ９〜ステップＳＴ１０は、ポスト・プロセス工程ＳＴ１０３である。 Here, the flow of processing by the image processing apparatus 1 will be described with reference to the flowchart shown in FIG. Steps ST1 to ST3 are the camera posture estimation step ST101, steps ST4 to ST8 are the rigid / non-rigid region determination step ST102, and steps ST9 to ST10 are the post process step ST103.

ステップＳＴ１において、カメラ姿勢推定部１０１は、特徴点を抽出し、記述する。
ステップＳＴ２において、カメラ姿勢推定部１０１は、対応点を探索する。
ステップＳＴ３において、カメラ姿勢推定部１０１は、最適化（三次元座標とカメラパラメータ出力）を行う。 In step ST1, the camera posture estimation unit 101 extracts and describes feature points.
In step ST2, the camera posture estimation unit 101 searches for corresponding points.
In step ST3, the camera posture estimation unit 101 performs optimization (3D coordinate and camera parameter output).

ステップＳＴ４において、領域分割部１４は、初期値を設定する。分割間隔は、本実施例では、Ｓ＝ｎ／ｉ、ｎ＝６４、ｉ＝１を初期値とした。
ステップＳＴ５において、領域分割部１４は、所定の方法によって領域分割を行う。ここで、所定の方法とは、例えば、ＳＬＩＣ手法による時空間クラスタリングである。 In step ST4, the area dividing unit 14 sets an initial value. In this embodiment, the initial values of the division intervals are S = n / i, n = 64, and i = 1.
In step ST5, the area dividing unit 14 performs area division by a predetermined method. Here, the predetermined method is, for example, spatio-temporal clustering by the SLIC method.

ステップＳＴ６において、非剛体領域抽出部１０２は、重複領域の抽出を行う。
ステップＳＴ７において、非剛体領域抽出部１０２は、重複領域の評価を行う。
ステップＳＴ８において、領域分割部１４は、分割数が最大であるかどうかを判断する。分割数が最大の場合（Ｙｅｓ）には、ステップＳＴ９に進み、分割数が最大でない場合（Ｎｏ）には、ステップＳＴ５に戻る。 In step ST6, the non-rigid region extraction unit 102 extracts an overlapping region.
In step ST7, the non-rigid region extraction unit 102 evaluates the overlapping region.
In step ST8, the area dividing unit 14 determines whether the number of divisions is the maximum. If the number of divisions is the maximum (Yes), the process proceeds to step ST9. If the number of divisions is not the maximum (No), the process returns to step ST5.

ステップＳＴ９において、剛体映像生成部１０３は、穴埋め処理による剛体画像を生成する。
ステップＳＴ１０において、透過度算出部２０は、非剛体領域の透過度マップを作成する。 In step ST9, the rigid body image generation unit 103 generates a rigid body image by the hole filling process.
In step ST10, the transparency calculation unit 20 creates a transparency map of the non-rigid region.

＜カメラ姿勢推定工程ＳＴ１０１について＞
カメラ姿勢推定工程ＳＴ１０１においては、動画像の各時刻の画像に対し、ＳＩＦＴ手法により特徴点を検出・記述する。次に、各画像上の特徴点と対応する、異なる時刻の画像上の対応点を特徴ベクトルの距離を指標に判定し、対応関係を求める。次に、バンドル調整（アジャストメント）による最適化で、特徴点の３次元位置と、カメラ姿勢を推定する。 <About Camera Posture Estimation Step ST101>
In the camera posture estimation step ST101, feature points are detected and described by the SIFT method for each time image of the moving image. Next, corresponding points on the images at different times corresponding to the feature points on each image are determined using the distance of the feature vector as an index, and a correspondence relationship is obtained. Next, the three-dimensional position of the feature point and the camera posture are estimated by optimization through bundle adjustment (adjustment).

ここで、求められる情報は、
・各時刻の画像上の特徴点の画像上の水平、垂直位置と３次元空間上の（ｘ，ｙ，ｚ）位置、
・カメラの内部パラメータである画像中心（ｃ_ｘ，ｃ_ｙ）、焦点距離（ｆ）、レンズ歪み（ｋ１，ｋ２）
・カメラの外部パラメータ（ｘ，ｙ，ｚ，ｒｏｌｌ，ｐｉｔｃｈ，ｙａｗ）、
である。 Here, the required information is
The horizontal and vertical positions of the feature points on the image at each time and the (x, y, z) position in the three-dimensional space;
Image center (c _x , c _y ), focal length (f), lens distortion (k1, k2), which are internal parameters of the camera
Camera external parameters (x, y, z, roll, pitch, yaw),
It is.

ここで、本実施例では、カメラパラメータや特徴点の３次元位置を推定する必要はなく、誤対応の少ない対応関係を求めることができれば、この部分の処理を限定するものではない。 Here, in the present embodiment, it is not necessary to estimate the three-dimensional positions of camera parameters and feature points, and the processing of this part is not limited as long as a correspondence relationship with few erroneous correspondences can be obtained.

例えば、特徴点を抽出し、それらの特徴ベクトルのユークリッド距離から対応関係を求める方法でもよいし、フレーム双方向で対応関係を求め、同一の対応関係にならなければ、外れ値として除外するような誤対応抑制処理を適用してもよい。 For example, a feature point may be extracted and a correspondence relationship may be obtained from the Euclidean distance of those feature vectors, or a correspondence relationship may be obtained in both directions of a frame, and if the same correspondence relationship is not obtained, it is excluded as an outlier. You may apply a mishandling suppression process.

誤対応が少なくすることが、本実施例の高精度化に有利である。よって、本実施例では最適化によりカメラパラメータと特徴点の３次元位置を求める工程により、つまり幾何学的なヒントを利用することにより、外れ値を抑制している。 Reducing erroneous correspondence is advantageous for improving the accuracy of this embodiment. Therefore, in this embodiment, outliers are suppressed by a process of obtaining the three-dimensional positions of camera parameters and feature points by optimization, that is, by using geometric hints.

＜剛体・非剛体領域判定工程ＳＴ１０２について＞
つぎに、基本的な時空間分割について説明を行う。ＳＬＩＣ手法は、図３に示すように、処理対象となる画像Ａに一定間隔（Ｓ）でシードとなる点を配置し、このシード位置の画素情報（又は更新した同じラベルの付いた画素の画素情報の平均値）とシードから２Ｓの範囲の各画素情報との比較し、その類似性を拠り所にラベル付けを繰り返し、領域分割を行う。つまり、この２Ｓは、各画素を評価する範囲となる。 <Regarding the rigid / non-rigid region determination step ST102>
Next, basic space-time division will be described. As shown in FIG. 3, the SLIC method arranges seed points at a fixed interval (S) in the image A to be processed, and pixel information of the seed position (or the pixel of the pixel with the same updated label). The average value of information) is compared with each pixel information in the range of 2S from the seed, labeling is repeated based on the similarity, and region division is performed. That is, 2S is a range for evaluating each pixel.

ＳＬＩＣ手法は、被写体の動きを考慮した手法ではないため、本実施例では、時間方向の画像間で同一クラスタ領域の移動量を用い、画素間の画素情報の比較、シード位置の更新に利用し、速い動きの被写体に対しても領域が正しく連結される工夫を行った。 Since the SLIC method is not a method that considers the movement of the subject, in this embodiment, the movement amount of the same cluster region is used between images in the time direction, and is used for comparison of pixel information between pixels and update of the seed position. In addition, a device has been devised in which the regions are correctly connected even to fast-moving subjects.

具体的には、ＳＬＩＣで画素間の画素情報の特徴距離を算出する際、従来はその画素情報の色空間で、例えばＬ^＊ａ^＊ｂ^＊表色系で扱い、これと時空間での位置（ｘ，ｙ，ｚ）を加えた６次元の特徴とし、その距離を類似性の基準とする。 Specifically, when the feature distance of pixel information between pixels is calculated by SLIC, it is conventionally handled in the color space of the pixel information, for example, the L ^* a ^* b ^* color system, and the position in space-time A six-dimensional feature including (x, y, z) is added, and the distance is used as a criterion for similarity.

例えば、シード位置の画素をＰ、比較対象の画素をＱとすると、それぞれの特徴ベクトルは、下記式で表せる。

また、類似度の評価値Ｄは、下記式で表せる。評価値Ｄは、小さいほど、類似性があることになる。

For example, if the pixel at the seed position is P and the pixel to be compared is Q, each feature vector can be expressed by the following equation.

Moreover, the evaluation value D of similarity can be expressed by the following formula. The smaller the evaluation value D, the more similar.

ここでＳは、配置したシード間の距離（画素）で、ｍは、ＳＬＩＣで領域分割する際の複雑度を表し、ｍを大きくするほど空間上での近接度が重要となる。本実施例では、ｍ＝Ｓとした。 Here, S is the distance (pixel) between the arranged seeds, m represents the complexity when the area is divided by SLIC, and the proximity in space becomes more important as m is increased. In this embodiment, m = S.

ここで、本実施例では、各時間の画像間で同じラベルの画素を対象として、各時間での２次元重心位置の移動量をキャンセルするようＰｘ，Ｐｙに与えて、評価、３次元重心位置の更新を行う。 Here, in this embodiment, for pixels having the same label between images at each time, evaluation is given to Px and Py so as to cancel the movement amount of the two-dimensional centroid position at each time, and the three-dimensional centroid position is evaluated. Update.

これにより、速い動きの被写体が画像上に含まれている場合でも、同じ被写体が同一クラスタに含まれやすくなり、本実施例の目的である剛体領域と非剛体領域の分離の精度を向上することができる。 As a result, even when a fast-moving subject is included in the image, the same subject is likely to be included in the same cluster, and the accuracy of separation of the rigid region and the non-rigid region, which is the object of this embodiment, is improved. Can do.

ただし、本実施例は、被写体の動きの検出手法や、評価式にどのように反映させるかを限定するものではなく、例えば、各フレームの同一クラスタの重心移動量ではなく、各画素のオプティカルフローから平均移動量を考慮する、又は各画素の移動量を考慮するなどの方法でも精度向上は可能である。 However, the present embodiment does not limit the detection method of the subject movement or how to reflect it in the evaluation formula. For example, the optical flow of each pixel is not the amount of centroid movement of the same cluster of each frame. Therefore, the accuracy can be improved by a method such as considering the average movement amount or considering the movement amount of each pixel.

＜剛体・非剛体領域判定（重複領域の抽出）＞
剛体・非剛体領域の判定は、領域分割の結果と、カメラパラメータで求められた特徴点の三次元位置情報と、カメラ姿勢情報を利用し、ある時刻の特徴点間が成す三角形パッチを用いて行う。 <Rigid / non-rigid area determination (extraction of overlapping areas)>
Rigid / non-rigid areas are determined using a triangular patch consisting of feature points at a certain time using the result of segmentation, 3D position information of feature points determined by camera parameters, and camera posture information. Do.

ここで、カメラ姿勢、特徴点の３次元位置、及び時空間領域における分割処理結果が得られたものと仮定する。
図４に示すように、ある時刻（フレーム）ｔの画像（図４中の画像ｘ_１）に注目すると、その画像に対し、特徴点の３次元位置をカメラ姿勢を用いて投影し、２次元座標を求める。各特徴点の２次元座標は、隣接する特徴点と接続することで、三角形のパッチを形成できる。なお、本実施例では、ドロネー三角形分割手法を用いて三角形のパッチを形成したが、この手法に限定されない。 Here, it is assumed that the division processing result in the camera posture, the three-dimensional position of the feature point, and the spatio-temporal region is obtained.
As shown in FIG. 4, when attention is paid to an image at a certain time (frame) t (image x _{1 in} FIG. 4), the three-dimensional position of the feature point is projected onto the image using the camera posture. Find the coordinates. A two-dimensional coordinate of each feature point can be connected to an adjacent feature point to form a triangular patch. In the present embodiment, the triangular patches are formed using the Delaunay triangulation method, but the present invention is not limited to this method.

つぎに、同様の投影を前後の時刻（フレーム）の画像（図４中の画像ｙ_１及び画像ｚ_１）に対し行い、時刻ｔで作成した各三角形に対応する領域を特定する。なお、画像ｙ_１は、時刻ｔよりも時間的に以前の時刻（フレーム）ｔ−１の画像である。画像ｚ_１は、時刻ｔよりも時間的に以後の時刻（フレーム）ｔ＋１の画像である。 Next, similar projection is performed on the images at the previous and subsequent times (frames) (image y ₁ and image z _{1 in} FIG. 4), and the region corresponding to each triangle created at time t is specified. Note that the image y ₁ is an image at a time (frame) t−1 earlier in time than the time t. Image z _1, rather than time t is temporally subsequent time (frame) t + 1 of the image.

つぎに、領域分割結果と三角形領域の対応をとり、領域分割結果が三角形を重複する領域を抽出する。
ここで、時刻ｔについては同じ２次元座標空間であるが、その前後の時刻の画像は重複領域の場所は異なる。そこで、同一の三角形領域は、同じ被写体領域だと仮定すると、時刻ｔの三角形内の各画素は、前後の時刻の三角形領域にアフィン変換により座標変換可能となる。つまり、時刻ｔ以外の重複領域の各画素が、異なる時刻の画像上でどの座標に対応するかが判明することとなる。 Next, correspondence between the region division result and the triangular region is taken, and a region where the region division result overlaps the triangle is extracted.
Here, although the time t is the same two-dimensional coordinate space, the images of the times before and after the time are different in the location of the overlapping region. Thus, assuming that the same triangular area is the same subject area, each pixel in the triangle at time t can be coordinate-transformed into the triangular area at the previous and subsequent times by affine transformation. That is, it becomes clear to which coordinate each pixel of the overlapping area other than the time t corresponds on the image at a different time.

図４に示す例では、時刻ｔの画像ｘ_１の中の三角形領域ａ_１に対応する領域は、時刻ｔ−１の画像ｙ_１では、三角形領域ｂ_１であり、また、時刻ｔ＋１の画像ｚ_１では、三角形領域ｃ_１である。 In the example shown in FIG. 4, the region corresponding to the triangular region a ₁ in the image x ₁ at time t is the triangular region b ₁ in the image y ₁ at time t− ₁ , and the image z at time t + 1. in _1, a triangular area _{c 1.}

ここで、重複領域の時間変化に対する画像情報の特徴量の変化量を評価することで、剛体と非剛体の判定が可能になる。つまり、剛体では変化が少なく、非剛体では、その被写体の動きが画像の大域的な動きとは異なるため、変化量が多くなる。 Here, it is possible to determine a rigid body and a non-rigid body by evaluating the change amount of the feature amount of the image information with respect to the temporal change of the overlapping region. That is, the change is small in the rigid body, and the change in the non-rigid body is large because the movement of the subject is different from the global movement of the image.

図４に示す例では、時刻ｔの画像ｘ_１の中の三角形領域ａ_１は、前後の時刻の同じ三角形領域に基づいて、非剛体が含まれていると判断できる。よって、非剛体領域抽出部１０２は、三角形領域ａ_１から非剛体領域（図４中のＰ_１）を抽出する。 In the example illustrated in FIG. 4, it can be determined that the triangular area a ₁ in the image x _{1 at} the time t includes a non-rigid body based on the same triangular area at the previous and subsequent times. Therefore, the non-rigid region extraction unit 102 extracts a non-rigid region (P _{1 in} FIG. 4) from the triangular region a ₁ .

＜剛体・非剛体領域判定（重複領域の評価）＞
つぎに、重複領域の評価方法について述べる。厳密には、上述した三角形領域は、時刻変化に対し、その頂点位置が合致していても、三角形内部に含まれる被写体が平面とは限らないため、誤差が生じる。 <Rigid body / non-rigid body area determination (overlapping area evaluation)>
Next, a method for evaluating the overlapping area will be described. Strictly speaking, an error occurs because the subject included in the triangle is not necessarily a plane even if the apex position of the triangle area matches the time change.

例えば、時刻の異なる重複領域に対し、各画素の輝度差などを基準に剛体・非剛体判定を行った場合、判定結果が不安定となる。
そのため、本実施例では、多少の位置誤差が含まれていても模様と色特徴を基準に安定に重複領域の比較を行っている。 For example, when a rigid / non-rigid determination is performed on overlapping regions with different times based on the luminance difference of each pixel, the determination result becomes unstable.
For this reason, in this embodiment, even if some positional errors are included, the overlapping areas are stably compared on the basis of the pattern and color characteristics.

具体的には、重複領域を対象にＨｏＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔ）及び色類似度を用いて評価を行い、得られた類似度に対して閾値処理を行うことで判定を行う。ＨｏＧについては、輝度勾配の方向と程度からヒストグラムを作成し、これを特徴量とする。 More specifically, the evaluation is performed by using the HoG (Histogram of Oriented Gradient) and the color similarity for the overlapping region, and a threshold process is performed on the obtained similarity. For HoG, a histogram is created from the direction and degree of the luminance gradient, and this is used as the feature amount.

また、色類似度は、輝度成分を除外した色ヒストグラムインターセクションを用いる。本実施例では、色空間は、Ｌ^＊ａ^＊ｂ^＊表色系としたが、これに限定されない。ただし、照明の影響を受けないようＲＧＢ表色系ではなく、ＨＳＶなど輝度と色成分が独立した表色系が好ましい。 The color similarity uses a color histogram intersection excluding a luminance component. In this embodiment, the color space is the L ^* a ^* b ^* color system, but is not limited to this. However, not the RGB color system so as not to be affected by illumination, but a color system with independent luminance and color components such as HSV is preferable.

通常、色ヒストグラムインターセクションは、各色成分を量子化したものをヒストグラムのビン（ＢＩＮ）と、それぞれの色ヒストグラムを作成する。これを、各ビンで最小値を総和し、一方のヒストグラムの総和で除算した値を類似度とする。 Usually, the color histogram intersection creates a histogram bin (BIN) and a respective color histogram obtained by quantizing each color component. This is the sum of the minimum values in each bin, and the value obtained by dividing the sum by the sum of one histogram is taken as the similarity.

ここで、色空間としてＬ^＊ａ^＊ｂ^＊表色系を採用し、Ｌ（明度）を除き同様の演算により求めた値を本実施例では類似度とする。
これにより照明変化の影響を受けにくい類似度評価値となる。本実施例では、Ｌを除外したが、照明変化の程度に合わせて、Ｌ、ａ、ｂで重みづけをすることも可能である。 Here, the L ^* a ^* b ^* color system is adopted as the color space, and the value obtained by the same calculation except for L (brightness) is set as the similarity in this embodiment.
As a result, the similarity evaluation value is less likely to be affected by illumination changes. In this embodiment, L is excluded, but weighting with L, a, and b is also possible according to the degree of illumination change.

＜剛体・非剛体領域判定（判定処理）＞
剛体・非剛体であるかの判定は、重複領域の評価結果に基づき行う。評価パラメータの一つは色ヒストグラムインターセクションであり、これは１．０に近いほど類似していることになる。 <Rigid body / non-rigid area determination (determination process)>
Whether the body is rigid or non-rigid is determined based on the evaluation result of the overlapping region. One of the evaluation parameters is the color histogram intersection, which is more similar to 1.0.

もうひとつの評価パラメータであるＨｏＧは、勾配ヒストグラムであり、これについても色ヒストグラムインターセクションを算出すれば、類似しているほど１．０に近づくことになる。 Another evaluation parameter, HoG, is a gradient histogram. If a color histogram intersection is also calculated for this, the similarity is closer to 1.0.

ヒストグラムを特徴ベクトルとして、特徴ベクトル間の距離で評価することも可能である。また、算出した評価パラメータを非線形ＳＶＭで学習して判定したり、それぞれに重みを付けて評価することも可能であるが、本実施例では、実験から経験的に双方ともインターセクションを算出して乗算し、１．０に対し、±０．１以内の評価値のものを剛体と判定した。 It is also possible to evaluate the distance between feature vectors using a histogram as a feature vector. In addition, it is possible to evaluate the calculated evaluation parameters by learning with a non-linear SVM, or weight each of the evaluation parameters, but in this embodiment, both of them are calculated empirically from experiments. Multiplication was performed, and an evaluation value within ± 0.1 with respect to 1.0 was determined as a rigid body.

＜剛体・非剛体領域判定（ループによる詳細化）＞
ここまでの処理で求めた剛体・非剛体領域情報を、より詳細にするためＳＬＩＣ手法をベースとした領域分割処理における分割数を多く設定し、再分割を行う。
ここで、剛体・非剛体領域の境界については、判定誤りが生じやすいことと、前回のシード間隔では適切に剛体・非剛体領域の分離ができない場合があるため、境界領域では非剛体領域を剛体領域方向へ１領域分膨張させる。 <Rigid body / non-rigid area determination (detailed by loop)>
In order to make the rigid / non-rigid region information obtained in the above processing more detailed, the number of divisions in the region division processing based on the SLIC method is set to be larger, and re-division is performed.
Here, regarding the boundary between the rigid and non-rigid regions, a judgment error is likely to occur, and the rigid / non-rigid regions may not be properly separated at the previous seed interval. Expand one region in the region direction.

剛体と判定された領域を再度判定する必要はないので、その領域ではシードを設定しない。一方、非剛体と判定された領域に対しては、前回の処理に対し、２倍のシード間隔で座標を設定したシードを与える。 Since it is not necessary to determine again the area determined to be a rigid body, no seed is set in that area. On the other hand, for a region determined to be a non-rigid body, a seed having coordinates set at twice the seed interval is given to the previous process.

また、シード位置がＳＬＩＣ手法の重心位置算出により更新される場合に、重心位置が剛体領域に移動することがないような拘束条件を与える。なお、本実施例では、剛体領域の画像情報をＲ，Ｇ，Ｂで（０，０，０）の黒と設定したが、これに限定されず、例えば、ＳＬＩＣ内の更新において剛体領域に移動する場合は、Ｘ，Ｙ，Ｚ座標の重心位置更新を行わず、その回の更新時は、前回重心位置更新による値を保持するなどの方法でもよい。 In addition, when the seed position is updated by calculating the gravity center position of the SLIC method, a constraint condition is provided so that the gravity center position does not move to the rigid body region. In this embodiment, the image information of the rigid region is set to (0, 0, 0) black for R, G, and B. However, the present invention is not limited to this. For example, the image is moved to the rigid region in the update in the SLIC. In such a case, a method may be used in which the center-of-gravity position update of the X, Y, and Z coordinates is not performed and the value obtained by the previous center-of-gravity position update is retained at the time of the update.

本実施例では、これを繰り返すことで、より細かな領域分割を行う。また、本実施例では、シード間隔が２画素置き（つまり、水平方向及び垂直方向でシードとシードの間に１画素間が空く状態）となるまで繰り返す設定としたが、これに限られず、繰り返し回数と処理時間との間には相関があるため、用途により繰り返し条件を変更してもよい。 In the present embodiment, by repeating this, finer area division is performed. In the present embodiment, the setting is repeated until the seed interval is every two pixels (that is, a state where one pixel is spaced between the seeds in the horizontal direction and the vertical direction). Since there is a correlation between the number of times and the processing time, the repetition condition may be changed depending on the application.

＜ポスト・プロセス工程ＳＴ１０３について＞
上述した剛体・非剛体の領域分割処理は、それぞれの境界領域がいわゆるハードエッジとなっている。この状態で、映像加工に利用することは可能である。しかし、映画製作やテレビ番組制作においては、映像合成時に違和感が生じないようソフトエッジで映像素材を用意して用いることが一般的である。また、剛体領域についても、映像加工で利用するには、非剛体領域を取り除いただけでは、穴開きの状態であり、なんらかの映像情報で補填する必要がある。以下、これらの処理について説明する。 <About post-process step ST103>
In the above-described rigid / non-rigid region division processing, each boundary region is a so-called hard edge. In this state, it can be used for video processing. However, in movie production and television program production, it is common to prepare and use video material with a soft edge so that a sense of incongruity does not occur during video composition. In addition, in order to use the rigid body region for image processing, the removal of the non-rigid body region is a hole-open state, and it is necessary to compensate with some image information. Hereinafter, these processes will be described.

＜非剛体を含まない剛体映像の穴埋め処理＞
特定の時刻の非剛体領域に対し、別の時刻同士で同じ領域を比較する。このとき、上述した重複領域の評価方法で、評価を行う。これにより、内容が同一であれば、評価結果は類似度が高く評価される。このことから、統計的に剛体領域であるか否かの判断を行い、剛体領域と判断された映像情報を特定時刻の非剛体領域にアフィン変換を行い補填する。 <Hill filling of rigid body images that do not include non-rigid bodies>
Compare the non-rigid region at a specific time with the same region at different times. At this time, the evaluation is performed by the above-described overlapping region evaluation method. Thus, if the contents are the same, the evaluation result is evaluated with a high degree of similarity. Therefore, it is statistically determined whether or not the region is a rigid region, and video information determined to be a rigid region is compensated by performing affine transformation on a non-rigid region at a specific time.

この補填処理の際に、各映像情報の境界付近で、低域成分や絵柄に不連続が生じ、違和感となる場合がある。そこで、本実施例では、それぞれの境界領域１画素分を互いの隣接する画素との平均値とすることで違和感を低減させている。 During this compensation process, discontinuity may occur in the low frequency components and the pattern near the boundary of each video information, which may cause discomfort. Therefore, in this embodiment, the uncomfortable feeling is reduced by setting each pixel in the boundary area as an average value with the adjacent pixels.

この違和感に対する対策方法は、特に限定されるものではなく、例えば、色味を貼り付け先画像つまり穴開きの剛体領域のものに似せ、テクスチャ（画像の勾配）は貼り付け元画像、つまり別の時刻の非剛体領域に対応する剛体領域画像のものを使うことで自然な画像合成を実現する。以下の論文に掲載された手法を利用することも可能である。
Ｐ．Ｐｅｒｅｚ，Ｍ．ＧａｎｇｎｅｔａｎｄＡ．Ｂｌａｋｅ：”Ｐｏｉｓｓｏｎｉｍａｇｅｅｄｉｔｉｎｇ”，Ｐｒｏｃ．ＳＩＧＧＲＡＰＨ，２２（３），ｐｐ．３１３−３１８（２００３） There are no particular limitations on how to deal with this sense of incongruity. For example, the color is similar to that of the pasted image, that is, the rigid region of the hole, and the texture (gradient of the image) is the pasted image, that is, another image. Natural image synthesis is realized by using a rigid region image corresponding to a non-rigid region of time. It is also possible to use the methods published in the following papers.
P. Perez, M.M. Gangnet and A.M. Blake: “Poisson image editing”, Proc. SIGGRAPH, 22 (3), pp. 313-318 (2003)

図４に示す例では、時刻ｔの画像ｘ_１から非剛体領域（図４中のＰ_１）が抽出された画像ｘ_２は、抽出された部分を他の画像（図４中の画像ｙ_２と画像ｚ_２）を利用して埋める処理を行う様子を示している。画像ｙ_２は、時刻ｔ−１の画像であり、画像ｚ_２は、時刻ｔ＋１の画像である。
時刻ｔの画像ｘ_２の中の三角形領域ａ_２に対応する領域は、時刻ｔ−１の画像ｙ_２では、三角形領域ｂ_２であり、また、時刻ｔ＋１の画像ｚ_２では、三角形領域ｃ_２である。 In the example shown in FIG. 4, an image x _{2 obtained} by extracting a non-rigid body region (P _{1 in} FIG. 4) from the image x _{1 at} time t is an extracted part of another image (image y _{2 in} FIG. 4). And the image z ₂ ) are used for filling processing. Image _{y 2} is the time t-1 of the image, the image _{z 2} is a time t + 1 of the image.
The area corresponding to the triangular area a ₂ in the image x _{2 at} the time t is the triangular area b ₂ in the image y ₂ at the time t−1, and the triangular area c _{2 in} the image z ₂ at the time t + 1. It is.

三角形領域ｂ_２から非剛体領域Ｐ_１に相当する部分Ｐ_２を切り出し、また、三角形領域ｃ_２から非剛体領域Ｐ_２に相当する部分Ｐ_３を切り出す。そして、切り出した部分Ｐ_２と部分Ｐ_３を三角形領域ａ_２の該当する部分に合成する。このようにして、非剛体の映像を作成する。 A portion P ₂ corresponding to the non-rigid region P ₁ is cut out from the triangular region b _2, and a portion P ₃ corresponding to the non-rigid region P ₂ is cut out from the triangular region c ₂ . Then, the cut-out part P ₂ and the part P ₃ are combined with the corresponding part of the triangular area a ₂ . In this way, a non-rigid image is created.

＜非剛体領域の透過度マップ生成（αプレーン作成）＞
映像合成を行う際に、背景画像と前景画像を違和感なく合成するために考慮するべき点として、境界領域の自然さがあげられる。通常は、ハードエッジではなく、背景と前景の境界画素をブレンドしソフトエッジとすることで違和感の低減を行っている。具体的には前景となる画像、例えば、ＲＧＢ画像であれば、さらに透過度の分布を表す画像プレーン（αプレーン）を加え、この値を前景と背景の混合割合として映像合成を行う。 <Transparency map generation of non-rigid body area (α plane creation)>
When video synthesis is performed, the nature of the boundary region is a point to be considered in order to synthesize the background image and the foreground image without a sense of incongruity. Usually, the unnatural feeling is reduced by blending the background and foreground boundary pixels into a soft edge instead of a hard edge. Specifically, in the case of an image to be a foreground, for example, an RGB image, an image plane (α plane) representing a distribution of transparency is further added, and video synthesis is performed using this value as a mixing ratio of the foreground and the background.

非剛体領域を除いた剛体のみの画像Ｉｒを作成している。これは、映像合成でいうところの背景画像にあたり、非剛体領域との境界では、一定の混合率で非剛体映像ブレンドされている。 An image Ir of only the rigid body excluding the non-rigid region is created. This is a background image in terms of video composition, and non-rigid video blending is performed at a constant mixing ratio at the boundary with the non-rigid region.

ここで、撮影された非剛体・剛体が分離されていない画像をＩ_ｃとし、非剛体のみの画像をＩ_ｎとする。画像非剛体領域の透過率をα（０から１）で表し、０は透過、１は非透過、その間は画素混合が行われ、以下の式であらわす映像合成が行われると仮定できる。
Ｉ_ｃ＝Ｉ_ｎα＋Ｉ_ｒ（１−α）・・・（６）
Ｉ_ｒは、上述で求めている。一方、Ｉ_ｃは、撮影された映像そのものである。透過度付き非剛体映像を求めるには、未知数は、厳密にはＩ_ｎ、αであり、これらを推定する必要がある。 Here, the image non-rigid-rigid taken are not separated as I _c, the image of the non-rigid only to I _n. The transmittance of the image non-rigid region is represented by α (0 to 1), where 0 is transparent, 1 is non-transparent, and pixel mixing is performed in the meantime, and it can be assumed that video composition represented by the following equation is performed.
I _c = I _n α + I _r (1−α) (6)
I _r is determined by the above-mentioned. On the other hand, I _c is a photographed image itself. In order to obtain a non-rigid image with transparency, the unknowns are strictly I _n and α, and these need to be estimated.

ここで、αプレーンは、画像上の２次元座標（ｘ，ｙ）とすると０≦α（ｘ，ｙ）≦１となる。ここで、自然画像からαを推定する従来手法としてＰｏｉｓｓｏｎＭａｔｔｉｎｇ（以下の論文参照）などがあげられる。
Ｊ．Ｓｕｎ，Ｊ．Ｊｉａ，Ｃ．Ｋ．Ｔａｎｇ，Ｈ．Ｙ．Ｓｈｕｍ，Ｐｏｉｓｓｏｎｍａｔｔｉｎｇ，ＳＩＧＧＲＡＰＨ，２００４ Here, if the α plane is two-dimensional coordinates (x, y) on the image, 0 ≦ α (x, y) ≦ 1. Here, as a conventional method for estimating α from a natural image, there is Poisson Matching (see the following paper).
J. et al. Sun, J. et al. Jia, C .; K. Tang, H .; Y. Shum, Poisson matting, SIGGRAPH, 2004

これらは撮影画像から特定の被写体領域をαを含め抜き出す手法である。抜き出す対象を前景領域、背景を背景領域を手動で指定し、それら境界部分を未知領域とし、最適化により未知領域のαを推定する手法である。 These are techniques for extracting a specific subject area including α from a photographed image. In this method, the foreground region and the background region are manually specified as the extraction target, the boundary portion is set as an unknown region, and α of the unknown region is estimated by optimization.

本実施例では、非剛体領域、剛体領域の境界、２画素幅を未知領域とし、非剛体領域から未知領域を除いた領域を前景領域、剛体領域から未知領域を除いた領域を背景領域とし、ＰｏｉｓｓｏｎＭａｔｔｉｎｇによりαを推定する。なお、本実施例では、αを推定する方法を限定するものではなく、前景領域、背景領域、未知領域のＴｒｉｍａｐを与えることでαを推定する手法、例えば、以下の論文に示されている手法を適用してもよい。
ＸｉａｏｗｕＣｈｅｎ，ＤｏｎｇｑｉｎｇＺｏｕ，ＰｉｎｇＴａｎ，ＩｍａｇｅＭａｔｔｉｎｇｗｉｔｈＬｏｃａｌａｎｄＮｏｎｌｏｃａｌＳｍｏｏｔｈＰｒｉｏｒｓ，ＣＶＰＲ２０１３
以上の手法により非剛体領域の高精度な透過度、領域情報を取得する。 In this example, the non-rigid region, the boundary of the rigid region, the two-pixel width is the unknown region, the region obtained by removing the unknown region from the non-rigid region is the foreground region, and the region obtained by removing the unknown region from the rigid region is the background region. Α is estimated by Poisson Matching. In this embodiment, the method for estimating α is not limited, but a method for estimating α by giving a Trimap of the foreground region, the background region, and the unknown region, for example, the method shown in the following paper May be applied.
Xiaou Chen, Dongqing Zou, PingTan, Image Matching with Local and NonlocalSmooth Priors, CVPR2013
Through the above method, the highly accurate transparency and region information of the non-rigid region are acquired.

従来、特定の被写体領域を抽出するには、一定の撮影環境を用意するか、撮影後の後処理において、手作業で切り出し作業をする必要があり、煩雑であった。一般的には、切り出しは、映像上の人物を切り出すニーズが高い。 Conventionally, in order to extract a specific subject area, it is necessary to prepare a certain shooting environment or manually perform a clipping operation in post-processing after shooting, which is complicated. In general, the need for cutting out a person on a video is high.

本実施例に係る画像処理装置１は、移動物体等の非剛体を対象として、自動で抽出することができ、プロフェッショナルなクリエイターの煩雑な作業を低減するとともに、映像制作のスキルを持ち合わせない、一般ユーザでも高度な映像合成が可能になる。 The image processing apparatus 1 according to the present embodiment can automatically extract a non-rigid body such as a moving object, reduces the cumbersome work of professional creators, and does not have video production skills. Users can also perform advanced video composition.

また、本実施例では、主に、対象となる部分（非剛体部分）を手作業によらずに分離する画像処理装置の構成と動作について説明したが、これに限られず、各構成要素を備え、対象となる部分（非剛体部分）を手作業によらずに分離するための方法、及びプログラムとして構成されてもよい。 Further, in the present embodiment, the configuration and operation of the image processing apparatus that mainly separates a target portion (non-rigid body portion) without manual operation has been described. However, the present invention is not limited to this, and each component is provided. The method may be configured as a method and a program for separating a target portion (non-rigid body portion) without relying on manual work.

さらに、画像処理装置の機能を実現するためのプログラムをコンピュータで読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。 Further, it may be realized by recording a program for realizing the functions of the image processing apparatus on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium. .

ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータで読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 The “computer system” here includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a hard disk built in the computer system.

さらに「コンピュータで読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでもよい。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Furthermore, “computer-readable recording medium” means that a program is dynamically held for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It is also possible to include one that holds a program for a certain time, such as a volatile memory inside a computer system that becomes a server or client in that case. Further, the program may be for realizing a part of the above-described functions, and may be capable of realizing the above-described functions in combination with a program already recorded in the computer system. .

１画像処理装置
１１特徴点検出部
１２対応点探索部
１３多角形状分割部
１４領域分割部
１５抽出部
１６特定部
１７判断部
１８分離部
１９算出部
２０透過度算出部
２１第１類似度判定部
２２第２類似度判定部
２３処理部
１０１カメラ姿勢推定部
１０２非剛体領域抽出部
１０３剛体映像生成部 DESCRIPTION OF SYMBOLS 1 Image processing apparatus 11 Feature point detection part 12 Corresponding point search part 13 Polygon shape division part 14 Area division part 15 Extraction part 16 Identification part 17 Determination part 18 Separation part 19 Calculation part 20 Transparency calculation part 21 1st similarity determination part 22 second similarity determination unit 23 processing unit 101 camera posture estimation unit 102 non-rigid region extraction unit 103 rigid body image generation unit

Claims

A feature point detection unit for detecting a feature point from an image of each frame in content composed of a plurality of frames;
A corresponding point search unit that searches for corresponding points between images of each frame based on the feature amount of each feature point detected by the feature point detection unit;
In an image of a target frame, adjacent n feature points are connected and divided into a plurality of polygonal shapes, and the plurality of the feature points are compared with respect to an image of a frame before and / or after the image of the target frame. A polygonal shape dividing unit that projects the polygonal shape of
A region dividing unit that divides each image into a plurality of regions based on the similarity between pixels in the image of the target frame and the image of the frame before or / and after the image of the target frame;
The image of the target frame divided into a plurality of polygon shapes by the polygon shape dividing unit and the image of the target frame divided into a plurality of regions by the region dividing unit are compared, and the position of each polygon shape An extraction unit that extracts each of the divided areas corresponding to the overlapping area;
Based on the corresponding points searched by the corresponding point search unit, the corresponding region corresponding to each overlapping region extracted by the extraction unit is specified from the image of the frame before and / or after the target frame image. Specific part to do,
Each overlap region extracted by the extraction unit and each corresponding region specified by the specifying unit are compared to calculate a change amount. If the calculated change amount exceeds a predetermined threshold, the overlap A determination unit that determines that the region includes a non-rigid body;
An image processing apparatus comprising: a separation unit that separates a non-rigid body portion from an overlapping region that is determined by the determination unit to include a non-rigid body based on an area divided by the region dividing unit.

Using the corresponding points between the images of each frame searched by the corresponding point search unit, the three-dimensional position of each feature point and the camera posture of the camera that captured the content are estimated by optimization by bundle adjustment. Based on the three-dimensional position of each feature point and the camera posture, the feature point is back-projected into a three-dimensional space and re-projected on the image plane, and the distance between the feature point and the re-projection point is set as a reprojection error. A calculation unit that calculates and identifies feature points that are separated by a predetermined size or more;
The corresponding point search unit excludes the feature point specified by the calculation unit from each feature point detected by the feature point detection unit, and based on the feature amount of the remaining feature point, between the images of each frame The image processing apparatus according to claim 1, wherein a corresponding point is searched.

After separating the non-rigid part from each overlapping area by the separation unit, the non-rigid part is set as the foreground area, the part other than the foreground area is set as the background area, and a trimap is created from the boundary area between the foreground area and the background area. The image processing apparatus according to claim 1, further comprising: a transparency calculating unit that calculates a transparency of a boundary portion of the foreground area based on continuity between the foreground area and the background area.

By comparing the separation region in which the non-rigid body part is separated from each overlapping region by the separation unit and the corresponding region corresponding to the separation region in the image of the frame before or / and after the target frame image, A first similarity determination unit for determining similarity;
A second similarity determination unit that compares the corresponding regions corresponding to the separation region in the image of the frame before and after the image of the target frame and determines the similarity;
If the similarity is determined to be lower than a predetermined value from the determination result by the first similarity determination unit, and the second similarity determination unit determines that the similarity is higher than the predetermined value, the target frame 3. The processing unit according to claim 1, further comprising: a processing unit that cuts out a corresponding region having a higher degree of similarity among corresponding regions corresponding to the separation region in an image of a frame before or after the first image and pastes the corresponding region to the separation region. Image processing device.

A feature point detecting step of detecting a feature point from an image of each frame in content composed of a plurality of frames;
Based on the feature amount of each feature point detected by the feature point detection step, a corresponding point search step for searching for a corresponding point between images of each frame;
In an image of a target frame, adjacent n feature points are connected and divided into a plurality of polygonal shapes, and the plurality of the feature points are compared with respect to an image of a frame before or after the image of the target frame. A polygonal shape dividing step of projecting the polygonal shape of
A region dividing step of dividing each image into a plurality of regions based on the similarity between pixels in the image of the target frame and the image of the frame before or / and after the image of the target frame;
The image of the target frame divided into a plurality of polygon shapes by the polygon shape dividing step and the image of the target frame divided into a plurality of regions by the region dividing step are compared, and the position of each polygon shape An extraction step of extracting each divided region corresponding to as an overlapping region;
Based on the corresponding points searched in the corresponding point search step, the corresponding region corresponding to each overlapping region extracted in the extraction step is specified from the image of the frame before or / and after the target frame image. Specific process to
Each overlap region extracted in the extraction step is compared with each corresponding region specified in the specific step to calculate a change amount, and when the calculated change amount exceeds a predetermined threshold, the overlap A determination step of determining that the region includes a non-rigid body,
An image processing method comprising: a separation step of separating a non-rigid portion from an overlap region determined to contain a non-rigid body by the determination step based on the region divided by the region division step.

A feature point detecting step of detecting a feature point from an image of each frame in content composed of a plurality of frames;
Based on the feature amount of each feature point detected by the feature point detection step, a corresponding point search step for searching for a corresponding point between images of each frame;
In an image of a target frame, adjacent n feature points are connected and divided into a plurality of polygonal shapes, and the plurality of the feature points are compared with respect to an image of a frame before or after the image of the target frame. A polygonal shape dividing step of projecting the polygonal shape of
A region dividing step of dividing each image into a plurality of regions based on the similarity between pixels in the image of the target frame and the image of the frame before or / and after the image of the target frame;
The image of the target frame divided into a plurality of polygon shapes by the polygon shape dividing step and the image of the target frame divided into a plurality of regions by the region dividing step are compared, and the position of each polygon shape An extraction step of extracting each divided region corresponding to as an overlapping region;
Based on the corresponding points searched in the corresponding point search step, the corresponding region corresponding to each overlapping region extracted in the extraction step is specified from the image of the frame before or / and after the target frame image. Specific process to
Each overlap region extracted in the extraction step is compared with each corresponding region specified in the specific step to calculate a change amount, and when the calculated change amount exceeds a predetermined threshold, the overlap A determination step of determining that the region includes a non-rigid body,
A program for causing a computer to execute a separation step of separating a non-rigid body part from an overlapping region determined to contain a non-rigid body by the determination step based on the region divided by the region division step.