JP6611255B2

JP6611255B2 - Image processing apparatus, image processing method, and image processing program

Info

Publication number: JP6611255B2
Application number: JP2016115333A
Authority: JP
Inventors: 真理子山口; 峻司細野; 広夢宮下; 秀信長田; 朗小野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-06-09
Filing date: 2016-06-09
Publication date: 2019-11-27
Anticipated expiration: 2036-06-09
Also published as: JP2017220098A

Description

本発明は、画像中の任意の領域を抽出する技術に関する。 The present invention relates to a technique for extracting an arbitrary region in an image.

幅広い分野において、画像からの任意の領域を抽出したいという要求がある。テレビ局やスタジオ等では、グリーンバック背景に被写体を設置して被写体領域のみを抽出するクロマキー技術が古くから研究開発され、現在も広く普及し活用されている。被写体領域のみを抽出することにより、例えば抽出した被写体画像を別画像に貼り付けて現実ではあり得ないような画像を生成する等の付加価値を得ることができる。また、人工知能に被写体を認識させるための教師データとして蓄えることもできる。このような利点から、ヴァーチャルリアリティや映画におけるＣＧ作成、ビックデータ生成などにおいて、クロマキー技術による領域の抽出技術は不可欠なものとなっている。 There is a demand for extracting an arbitrary region from an image in a wide range of fields. In TV stations, studios, etc., chroma key technology for extracting a subject area by placing a subject on a green background has been researched and developed for a long time and is still widely used. By extracting only the subject area, it is possible to obtain added value such as, for example, generating an image that cannot be realized by pasting the extracted subject image on another image. It can also be stored as teacher data for causing the artificial intelligence to recognize the subject. Because of these advantages, the area extraction technique using the chroma key technique is indispensable for virtual reality, CG creation in movies, big data generation, and the like.

クロマキー技術では被写体の背景に単色のスクリーン等を用いる必要があるため、技術を利用できる場面が制限される。たとえば、屋外でのロケ映像や、競技場でのスポーツ競技においては用いることができない。この問題に対し、任意背景における任意領域の抽出技術が検討されている。例えば、背景差分を用いる方法(例えば、非特許文献１参照)、抽出対象領域および背景の、それぞれの色のヒストグラムを入力する方法（例：非特許文献２参照）、深度センサを用いる方法（例：非特許文献３参照）である。 In the chroma key technology, since it is necessary to use a monochrome screen or the like for the background of the subject, the scenes where the technology can be used are limited. For example, it cannot be used in outdoor location images or sports competitions in stadiums. In order to solve this problem, a technique for extracting an arbitrary region in an arbitrary background has been studied. For example, a method using a background difference (for example, see Non-Patent Document 1), a method for inputting a histogram of each color of an extraction target region and a background (for example, see Non-Patent Document 2), a method for using a depth sensor (for example) : Non-patent document 3).

波部斉、和田俊和、松山隆司、“照明変化に対して頑健な背景差分法”、一般社団法人情報処理学会、CVM115-3, 1999Satoshi Namibe, Toshikazu Wada, Takashi Matsuyama, “Background Difference Method Robust against Lighting Changes”, Information Processing Society of Japan, CVM115-3, 1999 Carsten Rother, Vladimir Kolmogorov, and Andrew Blake, “‘GrabCut’- Interactive foreground extraction using iterated graph cuts”, ACM Transactios on Graphics (SIGGRAPH), 2004, Volume 23, Issue 3, p. 309-314Carsten Rother, Vladimir Kolmogorov, and Andrew Blake, “” GrabCut ”-Interactive foreground extraction using iterated graph cuts”, ACM Transactios on Graphics (SIGGRAPH), 2004, Volume 23, Issue 3, p. 309-314 Ryan Crabb, Colin Tracey, Akshaya Puranik, and James David, “Real-time Foreground Segmentation via Range and Color Imaging”, IEEE, CVPR Workshops, 2008Ryan Crabb, Colin Tracey, Akshaya Puranik, and James David, “Real-time Foreground Segmentation via Range and Color Imaging”, IEEE, CVPR Workshops, 2008

しかしながら、従来の技術には以下のような課題がある。 However, the conventional techniques have the following problems.

背景差分を用いた領域の抽出方法（以下、背景差分に基づく方法）では、抽出対象領域を含まない背景画像を事前に撮影し、抽出対象領域を含む画像と背景画像とを画素ごとに比較し、比較の結果得られる色あるいは輝度等の差分値に基づいて抽出対象領域を得る。このとき、抽出対象領域を含む画像中の背景部分の画素が、事前に撮影した背景画像の画素と等しく、かつ抽出対象領域の画素が背景画像中の画素と色成分や輝度において異なる場合、抽出対象領域の画素のみから、背景画像の画素との差分が得られ、抽出対象領域を正確に得ることが可能である。しかしながら、利用場面によっては、時刻による太陽光の変化、照明変動、または被写体の動きによる影の生成などから、事前に撮影した背景画像と抽出対象領域を含む画像における背景部分とに差分が生じ、抽出対象領域が正しく得られない。また、被写体の動きや被服によって、抽出対象領域中の画素が、事前に撮影した背景画像に含まれる画素と近い色・輝度になることがあり、この場合も正確に抽出対象領域を得られないことがある。 In the region extraction method using the background difference (hereinafter referred to as the background difference method), a background image that does not include the extraction target region is captured in advance, and the image including the extraction target region and the background image are compared for each pixel. Then, an extraction target region is obtained based on a difference value such as a color or luminance obtained as a result of the comparison. At this time, if the pixels of the background portion in the image including the extraction target area are equal to the pixels of the background image captured in advance and the pixels of the extraction target area are different from the pixels in the background image in color components and brightness, extraction is performed. The difference from the pixels of the background image can be obtained from only the pixels in the target area, and the extraction target area can be accurately obtained. However, depending on the usage scene, there is a difference between the background image taken in advance and the background part in the image including the extraction target area, such as the change in sunlight due to time, the fluctuation of illumination, or the generation of shadows due to subject movement, etc. The extraction target area cannot be obtained correctly. Also, depending on the movement of the subject and the clothes, the pixels in the extraction target region may have colors and brightness close to those of the pixels included in the background image captured in advance, and in this case, the extraction target region cannot be obtained accurately. Sometimes.

抽出対象を含む画像から、抽出対象領域および背景領域の画素を選択し、選択された画素の色のヒストグラムを求め、分布率に基づいて、抽出対象を含む画像の全画素に対し、抽出対象領域であるか、背景であるかの尤度を求め、この尤度に基づいて、抽出対象領域を得る方法が提案されている。この方法は、抽出対象領域と背景領域の境界を求めることが可能だが、色情報に基づく方法であるため、抽出対象領域および背景領域の画素の選択結果によっては、正確に抽出対象領域を得られないことがある。 Extraction target area and background area pixels are selected from the image including the extraction target, a histogram of the color of the selected pixel is obtained, and the extraction target area is calculated for all pixels of the image including the extraction target based on the distribution ratio. A method has been proposed in which the likelihood of whether or not it is a background is obtained and an extraction target region is obtained based on the likelihood. Although this method can determine the boundary between the extraction target area and the background area, it is based on color information, so the extraction target area can be obtained accurately depending on the selection result of the pixels in the extraction target area and the background area. There may not be.

抽出対象領域に存在する被写体までの距離を利用し、被写体のみを抽出対象領域として得る方法がある。この方法では、距離の情報によって抽出対象領域を判定するため、背景色と抽出対象領域の色が近い場合においても正確に抽出対象領域を得やすい。しかしながら、深度に基づく方法では、抽出対象でない領域に、抽出対象と等しい距離が検出された場合、これを分離することが困難である。また、深度を計測する際に赤外深度センサを用いた場合、計測できる深度情報は、画面内において疎であり、深度の結果のみを用いて、画像中の抽出対象領域を精緻に指定することが困難である。また、赤外深度センサは、黒い物体、毛皮、毛髪等に対して正確に測距することが原理的に困難であるため、被写体の種類によっては、正確に抽出対象領域を得ることができない場合がある。 There is a method of obtaining only the subject as the extraction target region by using the distance to the subject existing in the extraction target region. In this method, since the extraction target area is determined based on the distance information, it is easy to accurately obtain the extraction target area even when the background color and the color of the extraction target area are close. However, in the method based on the depth, when a distance equal to the extraction target is detected in a region that is not the extraction target, it is difficult to separate this. In addition, when using an infrared depth sensor when measuring depth, the depth information that can be measured is sparse in the screen, and the extraction target area in the image must be specified precisely using only the depth result. Is difficult. In addition, the infrared depth sensor is difficult in principle to accurately measure a black object, fur, hair, etc., and depending on the type of subject, the extraction target area cannot be obtained accurately. There is.

このように、従来の技術では、任意の背景に対して、正確に抽出対象領域を得ることが困難となっている。 Thus, with the conventional technique, it is difficult to accurately obtain an extraction target region with respect to an arbitrary background.

本発明は、上記に鑑みてなされたものであり、任意の背景下において、より正確に抽出対象領域を得ることを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to obtain an extraction target region more accurately under an arbitrary background.

第１の本発明に係る画像処理装置は、画像を入力する画像入力手段と、前記画像の前景領域の輪郭に沿った前景領域の形状を入力し、前記画像から前記前景領域の形状に対応する画素を切り出して処理領域画像を取得する形状入力手段と、前記処理領域画像の前景領域側の画素と前記処理領域画像の背景領域側の画素から前記画像の各画素の前景らしさ又は背景らしさの確率を保持する初期確率場を生成する初期確率場生成手段と、前記処理領域画像から前記前景領域の輪郭線を抽出する輪郭線抽出手段と、前記輪郭線に基づいて重み付けた前記初期確率場をコスト関数としてグラフカットを用いて前記画像を前景領域と背景領域とに分離する分離手段と、を有し、前記輪郭線抽出手段は、前記処理領域画像から色や位置が近い画素をクラスタリングしたスーパーピクセルを計算し、スーパーピクセルの境界を表す境界画像を生成するスーパーピクセル計算手段、前記前景領域の形状に基づいて微分フィルタを生成し、前記境界画像に当該微分フィルタを適用して前記前景領域の輪郭候補画素を抽出する微分フィルタ処理手段と、前記輪郭候補画素間を前記スーパーピクセルの境界に沿って補間して前記前景領域の輪郭線を抽出する輪郭線補間手段と、を有することを特徴とする。 An image processing apparatus according to the first aspect of the present invention inputs an image input means for inputting an image, and the shape of the foreground area along the outline of the foreground area of the image, and corresponds to the shape of the foreground area from the image. Shape input means for cutting out pixels to obtain a processing area image, and the foreground or background probability of each pixel of the image from the pixels on the foreground area side of the processing area image and the pixels on the background area side of the processing area image An initial random field generating means for generating an initial random field for holding the contour, an outline extracting means for extracting the outline of the foreground area from the processing area image, and the initial random field weighted based on the outline is a cost. a separating means for separating the image into a foreground region and a background region using graph cuts as a function, the said contour extraction means, class pixels is close color and position from the process area image Superpixel calculating means for calculating a ringed superpixel and generating a boundary image representing a boundary of the superpixel, generating a differential filter based on the shape of the foreground region, and applying the differential filter to the boundary image Differential filter processing means for extracting contour candidate pixels in the foreground region, and contour interpolation means for extracting the contour line of the foreground region by interpolating between the contour candidate pixels along the boundary of the super pixel. It is characterized by.

第２の本発明に係る画像処理方法は、
コンピュータによって実行される画像処理方法であって、画像を入力するステップと、前記画像の前景領域の輪郭に沿った前景領域の形状を入力するステップと、前記画像から前記前景領域の形状に対応する画素を切り出して処理領域画像を取得するステップと、前記処理領域画像の前景領域側の画素と前記処理領域画像の背景領域側の画素から前記画像の各画素の前景らしさ又は背景らしさの確率を保持する初期確率場を生成するステップと、前記処理領域画像から前記前景領域の輪郭線を抽出するステップと、前記輪郭線に基づいて重み付けた前記初期確率場をコスト関数としてグラフカットを用いて前記画像を前景領域と背景領域とに分離するステップと、を有し、前記輪郭線を抽出するステップは、前記処理領域画像から色や位置が近い画素をクラスタリングしたスーパーピクセルを計算し、スーパーピクセルの境界を表す境界画像を生成するステップと、前記前景領域の形状に基づいて微分フィルタを生成し、前記境界画像に当該微分フィルタを適用して前記前景領域の輪郭候補画素を抽出するステップと、前記輪郭候補画素間を前記スーパーピクセルの境界に沿って補間して前記前景領域の輪郭線を抽出するステップと、を有することを特徴とする。 An image processing method according to the second aspect of the present invention includes:
An image processing method executed by a computer, the step of inputting an image, the step of inputting the shape of a foreground region along the outline of the foreground region of the image, and the shape of the foreground region from the image A step of obtaining a processing region image by cutting out pixels, and maintaining the probability of foreground or background likelihood of each pixel of the image from the pixels on the foreground region side of the processing region image and the pixels on the background region side of the processing region image Generating an initial random field, extracting a contour line of the foreground region from the processing region image, and using the graph cut with the initial random field weighted based on the contour line as a cost function. the has a step of separating the foreground and background regions, a step of extracting the contour, the color and position of the processing region image Calculating a superpixel obtained by clustering a large number of pixels, generating a boundary image representing a boundary of the superpixel, generating a differential filter based on the shape of the foreground region, and applying the differential filter to the boundary image. Extracting the contour candidate pixels of the foreground region, and extracting the contour line of the foreground region by interpolating between the contour candidate pixels along a boundary of the superpixel .

第３の本発明に係る画像処理プログラムは、上記画像処理装置の各手段としてコンピュータを動作させることを特徴とする。 According to a third aspect of the present invention, there is provided an image processing program for operating a computer as each unit of the image processing apparatus.

本発明によれば、任意の背景下において、より正確に抽出対象領域を得ることができる。 According to the present invention, an extraction target region can be obtained more accurately under an arbitrary background.

第１の実施の形態における画像処理装置の構成を示す機能ブロック図である。1 is a functional block diagram illustrating a configuration of an image processing apparatus according to a first embodiment. 処理領域画像と初期確率場を生成する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which produces | generates a process area image and an initial stage random field. 元画像の例を示す図である。It is a figure which shows the example of an original image. 元画像に前景領域の形状を入力した様子を示す図である。It is a figure which shows a mode that the shape of the foreground area | region was input into the original image. 前景領域の輪郭を抽出する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which extracts the outline of a foreground area | region. 処理領域画像から生成したスーパーピクセルの例を示す図である。It is a figure which shows the example of the super pixel produced | generated from the process area | region image. 図６のスーパーピクセルから生成した二値画像の例を示す図である。It is a figure which shows the example of the binary image produced | generated from the super pixel of FIG. グリッドで区分けした二値画像と各グリッドの傾きを示す図である。It is a figure which shows the inclination of each binary image divided by the grid, and each grid. 二値画像に対して８方向に直交する微分フィルタを生成して適用した例を示す図である。It is a figure which shows the example which produced | generated and applied the differential filter orthogonal to 8 directions with respect to a binary image. 元画像のマネキンの胸付近を拡大した図である。It is the figure which expanded the chest vicinity of the mannequin of the original image. 図１０の処理領域画像から抽出した輪郭候補の画素を示す図である。It is a figure which shows the pixel of the outline candidate extracted from the process area image of FIG. 図１１からノイズを除去して得られた有為線の例を示す図である。It is a figure which shows the example of the prosthetic line obtained by removing noise from FIG. 図１２の有為線間を補間して得られた輪郭候補線の例を示す図である。It is a figure which shows the example of the outline candidate line obtained by interpolating between the probable lines of FIG. 図１３の輪郭候補線から輪郭線を抽出した例を示す図である。It is a figure which shows the example which extracted the outline from the outline candidate line of FIG. 有為線間を補間する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which interpolates between important lines. スーパーピクセルにラベルを付与し、画素に隣接情報を付与した様子を示す図である。It is a figure which shows a mode that the label was provided to the super pixel and the adjacent information was provided to the pixel. 有為線Ｃ１を伸ばした様子を示す図である。It is a figure which shows a mode that the major line C1 was extended. 有為線の統合処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the integration process of a major line. 有為線を統合する様子を示す図である。It is a figure which shows a mode that a significant line is integrated. 島状のスーパーピクセルで分断された有為線を統合する様子を示す図である。It is a figure which shows a mode that the major line divided by the island-like super pixel is integrated. 前景切り抜き処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a foreground clipping process. 前景領域を切り出した出力画像の例を示す図である。It is a figure which shows the example of the output image which cut out the foreground area | region. 第２の実施の形態における画像処理装置を含む画像認識システムの全体構成図である。It is a whole block diagram of the image recognition system containing the image processing apparatus in 2nd Embodiment. 第２の実施の形態における画像処理装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the image processing apparatus in 2nd Embodiment. 第２の実施の形態における画像処理装置が特徴点を抽出する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the image processing apparatus in 2nd Embodiment extracts a feature point. 第２の実施の形態における画像認識装置の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the image recognition apparatus in 2nd Embodiment. クライアント端末に表示される結果の例を示す図である。It is a figure which shows the example of the result displayed on a client terminal.

［第１の実施の形態］
＜画像処理装置の構成＞
図１は、第１の実施の形態における画像処理装置の構成を示す機能ブロック図である。 [First Embodiment]
<Configuration of image processing apparatus>
FIG. 1 is a functional block diagram illustrating the configuration of the image processing apparatus according to the first embodiment.

図１に示す画像処理装置１は、入力部１１、初期確率場生成部１２、スーパーピクセル計算部１３、微分フィルタ処理部１４、形状候補抽出部１５、処理確率場生成部１６、グラフカット計算部１７、及び領域論理演算部１８を備える。画像処理装置１が備える各部は、演算処理装置、記憶装置等を備えたコンピュータにより構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムは画像処理装置１が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。 An image processing apparatus 1 shown in FIG. 1 includes an input unit 11, an initial random field generation unit 12, a superpixel calculation unit 13, a differential filter processing unit 14, a shape candidate extraction unit 15, a processing random field generation unit 16, and a graph cut calculation unit. 17 and an area logic operation unit 18. Each unit included in the image processing device 1 may be configured by a computer including an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the image processing apparatus 1, and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network.

入力部１１は、前景領域と背景領域の分離を望む処理対象の元画像（カラー画像）と、元画像の前景領域の境界を含み、前景領域の大まかな形状を示す処理領域を入力し、元画像から処理領域に対応する画像を切り出して処理領域画像を生成する。入力部１１は、入力した元画像を表示する。ユーザは、表示された元画像に対して、切り出したい前景領域の輪郭部分を太いペンツールなどでなぞって処理領域を入力する。例えば、タブレットなどのタッチセンサ内蔵ディスプレイ、ペンタブレットなどのフリーハンド入力機器が利用できる。入力部１１は、ユーザがなぞった部分を元画像から切り出して処理領域画像とする。あるいは、元画像を撮影するときに、深度センサや温度センサを用いて抽出したい対象の形状を検出しておき、センサによって検出された形状に基づいて処理領域を設定してもよい。センサで処理領域の形状を入力する場合は、センサの値の閾値によって選択領域と非選択領域を定め、選択領域の画素値を１、非選択領域の画素値を０としたマスク画像を生成する。ただし、この画像は、対象の大まかな形状を再現しているものとする。 The input unit 11 inputs an original image (color image) to be processed for which separation of the foreground area and the background area is desired, and a processing area indicating the rough shape of the foreground area, including the boundary between the foreground area of the original image, An image corresponding to the processing region is cut out from the image to generate a processing region image. The input unit 11 displays the input original image. The user inputs a processing area with respect to the displayed original image by tracing the outline of the foreground area to be cut out with a thick pen tool or the like. For example, a display with a built-in touch sensor such as a tablet or a freehand input device such as a pen tablet can be used. The input unit 11 cuts out a part traced by the user from the original image and sets it as a processing region image. Alternatively, when capturing an original image, a shape of a target to be extracted may be detected using a depth sensor or a temperature sensor, and a processing region may be set based on the shape detected by the sensor. When inputting the shape of the processing area with the sensor, the selection area and the non-selection area are determined by the threshold value of the sensor, and a mask image is generated with the pixel value of the selection area set to 1 and the pixel value of the non-selection area set to 0. . However, it is assumed that this image reproduces the rough shape of the object.

処理領域が閉領域である場合は、内側を前景領域、外側を背景領域とする。処理領域が開領域である場合は、ユーザからの入力又はセンサ情報によって、どちら側を前景領域とするのか決定する。 If the processing area is a closed area, the inside is the foreground area and the outside is the background area. When the processing area is an open area, which side is set as the foreground area is determined based on input from the user or sensor information.

初期確率場生成部１２は、処理領域画像の前景領域側のふちの画素を前景の一部、処理領域画像の背景領域側のふちの画素を背景の一部の教示として、コスト最小化問題を解くために必要な、元画像の各画素の「前景らしさ」「背景らしさ」の確率を持つ初期確率場を生成する。より正確には、初期確率場は、隣接する画素と自分が同じラベル（前景又は背景）をもつ確率をすべての画素がすべての隣り合った画素に対して持っている確率場である。初期確率場は、グラフカットで用いられる既知のコスト関数である。例えば、指定された領域から画素値ヒストグラムを作成しておき、任意の画素について、その画素値が前景あるいは背景に適切にラベル付けされたときに、全体が小さいコストになるようなコスト関数である。なお、本実施形態では、グラフカットの際に、初期確率場に対して前景領域の輪郭線の重み付けを行った処理確率場を用いる。 The initial random field generator 12 teaches the cost minimization problem using the edge pixels on the foreground area side of the processing area image as part of the foreground and the edge pixels on the background area side of the processing area image as part of the background. An initial random field having the probabilities of “foreground-like” and “background-like” of each pixel of the original image necessary for solving is generated. More precisely, the initial random field is a random field in which every pixel has a probability of having the same label (foreground or background) with neighboring pixels for every neighboring pixel. The initial random field is a known cost function used in graph cuts. For example, a cost function is such that a pixel value histogram is created from a specified area, and when an arbitrary pixel is appropriately labeled with the pixel value on the foreground or background, the overall cost is small. . In the present embodiment, a processing probability field obtained by weighting the outline of the foreground region with respect to the initial probability field is used at the time of graph cutting.

スーパーピクセル計算部１３は、処理領域画像から色や位置が近い画素をクラスタリングしたスーパーピクセルを計算し、スーパーピクセルの境界を示す二値画像を生成する。 The super pixel calculation unit 13 calculates a super pixel obtained by clustering pixels having similar colors and positions from the processing region image, and generates a binary image indicating the boundary of the super pixel.

微分フィルタ処理部１４は、スーパーピクセル計算部１３が生成した二値画像に対して、処理領域の形状に基づいて生成した微分フィルタを適用し、前景領域の輪郭候補の画素を抽出する。 The differential filter processing unit 14 applies the differential filter generated based on the shape of the processing region to the binary image generated by the super pixel calculation unit 13 and extracts the contour candidate pixels of the foreground region.

形状候補抽出部１５は、前景領域の輪郭候補の画素間を補間して前景領域の輪郭線を抽出し、輪郭線上で隣り合う画素同士に無限小のコストを付与した輪郭候補確率場を生成する。 The shape candidate extraction unit 15 interpolates between the contour candidate pixels of the foreground region to extract the contour line of the foreground region, and generates a contour candidate probability field in which an infinitesimal cost is given to adjacent pixels on the contour line. .

処理確率場生成部１６は、初期確率場に対して輪郭候補確率場で重み付けて処理確率場を生成する。 The processing random field generation unit 16 generates a processing random field by weighting the initial random field with the contour candidate random field.

グラフカット計算部１７は、処理確率場をコスト関数として、元画像の全画素に前景又は背景のラベルを付けときにコストが最小となる組み合わせをグラフカットにより求めて、前景と背景を分離し、前景領域を抽出する。グラフカットとは、あらかじめ前景領域と背景領域の一部が与えられている条件下で、前景と背景が隣接する画素には画素値に違いがあるという仮定のもと、全画素に対して前景又は背景のラベルが適切に付けられた時に最小のコストとなるコスト関数を与え、コスト最小となるラベルの組み合わせを効率よく求める計算方法である。 The graph cut calculation unit 17 uses the processing random field as a cost function to obtain a combination that minimizes the cost when labeling the foreground or the background on all pixels of the original image, and separates the foreground and the background, Extract the foreground area. A graph cut is a condition in which foreground and background areas are given in advance, assuming that there is a difference in pixel values between adjacent pixels in the foreground and background. Alternatively, it is a calculation method for efficiently obtaining a combination of labels that gives the minimum cost by giving a cost function that gives the lowest cost when the background label is appropriately attached.

領域論理演算部１８は、元画像から前景領域に対応する画像を切り出して出力画像を生成する。 The area logic operation unit 18 cuts out an image corresponding to the foreground area from the original image and generates an output image.

＜画像処理装置の動作＞
次に、第１の実施の形態における画像処理装置の処理の流れについて説明する。以下、画像処理装置１の処理を、処理領域画像の生成処理、輪郭の抽出処理、前景切り抜き処理の３つに分けて説明する。 <Operation of Image Processing Device>
Next, a processing flow of the image processing apparatus according to the first embodiment will be described. Hereinafter, the processing of the image processing apparatus 1 will be described by dividing it into three processing regions, that is, processing region image generation processing, contour extraction processing, and foreground clipping processing.

＜処理領域画像の生成処理＞
まず、処理領域画像の生成処理について説明する。これらの処理は、入力部１１と初期確率場生成部１２によって行われる。 <Processing region image generation processing>
First, processing area image generation processing will be described. These processes are performed by the input unit 11 and the initial random field generation unit 12.

図２は、処理領域画像と初期確率場を生成する処理の流れを示すフローチャートである。 FIG. 2 is a flowchart showing a flow of processing for generating a processing region image and an initial random field.

入力部１１は、元画像を入力する（ステップＳ１１）。図３に入力する元画像の例を示す。 The input unit 11 inputs an original image (step S11). FIG. 3 shows an example of the original image to be input.

入力部１１は、元画像を画面上に表示し、前景領域の形状を入力する（ステップＳ１２）。ユーザは、画面に表示された元画像の前景領域の輪郭をなぞるように大まかな形状を入力する。図４に前景領域の形状を入力した様子を示す。同図のマネキンの輪郭に沿った領域がユーザの入力した前景領域の形状である。 The input unit 11 displays the original image on the screen and inputs the shape of the foreground area (step S12). The user inputs a rough shape so as to trace the outline of the foreground area of the original image displayed on the screen. FIG. 4 shows a state in which the shape of the foreground area is input. The area along the outline of the mannequin in the figure is the shape of the foreground area input by the user.

入力部１１は、元画像からユーザがなぞった部分を処理領域画像として切り出す（ステップＳ１３）。入力部１１は、ユーザによって入力された前景領域の輪郭を含む領域をマスク画像として保存し、元画像とマスク画像とを論理演算することで、処理領域画像を生成する。生成された処理領域画像は、初期確率場生成部１２及びスーパーピクセル計算部１３へ送信される。 The input unit 11 cuts out a portion traced by the user from the original image as a processing region image (step S13). The input unit 11 stores a region including the outline of the foreground region input by the user as a mask image, and generates a processing region image by performing a logical operation on the original image and the mask image. The generated processing region image is transmitted to the initial random field generation unit 12 and the superpixel calculation unit 13.

初期確率場生成部１２は、処理領域画像から初期確率場を生成する（ステップＳ１４）。図４の例では、処理領域は、マネキンを囲む閉領域である。初期確率場生成部１２は、処理領域画像の内側を前景領域、処理領域画像の外側を背景領域として、前景領域の画素値と外側領域の画素値から初期確率場を生成する。 The initial random field generation unit 12 generates an initial random field from the processing region image (Step S14). In the example of FIG. 4, the processing area is a closed area surrounding the mannequin. The initial random field generation unit 12 generates an initial random field from the pixel values of the foreground region and the pixel values of the outer region, with the inside of the processing region image being the foreground region and the outside of the processing region image being the background region.

＜輪郭の抽出処理＞
続いて、輪郭の抽出処理について説明する。この処理は、スーパーピクセル計算部１３、微分フィルタ処理部１４、及び形状候補抽出部１５によって行われる。 <Outline extraction processing>
Next, the contour extraction process will be described. This process is performed by the superpixel calculator 13, the differential filter processor 14, and the shape candidate extractor 15.

図５は、前景領域の輪郭を抽出する処理の流れを示すフローチャートである。 FIG. 5 is a flowchart showing a flow of processing for extracting the outline of the foreground region.

スーパーピクセル計算部１３は、処理領域画像からスーパーピクセルを生成する（ステップＳ２１）。図６に、処理領域画像から生成したスーパーピクセルの例を示す。同図中の同じ色の画素の集まりが１つのスーパーピクセルである。 The super pixel calculation unit 13 generates a super pixel from the processing region image (step S21). FIG. 6 shows an example of a super pixel generated from the processing region image. A group of pixels of the same color in the figure is one super pixel.

スーパーピクセル計算部１３は、各スーパーピクセルの輪郭を抽出した二値画像を生成する（ステップＳ２２）。スーパーピクセル計算部１３は、スーパーピクセルの境界に隣接する画素を抽出して二値画像を生成する。図７に、図６のスーパーピクセルから生成した二値画像の例を示す。 The super pixel calculation unit 13 generates a binary image obtained by extracting the outline of each super pixel (step S22). The superpixel calculator 13 extracts pixels adjacent to the superpixel boundary to generate a binary image. FIG. 7 shows an example of a binary image generated from the superpixel of FIG.

微分フィルタ処理部１４は、処理領域画像の形状に基づいて微分フィルタを生成する（ステップＳ２３）。微分フィルタ処理部１４は、処理領域画像を任意のグリッドで区分けし、各グリッドにおける処理領域画像の外周の傾きを求め、各グリッドの傾きに対して垂直にたたむ込む方向の微分フィルタを生成する。図８に、グリッドで区分けした二値画像と各グリッドの傾きを示す。 The differential filter processing unit 14 generates a differential filter based on the shape of the processing region image (step S23). The differential filter processing unit 14 divides the processing region image with an arbitrary grid, obtains the inclination of the outer periphery of the processing region image in each grid, and generates a differential filter in a direction that folds perpendicular to the inclination of each grid. FIG. 8 shows the binary image divided by the grid and the inclination of each grid.

微分フィルタ処理部１４は、生成した微分フィルタを二値画像に適用する（ステップＳ２４）。図９に、二値画像に対して８方向に直交する微分フィルタを生成して適用した例を示す。微分フィルタにより、前景領域の輪郭候補の画素が抽出される。 The differential filter processing unit 14 applies the generated differential filter to the binary image (step S24). FIG. 9 shows an example in which a differential filter orthogonal to eight directions is generated and applied to a binary image. Pixels of contour candidates in the foreground region are extracted by the differential filter.

形状候補抽出部１５は、輪郭候補の画素からノイズを除去する（ステップＳ２５）。形状候補抽出部１５は、輪郭候補の画素の連続性を求め、輪郭候補の画素が所定の閾値よりも連続している場合は有為線として残し、所定の閾値よりも短い場合はノイズとして除去する。図１０は、元画像のマネキンの胸付近を拡大した図である。図１１は、スーパーピクセル計算部１３及び微分フィルタ処理部１４により図１０の処理領域画像から抽出した輪郭候補の画素を示す図である。図１２に、図１１からノイズを除去して得られた有為線の例を示す。 The shape candidate extraction unit 15 removes noise from the contour candidate pixels (step S25). The shape candidate extraction unit 15 obtains the continuity of the contour candidate pixels, and if the contour candidate pixels are continuous than a predetermined threshold, the shape candidate extraction unit 15 leaves it as a constrained line, and if it is shorter than the predetermined threshold, removes it as noise. To do. FIG. 10 is an enlarged view of the vicinity of the chest of the mannequin in the original image. FIG. 11 is a diagram showing contour candidate pixels extracted from the processing region image of FIG. 10 by the superpixel calculator 13 and the differential filter processor 14. FIG. 12 shows an example of a significant line obtained by removing noise from FIG.

形状候補抽出部１５は、有為線間をスーパーピクセルの境界に沿って補間する（ステップＳ２６）。有為線間を補間する処理の詳細は後述する。図１３に、図１２の有為線間を補間して得られた輪郭候補線の例を示す。図１３の符号２０で示した箇所の輪郭候補線は短いので無視する。 The shape candidate extraction unit 15 interpolates between the significant lines along the superpixel boundary (step S26). Details of the process of interpolating between the significant lines will be described later. FIG. 13 shows an example of a contour candidate line obtained by interpolating between the significant lines in FIG. The contour candidate line indicated by reference numeral 20 in FIG. 13 is short and ignored.

形状候補抽出部１５は、得られた輪郭候補線のうち、前景領域の輪郭に最も近い輪郭候補線を輪郭線として抽出する（ステップＳ２７）。この処理では、最も尤もらしい輪郭線を抽出できればよい。例えば、前景領域の形状と長さや傾きが近似している輪郭候補線を抽出する。正しい輪郭線を抽出するために有為なものであればよい。ユーザに輪郭候補線を提示し、ユーザが任意に選択してもよい。あるいは、輪郭候補線に画素の重複を許し、最も輪郭候補線が通りやすい線を算出して輪郭線としてもよい。図１４に、図１３の輪郭候補線から輪郭線を抽出した例を示す。図１４のマネキンに沿った白線が抽出した輪郭線である。 The shape candidate extraction unit 15 extracts a contour candidate line closest to the contour of the foreground region from the obtained contour candidate lines as a contour line (step S27). In this process, it is only necessary to extract the most likely contour line. For example, contour candidate lines whose shape, length, and inclination are approximate to the foreground area are extracted. Anything that is useful for extracting a correct contour line may be used. An outline candidate line may be presented to the user, and the user may arbitrarily select it. Alternatively, it is possible to allow the contour candidate line to be overlapped with pixels and calculate a line that is most likely to pass through the contour candidate line to obtain the contour line. FIG. 14 shows an example in which a contour line is extracted from the contour candidate lines in FIG. The white line along the mannequin in FIG. 14 is the extracted outline.

形状候補抽出部１５は、輪郭線上で隣り合う画素同士に無限小のコストを付与した輪郭候補確率場を生成する（ステップＳ２８）。 The shape candidate extraction unit 15 generates a contour candidate probability field in which an infinitesimal cost is given to pixels adjacent on the contour line (step S28).

ここで、有為線間を補間する処理について説明する。 Here, the process of interpolating between the significant lines will be described.

図１５は、有為線間を補間する処理の流れを示すフローチャートである。 FIG. 15 is a flowchart showing a flow of processing for interpolating between the significant lines.

スーパーピクセル計算部１３は、各スーパーピクセルにラベルを付与し（ステップＳ３１）、スーパーピクセルの境界に隣接する画素に隣接情報をヒモ付ける（ステップＳ３２）。図１６に、スーパーピクセルにラベルを付与し、画素に隣接情報を付与した様子を示す。図１６の点線の四角形が１画素を示し、実線がスーパーピクセルの境界を示す。図１６には、ラベルＬ１，Ｌ２，Ｌ３を付与された３つのスーパーピクセルを示している。スーパーピクセルＬ１の画素Ｐ１０〜Ｐ１９、スーパーピクセルＬ２の画素Ｐ２０〜Ｐ１６、及びスーパーピクセルＬ３の画素Ｐ３０〜Ｐ３８がスーパーピクセルの境界に隣接する画素である。また、図１６では、括弧内に自身の属するスーパーピクセルと隣接情報を示している。例えば、画素Ｐ１０〜Ｐ１４には（Ｌ１，Ｌ１：Ｌ３）という情報がヒモ付けられている。これは、画素Ｐ１０〜Ｐ１４は、スーパーピクセルＬ１に属し、スーパーピクセルＬ１，Ｌ３の境界に隣接していることを表している。さらに、図１６には、２つの有為線Ｃ１，Ｃ２を示している。画素Ｐ１０，Ｐ１１，Ｐ３０，Ｐ３１は有為線Ｃ１に属し、画素Ｐ１８，Ｐ１９，Ｐ２０，Ｐ２１は有為線Ｃ２に属する。 The superpixel calculator 13 assigns a label to each superpixel (step S31), and attaches adjacency information to pixels adjacent to the superpixel boundary (step S32). FIG. 16 shows a state in which a label is assigned to a super pixel and adjacent information is assigned to a pixel. The dotted rectangle in FIG. 16 represents one pixel, and the solid line represents a superpixel boundary. FIG. 16 shows three superpixels with labels L1, L2, and L3. Pixels P10 to P19 of the superpixel L1, pixels P20 to P16 of the superpixel L2, and pixels P30 to P38 of the superpixel L3 are pixels adjacent to the boundary of the superpixel. In FIG. 16, superpixels to which the device belongs and adjacent information are shown in parentheses. For example, information of (L1, L1: L3) is attached to the pixels P10 to P14. This indicates that the pixels P10 to P14 belong to the superpixel L1 and are adjacent to the boundary between the superpixels L1 and L3. Further, FIG. 16 shows two major lines C1 and C2. Pixels P10, P11, P30, and P31 belong to the major line C1, and pixels P18, P19, P20, and P21 belong to the major line C2.

形状候補抽出部１５は、有為線の数が一定以下になるまで以下の有為線を統合する処理を繰り返す。 The shape candidate extraction unit 15 repeats the process of integrating the following significant lines until the number of significant lines becomes a certain number or less.

形状候補抽出部１５は、有為線の先端の画素と同じ隣接情報を持つ画素を有為線に含める（ステップＳ３３）。図１６の有為線Ｃ１の先端の画素は、画素Ｐ１１，Ｐ３１である。有為線Ｃ１の先端の画素Ｐ１１，Ｐ３１から４近傍で同じ隣接情報を持つ画素を探して有為線Ｃ１に含める。画素Ｐ１２は、画素Ｐ３１の４近傍で同じ隣接情報（Ｌ１：Ｌ３）を持つ画素である。画素Ｐ１２を有為線Ｃ１に含め、有為線Ｃ１の先端とする。続いて、画素Ｐ１２の４近傍で同じ隣接情報を持つ画素Ｐ３２を有為線Ｃ１に含める。以上の処理を繰り返して、有為線Ｃ１をスーパーピクセルＬ１，Ｌ３の境界に沿って伸ばしていく。図１７は、有為線Ｃ１を伸ばした様子を示す図である。画素Ｐ１２〜Ｐ１４，Ｐ３２〜Ｐ３４が有為線Ｃ１に含まれている。有為線Ｃ２も、同様に処理されて、画素Ｐ１６，Ｐ１７，Ｐ２２，Ｐ２３を含めて伸ばされている。 The shape candidate extraction unit 15 includes the pixel having the same adjacent information as the pixel at the tip of the significant line in the significant line (step S33). The pixels at the tip of the significant line C1 in FIG. 16 are pixels P11 and P31. Pixels having the same adjacent information in the vicinity of 4 from the pixels P11 and P31 at the tip of the significant line C1 are searched and included in the significant line C1. The pixel P12 is a pixel having the same adjacency information (L1: L3) in the vicinity of 4 of the pixel P31. The pixel P12 is included in the major line C1 and is the tip of the major line C1. Subsequently, a pixel P32 having the same adjacency information in the vicinity of 4 of the pixel P12 is included in the significant line C1. The above process is repeated to extend the major line C1 along the boundary between the superpixels L1 and L3. FIG. 17 is a diagram illustrating a state in which the major line C1 is extended. Pixels P12 to P14 and P32 to P34 are included in the major line C1. The major line C2 is processed in the same manner and extended including the pixels P16, P17, P22, and P23.

続いて、形状候補抽出部１５は、先端同士が近接した有為線の統合処理を行う（ステップＳ３４）。 Subsequently, the shape candidate extraction unit 15 performs an integration process of the major lines whose tips are close to each other (Step S34).

図１８は、有為線の統合処理の流れを示すフローチャートである。 FIG. 18 is a flowchart showing the flow of the integration process of the major line.

形状候補抽出部１５は、まず、有為線が別の有為線と接触したか否か判定する（ステップＳ４１）。有為線の先端の４近傍の画素が別の有為線の画素である場合に、有為線が接触したと判定する。 The shape candidate extraction unit 15 first determines whether or not the major line has come into contact with another major line (step S41). When the pixels in the vicinity of 4 at the leading end of the significant line are pixels of another significant line, it is determined that the significant line is in contact.

有為線が接触した場合（ステップＳ４１のＹＥＳ）、形状候補抽出部１５は、有為線の先端の画素にヒモ付けられた隣接情報と別の有為線の画素にヒモ付けられた隣接情報が同じか否か判定する（ステップＳ４２）。 When the potential line comes into contact (YES in step S41), the shape candidate extraction unit 15 adjoins the adjacent information attached to the pixel at the tip of the significant line and the adjacent information attached to the pixel of another meaningful line. Are the same (step S42).

隣接情報が同じ場合（ステップＳ４２のＹＥＳ）、形状候補抽出部１５は、接触した有為線を統合する（ステップＳ４３）。有為線を統合するときは、番号の小さい有為線に番号の大きい有為線を統合する。 When the adjacent information is the same (YES in step S42), the shape candidate extraction unit 15 integrates the contacted major lines (step S43). When integrating the major lines, the major lines with the larger numbers are merged with the minor lines with the smaller numbers.

隣接情報が異なる場合（ステップＳ４２のＮＯ）、形状候補抽出部１５は、接触した有為線を統合しない。 When the adjacent information is different (NO in step S42), the shape candidate extraction unit 15 does not integrate the contacted major lines.

一方、有為線が接触していない場合（ステップＳ４１のＮＯ）、形状候補抽出部１５は、有為線の先端間の距離が一定以下であるか否か判定する（ステップＳ４４）。 On the other hand, when the potential line is not in contact (NO in step S41), the shape candidate extraction unit 15 determines whether the distance between the ends of the significant line is equal to or less than a certain value (step S44).

先端間の距離が一定以下である場合（ステップＳ４４のＹＥＳ）、形状候補抽出部１５は、先端間の最短距離上の画素を含めて有為線を統合する（ステップＳ４５）。図１７の有為線Ｃ１，Ｃ２は先端が接触していないが、先端間が一定以下の距離である。そこで、形状候補抽出部１５は、図１９に示すように、有為線Ｃ１，Ｃ２の先端間の最短距離上の画素Ｐ１５，Ｐ３５を含めて、有為線Ｃ２を有為線Ｃ１に統合する。 When the distance between the tips is equal to or less than a certain value (YES in step S44), the shape candidate extraction unit 15 integrates the significant lines including the pixels on the shortest distance between the tips (step S45). The leading lines C1 and C2 in FIG. 17 are not in contact with each other, but the distance between the ends is a certain distance or less. Therefore, as shown in FIG. 19, the shape candidate extraction unit 15 integrates the major line C2 into the major line C1, including the pixels P15 and P35 on the shortest distance between the tips of the major lines C1 and C2. .

先端間の距離が一定以下でない場合（ステップＳ４４のＮＯ）、形状候補抽出部１５は、島状のスーパーピクセルで有為線が分断されたか否か判定する（ステップと４６）。具体的には、形状候補抽出部１５は、有為線それぞれの先端の画素が共通のスーパーピクセルに属しており、補間対象画素のスーパーピクセルが先端の画素それぞれの隣接情報の共通項であるか否か判定する。図２０では、スーパーピクセルＬ１，Ｌ３の間に島状のスーパーピクセルＬ２が形成されて、スーパーピクセルＬ１，Ｌ３の境界に沿った有為線Ｃ１，Ｃ２が分断されている。図２０において、有為線Ｃ１の先端の画素Ｐ１０はスーパーピクセルＬ１に属し、有為線Ｃ２の先端の画素Ｐ１１はスーパーピクセルＬ１に属している。つまり、有為線それぞれの先端の画素が共通のスーパーピクセルに属している。画素Ｐ１０の４近傍の補間対象画素Ｐ２０はスーパーピクセルＬ２に属している。画素Ｐ１１の４近傍の補間対象画素Ｐ２９はスーパーピクセルＬ２に属している。画素Ｐ１０の隣接情報はＬ１：Ｌ２：Ｌ３であり、画素Ｐ１１の隣接情報はＬ１：Ｌ２：Ｌ３である。つまり、補間対象画素Ｐ２０，Ｐ２９の属するスーパーピクセルＬ２は、先端の画素Ｐ１０，Ｐ１１の隣接情報の共通項である。したがって、図２０の状況は、ステップＳ４６でＹＥＳと判定される。 If the distance between the tips is not less than a certain value (NO in step S44), the shape candidate extraction unit 15 determines whether or not the significant line is divided by the island-shaped superpixel (step and 46). Specifically, the shape candidate extraction unit 15 determines whether the leading pixel of each of the tangent lines belongs to a common super pixel, and whether the super pixel of the interpolation target pixel is a common item of adjacent information of each of the leading pixels. Judge whether or not. In FIG. 20, an island-shaped superpixel L2 is formed between the superpixels L1 and L3, and the major lines C1 and C2 along the boundary between the superpixels L1 and L3 are divided. In FIG. 20, the pixel P10 at the leading end of the significant line C1 belongs to the super pixel L1, and the pixel P11 at the leading end of the significant line C2 belongs to the super pixel L1. That is, the pixel at the tip of each of the important lines belongs to a common super pixel. The interpolation target pixel P20 in the vicinity of 4 of the pixel P10 belongs to the super pixel L2. The interpolation target pixel P29 in the vicinity of 4 of the pixel P11 belongs to the super pixel L2. The adjacent information of the pixel P10 is L1: L2: L3, and the adjacent information of the pixel P11 is L1: L2: L3. That is, the super pixel L2 to which the interpolation target pixels P20 and P29 belong is a common item of the adjacent information of the leading pixels P10 and P11. Therefore, the situation in FIG. 20 is determined as YES in step S46.

島状のスーパーピクセルで有為線が分断されたと判定した場合（ステップＳ４６のＹＥＳ）、形状候補抽出部１５は、補間対象画素を最短距離でつなげて有為線を統合する（ステップＳ４７）。このとき補間した画素の有意度を半値（例えば０．５）とする。有意度が半値の場合、初期確率場に対する影響を少なくする。図２０では、スーパーピクセルＬ２の境界に沿って、補間対象画素Ｐ２０，Ｐ２９の間の画素Ｐ２１〜Ｐ２４，Ｐ２５〜Ｐ２８を有為線Ｃ１に含めて、有為線Ｃ２を有為線Ｃ１に統合する。 If it is determined that the significant line is divided by the island-shaped superpixel (YES in step S46), the shape candidate extraction unit 15 connects the interpolation target pixels at the shortest distance and integrates the significant line (step S47). At this time, the significance of the interpolated pixel is set to a half value (for example, 0.5). When the significance is half-value, the influence on the initial random field is reduced. In FIG. 20, along the boundary of the super pixel L2, the pixels P21 to P24 and P25 to P28 between the interpolation target pixels P20 and P29 are included in the major line C1, and the major line C2 is integrated into the major line C1. To do.

以上の処理により有為線を統合し、有為線の数を減らしていく。 The above process will integrate the major lines and reduce the number of major lines.

＜前景切り抜き処理＞
続いて、前景切り抜き処理について説明する。この処理は、処理確率場生成部１６、グラフカット計算部１７、及び領域論理演算部１８によって行われる。 <Foreground clipping process>
Next, the foreground clipping process will be described. This processing is performed by the processing random field generation unit 16, the graph cut calculation unit 17, and the region logic operation unit 18.

図２１は、前景切り抜き処理の流れを示すフローチャートである。 FIG. 21 is a flowchart showing the flow of the foreground clipping process.

処理確率場生成部１６は、初期確率場に対して輪郭候補確率場を重み付けて処理確率場を生成する（ステップＳ５１）。重みの比率は、例えば、背景色と前景色が近い場合は形状に重みをおくなど、有為に定めることができる。初期確率場に対して輪郭候補確率場を重み付けることで、前景領域の一部と背景領域の一部の色ヒストグラムから生成した初期確率場に、前景領域の形状の重み付けを行うことができる。 The processing random field generation unit 16 generates a processing random field by weighting the contour candidate random field with respect to the initial random field (step S51). For example, when the background color and the foreground color are close to each other, the weight ratio can be determined to be significant, such as placing a weight on the shape. By weighting the contour candidate probability field with respect to the initial probability field, the shape of the foreground region can be weighted to the initial probability field generated from the color histograms of a part of the foreground region and a part of the background region.

グラフカット計算部１７は、処理確率場をコスト関数として、コスト最小となるように元画像の全画素に前景又は背景のラベルを付与し、前景領域を抽出する（ステップＳ５２）。このとき、処理の高速化のために、スーパーピクセル計算部１３が生成したスーパーピクセルを用いて、スーパーピクセル間のグラフカットを行ってもよい。 The graph cut calculation unit 17 assigns a foreground or background label to all pixels of the original image so as to minimize the cost, using the processing probability field as a cost function, and extracts a foreground region (step S52). At this time, in order to increase the processing speed, graph cuts between superpixels may be performed using the superpixels generated by the superpixel calculator 13.

領域論理演算部１８は、前景領域と背景領域に二分化された１，０の二値画像を生成し、この二値画像と元画像とを論理演算することで、前景領域を切り出した出力画像を生成する（ステップＳ５３）。図２２に、前景領域を切り出した出力画像の例を示す。 The area logical operation unit 18 generates a binary image of 1 and 0 divided into a foreground area and a background area, and performs an logical operation on the binary image and the original image, thereby outputting an output image obtained by cutting out the foreground area. Is generated (step S53). FIG. 22 shows an example of an output image obtained by cutting out the foreground area.

以上説明したように、本実施の形態によれば、入力部１１が、元画像の前景領域の境界を含み、前景領域の大まかな形状を示す処理領域を入力し、初期確率場生成部１２が、処理領域画像の前景領域側を前景の一部、背景領域側を背景の一部の教示として、初期確率場を生成し、形状候補抽出部１５が前景領域の輪郭線を抽出し、グラフカット計算部１７が、輪郭線に基づいて重み付けた初期確率場をコスト関数としてグラフカットを用いて前景領域と背景領域とに分離することにより、前景領域の一部と背景領域の一部の色ヒストグラムから生成した初期確率場に前景領域の形状を加味し、任意の背景下において、より正確に前景領域を分離できる。 As described above, according to the present embodiment, the input unit 11 inputs a processing region that includes the boundary of the foreground region of the original image and indicates the rough shape of the foreground region, and the initial random field generation unit 12 Then, an initial random field is generated by using the foreground area side of the processing area image as a part of the foreground and the background area side as a part of the background, and the shape candidate extraction unit 15 extracts the outline of the foreground area and cuts the graph. The calculation unit 17 separates the initial random field weighted based on the contour line into a foreground region and a background region using a graph cut as a cost function, so that a color histogram of a part of the foreground region and a part of the background region is obtained. By adding the shape of the foreground region to the initial random field generated from the above, the foreground region can be more accurately separated under an arbitrary background.

本実施の形態によれば、スーパーピクセル計算部１３がスーパーピクセルを生成してスーパーピクセルの境界を示す二値画像を生成し、微分フィルタ処理部１４が処理領域画像の形状に基づいて生成した微分フィルタを二値画像に適用し、形状候補抽出部１５が微分フィルタを適用後の二値画像中の輪郭候補の画素をスーパーピクセルの境界に沿って補間して前景領域の輪郭線を推定することで、入力された前景領域の形状に基づいて前景領域の輪郭線を推定することができる。 According to the present embodiment, the super pixel calculation unit 13 generates a super pixel to generate a binary image indicating the boundary of the super pixel, and the differential filter processing unit 14 generates the differential based on the shape of the processing region image. The filter is applied to the binary image, and the shape candidate extraction unit 15 interpolates the contour candidate pixels in the binary image after applying the differential filter along the superpixel boundary to estimate the contour line of the foreground region. Thus, the outline of the foreground area can be estimated based on the input shape of the foreground area.

本実施の形態を動画に適用する場合は、１枚目の画像の処理時に生成した確率場を保持して更新しながら使用する。例えば、オプティカルフローなどで画像上の特徴点の動きを追うことによって対応する画素のキーポイントを一致させて、輪郭情報を追従させる期待できる。 When this embodiment is applied to a moving image, the probability field generated at the time of processing the first image is retained and updated. For example, it can be expected to follow the contour information by matching the key points of the corresponding pixels by following the movement of the feature points on the image by optical flow or the like.

［第２の実施の形態］
＜画像認識システムの構成＞
図２３は、第２の実施の形態における画像処理装置を含む画像認識システムの全体構成図である。同図に示す画像認識システムは、画像処理装置１、画像認識装置３を備える。画像処理装置１と画像認識装置３とはネットワークを介して接続される。 [Second Embodiment]
<Configuration of image recognition system>
FIG. 23 is an overall configuration diagram of an image recognition system including an image processing device according to the second embodiment. The image recognition system shown in the figure includes an image processing device 1 and an image recognition device 3. The image processing apparatus 1 and the image recognition apparatus 3 are connected via a network.

第２の実施の形態では、画像処理装置１は教師データの作成に利用される。画像処理装置１に、第１の実施の形態と同様に、元画像と元画像から切り出す前景領域の形状を入力し、前景領域のみが切り出された前景画像を得る。第２の実施の形態では、図２４に示すように、画像処理装置１が特徴点抽出部１９を備え、前景画像の特徴点を抽出し、特徴点と元画像を教師データとして画像認識装置３へ送信する。 In the second embodiment, the image processing apparatus 1 is used for creating teacher data. Similar to the first embodiment, the image processing apparatus 1 is input with the original image and the shape of the foreground area cut out from the original image, and a foreground image with only the foreground area cut out is obtained. In the second embodiment, as shown in FIG. 24, the image processing apparatus 1 includes a feature point extraction unit 19, extracts feature points of the foreground image, and uses the feature points and the original image as teacher data. Send to.

画像認識装置３は、受信部３１、特徴点抽出部３２、判別処理部３３、送信部３４、及び画像データ記憶部３５を備える。画像認識装置３が備える各部は、演算処理装置、記憶装置等を備えたコンピュータにより構成して、各部の処理がプログラムによって実行されるものとしてもよい。このプログラムは画像認識装置３が備える記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。 The image recognition device 3 includes a reception unit 31, a feature point extraction unit 32, a discrimination processing unit 33, a transmission unit 34, and an image data storage unit 35. Each unit included in the image recognition device 3 may be configured by a computer including an arithmetic processing device, a storage device, and the like, and the processing of each unit may be executed by a program. This program is stored in a storage device included in the image recognition device 3, and can be recorded on a recording medium such as a magnetic disk, an optical disk, or a semiconductor memory, or provided through a network.

受信部３１は、クライアント端末５から認識対象の画像を受信する。画像認識装置３とクライアント端末５とはネットワークを介して接続される。 The receiving unit 31 receives an image to be recognized from the client terminal 5. The image recognition device 3 and the client terminal 5 are connected via a network.

特徴点抽出部３２は、受信した画像から特徴点を抽出する。 The feature point extraction unit 32 extracts feature points from the received image.

判別処理部３３は、特徴点抽出部３２が抽出した特徴点を画像データ記憶部３５に格納された特徴点と照合し、類似の特徴点をもつデータを検索する。 The discrimination processing unit 33 collates the feature points extracted by the feature point extraction unit 32 with the feature points stored in the image data storage unit 35, and searches for data having similar feature points.

送信部３４は、判別処理部３３の検索結果をクライアント端末５へ送信する。検索結果は、特徴点に関連付けられたデータであり、例えば、画像処理装置１から受信した画像である。 The transmission unit 34 transmits the search result of the discrimination processing unit 33 to the client terminal 5. The search result is data associated with the feature point, for example, an image received from the image processing apparatus 1.

画像データ記憶部３５は、画像処理装置１から受信した特徴点と画像を関連付けて格納する。 The image data storage unit 35 stores the feature points received from the image processing apparatus 1 in association with the images.

＜画像認識システムの動作＞
次に、第２の実施の形態の画像認識システムの動作について説明する。 <Operation of image recognition system>
Next, the operation of the image recognition system according to the second embodiment will be described.

図２５は、画像処理装置が特徴点を抽出する処理の流れを示すフローチャートである。 FIG. 25 is a flowchart showing a flow of processing in which the image processing apparatus extracts feature points.

画像処理装置１は、第１の実施の形態で説明したように、前景画像を切り出したとする。 Assume that the image processing apparatus 1 cuts out the foreground image as described in the first embodiment.

特徴点抽出部１９は、前景画像の特徴点を抽出する（ステップＳ６１）。特徴点は、エッジ勾配、色変化など有為なものであればよい。ただし、特徴点抽出部１９が抽出する特徴点は、画像認識装置３の特徴点抽出部３２が抽出する特徴点と同じ種類のものでなければならない。 The feature point extraction unit 19 extracts feature points of the foreground image (step S61). The feature point may be any significant one such as an edge gradient or a color change. However, the feature points extracted by the feature point extraction unit 19 must be of the same type as the feature points extracted by the feature point extraction unit 32 of the image recognition device 3.

特徴点抽出部１９は、特徴点と画像処理装置１が入力した元画像を画像認識装置３に送信する（ステップＳ６２）。特徴点抽出部１９は、特徴点の画像上の座標と位置関係及び前景画像に写った物体の情報を元画像に関連付けて画像認識装置３へ送信する。画像処理装置１が教師データを作成するときは、ユーザから前景領域の形状を入力するときに、前景の物体の情報も入力しておくとよい。 The feature point extraction unit 19 transmits the feature points and the original image input by the image processing device 1 to the image recognition device 3 (step S62). The feature point extraction unit 19 transmits the coordinates of the feature point on the image, the positional relationship, and information on the object shown in the foreground image to the image recognition apparatus 3 in association with the original image. When the image processing apparatus 1 creates the teacher data, it is preferable to input foreground object information when the user inputs the shape of the foreground area.

図２６は、画像認識装置の処理の流れを示すフローチャートである。 FIG. 26 is a flowchart showing the flow of processing of the image recognition apparatus.

受信部３１は、クライアント端末５から画像を受信する（ステップＳ７１）。 The receiving unit 31 receives an image from the client terminal 5 (step S71).

特徴点抽出部３２は、クライアント端末５から受信した画像の特徴点を抽出する（ステップＳ７２）。特徴点抽出部３２は、画像処理装置１の特徴点抽出部１９と同種の特徴点を抽出する。 The feature point extraction unit 32 extracts feature points of the image received from the client terminal 5 (step S72). The feature point extraction unit 32 extracts the same kind of feature points as the feature point extraction unit 19 of the image processing apparatus 1.

判別処理部３３は、特徴点抽出部３２が抽出した特徴点と類似の特徴点を持つデータを画像データ記憶部３５から検索する（ステップＳ７３）。 The discrimination processing unit 33 searches the image data storage unit 35 for data having feature points similar to the feature points extracted by the feature point extraction unit 32 (step S73).

送信部３４は、判別処理部３３の検索結果をクライアント端末５へ送信する（ステップＳ７４）。 The transmission unit 34 transmits the search result of the discrimination processing unit 33 to the client terminal 5 (step S74).

図２７は、クライアント端末に表示される結果の例を示す図である。 FIG. 27 is a diagram illustrating an example of a result displayed on the client terminal.

クライアント端末５が、クライアント端末５の備えるカメラなどで撮った問い合わせ画像５１を画像認識装置３へ送信すると結果が得られる。図２７では、検索結果として得られた問い合わせ画像５１に写った物体の情報５２、関連する検索結果画像５３Ａ、５３Ｂを表示している。物体の情報５２及び検索結果画像５３Ａ、５３Ｂは、画像データ記憶部３５に格納されたデータである。 A result is obtained when the client terminal 5 transmits an inquiry image 51 taken by a camera or the like included in the client terminal 5 to the image recognition apparatus 3. In FIG. 27, object information 52 and related search result images 53A and 53B shown in the inquiry image 51 obtained as a search result are displayed. The object information 52 and the search result images 53 </ b> A and 53 </ b> B are data stored in the image data storage unit 35.

本実施の形態によれば、画像処理装置１が前景領域を切り出し、前景領域の特徴点を抽出して教師データを作成することにより、余計な背景が切り落とし、確からしい特徴点のみを抽出できるので、画像認識処理において、誤差を抑え、マッチングの精度の向上を期待できる。 According to the present embodiment, the image processing apparatus 1 cuts out the foreground area, extracts feature points of the foreground area, and creates teacher data, thereby cutting off the extra background and extracting only probable feature points. In the image recognition processing, errors can be suppressed and an improvement in matching accuracy can be expected.

１…画像処理装置
１１…入力部
１２…初期確率場生成部
１３…スーパーピクセル計算部
１４…微分フィルタ処理部
１５…形状候補抽出部
１６…処理確率場生成部
１７…グラフカット計算部
１８…領域論理演算部
１９…特徴点抽出部
３…画像認識装置
３１…受信部
３２…特徴点抽出部
３３…判別処理部
３４…送信部
３５…画像データ記憶部
５…クライアント端末 DESCRIPTION OF SYMBOLS 1 ... Image processing apparatus 11 ... Input part 12 ... Initial probability field generation part 13 ... Super pixel calculation part 14 ... Differential filter processing part 15 ... Shape candidate extraction part 16 ... Processing probability field generation part 17 ... Graph cut calculation part 18 ... Area Logical operation unit 19 ... feature point extraction unit 3 ... image recognition device 31 ... reception unit 32 ... feature point extraction unit 33 ... discrimination processing unit 34 ... transmission unit 35 ... image data storage unit 5 ... client terminal

Claims

An image input means for inputting an image;
A shape input means for inputting a shape of a foreground area along an outline of a foreground area of the image, and cutting out pixels corresponding to the shape of the foreground area from the image to obtain a processing area image;
An initial random field generating means for generating an initial random field that holds a foreground-like or background-like probability of each pixel of the image from pixels on the foreground area side of the processing area image and pixels on the background area side of the processing area image; ,
Contour extraction means for extracting the contour of the foreground region from the processing region image;
Separating means for separating the image into a foreground region and a background region using a graph cut with the initial random field weighted based on the contour as a cost function ,
The contour line extracting means includes
Superpixel calculation means for calculating a superpixel obtained by clustering pixels having similar colors and positions from the processing area image and generating a boundary image representing a boundary of the superpixel;
Differential filter processing means for generating a differential filter based on the shape of the foreground region and applying the differential filter to the boundary image to extract contour candidate pixels of the foreground region;
An image processing apparatus comprising: a contour interpolation unit that interpolates between the contour candidate pixels along a boundary of the super pixel and extracts a contour line of the foreground region .

The image processing apparatus according to claim 1 , wherein the separating unit separates a foreground region and a background region between the superpixels.

The shape input means, the image processing apparatus according to claim 1 or 2, characterized in that inputting the shape of the foreground region using the depth sensor or temperature sensor.

An image processing method executed by a computer,
Inputting an image;
Inputting the shape of the foreground region along the outline of the foreground region of the image;
Cutting out pixels corresponding to the shape of the foreground region from the image to obtain a processing region image;
Generating an initial random field that holds the foreground or background likelihood of each pixel of the image from the pixels on the foreground area side of the processing area image and the pixels on the background area side of the processing area image;
Extracting a contour line of the foreground region from the processing region image;
Separating the image into a foreground region and a background region using a graph cut with the initial random field weighted based on the contour as a cost function , and
Extracting the contour line comprises:
Calculating a superpixel obtained by clustering pixels having similar colors and positions from the processing region image, and generating a boundary image representing a boundary of the superpixel;
Generating a differential filter based on the shape of the foreground region, applying the differential filter to the boundary image, and extracting contour candidate pixels of the foreground region;
An image processing method comprising: interpolating between the contour candidate pixels along a boundary of the superpixel to extract a contour line of the foreground region .

An image processing program that causes a computer to operate as each unit of the image processing apparatus according to claim 1 .