JP2015191481A

JP2015191481A - Image processing apparatus and image processing program

Info

Publication number: JP2015191481A
Application number: JP2014068809A
Authority: JP
Inventors: 英範栗林; Hidenori Kuribayashi
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2014-03-28
Filing date: 2014-03-28
Publication date: 2015-11-02

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus configured to extract an area of interest with high accuracy at a high speed, and an image processing program.SOLUTION: An image processing apparatus includes: a map creating section which creates a map indicating a distribution of a degree of attention of an input image; a mask creating section which creates a mask indicating an outer shape of an area of interest, on the basis of edge information of the input image; and an extraction section which extracts the area of interest, from the input image on the basis of the map created by the map creating section and the mask created by the mask creating section.

Description

本発明は、画像処理装置および画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

従来、画像から主要被写体の存在する領域、すなわち注目領域を抽出する技術が知られている。例えば非特許文献１には、画像の色情報からサリエンシーマップを作成すると共に画像に対して色情報に基づくクラスタリングを行って画像を複数の領域に分割し、それら複数の領域の中からサリエンシー（注目度）の高い領域だけを抜き出すことにより、注目領域を抽出する技術が記載されている。 Conventionally, a technique for extracting a region where a main subject exists, that is, a region of interest from an image is known. For example, in Non-Patent Document 1, a saliency map is created from color information of an image and clustering based on the color information is performed on the image to divide the image into a plurality of regions. A technique for extracting a region of interest by extracting only a region having a high degree of attention) is described.

R. Achanta, S. Hemami, F. Estrada and S. Susstrunk, Frequency-tuned Salient Region Detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 1597 - 1604, 2009.R. Achanta, S. Hemami, F. Estrada and S. Susstrunk, Frequency-tuned Salient Region Detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2009), pp. 1597-1604, 2009.

従来技術には、色情報に基づくクラスタリングの計算量が大きく処理時間を要してしまうという問題があった。 The prior art has a problem that the calculation amount of clustering based on color information is large and requires processing time.

請求項１に記載の画像処理装置は、入力画像の注目度の分布を示すマップを作成するマップ作成部と、前記入力画像のエッジ情報に基づき、注目領域の外形を示すマスクを作成するマスク作成部と、前記マップ作成部により作成された前記マップと、前記マスク作成部により作成された前記マスクとに基づいて、前記入力画像から前記注目領域を抽出する抽出部と、を備えることを特徴とする。
請求項６に記載の画像処理プログラムは、入力画像の注目度の分布を示すマップを作成するマップ作成ステップと、前記入力画像のエッジ情報に基づき、注目領域の外形を示すマスクを作成するマスク作成ステップと、前記マップ作成ステップにより作成された前記マップと、前記マスク作成ステップにより作成された前記マスクとに基づいて、前記入力画像から前記注目領域を抽出する抽出ステップと、をコンピュータに実行させることを特徴とする。 The image processing apparatus according to claim 1, wherein a map creation unit that creates a map that indicates a distribution of attention level of an input image, and a mask creation that creates a mask that indicates an outline of a region of interest based on edge information of the input image And an extraction unit that extracts the region of interest from the input image based on the map created by the map creation unit and the mask created by the mask creation unit. To do.
An image processing program according to claim 6, wherein a map creation step for creating a map showing a distribution of attention level of an input image, and a mask creation for creating a mask showing an outline of a region of interest based on edge information of the input image Causing the computer to execute a step, and an extraction step of extracting the region of interest from the input image based on the map created by the map creation step and the mask created by the mask creation step It is characterized by.

本発明によれば、注目領域を高精度に且つ高速に抽出することができる。 According to the present invention, it is possible to extract a region of interest with high accuracy and at high speed.

本発明の第１の実施の形態に係る画像処理システムの構成の模式図である。1 is a schematic diagram of a configuration of an image processing system according to a first embodiment of the present invention. 画像処理装置３の構成を模式的に示す図である。2 is a diagram schematically showing a configuration of an image processing apparatus 3. FIG. 画像処理装置３の要部構成を例示する図である。2 is a diagram illustrating a main configuration of an image processing apparatus 3. FIG. 注目領域画像作成部１２の要部構成を例示する図である。FIG. 3 is a diagram illustrating a main configuration of an attention area image creation unit 12. マップ作成部２１の説明図である。It is explanatory drawing of the map preparation part. マスク作成部２２の説明図である。It is explanatory drawing of the mask preparation part 22. FIG. マスク処理部２３および抽出部２４の説明図である。It is explanatory drawing of the mask process part 23 and the extraction part 24. FIG. 注目領域画像作成部１２が実行する注目領域画像作成処理のフローチャートである。5 is a flowchart of attention area image creation processing executed by a attention area image creation unit 12; 変形例１の説明図である。It is explanatory drawing of the modification 1.

（第１の実施の形態）
図１は、本発明の第１の実施の形態に係る画像処理システムの構成の模式図である。画像処理システム１は、カメラ２と、画像処理装置３と、ハードディスクドライブ（ＨＤＤ）４とを有する。カメラ２は、いわゆるデジタルスチルカメラであり、被写体を撮影して所定の記憶媒体に撮影画像データ（以下、単に画像と称する）を記憶する。 (First embodiment)
FIG. 1 is a schematic diagram of a configuration of an image processing system according to the first embodiment of the present invention. The image processing system 1 includes a camera 2, an image processing device 3, and a hard disk drive (HDD) 4. The camera 2 is a so-called digital still camera that captures a subject and stores captured image data (hereinafter simply referred to as an image) in a predetermined storage medium.

画像処理装置３は、カメラ２により撮影された画像を管理する装置である。画像処理装置３は、カメラ２により撮影された画像を、例えばカメラ２との間で有線や無線等によるデータ通信を行い、カメラ２から画像を取得する。画像処理装置３は、カメラ２から取得した画像を大容量の記憶媒体であるＨＤＤ４に記憶し、管理する。画像処理装置３は、ＨＤＤ４に記憶されている画像を、例えばサムネイル形式で一覧表示したり、撮影日や撮影パラメータ等に基づき検索したり、不図示の表示装置に表示（再生）したり、不図示のプリンタを用いて印刷したりすることができる。 The image processing device 3 is a device that manages images taken by the camera 2. The image processing device 3 acquires data from the camera 2 by performing data communication with the camera 2 by, for example, wired or wireless communication with the image taken by the camera 2. The image processing apparatus 3 stores and manages an image acquired from the camera 2 in the HDD 4 that is a large-capacity storage medium. The image processing device 3 displays a list of images stored in the HDD 4, for example, in a thumbnail format, searches based on the shooting date, shooting parameters, etc., displays (plays back) the image on a display device (not shown), Printing can be performed using the illustrated printer.

特に画像処理装置３は、カメラ２により撮影された複数の画像から、類似画像の組を自動的に検出して、それら類似画像の組を一塊の単位として管理することができる。例えば、いわゆる連写機能（連続撮影機能）や露出オートブラケット機能を用いて被写体を撮影した場合、各画像の内容は極めて類似したものになる。また、そのような特別な機能を用いずとも、利用者が画角等を固定して撮影を繰り返し行えば、やはり画像は極めて類似した内容になる。画像処理装置３は、後述する処理によって画像の主要被写体を特定し、その主要被写体がある程度一致する画像同士を１つのグループにまとめる（分類する）ことにより、利用者による操作を容易にする。利用者による操作とは、例えば削除や表示、印刷等である。つまり画像処理装置３は利用者に対して、画像の削除、表示、印刷等を、そのグループ単位でまとめて行う機能を提供する。 In particular, the image processing apparatus 3 can automatically detect a set of similar images from a plurality of images taken by the camera 2 and manage the set of similar images as a unit of a block. For example, when a subject is photographed using a so-called continuous shooting function (continuous shooting function) or an exposure auto bracket function, the contents of each image are very similar. Even if such a special function is not used, if the user repeats photographing while fixing the angle of view or the like, the images will have very similar contents. The image processing device 3 identifies the main subject of the image by the process described later, and groups (classifies) images that match the main subject to some extent to facilitate operation by the user. The operation by the user is, for example, deletion, display, printing, or the like. In other words, the image processing apparatus 3 provides the user with a function for collectively deleting, displaying, printing, and the like of an image for each group.

図２は、画像処理装置３の構成を模式的に示す図である。本実施形態は、例えば、画像を記憶・分類する画像処理プログラムを図１に示すコンピュータ装置に実行させることにより、画像を記憶・分類する画像処理装置３を提供する。プログラムをパーソナルコンピュータ１００に取込む場合には、パーソナルコンピュータ１００のデータストレージ装置にプログラムをローディングした上で当該プログラムを実行させることにより、画像を記憶・分類する画像処理装置３として使用する。 FIG. 2 is a diagram schematically illustrating the configuration of the image processing apparatus 3. The present embodiment provides an image processing apparatus 3 for storing and classifying images by causing the computer apparatus shown in FIG. 1 to execute an image processing program for storing and classifying images. When the program is loaded into the personal computer 100, the program is loaded into the data storage device of the personal computer 100 and then executed to be used as the image processing device 3 for storing and classifying images.

プログラムのローディングは、プログラムを格納したＣＤ−ＲＯＭなどの記録媒体１０４をパーソナルコンピュータ１００にセットして行ってもよいし、ネットワークなどの通信回線１０１を経由する方法でパーソナルコンピュータ１００へローディングしてもよい。通信回線１０１を経由する場合は、通信回線１０１に接続されたサーバコンピュータ１０２のハードディスク装置１０３などにプログラムを格納しておく。このように、プログラムは記録媒体１０４や通信回線１０１を介する提供などの種々の形態のコンピュータプログラム製品として供給される。 The program may be loaded by setting a recording medium 104 such as a CD-ROM storing the program in the personal computer 100, or by loading the program into the personal computer 100 via a communication line 101 such as a network. Good. When passing through the communication line 101, the program is stored in the hard disk device 103 of the server computer 102 connected to the communication line 101. In this manner, the program is supplied as various types of computer program products such as provision via the recording medium 104 or the communication line 101.

なお、この画像処理装置３は、パーソナルコンピュータの機能として実現されることに限られず、例えば、タブレットコンピュータ、フォトビューワ、またはカメラ等の機能として実現されてもよい。また、ネットワークに接続されたサーバーの機能として実現されてもよい。 The image processing device 3 is not limited to being realized as a function of a personal computer, and may be realized as a function of a tablet computer, a photo viewer, a camera, or the like. Further, it may be realized as a function of a server connected to a network.

図３は、パーソナルコンピュータ１００（図２）の機能として実現された画像処理装置３の要部構成を例示する図である。画像処理装置３は、画像分類部１１と、注目領域画像作成部１２と、を含む。画像分類部１１は、カメラ２から入力された画像を注目領域画像作成部１２に入力する。注目領域画像作成部１２は、この入力画像から注目領域画像を作成して画像分類部１１に出力する。注目領域画像とは、入力画像から注目領域を抽出した画像である。注目領域とは、画像全体のうち、主要被写体が存在する領域を指す。つまり注目領域画像とは、入力画像に写り込んでいる背景等、主要被写体ではない領域を除去した画像である。注目領域画像作成部１２の動作については後に詳述する。 FIG. 3 is a diagram exemplifying a main part configuration of the image processing apparatus 3 realized as a function of the personal computer 100 (FIG. 2). The image processing device 3 includes an image classification unit 11 and an attention area image creation unit 12. The image classification unit 11 inputs the image input from the camera 2 to the attention area image creation unit 12. The attention area image creation unit 12 creates an attention area image from the input image and outputs the attention area image to the image classification unit 11. The attention area image is an image obtained by extracting the attention area from the input image. The attention area refers to an area where the main subject exists in the entire image. In other words, the attention area image is an image obtained by removing an area that is not the main subject, such as a background reflected in the input image. The operation of the attention area image creation unit 12 will be described in detail later.

画像分類部１１は、カメラ２から入力された画像と、注目領域画像作成部１２により作成された注目領域画像とを関連付けてＨＤＤ４に記憶する。この記憶に際し、画像分類部１１は、この注目領域画像に類似した注目領域画像をＨＤＤ４から検索する。そして、そのような注目領域画像が見つかった場合、現に記憶しようとしている画像と、その見つかった注目領域画像に関連付けられている画像と、を同一のグループに分類する。つまり、ＨＤＤ４において、互いに類似する注目領域画像を有する複数の画像は、一つのグループに分類される。 The image classification unit 11 stores the image input from the camera 2 and the attention area image created by the attention area image creation unit 12 in association with each other in the HDD 4. At the time of storage, the image classification unit 11 searches the HDD 4 for an attention area image similar to the attention area image. When such an attention area image is found, the image that is actually stored and the image associated with the found attention area image are classified into the same group. That is, in the HDD 4, a plurality of images having attention area images similar to each other are classified into one group.

図４は、注目領域画像作成部１２の要部構成を例示する図である。注目領域画像作成部１２は、マップ作成部２１と、マスク作成部２２と、マスク処理部２３と、抽出部２４と、を含む。以下、これらの各部の働きについて順に説明する。 FIG. 4 is a diagram illustrating a main configuration of the attention area image creation unit 12. The attention area image creation unit 12 includes a map creation unit 21, a mask creation unit 22, a mask processing unit 23, and an extraction unit 24. Hereinafter, the functions of these units will be described in order.

（マップ作成部２１の説明）
マップ作成部２１は、入力画像の色情報に基づいて、入力画像のサリエンシーマップ（以下、単にマップと呼ぶ）を作成する。このマップは、入力画像の各画素について、その画素の注目度、すなわちその画素が注目領域に属する可能性（注目領域らしさ）を算出してその画素の位置に割り当てたマップである。つまり、このマップは、各画素位置における注目度の分布を表すマップである。本実施形態のマップ作成部２１は、まず入力画像に所定の平滑化フィルタ（例えばガウシアンフィルタ）を適用したフィルタ画像を作成する。その後、入力画像とフィルタ画像とを用いて次式（１）に示す演算を行うことにより、マップを作成する。

(Description of map creation unit 21)
The map creation unit 21 creates a saliency map of the input image (hereinafter simply referred to as a map) based on the color information of the input image. This map is a map in which, for each pixel of the input image, the attention level of the pixel, that is, the possibility that the pixel belongs to the attention area (likeness of attention area) is calculated and assigned to the position of the pixel. That is, this map is a map that represents the distribution of the attention level at each pixel position. The map creation unit 21 of the present embodiment first creates a filter image obtained by applying a predetermined smoothing filter (for example, a Gaussian filter) to the input image. Then, a map is created by performing the calculation shown in the following equation (1) using the input image and the filter image.

ここでＳ（ｘ，ｙ）はマップにおける座標（ｘ，ｙ）の値（注目度）、Ｉ_μは入力画像の平均特徴量ベクトル、Ｉ_ωｈｃ（ｘ，ｙ）はフィルタ画像の座標（ｘ，ｙ）の画素の画素値ベクトルである。上式（１）は、平滑化フィルタを適用したことによる座標（ｘ，ｙ）の画素値の変化の絶対値を、座標（ｘ，ｙ）のマップの値とすることを意味する。例えば入力画像において、空など、色の変化に乏しい（周波数の低い）領域は、平滑化フィルタを適用してもあまり変化しない。従って、マップにおいて、そのような領域は注目度が低くなる。他方、主要被写体には細かなテクスチャが存在する（周波数が高い）と考えられるので、そのような領域は平滑化フィルタを適用すると入力画像から大きく変化する。従って、マップにおいて、そのような領域は注目度が高くなる。つまり本実施形態におけるマップは、入力画像の周波数マップとも言うべき特性を有している。なお、以下の説明において、マップにおける注目度は、０〜１の範囲の数値に正規化されているものとする。数値が大きいほど、注目度が大きい。すなわち、数値が大きいほど、その位置が注目領域内である可能性が高い。 Here, S (x, y) is the value (attention level) of coordinates (x, y) in the map, I _μ is the average feature vector of the input image, and I _ωhc (x, y) is the coordinates (x, y) of the filter image. This is a pixel value vector of the pixel y). The above equation (1) means that the absolute value of the change in the pixel value of the coordinate (x, y) due to the application of the smoothing filter is the map value of the coordinate (x, y). For example, in an input image, an area such as the sky where color change is poor (low frequency) does not change much even when a smoothing filter is applied. Therefore, such a region has a low level of attention in the map. On the other hand, since it is considered that there is a fine texture (high frequency) in the main subject, such a region changes greatly from the input image when a smoothing filter is applied. Therefore, such a region has a high degree of attention in the map. That is, the map in the present embodiment has a characteristic that can be called a frequency map of the input image. In the following description, it is assumed that the attention level in the map is normalized to a numerical value in the range of 0 to 1. The greater the number, the greater the degree of attention. That is, the larger the numerical value, the higher the possibility that the position is within the region of interest.

なお、以上に説明するマップの作成方法は、非特許文献１に記載されているサリエンシーマップの作成方法と同一である。しかしながら、マップ作成部２１によるマップ（サリエンシーマップ）の作成方法は、これとは異なっていてもよい。例えば、色情報ではなく輝度情報を用いてマップを作成してもよい。また、いわゆる位相差方式の焦点検出法により、撮影画面内の各部分についていわゆるデフォーカス量を演算し、それらのデフォーカス量を撮影画面に分布させたものをマップとすることも可能である。この場合、デフォーカス量が小さいほど注目度が大きいことになる。このように、マップは画像内における注目度の分布を表すものであれば、どのような方法により作成したものであってもよい。 The map creation method described above is the same as the saliency map creation method described in Non-Patent Document 1. However, the map creation unit 21 may create a map (saliency map) differently. For example, the map may be created using luminance information instead of color information. It is also possible to calculate a so-called defocus amount for each part in the shooting screen by a so-called phase difference type focus detection method, and to map these defocus amounts on the shooting screen. In this case, the smaller the defocus amount, the greater the attention level. As described above, the map may be created by any method as long as it represents the distribution of the attention level in the image.

図５（ａ）に入力画像の例を示す。入力画像４０には、主要被写体として人物４１が写っている。また、主要被写体以外の部分（背景部分）として、空４２、山４３、地面４４が写っている。この入力画像４０から作成されたマップの例を図５（ｂ）に示す。図５（ｂ）では、色（ハッチング）の濃淡で注目度を表しており、薄いほど注目度が高い領域であり、濃いほど注目度が低い領域である。遠景の空の部分４２はマップ５０の中で注目度が最も低くなっている。遠景の山の部分４３は、空の部分４２よりも注目度が高いが、マップ５０の全体から見ると注目度は低い。地面の部分４４は、山の部分４３よりも注目度が高いが、やはりマップ５０の全体から見ると注目度は低い。これに対して、主要被写体である人物の部分４１ａ、４１ｂ、４１ｃ、４１ｄはいずれも主要被写体以外の部分よりも高い注目度を有している。具体的には、人物の鼻の部分４１ａは、マップ５０全体で最も高い注目度を有している。また、人物の顔の部分４１ｂと、人物の胴体の部分４１ｃは、その次に高い注目度を有している。これに対して、人物の首回りの部分４１ｄは、やや低めの注目度を有しているが、主要被写体以外の部分に比べて注目度は高い。 FIG. 5A shows an example of the input image. The input image 40 includes a person 41 as a main subject. In addition, the sky 42, the mountain 43, and the ground 44 are shown as a portion (background portion) other than the main subject. An example of a map created from this input image 40 is shown in FIG. In FIG. 5B, the degree of attention is represented by the shade of color (hatching). The thinner the region, the higher the degree of attention, and the darker the region. The distant sky portion 42 has the lowest degree of attention in the map 50. The distant mountain portion 43 has a higher degree of attention than the sky portion 42, but the degree of attention is low when viewed from the entire map 50. The ground portion 44 has a higher degree of attention than the mountain portion 43, but the degree of attention is also low when viewed from the entire map 50. On the other hand, the portions 41a, 41b, 41c, and 41d of the person that is the main subject have higher attention than the portions other than the main subject. Specifically, the person's nose portion 41 a has the highest degree of attention in the entire map 50. The person's face part 41b and the person's torso part 41c have the second highest attention. On the other hand, the portion 41d around the person's neck has a slightly lower degree of attention, but the degree of attention is higher than the portion other than the main subject.

（マスク作成部２２の説明）
マスク作成部２２は、入力画像４０の輝度情報に基づいて、注目領域の外形を推定する。そして、その推定した外形を表すマスクを作成する。マスク作成部２２は、入力画像４０をグレースケール化してエッジ情報を検出する。そして、そのエッジ情報から入力画像４０の局所特徴点を検出する。このような局所特徴点の検出方法としては、ＳＩＦＴ（Scale Invariant Feature Transform）、ＳＵＲＦ（Speed Up Robust Features）、ＯＲＢ（Oriented FAST and Rotated BRIEF）等が知られている。マスク作成部２２は、そのような周知の方法により、入力画像４０の局所特徴点を検出する。図６（ａ）に、入力画像４０に多数の局所特徴点６０を重畳した例を示す。また、図６（ｂ）には、入力画像４０を取り除き、局所特徴点６０のみを示す。なお、図６（ａ）および図６（ｂ）では、作図の便宜上、多数の局所特徴点６０の一部にのみ符号を付している。 (Description of the mask creation unit 22)
The mask creation unit 22 estimates the outer shape of the attention area based on the luminance information of the input image 40. Then, a mask representing the estimated outer shape is created. The mask creation unit 22 detects edge information by converting the input image 40 to gray scale. And the local feature point of the input image 40 is detected from the edge information. As such local feature point detection methods, SIFT (Scale Invariant Feature Transform), SURF (Speed Up Robust Features), ORB (Oriented FAST and Rotated BRIEF), and the like are known. The mask creation unit 22 detects local feature points of the input image 40 by such a known method. FIG. 6A shows an example in which a large number of local feature points 60 are superimposed on the input image 40. In FIG. 6B, the input image 40 is removed and only the local feature points 60 are shown. In FIG. 6A and FIG. 6B, only a part of a large number of local feature points 60 is denoted for convenience of drawing.

マスク作成部２２は、このようにして検出された局所特徴点６０の最外周を包括する領域７１を特定する。そして、その領域７１に「１」、それ以外の領域７２に「０」を割り当てたマスクを作成する。図６（ｂ）から作成されたマスクの例を図６（ｃ）に示す。仮に局所特徴点６０が主要被写体（注目領域内）にのみ存在するのであれば、ここで作成されるマスク７０は、主要被写体（注目領域）の外形と完全に一致するはずである。つまり、このマスク７０を入力画像４０に適用すれば、主要被写体（注目領域）を入力画像４０から抜き出すことができるはずである。実際には、図６（ｂ）の局所特徴点６０ａのように、主要被写体（注目領域）以外の位置からも、局所特徴点６０が検出されてしまう。従って、ここで作成されるマスク７０も、主要被写体（注目領域）以外の部分を含む、不完全なものとなる。つまり、マスク７０は、主要被写体（注目領域）の外形を推定したものである。 The mask creation unit 22 identifies a region 71 that encompasses the outermost periphery of the local feature point 60 detected in this way. Then, a mask in which “1” is assigned to the area 71 and “0” is assigned to the other area 72 is created. An example of the mask created from FIG. 6B is shown in FIG. If the local feature point 60 exists only in the main subject (within the attention area), the mask 70 created here should completely match the outline of the main subject (attention area). That is, if this mask 70 is applied to the input image 40, the main subject (region of interest) should be able to be extracted from the input image 40. Actually, the local feature point 60 is detected from a position other than the main subject (region of interest) like the local feature point 60a in FIG. 6B. Therefore, the mask 70 created here is also incomplete including a portion other than the main subject (region of interest). That is, the mask 70 is an estimate of the outer shape of the main subject (region of interest).

（マスク処理部２３の説明）
マスク処理部２３は、マップ作成部２１により作成されたマップ５０と、マスク作成部２２により作成されたマスク７０とに基づいて、処理済マップ８０を作成する。図７に、処理済マップ８０の作成方法を模式的に示す。図７中に（Ａ）で示すように、マップ作成部２１が入力画像４０からマップ５０を作成する。図７中に（Ｂ）で示すように、マスク作成部２２が入力画像４０からマスク７０を作成する。マスク処理部２３は、図７中に（Ｃ）で示すように、マップ５０と、マスク７０とを乗算することにより、処理済マップ８０を作成する。前述の通り、マスク７０の主要被写体（注目領域）と推定された部分については「１」、それ以外の部分については「０」が割り当てられている。従って、マップ５０と乗算した処理済マップ８０は、主要被写体（注目領域）と推定された部分については基となるマップ５０内の注目度の値がそのまま含まれ、それ以外の部分については「０」が割り当てられることになる。つまり処理済マップ８０においては、主要被写体（注目領域）以外の部分の注目度が０になる。 (Description of mask processing unit 23)
The mask processing unit 23 creates a processed map 80 based on the map 50 created by the map creation unit 21 and the mask 70 created by the mask creation unit 22. FIG. 7 schematically shows a method for creating the processed map 80. As shown by (A) in FIG. 7, the map creation unit 21 creates a map 50 from the input image 40. As shown by (B) in FIG. 7, the mask creating unit 22 creates a mask 70 from the input image 40. The mask processing unit 23 creates a processed map 80 by multiplying the map 50 and the mask 70 as indicated by (C) in FIG. As described above, “1” is assigned to the portion of the mask 70 estimated as the main subject (region of interest), and “0” is assigned to the other portions. Therefore, the processed map 80 multiplied by the map 50 includes the attention level value in the base map 50 for the portion estimated as the main subject (attention area), and “0” for the other portions. "Will be assigned. That is, in the processed map 80, the degree of attention of the part other than the main subject (attention area) is zero.

（抽出部２４の説明）
抽出部２４は、処理済マップ８０に基づいて、入力画像４０から主要被写体（注目領域）を抽出した注目領域画像を作成する。図７中に（Ｄ）で示すように、抽出部２４は、入力画像４０と、処理済マップ８０とを乗算することにより、注目領域画像９０を作成する。前述の通り、処理済マップ８０の主要被写体（注目領域）以外の部分の注目度は０であるため、注目領域画像９０において、そのような部分の画素値は０になる。他方、主要被写体（注目領域）の部分の画素値は、入力画像４０の画素値が、０〜１の注目度に応じた量だけ減衰する（暗くなる）。具体的には、注目度が低い画素位置ほど、入力画像４０の画素値から暗くなる。 (Description of the extraction unit 24)
Based on the processed map 80, the extraction unit 24 creates an attention area image in which the main subject (attention area) is extracted from the input image 40. As shown by (D) in FIG. 7, the extraction unit 24 multiplies the input image 40 and the processed map 80 to create the attention area image 90. As described above, since the degree of attention of the part other than the main subject (attention area) in the processed map 80 is 0, the pixel value of such a part in the attention area image 90 is 0. On the other hand, the pixel value of the main subject (region of interest) is attenuated (darkened) by the amount corresponding to the degree of attention of 0 to 1 in the pixel value of the input image 40. Specifically, the pixel position with a lower attention level becomes darker from the pixel value of the input image 40.

例えば図５（ｂ）において、主要被写体である人物の部分４１ａ〜４１ｄはいずれも高い注目度を有している。従って、図７に示すように、注目領域画像９０において、それらの部分の画素値は、入力画像４０からそれほど変化していない（暗くなっていない）。他方、図５（ｂ）において、地面の部分４４には、低い注目度が割り当てられている。従って、図７に示すように、注目領域画像９０において、その部分の画素値は、入力画像４０から大きく下がり、黒に近い状態となっている。 For example, in FIG. 5B, the portions 41a to 41d of the person that is the main subject all have a high degree of attention. Therefore, as shown in FIG. 7, in the attention area image 90, the pixel values of those portions are not changed so much from the input image 40 (not darkened). On the other hand, in FIG. 5B, a low degree of attention is assigned to the ground portion 44. Therefore, as shown in FIG. 7, in the attention area image 90, the pixel value of that portion is greatly lowered from the input image 40 and is close to black.

つまり、注目領域画像９０において、主要被写体（注目領域）の部分はほぼ入力画像４０そのままの状態になり（輝度は若干低下する可能性がある）、それ以外の部分はほぼ黒の状態になる。注目領域でない部分は完全に黒になるわけではないが、注目領域に比べて、きわめて目立たない状態になる。このような注目領域画像９０は、画像分類部１１による類似画像の検索には十分なものである。 That is, in the attention area image 90, the main subject (attention area) portion is almost in the state of the input image 40 (the luminance may be slightly reduced), and the other portions are substantially black. The part that is not the attention area is not completely black, but is inconspicuous compared to the attention area. Such a region-of-interest image 90 is sufficient for searching for similar images by the image classification unit 11.

なお、注目領域画像９０に対してレベル補正を行い、一定未満の輝度値しか有していない領域（注目度が一定未満である領域）が完全な黒になるようにしてもよい。また、注目領域画像９０に対して周知のエッジ検出処理を行えば、注目領域の外形を特定することができる。ここで特定される外形は、マスク作成部２２により推定される外形よりも精緻なものとなる。更に、マップ５０における注目度のヒストグラムを作成し、ピーク値未満の注目度しか有していない領域を注目領域画像９０から除外する（その領域の輝度値を０にする）こともできる。 Note that level correction may be performed on the attention area image 90 so that an area having a luminance value less than a certain value (an area where the attention level is less than a certain value) becomes completely black. Further, if a known edge detection process is performed on the attention area image 90, the outer shape of the attention area can be specified. The outer shape specified here is more precise than the outer shape estimated by the mask creating unit 22. Furthermore, it is possible to create a histogram of the attention level in the map 50 and exclude an area having an attention level less than the peak value from the attention area image 90 (set the luminance value of that area to 0).

図８は、注目領域画像作成部１２が実行する注目領域画像作成処理のフローチャートである。まずステップＳ１０において、注目領域画像作成部１２が入力画像４０の入力を受け付ける。ステップＳ２０でマップ作成部２１が、入力画像４０に基づいてマップ５０を作成する。ステップＳ３０でマスク作成部２２が、入力画像４０から局所特徴点６０を検出する。ステップＳ４０でマスク作成部２２が、検出した局所特徴点６０に基づきマスク７０を作成する。ステップＳ５０でマスク処理部２３が、マップ５０とマスク７０とから処理済マップ８０を作成する。ステップＳ６０で抽出部２４が、入力画像４０と処理済マップ８０とから注目領域画像９０を作成する。 FIG. 8 is a flowchart of attention area image creation processing executed by the attention area image creation unit 12. First, in step S 10, the attention area image creation unit 12 receives an input of the input image 40. In step S 20, the map creation unit 21 creates a map 50 based on the input image 40. In step S 30, the mask creation unit 22 detects local feature points 60 from the input image 40. In step S 40, the mask creation unit 22 creates a mask 70 based on the detected local feature points 60. In step S50, the mask processing unit 23 creates a processed map 80 from the map 50 and the mask 70. In step S 60, the extraction unit 24 creates the attention area image 90 from the input image 40 and the processed map 80.

上述した第１の実施の形態による画像処理システムによれば、次の作用効果が得られる。
（１）マップ作成部２１は、入力画像４０の注目度の分布を示すマップ５０を作成する。マスク作成部２２は、入力画像４０のエッジ情報に基づき、注目領域の外形を示すマスク７０を作成する。抽出部２４は、マップ作成部２１により作成されたマップ５０と、マスク作成部２２により作成されたマスク７０とに基づいて、入力画像４０から注目領域を抽出する。このようにしたので、注目領域を高精度に且つ高速に抽出することができる。特に、非特許文献１に記載された従来技術に比べて、色情報に基づくクラスタリングという膨大な計算量が必要になる処理を行う必要がないので、注目領域を高速に抽出することができる。 According to the image processing system of the first embodiment described above, the following operational effects can be obtained.
(1) The map creation unit 21 creates a map 50 indicating the distribution of the attention level of the input image 40. Based on the edge information of the input image 40, the mask creation unit 22 creates a mask 70 that indicates the outer shape of the region of interest. The extraction unit 24 extracts a region of interest from the input image 40 based on the map 50 created by the map creation unit 21 and the mask 70 created by the mask creation unit 22. Since it did in this way, an attention area can be extracted with high precision and at high speed. In particular, as compared with the prior art described in Non-Patent Document 1, it is not necessary to perform a process that requires a huge amount of calculation such as clustering based on color information, so that a region of interest can be extracted at high speed.

（２）マスク処理部２３は、マップ作成部２１により作成されたマップ５０とマスク作成部２２により作成されたマスク７０との乗算処理を行うことにより処理済マップ８０（マスク処理済マップ）を作成する。抽出部２４は、マスク処理部２３により作成した処理済マップ８０と入力画像４０との乗算処理を行うことにより入力画像４０から注目領域を抽出する。このようにしたので、注目領域を高精度に且つ高速に抽出することができる。 (2) The mask processing unit 23 creates a processed map 80 (mask processed map) by performing multiplication processing of the map 50 created by the map creating unit 21 and the mask 70 created by the mask creating unit 22. To do. The extraction unit 24 extracts a region of interest from the input image 40 by performing multiplication processing of the processed map 80 created by the mask processing unit 23 and the input image 40. Since it did in this way, an attention area can be extracted with high precision and at high speed.

（３）マップ作成部２１は、入力画像４０を平滑化した平滑化画像と、入力画像４０との差の絶対値を演算することにより、マップ５０を作成する。このようにしたので、入力画像４０における注目度の分布を精度よく演算することができる。 (3) The map creation unit 21 creates the map 50 by calculating the absolute value of the difference between the smoothed image obtained by smoothing the input image 40 and the input image 40. Since it did in this way, the distribution of the attention level in the input image 40 can be calculated accurately.

（４）マスク作成部２２は、入力画像４０のエッジ情報から複数の局所特徴点６０を検出し、複数の局所特徴点６０の最外周を包括する形状を注目領域の外形と推定する。このようにしたので、注目領域の外形を精度よく容易に推定することができる。 (4) The mask creation unit 22 detects a plurality of local feature points 60 from the edge information of the input image 40 and estimates a shape including the outermost periphery of the plurality of local feature points 60 as the outer shape of the attention area. Since it did in this way, the external shape of an attention area can be estimated easily with sufficient accuracy.

次のような変形も本発明の範囲内であり、変形例の一つ、もしくは複数を上述の実施形態と組み合わせることも可能である。 The following modifications are also within the scope of the present invention, and one or a plurality of modifications can be combined with the above-described embodiment.

（変形例１）
上述した実施形態において、マスク作成部２２は局所特徴点６０の最外周を包括することにより注目領域の外形を推定していたが、これ以外の方法によりマスク７０を作成してもよい。例えば図９（ａ）および図９（ｂ）に示すように、各々の局所特徴点６０の位置に、その位置を中心とする所定の大きさの正方形を配置したものをマスク７０としてもよい。この場合、図９（ｂ）に示すように、人物の部分の内部に、「０」が割り当てられた領域（領域７１、７２等）が生じてしまうが、多数の局所特徴点６０の最外周を調べるために必要な計算量を削減することが可能になる。なお、正方形以外の図形（例えば菱形等）を配置してもよい。 (Modification 1)
In the embodiment described above, the mask creating unit 22 estimates the outer shape of the region of interest by including the outermost periphery of the local feature point 60. However, the mask 70 may be created by other methods. For example, as shown in FIGS. 9A and 9B, a mask 70 may be formed by arranging a square of a predetermined size centered on the position of each local feature point 60. In this case, as shown in FIG. 9B, areas (areas 71, 72, etc.) to which “0” is assigned are generated inside the person portion, but the outermost periphery of many local feature points 60 It is possible to reduce the amount of calculation required to check A figure other than a square (for example, a rhombus) may be arranged.

なお、上記の領域７１、７２等は、当該領域内に局所特徴点６０が存在しないために生じてしまうものである。しかしながら、その領域内に局所特徴点６０が存在しないということは、すなわち入力画像４０のその領域内にテクスチャが存在しない（エッジが存在しない）ということである。従って、それらの領域７１、７２等の色は、その周囲の色と略同一であると推定することができる。そこで、最終的に作成された注目領域画像９０において、それらの領域７１、７２をその周囲の色により塗りつぶせば、領域７１、７２等のように、注目領域内に生じるマスクされてしまう部分を補間することができる。 Note that the above-described regions 71 and 72 are generated because the local feature point 60 does not exist in the region. However, the fact that the local feature point 60 does not exist in that region means that no texture exists (no edge exists) in that region of the input image 40. Therefore, it can be estimated that the colors of the regions 71 and 72 are substantially the same as the surrounding colors. Therefore, in the finally created attention area image 90, if these areas 71 and 72 are filled with the surrounding color, the masked portion generated in the attention area, such as the areas 71 and 72, is interpolated. can do.

（変形例２）
例えば、マスク７０により表される注目領域の外形が非常に小さい場合を考える。このとき、その注目領域内に非常に高い注目度を有する部分があると、注目領域画像９０において、注目領域内のそれ以外の部分が非常に暗くなってしまう。このように、マスク作成部２２により推定された注目領域の外形の大きさに応じて、注目度の領域画像９０への影響の大きさが変化する。そこで、マスク処理部２３が、処理済マップ８０を作成するに際し、注目領域の外形に応じてマップ５０に重み付けを行うようにしてもよい。具体的には、次式（２）、（３）によって、注目度Ｓ（ｘ，ｙ）から重み付け後の注目度Ｓ’（ｘ，ｙ）を演算する。
Ｓ’（ｘ，ｙ）＝Ｓ（ｘ，ｙ）^α ・・・（２）
α＝（Ａｓ／Ａｉ）×Ｃ・・・（３）
ここでＡｓはマスク７０により表される注目領域の外形の面積、Ａｉは入力画像４０の全体の面積、Ｃは定数である。この例のように、マスク７０により表される注目領域の外形の面積が小さいほど、注目度の注目領域画像９０への影響を小さくすることで、注目領域の外形の大小が注目領域画像９０の鮮明さに与える影響を小さくすることができる。 (Modification 2)
For example, consider a case where the outer shape of the region of interest represented by the mask 70 is very small. At this time, if there is a part having a very high degree of attention in the attention area, the other part in the attention area in the attention area image 90 becomes very dark. Thus, the magnitude of the influence of the attention level on the area image 90 changes according to the size of the outline of the attention area estimated by the mask creating unit 22. Therefore, the mask processing unit 23 may weight the map 50 according to the outer shape of the attention area when creating the processed map 80. Specifically, the weighted attention level S ′ (x, y) is calculated from the attention level S (x, y) by the following equations (2) and (3).
S ′ (x, y) = S (x, y) ^α (2)
α = (As / Ai) × C (3)
Here, As is the outer area of the region of interest represented by the mask 70, Ai is the entire area of the input image 40, and C is a constant. As in this example, the smaller the outer area of the attention area represented by the mask 70 is, the smaller the influence of the attention degree on the attention area image 90 is. The influence on clearness can be reduced.

（変形例３）
マップ５０の作成に当たって、必ずしも入力画像４０の１画素ごとに１つの注目度を割り当てる必要はない。例えば、入力画像４０を４画素×４画素の領域に碁盤目状に分割し、その各々の領域に対して１つの注目度を割り当ててもよい。また、これ以外の方法で入力画像４０をいくつかの領域に分割してもよい。マスク７０についても同様である。 (Modification 3)
In creating the map 50, it is not always necessary to assign one attention level to each pixel of the input image 40. For example, the input image 40 may be divided into a 4-pixel by 4-pixel area in a grid pattern, and one attention degree may be assigned to each area. Further, the input image 40 may be divided into several regions by other methods. The same applies to the mask 70.

（変形例４）
上述した実施形態では、注目領域画像９０を類似画像の分類に利用する例について説明したが、本発明により作成された注目領域画像９０は、これ以外の用途に利用することも可能である。例えば、注目領域画像９０同士を合成することで、多数の主要被写体が集合した画像を作成することもできる。また、ある画像から抽出された注目領域画像９０を風景画像と合成して、その注目領域画像９０により表される主要被写体があたかもその風景の中に存在するかのような画像を作成することもできる。 (Modification 4)
In the above-described embodiment, an example in which the attention area image 90 is used for classification of similar images has been described. However, the attention area image 90 created according to the present invention can be used for other purposes. For example, by synthesizing the attention area images 90, an image in which a large number of main subjects are gathered can be created. Further, the attention area image 90 extracted from a certain image may be combined with the landscape image to create an image as if the main subject represented by the attention area image 90 exists in the landscape. it can.

本発明の特徴を損なわない限り、本発明は上記実施の形態に限定されるものではなく、本発明の技術的思想の範囲内で考えられるその他の形態についても、本発明の範囲内に含まれる。 As long as the characteristics of the present invention are not impaired, the present invention is not limited to the above-described embodiments, and other forms conceivable within the scope of the technical idea of the present invention are also included in the scope of the present invention. .

１…画像処理システム、２…カメラ、３…画像処理装置、４…ハードディスクドライブ（ＨＤＤ）、１１…画像分類部、１２…注目領域画像作成部、２１…マップ作成部、２２…マスク作成部、２３…マスク処理部、２４…抽出部 DESCRIPTION OF SYMBOLS 1 ... Image processing system, 2 ... Camera, 3 ... Image processing apparatus, 4 ... Hard disk drive (HDD), 11 ... Image classification | category part, 12 ... Attention area image creation part, 21 ... Map creation part, 22 ... Mask creation part, 23 ... Mask processing unit, 24 ... Extraction unit

Claims

A map creation unit that creates a map that shows the distribution of the attention level of the input image;
Based on the edge information of the input image, a mask creating unit that creates a mask indicating the outer shape of the region of interest;
An extraction unit that extracts the region of interest from the input image based on the map created by the map creation unit and the mask created by the mask creation unit;
An image processing apparatus comprising:

The image processing apparatus according to claim 1.
The extraction unit creates a mask-processed map by multiplying the map created by the map creation unit and the mask created by the mask creation unit, and the created mask-processed map and An image processing apparatus, wherein the region of interest is extracted from the input image by performing a multiplication process with the input image.

The image processing apparatus according to claim 1 or 2,
The image processing apparatus, wherein the map creation unit creates the map by calculating an absolute value of a difference between a smoothed image obtained by smoothing the input image and the input image.

In the image processing device according to any one of claims 1 to 3,
The mask generating unit detects a plurality of local feature points from edge information of the input image, and estimates a shape including the outermost periphery of the plurality of local feature points as an outer shape of the attention area. Processing equipment.

In the image processing device according to any one of claims 1 to 3,
The mask creation unit detects a plurality of local feature points from edge information of the input image, and estimates a shape in which a figure of a predetermined shape is arranged at each position of the plurality of local feature points as an outline of the attention area An image processing apparatus.

A map creation step for creating a map showing the distribution of the attention level of the input image;
Based on the edge information of the input image, a mask creating step for creating a mask indicating the outer shape of the region of interest;
An extraction step for extracting the region of interest from the input image based on the map created by the map creation step and the mask created by the mask creation step;
An image processing program for causing a computer to execute.