JP6949671B2

JP6949671B2 - Information processing device, image area selection method, computer program, and storage medium

Info

Publication number: JP6949671B2
Application number: JP2017212810A
Authority: JP
Inventors: 雅人青葉
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-11-02
Filing date: 2017-11-02
Publication date: 2021-10-13
Anticipated expiration: 2037-11-02
Also published as: JP2019086899A

Description

本発明は、画像から所定の領域を選択するための情報処理装置に関する。 The present invention relates to an information processing device for selecting a predetermined area from an image.

情報処理装置は、画像内の選択された領域を対象として所定の処理を行うことがある。このための対象領域の選択方法として、様々なユーザインタフェースが提案されている。最も一般的な方法には、画像中の一点をマウス等のポインティングデバイスでクリックし、ドラッグすることでバウンディングボックスを選択する方法である。この他に、切り抜きたい領域の輪郭を複数回クリックすることで輪郭を切り出すスライスツールなども一般的に用いられる方法である。これらの方法は、いずれもユーザが手動で領域の選択を行う。これら手動による領域選択に対して、自動／半自動による領域選択を行う方法も提案されている。 The information processing device may perform a predetermined process on a selected area in the image. Various user interfaces have been proposed as a method for selecting a target area for this purpose. The most common method is to select a bounding box by clicking and dragging a point in the image with a pointing device such as a mouse. In addition to this, a slice tool that cuts out the outline by clicking the outline of the area to be cut out multiple times is also a commonly used method. In each of these methods, the user manually selects the area. In response to these manual area selections, a method of performing automatic / semi-automatic area selection has also been proposed.

特許文献１は、人の頭頂部及び眼を検出し、その検出結果から顔領域のサイズを自動調節してトリミングサイズを決定する画像処理装置を開示する。この画像処理装置の処理は、顔の領域選択に特化したボトムアップ手法を用いて行われる。特許文献２は、ボトムアップ手法として代表的な、領域成長（region growing）による領域選択を行う画像抽出装置を開示する。この画像抽出装置は、まず、背景差分やオプティカルフローなどの一次特徴を有する領域を分割する。画像抽出装置は、一次特徴で分割した領域から選択した領域を中心として、色成分などの二次特徴で類似した領域を連結し、物体の領域を抽出する。特許文献３は、グラフベースの手法を提案する。この手法は、選択すべき領域の輪郭より内側の領域を大雑把にユーザが指定することで、指定領域内部の特徴分布に従い、グラフカットを繰り返して物体領域を算出する。 Patent Document 1 discloses an image processing device that detects a person's crown and eyes, and automatically adjusts the size of a face region from the detection results to determine a trimming size. The processing of this image processing device is performed by using a bottom-up method specialized in face area selection. Patent Document 2 discloses an image extraction device that selects a region by region growing, which is typical as a bottom-up method. This image extraction device first divides a region having primary features such as background subtraction and optical flow. The image extraction device extracts an object region by connecting similar regions with secondary features such as color components, centering on a region selected from the regions divided by the primary features. Patent Document 3 proposes a graph-based method. In this method, the user roughly specifies the area inside the contour of the area to be selected, and the graph cut is repeated according to the feature distribution inside the specified area to calculate the object area.

一方で、画像を人物の領域、自動車の領域、道路の領域、建物の領域、空の領域などの、意味的な領域を切り出す課題が研究されている。このような課題は、意味的領域分割（Semantic Segmentation）と呼ばれ、物の種類に対応した画像補正や、シーン解釈などへの応用が期待される。意味的領域分割を行うにあたり、画像の各位置に関するカテゴリラベルの識別を、画素単位ではなく、小領域（superpixel）単位で行うことは、すでに一般的である。小領域は、主に類似した特徴を持つ小さな領域として画像から切り出されるものである。類似した特徴の小領域の切り出しは、様々な手法が提案されている。非特許文献１は、このような手法の代表的なものである。小領域は、その内部の特徴量、或いはその周辺のコンテクスト特徴量も一緒に用いてカテゴリラベルが識別される。通常は、様々な学習画像を用いてこのような局所ベースの領域識別器を学習させることで、領域識別が行われることになる。非特許文献２に開示される技術は、画像を複数レベルで小領域に分割し、各レベルにおける小領域を線形ＳＶＭ（Support Vector Machine）で識別する。各画素におけるすべてのレベルにおけるカテゴリ尤度を線形ＳＶＭの入力として、画像の各画素のカテゴリラベルが推定される。 On the other hand, the problem of cutting out a semantic area such as a person area, an automobile area, a road area, a building area, and an empty area from an image is being studied. Such a problem is called semantic segmentation, and is expected to be applied to image correction corresponding to the type of object, scene interpretation, and the like. In performing the semantic region division, it is already common to identify the category label for each position of the image in units of small areas (superpixels) instead of units of pixels. The small area is mainly cut out from the image as a small area having similar characteristics. Various methods have been proposed for cutting out small areas with similar characteristics. Non-Patent Document 1 is a typical example of such a method. The category label of the small area is identified by using the internal feature amount or the context feature amount around it. Usually, region identification is performed by training such a locally-based region classifier using various training images. The technique disclosed in Non-Patent Document 2 divides an image into small regions at a plurality of levels, and identifies the small regions at each level by a linear SVM (Support Vector Machine). The category label of each pixel of the image is estimated with the category likelihood at all levels in each pixel as the input of the linear SVM.

特開２００２−１５２４９２号公報Japanese Unexamined Patent Publication No. 2002-152492 特開平９−１８５７２０号公報Japanese Unexamined Patent Publication No. 9-185720 米国特許第７６６０４６３号明細書U.S. Pat. No. 7,660,463

SLIC Superpixels,R.Achanta,A.Shaji,K.Smith,A.Lucchi,EPFL TechnicalReport,2010.SLIC Superpixels, R.Achanta, A.Shaji, K.Smith, A.Lucchi, EPFL Technical Report, 2010. RGB-(D) Scene Labeling:Features and Algorithms,X.Ren,L.Bo and D.Fox,CVPR2012.RGB- (D) Scene Labeling: Features and Algorithms, X.Ren, L.Bo and D.Fox, CVPR2012.

ユーザが選択したい画像中の領域が大きかったり不定形である場合、ユーザが正確に領域選択することは難しい。例えば両腕を広げた人物を囲むバウンディングボックスを指定する場合、最初の一点を正しい位置に置くことに失敗すると、腕が切れてしまったり、或いは人物に対して大きすぎるバウンディングボックスが得られてしまうことがある。また、複数点指定による輪郭切り出しは、領域輪郭の凹凸が多い場合には非常に手間のかかる作業となる。 When the area in the image that the user wants to select is large or irregular, it is difficult for the user to accurately select the area. For example, if you specify a bounding box that surrounds a person with both arms outstretched, if you fail to place the first point in the correct position, your arms will break or you will get a bounding box that is too large for the person. Sometimes. Further, cutting out a contour by designating a plurality of points is a very time-consuming work when there are many irregularities on the contour of the area.

前述したボトムアップ手法では、隣接した領域の類似性により領域を拡大していく。そのために、例えば赤いランニングシャツを着て走っている人物の背景に肌色に近い壁がある画像では、腕とランニングシャツよりも、腕と壁の方が類似した領域と判定される。この場合、腕と壁が連結された領域と判断されてしまい、画像中の人物が一つの領域として選択されない。 In the bottom-up method described above, the area is expanded by the similarity of adjacent areas. Therefore, for example, in an image in which a person running in a red running shirt has a wall close to skin color in the background, it is determined that the arm and the wall are more similar areas than the arm and the running shirt. In this case, it is determined that the arm and the wall are connected to each other, and the person in the image is not selected as one area.

本発明は、上記課題に鑑みてなされたものであり、画像中から簡単な操作で所定の領域を選択することが可能な情報処理装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an information processing apparatus capable of selecting a predetermined area from an image by a simple operation.

本発明の情報処理装置は、画像を取得する画像取得手段と、取得した前記画像における領域を階層的に複数のカテゴリに識別する領域識別手段と、前記画像を表示する表示手段と、前記表示手段に表示された画像に対するユーザの操作に応じて、所定の位置の領域を選択領域の初期領域として設定する初期領域設定手段と、ユーザの所定の操作に応じて、前記階層的なカテゴリ判定結果に従って前記選択領域の拡張及び縮小を行い、前記選択領域を更新する領域制御手段と、を備えることを特徴とする。 The information processing apparatus of the present invention includes an image acquisition means for acquiring an image, an area identification means for hierarchically identifying areas in the acquired image into a plurality of categories, a display means for displaying the image, and the display means. According to the initial area setting means for setting the area at a predetermined position as the initial area of the selection area according to the user's operation on the image displayed on the image, and according to the hierarchical category determination result according to the user's predetermined operation. It is characterized by comprising an area control means for expanding and contracting the selected area and updating the selected area.

本発明によれば、ユーザが画像中から簡単な操作で所定の領域を選択することが可能となる。 According to the present invention, a user can select a predetermined area from an image by a simple operation.

情報処理装置の説明図。Explanatory drawing of information processing apparatus. 学習処理を表すフローチャート。A flowchart showing the learning process. （ａ）〜（ｃ）は、学習画像及び領域カテゴリラベルデータの説明図。(A) to (c) are explanatory views of the training image and the area category label data. 領域カテゴリラベルの説明図。Explanatory drawing of the area category label. （ａ）、（ｂ）は、画像領域選択処理を表すフローチャート。(A) and (b) are flowcharts showing an image area selection process. （ａ）〜（ｃ）は、画像領域選択処理の説明図。(A) to (c) are explanatory views of an image area selection process. （ａ）〜（ｃ）は、画像領域選択処理の説明図。(A) to (c) are explanatory views of an image area selection process. （ａ）〜（ｄ）は、画像領域選択処理の説明図。(A) to (d) are explanatory views of an image area selection process. （ａ）、（ｂ）は、画像領域選択処理の説明図。(A) and (b) are explanatory views of image area selection processing. （ａ）〜（ｄ）は、画像領域選択処理の説明図。(A) to (d) are explanatory views of an image area selection process. （ａ）〜（ｇ）は、画像領域選択処理の説明図。(A) to (g) are explanatory views of an image area selection process. （ａ）〜（ｃ）は、画像処理の説明図。(A) to (c) are explanatory views of image processing. Ｓ１６００の処理を表すフローチャート。The flowchart which shows the process of S1600. （ａ）〜（ｃ）は、飛び地拡張操作及び飛び地縮小操作の説明図。(A) to (c) are explanatory views of the excursion expansion operation and the excursion reduction operation. （ａ）〜（ｅ）は、飛び地拡張操作の説明図(A) to (e) are explanatory views of the excursion expansion operation. Ｓ１６００の処理を表すフローチャート。The flowchart which shows the process of S1600. （ａ）〜（ｄ）は、輪郭修正操作の説明図。(A) to (d) are explanatory views of the contour correction operation. （ａ）、（ｂ）は、領域追加操作の説明図。(A) and (b) are explanatory views of the area addition operation.

以下、図面を参照して、実施形態を詳細に説明する。 Hereinafter, embodiments will be described in detail with reference to the drawings.

（第１実施形態）
図１は、本実施形態の画像領域選択装置を実現する情報処理装置の説明図である。画像領域選択装置は、ユーザが画像から所望の領域を選択するための画像領域選択処理を行うための機能と、画像領域選択処理を行うために必要な領域識別器を予め生成するための学習処理を行うための機能とを備える。 (First Embodiment)
FIG. 1 is an explanatory diagram of an information processing device that realizes the image area selection device of the present embodiment. The image area selection device has a function for performing an image area selection process for the user to select a desired area from an image, and a learning process for generating in advance an area classifier necessary for performing the image area selection process. It has a function to perform.

画像領域選択処理を行うための機能について説明する。この機能は、画像取得部１１００、領域分割部１２００、領域識別部１３００、表示部１４００、初期領域設定部１５００、領域制御部１６００、及び処理部１７００により実現される。各機能は、すべて同じ情報処理装置上で実現されるものでもよく、それぞれ独立したモジュールで実現されてもよい。情報処理装置は、例えばパーソナルコンピュータとモニタとの組み合わせや、タブレット端末やスマートフォンなどを用いることができる。各機能は、情報処理装置に実装されるコンピュータプログラムをＣＰＵ（Central Processing Unit）で実行することで実現されてもよい。また各機能は、カメラ等の撮影装置内部において、ハードウェアもしくはコンピュータプログラムの実行により実現されてもよい。 The function for performing the image area selection process will be described. This function is realized by the image acquisition unit 1100, the area division unit 1200, the area identification unit 1300, the display unit 1400, the initial area setting unit 1500, the area control unit 1600, and the processing unit 1700. Each function may be realized on the same information processing device, or may be realized by independent modules. As the information processing device, for example, a combination of a personal computer and a monitor, a tablet terminal, a smartphone, or the like can be used. Each function may be realized by executing a computer program implemented in the information processing apparatus on a CPU (Central Processing Unit). Further, each function may be realized by executing hardware or a computer program inside a photographing device such as a camera.

画像取得部１１００は、外部装置から入力画像を取得する。領域分割部１２００は、画像取得部１１００で取得した入力画像を複数の小領域に分割する。領域識別部１３００は、領域識別器記憶部５２００に記憶されている領域識別器を読み出し、領域分割部１２００で分割された各小領域の領域カテゴリを推定する。領域識別器記憶部５２００には、後述する学習処理によって生成された領域識別器が記憶されている。表示部１４００は、画像取得部１１００で取得された入力画像を表示する表示装置である。ユーザは、表示部１４００の表示により入力画像を確認することができる。初期領域設定部１５００は、所定のインタフェースによるユーザの指示に応じて、入力画像の所定の位置の領域を初期領域に設定する。領域制御部１６００は、ユーザにより行われた操作に応じて、初期領域の拡張／縮小を行い、選択領域を生成する。処理部１７００は、選択領域に対する所定の処理を行う。 The image acquisition unit 1100 acquires an input image from an external device. The area division unit 1200 divides the input image acquired by the image acquisition unit 1100 into a plurality of small areas. The area identification unit 1300 reads out the area classifier stored in the area classifier storage unit 5200, and estimates the area category of each small area divided by the area division unit 1200. The area classifier storage unit 5200 stores the area classifier generated by the learning process described later. The display unit 1400 is a display device that displays the input image acquired by the image acquisition unit 1100. The user can confirm the input image by the display of the display unit 1400. The initial area setting unit 1500 sets the area at a predetermined position of the input image as the initial area according to the instruction of the user by the predetermined interface. The area control unit 1600 expands / reduces the initial area according to the operation performed by the user, and generates a selected area. The processing unit 1700 performs a predetermined process on the selected area.

画像領域選択処理で用いる領域識別器を生成するための学習処理を行うための機能について説明する。この機能は、学習データ取得部２１００、学習画像領域分割部２２００、及び領域識別器生成部２３００により実現される。各機能は、すべて同じ情報処理装置上で実現されるものでもよく、それぞれ独立したモジュールで実現されてもよい。各機能は、情報処理装置に実装されるコンピュータプログラムをＣＰＵ（Central Processing Unit）で実行することで実現されてもよい。 The function for performing the learning process for generating the area classifier used in the image area selection process will be described. This function is realized by the learning data acquisition unit 2100, the learning image area division unit 2200, and the area classifier generation unit 2300. Each function may be realized on the same information processing device, or may be realized by independent modules. Each function may be realized by executing a computer program implemented in the information processing apparatus on a CPU (Central Processing Unit).

学習データ取得部２１００は、学習データ記憶部５１００から学習データを取得する。学習データ記憶部５１００は、学習処理で用いる学習データを予め記憶する。学習データは、複数の学習画像と、学習画像の各画素に対応して階層定義する領域カテゴリラベルが付与された領域カテゴリラベルデータと、から構成される。学習画像領域分割部２２００は、学習データ取得部２１００で取得した学習データについて、それぞれの学習画像を小領域に分割する。領域識別器生成部２３００は、学習画像領域分割部２２００で分割した各小領域の特徴量と領域カテゴリラベルとに基づいて学習処理を行い、小領域のカテゴリを識別する領域識別器を生成する。領域識別器生成部２３００は、生成した領域識別器を領域識別器記憶部５２００に記憶させる。学習データ記憶部５１００及び領域識別器記憶部５２００は、情報処理装置の内部もしくは外部ストレージにより実現される。 The learning data acquisition unit 2100 acquires learning data from the learning data storage unit 5100. The learning data storage unit 5100 stores the learning data used in the learning process in advance. The learning data is composed of a plurality of learning images and area category label data to which an area category label for hierarchically defining each pixel of the learning image is assigned. The learning image area division unit 2200 divides each learning image into small areas for the learning data acquired by the learning data acquisition unit 2100. The area classifier generation unit 2300 performs learning processing based on the feature amount of each small area divided by the learning image area dividing unit 2200 and the area category label, and generates an area classifier that identifies the category of the small area. The area classifier generation unit 2300 stores the generated area classifier in the area classifier storage unit 5200. The learning data storage unit 5100 and the area classifier storage unit 5200 are realized by internal or external storage of the information processing device.

画像領域選択処理に用いる各機能と学習処理に用いる各機能とは、同じ情報処理装置上で実現してもよく、別々の情報処理装置で実現してもよい。学習処理と画像領域選択処理とを別々の情報処理装置で実現する場合、領域識別器記憶部５２００は、それぞれで異なるストレージにより実現されてもよい。その場合、学習処理で得られた領域識別器が、画像領域選択処理用の装置におけるストレージにコピーもしくは移動して用いられる。 Each function used for the image area selection process and each function used for the learning process may be realized on the same information processing device or may be realized by different information processing devices. When the learning process and the image area selection process are realized by separate information processing devices, the area classifier storage unit 5200 may be realized by different storages for each. In that case, the area classifier obtained in the learning process is used by copying or moving it to the storage in the device for the image area selection process.

以上のような構成の画像領域選択装置による学習処理及び画像領域選択処理について説明する。図２は、学習処理を表すフローチャートである。学習処理とは、画像領域選択処理を行うために利用される領域識別器を、事前に用意された学習画像から生成することである。一度学習して生成された領域識別器は、領域識別器記憶部５２００に記憶され、領域識別器記憶部５２００から読み出されて再利用される。そのために、画像領域選択処理時に学習処理を毎回行う必要はない。 The learning process and the image area selection process by the image area selection device having the above configuration will be described. FIG. 2 is a flowchart showing the learning process. The learning process is to generate an area classifier used for performing an image area selection process from a trained image prepared in advance. The area classifier generated by learning once is stored in the area classifier storage unit 5200, read from the area classifier storage unit 5200, and reused. Therefore, it is not necessary to perform the learning process each time during the image area selection process.

学習処理を開始すると、学習データ取得部２１００は、学習データ記憶部５１００から学習画像及び階層定義された領域カテゴリラベルデータを含む学習データを取得する（Ｓ２１００）。学習画像は、具体的にはデジタルカメラ等で撮影された画像データである。学習画像の枚数をＮ枚とし、ｎ番目の学習画像を学習画像Ｉ＿ｎ（ｎ＝１…Ｎ）と記載する。領域カテゴリラベルデータは、各学習画像の各画素に対して階層的な領域カテゴリラベルが割り振られる。階層数をＬとし、階層のインデックスをインデックスｌ＝１…Ｌと記載する。第ｌ階層で定義されているカテゴリ数をＭ＿ｌとする。 When the learning process is started, the learning data acquisition unit 2100 acquires the learning data including the learning image and the hierarchically defined area category label data from the learning data storage unit 5100 (S2100). The learning image is specifically image data taken by a digital camera or the like. The number of learning images is N, and the nth learning image is described as a learning image I_n (n = 1 ... N). In the area category label data, a hierarchical area category label is assigned to each pixel of each learning image. The number of layers is L, and the index of the layer is described as index l = 1 ... L. Let M_l be the number of categories defined in the first layer.

図３は、学習画像及び領域カテゴリラベルデータの説明図である。この例では階層数Ｌ＝５層の場合を説明するが、階層数はこの値に限定されるものではない。図３（ａ）に示す学習画像５００は、対応する領域カテゴリラベルデータが図３（ｂ）の階層５１０〜５５０で示される。領域カテゴリラベルは、粗から詳細へと被写体のカテゴリを与える。図３（ｂ）の例では、階層５１０が最も粗なカテゴリラベルであり、階層５２０、５３０、５４０の順に詳細なカテゴリが与えられ、階層５５０が最も詳細なカテゴリ定義である。ここでは、最も粗な階層５１０から順番に、第１階層、第２階層、…と呼ぶ。 FIG. 3 is an explanatory diagram of the training image and the area category label data. In this example, the case where the number of layers L = 5 layers will be described, but the number of layers is not limited to this value. In the learning image 500 shown in FIG. 3A, the corresponding area category label data is shown in layers 510 to 550 of FIG. 3B. Area category labels give the subject categories from coarse to detailed. In the example of FIG. 3B, layer 510 is the coarsest category label, detailed categories are given in the order of layers 520, 530, 540, and layer 550 is the most detailed category definition. Here, the first layer, the second layer, and so on are referred to in order from the coarsest layer 510.

第１階層５１０では、空５１１と非空５１２とのカテゴリラベルが割り振られている。第２階層５２０では、第１階層５１０における空５１１が空５２１として継承され、第１階層５１０における非空５１２が人体５２２と植物５２３とに分解されている。第３階層５３０では、空５２１は空５３１として継承され、人体５２２は顔５３２と上半身５３３とに分解され、植物５２３は花５３４と茎葉５３５とに分解されている。第４階層５４０では、空５３１は空５４１に、上半身５３３は上半身５４４に、花５３４は花５４５に、茎葉５３５は茎葉５４６として継承され、顔５３２は髪５４２と顔５４３とに分解されている。第５階層５５０では、空５４１は空５５１に、髪５４２は髪５５２に、上半身５４４は上半身５５５として継承される。また、顔５４３は目５５３、顔肌５５４、及び口５５６に、花５４５は花弁５５７及び管状花５５８に、茎葉５４６は葉５５９及びと茎５６０に、分解されている。 In the first layer 510, category labels of empty 511 and non-empty 512 are assigned. In the second layer 520, the sky 511 in the first layer 510 is inherited as the sky 521, and the non-empty 512 in the first layer 510 is decomposed into the human body 522 and the plant 523. In the third layer 530, the sky 521 is inherited as the sky 531 and the human body 522 is decomposed into the face 532 and the upper body 533, and the plant 523 is decomposed into the flowers 534 and the foliage 535. In the fourth layer 540, the sky 531 is inherited as the sky 541, the upper body 533 is inherited as the upper body 544, the flower 534 is inherited as the flower 545, the foliage 535 is inherited as the foliage 546, and the face 532 is decomposed into the hair 542 and the face 543. .. In the fifth layer 550, the sky 541 is inherited as the sky 551, the hair 542 is inherited as the hair 552, and the upper body 544 is inherited as the upper body 555. The face 543 is decomposed into eyes 553, facial skin 554, and mouth 556, the flower 545 is decomposed into petals 557 and tubular flowers 558, and the stems and leaves 546 are decomposed into leaves 559 and stems 560.

学習画像Ｉ＿ｎに対応する、第ｌ階層における領域カテゴリラベルデータをＧＴ＿（ｎ，ｌ）と表す。図３の例示以外にも、これらの意味的な領域カテゴリは階層的に包含関係が定義される。図４は、図３の領域カテゴリラベルの説明図である。領域カテゴリラベルは、これ以外にもさまざまな領域カテゴリや階層レベルの定義が可能であることは言うまでもない。 The area category label data in the first layer corresponding to the training image I_n is represented as GT_ (n, l). In addition to the examples in FIG. 3, these semantic area categories are hierarchically defined inclusive relations. FIG. 4 is an explanatory diagram of the area category label of FIG. It goes without saying that the area category label can define various area categories and hierarchy levels in addition to this.

学習画像領域分割部２２００は、取得した学習画像を小領域に分割する（Ｓ２２００）。小領域は、類似した特徴を持つ小さな領域として学習画像から切り出されるものである。学習画像を小領域に分割する手法はいくつか提案されている。代表的なものとして、非特許文献１のような手法がある。また、単純に、均一サイズの長方形に学習画像を分割して得られる領域を、小領域として用いてもよい。また、学習画像の各画素を小領域とみなしてもよく、その場合は特に分割処理を行う必要はない。図３（ｃ）では、学習画像５００を小領域に分割した結果を例示する。
学習画像Ｉ＿ｎに対する領域分割結果として、Ｒ＿ｎ個の小領域が生成される場合、学習用の小領域の総数はＲ＝ΣＲ＿ｎ個である。学習画像の小領域を通し番号でＳＰ＿ｒ（ｒ＝１…Ｒ）と記載する。 The learning image area dividing unit 2200 divides the acquired learning image into small areas (S2200). The small area is cut out from the training image as a small area having similar characteristics. Several methods have been proposed for dividing the training image into small areas. As a typical example, there is a method as described in Non-Patent Document 1. Further, a region obtained by simply dividing the learning image into rectangles of uniform size may be used as a small region. Further, each pixel of the learning image may be regarded as a small area, and in that case, it is not necessary to perform the division processing in particular. FIG. 3C exemplifies the result of dividing the learning image 500 into small areas.
When R_n small regions are generated as a result of region division for the training image I_n, the total number of small regions for training is R = ΣR_n. The small area of the training image is described as SP_r (r = 1 ... R) with a serial number.

領域識別器生成部２３００は、小領域のカテゴリを識別するための領域識別器を学習して生成する（Ｓ２３００）。領域識別器生成部２３００は、学習に用いる小領域に関する領域特徴を抽出する。ここで抽出される領域特徴は、小領域内部における色平均値や色ヒストグラム、小領域の位置や大きさ、ＬＢＰ（Local Binary Pattern）などのテクスチャ特徴など、その種類によって限定されるものではない。また、領域特徴は、小領域の周囲に関する線分や色の分布などによるコンテクスト特徴であってもよい。また、ＣＮＮ（Convolutional Neural Network）を利用して、その畳み込み層を特徴抽出器とみなしてもよい。小領域ＳＰ＿ｒから抽出された領域特徴を小領域特徴ｘ＿ｒと記載する。 The area classifier generation unit 2300 learns and generates an area classifier for identifying a category of a small area (S2300). The area classifier generator 2300 extracts the area features related to the small area used for learning. The area features extracted here are not limited to the types such as the color average value and the color histogram inside the small area, the position and size of the small area, and the texture features such as LBP (Local Binary Pattern). Further, the region feature may be a context feature based on a line segment or a color distribution around the small region. Further, the convolutional layer may be regarded as a feature extractor by using CNN (Convolutional Neural Network). The region feature extracted from the small region SP_r is referred to as a small region feature x_r.

ＧＴ＿（ｎ，ｌ）の小領域ＳＰ＿ｒに対応する第ｌ階層の領域カテゴリラベルをｃ＿（ｒ，ｌ）とすると、小領域ＳＰ＿ｒに対する第ｌ階層の教師ベクトルτ＿（ｒ，ｌ）は下記の式で表される。 Assuming that the area category label of the first layer corresponding to the small area SP_r of GT_ (n, l) is c_ (r, l), the teacher vector τ_ (r, l) of the first layer for the small area SP_r is the following equation. It is represented by.

ここで領域カテゴリラベルｃ＿（ｒ，ｌ）は、小領域ＳＰ＿ｒが、第ｌ階層において領域カテゴリラベルとして割り振られているカテゴリのインデックスである。領域識別器の学習は、識別関数に対して小領域特徴ｘ＿ｒを入力したときに得られる出力ベクトルと教師ベクトルτ＿（ｒ，ｌ）との誤差が、全学習データを通して小さくなるように識別関数のパラメータを調整して、領域識別器を生成することである。領域識別器は、小領域特徴ｘ＿ｒを入力として、各階層における領域カテゴリのスコアベクトルｆ＿ｌ（ｘ＿ｒ）を出力する。スコアベクトルｆ＿ｌ（ｘ＿ｒ）は、Ｍ＿ｌ次元のベクトルである。スコアベクトルｆ＿ｌ（ｘ＿ｒ）の各要素は、各領域カテゴリに対するスコアであって、第ｌ階層のｃ番目の領域カテゴリに対するスコアをｆ＿ｃ（ｘ＿ｒ）（ｃ＝１…Ｍ＿ｌ）と記載する。 Here, the area category label c_ (r, l) is an index of the category in which the small area SP_r is assigned as the area category label in the first layer. The learning of the region classifier is performed so that the error between the output vector and the teacher vector τ_ (r, l) obtained when the small region feature x_r is input to the discriminant function is small throughout the entire training data. Adjusting the parameters to generate a region classifier. The area classifier takes the small area feature x_r as an input and outputs the score vector f_l (x_r) of the area category in each layer. The score vector f_l (x_r) is an M_l dimension vector. Each element of the score vector f_l (x_r) is a score for each area category, and the score for the cth area category of the first layer is described as f_c (x_r) (c = 1 ... M_l).

識別関数のモデルとその学習方法には、様々なものが考えられる。例えばＳＶＭや多層ニューラルネットワーク、ロジスティック回帰などを用いることができる。また、前述のＣＮＮを利用する場合には、その全結合層を識別関数のモデルとみなすことができ、特徴抽出を担う畳み込み層も含めて学習することが可能である。本実施形態の識別関数のモデルとその学習方法は、それらの種類によって限定されるものではない。領域識別器生成部２３００は、学習して得られた領域識別器を領域識別器記憶部５２００に記憶させる。 There are various possible models of discriminant functions and their learning methods. For example, SVM, multi-layer neural network, logistic regression and the like can be used. Further, when the above-mentioned CNN is used, the fully connected layer can be regarded as a model of the discriminant function, and it is possible to learn including the convolution layer responsible for feature extraction. The model of the discriminant function of the present embodiment and the learning method thereof are not limited by their types. The area classifier generation unit 2300 stores the area classifier obtained by learning in the area classifier storage unit 5200.

以上のように学習処理が行われる。情報処理装置は、学習処理によって得られた領域識別器を用いて画像領域選択処理を行う。図５は、画像領域選択処理を表すフローチャートである。図５（ａ）は、画像領域選択処理の全体処理を表す。図６、図７、図８、図９、図１０、図１１は、画像領域選択処理の説明図である。 The learning process is performed as described above. The information processing device performs the image area selection process using the area classifier obtained by the learning process. FIG. 5 is a flowchart showing the image area selection process. FIG. 5A shows the entire processing of the image area selection processing. 6, FIG. 7, FIG. 8, FIG. 9, FIG. 10, and FIG. 11 are explanatory views of the image area selection process.

画像取得部１１００は、入力画像を取得する（Ｓ１１００）。図６（ａ）は、入力画像１００を例示する。入力画像１００の取得方法に関しては様々なものが考えられるが、本実施形態はその取得方法に関して限定されるものではない。例えば、カメラなどの撮像装置から直接取得するものでもよく、予めハードディスクなどのストレージに保存されている画像データから取得するものでもよい。 The image acquisition unit 1100 acquires an input image (S1100). FIG. 6A illustrates the input image 100. Various methods can be considered for acquiring the input image 100, but the present embodiment is not limited to the acquisition method. For example, it may be acquired directly from an image pickup device such as a camera, or may be acquired from image data previously stored in a storage such as a hard disk.

領域分割部１２００は、取得した入力画像１００を小領域に分割する（Ｓ１２００）。ここで行われる小領域への分割処理は、学習画像領域分割部２２００がＳ２２００の処理と同様の処理であることが好ましい。各画素を小領域とみなす場合には、特に分割処理を行う必要はない。図６（ｂ）は、入力画像１００の領域分割結果２００を例示する。入力画像１００を領域分割して得られた小領域の総数をＫとする。 The area division unit 1200 divides the acquired input image 100 into small areas (S1200). It is preferable that the learning image region dividing unit 2200 is the same processing as the processing of S2200 in the division processing into small regions performed here. When each pixel is regarded as a small area, it is not necessary to perform the division process in particular. FIG. 6B illustrates the region division result 200 of the input image 100. Let K be the total number of small areas obtained by dividing the input image 100 into areas.

領域識別部１３００は、入力画像１００の各小領域に関して領域カテゴリを識別する（Ｓ１３００）。領域識別部１３００は、領域分割部１２００で生成される各小領域の領域特徴を抽出する。抽出する領域特徴は、例えば図２のＳ２３００の処理で領域識別器生成部２３００が学習画像の小領域から抽出する領域特徴と同じ種類のものである。小領域ＳＰ＿ｋ（ｋ＝１…Ｋ）から抽出された領域特徴をｘ＿ｋとする。領域識別部１３００は、領域識別器記憶部５２００に記憶される学習処理で得られた領域識別器を読み込む。領域識別部１３００は、読み込んだ領域識別器ｆ＿ｌ（ｌ＝１…Ｌ）に対して、それぞれの小領域ＳＰ＿ｋに関する領域特徴ｘ＿ｋを入力する。これにより領域識別部１３００は、第ｌ階層における各領域カテゴリのスコアベクトルｆ＿ｌ（ｘ＿ｋ）を生成する。 The area identification unit 1300 identifies the area category for each small area of the input image 100 (S1300). The area identification unit 1300 extracts the area features of each small area generated by the area division unit 1200. The region feature to be extracted is, for example, the same type as the region feature extracted from the small region of the learning image by the region classifier generator 2300 in the process of S2300 of FIG. Let x_k be the region feature extracted from the small region SP_k (k = 1 ... K). The area classifier 1300 reads the area classifier obtained by the learning process stored in the area classifier storage unit 5200. The area identification unit 1300 inputs the area feature x_k related to each small area SP_k to the read area classifier f_l (l = 1 ... L). As a result, the area identification unit 1300 generates the score vector f_l (x_k) of each area category in the first layer.

第ｌ階層の各小領域ＳＰ＿ｋに対する領域識別結果は、例えばスコアベクトルｆ＿ｌ（ｘ＿ｋ）が最大となるカテゴリｃ＿（ｋ，ｌ）として生成される。 The area identification result for each small area SP_k of the first layer is generated, for example, as a category c_ (k, l) having the maximum score vector f_l (x_k).

領域識別部１３００は、すべての小領域ＳＰ＿ｋ（ｋ＝１…Ｋ）に関して、すべての階層ｌにおける領域識別器ｆ＿ｌを適用して、すべての領域識別結果であるカテゴリｃ＿（ｋ，ｌ）が得られると、Ｓ１３００の処理を終了する。図６（ｃ）は、領域識別結果１１０、１２０、１３０、１４０、１５０を例示する。最も詳細なカテゴリ識別結果として、第５階層における識別結果が領域識別結果１５０に示されている。この例では、口１５１、髪１５２、目１５３、顔肌１５４、腕１５５、胴１５６、手１５７、股１５８、脚１５９、足１６１、屋内壁１６２、家具１６３、外壁１６４、及び床１６５などの領域が得られる。第４階層における領域識別結果１４０では、髪１４１、顔肌１４２、腕１４３、胴体１４４、股１４５、脚１４６、頭１４７、及び屋内１４８といった領域が得られている。第３階層における領域識別結果１３０では、頭１３１、１３４、上半身１３２、下半身１３３、及び建物１３５といった領域が得られている。第２階層における領域識別結果１２０では、人工物１２１、人体１２２、１２３の領域が得られている。第１階層における領域識別結果１１０では、画面全体の領域が非空１１１と識別されている。 The area identification unit 1300 applies the area classifier f_l in all layers l to all the small areas SP_k (k = 1 ... K) to obtain the category c_ (k, l) which is the result of all area identification. When it is done, the processing of S1300 is terminated. FIG. 6 (c) illustrates the region identification results 110, 120, 130, 140, 150. As the most detailed category identification result, the identification result in the fifth layer is shown in the area identification result 150. In this example, mouth 151, hair 152, eyes 153, facial skin 154, arms 155, torso 156, hands 157, crotch 158, legs 159, feet 161, indoor wall 162, furniture 163, outer wall 164, floor 165, and the like. The area is obtained. In the region identification result 140 in the fourth layer, regions such as hair 141, facial skin 142, arms 143, torso 144, crotch 145, legs 146, head 147, and indoor 148 are obtained. In the area identification result 130 in the third layer, areas such as the head 131, 134, the upper body 132, the lower body 133, and the building 135 are obtained. In the area identification result 120 in the second layer, the areas of the artificial object 121, the human body 122, and 123 are obtained. In the area identification result 110 in the first layer, the area of the entire screen is identified as non-empty 111.

表示部１４００は、入力画像１００を表示する（Ｓ１４００）。表示部１４００は、表示された画像に対してグラフィカルなユーザインタフェースを有するものとするが、本実施形態は表示部１４００の形式に限定されるものではない。入力画像１００を表示する表示部１４００は、タッチパネルであってもよいし、マウスやペンタブレットを利用可能なパーソナルコンピュータに接続されたモニタでもよい。以下では、表示部１４００をタブレットやスマートフォンに用いられるタッチパネルを例として説明する。 The display unit 1400 displays the input image 100 (S1400). The display unit 1400 shall have a graphical user interface for the displayed image, but the present embodiment is not limited to the format of the display unit 1400. The display unit 1400 that displays the input image 100 may be a touch panel or a monitor connected to a personal computer that can use a mouse or a pen tablet. In the following, the display unit 1400 will be described by taking a touch panel used for a tablet or a smartphone as an example.

初期領域設定部１５００は、表示部１４００に表示された入力画像１００に対してユーザが所定の位置を指定することで、初期領域の設定を行う（Ｓ１５００）。
ユーザは、図７（ａ）に例示するように表示部１４００に表示された入力画像１００の上で、選択したい領域の一部をタップする。初期領域設定部１５００は、領域分割部１２００により分割されて得られるすべての小領域Ｓ＿ｋ（ｋ＝１…Ｋ）のうち、タップされた位置を含む小領域Ｓ＿ｉを初期領域に設定する。例えば、図７（ａ）のようにしてタップした位置が、図８（ａ）に示す右側人物の顔肌の右半に対応する小領域４０１の内部である場合、図８（ｂ）に示すようにこの小領域４０１が初期領域４１１となる。 The initial area setting unit 1500 sets the initial area by designating a predetermined position with respect to the input image 100 displayed on the display unit 1400 (S1500).
The user taps a part of the area to be selected on the input image 100 displayed on the display unit 1400 as illustrated in FIG. 7A. The initial area setting unit 1500 sets the small area S_i including the tapped position as the initial area among all the small areas S_k (k = 1 ... K) obtained by being divided by the area dividing unit 1200. For example, when the tapped position as shown in FIG. 7 (a) is inside the small area 401 corresponding to the right half of the facial skin of the right person shown in FIG. 8 (a), it is shown in FIG. 8 (b). As described above, this small area 401 becomes the initial area 411.

領域制御部１６００は、初期領域設定部１５００により設定された初期領域４１１を所定の操作に応じて拡張、縮小を行い、所望の領域を取得する（Ｓ１６００）。図５（ｂ）は、領域制御部１６００による処理の詳細を表す。 The area control unit 1600 expands or contracts the initial area 411 set by the initial area setting unit 1500 according to a predetermined operation, and acquires a desired area (S1600). FIG. 5B shows the details of the processing by the area control unit 1600.

領域制御部１６００は、その時点で選択されている領域（選択領域）を表示部１４００に表示する（Ｓ１６９０）。選択領域の表示形式は、本実施形態において限定されるものではない。選択領域は、図９（ａ）に例示するように入力画像上に選択領域の輪郭を表示されてもよく、図９（ｂ）に例示するように選択領域の内部だけを表示されてもよい。 The area control unit 1600 displays the area (selected area) selected at that time on the display unit 1400 (S1690). The display format of the selected area is not limited to this embodiment. As the selected area, the outline of the selected area may be displayed on the input image as illustrated in FIG. 9 (a), or only the inside of the selected area may be displayed as illustrated in FIG. 9 (b). ..

領域制御部１６００は、表示された選択領域に対してユーザが行う操作を取得する（Ｓ１６１０）。領域制御部１６００は、ユーザによる操作内容を判断して次の処理を決定する（Ｓ１６１５）。ユーザが行った操作が領域拡張操作である場合（Ｓ１６１５：領域拡張）、領域制御部１６００は、その時点の選択領域を拡張する（Ｓ１６２０）。ユーザが行った操作が領域縮小操作である場合（Ｓ１６１５：領域縮小）、領域制御部１６００は、その時点の選択領域を縮小する（Ｓ１６３０）。ユーザが行った操作が領域選択終了操作である場合（Ｓ１６１５：終了）、領域制御部１６００は、領域制御処理を終了する。これにより画像領域選択処理が終了する。 The area control unit 1600 acquires an operation performed by the user with respect to the displayed selected area (S1610). The area control unit 1600 determines the operation content by the user and determines the next process (S1615). When the operation performed by the user is an area expansion operation (S1615: area expansion), the area control unit 1600 expands the selected area at that time (S1620). When the operation performed by the user is an area reduction operation (S1615: area reduction), the area control unit 1600 reduces the selected area at that time (S1630). When the operation performed by the user is the area selection end operation (S1615: end), the area control unit 1600 ends the area control process. This completes the image area selection process.

図７（ｂ）、７（ｃ）は、領域拡張操作及び領域縮小操作を説明する。ここでは、領域拡張操作及び領域縮小操作が、表示画面上の上下のスライド操作に割り当てている。ただし、領域拡張操作及び領域縮小操作はこれに限定されるものではない。例えば、左右スライド操作によって領域拡張操作及び領域縮小操作が行われてもよい。タッチパネルを押す際の圧力が増える場合は領域拡張操作、圧力が減る場合には領域縮小操作などとしてもよい。ロングタップをしながら別メニューをタップして操作を区別もしくは切り替えるとしてもよい。キーボードとの組み合わせであれば、ロングタップしながらＳＨＩＦＴやＣｔｒｌキーなどの特殊キーとの組み合わせによって区別もしくは切り替えてもよい。マウスを利用する場合は、マウスの上下動作もしくは左右動作で領域拡張操作及び領域縮小操作を行ってもよく、スクロールウィールの回転操作で制御してもよい。マウスのクリック操作とＳＨＩＦＴキーやＣｔｒｌキーなどの特殊キーとの組み合わせによって領域拡張操作及び領域縮小操作を区別する、もしくは切り替わる、などとしてもよい。領域選択終了操作は、例えば指をタッチパネルから離す、マウスでダブルクリックする、などの操作である。 7 (b) and 7 (c) explain the area expansion operation and the area reduction operation. Here, the area expansion operation and the area reduction operation are assigned to the up and down slide operations on the display screen. However, the area expansion operation and the area reduction operation are not limited to this. For example, the area expansion operation and the area reduction operation may be performed by the left and right slide operation. When the pressure when pressing the touch panel increases, the area expansion operation may be performed, and when the pressure decreases, the area reduction operation may be performed. You may distinguish or switch the operation by tapping another menu while long tapping. If it is a combination with a keyboard, it may be distinguished or switched by a combination with a special key such as the SHIFT or Ctrl key while long-tapping. When a mouse is used, the area expansion operation and the area reduction operation may be performed by the vertical movement or the left and right movement of the mouse, or may be controlled by the rotation operation of the scroll wheel. The area expansion operation and the area reduction operation may be distinguished or switched by the combination of the mouse click operation and the special key such as the SHIFT key or the Ctrl key. The area selection end operation is, for example, an operation such as releasing the finger from the touch panel or double-clicking with the mouse.

領域制御部１６００は、Ｓ１６２０の処理で選択領域を拡張する場合、まず、同一階層中で、選択領域に対して空間的に隣接し且つ同一カテゴリである小領域を該選択領域に追加することで、選択領域を拡張する。領域制御部１６００は、当該階層内で隣接する同一カテゴリ領域がない場合、階層を一つ上に移して同様の処理を行う。一つの小領域を選択領域に追加すると、領域制御部１６００は、Ｓ１６９０の処理へ戻る。 When expanding the selected area in the process of S1620, the area control unit 1600 first adds a small area spatially adjacent to the selected area and in the same category to the selected area in the same layer. , Expand the selection area. When there is no adjacent same category area in the hierarchy, the area control unit 1600 moves the hierarchy up one level and performs the same processing. When one small area is added to the selected area, the area control unit 1600 returns to the process of S1690.

Ｓ１６２０の処理を繰り返すことで領域が拡張されていく様子の具体例を以下に示す。図８（ｂ）に例示する初期領域４１１に対して領域拡張操作が行われる場合を例に説明する。
初期領域４１１は、図６（ｃ）の第５階層の領域識別結果１５０において顔肌カテゴリと判定された顔肌１５４の領域に含まれる。初期領域４１１に隣接する小領域のうち同じ顔肌カテゴリと判定された小領域は、図８（ａ）に示した小領域４０２、４０３である。領域制御部１６００は、まず、これらの領域のうち初期領域４１１に対して最も特徴の近い領域を選択する。選択に用いる特徴量は、色ヒストグラムやＬＢＰなどのテクスチャ特徴など、様々なものが考えられるが、本実施形態において限定されるものではない。 A specific example of how the area is expanded by repeating the process of S1620 is shown below. A case where the area expansion operation is performed on the initial area 411 illustrated in FIG. 8B will be described as an example.
The initial region 411 is included in the region of the facial skin 154 determined to be the facial skin category in the region identification result 150 of the fifth layer of FIG. 6 (c). Of the small areas adjacent to the initial area 411, the small areas determined to be in the same facial skin category are the small areas 402 and 403 shown in FIG. 8A. The region control unit 1600 first selects a region having the closest characteristics to the initial region 411 among these regions. Various features such as color histograms and texture features such as LBP can be considered as the feature amount used for selection, but the feature amount is not limited in this embodiment.

領域制御部１６００は、初期領域４１１と最も特徴の近い小領域４０２を初期領域４１１に結合した領域を、新たな選択領域として更新する。図８（ｃ）は、更新された選択領域４１２を例示する。さらに領域拡張操作が継続して行われた場合、領域制御部１６００は、残りの小領域４０３を選択領域４１２に結合させたものを選択領域として更新する。図８（ｄ）は、更新された選択領域４１０を例示する。ここまで領域の拡張が行われると、第５階層における初期領域と連結可能な顔肌領域はすべて連結されたことになる。 The area control unit 1600 updates the area in which the small area 402 having the closest characteristics to the initial area 411 is combined with the initial area 411 as a new selection area. FIG. 8 (c) illustrates the updated selection area 412. Further, when the area expansion operation is continuously performed, the area control unit 1600 updates the remaining small area 403 combined with the selection area 412 as the selection area. FIG. 8 (d) illustrates the updated selection area 410. When the area is expanded up to this point, all the facial skin areas that can be connected to the initial area in the fifth layer are connected.

ここでさらに領域拡張操作が継続して行われた場合、領域制御部１６００は、階層を一つ上、この例では第４階層に処理階層を移す。第５階層における顔肌カテゴリ領域は、第４階層では顔カテゴリ領域に属する。
顔カテゴリ領域の中で、図１０（ａ）に示す目と口である小領域４０４、４０５、４０６は、この時点ではまだ選択領域に含まれていない。しかし、第４階層においては、選択領域４１０と同様に、図６（ｃ）の顔カテゴリ領域の顔肌１４２に属することとなる。そこで、領域制御部１６００は、先ほどと同様にして、選択領域４１０に隣接する小領域の中で、選択領域４１０と最も特徴の近い小領域を結合していく。図１０（ｂ）には、選択領域４１０に対して小領域４０４を結合して更新された選択領域４２１が例示される。図１０（ｃ）には、さらに小領域４０５を結合した選択領域４２２が例示される。図１０（ｄ）にはさらに小領域４０６を結合した選択領域４２０が例示される。 If the area expansion operation is further performed here, the area control unit 1600 moves the processing layer up one level, and in this example, to the fourth layer. The facial skin category area in the fifth layer belongs to the face category area in the fourth layer.
Among the face category regions, the small regions 404, 405, and 406, which are the eyes and mouth shown in FIG. 10 (a), are not yet included in the selected region at this point. However, in the fourth layer, it belongs to the face skin 142 of the face category area of FIG. 6C, similarly to the selection area 410. Therefore, the area control unit 1600 combines the small area having the closest characteristics with the selected area 410 among the small areas adjacent to the selected area 410 in the same manner as before. FIG. 10B illustrates an updated selection region 421 by combining a small region 404 with respect to the selection region 410. FIG. 10 (c) illustrates a selection region 422 further coupled with a small region 405. FIG. 10 (d) illustrates a selection region 420 further coupled with a small region 406.

このようにして、同一階層内の同一カテゴリとなる小領域を連結していき、同一カテゴリの隣接する小領域がなくなった時点で一つ上の階層に移って同様の処理を繰り返していくことで、階層カテゴリに従って領域を拡張していくことができる。図１１（ａ）〜１１（ｇ）は、初期領域４１１から領域拡張操作を続けたときの、各階層における連結結果を示す。図１１（ｂ）は、第５階層において図１１（ａ）の初期領域４１１から同一カテゴリ領域を連結していって得られる選択領域４１０を例示する。さらに領域拡張操作を続けると、第４階層では図１１（ｃ）に例示する顔カテゴリ領域が選択領域４２０として得られ、第３階層では図１１（ｄ）に例示する頭部カテゴリ領域が選択領域４３０として得られる。
第２階層では、頭部カテゴリ領域から拡張していくことにより、図１１（ｅ）に例示する右側人物領域４４０が人物カテゴリ領域の途中結果として得られる。さらに拡張していくと、図１１（ｆ）のように、つないだ手を通して左側人物領域も連結された人物カテゴリ領域が選択領域４５０として得られる。第１階層でさらに連結を続けていくと、図１１（ｇ）のように、画像全体が非空カテゴリ領域として一つの領域に結合されたものが選択領域４６０として設定される。 In this way, small areas of the same category in the same hierarchy are connected, and when there are no adjacent small areas of the same category, the next higher layer is moved and the same processing is repeated. , The area can be expanded according to the hierarchical category. 11 (a) to 11 (g) show the connection results in each layer when the area expansion operation is continued from the initial area 411. FIG. 11B illustrates a selection region 410 obtained by connecting the same category regions from the initial region 411 of FIG. 11A in the fifth layer. Further, when the area expansion operation is continued, the face category area illustrated in FIG. 11C is obtained as the selection area 420 in the fourth layer, and the head category area illustrated in FIG. 11D is the selection area in the third layer. Obtained as 430.
In the second layer, by expanding from the head category area, the right person area 440 illustrated in FIG. 11 (e) is obtained as an intermediate result of the person category area. Further expanding, as shown in FIG. 11 (f), a person category area in which the left person area is also connected through the connected hands is obtained as the selection area 450. When the connection is further continued in the first layer, as shown in FIG. 11 (g), an image in which the entire image is combined into one area as a non-empty category area is set as the selection area 460.

領域制御部１６００は、Ｓ１６３０の処理で領域を縮小する場合、まず、現在の選択領域の中から、現在の階層から一つ下の階層において初期領域が含まれるカテゴリ領域を除いた小領域を、削除候補領域とする。領域制御部１６００は、削除候補領域の中で、選択領域と特徴が最も異なる小領域を選択領域から除き、選択領域を更新する。一つの小領域を選択領域から削除すると、領域制御部１６００は、Ｓ１６９０の処理へと戻る。Ｓ１６３０の処理を繰り返すことによって領域が縮小されていく具体例を以下に示す。 When the area is reduced by the processing of S1630, the area control unit 1600 first obtains a small area from the current selection area excluding the category area including the initial area in the layer one level below the current layer. Use as a deletion candidate area. The area control unit 1600 updates the selected area by removing from the selected area a small area having the characteristics most different from the selected area among the deletion candidate areas. When one small area is deleted from the selected area, the area control unit 1600 returns to the process of S1690. A specific example in which the area is reduced by repeating the process of S1630 is shown below.

例えば、図１０（ｄ）に例示する選択領域４２０について領域縮小操作を行ったとする。このとき処理階層は第４階層で、処理対象カテゴリは図６（ｃ）で示される顔カテゴリ領域の顔肌１４２である。その一つ下の階層、すなわち第５階層において、初期領域の含まれる領域は、図６（ｃ）で例示する顔肌１５４のカテゴリ領域である。初期領域の含まれる顔肌１５４のカテゴリ領域は、図１０（ａ）に例示する選択領域４１０で、それを除いた領域は図１０（ａ）における小領域４０４、４０５、４０６となる。領域制御部１６００は、これらの小領域４０４、４０５、４０６を削除候補領域とし、選択領域４２０と特徴の最も異なる小領域を選択領域から削除する。これにより、選択領域４２０は、図１０（ｃ）、１０（ｂ）のように、顔肌カテゴリ領域以外の部分が削除されていくことで、顔肌カテゴリ領域だけが選択領域４１０のように残ることになる。選択領域４１０が顔肌カテゴリ領域だけになると、処理階層を一つ下げる。ここでは第５階層に移り、領域制御部１６００は、初期領域４１１を除いた図８（ａ）に示す小領域４０１、４０２、４０３を削除候補領域として、同様な縮小処理を続けていく。縮小処理を止めずに続けていけば、最終的には図１１（ａ）に例示する初期領域４１１の状態まで戻ることができる。 For example, it is assumed that the area reduction operation is performed on the selected area 420 illustrated in FIG. 10 (d). At this time, the processing layer is the fourth layer, and the processing target category is the face skin 142 of the face category area shown in FIG. 6C. In the next lower layer, that is, the fifth layer, the area including the initial area is the category area of the facial skin 154 illustrated in FIG. 6 (c). The category region of the facial skin 154 including the initial region is the selection region 410 illustrated in FIG. 10 (a), and the regions excluding it are the small regions 404, 405, and 406 in FIG. 10 (a). The area control unit 1600 sets these small areas 404, 405, and 406 as deletion candidate areas, and deletes the small area having the most different characteristics from the selected area 420 from the selected area. As a result, as shown in FIGS. 10 (c) and 10 (b), the selected area 420 is deleted from the portion other than the facial skin category area, so that only the facial skin category area remains like the selected area 410. It will be. When the selection area 410 becomes only the facial skin category area, the processing hierarchy is lowered by one. Here, moving to the fifth layer, the area control unit 1600 continues the same reduction processing with the small areas 401, 402, and 403 shown in FIG. 8A excluding the initial area 411 as deletion candidate areas. If the reduction process is continued without stopping, it is finally possible to return to the state of the initial region 411 illustrated in FIG. 11 (a).

ユーザは、領域拡張操作及び領域縮小操作を行いながら、所望の領域が得られた時点で領域選択終了操作を行い、領域拡張操作を止めればよい。
以上のようにして、階層的な意味的カテゴリに従って領域の拡張及び縮小を制御することにより、ユーザは、意味のある塊である領域を選択することが容易になる。このようにして得られる領域は、さまざまな画像処理に対して非常に有用である。 The user may stop the area expansion operation by performing the area selection end operation when a desired area is obtained while performing the area expansion operation and the area reduction operation.
As described above, by controlling the expansion and contraction of the area according to the hierarchical semantic category, the user can easily select the area which is a meaningful mass. The region thus obtained is very useful for various image processing.

図１２は、本実施形態の画像処理の説明図である。図１２における画像７００の例で人物を一杯に含む領域をクロッピングしたければ以下のようになる。まず人物の内部領域を初期領域として設定し、領域拡張・縮小操作を行うことで図１２（ａ）に例示する人物領域７１０が選択される。図１２（ｂ）に例示する選択領域の外接矩形７２０を算出することは容易である。これをもとに図１２（ｃ）に例示する人体領域７３０を一杯に含むようにクロッピングすることができる。同様にして、選択領域の外接矩形に合わせてズーム率を画面サイズに対して最適にすることも容易である。このようにして、本実施形態のように意味的カテゴリによる領域拡張及び縮小をユーザが自在に行うことができれば、ユーザの希望する範囲の領域に対して、自動的にクロッピングやズームの範囲を計算することができ、見切れや無駄な余白が発生することがなくなる。 FIG. 12 is an explanatory diagram of image processing of the present embodiment. In the example of the image 700 in FIG. 12, if the area including a person is to be cropped, it is as follows. First, the internal area of the person is set as the initial area, and the person area 710 illustrated in FIG. 12A is selected by performing the area expansion / reduction operation. It is easy to calculate the circumscribed rectangle 720 of the selected area illustrated in FIG. 12B. Based on this, cropping can be performed so as to fully include the human body region 730 illustrated in FIG. 12 (c). Similarly, it is easy to optimize the zoom factor with respect to the screen size according to the circumscribed rectangle of the selected area. In this way, if the user can freely expand and reduce the area according to the semantic category as in the present embodiment, the cropping and zoom ranges are automatically calculated for the area desired by the user. It can be done, and there will be no cut-offs or useless margins.

（第２実施形態）
第１実施形態では、ユーザに指定された領域の拡張が、隣接領域を徐々に連結していくことで領域の選択が行われている。しかし、意味的な領域識別結果が得られていることにより、画像上の空間的な隣接関係に限定されずに、同一カテゴリの領域をまとめて選択することも可能である。本実施形態では、そのような飛び地による領域選択を可能とする。本実施形態の装置構成は、図１に示した第１実施形態と同様であるため、説明を省略する。また、本実施形態の学習処理も、図２に示した第１実施形態における学習処理と同様であるため、説明を省略する。 (Second Embodiment)
In the first embodiment, the area is selected by expanding the area designated by the user and gradually connecting the adjacent areas. However, since the semantic area identification result is obtained, it is possible to select the areas of the same category collectively without being limited to the spatial adjacency relationship on the image. In the present embodiment, it is possible to select an area by such an excursion. Since the apparatus configuration of this embodiment is the same as that of the first embodiment shown in FIG. 1, the description thereof will be omitted. Further, since the learning process of this embodiment is the same as the learning process of the first embodiment shown in FIG. 2, the description thereof will be omitted.

本実施形態の画像領域選択処理は、大まかな処理は図５（ａ）に示した第１実施形態における画像領域選択処理と同様である。本実施形態では、Ｓ１６００の領域制御処理の詳細が第１実施形態とは異なる。本実施形態のＳ１６００の領域制御処理は、第１実施形態で行われる領域拡張操作及び領地縮小操作に加えて、飛び地拡張操作及び飛び地縮小操作の２種類を加えた、計４種類の操作の組み合わせで行われる。 The image area selection process of the present embodiment is roughly the same as the image area selection process of the first embodiment shown in FIG. 5A. In the present embodiment, the details of the area control process of S1600 are different from those of the first embodiment. The area control process of S1600 of the present embodiment is a combination of a total of four types of operations, including two types of an excursion expansion operation and an excursion reduction operation in addition to the area expansion operation and the territory reduction operation performed in the first embodiment. It is done in.

図１３は、本実施形態のＳ１６００の領域制御処理を表すフローチャートである。Ｓ１６９０、Ｓ１６１０、Ｓ１６１５、Ｓ１６２０、及びＳ１６３０の処理は、図５（ｂ）に示す第１実施形態の処理と同様である。本実施形態では、領域制御部１６００が、ユーザによる操作内容を判断して次の処理を決定するＳ１６１５の処理により、領域拡張操作及び領域縮小操作に加えて、飛び地拡張操作及び飛び地縮小操作の判断を行う。 FIG. 13 is a flowchart showing the area control process of S1600 of the present embodiment. The processing of S1690, S1610, S1615, S1620, and S1630 is the same as the processing of the first embodiment shown in FIG. 5 (b). In the present embodiment, the area control unit 1600 determines the jumping area expansion operation and the jumping area reduction operation in addition to the area expansion operation and the area reduction operation by the process of S1615 which determines the operation content by the user and determines the next process. I do.

領域制御部１６００は、ユーザが行った操作が領域拡張操作であればＳ１６２０の処理を行い、領域縮小操作であればＳ１６３０の処理を行い、ユーザが行った操作が領域選択終了操作であれば領域制御処理を終了する。領域制御部１６００は、ユーザが行った操作が飛び地拡張操作であればＳ１６４０の飛び地拡張処理を行い、飛び地縮小操作であればＳ１６５０の飛び地縮小処理を行う。 The area control unit 1600 performs the process of S1620 if the operation performed by the user is an area expansion operation, performs the process of S1630 if the operation is an area reduction operation, and the area if the operation performed by the user is an area selection end operation. End the control process. The area control unit 1600 performs the excursion expansion process of S1640 if the operation performed by the user is the excursion expansion operation, and performs the excursion reduction process of S1650 if the operation is the excursion reduction operation.

領域拡張操作及び領域縮小操作は、図７（ｂ）、７（ｃ）で説明した操作である。図１４は、飛び地拡張操作及び飛び地縮小操作の説明図である。図１４（ａ）は、飛び地拡張操作を表す。図１４（ｂ）は、飛び地縮小操作を表す。領域拡張操作及び領域縮小操作が上下スライド操作であるのに対し（図７（ｂ）、７（ｃ）参照）、飛び地拡張操作及び飛び地縮小操作は、左右スライド操作である。ここでスライド方向の区別は上下左右のスライド方向の組み合わせで４つの操作が区別できればよく、上に書かれている組み合わせに限定されるものではない。飛び地拡張操作及び飛び地縮小操作は、図１４（ｃ）に示すように、空間的に離れた別領域の別の指によるタップ操作やダブルタップ操作などで行われてもよい。
領域拡張操作、領域縮小操作、飛び地拡張操作、及び飛び地縮小操作は、マウスを利用する場合、マウスの上下動作もしくは左右動作を組み合わせて行われてもよく、いずれかをスクロールウィールの回転操作と組み合わせ行われてもよい。あるいは、領域拡張操作、領域縮小操作、飛び地拡張操作、及び飛び地縮小操作は、ＡｌｔキーやＴａｂキーなどの特殊キーとの組み合わせによって行われてもよい。 The area expansion operation and the area reduction operation are the operations described with reference to FIGS. 7 (b) and 7 (c). FIG. 14 is an explanatory diagram of an excursion expansion operation and an excursion reduction operation. FIG. 14A shows an excursion expansion operation. FIG. 14B shows an excursion reduction operation. While the area expansion operation and the area reduction operation are up and down slide operations (see FIGS. 7 (b) and 7 (c)), the excursion expansion operation and the excursion reduction operation are left and right slide operations. Here, the distinction of the slide direction is not limited to the combination described above, as long as the four operations can be distinguished by the combination of the slide directions of up, down, left and right. As shown in FIG. 14C, the excursion expansion operation and the excursion reduction operation may be performed by a tap operation or a double tap operation with another finger in another spatially separated area.
When using a mouse, the area expansion operation, the area reduction operation, the excursion expansion operation, and the excursion reduction operation may be performed by combining the vertical movement or the left / right movement of the mouse, and any of them may be combined with the rotation operation of the scroll wheel. It may be done. Alternatively, the area expansion operation, the area reduction operation, the excursion expansion operation, and the excursion reduction operation may be performed in combination with a special key such as the Alt key or the Tab key.

Ｓ１６４０の処理では、領域制御部１６００は、その時点における選択領域と同一カテゴリの小領域を、画像上の空間的な隣接関係にかかわらず、該選択領域に追加する。一つもしくは複数の小領域を追加すると、領域制御部１６００は、Ｓ１６９０の処理に戻る。Ｓ１６４０の処理の具体例を以下に示す。 In the process of S1640, the area control unit 1600 adds a small area of the same category as the selected area at that time to the selected area regardless of the spatial adjacency relationship on the image. When one or a plurality of small areas are added, the area control unit 1600 returns to the process of S1690. A specific example of the processing of S1640 is shown below.

例えば、図１０（ｃ）の状態で飛び地拡張操作が行われる場合、Ｓ１６４０の処理は、図６（ｃ）の第４階層の領域識別結果１４０において行われており、拡張中の領域カテゴリは顔カテゴリである。その時点の選択領域４２２に対して隣接する同一カテゴリの小領域は、図１０（ａ）の小領域４０６だけであるが、飛び地拡張操作では設定領域との隣接関係と関係なく、同一階層内の同一カテゴリの小領域が連結候補となる。 For example, when the excursion expansion operation is performed in the state of FIG. 10 (c), the process of S1640 is performed in the area identification result 140 of the fourth layer of FIG. 6 (c), and the area category being expanded is the face. It is a category. The small area of the same category adjacent to the selected area 422 at that time is only the small area 406 of FIG. Small areas of the same category are candidates for consolidation.

図１５は、飛び地拡張操作の説明図である。図１５（ａ）の左側人物の顔を構成する小領域４０７、４０８、４０９、４１１、４１２、４１３も、右側人物の小領域４０６に加えて、選択領域４２２の連結対象候補となる。左側人物の顔を構成する小領域４０７、４０８、４０９、４１１、４１２、４１３のうち、特徴が選択領域４２２に最も近い領域が連結される。 FIG. 15 is an explanatory diagram of the excursion expansion operation. The small areas 407, 408, 409, 411, 421, and 413 that form the face of the left person in FIG. 15A are also candidates for connection of the selected area 422 in addition to the small area 406 of the right person. Of the small areas 407, 408, 409, 411, 421, and 413 that make up the face of the left person, the areas whose features are closest to the selected area 422 are connected.

領域制御部１６００は、選択領域４２２と最も特徴の近い小領域（ここでは小領域４０７）を該選択領域４２２に加えた飛び地領域を含めたものを、新たな選択領域として更新する。図１５（ｂ）は、更新された選択領域４７１を例示する。選択領域４７１に対して最も特徴の近い小領域が小領域４０８の場合、次に得られる選択領域は、図１５（ｃ）に例示する選択領域４７２のようになる。残りの小領域４０６、４０９、４１１、４１２、４１３がそれぞれ追加されることで、この階層における選択領域は、最終的に図１５（ｄ）に例示する選択領域４７０のようになる。 The area control unit 1600 updates a new selection area including a small area (here, a small area 407) having the closest characteristics to the selection area 422 and an excursion area added to the selection area 422. FIG. 15B illustrates the updated selection area 471. When the small area having the closest characteristics to the selected area 471 is the small area 408, the selected area obtained next is as shown in the selected area 472 illustrated in FIG. 15 (c). By adding the remaining small regions 406, 409, 411, 412, and 413, respectively, the selected region in this hierarchy finally becomes the selected region 470 illustrated in FIG. 15 (d).

図１５（ｄ）の状態で領域拡張操作が行われると、一段上の階層、ここでは第３階層、へと移って拡張処理が続けられることになる。この場合、領域制御部１６００は、右側人物と左側人物それぞれの隣接領域に対して頭部カテゴリ領域を拡張していくことになる。そのため、この階層では最終的に図１５（ｅ）に例示する選択領域４８０が得られることになる。さらに領域拡張処理を続けると、第２階層では図１１（ｆ）に例示する選択領域４５０が得られる。 When the area expansion operation is performed in the state of FIG. 15D, the expansion process is continued by moving to the next higher layer, here, the third layer. In this case, the area control unit 1600 expands the head category area with respect to the adjacent areas of the right person and the left person. Therefore, in this hierarchy, the selection area 480 illustrated in FIG. 15 (e) is finally obtained. Further, when the area expansion process is continued, the selection area 450 illustrated in FIG. 11F is obtained in the second layer.

Ｓ１６５０の処理では、領域制御部１６００は、現時点の選択領域の中から、現時点の階層において初期領域と空間的に連結していない部分に関して、優先的に小領域を削除していく。領域制御部１６００は、一つもしくは複数の小領域を選択領域から削除すると、Ｓ１６９０の処理へと戻る。Ｓ１６５０の処理の具体例を以下に示す。 In the process of S1650, the area control unit 1600 preferentially deletes a small area from the selected area at the present time with respect to the portion that is not spatially connected to the initial area in the current hierarchy. When the area control unit 1600 deletes one or a plurality of small areas from the selected area, the area control unit 1600 returns to the process of S1690. A specific example of the processing of S1650 is shown below.

ここでは、図１５（ｄ）に例示する選択領域４７０に対して飛び地縮小処理を行う場合について説明する。初期領域が右側人物の領域である場合、領域制御部１６００は、左側人物の顔領域における小領域の中から、選択領域４７０全体に対して最も特徴の異なる小領域を取り除く。このようにして、左側人物の顔における小領域がすべて選択領域から除かれると、図１１（ｃ）に例示する選択領域４２０が残る。さらに飛び地縮小操作が続けられた場合、領域縮小処理と同様にして右側人物の顔領域に関する領域縮小処理が行われる。 Here, a case where the excursion reduction process is performed on the selected area 470 illustrated in FIG. 15D will be described. When the initial area is the area of the right person, the area control unit 1600 removes the small area having the most characteristics with respect to the entire selection area 470 from the small areas in the face area of the left person. When all the small areas on the face of the left person are excluded from the selection area in this way, the selection area 420 illustrated in FIG. 11C remains. When the excursion reduction operation is continued, the area reduction process for the face area of the right person is performed in the same manner as the area reduction process.

以上のように領域拡張処理及び領域縮小処理に飛び地拡張処理及び飛び地縮小処理を併用することで、ユーザは、目的に応じて好みの領域を容易に選択することができる。例えば、図６（ａ）に例示する入力画像１００において右側人物の体全体を選択したい場合、ユーザは、右側人物の内部領域を初期領域として選択する。情報処理装置は、この初期領域に基づいて領域拡張処理を続ける。図１１（ｅ）に例示する右側人物領域４４０が得られた時点でユーザが領域選択終了操作を行うことで、情報処理装置は、領域選択処理を終了する。 By using the excursion expansion process and the excursion reduction process together with the area expansion process and the area reduction process as described above, the user can easily select a favorite area according to the purpose. For example, when it is desired to select the entire body of the right person in the input image 100 illustrated in FIG. 6A, the user selects the internal area of the right person as the initial area. The information processing device continues the area expansion process based on this initial area. When the right person area 440 illustrated in FIG. 11E is obtained, the user performs the area selection end operation, so that the information processing apparatus ends the area selection process.

また、左右人物の両方の顔だけを選択したい場合、ユーザは、いずれかの人物の顔領域内部をタップして初期領域として選択する。ユーザがその人物の顔領域が得られた時点で飛び地拡張操作を行うことで、情報処理装置は、もう一人の顔領域を含んだ選択領域を得ることができる。同様な操作によって、二人以上の人物領域を選択することも可能である。また、人物領域以外、例えば複数の自動車領域を同時選択する、といったことも可能であることは言うまでもない。 Further, when it is desired to select only the faces of both the left and right persons, the user taps the inside of the face area of either person to select it as the initial area. When the user performs the excursion expansion operation when the face area of the person is obtained, the information processing apparatus can obtain a selection area including another face area. It is also possible to select two or more person areas by the same operation. Needless to say, it is also possible to simultaneously select, for example, a plurality of automobile areas other than the person area.

（第３実施形態）
第１実施形態及び第２実施形態の領域制御時の処理単位となる小領域は、必ずしも所望の輪郭位置で分割されるとは限らない。例えば、黒髪の背景が暗い夜景であれば、髪と背景の領域が分割されずに一つの小領域となる可能性もある。本実施形態では、そのような場合に輪郭を修正して適切な小領域を得るようにする。本実施形態の装置構成は、図１に示した第１実施形態と同様であるため、説明を省略する。また、本実施形態の学習処理も、図２に示した第１実施形態における学習処理と同様であるため、説明を省略する。 (Third Embodiment)
The small area that is the processing unit during the area control of the first embodiment and the second embodiment is not always divided at a desired contour position. For example, if the background of black hair is a dark night view, the hair and background areas may not be divided into one small area. In this embodiment, the contour is modified in such a case to obtain an appropriate small area. Since the apparatus configuration of this embodiment is the same as that of the first embodiment shown in FIG. 1, the description thereof will be omitted. Further, since the learning process of this embodiment is the same as the learning process of the first embodiment shown in FIG. 2, the description thereof will be omitted.

本実施形態の画像領域選択処理は、大まかな処理は図５（ａ）に示した第１実施形態における画像領域選択処理と同様である。本実施形態では、Ｓ１６００の領域制御処理の詳細が第１実施形態とは異なる。本実施形態のＳ１６００の領域制御処理は、第２実施形態で行われる領域拡張操作、領地縮小操作、飛び地拡張操作、及び飛び地縮小操作に加えて、輪郭修正操作及び領域追加操作の２種類の操作を加えた、計６種類の操作の組み合わせで行われる。 The image area selection process of the present embodiment is roughly the same as the image area selection process of the first embodiment shown in FIG. 5A. In the present embodiment, the details of the area control process of S1600 are different from those of the first embodiment. The area control process of S1600 of the present embodiment includes two types of operations, a contour correction operation and an area addition operation, in addition to the area expansion operation, the territory reduction operation, the excelave expansion operation, and the excelave reduction operation performed in the second embodiment. It is performed by a combination of a total of 6 types of operations including.

図１６は、本実施形態のＳ１６００の領域制御処理を表すフローチャートである。Ｓ１６９０、Ｓ１６１０、Ｓ１６１５、Ｓ１６２０、Ｓ１６３０、Ｓ１６４０、及びＳ１６５０の処理は、図１３に示す第２実施形態の処理と同様である。本実施形態では、領域制御部１６００が、ユーザによる操作内容を判断して次の処理を決定するＳ１６１５の処理により、領域拡張操作、領地縮小操作、飛び地拡張操作、及び飛び地縮小操作に加えて、輪郭修正操作及び領域追加操作の判断を行う。 FIG. 16 is a flowchart showing the area control process of S1600 of the present embodiment. The processing of S1690, S1610, S1615, S1620, S1630, S1640, and S1650 is the same as the processing of the second embodiment shown in FIG. In the present embodiment, the area control unit 1600 determines the operation content by the user and determines the next process by the process of S1615, in addition to the area expansion operation, the territory reduction operation, the excursion expansion operation, and the excursion reduction operation. Judge the contour correction operation and the area addition operation.

領域制御部１６００は、ユーザが行った操作が領域拡張操作であればＳ１６２０の処理を行い、領域縮小操作であればＳ１６３０の処理を行い、ユーザが行った操作が領域選択終了操作であれば領域制御処理を終了する。領域制御部１６００は、ユーザが行った操作が飛び地拡張操作であればＳ１６４０の処理を行い、飛び地縮小操作であればＳ１６５０の処理を行う。領域制御部１６００は、ユーザが行った操作が輪郭修正操作であればＳ１６６０の処理を行い、領域追加操作であればＳ１６７０の処理を行う。 The area control unit 1600 performs the process of S1620 if the operation performed by the user is an area expansion operation, performs the process of S1630 if the operation is an area reduction operation, and the area if the operation performed by the user is an area selection end operation. End the control process. The area control unit 1600 performs the process of S1640 if the operation performed by the user is a jump land expansion operation, and performs the process of S1650 if the operation is a jump land reduction operation. The area control unit 1600 performs the process of S1660 if the operation performed by the user is a contour correction operation, and performs the process of S1670 if the operation is an area addition operation.

領域拡張操作、領域縮小操作、飛び地拡張操作、及び飛び地縮小操作は、図７及び図１４で説明した操作である。図１７は、輪郭修正操作の説明図である。図１８は、領域追加操作の説明図である。 The area expansion operation, the area reduction operation, the excursion expansion operation, and the excursion reduction operation are the operations described with reference to FIGS. 7 and 14. FIG. 17 is an explanatory diagram of the contour correction operation. FIG. 18 is an explanatory diagram of the area addition operation.

図１７（ａ）は、表示された入力画像８００に対して領域拡張操作及び領域縮小操作を行った結果得られる領域８５０を示す。ここでユーザが本当に得たい領域は、輪郭８１０で表される。実際に得られている領域は、輪郭８２０で表される。そのため、輪郭８１０と輪郭８２０との間の領域は、不要領域となる。
図１７（ｂ）では、輪郭修正操作として、ユーザが、領域選択に用いた指とは別の指で、表示された入力画像８００の不要領域部分に対してフリック操作を行っている。ここではフリック操作を輪郭修正操作としている。輪郭修正操作は、それ以外に、不要領域に対するタップ操作やダブルタップ操作であってもよい。情報処理装置は、輪郭修正操作を行った際にユーザが指定した不要領域の位置を、不要領域位置として記憶しておく。 FIG. 17A shows a region 850 obtained as a result of performing a region expansion operation and a region reduction operation on the displayed input image 800. Here, the area that the user really wants to obtain is represented by the contour 810. The actually obtained region is represented by the contour 820. Therefore, the region between the contour 810 and the contour 820 becomes an unnecessary region.
In FIG. 17B, as a contour correction operation, the user performs a flick operation on an unnecessary area portion of the displayed input image 800 with a finger different from the finger used for area selection. Here, the flick operation is a contour correction operation. In addition to this, the contour correction operation may be a tap operation or a double tap operation on an unnecessary area. The information processing device stores the position of the unnecessary area specified by the user when the contour correction operation is performed as the unnecessary area position.

領域制御部１６００は、Ｓ１６６０の処理により、不要領域位置の付近の画像に対する小領域分割を再び行う。図１７（ｃ）は、選択された不要領域位置を含む小領域８３０を例示する。領域制御部１６００は、この小領域８３０の内部で、小領域分割をさらに細かくするように領域分割パラメータを再設定し、図１７（ｄ）に例示するように、小領域８３０を細分割した小領域８３１、８３２を生成する。領域制御部１６００は、細分割して生成した不要領域位置を含む小領域８３２に関してはどのカテゴリにも属さない、というフラグを付与し、領域拡張及び領域縮小処理の対象外として選択領域から除外する。領域制御部１６００は、選択領域を更新するとＳ１６９０の処理に戻る。 The area control unit 1600 re-divides the image in the vicinity of the unnecessary area position into small areas by the process of S1660. FIG. 17 (c) illustrates a small region 830 containing a selected unwanted region position. Inside the small area 830, the area control unit 1600 resets the area division parameters so as to further subdivide the small area, and as illustrated in FIG. 17 (d), the small area 830 is subdivided into small areas. Regions 831 and 832 are generated. The area control unit 1600 adds a flag that the small area 832 including the unnecessary area position generated by subdivision does not belong to any category, and excludes it from the selected area as a target of the area expansion and area reduction processing. .. When the area control unit 1600 updates the selected area, the area control unit 1600 returns to the process of S1690.

領域追加操作について説明する。図１８（ａ）は、入力画像９００に対して領域拡張操作及び領域縮小操作を行った結果得られる領域９５０を示す。ここでユーザが本当に得たい領域は、輪郭９１０で表される。そのために、選択領域外に不足領域９２０が発生している。図１８（ｂ）では、領域追加操作として、ユーザは、領域選択に用いた指とは別の指で不足領域部分をロングタップする。情報処理装置は、領域追加操作を行った際にユーザが指定した位置を、追加領域位置として記憶しておく。
領域制御部１６００は、Ｓ１６７０の処理により、指定された追加領域位置にある小領域のカテゴリ判定結果を、選択領域のカテゴリと一致するように置き換えて選択領域に追加する。領域制御部１６００は、選択領域を更新するとＳ１６９０の処理に戻る。 The area addition operation will be described. FIG. 18A shows the area 950 obtained as a result of performing the area expansion operation and the area reduction operation on the input image 900. Here, the area that the user really wants to obtain is represented by the contour 910. Therefore, a shortage area 920 is generated outside the selected area. In FIG. 18B, as an area addition operation, the user long-tap the insufficient area portion with a finger different from the finger used for area selection. The information processing device stores the position specified by the user when the area addition operation is performed as the additional area position.
By the process of S1670, the area control unit 1600 replaces the category determination result of the small area at the designated additional area position so as to match the category of the selected area, and adds it to the selected area. When the area control unit 1600 updates the selected area, the area control unit 1600 returns to the process of S1690.

このような処理により、小領域の細分割による輪郭修正と、欠けている領域の追加とをユーザが簡単に行えることができるようになる。そのために、情報処理装置は、小領域分割や領域認識が不適切であっても、簡単な操作でより正確な領域を選択することができる。 By such a process, the user can easily correct the contour by subdividing the small area and add the missing area. Therefore, the information processing apparatus can select a more accurate area with a simple operation even if the small area division or the area recognition is inappropriate.

（第４実施形態）
第１〜第３実施形態では、画像の各小領域に対する領域識別結果として一つのカテゴリだけが得られるように説明したが、カテゴリの多義性を考慮して複数カテゴリが得られるようにしてもよい。本実施形態では、複数カテゴリの出力を許容する画像領域選択方法について説明する。なお、カテゴリの多義性とは、画像中の所定の領域について、所属するカテゴリが一意に決まらないような状態を指す。例えば、木の幹がむき出しで建てられているログハウスは、カテゴリとして自然物の木と判定されても、人工物の建物と判定されても差し支えない、といった場合がある。学習画像においてそのような領域があった場合には、領域カテゴリラベルとして複数のラベルが重複して付与される。 (Fourth Embodiment)
In the first to third embodiments, it has been described that only one category can be obtained as the area identification result for each small area of the image, but a plurality of categories may be obtained in consideration of the ambiguity of the categories. .. In this embodiment, an image area selection method that allows output of a plurality of categories will be described. The ambiguity of the category refers to a state in which the category to which the category belongs is not uniquely determined for a predetermined area in the image. For example, a log house with a bare tree trunk may be classified as a natural tree or an artificial building as a category. When there is such an area in the training image, a plurality of labels are duplicated as the area category label.

本実施形態では、学習処理において、領域識別器生成部２３００が以下のような式により領域識別器の学習を行う。 In the present embodiment, in the learning process, the area classifier generation unit 2300 learns the area classifier by the following formula.

ここでＣ＿（ｒ，ｌ）は、第ｌ階層における小領域ｒに関して重複を許容して割り当てられた、カテゴリラベルインデックスの集合である。その他については、第１実施形態と同様の学習処理が行われる。 Here, C_ (r, l) is a set of category label indexes allocated to allow duplication with respect to the small area r in the first layer. Other than that, the same learning process as in the first embodiment is performed.

画像領域選択処理では、領域識別部１３００が以下の式により各小領域ｋの領域識別結果を取得する。 In the image area selection process, the area identification unit 1300 acquires the area identification result of each small area k by the following formula.

ここでＣ＿（ｋ，ｌ）は、第ｌ階層における小領域ｋに関する識別結果となるカテゴリインデックスの集合である。θは所定の閾値であって、例えばθ＝０．５などと設定される。これにより、入力画像の領域によっては複数のクラスラベルが識別結果として得られる場合が発生することになる。 Here, C_ (k, l) is a set of category indexes that are the identification results for the small area k in the first layer. θ is a predetermined threshold value, and is set, for example, θ = 0.5. As a result, a plurality of class labels may be obtained as identification results depending on the area of the input image.

第１〜第３実施形態で説明した領域拡張処理、領域縮小処理、飛び地拡張処理、飛び地縮小処理、輪郭修正処理、及び領域追加処理は、初期領域設定部１５００によるＳ１５００の初期領域設定処理でユーザが設定した初期領域にのみ依存する。そのために、上記のように各小領域に対して複数ラベルが割り当てられていても、同様の処理を行うことによって画像領域選択処理を実行することで、第１〜第３実施形態で説明した効果を得ることができる。 The area expansion processing, area reduction processing, excelave expansion processing, excelave reduction processing, contour correction processing, and area addition processing described in the first to third embodiments are the initial area setting processing of S1500 by the initial area setting unit 1500. Depends only on the initial area set by. Therefore, even if a plurality of labels are assigned to each small area as described above, the effect described in the first to third embodiments can be obtained by executing the image area selection process by performing the same process. Can be obtained.

（第５実施形態）
第１〜第４実施形態では、領域拡張及び領域縮小が、選択領域に対する小領域の追加及び削除により行われる。領域拡張及び領域縮小は、小領域単位で行う他に、画素単位で行われてもよい。この場合、領域制御部１６００は、Ｓ１６２０の処理で、選択領域と隣接する同一カテゴリ画素の中で、選択領域に対して最も類似度の高い画素を追加する。このときに利用できる類似度としては、選択領域内の色分布を混合ガウス分布として表したときの、画素色輝度値の尤度を用いてもよい。同様にして領域制御部１６００は、Ｓ１６３０の処理で、選択領域の一番外側の画素で、選択領域との類似度の最も低い画素を該選択領域から削除する。 (Fifth Embodiment)
In the first to fourth embodiments, the area expansion and the area reduction are performed by adding and deleting a small area to the selected area. The area expansion and area reduction may be performed in pixel units as well as in small area units. In this case, the area control unit 1600 adds the pixel having the highest degree of similarity to the selected area among the pixels of the same category adjacent to the selected area in the process of S1620. As the similarity that can be used at this time, the likelihood of the pixel color luminance value when the color distribution in the selected region is expressed as a mixed Gaussian distribution may be used. Similarly, in the process of S1630, the area control unit 1600 deletes the outermost pixel of the selected area and the pixel having the lowest similarity with the selected area from the selected area.

以上のような各実施形態では、事前知識を使って学習することによって得られる意味的なカテゴリに基づいて領域を設定するため、ボトムアップな領域成長と比べて正確な領域を選択することができる。また、意味的なカテゴリの上位／下位の概念に従って、さまざまなレベルの意味的な領域が選択可能である。これによりユーザは、画像中の領域を簡単に選択できるようになり、従来の点指定では煩雑であった領域単位のユーザインタフェースによる諸作業を簡単に行うことができるようになる。 In each of the above embodiments, the area is set based on the semantic category obtained by learning using prior knowledge, so that an accurate area can be selected as compared with bottom-up area growth. .. In addition, various levels of semantic areas can be selected according to the concept of upper / lower levels of semantic categories. As a result, the user can easily select an area in the image, and can easily perform various operations by the user interface for each area, which is complicated in the conventional point designation.

本発明は、上述の各実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of each of the above-described embodiments to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads the program. It can also be realized by the processing to be executed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

Image acquisition means to acquire images and
An area identification means for hierarchically identifying the area in the acquired image into a plurality of categories, and
A display means for displaying the image and
An initial area setting means for setting an area at a predetermined position as an initial area of a selection area according to a user's operation on an image displayed on the display means.
It is characterized by comprising an area control means for expanding and contracting the selected area according to the hierarchical category determination result according to a predetermined operation of the user and updating the selected area.
Information processing device.

The area control means expands the area by combining the areas of the same category adjacent to the selected area in the same layer in accordance with a predetermined first operation of the user to the selected area.
The selected area is reduced by deleting an area that does not include the initial area from the selected area in accordance with a predetermined second operation of the user.
The information processing device according to claim 1.

The area control means is characterized in that the first operation and the second operation are distinguished by the direction of a slide operation by a touch panel or a mouse.
The information processing device according to claim 2.

The area control means is characterized in that the first operation and the second operation are distinguished by the pressure on the touch panel.
The information processing device according to claim 2.

The area control means is characterized in that the first operation and the second operation are distinguished by a combination of a click by a mouse and a special key.
The information processing device according to claim 2.

The area control means is characterized in that the first operation and the second operation are distinguished by switching a click operation with a mouse with a special key.
The information processing device according to claim 2.

The area control means is characterized in that the first operation and the second operation are distinguished or switched by a combination of a long tap on a touch panel and a special key.
The information processing device according to claim 2.

The area control means is characterized in that the first operation and the second operation are distinguished or switched by tapping the touch panel separately from the long tap and the menu.
The information processing device according to claim 2.

The area control means is characterized in that the first operation and the second operation are distinguished according to a rotation operation of a scroll wheel in a mouse.
The information processing device according to claim 2.

The area control means connects an area in the same category as the selected area including an area not connected to the initial area in the same layer to the selected area in accordance with a predetermined third operation of the user. Extend and
The selected area is reduced by preferentially deleting an area that is not connected to the initial area from the selected area in accordance with a predetermined fourth operation of the user.
The information processing device according to any one of claims 1 to 9.

The area control means is characterized in that the third operation and the fourth operation are recognized by a slide operation using a touch panel or a mouse.
The information processing device according to claim 10.

The area control means recognizes the third operation and the fourth operation according to the rotation operation of the scroll wheel in the mouse.
The information processing device according to claim 10.

The area control means recognizes the third operation and the fourth operation by a tap operation on another area away from the selected area.
The information processing device according to claim 10.

The area control means is characterized in that the third operation and the fourth operation are distinguished from the first operation and the second operation by a special key.
The information processing device according to claim 10.

The area control means
By designating an unnecessary area from the selected area and dividing the unnecessary area into areas according to a predetermined fifth operation of the user, the contour of the selected area is corrected.
Adding the shortage area to the selection area by designating the shortage area from outside the selection area and matching the category of the shortage area with the category of the selection area according to a predetermined sixth operation of the user. Characteristic,
The information processing device according to any one of claims 1 to 14.

The area control means recognizes the fifth operation by tapping or flicking the unnecessary area displayed on the touch panel.
The information processing device according to claim 15.

The area control means is characterized in that the fifth operation and the sixth operation are recognized by an operation of clicking the unnecessary area with a mouse.
The information processing device according to claim 15.

The area control means recognizes the sixth operation by tapping the insufficient area displayed on the touch panel.
The information processing device according to any one of claims 15 to 17.

Information processing device
Steps to get the image and
A step of hierarchically identifying areas in the acquired image into a plurality of categories,
The step of displaying the image and
A step of setting an area at a predetermined position as an initial area of a selection area according to a user's operation on the displayed image, and
Including a step of expanding and contracting the selected area according to the hierarchical category determination result according to a predetermined operation of the user and updating the selected area.
Image area selection method.

Computer,
Image acquisition means to acquire images,
Area identification means for hierarchically identifying areas in the acquired image into a plurality of categories,
Display means for displaying the image,
An initial area setting means that sets an area at a predetermined position as an initial area of a selection area according to a user's operation on an image displayed on the display means.
An area control means that expands and contracts the selected area according to the hierarchical category determination result according to a predetermined operation of the user, and updates the selected area.
A computer program to function as.

A computer-readable storage medium that stores the computer program of claim 20.