JP7172066B2

JP7172066B2 - Information processing device and program

Info

Publication number: JP7172066B2
Application number: JP2018042418A
Authority: JP
Inventors: 和敏池田
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2018-03-08
Filing date: 2018-03-08
Publication date: 2022-11-16
Anticipated expiration: 2038-03-08
Also published as: JP2019159503A

Description

本発明は、情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and program.

現在、撮影された画像に含まれる物体の像を画像解析により認識する技術が研究されている。例えば距離の情報を含むカラー画像の一部分が前景又は背景としてユーザが指定した場合に、指定されていない方の領域を画像の解析によって認識する技術がある。 Techniques for recognizing an image of an object included in a captured image by image analysis are currently being researched. For example, when a user designates a part of a color image containing distance information as the foreground or background, there is a technique for recognizing the area that is not designated by analyzing the image.

特開２０１０－０３９９９９号公報JP 2010-039999 A

しかし、ユーザが画像毎に領域を指定する手法は、処理の対象とする画像の数が多い場合や実時間での処理が必要になる場合への応用が難しい。 However, it is difficult to apply the method in which the user designates an area for each image when there are many images to be processed or when real-time processing is required.

本発明は、手作業による領域の指定を前提とすることなく処理の対象とする領域の特定を可能にする。 The present invention makes it possible to specify an area to be processed without assuming manual specification of the area.

請求項１に記載の発明は、撮像された画像を構成する各画素に対応する深度の分布が集中して出現する範囲を抽出する抽出手段と、抽出された前記範囲に属する画素の少なくとも一部を、処理の対象とする領域を特定するための起点として設定する設定手段とを有する情報処理装置である。
請求項２に記載の発明は、前記抽出手段は、予め定めた条件を満たす深度の分布を前記範囲として抽出する、請求項１に記載の情報処理装置である。
請求項３に記載の発明は、前記条件は、分布の形状が予め定めた関数の形状に類似することである、請求項２に記載の情報処理装置である。
請求項４に記載の発明は、前記関数は、単峰性の関数である、請求項３に記載の情報処理装置である。
請求項５に記載の発明は、前記条件は、予め定めた閾値を超える度数が出現することである、請求項２に記載の情報処理装置である。
請求項６に記載の発明は、前記設定手段は、抽出された前記範囲の中でも画素の出現が集中する深度の画素を前記起点に設定する、請求項１に記載の情報処理装置である。
請求項７に記載の発明は、前記少なくとも一部の画素は、前記範囲の中で出現する画素の数が極大となる深度に対応する、請求項６に記載の情報処理装置である。
請求項８に記載の発明は、前記少なくとも一部の画素は、前記範囲の中央値に対して標準偏差の定数倍の範囲に含まれる深度に対応する、請求項６に記載の情報処理装置である。
請求項９に記載の発明は、前記画像は、深度の分布が対応付けられた画素を有する、請求項１に記載の情報処理装置である。
請求項１０に記載の発明は、各画素に対応する深度の分布は、複数の撮像手段の視差として与えられる、請求項９に記載の情報処理装置である。
請求項１１に記載の発明は、各画素に対応する深度の分布は、前記画像を撮像する際の測定値として与えられる、請求項９に記載の情報処理装置である。
請求項１２に記載の発明は、前記起点に設定された前記画素の値を使用して、前記画像を複数の領域に分割する、請求項１に記載の情報処理装置である。
請求項１３に記載の発明は、処理の対象とする前記領域は、予め定めた規則に基づいて選択的に特定される、請求項１に記載の情報処理装置である。
請求項１４に記載の発明は、予め定めた規則に従って、前記起点を通じて特定された領域の一部を処理の対象として選択する、請求項１３に記載の情報処理装置である。
請求項１５に記載の発明は、予め定めた前記規則は、前記画像の中央に近い領域を優先することである、請求項１４に記載の情報処理装置である。
請求項１６に記載の発明は、予め定めた前記規則は、ユーザの視線が集中する領域を優先することである、請求項１４に記載の情報処理装置である。
請求項１７に記載の発明は、前記規則に基づく選択は、前記起点を通じて特定された複数の領域の深度の違いが予め定めた範囲内である場合に実行される、請求項１４に記載の情報処理装置である。
請求項１８に記載の発明は、コンピュータを、撮像された画像を構成する各画素に対応する深度の分布が集中して出現する範囲を抽出する抽出手段と、抽出された前記範囲に属する画素の少なくとも一部を、処理の対象とする領域を特定するための起点として設定する設定手段として機能させるプログラムである。 The invention according to claim 1 comprises extraction means for extracting a range in which depth distribution corresponding to each pixel constituting a captured image appears intensively, and at least a part of the extracted pixels belonging to the range. as a starting point for specifying an area to be processed.
The invention according to claim 2 is the information processing apparatus according to claim 1, wherein the extracting means extracts a distribution of depths satisfying a predetermined condition as the range.
The invention according to claim 3 is the information processing apparatus according to claim 2, wherein the condition is that the shape of the distribution resembles the shape of a predetermined function.
The invention according to claim 4 is the information processing apparatus according to claim 3, wherein the function is a unimodal function.
The invention according to claim 5 is the information processing apparatus according to claim 2, wherein the condition is that a frequency exceeding a predetermined threshold appears.
The invention according to claim 6 is the information processing apparatus according to claim 1, wherein the setting means sets, as the starting point, a pixel at a depth where appearance of pixels is concentrated even in the extracted range.
The invention according to claim 7 is the information processing apparatus according to claim 6, wherein the at least some pixels correspond to a depth at which the number of pixels appearing in the range is maximized.
The invention according to claim 8 is the information processing apparatus according to claim 6, wherein the at least some pixels correspond to a depth included in a range that is a constant multiple of the standard deviation with respect to the median value of the range. be.
The invention according to claim 9 is the information processing apparatus according to claim 1, wherein the image has pixels associated with a depth distribution .
The invention according to claim 10 is the information processing apparatus according to claim 9, wherein the depth distribution corresponding to each pixel is given as parallax of a plurality of imaging means.
The invention according to claim 11 is the information processing apparatus according to claim 9, wherein the depth distribution corresponding to each pixel is given as a measurement value when capturing the image.
The invention according to claim 12 is the information processing apparatus according to claim 1, wherein the image is divided into a plurality of regions using the values of the pixels set as the starting points.
The invention according to claim 13 is the information processing apparatus according to claim 1, wherein the area to be processed is selectively identified based on a predetermined rule.
The invention according to claim 14 is the information processing apparatus according to claim 13, wherein a part of the area specified through the starting point is selected as a processing target according to a predetermined rule.
The invention according to claim 15 is the information processing apparatus according to claim 14, wherein the predetermined rule is to give priority to an area near the center of the image.
The invention according to claim 16 is the information processing apparatus according to claim 14, wherein the predetermined rule prioritizes an area where the user's line of sight is concentrated.
17. The information according to claim 14, wherein the selection based on the rule is performed when a difference in depth of the plurality of regions identified through the starting point is within a predetermined range. processing equipment.
According to an eighteenth aspect of the present invention, a computer comprises an extracting means for extracting a range in which depth distribution corresponding to each pixel constituting a captured image appears in a concentrated manner; The program causes at least a part of the program to function as setting means for setting a starting point for specifying a region to be processed.

請求項１記載の発明によれば、手作業による領域の指定を前提とすることなく処理の対象とする領域を特定できる。
請求項２記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項３記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項４記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項５記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項６記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項７記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項８記載の発明によれば、処理の対象とする領域の候補の推定の精度を高めることができる。
請求項９記載の発明によれば、画像の数が多い場合や実時間での処理が要求される場合にも対応できる。
請求項１０記載の発明によれば、像の数が多い場合や実時間での処理が要求される場合にも対応できる。
請求項１１記載の発明によれば、像の数が多い場合や実時間での処理が要求される場合にも対応できる。
請求項１２記載の発明によれば、処理の対象とする領域の候補を計算により推定できる。
請求項１３記載の発明によれば、手動で領域を指定できない場合でも処理の対象とする領域の候補を効率的に推定できる。
請求項１４記載の発明によれば、予め規則を定めておけば、特別な操作を必要とせずに処理の対象とする領域を選択できる。
請求項１５記載の発明によれば、予め規則を定めておけば、特別な操作を必要とせずに処理の対象とする領域を選択できる。
請求項１６記載の発明によれば、予め規則を定めておけば、特別な操作を必要とせずに処理の対象とする領域を選択できる。
請求項１７記載の発明によれば、予め規則を定めておけば、特別な操作を必要とせずに処理の対象とする領域を選択できる。
請求項１８記載の発明によれば、手作業による領域の指定を前提とすることなく処理の対象とする領域の特定を可能にできる。 According to the first aspect of the invention, the area to be processed can be specified without the assumption that the area is specified manually.
According to the second aspect of the present invention, it is possible to improve the accuracy of estimating a candidate region to be processed.
According to the third aspect of the present invention, it is possible to improve the accuracy of estimating a candidate region to be processed.
According to the fourth aspect of the present invention, it is possible to improve the accuracy of estimating a candidate region to be processed.
According to the fifth aspect of the present invention, it is possible to improve the accuracy of estimating a candidate region to be processed.
According to the sixth aspect of the present invention, it is possible to improve the accuracy of estimating a candidate region to be processed.
According to the seventh aspect of the present invention, it is possible to improve the accuracy of estimating a candidate region to be processed.
According to the eighth aspect of the present invention, it is possible to improve the accuracy of estimating candidates for the area to be processed.
According to the ninth aspect of the invention, it is possible to cope with the case where the number of images is large and the case where real-time processing is required.
According to the tenth aspect of the present invention, it is possible to cope with the case where the number of images is large and the case where real-time processing is required.
According to the eleventh aspect of the invention, it is possible to cope with the case where the number of images is large and the case where real-time processing is required.
According to the twelfth aspect of the present invention, it is possible to estimate a candidate region to be processed by calculation.
According to the thirteenth aspect of the present invention, it is possible to efficiently estimate candidates for the area to be processed even when the area cannot be specified manually.
According to the fourteenth aspect of the invention, if a rule is defined in advance, a region to be processed can be selected without requiring any special operation.
According to the fifteenth aspect of the present invention, if a rule is defined in advance, a region to be processed can be selected without requiring any special operation.
According to the sixteenth aspect of the invention, if a rule is defined in advance, a region to be processed can be selected without requiring any special operation.
According to the seventeenth aspect of the invention, if a rule is defined in advance, a region to be processed can be selected without requiring any special operation.
According to the eighteenth aspect of the invention, it is possible to specify the area to be processed without assuming that the area is specified manually.

実施の形態１における使用の形態の例を説明する図である。FIG. 2 is a diagram for explaining an example of usage in the first embodiment; FIG. 実施の形態１における情報処理装置のハードウェア構成の一例を説明する図である。2 is a diagram illustrating an example of a hardware configuration of an information processing device according to Embodiment 1; FIG. 実施の形態１における情報処理装置の機能構成の一例を説明する図である。2 is a diagram illustrating an example of a functional configuration of an information processing device according to Embodiment 1; FIG. 被写体と深度の情報の関係を説明する図である。（Ａ）はワイングラスの各部と深度の関係を示し、（Ｂ）はワイングラスについてのヒストグラムを示す。FIG. 4 is a diagram for explaining the relationship between subject and depth information; (A) shows the relationship between each part of the wine glass and depth, and (B) shows a histogram of the wine glass. １つの画像について生成されたヒストグラムの一例を説明する図である。FIG. 4 is a diagram illustrating an example of a histogram generated for one image; FIG. 確率密度関数の波形とクラスタリング後のヒストグラムを説明する図である。（Ａ）は確率密度関数の波形を示し、（Ｂ）はクラスタリング後のヒストグラムを示す。It is a figure explaining the waveform of a probability density function, and the histogram after clustering. (A) shows the waveform of the probability density function, and (B) shows the histogram after clustering. 代表画素（シード）として設定する画素を選択する方法の一例を示す図である。（Ａ）はクラスタリング後のヒストグラムの例を示し、（Ｂ）は選択に用いる手法の例を示す。It is a figure which shows an example of the method of selecting the pixel set as a representative pixel (seed). (A) shows an example of a histogram after clustering, and (B) shows an example of a method used for selection. 代表画素（シード）として設定する画素を選択する方法の他の例を示す図である。FIG. 10 is a diagram showing another example of a method of selecting pixels to be set as representative pixels (seeds); クラスタと代表画素（シード）の関係を説明する図である。（Ａ）はクラスタの分布を示し、（Ｂ）は代表画素（シード）の画像へのマッピング例を示す。It is a figure explaining the relationship between a cluster and a representative pixel (seed). (A) shows the distribution of clusters, and (B) shows an example of mapping representative pixels (seeds) to an image. 領域分割部による処理の前後の関係を説明する図である。（Ａ）は代表画素（シード）がマッピングされた直後の状態を示し、（Ｂ）は領域の分割が終了した時点の状態を示す。It is a figure explaining the relationship before and behind the process by an area|region division part. (A) shows the state immediately after the representative pixels (seeds) are mapped, and (B) shows the state at the time when the region division is completed. 特定された領域の一例を説明する図である。It is a figure explaining an example of the pinpointed field. 画像処理によって画面の中央に近いワイングラス内のワインの色が変更される様子を説明する図である。（Ａ）は変更前の画像の例を示し、（Ｂ）は変更後の画像の例を示す。FIG. 5 is a diagram illustrating how image processing changes the color of wine in a wine glass near the center of the screen; (A) shows an example of an image before change, and (B) shows an example of an image after change. 情報処理装置が実行する処理動作を説明するフローチャートである。4 is a flowchart for explaining processing operations performed by an information processing apparatus; 実施の形態２における情報処理装置のハードウェア構成の一例を説明する図である。FIG. 10 is a diagram illustrating an example of a hardware configuration of an information processing device according to a second embodiment; FIG. 実施の形態２における情報処理装置の機能構成の一例を説明する図である。FIG. 10 is a diagram illustrating an example of a functional configuration of an information processing device according to Embodiment 2; 深度情報取得部で実行される機能上の構成を説明する図である。4 is a diagram illustrating a functional configuration executed by a depth information acquisition unit; FIG. 実施の形態３における使用の形態の例を説明する図である。FIG. 11 is a diagram for explaining an example of a usage form in Embodiment 3; 実施の形態３における情報処理装置のハードウェア構成の一例を説明する図である。FIG. 12 is a diagram illustrating an example of a hardware configuration of an information processing device according to Embodiment 3; FIG. フィルタ付きカメラのレンズの開口部に装着されるフィルタの構造例を示す図である。It is a figure which shows the structural example of the filter with which the aperture part of the lens of a camera with a filter is mounted|worn. 実施の形態３における情報処理装置の機能構成の一例を説明する図である。FIG. 13 is a diagram illustrating an example of a functional configuration of an information processing device according to Embodiment 3; 実施の形態４における使用の形態の例を説明する図である。FIG. 11 is a diagram for explaining an example of usage in Embodiment 4; 情報処理装置の基本構成を説明する図である。It is a figure explaining the basic composition of an information processing apparatus. 実施の形態４における情報処理装置の機能構成の一例を説明する図である。FIG. 12 is a diagram illustrating an example of a functional configuration of an information processing device according to a fourth embodiment; FIG. 実施の形態４における情報処理装置の機能構成の他の例を説明する図である。FIG. 13 is a diagram illustrating another example of the functional configuration of the information processing device according to Embodiment 4;

以下、図面を参照して、本発明の実施の形態を説明する。
＜実施の形態１＞
＜使用の形態＞
図１は、実施の形態１における使用の形態の例を説明する図である。
図１では、現実の空間に実在する物体（卓１０、ワイングラス１１、１２、皿１３、ごみ箱１４）を、カメラ機能付きの情報処理装置２０で撮像する場面を表している。なお、卓１０の奥には壁があり、卓１０は床面の上に置かれている。
図１では、情報処理装置２０が、いわゆるスマートフォンの場合を表している。もっとも、情報処理装置２０は、いわゆるデジタルカメラやビデオカメラでもよい。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Embodiment 1>
<Mode of use>
FIG. 1 is a diagram for explaining an example of a mode of use according to Embodiment 1. FIG.
FIG. 1 shows a scene in which objects (table 10, wine glasses 11 and 12, plate 13, trash can 14) that actually exist in a real space are imaged by an information processing device 20 with a camera function. There is a wall behind the table 10, and the table 10 is placed on the floor.
FIG. 1 shows a case where the information processing device 20 is a so-called smart phone. However, the information processing device 20 may be a so-called digital camera or video camera.

本実施の形態の場合、情報処理装置２０は、現実の空間に実在する物体を、静止画像又は動画像の形式で撮像する。現実の空間を撮像して得られる静止画像等の各画素には、特定の領域への帰属を示す情報は含まれていない。換言すると、情報処理装置２０が扱う静止画像等はオブジェクトの情報を含んでいない。
ここで、オブジェクトとは、コンピュータが構築する仮想の空間で定義された取り扱いの単位をいい、ピクセルの集合体、ポリゴンの集合体、又はボクセルの集合体として与えられる。 In the case of the present embodiment, the information processing device 20 captures an image of an object that actually exists in a real space in the form of a still image or a moving image. Each pixel of a still image or the like obtained by imaging a real space does not contain information indicating belonging to a specific region. In other words, still images and the like handled by the information processing apparatus 20 do not contain object information.
Here, an object refers to a unit of handling defined in a virtual space constructed by a computer, and given as a collection of pixels, a collection of polygons, or a collection of voxels.

＜装置構成＞
図２は、実施の形態１における情報処理装置２０のハードウェア構成の一例を説明する図である。
情報処理装置２０は、データを処理する処理回路部２００と、プログラムやデータを記憶する不揮発性のメモリ２１１と、オーディオ信号を入出力するオーディオ回路２１２と、画像を表示する液晶ディスプレイ（ＬＣＤ）２１３と、電源制御デバイス２１４と、現実の空間に実在する物体の画像を撮像するカメラ２１５と、カメラ２１５の前方に実在する物体までの距離を測定する距離センサ２１６と、ユーザの操作を検知するタッチパネル２１７と、ＷｉＦｉ（登録商標）規格に準拠する無線信号を送受信するＷｉＦｉ（Wireless Fidelity）モジュール２１８と、近距離無線通信規格の１つであるブルートゥース（登録商標）規格に準拠する無線信号を送受信するBluetoothモジュール２１９とを有している。 <Device configuration>
FIG. 2 is a diagram illustrating an example of the hardware configuration of the information processing device 20 according to the first embodiment.
The information processing device 20 includes a processing circuit unit 200 that processes data, a nonvolatile memory 211 that stores programs and data, an audio circuit 212 that inputs and outputs audio signals, and a liquid crystal display (LCD) 213 that displays images. , a power control device 214, a camera 215 that captures an image of an object that actually exists in the real space, a distance sensor 216 that measures the distance to the object that actually exists in front of the camera 215, and a touch panel that detects user operations. 217, a WiFi (Wireless Fidelity) module 218 that transmits and receives wireless signals conforming to the WiFi (registered trademark) standard, and a wireless signal that conforms to the Bluetooth (registered trademark) standard, which is one of short-range wireless communication standards. and a Bluetooth module 219 .

本実施の形態の場合、距離センサ２１６には、例えば光学式センサや超音波式センサが用いられる。光学式センサには、例えば赤外線を発光する赤外線ＬＥＤ（Light Emitting Diode）と、赤外線に反応するフォトトランジスタとで構成される赤外線センサがある。なお、距離センサ２１６は、光源から発した光が対象物で反射してセンサに届くまでの光の飛行時間（時間差）を画毎に測定することにより対象物までの距離を求めるＴＯＦ（Time Of Flight）方式のセンサでもよい。 In this embodiment, the distance sensor 216 is, for example, an optical sensor or an ultrasonic sensor. Optical sensors include, for example, an infrared sensor composed of an infrared LED (Light Emitting Diode) that emits infrared rays and a phototransistor that reacts to infrared rays. Note that the distance sensor 216 measures the flight time (time difference) of the light emitted from the light source until it reaches the sensor after being reflected by the object for each image, thereby obtaining the distance to the object (TOF). Flight) type sensor may be used.

なお、本実施の形態における処理回路部２００には、メモリ２１１との間でデータの読み書きを制御するメモリ制御部２０１と、処理を実行する複数のＣＰＵコア（Ｃｏｒｅ）２０２と、電源の供給を管理する電源管理部２０３と、組み込みシステムの動作を管理するシステム管理部２０４と、オーディオ信号を処理するオーディオ処理部２０５と、実時間で画像を処理するＧＰＵ（Graphics Processing Unit）２０６と、液晶ディスプレイ２１３に画像を表示するディスプレイ制御部２０７と、外部のモジュールとの接続に使用される外部インタフェース（Ｉ／Ｆ）２０８と、ベースバンド信号を処理するベースバンド回路２０９とが設けられている。 Note that the processing circuit unit 200 in this embodiment includes a memory control unit 201 that controls reading and writing of data with a memory 211, a plurality of CPU cores (Cores) 202 that execute processing, and power supply. A power management unit 203 to manage, a system management unit 204 to manage the operation of the embedded system, an audio processing unit 205 to process audio signals, a GPU (Graphics Processing Unit) 206 to process images in real time, and a liquid crystal display 213 is provided with a display control unit 207 for displaying images, an external interface (I/F) 208 used for connection with an external module, and a baseband circuit 209 for processing baseband signals.

＜機能構成＞
図３は、実施の形態１における情報処理装置２０の機能構成の一例を説明する図である。
図３に示す機能上の構成は、ＣＰＵコア２０２によるプログラムの実行を通じて実現される。
機能の観点からみた情報処理装置２０は、カメラ２１５から画像データを取得する画像情報取得部２２１と、距離センサ２１６から距離センサと被写体との奥行き方向の距離を示す深度の情報を取得する深度情報取得部２２２と、深度別に出現する画素の個数（度数）を集計してヒストグラムを生成する深度ヒストグラム生成部２２３と、奥行き方向の深度が近似する画素の集合を塊（クラスタ）として抽出する深度クラスタリング実行部２２４と、クラスタに属する画素の座標を画面内の代表画素（シード）を与えるシード座標として設定するシード座標設定部２２５と、シード座標の画素を起点として画素値が類似する周囲の画素の統合を繰り返して画像を複数の領域に分割する領域分割部２２６と、分割後の領域のうち処理の対象とする領域を特定する対象領域特定部２２７と、特定された領域の画像に予め定めた処理を加える画像処理部２２８とを有している。 <Functional configuration>
FIG. 3 is a diagram illustrating an example of the functional configuration of the information processing device 20 according to the first embodiment.
The functional configuration shown in FIG. 3 is realized through program execution by the CPU core 202 .
From the viewpoint of functions, the information processing apparatus 20 includes an image information acquisition unit 221 that acquires image data from the camera 215, and depth information that acquires depth information indicating the distance between the distance sensor and the subject in the depth direction from the distance sensor 216. An acquisition unit 222, a depth histogram generation unit 223 that aggregates the number (frequency) of pixels that appear for each depth and generates a histogram, and depth clustering that extracts a set of pixels having similar depths in the depth direction as clusters. an execution unit 224; a seed coordinate setting unit 225 that sets the coordinates of pixels belonging to a cluster as seed coordinates that give a representative pixel (seed) in the screen; A region dividing unit 226 that divides an image into a plurality of regions by repeating integration, a target region identifying unit 227 that identifies a region to be processed among the divided regions, and a predetermined and an image processing unit 228 for applying processing.

ここで、画像情報取得部２２１は、静止画像又は動画像に対応する画像データについて、１画面を構成する各画素の画素値（例えばＲＧＢ値）を取得する。なお、画像データは、カラー画像として与えられてもグレースケール画像として与えられても構わない。なお、ここでの画像データには、深度の情報が対応付けられている。
深度情報取得部２２２は、画像の撮像と同時に距離センサ２１６から深度の情報を取得する。 Here, the image information acquisition unit 221 acquires pixel values (for example, RGB values) of pixels forming one screen for image data corresponding to a still image or a moving image. The image data may be given as a color image or as a grayscale image. It should be noted that depth information is associated with the image data here.
The depth information acquisition unit 222 acquires depth information from the distance sensor 216 at the same time as the image is captured.

図４は、被写体と深度の情報の関係を説明する図である。（Ａ）はワイングラス１１の各部と深度の関係を示し、（Ｂ）はワイングラス１１についてのヒストグラムを示す。
図４の場合、ワイングラス１１の持ち手（ステム）の深度は１０５センチメートルであり、ワインが満たされる胴（ボウル）のうち手前側の深度は１００センチメートルであり、飲み口（リップ又はリム）のうち奥側の深度は１１０センチメートルである。つまり、図４に示すワイングラス１１の胴部は直径１０センチメートルの放物面を有している。この例の場合、持ち手までの距離にあたる１０５センチメートルの付近に深度の集中が認められる。
現実の空間では、ワイングラス１１の周囲に他の物体が存在する（図１参照）。このため、ワイングラス１１を取り囲む画素には、他の物体までの深度が割り当てられる。 FIG. 4 is a diagram for explaining the relationship between subject and depth information. (A) shows the relationship between each part of the wine glass 11 and the depth, and (B) shows a histogram of the wine glass 11. FIG.
In the case of FIG. 4, the depth of the handle (stem) of the wine glass 11 is 105 cm, the depth of the front side of the body (bowl) filled with wine is 100 cm, and the mouth (lip or rim) is 100 cm. ) is 110 cm. That is, the barrel of the wine glass 11 shown in FIG. 4 has a paraboloid with a diameter of 10 centimeters. In the case of this example, the concentration of depth is recognized around 105 cm, which is the distance to the handle.
In real space, other objects exist around the wineglass 11 (see FIG. 1). Thus, the pixels surrounding the wineglass 11 are assigned depths to other objects.

１つの物体を構成する面は連続しているため、物体に対応する画素の深度は、特定の深度を中心に広がるように出現する。
もっとも、被写体となる物体の形状は任意であるので（ワイングラスのように回転対称とは限らず、表面に凹凸や不規則な形状を有する場合もあるので）、深度の分布は理想的な確率密度分布になるとは限らない。ただし、１つの物体に対応する深度の分布は、基本的には山形に現れ易い。
図３の説明に戻る。 Since the surfaces forming one object are continuous, the depth of the pixels corresponding to the object appears to spread around a specific depth.
However, since the shape of the object to be photographed is arbitrary (it may not be rotationally symmetrical like a wine glass, and may have uneven or irregular shapes on its surface), the depth distribution is an ideal probability. It is not necessarily the density distribution. However, the distribution of depths corresponding to one object basically tends to appear in a mountain shape.
Returning to the description of FIG.

深度ヒストグラム生成部２２３は、取得された深度の情報に基づいてヒストグラムを生成する。本実施の形態におけるヒストグラムは、１つの画像を構成する各画素について取得された深度の値の分布を表している。
図５は、１つの画像について生成されたヒストグラムの一例を説明する図である。図中、横軸は深度であり、縦軸は度数である。なお、横軸の単位はセンチメートルである。
図５の場合、山形の波形に対して出現する順番に数字を付して示している。
深度ヒストグラム生成部２２３が生成したヒストグラムには、雑音成分（ノイズ）も含まれている。図５の例では、６番目の波形が雑音成分に対応する。
なお、撮像する地点の近くから遠方まで広がる地面や道に対応する画素の深度は、山形の波形ではなく、平坦な形状になり易い。また、壁に対応する画素の深度は、前景を構成する他の物体に対応する画素の深度よりも奥側に出現される。
図３の説明に戻る。 The depth histogram generation unit 223 generates a histogram based on the acquired depth information. The histogram in the present embodiment represents the distribution of depth values obtained for each pixel that constitutes one image.
FIG. 5 is a diagram explaining an example of a histogram generated for one image. In the figure, the horizontal axis is depth and the vertical axis is frequency. Note that the unit of the horizontal axis is centimeters.
In the case of FIG. 5, numbers are attached to the chevron waveforms in the order in which they appear.
The histogram generated by the depth histogram generation unit 223 also contains a noise component (noise). In the example of FIG. 5, the 6th waveform corresponds to the noise component.
It should be noted that the depth of the pixels corresponding to the ground or road extending from near to the point to be imaged is likely to have a flat shape rather than a chevron waveform. In addition, the depth of the pixels corresponding to the wall appears on the back side of the depth of the pixels corresponding to other objects forming the foreground.
Returning to the description of FIG.

深度クラスタリング実行部２２４は、予め定めた条件を満たす深度をクラスタとして特定する。例えば深度クラスタリング実行部２２４は、予め定めた関数の形状に類似する分布が出現する深度の範囲をクラスタとして特定する。また例えば深度クラスタリング実行部２２４は、予め定めた閾値を超える度数が出現する深度の範囲をクラスタとして特定する。
クラスタリングに使用する関数を基底関数という。本実施の形態では、基底関数として確率密度関数を使用する。確率密度関数は、単峰性を有し、かつ、３σ（標準偏差）以上の度数がほぼ０（ゼロ）になる特徴を有している。
本実施の形態では、基底関数として確率密度関数を用いているが、他の関数を用いても構わない。また、複数の確率密度関数を用意し、撮像時の環境、場面、被写体等に応じて使用する既定関数を切り替えてもよい。 The depth clustering execution unit 224 identifies depths that satisfy a predetermined condition as clusters. For example, the depth clustering execution unit 224 identifies, as a cluster, a range of depths in which a distribution similar to the shape of a predetermined function appears. Further, for example, the depth clustering execution unit 224 identifies, as a cluster, a range of depths in which frequencies exceeding a predetermined threshold appear.
A function used for clustering is called a basis function. In this embodiment, probability density functions are used as basis functions. The probability density function has a unimodal characteristic, and the frequency of 3σ (standard deviation) or more is almost 0 (zero).
Although the probability density function is used as the basis function in this embodiment, other functions may be used. Alternatively, a plurality of probability density functions may be prepared, and the default function to be used may be switched according to the environment, scene, subject, etc. at the time of imaging.

図６は、確率密度関数の波形とクラスタリング後のヒストグラムを説明する図である。（Ａ）は確率密度関数の波形を示し、（Ｂ）はクラスタリング後のヒストグラムを示す。なお、基底関数には離散型の関数には、例えば正規分布がある。
図６の場合、クラスタリングの処理により、図５においては６番目に出現していた波形が除かれている。結果として、クラスタリング後のヒストグラムには、６つの山形の波形が残されている。
なお、個々の山形の波形は、現実の空間に実在する１つの物体に対応するとは限らない。個々の物体に対応する深度の分布は前述したように山形の波形になり易いが（図４参照）、１つの画面内には深度の分布が重なる複数の物体が実在する可能性もある。このため、１つのクラスタに属する画素が１つの物体に対応するとは限らない。勿論、１つのクラスタに属する画素が１つの物体に対応する場合もある。
ここでの深度ヒストグラム生成部２２３と深度クラスタリング実行部２２４は、抽出手段の一例である。
図３の説明に戻る。 FIG. 6 is a diagram for explaining the waveform of the probability density function and the histogram after clustering. (A) shows the waveform of the probability density function, and (B) shows the histogram after clustering. It should be noted that the basis functions include, for example, a normal distribution as a discrete function.
In the case of FIG. 6, the waveform that appears sixth in FIG. 5 is removed by the clustering process. As a result, six chevron waveforms remain in the histogram after clustering.
Note that each chevron-shaped waveform does not necessarily correspond to one object that actually exists in the actual space. As described above, the depth distribution corresponding to each object tends to be a chevron waveform (see FIG. 4), but there is a possibility that multiple objects with overlapping depth distributions exist within one screen. Therefore, pixels belonging to one cluster do not necessarily correspond to one object. Of course, pixels belonging to one cluster may correspond to one object.
The depth histogram generation unit 223 and the depth clustering execution unit 224 here are examples of extraction means.
Returning to the description of FIG.

シード座標設定部２２５は、クラスタに属する画素の全部又は一部の座標をシード座標として設定する。本実施の形態では、クラスタに属する画素の一部を画面内の代表画素（シード）として使用する。ここでのシード座標設定部２２５は、設定手段の一例である。
図７は、代表画素（シード）として設定する画素を選択する方法の一例を示す図である。（Ａ）はクラスタリング後のヒストグラムの例を示し、（Ｂ）は選択に用いる手法の例を示す。
図７では、ヒストグラムの５番目に出現するクラスタから代表画素（シード）を選択する方法の例を３つ表している。図７では、対応するクラスタを破線で囲んで示している。 The seed coordinate setting unit 225 sets the coordinates of all or part of the pixels belonging to the cluster as seed coordinates. In this embodiment, some of the pixels belonging to the cluster are used as representative pixels (seeds) in the screen. The seed coordinate setting unit 225 here is an example of setting means.
FIG. 7 is a diagram showing an example of a method of selecting pixels to be set as representative pixels (seeds). (A) shows an example of a histogram after clustering, and (B) shows an example of a method used for selection.
FIG. 7 shows three examples of a method of selecting a representative pixel (seed) from the cluster appearing fifth in the histogram. In FIG. 7, the corresponding clusters are shown enclosed by dashed lines.

例１は、クラスタに属する画素の全てを代表画素（シード）に選択する場合である。この手法は、代表画素（シード）に用いる画素の数を最も多くできる。一方で、別の物体に属する画素の情報も含まれ易くなる。
例２は、頻度が極大となる深度に対応する画素を代表画素（シード）に選択する場合である。この手法は、現実の空間でも単一の物体に属する画素である可能性が高い。一方で、代表画素（シード）を構成する画素の数が少なくなるため、代表画素（シード）を構成する画素で規定される領域を拡張する演算の負担が大きくなる。
例３は、頻度が最大となる深度（クラスタに対応する深度の中央値）に対して分布が標準偏差σの範囲に含まれる画素を代表画素（シード）に選択する場合である。この手法は、代表画素（シード）を構成する画素の数が多いだけでなく、現実の空間において単一の物体に属する画素である可能性も高い。なお、代表画素（シード）として選択する範囲は、標準偏差σに乗ずる定数の大きさの変更により調整が可能である。 Example 1 is a case where all pixels belonging to a cluster are selected as representative pixels (seeds). This technique allows the largest number of pixels to be used as representative pixels (seeds). On the other hand, information about pixels belonging to other objects is also likely to be included.
Example 2 is a case where a pixel corresponding to the depth with the maximum frequency is selected as a representative pixel (seed). This approach is likely to result in pixels belonging to a single object even in real space. On the other hand, since the number of pixels forming the representative pixel (seed) is reduced, the computational load for expanding the area defined by the pixels forming the representative pixel (seed) increases.
Example 3 is a case of selecting, as representative pixels (seeds), pixels whose distribution is included in the range of standard deviation σ with respect to depths (median values of depths corresponding to clusters) with the highest frequency. In this method, not only is there a large number of pixels forming representative pixels (seeds), but there is also a high possibility that the pixels belong to a single object in real space. Note that the range selected as the representative pixel (seed) can be adjusted by changing the magnitude of the constant that is multiplied by the standard deviation σ.

図８は、代表画素（シード）として設定する画素を選択する方法の他の例を示す図である。
図８では、予め定めた閾値ＴＨを超える度数を有する深度の範囲を代表画素（シード）に選択している。
図８に示す例４では、出現する度数が少ない３番目のクラスタが代表画素（シード）から除かれている。なお、閾値ＴＨを複数用意し、選択的に適用できるようにしてもよい。また、クラスタ毎に閾値ＴＨを変更してもよい。例えば個々のクラスタの極大値の大きさに応じて閾値ＴＨを選択してもよい。 FIG. 8 is a diagram showing another example of a method of selecting pixels to be set as representative pixels (seeds).
In FIG. 8, a depth range having a frequency exceeding a predetermined threshold TH is selected as a representative pixel (seed).
In Example 4 shown in FIG. 8, the third cluster with a low appearance frequency is excluded from the representative pixels (seed). A plurality of thresholds TH may be prepared and selectively applied. Also, the threshold TH may be changed for each cluster. For example, the threshold TH may be selected according to the magnitude of the maximum value of each cluster.

なお、代表画素（シード）の選択に例２（図７参照）や例４（図８参照）を用いる場合には、ヒストグラムの生成やクラスタリングを実行しなくても、対象とする画素を選択することが可能である。すなわち、深度ヒストグラム生成部２２３（図３参照）や深度クラスタリング実行部２２４（図３参照）を用いなくても、代表画素（シード）を与える画素の選択が可能である。
ただし、この手法だけでは、ヒストグラムに対するクラスタリングとは異なり、雑音成分に対応する深度の集中に対応する画素が代表画素（シード）として選択される可能性がある。 Note that when example 2 (see FIG. 7) or example 4 (see FIG. 8) is used to select representative pixels (seeds), target pixels are selected without executing histogram generation or clustering. It is possible. That is, it is possible to select pixels to give representative pixels (seeds) without using the depth histogram generation unit 223 (see FIG. 3) or the depth clustering execution unit 224 (see FIG. 3).
However, with this method alone, there is a possibility that pixels corresponding to concentration of depths corresponding to noise components are selected as representative pixels (seeds), unlike clustering with respect to histograms.

図９は、クラスタと代表画素（シード）の関係を説明する図である。（Ａ）はクラスタの分布を示し、（Ｂ）は代表画素（シード）の画像へのマッピング例を示す。
１つのクラスタに対応する代表画素（シード）は、多くの場合、実在する１つの物体の奥行きの方向の位置の違いを表している。このため、代表画素（シード）は、画像内に塊として出現する傾向が認められる。
図３の説明に戻る。 FIG. 9 is a diagram for explaining the relationship between clusters and representative pixels (seeds). (A) shows the distribution of clusters, and (B) shows an example of mapping representative pixels (seeds) to an image.
A representative pixel (seed) corresponding to one cluster often represents a difference in position in the depth direction of one existing object. Therefore, representative pixels (seeds) tend to appear as clusters in the image.
Returning to the description of FIG.

領域分割部２２６は、既知の手法を使用して個々の代表画素（シード）で規定される領域と関連が認められた周囲の画素を統合し、画像を複数の領域に分割する処理を実行する。領域分割部２２６では、例えばグラフカット（Ｇｒａｐｈ－Ｃｕｔ）法、グローカット（Ｇｒｏｗ－Ｃｕｔ）法、セルラーカット（Ｃｅｌｌｕｌａｒ－Ｃｕｔ）法などが使用される。
グラフカット法は、代表画素（シード）の画素に類似する色を持つ隣接する画素を結合しながら領域を拡大し、隣接する画素の色の差が大きい部分に領域の境界を設定する手法である。
グローカット法は、隣接する画素の画素値の差に基づいて全ての画素をいずれかの領域に割り付ける手法である。
セルラーカット法は、画面の全画素を細胞（Ｃｅｌｌ）と見立て、起点となる画素と周辺の画素とを色特性の近さに応じた重みで関係付け、乗算された重みを周辺に伝搬させることで領域に属する画素を決定する手法である。 The region dividing unit 226 uses a known technique to integrate the surrounding pixels that are found to be related to the region defined by the individual representative pixels (seed), and executes the process of dividing the image into a plurality of regions. . The region dividing unit 226 uses, for example, a Graph-Cut method, a Grow-Cut method, a Cellular-Cut method, or the like.
The graph cut method is a method in which adjacent pixels with colors similar to the representative pixel (seed) are combined to expand the area, and the boundary of the area is set at the portion where the color difference between the adjacent pixels is large. .
The grow cut method is a method of allocating all pixels to any region based on the difference in pixel value between adjacent pixels.
In the cellular cut method, all pixels on the screen are regarded as cells, the starting point pixel and surrounding pixels are related with weights according to the closeness of color characteristics, and the multiplied weights are propagated to the surroundings. is a method of determining pixels belonging to a region by

図１０は、領域分割部２２６による処理の前後の関係を説明する図である。（Ａ）は代表画素（シード）がマッピングされた直後の状態（図９参照）を示し、（Ｂ）は領域の分割が終了した時点の状態を示す。
図１０では、画面が、壁や床面に対応する背景に対応する領域、卓１０に対応する領域、ワイングラス１１に対応する領域、ワイングラス１２に対応する領域、皿１３に対応する領域、ごみ箱１４に対応する領域の６つに分割されている。
図３の説明に戻る。 10A and 10B are diagrams for explaining the relationship before and after the processing by the area division unit 226. FIG. (A) shows the state immediately after the representative pixels (seeds) are mapped (see FIG. 9), and (B) shows the state at the time when the region division is completed.
In FIG. 10, the screen has an area corresponding to the background corresponding to the wall and floor, an area corresponding to the table 10, an area corresponding to the wine glass 11, an area corresponding to the wine glass 12, an area corresponding to the plate 13, It is divided into six areas corresponding to the trash can 14 .
Returning to the description of FIG.

対象領域特定部２２７は、処理の対象とする領域を、予め定めた規則に基づいて分割された領域の中から選択的に特定する。
選択的に特定する領域の数は１つに限る必要はない。規則に基づいて分割された領域間に順位が付けられる場合、上位の領域から予め指定した個数の領域を選択してもよい。
もっとも、全ての領域を処理の対象として特定してもよい。
なお、特定する個数の数は、処理の目的により異なってもよい。 The target region specifying unit 227 selectively specifies a region to be processed from the divided regions based on a predetermined rule.
The number of selectively specified regions need not be limited to one. When the regions divided based on the rule are ranked, a predetermined number of regions may be selected from the higher regions.
However, all regions may be specified as targets for processing.
Note that the specified number may vary depending on the purpose of the processing.

ここで、領域の特定に使用する規則は、処理の内容に応じて選択されることが望ましい。なお、使用する規則は１つでも複数でもよい。
また、規則に基づいた選択は、複数の領域の深度の違いが予め定めた範囲内である場合（複数の領域の深度がほぼ等しい場合）に限りに実行してもよい。 Here, it is desirable that the rule used for identifying the area be selected according to the content of the processing. One or more rules may be used.
Also, rule-based selection may be performed only when the difference in depth between the multiple regions is within a predetermined range (when the depths of the multiple regions are approximately equal).

ここでの規則の１つには、例えば画面内の中央に近い領域の優先がある。情報処理装置２０で像を撮像する場合には、ユーザが注目している領域が画面の中央に位置することが多いためである。
他の規則には、ユーザの視線が集中する領域の優先がある。この規則の利用には、ユーザの視線の方向を検知するセンサが情報処理装置２０に内蔵されていることが前提となる。視線の方向は、ユーザが注目している領域とみなしてよいためである。 One of the rules here is, for example, priority for areas near the center in the screen. This is because when an image is captured by the information processing apparatus 20, the area that the user is paying attention to is often located in the center of the screen.
Other rules include prioritizing regions where the user's gaze is concentrated. The use of this rule is based on the assumption that the information processing apparatus 20 has a built-in sensor that detects the direction of the user's line of sight. This is because the line-of-sight direction may be regarded as the area on which the user is paying attention.

他の規則には、例えば画面内における面積が大きい領域の優先がある。ただし、背景に対応する領域の面積が大きくなりやすいため、面積の大きさが２番目以降の領域を優先してもよいし、優先する面積の大きさに制限を加えてもよい。なお、面積が小さい領域を優先する規則があってもよい。
他の規則には、タッチパネル２１７（図２参照）で検知された座標を含む領域を優先してもよい。タッチパネル２１７で指定された部分はユーザが注目している領域であるためである。ここでのタッチ操作は、代表画素（シード）の特定とは異なる操作である。 Other rules include, for example, prioritizing large area areas within the screen. However, since the area of the area corresponding to the background tends to increase, priority may be given to areas with the second and subsequent areas, or the size of the area to be prioritized may be restricted. It should be noted that there may be a rule that prioritizes areas with small areas.
Other rules may prioritize areas containing coordinates sensed on touch panel 217 (see FIG. 2). This is because the portion designated on the touch panel 217 is the area that the user is paying attention to. The touch operation here is an operation different from specifying a representative pixel (seed).

図１１は、特定された領域の一例を説明する図である。
図１１では、現実の空間に実在する物体に対応する像を、卓の像１０Ａ、ワイングラスの像１１Ａ、１２Ａ、皿の像１３Ａ、ごみ箱の像１４Ａとして表している。
また、図１１では、特定された状態を表現するため、対象領域特定部２２７（図３参照）によって特定された領域を矩形の破線で囲んで示している。図１１の例では、卓の像１０Ａに載せられているワイングラスの像１１Ａ、１２Ａと皿の像１３Ａが処理の対象である。
図３の説明に戻る。 FIG. 11 is a diagram illustrating an example of the specified area.
In FIG. 11, images corresponding to objects that actually exist in the actual space are represented as a table image 10A, wine glass images 11A and 12A, a plate image 13A, and a trash can image 14A.
In addition, in FIG. 11, the area specified by the target area specifying unit 227 (see FIG. 3) is surrounded by a rectangular dashed line to express the specified state. In the example of FIG. 11, the wine glass images 11A and 12A and the plate image 13A placed on the table image 10A are to be processed.
Returning to the description of FIG.

画像処理部２２８では、特定された領域の印象や質感を変更する処理等が実行される。ここでの処理は、例えば領域内の色分布、領域の境界、領域の表面の変更を組み合わせて実行される。本実施の形態では、これらの処理を画像処理という。もっとも、図３に示す他の機能部の処理も画像処理の一形態である。
図１２は、画像処理によって画面の中央に近いワイングラス内のワインの色が変更される様子を説明する図である。（Ａ）は変更前の画面の例を示し、（Ｂ）は変更後の画面の例を示す。
なお、画像処理部２２８は、画像処理の結果を、液晶ディスプレイ２１３（図２参照）に出力する。 The image processing unit 228 executes processing such as changing the impression and texture of the identified region. The processing here is performed by combining, for example, changing the color distribution within the region, the boundary of the region, and the surface of the region. In this embodiment, these processes are called image processing. However, the processing of other functional units shown in FIG. 3 is also one form of image processing.
FIG. 12 is a diagram for explaining how the color of wine in a wine glass near the center of the screen is changed by image processing. (A) shows an example of the screen before change, and (B) shows an example of the screen after change.
Note that the image processing unit 228 outputs the result of image processing to the liquid crystal display 213 (see FIG. 2).

＜処理動作＞
ここでは、情報処理装置２０（図１参照）の処理動作について説明する。
図１３は、情報処理装置２０が実行する処理動作を説明するフローチャートである。
情報処理装置２０は、撮像の開始が不図示の操作を通じてユーザから与えられることで、図１３に示す処理を開始する。
まず、情報処理装置２０は、画像の撮像と同時に、画素毎の深度の情報を取得する（ステップ１０１）。なお、図中のＳは、ステップの略称である。 <Processing operation>
Here, the processing operation of the information processing device 20 (see FIG. 1) will be described.
FIG. 13 is a flow chart for explaining the processing operations performed by the information processing device 20 .
The information processing apparatus 20 starts the process shown in FIG. 13 when the user instructs the information processing apparatus 20 to start imaging through an operation (not shown).
First, the information processing apparatus 20 acquires depth information for each pixel at the same time as capturing an image (step 101). Note that S in the figure is an abbreviation for a step.

次に、情報処理装置２０は、深度毎の度数を集計する（ステップ１０２）。本実施の形態では、集計後にヒストグラムが生成される。
続いて、情報処理装置２０は、クラスタリングを実行し、雑音成分等を除外する（ステップ１０３）。本実施の形態では、度数の出現が確率密度分布に類似する深度が特定される。このクラスタリングにより、道などの奥行きをもつ平面が、抽出の対象から除外される。また、クラスタの数が分割後の領域の数を規定するため、３つ以上の領域への分割も可能になる。
情報処理装置２０は、個々のクラスタに対応する画素の座標を、画像を複数の領域に分割する処理の起点（すなわち、シード座標）に設定する（ステップ１０４）。予め定めた規則に従ってクラスタ毎に起点を設定できるので、手動による起点の指定が不要になる。 Next, the information processing device 20 tallies the frequencies for each depth (step 102). In this embodiment, a histogram is generated after aggregation.
Subsequently, the information processing device 20 performs clustering and excludes noise components and the like (step 103). In the present embodiment, depths where frequency occurrences resemble a probability density distribution are identified. Due to this clustering, planes with depth, such as roads, are excluded from extraction targets. Also, since the number of clusters defines the number of regions after division, division into three or more regions is also possible.
The information processing device 20 sets the coordinates of pixels corresponding to individual clusters as starting points (ie, seed coordinates) for dividing the image into a plurality of regions (step 104). Since the starting point can be set for each cluster according to a predetermined rule, there is no need to manually specify the starting point.

次に、情報処理装置２０は、シードの画素値を用いて画像を複数の領域に分割する（ステップ１０５）。ここで、画素値（すなわち画像）の解像度は、距離センサ２１６（図２参照）により取得される深度の解像度よりも高い。このため、深度の情報だけを用いて領域を分割する場合に比して分割の精度を高めることが可能になる。また、画素値の情報を用いることで、未定義の物体に対応する領域にも対応することができる。なお、本実施の形態による領域の分割の結果は、手動によってシード座標を与えた場合（グラフカット法、グローカット法、セルラーカット法等）と同等である。
続いて、情報処理装置２０は、規則に基づいて分割後の領域の一部を処理の対象として特定する（ステップ１０６）。
この後、情報処理装置２０は、予め定めた画像処理を実行し、画面の表示に反映する（ステップ１０７）。 Next, the information processing device 20 divides the image into a plurality of regions using the seed pixel values (step 105). Here, the pixel value (ie, image) resolution is higher than the depth resolution obtained by the range sensor 216 (see FIG. 2). Therefore, it is possible to improve the accuracy of division as compared with the case where only depth information is used to divide an area. Also, by using pixel value information, it is possible to deal with an area corresponding to an undefined object. The result of region division according to the present embodiment is the same as when seed coordinates are given manually (graph cut method, grow cut method, cellular cut method, etc.).
Subsequently, the information processing apparatus 20 specifies a part of the divided area as a processing target based on the rule (step 106).
After that, the information processing device 20 executes predetermined image processing and reflects it on the screen display (step 107).

＜実施の形態２＞
本実施の形態では、いわゆるステレオカメラ方式を用いて深度の情報を取得する場合について説明する。
ステレオカメラ方式とは、現実の空間に実在する物体を２台のカメラで同時に撮像した２枚の画像（ステレオ画像）から画像中の各画素の深度を取得する方式をいう。
図１４は、実施の形態２における情報処理装置２０Ａのハードウェア構成の一例を説明する図である。図１４には、図２との対応部分に対応する符号を付して示している。
図１４に示す情報処理装置２０Ａが情報処理装置２０（図１参照）と異なる点は、２台のカメラ２１５Ｒ及び２１５Ｌを使用し、距離センサ２１６（図２参照）を使用しない点である。 <Embodiment 2>
In this embodiment, a case of acquiring depth information using a so-called stereo camera method will be described.
The stereo camera method is a method of acquiring the depth of each pixel in an image from two images (stereo images) of an object that actually exists in a real space taken simultaneously by two cameras.
FIG. 14 is a diagram illustrating an example of the hardware configuration of the information processing device 20A according to the second embodiment. In FIG. 14, parts corresponding to those in FIG. 2 are shown with reference numerals.
The information processing apparatus 20A shown in FIG. 14 differs from the information processing apparatus 20 (see FIG. 1) in that two cameras 215R and 215L are used and the distance sensor 216 (see FIG. 2) is not used.

図１５は、実施の形態２における情報処理装置２０Ａの機能構成の一例を説明する図である。図１５には、図３との対応部分に対応する符号を付して示している。
本実施の形態における情報処理装置２０Ａの場合、２台のカメラ２１５Ｒ及び２１５Ｌのうちの一方から出力される画像データが画像情報取得部２２１に与えられる。図１５の例では、カメラ２１５Ｒから出力される画像データを画像情報取得部２２１に与えられている。
また、深度情報取得部２２２Ａには、２台のカメラ２１５Ｒ及び２１５Ｌの両方から画像データが与えられている。 FIG. 15 is a diagram illustrating an example of the functional configuration of the information processing device 20A according to the second embodiment. In FIG. 15, parts corresponding to those in FIG. 3 are shown with reference numerals.
In the case of the information processing apparatus 20A according to the present embodiment, image data output from one of the two cameras 215R and 215L is provided to the image information acquiring section 221. FIG. In the example of FIG. 15, the image data output from the camera 215R is given to the image information acquiring section 221. In the example of FIG.
Also, the depth information acquisition unit 222A is supplied with image data from both the two cameras 215R and 215L.

図１６は、深度情報取得部２２２Ａで実行される機能上の構成を説明する図である。
ここでの深度情報取得部２２２Ａは、原画像を平均化して平均化画像を生成する処理（例えば３×３画素毎に画素値を平均化する処理）と、原画像と平均化画像との差分を原画像に上乗せすることにより変化分を強調する鮮鋭化処理とを実行する平滑化／鮮鋭化部２５０と、処理後の画像を比較して各領域の視差を深度として推定する深度推定部２５１とを有している。
本実施の形態では、深度推定部２５１で推定された深度の情報を深度ヒストグラム２２３に与えている。
他の構成及び動作は、実施の形態１における情報処理装置２０（図１参照）と共通である。 FIG. 16 is a diagram illustrating a functional configuration executed by the depth information acquisition unit 222A.
The depth information acquisition unit 222A here performs a process of averaging the original image to generate an averaged image (for example, a process of averaging pixel values for each 3×3 pixels), and a difference between the original image and the averaged image. A smoothing/sharpening unit 250 that executes a sharpening process that emphasizes the change by superimposing the . and
In this embodiment, depth information estimated by the depth estimation unit 251 is provided to the depth histogram 223 .
Other configurations and operations are common to the information processing apparatus 20 (see FIG. 1) in the first embodiment.

＜実施の形態３＞
本実施の形態では、いわゆる１台のカメラを用いて深度の情報を取得する場合について説明する。
図１７は、実施の形態３における使用の形態の例を説明する図である。
図１７には、図１との対応部分に対応する符号を付して示している。
本実施の形態で使用する情報処理装置２０Ｂは、現実の空間と仮想の空間とを合成した情報（いわゆる拡張現実や複合現実）をユーザに知覚させることが可能な装置である。
図１７に示す情報処理装置２０Ｂは、いわゆるスマートグラスの場合である。スマートグラスは、メガネを装着する要領で頭部に装着して使用するウェアラブルデバイスの一種である。
図１７に示すスマートグラスは、現実の空間に実在する物体を直接的に視認可能な透過型のディスプレイを表している。図１７に示すスマートグラスは単眼型であるが、両眼型であってもよい。また、ユーザの視界を外部から遮断する非透過型のディスプレイでもよい。 <Embodiment 3>
In this embodiment, a case of acquiring depth information using a so-called single camera will be described.
17A and 17B are diagrams for explaining an example of the mode of use according to the third embodiment. FIG.
In FIG. 17, parts corresponding to those in FIG.
The information processing device 20B used in the present embodiment is a device capable of allowing a user to perceive information (so-called augmented reality or mixed reality) obtained by synthesizing a real space and a virtual space.
The information processing device 20B shown in FIG. 17 is a case of so-called smart glasses. Smart glasses are a type of wearable device that is worn on the head in the same manner as wearing glasses.
The smart glasses shown in FIG. 17 represent a transmissive display that allows direct visual recognition of objects that actually exist in real space. The smart glasses shown in FIG. 17 are of a monocular type, but may be of a binocular type. A non-transmissive display that blocks the user's view from the outside may also be used.

図１８は、実施の形態３における情報処理装置２０Ｂのハードウェア構成の一例を説明する図である。図１８には、図２との対応部分に対応する符号を付して示している。
図１８に示す情報処理装置２０Ｂが情報処理装置２０（図１参照）と異なる点は、網膜走査ディスプレイ２１３Ｂを使用し、液晶ディスプレイ２１３（図２参照）を使用しない点と、フィルタ付きカメラ２１５Ｂを使用し、距離センサ２１６（図２参照）を使用しない点である。
ここでの網膜走査ディスプレイ２１３Ｂは、人の網膜に直接光を当てて映像を映す方式のディスプレイである。 FIG. 18 is a diagram illustrating an example of the hardware configuration of the information processing device 20B according to the third embodiment. In FIG. 18, parts corresponding to those in FIG. 2 are shown with reference numerals corresponding thereto.
The information processing apparatus 20B shown in FIG. 18 differs from the information processing apparatus 20 (see FIG. 1) in that it uses a retinal scanning display 213B and does not use a liquid crystal display 213 (see FIG. 2), and a camera 215B with a filter. 2 and does not use the distance sensor 216 (see FIG. 2).
The retinal scanning display 213B here is a display that projects an image by directly illuminating the retina of a person.

フィルタ付きカメラ２１５Ｂは、単眼の可視光カメラで撮像した１枚の画像から、カラー画像と深度に関する情報とを同時に出力する方式の撮像手段である。この種の技術には、例えば東芝が開発した「カラー開口撮像技術」がある。
図１９は、フィルタ付きカメラ２１５Ｂのレンズの開口部に装着されるフィルタ２７０の構造例を示す図である。フィルタ２７０の半分は水色のカラーフィルタ２７１であり、残りの半分は黄色のカラーフィルタ２７２である。この構造のフィルタ２７０を装着して画像を撮像すると、焦点の位置に対して手前側か奥側かでボケと色ズレの関係が反転する。具体的には、赤、青、緑の３色の光が合わさって発生する左右に非対称の色ズレ（色毎に異なるボケの形状）が、焦点の位置に対して手前側か奥側かで反転する。当然ながら、焦点の位置ではボケや色ズレは生じない。 The filter-equipped camera 215B is an imaging means of a method of simultaneously outputting a color image and depth information from one image captured by a monocular visible light camera. This type of technology includes, for example, "color aperture imaging technology" developed by Toshiba.
FIG. 19 is a diagram showing a structural example of the filter 270 attached to the opening of the lens of the filter-equipped camera 215B. Half of the filter 270 is a light blue color filter 271 and the other half is a yellow color filter 272 . When an image is captured with the filter 270 having this structure attached, the relationship between blurring and color shift is reversed depending on whether the focal point is on the front side or the back side. Specifically, the left-right asymmetrical color shift (bokeh shape that differs for each color) generated by combining the three colors of red, blue, and green light depends on whether it is on the front side or the back side of the focal point. Flip. Naturally, no blurring or color shift occurs at the focal point.

図２０は、実施の形態３における情報処理装置２０Ｂの機能構成の一例を説明する図である。図２０には、図３との対応部分に対応する符号を付して示している。
本実施の形態における情報処理装置２０Ｂの場合、フィルタ付きカメラ２１５Ｂから出力される画像データが画像情報取得部２２１と深度情報取得部２２２Ｂに与えられる。
深度情報取得部２２２Ｂは、色毎に非対称なボケの形状を解析することで深度の情報を取得する前述したボケと色ズレを画像処理にて補正し、画素毎に深度の情報を取得する。
他の構成及び動作は、実施の形態１における情報処理装置２０（図１参照）と共通である。 FIG. 20 is a diagram illustrating an example of the functional configuration of an information processing device 20B according to the third embodiment. In FIG. 20, parts corresponding to those in FIG. 3 are shown with reference numerals.
In the case of the information processing apparatus 20B according to the present embodiment, the image data output from the camera with filter 215B is provided to the image information acquiring section 221 and the depth information acquiring section 222B.
The depth information acquisition unit 222B acquires depth information by analyzing the shape of an asymmetric blur for each color, corrects the aforementioned blurring and color shift by image processing, and acquires depth information for each pixel.
Other configurations and operations are common to the information processing apparatus 20 (see FIG. 1) in the first embodiment.

＜実施の形態４＞
本実施の形態では、画像の撮像手段や画像データの蓄積手段と、画像を処理する情報処理装置とがネットワークを通じて接続されている場合について説明する。
図２１は、実施の形態４における使用の形態の例を説明する図である。
図２１に示す情報処理システム３００は、ネットワーク３０１に対して、現実の空間を撮像するカメラ３０２と、撮像された画像データが蓄積されるファイルサーバ３０３及び３０４と、人手による操作無しに画像処理の対象とする領域を特定する情報処理装置３０５及び３０６を有している。
ネットワーク３０１は、ＬＡＮ（Local Area Network）でも、インターネットでも、クラウドネットワークでもよい。また、ネットワーク３０１は、無線接続でも有線接続でもよい。 <Embodiment 4>
In the present embodiment, a case will be described in which an image capturing means, an image data storage means, and an information processing apparatus that processes an image are connected via a network.
21A and 21B are diagrams for explaining an example of a mode of use according to the fourth embodiment. FIG.
The information processing system 300 shown in FIG. 21 includes a network 301, a camera 302 that captures an image of a real space, file servers 303 and 304 that store captured image data, and image processing without manual operation. It has information processing devices 305 and 306 that specify target areas.
The network 301 may be a LAN (Local Area Network), the Internet, or a cloud network. Also, the network 301 may be a wireless connection or a wired connection.

カメラ３０２は、前述したスマートフォンやスマートグラスに搭載されている撮像素子の場合もあれば、独立した撮像装置の場合もある。独立した撮像装置には、例えば業務用のカメラや家庭用のカメラがある。また、業務用のカメラには、セキュリティーカメラが含まれる。
本実施の形態の場合、ファイルサーバ３０３は、深度情報を有する画像データの蓄積用であり、ファイルサーバ３０４は、深度情報を有していない画像データの蓄積用である。
本実施の形態の場合、情報処理装置３０５は、据え置き型のコンピュータであり、情報処理装置３０６は、携帯型のコンピュータである。 The camera 302 may be an imaging device mounted on the smartphone or smart glasses described above, or may be an independent imaging device. Independent imaging devices include, for example, commercial cameras and home cameras. Commercial cameras also include security cameras.
In this embodiment, the file server 303 is for storing image data with depth information, and the file server 304 is for storing image data without depth information.
In this embodiment, the information processing device 305 is a stationary computer, and the information processing device 306 is a portable computer.

図２２は、情報処理装置３０５及び３０６の基本構成を説明する図である。ここでは、情報処理装置３０５について説明する。
情報処理装置３０５は、装置本体３１１に入力装置３１６と補助記憶装置３１７と出力装置３１８を接続した構成を有している。
装置本体３１１は、中央処理装置３１２と主記憶装置３１５とで構成され、中央処理装置３１２は、各部の動作を制御する制御装置３１３とデータを処理する演算装置３１４とを有している。
主記憶装置３１５は、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）で構成され、データやプログラムが記憶されている。プログラムには、基本ソフトウェアやアプリケーションプログラムが含まれる。
入力装置３１６は、マウス、キーボード、タッチパネル等である。
補助記憶装置３１７は、ハードディスク、光学式記憶媒体、半導体記憶媒体等である。
出力装置３１８は、ディスプレイ、印刷装置等である。 FIG. 22 is a diagram for explaining the basic configuration of the information processing apparatuses 305 and 306. As shown in FIG. Here, the information processing device 305 will be described.
The information processing device 305 has a configuration in which an input device 316 , an auxiliary storage device 317 and an output device 318 are connected to a device body 311 .
The apparatus main body 311 is composed of a central processing unit 312 and a main storage unit 315. The central processing unit 312 has a control unit 313 that controls the operation of each unit and an arithmetic unit 314 that processes data.
The main storage device 315 is composed of ROM (Read Only Memory) and RAM (Random Access Memory), and stores data and programs. Programs include basic software and application programs.
The input device 316 is a mouse, keyboard, touch panel, or the like.
The auxiliary storage device 317 is a hard disk, an optical storage medium, a semiconductor storage medium, or the like.
The output device 318 is a display, a printer, or the like.

図２３は、実施の形態４における情報処理装置３０５及び３０６の機能構成の一例を説明する図である。図２３には、図３との対応部分に対応する符号を付して示している。
図２３に示す機能構成は、深度情報を有する画像データが蓄積されているファイルサーバ３０３から画像データと対応する深度情報とがダウンロードされる場合に対応する。
図２３の場合、画像情報取得部２２１が、ファイルサーバ３０３からダウンロードされるデータの中から画像データを取得し、深度情報取得部２２２Ｃが、ファイルサーバ３０３からダウンロードされるデータの中から深度の情報を取得する。
なお、カメラ３０２から深度の情報も出力される場合には、カメラ３０２の出力を、画像情報取得部２２１と深度情報取得部２２２Ｂに与えればよい。 FIG. 23 is a diagram illustrating an example of functional configurations of the information processing apparatuses 305 and 306 according to the fourth embodiment. In FIG. 23, parts corresponding to those in FIG. 3 are shown with reference numerals.
The functional configuration shown in FIG. 23 corresponds to a case where image data and corresponding depth information are downloaded from the file server 303 in which image data having depth information is accumulated.
In the case of FIG. 23, the image information acquisition unit 221 acquires image data from the data downloaded from the file server 303, and the depth information acquisition unit 222C acquires depth information from the data downloaded from the file server 303. to get
If depth information is also output from the camera 302, the output of the camera 302 may be given to the image information acquisition unit 221 and the depth information acquisition unit 222B.

図２４は、実施の形態４における情報処理装置３０５及び３０６の機能構成の他の例を説明する図である。図２４には、図３との対応部分に対応する符号を付して示している。
図２４に示す機能構成は、深度情報を有していない画像データが蓄積されているファイルサーバ３０４から画像データと対応する深度情報とがダウンロードされる場合に対応する。
図２４の場合、画像情報取得部２２１と深度情報取得部２２２Ｄの両方が、ファイルサーバ３０４から同じ画像データをダウンロードする。
ここでの深度情報取得部２２２Ｄは、例えば人工知能（ＡＩ）を活用したデータ解析等を実行し、可視光カメラで撮像された画像内の各領域までの深度を取得する。
なお、画像データが、フィルタ付きカメラ２１５Ｂ（図１８参照）で撮像された画像データである場合には、深度情報取得部２２２Ｄとして深度情報取得部２２２Ｂ（図２０参照）を使用すればよい。 FIG. 24 is a diagram illustrating another example of the functional configuration of the information processing apparatuses 305 and 306 according to the fourth embodiment. In FIG. 24, parts corresponding to those in FIG. 3 are shown with reference numerals.
The functional configuration shown in FIG. 24 corresponds to a case where image data and corresponding depth information are downloaded from the file server 304 storing image data without depth information.
In the case of FIG. 24, both the image information acquisition unit 221 and the depth information acquisition unit 222D download the same image data from the file server 304. FIG.
The depth information acquisition unit 222D here executes data analysis using artificial intelligence (AI), for example, and acquires the depth to each area in the image captured by the visible light camera.
If the image data is image data captured by the filter-equipped camera 215B (see FIG. 18), the depth information acquisition section 222B (see FIG. 20) may be used as the depth information acquisition section 222D.

＜他の実施形態＞
以上、本発明の実施の形態について説明したが、本発明の技術的範囲は上述の実施の形態に記載の範囲に限定されない。上述の実施の形態に、種々の変更又は改良を加えたものも、本発明の技術的範囲に含まれることは、特許請求の範囲の記載から明らかである。
例えば前述の実施の形態では、深度別に集計した度数の分布に基づいてヒストグラムを生成しているが、ヒストグラムの生成は必須ではない。深度別に集計された度数の分布を処理して、出現数が多い（集中して出現する）深度の範囲をクラスタとして抽出してもよい。この処理を実行する機能部は、抽出手段の一例である。
前述の実施の形態においては、シード座標設定部２２５がクラスタに属する画素のうちシードとして使用する画素を抽出しているが、深度クラスタリング実行部２２４がクラスタを構成する画素の情報を出力する際に予め定めた規則に基づいて一部の画素を抽出する処理を実行してもよい。この場合、深度クラスタリング実行部２２４が抽出手段の一例となる。 <Other embodiments>
Although the embodiments of the present invention have been described above, the technical scope of the present invention is not limited to the scope described in the above-described embodiments. It is clear from the scope of claims that various modifications and improvements to the above embodiment are also included in the technical scope of the present invention.
For example, in the above-described embodiments, histograms are generated based on the frequency distribution aggregated by depth, but the generation of histograms is not essential. A distribution of frequencies tabulated by depth may be processed to extract depth ranges with a large number of occurrences (appearing intensively) as clusters. A functional unit that executes this process is an example of an extractor.
In the above-described embodiment, the seed coordinate setting unit 225 extracts the pixels to be used as seeds from among the pixels belonging to the cluster. A process of extracting some pixels based on a predetermined rule may be executed. In this case, the depth clustering execution unit 224 is an example of the extraction means.

２０、２０Ａ、２０Ｂ…情報処理装置、２２１…画像情報取得部、２２２、２２２Ａ、２２２Ｂ、２２２Ｃ、２２２Ｄ…深度情報取得部、２２３…深度ヒストグラム生成部、２２４…深度クラスタリング実行部、２２５…シード座標設定部、２２６…領域分割部、２２７…対象領域特定部、２２８…画像処理部、２５０…平滑化／鮮鋭化部、２５１…深度推定部 20, 20A, 20B... Information processing apparatus 221... Image information acquisition unit 222, 222A, 222B, 222C, 222D... Depth information acquisition unit 223... Depth histogram generation unit 224... Depth clustering execution unit 225... Seed coordinates Setting unit 226 Area dividing unit 227 Target area specifying unit 228 Image processing unit 250 Smoothing/sharpening unit 251 Depth estimation unit

Claims

an extracting means for extracting a range in which the depth distribution corresponding to each pixel constituting the captured image appears intensively;
and setting means for setting at least part of the extracted pixels belonging to the range as a starting point for specifying a region to be processed.

2. The information processing apparatus according to claim 1, wherein said extracting means extracts a depth distribution that satisfies a predetermined condition as said range.

3. The information processing apparatus according to claim 2, wherein said condition is that the shape of the distribution resembles the shape of a predetermined function.

4. The information processing apparatus according to claim 3, wherein said function is a unimodal function.

3. The information processing apparatus according to claim 2, wherein said condition is that a frequency exceeding a predetermined threshold appears.

2. The information processing apparatus according to claim 1, wherein said setting means sets, as said starting point, a pixel at a depth where appearance of pixels is concentrated in said extracted range.

7. The information processing apparatus according to claim 6, wherein said at least some pixels correspond to a depth at which the number of pixels appearing in said range is maximal.

7. The information processing apparatus according to claim 6, wherein said at least some pixels correspond to a depth included in a range of a constant multiple of a standard deviation from a median value of said range.

2. The information processing apparatus according to claim 1, wherein said image has pixels associated with a depth distribution .

10. The information processing apparatus according to claim 9, wherein the depth distribution corresponding to each pixel is given as parallax of a plurality of imaging means.

10. The information processing apparatus according to claim 9, wherein the depth distribution corresponding to each pixel is given as a measurement value when capturing the image.

2. The information processing apparatus according to claim 1, wherein the image is divided into a plurality of regions using the pixel values set as the starting points.

2. The information processing apparatus according to claim 1, wherein said area to be processed is selectively identified based on a predetermined rule.

14. The information processing apparatus according to claim 13, wherein a part of the area identified through said starting point is selected as a processing target according to a predetermined rule.

15. The information processing apparatus according to claim 14, wherein said predetermined rule is to give priority to an area near the center of said image.

15. The information processing apparatus according to claim 14, wherein said predetermined rule is to give priority to an area where a user's line of sight is concentrated.

15. The information processing apparatus according to claim 14, wherein the rule-based selection is performed when a difference in depth of the plurality of regions specified through the starting point is within a predetermined range.

the computer,
an extracting means for extracting a range in which the depth distribution corresponding to each pixel constituting the captured image appears intensively;
A program that functions as setting means for setting at least some of the extracted pixels belonging to the range as a starting point for specifying a region to be processed.