JP2018180646A

JP2018180646A - Object candidate area estimation device, object candidate area estimation method and object candidate area estimation program

Info

Publication number: JP2018180646A
Application number: JP2017074752A
Authority: JP
Inventors: 峻司細野; Shunji Hosono; 周平田良島; Shuhei Tarashima; 隆行黒住; Takayuki Kurozumi; 杵渕　哲也; Tetsuya Kinebuchi; 哲也杵渕
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-04-04
Filing date: 2017-04-04
Publication date: 2018-11-15
Anticipated expiration: 2037-04-04
Also published as: JP6754717B2

Abstract

PROBLEM TO BE SOLVED: To provide an object candidate area estimation device, an object candidate area estimation method and an object candidate area estimation program which can estimate a candidate area which is a candidate of an area which suitably contains a single object in an image at high speed and with smaller number of the candidates.SOLUTION: A method includes steps of: accepting a visible light image and a depth image as inputs; estimating candidate area groups which consists of a plurality of candidate areas which are candidates of areas capturing objects, from the visible light image; deleting the candidate area determined that a substance projected in the candidate area is not the object, from the candidate area groups, based on the depth image; estimating object likelihood indicating the degree of a capture state in which the candidate area has caught a single object, about each of the candidate area which was not deleted, based on the depth change between pixels in the depth image; and correcting the shape of the candidate area so as to increase the object likelihood, about each of the candidate area which was not deleted.SELECTED DRAWING: Figure 1

Description

本発明は、画像内における単一の物体のみが含まれる領域の候補とする候補領域を推定する物体候補領域推定装置、物体候補領域推定方法、及び物体候補領域推定プログラムに関する。 The present invention relates to an object candidate area estimation apparatus, an object candidate area estimation method, and an object candidate area estimation program for estimating a candidate area which is a candidate of an area including only a single object in an image.

物体候補領域推定とは、画像内における単一の物体が含まれていそうな領域を候補領域として推定する技術であり、画像に写る物体の種類及び位置を推定する物体検出の前処理として広く用いられている。 The object candidate area estimation is a technology for estimating an area likely to contain a single object in an image as a candidate area, and is widely used as a pre-processing for object detection to estimate the type and position of an object shown in an image. It is done.

物体候補領域推定を用いた物体検出（非特許文献１及び非特許文献２を参照）では、物体候補領域推定により推定された各候補領域について、物体認識（物体の種類推定）を行なうというフレームワークで物体検出を行なっている。物体検出により、スマートフォン等で撮影された画像に写る物体を検出することができれば、撮影された物体に関する詳細な情報、関連情報等を重畳表示することによる、直感的な情報検索・提示といったアプリケーションの実現が見込める。 In object detection using object candidate area estimation (see Non-Patent Document 1 and Non-patent Document 2), a framework for performing object recognition (object type estimation) for each candidate area estimated by object candidate area estimation Object detection is performed. If it is possible to detect an object captured in an image captured by a smartphone or the like by object detection, such an application as intuitive information retrieval and presentation by superimposing and displaying detailed information on the captured object, related information, etc. Realization can be expected.

特に、商品パッケージのアレルギー情報、看板の道案内情報等、言語により詳細な情報が提示されている物体について、表記言語を母国語としないユーザが情報を取得することは困難であるため、画像をインターフェイスとした直感的な情報取得が実現されることによるＵＸ（user experience）の向上は大きいと考えられる。 In particular, it is difficult for a user who does not use the written language as the native language to obtain information on objects for which detailed information is presented in language, such as allergy information on product packages, route guidance information on signs, etc. The improvement of UX (user experience) by realizing intuitive information acquisition as an interface is considered to be great.

物体候補領域推定の最も簡単な方法として、サイズが予め定められた矩形状の枠を画像上で走査しながら領域を切り出す、スライディングウィンドウが考えられる。しかしながら、この手法では、推定される膨大な数の候補領域について物体認識を行なう必要があるため、計算コストが高くなってしまう。 As the simplest method of object candidate area estimation, a sliding window can be considered in which an area is cut out while scanning a rectangular frame of a predetermined size on the image. However, in this method, it is necessary to perform object recognition for a large number of candidate areas to be estimated, which increases the calculation cost.

また、推定された候補領域が物体の一部しか含まない場合、物体以外の領域（物体とは関係のない背景、他の物体等）を多く含む場合等には、物体認識の誤認識、及び認識漏れにつながる（非特許文献３を参照）。 Also, in the case where the estimated candidate region includes only a part of the object, in the case where the region other than the object (a background unrelated to the object, other objects, etc.) is included in a large number, etc. This leads to recognition failure (see Non-Patent Document 3).

さらに、物体検出によるシームレスな情報提示には、処理のリアルタイム化も重要であることを鑑みると、物体候補領域推定は高速に動作する必要がある。 Furthermore, in view of the fact that real-time processing is also important for seamless information presentation by object detection, object candidate area estimation needs to operate at high speed.

以上のことから、物体候補領域推定には、少ない数の候補領域で、かつ画像内の個々の物体を過不足なく含む候補領域を高速に推定することが求められる。 From the above, it is required for object candidate area estimation to rapidly estimate candidate areas including a small number of candidate areas and including just enough individual objects in the image.

R. Girshick, et al., Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, in CVPR, 2014.R. Girshick, et al., Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, in CVPR, 2014. K. He, et al., Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, in ECCV, 2014.K. He, et al., Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, in ECCV, 2014. J. Hosang, et al., What Makes for Effective Detection Proposals?, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.38, no.4, 2016.J. Hosang, et al., What Makes for Effective Detection Proposals ?, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 38, no. 4, 2016. J. R. R. Uijings, et al., Selective Search for Object Recognition, Int. Journal on Computer Vision, vol.104, no.2, pp.154-171, 2013.J. R. R. Uijings, et al., Selective Search for Object Recognition, Int. Journal on Computer Vision, vol. 104, no. 2, pp. 154-171, 2013. P. Arbelaez, et al., Multiscale Combinatorial Grouping. In CVPR, 2014.P. Arbelaez, et al., Multiscale Combinatorial Grouping. In CVPR, 2014. R. Achanta, et al., SLIC Superpixels Compared to State-of-the-art superpixel Methods, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.34, no.11, pp.2274-2282, 2012.R. Achanta, et al., SLIC Superpixels Compared to State-of-the-art superpixel Methods, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274-2282, 2012. C. Lawrence Zitnick et al., Edge Boxes: Locating Object Proposals from Edges, in ECCV, 2014.C. Lawrence Zitnick et al., Edge Boxes: Locating Object Proposals from Edges, in ECCV, 2014. B. Alexe, et al., Measuring the Objectness of Image Windows, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.24, no.11, pp.2189-2202, 2012.B. Alexe, et al., Measuring the Objectness of Image Windows, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 24, no. 11, pp. 2189-2202, 2012. A. Kanezaki, et al., 3D Selective Search for Obtaining Object Candidates, in IROS, 2015.A. Kanezaki, et al., 3D Selective Search for Obtaining Object Candidates, in IROS, 2015. J. Liu, et al., Depth-aware Layered Edge for Object Proposal, in ICME, 2016.J. Liu, et al., Depth-aware Layered Edge for Object Proposal, in ICME, 2016.

公知の物体候補領域推定は、小領域を統合する方法（非特許文献４及び非特許文献５を参照）と、領域の物体らしさを計算する方法（非特許文献６及び非特許文献７を参照）に大別される。 The known object candidate area estimation is a method of integrating small areas (see non-patent document 4 and non-patent document 5) and a method of calculating object likeliness of area (see non-patent documents 6 and 7) It is divided roughly.

小領域を統合する方法では、一例として図６に示すように、まず、Superpixel（非特許文献８を参照）等で生成された小領域について、色、大きさ等の画像特徴を抽出する。そして、画像特徴が類似する小領域同士を順次統合していき、その過程で生成された領域を候補領域として推定する。このような方法で候補領域を高精度に推定するためには、膨大な数の統合手順について、小領域からの特徴抽出及び類似度計算を行なう必要があり、多くの計算時間を必要とする。 In the method of integrating small areas, as shown in FIG. 6 as an example, first, image features such as color and size are extracted for small areas generated by Superpixel (see Non-Patent Document 8). Then, small regions having similar image features are sequentially integrated, and regions generated in the process are estimated as candidate regions. In order to estimate a candidate area with high accuracy by such a method, it is necessary to perform feature extraction and similarity calculation from a small area for a huge number of integration procedures, and a lot of calculation time is required.

一方、領域の物体らしさを計算する方法では、一例として図７に示すように、スライディングウィンドウで切り出された各領域について、その領域がどの程度物体を過不足なく含んでいるか（物体らしさ）を計算し、各領域を物体らしさの高い順にランキングし、物体らしさが上位の幾つかの領域を候補領域として出力する。多くの物体らしさ計算手法では、物体の境界に外接する領域程、物体らしさが高くなることを狙っており、領域縁部から抽出されたエッジやコントラスト等の画像境界情報から物体らしさを計算している。 On the other hand, in the method of calculating the object likeness of a region, as shown in FIG. 7 as an example, for each region cut out by the sliding window, it is calculated how much the region includes just enough objects (object likeness) Then, the regions are ranked in descending order of object likeness, and several regions with higher object likeness are output as candidate regions. Many object resemblance calculation methods aim to increase the object resemblance to the area circumscribed to the object border, and calculate the object resemblance from image border information such as edges and contrast extracted from the area edge. There is.

しかしながらこの方法は、複数の物体を含む領域も物体の境界に外接する領域となるため、複数の物体を含んだ領域の物体らしさも高くなる。そのため、単一の物体を過不足なく含む領域を推定するためには、多くの候補領域を出力しなければならないという問題がある。 However, in this method, the region including a plurality of objects is also a region circumscribing the boundary of the objects, so the object-likeness of the region including the plurality of objects is also enhanced. Therefore, there is a problem that many candidate regions have to be output in order to estimate a region that includes just a single object.

すなわち、複数の物体を含む領域の中央部には物体の境界が含まれるが、画像の境界には、物体の境界だけでなく、物体の模様から抽出された境界も含まれる。これによって、単一の物体を過不足なく含んでいる場合にも、領域中央部には物体の模様から抽出された画像境界が含まれる。そのため、領域中央部に含まれる画像の境界から、その領域が複数の物体を含んでいるのか否かを推定することが困難となっている。 That is, although the center of the region including a plurality of objects includes the boundary of the object, the boundary of the image includes not only the boundary of the object but also the boundary extracted from the pattern of the object. As a result, even when a single object is included without excess or deficiency, the central portion of the region includes the image boundary extracted from the pattern of the object. Therefore, it is difficult to estimate whether the area includes a plurality of objects from the boundary of the image included in the center of the area.

以上のように、公知の物体候補領域推定手法で単一の物体を過不足なく含む領域を推定する場合、小領域を統合する方法では多くの計算量が必要であり、領域の物体らしさを計算する方法では単一の物体を正確に捉えられていない領域が多く含まれる、という問題があった。 As described above, in the case of estimating a region including just a single object using the known object candidate region estimation method, the method of integrating small regions requires a large amount of calculation, and the object-likeness of the region is calculated. There is a problem in that there are many areas where a single object is not accurately captured.

また、より高精度な物体候補領域推定を実現するため、推定対象とする画像に加え、深度画像を活用した方法も提案されている。深度画像とは、可視光画像のある画素に対するカメラからの距離（深度）が埋め込まれた画像であり、深度センサ、ステレオカメラ等で取得することができる。 In addition to the image to be estimated, a method using a depth image has also been proposed in order to realize more accurate object candidate region estimation. A depth image is an image in which a distance (depth) from a camera to a certain pixel of a visible light image is embedded, and can be acquired by a depth sensor, a stereo camera or the like.

非特許文献９に開示されている技術は、小領域を統合する方法で深度画像を活用している。この方法では、小領域から抽出する特徴として画像から得られる特徴だけでなく、小領域の体積も計算することで、高精度化を図っている。しかしながら、画像のみを用いる場合同様、高精度な推定には膨大な数の小領域同士の組み合わせについて類似度を計算する必要があり、依然として多くの計算時間を要すると考えられる。 The technology disclosed in Non Patent Literature 9 utilizes depth images in a method of integrating small regions. In this method, high accuracy is achieved by calculating not only the feature obtained from the image as the feature to be extracted from the small region but also the volume of the small region. However, as in the case of using only an image, it is necessary to calculate the degree of similarity for a combination of a large number of small areas for high accuracy estimation, and it is considered that it still requires a large amount of calculation time.

非特許文献１０で開示されている技術は、深度によって画像を複数のレイヤに分割し、各レイヤで物体らしさの計算を行なった結果を統合することで、距離の大きく異なる複数の物体が含まれる領域の発生を抑制している。しかしながら、物体らしさの算出には、画像境界のみを用いているため、同じレイヤに複数の物体が含まれる状況では依然として複数の物体を含む領域が多く推定されるため、単一物体を過不足なく含む領域を推定するためには、多くの候補領域を出力しなければならない、という問題がある。 The technique disclosed in Non-Patent Document 10 divides an image into a plurality of layers according to depth, and integrates results obtained by calculating the object likeness in each layer, thereby including a plurality of objects having widely different distances. It suppresses the generation of the area. However, since only image boundaries are used to calculate the object likeness, many regions including multiple objects are still estimated in situations where multiple objects are included in the same layer, so there is no shortage of single objects. There is a problem that many candidate regions have to be output in order to estimate the region included.

このように、深度画像を活用した物体候補領域推定においても、推定対象とする画像のみを用いた場合と同様の課題が残されている。 As described above, in the object candidate area estimation using the depth image, the same problem as in the case of using only the image to be estimated remains.

本発明は、以上のような事情に鑑みてなされたものであり、画像内の単一の物体を過不足なく含む領域の候補である候補領域を、高速かつ少ない候補数で推定することができる物体候補領域推定装置、物体候補領域推定方法、及び物体候補領域推定プログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and it is possible to estimate candidate areas that are candidates for areas including a single object in an image without excess or short with high speed and a small number of candidates. An object candidate area estimation apparatus, an object candidate area estimation method, and an object candidate area estimation program are provided.

上記目的を達成するために、本発明の物体候補領域推定装置は、可視光画像及び当該可視光画像に対応する深度画像を入力とし、前記可視光画像から、物体が写る領域の候補である複数の候補領域からなる候補領域群を推定する初期候補領域推定部と、前記深度画像に基づいて、前記候補領域に写るものが前記物体でないと判定した前記候補領域を、前記候補領域群から削除する候補領域削減部と、前記深度画像における画素間の深度変化に基づいて、前記候補領域削減部により削除されなかった前記候補領域の各々について、前記候補領域が単一の物体を捉えている度合いを示す物体らしさを推定する物体らしさ推定部と、前記候補領域削減部により削除されなかった前記候補領域の各々について、前記物体らしさ推定部により推定された前記物体らしさが高くなるように、前記候補領域の形状を補正する候補領域補正部と、を含む。 In order to achieve the above object, an object candidate area estimation apparatus according to the present invention receives a visible light image and a depth image corresponding to the visible light image as input, and a plurality of candidates for an area in which an object appears from the visible light image. And an initial candidate area estimation unit for estimating a candidate area group consisting of candidate areas, and deleting from the candidate area group the candidate areas that are determined not to be the object based on the depth image. The degree to which the candidate area captures a single object for each of the candidate areas not deleted by the candidate area reduction unit based on the candidate area reduction unit and the change in depth between pixels in the depth image The object likelihood estimating unit estimates each of an object likelihood estimating unit that estimates an object likelihood to be shown, and the candidate regions that are not deleted by the candidate region reducing unit As serial object likelihood is high, including a candidate area correcting unit for correcting the shape of the candidate region.

なお、前記物体らしさ推定部は、前記候補領域削減部により削除されなかった前記候補領域の各々について、前記深度画像に対するエッジ検出により抽出された深度境界に基づいて、前記候補領域の縁部の前記深度境界の密度が大きく、かつ前記候補領域の内部の前記深度境界の密度が小さいほど高くなる指標を用いて、前記物体らしさを推定するようにしても良い。 Note that the object likeness estimation unit is configured to calculate the edge portion of the candidate region based on the depth boundary extracted by the edge detection on the depth image for each of the candidate regions not deleted by the candidate region reduction unit. The object likeliness may be estimated using an index which is higher as the density of the depth boundary is higher and the density of the depth boundary inside the candidate area is lower.

また、前記物体らしさＳ（ｒ）は、前記候補領域の縁部の領域をｒ_１とし、前記候補領域の内部の領域をｒ_２とし、φ（ｒ）を領域ｒにおける深度境界の密度とし、

を標準シグモイド関数とすると、下記の式で表されるようにしても良い。
In the object likeness S (r), the area at the edge of the candidate area is r ₁ , the area inside the candidate area is r _2, and φ (r) is the density of the depth boundary in the area r.

If is a standard sigmoid function, it may be expressed by the following equation.

また、前記候補領域削減部は、前記深度画像を用いて求められる前記候補領域に写るものの実空間上の大きさに基づいて前記候補領域に写るものが前記物体であるか否かを判定し、前記候補領域に写るものが前記物体でないと判定した前記候補領域を、前記候補領域群から削除するようにしても良い。 Further, the candidate area reduction unit determines whether or not the object in the candidate area is the object based on the size in real space of the object in the candidate area obtained using the depth image. The candidate area determined to be not the object in the candidate area may be deleted from the candidate area group.

また、前記候補領域補正部は、前記候補領域削減部により削除されなかった前記候補領域の各々について、前記候補領域の形状を複数の変形方法により変形させた各々の形状の前記候補領域の前記物体らしさを推定し、推定した前記物体らしさが最も高い形状となるように前記候補領域の形状を補正するようにしても良い。 Further, the candidate area correction unit is configured to, for each of the candidate areas not deleted by the candidate area reduction unit, the object of the candidate area of each shape obtained by deforming the shape of the candidate area by a plurality of deformation methods. The likelihood may be estimated, and the shape of the candidate area may be corrected so that the estimated likelihood of the object has the highest shape.

上記目的を達成するために、本発明の物体候補領域推定方法は、初期候補領域推定部と、候補領域削減部と、物体らしさ推定部と、候補領域補正部と、部とを有する物体候補領域推定装置における物体候補領域推定方法であって、前記初期候補領域推定部が、可視光画像及び当該可視光画像に対応する深度画像を入力とし、前記可視光画像から、物体が写る領域の候補である複数の候補領域からなる候補領域群を推定するステップと、前記候補領域削減部が、前記深度画像に基づいて、前記候補領域に写るものが前記物体でないと判定した前記候補領域を、前記候補領域群から削除するステップと、前記物体らしさ推定部が、前記深度画像における画素間の深度変化に基づいて、前記候補領域削減部により削除されなかった前記候補領域の各々について、前記候補領域が単一の物体を捉えている度合いを示す物体らしさを推定するステップと、前記候補領域補正部が、前記候補領域削減部により削除されなかった前記候補領域の各々について、前記物体らしさ推定部により推定された前記物体らしさが高くなるように、前記候補領域の形状を補正するステップと、を含む。 In order to achieve the above object, an object candidate area estimation method according to the present invention includes an initial candidate area estimation unit, a candidate area reduction unit, an object likelihood estimation unit, a candidate area correction unit, and an object candidate area An object candidate area estimation method in an estimation apparatus, wherein the initial candidate area estimation unit receives a visible light image and a depth image corresponding to the visible light image, and is a candidate for an area in which an object appears from the visible light image. Estimating the candidate area group including a plurality of candidate areas, and the candidate area determined by the candidate area reduction unit not to be the object according to the candidate area based on the depth image, Removing from the region group, and each of the candidate regions not deleted by the candidate region reduction unit based on a change in depth between pixels in the depth image, and the object likeness estimation unit Estimating the likelihood of the object indicating the degree to which the candidate area captures a single object, and the candidate area correction unit does not delete the candidate area for each of the candidate areas not deleted by the candidate area reduction unit. Correcting the shape of the candidate area such that the object likelihood estimated by the object likeness estimation unit is high.

上記目的を達成するために、本発明の物体候補領域推定プログラムは、コンピュータを、上記物体候補領域推定装置の各部として機能させるためのプログラムである。 In order to achieve the above object, an object candidate area estimation program of the present invention is a program for causing a computer to function as each part of the object candidate area estimation apparatus.

本発明によれば、画像内の単一の物体を過不足なく含む領域の候補である候補領域を、高速かつ少ない候補数で推定することが可能となる。 According to the present invention, it is possible to estimate candidate areas which are candidates for an area including just a single object in an image with high speed and a small number of candidates.

実施形態に係る物体候補領域推定装置の機能的な構成を示すブロック図である。It is a block diagram which shows the functional structure of the object candidate area | region estimation apparatus which concerns on embodiment. 実施形態に係る物体らしさの推定方法について説明するための模式図である。It is a schematic diagram for demonstrating the estimation method of object likeness which concerns on embodiment. 実施形態に係る候補領域の縁部の領域及び内部の領域を示す模式図である。It is a schematic diagram which shows the area | region of the edge of the candidate area | region which concerns on embodiment, and the area | region inside. 実施形態に係る物体候補領域推定処理の流れを示すフローチャートである。It is a flow chart which shows a flow of object candidate field presumed processing concerning an embodiment. 実施形態に係る補正処理のサブルーチンの流れを示すフローチャートである。It is a flowchart which shows the flow of the subroutine of the correction process which concerns on embodiment. 従来の小領域を統合する方法について説明するための模式図である。It is a schematic diagram for demonstrating the method to integrate the conventional small area | region. 従来の領域の物体らしさを計算する方法について説明するための模式図である。It is a schematic diagram for demonstrating the method to calculate the object likelihood of the conventional area | region.

以下、本発明の実施形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、第１実施形態に係る物体候補領域推定装置１０の機能的な構成を示すブロック図である。図１に示すように、第１実施形態に係る物体候補領域推定装置１０は、初期候補領域設定部１２、候補領域削減部１４、物体らしさ推定部１６、候補領域補正部１８、及び、候補領域記憶部２０を有している。 FIG. 1 is a block diagram showing a functional configuration of the object candidate area estimation apparatus 10 according to the first embodiment. As shown in FIG. 1, the object candidate area estimation device 10 according to the first embodiment includes an initial candidate area setting unit 12, a candidate area reduction unit 14, an object likelihood estimation unit 16, a candidate area correction unit 18, and a candidate area A storage unit 20 is provided.

初期候補領域設定部１２は、可視光画像及び当該可視光画像に対応する深度画像を入力とし、可視光画像から、物体が写る領域の候補である複数の候補領域からなる候補領域群を推定する。本実施形態では、可視光画像における予め定められた複数の位置における、予め定められた大きさの複数の領域を候補領域とする。例えば、複数の候補領域は、可視光画像上で、予め定められた大きさの矩形状の枠（例えば、一辺が３００×４００［ｐｉｘｅｌ］）を予め定められた間隔（例えば、矩形の幅の５０％の間隔）で走査させることにより設定される。 The initial candidate area setting unit 12 receives a visible light image and a depth image corresponding to the visible light image, and estimates, from the visible light image, a candidate area group consisting of a plurality of candidate areas that are candidates for the area in which the object appears. . In this embodiment, a plurality of areas of a predetermined size at a plurality of predetermined positions in the visible light image are set as candidate areas. For example, on the visible light image, a plurality of candidate areas have rectangular frames (for example, one side of 300 × 400 [pixel]) of a predetermined size and a predetermined interval (for example, rectangular width). It is set by scanning at intervals of 50%.

候補領域削減部１４は、深度画像に基づいて、候補領域に写るものが物体でないと判定した候補領域を、候補領域群から削除する。 The candidate area reduction unit 14 deletes, from the candidate area group, the candidate area determined to be not an object in the candidate area based on the depth image.

本実施形態では、候補領域削減部１４は、深度画像を用いて求められる候補領域に写るものの実空間上の大きさに基づいて候補領域に写るものが物体であるか否かを判定し、候補領域に写るものが物体でないと判定した候補領域を、候補領域群から削除する。このように、候補領域群から物体を捉えられていない候補領域を削除することにより、推定結果として出力する候補領域における、物体を捉えた領域の割合をより高めることができる。 In the present embodiment, the candidate area reduction unit 14 determines whether or not the object in the candidate area is an object based on the size in real space of the object in the candidate area obtained using the depth image. The candidate area determined to be not an object that is reflected in the area is deleted from the candidate area group. As described above, by deleting a candidate area in which the object is not captured from the candidate area group, it is possible to further increase the ratio of the area in which the object is captured in the candidate area output as the estimation result.

具体的には、候補領域削減部１４は、候補領域の各々について、深度画像から候補領域の実空間上での大きさを算出し、物体の大きさとして事前に設定された大きさの範囲を逸脱する候補領域を、候補領域群から削除する。例えば、実空間上での領域の高さで制限を設ける場合、領域ｒの実空間上での高さｒ_ｈ’は、下記（１）式により算出可能である。 Specifically, the candidate area reduction unit 14 calculates the size of the candidate area in the real space from the depth image for each of the candidate areas, and sets the range of the size set in advance as the size of the object. The candidate area which deviates is deleted from the candidate area group. For example, in the case where restriction is provided by the height of the region in the real space, the height r _{h ′} of the region r in the real space can be calculated by the following equation (1).

…（１）
... (1)

ここで、ｒ_ｄは領域ｒ内の深度の代表値であり、例えば、領域内の深度の中央値等を用いる。また、ｒ_ｈは領域ｒの画像上での高さであり、ｆ_ｙはカメラの内部パラメータにおける、鉛直方向の画素単位の焦点距離であり、カメラキャリブレーションにより取得する。 Here, r _d is a representative value of the depth in the region r, and for example, the median value of the depth in the region or the like is used. Further, r _h is the height in the image region r, f _y is the internal parameters of the camera, the focal length of the vertical pixels, acquired by camera calibration.

物体らしさ推定部１６は、深度画像における画素間の深度変化に基づいて、候補領域削減部１４により削除されなかった候補領域の各々について、候補領域が単一の物体を過不足なく捉えている度合いを示す物体らしさを推定する。 The object resemblance estimation unit 16 determines the extent to which the candidate area captures a single object without excess for each of the candidate areas not deleted by the candidate area reduction unit 14 based on the change in depth between pixels in the depth image. Estimate the object likeness.

本実施形態では、一例として図２に示すように、入力した可視光画像３０に対応する深度画像３２に基づいて、深度画像３２に対してエッジ検出（例えば、下記参考文献１を参照）を行うことにより、可視光画像３０における深度の境界を表す深度境界画像３４を生成する。深度画像３２に対してエッジ検出では、画素間の深度の変化が大きな点（例えば、画素間の深度の変化が予め定めた閾値以上である点）が深度の境界として検出される。 In the present embodiment, as shown in FIG. 2 as an example, edge detection (see, for example, reference 1 below) is performed on the depth image 32 based on the depth image 32 corresponding to the input visible light image 30. Thus, the depth boundary image 34 representing the depth boundary in the visible light image 30 is generated. In the edge detection with respect to the depth image 32, a point at which a change in depth between pixels is large (for example, a point at which a change in depth between pixels is equal to or more than a predetermined threshold) is detected as a depth boundary.

そして、物体らしさ推定部１６は、深度境界画像３４に基づいて、候補領域の各々に対し、一例として図３に示すように、候補領域３５の縁部の領域ｒ_１に深度境界が多く、かつ候補領域の内部の領域ｒ_２に深度境界が少ない領域ほど、物体らしいという物体らしさの指標を算出するために、候補領域３５の縁部の領域ｒ_１における深度境界の密度及び候補領域３５の内部の領域ｒ_２における深度境界の密度を用いて、物体らしさを推定する。 Then, the object likelihood estimation unit 16, based on the depth border image 34, for each of the candidate regions, as shown in FIG. 3 as an example, depth boundaries many regions r ₁ of edge candidate areas 35, and more area depth boundaries within the region r ₂ is less candidate areas, in order to calculate the index of the object likelihood that seems object, the interior of the density of the depth boundaries in region r ₁ of edge candidate areas 35 and candidate regions 35 The object likeness is estimated using the density of the depth boundary in the region r ₂ of

ここで、物体らしさは、候補領域の縁部の深度境界の密度が大きく、かつ候補領域の内部の深度境界の密度が小さいほど高くなる指標を用いて表される。なお、本実施形態では、物体らしさ推定部１６は、下記（２）式で定義された物体らしさＳ（ｒ）を推定する。 Here, the object likeness is expressed using an index which increases as the density of the depth boundary of the edge of the candidate area is larger and the density of the depth boundary inside the candidate area is smaller. In the present embodiment, the object likeness estimation unit 16 estimates the object likeness S (r) defined by the following equation (2).

…（２）
... (2)

上記（２）式におけるφ（・）はある領域における深度境界の密度であり、密度φ（ｒ）は常に０以上となるため、標準シグモイド関数

で［０．５，１］の範囲に正規化されている。 In the above equation (2), φ (·) is the density of the depth boundary in a certain area, and the density φ (r) is always 0 or more, so the standard sigmoid function

Is normalized to the range of [0.5, 1].

物体らしさの算出回数は候補領域の数に比例して増大する。しかしながら、事前に深度画像の積分画像（例えば、下記参考文献２を参照）を計算しておくことで、深度境界の密度の計算が任意の領域について高速に算出可能であるため、物体らしさの計算を高速に行なうことができる。 The number of calculations of the object likeness increases in proportion to the number of candidate areas. However, by calculating the integral image of the depth image in advance (see, for example, reference 2 below), the calculation of the density of the depth boundary can be calculated at high speed for any region, so the object likeness calculation is performed. Can be done at high speed.

［参考文献１］J. Canny, A Computational Approach to Edge Detection, IEEE Trans. on Pattern Analysis and Meachine Inteligence, vol.8, no.6, pp.679-698, 1986. [Reference 1] J. Canny, A Computational Approach to Edge Detection, IEEE Trans. On Pattern Analysis and Meachine Inteligence, vol. 8, no. 6, pp. 679-698, 1986.

［参考文献２］F. C. Crow, Summed-Area Tables for Texture Mapping, in SIGGRAPH, 1984. [Reference 2] F. C. Crow, Summed-Area Tables for Texture Mapping, in SIGGRAPH, 1984.

候補領域補正部１８は、候補領域削減部１４により削除されなかった候補領域の各々について、物体らしさ推定部１６により推定された物体らしさが高くなるように、候補領域の形状を補正する。 The candidate area correction unit 18 corrects the shape of the candidate area so that the likelihood of an object estimated by the object likeness estimation unit 16 is high for each of the candidate areas not deleted by the candidate area reduction unit 14.

本実施形態では、候補領域補正部１８は、候補領域削減部１４により削除されなかった候補領域の各々について、候補領域の形状を複数の変形方法により変形させた各々の形状の候補領域の物体らしさを推定し、推定した物体らしさが最も高い形状となるように候補領域の形状を補正する。また、候補領域補正部１８は、形状を補正した候補領域を示すデータを候補領域記憶部２０に記憶させる。このように、各候補領域が個々の物体をより正確に捉えるよう、候補領域群に含まれる各候補領域の位置、形状等をより物体らしさが高くなるように補正し、物体らしさで順位付けされた候補領域を最終的な推定結果として出力する。 In the present embodiment, the candidate area correction unit 18 determines, for each candidate area not deleted by the candidate area reduction unit 14, the object likelihood of the candidate area of each shape obtained by deforming the shape of the candidate area by a plurality of deformation methods. Is estimated, and the shape of the candidate area is corrected so that the estimated likelihood of the object has the highest shape. Further, the candidate area correction unit 18 stores data indicating the candidate area whose shape has been corrected in the candidate area storage unit 20. In this manner, the positions, shapes, and the like of the candidate areas included in the candidate area group are corrected so as to increase the object-likeness so that each candidate area catches an individual object more accurately, and is ranked according to the object-likeness. The candidate area is output as the final estimation result.

このとき、候補領域の形状として考え得る全ての形状のうち、最も物体らしさが高くなる形状を選択して補正を行うことも可能であるが、その場合には、膨大な回数の物体らしさを算出する必要があり計算コストが高くなってしまう。 At this time, it is also possible to select and correct the shape that gives the highest object likelihood among all the shapes that can be considered as the shape of the candidate area, but in that case, the object likeness of a large number of times is calculated It is necessary to do so and the computational cost will be high.

そのため、例えば、候補領域の形状を貪欲に探索することが考えられる。具体的には、候補領域の補正幅を予め定めた値（例えば、領域幅の５０％等）とし、候補領域の各辺を補正幅分、上下又は左右に増減させた中で最も物体らしさの高い形状となるように候補領域を補正する。その後、補正幅を予め定めた量（例えば、現在定められている補正幅の５０％）だけ減らし、さらに物体らしさの高い形状となるように候補領域に補正する、という動作を補正幅が予め定めた閾値以下となるまで繰り返すことで、物体らしさ計算回数を抑制しながら候補領域の補正を行なうことができる。 Therefore, for example, it is conceivable to greedily search for the shape of the candidate area. Specifically, the correction width of the candidate area is a predetermined value (for example, 50% of the area width, etc.), and each side of the candidate area is increased or decreased by the correction width, vertically or horizontally for the most object likelihood. The candidate area is corrected to have a high shape. After that, the correction width is previously determined by reducing the correction width by a predetermined amount (for example, 50% of the currently set correction width) and further correcting the candidate area into a shape with high object-likeness. By repeating the process until it becomes equal to or less than the threshold value, it is possible to correct the candidate area while suppressing the number of times of calculating the object likeness.

なお、出力される候補領域は、任意の形状（多角形、楕円等）の領域とすることができるが、本実施形態では、簡単のため、矩形上の領域として出力されるものとする。 The candidate area to be output may be an area of an arbitrary shape (polygon, ellipse or the like), but in the present embodiment, it is output as a rectangular area for the sake of simplicity.

本実施形態に係る物体候補領域推定装置１０は、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、各種プログラムを記憶するＲＯＭ（Read Only Memory）を備えたコンピュータ装置で構成される。また、物体候補領域推定装置１０を構成するコンピュータは、ハードディスクドライブ、不揮発性メモリ等の記憶部を備えていても良い。本実施形態では、ＣＰＵがＲＯＭ、ハードディスク等の記憶部に記憶されているプログラムを読み出して実行することにより、上記のハードウェア資源とプログラムとが協働し、上述した機能が実現される。 The object candidate area estimation apparatus 10 according to the present embodiment is, for example, a computer apparatus provided with a central processing unit (CPU), a random access memory (RAM), and a read only memory (ROM) for storing various programs. Moreover, the computer which comprises the object candidate area | region estimation apparatus 10 may be equipped with memory | storage parts, such as a hard disk drive and non-volatile memory. In the present embodiment, when the CPU reads and executes the program stored in the storage unit such as the ROM and the hard disk, the above-described hardware resource and the program cooperate with each other to realize the above-described function.

本実施形態に係る物体候補領域推定装置１０による物体候補領域推定処理の流れを、図４に示すフローチャートを用いて説明する。本実施形態では、物体候補領域推定装置１０に、物体候補領域推定処理の実行を開始するための予め定めた情報が入力されたタイミングで物体候補領域推定処理が開始されるが、物体候補領域推定処理が開始されるタイミングはこれに限らず、例えば、可視光画像及び深度画像が入力されたタイミングで物体候補領域推定処理が開始されても良い。 The flow of object candidate area estimation processing by the object candidate area estimation apparatus 10 according to the present embodiment will be described using the flowchart shown in FIG. In the present embodiment, the object candidate area estimation process is started at a timing when predetermined information for starting the execution of the object candidate area estimation process is input to the object candidate area estimation device 10. The timing at which the processing is started is not limited to this. For example, the object candidate area estimation processing may be started at the timing when the visible light image and the depth image are input.

ステップＳ１０１では、初期候補領域設定部１２が、可視光画像及び当該可視光画像に対応する深度画像の入力を受け付け、可視光画像から、複数の候補領域からなる候補領域群を推定する。 In step S101, the initial candidate area setting unit 12 receives a visible light image and an input of a depth image corresponding to the visible light image, and estimates a candidate area group including a plurality of candidate areas from the visible light image.

ステップＳ１０３では、候補領域削減部１４が、深度画像に基づいて、候補領域に写るものが物体でないと判定した候補領域を、候補領域群から削除する。 In step S103, based on the depth image, the candidate area reduction unit 14 deletes, from the candidate area group, candidate areas that are determined not to be objects in the candidate area.

ステップＳ１０５では、物体らしさ推定部１６が、深度画像における画素間の深度変化に基づいて、候補領域削減部１４により削除されなかった候補領域の各々について、物体らしさを推定する。 In step S105, the object likelihood estimation unit 16 estimates the object likeness for each of the candidate regions not deleted by the candidate region reduction unit 14 based on the change in depth between the pixels in the depth image.

ステップＳ１０７では、候補領域補正部１８が、候補領域削減部１４により削除されなかった候補領域の各々について、物体らしさ推定部１６により推定された物体らしさが高くなるように、候補領域の形状を補正する補正処理を行う。 In step S107, the candidate area correction unit 18 corrects the shape of the candidate area so that the likelihood of an object estimated by the object likeness estimation unit 16 is high for each of the candidate areas not deleted by the candidate area reduction unit 14 Perform correction processing.

ステップＳ１０９では、候補領域補正部１８が、形状が補正された候補領域を示すデータを出力し、本物体候補領域推定処理のプログラムの実行を終了する。なお、本実施形態では、形状が補正された候補領域を示すデータをディスプレイ等の表示手段に表示させたり、形状が補正された候補領域を示すデータを外部装置に送信したり、形状が補正された候補領域を示すデータを記憶手段に記憶させたりすることにより、形状が補正された候補領域を示すデータを出力する。 In step S109, the candidate area correction unit 18 outputs data indicating the candidate area whose shape has been corrected, and ends the execution of the program of the object candidate area estimation process. In the present embodiment, data indicating a candidate area whose shape is corrected is displayed on a display unit such as a display, data indicating a candidate area whose shape is corrected is transmitted to an external device, or the shape is corrected. By storing data indicating the candidate area in the storage unit, data indicating the candidate area whose shape has been corrected is output.

次に、本実施形態に係る物体候補領域推定装置１０による上述した補正処理のサブルーチンの流れを、図５に示すフローチャートを用いて説明する。なお、図５に示すフローチャートは、候補領域の各々について実行される。 Next, the flow of a subroutine of the above-described correction processing by the object candidate region estimation apparatus 10 according to the present embodiment will be described using the flowchart shown in FIG. The flowchart shown in FIG. 5 is executed for each of the candidate areas.

ステップＳ２０１では、候補領域補正部１８が、候補領域の形状を複数の変形方法により複数の形状に変形させる。本実施形態では、候補領域の各辺を、予め定めた補正幅で、増加又は減少させる場合について説明するが、これに限らない。候補領域の形状の変形方法としては、例えば、候補領域を予め定めた比率で、増加又は減少させる方法等が挙げられる。 In step S201, the candidate area correction unit 18 deforms the shape of the candidate area into a plurality of shapes by a plurality of deformation methods. In the present embodiment, although the case where each side of the candidate area is increased or decreased with a predetermined correction width is described, the present invention is not limited thereto. As a method of deforming the shape of the candidate area, for example, a method of increasing or decreasing the candidate area at a predetermined ratio may be mentioned.

ステップＳ２０３では、候補領域補正部１８が、変形させた各々の形状の候補領域の物体らしさを推定する。 In step S203, the candidate area correction unit 18 estimates the object likeness of the candidate area of each of the deformed shapes.

ステップＳ２０５では、候補領域補正部１８が、推定した物体らしさが最も高い形状となるように候補領域の形状を補正し、形状を補正した候補領域を示すデータを候補領域記憶部２０に記憶させる。 In step S205, the candidate area correction unit 18 corrects the shape of the candidate area so that the estimated shape of the object has the highest shape, and stores data indicating the candidate area whose shape is corrected in the candidate area storage unit 20.

ステップＳ２０７では、候補領域補正部１８が、候補領域の各辺の補正幅が予め定めた閾値以下であるか否かを判定する。ここでいう予め定めた閾値は、どの程度細かな補正を行なうかを示す値であり、例えば、ユーザにより予め入力された値である。 In step S207, the candidate area correction unit 18 determines whether the correction width of each side of the candidate area is equal to or less than a predetermined threshold. Here, the predetermined threshold is a value indicating how finely the correction is to be performed, and is, for example, a value previously input by the user.

ステップＳ２０７で候補領域の各辺の補正幅が閾値以下でないと判定した場合（Ｓ２０７，Ｎ）はステップＳ２０９に移行する。また、ステップＳ２０７で候補領域の各辺の補正幅が閾値以下であると判定した場合は、本補正処理のプログラムの実行を終了する。 If it is determined in step S207 that the correction width of each side of the candidate area is not equal to or less than the threshold (S207, N), the process proceeds to step S209. If it is determined in step S207 that the correction width of each side of the candidate area is equal to or less than the threshold value, the execution of the program of the present correction processing is ended.

ステップＳ２０９では、候補領域補正部１８が、候補領域の各辺の補正幅を減少させて、ステップＳ２０１に移行する。 In step S209, the candidate area correction unit 18 reduces the correction width of each side of the candidate area, and proceeds to step S201.

このように、本実施形態では、可視光画像及び当該可視光画像に対応する深度画像を入力とし、可視光画像から、複数の候補領域からなる候補領域群が推定される。また、深度画像に基づいて、候補領域に写るものが物体でないと判定した候補領域が、候補領域群から削除される。また、深度画像における画素間の深度変化に基づいて、削除されなかった候補領域の各々について、候補領域が単一の物体を捉えている度合いを示す物体らしさが推定される。そして、削除されなかった候補領域の各々について、物体らしさが高くなるように、候補領域の形状が補正される。 As described above, in the present embodiment, a visible light image and a depth image corresponding to the visible light image are input, and a candidate area group including a plurality of candidate areas is estimated from the visible light image. Moreover, the candidate area | region determined that what is reflected to a candidate area | region is not an object based on a depth image is deleted from a candidate area group. Also, based on the change in depth between pixels in the depth image, an object likeness indicating the degree to which the candidate area captures a single object is estimated for each of the candidate areas not deleted. Then, for each of the candidate areas not deleted, the shape of the candidate area is corrected such that the likelihood of an object is high.

この際、候補領域に写っているものの物体らしさを算出する際に用いる、候補領域内の深度境界の数を、候補領域に対するエッジ検出を行うことにより算出すること、及び、候補領域内の画素の和を積分画像を用いて高速で計算することにより、物体らしさを高速で算出することができる。 At this time, the number of depth boundaries in the candidate area is calculated by performing edge detection on the candidate area, which is used when calculating the object-likeness of objects appearing in the candidate area, and By calculating the sum at high speed using the integral image, the object likeness can be calculated at high speed.

以上のように、本実施形態によると、可視光画像に写った各々の物体の境界に外接し、かつ物体の境界を内部に含まない領域、つまり、単一の物体を過不足なく含む候補領域を、高速かつ少ない候補数で推定することが可能となる。 As described above, according to the present embodiment, a region circumscribed to the boundary of each object shown in the visible light image and not including the boundary of the object, that is, a candidate region including just a single object Can be estimated quickly and with a small number of candidates.

なお、本実施形態では、図１に示す機能の構成要素の動作をプログラムとして構築し、物体候補領域推定装置１０として利用されるコンピュータにインストールして実行させるが、これに限らず、ネットワークを介して流通させても良い。 In this embodiment, the operation of the component of the function shown in FIG. 1 is constructed as a program and installed in a computer used as the object candidate area estimation apparatus 10 and executed, but the present invention is not limited thereto. You may distribute it.

また、構築されたプログラムをハードディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールしたり、配布したりしても良い。 Further, the constructed program may be stored in a portable storage medium such as a hard disk or a CD-ROM, and may be installed in a computer or distributed.

１０物体候補領域推定装置
１２初期候補領域設定部
１４候補領域削減部
１６物体らしさ推定部
１８候補領域補正部
２０候補領域記憶部 10 Object candidate area estimation device 12 Initial candidate area setting unit 14 Candidate area reduction unit 16 Object likelihood estimation unit 18 Candidate area correction unit 20 Candidate area storage unit

Claims

An initial candidate area estimation unit configured to receive a visible light image and a depth image corresponding to the visible light image and estimate a candidate area group consisting of a plurality of candidate areas that are candidates for an area in which an object appears from the visible light image;
A candidate area reduction unit which deletes, from the candidate area group, the candidate area determined to be not the object based on the depth image and determined to be not the object in the candidate area;
An object likeness indicating the degree to which the candidate area captures a single object is estimated for each of the candidate areas not deleted by the candidate area reduction unit based on a change in depth between pixels in the depth image. An object likeness estimation unit,
A candidate area correction unit that corrects the shape of the candidate area such that the likelihood of the object estimated by the object likeness estimation unit becomes high for each of the candidate regions not deleted by the candidate area reduction unit;
An object candidate area estimation apparatus including:

The object likeness estimation unit is configured to determine the depth boundary of the edge of the candidate region based on the depth boundary extracted by the edge detection on the depth image for each of the candidate regions not deleted by the candidate region reduction unit. The object candidate area estimation apparatus according to claim 1, wherein the likelihood of the object is estimated using an index that increases as the density of the target area increases and the density of the depth boundary inside the candidate area decreases.

In the object likeness S (r), an area at the edge of the candidate area is r ₁ , an area inside the candidate area is r _2, and φ (r) is a density of the depth boundary in the area r.

The object candidate area estimation apparatus according to claim 2, which is expressed by the following equation, where is a standard sigmoid function.

The candidate area reduction unit determines whether or not the object in the candidate area is the object based on the size in real space of the object in the candidate area obtained using the depth image. The object candidate area | region estimation apparatus in any one of Claims 1-3 which deletes the said candidate area | region determined that what is reflected to an area | region is not the said object from the said candidate area group.

The candidate area correction unit determines, for each of the candidate areas not deleted by the candidate area reduction unit, the object likeness of the candidate area of each shape obtained by deforming the shape of the candidate area by a plurality of deformation methods. The object candidate area estimation device according to any one of claims 1 to 4, wherein the shape of the candidate area is corrected so as to be estimated and the shape of the estimated object likelihood is the highest.

An object candidate area estimation method in an object candidate area estimation apparatus having an initial candidate area estimation unit, a candidate area reduction unit, an object likeness estimation unit, a candidate area correction unit, and a unit,
The initial candidate area estimation unit receives a visible light image and a depth image corresponding to the visible light image, and estimates a candidate area group consisting of a plurality of candidate areas that are candidates for an area in which an object appears from the visible light image Step to
Deleting, from the candidate area group, the candidate area determined by the candidate area reduction unit not to be the object based on the depth image and determined that the object in the candidate area is not the object;
The degree to which the candidate area captures a single object for each of the candidate areas not deleted by the candidate area reduction unit based on a change in depth between pixels in the depth image Estimating an object likeness indicating
The candidate area correction unit corrects the shape of the candidate area so that the likelihood of the object estimated by the object likeness estimation unit is high for each of the candidate areas not deleted by the candidate area reduction unit. Step and
An object candidate area estimation method including:

The object candidate area | region estimation program for functioning a computer as each part of the object candidate area | region estimation apparatus in any one of Claims 1-5.