JP6717049B2

JP6717049B2 - Image analysis apparatus, image analysis method and program

Info

Publication number: JP6717049B2
Application number: JP2016100550A
Authority: JP
Inventors: 崇之原
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-05-19
Filing date: 2016-05-19
Publication date: 2020-07-01
Anticipated expiration: 2036-05-19
Also published as: JP2017207960A

Description

本発明は、画像解析装置、画像解析方法およびプログラムに関する。 The present invention relates to an image analysis device, an image analysis method and a program.

従来、画像からユーザの興味領域を抽出する技術は、画像の自動クロッピング／サムネイル生成や、画像理解／画像検索におけるアノテーション生成の前処理などに広く利用されており、興味領域の抽出方法としては、物体認識や顕著性マップを利用する方法が知られている。 Conventionally, a technique for extracting a user's region of interest from an image has been widely used for automatic cropping/thumbnail generation of an image, preprocessing of annotation generation in image understanding/image search, and the like. Methods using object recognition and saliency maps are known.

物体認識に基づく興味領域抽出技術として、特許文献１は、画像中から顔領域を検出し、顔領域の画像を抽出する技術を開示し、特許文献２は、人検出により画像中の人物領域を抽出する技術を開示する。物体認識に基づいて興味領域抽出を行う場合、物体ごとにモデルを用意する必要がある。 As a region-of-interest extraction technique based on object recognition, Patent Document 1 discloses a technique for detecting a face region in an image and extracting an image of the face region, and Patent Document 2 discloses a person region in the image by human detection. A technique for extracting is disclosed. When extracting the region of interest based on object recognition, it is necessary to prepare a model for each object.

一方、顕著性マップを用いた興味領域抽出では、色やエッジといった低次の特徴量を用いることで、より汎用的な興味領域抽出が可能となる。この点につき、非特許文献１は、脳神経科学において研究されている人間の視覚モデルを利用し、画像の局所的な特徴からボトムアップ的に顕著性マップを生成する方法を開示する。また、特許文献３は、各画素で算出されたエッジ量のマップに対して、注目領域重み付けマップを乗算することで精度良く顕著性マップを得る技術を開示する。さらに、特許文献４、５は、画像特徴量に深度情報を合わせて顕著性を算出する技術を開示する。 On the other hand, in the interest area extraction using the saliency map, a more general interest area extraction can be performed by using low-order feature quantities such as colors and edges. In this regard, Non-Patent Document 1 discloses a method of generating a saliency map in a bottom-up manner from a local feature of an image using a human visual model studied in neuroscience. Further, Patent Document 3 discloses a technique of accurately obtaining a saliency map by multiplying the map of the edge amount calculated for each pixel by the attention area weighting map. Furthermore, Patent Documents 4 and 5 disclose techniques for calculating saliency by combining depth information with image feature amounts.

さらに近年では、画像の低次の特徴（色、エッジ、深度など）に対して、より高次の意味的な情報を利用して興味領域抽出を行うアプローチが試みられている。この点につき、非特許文献２，３は、ニューラルネットワークを用いて画像から高次特徴を抽出し、興味領域を推定する方法を開示する。 Furthermore, in recent years, an approach has been attempted in which a region of interest is extracted by using higher-order semantic information for lower-order features (color, edge, depth, etc.) of an image. In this regard, Non-Patent Documents 2 and 3 disclose a method of estimating a region of interest by extracting higher-order features from an image using a neural network.

さらに近年、１８０度を超える画角を有する魚眼カメラや３６０度全方位を撮影可能な全方位カメラなどの超広角カメラが広く用いられるようになっており、これらの超広角画像から精度良く興味領域を推定したいという要請がある。 Furthermore, in recent years, super wide-angle cameras such as fish-eye cameras having an angle of view of more than 180 degrees and omnidirectional cameras capable of shooting 360 degrees in all directions have been widely used. There is a request to estimate the area.

本発明は、上記に鑑みてなされたものであり、超広角画像から精度良く興味領域（注目点）を推定することができる画像解析装置を提供することを目的とする。 The present invention has been made in view of the above, and an object thereof is to provide an image analysis apparatus capable of accurately estimating a region of interest (point of interest) from an ultra-wide-angle image.

本発明者は、超広角画像から精度良く興味領域（注目点）を推定することができる画像解析装置につき鋭意検討した結果、以下の構成に想到し、本発明に至ったのである。 As a result of earnest studies on an image analysis apparatus capable of accurately estimating a region of interest (point of interest) from an ultra-wide-angle image, the present inventor conceived the following configuration and arrived at the present invention.

すなわち、本発明によれば、入力画像から注目点を抽出する画像解析装置であって、前記入力画像の各位置の要素特徴を抽出する要素特徴抽出部と、前記入力画像を複数の領域に分割し、分割した領域毎に前記要素特徴を積算して領域特徴を算出する領域特徴算出部と、算出された前記領域特徴から所定の回帰モデルに基づいて前記入力画像の注目点を算出する注目点回帰部と、を含む、画像解析装置が提供される。 That is, according to the present invention, there is provided an image analysis device for extracting a point of interest from an input image, the element feature extraction unit extracting the element feature at each position of the input image, and dividing the input image into a plurality of regions. Then, an area feature calculation unit that calculates the area feature by integrating the element features for each divided area, and an attention point that calculates the attention point of the input image based on a predetermined regression model from the calculated area feature An image analysis apparatus including a regression unit is provided.

上述したように、本発明によれば、超広角画像から精度良く興味領域（注目点）を推定することができる画像解析装置が提供される。 As described above, according to the present invention, there is provided an image analysis device capable of accurately estimating a region of interest (point of interest) from an ultra wide-angle image.

Equirectangular形式（正距円筒図法）の画像を説明するための概念図。A conceptual diagram for explaining an image in the Equirectangular format (equidistant cylinder projection). 第１実施形態の画像解析装置の機能ブロック図。The functional block diagram of the image analysis apparatus of 1st Embodiment. 第１実施形態の画像解析装置が実行する処理を示すフローチャート。3 is a flowchart showing a process executed by the image analysis device of the first embodiment. 要素特徴抽出部が実行する処理を説明するための概念図。FIG. 4 is a conceptual diagram for explaining a process executed by an element feature extraction unit. 要素特徴抽出部が実行する処理を説明するための概念図。FIG. 4 is a conceptual diagram for explaining a process executed by an element feature extraction unit. 領域特徴算出部が実行する処理を説明するための概念図。FIG. 6 is a conceptual diagram for explaining a process executed by a region feature calculation unit. 要素特徴抽出部が実行する処理を説明するための概念図。FIG. 4 is a conceptual diagram for explaining a process executed by an element feature extraction unit. 第２実施形態の画像解析装置の機能ブロック図。The functional block diagram of the image analysis apparatus of 2nd Embodiment. 第２実施形態の画像解析装置が実行する処理を示すフローチャート。The flowchart which shows the process which the image analysis apparatus of 2nd Embodiment performs. 第３実施形態の画像解析装置の機能ブロック図。The functional block diagram of the image analysis apparatus of 3rd Embodiment. 第３実施形態の画像解析装置が実行する処理を示すフローチャート。The flowchart which shows the process which the image analysis apparatus of 3rd Embodiment performs. 第４実施形態の画像解析装置の機能ブロック図。The functional block diagram of the image analysis apparatus of 4th Embodiment. 第４実施形態の画像解析装置が実行する処理を示すフローチャート。The flowchart which shows the process which the image analysis apparatus of 4th Embodiment performs. 本実施形態の画像解析装置のハードウェア構成図。The hardware block diagram of the image analysis apparatus of this embodiment.

以下、本発明を、実施形態をもって説明するが、本発明は後述する実施形態に限定されるものではない。なお、以下に参照する各図においては、共通する要素について同じ符号を用い、適宜、その説明を省略するものとする。 Hereinafter, the present invention will be described with reference to embodiments, but the present invention is not limited to the embodiments described below. In each drawing referred to below, common elements are denoted by the same reference numerals, and description thereof will be appropriately omitted.

本発明の実施形態である画像解析装置は、入力された画像から興味領域を抽出する機能を備え、より具体的には、注目点（興味領域内の点、または、興味領域の重心）を推定する機能を備える。ここで、本実施形態の画像解析装置の説明に入る前に、超広角画像（魚眼カメラや全方位カメラの撮影画像など）に対して、従来の興味領域抽出技術を適用した場合、興味領域を精度良く抽出することができない理由について説明する。 An image analysis apparatus according to an embodiment of the present invention has a function of extracting a region of interest from an input image, and more specifically, estimates a point of interest (a point within the region of interest or a center of gravity of the region of interest). It has a function to do. Here, before entering the description of the image analysis apparatus of the present embodiment, if the conventional ROI extraction technique is applied to a super wide-angle image (image captured by a fisheye camera or an omnidirectional camera), the ROI The reason why can not be extracted accurately will be described.

まず第一に、超広角画像を、図１に示すEquirectangular形式（正距円筒図法）の画像に変換し、変換後の画像から興味領域を抽出するといった方法が考えられる。ここで、Equirectangular形式は、主にパノラマ撮影に使われる画像の表現形式であり、図１に示すように、画素の３次元方向を緯度と経度に分解し、正方格子状に対応する画素値を並べた画像形式である。Equirectangular形式の画像からは、経度緯度の座標値から任意の３次元方向の画素値を得ることができ、概念的には、単位球に画素値がプロットされたものとして捉えることができる。 First of all, a method of converting the super wide-angle image into an image of the Equirectangular format (equidistant cylinder projection) shown in FIG. 1 and extracting the region of interest from the converted image can be considered. Here, the Equirectangular format is an image representation format mainly used for panoramic photography, and as shown in FIG. 1, the three-dimensional directions of pixels are decomposed into latitude and longitude, and the pixel values corresponding to a square lattice are calculated. It is a side-by-side image format. From the image in the Equirectangular format, it is possible to obtain pixel values in arbitrary three-dimensional directions from the coordinate values of longitude and latitude, and conceptually, it can be considered that the pixel values are plotted on the unit sphere.

しかしながら、Equirectangular形式の画像から直接的に興味領域を抽出する場合、歪みが極端に大きくなる天頂・天底近傍の領域や画像境界に存在する興味領域を抽出することができないという問題がある。 However, when extracting the region of interest directly from the image in the Equirectangular format, there is a problem that the region near the zenith/nadir where the distortion becomes extremely large or the region of interest existing at the image boundary cannot be extracted.

第二に、超広角画像を複数の画像に分割し、各分割画像から興味領域を抽出するといった方法が考えられる。しかしながら、この場合、各分割画像から得られる顕著性マップの統合法が明らかではない。 Secondly, a method of dividing the super wide-angle image into a plurality of images and extracting the region of interest from each divided image can be considered. However, in this case, it is not clear how to integrate the saliency maps obtained from the divided images.

さらに、超広角画像の場合、一つの画像内に複数の顕著性の高い物体が含まれていることが想定されるが、従来技術には、複数の物体間の優先順位を判断する仕組みがない。 Further, in the case of an ultra wide-angle image, it is assumed that a plurality of highly prominent objects are included in one image, but the prior art does not have a mechanism for determining the priority order among a plurality of objects. ..

以上、従来の興味領域抽出技術の問題点について説明してきたが、この問題に対し、本実施形態の画像解析装置は、歪みが大きく、複数の物体を含む超広角画像から、精度良くユーザの興味領域を抽出する機能を備えることを特徴とする。以下、本実施形態の画像解析装置の具体的な構成について説明する。 Although the problem of the conventional region-of-interest extraction technology has been described above, the image analysis device of the present embodiment has a large distortion, and an ultra-wide-angle image including a plurality of objects accurately addresses the user's interest. It is characterized by having a function of extracting a region. Hereinafter, a specific configuration of the image analysis apparatus of this embodiment will be described.

（第１実施形態）
本発明の第１実施形態である画像解析装置１００Ａは、処理対象となる画像を複数の領域に分割し、各分割領域の特徴から処理対象となる画像の注目点を推定する機能を備える。以下、図２に示す機能ブロック図に基づいて、本実施形態の画像解析装置１００Ａの機能構成を説明する。 (First embodiment)
The image analysis apparatus 100A according to the first embodiment of the present invention has a function of dividing an image to be processed into a plurality of areas and estimating a point of interest of the image to be processed from the characteristics of each divided area. Hereinafter, the functional configuration of the image analysis apparatus 100A of the present embodiment will be described based on the functional block diagram shown in FIG.

図２に示すように、画像解析装置１００Ａは、画像入力部１０１と、要素特徴抽出部１０２と、領域特徴算出部１０３と、注目点回帰部１０４と、注目点出力部１０５とを含んで構成される。 As shown in FIG. 2, the image analysis device 100A includes an image input unit 101, an element feature extraction unit 102, a region feature calculation unit 103, an attention point regression unit 104, and an attention point output unit 105. To be done.

画像入力部１０１は、処理対象となる画像を入力する手段である。 The image input unit 101 is means for inputting an image to be processed.

要素特徴抽出部１０２は、処理対象となる画像の各位置の要素特徴を抽出する手段である。 The element feature extraction unit 102 is means for extracting the element feature at each position of the image to be processed.

領域特徴算出部１０３は、処理対象となる画像を複数の領域に分割し、分割した領域毎に要素特徴を積算して領域特徴を算出する手段である。 The area feature calculation unit 103 is a unit that divides an image to be processed into a plurality of areas and integrates the element features for each of the divided areas to calculate the area features.

注目点回帰部１０４は、算出された領域特徴から所定の回帰モデルに基づいて処理対象となる画像の注目点を算出する手段である。 The attention point regression unit 104 is means for calculating the attention point of the image to be processed from the calculated region features based on a predetermined regression model.

注目点出力部１０５は、算出された注目点を出力する手段である。 The attention point output unit 105 is means for outputting the calculated attention point.

なお、本実施形態では、画像解析装置１００Ａを構成するコンピュータが所定のプログラムを実行することにより、画像解析装置１００Ａが上述した各手段として機能する。 In the present embodiment, the computer configuring the image analysis apparatus 100A executes a predetermined program, so that the image analysis apparatus 100A functions as each of the above-mentioned units.

以上、本実施形態の画像解析装置１００Ａの機能構成について説明してきたが、続いて、画像解析装置１００Ａが実行する処理の内容を図３に示すフローチャートに基づいて説明する。 The functional configuration of the image analysis apparatus 100A according to the present embodiment has been described above. Next, the content of processing executed by the image analysis apparatus 100A will be described based on the flowchart shown in FIG.

まず、ステップ１０１では、画像入力部１０１が、任意の記憶手段から処理対象となるEquirectangular形式の全方位画像を読み込んで入力する。以下、入力した画像を“入力画像”という。 First, in step 101, the image input unit 101 reads and inputs an omnidirectional image in the Equirectangular format to be processed from an arbitrary storage means. Hereinafter, the input image will be referred to as an “input image”.

続くステップ１０２では、要素特徴抽出部１０２が、先のステップ１０１で読み込んだ入力画像の各位置から要素特徴を抽出する。なお、要素特徴は、入力画像の画素単位で抽出しても良いし、特定のサンプリング位置から抽出しても良い。 In the following step 102, the element feature extraction unit 102 extracts the element feature from each position of the input image read in the previous step 101. The element feature may be extracted for each pixel of the input image or may be extracted from a specific sampling position.

本実施形態では、要素特徴として、色、エッジ、顕著性、物体位置／ラベル、などを用いることができる。 In the present embodiment, color, edge, saliency, object position/label, etc. can be used as element features.

色特徴としては、特定の色空間(RGBやL*a*b*など)の値、特定色（たとえば肌の色）とのユークリッド距離、マハラノビス距離などを使用することができる。 As the color feature, a value in a specific color space (RGB, L*a*b*, etc.), a Euclidean distance from a specific color (for example, skin color), a Mahalanobis distance, or the like can be used.

エッジ特徴としては、Sobelフィルタやガボールフィルタなどで抽出した画素値勾配の方向や強度を用いることができる。 As the edge feature, the direction and strength of the pixel value gradient extracted by the Sobel filter or Gabor filter can be used.

顕著性としては、既存の顕著性抽出アルゴリズムによって抽出された顕著性の値を用いることができる。ここでいう、顕著性抽出アルゴリズムの例として、先に挙げた特許文献３〜５、非特許文献１〜３に開示されるアルゴリズムを挙げることができる。 As the saliency, a saliency value extracted by an existing saliency extraction algorithm can be used. Examples of the saliency extraction algorithm here include the algorithms disclosed in the above-mentioned Patent Documents 3 to 5 and Non-Patent Documents 1 to 3.

物体位置／ラベル特徴としては、既知の物体検出アルゴリズムで検出された物体の位置（通常、検出矩形の４隅の座標で表される）と物体種（顔、人、車、等）を用いることができる。ここで、物体検出アルゴリズムの例として、先に挙げた特許文献１、２に開示されるアルゴリズムを挙げることができる。 As the object position/label feature, use the position of the object detected by a known object detection algorithm (usually represented by the coordinates of the four corners of the detection rectangle) and the object type (face, person, car, etc.). You can Here, as an example of the object detection algorithm, the algorithms disclosed in the above-mentioned Patent Documents 1 and 2 can be cited.

なお、本実施形態で採用することができる要素特徴は、上記に限定されるものではなく、従来、画像認識の分野で使用されているその他の特徴量（LBP, Haar like feature, HOG, SIFT,など）を採用しても良いことはいうまでもない。 Note that the element features that can be adopted in the present embodiment are not limited to the above, and other feature amounts conventionally used in the field of image recognition (LBP, Haar like feature, HOG, SIFT, It goes without saying that it is acceptable to adopt (for example).

ここで、本実施形態においては、特徴抽出精度の観点から、以下の方法によって要素特徴を抽出する。 Here, in the present embodiment, from the viewpoint of feature extraction accuracy, element features are extracted by the following method.

図１に示すように、Equirectangular形式の画像からは、経度緯度の座標値から任意の３次元方向の画素値を得ることができ、Equirectangular形式の画像は、概念的には単位球に画素値がプロットされたものとして捉えることができる。そこで、本実施形態では、図４に示すように、所定の投影面を定義し、単位球の中心を投影中心Ｏとして、下記式（１）により、Equirectangular形式の全方位画像の画素値（θ，φ）を定義した投影面上の画素値（ｘ，ｙ）に対応させる透視投影変換を行い、透視投影変換した画像から要素特徴を抽出する。なお、下記式（１）において、Ｐは透視投影行列を示し、等号は０以外のスカラー倍で等しいことを示す。 As shown in FIG. 1, from an image in Equirectangular format, a pixel value in an arbitrary three-dimensional direction can be obtained from coordinate values of longitude and latitude, and an image in Equirectangular format conceptually has pixel values in a unit sphere. It can be considered as plotted. Therefore, in the present embodiment, as shown in FIG. 4, a predetermined projection plane is defined, the center of the unit sphere is set as the projection center O, and the pixel value of the omnidirectional image in the Equirectangular format (θ , Φ) is associated with the pixel value (x, y) on the defined projection surface, and the element feature is extracted from the image subjected to the perspective projection conversion. In the following formula (1), P indicates a perspective projection matrix, and the equal sign indicates equality with a scalar multiple other than 0.

具体的には、Equirectangular形式の全方位画像の投影面として、単位球と共通する中心を有する正多面体を定義した上で、各面の法線方向を視線方向として透視投影変換を行う。図５(ａ)は、全方位画像の投影面として正八面体を定義した例を示し、図５(ｂ)は、全方位画像の投影面として正十二面体を定義した例を示す。 Specifically, a regular polyhedron having a common center with the unit sphere is defined as the projection surface of the omnidirectional image of the Equirectangular format, and perspective projection conversion is performed with the normal direction of each surface as the line-of-sight direction. FIG. 5A shows an example in which a regular octahedron is defined as the projection surface of the omnidirectional image, and FIG. 5B shows an example in which a regular dodecahedron is defined as the projection surface of the omnidirectional image.

再び、図３に戻って説明を続ける。 Returning to FIG. 3 again, the description will be continued.

続くステップ１０３では、領域特徴算出部１０３が、入力画像（全方位画像）の撮影方向を空間的に等分割することによって、当該入力画像を複数の領域に分割した上で、各分割領域から抽出された要素特徴を積算し、領域ごとの積算値を領域特徴として算出する。例えば、図５に示したように、全方位画像の球面を正多面体で近似する場合は、正多面体の各面を投影面とする透視投影変換画像から抽出された要素特徴の積算値が領域特徴となる。なお、RGBで構成される色特徴を要素特徴とする場合、各分割領域において、R,G,Bそれぞれの値を積算する。 In the following step 103, the region feature calculation unit 103 divides the input image (omnidirectional image) into a plurality of regions by spatially equally dividing the shooting direction of the input image, and then extracts the divided regions from the respective divided regions. The obtained element features are integrated, and the integrated value for each region is calculated as the region feature. For example, as shown in FIG. 5, when the spherical surface of the omnidirectional image is approximated by a regular polyhedron, the integrated value of the element features extracted from the perspective projection conversion image in which each face of the regular polyhedron is the projection surface is the regional feature. Becomes When the color feature composed of RGB is used as the element feature, the values of R, G, and B are integrated in each divided area.

図６は、エッジ強度、顕著性、物体位置（顔分布）という３種類の要素特徴を用いて領域特徴を算出した場合を例示的に示す。このように、２種類以上の要素特徴を用いて領域特徴を算出する場合には、算出される領域特徴の数＝分割領域数×要素特徴の種類数となる。 FIG. 6 exemplarily shows a case where the area feature is calculated by using three types of element features of edge strength, saliency, and object position (face distribution). As described above, when the area feature is calculated using two or more types of element features, the number of calculated area features=the number of divided areas×the number of types of element features.

続くステップ１０４では、注目点回帰部１０４が、予め用意された所定の回帰モデルを用いて、先のステップ１０３で算出した領域特徴から注目点の位置を算出する。ここで、注目点の位置ｙは下記式（２）で表すことができる。 In the following step 104, the attention point regression unit 104 calculates the position of the attention point from the area feature calculated in the previous step 103 using a predetermined regression model prepared in advance. Here, the position y of the point of interest can be expressed by the following equation (2).

上記式（２）において、ｘは領域特徴ベクトルを示し、ｆは回帰モデルを示し、αは回帰パラメータを示す。なお、回帰パラメータαは、事前に訓練データ（ｘとｙの複数の組）を用いた機械学習によって同定しておく。また、回帰には、線形回帰、ロジスティック回帰、サポートベクトル回帰、ランダムフォレスト回帰、ニューラルネットワークなど、既知の回帰の方法を用いることができる。 In the above formula (2), x represents a region feature vector, f represents a regression model, and α represents a regression parameter. The regression parameter α is identified in advance by machine learning using training data (a plurality of sets of x and y). For regression, known regression methods such as linear regression, logistic regression, support vector regression, random forest regression, and neural network can be used.

以下、例示的に、サポートベクトル回帰を使用する場合について説明する。 Hereinafter, the case of using the support vector regression will be described as an example.

この場合、回帰パラメータαは、サポートベクトル{ｓ_ｉ}、サポートベクトルの重み{ｗ_ｉ}、オフセットｈとなる（実際にはこの他に、カーネルの種類、カーネルのパラメータがハイパーパラメータとして存在する）。注目点の位置ｙは、３次元空間中の単位方向(ｅ_ｘ，ｅ_ｙ，ｅ_ｚ)で表現し、ｅ_ｘ，ｅ_ｙ，ｅ_ｚそれぞれに対して領域特徴ベクトルｘからの回帰モデルを構築する。この場合、回帰モデルｆは下記式（３）で表現することができる。なお、下記式（３）において、Ｋはカーネルを示す。 In this case, the regression parameter α is the support vector {s _i }, the weight of the support vector {w _i }, and the offset h (actually, in addition to this, the kernel type and kernel parameters exist as hyperparameters). .. Position y of the target point is represented by a unit direction in 3-dimensional space _{_{_{(e x, e y, e}}} z), a regression model from the region feature vector x with respect to e _x, e y, _{e z} respectively To do. In this case, the regression model f can be expressed by the following equation (3). In the formula (3) below, K represents a kernel.

最後に、ステップ１０５では、注目点出力部１０５が、先のステップ１０４で算出された注目点の位置を出力し、処理を終了する。 Finally, in step 105, the attention point output unit 105 outputs the position of the attention point calculated in the previous step 104, and the process ends.

本実施形態をクロッピングやサムネイル生成に適用する場合には、上述した手順で求めた注目点を中心に特定の画角を設定することで興味領域を定義し、定義した興味領域の画像を、そのままクロッピング画像やサムネイル画像とする。この場合、設定する画角は、回帰モデルに与えた訓練データにおける注目点を含む興味領域の画角であることが望ましい。また、本実施形態を画像認識／画像検索システムに適用する場合には、注目点を含む物体領域を認識対象、検索対象の物体とする。 When the present embodiment is applied to cropping or thumbnail generation, an area of interest is defined by setting a specific angle of view centered on the point of interest obtained in the procedure described above, and the image of the defined area of interest is used as it is. Use as cropping images or thumbnail images. In this case, the angle of view to be set is preferably the angle of view of the region of interest including the point of interest in the training data given to the regression model. Further, when the present embodiment is applied to the image recognition/image search system, the object region including the point of interest is set as the recognition target object and the search target object.

以上、説明したように、本実施形態においては、画像を歪みの少ない部分画像（分割領域）に分解してから要素特徴を算出するので、１８０度を超える超広角画像をロバストに処理することが可能になる。 As described above, in the present embodiment, since the element feature is calculated after decomposing the image into partial images (divided regions) with less distortion, it is possible to robustly process an ultra wide-angle image exceeding 180 degrees. It will be possible.

また、本実施形態においては、各部分画像から得られた顕著性マップや物体分布を単純に統合するのではなく、分割領域ごとに集約した領域特徴から回帰モデルに基づいて注目点を推定するので、領域ＡにＸという物体が存在し、領域ＢにＹという物体が存在する場合にはＣを注目点とする、といったような領域横断的なルールが機械学習の中で回帰モデルの中に獲得されることにより、領域間の特徴の相互作用を考慮した注目点の推定が可能になる。 Further, in the present embodiment, the saliency map and the object distribution obtained from each partial image are not simply integrated, but the attention point is estimated based on the regression model from the area features aggregated for each divided area. , A region crossing rule exists such that an X object exists in the area A and a Y object exists in the area B, and the C point is the point of interest in the regression model in machine learning. By doing so, it becomes possible to estimate the point of interest in consideration of the interaction of features between regions.

なお、上述した第１実施形態においては、以下に述べる設計変更が可能である。 In addition, in the above-described first embodiment, the following design changes can be made.

例えば、先のステップ１０３の領域特徴の算出時における入力画像の領域分割は、全方位画像の球面を正多面体で近似して分割する方法の他にも、任意の分割方法を採用することができ、例えば、全方位画像の球面を準正多面体で近似して分割しても良いし、全方位画像の球面上にランダムに展開した母点に基づくボロノイ分割によって分割しても良い。なお、要素特徴を行うための分割的な透視投影変換における分割方法と、領域特徴算出のための領域分割における分割方法は必ずしも一致している必要はないが、計算コスト低減の観点から、一致していることが好ましい。 For example, for the area division of the input image at the time of calculating the area features in the previous step 103, an arbitrary division method can be adopted other than the method of approximating the spherical surface of the omnidirectional image with a regular polyhedron. For example, the sphere of the omnidirectional image may be approximated and divided by a quasi-regular polyhedron, or may be divided by Voronoi division based on a generating point randomly developed on the sphere of the omnidirectional image. It should be noted that the division method in the perspective perspective transformation for performing the element feature and the division method in the region division for the region feature calculation do not necessarily have to match, but from the viewpoint of calculation cost reduction, they do not. Preferably.

また、先のステップ１０２の要素特徴抽出の対象画像は、全方位画像を透視投影変換した画像に限らず、その他の投影法によって投影した画像であっても良い。例えば、それは、正投影した画像であって良いし、図７（ａ）、（ｂ）に示すように、投影中心Ｏを単位球の中心からずらして透視投影変換を行った画像であっても良い。図７（ａ）、（ｂ）に示す投影法によれば、画像端の射影歪みを緩和することが可能となり、また画角１８０度以上の投影も可能となるので、より少ない画像分割で要素特徴を抽出することが可能となる。 Further, the target image of the element feature extraction in the previous step 102 is not limited to the image obtained by perspective projection conversion of the omnidirectional image, and may be an image projected by another projection method. For example, it may be an orthographically projected image, or may be an image obtained by performing perspective projection conversion by shifting the projection center O from the center of the unit sphere as shown in FIGS. 7A and 7B. good. According to the projection method shown in FIGS. 7A and 7B, it is possible to reduce the projective distortion at the image end and to project the image with a field angle of 180 degrees or more. It becomes possible to extract features.

また、画角が３６０度に至らないカメラで撮影した画像を処理対象とする場合には、その範囲の画角の画像をEquirectangluar形式に変換してなる画像（部分的に欠損した画像）を上述したのと同様の手順で処理すれば良い。 Further, when processing an image captured by a camera whose angle of view does not reach 360 degrees, the image obtained by converting the image of the angle of view within the range to the Equirectangluar format (partially missing image) is described above. The procedure may be the same as that described above.

さらに、処理対象がEquirectangular形式の画像でない場合であっても、その画像を撮影したカメラが校正済み（すなわち、カメラ撮像面の位置に対応する三次元空間中の光線の方向が既知）である限り、上述したのと同様に扱うことができる。なお、処理対象が未校正カメラの撮影画像である場合は、画像を正多面体で近似して分割する方法を適用することはできないが、その場合は、その他の適用可能な分割方式（例えば、先述のボロノイ分割）で領域分割すれば良い。 Furthermore, even if the processing target is not an image in Equirectangular format, as long as the camera that captured the image is calibrated (that is, the direction of the ray in the three-dimensional space corresponding to the position of the camera imaging plane is known). , Can be handled in the same manner as described above. If the processing target is an image taken by an uncalibrated camera, the method of approximating and dividing the image by a regular polyhedron cannot be applied, but in that case, another applicable division method (for example, the above-mentioned Area division).

以上、本発明の第１実施形態を説明してきたが、続いて、本発明の第２実施形態を説明する。なお、以下では、第１実施形態の内容と共通する部分の説明を省略し、専ら、第１実施形態との相違点のみを説明するものとする。 The first embodiment of the present invention has been described above, and then the second embodiment of the present invention will be described. It should be noted that in the following, description of the parts common to the contents of the first embodiment will be omitted, and only the differences from the first embodiment will be explained.

（第２実施形態）
第２実施形態の画像解析装置１００は、種類の異なる要素特徴を領域内で統合し、統合した領域特徴から入力画像の注目点を推定する機能を備える。 (Second embodiment)
The image analysis apparatus 100 according to the second embodiment has a function of integrating different types of element features within a region and estimating a point of interest of an input image from the integrated region features.

図８は、画像解析装置１００Ｂの機能ブロック図を示す。図８に示すように、画像解析装置１００Ｂの機能構成は、領域特徴統合部１１０を追加的に備える他は、第１実施形態の画像解析装置１００Ａと同じである。 FIG. 8 shows a functional block diagram of the image analysis apparatus 100B. As shown in FIG. 8, the functional configuration of the image analysis apparatus 100B is the same as that of the image analysis apparatus 100A of the first embodiment, except that the area feature integration unit 110 is additionally provided.

ここで、領域特徴統合部１１０は、領域特徴をより低次元の特徴に写像して統合領域特徴を得る手段である。 Here, the area feature integration unit 110 is means for mapping the area features to lower-dimensional features to obtain integrated area features.

以下、画像解析装置１００Ｂが実行する処理の内容を図９に示すフローチャートに基づいて説明する。 The contents of the processing executed by the image analysis apparatus 100B will be described below with reference to the flowchart shown in FIG.

ステップ１０１〜１０３の内容は、図３に基づいて説明した先のステップ１０１〜１０３のそれと同じであるので説明を省略し、ここでは、ステップ１１０から説明する。 Since the contents of steps 101 to 103 are the same as those of steps 101 to 103 described above with reference to FIG. 3, the description thereof will be omitted. Here, step 110 will be described.

ステップ１１０では、領域特徴統合部１１０が、先のステップで算出され領域特徴を、より低次元の特徴に統合する。ここで、領域特徴統合部１１０は、下記式（４）に示すように、領域ｉの領域特徴ベクトルｘ_ｉに対して、低次元の統合領域特徴部ベクトルｘ_ｉ’を写像ｇにより求める。なお、本実施形態では、写像ｇを、予め設計するか、機械学習により同定しておく。 In step 110, the area feature integration unit 110 integrates the area features calculated in the previous step into lower dimensional features. Here, the area feature integration unit 110 obtains a low-dimensional integrated area feature vector x _i ′ by the mapping g with respect to the area feature vector x _i of the area i as shown in the following expression (4). In the present embodiment, the mapping g is designed in advance or identified by machine learning.

続くステップ１０４では、注目点回帰部１０４が、予め用意された所定の回帰モデルを用いて、先のステップ１１０で求めた統合領域特徴ベクトルｘ_ｉ’から注目点の位置を算出する。ここで、注目点の位置ｙは下記式（５）で表すことができる。 In the following step 104, the attention point regression unit 104 calculates the position of the attention point from the integrated area feature vector x _i ′ obtained in the previous step 110, using a predetermined regression model prepared in advance. Here, the position y of the point of interest can be expressed by the following equation (5).

なお、上記式（５）における｛ｘ_ｉ’｝は、仮に領域がＳ個ある場合は、下記式（６）であることを示す。 In addition, {x _i '} in the above formula (5) indicates that if there are S areas, the following formula (6) is given.

ここで、写像ｇについて説明する。 Here, the mapping g will be described.

最も単純な写像ｇは、領域特徴ベクトルｘ_ｉの要素をすべて加算する写像である。この場合、領域特徴ベクトルｘ_ｉは１次元まで集約される。 The simplest mapping g is a mapping that adds all the elements of the region feature vector x _i . In this case, the region feature vector x _i is aggregated up to one dimension.

他の例として、下記式（７）に示すように、写像ｇとして、Ｒ^ｎからＲ^ｍ（ｍ＜ｎ）への線形変換Ｗを採用することもできる。 As another example, a linear conversion W from R ⁿ to R ^m (m<n) can be adopted as the mapping g as shown in the following formula (7).

なお、線形変換Ｗは、訓練データとして、領域特徴ベクトルｘと注目点の位置ｙの組が与えられている場合、機械学習により獲得することができる。すなわち、統合領域特徴ｘ_ｉ’から注目点の位置ｙへの写像ｆが決定されている場合、訓練データのｙに対して上記式（５）を満たす{ｘ_ｉ’}を求め、{ｘ_ｉ’}とｘの組から写像ｇ（つまりは行列Ｗ）を学習で求めることができる。写像ｆが決定されていない場合は、仮に決定したｆに対してｇを学習し、学習したｇに対してｆを学習する、というプロセスを繰り返すことでfおよびｇを求めることができる。ここで、fおよびｇがともに線形変換であり、且つ、ＷがＲ^ｎ→Ｒである場合には、下記式（８）に示すように、ｆを行列Ｖで表現することができる。 The linear transformation W can be acquired by machine learning when a set of a region feature vector x and a position y of a point of interest is given as training data. That is, when the mapping f from the integrated region feature x _i ′ to the position y of the target point is determined, {x _i ′} that satisfies the above equation (5) is obtained for y of the training data, and {x _i The mapping g (that is, the matrix W) can be obtained by learning from the set of'} and x. When the mapping f is not determined, f and g can be obtained by repeating the process of learning g for the temporarily determined f and learning f for the learned g. Here, when both f and g are linear transformations and W is R ⁿ →R, f can be expressed by the matrix V as shown in the following expression (8).

そして、上記式（８）と式（７）を整理すれば、全体は、下記式（９）、（１０）で表すことができる。
If the above equations (8) and (7) are arranged, the whole can be expressed by the following equations (9) and (10).

ここで、上記式（９）において、Ｖを固定してＶＸ^Ｔからｙへの線形回帰と見てＷを求め、Ｗを固定してＷＸからｙへの線形回帰と見てＶを求めるというプロセスを繰り返すことにより、ＶおよびＷ、すなわちｆおよびｇを求めることができる。 Here, in the above equation (9), a process of fixing V and seeing W as a linear regression from VX ^T to y, and obtaining W by fixing W and seeing V as a linear regression from WX to y By repeating the above, V and W, that is, f and g can be obtained.

また、写像ｇとして線形変換以外のものを考えることもできる。結局のところ、gを求めることは回帰問題を解くことであり、ｆと同様にサポートベクトル回帰、ランダムフォレスト回帰、ニューラルネットワークなど、既知の回帰の方法を用いることができる。 It is also possible to consider something other than linear transformation as the mapping g. After all, finding g is to solve a regression problem, and similarly to f, a known regression method such as support vector regression, random forest regression, or neural network can be used.

以上、説明したように、本実施形態によれば、領域特徴をより少ない数の統合領域特徴に集約することにより、回帰モデルのパラメータを削減することができる。線形回帰を例に取れば、第１実施形態では「（要素特徴数）×（領域分割数）」に比例した数のパラメータが必要であったのに対し、第１実施形態では「（要素特徴数）＋（領域分割数）」に比例した数までパラメータ数を減らすことができる。非線形回帰の場合も同様のパラメータ削減効果が得られる。これにより、回帰モデルを求める時に生じるオーバーフィッティングを抑制することができ、少ない訓練データから精度良く注目点を推定できるようになる。 As described above, according to the present embodiment, the parameters of the regression model can be reduced by aggregating the region features into a smaller number of integrated region features. Taking linear regression as an example, in the first embodiment, the number of parameters proportional to “(number of element features)×(number of area divisions)” was required, whereas in the first embodiment, “(element features) The number of parameters can be reduced to a number proportional to (number)+(area division number)”. In the case of nonlinear regression, the same parameter reduction effect can be obtained. As a result, it is possible to suppress overfitting that occurs when a regression model is obtained, and it becomes possible to accurately estimate the point of interest from a small amount of training data.

以上、本発明の第２実施形態を説明してきたが、続いて、本発明の第３実施形態を説明する。なお、以下では、第１実施形態の内容と共通する部分の説明を省略し、専ら、第１実施形態との相違点のみを説明するものとする。 The second embodiment of the present invention has been described above, and then the third embodiment of the present invention will be described. It should be noted that in the following, description of the parts common to the contents of the first embodiment will be omitted, and only the differences from the first embodiment will be explained.

（第３実施形態）
第３実施形態の画像解析装置１００Ｃは、ソフトセグメンテーションされた領域に対して領域特徴を算出し、入力画像における注目点を推定する機能を備える。 (Third Embodiment)
The image analysis apparatus 100C of the third embodiment has a function of calculating a region feature for a soft segmented region and estimating a point of interest in the input image.

図１０は、画像解析装置１００Ｃの機能ブロック図を示す。図１０に示すように、画像解析装置１００Ｃの機能構成は、第１実施形態の画像解析装置１００Ａの領域特徴算出部１０３に代えて、領域特徴算出部１２０を備える他は同じである。 FIG. 10 shows a functional block diagram of the image analysis apparatus 100C. As shown in FIG. 10, the functional configuration of the image analysis apparatus 100C is the same as that of the image analysis apparatus 100A of the first embodiment except that the area feature calculation unit 120 is provided instead of the region feature calculation unit 103.

ここで、領域特徴算出部１２０は、領域毎に位置に応じた重み関数と要素特徴を加重加算して領域特徴を算出する手段である。 Here, the area feature calculation unit 120 is means for calculating the area feature by performing weighted addition of the weighting function and the element feature according to the position for each area.

以下、画像解析装置１００Ｃが実行する処理の内容を図１１に示すフローチャートに基づいて説明する。 The contents of the processing executed by the image analysis device 100C will be described below with reference to the flowchart shown in FIG.

ステップ１０１〜１０２の内容は、図３に基づいて説明した先のステップ１０１〜１０２のそれと同じであるので説明を省略し、ここでは、ステップ１２０から説明する。 The contents of Steps 101 to 102 are the same as those of Steps 101 to 102 described above with reference to FIG. 3, and therefore a description thereof will be omitted. Here, Step 120 will be described.

ステップ１２０では、領域特徴算出部１２０が、先のステップ１０２で抽出された要素特徴を領域ごとに積算して領域特徴を算出する。本実施形態では、隣接する領域間にオーバーラップが存在し、単位球面上の位置ｑ＝（ＸＹＺ）^Ｔに対して、領域ｉへの所属確率Ｐ（ｉ|ｑ）が定義されている。ここで、領域の中心座標は第１実施形態のように多面体の面中心やランダム生成で設定することができる。所属確率Ｐ（ｉ|ｑ）は領域ｉの中心座標をｃ_ｉ（単位ベクトル）として、例えば、下記式（１１）に示すように設定することができる。 In step 120, the area feature calculation unit 120 calculates the area feature by integrating the element features extracted in the previous step 102 for each area. In the present embodiment, there is an overlap between adjacent regions, and the probability of belonging P(i|q) to the region i is defined for the position q=(XYZ) ^T on the unit spherical surface. Here, the center coordinates of the region can be set by the surface center of the polyhedron or by random generation as in the first embodiment. The belonging probability P(i|q) can be set, for example, as shown in the following formula (11) with the center coordinate of the region i as c _i (unit vector).

上記式（１１）において、βはパラメータであり、βが小さいほどソフトセグメンテーションとなる。ただし、上記式（１１）は例示であって、所属確率Ｐはこの形に限らず自由に設計することができる。 In the above equation (11), β is a parameter, and the smaller β is, the softer the segmentation becomes. However, the above formula (11) is an example, and the belonging probability P is not limited to this form and can be freely designed.

本実施形態では、以上の設定のもとに、領域特徴算出部１２０が、領域毎に位置に応じた重み関数と要素特徴を加重加算して領域特徴ｘ_ｉを算出する。具体的には、領域ごとに位置ｑにおける要素特徴を所属確率で重み付けて積算することで領域特徴ｘ_ｉを求める。より具体的には、位置qにおける要素特徴ベクトルａ（ｑ）に対して、下記式（１２）により、領域ｉにおける領域特徴ｘ_ｉを求める。 In the present embodiment, based on the above settings, the region feature calculation unit 120 calculates the region feature x _i by performing weighted addition of the weighting function and the element feature according to the position for each region. Specifically, the region feature x _i is obtained by weighting the element features at the position q for each region with the belonging probabilities and integrating them. More specifically, with respect to the element feature vector a(q) at the position q, the area feature x _i in the area i is obtained by the following equation (12).

ここで、上記式（１２）は、第１実施形態の一般化となっていることが見て取れるであろう。すなわち、第１実施形態は、上記式（１２）において、所属確率Ｐ（ｉ|ｑ）が０か１のみを取る特殊な例（ハードセグメンテーション）と捉えることができる。 Here, it can be seen that the above formula (12) is a generalization of the first embodiment. That is, the first embodiment can be regarded as a special example (hard segmentation) in which the belonging probability P(i|q) takes only 0 or 1 in the above formula (12).

さらに確率から離れて一般化すれば、任意の関数ｈ_ｉ（ｑ）を用いて、領域特徴ｘ_ｉを下記式（１３）で求めることができる。 Further generalizing away from the probability, the area feature x _i can be obtained by the following formula (13) using an arbitrary function h _i (q).

本実施形態では、上記式（１３）におけるｈ_ｉ（ｑ）として、球面調和関数を用いることができる。
In the present embodiment, a spherical harmonic function can be used as h _i (q) in the above equation (13).

続くステップ１０４では、注目点回帰部１０４が、予め用意された所定の回帰モデルを用いて、先のステップ１０３で算出した領域特徴から注目点の位置を算出し、最後に、ステップ１０５では、注目点出力部１０５が、先のステップ１０４で算出された注目点の位置を出力し、処理を終了する。 In the following step 104, the attention point regression unit 104 calculates the position of the attention point from the region feature calculated in the previous step 103 by using a predetermined regression model prepared in advance, and finally, in step 105, The point output unit 105 outputs the position of the point of interest calculated in the previous step 104, and the process ends.

以上、説明したように、本実施形態によれば、領域をソフトセグメンテーションすることにより、領域の離散化による誤差を低減し、より高い精度で注目点を推定することが可能となる。 As described above, according to the present embodiment, it is possible to reduce the error due to the discretization of the region and estimate the target point with higher accuracy by soft segmenting the region.

以上、本発明の第３実施形態を説明してきたが、続いて、本発明の第４実施形態を説明する。なお、以下では、第１実施形態の内容と共通する部分の説明を省略し、専ら、第１実施形態との相違点のみを説明するものとする。 The third embodiment of the present invention has been described above, and then the fourth embodiment of the present invention will be described. It should be noted that in the following, description of the parts common to the contents of the first embodiment will be omitted, and only the differences from the first embodiment will be explained.

（第４実施形態）
第４実施形態の画像解析装置１００Ｄは、入力画像から複数個の注目点を推定する機能を備える。 (Fourth Embodiment)
The image analysis apparatus 100D of the fourth embodiment has a function of estimating a plurality of points of interest from an input image.

図１２は、画像解析装置１００Ｄの機能ブロック図を示す。図１２に示すように、画像解析装置１００Ｄの機能構成は、第１実施形態の画像解析装置１００Ａの領域特徴算出部１０３および注目点回帰部１０４に代えて、要素特徴統合部１３０および注目点探索部１４０を備える他は同じである。 FIG. 12 shows a functional block diagram of the image analysis device 100D. As shown in FIG. 12, the functional configuration of the image analysis device 100D is the element feature integration unit 130 and the attention point search instead of the area feature calculation unit 103 and the attention point regression unit 104 of the image analysis device 100A of the first embodiment. It is the same except that the unit 140 is provided.

ここで、要素特徴統合部１３０は、入力画像の各位置の要素特徴を１つの値に統合して統合要素特徴を得る手段であり、注目点探索部１４０は、統合要素特徴と所定の窓関数の積和からなる評価関数の局所解として１以上の注目点を算出する手段である。 Here, the element feature integrating unit 130 is a unit that obtains an integrated element feature by integrating the element features at each position of the input image into one value, and the attention point searching unit 140 uses the integrated element feature and a predetermined window function. It is a means for calculating one or more points of interest as a local solution of the evaluation function consisting of the sum of products.

以下、画像解析装置１００Ｄが実行する処理の内容を図１３に示すフローチャートに基づいて説明する。 The contents of the processing executed by the image analysis device 100D will be described below with reference to the flowchart shown in FIG.

ステップ１０１〜１０２の内容は、図３に基づいて説明した先のステップ１０１〜１０２のそれと同じであるので説明を省略し、ここでは、ステップ１３０から説明する。 The contents of Steps 101 to 102 are the same as those of Steps 101 to 102 described above with reference to FIG. 3, and thus the description thereof will be omitted. Here, Step 130 will be described.

ステップ１３０では、要素特徴統合部１３０が要素特徴を結合する。本実施形態では、位置ｑごとに得られている要素特徴ベクトルを第２実施形態と同様の方法で統合し１次元の値とする。すなわち、第２実施形態では領域ごとに要素特徴を統合していたところを、本実施形態では、位置ごとに統合する点が異なる。なお、この統合法は、第２実施形態で説明した学習法を使って事前に決めておく。 In step 130, the element feature integration unit 130 combines the element features. In the present embodiment, element feature vectors obtained for each position q are integrated into a one-dimensional value by the same method as in the second embodiment. That is, in the second embodiment, the element feature is integrated for each area, but in the present embodiment, it is integrated for each position. The integration method is determined in advance by using the learning method described in the second embodiment.

続くステップ１４０では、注目点探索部１４０が注目点の位置を探索する。具体的には、先のステップ１３０で得られた、位置ｑごとに要素特徴ベクトルを集約した１次元の値ｂ（ｑ）に対して窓関数ψを使って、下記式（１４）に示す評価関数Ｊ（ｐ）を構築する。 In the following step 140, the attention point search unit 140 searches the position of the attention point. Specifically, using the window function ψ for the one-dimensional value b(q) obtained by collecting the element feature vectors for each position q obtained in the previous step 130, the evaluation shown in the following equation (14) Construct the function J(p).

本実施形態では、評価関数Ｊ（ｐ）の値が閾値以上となる１個以上の局所解ｐを求め、これを注目点とする。窓関数としてはδ関数やガウス関数などを用いることができる。 In the present embodiment, one or more local solutions p in which the value of the evaluation function J(p) is equal to or greater than the threshold value are obtained, and this is set as a point of interest. As the window function, a δ function or a Gaussian function can be used.

以上、説明したように、本実施形態によれば、入力画像から複数個の注目点を推定することができる。 As described above, according to this embodiment, a plurality of points of interest can be estimated from the input image.

最後に、図１４に基づいて本実施形態の画像解析装置１００を構成するコンピュータのハードウェア構成について説明する。 Finally, the hardware configuration of the computer configuring the image analysis apparatus 100 of this embodiment will be described based on FIG.

図１４に示すように、本実施形態の画像解析装置１００を構成するコンピュータは、装置全体の動作を制御するプロセッサ１０と、ブートプログラムやファームウェアプログラムなどを保存するＲＯＭ１２と、プログラムの実行空間を提供するＲＡＭ１４と、画像解析装置１００を上述した各手段として機能させるためのプログラムやオペレーティングシステム（ＯＳ）等を保存するための補助記憶装置１５と、外部入出力装置を接続するための入出力インタフェース１６と、ネットワークに接続するためのネットワーク・インターフェース１８とを備えている。 As shown in FIG. 14, the computer configuring the image analysis apparatus 100 of the present embodiment provides a processor 10 that controls the operation of the entire apparatus, a ROM 12 that stores a boot program, a firmware program, and the like, and a program execution space. RAM 14 for storage, an auxiliary storage device 15 for storing a program or an operating system (OS) for causing the image analysis device 100 to function as each of the above-described means, and an input/output interface 16 for connecting an external input/output device. And a network interface 18 for connecting to a network.

なお、上述した実施形態の各機能は、Ｃ、Ｃ＋＋、Ｃ＃、Ｊａｖａ（登録商標）などで記述されたプログラムにより実現でき、本実施形態のプログラムは、ハードディスク装置、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、フレキシブルディスク、ＥＥＰＲＯＭ、ＥＰＲＯＭなどの記録媒体に格納して頒布することができ、また他の装置が可能な形式でネットワークを介して伝送することができる。 Each function of the above-described embodiment can be realized by a program described in C, C++, C#, Java (registered trademark), etc., and the program of the present embodiment is a hard disk device, CD-ROM, MO, DVD. It can be stored in a recording medium such as a flexible disk, an EEPROM, or an EPROM for distribution, and can be transmitted via a network in a format that can be used by another device.

以上、本発明について実施形態をもって説明してきたが、本発明は上述した実施形態に限定されるものではなく、当業者が推考しうる実施態様の範囲内において、本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 Although the present invention has been described above with reference to the embodiments, the present invention is not limited to the above-described embodiments, and within the scope of the embodiments conceivable to those skilled in the art, as long as the operations and effects of the present invention are exhibited. Are included in the scope of the present invention.

１０…プロセッサ
１２…ＲＯＭ
１４…ＲＡＭ
１５…補助記憶装置
１６…入出力インタフェース
１８…ネットワーク・インターフェース
１００…画像解析装置
１０１…画像入力部
１０２…要素特徴抽出部
１０３…領域特徴算出部
１０４…注目点回帰部
１０５…注目点出力部
１１０…領域特徴統合部
１２０…領域特徴算出部
１３０…要素特徴統合部
１４０…注目点探索部 10... Processor 12... ROM
14... RAM
15... Auxiliary storage device 16... Input/output interface 18... Network interface 100... Image analysis device 101... Image input unit 102... Element feature extraction unit 103... Region feature calculation unit 104... Attention point regression unit 105... Attention point output unit 110 Area feature integration unit 120 Area feature calculation unit 130 Element feature integration unit 140 Attention point search unit

特許４５３８００８号公報Japanese Patent No. 4538008 特許３４１１９７１号公報Japanese Patent No. 3411971 特許５１５８９７４号公報Japanese Patent No. 5158974 特許５７６６６２０号公報Japanese Patent No. 5766620 特許５８６５０７８号公報Japanese Patent No. 5865078

L. Itti, et al., "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence 11 pp. 1254-1259, 1998.L. Itti, et al., "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence 11 pp. 1254-1259, 1998. R. Zhao, et al., "Saliency detection by multi-context deep learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.R. Zhao, et al., "Saliency detection by multi-context deep learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. X. Huang, et al., "SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks," Proceedings of the IEEE International Conference on Computer Vision. 2015.X. Huang, et al., "SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks," Proceedings of the IEEE International Conference on Computer Vision. 2015.

Claims

An image analysis device for extracting a point of interest from an input image,
An element feature extraction unit that extracts element features at each position of the input image,
An area feature calculation unit that divides the input image into a plurality of areas and integrates the element features for each of the divided areas to calculate area features,
An attention point regression unit that calculates the attention point of the input image based on a predetermined regression model from the calculated region features,
Only including,
The area feature calculation unit divides the input image by spatially dividing the shooting direction of the input image so as to correspond to each surface of a polyhedron that approximates the spherical surface of the input image,
Image analysis device.

The image analysis device according to claim 1, wherein the element feature extraction unit extracts the element feature based on a projection conversion image obtained by projecting the input image onto each surface of the polyhedron.

The image analysis apparatus according to claim 1, wherein the polyhedron is a regular polyhedron.

Further comprising a region feature integration unit that maps the region feature to a lower dimensional feature to obtain an integrated region feature,
The point of interest regression section,
Calculating the point of interest based on the regression model from the integrated region feature,
The image analysis apparatus according to claim 1.

The area feature calculation unit,
Calculating the region feature by weighted addition of the weighting function and the element feature according to the position for each region,
The image analysis apparatus according to any one of claims 1 to 4 .

The regression model is selected from the group consisting of linear regression, logistic regression, support vector regression, random forest regression and neural networks,
The image analysis apparatus according to any one of claims 1-5.

7. The image analysis apparatus according to claim 1, wherein the element feature is at least one element feature selected from the group consisting of color, edge, saliency, and object position/label.

A method of extracting a point of interest from an input image,
Extracting the element features at each position of the input image,
Dividing the input image into a plurality of regions, and integrating the element features for each divided region to calculate a region feature,
Calculating a point of interest of the input image from the calculated region features based on a predetermined regression model;
Only including,
The step of calculating the region feature includes the step of dividing the input image by spatially dividing the shooting direction of the input image so as to correspond to each surface of a polyhedron that approximates the spherical surface of the input image. ,
Method.

Further comprising mapping the region features to lower dimensional features to obtain integrated region features,
The step of calculating the point of interest includes
9. The method of claim 8 including the step of calculating the point of interest from the integrated region features based on the regression model.

The step of calculating the region feature includes
A step of calculating the area feature by performing weighted addition of a weighting function and element features according to the position for each area,
The method of claim 8 .

The regression model is selected from the group consisting of linear regression, logistic regression, support vector regression, random forest regression and neural networks,
The method according to any one of claims 8 to 10 .

The method according to any one of claims 8 to 12 , wherein the element feature is at least one element feature selected from the group consisting of color, edge, saliency, object position/label.

The computer program for executing the steps of the method according to any one of claims 8-12.