JP6675584B2

JP6675584B2 - Image processing apparatus, image processing method, and program

Info

Publication number: JP6675584B2
Application number: JP2016097610A
Authority: JP
Inventors: 崇之原
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-05-16
Filing date: 2016-05-16
Publication date: 2020-04-01
Anticipated expiration: 2036-05-16
Also published as: JP2017208591A

Description

本発明は、画像処理装置、画像処理方法およびプログラムに関する。 The present invention relates to an image processing device, an image processing method, and a program.

従来、入力される画像から抽出した注目領域や特徴量に基づいて、入力された画像から視覚的に優れた構図を探索する技術が種々提案されている。 2. Description of the Related Art Conventionally, various techniques have been proposed for searching for a visually superior composition from an input image based on a region of interest or a feature amount extracted from the input image.

この点につき、特許文献１は、画像から顕著性を抽出し、予め決められた構図の法則を提供して画像のクロッピングを行う技術を開示する。特許文献２は、注目領域を抽出し、絵画や写真から得た基準構図との相関を参照して画像をクロッピングする技術を開示する。特許文献３は、注目領域と不要領域を抽出し、注目領域を含み、不要領域を除いた構図を生成する技術を開示する。特許文献４は、主被写体の位置、興味領域の位置、注目領域の一致性、オブジェクトの保存性に基づいてトリミング範囲を評価する方法を開示する。特許文献５は、注目領域と背景領域のエッジに対して適応的に重みを設定し、構図のグリッドとの相関演算で構図の良否の評価を行う技術を開示する。 In this regard, Patent Literature 1 discloses a technique of extracting saliency from an image, providing a predetermined composition rule, and cropping the image. Patent Document 2 discloses a technique of extracting a region of interest and cropping an image with reference to a correlation with a reference composition obtained from a painting or a photograph. Patent Document 3 discloses a technique of extracting a region of interest and an unnecessary region, and generating a composition including the region of interest and excluding the unnecessary region. Patent Literature 4 discloses a method for evaluating a trimming range based on the position of a main subject, the position of a region of interest, the coincidence of a region of interest, and the preservability of an object. Patent Literature 5 discloses a technique in which weights are adaptively set for edges of an attention area and a background area, and the quality of a composition is evaluated by a correlation operation with a composition grid.

しかしながら、従来の技術は、専ら、入力された画像から視覚的に優れた構図を２次元的に探索するものでしかなかった。 However, the conventional technique only searches for a visually superior composition two-dimensionally from an input image.

本発明は、上記に鑑みてなされたものであり、入力された画像から構図を３次元的に探索することができる画像処理装置を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide an image processing apparatus capable of three-dimensionally searching for a composition from an input image.

本発明者は、入力された画像から構図を３次元的に探索することができる画像処理装置につき鋭意検討した結果、以下の構成に想到し、本発明に至ったのである。 The inventor of the present invention has intensively studied an image processing apparatus capable of three-dimensionally searching for a composition from an input image. As a result, the present inventor has arrived at the following configuration and has arrived at the present invention.

すなわち、本発明によれば、入力画像から構図を探索する画像処理装置であって、入力画像から注目領域を抽出する注目領域抽出部と、仮想的なカメラで前記入力画像の前記注目領域を含む領域を撮影してなる仮想的な撮影画像の構図を注目構図として設定する構図設定部であって、該仮想的なカメラの所定のカメラパラメータを所定の探索範囲内で変更して複数の前記注目構図を設定する、構図設定部と、前記仮想的な撮影画像と目標とする構図を持つテンプレート画像の類似度に基づいて、該仮想的な撮影画像に対応する前記注目構図の評価値を算出する構図評価部と、前記評価値が最大となる前記注目構図を探索する構図探索部と、
を含む、画像処理装置が提供される。 That is, according to the present invention, there is provided an image processing apparatus for searching for a composition from an input image, including an attention area extraction unit that extracts an attention area from the input image, and a virtual camera including the attention area of the input image. A composition setting unit that sets a composition of a virtual photographed image obtained by photographing an area as a composition of interest, and changes a plurality of the attention by changing a predetermined camera parameter of the virtual camera within a predetermined search range. A composition setting unit that sets a composition, and calculates an evaluation value of the composition of interest corresponding to the virtual captured image based on a similarity between the virtual captured image and a template image having a target composition. A composition evaluation unit, and a composition search unit that searches for the composition of interest that maximizes the evaluation value;
An image processing apparatus is provided.

上述したように、本発明によれば、入力された画像から構図を３次元的に探索することができる画像処理装置が提供される。 As described above, according to the present invention, an image processing apparatus capable of three-dimensionally searching for a composition from an input image is provided.

本実施形態の画像処理装置の機能ブロック図。FIG. 2 is a functional block diagram of the image processing apparatus according to the embodiment. 本実施形態の画像処理装置が実行する処理を示すフローチャート。5 is a flowchart illustrating processing executed by the image processing apparatus according to the embodiment. 構図設定部が実行する処理を説明するための概念図。FIG. 3 is a conceptual diagram for describing a process executed by a composition setting unit. 構図テンプレートを示す図。The figure which shows a composition template. エッジ算出部が実行する処理を説明するための概念図。FIG. 4 is a conceptual diagram for describing a process executed by an edge calculation unit. Equirectangular形式（正距円筒図法）の画像を説明するための概念図。Conceptual diagram for explaining images in Equirectangular format (Equidistant cylindrical projection). 構図設定部が実行する処理を説明するための概念図。FIG. 3 is a conceptual diagram for describing a process executed by a composition setting unit. 構図設定部が実行する処理を説明するための概念図。FIG. 3 is a conceptual diagram for describing a process executed by a composition setting unit. リトルプラネットの構図テンプレートを示す図。The figure which shows the composition template of a little planet. 主要直線算出部が実行する処理を説明するための概念図。FIG. 4 is a conceptual diagram for describing a process executed by a main straight line calculation unit. 構図評価部が実行する処理を説明するための概念図。FIG. 4 is a conceptual diagram for describing a process executed by a composition evaluation unit. 本実施形態の画像処理装置のハードウェア構成図。FIG. 2 is a hardware configuration diagram of the image processing apparatus according to the embodiment.

以下、本発明を、実施形態をもって説明するが、本発明は後述する実施形態に限定されるものではない。なお、以下に参照する各図においては、共通する要素について同じ符号を用い、適宜、その説明を省略するものとする。 Hereinafter, the present invention will be described with reference to embodiments, but the present invention is not limited to the embodiments described below. In the drawings referred to below, the same reference numerals are used for common elements, and the description thereof will be appropriately omitted.

本発明の実施形態である画像処理装置１００は、入力画像から視覚的に優れた構図を探索する機能を有する装置である。以下、図１に示す機能ブロック図に基づいて、本実施形態の画像処理装置１００の機能構成を説明する。 The image processing apparatus 100 according to the embodiment of the present invention is an apparatus having a function of searching for a visually excellent composition from an input image. Hereinafter, a functional configuration of the image processing apparatus 100 according to the present embodiment will be described based on a functional block diagram illustrated in FIG.

図１に示すように、画像処理装置１００は、画像入力部１０１と、注目領域抽出部１０２と、構図設定部１０３と、エッジ算出部１０４と、主要直線算出部１０５と、構図評価部１０６と、構図探索部１０７と、画像出力部１０８とを含んで構成される。 As shown in FIG. 1, the image processing apparatus 100 includes an image input unit 101, a region of interest extraction unit 102, a composition setting unit 103, an edge calculation unit 104, a main straight line calculation unit 105, a composition evaluation unit 106, , A composition search unit 107 and an image output unit 108.

画像入力部１０１は、処理対象となる画像を入力する手段である。 The image input unit 101 is a unit for inputting an image to be processed.

注目領域抽出部１０２は、処理対象となる画像から注目領域を抽出する手段である。 The attention area extraction unit 102 is a unit that extracts an attention area from an image to be processed.

構図設定部１０３は、仮想的なカメラで処理対象となる画像を撮影してなる仮想的な撮影画像の構図を注目構図として設定する手段である。 The composition setting unit 103 is a unit that sets a composition of a virtual captured image obtained by capturing an image to be processed by a virtual camera as a composition of interest.

エッジ算出部１０４は、仮想的な撮影画像のエッジ情報を算出する手段である。 The edge calculation unit 104 is a unit that calculates edge information of a virtual captured image.

主要直線算出部１０５は、仮想的な撮影画像から算出されたエッジ情報に基づいて主要直線を算出する手段である。 The main straight line calculation unit 105 is a unit that calculates a main straight line based on edge information calculated from a virtual captured image.

構図評価部１０６は、仮想的な撮影画像と目標とする構図を持つテンプレート画像の類似度に基づいて、仮想的な撮影画像に対応する注目構図の評価値を算出する手段である。 The composition evaluation unit 106 is a unit that calculates the evaluation value of the composition of interest corresponding to the virtual captured image based on the similarity between the virtual captured image and the template image having the target composition.

構図探索部１０７は、評価値が最大となる注目構図を探索する手段である。 The composition search unit 107 is a unit that searches for a composition of interest with the highest evaluation value.

画像出力部１０８は、評価値が最大となった注目構図で処理対象となる画像を切り出して出力する手段である。 The image output unit 108 is a unit that cuts out and outputs an image to be processed in the composition of interest with the highest evaluation value.

なお、本実施形態では、画像処理装置１００を構成するコンピュータが所定のプログラムを実行することにより、画像処理装置１００が上述した各手段として機能する。 In the present embodiment, when the computer constituting the image processing apparatus 100 executes a predetermined program, the image processing apparatus 100 functions as each unit described above.

以上、本実施形態の画像処理装置１００の機能構成について説明してきたが、続いて、画像処理装置１００が実行する処理の内容を図２に示すフローチャートに基づいて説明する。 The functional configuration of the image processing apparatus 100 according to the present embodiment has been described above. Next, the content of processing executed by the image processing apparatus 100 will be described with reference to the flowchart illustrated in FIG.

まず、ステップ１０１では、画像入力部１０１が、任意の記憶手段から処理対象となる画像を読み込んで入力する。以下、入力した画像を“入力画像”という。 First, in step 101, the image input unit 101 reads and inputs an image to be processed from an arbitrary storage unit. Hereinafter, the input image is referred to as an “input image”.

続くステップ１０２では、注目領域抽出部１０２が、入力画像から注目領域を抽出する。ここで、注目領域の抽出は、顔や人物などの物体を画像中から検出する物体検出または顕著性マップに基づいて行うことができる。顕著性マップに基づく注目領域の抽出は、例えば、L. Itti, et al., "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence 11 pp. 1254-1259, 1998.（非特許文献１）や、R. Zhao, et al., "Saliency detection by multi-context deep learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.（非特許文献２）が開示する方法によって行うことができる。なお、本実施形態において、“注目領域”とは、面積を有する領域のみならず、面積を有しない点（注目点）を含む概念である。 In the following step 102, the attention area extraction unit 102 extracts an attention area from the input image. Here, the region of interest can be extracted based on an object detection that detects an object such as a face or a person from an image or a saliency map. The extraction of the attention area based on the saliency map is described in, for example, L. Itti, et al., "A model of saliency-based visual attention for rapid scene analysis," IEEE Transactions on Pattern Analysis & Machine Intelligence 11 pp. 1254-1259. , 1998. (Non-Patent Document 1) and R. Zhao, et al., "Saliency detection by multi-context deep learning," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. Can be performed by the method disclosed in In the present embodiment, the “region of interest” is a concept including not only a region having an area but also a point having no area (a point of interest).

続くステップ１０３では、構図設定部１０３が、仮想的なカメラ（以下、仮想カメラという）で先のステップ１０２で抽出した注目領域を含む矩形の領域を撮影してなる仮想的な撮影画像（以下、仮想撮影画像という）の構図を“注目構図”として設定する。以下、注目構図を設定する手順を図３に基づいて説明する。 In a subsequent step 103, the composition setting unit 103 captures a virtual captured image (hereinafter, referred to as a virtual camera) formed by capturing a rectangular area including the attention area extracted in the previous step 102 with a virtual camera (hereinafter, referred to as a virtual camera). The composition (referred to as a virtual photographed image) is set as a “composition of interest”. Hereinafter, a procedure for setting the composition of interest will be described with reference to FIG.

本実施形態では、まず、入力画像を撮影した実カメラの投影中心Ｏと共通の投影中心を有する仮想カメラを定義する。その上で、図３（ａ）に示すように、注目領域の重心を算出し、投影中心Ｏと仮想カメラの撮影面（以下、仮想カメラ撮影面という）の中心を通る光軸ａが注目領域の重心を通るような仮想カメラの３次元回転パラメータを求める。続いて、求めた３次元回転パラメータを使用して入力画像を透視投影変換することにより仮想撮影画像を生成する。本実施形態では、このとき得られた仮想撮影画像の構図を“基準構図”とする。 In the present embodiment, first, a virtual camera having a common projection center with the projection center O of the real camera that has captured the input image is defined. Then, as shown in FIG. 3A, the center of gravity of the region of interest is calculated, and the optical axis a passing through the projection center O and the center of the imaging plane of the virtual camera (hereinafter referred to as the virtual camera imaging plane) is defined as the area of interest. Of the virtual camera passing through the center of gravity of the virtual camera. Subsequently, a virtual captured image is generated by performing perspective projection transformation on the input image using the obtained three-dimensional rotation parameters. In the present embodiment, the composition of the virtual captured image obtained at this time is referred to as a “reference composition”.

ここで、仮想カメラの３次元回転パラメータを求めるためには、入力画像を撮影した実カメラの内部パラメータ（焦点距離、光軸中心、レンズ歪み係数、画角など）が既知（校正済み）であることが前提となる。実カメラの内部パラメータが既知（校正済み）であれば、実カメラの投影中心Ｏから見た入力画像上の注目領域の重心の３次元方向を知ることができ、これにより、その光軸ａが注目領域の重心を通るために必要な仮想カメラの３次元回転パラメータが求まる。 Here, in order to obtain the three-dimensional rotation parameters of the virtual camera, the internal parameters (focal length, optical axis center, lens distortion coefficient, angle of view, etc.) of the real camera that captured the input image are known (calibrated). It is assumed that If the internal parameters of the real camera are known (calibrated), it is possible to know the three-dimensional direction of the center of gravity of the attention area on the input image viewed from the projection center O of the real camera. A three-dimensional rotation parameter of the virtual camera required to pass through the center of gravity of the attention area is obtained.

なお、本実施形態においては、実カメラの内部パラメータを、入力画像のexifタグから取得しても良いし、Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 11, pp. 1330-1334, 2000.（非特許文献３）に開示されるような既知のカメラ校正法を用いて取得しても良い。また、実カメラの正確な内部パラメータを取得できない場合は、便宜的に、標準的な焦点距離（たとえば35mm版換算で50mm）を用いて、仮想カメラの３次元回転パラメータを求めるようにしてもよい。 In this embodiment, the internal parameters of the real camera may be obtained from the exif tag of the input image, or may be obtained from Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence. , 22, 11, pp. 1330-1334, 2000. (Non-Patent Document 3). Further, when it is not possible to obtain the accurate internal parameters of the real camera, the three-dimensional rotation parameters of the virtual camera may be obtained using a standard focal length (for example, 50 mm in 35 mm conversion) for convenience. .

なお、本実施形態においては、仮想カメラの３次元回転パラメータを探索する範囲（以下、探索範囲という）を予め設定しておく。具体的には、図３（ｂ）に示すように、投影中心Ｏと注目領域の重心を通る直線Ｌと仮想カメラの光軸ａのなす角度θがｄθ以内となる範囲を探索範囲として設定した上で、当該探索範囲内で角度θを離散化することにより、Ｎ個（Ｎは２以上の整数。以下同様。）の角度条件を予め定義しておく。 In the present embodiment, a range for searching for the three-dimensional rotation parameter of the virtual camera (hereinafter referred to as a search range) is set in advance. Specifically, as shown in FIG. 3B, a range in which the angle θ between the projection center O, the straight line L passing through the center of gravity of the attention area, and the optical axis a of the virtual camera is within dθ is set as the search range. Above, by discretizing the angle θ within the search range, N (N is an integer of 2 or more; the same applies hereinafter) angle conditions are defined in advance.

その上で、ステップ１０３では、Ｎ個の角度条件の中から順番に１つずつ角度条件を選択し、選択した角度条件を満たす仮想カメラの仮想撮影画像の構図を“注目構図”として設定する。なお、ステップ１０３では、仮想カメラの３次元回転パラメータ以外のパラメータ（焦点距離、画角等）を固定としておく。 Then, in step 103, the angle conditions are selected one by one from the N angle conditions in order, and the composition of the virtual photographed image of the virtual camera that satisfies the selected angle condition is set as the “composition of interest”. In step 103, parameters (focal length, angle of view, etc.) other than the three-dimensional rotation parameters of the virtual camera are fixed.

続くステップ１０４では、先のステップ１０３で設定した注目構図に対応する仮想撮影画像の特徴量を算出する。本実施形態では、画像のエッジ情報を特徴量として算出することができる。その場合、エッジ算出部１０４は、先のステップ１０３で設定した注目構図に対応する仮想撮影画像の中で、画像のエッジ（画素値の変化点）を検出し、そのエッジ情報（エッジの強度と方向）を仮想撮影画像の特徴量として算出する。なお、エッジ検出は、ソーベルフィルタや微分演算の空間フィルタ（ガボールフィルタなど）を用いた既知の方法で行うことができる。 In the following step 104, the feature amount of the virtual photographed image corresponding to the composition of interest set in the previous step 103 is calculated. In the present embodiment, edge information of an image can be calculated as a feature amount. In this case, the edge calculation unit 104 detects an edge of the image (a change point of the pixel value) in the virtual captured image corresponding to the composition of interest set in the previous step 103, and detects the edge information (the edge intensity and the edge intensity). Direction) is calculated as the feature amount of the virtual captured image. Note that edge detection can be performed by a known method using a Sobel filter or a spatial filter (Gabor filter or the like) for differential operation.

続くステップ１０５では、構図評価部１０６が、先のステップ１０４でエッジ情報を算出した注目構図について、その視覚的な優位性を評価する評価値を算出し、算出した評価値とその注目構図を対応付けて一時記憶手段に保存する。本実施形態においては、目標とする構図を持つテンプレート画像を予め用意しておき、当該テンプレート画像と仮想撮影画像の類似度に基づく評価値を所定の評価関数を用いて算出する。以下、本実施形態に適用可能な評価関数について説明する。 In the following step 105, the composition evaluation unit 106 calculates an evaluation value for evaluating the visual superiority of the target composition for which edge information has been calculated in the previous step 104, and associates the calculated evaluation value with the target composition. And store it in temporary storage means. In the present embodiment, a template image having a target composition is prepared in advance, and an evaluation value based on the similarity between the template image and the virtual captured image is calculated using a predetermined evaluation function. Hereinafter, an evaluation function applicable to the present embodiment will be described.

本実施形態では、下記式（１）に示す評価関数Ｊ_１を用いることができる。 In the present embodiment, it is possible to use an evaluation function J ₁ of the following formula (1).

上記式（１）に示す評価関数Ｊ_１は、注目領域が注目構図の中心に存在し、また、画像のエッジが水平となっている構図が高く評価されるように設計されている。 The evaluation function J ₁ shown in the above formula (1), the region of interest is present in the center of attention composition, also designed as composition of the edges of the image is in the horizontal is highly appreciated.

また、本実施形態では、下記式（２）に示す評価関数Ｊ_２を用いることができる。 Further, in the present embodiment, it is possible to use an evaluation function J ₂ of the following formula (2).

上記式（２）において、e(i,j)は、評価の対象となる注目構図の位置(i,j)におけるエッジベクトルを示し、o(i,j)は、テンプレート画像の位置(i,j)におけるエッジベクトルを示す。ここで、エッジベクトルとは、水平・垂直方向の画素値の勾配を要素に持つ縦ベクトルであり、グレースケール画像の場合は２次元のベクトルとなり、ＲＧＢ画像の場合は６次元のベクトルとなる。 In the above equation (2), e (i, j) indicates the edge vector at the position (i, j) of the target composition to be evaluated, and o (i, j) indicates the position (i, j) of the template image. 13 shows an edge vector in j). Here, the edge vector is a vertical vector having a gradient of pixel values in the horizontal and vertical directions as elements, and is a two-dimensional vector in the case of a grayscale image, and a six-dimensional vector in the case of an RGB image.

さらに、本実施形態では、下記式（３）に示すように、テンプレート画像のエッジベクトルo(i,j)と、注目構図のエッジベクトルe(i,j)の内積の和を評価関数Ｊ_３として用いることができる。 Furthermore, in the present embodiment, as shown in the following formula (3), the edge vector o (i, j) of the template image and evaluate a sum of the inner product of interest composition of edge vectors e (i, j) function J ₃ Can be used as

なお、上述した評価関数Ｊ_１、Ｊ_２、Ｊ_３は、いずれも構図の評価が高くなるほどその評価値の値が大きくなるように設計されている。 The above-described evaluation functions J ₁ , J ₂ , and J ₃ are all designed so that the higher the evaluation of the composition, the larger the value of the evaluation value.

図４は、本実施形態に適用可能な６種類のテンプレート画像の構図（日の丸構図、三分割構図、水平構図、垂直構図、放射構図、斜線構図）を例示的に示す。本実施形態においては、１種類のテンプレートを用いて評価値を算出しても良いし、任意のＭ種類（Ｍは２以上の整数。以下同様）のテンプレートを用いてＭ個の評価値を算出しても良い。ただし、１つの仮想撮影画像（注目構図）に対して、Ｍ個の評価値を算出する場合、Ｍ個の評価値の中の最小値を当該注目構図の評価値とする。 FIG. 4 exemplarily shows the composition of six types of template images applicable to the present embodiment (the composition of the sun, the three-piece composition, the horizontal composition, the vertical composition, the radial composition, and the oblique composition). In the present embodiment, the evaluation values may be calculated using one type of template, or M evaluation values may be calculated using arbitrary M types (M is an integer of 2 or more; the same applies hereinafter). You may. However, when calculating M evaluation values for one virtual photographed image (composition of interest), the minimum value among the M evaluation values is used as the evaluation value of the composition of interest.

なお、グレースケールの場合、水平構図、垂直構図および放射構図のテンプレート画像のエッジベクトルo(i,j)の具体的表現は、それぞれ、下記式（４）〜（６）のようになる。 In the case of the gray scale, the specific expressions of the edge vectors o (i, j) of the template images of the horizontal composition, the vertical composition, and the radial composition are expressed by the following equations (4) to (6), respectively.

ここで、上記式（６）において、cx、cyはそれぞれ構図の中心位置のx座標とy座標を示し、N(x)はスカラー倍によって引数ベクトルxのL2ノルムを1にする正規化作用素である。 Here, in the above equation (6), cx and cy indicate the x and y coordinates of the center position of the composition, respectively, and N (x) is a normalization operator that sets the L2 norm of the argument vector x to 1 by a scalar multiplication. is there.

以上、本実施形態に適用可能な評価関数について説明してきたが、上述した評価関数は例示であって、その他の距離尺度を用いた評価関数を採用しても良い。 The evaluation function applicable to the present embodiment has been described above. However, the evaluation function described above is an example, and an evaluation function using another distance scale may be employed.

続くステップ１０６では、構図探索部１０７が、予め設定した探索範囲内の全構図を探索したか否かを判断する。この時点では、探索すべき構図が残っているので（ステップ１０６、Ｎｏ）、処理は再びステップ１０３に戻る。 In the following step 106, the composition search unit 107 determines whether or not all the compositions within the preset search range have been searched. At this point, since the composition to be searched remains (step 106, No), the process returns to step 103 again.

続くステップ１０３では、構図設定部１０３が、残った角度条件の中から選択された次の角度条件を満たす仮想カメラの仮撮影画像を新たな注目構図として設定し、続くステップ１０４で、エッジ算出部１０４が、設定した注目構図からエッジ情報を算出し、続くステップ１０５では、構図評価部１０６が、算出した仮撮影画像のエッジ情報とテンプレート画像のエッジ情報に基づいて、当該仮撮影画像に対応する注目構図の評価値を算出する。以降、探索範囲内の全ての構図を探索するまで（ステップ１０６、Ｎｏ）、ステップ１０３〜１０６を繰り返し実行する。 In the following step 103, the composition setting unit 103 sets a provisionally shot image of the virtual camera that satisfies the next angle condition selected from the remaining angle conditions as a new composition of interest. In the following step 104, the edge calculation unit 104 calculates edge information from the set composition of interest, and in a subsequent step 105, the composition evaluation unit 106 corresponds to the provisional captured image based on the calculated edge information of the provisional captured image and the edge information of the template image. The evaluation value of the composition of interest is calculated. Thereafter, steps 103 to 106 are repeatedly executed until all the compositions within the search range are searched (step 106, No).

ここで、ステップ１０３〜１０６を繰り返し実行するにあたって、エッジ算出部１０４は、２巡目以降のステップ１０４において、１巡目のステップ１０４で算出したエッジ情報を３次元変換することによって、該当する注目構図に係るエッジ情報を算出することが好ましい。 Here, in repeatedly executing steps 103 to 106, the edge calculation unit 104 performs three-dimensional conversion of the edge information calculated in step 104 in the first round in step 104 in the second and subsequent rounds to obtain a corresponding attention. It is preferable to calculate edge information related to the composition.

具体的には、一つの撮像面で検出されたエッジは、３次元空間中では、図５に示すように、投影中心Ｏからの位置ベクトルとエッジの方向ベクトルの組で表現することができるので、２巡目以降のステップ１０４では、１巡目のステップ１０４で検出したエッジ（位置ベクトルと方向ベクトルの組）を、仮想カメラの３次元回転パラメータを用いて透視投影変換することにより、仮撮影画像のエッジ情報を直接的に算出することができる。この方法によれば、２巡目以降のステップ１０４において、仮撮影画像の生成とフィルタ処理が不要になるので、その分、計算コストが低減される。なお、この場合は、誤差を低減する観点から、１巡目のステップ１０４で、基準構図の仮撮影画像からエッジ情報を算出することが好ましい。 Specifically, an edge detected on one imaging plane can be expressed in a three-dimensional space by a set of a position vector from the projection center O and a direction vector of the edge, as shown in FIG. In the second and subsequent steps 104, provisional imaging is performed by performing perspective projection transformation of the edge (a set of position vector and direction vector) detected in the first step 104 using the three-dimensional rotation parameters of the virtual camera. The edge information of the image can be calculated directly. According to this method, the generation of the provisionally photographed image and the filtering process are not required in the second and subsequent steps 104, so that the calculation cost is reduced accordingly. In this case, from the viewpoint of reducing the error, it is preferable to calculate the edge information from the provisionally shot image of the reference composition in step 104 of the first cycle.

ステップ１０３〜１０６を繰り返し実行し、探索範囲内の全構図を探索を終えると（ステップ１０６、Ｙｅｓ）、処理はステップ１０７に進む。この時点でＮ個の注目構図とその評価値が一時記憶手段に保存されているので、ステップ１０７では、構図探索部１０７が、そのＮ個の評価値の中から最大の評価値を有する注目構図を決定する。 When steps 103 to 106 are repeatedly executed, and the search has been completed for all the compositions within the search range (step 106, Yes), the process proceeds to step 107. At this point, the N compositions of interest and their evaluation values are stored in the temporary storage means, so in step 107, the composition search unit 107 causes the composition of interest to have the largest evaluation value among the N evaluation values. To determine.

最後に、ステップ１０８では、画像出力部１０８が、先のステップ１０７で決定した注目構図で入力画像を切り出して出力する。具体的には、画像出力部１０８は、先のステップ１０７で決定した注目構図に対応する仮撮影画像（透視投影変換画像）を出力して、処理を終了する。 Finally, in step 108, the image output unit 108 cuts out and outputs the input image with the composition of interest determined in the previous step 107. Specifically, the image output unit 108 outputs a provisionally shot image (perspective projection transformed image) corresponding to the composition of interest determined in the previous step 107, and ends the processing.

なお、これまで、仮想的カメラの３次元回転パラメータを所定の探索範囲内で変更することにより複数の注目構図を設定する態様を説明してきたが、本実施形態では、焦点距離、光軸中心、レンズ歪み係数、画角、並進といった、その他のカメラパラメータを所定の探索範囲内で変更することによって複数の注目構図を設定しても良い。 In the above, the mode in which a plurality of compositions of interest are set by changing the three-dimensional rotation parameter of the virtual camera within a predetermined search range has been described. However, in the present embodiment, the focal length, the optical axis center, A plurality of target compositions may be set by changing other camera parameters such as a lens distortion coefficient, an angle of view, and translation within a predetermined search range.

以上、説明したように、本実施形態によれば、注目領域を含む基準構図の近傍で仮想カメラのカメラパラメータを変動させることにより、入力画像が撮影された３次元空間における３次元的な回転・並進を考慮して構図を３次元的に探索することができる。これにより、例えば、一度撮影した画像の構図を変更して美観を向上させることなどが可能になる。また、本実施形態によれば、探索範囲を基準構図を中心とした所定の範囲に限定することによって、低い計算コストで構図を探索することが可能となる。 As described above, according to the present embodiment, by changing the camera parameters of the virtual camera near the reference composition including the attention area, the three-dimensional rotation and the three-dimensional rotation in the three-dimensional space where the input image is captured are obtained. The composition can be searched three-dimensionally in consideration of the translation. As a result, for example, it is possible to change the composition of the image that has been shot once to improve the beauty. Further, according to the present embodiment, the composition can be searched at a low calculation cost by limiting the search range to a predetermined range centered on the reference composition.

なお、本実施形態によれば、同様の手順で、パノラマ画像に代表されるような広角画像から視覚的に優れた構図を持つ部分画像を切り出すことが可能となる。以下、この点について説明する。 According to the present embodiment, it is possible to cut out a partial image having a visually excellent composition from a wide-angle image typified by a panoramic image by the same procedure. Hereinafter, this point will be described.

Equirectangular形式（正距円筒図法）は、主にパノラマ撮影に使われる画像の表現形式であり、図６に示すように、画素の３次元方向を緯度と経度に分解し、正方格子状に対応する画素値を並べた画像形式である。Equirectangular形式の画像からは、経度緯度の座標値から任意の３次元方向の画素値を得ることができ、概念的には、単位球に画素値がプロットされたものとして捉えることができる。 The Equirectangular format (equidistant cylindrical projection) is a representation format of an image mainly used for panoramic photography. As shown in FIG. 6, the three-dimensional direction of a pixel is decomposed into latitude and longitude to correspond to a square grid. This is an image format in which pixel values are arranged. From the image in the Equirectangular format, pixel values in an arbitrary three-dimensional direction can be obtained from the coordinate values of longitude and latitude, and can be conceptually regarded as pixel values plotted on a unit sphere.

入力画像がEquirectangular形式の画像である場合、構図設定部１０３は、図７に示すように、仮想カメラの光軸が注目領域の重心を通る構図を基準構図として設定し、基準構図の近傍に探索範囲を設定し、その探索範囲内で仮想カメラのカメラパラメータを離散化してＮ個の候補構図を作成する。なお、探索するパラメータには、焦点距離、光軸中心、レンズひずみ係数、３次元回転の他に、投影中心Ｏを含めることができる。この場合、図８（ａ）、（ｂ）に示すように、探索範囲内で投影中心Ｏを変更することよって異なる候補構図を作成することができる。 When the input image is an image in the Equirectangular format, as shown in FIG. 7, the composition setting unit 103 sets a composition in which the optical axis of the virtual camera passes through the center of gravity of the attention area as a reference composition, and searches near the reference composition. A range is set, and camera parameters of the virtual camera are discretized within the search range to create N candidate compositions. The search parameters may include the projection center O in addition to the focal length, the optical axis center, the lens distortion coefficient, and the three-dimensional rotation. In this case, as shown in FIGS. 8A and 8B, different candidate compositions can be created by changing the projection center O within the search range.

構図設定部１０３は、Ｎ個の候補構図の中から１つずつ順番に注目構図を選択し、エッジ算出部１０４は、選択された注目構図の画像のエッジ情報（エッジの強度と方向）を算出し、構図評価部１０６は、算出したエッジ情報とテンプレート画像のエッジ情報を上述した評価関数に投入して、当該注目構図の評価値を算出する。入力画像がEquirectangular形式の画像の場合、図４に例示したテンプレート画像に加えて、図９に示すパノラマ画像に特有なリトルプラネットと呼ばれる構図を持つテンプレート画像を用いてもよい。なお、リトルプラネットの構図を持つテンプレート画像のエッジベクトルo(i,j)の具体的表現は、それぞれ、下記式（７）のようになる。 The composition setting unit 103 sequentially selects a composition of interest one by one from the N candidate compositions, and the edge calculation unit 104 calculates edge information (edge strength and direction) of the image of the selected composition of interest. Then, the composition evaluation unit 106 inputs the calculated edge information and the edge information of the template image into the above-described evaluation function, and calculates an evaluation value of the composition of interest. When the input image is an image in the Equirectangular format, a template image having a composition called a little planet unique to the panoramic image shown in FIG. 9 may be used in addition to the template image shown in FIG. The specific expression of the edge vector o (i, j) of the template image having the composition of the little planet is as shown in the following equation (7).

以上、説明したように、本実施形態によれば、Equirectangular形式などで表現される広角画像から視覚的に優れた構図の部分画像を低い計算コストでクロッピングすることが可能となる。 As described above, according to this embodiment, it is possible to crop a partial image having a visually excellent composition from a wide-angle image expressed in the Equirectangular format or the like at a low calculation cost.

これまで、仮想撮影画像のエッジ情報を特徴量として構図を評価する態様について説明してきたが、本実施形態では、仮想撮影画像の主要直線を特徴量として構図を評価することもできる。その場合、先のステップ１０４（図２参照）では、主要直線算出部１０５が、先のステップ１０３（図２参照）で設定した注目構図に対応する仮想撮影画像からエッジ情報を算出し、算出したエッジ情報に基づいてハフ変換により主要直線を算出する。このとき、主要直線算出部１０５は、図１０に示すように、３次元空間において、検出した主要直線と投影中心Ｏを含む平面ｓを規定し、検出した主要直線を平面ｓの法線ベクトルとして表現する。 So far, an aspect has been described in which the composition is evaluated using the edge information of the virtual captured image as a feature amount. However, in the present embodiment, the composition can be evaluated using the main straight line of the virtual captured image as the feature amount. In that case, in step 104 (see FIG. 2), the main straight line calculation unit 105 calculates and calculates edge information from the virtual photographed image corresponding to the composition of interest set in step 103 (see FIG. 2). A main straight line is calculated by Hough transform based on the edge information. At this time, as shown in FIG. 10, the main straight line calculation unit 105 defines a plane s including the detected main straight line and the projection center O in the three-dimensional space, and sets the detected main straight line as a normal vector of the plane s. Express.

続くステップ１０５（図２参照）では、構図評価部１０６が、仮想撮影画像から算出した主要直線（法線ベクトル）と、テンプレート画像の主要直線に基づいて当該仮想撮影画像に対応する注目構図の評価値を算出する。ここで、本実施形態においては、直線に関する構図を持つテンプレート画像は、直線群の単位法線ベクトル群で表現されており、図１１に示すように、３次元空間中の単位球面で特定の分布を取る。本実施形態では、テンプレート画像の構図（ここでは、垂直構図を例示している）に係る直線群の分布と、注目構図を持つ仮想撮影画像から算出された主要直線の単位球面上での分布の類似度に基づく評価値を適切な評価関数を用いて算出する。本実施形態では、たとえば、単位球面上で分布間の積和を取る評価関数を用いても良いし、その際、特定の直線に対して重み付けを行っても良い。 In the following step 105 (see FIG. 2), the composition evaluation unit 106 evaluates the composition of interest corresponding to the virtual captured image based on the main straight line (normal vector) calculated from the virtual captured image and the main straight line of the template image. Calculate the value. Here, in the present embodiment, a template image having a composition related to a straight line is represented by a unit normal vector group of a straight line group, and as shown in FIG. I take the. In the present embodiment, the distribution of the straight line group related to the composition of the template image (here, a vertical composition is illustrated) and the distribution of the main straight line calculated from the virtual photographed image having the composition of interest on the unit spherical surface. An evaluation value based on the similarity is calculated using an appropriate evaluation function. In the present embodiment, for example, an evaluation function that takes the product sum between distributions on a unit spherical surface may be used, and in this case, a specific straight line may be weighted.

このように、仮想撮影画像の主要直線を特徴量として用いて評価値を算出することにより、ノイズに対するロバスト性を向上させることができる。なお、仮想撮影画像の主要直線を算出する場合も、仮想撮影画像のエッジ情報を算出する場合と同様に、２巡目以降のステップ１０４（図２参照）では、１巡目のステップ１０４で算出した主要直線をカメラパラメータを用いて３次元変換して、該当する注目構図に係る主要直線を直接的に算出することにより、計算コストを低減することができる。なお、この場合も、誤差を低減する観点から、１巡目のステップ１０４で、基準構図の仮撮影画像から主要直線を算出することが好ましい。 As described above, by calculating the evaluation value using the main straight line of the virtual captured image as the feature amount, it is possible to improve robustness against noise. Also, when calculating the main straight line of the virtual captured image, similarly to the case of calculating the edge information of the virtual captured image, in the step 104 (see FIG. 2) after the second round, the calculation is performed in the step 104 of the first round. The three-dimensional conversion of the obtained main straight line using the camera parameters and directly calculating the main straight line related to the relevant composition of interest can reduce the calculation cost. Also in this case, from the viewpoint of reducing errors, it is preferable to calculate the main straight line from the provisionally shot image of the reference composition in step 104 of the first cycle.

最後に、図１２に基づいて本実施形態の画像処理装置１００を構成するコンピュータのハードウェア構成について説明する。 Finally, a hardware configuration of a computer constituting the image processing apparatus 100 according to the present embodiment will be described with reference to FIG.

図１２に示すように、本実施形態の画像処理装置１００を構成するコンピュータは、装置全体の動作を制御するプロセッサ１０と、ブートプログラムやファームウェアプログラムなどを保存するＲＯＭ１２と、プログラムの実行空間を提供するＲＡＭ１４と、画像処理装置１００を上述した各手段として機能させるためのプログラムやオペレーティングシステム（ＯＳ）等を保存するための補助記憶装置１５と、外部入出力装置を接続するための入出力インタフェース１６と、ネットワークに接続するためのネットワーク・インターフェース１８とを備えている。 As shown in FIG. 12, the computer constituting the image processing apparatus 100 of the present embodiment provides a processor 10 for controlling the operation of the entire apparatus, a ROM 12 for storing a boot program, a firmware program, and the like, and an execution space for the program. RAM 14, an auxiliary storage device 15 for storing a program and an operating system (OS) for causing the image processing apparatus 100 to function as each unit described above, and an input / output interface 16 for connecting an external input / output device And a network interface 18 for connecting to a network.

なお、上述した実施形態の各機能は、Ｃ、Ｃ＋＋、Ｃ＃、Ｊａｖａ（登録商標）などで記述されたプログラムにより実現でき、本実施形態のプログラムは、ハードディスク装置、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、フレキシブルディスク、ＥＥＰＲＯＭ、ＥＰＲＯＭなどの記録媒体に格納して頒布することができ、また他の装置が可能な形式でネットワークを介して伝送することができる。 Each function of the above-described embodiment can be realized by a program described in C, C ++, C #, Java (registered trademark), or the like. The program of the present embodiment includes a hard disk device, a CD-ROM, an MO, a DVD, and the like. Can be stored in a recording medium such as a flexible disk, an EEPROM, or an EPROM and distributed, and can be transmitted via a network in a format that can be used by other devices.

以上、本発明について実施形態をもって説明してきたが、本発明は上述した実施形態に限定されるものではなく、当業者が推考しうる実施態様の範囲内において、本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 As described above, the present invention has been described with the embodiment. However, the present invention is not limited to the above-described embodiment, and as long as the functions and effects of the present invention are exerted within a range of an embodiment that can be estimated by those skilled in the art. Are included in the scope of the present invention.

１０…プロセッサ
１２…ＲＯＭ
１４…ＲＡＭ
１５…補助記憶装置
１６…入出力インタフェース
１８…ネットワーク・インターフェース
１００…画像処理装置
１０１…画像入力部
１０２…注目領域抽出部
１０３…構図設定部
１０４…エッジ算出部
１０４…構図評価部
１０５…主要直線算出部
１０６…構図評価部
１０７…構図探索部
１０８…画像出力部 10 Processor 12 ROM
14 ... RAM
15 Auxiliary storage device 16 Input / output interface 18 Network interface 100 Image processing device 101 Image input unit 102 Attention area extraction unit 103 Composition setting unit 104 Edge calculation unit 104 Composition evaluation unit 105 Main straight line Calculation unit 106: composition evaluation unit 107: composition search unit 108: image output unit

特表２００４−５２０７３５号公報JP-T-2004-520735 特許３４８２９２３号公報Japanese Patent No. 3482923 特開２００７−１５７０６３号公報JP 2007-157063 A 特開２００９−２１２９２９号公報JP 2009-212929 A 特許５７３１０３３号公報Japanese Patent No. 5731033

Claims

An image processing device that searches for a composition from an input image,
An attention area extraction unit that extracts an attention area from the input image;
A composition setting unit that sets, as a composition of interest, a composition of a virtual captured image obtained by capturing an area including the area of interest of the input image with a virtual camera, and a predetermined camera parameter of the virtual camera. Is changed within a predetermined search range to set a plurality of the composition of interest, a composition setting unit,
A composition evaluation unit that calculates an evaluation value of the composition of interest corresponding to the virtual captured image based on a similarity between the virtual captured image and a template image having a target composition;
A composition search unit that searches for the composition of interest that maximizes the evaluation value;
An edge information calculation unit that calculates edge information of the virtual captured image;
Including
The composition setting unit,
The composition of the virtual captured image taken by the virtual camera whose optical axis passes through the center of gravity of the attention area is set as a reference composition,
The composition evaluation unit,
Calculating the evaluation value based on the similarity between the edge information of the virtual captured image and the edge information of the template image,
The edge information calculation unit,
Calculating edge information of the plurality of target compositions by three-dimensionally converting edge information calculated from the virtual captured image having the reference composition using the camera parameters ;
Image processing device.

An image processing device that searches for a composition from an input image,
An attention area extraction unit that extracts an attention area from the input image;
A composition setting unit that sets, as a composition of interest, a composition of a virtual captured image obtained by capturing an area including the area of interest of the input image with a virtual camera, and a predetermined camera parameter of the virtual camera. Is changed within a predetermined search range to set a plurality of the composition of interest, a composition setting unit,
A composition evaluation unit that calculates an evaluation value of the composition of interest corresponding to the virtual captured image based on a similarity between the virtual captured image and a template image having a target composition;
A composition search unit that searches for the composition of interest that maximizes the evaluation value;
A main straight line calculation unit that calculates a main straight line based on the edge information calculated from the virtual captured image;
Including
The composition setting unit,
The composition of the virtual captured image taken by the virtual camera whose optical axis passes through the center of gravity of the attention area is set as a reference composition,
The composition evaluation unit,
Calculating the evaluation value based on the similarity between the main straight line calculated from the virtual captured image and the main straight line of the template image,
The main straight line calculation unit,
Calculating a main straight line of the composition of interest by subjecting the main straight line calculated from the virtual captured image having the reference composition to three-dimensional conversion using the camera parameters ;
Image processing device.

The camera parameters are:
At least one parameter selected from the group consisting of focal length, optical axis center, three-dimensional rotation, translation, and lens distortion coefficient;
The image processing apparatus according to claim 1 or 2.

The attention area extraction unit includes:
Extracting the region of interest based on an object detection or saliency map,
The image processing apparatus according to any one of claims 1-3.

A method for searching for a composition from an input image,
Extracting a region of interest from the input image;
Setting, as a composition of interest, a composition of a virtual captured image obtained by capturing an area including the area of interest of the input image with a virtual camera, and setting a predetermined camera parameter of the virtual camera to a predetermined composition. Setting a plurality of the noted composition by changing within the search range of,
Calculating an evaluation value of the composition of interest corresponding to the virtual captured image based on a similarity between the virtual captured image and a template image having a target composition;
Searching for the composition of interest in which the evaluation value is maximum;
Calculating edge information of the virtual captured image;
Including
The step of setting the composition of interest includes:
Setting the composition of the virtual captured image taken by the virtual camera whose optical axis passes through the center of gravity of the attention area as a reference composition,
The step of calculating the evaluation value includes:
Including calculating the evaluation value based on the similarity between the edge information of the virtual captured image and the edge information of the template image,
Calculating the edge information,
Calculating the edge information of the plurality of target compositions by performing three-dimensional conversion on the edge information calculated from the virtual captured image having the reference composition using the camera parameters .
Method.

A method for searching for a composition from an input image,
Extracting a region of interest from the input image;
Setting, as a composition of interest, a composition of a virtual captured image obtained by capturing an area including the area of interest of the input image with a virtual camera, and setting a predetermined camera parameter of the virtual camera to a predetermined composition. Setting a plurality of the noted composition by changing within the search range of,
Calculating an evaluation value of the composition of interest corresponding to the virtual captured image based on a similarity between the virtual captured image and a template image having a target composition;
Searching for the composition of interest in which the evaluation value is maximum;
Calculating a main straight line based on the edge information calculated from the virtual photographed image;
Including
The step of setting the composition of interest includes:
Setting the composition of the virtual captured image taken by the virtual camera whose optical axis passes through the center of gravity of the attention area as a reference composition,
The step of calculating the evaluation value includes:
Including calculating the evaluation value based on the similarity between the main straight line calculated from the virtual captured image and the main straight line of the template image,
The step of calculating the main straight line,
Calculating a main straight line of the target composition by three-dimensionally converting a main straight line calculated from the virtual captured image having the reference composition using the camera parameters .
Method.

The camera parameters are:
At least one parameter selected from the group consisting of focal length, optical axis center, three-dimensional rotation, translation, and lens distortion coefficient;
The method according to claim 5 .

Extracting the attention area,
Including extracting the region of interest based on an object detection or saliency map,
The method according to any one of claims 5 to 7 .

A program for causing a computer to execute each step of the method according to any one of claims 5 to 8 .