JP2005242592A

JP2005242592A - Image processing apparatus and image processing method

Info

Publication number: JP2005242592A
Application number: JP2004050489A
Authority: JP
Inventors: Takayasu Yamaguchi; 高康山口; Setsuyuki Hongo; 節之本郷
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2004-02-25
Filing date: 2004-02-25
Publication date: 2005-09-08
Anticipated expiration: 2024-02-25
Also published as: JP4741804B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus and an image processing method that correctly distinguish a subject included in a pickup image from the pickup image while reducing the number of dimensions of an image feature value and reducing computational complexity. SOLUTION: An image processing server 1 comprises feature value extracting means for quantizing signals of a pickup image represented in a uniform color space in every arbitrary area of the pickup image, extracting frequencies of quantization level values on axes in the uniform color space as a color histogram, combining the color histogram of every arbitrary area, and extracting a feature value of the entire pickup image, and identifying means for distinguishing an unknown subject according to the feature value of the entire pickup image. COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、撮影画像から撮影画像に含まれる未知の撮影対象を判別する画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method for discriminating an unknown shooting target included in a shot image from a shot image.

従来、撮影画像から当該撮影画像に含まれる撮影対象を判別する際に用いられる特徴量には様々なものがある。色の特徴量や形状（構図）の特徴量などが、その例である。近年、これら複数の画像特徴量を組み合わせて高次元特徴量を作成し、撮影対象の判別を行う方法が提案されている（例えば、特許文献１参照）。
特開２００３−２８９５５１号公報（「００８９」〜「００９１」段落、図１０） 2. Description of the Related Art Conventionally, there are various feature amounts used when determining a shooting target included in a captured image from a captured image. Examples include color feature values and shape (composition) feature values. In recent years, a method has been proposed in which a plurality of image feature amounts are combined to create a high-dimensional feature amount and a subject to be imaged is determined (see, for example, Patent Document 1).
Japanese Patent Laying-Open No. 2003-289551 (paragraphs “0089” to “0091”, FIG. 10)

しかしながら、複数の画像特徴量を組み合わせることにより、特徴量の次元が高くなると、その次元数に応じて計算処理量が大きくなっていた。 However, when the dimension of the feature quantity is increased by combining a plurality of image feature quantities, the amount of calculation processing increases according to the number of dimensions.

そこで、本発明は、上記の課題に鑑み、画像特徴量の次元数を低く抑え、計算処理量を小さくし、撮影画像から撮影画像に含まれる撮影対象を正しく判別する画像処理装置及び画像処理方法を提供することを目的とする。 In view of the above problems, the present invention is an image processing apparatus and an image processing method that can reduce the number of dimensions of an image feature amount, reduce the amount of calculation processing, and correctly determine an imaging target included in the captured image from the captured image. The purpose is to provide.

上記目的を達成するため、本発明の第１の特徴は、撮影画像から撮影画像に含まれる未知の撮影対象を判別する画像処理装置であって、（イ）撮影画像の任意の領域毎に、色空間で表現された撮影画像の信号を量子化し、色空間における各軸の量子化レベルの値の頻度を色ヒストグラムとして抽出する第１の特徴量抽出手段と、（ロ）任意の領域毎の色ヒストグラムを結合し、撮影画像全体の特徴量を抽出する第２の特徴量抽出手段と、（ハ）撮影画像全体の特徴量に基づいて、未知の撮影対象を判別する判別手段とを備える画像処理装置であることを要旨とする。ここで、「色空間」には、均等色空間（Ｌ^*、ａ^*、ｂ^*空間）、ＲＧＢ色空間、ＣＭＹＫ色空間、Ｌ^*、ｕ^*、ｖ^*空間、ＹＵＶ空間、ＸＹＺ空間などの様々な色空間が含まれる。 In order to achieve the above object, a first feature of the present invention is an image processing apparatus that discriminates an unknown shooting target included in a shot image from a shot image, and (a) for each arbitrary region of the shot image, First feature amount extraction means for quantizing a signal of a captured image expressed in a color space and extracting a frequency of a quantization level value of each axis in the color space as a color histogram; and (b) for each arbitrary region An image comprising: a second feature amount extraction unit that combines color histograms to extract a feature amount of the entire captured image; and (c) a determination unit that determines an unknown shooting target based on the feature amount of the entire captured image. The gist is that it is a processing device. Here, the “color space” includes uniform color space (L ^* , a ^* , b ^* space), RGB color space, CMYK color space, L ^* , u ^* , v ^* space, YUV space, XYZ space, etc. Various color spaces are included.

第１の特徴に係る画像処理装置によると、画像特徴量の次元数を低く抑え、計算処理量を小さくし、撮影画像から撮影画像に含まれる撮影対象を正しく判別することができる。ここで、「次元数」とは、ベクトルである、特徴量の各要素数をいう。 According to the image processing apparatus according to the first feature, the number of dimensions of the image feature amount can be reduced, the amount of calculation processing can be reduced, and the shooting target included in the shot image can be correctly determined from the shot image. Here, the “number of dimensions” refers to the number of elements of the feature quantity that is a vector.

又、第１の特徴に係る画像処理装置は、任意の領域毎の色ヒストグラムに、任意の重みを掛け合わせる重み付け手段を更に備え、第２の特徴量抽出手段は、重みを掛け合わされた任意の領域毎の色ヒストグラムを結合してもよい。この画像処理装置によると、各領域で抽出した特徴量のベクトルの大きさについてバランスを取ることができる。 The image processing apparatus according to the first feature further includes a weighting unit that multiplies the color histogram for each arbitrary region by an arbitrary weight, and the second feature amount extraction unit includes an arbitrary weighted multiplying unit. You may combine the color histogram for every area | region. According to this image processing apparatus, it is possible to balance the size of the feature vector extracted in each region.

又、任意の重みは、任意の領域毎の色ヒストグラムの次元の値の合計値の逆数であってもよい。この画像処理装置によると、ベクトルを正規化して結合を行うことができる。 The arbitrary weight may be the reciprocal of the total value of the dimension values of the color histogram for each arbitrary region. According to this image processing apparatus, the vectors can be normalized and combined.

又、第１の特徴に係る画像処理装置は、複数の撮影画像全体の特徴量それぞれに基づいて算出された複数の所定のパラメータを記憶する記憶手段と、所定のパラメータを算出するパラメータ算出手段とを更に備え、判別手段は、所定のパラメータを用いることにより、未知の撮影対象を判別してもよい。この画像処理装置によると、特徴量を用いて算出されたパラメータを用いて、未知の撮影対象の判別を行うことができる。 In addition, the image processing apparatus according to the first feature includes a storage unit that stores a plurality of predetermined parameters calculated based on each feature amount of the entire plurality of captured images, and a parameter calculation unit that calculates the predetermined parameters. The determining unit may determine an unknown imaging target by using a predetermined parameter. According to this image processing apparatus, it is possible to discriminate an unknown imaging target using parameters calculated using the feature amount.

又、lを既知の撮影対象の番号、Lを既知の撮影対象の数、N^(l)をl番目の既知の撮影対象に対する撮影画像の数、vを特徴量の次元の番号、Vを特徴量の最大次元数、x^(l) _(n,v)をl番目の既知の撮影対象に属するｎ番目の撮影画像のv番目の次元の特徴量、γを任意の値、θ_(v,l)をl番目の既知の撮影対象に属する特徴量のv番目の次元における所定パラメータ、x’_(v)を未知の撮影対象が含まれる撮影画像のv番目の次元の特徴量、Ｆ_(l)を前記未知の撮影対象がl番目の既知の撮影対象に属する度合いを示す判別値として、第１の特徴に係る画像処理装置のパラメータ算出手段は、以下の式によって、所定のパラメータを算出し、

Also, l is the number of a known object, L is the number of known objects, N ^(l) is the number of images taken for the l-th known object, v is the number of feature dimensions, and V is the feature number. The maximum number of dimensions of the quantity, x ^(l) _{(n, v)} is the feature quantity of the vth dimension of the nth captured image belonging to the lth known imaging target, γ is an arbitrary value, θ _{(v, l )} Is a predetermined parameter in the v-th dimension of the feature quantity belonging to the l-th known imaging target, x ′ _(v) is a feature quantity in the v-th dimension of the captured image including the unknown imaging target, F _(l) As a discriminant value indicating the degree to which the unknown imaging target belongs to the l-th known imaging target, the parameter calculation means of the image processing apparatus according to the first feature calculates a predetermined parameter by the following equation:

判別手段は、以下の式によって得られた判別値が最も大きい撮影画像を、撮影対象として認識してもよい。

The discriminating unit may recognize a photographed image having the largest discriminant value obtained by the following expression as a subject to be photographed.

この画像処理装置によると、特徴量を用いて算出されたパラメータを用いて、判別値を算出し、未知の撮影対象の判別を行うことができる。又、式（１）及び式（２）は、いわゆるＮＢ（ナイーブ・ベイズ）の手法であり、未知の撮影対象が既知の撮影対象に該当する確率を、複数の既知の撮影対象それぞれに算出することができ、かつ、処理速度が早いという利点を有する。 According to this image processing apparatus, it is possible to calculate a discrimination value using a parameter calculated using a feature amount and discriminate an unknown imaging target. Equations (1) and (2) are so-called NB (Naive Bayes) techniques, and the probability that an unknown subject is a known subject is calculated for each of a plurality of known subjects. And has an advantage of high processing speed.

本発明の第２の特徴は、撮影画像から撮影画像に含まれる未知の撮影対象を判別する画像処理方法であって、（イ）撮影画像の任意の領域毎に、色空間で表現された撮影画像の信号を量子化し、色空間における各軸の量子化レベルの値の頻度を色ヒストグラムとして抽出する第１の特徴量抽出ステップと、（ロ）任意の領域毎の色ヒストグラムを結合し、撮影画像全体の特徴量を抽出する第２の特徴量抽出ステップと、（ハ）撮影画像全体の特徴量に基づいて、未知の撮影対象を判別するステップとを含む画像処理方法であることを要旨とする。 A second feature of the present invention is an image processing method for discriminating an unknown shooting target included in a shot image from a shot image, and (a) a shooting expressed in a color space for each arbitrary region of the shot image. A first feature amount extraction step for quantizing an image signal and extracting a frequency of a quantization level value of each axis in a color space as a color histogram, and (b) combining a color histogram for each arbitrary region and shooting It is an image processing method including a second feature amount extraction step for extracting a feature amount of the entire image, and (c) a step of determining an unknown shooting target based on the feature amount of the entire shot image. To do.

第２の特徴に係る画像処理方法によると、画像特徴量の次元数を低く抑え、計算処理量を小さくし、撮影画像から撮影画像に含まれる撮影対象を正しく判別することができる。 According to the image processing method according to the second feature, it is possible to suppress the number of dimensions of the image feature amount, reduce the calculation processing amount, and correctly determine the shooting target included in the shot image from the shot image.

本発明によると、画像特徴量の次元数を低く抑え、計算処理量を小さくし、撮影画像から撮影画像に含まれる撮影対象を正しく判別する画像処理装置及び画像処理方法を提供することができる。 According to the present invention, it is possible to provide an image processing apparatus and an image processing method that can suppress the number of dimensions of an image feature amount, reduce the amount of calculation processing, and correctly determine a shooting target included in the shot image from the shot image.

次に、図面を参照して、本発明の実施の形態を説明する。以下の図面の記載において、同一又は類似の部分には、同一又は類似の符号を付している。ただし、図面は模式的なものであることに留意すべきである。 Next, embodiments of the present invention will be described with reference to the drawings. In the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals. However, it should be noted that the drawings are schematic.

（画像処理システム）
本実施形態に係る画像処理システムは、図１に示すように、ユーザの入力を受け付けて画像処理サーバ１（画像処理装置）に処理を依頼し、画像処理サーバ１での処理結果を出力する端末装置２ａ、２ｂ、２ｃと、通信ネットワーク（インターネット等）３を介して、端末装置２ａ、２ｂ、２ｃからの処理依頼を受け付けて処理を行い、端末装置２ａ、２ｂ、２ｃに処理結果を送信する画像処理サーバ１とを備える。 (Image processing system)
As illustrated in FIG. 1, the image processing system according to the present embodiment receives a user input, requests processing from the image processing server 1 (image processing apparatus), and outputs a processing result in the image processing server 1. Accepts processing requests from the terminal devices 2a, 2b, and 2c via the devices 2a, 2b, and 2c and the communication network (such as the Internet) 3, performs processing, and transmits processing results to the terminal devices 2a, 2b, and 2c. An image processing server 1.

ユーザは、端末装置２ａ、２ｂ、２ｃを、学習モードと判別モードの２種類のモードを切り換えて使用することができる。「学習モード」とは、既知である撮影対象を撮影し、この撮影画像と撮影対象を特定する情報を画像処理サーバ１へ送信することにより、画像処理サーバ１が撮影画像を学習するモードである。一方、「判別モード」とは、未知である撮影対象を撮影し、この撮影画像を画像処理サーバ１へ送信することにより、画像処理サーバ１が撮影対象を判別し、判別結果を、端末装置２ａ、２ｂ、２ｃへ送信するモードである。 The user can use the terminal devices 2a, 2b, and 2c by switching between two modes, a learning mode and a discrimination mode. The “learning mode” is a mode in which the image processing server 1 learns a photographed image by photographing a known photographing target and transmitting the photographed image and information specifying the photographing target to the image processing server 1. . On the other hand, in the “discrimination mode”, an unknown shooting target is shot, and this shot image is transmitted to the image processing server 1, whereby the image processing server 1 determines the shooting target, and the determination result is displayed as the terminal device 2a. 2b and 2c.

学習モードである場合、画像処理サーバ１は、端末装置２ａ、２ｂ、２ｃから受信した、撮影対象に関する情報や撮影対象を撮影した撮影画像から撮影対象のグルーピングを行い、各撮影対象情報及び画像を記憶する。 When the learning mode is set, the image processing server 1 performs grouping of the shooting targets from the information about the shooting targets and the shot images obtained by shooting the shooting targets received from the terminal devices 2a, 2b, and 2c. Remember.

判別モードである場合、画像処理サーバ１は、端末装置２ａ、２ｂ、２ｃの位置情報と、端末装置２ａ、２ｂ、２ｃから受信した撮影画像が撮影された位置に近い、予め登録された撮影対象の情報を取り出し、撮影画像に含まれる撮影対象を確率的に判別する。又、画像処理サーバ１は、判別結果となる複数の撮影対象の候補と、各候補が撮影対象である確率と、それら撮影対象に関連する情報を端末装置２ａ、２ｂ、２ｃに送信する。 When it is in the discrimination mode, the image processing server 1 is registered in advance as a photographing target that is close to the position information of the terminal devices 2a, 2b, and 2c and the position where the captured images received from the terminal devices 2a, 2b, and 2c are photographed. Information is taken out, and a shooting target included in the shot image is determined probabilistically. In addition, the image processing server 1 transmits to the terminal devices 2a, 2b, and 2c a plurality of candidates for shooting targets that are the determination results, the probability that each candidate is a shooting target, and information related to these shooting targets.

画像処理サーバ１は、図２に示すように、通信手段１１と、判別手段１２と、特徴量抽出手段１３と、登録手段１４と、学習手段１５と、演算装置１６と、記憶装置１７と、重み付け手段１８とを備える。 As shown in FIG. 2, the image processing server 1 includes a communication unit 11, a determination unit 12, a feature amount extraction unit 13, a registration unit 14, a learning unit 15, a calculation device 16, a storage device 17, Weighting means 18.

通信手段１１は、通信ネットワーク３（インターネット等）を介し、端末装置２ａ、２ｂ、２ｃから、撮影画像及び撮影対象の情報を受信する。又、通信手段１１は、判別モードである場合、通信ネットワーク３（インターネット等）を介し、端末装置２ａ、２ｂ、２ｃへ撮影対象の情報及び判別結果を送信する。 The communication unit 11 receives the captured image and the information on the imaging target from the terminal devices 2a, 2b, and 2c via the communication network 3 (Internet or the like). Further, when in the discrimination mode, the communication unit 11 transmits information on the imaging target and the discrimination result to the terminal devices 2a, 2b, and 2c via the communication network 3 (Internet or the like).

特徴量抽出手段１３は、撮影対象を判別する際の指標となる特徴量を抽出する。具体的には、特徴量抽出手段１３は、撮影画像の任意の領域毎に、均等色空間で表現された撮影画像の信号を量子化し、均等色空間における各軸の量子化レベルの値の頻度を色ヒストグラムとして抽出する。そして、特徴量抽出手段１３は、任意の領域毎の色ヒストグラムを結合し、撮影画像全体の特徴量を抽出する。即ち、特徴量抽出手段１３は、既知の撮影画像１枚毎に、次元数（V）分の要素を有する撮影画像全体の特徴量（x）を抽出する。特徴量（x）は、一定の要素数を有するベクトルである。以下の説明において、x、x’は、ベクトルである。 The feature amount extraction unit 13 extracts a feature amount that serves as an index when determining the shooting target. Specifically, the feature amount extraction unit 13 quantizes the signal of the captured image expressed in the uniform color space for each arbitrary region of the captured image, and the frequency of the quantization level value of each axis in the uniform color space. Are extracted as a color histogram. Then, the feature amount extraction means 13 combines the color histograms for each arbitrary region, and extracts the feature amount of the entire captured image. That is, the feature amount extraction unit 13 extracts the feature amount (x) of the entire captured image having elements for the number of dimensions (V) for each known captured image. The feature quantity (x) is a vector having a certain number of elements. In the following description, x and x 'are vectors.

又、複数の既知の撮影対象に属する撮影画像の特徴量をXで表す。lを既知の撮影対象の番号、nを撮影画像毎に付与した番号、vを特徴量の次元の番号とすると、特徴量Xは、各要素x^(l) _(n,v)からなるマトリクスで表される。 Further, X represents a feature amount of a photographed image belonging to a plurality of known photographing objects. The feature quantity X is a matrix consisting of each element x ^(l) _{(n, v), where} ^l is the number of a known subject, n is the number assigned to each photographed image, and v is the dimension number of the feature quantity. expressed.

学習手段１５（パラメータ算出手段）は、式（１）を用いて、特徴量抽出手段１３によって抽出された特徴量に基づいて、学習パラメータΘを算出する。Vを特徴量の最大次元、Lを既知の撮影対象の数とすると、学習パラメータΘは、各要素θ_(v,l)をV×L個並べたマトリクスである。

The learning unit 15 (parameter calculation unit) calculates the learning parameter Θ based on the feature quantity extracted by the feature quantity extraction unit 13 using the equation (1). The learning parameter Θ is a matrix in which V × L elements θ _{(v, l)} are arranged, where V is the maximum dimension of the feature quantity and L is the number of known shooting targets.

式（１）において、lは既知の撮影対象の番号、Lは既知の撮影対象の数、N^(l)はl番目の既知の撮影対象に対する撮影画像の数、vは特徴量の次元の番号、Vは特徴量の最大次元数、x^(l) _(n,v)はl番目の既知の撮影対象に属するｎ番目の撮影画像のv番目の次元の特徴量、γは任意の値、θ_(v,l)はl番目の既知の撮影対象に属する特徴量のv番目の次元における学習パラメータである。 In Equation (1), l is the number of a known object, L is the number of known objects, N ^(l) is the number of images taken for the l-th known object, and v is the feature dimension number. , V is the maximum number of dimensions of the feature quantity, x ^(l) _{(n, v)} is the feature quantity of the vth dimension of the nth captured image belonging to the lth known imaging target, γ is an arbitrary value, θ _{(v, l)} is a learning parameter in the v-th dimension of the feature quantity belonging to the l-th known imaging target.

上記の説明において、v、l、n（小文字）は変数であり、V、L、N（大文字）は固定値である。又、x、x’（小文字）は、ベクトルであり、X及びΘ（大文字）は、マトリクスである。 In the above description, v, l, and n (lowercase) are variables, and V, L, and N (uppercase) are fixed values. X and x '(lower case) are vectors, and X and Θ (upper case) are matrices.

具体的には、学習手段１５（パラメータ算出手段）は、式（１）を用いて、l番目の既知の撮影対象に属する特徴量のv番目の次元の学習パラメータθ_(v,l)を算出する。 Specifically, the learning unit 15 (parameter calculation unit) calculates the v-th dimension learning parameter θ _{(v, l)} of the feature quantity belonging to the l-th known imaging target using the equation (1). To do.

判別手段１２は、判別モードの際に受信した撮影画像について、式（２）を用いて、学習手段１５（パラメータ算出手段）によって算出された学習パラメータを用いることにより、未知の撮影対象を判別する。

The discriminating unit 12 discriminates an unknown imaging target by using the learning parameter calculated by the learning unit 15 (parameter calculating unit) using Expression (2) for the captured image received in the discrimination mode. .

式（２）において、x’_(v)は未知の撮影対象が含まれる撮影画像のv番目の次元の特徴量、Ｆ_(l)は前記未知の撮影対象がl番目の既知の撮影対象に属する度合いを示す判別値である。 In equation (2), x ′ _(v) is the feature quantity of the v th dimension of the captured image including the unknown imaging target, and F _(l) is the unknown imaging target belonging to the l th known imaging target. It is a discriminant value indicating the degree.

具体的には、判別手段１２は、l番目の既知の撮影対象に属する特徴量のv番目の次元の学習パラメータθ_(v,l)を用いて、未知の撮影対象が含まれる撮影画像に属する特徴量x’が、l番目の既知の撮影対象に属する度合いを示すＦ_(l)の値を算出する。ここで、Ｆ_(l)の値が大きいほど、未知の撮影対象は、l番目の既知の撮影対象である確率が高いと判断する。 Specifically, the determination unit 12 belongs to a photographed image including an unknown photographing target using the learning parameter θ _{(v, l)} of the v th dimension of the feature amount belonging to the l th known photographing target. A value of F _(l) indicating the degree to which the feature quantity x ′ belongs to the l-th known imaging target is calculated. Here, it is determined that the larger the value of F _(l), the higher the probability that the unknown imaging target is the l-th known imaging target.

このように、式（２）によるＦ_(l)の算出をL回行えば、Ｆ_(l)の値に応じて、未知の撮影対象がL種類の撮影対象のどれに属するかを順位付けて判別することができる。 In this way, if F _(l) is calculated L times according to equation (2), according to the value of F _(l) , which of the L types of imaging targets the unknown imaging target belongs to is ranked. Can be determined.

登録手段１４は、演算装置１６を介して記憶装置１７に撮影画像毎の特徴量や学習パラメータを登録する。 The registration unit 14 registers the feature amount and learning parameter for each captured image in the storage device 17 via the arithmetic device 16.

重み付け手段１８は、任意の領域毎の色ヒストグラムに、任意の重みを掛け合わせる。特徴量抽出手段１３は、重み付け手段１８によって、重みを掛け合わされた任意の領域毎の色ヒストグラムを結合する。任意の重みは、例えば、任意の領域毎の色ヒストグラムの次元の値の合計値の逆数とすることができる。 The weighting means 18 multiplies the color histogram for each arbitrary region by an arbitrary weight. The feature amount extraction unit 13 combines the color histograms for each arbitrary region multiplied by the weight by the weighting unit 18. The arbitrary weight may be, for example, the reciprocal of the total value of the dimension values of the color histogram for each arbitrary region.

演算装置１６は、通信手段１１と、判別手段１２と、特徴量抽出手段１３と、登録手段１４と、学習手段１５と、記憶装置１７、重み付け手段１８の動作を制御する。 The arithmetic device 16 controls the operations of the communication unit 11, the determination unit 12, the feature amount extraction unit 13, the registration unit 14, the learning unit 15, the storage device 17, and the weighting unit 18.

又、本発明の実施の形態に係る画像処理サーバ１は、処理制御装置（ＣＰＵ）を有し、通信手段１１、判別手段１２、特徴量抽出手段１３、登録手段１４、学習手段１５、重み付け手段１８などをモジュールとしてＣＰＵに内蔵する構成とすることができる。これらのモジュールは、パーソナルコンピュータ等の汎用コンピュータにおいて、所定のプログラム言語を利用するための専用プログラムを実行することにより実現することができる。又、記憶装置１７は、複数の撮影画像の特徴量、複数の撮影画像それぞれに基づいて算出された複数の所定の学習パラメータ、撮影画像データ、登録対象情報、登録対象関連情報、判定値などを保存する記録媒体である。記録媒体は、例えば、ＲＡＭ、ＲＯＭ、ハードディスク、フレキシブルディスク、コンパクトディスク、ＩＣチップ、カセットテープなどが挙げられる。このような記録媒体によれば、撮影画像データ、学習パラメータ、登録対象情報などの保存、運搬、販売などを容易に行うことができる。 The image processing server 1 according to the embodiment of the present invention includes a processing control device (CPU), and includes a communication unit 11, a determination unit 12, a feature amount extraction unit 13, a registration unit 14, a learning unit 15, and a weighting unit. 18 or the like as a module can be built in the CPU. These modules can be realized by executing a dedicated program for using a predetermined program language in a general-purpose computer such as a personal computer. In addition, the storage device 17 stores feature amounts of a plurality of captured images, a plurality of predetermined learning parameters calculated based on each of the plurality of captured images, captured image data, registration target information, registration target related information, determination values, and the like. A recording medium to be stored. Examples of the recording medium include RAM, ROM, hard disk, flexible disk, compact disk, IC chip, and cassette tape. According to such a recording medium, it is possible to easily store, transport, and sell captured image data, learning parameters, registration target information, and the like.

端末装置２ａ、２ｂ、２ｃは、学習モードと判別モードの２種類のモードを切り換えることができる。 The terminal devices 2a, 2b, and 2c can switch between two types of modes, a learning mode and a discrimination mode.

学習モードである場合、端末装置２ａ、２ｂ、２ｃは、搭載されたカメラによって、既知の撮影対象を撮影する。端末装置２ａ、２ｂ、２ｃは、予めユーザによって登録された登録対象情報と登録対象関連情報とともに、撮影画像を画像処理サーバ１に送信する。ここで、「登録対象情報」とは、撮影画像に写っている撮影対象を特定するための情報である。又、「登録対象関連情報」とは、撮影対象の位置情報、撮影対象に関連する情報（名称、ＵＲＬなど）などである。 In the learning mode, the terminal devices 2a, 2b, and 2c photograph a known subject to be photographed using the mounted camera. The terminal devices 2a, 2b, and 2c transmit the captured image to the image processing server 1 together with registration target information and registration target related information registered in advance by the user. Here, “registration target information” is information for specifying a shooting target in a shot image. Further, “registration target related information” includes position information of a shooting target, information (name, URL, etc.) related to the shooting target, and the like.

判別モードである場合、端末装置２ａ、２ｂ、２ｃは、搭載されたカメラによって、未知の撮影対象を撮影する。端末装置２ａ、２ｂ、２ｃは、位置情報とともに、撮影画像を画像処理サーバ１に送信する。そして、端末装置２ａ、２ｂ、２ｃは、画像処理サーバ１から判別結果となる複数の撮影対象の候補と、各候補が撮影対象である確率と、それら撮影対象に関連する情報を受信し、上述したＦ_(l)の値を元に「登録対象情報」と「登録対象関連情報」を順位付けて、ユーザに提示する。ユーザは、それら順位付けされた候補の中から所望のデータを簡単に取り出すことができる。 In the discrimination mode, the terminal devices 2a, 2b, and 2c photograph an unknown photographing target using the mounted camera. The terminal devices 2a, 2b, and 2c transmit the captured image to the image processing server 1 together with the position information. Then, the terminal devices 2a, 2b, and 2c receive a plurality of shooting target candidates that are the determination results from the image processing server 1, the probability that each candidate is a shooting target, and information related to the shooting targets, and Based on the value of F _(l) , “registration target information” and “registration target related information” are ranked and presented to the user. The user can easily extract desired data from the ranked candidates.

端末装置２は、図３に示すように、入力手段２１と、通信手段２２と、出力手段２３と、撮影手段２４と、測位手段２５と、演算装置２６と、記憶装置２７とを備える。 As shown in FIG. 3, the terminal device 2 includes an input unit 21, a communication unit 22, an output unit 23, a photographing unit 24, a positioning unit 25, a calculation device 26, and a storage device 27.

通信手段２２は、通信ネットワーク３（インターネット）を介し、画像処理サーバ１へ、撮影画像及び撮影対象の情報を送信する。又、通信手段２２は、判別モードである場合、通信ネットワーク３（インターネット）を介し、画像処理サーバ１から、撮影対象の情報及び判別結果を受信する。 The communication unit 22 transmits the captured image and the information about the imaging target to the image processing server 1 via the communication network 3 (Internet). When the communication unit 22 is in the determination mode, the communication unit 22 receives information about the imaging target and the determination result from the image processing server 1 via the communication network 3 (Internet).

撮影手段２４は、具体的には、搭載されたカメラなどを指し、対象を撮影し、撮影画像を取得する。 Specifically, the imaging unit 24 refers to a mounted camera or the like, captures an object, and acquires a captured image.

測位手段２５は、端末装置２の位置や撮影対象の位置を測定する。 The positioning means 25 measures the position of the terminal device 2 and the position of the photographing target.

入力手段２１は、タッチパネル、キーボード、マウス等の機器を指す。入力手段２１から入力操作が行われると対応するキー情報が演算装置２６に伝達される。出力手段２３は、モニタなどの画面を指し、液晶表示装置（ＬＣＤ）、発光ダイオード（ＬＥＤ）パネル、エレクトロルミネッセンス（ＥＬ）パネル等が使用可能である。 The input means 21 refers to devices such as a touch panel, a keyboard, and a mouse. When an input operation is performed from the input means 21, corresponding key information is transmitted to the arithmetic device 26. The output means 23 refers to a screen such as a monitor, and a liquid crystal display (LCD), a light emitting diode (LED) panel, an electroluminescence (EL) panel, or the like can be used.

演算装置２６は、入力手段２１と、通信手段２２と、出力手段２３と、撮影手段２４と、測位手段２５と、記憶装置２７の動作を制御する。又、演算装置２６は、入力手段２１から入力されたキー情報などによって、学習モードと判別モードを切り換える切換手段として動作する。 The arithmetic device 26 controls operations of the input means 21, the communication means 22, the output means 23, the photographing means 24, the positioning means 25, and the storage device 27. The arithmetic unit 26 operates as a switching unit that switches between the learning mode and the discrimination mode based on key information input from the input unit 21.

記憶装置２７は、撮影画像、登録対象情報、登録対象関連情報などを保存する記録媒体である。 The storage device 27 is a recording medium that stores captured images, registration target information, registration target related information, and the like.

（画像処理方法）
次に、本実施形態に係る画像処理方法について、図４〜９を用いて説明する。 (Image processing method)
Next, the image processing method according to the present embodiment will be described with reference to FIGS.

まず、撮影画像の登録方法について、図４を用いて説明する。 First, a method for registering captured images will be described with reference to FIG.

（イ）まず、ステップＳ１０１において、端末装置２は、学習モードにおいて既知の登録対象を撮影し、その画像を取得する。 (A) First, in step S101, the terminal device 2 captures a known registration target in the learning mode and acquires the image.

（ロ）次に、ステップＳ１０２において、端末装置２は、登録対象情報を入力し、ステップＳ１０３において、端末装置２は、登録対象関連情報を入力する。登録対象情報及び登録対象関連情報の入力は、撮影前に予め行っていても構わない。例えば、撮影した画像が図９に示すような洋菓子店舗の場合、「登録対象情報」として、“ケーキ屋”などを入力し、「登録対象関連情報」として、“ＡＡＡ洋菓子店”、ＡＡＡ洋菓子店の住所、ＡＡＡ洋菓子店のＵＲＬなどを入力する。 (B) Next, in step S102, the terminal device 2 inputs registration target information, and in step S103, the terminal device 2 inputs registration target related information. The registration target information and registration target related information may be input in advance before shooting. For example, if the photographed image is a confectionery store as shown in FIG. 9, “cake shop” or the like is input as “registration target information”, and “AAA confectionery store” or AAA confectionery store as “registration target related information”. And the URL of the AAA pastry store.

（ハ）次に、ステップＳ１０４において、端末装置２は、登録対象を撮影した地点の位置情報、測位誤差、撮影時刻、可能であれば撮影対象までの距離や方向を取得する。 (C) Next, in step S 104, the terminal device 2 acquires the position information, the positioning error, the shooting time, and the distance and direction to the shooting target if possible.

（ニ）次に、ステップＳ１０５において、端末装置２は、画像処理サーバ１に登録対象情報、登録対象関連情報、位置情報、取得画像データを送信する。 (D) Next, in step S 105, the terminal device 2 transmits registration target information, registration target related information, position information, and acquired image data to the image processing server 1.

（ホ）次に、ステップＳ１０６において、画像処理サーバ１は、登録対象情報、登録対象関連情報、位置情報、取得画像を送信する。そして、ステップＳ１０７において、画像処理サーバ１は、登録画像の特徴量を抽出する。この特徴量の抽出方法は、後に詳述する。 (E) Next, in step S106, the image processing server 1 transmits registration target information, registration target related information, position information, and an acquired image. In step S107, the image processing server 1 extracts the feature amount of the registered image. This feature amount extraction method will be described in detail later.

（へ）次に、ステップＳ１０８において、画像処理サーバ１は、登録対象情報、登録対象関連情報、登録対象画像、特徴量及び画像処理サーバ１での登録時刻を記憶装置１７に記憶する。 (F) Next, in step S108, the image processing server 1 stores the registration target information, the registration target related information, the registration target image, the feature amount, and the registration time in the image processing server 1 in the storage device 17.

次に、撮影対象を学習する方法について、図５を用いて説明する。 Next, a method for learning an imaging target will be described with reference to FIG.

（イ）まず、ステップＳ２０１において、画像処理サーバ１は、記憶装置１７から撮影対象情報、撮影対象関連情報、取得画像データ、特徴量を読み出す。 (A) First, in step S201, the image processing server 1 reads shooting target information, shooting target related information, acquired image data, and feature amount from the storage device 17.

（ロ）次に、ステップＳ２０２において、画像処理サーバ１は、位置情報による対象のグルーピングを行って対象を絞り込む。後述する撮影対象の判別時に随時学習を行う場合は、端末装置２から端末装置２の位置情報を受信して、検索範囲内にある撮影対象を学習する。ここで、「検索範囲」とは、判別対象を中心として半径が（測位誤差）＋（対象までの距離）以内のエリアを指す。又、ここで用いる位置情報は、任意の位置を受け付けることが可能である。例えば、予め判別が行われそうな位置を用いて学習を行っても良いし、端末の位置情報を用いて判別する際に随時学習を行っても良い。 (B) Next, in step S202, the image processing server 1 performs target grouping based on position information to narrow down the target. In the case where learning is performed as needed at the time of determination of an imaging target, which will be described later, position information of the terminal device 2 is received from the terminal device 2, and the imaging target within the search range is learned. Here, the “search range” refers to an area having a radius within (positioning error) + (distance to the object) with the discrimination target as the center. The position information used here can accept any position. For example, learning may be performed using a position where the determination is likely to be performed in advance, or learning may be performed as needed when performing determination using position information of the terminal.

（ハ）次に、ステップＳ２０３において、画像処理サーバ１は、対象の学習を行う。具体的には、上述した式（１）を用いて学習パラメータの算出を行う。 (C) Next, in step S203, the image processing server 1 performs target learning. Specifically, the learning parameter is calculated using the above-described equation (1).

（ニ）次に、ステップＳ２０４において、画像処理サーバ１は、対象の学習結果（学習パラメータ）を記憶する。 (D) Next, in step S204, the image processing server 1 stores the target learning result (learning parameter).

次に、撮影対象を判別する方法について、図６を用いて説明する。 Next, a method for determining an imaging target will be described with reference to FIG.

（イ）まず、ステップＳ３０１において、端末装置２は、判別モードにおいて未知の判別対象を撮影し、画像を取得する。次に、ステップＳ３０２において、端末装置２は、判別対象関連情報（判別対象の位置情報、その測位誤差、対象までの距離等）を取得する。 (A) First, in step S301, the terminal device 2 captures an unknown discrimination target in the discrimination mode and acquires an image. Next, in step S302, the terminal device 2 acquires discrimination target related information (position information of the discrimination target, its positioning error, distance to the target, etc.).

（ロ）次に、ステップＳ３０３において、端末装置２は、画像処理サーバ１に判別対象関連情報及び判別画像を送信する。次に、ステップＳ３０４において、画像処理サーバ１は、判別対象関連情報及び判別画像を受信する。次に、ステップＳ３０５において、画像処理サーバ１は、判別画像の特徴量を抽出する。この特徴量の抽出方法は、後に詳述する。 (B) Next, in step S303, the terminal device 2 transmits the discrimination target related information and the discrimination image to the image processing server 1. Next, in step S304, the image processing server 1 receives the discrimination target related information and the discrimination image. Next, in step S305, the image processing server 1 extracts the feature amount of the discrimination image. This feature amount extraction method will be described in detail later.

（ハ）次に、ステップＳ３０６において、画像処理サーバ１は、判別対象の絞り込みを行う。次に、ステップＳ３０７において、画像処理サーバ１は、検索範囲の学習が完了しているか否か判断する。完了している場合は、ステップＳ３０８の処理に進み、完了していない場合は、ステップＳ３０９の処理に進む。 (C) Next, in step S306, the image processing server 1 narrows down the discrimination target. Next, in step S307, the image processing server 1 determines whether learning of the search range is completed. If completed, the process proceeds to step S308. If not completed, the process proceeds to step S309.

（ニ）ステップＳ３０９において、画像処理サーバ１は、図５のステップＳ２０３において説明した学習を行う。そして、ステップＳ３０８において、画像処理サーバ１は、判別画像の特徴量と学習パラメータを用いて対象判別を確率的に行う。具体的には、具体的には、上述した式（２）を用いて判定値を求め、判別候補となる撮影画像を求める。 (D) In step S309, the image processing server 1 performs the learning described in step S203 in FIG. In step S308, the image processing server 1 probabilistically performs target discrimination using the feature amount of the discrimination image and the learning parameter. Specifically, a determination value is obtained using the above-described equation (2), and a captured image that is a discrimination candidate is obtained.

（ホ）次に、ステップＳ３１０において、画像処理サーバ１は、判別結果の候補に関する撮影対象情報及び撮影対象関連情報を記憶装置１７から読み出し、撮影対象情報、撮影対象関連情報、判別結果を端末装置２に送信する。 (E) Next, in step S310, the image processing server 1 reads out the shooting target information and the shooting target related information related to the determination result candidate from the storage device 17, and obtains the shooting target information, the shooting target related information, and the determination result from the terminal device. 2 to send.

（へ）次に、ステップＳ３１０において、端末装置２は、確率的な判別結果を元に撮影対象情報と撮影対象関連情報に優先度をつけて（例えば、確率の高い候補を画面の上部に表示するなど）ユーザに提示する。これにより、ユーザは未知の登録対象に関連したＵＲＬ等のアドレスを元にして、ネットワークからさらに対象に関連する情報を引き出すことができる。 (F) Next, in step S310, the terminal device 2 prioritizes the shooting target information and the shooting target related information based on the probabilistic discrimination result (for example, displays a high-probability candidate at the top of the screen). Present it to the user. As a result, the user can further extract information related to the target from the network based on an address such as a URL related to the unknown registration target.

図６に示す判別処理によると、例えば、ユーザが、端末装置２によって未知の撮影対象である店舗Ａを撮影し、その撮影画像を画像処理サーバ１へ送信すると、画像処理サーバ１が店舗Ａを判別し、店舗Ａの名称、ＵＲＬ等を端末装置２に送信することができる。このため、端末装置２は、撮影画像からその撮影対象に関する情報を容易に得ることができる。 According to the determination processing shown in FIG. 6, for example, when the user images a store A that is an unknown image capturing object by the terminal device 2 and transmits the captured image to the image processing server 1, the image processing server 1 stores the store A. The name of the store A, the URL, etc. can be transmitted to the terminal device 2. For this reason, the terminal device 2 can easily obtain information regarding the subject to be photographed from the photographed image.

次に、図４のステップＳ１０７及び図６のステップＳ３０５における特徴量の抽出方法の詳細について、図７を用いて説明する。 Next, details of the feature amount extraction method in step S107 in FIG. 4 and step S305 in FIG. 6 will be described with reference to FIG.

（イ）まず、ステップＳ４０１において、画像処理サーバ１は、端末装置２に搭載したカメラによって対象の撮影画像を取得する。ここでは、図９に示す画像を取得したことを例にとり説明する。この取得した画像は、カメラや端末装置２の機能によりホワイトバランス等の一般的な画像補正を行われていてもよい。 (A) First, in step S 401, the image processing server 1 acquires a target captured image using a camera mounted on the terminal device 2. Here, the case where the image shown in FIG. 9 is acquired will be described as an example. The acquired image may be subjected to general image correction such as white balance by the function of the camera or the terminal device 2.

（ロ）次に、ステップＳ４０２において、画像処理サーバ１は、取得画像にノイズ除去の画像補正処理を施す。この補正画像に対して、ステップＳ４０３において、撮影画像を任意の領域に分割し、領域毎の色ヒストグラムを抽出する。各領域の色ヒストグラムは、Ｖｃ次元の特徴量（Ｖｃ個の要素を有するベクトル値）として表される。領域の分割方法として、図９では、格子状に分割したものを示したが、放射状でも円形状でもよく、分割形状はこれに限らない。又、領域は等分割されなくてもよい。色ヒストグラムの抽出方法については、後に詳述する。 (B) Next, in step S402, the image processing server 1 performs noise correction image correction processing on the acquired image. In step S403, the captured image is divided into arbitrary regions with respect to the corrected image, and a color histogram for each region is extracted. The color histogram of each region is expressed as a Vc-dimensional feature amount (vector value having Vc elements). As a method of dividing the region, FIG. 9 shows a region divided into a lattice shape, but it may be a radial shape or a circular shape, and the divided shape is not limited thereto. Further, the area may not be equally divided. The method for extracting the color histogram will be described in detail later.

（ハ）次に、ステップＳ４０４において、任意の領域毎の色ヒストグラムを結合し、撮影画像全体の特徴量を抽出する。特徴量は、一定の要素数を有するベクトル値である。例えば、各領域の色ヒストグラムがＶｃ次元、領域分割数がＳであるとき、両者の特徴量を結合した特徴量の次元ＶがＶ＝Ｓ×Ｖｃとなるように、独立の次元として結合を行う。例えば、図９に示す撮影画像は、縦４×横６＝２４領域に分割されているので、各領域の色ヒストグラムが２４次元であると、画像領域全体の特徴量は、２４×２４＝５７６次元となる。 (C) Next, in step S404, a color histogram for each arbitrary region is combined to extract a feature amount of the entire captured image. The feature amount is a vector value having a certain number of elements. For example, when the color histogram of each area is Vc dimension and the number of area divisions is S, combining is performed as an independent dimension so that the dimension V of the feature quantity obtained by combining both feature quantities is V = S × Vc. . For example, the photographed image shown in FIG. 9 is divided into 4 × vertical 6 × 24 = 24 regions. Therefore, if the color histogram of each region is 24 dimensions, the feature amount of the entire image region is 24 × 24 = 576. It becomes a dimension.

次に、図７のステップＳ４０３における色ヒストグラムの抽出方法の詳細について、図８を用いて説明する。 Next, details of the color histogram extraction method in step S403 of FIG. 7 will be described with reference to FIG.

（イ）まず、ステップＳ５０１において、画像処理サーバ１は、任意の領域毎に、補正画像を視覚的に均等な空間である均等色空間（Ｌ^*、ａ^*、ｂ^*）で表現する。 (A) First, in step S501, the image processing server 1 represents the corrected image in a uniform color space (L ^* , a ^* , b ^* ) that is a visually uniform space for each arbitrary region.

（ロ）次に、ステップＳ５０２において、画像処理サーバ１は、均等色空間における各軸を独立に等間隔で量子化を行う。 (B) Next, in step S502, the image processing server 1 independently quantizes each axis in the uniform color space at equal intervals.

（ハ）次に、ステップＳ５０３において、Ｌ^*、ａ^*、ｂ^*の量子化レベルの値の頻度を色ヒストグラムとする。例えば、Ｌ^*、ａ^*、ｂ^*の３軸の量子化レベルを８とした場合には、色ヒストグラムは、２４（＝３×８）次元（上述したＶｃ次元）の特徴量となる。このとき、色ヒストグラムには、色の３原色の割合や各ピクセルの輝度値が含まれる。例えば、図９に示す撮影画像は、縦４×横６＝２４領域に分割されているので、２４個の領域毎に、２４次元の特徴量が算出される。 (C) Next, in step S503, the frequency of the quantization level values of L ^* , a ^* , and b ^* is set as a color histogram. For example, if the three-axis quantization level of L ^* , a ^* , and b ^* is 8, the color histogram has 24 (= 3 × 8) dimension (the Vc dimension described above). At this time, the color histogram includes the ratio of the three primary colors and the luminance value of each pixel. For example, the photographed image shown in FIG. 9 is divided into 4 × vertical 6 × horizontal = 24 areas, and therefore a 24-dimensional feature value is calculated for each of the 24 areas.

（ニ）次に、ステップＳ５０４において、任意の領域毎の色ヒストグラムに、任意の重みを掛け合わせる。任意の重みは、例えば、任意の領域毎の色ヒストグラムの次元の値の合計値の逆数とすることができる。各領域は等分割されているとは限らないので、重みとして、上記の合計値の逆数を掛け合わせることにより、色ヒストグラムを正規化することができる。その他、撮影画像の中心に近い領域に大きな重みを掛け合わせるなど、掛け合わせる重みは、実情に即したものを適宜選択することとする。 (D) Next, in step S504, the color histogram for each arbitrary region is multiplied by an arbitrary weight. The arbitrary weight may be, for example, the reciprocal of the total value of the dimension values of the color histogram for each arbitrary region. Since each region is not necessarily equally divided, the color histogram can be normalized by multiplying the reciprocal of the total value as a weight. In addition, the weights to be multiplied such as multiplying a region close to the center of the photographed image are appropriately selected according to the actual situation.

（作用及び効果）
本実施形態に係る画像処理装置及び画像処理方法によると、撮影画像の任意の領域毎に、色ヒストグラムを抽出し、それらを結合した特徴量に基づいて、撮影対象を判別することができる。このため、画像特徴量の次元数を低く抑え、計算処理量を小さくし、撮影画像から撮影画像に含まれる撮影対象を正しく判別することができる。 (Action and effect)
According to the image processing apparatus and the image processing method according to the present embodiment, it is possible to extract a color histogram for each arbitrary region of a photographed image and determine a photographing target based on a feature amount obtained by combining them. For this reason, the number of dimensions of the image feature amount can be kept low, the amount of calculation processing can be reduced, and the shooting target included in the shot image can be correctly determined from the shot image.

例えば、図９に示す撮影画像が１００ピクセル×１００ピクセルの画像であるとする。この撮影画像に対して、従来の輝度値及び色の特徴量を結合する処理を行う。具体的には、各ピクセルの輝度値を特徴量として、１００×１００＝１０，０００次元を算出し、色の特徴量として、Ｌ^*、ａ^*、ｂ^*の３軸の量子化レベルを８とした場合に、３×８＝２４次元を算出する。これらを結合すると、撮影画像全体の特徴量は、１０，０００＋２４＝１０，０２４次元となる。一方、本実施形態において説明したように、図９に示す撮影画像を縦４×横６＝２４領域に分割し、領域毎に色ヒストグラムを抽出すると、Ｌ^*、ａ^*、ｂ^*の３軸の量子化レベルを８とした場合には、領域毎の色ヒストグラムは、２４（＝３×８）次元となる。これらを結合すると、撮影画像全体の特徴量は、２４次元×２４領域＝５７６次元となる。このように、本実施形態によると、大幅に画像特徴量の次元数を低く抑え、計算処理量を小さくすることができる。 For example, assume that the captured image shown in FIG. 9 is an image of 100 pixels × 100 pixels. A process for combining the luminance value and the color feature amount is performed on the captured image. Specifically, 100 × 100 = 10,000 dimensions are calculated using the luminance value of each pixel as a feature quantity, and the three-axis quantization levels of L ^* , a ^* , and b ^* are set to 8 as the color feature quantity. In this case, 3 × 8 = 24 dimensions are calculated. When these are combined, the characteristic amount of the entire photographed image becomes 10,000 + 24 = 10,024 dimensions. On the other hand, as described in the present embodiment, when the captured image shown in FIG. 9 is divided into 4 × 6 × 24 regions and a color histogram is extracted for each region, three axes L ^* , a ^* , and b ^* . When the quantization level of 8 is 8, the color histogram for each region has 24 (= 3 × 8) dimensions. When these are combined, the feature amount of the entire captured image is 24 dimensions × 24 areas = 576 dimensions. As described above, according to the present embodiment, it is possible to greatly reduce the number of dimensions of the image feature amount and reduce the calculation processing amount.

又、本実施形態に係る画像処理装置及び画像処理方法によると、任意の領域毎の色ヒストグラムに任意の重みを掛け合わせ、この重みを掛け合わされた色ヒストグラムを結合することにより特徴量を抽出することができる。このため、各領域で抽出した特徴量のベクトルの大きさについてバランスを取ることができる。 Further, according to the image processing apparatus and the image processing method according to the present embodiment, a feature amount is extracted by multiplying a color histogram for each arbitrary region by an arbitrary weight and combining the weighted histograms. be able to. For this reason, it is possible to balance the size of the feature vector extracted in each region.

又、任意の重みは、任意の領域毎の色ヒストグラムの次元の値の合計値の逆数とすることができる。このため、ベクトルを正規化して結合を行うことができる。 Further, the arbitrary weight can be the reciprocal of the total value of the dimension values of the color histogram for each arbitrary region. For this reason, it is possible to perform the combination by normalizing the vectors.

又、本実施形態に係る画像処理装置及び画像処理方法によると、複数の撮影画像全体の特徴量それぞれに基づいて算出された複数の所定のパラメータを用いることにより、未知の撮影対象を判別することができる。このため、特徴量を用いて算出されたパラメータを用いて、未知の撮影対象の判別を行うことができる。 Further, according to the image processing apparatus and the image processing method according to the present embodiment, it is possible to discriminate an unknown shooting target by using a plurality of predetermined parameters calculated based on respective feature amounts of a plurality of shot images as a whole. Can do. For this reason, it is possible to determine an unknown imaging target using the parameter calculated using the feature amount.

又、本実施形態に係る画像処理装置及び画像処理方法によると、上述した式（１）を用いて、所定のパラメータを算出し、式（２）によって得られた判別値が最も大きい撮影画像を、撮影対象として認識することができる。式（１）及び式（２）は、いわゆるＮＢ（ナイーブ・ベイズ）の手法であり、未知の撮影対象が既知の撮影対象に該当する確率を、複数の既知の撮影対象それぞれに算出することができ、かつ、処理速度が早いという利点を有する。 Further, according to the image processing apparatus and the image processing method according to the present embodiment, the predetermined parameter is calculated using the above-described equation (1), and the captured image having the largest discriminant value obtained by the equation (2) is obtained. Can be recognized as a shooting target. Expressions (1) and (2) are so-called NB (Naive Bayes) techniques, and the probability that an unknown imaging target corresponds to a known imaging object can be calculated for each of a plurality of known imaging objects. It is possible and has an advantage of high processing speed.

（その他の実施の形態）
本発明は上記の実施の形態によって記載したが、この開示の一部をなす論述及び図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例及び運用技術が明らかとなろう。 (Other embodiments)
Although the present invention has been described according to the above-described embodiments, it should not be understood that the descriptions and drawings constituting a part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples and operational techniques will be apparent to those skilled in the art.

例えば、上述した実施形態において、ＮＢ（ナイーブ・ベイズ）の手法を用いて、学習パラメータを取得し、未知の撮影対象を判別する例を示した。しかし、ＮＢに限らず、他のベイズ手法やＳＶＭ（サポート・ベクター・マシン）、ｋＮＮ（ｋニアレスト・ネイバー）、ＬＶＱ（ラーニング・ベクター・クォンタイゼーション）などの他の学習アルゴリズムを用いて、学習パラメータを取得し、未知の撮影対象を判別しても構わない。 For example, in the above-described embodiment, an example has been shown in which learning parameters are acquired using an NB (Naive Bayes) method to determine an unknown imaging target. However, not only NB, but also learning using other Bayesian methods and other learning algorithms such as SVM (Support Vector Machine), kNN (k Nearest Neighbor), LVQ (Learning Vector Quantization) You may acquire a parameter and discriminate | determine an unknown imaging | photography object.

又、実施形態に係る画像処理サーバ１は、通信手段１１と、判別手段１２と、特徴量抽出手段１３と、登録手段１４と、学習手段１５と、重み付け手段１８とをモジュールとしてＣＰＵに内蔵する構成とすることができると説明したが、それらが二つあるいはそれ以上のＣＰＵに分かれていても構わない。その際はそれらのＣＰＵ間でデータのやりとりが行えるようにバスなどで装置間を接続しているとする。 Further, the image processing server 1 according to the embodiment incorporates a communication unit 11, a determination unit 12, a feature amount extraction unit 13, a registration unit 14, a learning unit 15, and a weighting unit 18 in a CPU as a module. Although it has been described that the configuration can be made, they may be divided into two or more CPUs. In this case, it is assumed that the devices are connected by a bus or the like so that data can be exchanged between the CPUs.

このように、本発明はここでは記載していない様々な実施の形態等を含むことは勿論である。従って、本発明の技術的範囲は上記の説明から妥当な特許請求の範囲に係る発明特定事項によってのみ定められるものである。 As described above, the present invention naturally includes various embodiments not described herein. Therefore, the technical scope of the present invention is defined only by the invention specifying matters according to the scope of claims reasonable from the above description.

本発明の実施の形態に係る画像処理システムの構成ブロック図である。1 is a configuration block diagram of an image processing system according to an embodiment of the present invention. 本発明の実施の形態に係る画像処理サーバの構成ブロック図である。It is a block diagram of the configuration of the image processing server according to the embodiment of the present invention. 本発明の実施の形態に係る端末装置の構成ブロック図である。It is a block diagram of the configuration of a terminal device according to an embodiment of the present invention. 本発明の実施の形態に係る登録処理を示すフローチャートである。It is a flowchart which shows the registration process which concerns on embodiment of this invention. 本発明の実施の形態に係る学習処理を示すフローチャートである。It is a flowchart which shows the learning process which concerns on embodiment of this invention. 本発明の実施の形態に係る判別処理を示すフローチャートである。It is a flowchart which shows the discrimination | determination process which concerns on embodiment of this invention. 本発明の実施の形態に係る特徴量抽出処理を示すフローチャートである。It is a flowchart which shows the feature-value extraction process which concerns on embodiment of this invention. 図７のステップＳ４０３の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S403 of FIG. 本発明の実施の形態に係る撮影画像の一例である。It is an example of the picked-up image which concerns on embodiment of this invention.

Explanation of symbols

１画像処理サーバ
２端末装置
１１通信手段
１２対象判別手段
１３特徴量抽出手段
１４登録手段
１５学習手段
１６演算装置
１７記憶装置
１８重み付け手段
２１入力手段
２２通信手段
２３出力手段
２４撮影手段
２５測位手段
２６演算装置
２７記憶装置

DESCRIPTION OF SYMBOLS 1 Image processing server 2 Terminal device 11 Communication means 12 Object discrimination means 13 Feature quantity extraction means 14 Registration means 15 Learning means 16 Arithmetic device 17 Storage device 18 Weighting means 21 Input means 22 Communication means 23 Output means 24 Imaging means 25 Imaging means 25 Positioning means 26 Arithmetic device 27 Storage device

Claims

An image processing apparatus for determining an unknown shooting target included in a captured image from a captured image,
First feature amount extraction for quantizing a signal of a photographed image expressed in a color space for each arbitrary region of the photographed image and extracting a frequency of a quantization level value of each axis in the color space as a color histogram Means,
A second feature amount extracting means for combining the color histograms for each arbitrary region and extracting the feature amount of the entire captured image;
An image processing apparatus comprising: a discriminating unit that discriminates the unknown imaging target based on a feature amount of the entire captured image.

A weighting unit for multiplying the color histogram for each arbitrary region by an arbitrary weight;
The image processing apparatus according to claim 1, wherein the second feature amount extraction unit combines the color histograms of the arbitrary regions multiplied by weights.

The image processing apparatus according to claim 2, wherein the arbitrary weight is a reciprocal of a total value of dimension values of a color histogram for each arbitrary region.

Storage means for storing a plurality of predetermined parameters calculated based on the respective feature amounts of the plurality of captured images;
Parameter calculating means for calculating the predetermined parameter;
The image processing apparatus according to claim 1, wherein the determination unit determines the unknown imaging target by using the predetermined parameter.

l is the number of the known imaging target, L is the number of the known imaging targets, N ^(l) is the number of the captured images for the l-th known imaging target, v is the dimension number of the feature, and V is The maximum number of dimensions of the feature amount, x ^(l) _{(n, v)} is the feature amount of the vth dimension of the nth photographed image belonging to the lth known photographed object, γ is an arbitrary value, θ _{( v, l)} is the predetermined parameter in the v-th dimension of the feature quantity belonging to the l-th known imaging target, and x ′ _(v) is the feature quantity in the v-th dimension of the captured image including the unknown imaging target. , F _(l) as a discrimination value indicating the degree to which the unknown imaging target belongs to the l-th known imaging target,
The parameter calculation means calculates the predetermined parameter according to the following equation:

The discriminating unit discriminates the unknown imaging target based on the discriminant value obtained by the following expression.

The image processing apparatus according to claim 1, wherein the image processing apparatus is an image processing apparatus.

An image processing method for determining an unknown shooting target included in a captured image from a captured image,
First feature amount extraction for quantizing a signal of a photographed image expressed in a color space for each arbitrary region of the photographed image and extracting a frequency of a quantization level value of each axis in the color space as a color histogram Steps,
A second feature amount extraction step of combining the color histograms for each arbitrary region and extracting the feature amount of the entire captured image;
An image processing method comprising: discriminating the unknown imaging target based on a feature amount of the entire captured image.