JP4664047B2

JP4664047B2 - Image processing apparatus and image processing method

Info

Publication number: JP4664047B2
Application number: JP2004326994A
Authority: JP
Inventors: 高康山口; 博青野; 節之本郷
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2004-11-10
Filing date: 2004-11-10
Publication date: 2011-04-06
Anticipated expiration: 2024-11-10
Also published as: JP2006139418A

Description

本発明は、撮影画像から撮影画像に含まれる撮影対象を学習、又は判別する画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method for learning or determining a shooting target included in a shot image from a shot image.

従来、撮影画像から当該撮影画像に含まれる撮影対象を判別する際に用いられる特徴量には様々なものがある。色の特徴量や形状（構図）の特徴量などが、その例である。近年、これら複数の画像特徴量を組み合わせて高次元特徴量を作成し、撮影対象の判別を行う方法が提案されている（例えば、特許文献１参照）。これらの手法においては、画像から特徴量を抽出する際に、画面全体もしくは画面中の特定の認識領域に対して、一定の次元数を用いて特徴量が抽出されている。
特開２００３−２８９５５１号公報（「００８９」〜「００９１」段落、図１０） 2. Description of the Related Art Conventionally, there are various feature amounts used when determining a shooting target included in a captured image from a captured image. Examples include color feature values and shape (composition) feature values. In recent years, a method has been proposed in which a plurality of image feature amounts are combined to create a high-dimensional feature amount and a subject to be imaged is determined (see, for example, Patent Document 1). In these methods, when extracting a feature amount from an image, the feature amount is extracted using a certain number of dimensions for the entire screen or a specific recognition region in the screen.
Japanese Patent Laying-Open No. 2003-289551 (paragraphs “0089” to “0091”, FIG. 10)

しかしながら、撮影対象を画像中央部に位置するように対象を撮影すると、画像の縁の部分には対象以外の背景が写ることが多く、縁の部分の背景の影響で、特徴量に含まれるノイズが多くなる。このため、特徴量を用いた判別の正解率が低下する。 However, when the subject is photographed so that the subject is located in the center of the image, the background other than the subject often appears in the edge portion of the image, and noise included in the feature amount due to the influence of the background of the edge portion. Will increase. For this reason, the correct answer rate of the discrimination using the feature amount is lowered.

そこで、本発明は、上記の課題に鑑み、ノイズの少ない特徴量を抽出し、撮影画像から撮影画像に含まれる撮影対象を正しく判別する画像処理装置及び画像処理方法を提供することを目的とする。 SUMMARY OF THE INVENTION In view of the above problems, an object of the present invention is to provide an image processing apparatus and an image processing method that extract a feature amount with less noise and correctly determine a shooting target included in a shot image from a shot image. .

上記目的を達成するため、本発明の第１の特徴は、撮影画像から撮影画像に含まれる撮影対象の学習、又は判別の少なくとも一方を行う画像処理装置であって、（ａ）任意の数の領域に分割された撮影画像の各領域において、撮影対象が写る確率を推定する確率推定手段と、（ｂ）確率推定手段によって推定された確率を元に、各領域における画像の表現方法を変更する表現変更手段と、（ｃ）表現変更手段によって変更された表現方法を用いて、特徴ベクトルを抽出する特徴量抽出手段と、（ｄ）特徴量抽出手段によって抽出された各領域における特徴ベクトルを組み合わせることにより、撮影画像全体の特徴ベクトルを作成する特徴ベクトル作成手段と、（ｅ）撮影画像全体の特徴ベクトルを元に、撮影画像の学習、又は判別の少なくとも一方を行う手段とを備える画像処理装置であることを要旨とする。 In order to achieve the above object, a first feature of the present invention is an image processing apparatus that performs at least one of learning or discrimination of a shooting target included in a shot image from a shot image, and (a) an arbitrary number of In each area of the captured image divided into areas, a probability estimation unit that estimates the probability that the imaging target is captured, and (b) the image representation method in each area is changed based on the probability estimated by the probability estimation unit. (C) a feature amount extracting unit that extracts a feature vector using the expression method changed by the expression changing unit, and (d) a feature vector in each region extracted by the feature amount extracting unit. And (e) at least learning or discrimination of the captured image based on the feature vector of the entire captured image. And summarized in that an image processing apparatus comprising means for performing one.

第１の特徴に係る画像処理装置によると、領域毎の表現を変化させることにより、ノイズの少ない特徴量を抽出し、撮影画像から撮影画像に含まれる撮影対象を正しく判別することができる。 According to the image processing device according to the first feature, by changing the expression for each region, it is possible to extract a feature amount with less noise and correctly determine a shooting target included in the shot image from the shot image.

又、第１の特徴に係る画像処理装置の確率推定手段は、撮影対象の大きさに関する事前情報を元に、撮影対象が撮影画像中に写る各領域の統計量を求めて、確率を推定してもよい。 The probability estimating means of the image processing apparatus according to the first feature estimates the probability by obtaining a statistic of each region where the shooting target appears in the shot image based on prior information on the size of the shooting target. May be.

この画像処理装置によると、撮影対象の大きさに応じて、確率を推定することができる。 According to this image processing apparatus, the probability can be estimated according to the size of the subject to be photographed.

又、第１の特徴に係る画像処理装置の表現変更手段は、各領域から得られる特徴ベクトルの頻度、あるいは、特徴ベクトルの次元数を用いて、画像の表現方法を変更してもよい。 In addition, the expression changing unit of the image processing apparatus according to the first feature may change the image expressing method using the frequency of the feature vector obtained from each region or the number of dimensions of the feature vector.

この画像処理装置によると、特徴ベクトルの情報を用いて、画像の表現方法を変更することができる。 According to this image processing apparatus, it is possible to change an image representation method using feature vector information.

又、第１の特徴に係る画像処理装置において、学習、又は判別の少なくとも一方を行う撮影対象を、撮影を行った端末装置の位置情報及び撮影対象の位置情報により選択してもよい。 In addition, in the image processing apparatus according to the first feature, the photographing target to be learned or discriminated may be selected based on the position information of the terminal device that has performed the photographing and the position information of the photographing target.

この画像処理装置によると、端末装置の位置情報と撮影対象の位置情報から、処理が必要な撮影対象を絞り込むことができる。 According to this image processing apparatus, it is possible to narrow down the shooting target that needs to be processed from the position information of the terminal device and the position information of the shooting target.

本発明の第２の特徴は、撮影画像から撮影画像に含まれる撮影対象の学習、又は判別の少なくとも一方を行う画像処理方法であって、（ａ）任意の数の領域に分割された撮影画像の各領域において、撮影対象が写る確率を推定するステップと、（ｂ）推定された確率を元に、各領域における画像の表現方法を変更するステップと、（ｃ）変更された表現方法を用いて、特徴ベクトルを抽出するステップと、（ｄ）抽出された各領域における特徴ベクトルを組み合わせることにより、撮影画像全体の特徴ベクトルを作成するステップと、（ｅ）撮影画像全体の特徴ベクトルを元に、撮影対象の学習、又は判別の少なくとも一方を行うステップとを含む画像処理方法であることを要旨とする。 A second feature of the present invention is an image processing method that performs at least one of learning or discrimination of a shooting target included in a shot image from a shot image, and (a) a shot image divided into an arbitrary number of regions (B) using the estimated probability, (b) changing the image representation method in each region based on the estimated probability, and (c) using the modified expression method. Extracting a feature vector; (d) creating a feature vector of the entire captured image by combining the extracted feature vectors in each region; and (e) based on the feature vector of the entire captured image. The gist of the present invention is an image processing method including a step of performing at least one of learning or discrimination of an imaging target.

第２の特徴に係る画像処理方法によると、領域毎の表現を変化させることにより、ノイズの少ない特徴量を抽出し、撮影画像から撮影画像に含まれる撮影対象を正しく判別することができる。 According to the image processing method according to the second feature, it is possible to extract a feature amount with less noise by changing the expression for each region, and to correctly determine the shooting target included in the shot image from the shot image.

本発明によると、ノイズの少ない特徴量を抽出し、撮影画像から撮影画像に含まれる撮影対象を正しく判別する画像処理装置及び画像処理方法を提供することができる。 According to the present invention, it is possible to provide an image processing apparatus and an image processing method that extract feature amounts with less noise and correctly determine a shooting target included in a shot image from a shot image.

次に、図面を参照して、本発明の実施の形態を説明する。以下の図面の記載において、同一又は類似の部分には、同一又は類似の符号を付している。ただし、図面は模式的なものであることに留意すべきである。 Next, embodiments of the present invention will be described with reference to the drawings. In the following description of the drawings, the same or similar parts are denoted by the same or similar reference numerals. However, it should be noted that the drawings are schematic.

（画像処理システム）
本実施形態に係る画像処理システムは、図１に示すように、ユーザの入力を受け付けて画像処理サーバ１（画像処理装置）に処理を依頼し、画像処理サーバ１での処理結果を出力する端末装置２ａ、２ｂ、２ｃと、通信ネットワーク（移動通信網、ＬＡＮ、インターネット等）３を介して、端末装置２ａ、２ｂ、２ｃからの処理依頼を受け付けて処理を行い、端末装置２ａ、２ｂ、２ｃに処理結果を送信する画像処理サーバ１とを備える。 (Image processing system)
As illustrated in FIG. 1, the image processing system according to the present embodiment receives a user input, requests processing from the image processing server 1 (image processing apparatus), and outputs a processing result in the image processing server 1. Receives processing requests from the terminal devices 2a, 2b, and 2c via the devices 2a, 2b, and 2c and the communication network (mobile communication network, LAN, Internet, etc.) 3, and performs processing. And an image processing server 1 for transmitting the processing result.

ユーザは、端末装置２ａ、２ｂ、２ｃを、学習モードと判別モードの２種類のモードを切り換えて使用することができる。「学習モード」とは、既知である撮影対象を撮影し、この撮影画像と撮影対象を特定する情報を画像処理サーバ１へ送信することにより、画像処理サーバ１が撮影画像を学習するモードである。一方、「判別モード」とは、未知である撮影対象を撮影し、この撮影画像を画像処理サーバ１へ送信することにより、画像処理サーバ１が撮影対象を判別し、判別結果を、端末装置２ａ、２ｂ、２ｃへ送信するモードである。 The user can use the terminal devices 2a, 2b, and 2c by switching between two modes, a learning mode and a discrimination mode. The “learning mode” is a mode in which the image processing server 1 learns a photographed image by photographing a known photographing target and transmitting the photographed image and information specifying the photographing target to the image processing server 1. . On the other hand, in the “discrimination mode”, an unknown shooting target is shot, and this shot image is transmitted to the image processing server 1, whereby the image processing server 1 determines the shooting target, and the determination result is displayed as the terminal device 2a. 2b and 2c.

学習モードである場合、画像処理サーバ１は、端末装置２ａ、２ｂ、２ｃから受信した、登録対象情報、登録対象関連情報、撮影対象を撮影した撮影画像から、撮影対象のグルーピングを行い、各撮影対象情報及び画像を記憶する。ここで、「登録対象情報」とは、撮影画像に写っている撮影対象を特定するための情報である。又、「登録対象関連情報」とは、撮影対象の位置情報、撮影対象に関連する情報（名称、ＵＲＬなど）などである。 When in the learning mode, the image processing server 1 performs grouping of shooting targets from the registration target information, the registration target related information, and the shot images obtained by shooting the shooting targets received from the terminal devices 2a, 2b, and 2c. Store target information and images. Here, “registration target information” is information for specifying a shooting target in a shot image. Further, “registration target related information” includes position information of a shooting target, information (name, URL, etc.) related to the shooting target, and the like.

判別モードである場合、画像処理サーバ１は、端末装置２ａ、２ｂ、２ｃの位置情報と、端末装置２ａ、２ｂ、２ｃから受信した撮影画像が撮影された位置に近い、予め登録された撮影対象の登録対象情報、登録対象関連情報などを取り出し、撮影画像に含まれる撮影対象を確率的に判別する。又、画像処理サーバ１は、判別結果となる複数の撮影対象の候補と、各候補が撮影対象である確率と、それら撮影対象に関連する情報を端末装置２ａ、２ｂ、２ｃに送信する。 When it is in the discrimination mode, the image processing server 1 is registered in advance as a photographing target that is close to the position information of the terminal devices 2a, 2b, and 2c and the position where the captured images received from the terminal devices 2a, 2b, and 2c are photographed. The registration target information, registration target related information, and the like are extracted, and the shooting target included in the shot image is determined probabilistically. In addition, the image processing server 1 transmits to the terminal devices 2a, 2b, and 2c a plurality of candidates for shooting targets that are the determination results, the probability that each candidate is a shooting target, and information related to these shooting targets.

画像処理サーバ１は、図２に示すように、通信手段１１と、判別手段１２と、特徴量抽出手段１３と、登録手段１４と、学習手段１５と、演算装置１６と、記憶装置１７と、重み付け手段１８と、確率推定手段３０と、表現変更手段３１と、特徴ベクトル作成手段３２とを備える。 As shown in FIG. 2, the image processing server 1 includes a communication unit 11, a determination unit 12, a feature amount extraction unit 13, a registration unit 14, a learning unit 15, a calculation device 16, a storage device 17, Weighting means 18, probability estimating means 30, expression changing means 31, and feature vector creating means 32 are provided.

通信手段１１は、通信ネットワーク３（移動通信網、ＬＡＮ、インターネット等）を介し、端末装置２ａ、２ｂ、２ｃから、撮影画像及び撮影対象の情報を受信する。又、通信手段１１は、判別モードである場合、通信ネットワーク３（移動通信網、ＬＡＮ、インターネット等）を介し、端末装置２ａ、２ｂ、２ｃへ撮影対象の情報及び判別結果を送信する。 The communication unit 11 receives photographed images and information of photographing objects from the terminal devices 2a, 2b, and 2c via the communication network 3 (mobile communication network, LAN, Internet, etc.). When the communication unit 11 is in the discrimination mode, the communication unit 11 transmits information about the imaging target and the discrimination result to the terminal devices 2a, 2b, and 2c via the communication network 3 (mobile communication network, LAN, Internet, etc.).

確率推定手段３０は、任意の数の領域に分割された撮影画像の各領域において、撮影対象が写る確率を推定する。この各領域における確率は、後に詳述するが、例えば、図１３や図１６に示すような値となる。このような確率を推定する際、確率推定手段３０は、例えば、図１０に示すような、撮影対象の大きさに関する事前情報を元に、撮影対象が撮影画像中に写る各領域の統計量を求めて、確率を推定する。尚、撮影対象の大きさに関する事前情報は、記憶装置１７に予め登録されていてもよい。又、予め、撮影対象の大きさに関する事前情報が登録されていなくても、確率推定手段３０は、図１１〜図１３に示すような、後に詳述する方法を用いて、確率を推定することができる。 The probability estimating means 30 estimates the probability that a shooting target is captured in each area of the captured image divided into an arbitrary number of areas. The probability in each area will be described in detail later, and is, for example, a value as shown in FIG. 13 or FIG. When estimating such a probability, the probability estimating means 30 calculates, for example, the statistic of each region in which the shooting target appears in the shot image based on the prior information on the size of the shooting target as shown in FIG. Find the probability. Note that prior information regarding the size of the subject to be photographed may be registered in the storage device 17 in advance. Even if the prior information regarding the size of the object to be photographed is not registered in advance, the probability estimating means 30 estimates the probability using a method described in detail later as shown in FIGS. Can do.

表現変更手段３１は、確率推定手段３０によって推定された確率を元に、各領域における画像の表現方法を変更する。又、表現変更手段３１は、例えば、各領域から得られる特徴ベクトルの頻度、あるいは、特徴ベクトルの次元（あるいは、特徴ベクトルの頻度と特徴ベクトルの次元の両方）を用いて、画像の表現方法を変更する。この変更方法については、後に詳述する。 The expression changing unit 31 changes the image expressing method in each region based on the probability estimated by the probability estimating unit 30. In addition, the expression changing unit 31 uses, for example, the frequency of the feature vector obtained from each region or the dimension of the feature vector (or both the frequency of the feature vector and the dimension of the feature vector) to change the image expression method. change. This changing method will be described in detail later.

特徴量抽出手段１３は、任意の数の領域に分割された撮影画像の各領域において、撮影対象を判別する際の指標となる特徴量（特徴ベクトル）を抽出する。このとき、特徴量抽出手段１３は、表現変更手段３１によって変更された表現方法を用いて、特徴ベクトルを抽出する。 The feature amount extraction unit 13 extracts a feature amount (feature vector) that serves as an index for determining a shooting target in each region of the captured image divided into an arbitrary number of regions. At this time, the feature quantity extraction unit 13 extracts a feature vector using the expression method changed by the expression change unit 31.

特徴ベクトル作成手段３２は、特徴量抽出手段１３によって抽出された各領域における特徴ベクトルを組み合わせることにより、撮影画像全体の特徴ベクトルを作成する。 The feature vector creating unit 32 creates a feature vector of the entire photographed image by combining the feature vectors in the respective regions extracted by the feature amount extracting unit 13.

又、特徴量抽出手段１３は、例えば、撮影画像の任意の領域毎に、均等色空間で表現された撮影画像の信号を量子化し、均等色空間における各軸の量子化レベルの値の頻度を色ヒストグラムとして抽出する。ここで、「色空間」には、均等色空間（Ｌ^*、ａ^*、ｂ^*空間）、ＲＧＢ色空間、ＣＭＹＫ色空間、Ｌ^*、ｕ^*、ｖ^*空間、ＹＵＶ空間、ＸＹＺ空間などの様々な色空間が含まれる。そして、特徴量抽出手段１３は、任意の領域毎の色ヒストグラムを結合し、撮影画像全体の特徴量を抽出する。即ち、特徴量抽出手段１３は、既知の撮影画像１枚毎に、次元数（V）分の要素を有する撮影画像全体の特徴量（x）を抽出する。特徴量（x）は、一定の要素数を有するベクトルである。以下の説明において、x、x’は、ベクトルである。又、lを既知の撮影対象の番号、nを撮影画像毎に付与した番号、vを特徴量の次元の番号とすると、複数の既知の撮影対象に属する撮影画像の特徴量Xは、各要素x^(l) _(n,v)からなるマトリクスで表される。 The feature amount extraction unit 13 quantizes, for example, the signal of the captured image expressed in the uniform color space for each arbitrary region of the captured image, and calculates the frequency of the quantization level value of each axis in the uniform color space. Extract as a color histogram. Here, the “color space” includes uniform color space (L ^* , a ^* , b ^* space), RGB color space, CMYK color space, L ^* , u ^* , v ^* space, YUV space, XYZ space, etc. Various color spaces are included. Then, the feature amount extraction means 13 combines the color histograms for each arbitrary region, and extracts the feature amount of the entire captured image. That is, the feature amount extraction unit 13 extracts the feature amount (x) of the entire captured image having elements for the number of dimensions (V) for each known captured image. The feature quantity (x) is a vector having a certain number of elements. In the following description, x and x ′ are vectors. Further, if l is a number of a known photographing target, n is a number assigned to each photographed image, and v is a dimension number of a feature amount, the feature amount X of the photographed image belonging to a plurality of known photographing targets is represented by each element. It is represented by a matrix consisting of x ^(l) _{(n, v)} .

学習手段１５（パラメータ算出手段）は、式（１）を用いて、特徴量抽出手段１３によって抽出された特徴量に基づいて、学習パラメータΘを算出する。Vを特徴量の最大次元、Lを既知の撮影対象の数とすると、学習パラメータΘは、各要素θ_(v,l)をV×L個並べたマトリクスである。

The learning unit 15 (parameter calculation unit) calculates the learning parameter Θ based on the feature quantity extracted by the feature quantity extraction unit 13 using the equation (1). The learning parameter Θ is a matrix in which V × L elements θ _{(v, l)} are arranged, where V is the maximum dimension of the feature quantity and L is the number of known shooting targets.

式（１）において、lは既知の撮影対象の番号、Lは既知の撮影対象の数、N^(l)はl番目の既知の撮影対象に対する撮影画像の数、vは特徴量の次元の番号、Vは特徴量の最大次元数、x^(l) _(n,v)はl番目の既知の撮影対象に属するｎ番目の撮影画像のv番目の次元の特徴量、γは任意の値、θ_(v,l)はl番目の既知の撮影対象に属する特徴量のv番目の次元における学習パラメータである。 In Equation (1), l is the number of a known object, L is the number of known objects, N ^(l) is the number of images taken for the l-th known object, and v is the feature dimension number. , V is the maximum number of dimensions of the feature quantity, x ^(l) _{(n, v)} is the feature quantity of the vth dimension of the nth captured image belonging to the lth known imaging target, γ is an arbitrary value, θ _{(v, l)} is a learning parameter in the v-th dimension of the feature quantity belonging to the l-th known imaging target.

上記の説明において、v、l、n（小文字）は変数であり、V、L、N（大文字）は固定値である。又、x、x’（小文字）は、ベクトルであり、X及びΘ（大文字）は、マトリクスである。 In the above description, v, l, and n (lowercase) are variables, and V, L, and N (uppercase) are fixed values. X and x '(lower case) are vectors, and X and Θ (upper case) are matrices.

具体的には、学習手段１５（パラメータ算出手段）は、式（１）を用いて、l番目の既知の撮影対象に属する特徴量のv番目の次元の学習パラメータθ_(v,l)を算出する。 Specifically, the learning unit 15 (parameter calculation unit) calculates the v-th dimension learning parameter θ _{(v, l)} of the feature quantity belonging to the l-th known imaging target using the equation (1). To do.

判別手段１２は、判別モードの際に受信した撮影画像について、式（２）を用いて、学習手段１５（パラメータ算出手段）によって算出された学習パラメータを用いることにより、未知の撮影対象を判別する。

The discriminating unit 12 discriminates an unknown imaging target by using the learning parameter calculated by the learning unit 15 (parameter calculating unit) using Expression (2) for the captured image received in the discrimination mode. .

式（２）において、x’_(v)は未知の撮影対象が含まれる撮影画像のv番目の次元の特徴量、Ｆ_(l)は未知の撮影対象がl番目の既知の撮影対象に属する度合いを示す判別値である。 In Expression (2), x ′ _(v) is the feature quantity of the v th dimension of the captured image including the unknown imaging target, and F _(l) is the degree to which the unknown imaging target belongs to the l th known imaging target. Is a discriminating value.

具体的には、判別手段１２は、l番目の既知の撮影対象に属する特徴量のv番目の次元の学習パラメータθ_(v,l)を用いて、未知の撮影対象が含まれる撮影画像に属する特徴量x’が、l番目の既知の撮影対象に属する度合いを示すＦ_(l)の値を算出する。ここで、Ｆ_(l)の値が大きいほど、未知の撮影対象は、l番目の既知の撮影対象である確率が高いと判断する。 Specifically, the determination unit 12 belongs to a photographed image including an unknown photographing target using the learning parameter θ _{(v, l)} of the v th dimension of the feature amount belonging to the l th known photographing target. A value of F _(l) indicating the degree to which the feature quantity x ′ belongs to the l-th known imaging target is calculated. Here, it is determined that the larger the value of F _(l), the higher the probability that the unknown imaging target is the l-th known imaging target.

このように、式（２）によるＦ_(l)の算出をL回行えば、Ｆ_(l)の値に応じて、未知の撮影対象がL種類の撮影対象のどれに属するかを順位付けて判別することができる。 In this way, if F _(l) is calculated L times according to equation (2), according to the value of F _(l) , which of the L types of imaging targets the unknown imaging target belongs to is ranked. Can be determined.

登録手段１４は、演算装置１６を介して記憶装置１７に撮影画像毎の特徴量や学習パラメータを登録する。 The registration unit 14 registers the feature amount and learning parameter for each captured image in the storage device 17 via the arithmetic device 16.

重み付け手段１８は、任意の領域毎の色ヒストグラムに、任意の重みを掛け合わせる。特徴量抽出手段１３は、重み付け手段１８によって、重みを掛け合わされた任意の領域毎の色ヒストグラムを結合する。任意の重みは、例えば、任意の領域毎の色ヒストグラムの頻度の値の合計値の逆数とすることができる。 The weighting means 18 multiplies the color histogram for each arbitrary region by an arbitrary weight. The feature amount extraction unit 13 combines the color histograms for each arbitrary region multiplied by the weight by the weighting unit 18. The arbitrary weight may be, for example, the reciprocal of the total value of the frequency values of the color histogram for each arbitrary region.

演算装置１６は、通信手段１１と、判別手段１２と、特徴量抽出手段１３と、登録手段１４と、学習手段１５と、記憶装置１７、重み付け手段１８の動作を制御する。 The arithmetic device 16 controls the operations of the communication unit 11, the determination unit 12, the feature amount extraction unit 13, the registration unit 14, the learning unit 15, the storage device 17, and the weighting unit 18.

又、本発明の実施の形態に係る画像処理サーバ１は、処理制御装置（ＣＰＵ）を有し、通信手段１１、判別手段１２、特徴量抽出手段１３、登録手段１４、学習手段１５、重み付け手段１８、確率推定手段３０、表現変更手段３１、特徴ベクトル作成手段３２などをモジュールとしてＣＰＵに内蔵する構成とすることができる。これらのモジュールは、パーソナルコンピュータ等の汎用コンピュータにおいて、所定のプログラム言語を利用するための専用プログラムを実行することにより実現することができる。 The image processing server 1 according to the embodiment of the present invention includes a processing control device (CPU), and includes a communication unit 11, a determination unit 12, a feature amount extraction unit 13, a registration unit 14, a learning unit 15, and a weighting unit. 18, the probability estimating means 30, the expression changing means 31, the feature vector creating means 32, and the like may be built in the CPU as modules. These modules can be realized by executing a dedicated program for using a predetermined program language in a general-purpose computer such as a personal computer.

又、記憶装置１７は、複数の撮影画像の特徴量、複数の撮影画像それぞれに基づいて算出された複数の所定の学習パラメータ、撮影画像データ、登録対象情報、登録対象関連情報、判定値、撮影対象の大きさに関する事前情報などを保存する記録媒体である。記録媒体は、例えば、ＲＡＭ、ＲＯＭ、ハードディスク、フレキシブルディスク、コンパクトディスク、ＩＣチップ、カセットテープなどが挙げられる。このような記録媒体によれば、撮影画像データ、学習パラメータ、登録対象情報などの保存、運搬、販売などを容易に行うことができる。 In addition, the storage device 17 stores a plurality of predetermined learning parameters, photographed image data, registration target information, registration target related information, determination values, and photographing values calculated based on the feature amounts of the plurality of captured images. It is a recording medium for storing prior information related to the size of an object. Examples of the recording medium include RAM, ROM, hard disk, flexible disk, compact disk, IC chip, and cassette tape. According to such a recording medium, it is possible to easily store, transport, and sell captured image data, learning parameters, registration target information, and the like.

端末装置２ａ、２ｂ、２ｃは、学習モードと判別モードの２種類のモードを切り換えることができる。 The terminal devices 2a, 2b, and 2c can switch between two types of modes, a learning mode and a discrimination mode.

学習モードである場合、端末装置２ａ、２ｂ、２ｃは、搭載されたカメラによって、既知の撮影対象を撮影する。端末装置２ａ、２ｂ、２ｃは、予めユーザによって登録された登録対象情報と登録対象関連情報とともに、撮影画像を画像処理サーバ１に送信する。又、画像処理サーバ１は、学習処理を行う撮影対象を、撮影を行った端末装置２の位置情報及び登録対象関連情報に含まれる撮影対象の位置情報により選択する。 In the learning mode, the terminal devices 2a, 2b, and 2c photograph a known subject to be photographed using the mounted camera. The terminal devices 2a, 2b, and 2c transmit the captured image to the image processing server 1 together with registration target information and registration target related information registered in advance by the user. Further, the image processing server 1 selects the shooting target for which the learning process is performed based on the position information of the terminal device 2 that has performed the shooting and the position information of the shooting target included in the registration target related information.

一方、判別モードである場合、端末装置２ａ、２ｂ、２ｃは、搭載されたカメラによって、未知の撮影対象を撮影する。端末装置２ａ、２ｂ、２ｃは、登録対象関連情報とともに、撮影画像を画像処理サーバ１に送信する。又、画像処理サーバ１は、判別処理を行う撮影対象を、撮影を行った端末装置２の位置情報及び登録対象関連情報に含まれる撮影対象の位置情報により選択する。そして、端末装置２ａ、２ｂ、２ｃは、画像処理サーバ１から判別結果となる複数の撮影対象の候補と、各候補が撮影対象である確率と、それら撮影対象に関連する情報を受信し、上述したＦ_(l)の値を元に「登録対象情報」と「登録対象関連情報」を順位付けて、ユーザに提示する。ユーザは、それら順位付けされた候補の中から所望のデータを簡単に取り出すことができる。 On the other hand, in the discrimination mode, the terminal devices 2a, 2b, and 2c take an image of an unknown shooting target with the mounted camera. The terminal devices 2a, 2b, and 2c transmit the captured image to the image processing server 1 together with the registration target related information. Further, the image processing server 1 selects a shooting target to be subjected to the discrimination process based on the position information of the shooting target terminal device 2 and the position information of the shooting target included in the registration target related information. Then, the terminal devices 2a, 2b, and 2c receive a plurality of shooting target candidates that are the determination results from the image processing server 1, the probability that each candidate is a shooting target, and information related to the shooting targets, and Based on the value of F _(l) , “registration target information” and “registration target related information” are ranked and presented to the user. The user can easily extract desired data from the ranked candidates.

端末装置２は、図３に示すように、入力手段２１と、通信手段２２と、出力手段２３と、撮影手段２４と、測位手段２５と、演算装置２６と、記憶装置２７とを備える。 As shown in FIG. 3, the terminal device 2 includes an input unit 21, a communication unit 22, an output unit 23, a photographing unit 24, a positioning unit 25, a calculation device 26, and a storage device 27.

通信手段２２は、通信ネットワーク３（移動通信網、ＬＡＮ、インターネット等）を介し、画像処理サーバ１へ、撮影画像及び撮影対象の情報を送信する。又、通信手段２２は、判別モードである場合、通信ネットワーク３（移動通信網、ＬＡＮ、インターネット等）を介し、画像処理サーバ１から、撮影対象の情報及び判別結果を受信する。 The communication unit 22 transmits the photographed image and the photographing target information to the image processing server 1 via the communication network 3 (mobile communication network, LAN, Internet, etc.). When the communication unit 22 is in the determination mode, the communication unit 22 receives information about the imaging target and the determination result from the image processing server 1 via the communication network 3 (mobile communication network, LAN, Internet, etc.).

撮影手段２４は、具体的には、搭載されたカメラなどを指し、対象を撮影し、撮影画像を取得する。 Specifically, the imaging unit 24 refers to a mounted camera or the like, captures an object, and acquires a captured image.

測位手段２５は、端末装置２の位置や撮影対象の位置を測位する。測位手段２５は、例えば、ＧＰＳ、基地局との連動、超音波、電子コンパスなどを組み合わせて測位を行ってもよい。 The positioning means 25 measures the position of the terminal device 2 and the position of the photographing target. The positioning means 25 may perform positioning by combining, for example, GPS, linkage with a base station, ultrasonic waves, an electronic compass, and the like.

入力手段２１は、タッチパネル、キーボード、マウス、携帯電話のボタン等の機器を指す。入力手段２１から入力操作が行われると対応するキー情報が演算装置２６に伝達される。出力手段２３は、モニタなどの画面を指し、液晶表示装置（ＬＣＤ）、発光ダイオード（ＬＥＤ）パネル、エレクトロルミネッセンス（ＥＬ）パネル等が使用可能である。 The input means 21 refers to devices such as a touch panel, a keyboard, a mouse, and a cellular phone button. When an input operation is performed from the input means 21, corresponding key information is transmitted to the arithmetic device 26. The output means 23 refers to a screen such as a monitor, and a liquid crystal display (LCD), a light emitting diode (LED) panel, an electroluminescence (EL) panel, or the like can be used.

演算装置２６は、入力手段２１と、通信手段２２と、出力手段２３と、撮影手段２４と、測位手段２５と、記憶装置２７の動作を制御する。又、演算装置２６は、入力手段２１から入力されたキー情報などによって、学習モードと判別モードを切り換える切換手段として動作する。 The arithmetic device 26 controls operations of the input means 21, the communication means 22, the output means 23, the photographing means 24, the positioning means 25, and the storage device 27. The arithmetic unit 26 operates as a switching unit that switches between the learning mode and the discrimination mode based on key information input from the input unit 21.

記憶装置２７は、撮影画像、登録対象情報、登録対象関連情報などを保存する記録媒体である。 The storage device 27 is a recording medium that stores captured images, registration target information, registration target related information, and the like.

（画像処理方法）
次に、本実施形態に係る画像処理方法について、図４〜２１を用いて説明する。 (Image processing method)
Next, an image processing method according to the present embodiment will be described with reference to FIGS.

まず、撮影画像の登録方法について、図４を用いて説明する。 First, a method for registering captured images will be described with reference to FIG.

（イ）まず、ステップＳ１０１において、端末装置２は、学習モードにおいて既知の登録対象を撮影し、その撮影画像を取得する。 (A) First, in step S101, the terminal device 2 captures a known registration target in the learning mode, and acquires the captured image.

（ロ）次に、ステップＳ１０２において、端末装置２は、登録対象情報を入力し、ステップＳ１０３において、端末装置２は、登録対象関連情報を入力する。登録対象情報及び登録対象関連情報の入力は、撮影前に予め行っていても構わない。例えば、撮影した画像が図９に示すような洋菓子店舗の場合、「登録対象情報」として、“ケーキ屋”などを入力し、「登録対象関連情報」として、“ＡＡＡ洋菓子店”、ＡＡＡ洋菓子店の住所、ＡＡＡ洋菓子店のＵＲＬなどを入力する。 (B) Next, in step S102, the terminal device 2 inputs registration target information, and in step S103, the terminal device 2 inputs registration target related information. The registration target information and registration target related information may be input in advance before shooting. For example, if the photographed image is a confectionery store as shown in FIG. 9, “cake shop” or the like is input as “registration target information”, and “AAA confectionery store” or AAA confectionery store as “registration target related information”. And the URL of the AAA pastry store.

（ハ）次に、ステップＳ１０４において、端末装置２は、登録対象を撮影した地点の位置情報、測位誤差、撮影時刻、可能であれば撮影対象までの距離や方向を取得し、登録対象関連情報に含めて記憶する。 (C) Next, in step S104, the terminal device 2 acquires the position information, the positioning error, the shooting time, and, if possible, the distance and direction to the shooting target of the point where the registration target is shot, and the registration target related information. Remember to include.

（ニ）次に、ステップＳ１０５において、端末装置２は、画像処理サーバ１に登録対象情報、登録対象関連情報、撮影画像データを送信する。 (D) Next, in step S <b> 105, the terminal device 2 transmits registration target information, registration target related information, and captured image data to the image processing server 1.

（ホ）次に、ステップＳ１０６において、画像処理サーバ１は、登録対象情報、登録対象関連情報、撮影画像データを受信する。そして、ステップＳ１０７において、画像処理サーバ１は、登録画像の特徴量を抽出する。この特徴量の抽出方法は、後に詳述する。 (E) Next, in step S106, the image processing server 1 receives registration target information, registration target related information, and photographed image data. In step S107, the image processing server 1 extracts the feature amount of the registered image. This feature amount extraction method will be described in detail later.

（へ）次に、ステップＳ１０８において、画像処理サーバ１は、登録対象情報、登録対象関連情報、登録対象画像、特徴量及び画像処理サーバ１での登録時刻を記憶装置１７に記憶する。 (F) Next, in step S108, the image processing server 1 stores the registration target information, the registration target related information, the registration target image, the feature amount, and the registration time in the image processing server 1 in the storage device 17.

次に、撮影対象を学習する方法について、図５を用いて説明する。 Next, a method for learning an imaging target will be described with reference to FIG.

（イ）まず、ステップＳ２０１において、画像処理サーバ１は、記憶装置１７から登録対象情報、登録対象関連情報、撮影画像データ、特徴量を読み出す。 (A) First, in step S201, the image processing server 1 reads registration target information, registration target related information, captured image data, and feature amount from the storage device 17.

（ロ）次に、ステップＳ２０２において、画像処理サーバ１は、登録対象関連情報に含まれる位置情報による対象のグルーピングを行って撮影対象を絞り込む。後述する撮影対象の判別時に随時学習を行う場合は、端末装置２から端末装置２の位置情報を受信して、検索範囲内にある撮影対象を学習する。ここで、「検索範囲」とは、判別対象を中心として半径が（測位誤差）＋（対象までの距離）以内のエリアを指す。又、ここで用いる位置情報は、任意の位置を受け付けることが可能である。例えば、予め判別が行われそうな位置を用いて学習を行っても良いし、端末の位置情報を用いて判別する際に随時学習を行っても良い。 (B) Next, in step S202, the image processing server 1 performs grouping of objects based on position information included in the registration object related information to narrow down the imaging objects. In the case where learning is performed as needed at the time of determination of an imaging target, which will be described later, position information of the terminal device 2 is received from the terminal device 2, and the imaging target within the search range is learned. Here, the “search range” refers to an area having a radius within (positioning error) + (distance to the object) with the discrimination target as the center. The position information used here can accept any position. For example, learning may be performed using a position where the determination is likely to be performed in advance, or learning may be performed as needed when performing determination using position information of the terminal.

（ハ）次に、ステップＳ２０３において、画像処理サーバ１は、撮影対象の学習を行う。具体的には、上述した式（１）を用いて学習パラメータの算出を行う。 (C) Next, in step S203, the image processing server 1 learns the shooting target. Specifically, the learning parameter is calculated using the above-described equation (1).

（ニ）次に、ステップＳ２０４において、画像処理サーバ１は、対象の学習結果（学習パラメータ）を記憶する。 (D) Next, in step S204, the image processing server 1 stores the target learning result (learning parameter).

次に、撮影対象を判別する方法について、図６を用いて説明する。 Next, a method for determining an imaging target will be described with reference to FIG.

（イ）まず、ステップＳ３０１において、端末装置２は、判別モードにおいて未知の撮影対象を撮影し、撮影画像を取得する。次に、ステップＳ３０２において、端末装置２は、未知の撮影対象に関する登録対象関連情報を取得する。 (A) First, in step S301, the terminal device 2 captures an unknown capturing target in the discrimination mode and acquires a captured image. Next, in step S302, the terminal device 2 acquires registration target related information regarding an unknown imaging target.

（ロ）次に、ステップＳ３０３において、端末装置２は、画像処理サーバ１に登録対象関連情報及び撮影画像データを送信する。次に、ステップＳ３０４において、画像処理サーバ１は、登録対象関連情報及び撮影画像データを受信する。次に、ステップＳ３０５において、画像処理サーバ１は、撮影画像の特徴量を抽出する。この特徴量の抽出方法は、後に詳述する。 (B) Next, in step S <b> 303, the terminal device 2 transmits registration target related information and captured image data to the image processing server 1. Next, in step S304, the image processing server 1 receives registration target related information and captured image data. Next, in step S305, the image processing server 1 extracts a feature amount of the captured image. This feature amount extraction method will be described in detail later.

（ハ）次に、ステップＳ３０６において、画像処理サーバ１は、判別対象の絞り込みを行う。次に、ステップＳ３０７において、画像処理サーバ１は、検索範囲の学習が完了しているか否か判断する。完了している場合は、ステップＳ３０８の処理に進み、完了していない場合は、ステップＳ３０９の処理に進む。 (C) Next, in step S306, the image processing server 1 narrows down the discrimination target. Next, in step S307, the image processing server 1 determines whether learning of the search range is completed. If completed, the process proceeds to step S308. If not completed, the process proceeds to step S309.

（ニ）ステップＳ３０９において、画像処理サーバ１は、図５のステップＳ２０３において説明した学習を行う。そして、ステップＳ３０８において、画像処理サーバ１は、撮影画像の特徴量と学習パラメータを用いて対象判別を確率的に行う。具体的には、具体的には、上述した式（２）を用いて判定値を求め、判別候補となる撮影画像を求める。 (D) In step S309, the image processing server 1 performs the learning described in step S203 in FIG. In step S308, the image processing server 1 probabilistically performs target determination using the feature amount of the captured image and the learning parameter. Specifically, a determination value is obtained using the above-described equation (2), and a captured image that is a discrimination candidate is obtained.

（ホ）次に、ステップＳ３１０において、画像処理サーバ１は、判別結果の候補に関する登録対象情報及び登録対象関連情報を記憶装置１７から読み出し、登録対象情報、登録対象関連情報、判別結果を端末装置２に送信する。そして、ステップＳ３１１において、端末装置２は、登録対象情報、登録対象関連情報、判別結果を受信する。 (E) Next, in step S310, the image processing server 1 reads the registration target information and registration target related information related to the determination result candidate from the storage device 17, and stores the registration target information, registration target related information, and the determination result in the terminal device. 2 to send. In step S311, the terminal device 2 receives the registration target information, the registration target related information, and the determination result.

（へ）次に、ステップＳ３１２において、端末装置２は、確率的な判別結果を元に登録対象情報と登録対象関連情報に優先度をつけて（例えば、確率の高い候補を画面の上部に表示するなど）ユーザに提示する。これにより、ユーザは未知の撮影対象に関連したＵＲＬ等のアドレスを元にして、ネットワークから更に撮影対象に関連する情報を引き出すことができる。 (F) Next, in step S312, the terminal device 2 gives priority to the registration target information and the registration target related information based on the probabilistic discrimination result (for example, displays a high-probability candidate at the top of the screen). Present it to the user. Thereby, the user can further extract information related to the imaging target from the network based on the address such as the URL related to the unknown imaging target.

図６に示す判別処理によると、例えば、ユーザが、端末装置２によって未知の撮影対象である店舗Ａを撮影し、その撮影画像を画像処理サーバ１へ送信すると、画像処理サーバ１が店舗Ａを判別し、店舗Ａの名称、ＵＲＬ等を端末装置２に送信することができる。このため、端末装置２は、撮影画像からその撮影対象に関する情報を容易に得ることができる。 According to the determination processing shown in FIG. 6, for example, when the user images a store A that is an unknown image capturing object by the terminal device 2 and transmits the captured image to the image processing server 1, the image processing server 1 stores the store A. The name of the store A, the URL, etc. can be transmitted to the terminal device 2. For this reason, the terminal device 2 can easily obtain information regarding the subject to be photographed from the photographed image.

次に、図４のステップＳ１０７及び図６のステップＳ３０５における特徴量の抽出方法の詳細について、図７を用いて説明する。 Next, details of the feature amount extraction method in step S107 in FIG. 4 and step S305 in FIG. 6 will be described with reference to FIG.

（イ）まず、ステップＳ４０１において、画像処理サーバ１は、端末装置２に搭載したカメラによって撮影対象の撮影画像を取得する。ここでは、図９に示す画像を取得したことを例にとり説明する。この取得した画像は、カメラや端末装置２の機能によりホワイトバランス等の一般的な画像補正を行われていてもよい。 (A) First, in step S401, the image processing server 1 acquires a captured image to be captured by a camera mounted on the terminal device 2. Here, the case where the image shown in FIG. 9 is acquired will be described as an example. The acquired image may be subjected to general image correction such as white balance by the function of the camera or the terminal device 2.

（ロ）次に、ステップＳ４０２において、画像処理サーバ１は、取得画像にノイズ除去の画像補正処理を施す。尚、画像補正処理は、必ずしも必要ではなく、状況に応じて施せばよい。 (B) Next, in step S402, the image processing server 1 performs noise correction image correction processing on the acquired image. Note that the image correction process is not necessarily required, and may be performed according to the situation.

（ハ）次に、ステップＳ４０３において、画像処理サーバ１は、特徴量の抽出を行う。 (C) Next, in step S403, the image processing server 1 extracts feature amounts.

ステップＳ４０３における、特徴量の抽出方法の詳細を図８を用いて説明する。 Details of the feature amount extraction method in step S403 will be described with reference to FIG.

（イ）まず、ステップＳ５０１において、画像処理サーバ１は、撮影画像を格子状にI×J分割する。領域の分割方法として、図９では、格子状に分割したものを示したが、放射状でも円形状でもよく、分割形状はこれに限らない。又、領域は等分割されなくてもよい。 (A) First, in step S501, the image processing server 1 divides the captured image into I × J in a grid pattern. As a method for dividing the region, FIG. 9 shows a region divided into a lattice shape, but it may be a radial shape or a circular shape, and the divided shape is not limited to this. Further, the area may not be equally divided.

（ロ）次に、ステップＳ５０２において、画像処理サーバ１は、各領域に対象が写っている確率を推定する。画像の各領域において対象が写る確率を推定するには、撮影の条件と対象の大きさのデータを元に画像の各領域において対象が写る確率を推定することができる。例えば、「撮影対象を撮影画像の中央に、撮影画像の領域一杯に撮影する」という撮影条件を設定し、３個の撮影対象の大きさのデータがあった場合、３個の撮影対象の大きさの縦横比から、画像中に対象が写る頻度を求める。撮影画像の各領域において撮影対象が写る頻度を撮影対象の大きさのデータ数で割ることで、画像の各領域において対象が写る確率を求める。 (B) Next, in step S502, the image processing server 1 estimates the probability that the object is captured in each area. In order to estimate the probability that an object will appear in each area of the image, the probability that the object will appear in each area of the image can be estimated based on the shooting conditions and the size data of the object. For example, when a shooting condition of “shooting the shooting target in the center of the shooting image and the entire area of the shooting image” is set and there is data of the size of the three shooting targets, the size of the three shooting targets is set. The frequency at which the object appears in the image is determined from the aspect ratio. By dividing the frequency at which the photographic subject appears in each area of the photographic image by the number of data of the size of the photographic object, the probability that the object appears in each area of the image is obtained.

具体的な計算例を以下に示す。図１０に示すように、３つの対象に関するサイズが与えられているとする。すると、各対象は図１１のように画像中に写ることが想定される（ここでは、説明を簡単にする為、撮影画像の縦横比は1の場合について説明する）。 A specific calculation example is shown below. As shown in FIG. 10, it is assumed that sizes regarding three objects are given. Then, it is assumed that each object appears in the image as shown in FIG. 11 (here, in order to simplify the description, the case where the aspect ratio of the captured image is 1 will be described).

又、各対象の各領域における頻度は、図１２に示すようになる。ここで、図１３に示すように、対象の総数で頻度を割った値を、各領域における対象の写る確率とみなすことができる。 Moreover, the frequency in each area | region of each object becomes as shown in FIG. Here, as shown in FIG. 13, a value obtained by dividing the frequency by the total number of objects can be regarded as the probability that the object appears in each region.

上記は、対象のサイズが既知であるとした場合の例を述べたが、例えば対象を撮影した画像に写っている領域を元に、同様の計算方法で対象が写る範囲の確率を求めても良い。 The above describes an example in which the size of the target is known. For example, the probability of the range in which the target is captured can be obtained by the same calculation method based on the area captured in the image obtained by capturing the target. good.

更に、対象に関するデータが無い場合には、何かしらの分布を用いて、重み付けを行っても良い。図１４〜１６を用いて、３×３領域に分割した場合に、画面中央部の領域の確率が高くなるように、標準二次元正規分布を用いて（二次元正規分布等、他の分布でも構わない）、各領域の次元数を決定し、値を調節する方法を述べる。まず、図１４は、中心の領域を０とし、この中心領域からの距離を各領域毎に表したものである。図１５は、各領域の値に対して、所定の関数を用いて変換した結果である。ここで、用いた累積分布関数値は、

Furthermore, when there is no data regarding the object, weighting may be performed using some kind of distribution. 14 to 16, using a standard two-dimensional normal distribution (even with other distributions such as a two-dimensional normal distribution) so that the probability of the area in the center of the screen is high when divided into 3 × 3 areas. It does not matter) how to determine the number of dimensions of each region and adjust the value. First, FIG. 14 shows the center area as 0 and the distance from the center area for each area. FIG. 15 shows the result of converting the values in each region using a predetermined function. Here, the cumulative distribution function value used is

である。ここで、ｒは、図１４に示す距離マトリクス、μ＝０、σ＝１として計算した。 It is. Here, r was calculated with the distance matrix shown in FIG. 14, μ = 0, and σ = 1.

そして、図１６に示す正規化累積分布関数値を、画像の各領域に対象が写る確率とみなす。図１６では、各領域の合計値（画面全体）を１として正規化を行った例であるが、図１３に示すように、画面の中央領域を１として正規化しても構わない。画面の中央には撮影対象が存在する確率が非常に高いため、図１３に示す正規化も有用である。 Then, the normalized cumulative distribution function value shown in FIG. 16 is regarded as the probability that the object appears in each area of the image. FIG. 16 shows an example in which normalization is performed by setting the total value (entire screen) of each region as 1, but normalization may be performed by setting the central region of the screen as 1 as shown in FIG. Since there is a very high probability that an imaging target exists in the center of the screen, normalization shown in FIG. 13 is also useful.

（ハ）次に、ステップＳ５０３において、確率に基づいて、次元数や頻度を変更する際の重みを決定する。このとき、ステップＳ５０２において推定した確率をそのまま重みとして用いてもよい。ステップＳ５０４において、次元数を変更するか否か判断し、変更する場合は、ステップＳ５０５において、重みに基づいて次元数を変更し、変更しない場合は、ステップＳ５０６において、次元数を固定とする。 (C) Next, in step S503, weights for changing the number of dimensions and frequency are determined based on the probability. At this time, the probability estimated in step S502 may be used as it is as a weight. In step S504, it is determined whether or not the number of dimensions is to be changed. If so, the number of dimensions is changed based on the weight in step S505. If not, the number of dimensions is fixed in step S506.

ここで、次元数を変更する方法について述べる。次元数を変更する場合は、画像の各領域に対象が写る確率に応じて、量子化のビット数を変えることで、画像の各領域から抽出する特徴ベクトルの次元数を変える。図１７では、図１６に示す確率を元に、量子化ビット数を算出し、色空間と量子化ビット数をかけることで、次元数を算出している。このようにして算出した次元数に応じて、図１８に示すように、色ヒストグラムの次元数を変更する。例えば、図１９に示すように、撮影画像の中央部と、周囲と、中央部と周囲の間で、色ヒストグラムの次元数を変更する場合について説明する。図１９では、中央部では９次元、周囲では３次元、中央部と周囲の間では、６次元の次元数としている。中央部には撮影対象が存在する確率が高いため、中央部の次元数を上げている。 Here, a method for changing the number of dimensions will be described. When changing the number of dimensions, the number of dimensions of the feature vector extracted from each area of the image is changed by changing the number of quantization bits according to the probability that the object appears in each area of the image. In FIG. 17, the number of quantization bits is calculated based on the probability shown in FIG. 16, and the number of dimensions is calculated by multiplying the color space and the number of quantization bits. In accordance with the number of dimensions calculated in this manner, the number of dimensions of the color histogram is changed as shown in FIG. For example, as shown in FIG. 19, a case will be described in which the dimensionality of the color histogram is changed between the center portion and the periphery of the captured image, and between the center portion and the periphery. In FIG. 19, the number of dimensions is 9 dimensions at the center, 3 dimensions at the periphery, and 6 dimensions between the center and the periphery. Since there is a high probability that an object to be photographed exists in the center, the number of dimensions in the center is increased.

（ニ）次に、ステップＳ５０７において、画像処理サーバ１は、領域毎に、画像を視覚的に均等な空間である均等色空間（Ｌ^*、ａ^*、ｂ^*）で表現する。ここで、撮影画像を格子状にI×J分割した場合、i,j番目の領域における次元数をV_i,jとする。例えば、各分割領域における次元数は、図１８のように示される。そして、i,j番目の領域における各領域のV_i,j次元で、特徴量（特徴量は色ヒストグラム、周波数、画素値等でも構わない）を抽出する。本実施形態では、色ヒストグラムで抽出したとして話を進める。各領域の色ヒストグラムは、V_i,j次元の特徴量（V_i,j個の要素を有するベクトル値）として表される。ステップＳ５０８において、画像処理サーバ１は、均等色空間における各軸を独立に、次元数に基づいた量子化を行う。そして、ステップＳ５０９において、画像処理サーバ１は、Ｌ^*、ａ^*、ｂ^*の量子化レベルの値の頻度を算出する。 (D) Next, in step S507, the image processing server 1 represents the image in a uniform color space (L ^* , a ^* , b ^* ) that is a visually uniform space for each region. Here, when the captured image is divided into I × J in a grid pattern, the number of dimensions in the i, j-th region is V _{i, j} . For example, the number of dimensions in each divided region is shown as in FIG. Then, feature quantities (the feature quantities may be color histograms, frequencies, pixel values, etc.) are extracted in the V _{i, j} dimensions of each area in the i, j-th area. In the present embodiment, the discussion proceeds assuming that the color histogram is extracted. The color histogram of each region is represented as a V _{i, j-} dimensional feature quantity (vector value having V _{i, j} elements). In step S508, the image processing server 1 performs quantization based on the number of dimensions independently for each axis in the uniform color space. In step S509, the image processing server 1 calculates the frequency of the quantization level values of L ^* , a ^* , and b ^* .

（ホ）次に、ステップＳ５１０において、画像処理サーバ１は、頻度の調整を行うか否か判断し、調整を行う場合はステップＳ５１１に進み、重みに基づいた頻度の調整を行い、調整を行わない場合は、ステップＳ５１２に進む。 (E) Next, in step S510, the image processing server 1 determines whether or not to adjust the frequency, and if so, the process proceeds to step S511 to adjust the frequency based on the weight and perform the adjustment. If not, the process proceeds to step S512.

ここで、ステップＳ５１１において、頻度を調整する場合は、画像の各領域から画像の各領域の次元数によって抽出した特徴ベクトルの値に、画像の各領域に対象が写る確率の値をかける。例えば、図２０に示すように、撮影画像の中央部と、周囲と、中央部と周囲の間で、色ヒストグラムの頻度を調整する。図２０では、中央部、周囲、中央部と周囲の間では、すべて６次元であるが、各領域において、頻度を増加したり、減少したりする。中央部には撮影対象が存在する確率が高いため、中央部に近いほど、頻度を増加している。 In step S511, when the frequency is adjusted, the value of the feature vector extracted from each area of the image according to the number of dimensions of each area of the image is multiplied by the value of the probability that the object appears in each area of the image. For example, as shown in FIG. 20, the frequency of the color histogram is adjusted between the center portion and the periphery of the captured image and between the center portion and the periphery. In FIG. 20, the center part, the periphery, and between the center part and the periphery are all six-dimensional, but the frequency is increased or decreased in each region. Since there is a high probability that an object to be photographed exists in the center, the frequency increases as the distance from the center increases.

尚、次元数の変更、頻度の調整は、どちらか一方を行ってもよく、両方を行ってもよい。両方を行った場合は、図２１に示すように、各領域毎に次元数が変化し、かつ、領域毎に頻度を調整することとなる。 Note that either one of the dimensionality change and the frequency adjustment may be performed, or both may be performed. When both are performed, as shown in FIG. 21, the number of dimensions changes for each region, and the frequency is adjusted for each region.

（へ）次に、ステップＳ５１３において、画像処理サーバ１は、上記の次元数と頻度を調節した、画像の各領域のヒストグラムを結合し、撮影画像全体の特徴ベクトルを抽出する。例えば、ステップＳ５０１において、撮影画像を格子状にI×J分割した場合、i,j番目の領域における次元数をV_i,jとする。各領域の特徴量を結合して、全ての領域の特徴量を結合した特徴量の次元Vは、

(F) Next, in step S513, the image processing server 1 combines the histograms of the respective regions of the image with the number of dimensions and the frequency adjusted, and extracts a feature vector of the entire photographed image. For example, when the captured image is divided into I × J in a grid pattern in step S501, the number of dimensions in the i, j-th region is V _{i, j} . The dimension V of the feature value obtained by combining the feature values of each region and combining the feature values of all the regions is

で表される。 It is represented by

図１８に示す結果の場合は、全体のヒストグラムは、45次元+27次元×4ブロック+15次元×4ブロック=213次元ベクトルとなる。 In the case of the result shown in FIG. 18, the entire histogram is 45 dimensions + 27 dimensions × 4 blocks + 15 dimensions × 4 blocks = 213 dimensions vector.

（ト）次に、ステップＳ５１４において、全体のヒストグラム（213次元ベクトル）を正規化する。尚、正規化処理は、必ずしも必要ではなく、状況に応じて施せばよい。 (G) Next, in step S514, the entire histogram (213-dimensional vector) is normalized. The normalization process is not necessarily required, and may be performed according to the situation.

（作用及び効果）
本実施形態に係る画像処理サーバ１（画像処理装置）及び画像処理方法によると、通常、撮影対象は画像中央部に位置することを利用し、対象が映っている可能性の高い画像中央部の表現（次元数や頻度）を高めて、背景が映っている可能性が低い画像の周囲の表現（次元数や頻度）を低下させることで、より対象の情報を含み、かつノイズの少ない画像特徴量を抽出することができる。このため、撮影画像から撮影画像に含まれる撮影対象を正しく判別することができる。 (Action and effect)
According to the image processing server 1 (image processing apparatus) and the image processing method according to the present embodiment, the shooting target is usually located in the center of the image, and the center of the image where the target is likely to be reflected is used. Image features that contain more information of interest and are less noisy by increasing the representation (number of dimensions and frequency) and reducing the surrounding representation (number of dimensions and frequency) of images that are less likely to show the background The amount can be extracted. For this reason, it is possible to correctly determine the shooting target included in the shot image from the shot image.

又、画像処理サーバ１の確率推定手段３０は、撮影対象の大きさに関する事前情報を元に、撮影対象が撮影画像中に写る各領域の統計量を求めて、確率を推定することができる。 Further, the probability estimating means 30 of the image processing server 1 can estimate the probability by obtaining the statistic of each region where the shooting target appears in the shot image based on the prior information regarding the size of the shooting target.

又、画像処理サーバ１の表現変更手段３１は、各領域から得られる特徴ベクトルの頻度、あるいは、特徴ベクトルの次元数を用いて、画像の表現方法を変更することができる。 The expression changing unit 31 of the image processing server 1 can change the image expression method using the frequency of feature vectors obtained from each region or the number of dimensions of feature vectors.

又、画像処理サーバ１において、学習、又は判別の少なくとも一方を行う撮影対象を、撮影を行った端末装置２の位置情報及び撮影対象の位置情報により選択することができる。このため、撮影対象の位置情報から、処理が必要な撮影対象を絞り込むことができる。 Further, in the image processing server 1, the photographing target for which at least one of learning or discrimination can be performed can be selected based on the positional information of the terminal device 2 that has performed the photographing and the positional information of the photographing target. For this reason, it is possible to narrow down the shooting target that needs to be processed from the position information of the shooting target.

又、上述した式（１）及び式（２）は、いわゆるＮＢ（ナイーブ・ベイズ）の手法であり、未知の撮影対象が既知の撮影対象に該当する確率を、複数の既知の撮影対象それぞれに算出することができ、かつ、処理速度が早いという利点を有する。 Further, the above-described formulas (1) and (2) are so-called NB (Naive Bayes) techniques, and the probability that an unknown shooting target corresponds to a known shooting target is assigned to each of a plurality of known shooting targets. It has the advantage that it can be calculated and the processing speed is fast.

（その他の実施の形態）
本発明は上記の実施の形態によって記載したが、この開示の一部をなす論述及び図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例及び運用技術が明らかとなろう。 (Other embodiments)
Although the present invention has been described according to the above-described embodiments, it should not be understood that the descriptions and drawings constituting a part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples, and operational techniques will be apparent to those skilled in the art.

例えば、上述した実施形態において、ＮＢ（ナイーブ・ベイズ）の手法を用いて、学習パラメータを取得し、未知の撮影対象を判別する例を示した。しかし、ＮＢに限らず、他のベイズ手法やＳＶＭ（サポート・ベクター・マシン）、ｋＮＮ（ｋニアレスト・ネイバー）、ＬＶＱ（ラーニング・ベクター・クォンタイゼーション）などの他のアルゴリズムを用いて、学習パラメータを取得し、未知の撮影対象を判別しても構わない。 For example, in the above-described embodiment, an example has been shown in which learning parameters are acquired using an NB (Naive Bayes) method to determine an unknown imaging target. However, not only NB, but also other Bayesian methods and other algorithms such as SVM (Support Vector Machine), kNN (k Nearest Neighbor), LVQ (Learning Vector Quantization), etc. May be obtained to determine an unknown imaging target.

又、実施形態に係る画像処理サーバ１は、通信手段１１と、判別手段１２と、特徴量抽出手段１３と、登録手段１４と、学習手段１５と、重み付け手段１８と、確率推定手段３０と、表現変更手段３１と、特徴ベクトル作成手段３２とをモジュールとしてＣＰＵに内蔵する構成とすることができると説明したが、それらが二つあるいはそれ以上のＣＰＵに分かれていても構わない。その際はそれらのＣＰＵ間でデータのやりとりが行えるようにバスなどで装置間を接続しているとする。 The image processing server 1 according to the embodiment includes a communication unit 11, a determination unit 12, a feature amount extraction unit 13, a registration unit 14, a learning unit 15, a weighting unit 18, a probability estimation unit 30, Although it has been described that the expression changing unit 31 and the feature vector creating unit 32 can be built in the CPU as modules, they may be divided into two or more CPUs. In this case, it is assumed that the devices are connected by a bus or the like so that data can be exchanged between the CPUs.

このように、本発明はここでは記載していない様々な実施の形態等を含むことは勿論である。従って、本発明の技術的範囲は上記の説明から妥当な特許請求の範囲に係る発明特定事項によってのみ定められるものである。 As described above, the present invention naturally includes various embodiments not described herein. Therefore, the technical scope of the present invention is defined only by the invention specifying matters according to the scope of claims reasonable from the above description.

本発明の実施の形態に係る画像処理システムの構成ブロック図である。1 is a configuration block diagram of an image processing system according to an embodiment of the present invention. 本発明の実施の形態に係る画像処理サーバの構成ブロック図である。It is a block diagram of the configuration of the image processing server according to the embodiment of the present invention. 本発明の実施の形態に係る端末装置の構成ブロック図である。It is a block diagram of the configuration of a terminal device according to an embodiment of the present invention. 本発明の実施の形態に係る登録処理を示すフローチャートである。It is a flowchart which shows the registration process which concerns on embodiment of this invention. 本発明の実施の形態に係る学習処理を示すフローチャートである。It is a flowchart which shows the learning process which concerns on embodiment of this invention. 本発明の実施の形態に係る判別処理を示すフローチャートである。It is a flowchart which shows the discrimination | determination process which concerns on embodiment of this invention. 本発明の実施の形態に係る特徴量抽出処理を示すフローチャートである。It is a flowchart which shows the feature-value extraction process which concerns on embodiment of this invention. 図７のステップＳ４０３の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S403 of FIG. 本発明の実施の形態に係る撮影画像の一例である。It is an example of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影対象の大きさの一例である。It is an example of the magnitude | size of the imaging | photography object which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像に占める撮影対象の範囲を示す図である。It is a figure which shows the range of the imaging | photography object which occupies for the captured image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影対象の各領域における頻度を示す図である。It is a figure which shows the frequency in each area | region of the imaging | photography object which concerns on embodiment of this invention. 図１２において、データ数で正規化した確率を示す図である。In FIG. 12, it is a figure which shows the probability normalized by the number of data. 本発明の実施の形態に係る撮影画像の各領域の距離マトリクスを示す図である。It is a figure which shows the distance matrix of each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域の累積分布関数値を示す図である。It is a figure which shows the cumulative distribution function value of each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域の正規化累積分布関数値を示す図である。It is a figure which shows the normalized cumulative distribution function value of each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域の次元数を示す図である。It is a figure which shows the number of dimensions of each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域における特徴ベクトルの次元数を示す図である。It is a figure which shows the dimension number of the feature vector in each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域における色ヒストグラムの次元数変更を説明するための図である。It is a figure for demonstrating the dimension number change of the color histogram in each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域における色ヒストグラムの頻度調整を説明するための図である。It is a figure for demonstrating the frequency adjustment of the color histogram in each area | region of the picked-up image which concerns on embodiment of this invention. 本発明の実施の形態に係る撮影画像の各領域における色ヒストグラムの次元数変更及び頻度調整を説明するための図である。It is a figure for demonstrating the dimensionality change and frequency adjustment of the color histogram in each area | region of the picked-up image which concerns on embodiment of this invention.

Explanation of symbols

１画像処理サーバ
２端末装置
３通信ネットワーク
１１通信手段
１２判別手段
１３特徴量抽出手段
１４登録手段
１５学習手段
１６演算装置
１７記憶装置
１８重み付け手段
２１入力手段
２２通信手段
２３出力手段
２４撮影手段
２５測位手段
２６演算装置
２７記憶装置
３０確率推定手段
３１表現変更手段
３２特徴ベクトル作成手段 DESCRIPTION OF SYMBOLS 1 Image processing server 2 Terminal apparatus 3 Communication network 11 Communication means 12 Discriminating means 13 Feature-value extraction means 14 Registration means 15 Learning means 16 Arithmetic device 17 Storage device 18 Weighting means 21 Input means 22 Communication means 23 Output means 24 Imaging means 25 Positioning Means 26 Computing device 27 Storage device 30 Probability estimating means 31 Expression changing means 32 Feature vector creating means

Claims

An image processing device that performs at least one of learning or discrimination of a shooting target included in a captured image from a captured image,
Probability estimation means for estimating the probability that the subject is captured in each region of the captured image divided into an arbitrary number of regions;
Based on the probability estimated by the probability estimating means, the expression changing means for changing the expression method of the image in each region;
Feature amount extraction means for extracting feature vectors using the expression method changed by the expression change means;
Feature vector creation means for creating a feature vector of the entire captured image by combining feature vectors in the respective regions extracted by the feature amount extraction means;
Means for performing at least one of learning or discrimination of an imaging target based on the feature vector of the entire captured image;
The expression changing means changes the image expression method using the dimensionality of the feature vector obtained from each region ,
The change of the expression method by the expression changing means is characterized in that the number of dimensions is changed so that the number increases in accordance with a high probability estimated by the probability estimating means in each region. Image processing device.

2. The probability estimating unit estimates a probability by obtaining a statistic of each region in which the imaging target appears in the captured image based on prior information on the size of the imaging target. An image processing apparatus according to 1.

The image processing apparatus according to claim 1, wherein the expression changing unit changes an image expression method using a frequency of a feature vector obtained from each region.

4. The apparatus according to claim 1, wherein the imaging target for performing at least one of learning and discrimination is selected based on position information of a terminal device that has performed imaging and position information of the imaging target. Image processing apparatus.

An image processing method in an image processing apparatus that performs at least one of learning or discrimination of a shooting target included in a captured image from a captured image,
The image processing apparatus estimating a probability that the photographing object appears in each region of the photographed image divided into an arbitrary number of regions;
The image processing apparatus, based on the estimated probability, changing a representation method of the image in each region;
The image processing apparatus using the modified representation method to extract a feature vector;
The image processing device creating a feature vector of the entire captured image by combining the extracted feature vectors in each of the regions;
The image processing apparatus includes a step of performing at least one of learning or discrimination of an imaging target based on a feature vector of the entire captured image,
In the step of expressing, using the number of dimensions of the feature vector obtained from each region, the method of expressing the image is changed ,
In the image processing method, the number of dimensions is changed so that the number increases in accordance with a high probability estimated by the probability estimation unit in each region .