JP5823270B2

JP5823270B2 - Image recognition apparatus and method

Info

Publication number: JP5823270B2
Application number: JP2011262535A
Authority: JP
Inventors: 松尾　賢治; 賢治松尾
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-11-30
Filing date: 2011-11-30
Publication date: 2015-11-25
Anticipated expiration: 2031-11-30
Also published as: JP2013114596A

Description

本発明は、デジタル情報化された画像を対象とした画像認識装置及び方法に関し、特に、画像内に含まれる複数の物体について、各物体が何であるかを自動的に認識するだけではなく、画像内で存在する場所をも自動的に認識することのできる画像認識装置及び方法に関する。 The present invention relates to an image recognition apparatus and method for digitalized images, and more particularly to not only automatically recognizing what each object is but also a plurality of objects included in an image. The present invention relates to an image recognition apparatus and method capable of automatically recognizing a place existing in a computer.

近年、デジタルカメラが普及し、膨大な枚数のデジタル画像が日々撮影され、蓄積されている。画像を情報資源として利活用するためには、あらかじめ計算機上で処理が容易なテキスト情報に変換することが望ましいと考えられ、その実現のためには、人手を介さず画像認識により画像内容をテキスト情報化することが有効である。 In recent years, digital cameras have become widespread, and a huge number of digital images are taken and accumulated every day. In order to use images as information resources, it is desirable to convert them into text information that can be easily processed on a computer in advance. For this purpose, the image contents are converted into text information by image recognition without human intervention. Informatization is effective.

従来の取り組みの一つとして、画像内に含まれる物体を推定する画像認識方法がある。例えば、Torralbaらにより、あらかじめ構築された大規模な画像データセット内の類似事例に基づいて、新たに入力された画像内に含まれる物体の名称を推定する画像認識方法が提案されている。当該方法は非特許文献１に開示されている。 One conventional approach is an image recognition method for estimating an object included in an image. For example, Torralba et al. Have proposed an image recognition method for estimating the name of an object included in a newly input image based on similar cases in a large-scale image data set that has been constructed in advance. This method is disclosed in Non-Patent Document 1.

当該従来の画像認識方法は、画像内に含まれる物体の名称がラベル付けされた画像をあらかじめ大量に収集して構築した、約8000万組の大規模データセットを利用する。すなわち、新たに入力された画像に類似する画像を大規模データセットから検索し、類似画像群に付与されているラベルから、投票により入力画像内に含まれる物体の名称を推定するものである。 The conventional image recognition method uses about 80 million large-scale data sets constructed by collecting a large number of images labeled with names of objects included in the images in advance. That is, an image similar to a newly input image is searched from a large-scale data set, and the name of an object included in the input image is estimated by voting from a label assigned to the similar image group.

当該従来技術では、データセットの規模の大きさに期待し、データセットが大規模であるほど、データセット内で類似する画像には、新たに入力された画像内に含まれる物体の名称を的確に示すラベルが付与されている画像を数多く含むため、投票による推定が可能となる。また、データセットが大規模であるほど、どのような名称の物体を含む画像が入力されても、データセット内には類似する画像が存在する確率が高くなる。 In the related art, expecting the size of the data set, and the larger the data set, the more accurately the names of objects included in the newly input image are displayed in the similar images in the data set. Since many images to which the label shown in (2) is given are included, estimation by voting becomes possible. In addition, the larger the data set, the higher the probability that a similar image exists in the data set regardless of the name of the input object.

以下、非特許文献１に開示の当該従来技術を紹介する。図15は当該従来技術に係る画像認識を行う画像認識部の機能ブロックとその補助説明(1)〜(5)とを示す図である。画像認識部12は、画像サイズ正規化部81と、画像特徴量変換部82と、データセット蓄積部83と、比較部84と、投票判定部85とから構成され、当該各部は以下のように機能する。 The related art disclosed in Non-Patent Document 1 will be introduced below. FIG. 15 is a diagram showing functional blocks of an image recognition unit that performs image recognition according to the related art and auxiliary explanations (1) to (5). The image recognition unit 12 includes an image size normalization unit 81, an image feature amount conversion unit 82, a data set storage unit 83, a comparison unit 84, and a vote determination unit 85. Each unit is as follows. Function.

画像サイズ正規化部81は、(1)に示すような様々なサイズの画像が入力されることに対応するため、拡大縮小を行い、(2)に示すように常に一定のサイズの画像に変換する。図15では、非特許文献1の記載同様、高さ32画素、幅32画素の正規化画像が作成される場合を一例として示してある。 The image size normalization unit 81 performs scaling to cope with the input of images of various sizes as shown in (1), and always converts them into images of a constant size as shown in (2). To do. FIG. 15 shows an example in which a normalized image having a height of 32 pixels and a width of 32 pixels is created as described in Non-Patent Document 1.

画像特徴量変換部82は、(3)に示すように正規化画像から特徴量を抽出し、画像特徴ベクトルへと変換する。非特許文献1では、高さ32画素、幅32、RGBの3成分で表現されるカラー画像を32x32x3=3072次元の画像ベクトルと考え、ゼロ平均化(zero mean)およびノルム正規化(unit norm)の信号正規化を適用して3072次元の特徴ベクトルを得ている。 The image feature amount conversion unit 82 extracts feature amounts from the normalized image as shown in (3), and converts them into image feature vectors. In Non-Patent Document 1, a color image expressed by three components of 32 pixels in height, 32 in width, and RGB is considered as a 32x32x3 = 3072 dimensional image vector, and zero average and norm normalization (unit norm) The 3072-dimensional feature vector is obtained by applying the signal normalization.

データセット蓄積部83は、(4)に示すように、あらかじめ大量に収集した画像内に含まれる物体の名称がラベル付けされた画像のそれぞれに対して、画像サイズ正規化部81および画像特徴量変換部82にて変換された特徴ベクトルと画像に付与されたラベルの組を大規模に蓄積する。非特許文献1の例では、約8万種類の物体(当該種類は同数の約8万語にて指定される)に関する約8000万枚の大規模な画像データセットが格納されている。 As shown in (4), the data set accumulation unit 83 performs image size normalization unit 81 and image feature amount for each of the images labeled with the names of objects included in images collected in large quantities in advance. The feature vector converted by the conversion unit 82 and the set of labels attached to the image are accumulated on a large scale. In the example of Non-Patent Document 1, about 80 million large-scale image data sets relating to about 80,000 types of objects (the types are specified by the same number of about 80,000 words) are stored.

比較部84は、入力画像に対して、画像サイズ正規化部81および画像特徴量変換部82にて変換された特徴ベクトルと、データセット蓄積部83内に登録されている全特徴ベクトルのそれぞれとを比較し、類似度をそれぞれ算出する。類似度は入力画像と特徴ベクトルに対応する登録画像とがどの程度似ているかを示す。比較部84はさらに、類似度の高い登録画像から順に指定された所定数の検索件数M件を類似画像としてリストアップする。(4)にはM=24とした例が示してある。 The comparison unit 84, with respect to the input image, the feature vector converted by the image size normalization unit 81 and the image feature amount conversion unit 82, and each of all the feature vectors registered in the data set storage unit 83 Are compared, and the respective similarities are calculated. The degree of similarity indicates how similar the input image and the registered image corresponding to the feature vector are. Further, the comparison unit 84 lists a predetermined number M of search cases specified in order from registered images having a high degree of similarity as similar images. (4) shows an example where M = 24.

投票判定部85は、(5)に示すように、当該上位M件の類似画像に付与されたラベルによる投票を行う。投票とはすなわち、各ラベルが類似画像に付与されて出現した出現頻度を測定することである。投票判定部85はさらに、出現頻度が設定されたしきい値を上回るラベルを、入力画像内に含まれる物体の名称の推定結果とする。(5)には、入力画像に含まれていた通りの「自動車」と正しく推定された結果を例として示してある。 As shown in (5), the voting determination unit 85 performs voting based on the labels assigned to the top M similar images. In other words, voting is to measure the appearance frequency of each label appearing on a similar image. The voting determination unit 85 further uses a label whose appearance frequency exceeds the set threshold as an estimation result of the name of the object included in the input image. (5) shows, as an example, the result of the correct estimation of “car” as included in the input image.

Antonio Torralba、 Rob Fergus and William T. Freeman、 "80 million tiny images: a large dataset for non-parametric object and scene recognition、" IEEE Transactions on Pattern Analysis and Machine Intelligence、 Volume 30、 Issue 11、 No. 11、 November 2008、 pp. 1958-1970.Antonio Torralba, Rob Fergus and William T. Freeman, "80 million tiny images: a large dataset for non-parametric object and scene recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 30, Issue 11, No. 11, November 2008, pp. 1958-1970.

しかしながら、非特許文献1に記載の従来技術の課題として、(課題1) 複数の物体を含む画像に対する特別な配慮は無く、対応していないこと、(課題2) 画像内で物体が存在する場所まで認識できないこと、及び(課題3) 推定精度に改善の余地があること、の3つが挙げられる。具体的には以下の通りである。 However, as problems of the prior art described in Non-Patent Document 1, (Problem 1) There is no special consideration for images containing multiple objects, and it is not supported, (Problem 2) Where objects exist in images There are three reasons: it cannot be recognized until now, and (Problem 3) there is room for improvement in estimation accuracy. Specifically, it is as follows.

(課題1) 前述の通り、非特許文献1に記載の従来の画像認識技術は、入力画像全体を推定対象とし、データセットとの比較を行う。もし入力画像内に複数の物体が含まれるとしても、そのそれぞれの物体単位でデータセットと比較するわけではなく、あくまで入力画像全体を推定対象とする。したがって、複数物体を含む画像の認識には必ずしも適切に対応できているとは言えない。例えば、図15の例に示したような「自動車」と「人」が含まれる入力画像に対して、これら2つの物体をそれぞれ高精度に推定するための特別な配慮がなかった。 (Problem 1) As described above, the conventional image recognition technique described in Non-Patent Document 1 uses the entire input image as an estimation target and compares it with a data set. Even if a plurality of objects are included in the input image, it is not compared with the data set for each object, but the entire input image is used as an estimation target. Therefore, it cannot be said that it can always properly cope with recognition of an image including a plurality of objects. For example, with respect to an input image including “car” and “person” as shown in the example of FIG. 15, there was no special consideration for estimating each of these two objects with high accuracy.

(課題2) 非特許文献1に記載の従来の画像認識技術は、画像内に含まれる物体の名称を推定することはできる一方、その物体が画像内のどこに存在するか、場所まで認識するものではなかった。画像を情報資源として利活用し、映像情報処理を高度化するためには、画像内に含まれる物体の名称を推定するだけではなく、存在する場所まで認識することが有用であるが、当該従来技術では実現できなかった。 (Problem 2) The conventional image recognition technology described in Non-Patent Document 1 can estimate the name of an object included in the image, but recognizes where the object is located in the image. It wasn't. In order to utilize images as information resources and to improve video information processing, it is useful not only to estimate the names of objects contained in images, but also to recognize existing locations. It was not possible with technology.

(課題3) 非特許文献1に記載の従来の画像認識技術は、推定精度の良否が投票判定部85にて設定するしきい値に大きく左右される。例えば、図15の例で、しきい値を下げると入力画像に含まれる「自動車」だけでなく、「人」や「木」といった複数の推定候補が過剰にリストアップされ、反対にしきい値を上げると推定結果が「該当無し」と厳しく判定されることもあった。したがって、推定精度の改善および安定化に余地があった。 (Problem 3) In the conventional image recognition technology described in Non-Patent Document 1, the quality of the estimation accuracy is greatly influenced by the threshold set by the voting determination unit 85. For example, in the example of FIG. 15, when the threshold value is lowered, not only “automobile” included in the input image but also a plurality of estimation candidates such as “people” and “trees” are excessively listed. In some cases, the estimation result was strictly judged as “not applicable”. Therefore, there is room for improvement and stabilization of estimation accuracy.

以上のような点を踏まえて、本発明は、上記課題1及び2を解決し、画像内に含まれる複数の物体(又は対象)について、各物体が何であるかを自動的に認識するだけではなく、画像内で存在する位置をも自動的に認識することのできる画像認識装置及び方法を提供することを第一の目的とする。 Based on the above points, the present invention solves the above-mentioned problems 1 and 2, and for only a plurality of objects (or targets) included in an image, it only recognizes what each object is automatically. The first object of the present invention is to provide an image recognition apparatus and method capable of automatically recognizing a position existing in an image.

また、本発明は、前記第一の目的に加えてさらに、上記第3の課題をも解決し、画像内に含まれる複数の物体(又は対象)がそれぞれ何であるかを高精度で認識し、且つ画像内で存在する位置をも認識することのできる画像認識装置及び方法を提供することを第二の目的とする。 Further, the present invention solves the third problem in addition to the first object, and recognizes with high accuracy what each of the plurality of objects (or objects) included in the image is, A second object is to provide an image recognition apparatus and method capable of recognizing a position existing in an image.

上記目的を達成するため、本発明の画像認識装置は、入力画像より部分領域を複数切り出して各々を対象画像となす部分領域抽出部と、前記対象画像が何を表すかを認識する画像認識部とを備え、該画像認識部は、前記対象画像より第一の画像特徴量を抽出して画像特徴ベクトルに変換する画像特徴量変換部と、所与の複数の画像に各々ラベルを付与すると共に、当該各画像を前記画像特徴量変換部にて変換した画像特徴ベクトルと対応付けて蓄積するデータセット蓄積部と、前記対象画像の画像特徴ベクトルと前記データセット蓄積部に蓄積された画像特徴ベクトルの各々とを比較して、前記対象画像と前記蓄積された所与の複数の画像の各々との類似度を求める比較部と、前記類似度が上位の所定数の画像に対して前記付与されたラベルのうち、出現頻度が所定基準を満たすラベルを前記対象画像が何を表すかの認識結果として求める投票判定部とを含む画像認識部とを含み、前記部分領域抽出部によって複数切り出された部分領域の各々における前記画像認識部による認識結果によって、前記入力画像に含まれる複数の対象の各々が何であるか認識し、且つ当該複数の対象の各々の前記入力画像内における位置をも認識することを第一の特徴とする。 In order to achieve the above object, an image recognition apparatus according to the present invention includes a partial region extraction unit that cuts out a plurality of partial regions from an input image and sets each as a target image, and an image recognition unit that recognizes what the target image represents. The image recognition unit extracts a first image feature quantity from the target image and converts it into an image feature vector, and assigns a label to each of a plurality of given images. A data set storage unit that stores each image in association with an image feature vector converted by the image feature amount conversion unit, an image feature vector of the target image, and an image feature vector stored in the data set storage unit A comparison unit that compares each of the target image and obtains a similarity between each of the plurality of accumulated images, and the similarity is given to a predetermined number of images at a higher level. Label An image recognition unit including a voting determination unit that obtains a label whose appearance frequency satisfies a predetermined criterion as a recognition result indicating what the target image represents, and a plurality of partial regions extracted by the partial region extraction unit Recognizing what each of the plurality of objects included in the input image is based on the recognition result by the image recognition unit in each, and recognizing the position of each of the plurality of objects in the input image One feature.

また、本発明は、前記部分領域抽出部が、前記部分領域として前記入力画像の縦又は横軸に平行な辺からなる矩形の形状の領域を、当該領域が前記入力画像において占める相対的な位置及び大きさによって複数特定して選出する候補選出部と、当該選出された複数の矩形の領域を前記入力画像から各々切り出して前記部分領域となす矩形抽出部とを含むことを第二の特徴とする。 In the present invention, the partial region extraction unit may occupy a rectangular region having a side parallel to the vertical or horizontal axis of the input image as the partial region. And a second feature that includes a candidate selection unit that specifies and selects a plurality of rectangles according to size, and a rectangle extraction unit that cuts out the plurality of selected rectangular regions from the input image to form the partial regions. To do.

また、本発明は、前記候補選出部が、前記矩形の領域が前記入力画像において占める相対的な位置及び大きさを乱数を用いて所定数特定する乱数発生部を含む、又は、所定の画像において予め対象が存在する相対的な位置及び大きさに関する統計情報を測定する統計情報測定部と、当該測定された位置及び大きさの所定の組合せに関する事前確率分布を蓄積する確率分布蓄積部と、当該事前確率の高い順に前記相対的な位置及び大きさを所定数特定する候補抽出部とを含むことを第三の特徴とする。 In the present invention, the candidate selection unit may include a random number generation unit that specifies a predetermined number of relative positions and sizes occupied by the rectangular area in the input image using random numbers, or in a predetermined image A statistical information measuring unit that measures statistical information related to the relative position and size of the target in advance, a probability distribution storage unit that stores a prior probability distribution related to a predetermined combination of the measured position and size, and A third feature is that it includes a candidate extraction unit that specifies a predetermined number of the relative positions and sizes in descending order of prior probabilities.

また、本発明は、前記部分領域抽出部がさらに、前記候補選出部にて選出された複数の矩形の形状の領域の中から、当該領域間の重複関係に基づいてその一部のみを選別する候補選別部を含み、前記矩形抽出部は当該選別された領域のみを切り出すことを第四の特徴とする。 In the present invention, the partial region extraction unit further selects only a part of the plurality of rectangular regions selected by the candidate selection unit based on the overlapping relationship between the regions. A fourth feature is that a candidate selection unit is included, and the rectangle extraction unit cuts out only the selected region.

また、本発明は、前記候補選別部が、前記入力画像を色及び／又はテクスチャ特徴の類似に基づいて任意形状の領域に分割する領域分割部と、当該分割された領域を過不足なく覆っている度合いを指標として前記候補選出部にて選出された複数の矩形の形状の領域が対象を捉えているかについての信頼度を算出する物体信頼度算出部と、前記算出された信頼度の高さと、前記候補選出部にて選出された複数の矩形の形状の領域の重複の少なさとに基づいて、前記一部のみを選別する最終候補決定部とを含むことを第五の特徴とする。 Further, according to the present invention, the candidate selecting unit covers the input image into regions of arbitrary shapes based on similarity of color and / or texture characteristics, and covers the divided regions without excess or deficiency. An object reliability calculation unit that calculates a reliability regarding whether a plurality of rectangular regions selected by the candidate selection unit captures the target using the degree of the image as an index, and the calculated high reliability And a final candidate determination unit that selects only the part based on the small overlap of the plurality of rectangular regions selected by the candidate selection unit.

また、本発明は、前記領域分割部が、前記入力画像を色及び／又はテクスチャ特徴の類似に基づいて任意形状の複数の小領域に過剰分割する小領域分割部と、当該過剰分割された複数の小領域の各々に対して、前記入力画像において当該小領域に属する内部領域と当該小領域に属さない外部領域とを対応付けて出力する対象領域選択部と、前記外部領域のうち対応する前記内部領域と色及び／又はテクスチャ特徴が類似する領域を当該内部領域と統合して小領域拡張領域として出力する小領域拡張部と、前記小領域拡張領域同士のペアの領域類似度を算出する領域類似度算出部と、前記小領域のペアのうち、対応する小領域拡張領域ペアの前記領域類似度が所定基準を満たす小領域ペア同士を全て同一オブジェクトに属すると判定して統合し、前記所定基準を満たさない小領域ペア同士は異なるオブジェクトに属すると判定することによって、各オブジェクトに対応する領域として統合された小領域としての前記任意形状の領域への分割結果を得る小領域統合部とを含むことを第六の特徴とする。なお、オブジェクトとは、画像内に含まれる物体が占める領域のことだけでなく、背景となる領域を指し示す語として用いた。 According to the present invention, the region dividing unit includes a small region dividing unit that excessively divides the input image into a plurality of small regions having an arbitrary shape based on similarity of color and / or texture characteristics, and the plurality of excessively divided plural images. A target area selection unit that outputs an internal area that belongs to the small area and an external area that does not belong to the small area in the input image for each of the small areas; An area for calculating the area similarity of a pair of the small area expansion areas, and a small area expansion section that integrates an area similar in color and / or texture characteristics with the internal area and outputs it as a small area expansion area Among the pairs of small areas, the similarity calculation unit determines that all the small area pairs in which the area similarity of the corresponding small area expansion area pairs satisfies a predetermined standard belong to the same object, and integrates them. A small region integration unit that obtains a result of dividing into regions of any shape as small regions integrated as regions corresponding to each object by determining that small region pairs that do not satisfy the predetermined criteria belong to different objects The sixth feature is to include Note that an object is used as a word indicating an area as a background as well as an area occupied by an object included in an image.

また、本発明は、前記部分領域抽出部が、前記入力画像より画素毎の注目度合いに対応する顕著度を抽出する顕著度抽出部と、前記入力画像の縦又は横軸に平行な辺からなる矩形の形状の領域であり且つ前記顕著度が所定基準を満たす所定数の領域を前記部分領域として確定し、当該領域が前記入力画像において占める相対的な位置及び大きさによって特定して選出する矩形確定部と、当該選出された複数の矩形の領域を前記入力画像から各々切り出して前記部分領域となす矩形抽出部とを含むことを第七の特徴とする。 Further, according to the present invention, the partial region extraction unit includes a saliency extraction unit that extracts a saliency corresponding to a degree of attention for each pixel from the input image, and a side parallel to the vertical or horizontal axis of the input image. A rectangle that is a rectangular area and has a predetermined number of areas whose saliency satisfies a predetermined standard as the partial area, and is selected and specified by a relative position and size occupied by the area in the input image A seventh feature includes a determination unit and a rectangular extraction unit that cuts out the selected plurality of rectangular regions from the input image to form the partial regions.

また、本発明は、前記投票判定部における前記所定基準を満たすラベルの各々を候補として、当該候補に予め対応付けられた第二の画像特徴量と、前記対象画像より抽出した第二の画像特徴量との比較に基づいて前記対象画像が何を表すかを認識する再判定認識部をさらに備え、前記入力画像に含まれる複数の対象の各々が何であるかの認識結果に、前記画像認識部の認識結果に代えて前記再判定認識部の認識結果を採用することを第八の特徴とする。 In the present invention, each of the labels satisfying the predetermined criterion in the voting determination unit is set as a candidate, a second image feature amount previously associated with the candidate, and a second image feature extracted from the target image. A re-determination recognizing unit that recognizes what the target image represents based on a comparison with a quantity, and the image recognizing unit recognizes what each of the plurality of targets included in the input image is. It is an eighth feature that the recognition result of the redetermination recognition unit is adopted instead of the recognition result.

また、本発明は、前記画像特徴量変換部は、前記対象画像より前記第一の画像特徴量として画素値を各要素とした画像特徴ベクトルに変換し、前記再判定認識部は、前記対象画像より前記第二の画像特徴量として局所特徴量を抽出する局所特徴量抽出部と、所与の複数の画像より各々局所特徴量を抽出すると共に、クラスタリングを施して各クラスタの代表ベクトルを算出するクラスタリング処理部と、前記算出された代表ベクトルをコードブックとして蓄積するコードブック蓄積部と、前記対象画像より抽出された局所特徴量を前記蓄積されたコードブックを参照して代表ベクトルへと量子化すると共に、各代表ベクトルの出現頻度を計測して画像特徴ベクトルを求めるベクトル量子化部と、各々ラベルを付与された所与の複数の画像と当該各画像に対して前記ベクトル量子化部を適用した画像特徴ベクトルとを対応づけて保存する画像特徴ベクトル保存部と、前記画像特徴ベクトル保存部に保存されたラベル付与された画像特徴ベクトルを学習データとして機械学習を適用し、各ラベルについての識別器を出力する機械学習部と、前記各ラベルについての識別器を蓄積する識別器蓄積部と、前記投票判定部における認識結果として求められたラベルの各々に対応する識別器を前記識別器蓄積部から読み出して、前記対象画像に対して前記ベクトル量子化部により求められた画像特徴ベクトルを入力することで所定基準を満たすラベルを求め、当該求まったラベルを前記対象画像が何を表すかの認識結果とする識別部とを含むことを第九の特徴とする。 Further, according to the present invention, the image feature amount conversion unit converts an image feature vector having pixel values as elements as the first image feature amount from the target image, and the redetermination recognition unit includes the target image. A local feature amount extraction unit that extracts a local feature amount as the second image feature amount, and extracts local feature amounts from a plurality of given images, and performs clustering to calculate a representative vector of each cluster. A clustering processing unit, a code book storage unit that stores the calculated representative vector as a code book, and a local feature extracted from the target image is quantized into a representative vector with reference to the stored code book In addition, the vector quantization unit for measuring the appearance frequency of each representative vector to obtain an image feature vector, and a plurality of given images each assigned a label. An image feature vector storage unit that stores an image feature vector in which the vector quantization unit is applied to each image in association with each other, and a labeled image feature vector stored in the image feature vector storage unit as learning data As a machine learning unit that outputs a discriminator for each label, a discriminator storage unit that stores a discriminator for each label, and a label obtained as a recognition result in the voting determination unit The classifier corresponding to each is read from the classifier storage unit, and the image feature vector obtained by the vector quantization unit is input to the target image to obtain a label satisfying a predetermined criterion. A ninth feature is that it includes a discriminating unit that recognizes what the target image represents as a label.

また、本発明は、前記識別部がさらに、各々が複数の対象を含む画像として構成され且つ各対象に対応するラベルが付与された所与の複数の画像より、ラベル同士の共起関係を測定する共起関係測定部と、前記測定された共起関係に基づいてラベル同士の共起に関する事前確率を蓄積する事前確率分布蓄積部と、前記対象画像の各々と、当該各々における前記投票判定部における認識結果として求められたラベルの各々に対して前記識別器が出力したスコアとの組合せに対して、前記事前確率を乗じた総合信頼度を算出する総合信頼度算出部とを含み、前記総合信頼度が最大となる組合せに対応する前記対象画像の各々に対するラベルを前記入力画像に含まれる複数の対象の各々が何であるかの認識結果とすることを第十の特徴とする。 Further, according to the present invention, the identification unit further measures a co-occurrence relationship between labels from a plurality of given images each configured as an image including a plurality of targets and provided with a label corresponding to each target. A co-occurrence relationship measuring unit, a prior probability distribution accumulating unit for accumulating prior probabilities relating to co-occurrence of labels based on the measured co-occurrence relationship, each of the target images, and the voting determination unit in each of the target images A total reliability calculation unit that calculates a total reliability by multiplying the prior probability by a combination with the score output by the discriminator for each of the labels obtained as a recognition result in A tenth feature is that a label for each of the target images corresponding to the combination having the maximum overall reliability is a recognition result of what each of the plurality of targets included in the input image is.

また、本発明は、前記共起関係測定部が、各々が複数の対象を含む画像として構成され且つ各対象に対応するラベル及び各対象が画像において占める矩形領域の情報が付与された所与の複数の画像より、ラベル同士の共起関係を当該ラベルに対応する矩形領域の面積占有率の共起関係として測定することを第十一の特徴とする。 Further, according to the present invention, the co-occurrence relationship measurement unit is configured as an image each including a plurality of objects, and a label corresponding to each object and information on a rectangular area occupied by each object in the image are given. An eleventh feature is that a co-occurrence relationship between labels is measured as a co-occurrence relationship of area occupancy ratios of rectangular regions corresponding to the label from a plurality of images.

さらに、本発明は、前記データセット蓄積部における所与の複数の画像と、前記画像特徴ベクトル保存部における所与の複数の画像との少なくとも一方が、単一の対象を含む画像として構成されていることを第十二の特徴とする。 Further, the present invention is such that at least one of the given plurality of images in the data set storage unit and the given plurality of images in the image feature vector storage unit is configured as an image including a single target. The twelfth feature.

また、本発明の画像認識方法は、上記目的を達成するため、入力画像より部分領域を複数切り出して各々を対象画像となす部分領域抽出ステップと、前記対象画像が何を表すかを認識する画像認識ステップとを備え、該画像認識ステップは、前記対象画像より第一の画像特徴量を抽出して画像特徴ベクトルに変換する画像特徴量変換ステップと、所与の複数の画像に各々ラベルを付与すると共に、当該各画像を前記画像特徴量変換部にて変換した画像特徴ベクトルと対応付けて蓄積するデータセット蓄積ステップと、前記対象画像の画像特徴ベクトルと前記データセット蓄積部に蓄積された画像特徴ベクトルの各々とを比較して、前記対象画像と前記蓄積された所与の複数の画像の各々との類似度を求める比較ステップと、前記類似度が上位の所定数の画像に対して前記付与されたラベルのうち、出現頻度が所定基準を満たすラベルを前記対象画像が何を表すかの認識結果として求める投票判定部とを含む画像認識ステップとを含み、前記部分領域抽出ステップによって複数切り出された部分領域の各々における前記画像認識ステップによる認識結果によって、前記入力画像に含まれる複数の対象の各々が何であるか認識し、且つ当該複数の対象の各々の前記入力画像内における位置をも認識することを第十三の特徴とする。 Further, in order to achieve the above object, the image recognition method of the present invention extracts a plurality of partial areas from the input image and extracts each of them as a target image, and an image for recognizing what the target image represents. A recognition step, wherein the image recognition step extracts a first image feature amount from the target image and converts it into an image feature vector, and assigns a label to each of a plurality of given images. And a data set storage step for storing each image in association with an image feature vector converted by the image feature amount conversion unit, an image feature vector of the target image, and an image stored in the data set storage unit A comparison step of comparing each of the feature vectors to obtain a similarity between the target image and each of the accumulated plurality of images, and the similarity is higher An image recognition step including a voting determination unit that obtains, as a recognition result of what the target image represents, a label whose appearance frequency satisfies a predetermined criterion among the given labels for a predetermined number of images, Recognizing what each of the plurality of targets included in the input image is based on the recognition result of the image recognition step in each of the partial regions cut out by the partial region extraction step, and for each of the plurality of targets A thirteenth feature is that the position in the input image is also recognized.

さらに、本発明は、前記投票判定ステップにおける前記所定基準を満たすラベルの各々を候補として、当該候補に予め対応付けられた第二の画像特徴量と、前記対象画像より抽出した第二の画像特徴量との比較に基づいて前記対象画像が何を表すかを認識する再判定認識ステップをさらに備え、前記入力画像に含まれる複数の対象の各々が何であるかの認識結果に、前記画像認識ステップの認識結果に代えて前記再判定認識ステップの認識結果を採用することを第十四の特徴とする。 Further, according to the present invention, each of the labels satisfying the predetermined criterion in the voting determination step is used as a candidate, a second image feature amount previously associated with the candidate, and a second image feature extracted from the target image. A re-determination recognition step for recognizing what the target image represents based on a comparison with a quantity, and the image recognition step for recognizing what each of the plurality of targets included in the input image is. It is a fourteenth feature that the recognition result of the redetermination recognition step is adopted instead of the recognition result.

前記第一又は第十三の特徴によれば、画像内に含まれる複数の対象について、各対象が何であるかに加えて、存在する場所まで認識できるようになる。従って第一の目的が達成される。 According to the first or thirteenth feature, a plurality of objects included in an image can be recognized up to where they exist in addition to what each object is. Therefore, the first purpose is achieved.

前記第二の特徴によれば、矩形形状の部分領域を抽出することによって、領域形状の表現に複雑な情報を扱うことなく、例えば、矩形の左上頂点のｘ座標、矩形の左上頂点のy座標、矩形の幅、矩形の高さといったような、わずか4つのパラメータだけで一意に矩形を規定できるように簡易化し、処理量および領域形状の保持用の記憶容量を削減することが可能となる。 According to the second feature, by extracting a rectangular partial region, for example, the x coordinate of the upper left vertex of the rectangle, the y coordinate of the upper left vertex of the rectangle without handling complicated information in the representation of the region shape Further, simplification is possible so that the rectangle can be uniquely defined with only four parameters such as the width of the rectangle and the height of the rectangle, and the processing capacity and the storage capacity for holding the region shape can be reduced.

前記第三の特徴によれば、部分領域の決定に複雑な処理行うことなく、乱数に基づいて部分領域が決定されるため、より短時間で部分領域の抽出が可能になる、又はその代わりに、既存の事例から算出された物体が存在する確率の高い矩形から順に部分領域が決定して、乱数で発生させた部分領域をそのまま採用する場合と比べて精度を重視して部分領域の抽出が可能となる。 According to the third feature, since the partial area is determined based on the random number without performing complicated processing for determining the partial area, the partial area can be extracted in a shorter time, or instead. Compared to the case where partial areas are determined in order from a rectangle with a high probability of existence of an object calculated from existing cases, and the partial areas generated by random numbers are used as they are, the extraction of partial areas is emphasized with a greater emphasis on accuracy. It becomes possible.

前記第四ないし第六のいずれかの特徴によれば、画像内の信号を解析し、物体の存在する確からしさに基づいて部分領域が選別されるため、選別を適用しない場合と比較して、より高精度に候補となる部分領域の抽出が可能になる。 According to any of the fourth to sixth features, the signal in the image is analyzed, and the partial region is selected based on the probability that the object exists, so compared to the case where the selection is not applied, It is possible to extract candidate partial areas with higher accuracy.

前記第七の特徴によれば、画像内の信号を解析し、顕著度に基づいて部分領域が求められるため、対象が存在する可能性のある部分領域が求められる。 According to the seventh feature, since the partial area is obtained based on the saliency by analyzing the signal in the image, the partial area where the target may exist is obtained.

前記第八若しくは第九又は第十四の特徴によれば、まずは前記第一又は第十の特徴によって、多数の場所および多種類の物体名称(対象が何であるか)の候補を過剰気味に挙げ、その後、候補に挙げられたN種類の物体(対象)の中のいずれであるかをより高度な画像特徴量を使って詳細に判定することによって、画像内に含まれる物体の名称の最終的な推定精度を向上させると同時に、部分領域抽出部にて多数挙げられた物体候補の中から真に物体の存在する場所を高精度に特定することができるようになる。従って第二の目標が達成される。 According to the eighth, ninth, or fourteenth feature, first, according to the first or tenth feature, candidates for a large number of places and various kinds of object names (what the object is) are overexposed. Then, the final name of the object contained in the image is determined by determining in detail using a higher-level image feature quantity which is one of the N types of objects (targets) listed as candidates. In addition to improving the accuracy of estimation, it is possible to specify with high accuracy the location where the object truly exists from among the many object candidates listed in the partial region extraction unit. Therefore, the second goal is achieved.

前記第十の特徴によれば、対象同士が画像に同時に存在する共起関係が妥当であるかを示す事前確率に基づいた総合信頼度によって認識結果を求めるので、さらに高精度な結果が得られる。 According to the tenth feature, since the recognition result is obtained by the total reliability based on the prior probability indicating whether the co-occurrence relationship in which the objects exist in the image at the same time is appropriate, a more accurate result can be obtained. .

前記第十一の特徴によれば、対象同士が画像に同時に存在する共起関係を、当該対象同士の面積占有率によって記述して、対象同士の大小関係をも含めた共起関係に係る事前確率に基づいた総合信頼度によって認識結果を求めるので、さらに高精度な結果が得られる。 According to the eleventh feature, the co-occurrence relationship in which the objects exist simultaneously in the image is described by the area occupancy ratio of the objects, and the prior occurrence relating to the co-occurrence relationship including the size relationship between the objects. Since the recognition result is obtained based on the total reliability based on the probability, a more accurate result can be obtained.

前記第十二の特徴によれば、認識処理実行時に比較する特徴ベクトル間の整合性を取り、画像内に含まれる物体の名称の推定精度を向上させることができるようになる。 According to the twelfth feature, consistency between feature vectors to be compared at the time of executing the recognition process can be taken, and the estimation accuracy of the name of the object included in the image can be improved.

本発明の実施形態に係る画像認識装置の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of the image recognition apparatus which concerns on embodiment of this invention, and its description. 一実施例に係る部分領域抽出部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of the partial region extraction part which concerns on one Example, and its description. 一実施例に係る候補選出部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of the candidate selection part which concerns on one Example, and its description. 一実施例に係る候補選出部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of the candidate selection part which concerns on one Example, and its description. 一実施例に係る部分領域抽出部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of the partial region extraction part which concerns on one Example, and its description. 候補選別部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of a candidate selection part, and its description. 領域分割部の機能ブロック図である。It is a functional block diagram of an area dividing unit. 領域似度算出部と小領域統合部とによる処理の概要を説明する図である。It is a figure explaining the outline | summary of the process by an area | region similarity calculation part and a small area integration part. 領域分割部が出力する小領域から小領域統合部等によって生成される小領域拡張領域の生成結果と、その結果に基づく小領域のグループ分けについて説明する図である。It is a figure explaining the production | generation result of the small area expansion area | region produced | generated by the small area integration part etc. from the small area | region which an area division part outputs, and the grouping of the small area based on the result. 物体信頼度測定部の処理を説明するための図である。It is a figure for demonstrating the process of an object reliability measurement part. 再判定認識部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of a redetermination recognition part, and its description. 局所特徴量としての画像信号値の算出を説明する図である。It is a figure explaining calculation of the image signal value as a local feature-value. 収集した様々な収集画像について、局所特徴量抽出部で行われる画像信号値の算出についての説明図である。It is explanatory drawing about the calculation of the image signal value performed by the local feature-value extraction part about the various acquired images collected. クラスタリング処理部で行うクラスタリング、ベクトル量子化部で行うベクトル量子化、ヒストグラム化についての説明図である。It is explanatory drawing about the clustering performed by a clustering process part, the vector quantization performed by a vector quantization part, and histogram formation. 従来技術に係り、また、本発明にて利用する画像認識部の機能ブロックと、当該画像認識部の処理の説明とを示す図である。It is a figure which concerns on a prior art and is a figure which shows the functional block of the image recognition part used by this invention, and the description of the process of the said image recognition part. 一実施例に係る部分領域抽出部の機能ブロックとその説明とを示す図である。It is a figure which shows the functional block of the partial region extraction part which concerns on one Example, and its description. 一実施例に係る識別部の機能ブロック図である。It is a functional block diagram of the identification part which concerns on one Example. 一実施例に係る画像認識装置の機能ブロック図である。It is a functional block diagram of the image recognition apparatus which concerns on one Example.

図1に、本発明の実施形態に係る画像認識装置の機能ブロックとその処理の説明とを示す。画像認識装置1は、部分領域抽出部11と、画像認識部12と、再判定認識部13とを備え、入力画像内に含まれる複数の対象について、その各々が何であるかを識別し、且つ複数の対象の各々の入力画像内における位置をも識別する。第一実施例では、再判定認識部13は省略される。第二実施例では、再判定認識部13が省略されることなく利用され、第一実施例よりもさらに高精度に各対象が何であるかを識別することが可能となる。 FIG. 1 shows functional blocks of the image recognition apparatus according to the embodiment of the present invention and a description of the processing. The image recognition device 1 includes a partial region extraction unit 11, an image recognition unit 12, and a redetermination recognition unit 13, and identifies what each of the plurality of objects included in the input image is, and The position of each of the plurality of objects in the input image is also identified. In the first embodiment, the redetermination recognition unit 13 is omitted. In the second embodiment, the redetermination recognition unit 13 is used without being omitted, and it is possible to identify what each object is with higher accuracy than in the first embodiment.

部分領域抽出部11は、入力画像から部分領域を抽出する。(1)に示す入力画像の例では「自動車」と「人物」の2種類の対象が含まれるが、それら真に対象を含む部分領域が取りこぼし無く抽出するように、部分領域抽出部11は部分領域をやや過剰気味に多数抽出し、対象が含まれる可能性がある候補領域として(2)に示すように抽出する。なお、(1)の入力画像の例では、(2)にて抽出された候補領域の境界も白線として当該入力画像上に重ねて描いてある。 The partial area extraction unit 11 extracts a partial area from the input image. In the example of the input image shown in (1), two types of targets, “car” and “person”, are included, but the partial region extraction unit 11 is configured to extract the partial region that truly includes the target without missing it. A large number of areas are extracted slightly excessively, and extracted as candidate areas that may include the target as shown in (2). In the example of the input image of (1), the boundary of the candidate area extracted in (2) is also drawn on the input image as a white line.

画像認識部12は、その内部の個々の機能ブロックについては、図15で説明したのと同様である。なお、本発明は当該画像認識部12と、その前段の構成としての部分領域抽出部11(第一及び第二実施例)と、その後段の構成としての再判定認識部13(第二実施例)との連係に特徴を有する。画像認識部12は、前段としての部分領域抽出部11にて入力画像から抽出された対象の候補領域としての部分領域の各々を処理すべき対象画像として、図15で説明したのと同様の処理を行う。 The image recognition unit 12 has the same individual functional blocks as those described with reference to FIG. The present invention relates to the image recognition unit 12, the partial region extraction unit 11 (first and second embodiments) as the former configuration, and the redetermination recognition unit 13 (second embodiment) as the subsequent configuration. ) With the feature. The image recognition unit 12 performs processing similar to that described with reference to FIG. 15 as the target image to be processed for each of the partial regions as the target candidate regions extracted from the input image by the partial region extraction unit 11 as the preceding stage. I do.

画像認識部12は部分領域抽出部11で抽出された部分領域を1つずつ、入力されるべき対象画像として受け取り、(3)に示すようにそれぞれについて含まれる対象が何であるか推定する。すなわち、図15で説明したのと同様に、入力された個々の対象画像が画像サイズ正規化部81及び画像特徴量変換部82の処理にて画像特徴ベクトルへと変換され、当該画像特徴ベクトルにより対象画像に類似する画像がデータセット蓄積部83内より比較部84によって検索され、検索された類似画像群に付与されているラベルによって投票判定部85が投票を行い、各ラベルの出現頻度を測定する。 The image recognition unit 12 receives each partial region extracted by the partial region extraction unit 11 as a target image to be input, and estimates what is included in each of them as shown in (3). That is, as described with reference to FIG. 15, each input target image is converted into an image feature vector by the processing of the image size normalization unit 81 and the image feature amount conversion unit 82, and the image feature vector An image similar to the target image is searched from the data set storage unit 83 by the comparison unit 84, and the voting determination unit 85 votes based on the labels attached to the searched similar image group, and measures the appearance frequency of each label. To do.

各対象画像に対するラベルの例は(3)の通りである。当該例においては、投票判定部85におけるしきい値を低めに設定することで、対象が何であるかの推定結果が候補リストとしてやや過剰気味に多数出力されている。すなわち、「自動車」が真に含まれる部分領域が対象画像として入力された場合、推定結果として正解の「自動車」だけではなく、「人物」、「犬」、「建物」、「バイク」といった、正しくない名称も含め、いくつかの名称が候補に挙げられる。 Examples of labels for each target image are as shown in (3). In this example, by setting the threshold value in the voting determination unit 85 to be low, a large number of estimation results as to what is the target are output slightly excessively as a candidate list. That is, when a partial area that truly includes “car” is input as the target image, not only the correct “car” as an estimation result, but also “person”, “dog”, “building”, “bike”, Several names are listed as candidates, including incorrect names.

なお、上記の例のようにしきい値を低めに設定して候補リストとして過剰気味に出力することを特に必要とするのは、第二実施例においてである。この際、各対象画像において共通にしきい値を設定し、当該しきい値を変動させながら各対象画像において候補リストに挙げられる名称の数を変動させ、当該名称の数が所定基準を満たすようにしきい値を自動設定してもよい。第一実施例においては、候補リストにおける最上位の名称を最終的な認識結果として用いてもよい。 In the second embodiment, it is particularly necessary to set the threshold value to be lower and output the candidate list in an excessive manner as in the above example. At this time, a threshold value is commonly set in each target image, and the number of names listed in the candidate list in each target image is changed while changing the threshold value so that the number of names satisfies a predetermined standard. The threshold value may be automatically set. In the first embodiment, the highest name in the candidate list may be used as the final recognition result.

再判定認識部13は、部分領域抽出部11で抽出され画像認識部12で候補リストが得られた対象画像を1つずつ、当該対象画像に対応する候補リストと共に入力として受け取り、当該対象画像が当該候補リストに挙げられたN種類の名称のうちのいずれを表しているかを、画像認識部12で利用した画像特徴量よりも高度な画像特徴量を使って、詳細に再判定する。なお、当該Nの値は画像認識部12の設定により定まり、対象画像により一般に異なる。 The redetermination recognition unit 13 receives the target images extracted by the partial region extraction unit 11 and obtained from the candidate list by the image recognition unit 12 one by one together with the candidate list corresponding to the target image, and the target image is Which of the N types of names listed in the candidate list is represented is re-determined in detail using an image feature amount higher than the image feature amount used by the image recognition unit 12. Note that the value of N is determined by the setting of the image recognition unit 12, and generally differs depending on the target image.

当該再判定により、(3)に示すようなN種類の名称からなる候補リストにおいて、第一実施例では対象画像に実際に表している対象以外の名称が１位として挙げられ、実際の対象の名称が下位に挙げられているような場合であっても、より確実に当該実際の名称を最終的な認定結果として得ることができるようになる。 By this redetermination, in the candidate list consisting of N types of names as shown in (3), in the first embodiment, names other than the target actually represented in the target image are listed as the first, and the actual target Even in the case where the name is listed at the lower level, the actual name can be obtained more reliably as the final authorization result.

また、そもそも対象画像自体が過剰気味に抽出されたことの結果として、何らの対象をも捉えていない画像となっているような場合であれば、当該再判定によりN種類の名称からなる候補リストのうちのいずれにも該当しないと、より確実に判定されるようになる。すなわち、(4)に示すように候補リストが「キリン」、「教会」、…である対象画像は何らの対象をも適切に捉えておらず、第一実施例ではしきい値の設定によってはこのような対象画像であっても「キリン」等と判定されることもあるのに対して、第二実施例ではより確実に「該当なし」と判定されるようになる。 In addition, if the target image itself is excessively extracted as a result, the candidate list including N types of names is determined by the re-determination if the image does not capture any target. If it does not correspond to any of the above, it will be judged more reliably. That is, as shown in (4), the target image whose candidate list is “Kirin”, “Church”,... Does not properly capture any target. In the first embodiment, depending on the threshold value setting, Even if it is such a target image, it may be determined as “giraffe” or the like, whereas in the second embodiment, it is determined more reliably as “not applicable”.

なお、再判定認識部13自体は、任意の画像に対して、当該画像がユーザ等の指定したある１つの対象を表しているか否かを高精度に認識するものであって、任意の画像が何を表しているかを自動で認識するものではない。任意の多数の対象について再判定認識部13を適用することは計算量の観点から現実的ではない。本発明の第二実施例においては特に、再判定認識部13の前段に部分領域抽出部11及び画像認識部12を設けることで、画像が何を表しているかをある程度の精度で推定しておくことによって、再判定認識部13により現実的な計算量で高精度に最終推定を行うことが可能となる。 The redetermination recognition unit 13 itself recognizes with high accuracy whether or not the image represents a certain target designated by the user or the like. It does not automatically recognize what it represents. It is not realistic from the viewpoint of the amount of calculation to apply the redetermination recognition unit 13 to any number of objects. In the second embodiment of the present invention, in particular, by providing the partial region extraction unit 11 and the image recognition unit 12 in the preceding stage of the redetermination recognition unit 13, what the image represents is estimated with a certain degree of accuracy. As a result, the re-determination recognition unit 13 can perform final estimation with high accuracy with a realistic amount of calculation.

第一及び第二実施例の両方において、最終的な認識結果は、入力画像において何らかの対象を捉えていると判断された対象画像の名称と、当該対象画像の入力画像内の位置として与えられる。(5)に示す例では(1)の入力画像より、「自動車」とその位置及び「人物」とその位置が認識結果として得られている。 In both the first and second embodiments, the final recognition result is given as the name of the target image determined to capture some target in the input image and the position of the target image in the input image. In the example shown in (5), “car” and its position, “person” and its position are obtained as recognition results from the input image of (1).

以下、部分領域抽出部11、画像認識部12、再判定認識部13と、3つの機能ブロックそれぞれの具体的な構成について説明する。まず、部分領域抽出部11の実施例を3つ、[実施例A]、[実施例B]及び[実施例C]として説明する。 Hereinafter, specific configurations of the partial area extraction unit 11, the image recognition unit 12, the redetermination recognition unit 13, and the three functional blocks will be described. First, three examples of the partial region extraction unit 11 will be described as [Example A], [Example B], and [Example C].

[実施例A]
図2に、当該実施例Aにおける部分領域抽出部11の機能ブロックとその処理の説明(1)とを示す。実施例Aにて部分領域抽出部11は、候補選出部21及び矩形抽出部22を含む。 [Example A]
FIG. 2 shows a functional block of the partial area extraction unit 11 in the embodiment A and a description (1) of the processing. In Example A, the partial region extraction unit 11 includes a candidate selection unit 21 and a rectangle extraction unit 22.

候補選出部21は、(1)のような入力画像内に含まれる物体が占める可能性がある領域の候補を選出する。ここで、抽出される部分領域の形状を矩形とし、且つ当該矩形は入力画像全体としての矩形に対して傾くことなく、辺同士が平行になって配置されるものとする。領域形状の表現に複雑な情報を扱うことなく、矩形の左上頂点のｘ座標、矩形の左上頂点のy座標、矩形の幅、矩形の高さのわずか4つのパラメータだけで一意に矩形を規定できる。(2)に示すように、選出された候補領域の矩形パラメータがセットとして複数出力される。 The candidate selection unit 21 selects candidates for regions that may be occupied by objects included in the input image as in (1). Here, it is assumed that the shape of the extracted partial region is a rectangle, and the rectangle is arranged with its sides parallel to each other without being inclined with respect to the rectangle of the entire input image. A rectangle can be uniquely defined with only four parameters: the x coordinate of the upper left vertex of the rectangle, the y coordinate of the upper left vertex of the rectangle, the width of the rectangle, and the height of the rectangle, without handling complex information in the area shape representation. . As shown in (2), a plurality of rectangular parameters of the selected candidate area are output as a set.

矩形パラメータは4つとも、元の画像の幅と高さで0〜1の値に正規化された値を用いる。すなわち、矩形の幅が1は元の画像の幅に等しく、同様に矩形の高さが1は元の画像の高さに等しいことを意味する。矩形の左上頂点のx座標が0の場合、矩形が元の画像内で最左端に位置し、同様に矩形の左上頂点のy座標が0の場合、矩形が元の画像内で最上端に位置することを意味する。矩形の左上頂点のx座標が１の場合、矩形が元の画像内で最右端に位置し、同様に矩形の左上頂点のy座標が1の場合、矩形が元の画像内で最下端に位置することを意味する。 All four rectangle parameters use values normalized to a value of 0 to 1 by the width and height of the original image. That is, a rectangular width of 1 is equal to the width of the original image, and similarly a rectangular height of 1 is equal to the height of the original image. If the x coordinate of the upper left vertex of the rectangle is 0, the rectangle is located at the leftmost edge in the original image. Similarly, if the y coordinate of the upper left vertex of the rectangle is 0, the rectangle is located at the uppermost edge in the original image. It means to do. If the x coordinate of the upper left vertex of the rectangle is 1, the rectangle is positioned at the rightmost end in the original image. Similarly, if the y coordinate of the upper left vertex of the rectangle is 1, the rectangle is positioned at the lowest end in the original image. It means to do.

なお、矩形を規定するパラメータは左上頂点のｘ座標に代えて矩形中央のx座標を、左上頂点のy座標に代えて矩形中央のy座標を採用するなどしてもよい。入力画像の辺と平行な辺で構成される矩形とすることで、より一般には、縦横の位置及び縦横の大きさに対する合計4つのみのパラメータによって矩形が規定できる。また当該位置及び大きさは、入力画像に対して相対的な値とすることで、矩形規定パラメータを任意サイズの入力画像について共通で利用できる。 The parameter that defines the rectangle may employ the x coordinate of the center of the rectangle instead of the x coordinate of the upper left vertex, and the y coordinate of the center of the rectangle instead of the y coordinate of the upper left vertex. By making a rectangle composed of sides parallel to the sides of the input image, more generally, the rectangle can be defined by only a total of four parameters for the vertical and horizontal positions and vertical and horizontal sizes. In addition, by setting the position and size relative to the input image, the rectangle defining parameter can be commonly used for input images of an arbitrary size.

矩形抽出部22は、(3)に示すように、候補選出部21の出力した矩形パラメータの各々について、対応する矩形としての部分領域を入力画像より切り出す。図1で説明したように、当該部分領域は対象の存在しうる候補領域であって、各々が対象画像として画像認識部12へと渡される。 As shown in (3), the rectangle extraction unit 22 cuts out a partial area as a corresponding rectangle for each of the rectangle parameters output by the candidate selection unit 21 from the input image. As described with reference to FIG. 1, the partial area is a candidate area where a target can exist, and each of the partial areas is passed to the image recognition unit 12 as a target image.

候補選出部21の2つの実施例a及びbに係る機能ブロックとその説明とを、それぞれ図3及び図4に示す。図3の実施例aにおいては、候補選出部21は乱数発生部24を含む。乱数発生部24は、(1)のように、乱数によって4つの矩形パラメータのセットを所定数決定する。例えば、各々規格化された矩形の左上頂点のｘ座標、矩形の左上頂点のy座標、矩形の幅及び矩形の高さで規定される4つの矩形パラメータを利用する場合なら、0〜1の範囲で乱数を発生させて各パラメータの値を決定する。 Functional blocks according to two embodiments a and b of the candidate selection unit 21 and their descriptions are shown in FIGS. 3 and 4, respectively. In the embodiment a of FIG. 3, the candidate selection unit 21 includes a random number generation unit 24. The random number generator 24 determines a predetermined number of sets of four rectangular parameters based on random numbers as in (1). For example, when using four rectangle parameters specified by the x coordinate of the upper left vertex of the standardized rectangle, the y coordinate of the upper left vertex of the rectangle, the width of the rectangle, and the height of the rectangle, a range of 0 to 1 Generate random numbers to determine the value of each parameter.

図4の実施例bにおいては、候補選出部21は統計情報測定部25、確率分布蓄積部26及び候補抽出部27を含み、既存の事例に基づいてあらかじめ出現確率の高い矩形パラメータを測定しておき、出現確率の高いものから順に矩形パラメータを決定する。当該決定手法には、以下の非特許文献2の手法を利用することができる。 In the embodiment b of FIG. 4, the candidate selection unit 21 includes a statistical information measurement unit 25, a probability distribution storage unit 26, and a candidate extraction unit 27, and measures rectangular parameters with a high appearance probability based on existing cases in advance. The rectangle parameters are determined in descending order of appearance probability. For the determination method, the following method of Non-Patent Document 2 can be used.

[非特許文献2]Rahtu E., Kannala J., Blaschko M. B., "Learning a Category Independent Object Detection Cascade," Proceedings of International Conference on Computer Vision (ICCV 2011). [Non-Patent Document 2] Rahtu E., Kannala J., Blaschko M. B., "Learning a Category Independent Object Detection Cascade," Proceedings of International Conference on Computer Vision (ICCV 2011).

統計情報測定部25は、(1)に示すように、既存の事例から物体が存在する場所に関する事前確率分布を矩形の4つのパラメータについて測定する。このため、多種多様な対象について、それらを含む画像を多数収集し、目視により物体の含まれる位置の矩形パラメータを対応付けて与えておく。非特許文献2では、当該4つのパラメータとして矩形の中心点のｘ座標RX、矩形の中心点のy座標RY、矩形の幅RW、矩形の高さRHを採用し、これらについて0〜1の範囲で値を測定している。 As shown in (1), the statistical information measuring unit 25 measures the prior probability distribution regarding the place where the object exists from the existing cases with respect to the four parameters of the rectangle. For this reason, for a wide variety of objects, a large number of images including them are collected, and the rectangular parameters of the positions where the objects are included are visually associated with each other. In Non-Patent Document 2, the x-coordinate RX of the center point of the rectangle, the y-coordinate RY of the center point of the rectangle, the width RW of the rectangle, and the height RH of the rectangle are adopted as the four parameters. The value is measured with.

統計情報測定部25はさらに、当該多数測定された各矩形パラメータについて0~1の範囲がP等分となるような階級区間を設定し、4次元のヒストグラムを求める。1番目の階級にはパラメータの値が0〜1/P、2番目の階級には1/P〜2/P、・・・、q番目の階級には(q-1)/P〜q/P、・・・、P番目の階級にはパラメータの値が(P-1)/P〜1の範囲にある矩形がカウントされる。Pはあらかじめ固定値を設定する。ここで、多様な矩形をカバーしうる数値例として例えばP=80とすると、事前確率分布の空間は80の4乗に達する程の膨大な大きさの空間となり、実装面での記憶容量の面で取扱いが困難である。 Further, the statistical information measurement unit 25 sets a class interval in which the range of 0 to 1 is P equally divided for each of the measured rectangular parameters, and obtains a four-dimensional histogram. The first class has parameter values 0 to 1 / P, the second class has 1 / P to 2 / P, ..., the qth class has (q-1) / P to q / In the P,..., P th class, the rectangles whose parameter values are in the range of (P-1) / P to 1 are counted. P is set in advance as a fixed value. Here, as a numerical example that can cover various rectangles, for example, if P = 80, the space of the prior probability distribution becomes a huge space that reaches the fourth power of 80, and the storage capacity in terms of mounting It is difficult to handle.

そこで、確率分布蓄積部26は上記困難に対応すべく、対象が存在する場所に関する事前確率分布を蓄積する。具体的には、(2)に示すように、統計情報測定部25にて測定された3つの2次元ヒストグラムP(RW, RH)、P(RY|RH)及びP(RX|RW)を蓄積する。(2)では、これらを可視化したものを模式的にそれぞれ(21)、(22)及び(23)として示してある。濃淡が確率の大きさに対応し、(21)においては原点(0,0)付近にて確率が大きく、(22)及び(23)においては点(1/2, 1)付近にて確率が大きくなっているが、これは入力画像のほぼ全体に対象の存在する矩形が形成される確率が高いことを表している。 Therefore, the probability distribution accumulating unit 26 accumulates the prior probability distribution related to the place where the target exists in order to cope with the above-mentioned difficulty. Specifically, as shown in (2), three two-dimensional histograms P (RW, RH), P (RY | RH), and P (RX | RW) measured by the statistical information measurement unit 25 are stored. To do. In (2), those visualized are schematically shown as (21), (22) and (23), respectively. Shading corresponds to the magnitude of the probability.In (21), the probability is large near the origin (0,0), and in (22) and (23), the probability is near the point (1/2, 1). Although it is large, this indicates that there is a high probability that a target rectangle is formed in almost the entire input image.

候補抽出部27は、矩形の4つのパラメータの組み合わせで事前確率の高い順に候補領域として選出する。この際、あらかじめ確率分布蓄積部26に蓄積された、3つの2次元ヒストグラムP(RW, RH)、P(RY|RH)、P(RX|RW) を読み出し、これらの積により、4つの矩形パラメータの事前確率分布P(RX, RY, RW, RH)を次式の通り近似的に算出する。
P(RX, RY, RW, RH) =P(RY|RH) P(RX|RW)P(RW, RH) The candidate extraction unit 27 selects the candidate areas in the descending order of prior probabilities using a combination of four rectangular parameters. At this time, the three two-dimensional histograms P (RW, RH), P (RY | RH), P (RX | RW) stored in advance in the probability distribution storage unit 26 are read out, and four rectangles are obtained by their product. The parameter prior probability distribution P (RX, RY, RW, RH) is approximately calculated as follows:
P (RX, RY, RW, RH) = P (RY | RH) P (RX | RW) P (RW, RH)

そして、P(RX, RY, RW, RH)の値が大きい順に、矩形パラメータの組み合わせをNR件選出する。NRはあらかじめ固定値を設定する。なお、非特許文献2では、NR=10,000を採用している。 Then, NR combinations of rectangular parameters are selected in descending order of P (RX, RY, RW, RH). A fixed value is set in advance for NR. In Non-Patent Document 2, NR = 10,000 is adopted.

[実施例B]
図5に、実施例Bに係る部分領域抽出部11の機能ブロックとその説明を示す。図2に示した実施例Aとの差異点として、実施例Bに係る部分領域抽出部11は追加構成として候補選別部23を含む。候補選別部23は、(2)に示すように、候補選出部21で選出した多数の矩形パラメータのセットの中から対象が存在する信頼度の高い矩形を表すものを選別し、当該選別されたパラメータセットを矩形抽出部22へ渡す。例えば、候補選出部21にて10⁵セット程度のパラメータが選出されたのに対して、その数が候補選別部23における選別によって10²〜10³程度へと絞り込まれる。 [Example B]
FIG. 5 shows functional blocks of the partial area extracting unit 11 according to the embodiment B and the description thereof. As a difference from the embodiment A shown in FIG. 2, the partial region extraction unit 11 according to the embodiment B includes a candidate selection unit 23 as an additional configuration. Candidate selection unit 23, as shown in (2), from the set of a large number of rectangular parameters selected by candidate selection unit 21, the one representing a highly reliable rectangle in which the target exists is selected and the selected The parameter set is passed to the rectangle extraction unit 22. For example, about 10 ⁵ sets of parameters are selected by the candidate selection unit 21, but the number is narrowed down to about 10 ² to 10 ³ by the selection by the candidate selection unit 23.

図6に、候補選別部23の機能ブロックとその説明とを示す。候補選別部23は、領域分割部28、物体信頼度測定部29及び最終候補決定部210を含む。候補選別部23は、(3)に示す前述の候補選出部21で矩形パラメータとして選出された各候補領域が物体等の対象を含んでいる可能性を、(1)のように元の入力画像の信号を使って信頼度として算出し、(4)のように信頼度の高い候補領域だけを選別する。 FIG. 6 shows a functional block of the candidate selection unit 23 and a description thereof. The candidate selecting unit 23 includes an area dividing unit 28, an object reliability measuring unit 29, and a final candidate determining unit 210. The candidate selection unit 23 indicates that each candidate area selected as a rectangular parameter by the above-described candidate selection unit 21 shown in (3) may include a target such as an object, as in (1). Using this signal, the reliability is calculated, and only candidate areas with high reliability are selected as shown in (4).

領域分割部28は、(2)に示すように、入力画像内を色やテクスチャ等の類似した小領域に分割する。領域分割部28には、特許文献1(本発明者らによる先願：画像領域分割装置、画像領域分割方法および画像領域分割プログラム)または特許文献2(本発明者らによる先願：画像領域分割装置、画像領域分割方法および画像領域分割プログラム)に記載された装置を利用することができる。 The area dividing unit 28 divides the input image into similar small areas such as colors and textures as shown in (2). The area dividing unit 28 includes Patent Document 1 (prior application by the present inventors: image area dividing apparatus, image area dividing method and image area dividing program) or Patent Document 2 (prior application by the present inventors: image area dividing). The apparatus described in the apparatus, the image area dividing method, and the image area dividing program can be used.

[特許文献１]特開2011-150605号公報「画像領域分割装置、画像領域分割方法および画像領域分割プログラム」
[特許文献２]特願2010-232914号「画像領域分割装置、画像領域分割方法および画像領域分割プログラム」 [Patent Document 1] Japanese Unexamined Patent Application Publication No. 2011-150605 “Image Region Dividing Device, Image Region Dividing Method, and Image Region Dividing Program”
[Patent Document 2] Japanese Patent Application No. 2010-232914 “Image Region Dividing Device, Image Region Dividing Method, and Image Region Dividing Program”

図7に、領域分割部28の機能ブロックを示す。領域分割部28は、入力画像を色特徴に基づいて複数の小領域に過剰分割する小領域分割部10と、該過剰分割された複数の小領域の各々に対して、前記入力画像において該小領域に属する内部領域と該小領域に属さない外部領域とを対応付けて出力する対象領域選択部2と、前記外部領域のうち対応する前記内部領域と色特徴が類似する領域を該内部領域と統合して小領域拡張領域として出力する小領域拡張部30と、前記小領域拡張領域同士のペアの領域類似度を算出する領域類似度算出部6と、前記小領域のペアのうち、対応する小領域拡張領域ペアの前記領域類似度が所定基準を満たす小領域ペア同士を全て同一オブジェクトに属すると判定して統合し、前記所定基準を満たさない小領域ペア同士は異なるオブジェクトに属すると判定することによって前記各オブジェクトを得る小領域統合部7とを備え、入力画像に含まれる各オブジェクトを分離抽出する。 FIG. 7 shows functional blocks of the area dividing unit 28. The area dividing unit 28 subdivides the input image into a plurality of small areas based on color characteristics, and for each of the plurality of excessively divided small areas, A target area selection unit 2 that outputs an internal area that belongs to an area and an external area that does not belong to the small area, and an area similar in color characteristic to the corresponding internal area among the external areas Among the small region pairs, the small region expansion unit 30 that integrates and outputs as a small region expansion region, the region similarity calculation unit 6 that calculates the region similarity of the pair of the small region expansion regions, and the small region pair correspond to each other. It is determined that all the small region pairs in which the region similarity of the small region extended region pair satisfies the predetermined standard belong to the same object and are integrated, and the small region pairs that do not satisfy the predetermined standard are determined to belong to different objects. about Thus a small region integrating unit 7 to obtain each object, separates and extracts each object included in the input image.

当該特許文献1等に記載の画像領域分割装置を利用して実現された領域分割部28は、以上のように大きく分けて、小領域分割部10、対象領域選択部2、小領域拡張部30、領域類似度算出部6、小領域統合部7、の5つの機能ブロックにより、画像内に含まれるオブジェクトの抽出を実現する。当該抽出されるオブジェクトは矩形としてではなく、当該オブジェクトに対応する任意形状で抽出され、本発明における領域として利用する。オブジェクトには、画像内に含まれる物体等の対象が占める領域だけでなく、背景となる領域も含まれる。各部の機能概要は次の通りである。 The area dividing unit 28 realized by using the image area dividing apparatus described in Patent Document 1 is roughly divided as described above, and is divided into a small area dividing unit 10, a target area selecting unit 2, and a small area expanding unit 30. Extraction of objects included in the image is realized by the five functional blocks of the region similarity calculation unit 6 and the small region integration unit 7. The extracted object is extracted as an arbitrary shape corresponding to the object, not as a rectangle, and is used as a region in the present invention. The object includes not only a region occupied by a target such as an object included in the image but also a region serving as a background. The function outline of each part is as follows.

小領域分割部10は、色や輝度等の画像信号の性質の類似性に基づいて画像を実際に画像内に含まれるオブジェクトの数よりも格段に多くの小領域に分割する。当該詳細は特許文献1等に開示されている。 The small area dividing unit 10 divides the image into much smaller areas than the number of objects actually included in the image based on the similarity of the properties of the image signal such as color and luminance. The details are disclosed in Patent Document 1 and the like.

小領域拡張部30は、着目する小領域内の輝度および色の分布を統計的にモデル化し、画像内のそれ以外の場所で元の小領域と同様の輝度および色の分布モデルに従う画素を特定し、元の小領域に結合して拡張領域を得る。対象領域選択部と組み合わせることにより、過剰分割により得られた小領域のそれぞれに対して拡張領域が一つずつ得られる。当該詳細は特許文献1等に開示されている。 The small area expansion unit 30 statistically models the luminance and color distribution in the small area of interest, and identifies pixels that follow the same luminance and color distribution model as the original small area elsewhere in the image Then, an extended area is obtained by combining with the original small area. By combining with the target area selection unit, one extended area is obtained for each of the small areas obtained by the excessive division. The details are disclosed in Patent Document 1 and the like.

領域類似度算出部6は、拡張領域間の類似性を一致する画素が含まれる割合として測定し、数値化する。ここで小領域拡張領域間の領域類似度S(R1, R2)は次式(式1)から算出する。 The region similarity calculation unit 6 measures and digitizes the similarity between the extended regions as a ratio including matching pixels. Here, the region similarity S (R1, R2) between the small region expansion regions is calculated from the following equation (Equation 1).

式1において分子E1∩E2は小領域ペアR1およびR2に対応する小領域拡張領域E1およびE2の両方に属する画素の個数を、分母E1∪E2は小領域拡張領域領域E1またはE2のいずれかに属する画素の個数を示す。また、0≦S(E1、 E2)≦1であり、領域E1およびE2が完全に一致していればS(E1、 E2)=1を、全く一致する画素がなければS(E1、 E2)=0を示す。領域類似度は大きいほど領域(形状・位置などを含む)間が類似していることを示す。 In Equation 1, the numerator E1∩E2 is the number of pixels belonging to both the small region expansion regions E1 and E2 corresponding to the small region pair R1 and R2, and the denominator E1∪E2 is either the small region expansion region region E1 or E2. Indicates the number of pixels to which it belongs. Also, 0 ≦ S (E1, E2) ≦ 1, S (E1, E2) = 1 if the regions E1 and E2 are completely matched, and S (E1, E2) if there is no matching pixel = 0. A larger region similarity indicates that regions (including shape and position) are more similar.

小領域統合部7は、領域類似度算出部6にて算出された拡張領域間の類似度があらかじめ設定したしきい値を超えた場合、両拡張領域は互いに似ていると判定し、それらの拡張領域の元の小領域間が同じオブジェクトに属していると判定して、同一グループへと統合する。 The small region integration unit 7 determines that the two expansion regions are similar to each other when the similarity between the expansion regions calculated by the region similarity calculation unit 6 exceeds a preset threshold. It is determined that the original small areas of the extended area belong to the same object, and are integrated into the same group.

特許文献1に記載の画像領域分割装置における上述のような統合処理は、画像に含まれるオブジェクトはそれぞれ特徴的な色の分布を有しており、同一のオブジェクト内であれば部分的に見て色分布が大きく変わることはない、という特性に基づいて行われるものである。したがって、ある一定の色分布を選択された2つの小領域のそれぞれの色分布に基づいて出力された拡張領域同士に重複する領域が多い場合(位置・形状などの一致度が大きい場合を含む)は、選択された小領域が同じ色分布を持っており、選択された2つの小領域(選択された1つの小領域ペア)は同一のオブジェクトの一部である可能性が高いと考えて統合する。 In the integration process as described above in the image region dividing device described in Patent Document 1, each object included in an image has a characteristic color distribution. This is based on the characteristic that the color distribution does not change significantly. Therefore, when there are many overlapping areas in the extended areas output based on the color distribution of each of the two small areas selected for a certain color distribution (including cases where the degree of coincidence of position, shape, etc. is large) The selected subregions have the same color distribution and the two selected subregions (one selected subregion pair) are likely to be part of the same object To do.

図8に、領域類似度算出部6と小領域統合部7とによる処理の概要を示す。すなわち、小領域分割部1により過剰分割された結果(a)から、対象領域選択部2によって選択された(b)小領域R1および(d)小領域R2のそれぞれを、小領域拡張部30により拡張して(c)拡張領域E1および(e)拡張領域E2を得た場合の例である。領域類似度算出部6により拡張領域E1および拡張領域E2で一致する画素の割合から類似度が0.92と算出され、所定のしきい値0.90を超えるため、小領域統合部7により元の小領域1および小領域2を統合する。 FIG. 8 shows an outline of processing by the region similarity calculation unit 6 and the small region integration unit 7. That is, from the result (a) of being excessively divided by the small region dividing unit 1, each of (b) small region R1 and (d) small region R2 selected by the target region selecting unit 2 is converted by the small region expanding unit 30. In this example, (c) the expansion area E1 and (e) the expansion area E2 are obtained by expansion. Since the similarity is calculated as 0.92 from the ratio of the matching pixels in the extension region E1 and the extension region E2 by the region similarity calculation unit 6, and exceeds a predetermined threshold value 0.90, the small region integration unit 7 performs the original small region 1 And integrate the small area 2.

図8のようにして小領域統合部7が入力画像の小領域ペア全てにつき領域類似度を求め、小領域を各オブジェクトに対応するグループに分けていく例を図9に示す。入力画像を小領域分割部1によって小領域分割した結果、17個の小領域R1〜R17が生成され、番号を与えて識別されている。各小領域に対して、対象領域選択部2および小領域拡張部30の処理によって対応する小領域拡張領域E1〜E17が得られる。小領域拡張領域の全ペアの領域類似度を求め、所定しきい値を上回るかもしくはそれ以下である判定を行った結果、小領域拡張領域の3グループ[(E14、 E15、 E16、 E17)、 (E5、 E6、 E7、 E8、 E9、 E10、 E11、 E12、 E13)、 (E1、 E2、 E3、 E4)]が得られる。この3グループは各グループ内の小領域拡張領域ペアは全て領域類似度が所定値を上回り、各グループ外のペアでは全て所定値以下となるようなグループ分けとなっている。よってこの3グループから対応する元の小領域の3グループ[グループ1(R14、 R15、 R16、 R17)、グループ2(R5、 R6、 R7、 R8、 R9、 R10、 R11、 R12、 R13)、グループ3(R1、 R2、 R3、 R4)]を得る。各グループは各オブジェクトの一部であるとみなされる。 FIG. 9 shows an example in which the small region integration unit 7 obtains region similarity for all the small region pairs in the input image and divides the small regions into groups corresponding to the respective objects as shown in FIG. As a result of dividing the input image into small areas by the small area dividing unit 1, 17 small areas R1 to R17 are generated and identified by numbers. Corresponding small region expansion regions E1 to E17 are obtained for each small region by the processing of the target region selection unit 2 and the small region expansion unit 30. As a result of determining the area similarity of all pairs of small area expansion areas and determining whether it is above or below a predetermined threshold, 3 groups of small area expansion areas ((E14, E15, E16, E17), (E5, E6, E7, E8, E9, E10, E11, E12, E13), (E1, E2, E3, E4)] are obtained. The three groups are grouped so that all the small region extended region pairs in each group have a region similarity higher than a predetermined value, and all pairs outside each group have a predetermined value or less. Therefore, from these 3 groups, the corresponding 3 groups of the original small area (Group 1 (R14, R15, R16, R17), Group 2 (R5, R6, R7, R8, R9, R10, R11, R12, R13), Group 3 (R1, R2, R3, R4)]. Each group is considered part of each object.

以上で示した通り、前記特許文献1に記載された画像領域分割装置は、画像を一度過剰数の小領域に分割してから、各小領域に対する拡張領域間の類似性に基づいて、小領域単位で統合判定を行い、画像内に含まれるオブジェクトの抽出を実現する。本発明の領域分割部28は当該画像分割装置を利用して構成される。 As described above, the image region dividing device described in Patent Document 1 divides an image into an excessive number of small regions, and then, based on the similarity between the extended regions for each small region, Integration determination is performed on a unit basis, and an object included in the image is extracted. The area dividing unit 28 of the present invention is configured using the image dividing apparatus.

なお、本発明の領域分割部28には、図7の構成における小領域分割部10のみを採用して、過剰分割された小領域をそのまま物体信頼度測定部29へと渡す領域としてもよい。これは、以降説明するように、物体信頼度測定部29によれば過剰分割された小領域からも対象を適切に捉えた矩形を構築することができるからである。 Note that only the small region dividing unit 10 in the configuration of FIG. 7 may be used as the region dividing unit 28 of the present invention, and the excessively divided small region may be passed to the object reliability measuring unit 29 as it is. This is because, as will be described later, according to the object reliability measurement unit 29, it is possible to construct a rectangle that appropriately captures a target even from an excessively divided small region.

物体信頼度測定部29は、領域分割結果に基づき各候補領域の物体らしさ（物体又は何らかの対象を捉えているかについての度合い）を信頼度として数値化する。この方法には、非特許文献3の記載のアプローチを利用することができる。 The object reliability measurement unit 29 quantifies the object-likeness of each candidate area (degree of whether an object or some target is captured) as the reliability based on the area division result. The approach described in Non-Patent Document 3 can be used for this method.

[非特許文献3]B. Alexe、 T. Deselaers、 and V. Ferrari., "What is an object?," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2010), June 2010. [Non-Patent Document 3] B. Alexe, T. Deselaers, and V. Ferrari., "What is an object ?," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2010), June 2010.

図10は、物体信頼度測定部29の処理を説明するための図である。(2)に示すように、領域分割部28により画像は複数の領域sに分割され、領域分割結果RDは複数の領域sとして構成される。(2)の例ではオブジェクトとしての車に対応する領域から、車体フレームの各側面や、各々の窓や、各々のタイヤなどがそれぞれ領域sとして得られている。 FIG. 10 is a diagram for explaining the processing of the object reliability measurement unit 29. As shown in (2), the image is divided into a plurality of regions s by the region dividing unit 28, and the region division result RD is configured as a plurality of regions s. In the example of (2), each side of the body frame, each window, each tire, and the like are obtained as the region s from the region corresponding to the vehicle as the object.

また(1)に示すように、候補選出部21により入力画像内で物体が存在する可能性のある矩形wがいくつか候補として得られている。物体信頼度測定部29は、物体の存在する領域の候補となる各矩形wについて、物体らしさの信頼度SS(w)を次式(式2)から算出する。 In addition, as shown in (1), the candidate selection unit 21 has obtained as candidates some rectangles w in which an object may exist in the input image. The object reliability measurement unit 29 calculates an object-likeness reliability SS (w) from the following equation (Equation 2) for each rectangle w that is a candidate for a region where the object exists.

物体らしさの信頼度SS(w)は0〜1の値を取り、値が大きいほどその矩形wは物体が存在する可能性が高いとを意味する。ここで、|s∩w|は領域sについて矩形wの内側の面積を、また|s＼w|は領域sについて矩形wの外側の面積を、|w|は矩形wの面積を示す。いずれもデジタル画像の場合は、面積の代わりに画素数を測定することができる。右辺第2項は領域分割結果RD内に含まれる全ての領域sについて総和を取ることを示す。 The reliability SS (w) of object-likeness takes a value of 0 to 1, and the larger value means that the rectangle w is more likely to exist. Here, | s∩w | indicates the area inside the rectangle w for the region s, | s \ w | indicates the area outside the rectangle w for the region s, and | w | indicates the area of the rectangle w. In either case of digital images, the number of pixels can be measured instead of the area. The second term on the right side indicates that the sum is obtained for all the regions s included in the region division result RD.

例えば(3)に示す矩形w1,w2及びw3に対しては、(式2)において車オブジェクトの各部分に対応する各領域sが信頼度SS(w)の算出に寄与することとなる。そして(式2)によれば領域をできる限り過不足のないちょうどの大きさで覆うような矩形の信頼度が高くなるので、矩形w1,w2及びw3の間で比較すると、矩形w2の信頼度が高く算出されることとなる。なお、(4)は領域sのうちタイヤに対応する領域と矩形w1との関係において、(式2)の|s∩w1|及び|s＼w1|を例として示している。 For example, for the rectangles w1, w2, and w3 shown in (3), each region s corresponding to each part of the car object in (Expression 2) contributes to the calculation of the reliability SS (w). And according to (Equation 2), the reliability of the rectangle that covers the area with as much size as possible is as high as possible, so when comparing between the rectangles w1, w2 and w3, the reliability of the rectangle w2 Is calculated high. Note that (4) shows | s∩w1 | and | s \ w1 | in (Expression 2) as an example in the relationship between the region corresponding to the tire in the region s and the rectangle w1.

最終候補決定部210は、物体信頼度測定部29により測定された候補領域となる各矩形wの物体らしさの信頼度SS(w)とそれぞれの矩形パラメータを用いて、下記アルゴリズムにより、候補選出部21で生成された初期候補領域が物体等の存在する対象領域として採用し得るか否かを最終的に判定する。 The final candidate determination unit 210 uses the following algorithm to select a candidate selection unit using the reliability SS (w) of the object-likeness of each rectangle w that is a candidate area measured by the object reliability measurement unit 29 and each rectangle parameter. It is finally determined whether or not the initial candidate area generated in 21 can be adopted as a target area where an object or the like exists.

[アルゴリズム]
ステップ１．信頼度SS(w)の高い順に、候補領域となる各矩形を並べて、ステップ２以降の当該繰り返し判定アルゴリズムを適用する順番iとする。
ステップ２．i番目の矩形が、これまでに採用した矩形と重なりが一定の割合RO以下であれば対象領域として採用し、そうでなければ不採用とする。
ステップ３．採用数がパラメータにより指定された最終候補数NFの数に達すれば終了とする。
ステップ４．採用・不採用の判定対象となる矩形が残っていればiを増分して２に戻る。残っていなければ、ROを小さくし、最も信頼度SS（w）の大きく、且つ不採用とされた候補iからステップ２の判定を再開する。 [algorithm]
Step 1. The rectangles that are candidate regions are arranged in descending order of reliability SS (w), and the order i in which the repetitive determination algorithm in step 2 and subsequent steps is applied.
Step 2. If the i-th rectangle overlaps the rectangle adopted so far with a certain ratio RO or less, the i-th rectangle is adopted as the target area, otherwise it is not adopted.
Step 3. If the number of hires reaches the number of final candidates NF specified by the parameter, the process ends.
Step 4. If there remains a rectangle to be adopted / not adopted, i is incremented and the value returns to 2. If it does not remain, RO is reduced, and the determination in step 2 is restarted from the candidate i that has the highest reliability SS (w) and is not adopted.

なお、当該アルゴリズムは採用数を初期値ゼロに、カウンタiを初期値1に設定して開始し、ステップ3における終了判定の最終候補数NFは候補選出部21で選出する所定数よりも小さく設定する。ステップ4においてROを小さくする際には1未満の正の定数を乗ずるなど、所定の手法を定めておく。 The algorithm starts with the number of adoptions set to an initial value of zero and the counter i set to an initial value of 1, and the final candidate number NF of the end determination in step 3 is set smaller than the predetermined number selected by the candidate selection unit 21. To do. In order to reduce RO in Step 4, a predetermined method is determined such as multiplying by a positive constant less than 1.

[実施例C]
図16は実施例Cに係る部分領域抽出部11の機能ブロック図である。部分領域抽出部11は、顕著度抽出部210、矩形確定部230及び矩形抽出部22を含む。矩形抽出部22は実施例A及びBと同様である。 [Example C]
FIG. 16 is a functional block diagram of the partial area extracting unit 11 according to the embodiment C. The partial area extraction unit 11 includes a saliency extraction unit 210, a rectangle determination unit 230, and a rectangle extraction unit 22. The rectangular extraction unit 22 is the same as in the embodiments A and B.

顕著度抽出部210は、非特許文献5に開示された手法に従って、入力画像の各画素より、人間が画像内で特に注視する度合いとしての顕著度を算出する。例えば、草村に生えた赤い花が撮影された画像の場合、人間は赤い花に対応する領域を注視すると考えられ、当該領域に対して高い顕著度が算出される。 The saliency extraction unit 210 calculates the saliency as the degree that a person is particularly gazing in the image from each pixel of the input image according to the method disclosed in Non-Patent Document 5. For example, in the case of an image in which a red flower growing in grass village is photographed, it is considered that a human gazes at a region corresponding to the red flower, and a high saliency is calculated for the region.

[非特許文献5]Ming-Ming Cheng, Guo-Xin Zhang, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. Global Contrast based Salient Region Detection. IEEE CVPR, p. 409-416, Colorado Springs, Colorado, USA, June 21-23, 2011. [Non-Patent Document 5] Ming-Ming Cheng, Guo-Xin Zhang, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu. Global Contrast based Salient Region Detection. IEEE CVPR, p. 409-416, Colorado Springs, Colorado, USA, June 21-23, 2011.

顕著度は入力画像の各画素につき定義されるので、入力画像が縦IH画素×横IWのサイズで構成される場合、画素顕著度抽出部210は同様のサイズの縦IH行×横IW列の配列で構成される顕著度マップを算出することとなる。ここで、入力画像の各画素に対応する場所の顕著度が顕著度マップ内に格納されることとなる。また、顕著度は0から1の範囲内に規格化された実数として求められ、当該値が大きいほど注視される度合いが強いことを表す。ここで、顕著度マップ内のPy行、Px列における要素の顕著度をSAL(Px, Py)と記す。 Since the saliency is defined for each pixel of the input image, if the input image is configured with a size of vertical IH pixel × horizontal IW, the pixel saliency extraction unit 210 has the same size of vertical IH row × horizontal IW column. A saliency map composed of arrays is calculated. Here, the saliency of the place corresponding to each pixel of the input image is stored in the saliency map. The saliency is obtained as a real number normalized within the range of 0 to 1, and the greater the value, the stronger the degree of attention. Here, the saliency of the element in the Py row and Px column in the saliency map is denoted as SAL (Px, Py).

矩形確定部230は、顕著度抽出部210の求めた顕著度マップから、対象が存在する候補領域としての矩形領域を所定個数CN個だけ確定して、候補選出部21における矩形パラメータと同様の方式によって、当該確定した矩形領域を矩形抽出部22に渡す。当該確定する手順は以下の通りである。なお、手順を開始するに際して、矩形抽出部22における抽出対象としての確定個数CN並びにしきい値TV、TS及びTEを予め手動等で定めておく。しきい値に関してはいずれも0から1の範囲内の実数で設定する。 The rectangle determination unit 230 determines a predetermined number CN of rectangular regions as candidate regions in which the target exists from the saliency map obtained by the saliency extraction unit 210, and uses the same method as the rectangular parameter in the candidate selection unit 21 Then, the determined rectangular area is transferred to the rectangle extracting unit 22. The procedure for determining is as follows. When starting the procedure, the fixed number CN as an extraction target in the rectangular extraction unit 22 and the threshold values TV, TS, and TE are determined in advance manually. All threshold values are set as real numbers in the range of 0 to 1.

(手順0)
顕著度マップ内における顕著度の重心(Ox, Oy)を下式（式3）の通り求め、手順1へ進む。 (Procedure 0)
The center of gravity (Ox, Oy) of the saliency in the saliency map is obtained as in the following equation (Equation 3), and the procedure proceeds to step 1.

(手順1)
重心を含む矩形で、かつ含まれる全ての要素の顕著度がしきい値TVを超える最大の面積を持つ矩形を顕著度マップ内で特定する。つまり、特定された矩形内に含まれる全ての要素は顕著度がしきい値TVを超えている。また、当該特定矩形をその外で囲む外周上（幅が1画素の外周）のいずれかの点がしきい値TV以下の要素となっている。しきい値TVは例えば、予め0.8に設定することができるが、設定可能最大値の1.0を設定する等してもよい。当該しきい値TVを超える要素が1つでもあれば手順2へ、無ければ手順3へ移る。 (step 1)
A rectangle including the center of gravity and having a maximum area where the saliency of all the included elements exceeds the threshold TV is specified in the saliency map. That is, the saliency of all the elements included in the specified rectangle exceeds the threshold value TV. In addition, any point on the outer periphery (the outer periphery having a width of one pixel) surrounding the specific rectangle is an element equal to or less than the threshold TV. The threshold TV can be set to 0.8 in advance, for example, but it may be set to 1.0 which is the maximum settable value. If there is at least one element that exceeds the threshold TV, go to step 2, otherwise go to step 3.

(手順2)
手順1で特定された矩形について、入力画像に占める特定矩形の割合がしきい値TSを超えていたら、この特定矩形をCN個の候補領域のうちの1つとして確定すると同時に、確定された候補領域に含まれる顕著度マップ内の全ての要素の顕著度を0に置き換えてから、手順3へ移る。超えていなければ、候補領域は確定せずに手順3へ移る。例えばTSは0.05に設定し、これを下回る面積比の非常に小さな特定矩形が候補領域として抽出されるのを防ぐ。 (Step 2)
For the rectangle specified in step 1, if the percentage of the specified rectangle in the input image exceeds the threshold value TS, this specified rectangle is determined as one of the CN candidate areas, and at the same time, the determined candidate After the saliency of all the elements in the saliency map included in the area is replaced with 0, the procedure proceeds to step 3. If not, the candidate area is not fixed and the process proceeds to step 3. For example, TS is set to 0.05, and a very small specific rectangle having an area ratio lower than this is prevented from being extracted as a candidate region.

なお、当該特定矩形の向きは任意ではなく、各辺が入力画像の縦・横と平行となるような向きである。また、顕著度を0に置き換えることで、互いの重複を避けて候補領域としての特定矩形を順次複数求めることができるようになる。 Note that the orientation of the specific rectangle is not arbitrary, and is such that each side is parallel to the vertical and horizontal sides of the input image. Also, by replacing the saliency with 0, it is possible to sequentially obtain a plurality of specific rectangles as candidate regions while avoiding mutual overlap.

(手順3)
次に、しきい値TVの値を更新する。TV_更新後=αTV_更新前とし、αは0から1の範囲内の実数で設定する。よって更新する毎にしきい値TVは減少する。例えばα=0.99とする。以下、手順0に戻って、次の候補領域の確定を繰り返すが、確定した候補領域が目標数のCN個に達した場合、確定処理を終了する。またはCN個には達していないが、しきい値TVを更新した結果、TVの値がしきい値TEを下回った場合も確定処理を終了する。この場合、CN個未満の矩形領域が確定することとなるが、含まれる顕著度があまりにも低い特定矩形が候補領域として抽出されることを防ぐことができる。この意味から、しきい値TEは例えばTE=0.20に設定する。 (Step 3)
Next, the value of the threshold TV is updated. _After TV _update = α _before TV _update , α is set as a real number in the range of 0 to 1. Therefore, the threshold TV decreases each time it is updated. For example, α = 0.99. Thereafter, returning to the procedure 0, the determination of the next candidate area is repeated, but when the determined candidate areas reach the target number CN, the determination process is terminated. Alternatively, although the number of CNs has not been reached, as a result of updating the threshold value TV, the determination process is also terminated when the value of TV falls below the threshold value TE. In this case, less than CN rectangular areas are determined, but it is possible to prevent a specific rectangle having a too low degree of saliency being included as a candidate area. In this sense, the threshold value TE is set to TE = 0.20, for example.

なお、上記の矩形候補領域を確定するにあたって、顕著度抽出部210において顕著度マップにあらかじめガウシアンフィルタをかけておき、周辺と比較すると極端に差がある雑音と考えられる顕著度を平滑化しておいてもよい。 In determining the rectangular candidate area, the saliency extraction unit 210 applies a Gaussian filter to the saliency map in advance, and smoothes the saliency that is considered to be extremely different from the surrounding noise. May be.

なお、実施例Cにおける顕著度抽出部210及び矩形確定部230は、図5に示した実施例Bの候補選出部21及び候補選別部23に全体の役割として概ね対応しており、顕著度は実施例Bにおける信頼度SS(w)と概ね同様の意義を有する。差異点として、図4の構成により、オブジェクトの存在に関する事前確率を学習することで高精度化を図ることができる点などにおいて、実施例Bのほうが好ましいという点が挙げられる。 Note that the saliency extraction unit 210 and the rectangle determination unit 230 in Example C generally correspond to the candidate selection unit 21 and candidate selection unit 23 in Example B shown in FIG. The reliability SS (w) in Example B has substantially the same significance. As a difference, the example B is preferable in that the accuracy can be improved by learning the prior probability regarding the existence of the object by the configuration of FIG.

以上、部分領域抽出部11の3通りの[実施例A]、[実施例B]及び[実施例C]について説明した。次に再判定認識部13(図1参照)を説明する。 The three examples [Example A], [Example B], and [Example C] of the partial region extraction unit 11 have been described above. Next, the redetermination recognition unit 13 (see FIG. 1) will be described.

図11は、再判定認識部13の機能ブロックとその説明とを示す図である。再判定認識部13は、局所特徴量抽出部31、クラスタリング処理部32、コードブック蓄積部33、ベクトル量子化部34、画像特徴ベクトル蓄積部35、機械学習部36、識別器蓄積部37及び識別部38を含む。 FIG. 11 is a diagram illustrating a functional block of the redetermination recognition unit 13 and a description thereof. The re-determination recognition unit 13 includes a local feature amount extraction unit 31, a clustering processing unit 32, a code book storage unit 33, a vector quantization unit 34, an image feature vector storage unit 35, a machine learning unit 36, a classifier storage unit 37, and a discrimination Part 38 is included.

局所特徴量抽出部31は、画像からエッジや凹凸等の信号変化の大きい点をキーポイントとして複数抽出し、各キーポイント付近の色、形、模様等から算出される局所特徴量（画像信号値）を出力する。図12は、当該局所特徴量としての画像信号値の算出を説明する図である。図12の(1)に示す犬の画像の例においては、(2)に示すように目、鼻、輪郭等がキーポイントとして複数個抽出された局所特徴量が出力される。 The local feature amount extraction unit 31 extracts a plurality of points with large signal changes such as edges and irregularities from the image as key points, and calculates local feature amounts (image signal values) from colors, shapes, patterns, etc. near each key point. ) Is output. FIG. 12 is a diagram for explaining calculation of an image signal value as the local feature amount. In the example of the dog image shown in (1) of FIG. 12, as shown in (2), a plurality of local feature amounts extracted with the eyes, nose, contour, etc. as key points are output.

ここで行われる局所特徴量の算出は、公知の技術を利用して実現することができる。具体的な実現手段としては、例えば、Loweらによって提案されたSIFT（Scale-Invariant Feature Transform、 Lowe、 D. "Distinctive Image Features from scale-invariant keypoints、" International journal of Computer Vision、 Vol.60、 No.2、 pp. 91-110、 2004）を用いることができる。この場合、図12の(3)に示すように、SIFT特徴量による局所特徴量は128次元のベクトルとして出力される。 The calculation of the local feature amount performed here can be realized using a known technique. For example, SIFT (Scale-Invariant Feature Transform, Lowe, D. “Distinctive Image Features from scale-invariant keypoints,” International journal of Computer Vision, Vol. 60, No. .2, pp. 91-110, 2004) can be used. In this case, as shown in (3) of FIG. 12, the local feature amount based on the SIFT feature amount is output as a 128-dimensional vector.

局所特徴量の算出に際して上述したSIFTを使用することにより、回転や大きさ等について見え方の異なる画像でも、同一被写体および同じ内容の画像であれば、同じキーポイントが抽出され、同じ特徴ベクトルが抽出される。SIFTは、以下の流れで(A)キーポイントの検出と、(B)特徴ベクトルの抽出の各処理が行われる。 By using the above-mentioned SIFT when calculating local feature values, the same key points are extracted and the same feature vector is extracted even if the images have different appearances, such as rotation and size, if they are the same subject and the same content. Extracted. In SIFT, each process of (A) key point detection and (B) feature vector extraction is performed in the following flow.

[SIFTの処理の流れ]
(A)キーポイントの検出
(a)キーポイント候補点の検出
(b)キーポイントのローカライズ
(B)特徴ベクトルの抽出
(c)オリエンテーションの算出
(d)特徴量の抽出 [Process flow of SIFT]
(A) Keypoint detection
(a) Detection of key point candidate points
(b) Localizing key points
(B) Feature vector extraction
(c) Calculation of orientation
(d) Feature extraction

(a)のキーポイント候補点の検出では、DoG（Difference-of-Gaussian）処理により画像からエッジや凹凸等の信号変化の大きい点をキーポイント候補点として複数検出する。ガウス関数のスケールを数段階に変化させ、ガウス関数と入力画像を畳み込んだ平滑化画像を複数作成し、それらの平滑化画像の差分画像（DoG画像）内で極値となる点をキーポイント候補点として検出する。 In the detection of key point candidate points in (a), a plurality of points with large signal changes such as edges and irregularities are detected from the image by DoG (Difference-of-Gaussian) processing as key point candidate points. Change the scale of the Gaussian function in several stages, create multiple smoothed images convoluted with the Gaussian function and the input image, and key points are the points that become extreme values in the difference image (DoG image) of those smoothed images Detect as candidate points.

(b)のキーポイントのローカライズでは、(a)で検出されたキーポイント候補点から安定して抽出できるキーポイントを絞り込む。すなわち、コントラストの小さい点、主曲率の大きな点を、ノイズの影響を受けた点、安定的な抽出には向かない点として、キーポイントの候補点からそれぞれ削除する。 In the localization of key points in (b), the key points that can be stably extracted from the key point candidate points detected in (a) are narrowed down. That is, a point having a small contrast and a point having a large main curvature are deleted from the key point candidate points as points affected by noise and points not suitable for stable extraction.

(c)のオリエンテーションの算出では、同じキーポイントであれば画像が回転しても同じ特徴ベクトルが抽出できるようにするため、平滑化画像内の各点の勾配から、各キーポイントを特徴付ける方向を算出する。具体的には、キーポイント周辺の矩形領域から勾配方向と勾配強度に関するヒストグラムを測定する。先ず、勾配方向に関して36に量子化された階級で分類する。次に、分類された階級に勾配強度を加算し、ヒストグラム内で最頻を示した階級の方向をオリエンテーションとして算出する。 In the orientation calculation of (c), in order to extract the same feature vector even if the image is rotated at the same key point, the direction characterizing each key point is determined from the gradient of each point in the smoothed image. calculate. Specifically, a histogram relating to the gradient direction and gradient intensity is measured from a rectangular area around the key point. First, classification is performed by a class quantized to 36 with respect to the gradient direction. Next, the gradient strength is added to the classified class, and the direction of the class showing the mode in the histogram is calculated as the orientation.

(d)の特徴量の抽出では、(c)で求めたオリエンテーションに基づいて、各キーポイントにおける特徴ベクトルの抽出対象領域を正規化し、正規化して切り出されたキーポイント周辺の特徴ベクトルの抽出対象領域から特徴ベクトルを算出して、局所特徴量とする。 In the feature quantity extraction in (d), the feature vector extraction target area at each key point is normalized based on the orientation obtained in (c), and the feature vector extraction target around the key point extracted by normalization is extracted. A feature vector is calculated from the region and set as a local feature amount.

なお、局所特徴量抽出部31において抽出する局所特徴量についてはSIFT特徴量に限らず、その他の種類の局所特徴量、例えばSURF（Speeded Up Robust Features）やHOG（Histograms of Oriented Gradients）などを採用してもよい。SURFについては以下の非特許文献6に、HOGについては以下の非特許文献7に詳しい。 The local feature quantity extracted by the local feature quantity extraction unit 31 is not limited to the SIFT feature quantity, but other types of local feature quantities such as SURF (Speeded Up Robust Features) and HOG (Histograms of Oriented Gradients) are adopted. May be. SURF is detailed in the following non-patent document 6, and HOG is detailed in the following non-patent document 7.

[非特許文献6] Herbert Bay,Andreas Ess,Tinne Tuytelaars,Luc Van Gool,"SURF:Speeded-Up Robust Features,In Ninth European Conference on Computer Vision,2006 [Non-Patent Document 6] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded-Up Robust Features, In Ninth European Conference on Computer Vision, 2006

[非特許文献7] Navneet, Dalal and Bill Triggs., "Histograms of Oriented Gradients for Human Detection, " IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pages 886-893, June 2005. [Non-Patent Document 7] Navneet, Dalal and Bill Triggs., "Histograms of Oriented Gradients for Human Detection," IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. II, pages 886-893, June 2005.

図13は、収集した様々な収集画像について、局所特徴量抽出部31で行われる画像信号値の算出についての説明図である。図14は、クラスタリング処理部32で行うクラスタリング、ベクトル量子化部34で行うベクトル量子化、ヒストグラム化についての説明図である。以降、当該図13及び図14を適宜参照しながら説明する。 FIG. 13 is an explanatory diagram of image signal value calculation performed by the local feature quantity extraction unit 31 for various collected images. FIG. 14 is an explanatory diagram of clustering performed by the clustering processing unit 32, vector quantization performed by the vector quantization unit 34, and histogram formation. Hereinafter, description will be made with reference to FIGS. 13 and 14 as appropriate.

クラスタリング処理部32は、図13に示す通り、様々に異なる多種多様の収集画像から局所特徴量抽出部31により抽出された局所特徴量について、図14の(a)に示すように、局所特徴量空間上にプロットすることによって（各×点が局所特徴量である）、局所特徴量集合の空間上での分布特性を測定し、距離の近い局所特徴量同士で空間をクラスタリングして、各クラスタのセントロイド（代表ベクトル、図14の(a)における○×点）を後段のベクトル量子化部34のためのcodebook(コードブック)として出力する。 As shown in FIG. 13, the clustering processing unit 32 uses the local feature amount extracted by the local feature amount extraction unit 31 from a variety of different collected images, as shown in FIG. By plotting on the space (each x point is a local feature), the distribution characteristics in the space of the local feature set are measured, the space is clustered with local features close to each other, and each cluster Centroid (representative vector, xx points in FIG. 14A) is output as a codebook for the vector quantization unit 34 in the subsequent stage.

クラスタリングの具体的な実現手段としては、公知技術であるk-meansを用いることができる。k-means によるクラスタリングは、以下の（１）〜（４）の手順により行われる。この場合、分割クラスタ数ｋを任意に設定でき、生成される画像特徴量はｋ次元となる。 As a specific means for realizing clustering, k-means, which is a known technique, can be used. Clustering by k-means is performed by the following procedures (1) to (4). In this case, the number of divided clusters k can be arbitrarily set, and the generated image feature amount is k-dimensional.

[k-meansの手順]
(1)データを指定された任意の数であるk個のクラスタに分割する。
(2)各クラスタについて重心を計算する。
(3)全てのデータについて、重心との距離を最小にするクラスタを求め、各データを最小のクラスタに割り当てる。
(4)前回のクラスタから変化がなければ終了する。変化がある場合は、(2)に戻る。 [k-means procedure]
(1) The data is divided into k clusters, which is an arbitrary number.
(2) Calculate the center of gravity for each cluster.
(3) For all data, find a cluster that minimizes the distance from the center of gravity, and assign each data to the smallest cluster.
(4) Terminate if there is no change from the previous cluster. If there is a change, return to (2).

コードブック蓄積部(codebook蓄積部)33は、クラスタリング処理部32で算出されたセントロイド（代表ベクトル）をベクトル量子化のためのcodebookとして蓄積する。 The codebook storage unit (codebook storage unit) 33 stores the centroid (representative vector) calculated by the clustering processing unit 32 as a codebook for vector quantization.

ベクトル量子化部34は、局所特徴を最近傍の代表ベクトルへと量子化するのと同時に各代表ベクトルの出現頻度を計測して画像特徴ベクトルとして出力する。 The vector quantization unit 34 measures the appearance frequency of each representative vector and outputs it as an image feature vector simultaneously with quantizing the local feature into the nearest representative vector.

具体的には、まずコードブック蓄積部33に記憶されたcodebookにより、新たに局所特徴量抽出部31で測定された局所特徴量を、図14の（ｂ）に示す通り、最近傍のセントロイド（代表ベクトル）へと量子化する。 Specifically, first, the local feature amount newly measured by the local feature amount extraction unit 31 based on the codebook stored in the codebook storage unit 33 is calculated as shown in FIG. Quantize to (representative vector).

例えば、新たに図13のうちの１つのような画像が入力された場合、局所特徴量抽出部31で特徴点を抽出して局所特徴量を算出し、この局所特徴量がベクトル量子化部34にて図14の（ｂ）に示されるように量子化される。 For example, when an image such as one in FIG. 13 is newly input, a local feature amount is extracted by the local feature amount extraction unit 31 to calculate a local feature amount, and the local feature amount is calculated by the vector quantization unit 34. In FIG. 14, the quantization is performed as shown in FIG.

この際、様々な収集画像から抽出された局所特徴量を用いることにより、自然画像として出現しうる局所特徴量のパターンの分布を求め、その中から代表的な局所特徴量を各クラスタのセントロイド（代表ベクトル）として抽出することになる。 At this time, by using local feature values extracted from various collected images, the distribution of patterns of local feature values that can appear as natural images is obtained, and representative local feature values are obtained from the centroid of each cluster. It will be extracted as (representative vector).

次に、各セントロイドへと量子化された局所特徴量の頻度分布を計測し、図14の（ｃ）に示す通り、ｋ次元のヒストグラムを画像特徴ベクトルとして出力する。 Next, the frequency distribution of the local feature quantity quantized to each centroid is measured, and a k-dimensional histogram is output as an image feature vector as shown in FIG.

このようにして、ベクトル量子化部34において得られる画像特徴ベクトルについて、雑音の影響による僅かな画像特徴ベクトルの変動であれば許容し、雑音の影響に耐性を持たせ、かつ画像の種類により明確な差がはっきりと識別されるようにした画像特徴ベクトルに変換することが可能となる。 In this way, the image feature vector obtained by the vector quantization unit 34 is allowed to be a slight variation in the image feature vector due to the influence of noise, is resistant to the influence of noise, and is clearly defined by the type of image. It is possible to convert the difference into an image feature vector in which the difference is clearly identified.

画像特徴ベクトル蓄積部35は、ベクトル量子化部34によりあらかじめ生成された様々な種類の物体に関する画像特徴ベクトルを物体の名称と紐付けて蓄積する。 The image feature vector storage unit 35 stores image feature vectors relating to various types of objects generated in advance by the vector quantization unit 34 in association with the names of the objects.

機械学習部36は、着目している物体とそれ以外の物体の両方の画像特徴量から、両者で大きく異なる相違点を機械学習により抽出し、着目している物体を識別するための判定基準となる識別器を生成する。 The machine learning unit 36 extracts, by machine learning, differences that are greatly different from the image feature amounts of both the object of interest and the other objects, and a criterion for identifying the object of interest. Is generated.

本発明では、機械学習の具体的な実現手段として、非特許文献4に記載の公知技術であるSVM（Support Vector Machine）を用いることを想定する。SVMは、与えられたデータが設定されたある２つのクラスのいずれに属するかを判定する方法であり、1995年にAT&TのCorinna CortesとV.Vapnikによって提案された。SVMでは、判定基準は事前に収集したサンプルデータから学習によって識別器という形態で抽出される。 In the present invention, it is assumed that SVM (Support Vector Machine), which is a known technique described in Non-Patent Document 4, is used as a specific means for realizing machine learning. SVM is a method for determining which of two given classes a given data belongs to, and was proposed in 1995 by Corinna Cortes and V. Vapnik of AT & T. In SVM, criteria are extracted in the form of a discriminator by learning from sample data collected in advance.

[非特許文献4]Corinna Cortes, Vladimir Vapnik, "Support-Vector Networks", Machine Learning, 20, pp.273-297 (1995) [Non-Patent Document 4] Corinna Cortes, Vladimir Vapnik, "Support-Vector Networks", Machine Learning, 20, pp.273-297 (1995)

「顔」を例にとって説明する。与えられた画像が「顔画像であるかそうでないか」を分類するためには、顔画像だけでなく、顔以外の画像も多数収集し、学習用のサンプル画像として用いる。この場合、顔画像は正例サンプル、顔以外の画像は負例サンプルである。この画像は「顔である」もしくは「顔ではない」といった事前知識を与え、この事前知識に基づき、分類のための判定基準を生成する。 An explanation will be given by taking “face” as an example. In order to classify whether a given image is a “face image” or not, a large number of images other than the face image as well as the face image are collected and used as sample images for learning. In this case, the face image is a positive example sample, and the image other than the face is a negative example sample. This image gives prior knowledge such as “is a face” or “not a face”, and generates a criterion for classification based on this prior knowledge.

実際には、各サンプル画像を画像特徴ベクトルに変換し、特徴空間上での分布状況から正例サンプル（顔）と負例サンプル（非顔）とを分類する境界（超平面）を求め、分類のための判定基準として用いる。この判定基準を識別器として出力する。分類のための判定基準となる境界（超平面）は新たに与えられた画像を分類する際に用いられ、後述の識別部38では、「顔」に対応する識別器を利用して新たに与えられた画像から抽出された画像特徴ベクトルが特徴空間上で境界（超平面）のいずれ側に含まれるかによって、顔画像であるか否か、を判定する。 Actually, each sample image is converted into an image feature vector, and the boundary (hyperplane) that classifies the positive example sample (face) and negative example sample (non-face) from the distribution state in the feature space is obtained and classified. As a criterion for This criterion is output as a discriminator. The boundary (hyperplane), which is a criterion for classification, is used when classifying a newly given image, and the classifier 38 described later uses a classifier corresponding to “face” to give a new classifier. Whether the image is a face image is determined depending on which side of the boundary (hyperplane) the image feature vector extracted from the extracted image is included.

あらかじめ様々な種類の物体の画像特徴ベクトルが蓄積されている画像特徴ベクトル蓄積部35から画像特徴ベクトルを読み込み、各物体について識別器をそれぞれ1つずつ作成する。例えば、「自動車」の識別器を作成する場合は、「自動車」とラベルが付与されている画像特徴ベクトルを正例サンプルとして読み込み、それ以外のラベルが付与されている画像特徴ベクトルを負例サンプルとして読み込む。負例サンプルの対象となりうる画像特徴ベクトルは膨大であるため正例サンプルと同数程度に間引いてからSVMによる機械学習を適用し、識別器を作成する。 Image feature vectors are read from the image feature vector storage unit 35 in which image feature vectors of various types of objects are stored in advance, and one classifier is created for each object. For example, when creating a discriminator for “automobile”, an image feature vector labeled “automobile” is read as a positive sample, and an image feature vector labeled other than that is read as a negative sample. Read as. Since the image feature vectors that can be the target of the negative sample are enormous, the classifier is created by applying machine learning by SVM after thinning out to the same number as the positive sample.

識別器蓄積部37は、機械学習部36によりあらかじめ生成された様々な種類の物体に関する識別器を、当該識別器が識別する物体(又は対象)の名称と紐付けて蓄積する。 The discriminator accumulating unit 37 accumulates discriminators relating to various types of objects generated in advance by the machine learning unit 36 in association with names of objects (or targets) identified by the discriminator.

識別部38は、画像認識部12により推定されたN種類の物体名称(又は対象が何であるか)の候補について、識別器蓄積部37から該当する識別器を各々読み込み、新たに入力された画像がN種類の物体のいずれであるかを判定する。 The discriminator 38 reads the corresponding discriminator from the discriminator accumulator 37 for candidates of N types of object names (or what the target is) estimated by the image recognition unit 12, and newly input images Is one of N types of objects.

ここで、読み込まれた各物体(各対象)の識別器には、該当する物体(対象)を識別するための判定基準となる境界（超平面）が与えられている。新たに入力された画像から抽出された画像特徴ベクトルが特徴空間上で境界（超平面）のいずれ側に含まれるかによって、該当する物体であるか否か、を判定する。SVMでは、新たな画像特徴ベクトルが正例側または負例側のいずれ側にあるかを示す尺度として、境界（超平面）からの距離に応じた信頼度を出力する。信頼度0は境界上と考え、正例側に位置すれば信頼度は正、負例側に位置すれば信頼度は負の値であることを示す。 Here, a boundary (hyperplane) serving as a determination criterion for identifying the corresponding object (target) is given to the classifier of each read object (each target). Whether or not the object is a corresponding object is determined depending on which side of the boundary (hyperplane) the image feature vector extracted from the newly input image is included. In SVM, as a scale indicating whether the new image feature vector is on the positive example side or the negative example side, the reliability according to the distance from the boundary (hyperplane) is output. The reliability of 0 is considered to be on the boundary, and the reliability is positive if it is located on the positive example side, and the reliability is negative if it is located on the negative example side.

したがって、新たに入力された画像に対し、N種類の識別器のそれぞれから出力される信頼度の中で、最大の信頼度を示す物体(対象)を特定する。その信頼度が指定されたしきい値を超えれば、入力された画像はその物体であると判定し、しきい値を下回れば、N種類の物体に該当無しと判定する。 Therefore, an object (target) showing the maximum reliability among the reliability output from each of the N types of discriminators is specified for a newly input image. If the reliability exceeds a specified threshold value, it is determined that the input image is the object, and if the reliability falls below the threshold value, it is determined that there is no corresponding to N types of objects.

上記一実施例(実施例cとする)に対する識別部38の別の一実施例(実施例dとする)に係る機能ブロックを図17に示す。識別部38は共起関係測定部381、事前確率分布蓄積部382及び総合信頼度算出部383を含む。当該各部は、識別部38が実施例cに従って求めた各対象画像のN個の物体に対する信頼度に対して追加処理を行って総合信頼度を求める。識別部38は、当該各部により求められた総合信頼度に基づいて入力画像(図1の部分領域抽出部11への入力画像)に対する最終的な識別結果を得る。 FIG. 17 shows functional blocks according to another embodiment (referred to as embodiment d) of the identification unit 38 for the above-described one embodiment (referred to as embodiment c). The identification unit 38 includes a co-occurrence relationship measurement unit 381, a prior probability distribution accumulation unit 382, and an overall reliability calculation unit 383. Each unit performs an additional process on the reliability of the N objects of each target image obtained by the identification unit 38 according to Example c to obtain the overall reliability. The identification unit 38 obtains a final identification result for the input image (the input image to the partial region extraction unit 11 in FIG. 1) based on the total reliability obtained by each unit.

当該実施例dにおいては、実施例cのように各物体領域の矩形候補についての信頼度から独立に判定するのではなく、複数の物体領域の矩形候補間の共起まで考慮し、より高精度に入力された画像に含まれる物体を同時に判定する。 In Example d, rather than determining independently from the reliability of the rectangle candidates of each object region as in Example c, more accurate is considered by considering the co-occurrence between rectangle candidates of a plurality of object regions. The objects included in the image input to are simultaneously determined.

共起とは、例えば、図1の例では、「自動車」かつ「人物」が同時に出現していることを意味する。共起関係測定部381は、あらかじめ多数のサンプル画像から物体が同時に出現する頻度を共起頻度として測定する。ここで、本発明装置で扱う物体の種類の総数をCとする。画像内にある種類の物体c（c=1, 2, …, C）が含まれればf_c=1、含まれなければf_c=0とする。また、画像内の出現状態を並べたベクトルをf=(f₁, f₂, …., f_C)とする。サンプル画像i内の出現状態ベクトルをf_iとし、サンプル画像の総数をNSとすると、共起関係測定部381はあらかじめ下記(式4及び式5)を測定する。 Co-occurrence means, for example, that “car” and “person” appear simultaneously in the example of FIG. The co-occurrence relationship measuring unit 381 measures the frequency at which objects appear simultaneously from a number of sample images in advance as the co-occurrence frequency. Here, C is the total number of types of objects handled by the device of the present invention. It is within the image type of object c (c = 1, 2, ..., C) is to _{_{f c = 0 f c = 1}} , to be included if included. Also, let f = (f ₁ , f ₂ ,..., F _C ) be a vector in which the appearance states in the image are arranged. If the appearance state vector in the sample image i is f _i and the total number of sample images is NS, the co-occurrence relation measuring unit 381 measures the following (formula 4 and formula 5) in advance.

ただし、νは出現状態ベクトルの平均、Σは出現状態ベクトルの共分散行列を意味する。このとき新たな入力画像の出現状態ベクトルfから算出される事前確率p(f)は次式(式6)から算出できる。当該算出は共起関係測定部381が測定した共起頻度に基づいて行い、算出された事前確率を事前確率分布蓄積部382が蓄積する。 Here, ν means an average of appearance state vectors, and Σ means a covariance matrix of appearance state vectors. At this time, the prior probability p (f) calculated from the appearance state vector f of the new input image can be calculated from the following equation (Equation 6). The calculation is performed based on the co-occurrence frequency measured by the co-occurrence relation measuring unit 381, and the prior probability distribution accumulating unit 382 accumulates the calculated prior probabilities.

なお、本発明装置で扱う物体の種類の総数Cとは図15の画像認識部12におけるデータセット蓄積部83に予め登録するラベルの総種類数である。また、画像内にある種類の物体c（c=1, 2, …, C）とは、当該登録された各ラベルcによって特定された物体c(又はより一般には対象c)である。 Note that the total number C of object types handled by the apparatus of the present invention is the total number of label types registered in advance in the data set storage unit 83 in the image recognition unit 12 in FIG. Also, the kind of object c (c = 1, 2,..., C) in the image is the object c (or more generally the target c) specified by each registered label c.

あるいは、共起関係の別実施例として、物体cがサンプル画像において矩形領域として占める面積占有率で共起関係を記述するようにしてもよい。すなわち、サンプル画像内にある種類の物体c（c=1, 2, …, C）の占める矩形領域の面積に応じた存在比率r_c（Σr_c=1）を測定し、存在比率を並べたベクトルをr=(r₁, r₂, …., r_C)とし、出現状態ベクトルfの代わりに存在比率を並べたベクトルrを用いて、(式4)〜(式6)に対応する同様の手順(式7)〜(式9)で事前確率p(r)を求めることができる。 Alternatively, as another example of the co-occurrence relationship, the co-occurrence relationship may be described by an area occupancy ratio that the object c occupies as a rectangular region in the sample image. That is, the existence ratio r _c (Σr _c = 1) corresponding to the area of the rectangular area occupied by the kind of object c (c = 1, 2,..., C) in the sample image is measured, and the existence ratios are arranged. The vector r = (r ₁ , r ₂ ,..., R _C ), and using the vector r in which the existence ratios are arranged instead of the appearance state vector f, corresponding to (Expression 4) to (Expression 6) The prior probability p (r) can be obtained by the following procedures (Equation 7) to (Equation 9).

総合信頼度算出部383は、各候補領域(各対象画像)とその各ラベルとの組合せに対する総合信頼度を、事前確率分布蓄積部382に蓄積された当該事前確率p(f)またはp(r)と、各物体の候補領域について各候補となる物体に対する信頼度を掛け合わせた積として算出し、総合信頼度が最大となる組合せを最終的な識別結果とする。 The total reliability calculation unit 383 indicates the total reliability for the combination of each candidate region (each target image) and each label with the prior probability p (f) or p (r) stored in the prior probability distribution storage unit 382. ) And the candidate area of each object are multiplied by the reliability of each candidate object, and the combination with the maximum total reliability is the final identification result.

例えば、図1の例では、3つの候補領域(対象画像)に対するラベル例「自動車」(実施例cにより独立に求めた信頼度0.9)かつ「人物」(独立の信頼度0.7)かつ「該当無し」(独立の信頼度1.0)の総合信頼度は、自然界で共起しやすい関係であることから事前確率が0.95であったと仮定すると、0.9×0.7(×1.0)×0.95=0.5985となる。これに対し、ラベル例「自動車」(独立の信頼度0.9)かつ「該当なし」(独立の信頼度1.0)かつ「キリン」(独立の信頼度0.2)の総合信頼度は、自然界で共起しにくい関係であることから事前確率が0.1であったと仮定すると、0.9(×1.0)×0.2×0.1=0.018と非常に小さな値となる。 For example, in the example of FIG. 1, label examples for three candidate regions (target images) “car” (reliability 0.9 obtained independently according to Example c), “person” (independent reliability 0.7), and “not applicable” The total reliability of (independent reliability of 1.0) is 0.9 × 0.7 (× 1.0) × 0.95 = 0.5985, assuming that the prior probability is 0.95 because it is a co-occurrence relationship in nature. On the other hand, the overall reliability of the label example “automobile” (independent reliability 0.9), “not applicable” (independent reliability 1.0) and “Kirin” (independent reliability 0.2) co-occurs in nature. Assuming that the prior probability was 0.1 because of the difficult relationship, 0.9 (× 1.0) × 0.2 × 0.1 = 0.018, which is a very small value.

また、面積占有率による共起関係を用いる場合、対象同士の大きさの関係が妥当であるかについても考慮された総合信頼度が同様にして算出されることとなる。 Further, when using the co-occurrence relationship based on the area occupancy rate, the total reliability considering whether the size relationship between the objects is appropriate is calculated in the same manner.

なお、総合信頼度算出部383において総合信頼度を求める組合せは、各候補領域とその各ラベルに「該当無し」を加えたものとの全ての組合せである。「該当無し」の場合には上記計算例で示したように、信頼度を規格化された最大値1.0とする。例えば図1の(2)に示す3つの候補領域(対象画像)とそのラベル(3)とが与えられている場合は、1つ目の候補領域に対する「自動車」〜「バイク」及び「該当無し」の5+1=6ラベルと、2つ目の候補領域に対する「人物」〜「電柱」及び「該当無し」の4+1=5ラベルと、3つ目の候補領域に対する「キリン」〜「月」及び「該当無し」の5+1=6ラベルとの、全ての組合せ6×5×6=180通りについて総合信頼度を求め、その中から最大の値の組合せを最終認識結果となす。 It should be noted that the combinations for which the total reliability is calculated by the total reliability calculation unit 383 are all combinations of each candidate area and each label obtained by adding “not applicable”. In the case of “not applicable”, the reliability is set to a standardized maximum value 1.0 as shown in the above calculation example. For example, when the three candidate areas (target image) and the label (3) shown in (2) of FIG. 1 are given, “automobile” to “bike” and “not applicable” for the first candidate area ”5 + 1 = 6 label,“ person ”to“ electric pole ”and“ not applicable ”4 + 1 = 5 label for the second candidate area, and“ giraffe ”to“ The total reliability is obtained for all combinations 6 × 5 × 6 = 180 with “5 months” and “not applicable” 5 + 1 = 6 labels, and the combination with the maximum value is the final recognition result.

なおまた、全ての候補領域が「該当無し」は無意味であるので除外する、例えば上記例なら180−1＝179通りを評価するようにしてもよい。あるいは、サンプルを適切に設定すれば全ての候補領域が「該当無し」の事前確率は0となるので、総合信頼度0として評価してもよい。 In addition, since “not applicable” is meaningless for all candidate areas, it may be excluded. For example, in the above example, 180-1 = 179 patterns may be evaluated. Alternatively, if the sample is appropriately set, the prior probability that all candidate areas are “not applicable” is 0, and therefore, the total reliability may be evaluated as 0.

なおまた、共起関係測定部381が読み込む所与の複数のサンプル画像には、予め各々に複数の対象が含まれており且つ当該各複数の対象の各々を特定するラベルが付与されているものとする。また、共起関係を占有面積の割合によって記述する実施例の場合には、ラベルに加えてさらに、複数の対象の各々について本発明の部分領域抽出部11によって抽出されるのと同様の矩形領域の情報が付与されているものとする。 In addition, the given plurality of sample images read by the co-occurrence relation measuring unit 381 each include a plurality of targets in advance, and are provided with a label for specifying each of the plurality of targets. And Further, in the case of the embodiment in which the co-occurrence relationship is described by the ratio of the occupied area, in addition to the label, a rectangular area similar to that extracted by the partial area extracting unit 11 of the present invention for each of a plurality of objects Information is given.

以上のように、実施例dにおいては、実施例cのように各物体領域の矩形候補についての信頼度から独立に判定するのではなく、複数の物体領域の矩形候補間の共起まで考慮した総合信頼度を算出し、算出された総合信頼度から総合的に判定することにより、より高精度に入力された画像に含まれる物体を同時に判定できるようになる。 As described above, in Example d, instead of determining independently from the reliability of the rectangle candidates of each object region as in Example c, consideration was given to co-occurrence between rectangle candidates of a plurality of object regions. By calculating the total reliability and comprehensively determining from the calculated total reliability, it becomes possible to simultaneously determine objects included in the input image with higher accuracy.

[登録画像について]
本発明の主要部分についての説明は以上であるが、最後に、画像認識部12(図1参照)においてデータセット蓄積部83に蓄積する特徴ベクトル、および、再判定認識部13において画像特徴ベクトル蓄積部35に蓄積する画像特徴ベクトルについて、本発明の認識精度を上げるための注意事項について説明する。なお、当該蓄積のための登録時の流れについては、図1や図11にて点線の矢印で示した通りである。 [Registered images]
The description of the main part of the present invention is as described above. Finally, the feature vector stored in the data set storage unit 83 in the image recognition unit 12 (see FIG. 1) and the image feature vector storage in the redetermination recognition unit 13 With regard to the image feature vectors stored in the unit 35, notes for increasing the recognition accuracy of the present invention will be described. The registration flow for the accumulation is as shown by the dotted arrows in FIGS.

すなわち、データセット蓄積部83(図11参照)に蓄積する特徴ベクトル、画像特徴ベクトル蓄積部35に蓄積する画像特徴ベクトルの生成には、元画像そのものではなく、部分領域抽出部11(図1参照)により抽出された部分領域画像を用いる。これは、認識処理実行の際には、部分領域抽出部11により抽出された部分領域画像が、画像認識部12および再判定認識部13で処理対象となるため、登録時と認識処理実行時のギャップを解消し、比較する特徴ベクトル間の整合性を取ることによって、画像内に含まれる物体の名称の推定精度および識別精度の向上を狙うためである。 That is, the feature vector stored in the data set storage unit 83 (see FIG. 11) and the image feature vector stored in the image feature vector storage unit 35 are generated not by the original image itself but by the partial region extraction unit 11 (see FIG. 1). ) Is used. This is because when the recognition process is executed, the partial area image extracted by the partial area extraction unit 11 is processed by the image recognition unit 12 and the re-determination recognition unit 13. The purpose is to improve the estimation accuracy and identification accuracy of the names of the objects included in the image by eliminating the gap and taking consistency between the feature vectors to be compared.

あるいは、部分領域抽出部11を経るのと同様の効果として、予め単一の対象、例えば自動車と人物との両者ではなく自動車のみ、を含む画像に元画像を限定するようにしてもよい。なおまた、共起関係測定部381で読み込むサンプル画像は上記登録画像とは逆に、単一の対象のみに限定されず、複数の対象をも含んでいる必要がある。 Alternatively, as an effect similar to that performed through the partial area extraction unit 11, the original image may be limited to an image including only a single object in advance, for example, only a car, not both a car and a person. In addition, the sample image read by the co-occurrence relation measuring unit 381 is not limited to a single object, and needs to include a plurality of objects, contrary to the registered image.

[画像認識装置1の第三実施例について]
図18に、第三実施例に係る画像認識装置1の機能ブロック図を示す。図1に示した第一又は第二実施例の構成との差異点として画像認識部12が除外され、画像認識装置1は部分領域抽出部11と再判定認識部13とのみを備える。また図18では図1とは異なり、機能ブロック間の矢印は認識時の流れのみを示しており、登録時及びCB(コードブック)作成時の流れは省略してある。 [About the third embodiment of the image recognition apparatus 1]
FIG. 18 shows a functional block diagram of the image recognition apparatus 1 according to the third embodiment. The image recognition unit 12 is excluded as a difference from the configuration of the first or second embodiment shown in FIG. 1, and the image recognition device 1 includes only the partial region extraction unit 11 and the redetermination recognition unit 13. Also, in FIG. 18, unlike FIG. 1, the arrows between the functional blocks show only the flow at the time of recognition, and the flow at the time of registration and CB (code book) creation is omitted.

第三実施例においては、(1)に示すように、各候補領域としての対象画像に画像認識部12によって付与すべきであった候補リストに、予め定まった特定のものを用意しておいて、再判定認識部13での処理を行う。すなわち、全ての候補領域について同一の候補リストを用いて、再判定認識部13は処理を行う。入力画像において現れると想定される対象のうち、典型的なもののみを記載して候補数を適度に抑制した候補リストを予め用意しておくことで、検索に要する計算量を現実的な範囲に抑えたうえで、所定の精度を確保して画像認識装置1を提供することができる。 In the third embodiment, as shown in (1), a predetermined specific list is prepared in the candidate list that should be given by the image recognition unit 12 to the target image as each candidate area. Then, processing in the redetermination recognition unit 13 is performed. That is, the redetermination recognition unit 13 performs processing using the same candidate list for all candidate regions. By preparing in advance a candidate list in which only typical ones that are supposed to appear in the input image are listed and moderately suppressing the number of candidates, the amount of calculation required for the search is within a realistic range. In addition, it is possible to provide the image recognition device 1 while ensuring a predetermined accuracy.

[本発明の応用的な利用について]
本発明の画像認識装置1によって画像内に含まれる複数の物体の名称と存在する場所を自動的に認識してテキスト情報化することができる。これにより、画像クラウドサーバ上でユーザがアップロードしたデジカメ画像にタグを付け、キーワードによる画像検索に応用が可能となる。 [Application of the present invention]
The image recognition apparatus 1 of the present invention can automatically recognize the names and locations of a plurality of objects included in an image and convert them into text information. As a result, the digital camera image uploaded by the user on the image cloud server can be tagged and applied to image search using keywords.

1…画像認識装置、11…部分領域抽出部、12…画像認識部、13…再判定認識部、21…候補選出部、22…矩形抽出部、23…候補選別部、82…画像特徴量変換部、83…データセット蓄積部、84…比較部、85…投票判定部、31…局所特徴量抽出部、32…クラスタリング処理部、33…コードブック蓄積部、34…ベクトル量子化部、35…画像特徴ベクトル蓄積部、36…機械学習部、37…識別器蓄積部、38…識別部 DESCRIPTION OF SYMBOLS 1 ... Image recognition apparatus, 11 ... Partial area extraction part, 12 ... Image recognition part, 13 ... Redetermination recognition part, 21 ... Candidate selection part, 22 ... Rectangle extraction part, 23 ... Candidate selection part, 82 ... Image feature-value conversion , 83 ... Data set storage unit, 84 ... Comparison unit, 85 ... Vote determination unit, 31 ... Local feature amount extraction unit, 32 ... Clustering processing unit, 33 ... Codebook storage unit, 34 ... Vector quantization unit, 35 ... Image feature vector storage unit 36 ... Machine learning unit 37 ... Discriminator storage unit 38 ... Discrimination unit

Claims

A partial area extraction unit that cuts out a plurality of partial areas from the input image and sets each as a target image;
An image recognition unit for recognizing what the target image represents,
The image recognition unit
An image feature amount conversion unit that extracts a first image feature amount from the target image and converts it into an image feature vector;
A data set storage unit that assigns a label to each of a plurality of given images, and stores the images in association with the image feature vectors converted by the image feature amount conversion unit;
The image feature vector of the target image is compared with each of the image feature vectors stored in the data set storage unit, and the similarity between the target image and each of the plurality of stored stored images is obtained. A comparison unit;
Of the labels the similarity is the applied for a predetermined number of images of the upper includes a voting decision unit which occurrence frequency is determined as the recognition result of or represents what the target image a label that meets the predetermined criteria, the ,
Recognizing what each of the plurality of targets included in the input image is based on the recognition result by the image recognition unit in each of the partial regions cut out by the partial region extraction unit, and for each of the plurality of targets Recognizing the position in the input image ,
Based on a comparison between a second image feature amount previously associated with the candidate and a second image feature amount extracted from the target image, with each of the labels satisfying the predetermined criterion in the voting determination unit as a candidate And a re-determination recognition unit that recognizes what the target image represents,
Recognizing what each of the plurality of objects included in the input image is, the recognition result of the re-determination recognition unit is adopted instead of the recognition result of the image recognition unit,
The image feature amount conversion unit converts the pixel value from the target image into an image feature vector having each element as a first image feature amount,
The redetermination recognition unit includes a local feature amount extraction unit that extracts a local feature amount as the second image feature amount from the target image;
A clustering processing unit that extracts local feature amounts from a plurality of given images and performs clustering to calculate a representative vector of each cluster;
A codebook storage unit for storing the calculated representative vector as a codebook;
A vector quantization unit that quantizes the local feature amount extracted from the target image into a representative vector with reference to the accumulated codebook, and measures an appearance frequency of each representative vector to obtain an image feature vector; ,
An image feature vector storage unit that stores a plurality of given images each assigned a label and an image feature vector obtained by applying the vector quantization unit to each image in association with each other;
Applying machine learning using the labeled image feature vector stored in the image feature vector storage unit as learning data, and outputting a discriminator for each label;
A discriminator accumulating unit for accumulating a discriminator for each of the labels;
The discriminator corresponding to each of the labels obtained as the recognition results in the voting judgment unit is read from the discriminator storage unit, and the image feature vector obtained by the vector quantization unit is input to the target image. Thus, a label satisfying a predetermined standard is obtained, and the obtained label is used as a recognition result indicating what the target image represents. If a label satisfying the predetermined standard cannot be obtained, what the target image represents An image recognition apparatus comprising: an identification unit that determines that the recognition result is not applicable .

The partial area extraction unit specifies a plurality of rectangular areas having sides parallel to the vertical or horizontal axis of the input image as the partial areas according to the relative position and size of the input image in the input image. 2. The image recognition according to claim 1, further comprising: a candidate selection unit that is selected and a rectangular extraction unit that cuts out the plurality of selected rectangular regions from the input image and forms the partial regions. apparatus.

The candidate selection unit includes a random number generation unit that specifies a predetermined number of relative positions and sizes that the rectangular area occupies in the input image using random numbers, or a target in which a target exists in advance in a predetermined image Statistical information measuring unit for measuring statistical information on a specific position and size, a probability distribution storing unit for storing a prior probability distribution for a predetermined combination of the measured position and size, and 3. The image recognition apparatus according to claim 2, further comprising a candidate extraction unit that specifies a predetermined number of relative positions and sizes.

The partial region extraction unit further includes a candidate selection unit that selects only a part of the plurality of rectangular shapes selected by the candidate selection unit based on an overlapping relationship between the regions, 4. The image recognition apparatus according to claim 2, wherein the rectangular extraction unit cuts out only the selected area.

The candidate selection unit
A region dividing unit that divides the input image into regions of arbitrary shapes based on similarity of color and / or texture features;
An object reliability calculation unit that calculates the reliability of whether or not the plurality of rectangular regions selected by the candidate selection unit capture the target using the degree of covering the divided regions without excess or deficiency as an index When,
A final candidate determining unit that selects only the part based on the calculated high reliability and a small overlap between the plurality of rectangular regions selected by the candidate selecting unit. The image recognition device according to claim 4, wherein:

The region dividing unit is
A small region dividing unit that excessively divides the input image into a plurality of small regions of arbitrary shapes based on similarity of color and / or texture characteristics;
A target area selecting unit that outputs an internal area belonging to the small area and an external area not belonging to the small area in the input image in association with each of the plurality of subdivided small areas;
A small area expansion unit that integrates an area similar in color and / or texture characteristics with the corresponding internal area in the external area and outputs the area as a small area expansion area;
An area similarity calculation unit for calculating the area similarity of a pair of the small area expansion areas;
Among the small region pairs, the small region pairs in which the region similarity of the corresponding small region extended region pair satisfies a predetermined criterion are determined to belong to the same object and are integrated, and the small region does not satisfy the predetermined criterion And a small region integration unit that obtains a result of division into the region of the arbitrary shape as a small region integrated as a region corresponding to each object by determining that the pairs belong to different objects. 6. The image recognition device according to claim 5.

The partial region extraction unit,
A saliency extraction unit that extracts a saliency corresponding to the degree of attention for each pixel from the input image;
A predetermined number of areas that have a rectangular shape consisting of sides parallel to the vertical or horizontal axis of the input image and the saliency satisfies a predetermined standard are determined as the partial areas, and the area occupies in the input image A rectangle determining unit that is selected and specified by a relative position and size;
2. The image recognition apparatus according to claim 1, further comprising: a rectangular extraction unit that cuts out the plurality of selected rectangular areas from the input image and forms the partial areas.

The identification unit further includes:
A co-occurrence relation measuring unit that measures a co-occurrence relation between labels from a plurality of given images each configured as an image including a plurality of objects and provided with a label corresponding to each object;
A prior probability distribution accumulating unit that accumulates prior probabilities related to the co-occurrence of labels based on the measured co-occurrence relationship;
Total confidence obtained by multiplying the combination of each of the target images and the score output by the discriminator for each of the labels obtained as a recognition result in the voting determination unit in each of the target images by the prior probability. An overall reliability calculation unit for calculating the degree,
Claims 1, wherein the overall reliability and recognition result if each of the plurality of target included labels to the input image is what for each of the target image corresponding to a combination of maximum 7 The image recognition apparatus in any one of.

The co-occurrence relationship measuring unit is configured as an image including a plurality of targets, and a label corresponding to each target and a given plurality of images to which information on a rectangular area occupied by each target in the image is given. The image recognition apparatus according to claim 8 , wherein a co-occurrence relationship between each other is measured as a co-occurrence relationship of area occupancy ratios of rectangular regions corresponding to the labels.

At least one of the given plurality of images in the data set storage unit and the given plurality of images in the image feature vector storage unit is configured as an image including a single target. The image recognition apparatus according to claim 1 .

A partial region extraction step of cutting out a plurality of partial regions from the input image and making each a target image;
An image recognition step for recognizing what the target image represents,
The image recognition step includes:
An image feature amount converting step of extracting a first image feature amount from the target image and converting it into an image feature vector;
A data set accumulation step of assigning a label to each of a plurality of given images, and accumulating the images in association with the image feature vectors converted in the image feature amount conversion step ;
The image feature vector of the target image is compared with each of the image feature vectors stored in the data set storage step, and the similarity between the target image and each of the stored plurality of images is determined. The desired comparison step,
Includes among label the similarity is the applied for a predetermined number of images of the upper, and voting decision step of frequency is determined as one of the recognition result represents what the target image a label that meets the predetermined criteria, the ,
Recognizing what each of the plurality of targets included in the input image is based on the recognition result of the image recognition step in each of the partial regions cut out by the partial region extraction step, and for each of the plurality of targets Recognizing the position in the input image ,
Based on a comparison between a second image feature amount previously associated with the candidate and a second image feature amount extracted from the target image, with each of the labels satisfying the predetermined criterion in the voting determination step as a candidate Further comprising a redetermination recognition step for recognizing what the target image represents,
In the recognition result of what each of the plurality of objects included in the input image is, the recognition result of the redetermination recognition step is adopted instead of the recognition result of the image recognition step,
In the image feature amount conversion step, the target image is converted into an image feature vector having a pixel value as each element as the first image feature amount,
The re-determination recognition step includes a local feature amount extraction step of extracting a local feature amount as the second image feature amount from the target image;
A clustering process step of extracting local feature amounts from a plurality of given images and performing clustering to calculate a representative vector of each cluster;
A codebook storage step of storing the calculated representative vector as a codebook;
A vector quantization step of quantizing a local feature extracted from the target image into a representative vector with reference to the accumulated codebook, and determining an image feature vector by measuring an appearance frequency of each representative vector; ,
An image feature vector storing step of storing a plurality of given images each assigned a label and an image feature vector obtained by applying the vector quantization step to each of the images in association with each other;
Applying machine learning using the image feature vector with the label stored in the image feature vector storing step as learning data, and outputting a discriminator for each label; and
A discriminator accumulating step for accumulating a discriminator for each label;
The classifier corresponding to each of the labels obtained as the recognition result in the voting determination step is read from the result of the classifier accumulation step, and the image feature vector obtained by the vector quantization step is obtained for the target image. By inputting, a label satisfying a predetermined standard is obtained, and the obtained label is used as a recognition result indicating what the target image represents. If a label satisfying the predetermined standard is not obtained, what is the target image An image recognition method comprising: an identification step that makes a recognition result of whether or not to represent no match .