JP2015164012A

JP2015164012A - Category discriminator generation apparatus, category discrimination device, and computer program

Info

Publication number: JP2015164012A
Application number: JP2014039474A
Authority: JP
Inventors: 崇之梅田; Takayuki Umeda; 泳青孫; Yongqing Sun; 豪入江; Takeshi Irie; 数藤　恭子; Kyoko Sudo; 恭子数藤; 行信谷口; Yukinobu Taniguchi
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-02-28
Filing date: 2014-02-28
Publication date: 2015-09-10

Abstract

PROBLEM TO BE SOLVED: To facilitate category discrimination even in a case of learning a similar category.SOLUTION: A category discriminator generation device comprises: a first image feature amount extraction unit extracting a first image feature amount from a learning image; a second image feature amount extraction unit extracting a second image feature amount; a similar category determination unit determining whether or not a learned category similar to a category of the learning image is present on the basis of the extracted second image feature amount and a second image feature amount of the learned category; a feature amount generation unit generating the second image feature amount of the similar, learned category and a dis-similar second image feature amount and generating a new second image feature amount in addition to the second image feature amount of the learning image on the basis of the first image feature amount of the learning image and a first image feature amount of the similar, learned category if it is determined that the learned category is present; and a category learning unit generating a category discriminator for discriminating the new second image feature amount.

Description

本発明は、画像のカテゴリを識別する技術に関する。 The present invention relates to a technique for identifying a category of an image.

画像に含まれる物体やシーンといったカテゴリを識別する手法として、例えば以下のような手法がある。事前にカテゴリラベル付きの学習用画像から特徴量を抽出し、ＳＶＭ（Support vector machine）等で複数のカテゴリの識別器を学習させる。そして、カテゴリの識別対象である未知の画像の特徴量を抽出し、学習させた複数のカテゴリの識別器を用いて未知の画像が学習済みのカテゴリのいずれに属するかを判定させる。このようにして、未知の画像のカテゴリを識別することができる。
カテゴリの識別に用いられる特徴量は２つの特徴量に大別することができる。１つ目の特徴量は、ＳＩＦＴ（Scale-Invariant Feature Transform）や色ヒストグラムといった画像信号に基づくLow-levelな特徴量（以下、「Low-level特徴量」という。）である。２つ目の特徴量は、Low-level特徴量から新たに学習したMid-levelな特徴量（以下、「Mid-level特徴量」という。）である。Mid-level特徴量は、例えば犬における尻尾・足・毛皮や飛行機における翼・窓・金属といった動物体を構成する構成要素や質感（例えば、非特許文献１参照）を表す特徴量や、都市部におけるビル・道路・人や自然風景における木・草原・川といったシーンを構成する物体や環境（例えば、非特許文献２参照）を表す特徴量である。 As a technique for identifying a category such as an object or a scene included in an image, for example, the following technique is available. Feature amounts are extracted from learning images with category labels in advance, and classifiers of a plurality of categories are learned using an SVM (Support Vector Machine) or the like. Then, the feature amount of the unknown image that is the category identification target is extracted, and it is determined whether the unknown image belongs to the learned category by using the learned classifiers of the plurality of categories. In this way, an unknown image category can be identified.
The feature quantities used for category identification can be roughly divided into two feature quantities. The first feature amount is a low-level feature amount (hereinafter referred to as “Low-level feature amount”) based on an image signal such as SIFT (Scale-Invariant Feature Transform) or a color histogram. The second feature amount is a mid-level feature amount newly learned from the low-level feature amount (hereinafter referred to as “Mid-level feature amount”). Mid-level feature amounts include, for example, feature amounts representing texture elements (for example, see Non-Patent Document 1) and urban areas such as tails, legs, furs in dogs, wings, windows, and metals in airplanes. This is a feature amount representing an object or environment (for example, see Non-Patent Document 2) constituting a scene such as a building, road, person, and a tree, grassland, or river in a natural landscape.

カテゴリの識別にLow-level特徴量を用いる場合、画像信号から直接カテゴリを学習しているため、学習過程が不透明であり、どのようなカテゴリに対してどのような特徴量を用いることが有効であるか検証することが困難である。また、画像信号とカテゴリとの意味的なギャップが大きいため、高精度な識別を行うには多くの学習データが必要となる。
カテゴリの識別にMid-level特徴量を用いる場合、まずLow-level特徴量を基にカテゴリを構成する中間要素の識別器を学習させ、中間要素の識別器の出力値を特徴量としてカテゴリを学習させる。カテゴリは人間に知覚可能な中間要素の組合せで表現されているため、カテゴリの識別にMid-level特徴量を用いる場合には学習過程が明瞭である。また、Mid-level特徴量とカテゴリとの意味的なギャップが、Low-level特徴量とカテゴリとのギャップに比べると小さいため、少量の学習データで高精度な識別が可能である。
しかしながら、Mid-level特徴量を用いる場合、どのような中間要素を特徴量とすべきか事前に決定する必要がある。非特許文献１、２では、人手で中間要素を決定し、決定された中間要素の識別器を学習させる。非特許文献３では、識別すべきカテゴリが既知である場合、識別すべきカテゴリを弁別するために最適な中間要素の特徴量を自動的に獲得している。 When low-level features are used for category identification, the categories are learned directly from the image signal, so the learning process is opaque and it is effective to use what features for what categories. It is difficult to verify whether it exists. In addition, since the semantic gap between the image signal and the category is large, a large amount of learning data is required for highly accurate identification.
When using mid-level features for category identification, first learn the intermediate element classifiers that make up the category based on the low-level feature quantities, and learn the categories using the output values of the intermediate element classifiers as feature quantities. Let Since the category is expressed by a combination of intermediate elements that can be perceived by humans, the learning process is clear when the mid-level feature is used for category identification. In addition, since the semantic gap between the Mid-level feature value and the category is smaller than the gap between the Low-level feature value and the category, high-precision identification is possible with a small amount of learning data.
However, when using the mid-level feature value, it is necessary to determine in advance which intermediate element should be used as the feature value. In Non-Patent Documents 1 and 2, an intermediate element is determined manually, and a classifier of the determined intermediate element is learned. In Non-Patent Document 3, when the category to be identified is known, the optimum feature value of the intermediate element is automatically acquired in order to discriminate the category to be identified.

A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. “Describing Objects by their Attributes”. In CVPR, pp. 1778-1785, 2009.A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. “Describing Objects by their Attributes”. In CVPR, pp. 1778-1785, 2009. L. Li, H. Su, E.P. Xing, and L. Fei-Fei. “Object Bank: A High-level Image Representation for Scene Classification & Semantic Feature Sparsification”. In NIPS, pp. 1378-1386, 2010.L. Li, H. Su, E.P. Xing, and L. Fei-Fei. “Object Bank: A High-level Image Representation for Scene Classification & Semantic Feature Sparsification”. In NIPS, pp. 1378-1386, 2010. F.X. Yu, L Cao, R.S. Feris, J.R. Smith, and S.-F. Chang. “Designing Category-Level Attributes for Discriminative Visual Recognition”. In CVPR, pp. 771-778, 2013.F.X. Yu, L Cao, R.S. Feris, J.R. Smith, and S.-F. Chang. “Designing Category-Level Attributes for Discriminative Visual Recognition”. In CVPR, pp. 771-778, 2013.

しかしながら、新たにカテゴリを学習させる場合、既知のカテゴリの中間要素と新たに学習させるカテゴリの中間要素とが一致してしまうと、新たに学習させるカテゴリを弁別することが出来ない。例えば、事前に学習させたMid-level特徴量が尻尾・足・翼であり、既知のカテゴリが犬・飛行機であり、新たに猫のカテゴリを学習させる場合を考える。猫のカテゴリと飛行機のカテゴリとを比較した場合、猫のカテゴリには尻尾・足が存在する一方で翼は存在せず、飛行機のカテゴリには翼が存在する一方で尻尾・足は存在しない。そのため、猫のカテゴリと飛行機のカテゴリとでは、中間要素が互いに背反であるためカテゴリの弁別が可能である。しかし、猫のカテゴリと犬のカテゴリとを比較した場合、両カテゴリともに尻尾・足が存在し、翼が存在しない。そのため、猫のカテゴリと犬のカテゴリとでは、各中間要素が一致（衝突）しており両カテゴリを弁別することは困難である。上述したように、類似したカテゴリを学習する際には、事前に学習したMid-level特徴量のみではカテゴリの弁別が困難であるという問題があった。 However, when a category is newly learned, if the intermediate element of the known category matches the intermediate element of the newly learned category, the newly learned category cannot be distinguished. For example, let us consider a case where the Mid-level feature amount learned in advance is tail, foot, wing, the known category is dog / airplane, and a new category of cat is learned. When the cat category and the airplane category are compared, the cat category has a tail / foot but no wings, and the airplane category has wings but no tail / foot. Therefore, the category of the cat category and the airplane category can be discriminated because the intermediate elements are contradictory to each other. However, when comparing the cat category with the dog category, both categories have tails / feet and no wings. Therefore, in the cat category and the dog category, each intermediate element matches (collises), and it is difficult to discriminate between the two categories. As described above, when learning similar categories, there is a problem that it is difficult to discriminate between categories only by using Mid-level feature values learned in advance.

上記事情に鑑み、本発明は、類似したカテゴリを学習する場合であってもカテゴリの弁別を容易にすることができる技術の提供を目的としている。 In view of the above circumstances, an object of the present invention is to provide a technique capable of easily discriminating categories even when learning similar categories.

本発明の一態様は、画像のカテゴリを学習するための学習画像から第１の画像特徴量を抽出する第１画像特徴量抽出部と、抽出された前記第１の画像特徴量から第２の画像特徴量を抽出する第２画像特徴量抽出部と、抽出された前記第２の画像特徴量と、学習済みのカテゴリの第２の画像特徴量とに基づいて、前記学習画像のカテゴリに類似する前記学習済みのカテゴリが存在するか否か判定する類似カテゴリ判定部と、類似する前記学習済みのカテゴリが存在すると判定された場合に、前記学習画像の前記第１の画像特徴量と、類似する前記学習済みのカテゴリの第１の画像特徴量とに基づいて、類似する前記学習済みのカテゴリの第２の画像特徴量と類似しない第２の画像特徴量を生成し、生成された前記第２の画像特徴量を前記学習画像の第２の画像特徴量に加えて新たな第２の画像特徴量を生成する特徴量生成部と、前記新たな第２の画像特徴量を識別するためのカテゴリ識別器を生成するカテゴリ学習部と、備えるカテゴリ識別器生成装置である。 According to one aspect of the present invention, a first image feature amount extraction unit that extracts a first image feature amount from a learning image for learning a category of an image, and a second from the extracted first image feature amount Similar to the category of the learning image based on the second image feature amount extraction unit that extracts the image feature amount, the extracted second image feature amount, and the second image feature amount of the learned category A similar category determining unit that determines whether or not the learned category exists, and the first image feature amount of the learned image when it is determined that the similar learned category exists. And generating a second image feature quantity that is not similar to the second image feature quantity of the similar learned category based on the first image feature quantity of the learned category to be generated, and The image feature amount of 2 is used as the learning image. In addition to the second image feature amount, a feature amount generation unit that generates a new second image feature amount, and a category learning unit that generates a category identifier for identifying the new second image feature amount And a category discriminator generating device.

本発明の一態様は、上記のカテゴリ識別器生成装置であって、前記カテゴリ学習部は、前記学習画像のカテゴリに類似しない前記学習済みのカテゴリに関して、線形識別器の出力値の関係式において、前記学習画像の第２の画像特徴量に新たに追加した前記第２の画像特徴量に対する重みを無くす。 One aspect of the present invention is the above-described category classifier generation device, wherein the category learning unit is configured to use a relational expression of output values of a linear classifier for the learned category that is not similar to the category of the learning image. The weight for the second image feature amount newly added to the second image feature amount of the learning image is eliminated.

本発明の一態様は、上記のカテゴリ識別器生成装置の前記カテゴリ学習部によって学習された前記カテゴリ識別器を用いて、入力された画像のカテゴリを識別するカテゴリ識別装置である。 One aspect of the present invention is a category identification device that identifies a category of an input image using the category identifier learned by the category learning unit of the category identifier generation device.

本発明の一態様は、上記の装置としてコンピュータを機能させるためのコンピュータプログラムである。 One embodiment of the present invention is a computer program for causing a computer to function as the above-described device.

本発明により、類似したカテゴリを学習する場合であってもカテゴリの弁別を容易にすることが可能となる。 According to the present invention, even when similar categories are learned, category discrimination can be facilitated.

本発明におけるカテゴリ識別器生成装置１００の機能構成を表す概略ブロック図である。It is a schematic block diagram showing the function structure of the category identifier production | generation apparatus 100 in this invention. 学習カテゴリのMid-level特徴量の衝突検知に関する概念図である。It is a conceptual diagram regarding the collision detection of the Mid-level feature-value of a learning category. 類似カテゴリが存在する場合における処理を説明するための概念図である。It is a conceptual diagram for demonstrating the process in case a similar category exists. 類似カテゴリが存在する場合における処理を説明するための概念図である。It is a conceptual diagram for demonstrating the process in case a similar category exists. 本発明におけるカテゴリ識別装置２００の機能構成を表す概略ブロック図である。It is a schematic block diagram showing the function structure of the category identification apparatus 200 in this invention. 本実施形態におけるカテゴリ識別器生成装置１００の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the category identification device production | generation apparatus 100 in this embodiment.

以下、本発明の一実施形態を、図面を参照しながら説明する。
図１は、本発明におけるカテゴリ識別器生成装置１００の機能構成を表す概略ブロック図である。カテゴリ識別器生成装置１００は、入力される画像（カテゴリ学習用画像）がどのようなカテゴリに分類されるかを識別する識別器（カテゴリ識別器）を学習によって生成する装置である。カテゴリは、画像に含まれる動物体やシーンなどであり、例えば猫や犬などである。また、カテゴリ識別器生成装置１００には、カテゴリ学習用画像記憶部１０に記憶されているカテゴリ学習用画像が入力される。カテゴリ学習用画像は、画像と画像のカテゴリとを示すカテゴリ付きの画像データである。カテゴリ学習用画像は、カテゴリ学習用画像を識別するためのインデクスに対応付けられてカテゴリ学習用画像記憶部１０に記憶されている。カテゴリ識別器生成装置１００は、入力されたカテゴリ学習用画像に基づいてカテゴリの識別器を学習する。
以下、カテゴリ識別器生成装置１００の具体的な構成について説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a schematic block diagram showing a functional configuration of a category discriminator generation device 100 according to the present invention. The category discriminator generation device 100 is a device that generates, by learning, a discriminator (category discriminator) that identifies an input image (category learning image) to be classified into what category. The category is a moving object or a scene included in the image, such as a cat or a dog. Further, the category learning image stored in the category learning image storage unit 10 is input to the category discriminator generation device 100. The category learning image is image data with a category indicating the image and the category of the image. The category learning image is stored in the category learning image storage unit 10 in association with an index for identifying the category learning image. The category discriminator generation device 100 learns a category discriminator based on the input category learning image.
Hereinafter, a specific configuration of the category discriminator generation device 100 will be described.

カテゴリ識別器生成装置１００は、バスで接続されたＣＰＵ（Central Processing Unit）やメモリや補助記憶装置などを備え、カテゴリ識別器生成プログラムを実行する。カテゴリ識別器生成プログラムの実行によって、カテゴリ識別器生成装置１００は、Low-level特徴量抽出部（第１画像特徴量抽出部）１０１、Low-level特徴量記憶部１０２、Mid-level特徴量抽出部（第２画像特徴量抽出部）１０３、Mid-level特徴量記憶部１０４、類似カテゴリ判定部１０５、特徴量生成部１０６、カテゴリ学習部１０７、カテゴリ識別器記憶部１０８を備える装置として機能する。なお、カテゴリ識別器生成装置１００の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されてもよい。また、カテゴリ識別器生成プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。また、カテゴリ識別器生成プログラムは、電気通信回線を介して送受信されてもよい。 The category discriminator generation device 100 includes a CPU (Central Processing Unit), a memory, an auxiliary storage device, and the like connected by a bus, and executes a category discriminator generation program. By executing the category discriminator generation program, the category discriminator generation device 100 has a low-level feature quantity extraction unit (first image feature quantity extraction unit) 101, a low-level feature quantity storage unit 102, a mid-level feature quantity extraction. Unit (second image feature amount extraction unit) 103, Mid-level feature amount storage unit 104, similar category determination unit 105, feature amount generation unit 106, category learning unit 107, and category classifier storage unit 108. . Note that all or part of each function of the category discriminator generating apparatus 100 is realized by using hardware such as an application specific integrated circuit (ASIC), a programmable logic device (PLD), and a field programmable gate array (FPGA). Also good. The category identifier generation program may be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system. The category identifier generation program may be transmitted / received via a telecommunication line.

Low-level特徴量抽出部１０１は、入力された学習対象のカテゴリ（以下、「学習カテゴリ」という。）のカテゴリ学習用画像からLow-level特徴量（第１の画像特徴量）を抽出する。Low-level特徴量の抽出には、例えばＳＩＦＴ、ＳＵＲＦ（Speed-Up Robust Features）、ＨｏＧ（Histogram of oriented Gradient）、色ヒストグラム、エッジヒストグラム、ウェーブレット特徴などが適用される。
Low-level特徴量記憶部１０２は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。Low-level特徴量記憶部１０２は、Low-level特徴量抽出部１０１によって抽出されたLow-level特徴量とともに学習カテゴリのカテゴリ学習用画像のインデクスを記憶する。 The low-level feature quantity extraction unit 101 extracts a low-level feature quantity (first image feature quantity) from the category learning image of the input learning target category (hereinafter referred to as “learning category”). For example, SIFT, SURF (Speed-Up Robust Features), HoG (Histogram of oriented Gradient), color histogram, edge histogram, wavelet feature, and the like are applied to the extraction of the low-level feature amount.
The low-level feature amount storage unit 102 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The low-level feature quantity storage unit 102 stores the category learning image index of the learning category together with the low-level feature quantity extracted by the low-level feature quantity extraction unit 101.

Mid-level特徴量抽出部１０３は、Low-level特徴量抽出部１０１によって抽出されたLow-level特徴量からMid-level特徴量（第２の画像特徴量）を抽出する。Mid-level特徴量の抽出には、学習カテゴリの識別器以外のあらゆる識別器の出力値を用いることができる。Mid-level特徴量は、非特許文献１に記載のAttribute（属性）が挙げられる。より具体的には、犬・飛行機・猫をカテゴリとするとき、尻尾・足・翼がAttributeに相当する。
Mid-level特徴量記憶部１０４は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。Mid-level特徴量記憶部１０４は、Mid-level特徴量抽出部１０３によって抽出されたMid-level特徴量とともにカテゴリ学習用画像のインデクスを記憶する。 The mid-level feature quantity extraction unit 103 extracts a mid-level feature quantity (second image feature quantity) from the low-level feature quantity extracted by the low-level feature quantity extraction unit 101. For the extraction of the mid-level feature quantity, the output value of any classifier other than the classifier of the learning category can be used. Examples of the mid-level feature amount include Attribute described in Non-Patent Document 1. More specifically, when the category is dog, airplane, or cat, the tail, paws, and wings correspond to Attribute.
The mid-level feature amount storage unit 104 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The mid-level feature amount storage unit 104 stores the index of the category learning image together with the mid-level feature amount extracted by the mid-level feature amount extraction unit 103.

類似カテゴリ判定部１０５は、Mid-level特徴量記憶部１０４に記憶されているMid-level特徴量に基づいて、学習カテゴリと類似したカテゴリ（以下、「類似カテゴリ」という。）が存在するか否か判定する。具体的には、類似カテゴリ判定部１０５は、学習カテゴリのMid-level特徴量と、学習済みカテゴリのMid-level特徴量とが衝突しているか否か検知する。類似カテゴリが存在する場合、つまり、学習カテゴリのMid-level特徴量と、学習済みカテゴリのMid-level特徴量とが衝突している場合、類似カテゴリ判定部１０５は類似カテゴリのインデクスの情報を特徴量生成部１０６に出力する。一方、類似カテゴリが存在しない場合、つまり、学習カテゴリのMid-level特徴量と、学習済みカテゴリのMid-level特徴量とが衝突していない場合、類似カテゴリ判定部１０５は学習カテゴリのMid-level特徴量をカテゴリ学習部１０７に出力する。類似カテゴリ判定部１０５の具体的な処理については後述する。 The similar category determination unit 105 determines whether there is a category similar to the learning category (hereinafter referred to as “similar category”) based on the Mid-level feature amount stored in the Mid-level feature amount storage unit 104. To determine. Specifically, the similar category determination unit 105 detects whether or not the Mid-level feature value of the learning category collides with the Mid-level feature value of the learned category. If a similar category exists, that is, if the learning category Mid-level feature amount and the learned category Mid-level feature amount collide, the similar category determination unit 105 characterizes the index information of the similar category. The data is output to the quantity generation unit 106. On the other hand, when the similar category does not exist, that is, when the Mid-level feature amount of the learning category does not collide with the Mid-level feature amount of the learned category, the similar category determination unit 105 determines the Mid-level of the learning category. The feature amount is output to the category learning unit 107. Specific processing of the similar category determination unit 105 will be described later.

特徴量生成部１０６は、類似カテゴリのインデクスの情報に基づいて、新たなMid-level特徴量を生成する。新たなMid-level特徴量とは、Mid-level特徴量記憶部１０４に記憶されていないMid-level特徴量である。具体的には、まず特徴量生成部１０６は、類似カテゴリのインデクスの情報に基づいて、学習カテゴリのLow-level特徴量と、類似カテゴリのLow-level特徴量とをLow-level特徴量記憶部１０２から取得する。次に、特徴量生成部１０６は、取得した学習カテゴリのLow-level特徴量を正例、類似カテゴリのLow-level特徴量を負例として識別器を学習する。これにより、特徴量生成部１０６は、学習カテゴリと類似カテゴリとを弁別することが可能になる。そして、特徴量生成部１０６は、学習した識別器の出力値を新たなMid-level特徴量として生成する。識別器には、例えば線形ＳＶＭが用いられてもよい。特徴量生成部１０６によって生成された新たなMid-level特徴量は、新たなカテゴリを学習する際にも使用される。つまり、特徴量生成部１０６は、生成したMid-level特徴量をMid-level特徴量記憶部１０４に記憶されている各学習済みカテゴリのMid-level特徴量として新たに追加記憶させる。 The feature amount generation unit 106 generates a new mid-level feature amount based on the index information of the similar category. The new Mid-level feature value is a Mid-level feature value that is not stored in the Mid-level feature value storage unit 104. Specifically, first, the feature quantity generation unit 106 stores the low-level feature quantity of the learning category and the low-level feature quantity of the similar category based on the index information of the similar category. 102. Next, the feature quantity generation unit 106 learns the classifier using the acquired low-level feature quantity of the learning category as a positive example and the low-level feature quantity of the similar category as a negative example. Thereby, the feature quantity generation unit 106 can discriminate between the learning category and the similar category. Then, the feature amount generation unit 106 generates the learned output value of the discriminator as a new Mid-level feature amount. For example, a linear SVM may be used as the discriminator. The new mid-level feature value generated by the feature value generation unit 106 is also used when learning a new category. That is, the feature value generation unit 106 newly stores the generated Mid-level feature value as a Mid-level feature value of each learned category stored in the Mid-level feature value storage unit 104.

カテゴリ学習部１０７は、学習カテゴリの識別器（カテゴリ識別器）を学習する。例えば、類似カテゴリ判定部１０５から学習カテゴリのMid-level特徴量が入力された場合、カテゴリ学習部１０７は入力されたMid-level特徴量に基づいて学習カテゴリの識別器を学習する。識別器の学習には、例えば線形識別器が用いられてもよい。また、カテゴリ学習部１０７は、特徴量生成部１０６によって得られた新たなMid-level特徴量を学習カテゴリのMid-level特徴量に加えて学習カテゴリの識別器を学習する。ここで、新たなMid-level特徴量を生成する前に学習したカテゴリ識別器と、新たなMid-level特徴量を生成した後に学習したカテゴリ識別器（学習カテゴリの識別器）とでは、Mid-level特徴量の次元数が異なる。そのため、通常は、前者のカテゴリ識別器に関して再学習が必要になる。しかし、カテゴリ学習部１０７が、以下の処理を行うことによって再学習が必要無くなる。線形識別器における出力値は、式１に示すようにｎ次元の特徴量ａとその特徴量ａの重みｗの内積によって決定される。そのため、類似していない学習済みカテゴリに関しては、式２において特徴量生成部１０６で生成した特徴量の重みｗ＿ｎｅｗを０とすることで式１と同じ式になるため、識別結果を不変にすることができる。そのため、識別器の再学習が不必要となる。 The category learning unit 107 learns a learning category classifier (category classifier). For example, when a Mid-level feature amount of a learning category is input from the similar category determination unit 105, the category learning unit 107 learns a learning category identifier based on the input Mid-level feature amount. For example, a linear classifier may be used for learning the classifier. Further, the category learning unit 107 learns the learning category discriminator by adding the new Mid-level feature amount obtained by the feature amount generating unit 106 to the Mid-level feature amount of the learning category. Here, the category classifier learned before generating a new Mid-level feature quantity and the category classifier learned after generating a new Mid-level feature quantity (learning category classifier) are Mid- The number of dimensions of level features is different. Therefore, re-learning is usually required for the former category classifier. However, the category learning unit 107 does not need to learn again by performing the following processing. The output value in the linear discriminator is determined by the inner product of the n-dimensional feature quantity a and the weight w of the feature quantity a as shown in Equation 1. Therefore, for learned categories that are not similar, the weight w_new of the feature quantity generated by the feature quantity generation unit 106 in Formula 2 becomes the same formula as Formula 1, so that the identification result remains unchanged. Can do. Therefore, re-learning of the discriminator becomes unnecessary.

カテゴリ識別器記憶部１０８は、磁気ハードディスク装置や半導体記憶装置などの記憶装置を用いて構成される。カテゴリ識別器記憶部１０８は、カテゴリ学習部１０７によって学習された識別器（カテゴリ識別器）を記憶している。 The category identifier storage unit 108 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device. The category classifier storage unit 108 stores the classifier (category classifier) learned by the category learning unit 107.

図２は、学習カテゴリのMid-level特徴量の衝突検知に関する概念図である。
図２には、左から順番にカテゴリ学習用画像、Mid-level特徴量、学習カテゴリのMid-level特徴量と学習済みカテゴリのMid-level特徴量との衝突検知結果が示されている。図２では、カテゴリ“飛行機”・“犬”・“猫”に関する画像がカテゴリ学習用画像として示されており、カテゴリ“飛行機”・“犬”が学習済みカテゴリに相当し、カテゴリ“猫”が学習カテゴリに相当する。図２の例では、Mid-level特徴量は、“Ｗｉｎｇ”・“Ｔａｉｌ”・“Ｆｕｒｒｙ”の３つの値が示されている。各カテゴリ“飛行機”・“犬”・“猫”のカテゴリ学習用画像１−１〜１−３、２−１〜２−３、３−１〜３−３のMid-level特徴量がそれぞれのカテゴリ学習用画像１−１〜１−３、２−１〜２−３、３−１〜３−３に対応付けて示されている。例えば、カテゴリ“飛行機”のカテゴリ学習用画像１−１のMid-level特徴量は、”Ａ¹ ₁＝｛０．９０，０．２１，０．１８｝”である。類似カテゴリ判定部１０５は、各カテゴリ学習用画像１−１〜１−３、２−１〜２−３、３−１〜３−３のMid-level特徴量に基づいて、学習カテゴリ（図２では、カテゴリ“猫”）の類似カテゴリがあるか否か判定する。
以下、類似カテゴリ判定部１０５の具体的な処理について説明する。 FIG. 2 is a conceptual diagram related to collision detection of the Mid-level feature amount of the learning category.
FIG. 2 shows, in order from the left, the category learning image, the Mid-level feature value, and the collision detection result between the learning category Mid-level feature value and the learned category Mid-level feature value. In FIG. 2, images related to the categories “airplane”, “dog”, and “cat” are shown as category learning images. The categories “airplane” and “dog” correspond to learned categories, and the category “cat” Corresponds to the learning category. In the example of FIG. 2, three values of “Wing”, “Tail”, and “Furry” are shown as the Mid-level feature amount. The category learning images 1-1 to 1-3, 2-1 to 2-3, and 3-1 to 3-3 for each category “airplane”, “dog”, and “cat” have respective Mid-level feature amounts. The category learning images 1-1 to 1-3, 2-1 to 2-3, and 3-1 to 3-3 are shown in association with each other. For example, the mid-level feature amount of the category learning image 1-1 of the category “airplane” is “A ¹ ₁ = {0.90, 0.21, 0.18}”. The similar category determination unit 105 determines the learning category (FIG. 2) based on the mid-level feature amounts of the respective category learning images 1-1 to 1-3, 2-1 to 2-3, and 3-1 to 3-3. Then, it is determined whether there is a similar category of the category “cat”.
Hereinafter, specific processing of the similar category determination unit 105 will be described.

類似カテゴリ判定部１０５は、衝突を検知するために学習カテゴリのMid-level特徴量と学習済みカテゴリのMid-level特徴量とに対して２標本検定を行い、帰無仮説が棄却されるか否か検定する。統計的検定には、ｔ検定が用いられてもよい。類似カテゴリ判定部１０５は、Mid-level特徴量の各次元（図２では、“Ｗｉｎｇ”・“Ｔａｉｌ”・“Ｆｕｒｒｙ”）に対して２標本検定を行い、帰無仮説が棄却された場合には帰無仮説が棄却された次元は衝突していないと判定する。一方、帰無仮説が棄却されなかった場合には、類似カテゴリ判定部１０５は帰無仮説が棄却されなかった次元は衝突していると判定する。
そして、類似カテゴリ判定部１０５は、各次元に対する判定結果に基づいて衝突率Ｒを算出する。具体的には、類似カテゴリ判定部１０５は、Mid-level特徴量の次元をＮ、衝突した次元の数をＮ_ｃｏｌｌｉｓｉｏｎとして、衝突率Ｒ=Ｎ_ｃｏｌｌｉｓｉｏｎ／Ｎにより衝突率Ｒを算出する。 The similar category determination unit 105 performs a two-sample test on the Mid-level feature quantity of the learning category and the Mid-level feature quantity of the learned category in order to detect a collision, and whether or not the null hypothesis is rejected. Test. A t-test may be used for the statistical test. The similar category determination unit 105 performs a two-sample test on each dimension of the mid-level feature quantity (in FIG. 2, “Wing”, “Tail”, “Furry”), and when the null hypothesis is rejected Determines that the dimensions for which the null hypothesis is rejected do not collide. On the other hand, when the null hypothesis is not rejected, the similar category determination unit 105 determines that the dimensions for which the null hypothesis is not rejected are colliding.
Then, the similar category determination unit 105 calculates the collision rate R based on the determination result for each dimension. Specifically, the similar category determination unit 105 calculates the collision rate R using the collision rate R = N_collion / N, where N is the dimension of the Mid-level feature value and N_collion is the number of collided dimensions.

類似カテゴリ判定部１０５は、衝突率Ｒが閾値を超えた場合、両カテゴリのMid-level特徴量が衝突しているため弁別できない、すなわち類似カテゴリが存在すると判定する。この場合、類似カテゴリ判定部１０５は、類似カテゴリのインデクスの情報を特徴量生成部１０６に出力する。一方、類似カテゴリ判定部１０５は、衝突率Ｒが閾値を超えなかった場合、両カテゴリのMid-level特徴量が衝突していないため弁別できる、すなわち類似カテゴリが存在しないと判定する。閾値は、カテゴリ識別器生成装置１００に予め記憶されていてもよいし、ユーザによって任意に決定されてもよい。 When the collision rate R exceeds the threshold, the similar category determination unit 105 determines that the mid-level feature quantities of both categories cannot be discriminated, that is, there is a similar category. In this case, the similar category determination unit 105 outputs the index information of the similar category to the feature amount generation unit 106. On the other hand, when the collision rate R does not exceed the threshold value, the similar category determination unit 105 determines that the mid-level feature quantities of both categories can be discriminated, that is, there is no similar category. The threshold value may be stored in advance in the category discriminator generation device 100 or may be arbitrarily determined by the user.

図２に示される例では、カテゴリ“飛行機”とカテゴリ“猫”との衝突検知結果は衝突率が０％である。つまり、カテゴリ“飛行機”とカテゴリ“猫”とは弁別が可能であることが示されている。この場合、類似カテゴリ判定部１０５は、カテゴリ“猫”のカテゴリ学習用画像のMid-level特徴量をカテゴリ学習部１０７に出力する。また、図２に示される例では、カテゴリ“犬”とカテゴリ“猫”との衝突検知結果は衝突率が１００％である。つまり、カテゴリ“犬”とカテゴリ“猫”とは弁別が不可能である（衝突している）ことが示されている。この場合、類似カテゴリ判定部１０５は、類似カテゴリ（カテゴリ“犬”）のインデクスの情報を特徴量生成部１０６に出力する。
以上で、類似カテゴリ判定部１０５の具体的な処理についての説明を終了する。 In the example shown in FIG. 2, the collision detection result between the category “airplane” and the category “cat” has a collision rate of 0%. That is, it is shown that the category “airplane” and the category “cat” can be distinguished. In this case, the similar category determination unit 105 outputs the mid-level feature amount of the category learning image of the category “cat” to the category learning unit 107. In the example shown in FIG. 2, the collision detection result of the category “dog” and the category “cat” has a collision rate of 100%. That is, it is shown that the category “dog” and the category “cat” cannot be distinguished (collision). In this case, the similar category determination unit 105 outputs the index information of the similar category (category “dog”) to the feature amount generation unit 106.
Above, the description about the specific process of the similar category determination part 105 is complete | finished.

次に、図３及び４を用いて、類似カテゴリ判定部１０５及び特徴量生成部１０６の具体的な動作例について説明する。図３及び４は、類似カテゴリが存在する場合における処理を説明するための概念図である。図３及び４の説明では、予めカテゴリ“犬”及び“飛行機”の識別器が学習済みである場合を例に説明する。また、説明の簡単化のため、カテゴリで示されている移動体（例えば、“犬”や“飛行機”）がMid-level特徴量で示される属性（例えば、“Ｗｉｎｇ”や“Ｔａｉｌ”）を有している場合には“１”で示し、有していない場合には“０”で示す。 Next, specific examples of operations of the similar category determination unit 105 and the feature amount generation unit 106 will be described with reference to FIGS. 3 and 4 are conceptual diagrams for explaining the processing in the case where a similar category exists. In the description of FIGS. 3 and 4, a case where the classifiers of the categories “dog” and “airplane” have been learned in advance will be described as an example. In addition, for the sake of simplification of explanation, an attribute (for example, “Wing” or “Tail”) in which a moving body (for example, “dog” or “airplane”) indicated by a category is indicated by a mid-level feature amount is used. If it has, it is indicated by “1”, otherwise it is indicated by “0”.

図３（Ａ）に示される例では、学習済みカテゴリ“犬”の属性“Ｗｉｎｇ”に対応する項目が“０”、属性“Ｔａｉｌ”に対応する項目が“１”である。つまり、犬は翼（“Ｗｉｎｇ”）を有しておらず、尻尾（“Ｔａｉｌ”）を有していることが表されている。また、学習済みカテゴリ“飛行機”の属性“Ｗｉｎｇ”に対応する項目が“１”、属性“Ｔａｉｌ”に対応する項目が“０”である。つまり、飛行機は翼を有しており、尻尾を有していないことが表されている。 In the example shown in FIG. 3A, the item corresponding to the attribute “Wing” of the learned category “dog” is “0”, and the item corresponding to the attribute “Tail” is “1”. That is, it is shown that the dog does not have wings (“Wing”) but has a tail (“Tail”). In addition, the item corresponding to the attribute “Wing” of the learned category “airplane” is “1”, and the item corresponding to the attribute “Tail” is “0”. That is, it is shown that the airplane has a wing and does not have a tail.

上述のように、学習済みカテゴリ“犬”と“飛行機”とでは各属性の値が異なる。そのため、類似カテゴリ判定部１０５は、学習済みカテゴリ“犬”と“飛行機”とで衝突率Ｒが閾値未満であると判定する。つまり、学習済みカテゴリ“犬”のMid-level特徴量と“飛行機”のMid-level特徴量とが衝突していない。この場合、学習済みカテゴリ“犬”と“飛行機”とは互いに背反となっているため、新たにカテゴリ“猫”の識別器を学習させる場合、衝突する学習済みカテゴリは１組だけになる。 As described above, the value of each attribute is different between the learned categories “dog” and “airplane”. Therefore, the similar category determination unit 105 determines that the collision rate R is less than the threshold for the learned categories “dog” and “airplane”. That is, the Mid-level feature value of the learned category “dog” and the Mid-level feature value of “Airplane” do not collide. In this case, since the learned categories “dog” and “airplane” are mutually contradictory, when the classifier of the category “cat” is newly learned, there is only one set of learned categories that collide.

新たにカテゴリ“猫”の識別器を学習させる場合、カテゴリ“猫”に対応するカテゴリ学習用画像から抽出されたLow-level特徴量を元にMid-level特徴量（“Ｗｉｎｇ”や“Ｔａｉｌ”）が抽出される。図３（Ｂ）に、学習カテゴリ“猫”の各属性に対応する項目を追加した例を示す。
図３（Ｂ）に示される例では、学習カテゴリ“猫”の属性“Ｗｉｎｇ”に対応する項目が“０”、属性“Ｔａｉｌ”に対応する項目が“１”である。つまり、猫は翼（“Ｗｉｎｇ”）を有しておらず、尻尾（“Ｔａｉｌ”）を有していることが表されている。 When a new classifier “category” is trained, a mid-level feature value (“Wing” or “Tail”) based on the low-level feature value extracted from the category learning image corresponding to the category “cat”. ) Is extracted. FIG. 3B shows an example in which items corresponding to the attributes of the learning category “cat” are added.
In the example shown in FIG. 3B, the item corresponding to the attribute “Wing” of the learning category “cat” is “0”, and the item corresponding to the attribute “Tail” is “1”. That is, the cat does not have wings (“Wing”) but has a tail (“Tail”).

図３（Ｂ）に示されるように、学習済みカテゴリ“犬”と学習カテゴリ“猫”とでは各属性の値が一致する。そのため、類似カテゴリ判定部１０５は、学習済みカテゴリ“犬”と学習カテゴリ“猫”とで衝突率Ｒが閾値以上であると判定する。つまり、学習済みカテゴリ“犬”のMid-level特徴量と学習カテゴリ“猫”のMid-level特徴量とが衝突している。この場合、特徴量生成部１０６は、両カテゴリ（学習済みカテゴリ“犬”と学習カテゴリ“猫”）を弁別可能な属性（Ｎｅｗ）の特徴量をMid-level特徴量として新たに追加する（図３（Ｃ））。学習済みカテゴリ“犬”と学習カテゴリ“猫”とを弁別することが可能な属性の特徴量が新たに追加されるため、全てのカテゴリは再び背反する。そして、学習カテゴリ“猫”を識別するための識別器が生成される。 As shown in FIG. 3B, the value of each attribute is the same in the learned category “dog” and the learning category “cat”. Therefore, the similar category determination unit 105 determines that the collision rate R is greater than or equal to the threshold value for the learned category “dog” and the learned category “cat”. That is, the Mid-level feature value of the learned category “dog” and the Mid-level feature value of the learning category “Cat” collide. In this case, the feature value generation unit 106 newly adds a feature value of an attribute (New) that can distinguish both categories (learned category “dog” and learning category “cat”) as a Mid-level feature value (see FIG. 3 (C)). Since a feature amount of an attribute that can discriminate between the learned category “dog” and the learned category “cat” is newly added, all categories are contradicted again. Then, a discriminator for identifying the learning category “cat” is generated.

また、再度新たにカテゴリ“ライオン”の識別器を学習させる場合、全てのカテゴリが背反であるため、衝突するカテゴリは１組だけになる。新たにカテゴリ“ライオン”の識別器を学習させる場合、カテゴリ“ライオン”に対応するカテゴリ学習用画像から抽出されたLow-level特徴量を元にMid-level特徴量（“Ｗｉｎｇ”や“Ｔａｉｌ”や“Ｎｅｗ”）が抽出される。図４（Ａ）に、学習カテゴリ“ライオン”の各属性に対応する項目を追加した例を示す。
図４（Ａ）に示される例では、学習カテゴリ“ライオン”の属性“Ｗｉｎｇ”に対応する項目が“０”、属性“Ｔａｉｌ”に対応する項目が“１”、属性“Ｎｅｗ”に対応する項目が“１”である。つまり、ライオンは翼（“Ｗｉｎｇ”）を有しておらず、尻尾（“Ｔａｉｌ”）を有しており、属性“Ｎｅｗ”を有していることが表されている。 Further, when the classifier of the category “Lion” is newly learned again, since all categories are contradictory, there is only one set of conflicting categories. When a classifier of category “Lion” is newly learned, a Mid-level feature amount (“Wing” or “Tail”) based on the low-level feature amount extracted from the category learning image corresponding to category “Lion”. Or “New”) is extracted. FIG. 4A shows an example in which items corresponding to the attributes of the learning category “lion” are added.
In the example shown in FIG. 4A, the item corresponding to the attribute “Wing” of the learning category “Lion” corresponds to “0”, the item corresponding to the attribute “Tail” corresponds to “1”, and the attribute “New”. The item is “1”. That is, the lion does not have a wing (“Wing”), has a tail (“Tail”), and has an attribute “New”.

図４（Ａ）に示されるように、学習済みカテゴリ“猫”と学習カテゴリ“ライオン”とで各属性の値が一致する。そのため、類似カテゴリ判定部１０５は、学習済みカテゴリ“猫”と学習カテゴリ“ライオン”とで衝突率Ｒが閾値以上であると判定する。つまり、学習済みカテゴリ“猫”のMid-level特徴量と学習カテゴリ“ライオン”のMid-level特徴量とが衝突している。この場合、特徴量生成部１０６は、両カテゴリ（学習済みカテゴリ“猫”と学習カテゴリ“ライオン”）を弁別可能な属性（Ｎｅｗ２）の特徴量をMid-level特徴量として新たに追加する（図４（Ｂ））。学習済みカテゴリ“猫”と学習カテゴリ“ライオン”とを弁別することが可能な属性の特徴量が新たに追加されるため、全てのカテゴリは再び背反する。そして、学習カテゴリ“ライオン”を識別するための識別器が生成される。新たなカテゴリが学習される度に、以上のような処理が繰り返し実行される。
以上で、類似カテゴリ判定部１０５及び特徴量生成部１０６の具体的な動作例についての説明を終了する。 As shown in FIG. 4A, the value of each attribute is the same in the learned category “Cat” and the learning category “Lion”. Therefore, the similar category determination unit 105 determines that the collision rate R is greater than or equal to the threshold value for the learned category “cat” and the learned category “lion”. That is, the Mid-level feature value of the learned category “Cat” and the Mid-level feature value of the learning category “Lion” collide. In this case, the feature quantity generation unit 106 newly adds a feature quantity of an attribute (New2) that can distinguish both categories (learned category “cat” and learning category “lion”) as a Mid-level feature quantity (FIG. 4 (B)). Since a feature amount of an attribute that can discriminate between the learned category “cat” and the learned category “lion” is newly added, all categories are contradicted again. Then, a discriminator for identifying the learning category “lion” is generated. Each time a new category is learned, the above processing is repeatedly executed.
Above, the description about the specific operation example of the similar category determination part 105 and the feature-value production | generation part 106 is complete | finished.

以下、カテゴリ識別器生成装置１００における動作について具体例を用いて説明する。
カテゴリ識別器生成装置１００は、カテゴリを１つずつ（逐次的に）学習する、つまり、あるカテゴリを識別するためのカテゴリ識別器を逐次的に生成する。具体例として、犬・飛行機・猫を識別するための識別器を順番に学習する例を説明する。また、Mid-level特徴量は、例えば尻尾・足・翼の特徴量が使用される。
まず、カテゴリ識別器生成装置１００は、学習カテゴリ“犬”の識別器を学習する。犬が撮像されているカテゴリ学習用画像がカテゴリ識別器生成装置１００に入力されると、Low-level特徴量抽出部１０１はカテゴリ学習用画像からLow-level特徴量を抽出する。次に、Mid-level特徴量抽出部１０３は、Low-level特徴量からMid-level特徴量（尻尾・足・翼）を抽出する。 Hereinafter, the operation in the category discriminator generation device 100 will be described using a specific example.
The category discriminator generating apparatus 100 learns categories one by one (sequentially), that is, sequentially generates category discriminators for identifying a certain category. As a specific example, an example in which a discriminator for discriminating dogs, airplanes, and cats is sequentially learned will be described. Further, as the mid-level feature value, for example, the feature value of the tail, the foot, or the wing is used.
First, the category discriminator generation device 100 learns a discriminator of the learning category “dog”. When a category learning image in which a dog is imaged is input to the category discriminator generation device 100, the low-level feature value extraction unit 101 extracts a low-level feature value from the category learning image. Next, the Mid-level feature quantity extraction unit 103 extracts Mid-level feature quantities (tail, foot, wing) from the Low-level feature quantities.

類似カテゴリ判定部１０５は、Mid-level特徴量記憶部１０４に記憶されているMid-level特徴量に基づいて類似カテゴリが存在するか否か判定する。現時点では、類似カテゴリが存在しないため、カテゴリ学習部１０７は学習カテゴリ“犬”のMid-level特徴量（尻尾・足・翼）に基づいてカテゴリ“犬”の識別器を学習する。 The similar category determination unit 105 determines whether a similar category exists based on the Mid-level feature amount stored in the Mid-level feature amount storage unit 104. At this time, since there is no similar category, the category learning unit 107 learns a discriminator of the category “dog” based on the mid-level feature amount (tail, foot, wing) of the learning category “dog”.

次に、カテゴリ識別器生成装置１００は、学習カテゴリ“飛行機”の識別器を学習する。飛行機が撮像されているカテゴリ学習用画像がカテゴリ識別器生成装置１００に入力されると、Low-level特徴量抽出部１０１はカテゴリ学習用画像からLow-level特徴量を抽出する。次に、Mid-level特徴量抽出部１０３は、Low-level特徴量からMid-level特徴量を抽出する。 Next, the category discriminator generation device 100 learns the discriminator of the learning category “airplane”. When a category learning image in which an airplane is imaged is input to the category discriminator generation device 100, the low-level feature quantity extraction unit 101 extracts a low-level feature quantity from the category learning image. Next, the Mid-level feature quantity extraction unit 103 extracts a Mid-level feature quantity from the Low-level feature quantity.

類似カテゴリ判定部１０５は、Mid-level特徴量記憶部１０４に記憶されているMid-level特徴量に基づいて類似カテゴリが存在するか否か判定する。この場合には、類似カテゴリ判定部１０５は、学習カテゴリ“飛行機”のMid-level特徴量と学習済みカテゴリ“犬”のMid-level特徴量とが衝突しているか否か判定する。学習カテゴリ“飛行機”のMid-level特徴量と学習済みカテゴリ“犬”のMid-level特徴量とは背反するため、衝突していない。つまり、類似カテゴリが存在しないため、カテゴリ学習部１０７は、学習カテゴリ“飛行機”のMid-level特徴量に基づいてカテゴリ“飛行機”の識別器を学習する。 The similar category determination unit 105 determines whether a similar category exists based on the Mid-level feature amount stored in the Mid-level feature amount storage unit 104. In this case, the similar category determination unit 105 determines whether or not the mid-level feature value of the learning category “airplane” and the mid-level feature value of the learned category “dog” collide with each other. Since the mid-level feature amount of the learning category “airplane” and the mid-level feature amount of the learned category “dog” are contradictory, there is no collision. That is, since there is no similar category, the category learning unit 107 learns a classifier of “airplane” based on the mid-level feature amount of the learning category “airplane”.

次に、カテゴリ識別器生成装置１００は、学習カテゴリ“猫”の識別器を学習する。猫が撮像されているカテゴリ学習用画像がカテゴリ識別器生成装置１００に入力されると、Low-level特徴量抽出部１０１はカテゴリ学習用画像からLow-level特徴量を抽出する。次に、Mid-level特徴量抽出部１０３は、Low-level特徴量からMid-level特徴量を抽出する。 Next, the category discriminator generation device 100 learns the discriminator of the learning category “cat”. When a category learning image in which a cat is captured is input to the category discriminator generation device 100, the low-level feature amount extraction unit 101 extracts a low-level feature amount from the category learning image. Next, the Mid-level feature quantity extraction unit 103 extracts a Mid-level feature quantity from the Low-level feature quantity.

類似カテゴリ判定部１０５は、Mid-level特徴量記憶部１０４に記憶されているMid-level特徴量に基づいて類似カテゴリが存在するか否か判定する。この場合には、類似カテゴリ判定部１０５は、学習カテゴリ“猫”のMid-level特徴量と学習済みカテゴリ“飛行機”のMid-level特徴量とが衝突しているか否か判定する。さらに、類似カテゴリ判定部１０５は、学習カテゴリ“猫”のMid-level特徴量と学習済みカテゴリ“犬”のMid-level特徴量とが衝突しているか否か判定する。 The similar category determination unit 105 determines whether a similar category exists based on the Mid-level feature amount stored in the Mid-level feature amount storage unit 104. In this case, the similar category determination unit 105 determines whether or not the Mid-level feature amount of the learning category “Cat” and the Mid-level feature amount of the learned category “Airplane” collide with each other. Further, the similar category determination unit 105 determines whether or not the mid-level feature amount of the learning category “cat” and the mid-level feature amount of the learned category “dog” collide with each other.

学習カテゴリ“猫”のMid-level特徴量と学習済みカテゴリ“飛行機”のMid-level特徴量とは背反するため、衝突していない。一方、学習カテゴリ“猫”のMid-level特徴量と学習済みカテゴリ“犬”のMid-level特徴量とは衝突率Ｒが閾値以上であるため、衝突している。この場合、特徴量生成部１０６は、学習カテゴリ“猫”のLow-level特徴量と学習済みカテゴリ“犬”のLow-level特徴量とに基づいて新たなMid-level特徴量（Ｎｅｗ）を生成する。具体的には、学習カテゴリ“猫”のLow-level特徴量を正例、学習済みカテゴリ“犬”のLow-level特徴量を負例として線形識別器を学習する。カテゴリ学習部１０７は、まず学習カテゴリ“猫”の識別器を、“尻尾”・“足”・“翼”・“Ｎｅｗ”の４つのMid-level特徴量に基づいて学習する。また、カテゴリ学習部１０７は、類似カテゴリである学習済みカテゴリ“犬”の識別器を、学習カテゴリ“猫”の識別器と同様に４つのMid-level特徴量に基づいて再学習する。 Since the mid-level feature amount of the learning category “cat” and the mid-level feature amount of the learned category “airplane” are contradictory, there is no collision. On the other hand, the Mid-level feature amount of the learning category “Cat” and the Mid-level feature amount of the learned category “Dog” collide because the collision rate R is equal to or greater than the threshold value. In this case, the feature value generation unit 106 generates a new mid-level feature value (New) based on the low-level feature value of the learning category “cat” and the low-level feature value of the learned category “dog”. To do. Specifically, the linear discriminator is learned with the low-level feature value of the learning category “cat” as a positive example and the low-level feature value of the learned category “dog” as a negative example. The category learning unit 107 first learns the discriminator of the learning category “cat” based on the four Mid-level feature values of “tail”, “foot”, “wing”, and “New”. Further, the category learning unit 107 re-learns the discriminator of the learned category “dog”, which is a similar category, based on the four mid-level feature quantities, similarly to the discriminator of the learning category “cat”.

その後、新たなカテゴリを学習する際、カテゴリ識別器生成装置１００は上記処理にて新たに生成したMid-level特徴量“Ｎｅｗ”を加えた４つのMid-level特徴量を新たなカテゴリの対象が撮像されているカテゴリ学習用画像から抽出する。そして、カテゴリ識別器生成装置１００は、抽出したMid-level特徴量が学習済みカテゴリ“犬”・“飛行機”・“猫”それぞれのMid-level特徴量と衝突しているか否か判定する。類似カテゴリが存在する場合には、新たにMid-level特徴量（Ｎｅｗ２）が追加される。 After that, when learning a new category, the category discriminator generating apparatus 100 sets four Mid-level feature amounts newly added by the above-described processing to four Mid-level feature amounts as targets of the new category. Extracted from the captured category learning image. Then, the category discriminator generation device 100 determines whether or not the extracted Mid-level feature amount collides with the Mid-level feature amount of each of the learned categories “dog”, “airplane”, and “cat”. If a similar category exists, a new Mid-level feature (New2) is added.

図５は、本発明におけるカテゴリ識別装置２００の機能構成を表す概略ブロック図である。カテゴリ識別装置２００は、カテゴリ識別器生成装置１００によって生成されたカテゴリ識別器を用いて、カテゴリ識別装置２００に入力された画像のカテゴリを識別する。例えば、カテゴリ識別装置２００には、テスト画像記憶部２０に記憶されているテスト画像が入力される。テスト画像は、カテゴリ学習用画像と異なり、画像以外の情報を含まない。カテゴリ識別装置２００は、カテゴリ識別器生成装置１００によって生成されたカテゴリ識別器を用いて、入力されたテスト画像のカテゴリを識別する。
以下、カテゴリ識別装置２００の具体的な構成について説明する。 FIG. 5 is a schematic block diagram showing the functional configuration of the category identification device 200 according to the present invention. The category identification device 200 identifies the category of the image input to the category identification device 200 using the category identifier generated by the category identifier generation device 100. For example, the category identification apparatus 200 receives a test image stored in the test image storage unit 20. Unlike the category learning image, the test image does not include information other than the image. The category identification device 200 identifies the category of the input test image using the category identifier generated by the category identifier generation device 100.
Hereinafter, a specific configuration of the category identification device 200 will be described.

カテゴリ識別装置２００は、バスで接続されたＣＰＵやメモリや補助記憶装置などを備え、カテゴリ識別プログラムを実行する。カテゴリ識別プログラムの実行によって、カテゴリ識別装置２００は、Low-level特徴量抽出部２０１、Mid-level特徴量抽出部２０２、カテゴリ識別器２０３を備える装置として機能する。なお、カテゴリ識別装置２００の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されてもよい。また、カテゴリ識別プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。また、カテゴリ識別プログラムは、電気通信回線を介して送受信されてもよい。 The category identification device 200 includes a CPU, a memory, an auxiliary storage device, and the like connected by a bus, and executes a category identification program. By executing the category identification program, the category identification device 200 functions as a device including a low-level feature quantity extraction unit 201, a mid-level feature quantity extraction unit 202, and a category classifier 203. Note that all or some of the functions of the category identification device 200 may be realized using hardware such as an ASIC, PLD, or FPGA. The category identification program may be recorded on a computer-readable recording medium. The computer-readable recording medium is, for example, a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system. The category identification program may be transmitted / received via a telecommunication line.

Low-level特徴量抽出部２０１は、テスト画像からLow-level特徴量を抽出する。
Mid-level特徴量抽出部２０２は、Low-level特徴量抽出部２０１によって抽出されたLow-level特徴量からMid-level特徴量を抽出する。
カテゴリ識別器２０３は、カテゴリ学習部１０７の処理によって学習されたカテゴリ識別器である。カテゴリ識別器２０３は、Mid-level特徴量抽出部２０２によって抽出されたMid-level特徴量に基づいて、入力されたテスト画像のカテゴリを識別する。 The low-level feature quantity extraction unit 201 extracts a low-level feature quantity from the test image.
The mid-level feature quantity extraction unit 202 extracts a mid-level feature quantity from the low-level feature quantity extracted by the low-level feature quantity extraction unit 201.
The category identifier 203 is a category identifier learned by the processing of the category learning unit 107. The category discriminator 203 identifies the category of the input test image based on the Mid-level feature amount extracted by the Mid-level feature amount extraction unit 202.

図６は、本実施形態におけるカテゴリ識別器生成装置１００の処理の流れを示すフローチャートである。
カテゴリ識別器生成装置１００にカテゴリ学習用画像が入力される（ステップＳ１０１）。カテゴリ学習用画像が入力されると、Low-level特徴量抽出部１０１は入力されたカテゴリ学習用画像からLow-level特徴量を抽出する（ステップＳ１０２）。その後、Low-level特徴量抽出部１０１は、抽出したLow-level特徴量をカテゴリ学習用画像のインデクスとともにLow-level特徴量記憶部１０２に記憶させる。 FIG. 6 is a flowchart showing a process flow of the category discriminator generation device 100 in the present embodiment.
A category learning image is input to the category discriminator generation device 100 (step S101). When the category learning image is input, the low-level feature amount extraction unit 101 extracts the low-level feature amount from the input category learning image (step S102). Thereafter, the low-level feature quantity extraction unit 101 stores the extracted low-level feature quantity in the low-level feature quantity storage unit 102 together with the index of the category learning image.

次に、Mid-level特徴量抽出部１０３は、Low-level特徴量抽出部１０１によって抽出されたLow-level特徴量からMid-level特徴量を抽出する（ステップＳ１０３）。その後、Mid-level特徴量抽出部１０３は、抽出したMid-level特徴量をカテゴリ学習用画像のインデクスとともにMid-level特徴量記憶部１０４に記憶させる。 Next, the mid-level feature quantity extraction unit 103 extracts a mid-level feature quantity from the low-level feature quantity extracted by the low-level feature quantity extraction unit 101 (step S103). Thereafter, the Mid-level feature quantity extraction unit 103 stores the extracted Mid-level feature quantity in the Mid-level feature quantity storage unit 104 together with the index of the category learning image.

類似カテゴリ判定部１０５は、Mid-level特徴量抽出部１０３によって抽出されたMid-level特徴量と、Mid-level特徴量記憶部１０４に記憶されているMid-level特徴量とに基づいて類似カテゴリが存在するか否か判定する（ステップＳ１０４）。類似カテゴリが存在しない場合（ステップＳ１０４−ＮＯ）、カテゴリ学習部１０７は学習カテゴリのMid-level特徴量に基づいて、学習カテゴリの識別器を学習する（ステップＳ１０５）。その後、カテゴリ学習部１０７は、学習した識別器をカテゴリ識別器記憶部１０８に記憶させる（ステップＳ１０６）。
一方、類似カテゴリが存在する場合（ステップＳ１０４−ＹＥＳ）、特徴量生成部１０６は新たなMid-level特徴量を生成する（ステップＳ１０７）。 The similar category determination unit 105 performs similar categories based on the Mid-level feature amount extracted by the Mid-level feature amount extraction unit 103 and the Mid-level feature amount stored in the Mid-level feature amount storage unit 104. Is determined (step S104). When the similar category does not exist (step S104—NO), the category learning unit 107 learns the learning category classifier based on the learning category mid-level feature amount (step S105). Thereafter, the category learning unit 107 stores the learned classifier in the category classifier storage unit 108 (step S106).
On the other hand, when a similar category exists (step S104—YES), the feature quantity generation unit 106 generates a new mid-level feature quantity (step S107).

以上のように構成されたカテゴリ識別器生成装置１００によれば、高精度なカテゴリ識別器を生成することができる。具体的には、カテゴリ識別器生成装置１００に入力されるカテゴリ学習用画像の学習カテゴリと類似する学習済みカテゴリが自動的に判定され、類似カテゴリと、学習カテゴリとを弁別可能な属性の特徴量をMid-level特徴量として逐次的に生成し追加される。したがって、新たにカテゴリを学習させる場合に追加されるMid-level特徴量も高々１つになる。このように、カテゴリ識別器生成装置１００は、逐次的にカテゴリを学習することで、カテゴリの識別に必要な属性を大量に増やすことなく、必要最低限の属性の数でカテゴリを分類することができる。そのため、類似したカテゴリを学習する場合であってもカテゴリの弁別を容易にすることが可能になる。 According to the category discriminator generation device 100 configured as described above, a highly accurate category discriminator can be generated. Specifically, a learned category similar to the learning category of the category learning image input to the category discriminator generation device 100 is automatically determined, and the feature amount of the attribute that can distinguish the similar category from the learning category Are sequentially generated and added as Mid-level features. Therefore, there is at most one Mid-level feature amount added when a category is newly learned. As described above, the category discriminator generation device 100 can classify categories by the minimum number of attributes without increasing a large number of attributes necessary for category identification by learning the categories sequentially. it can. Therefore, even when learning similar categories, category discrimination can be facilitated.

また、非特許文献３のように、全てのカテゴリに最適な特徴を獲得するバッチ的な手法では、カテゴリを追加するたびに特徴を一新する必要があり、追加するカテゴリの数の増加に応じて計算量が増大してしまう。これに対し、本発明は、学習カテゴリと類似カテゴリとを弁別可能な特徴量のみを生成するため、カテゴリの数の増加に対して計算量が増大してしまうおそれが少ない。そのため、カテゴリの識別器を学習するのに効率的である。 In addition, as in Non-Patent Document 3, in the batch method of acquiring the optimum feature for all categories, it is necessary to renew the feature every time a category is added, and according to the increase in the number of categories to be added. As a result, the calculation amount increases. On the other hand, since the present invention generates only the feature quantity that can discriminate between the learning category and the similar category, the calculation amount is less likely to increase with the increase in the number of categories. Therefore, it is efficient to learn a category discriminator.

また、カテゴリ識別器生成装置１００は、新たなMid-level特徴量を追加して学習カテゴリの識別器を学習する場合、学習済みのカテゴリの識別器とMid-level特徴量の次元数が異なるため、再学習が必要となってしまう。しかし、本発明のように、類似していない学習済みカテゴリに関しては、新たに追加されたMid-level特徴量に対する重みを０として計算することで再学習を不要とすることができる。
また、以上のように構成されたカテゴリ識別装置２００によれば、カテゴリ識別器生成装置１００によって生成されたカテゴリ識別器を用いて識別対象となる画像を識別する。そのため、精度よく画像のカテゴリを識別することが可能になる。 In addition, when the category discriminator generating apparatus 100 learns a discriminator of a learning category by adding a new Mid-level feature quantity, the number of dimensions of the learned category discriminator and the Mid-level feature quantity is different. Re-learning becomes necessary. However, as in the present invention, for learned categories that are not similar, re-learning can be made unnecessary by calculating the weight for the newly added Mid-level feature amount as zero.
In addition, according to the category identification device 200 configured as described above, an image to be identified is identified using the category identifier generated by the category identifier generation device 100. Therefore, it becomes possible to identify the category of the image with high accuracy.

＜変形例＞
カテゴリ識別器生成装置１００に入力されるカテゴリ学習用画像は、１枚であってもよいし、複数枚であってもよい。また、カテゴリ識別装置２００に入力されるテスト画像は、１枚であってもよいし、複数枚であってもよい。 <Modification>
The category learning image input to the category discriminator generation device 100 may be one image or a plurality of images. Further, the test image input to the category identification device 200 may be one sheet or a plurality of sheets.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１０…カテゴリ学習用画像記憶部，２０…テスト画像記憶部，１００…カテゴリ識別器生成装置，１０１…Low-level特徴量抽出部，１０２…Low-level特徴量記憶部，１０３…Mid-level特徴量抽出部，１０４…Mid-level特徴量記憶部，１０５…類似カテゴリ判定部，１０６…特徴量生成部，１０７…カテゴリ学習部，１０８…カテゴリ識別器記憶部，２００…カテゴリ識別装置，２０１…Low-level特徴量抽出部，２０２…Mid-level特徴量抽出部，２０３…カテゴリ識別器 DESCRIPTION OF SYMBOLS 10 ... Category learning image memory | storage part, 20 ... Test image memory | storage part, 100 ... Category discriminator production | generation apparatus, 101 ... Low-level feature-value extraction part, 102 ... Low-level feature-value memory | storage part, 103 ... Mid-level feature Quantity extraction unit, 104 ... Mid-level feature quantity storage part, 105 ... Similar category determination part, 106 ... Feature quantity generation part, 107 ... Category learning part, 108 ... Category discriminator storage part, 200 ... Category discrimination device, 201 ... Low-level feature quantity extraction unit, 202 ... Mid-level feature quantity extraction unit, 203 ... Category identifier

Claims

A first image feature amount extraction unit that extracts a first image feature amount from a learning image for learning an image category;
A second image feature quantity extraction unit for extracting a second image feature quantity from the extracted first image feature quantity;
Similarity for determining whether the learned category similar to the category of the learning image exists based on the extracted second image feature amount and the second image feature amount of the learned category A category determination unit;
When it is determined that there is a similar learned category, the learning is similar based on the first image feature quantity of the learning image and the first image feature quantity of the similar learned category. A second image feature amount that is not similar to the second image feature amount of the learned category is generated, and the generated second image feature amount is added to the second image feature amount of the learning image to newly A feature amount generation unit for generating a second image feature amount;
A category learning unit for generating a category discriminator for identifying the new second image feature amount;
A category discriminator generating device.

The category learning unit adds the second newly added to the second image feature amount of the learning image in the relational expression of the output value of the linear classifier for the learned category that is not similar to the category of the learning image. The category discriminator generation device according to claim 1, wherein a weight for the image feature amount is eliminated.

A category identification device that identifies a category of an input image using the category identifier learned by the category learning unit of the category identifier generation device according to claim 1.

The computer program for functioning a computer as an apparatus of any one of Claims 1-3.