JP2015230559A

JP2015230559A - Object identification device, method, and program

Info

Publication number: JP2015230559A
Application number: JP2014116188A
Authority: JP
Inventors: 史紘佐々木; Fumihiro Sasaki
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2014-06-04
Filing date: 2014-06-04
Publication date: 2015-12-21

Abstract

PROBLEM TO BE SOLVED: To improve the identification performance by considering a deformation portion as one of features of DPM, in an approach to identification performance degradation of an object identification model base on the DPM due to the difference between an original dataset and a target dataset.SOLUTION: An appearance feature quantity is extracted using an intermediate dictionary that becomes gradually close from an original dictionary to a target dictionary. The appearance feature quantity is changed to a quantity appropriate for a target dataset, and a coefficient for a vector obtained by interconnecting the appearance feature quantity and the deformation feature quantity is learned. Therefore, an identifier for putting the deformation portion into a state appropriate for the target dataset can be produced.

Description

本発明は、物体識別装置、方法及びプログラムに関し、特に、物体の識別精度を向上させるための技術に関する。 The present invention relates to an object identification device, method, and program, and more particularly, to a technique for improving object identification accuracy.

物体識別モデルのデファクトスタンダードとして、Deformable Part Model（以後ＤＰＭ）が知られている（例えば、図１１参照）。 A Deformable Part Model (hereinafter referred to as DPM) is known as a de facto standard for an object identification model (see, for example, FIG. 11).

また、ＤＰＭでも使用されているアピアランス特徴量：Histogram of Oriented Gradient（以後、ＨＯＧと呼ぶ）は、現在、アピアランス特徴量のデファクトスタンダードとなっている。しかし近年では、Histogram of Sparse Code（以後、ＨＳＣと呼ぶ）に代表される、辞書学習によって得られた辞書（＝基底）をベースとして、アピアランス特徴量を抽出する手法が台頭してきている（例えば、図１２参照）。 The appearance feature quantity used in DPM: Histogram of Oriented Gradient (hereinafter referred to as HOG) is now the de facto standard for appearance feature quantities. However, in recent years, techniques for extracting appearance feature quantities based on dictionaries (= bases) obtained by dictionary learning represented by Histogram of Sparse Code (hereinafter referred to as HSC) have emerged (for example, (See FIG. 12).

また、学習サンプルデータセット（以後、元データセット）の辞書（以後、元辞書と呼ぶ）、及び評価サンプルデータセット（以後、目標データセット）の辞書（以後、目標辞書）を得た後に、元辞書と目標辞書の中間表現（以後、中間辞書）を抽出し、元辞書・目標辞書・中間辞書の全てを用いて表現した特徴量を学習することで、元データセットと目標データセットの画像特性の違い（物体の向き、撮影視点、解像度、等）による識別性能劣化を抑止する技術が既に知られている。 Further, after obtaining a dictionary of learning sample data sets (hereinafter referred to as original data sets) and a dictionary of evaluation sample data sets (hereinafter referred to as target data sets) (hereinafter referred to as target dictionaries), Image characteristics of the original data set and the target data set are extracted by extracting the intermediate representation of the dictionary and the target dictionary (hereinafter referred to as the intermediate dictionary) and learning the feature values expressed using all of the original dictionary, the target dictionary, and the intermediate dictionary. There is already known a technique for suppressing deterioration of the identification performance due to the difference (object orientation, shooting viewpoint, resolution, etc.).

図１３に、元辞書・目標辞書・中間辞書の一例を示す。図中左端が元辞書で、右端が目標辞書であるとすると、その間に並ぶ画像が中間辞書である。また、図１３では、「人物」が識別対象であり、「物体の向き」が元辞書と目標辞書の画像特性の違いである。 FIG. 13 shows an example of the original dictionary, the target dictionary, and the intermediate dictionary. If the left end in the figure is the original dictionary and the right end is the target dictionary, the images arranged between them are the intermediate dictionary. In FIG. 13, “person” is an identification target, and “object orientation” is a difference in image characteristics between the original dictionary and the target dictionary.

しかしながら、今までのＤＰＭをベースとした物体識別モデルは、元データセットと目標データセットの画像特性の違い（物体の向き、撮影視点、解像度、等）による識別性能劣化に対するアプローチにおいて、ＤＰＭの特徴のひとつであるDeformable Part（以後、変形部位と呼ぶ）を考慮しておらず、識別性能向上率は限定的であるという問題があった。 However, the object identification model based on DPM so far is based on the characteristics of DPM in the approach for the degradation of the identification performance due to the difference in image characteristics (object orientation, shooting viewpoint, resolution, etc.) between the original data set and the target data set. The Deformable Part (hereinafter referred to as a deformed part), which is one of the above, is not taken into consideration, and there is a problem that the identification performance improvement rate is limited.

例として、識別対象を人物、元データセットと目標データセットの画像特性の違いを人物の向き、変形部位を左腕、として考える（図１４参照）。この例では、元データセットのサンプルと目標データセットのサンプルで、画像中心からの左腕の位置が変わっているのがわかる。元データセットのサンプルの左腕の位置を「左腕の位置として尤もらしい」ものとしてＤＰＭを最適化した場合、目標データセットのサンプルでは左腕の位置の尤もらしさが低く、結果として人物として識別できない場合がある。 As an example, consider the identification target as a person, the difference in image characteristics between the original data set and the target data set as the person's orientation, and the deformed part as the left arm (see FIG. 14). In this example, it can be seen that the position of the left arm from the center of the image has changed between the sample of the original data set and the sample of the target data set. When DPM is optimized with the left arm position of the original data set sample as "probable as the left arm position", the likelihood of the left arm position is low in the target data set sample, and as a result it may not be identified as a person is there.

ＤＰＭは目標データセットのサンプルを入力として、サンプルの識別対象らしさ（この例では人物らしさ）をスコアとして出力する。例えば、人物らしいと識別したサンプルに対しては出力スコア値が大きくなり、人物らしくないと識別したサンプルは出力スコア値が小さくなる。最終的な識別結果は、ＤＰＭの出力スコアが任意の閾値以上であれば「検出」、閾値未満であれば「未検出」として判定する。したがって、人物画像のサンプルが入力された際に高いスコアを出力する事が求められる。また、人物画像のサンプルが入力された際に高い出力スコアが出ない場合は、前述の閾値を下げる事により「検出」という結果を得ることができるが、人物以外の画像のサンプルが入力された際にも「検出」という結果（つまり誤識別結果）を得る確率も増えるため、識別性能を向上させるためには人物画像のサンプルが入力された時のみ高いスコアを出力する事が求められる。 The DPM receives a sample of the target data set as an input, and outputs a sample identification target likelihood (personality in this example) as a score. For example, the output score value increases for a sample identified as human, and the output score value decreases for a sample identified as not human. The final identification result is determined as “detected” if the output score of the DPM is equal to or greater than an arbitrary threshold, and “undetected” if it is less than the threshold. Therefore, it is required to output a high score when a human image sample is input. If a high output score is not output when a human image sample is input, the result of “detection” can be obtained by lowering the above-mentioned threshold, but an image sample other than a human image is input. In particular, since the probability of obtaining a result of “detection” (that is, a misidentification result) increases, it is required to output a high score only when a sample of a human image is input in order to improve the identification performance.

ＤＰＭは前述した部位の位置変化（＝変形）をペナルティー付きで許容することで識別性能向上を図る物体識別モデルである。ペナルティーとはつまり出力スコア値の減点を意味し、すなわち出力スコア値が小さくなることを意味する。しかし、目標データセットの部位の位置を考慮せず、元データセットの部位の位置によりＤＰＭを最適化した場合、元データセットと目標データセット間で部位の位置が極端に違う場合はペナルティーが大きくなり、「人物画像のサンプルが入力された時に高いスコアを出力する」という識別性能向上の要件を満たせなくなる場合がある。 The DPM is an object identification model that improves the identification performance by allowing the above-described position change (= deformation) of the part with a penalty. The penalty means that the output score value is deducted, that is, the output score value becomes smaller. However, if the DPM is optimized based on the position of the original data set without considering the position of the target data set, the penalty will be large if the position of the position is extremely different between the original data set and the target data set. Therefore, there is a case where the requirement for improving the identification performance of “outputting a high score when a human image sample is input” may not be satisfied.

特許文献１には、物体識別精度を高める目的で、部位毎の画像特徴を評価する物体識別モデルが開示されている。本発明とは確かに部位毎に画像特徴を評価する点では似ている点がある。しかし、上述した変形部位を考慮できていないという問題は解消できていない。 Patent Document 1 discloses an object identification model that evaluates image features for each part for the purpose of improving object identification accuracy. It is similar to the present invention in that the image feature is certainly evaluated for each part. However, the problem that the deformation | transformation site | part mentioned above cannot be considered has not been solved.

非特許文献２には、元データセットと目標データセットの特性の違いによる識別性能劣化を抑止する目的で、元辞書、及び目標辞書を得た後に、元辞書と目標辞書の中間辞書を抽出する手法が開示されている。本発明とは確かに元辞書、目標辞書、中間辞書を生成する点では似ている点がある。しかし、ＤＰＭをベースとした物体識別モデルを考えた場合、変形部位を考慮できていない。 Non-Patent Document 2 extracts an intermediate dictionary between an original dictionary and a target dictionary after obtaining an original dictionary and a target dictionary for the purpose of suppressing degradation of identification performance due to a difference in characteristics between the original data set and the target data set. A technique is disclosed. The present invention is certainly similar in that it generates an original dictionary, a target dictionary, and an intermediate dictionary. However, when an object identification model based on DPM is considered, deformation sites cannot be considered.

そこで、ＤＰＭをベースとした物体識別モデルの、元データセットと目標データセットの特性の違いによる識別性能劣化に対するアプローチにおいて、ＤＰＭの特徴の一つである変形部位を考慮し、識別性能を向上させることを課題とする。 Therefore, in the approach to the degradation of the identification performance due to the difference in the characteristics of the original data set and the target data set in the object identification model based on DPM, the deformation performance which is one of the features of DPM is considered and the identification performance is improved. This is the issue.

すなわち、本発明は、識別対象に変形部位がある場合でも物体を識別できるように識別性能を向上させることを目的とする。 That is, an object of the present invention is to improve identification performance so that an object can be identified even when there is a deformed part in the identification target.

上記目的を達成する本発明は、第１の態様として、コンピュータを、ある識別対象が写った第１の画像に基づいて元辞書を生成する元辞書生成部と、前記識別対象が写った、前記第１の画像とは画像特性が異なる画像である第２の画像に基づいて目標辞書を生成する目標辞書生成部と、前記元辞書と前記目標辞書とに基づいて複数の中間辞書を生成する中間辞書生成部と、前記第１の画像の特徴量と該第１の画像に付加されたラベルとを識別器に入力し、前記元辞書を用いて前記識別器に学習をさせる識別器学習部として機能させ、前記識別器学習部が、前記第１の画像と前記第２の画像が混合した画像群と、各々に付されたラベルとを、前記識別器に入力し、前記中間辞書の各々を用いて前記識別器に学習をさせるように機能させ、前記第１の画像及び前記第２の画像には、前記識別対象の一部であって見かけが前記第１の画像と前記第２の画像とで異なる変形部位が写っていることを特徴とする、物体識別プログラムを提供する。 The present invention that achieves the above-described object provides, as a first aspect, a computer, an original dictionary generation unit that generates an original dictionary based on a first image in which a certain identification object is captured, and the identification object is captured, A target dictionary generating unit that generates a target dictionary based on a second image that has an image characteristic different from that of the first image, and an intermediate that generates a plurality of intermediate dictionaries based on the original dictionary and the target dictionary As a classifier learning unit for inputting a feature quantity of the first image and a label added to the first image to a classifier and causing the classifier to learn using the original dictionary The classifier learning unit inputs an image group in which the first image and the second image are mixed, and a label attached to each of the images to the classifier, and sets each of the intermediate dictionaries. Using the discriminator to function to learn, The image identification and the second image include a part of the identification target and a deformed part that is different in appearance in the first image and the second image. Provide a program.

本発明によれば、識別対象に変形部位がある場合でも物体を識別できるように識別性能を向上させることが可能となる。 According to the present invention, it is possible to improve the identification performance so that an object can be identified even when there is a deformed part in the identification target.

実施形態の学習フローを示すフローチャート図である。It is a flowchart figure which shows the learning flow of embodiment. 図１のＬ１の処理の概念図である。It is a conceptual diagram of the process of L1 of FIG. 図１のＬ２の処理の概念図である。It is a conceptual diagram of the process of L2 of FIG. 図１のＬ３の処理の概念図である。It is a conceptual diagram of the process of L3 of FIG. 図１のＬ４の処理の概念図（その１）である。FIG. 3 is a conceptual diagram (part 1) of the process of L4 in FIG. 1; 図１のＬ４の処理の概念図（その２）である。FIG. 3 is a conceptual diagram (part 2) of the process of L4 in FIG. 実施形態の学習過程の概念図である。It is a conceptual diagram of the learning process of embodiment. 実施形態の識別フローを示すフローチャート図である。It is a flowchart figure which shows the identification flow of embodiment. 実施例のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of an Example. 実施例の機能構成を示すブロック図である。It is a block diagram which shows the function structure of an Example. 変形部位モデル（ＤＰＭ）を説明する際に用いられる図である。It is a figure used when demonstrating a deformation | transformation site | part model (DPM). アピアランス特徴量（ＨＳＣ）を説明する際に用いられる図である。It is a figure used when explaining an appearance feature-value (HSC). 元辞書・目標辞書・中間辞書の概念を説明するための図である。It is a figure for demonstrating the concept of an original dictionary, a target dictionary, and an intermediate dictionary. 変形部位があるために物体識別に失敗する場合があることを説明するための図である。It is a figure for demonstrating that object identification may fail because there exists a deformation | transformation site | part.

以下に開示される発明は、ＤＰＭをベースとした物体識別モデルにおいて、元データセットと目標データセットの特性の違いによる識別性能劣化を抑止するアプローチとして変形部位を考慮するので、元データセットと目標データセットの特性の違いによる識別性能劣化を抑止することができる。 The invention disclosed below considers a deformed part as an approach for suppressing degradation of identification performance due to a difference in characteristics between an original data set and a target data set in an object identification model based on DPM. It is possible to suppress degradation of the identification performance due to the difference in the characteristics of the data set.

以下では、上記のような物体識別モデルを具体的に実現した物体識別プログラムのアルゴリズムについて説明し（図１ないし図８）、次に、物体識別プログラムを汎用のコンピュータで実行して同コンピュータを物体識別装置として機能させた場合の実施例を説明する（図９，図１０）。 In the following, an algorithm of an object identification program that specifically realizes the object identification model as described above will be described (FIGS. 1 to 8). Next, the object identification program is executed by a general-purpose computer, and the computer is An embodiment in the case of functioning as an identification device will be described (FIGS. 9 and 10).

本実施形態の物体識別モデルは、あらかじめそれが何であるか分かっている物体を学習した後、何であるか不明である物体を識別する。学習過程について図１ないし図７を参照しながら説明し、識別過程について図８を参照して説明する。 The object identification model of the present embodiment learns an object that is known in advance and then identifies an object that is unknown. The learning process will be described with reference to FIGS. 1 to 7, and the identification process will be described with reference to FIG.

＜学習フロー＞
元データセットの全て、及び目標データセットの一部にラベル付け（人＝１、それ以外＝０）がなされている場合を考え、図１及び図２ないし図７で示したＬ１〜Ｌ４処理について以下に説明する。 <Learning flow>
Considering the case where all of the original data set and a part of the target data set are labeled (person = 1, otherwise = 0), the L1 to L4 processes shown in FIG. 1 and FIGS. This will be described below.

・Ｌ１：元辞書・目標辞書の生成（図２）
このステップでは、元データセットから元辞書を、目標データセットから目標辞書を算出する。元データセット、元辞書、目標データセット、目標辞書は、下記のように表す（ただし、ｎ^s：元データセットのサンプル数、ｎ^t：目標データセットのサンプル数、ｍ：各サンプルの次元数）。・ L1: Generation of original dictionary and target dictionary (Fig. 2)
In this step, the original dictionary is calculated from the original data set, and the target dictionary is calculated from the target data set. The original data set, the original dictionary, the target data set, and the target dictionary are expressed as follows (where n ^{s is the} number of samples in the original data set, n ^{t is the} number of samples in the target data set, m is the number of dimensions of each sample) ).

元辞書・目標辞書の算出アルゴリズムについて各種手法があるが、例えば非特許文献１記載の方法が挙げられる。この方法に基づくと、下記式のように、元辞書・目標辞書が算出できる。なお、下記式のГは、各辞書にて得られるスパースコードである。 There are various methods for the calculation algorithm of the original dictionary and the target dictionary. For example, the method described in Non-Patent Document 1 can be cited. Based on this method, the original dictionary / target dictionary can be calculated as in the following equation. Note that Γ in the following formula is a sparse code obtained in each dictionary.

各サンプルの次元数ｍは、画像サイズによって決まる。元辞書、及び目標辞書の学習に際して、全サンプルについて次元数を統一する必要があるが、各サンプルの画像サイズが異なる場合がある。この場合、例えば下記方法により全サンプルに対して再サンプル処理を行い、次元数を決定する。
サンプルを予め決めた固定サイズに変倍する。
サンプルの一部を予め決めた固定サイズの枠で切り取る（枠の位置の決定については非特許文献４を参照）。 The dimension number m of each sample is determined by the image size. When learning the original dictionary and the target dictionary, it is necessary to unify the number of dimensions for all samples, but the image size of each sample may be different. In this case, for example, the resample process is performed on all samples by the following method to determine the number of dimensions.
The sample is scaled to a predetermined fixed size.
A part of the sample is cut out with a predetermined fixed-size frame (see Non-Patent Document 4 for determining the position of the frame).

・Ｌ２：中間辞書の生成（図３）
このステップでは、中間辞書：Ｄ（ｋ）（ただし、ｋ＝１〜Ｋ−１）を生成する。中間辞書間の変位：ΔＤ（ｋ）の算出アルゴリズムは、非特許文献２に記載の方法で実施する。 L2: Generation of intermediate dictionary (Fig. 3)
In this step, an intermediate dictionary: D (k) (where k = 1 to K−1) is generated. The algorithm for calculating the displacement: ΔD (k) between the intermediate dictionaries is performed by the method described in Non-Patent Document 2.

・Ｌ３：元辞書を用いて初期モデルを学習（図４）
このステップでは、元辞書を用いて元データセットの各サンプルに対するアピアランス特徴量及びラベルを入力とし、下記目的関数を係数：β（０）について学習し、学習結果として係数：β（０）を得る。アピアランス特徴量及びラベルは、下記のように表すことができる。 L3: Learning the initial model using the original dictionary (Fig. 4)
In this step, the appearance feature quantity and label for each sample of the original data set are input using the original dictionary, the following objective function is learned for the coefficient: β (0), and the coefficient: β (0) is obtained as a learning result. . The appearance feature value and the label can be expressed as follows.

元辞書を用いたアピアランス特徴量算出アルゴリズムは各種方法があるが例えば非特許文献３の方法が挙げられる。係数：β（０）の学習方法についても非特許文献４に記載がある。 There are various methods for calculating the appearance feature value using the original dictionary. For example, the method described in Non-Patent Document 3 can be cited. The learning method of the coefficient: β (0) is also described in Non-Patent Document 4.

・Ｌ４：中間辞書及び目標辞書を用いてモデルを逐次学習（図５、図６）
まず、元データセット、目標データセットを混合した各サンプルに対してＤ（ｋ）を用いてアピアランス特徴量及びラベルを算出する。（図５） L4: Sequential learning of models using intermediate dictionary and target dictionary (FIGS. 5 and 6)
First, an appearance feature value and a label are calculated using D (k) for each sample obtained by mixing the original data set and the target data set. (Fig. 5)

算出したアピアランス特徴量及びラベルを入力とし、下記目的関数を係数：β（ｋ）について学習し、学習結果として係数：β（ｋ）を得る。β（ｋ）の初期値をβ（ｋ−１）とする。本処理をｋ＝１〜Ｋまで繰り返す。（図６） The calculated appearance feature value and label are input, the following objective function is learned with respect to the coefficient: β (k), and the coefficient: β (k) is obtained as a learning result. Let β (k−1) be the initial value of β (k). This process is repeated from k = 1 to K. (Fig. 6)

Ｌ４の処理において、元辞書から段階的に目標辞書に近づく中間辞書を用いてアピアランス特徴量を抽出することにより、アピアランス特徴量を目標データセットに適したものに変化させていくと共に、アピアランス特徴量と変形特徴量を連結したベクターに対する係数を学習していくことで、変形部位を目標データセットに適した状態とする識別器を生成することが可能となる。
図７にＬ１〜Ｌ４の学習過程のイメージを示す。 In the process of L4, the appearance feature value is extracted using an intermediate dictionary that gradually approaches the target dictionary from the original dictionary, thereby changing the appearance feature value to be suitable for the target data set, and the appearance feature value. By learning the coefficient for the vector in which the deformed feature quantity is linked, it becomes possible to generate a discriminator that makes the deformed part suitable for the target data set.
FIG. 7 shows an image of the learning process of L1 to L4.

なお上記説明において、全体像（Root）及び変形部位（Deformable Part）の特徴量算出時に同一の辞書を用いているが、各々の辞書をＬ１時点で用意しておき、特徴量算出時点においても各々の辞書を用いる方法も可能である。例えば、変形部位用の辞書を算出する際には、元データセット及び目標データセットのサンプルをある画像サイズで再サンプリングした上で、元辞書・目標辞書・中間辞書を算出する。また、全体像用の辞書を算出する際に、元データセット及び目標データセットのサンプルを、変形部位のサンプリング画像サイズの１／２の解像度で再サンプリングした上で、元辞書・目標辞書・中間辞書を算出する。 In the above description, the same dictionary is used when calculating the feature values of the overall image (Root) and the deformable part (Deformable Part). However, each dictionary is prepared at the time point L1 and each time when the feature value is calculated. It is possible to use a dictionary of For example, when calculating the deformed part dictionary, the original data set, the target data set, and the intermediate dictionary are calculated after the samples of the original data set and the target data set are resampled at a certain image size. In addition, when calculating the dictionary for the whole image, the original data set and the target data set sample are resampled at a resolution half the sampling image size of the deformed part, and then the original dictionary, target dictionary, intermediate Calculate the dictionary.

また上記説明において、元辞書・目標辞書の学習に用いる全サンプルの画像サイズが同一である事を前提としているが、例えば画像のアスペクト比などを基準に元データセットを複数のサンプル群（以後、元サブデータセット）に分割し、同様の基準により目標データセットを複数（以後、目標サブデータセット）のサンプル群に分割し、各サブデータセットの辞書を算出する手法も可能である（アスペクト比に準じたサンプル群の分割手法については、非特許文献４参照）。 In the above description, it is assumed that the image sizes of all samples used for learning of the original dictionary and the target dictionary are the same. For example, the original data set is converted into a plurality of sample groups (hereinafter referred to as the aspect ratio of the image). It is also possible to divide the target data set into a plurality of sample groups (hereinafter referred to as target sub data set) according to the same criteria and calculate a dictionary for each sub data set (aspect ratio). (See Non-Patent Document 4 for the method of dividing the sample group in accordance with the above).

例えば元データセット・目標データセットをアスペクト比に準じて２つの元サブデータセット、及び２つの目標サブデータセットに分割した場合、Ｌ１の処理においては、一つ目の元サブデータセットの辞書（以後、サブ元辞書１）、二つ目の元サブデータセットの辞書（以後、サブ元辞書２）、一つ目の目標サブデータセットの辞書（以後、サブ目標辞書１）、二つ目の目標サブデータセットの辞書（以後、サブ目標辞書２）を算出する。Ｌ２の処理においては、サブ元辞書１とサブ目標辞書１間の中間辞書（中間辞書１）、及びサブ元辞書２とサブ目標辞書２間の中間辞書（中間辞書２）を算出する。Ｌ３以降のモデルの学習はマルチ・コンポーネント・モデル学習となり（非特許文献４参照）、サブ元辞書１・中間辞書１・サブ目標辞書１を用いたコンポーネント、及びサブ元辞書２・中間辞書２・サブ目標辞書２を用いたコンポーネントを学習する。 For example, when the original data set / target data set is divided into two original sub data sets and two target sub data sets according to the aspect ratio, in the processing of L1, the dictionary of the first original sub data set ( Thereafter, the sub-source dictionary 1), the second original sub-data set dictionary (hereinafter referred to as the sub-original dictionary 2), the first target sub-data set dictionary (hereinafter referred to as the sub-target dictionary 1), the second A dictionary of the target sub-data set (hereinafter referred to as sub-target dictionary 2) is calculated. In the process of L2, an intermediate dictionary between the sub-source dictionary 1 and the sub-target dictionary 1 (intermediate dictionary 1) and an intermediate dictionary between the sub-source dictionary 2 and the sub-target dictionary 2 (intermediate dictionary 2) are calculated. Model learning after L3 is multi-component model learning (see Non-Patent Document 4). Components using sub-source dictionary 1, intermediate dictionary 1, sub-target dictionary 1, sub-source dictionary 2, intermediate dictionary 2, A component using the sub-target dictionary 2 is learned.

＜識別フロー＞
図８を参照して上述の学習フローにより生成された識別器を用いて物体の識別を行う処理について説明する。
・Ｄ１：目標辞書を用いて特徴量抽出
評価対象画像に対してＬ１で生成した目標辞書を用いたアピアランス特徴量を抽出する。アピアランス特徴量の抽出方法はＬ４処理と同一の方法である。
・Ｄ２：識別器に特徴量を入力
学習済みの係数β（Ｋ）に基づいて、特徴量を入力された識別器が識別スコアを算出する。識別スコア算出方法は非特許文献４を参照のこと。
・Ｄ３：識別スコアの閾値判定
算出した識別スコアを閾値判定に掛け、閾値以上であれば「検出」、閾値未満であれば「未検出」、と識別する。 <Identification flow>
A process for identifying an object using the classifier generated by the learning flow described above will be described with reference to FIG.
D1: Feature value extraction using target dictionary Appearance feature values using the target dictionary generated in L1 are extracted for the evaluation target image. The appearance feature amount extraction method is the same as the L4 processing.
D2: Input Feature Value to Discriminator Based on the learned coefficient β (K), the discriminator input with the feature amount calculates an identification score. See Non-Patent Document 4 for the method of calculating the identification score.
D3: Threshold determination of identification score The calculated identification score is subjected to threshold determination and identified as “detected” if it is equal to or greater than the threshold, and “undetected” if it is less than the threshold.

以上に述べた本実施形態によると、アピアランス特徴量を目標データセットに適したものに変化させていくと共に、アピアランス特徴量と変形特徴量を連結したベクターに対する係数を学習していくことで、変形部位を目標データセットに適した状態とする識別器を生成する。そのような識別器を用いて物体の識別を行うので、変形部位が存在した場合でも識別精度が劣化しない。 According to the present embodiment described above, the appearance feature value is changed to a value suitable for the target data set, and the coefficient for the vector in which the appearance feature value and the deformed feature value are connected is learned. A discriminator is generated that puts the region in a state suitable for the target data set. Since an object is identified using such a classifier, the identification accuracy does not deteriorate even when a deformed part exists.

＜実施例＞
図９に、実施例のハードウェア構成例を示す。実施例に係る物体識別装置１は、一般的な情報処理端末と同様の構成を有する。即ち、実施例に係る物体識別装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１３及びＩ／Ｆ１４がバスを介して接続されている。また、Ｉ／Ｆ１４にはＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）１５及び操作部１６が接続されている。 <Example>
FIG. 9 shows a hardware configuration example of the embodiment. The object identification device 1 according to the embodiment has the same configuration as a general information processing terminal. That is, in the object identification device 1 according to the embodiment, a CPU (Central Processing Unit) 10, a RAM (Random Access Memory) 11, a ROM (Read Only Memory) 12, an HDD (Hard Disk Drive) 13 and an I / F 14 are buses. Connected through. Further, an LCD (Liquid Crystal Display) 15 and an operation unit 16 are connected to the I / F 14.

ＣＰＵ１０は演算手段であり、装置全体の動作を制御する。ＲＡＭ１１は、情報の高速な読み書きが可能な揮発性の記憶媒体であり、ＣＰＵ１０が情報を処理する際の作業領域として用いられる。ＲＯＭ１２は、読み出し専用の不揮発性記憶媒体であり、ファームウェア等のプログラムが格納されている。ＨＤＤ１３は、情報の読み書きが可能な不揮発性の記憶媒体であり、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や各種の制御プログラム、アプリケーション・プログラム等が格納されている。Ｉ／Ｆ１４は、バスと各種のハードウェアやネットワーク等を接続し制御する。ＬＣＤ１５は、ユーザが装置の状態を確認するための視覚的ユーザインタフェースである。操作部１６は、キーボードやマウス等、ユーザが装置に情報を入力するためのユーザインタフェースである。 The CPU 10 is a calculation means and controls the operation of the entire apparatus. The RAM 11 is a volatile storage medium capable of reading and writing information at high speed, and is used as a work area when the CPU 10 processes information. The ROM 12 is a read-only nonvolatile storage medium, and stores programs such as firmware. The HDD 13 is a non-volatile storage medium that can read and write information, and stores an OS (Operating System), various control programs, application programs, and the like. The I / F 14 connects and controls a bus and various hardware and networks. The LCD 15 is a visual user interface for the user to check the state of the apparatus. The operation unit 16 is a user interface such as a keyboard and a mouse for a user to input information to the apparatus.

上述のようなハードウェア資源を用いたソフトウェアプログラムによる情報処理により、以下に述べるような機能の構成がＣＰＵ１０などで形作られる制御部に実現する。以下、機能ブロックについて述べる。 By the information processing by the software program using the hardware resources as described above, the functional configuration described below is realized in the control unit formed by the CPU 10 or the like. Hereinafter, functional blocks will be described.

図１０に物体識別装置１の機能構成を示す。図示のように、物体識別装置１は、識別器学習機能部１１０と識別機能部１２０の他、各種保存部を備える。 FIG. 10 shows a functional configuration of the object identification device 1. As illustrated, the object identification device 1 includes various storage units in addition to the classifier learning function unit 110 and the identification function unit 120.

識別器学習機能部１１０は、図１に示した識別器学習機能を担う機能ブロックである。
識別器学習機能部１１０は、辞書学習画像入力部１１１、元辞書生成部１１２、目標辞書生成部１１３、中間辞書生成部１１４、学習画像・ラベル入力部１１５、特徴ベクター生成部１１６、識別器学習・生成部１１７を備える。 The discriminator learning function unit 110 is a functional block that bears the discriminator learning function shown in FIG.
The classifier learning function unit 110 includes a dictionary learning image input unit 111, an original dictionary generation unit 112, a target dictionary generation unit 113, an intermediate dictionary generation unit 114, a learning image / label input unit 115, a feature vector generation unit 116, and classifier learning. A generation unit 117 is provided.

辞書学習画像入力部１１１は、Ｌ１、Ｌ２の処理に必要な画像入力ブロックである。
元辞書生成部１１２は、Ｌ１の元辞書生成ブロックである。
目標辞書生成部１１３は、Ｌ１の目標辞書生成ブロックである。
中間辞書生成部１１４は、Ｌ２の中間辞書生成ブロックである。
学習画像・ラベル入力部１１５は、Ｌ３、Ｌ４の各データセットサンプル入力部である。
特徴ベクター生成部１１６は、Ｌ３、Ｌ４の各辞書を用いた特徴量抽出ブロックである。
識別器学習・生成部１１７は、Ｌ３、Ｌ４の係数学習ブロックである。
辞書保存部１１８は、Ｌ１、Ｌ２で生成した辞書保存部である。
識別器保存部１１９は、Ｌ３、Ｌ４で生成した識別器保存部である。 The dictionary learning image input unit 111 is an image input block necessary for the processing of L1 and L2.
The original dictionary generation unit 112 is an original dictionary generation block of L1.
The target dictionary generation unit 113 is an L1 target dictionary generation block.
The intermediate dictionary generation unit 114 is an L2 intermediate dictionary generation block.
The learning image / label input unit 115 is a data set sample input unit for each of L3 and L4.
The feature vector generation unit 116 is a feature amount extraction block using the L3 and L4 dictionaries.
The discriminator learning / generation unit 117 is a coefficient learning block of L3 and L4.
The dictionary storage unit 118 is a dictionary storage unit generated by L1 and L2.
The classifier storage unit 119 is a classifier storage unit generated in L3 and L4.

識別機能部１２０は、図８に示した識別処理機能を担う機能ブロックである。
識別機能部１２０は、評価画像入力部１２１、特徴ベクター生成部１２３、識別スコア算出部１２４、閾値判定部１２７を備える。 The identification function unit 120 is a functional block responsible for the identification processing function shown in FIG.
The identification function unit 120 includes an evaluation image input unit 121, a feature vector generation unit 123, an identification score calculation unit 124, and a threshold determination unit 127.

評価画像入力部１２１は、識別対象画像入力ブロックである。
特徴ベクター生成部１２３は、Ｄ１処理ブロックである。
識別スコア算出部１２４は、Ｄ２処理ブロックである。
閾値値判定部１２７は、Ｄ３処理ブロックである。
識別結果保存部１３０は、Ｄ３処理により得られる識別結果を保存するブロックである。 The evaluation image input unit 121 is an identification target image input block.
The feature vector generation unit 123 is a D1 processing block.
The identification score calculation unit 124 is a D2 processing block.
The threshold value determination unit 127 is a D3 processing block.
The identification result storage unit 130 is a block that stores the identification result obtained by the D3 process.

上述のような構成を持つ装置により、上記実施形態の処理は実施することができ、識別対象に変形部位がある場合でも物体を識別できる。 With the apparatus having the above-described configuration, the processing of the above embodiment can be performed, and an object can be identified even when there is a deformed part in the identification target.

１物体識別装置
１１０識別器学習機能部
１１１辞書学習画像入力部
１１２元辞書生成部
１１３目標辞書生成部
１１４中間辞書生成部
１１５学習画像・ラベル入力部
１１６特徴ベクター生成部
１１７識別器学習・生成部
１１８辞書保存部
１１９識別器保存部
１２０識別機能部
１２１評価画像入力部
１２３特徴ベクター生成部
１２４識別スコア算出部
１２７閾値判定部
１３０識別結果保存部 DESCRIPTION OF SYMBOLS 1 Object identification device 110 Classifier learning function part 111 Dictionary learning image input part 112 Original dictionary generation part 113 Target dictionary generation part 114 Intermediate dictionary generation part 115 Learning image and label input part 116 Feature vector generation part 117 Classifier learning and generation part 118 Dictionary Storage Unit 119 Classifier Storage Unit 120 Identification Function Unit 121 Evaluation Image Input Unit 123 Feature Vector Generation Unit 124 Discrimination Score Calculation Unit 127 Threshold Determination Unit 130 Identification Result Storage Unit

特開２００５−００４６１２号公報JP 2005-004612 A

M. Aharon, M. Elad, and A. Bruckstein. K-SVD : An algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311-4322, 2006. 2, 3, 5, 6M. Aharon, M. Elad, and A. Bruckstein.K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation.IEEE Transactions on Signal Processing, 54 (11): 4311-4322, 2006. 2, 3, 5 , 6 Ni, Jie, Qiang Qiu, and Rama Chellappa. "Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation." 2013 IEEE Conference on Computer Vision and Pattern RecognitionNi, Jie, Qiang Qiu, and Rama Chellappa. "Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation." 2013 IEEE Conference on Computer Vision and Pattern Recognition Ren, Xiaofeng, and Deva Ramanan. "Histograms of Sparse Codes for Object Detection.", 2013 IEEE Conference on Computer Vision and Pattern RecognitionRen, Xiaofeng, and Deva Ramanan. "Histograms of Sparse Codes for Object Detection.", 2013 IEEE Conference on Computer Vision and Pattern Recognition Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.9 (2010): 1627-1645.Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." Pattern Analysis and Machine Intelligence, IEEE Transactions on 32.9 (2010): 1627-1645.

Claims

Computer
An original dictionary generation unit that generates an original dictionary based on a first image in which a certain identification object is captured;
A target dictionary generating unit that generates a target dictionary based on a second image that has an image characteristic different from that of the first image in which the identification target is captured;
An intermediate dictionary generating unit that generates a plurality of intermediate dictionaries based on the original dictionary and the target dictionary;
The feature amount of the first image and the label added to the first image are input to a discriminator, and function as a discriminator learning unit that causes the discriminator to learn using the original dictionary,
The classifier learning unit inputs an image group in which the first image and the second image are mixed, and a label attached to each group to the classifier, and uses each of the intermediate dictionaries to Let the classifier function to learn,
The first image and the second image include a deformed portion that is a part of the identification target and has a different appearance between the first image and the second image. , Object identification program.

An original dictionary generation unit that generates an original dictionary based on a first image in which a certain identification object is captured;
A target dictionary generating unit that generates a target dictionary based on a second image that has an image characteristic different from that of the first image in which the identification target is captured;
An intermediate dictionary generating unit that generates a plurality of intermediate dictionaries based on the original dictionary and the target dictionary;
A discriminator learning unit that inputs the feature amount of the first image and a label added to the first image to a discriminator, and causes the discriminator to learn using the original dictionary;
The classifier learning unit inputs an image group in which the first image and the second image are mixed, and a label attached to each group to the classifier, and uses each of the intermediate dictionaries to Let the classifier learn,
The first image and the second image include a deformed portion that is a part of the identification target and has a different appearance between the first image and the second image. , Object identification device.

The original dictionary generation unit generates an original dictionary for a deformed part based on a third image showing the deformed part,
The target dictionary generation unit generates a target dictionary for a deformed part based on a fourth image showing the deformed part,
The object identification device according to claim 2, wherein the intermediate dictionary generation unit generates a plurality of intermediate dictionaries for deformed parts based on the original dictionary and the target dictionary for the deformed parts.

4. The object identification device according to claim 2, wherein the dictionary group is generated for each aspect ratio of the image.

An original dictionary generation step of generating an original dictionary based on a first image in which a certain identification object is shown;
A target dictionary generating step for generating a target dictionary based on a second image that is an image having a different image characteristic from the first image in which the identification target is captured;
An intermediate dictionary generating step for generating a plurality of intermediate dictionaries based on the original dictionary and the target dictionary;
A first discriminator learning step of inputting a feature amount of the first image and a label added to the first image to a discriminator, and causing the discriminator to learn using the original dictionary;
An image group in which the first image and the second image are mixed and a label attached to each image are input to the discriminator, and the discriminator learns using each of the intermediate dictionaries. Two discriminator learning steps;
The first image and the second image include a deformed portion that is a part of the identification target and has a different appearance between the first image and the second image. , Object identification method.