JP5846553B2

JP5846553B2 - Attribute learning and transfer system, recognizer generation device, recognizer generation method, and recognition device

Info

Publication number: JP5846553B2
Application number: JP2011127642A
Authority: JP
Inventors: 長谷川　修; 修長谷川; アラムカーウィーウォン; ピシャイカーンクアクーン; 大毅木村
Original assignee: Tokyo Institute of Technology NUC
Current assignee: Tokyo Institute of Technology NUC
Priority date: 2010-09-13
Filing date: 2011-06-07
Publication date: 2016-01-20
Anticipated expiration: 2031-06-07
Also published as: JP2012084117A

Description

本発明は、識別対象であるクラスをその特徴である属性により認識することで転移学習を可能とする技術に関し、クラスを認識するための属性の学習及び転移システム、認識器生成装置、認識器生成方法及び認識装置に関する。 The present invention relates to a technology that enables transfer learning by recognizing a class to be identified by an attribute that is a characteristic thereof, and relates to an attribute learning and transfer system for recognizing a class, a recognizer generation device, and a recognizer generation. The present invention relates to a method and a recognition device.

対象認識や対象識別についての個々の研究が、ここ１０年の間に大きな発達を遂げている。顔や車両のような特定の物体を検出するタスクについては、とても強力な検出器や認識器が利用可能である。このような検出器や認識器は、対象の特徴を示す低次元の特徴量（例えば、ＳＩＦＴ、ＳＵＲＦなど）と、サポートベクトルマシーン（ＳＶＭ）などの現代的な機械学習メカニズムと、の組み合わせにより得ることができる。しかしながらこのような手法では、良好な精度を得るために人手によりラベル付けされた多数の教師データを通常必要とし、各個別のクラスを学習するためには一般的に数１０万枚のサンプル画像を必要とする。 Individual research on object recognition and object identification has made great progress over the last decade. For tasks that detect specific objects such as faces or vehicles, very powerful detectors and recognizers are available. Such a detector or recognizer is obtained by a combination of a low-dimensional feature amount (for example, SIFT, SURF, etc.) indicating a target feature and a modern machine learning mechanism such as a support vector machine (SVM). be able to. However, such a technique typically requires a large number of manually labeled teacher data to obtain good accuracy, and typically requires several hundred thousand sample images to learn each individual class. I need.

また、多くの対象を認識する必要がある場合には他の問題が発生することがある。このような多くの対象を認識するという問題を解決するためには、これまでの手法では、各対象カテゴリーに対してそれぞれ新たな検出器を作った上で、それら検出器を学習する必要がある。しかしながら、新たな各検出器を効率的に学習する場合においても、上述したのと同様にして、人手によりラベル付けされた多数の教師データを必要とし、各個別のクラスを学習するために、一般的に数１０万枚のサンプル画像を必要とする。 Also, other problems may occur when it is necessary to recognize many objects. In order to solve the problem of recognizing many objects like this, it is necessary to create new detectors for each target category and learn these detectors with the conventional methods. . However, even when learning each new detector efficiently, in the same way as described above, a large number of manually labeled teacher data is required, and in order to learn each individual class, Therefore, several hundred thousand sample images are required.

コンピュータの見地から見れば、例えば何らかの効率的で自動的なラベリングツールが利用可能であれば、多数の学習データセットを用意することはそれほど大変なことではないと考えられるかもしれない。インターネットを介して多数の画像の集合へは簡単にアクセスすることができ、コンピュータハードウェアのパフォーマンスは近年では劇的に向上してきた。それにもかかわらず、ロボットのような知的エージェントの利用には、このようなことは当てはまらない。知的ロボットに対しては、ロボットはインターネットへのアクセスが限定されると共にハードウェア資源も限られており、また、実用的な実世界でのタスクがあまりにも一般的なものであるために、事前に学習された検出器のみを利用するものとしてはとてもこのタスクを解決することはできそうにない。従って最近では、多くの研究者達が、対象の属性（例えば非特許文献１に例示される。）や、対象の部分を考慮することによる対象認識についてより興味を持つようになってきた。 From a computer perspective, for example, if some efficient and automatic labeling tool is available, it may be considered that preparing a large number of training data sets is not that difficult. Many sets of images can be easily accessed via the Internet, and the performance of computer hardware has improved dramatically in recent years. Nevertheless, this does not apply to the use of intelligent agents such as robots. For intelligent robots, robots have limited access to the Internet and limited hardware resources, and practical real-world tasks are too common, It seems unlikely that this task can be solved by using only pre-learned detectors. Therefore, recently, many researchers have become more interested in object recognition by taking into account the attributes of the object (eg, exemplified in Non-Patent Document 1) and the part of the object.

複数の対象同士の間には、通常、何らかの共通属性が存在する（例えば、ライオン、タイガー、ドッグ、キャットなどでは、全て４本足の動物であるという共通の属性が存在する。）。非特許文献１において提案されているように、人間は、例から学習してそれを十分に抽象化することによって、少なくとも３００００の関連のある対象クラスを区別することができる（人間は、高次元の特徴記述が与えられた時には、完全に未知の対象クラスであっても検出することができる。）。このことはつまり、１つの対象クラスにおいて発見された属性の知識が、同じ属性を含む他の異なる対象クラスへの利用のために転移されているものと考えられている。コンピュータビジョンにおける多くのこれまでの貴重な成果が、転移された属性を利用することで、未知の対象クラスの検出がまさに可能となることを既に示している（例えば、非特許文献１を参照されたい。）。 There is usually some common attribute between multiple subjects (for example, there is a common attribute in lions, tigers, dogs, cats, etc. that all animals are four-legged). As proposed in Non-Patent Document 1, humans can distinguish at least 30000 related object classes by learning from examples and fully abstracting them (humans are high dimensional). If a feature description is given, even a completely unknown target class can be detected.) This means that the knowledge of attributes found in one target class has been transferred for use to other different target classes that contain the same attributes. Many previous valuable achievements in computer vision have already shown that the use of transferred attributes makes it possible to detect unknown target classes (see, for example, Non-Patent Document 1). I want.)

C. Lampert et al., "Learning to detect unseen object classes by between-class attribute transfer, " in CVPR, 2009.C. Lampert et al., "Learning to detect unseen object classes by between-class attribute transfer," in CVPR, 2009.

共通属性の学習の可能性とそれら属性を新規クラスの検出への利用に転移させることは、現在のロボティクスでは極めて有効である。簡単に説明するため、オフィス内での移動ロボットの利用を想定する。ロボットを他の部屋へと移動させて我々のために対象Ｂを取ってくるようにロボットに命令したい場合に、そのような対象Ｂの画像をロボットに提示するために我々が用意しているという状況は、とてもありえそうなことではない。対象の画像を必要とせずにロボットに命令を与える唯一の方法は、対象の属性を言葉で説明することである。これは、対象の１つのクラスからの属性をロボットに学習させ、さらに、その学習させた属性を転移させて未知クラスに属する新たな対象の認識へ利用させることで、解決されるべきである。 Transferring the possibility of learning common attributes and using them to detect new classes is extremely effective in current robotics. For the sake of simplicity, assume the use of a mobile robot in the office. If you want the robot to move to another room and instruct the robot to pick up the subject B for us, we are prepared to present the image of the subject B to the robot The situation is not very likely. The only way to give instructions to the robot without requiring an image of the object is to describe the attributes of the object in words. This should be solved by letting the robot learn the attributes from one class of objects, and then transfer the learned attributes to be used to recognize new objects belonging to the unknown class.

しかしながら、属性の学習及び転移によって未知対象クラスの検出が可能となることがこれまでに示されたにも関わらず、これまでに提案された属性の転移及び学習手法をロボットでの利用に応用することについては以下に述べるような課題がある。 However, despite the fact that it has been shown so far that it is possible to detect unknown target classes through attribute learning and transfer, we apply the proposed attribute transfer and learning methods to robotic use. There are problems as described below.

まず、従来手法では、学習した属性は他の対象クラスでの利用に転移可能であるものの、各属性検出器を学習する事前の学習段階に関して、完全にバッチ処理となっている。従来手法では、任意の１つの属性の検出器を学習するために、巨大な教師画像データセットを必要とする。また、ロボットで利用するためには、システムは、教師画像を取得したときにはいつでもより柔軟に学習すべきであり、さらに、必要な場合にはいつでも識別すべきである。従って、完全に追加的な、属性の学習及び転移手法が必要である。 First, in the conventional method, although the learned attribute can be transferred to use in another target class, the batch process is completely performed in relation to the prior learning stage for learning each attribute detector. The conventional method requires a huge teacher image data set in order to learn a detector having any one attribute. Also, for use with robots, the system should learn more flexibly whenever a teacher image is acquired, and should identify whenever necessary. Therefore, a completely additional attribute learning and transfer approach is needed.

ここで、非特許文献１に開示された従来手法を例に課題を説明する。非特許文献１に開示された従来手法では、個別の属性それぞれについての識別器を学習する必要がある。テスト段階では、各属性識別器が各属性についての確率を予測し、ベイズ理論に基づいて最終的な確率スコアが計算される。各属性識別器はＳＶＭによって学習され、学習には数時間を必要とする。これを全ての属性（８５個の属性）について行うと、あまりにも計算負荷が高くなり、ロボティクスや他のオンラインアプリケーションには事実上利用することができない。また、全ての属性に対してＳＶＭを再度学習することは非現実的であることから、新たな入力教師データを追加的に学習することができない。 Here, the problem will be described by taking the conventional method disclosed in Non-Patent Document 1 as an example. In the conventional method disclosed in Non-Patent Document 1, it is necessary to learn a classifier for each individual attribute. In the test phase, each attribute classifier predicts the probability for each attribute, and a final probability score is calculated based on Bayesian theory. Each attribute classifier is learned by SVM and requires several hours for learning. If this is performed for all the attributes (85 attributes), the calculation load becomes too high and cannot be used practically for robotics or other online applications. In addition, since it is unrealistic to learn SVM again for all attributes, new input teacher data cannot be additionally learned.

本発明は、このような問題点を解決するためになされたものであり、オンラインかつ追加学習が可能な属性の学習及び転移システム、及び、学習及び転移方法を提供することを目的とする。 The present invention has been made to solve such problems, and it is an object of the present invention to provide an attribute learning and transfer system and a learning and transfer method that are online and can be additionally learned.

本発明にかかる属性の学習及び転移システムは、入力データ及び教師データから特徴を抽出する特徴抽出部と、与えられる属性情報を前記教師データにラベル付けするラベリング部と、前記入力データに含まれる属性を識別する属性識別器であって、当該属性識別器をノード及び当該ノード間を接続するエッジを含む自己増殖型ニューラルネットワークを用いて構成し、当該自己増殖型ニューラルネットワークを前記属性により識別される識別内容に応じて複数の部分に分割し、前記ラベリング部でラベル付けされた前記属性情報により特定される前記自己増殖型ニューラルネットワークの部分に対して、前記特徴抽出部で抽出された前記教師データの特徴を教師パターンとして入力し、当該自己増殖型ニューラルネットワークにおいて前記教師パターンに基づいて前記ノード及び前記エッジを生成する識別器生成部と、前記識別器生成部で生成された属性識別器を保持する識別器保持部と、前記入力データが入力された場合に、前記識別器保持部で保持された前記属性識別器を構成する前記自己増殖型ニューラルネットワークのそれぞれの部分に対して、前記入力データから抽出された特徴を入力パターンとして入力し、当該入力パターンと前記自己増殖型ニューラルネットワークに含まれる前記ノードとの第１の類似度をそれぞれの前記自己増殖型ニューラルネットワークの各部分において算出し、当該算出した第１の類似度に応じて、前記識別内容のいずれの属性が前記入力データに含まれるかを識別する属性識別部と、複数のクラスについてそれぞれが含む属性情報が与えられ、前記属性識別部で識別された前記入力データの属性と前記クラスの属性情報とを比較して第２の類似度を求め、当該算出した第２の類似度に応じて、前記複数のクラスのうちでいずれのクラスに前記入力データが含まれるか識別するクラス識別部と、を備えるものである。 An attribute learning and transfer system according to the present invention includes a feature extraction unit that extracts features from input data and teacher data, a labeling unit that labels given attribute information on the teacher data, and attributes included in the input data An attribute discriminator that comprises a node and a self-propagating neural network including edges connecting the nodes, and the self-propagating neural network is identified by the attribute The teacher data extracted by the feature extraction unit with respect to the part of the self-propagating neural network that is divided into a plurality of parts according to the identification content and specified by the attribute information labeled by the labeling unit Is input as a teacher pattern, and in the self-propagating neural network, When the input data is input, the discriminator generating unit that generates the node and the edge based on the teacher pattern, the discriminator holding unit that holds the attribute discriminator generated by the discriminator generating unit, For each part of the self-propagating neural network constituting the attribute classifier held by the classifier holding unit, a feature extracted from the input data is input as an input pattern, and the input pattern and the A first similarity with the node included in the self-propagating neural network is calculated in each part of the self-propagating neural network, and any one of the identification contents is determined according to the calculated first similarity. An attribute identification unit for identifying whether the input data is included in the input data, and attribute information included for each of the plurality of classes. And comparing the attribute of the input data identified by the attribute identification unit with the attribute information of the class to obtain a second similarity, and according to the calculated second similarity, the plurality of classes A class identifying unit for identifying which class includes the input data.

これにより、オンラインかつ追加学習が可能な属性の学習及び転移を実現することができる。 Thereby, learning and transfer of attributes that can be performed online and additional learning can be realized.

また、前記識別器生成部は、前記属性が含まれていることを示す第１の部分と、前記属性が含まれていないことを示す第２の部分と、に前記属性識別器を分割するようにしてもよい。これにより、簡単な構成で２値の属性の識別を実現することができる。 The classifier generator may divide the attribute classifier into a first part indicating that the attribute is included and a second part indicating that the attribute is not included. It may be. As a result, binary attribute identification can be realized with a simple configuration.

さらにまた、前記入力データ及び教師データは、画像データ、音声データ、時系列データ、又は、これらを組み合わせたデータであるようにしてもよいし、前記特徴抽出部は、前記画像データからＳＩＦＴ特徴量、SURF特徴量、rg-SIFT特徴量、PHOG特徴量、cq特徴量、Lss-histogram特徴量を抽出するようにしてもよい。 Furthermore, the input data and the teacher data may be image data, audio data, time-series data, or a combination of these, and the feature extraction unit may perform SIFT feature values from the image data. , SURF feature value, rg-SIFT feature value, PHOG feature value, cq feature value, Lss-histogram feature value may be extracted.

また、前記自己増殖型ニューラルネットワークは、Self-Organizing and Incremental Neural Networksであるようにしてもよい。これにより、完全なオンラインかつ追加学習を実現することができる。 The self-propagating neural network may be Self-Organizing and Incremental Neural Networks. Thereby, complete online and additional learning can be realized.

本発明にかかる認識器生成装置は、識別対象であるクラスをその特徴である属性により認識する認識器を、教師データの特徴量を学習することにより生成する認識器生成装置であって、前記クラス及び前記属性がラベル重みとして付された教師データから特徴量を重みベクトルとして抽出する特徴抽出部と、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出部と、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定部と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理部と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新部と、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散するラベル重み更新部と、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除部と、を有し、前記ラベル重み更新部は、前記ノード削除部がノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散するものである。 The recognizer generation device according to the present invention is a recognizer generation device that generates a recognizer for recognizing a class to be identified by an attribute that is a feature by learning a feature amount of teacher data. And a feature extraction unit that extracts a feature quantity as a weight vector from the teacher data in which the attribute is attached as a label weight, and calculates the distance between the input node and each node using the extracted weight vector as an input node A winner node extraction unit that extracts a node closest to the input node and a node closest to the second as a first winner node and a second winner node, respectively, the input node, and the first and second winner nodes. A node insertion determination unit that determines whether or not to insert the input node as a new node based on the distance; and the input node as a new node. Otherwise, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is set to 0. If there is an edge, the age is set to 0, and the first winner is An edge management unit that increments the age of all edges of a node and deletes an edge that has reached a predetermined age, and when the input node is not inserted as a new node, the weight vector of the first winner node is the input node A node weight updating unit for updating based on a weight vector of the label, and a label weight for diffusing at least a part of the label weight of the input node to the first and second winner nodes when the input node is not inserted as a new node An update unit, and a node deletion unit that deletes a node at a predetermined timing according to the node density, and updates the label weight. Parts are when the node deleting unit deletes the node is at least a portion of the label weights with deletion node intended to diffuse into neighboring nodes of the removed node.

これにより、オンラインかつ追加学習が可能な認識器生成装置を提供することができる。 Thereby, the recognizer production | generation apparatus which can be added online and can be provided can be provided.

本発明にかかる認識器生成方法は、識別対象であるクラスをその特徴である属性により認識する認識器を、教師データの特徴量を学習することにより生成する認識器生成方法であって、前記クラス及び前記属性がラベル重みとして付された教師データから特徴量を重みベクトルとして抽出する特徴抽出ステップと、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出ステップと、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定ステップと、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理ステップと、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新ステップと、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散する第１ラベル重み更新ステップと、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除ステップと、前記ノード削除ステップにてノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散する第１ラベル重み更新ステップとを有するものである。 The recognizing device generating method according to the present invention is a recognizing device generating method for generating a recognizing device that recognizes a class to be identified by an attribute that is a feature of the class by learning a feature amount of teacher data. And a feature extraction step of extracting a feature quantity as a weight vector from the teacher data to which the attribute is attached as a label weight, and calculating the distance between the input node and each node using the extracted weight vector as an input node A winner node extracting step of extracting a node closest to the input node and a node closest to the second as a first winner node and a second winner node, respectively, and the input node and the first and second winner nodes. A node insertion determination step for determining whether to insert the input node as a new node based on the distance; and the input node If not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is 0, if there is an edge, its age is 0, An edge management step of incrementing the ages of all edges of the first winner node and deleting edges that have reached a predetermined age; and when the input node is not inserted as a new node, the weight vector of the first winner node A node weight update step for updating the input node based on the weight vector of the input node, and if the input node is not inserted as a new node, at least a part of the label weight of the input node is assigned to the first and second winner nodes The first label weight updating step for spreading and deleting nodes according to the node density at a predetermined timing And a first label weight updating step for diffusing at least a part of the label weight of the deleted node to nodes around the deleted node when the node is deleted in the node deleting step. is there.

これにより、オンラインかつ追加学習が可能な認識器生成方法を提供することができる。 Thereby, it is possible to provide a recognizer generation method capable of online and additional learning.

本発明にかかるプログラムは、識別対象であるクラスをその特徴である属性により認識する認識器を、教師データの特徴量を学習することにより生成する処理をコンピュータに実行させるプログラムであって、前記クラス及び前記属性がラベル重みとして付された教師データから特徴量を重みベクトルとして抽出する特徴抽出処理と、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出処理と、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定処理と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理処理と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新処理と、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散する第１ラベル重み更新処理と、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除処理と、前記ノード削除ステップにてノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散する第１ラベル重み更新処理とをコンピュータに実行させるものである。 A program according to the present invention is a program for causing a computer to execute a process of generating a recognizer that recognizes a class to be identified by an attribute that is a feature by learning a feature amount of teacher data, the class And feature extraction processing for extracting a feature quantity as a weight vector from the teacher data to which the attribute is assigned as a label weight, and using the extracted weight vector as an input node, the distance between the input node and each node is calculated. A winner node extraction process for extracting a node closest to the input node and a node closest to the second as a first winner node and a second winner node, respectively, and the input node and the first and second winner nodes. A node insertion determination process for determining whether to insert the input node as a new node based on the distance; and When a force node is not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is 0, and if there is an edge, its age is 0. In addition, an edge management process for incrementing the age of all edges of the first winner node and deleting an edge reaching a predetermined age, and when the input node is not inserted as a new node, the first winner node Node weight update processing for updating the weight vector of the input node based on the weight vector of the input node, and when the input node is not inserted as a new node, at least a part of the label weight of the input node is the first and second A first label weight update process that spreads to the winner node and a node that deletes the node according to the node density at a predetermined timing. When the node is deleted in the node deletion step, the computer is caused to execute a first label weight update process for diffusing at least a part of the label weight of the deleted node to nodes around the deleted node. Is.

これにより、オンラインかつ追加学習が可能なプログラムを提供することができる。 As a result, it is possible to provide a program that allows online and additional learning.

本発明にかかる認識装置は、入力データから認識すべき認識対象をクラスとし、当該クラスをその特徴である属性により認識することで転移学習が可能な認識装置であって、前記入力データから特徴量を重みベクトルとして抽出する特徴抽出部と、認識器生成装置に、前記クラス及び前記属性がラベル重みとして付された教師データを入力し、その特徴量を学習することで生成された認識器と、前記認識器が有する重みベクトルからなる複数の学習済ノードと前記入力データから抽出した重みベクトルとの距離に応じて、認識結果を出力する結果出力部とを有し、前記認識器生成装置は、前記教師データから特徴量を重みベクトルとして抽出する特徴抽出部と、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出部と、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定部と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理部と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新部と、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散するラベル重み更新部と、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除部と、を有し、前記ラベル重み更新部は、前記ノード削除部がノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散し、所定数の前記教師データを入力した後の各ノードが前記学習済ノードとされ、当該学習済ノードにより前記認識器が構成されるものである。 A recognition apparatus according to the present invention is a recognition apparatus capable of transfer learning by using a recognition target to be recognized from input data as a class, and recognizing the class based on an attribute that is a feature of the recognition object. A feature extraction unit that extracts a vector as a weight vector, a recognizer generated by inputting teacher data with the class and the attribute added as label weights to a recognizer generation device, and learning the feature amount; According to the distance between a plurality of learned nodes composed of weight vectors possessed by the recognizer and the weight vectors extracted from the input data, a result output unit that outputs a recognition result; A feature extraction unit that extracts a feature amount from the teacher data as a weight vector, and the extracted weight vector as an input node, and the input node and each node A winner node extraction unit that extracts a node closest to the input node and a node closest to the input node as a first winner node and a second winner node, respectively, the input node, the first and second A node insertion determination unit that determines whether or not to insert the input node as a new node based on a distance from the winner node; and when the input node is not inserted as a new node, the first winner node and the first node If there is no edge between the two winner nodes, an edge is generated and the age is set to 0. If there is an edge, the age is set to 0, and the age of all the edges of the first winner node is incremented. An edge management unit that deletes an edge that has reached the age of, and a weight vector of the first winner node when the input node is not inserted as a new node A node weight update unit that updates based on the weight vector of the input node and, when the input node is not inserted as a new node, diffuses at least a part of the label weight of the input node to the first and second winner nodes A label weight updating unit that deletes the node according to the node density at a predetermined timing, and the label weight updating unit is deleted when the node deleting unit deletes the node. At least a part of the label weight of the node is diffused to nodes around the deleted node, and each node after inputting a predetermined number of the teacher data is set as the learned node, and the recognizer recognizes the recognizer Is constituted.

これにより、オンラインかつ追加学習が可能な認識器を用いた認識装置を提供することができる。 Thereby, it is possible to provide a recognition device using a recognizer that is online and capable of additional learning.

本発明にかかる認識方法は、識別対象であるクラスをその特徴である属性により認識する認識器を、教師データの特徴量を学習することにより生成し、入力データを当該認識器により認識する認識方法であって、前記クラス及び前記属性がラベル重みとして付された教師データから特徴量を重みベクトルとして抽出する第１特徴抽出ステップと、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出ステップと、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定ステップと、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理ステップと、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新ステップと、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散する第１ラベル重み更新ステップと、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除ステップと、前記ノード削除ステップにてノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散する第１ラベル重み更新ステップと、所定数の前記教師データを入力した後の各ノードを学習済ノードとし、当該学習済ノードにより前記認識器を構成する認識器生成ステップと、入力データから特徴量を重みベクトルとして抽出する第２特徴量抽出ステップと、前記認識器生成ステップにて生成された前記認識器の前記複数の学習済ノードと、前記入力データから抽出した重みベクトルとの距離に応じて、認識結果を出力する結果出力ステップとを有するものである。 The recognition method according to the present invention generates a recognizer that recognizes a class to be identified by an attribute that is a feature by learning a feature amount of teacher data, and recognizes input data by the recognizer. A first feature extraction step of extracting a feature quantity as a weight vector from teacher data to which the class and the attribute are attached as label weights, and the extracted weight vector as an input node, Calculating a distance to a node, extracting a node closest to the input node and a node closest to the second as a first winner node and a second winner node, respectively, the input node; A node insertion rule for determining whether to insert the input node as a new node based on the distance between the first and second winner nodes If the step and the input node are not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is 0, and if there is an edge When the age is set to 0, the age of all edges of the first winner node is incremented, and an edge management step of deleting an edge that reaches a predetermined age, and when the input node is not inserted as a new node, A node weight update step of updating the weight vector of the first winner node based on the weight vector of the input node; and if the input node is not inserted as a new node, at least a part of the label weight of the input node is A first label weight update step spreading to the first and second winner nodes, and at a predetermined timing, the node A node deletion step for deleting a node according to the degree, and a first label weight for diffusing at least a part of the label weight of the deletion node to nodes around the deletion node when the node is deleted in the node deletion step An update step, each node after inputting a predetermined number of the teacher data as a learned node, a recognizer generating step that configures the recognizer with the learned node, and extracting a feature quantity from the input data as a weight vector A recognition result is output according to the distance between the plurality of learned nodes of the recognizer generated in the recognizer generating step and the weight vector extracted from the input data. And a result output step.

これにより、オンラインかつ追加学習が可能な認識器を用いた認識方法を提供することができる。 Thereby, it is possible to provide a recognition method using a recognizer capable of online and additional learning.

本発明にかかるプログラムは、識別対象であるクラスをその特徴である属性により認識する認識器を、教師データの特徴量を学習することにより生成し、入力データを当該認識器により認識する処理をコンピュータに実行させるためのプログラムであって、前記クラス及び前記属性がラベル重みとして付された教師データから特徴量を重みベクトルとして抽出する第１特徴抽出処理と、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出処理と、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定処理と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理処理と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新処理と、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散する第１ラベル重み更新処理と、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除処理と、前記ノード削除処理にてノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散する第１ラベル重み更新処理と、所定数の前記教師データを入力した後の各ノードを学習済ノードとし、当該学習済ノードにより前記認識器を構成する認識器生成処理と、入力データから特徴量を重みベクトルとして抽出する第２特徴量抽出処理と、前記認識器生成処理にて生成された前記認識器の前記複数の学習済ノードと、前記入力データから抽出した重みベクトルとの距離に応じて、認識結果を出力する結果出力処理と、をコンピュータに実行させるものである。 The program according to the present invention generates a recognizer that recognizes a class to be identified by an attribute that is a feature by learning a feature amount of teacher data, and performs processing for recognizing input data by the recognizer. A first feature extraction process for extracting a feature amount as a weight vector from teacher data in which the class and the attribute are assigned as label weights, and the extracted weight vector as an input node Calculating a distance between the input node and each node, and extracting a node closest to the input node and a second closest node as a first winner node and a second winner node, respectively, Based on the distance between the input node and the first and second winner nodes, the input node is inserted as a new node. Node insertion determination processing for determining whether or not the input node is not inserted as a new node, and if there is no edge between the first winner node and the second winner node, an edge is generated and An edge management process for setting the age to 0, if there is an edge, setting the age to 0, further incrementing the age of all edges of the first winner node, and deleting an edge that has reached a predetermined age; and the input node Is not inserted as a new node, a node weight update process for updating the weight vector of the first winner node based on the weight vector of the input node, and when the input node is not inserted as a new node, A first label weight update process for diffusing at least a part of the label weights to the first and second winner nodes; and a predetermined timing In the node deletion process for deleting a node according to the node density, and at the time of deleting a node in the node deletion process, at least a part of the label weight of the deleted node is diffused to nodes around the deleted node. The first label weight updating process, each node after inputting a predetermined number of the teacher data as a learned node, a recognizer generating process for configuring the recognizer by the learned node, and a feature amount from the input data According to the distance between the second feature amount extraction process to be extracted as a weight vector, the plurality of learned nodes of the recognizer generated by the recognizer generation process, and the weight vector extracted from the input data, A result output process for outputting a recognition result is executed by a computer.

これにより、オンラインかつ追加学習が可能な認識器を用いたプログラムを提供することができる。 Thereby, it is possible to provide a program using a recognizer capable of online and additional learning.

本発明にかかるロボット装置は、入力データ取得部と、前記入力データから認識すべき認識対象をクラスとし、当該クラスをその特徴である属性により認識することで転移学習が可能な認識装置と、を有し、前記認識装置は、認識器生成装置と、前記入力データから特徴量を重みベクトルとして抽出する特徴抽出部と、前記認識器生成装置に、前記クラス及び前記属性がラベル重みとして付された教師データを入力し、その特徴量を学習することで生成された認識器と、前記認識器が有する重みベクトルからなる複数の学習済ノードと前記入力データから抽出した重みベクトルとの距離に応じて、認識結果を出力する結果出力部とを有し、前記認識器生成装置は、前記教師データから特徴量を重みベクトルとして抽出する前記特徴抽出部と、前記抽出された重みベクトルを入力ノードとし、当該入力ノードと各ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する勝者ノード抽出部と、前記入力ノードと、前記第１及び第２勝者ノードとの距離に基づき、当該入力ノードを新たなノードとして挿入するか否かを判定するノード挿入判定部と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードと前記第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とし、さらに前記第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除するエッジ管理部と、前記入力ノードを新たなノードとして挿入しない場合、前記第１勝者ノードの重みベクトルを当該入力ノードの重みベクトルに基づき更新するノード重み更新部と、前記入力ノードを新たなノードとして挿入しない場合、当該入力ノードの前記ラベル重みの少なくとも一部を前記第１及び第２勝者ノードに拡散するラベル重み更新部と、所定のタイミングで、そのノード密度に応じてノードを削除するノード削除部と、を有し、前記ラベル重み更新部は、前記ノード削除部がノードを削除する際、削除ノードが有するラベル重みの少なくとも一部を当該削除ノードの周辺のノードに拡散し、所定数の前記教師データを入力した後の各ノードは、前記学習済ノードとし、当該学習済ノードにより前記認識器が構成されるものである。 A robot apparatus according to the present invention includes an input data acquisition unit, and a recognition apparatus capable of transfer learning by recognizing a class to be recognized from the input data and recognizing the class by an attribute that is a characteristic of the class. The recognition device includes a recognizer generation device, a feature extraction unit that extracts a feature quantity from the input data as a weight vector, and the class and the attribute are attached to the recognizer generation device as label weights. According to the distance between the recognizer generated by inputting the teacher data and learning the feature amount, a plurality of learned nodes composed of the weight vectors of the recognizer and the weight vector extracted from the input data A result output unit that outputs a recognition result, and the recognizer generation device extracts the feature amount from the teacher data as a weight vector, and The extracted weight vector is used as an input node, the distance between the input node and each node is calculated, and the node closest to the input node and the second closest node are respectively the first winner node and the second winner node. And a node insertion determination unit that determines whether to insert the input node as a new node based on the distance between the input node and the first and second winner nodes, When the input node is not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is set to 0. An edge management unit that increments the age of all edges of the first winner node and deletes an edge that has reached a predetermined age; and A node weight update unit that updates the weight vector of the first winner node based on the weight vector of the input node, and a node that does not insert the input node as a new node. A label weight update unit that spreads at least a part of the label weight to the first and second winner nodes, and a node deletion unit that deletes a node according to the node density at a predetermined timing, When the node deletion unit deletes a node, the label weight update unit diffuses at least a part of the label weight of the deletion node to nodes around the deletion node and inputs a predetermined number of the teacher data Each of the nodes is the learned node, and the recognizer is configured by the learned node.

これにより、オンラインかつ追加学習が可能なロボット装置を提供することができる。 Thereby, it is possible to provide a robot apparatus that is online and capable of additional learning.

本発明によれば、オンラインかつ追加学習が可能な属性の学習及び転移システム、認識器生成装置、認識器生成方法及び認識装置を提供することができる。 According to the present invention, it is possible to provide an attribute learning and transfer system, a recognizer generation device, a recognizer generation method, and a recognition device that are online and can be additionally learned.

実施の形態１にかかる属性の学習及び転移システムの構成図である。1 is a configuration diagram of an attribute learning and transfer system according to a first exemplary embodiment; 実施の形態１にかかる属性の学習及び転移を説明するためのフローチャートである。6 is a flowchart for explaining attribute learning and transfer according to the first exemplary embodiment; 実施の形態１にかかる未知対象検出を説明するためのフローチャートである。3 is a flowchart for explaining unknown object detection according to the first exemplary embodiment; 実施の形態１にかかる属性の学習及び転移システムの全体的な構成及び処理を説明するための概念図である。It is a conceptual diagram for demonstrating the whole structure and process of the attribute learning and transfer system concerning Embodiment 1. FIG. 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態１にかかる効果を説明するための図である。FIG. 6 is a diagram for explaining an effect according to the first embodiment; 実施の形態２の前提となるＡｄｊｕｓｔｅｄ−ＳＯＩＮＮの処理を示す図である。FIG. 10 is a diagram showing Adjusted-SOINN processing which is a premise of the second embodiment. 実施の形態２の前提となるＳＴＡＲ−ＳＯＩＮＮの処理を示す図である。FIG. 10 is a diagram illustrating a STAR-SONN process which is a premise of the second embodiment. 実施の形態２の識別器生成装置の構成を示す図である。It is a figure which shows the structure of the discriminator production | generation apparatus of Embodiment 2. FIG. 実施の形態２の識別器生成装置の処理を示す図である。It is a figure which shows the process of the discriminator production | generation apparatus of Embodiment 2. FIG. 実施の形態２の識別装置の構成を示す図である。6 is a diagram illustrating a configuration of an identification apparatus according to Embodiment 2. FIG. 実施の形態２の識別装置の処理を示す図である。FIG. 10 is a diagram illustrating processing of the identification device according to the second embodiment. 実施の形態２の識別器生成装置及び識別装置処理を示す図である。It is a figure which shows the discriminator production | generation apparatus and discrimination | determination apparatus process of Embodiment 2. FIG. 実施の形態２にかかる効果を説明するための図である。FIG. 10 is a diagram for explaining an effect according to the second embodiment. 実施の形態２にかかる効果を説明するための図である。FIG. 10 is a diagram for explaining an effect according to the second embodiment. 実施の形態２にかかる効果を説明するための図である。FIG. 10 is a diagram for explaining an effect according to the second embodiment. 実施の形態２にかかる効果を説明するための図である。FIG. 10 is a diagram for explaining an effect according to the second embodiment. 実施の形態３のロボット装置の構成を示す図である。FIG. 6 is a diagram illustrating a configuration of a robot apparatus according to a third embodiment.

＜実施の形態１．＞
＜学習及び転移システムの構成＞
以下、図面を参照して本発明の実施の形態について説明する。
図１は、本実施の形態にかかる属性の学習及び転移システムの構成図である。属性の学習及び転移システム１は、特徴抽出部２と、ラベリング部３と、識別器保持部４と、識別器生成部５と、属性識別部６と、クラス識別部７と、を備えている。 <Embodiment 1. >
<Configuration of learning and transfer system>
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a configuration diagram of an attribute learning and transfer system according to the present embodiment. The attribute learning and transfer system 1 includes a feature extraction unit 2, a labeling unit 3, a discriminator holding unit 4, a discriminator generating unit 5, an attribute discriminating unit 6, and a class discriminating unit 7. .

特徴抽出部２は、学習時は教師データ、認識時は入力データからその特徴を抽出する。例えば教師データ及び入力データが画像データである場合には、ＳＩＦＴ、ＳＵＲＦなどの特徴量を抽出する。なお、教師データ及び入力データは画像データに限定されず、音声データ、モーターなどの時系列データ、さらにそれらのデータの組み合わせが入力されるものとしてもよい。ここで、モーターは、ロボットに搭載されており、モーターが回転駆動することで、ロボットの各関節が駆動される。このようなモーターの値などが教師データ及び入力データとなる。また、教師データ及び入力データが画像データである場合に抽出する特徴量としては、ＳＩＦＴ特徴量、ＳＵＲＦ特徴量、ｒｇ−ＳＩＦＴ特徴量、ＰＨＯＧ特徴量、ｃｑ特徴量、Ｌｓｓ−ｈｉｓｔｏｇｒａｍ特徴量など公知の特徴量のうち、いずれか１つを抽出するものとしてもよいし、複数の種類の特徴量を抽出するものとしてもよい。 The feature extraction unit 2 extracts features from teacher data during learning and from input data during recognition. For example, when the teacher data and the input data are image data, feature quantities such as SIFT and SURF are extracted. The teacher data and the input data are not limited to image data, and voice data, time series data such as a motor, and a combination of these data may be input. Here, the motor is mounted on the robot, and each joint of the robot is driven by the rotation of the motor. Such motor values are used as teacher data and input data. In addition, as feature amounts to be extracted when the teacher data and the input data are image data, known features such as SIFT feature amounts, SURF feature amounts, rg-SIFT feature amounts, PHOG feature amounts, cq feature amounts, Lss-histogram feature amounts, and the like are known. Any one of the feature amounts may be extracted, or a plurality of types of feature amounts may be extracted.

ラベリング部３は、与えられる属性情報を、入力するデータにラベル付けする（教師データ）。属性情報は、入力するデータのクラスが持つ属性を表現する情報であり、多値ベクトルにより与えられる。本実施の形態では、教師データ及び入力データを画像として、その画像クラスが持つ属性を２値のベクトル（属性情報）として入力画像にラベル付けする。なお、本実施の形態では、ラベリング部３は、後述する属性ラベリングモジュールに相当する。 The labeling unit 3 labels the given attribute information on the input data (teacher data). The attribute information is information that represents an attribute of the class of input data, and is given by a multi-value vector. In the present embodiment, teacher data and input data are used as images, and the attributes of the image class are labeled as binary vectors (attribute information). In the present embodiment, the labeling unit 3 corresponds to an attribute labeling module described later.

識別器保持部４は、識別器生成部５で生成される属性識別器を、メモリ等の記憶手段（不図示）に保持する。識別器生成部５は、入力データに含まれる属性を識別するために用いる属性識別器を生成する。識別器生成部５は、ノード及びノード間を接続するエッジを含む自己増殖型ニューラルネットワークを用いて構成されており、自己増殖型ニューラルネットワークは、属性により識別される識別内容に応じて複数の部分に分割されている。本実施の形態では、後述するように、属性が含まれていることを示す第１の部分（ポジティブ部分）と、属性が含まれていないことを示す第２の部分（ネガティブ部分）と、に自己増殖型ニューラルネットワークを分割する。なお、３つ以上の内容を識別する場合には、自己増殖型ニューラルネットワークを３つ以上の部分に分割するものとしてもよい。また、自己増殖型ニューラルネットワークの詳細については後述する。 The discriminator holding unit 4 holds the attribute discriminator generated by the discriminator generating unit 5 in storage means (not shown) such as a memory. The discriminator generation unit 5 generates an attribute discriminator used for identifying an attribute included in the input data. The discriminator generating unit 5 is configured using a self-propagating neural network including nodes and edges connecting the nodes, and the self-propagating neural network includes a plurality of parts according to identification contents identified by attributes. It is divided into In the present embodiment, as will be described later, a first part (positive part) indicating that an attribute is included and a second part (negative part) indicating that an attribute is not included. Divide a self-propagating neural network. When identifying three or more contents, the self-propagating neural network may be divided into three or more parts. Details of the self-propagating neural network will be described later.

また、識別器生成部５は、ラベリング部３でラベル付けされた属性情報により特定される自己増殖型ニューラルネットワークの部分に対して、特徴抽出部２で抽出された教師データの特徴を教師パターンとして入力する。そして、識別器生成部５は、自己増殖型ニューラルネットワークにおいて、教師パターンに基づいてノード及びエッジを生成する。なお、属性情報は、自己増殖型ニューラルネットワークのいずれの部分に教師データ及び入力データが入力されるべきかを示しており、識別器生成部５は、ラベリング部３でラベル付けされた属性情報により、どの自己増殖型ニューラルネットワークに対して、教師パターンを入力すればよいのかを特定することができる。 Further, the classifier generation unit 5 uses the feature of the teacher data extracted by the feature extraction unit 2 as a teacher pattern for the part of the self-propagating neural network specified by the attribute information labeled by the labeling unit 3. input. Then, the discriminator generation unit 5 generates nodes and edges based on the teacher pattern in the self-propagating neural network. The attribute information indicates in which part of the self-propagating neural network the teacher data and the input data should be input. The discriminator generation unit 5 uses the attribute information labeled by the labeling unit 3. It is possible to specify for which self-propagating neural network the teacher pattern should be input.

属性識別部６は、入力データが入力された場合に、属性識別器を用いて、入力データに含まれる属性を識別する。属性識別部６は、識別器保持部４で保持された属性識別器の自己増殖型ニューラルネットワークのそれぞれの部分に対して、入力データから抽出された特徴を入力パターンとして入力し、当該入力パターンと自己増殖型ニューラルネットワークに含まれるノードとの第１の類似度をそれぞれの自己増殖型ニューラルネットワークの部分において算出し、算出した第１の類似度に応じて、識別内容のいずれの属性が入力データに含まれるかを識別する。 The attribute identification unit 6 identifies an attribute included in the input data using an attribute identifier when input data is input. The attribute discriminating unit 6 inputs, as an input pattern, features extracted from the input data to each part of the self-propagating neural network of the attribute discriminator held by the discriminator holding unit 4. A first similarity with a node included in the self-propagating neural network is calculated in each self-propagating neural network, and any attribute of the identification content is input data according to the calculated first similarity. To identify whether it is included.

本実施の形態では、後述するように、１つの属性を表現する１つの属性識別器を用いて、入力データがその１つの属性を含んでいるか否かという２つの内容を識別する。また、複数の属性識別器を用いる場合には、それぞれの属性識別器から、その属性の有無を示す識別結果が出力され、入力データが複数の属性のそれぞれを含んでいるか否かが識別される。 In the present embodiment, as will be described later, one attribute discriminator that expresses one attribute is used to identify two contents of whether or not the input data includes the one attribute. When a plurality of attribute classifiers are used, an identification result indicating the presence or absence of the attribute is output from each attribute classifier, and it is identified whether or not the input data includes each of the plurality of attributes. .

また、本実施の形態では、入力パターンと自己増殖型ニューラルネットワークに含まれるノードとの類似度として、入力パターンと複数個の最近傍ノードとの間の距離を算出する。さらに、後述する数（４）に示されるように、第１の部分（ポジティブ部分）に関して算出した第１の類似度と、第２の部分（ネガティブ部分）に関して算出した第１の類似度と、を比較する際には、これら第１の類似度の相対的な大小関係を考慮した上で、入力データに含まれる属性を識別している。 In this embodiment, the distance between the input pattern and a plurality of nearest nodes is calculated as the similarity between the input pattern and a node included in the self-propagating neural network. Furthermore, as shown in the number (4) described later, the first similarity calculated for the first part (positive part), the first similarity calculated for the second part (negative part), Are compared, the attributes included in the input data are identified in consideration of the relative magnitude relationship of these first similarities.

クラス識別部７は、入力データに対する属性識別器からの出力に基づいて、その入力データが含まれるクラスを識別する。より具体的には、クラス識別部７は、複数のクラスについてそれぞれが含む属性情報が与えられ、属性識別部６で識別された入力データの属性とクラスの属性情報とを比較して第２の類似度を求め、その算出した第２の類似度に応じて、複数のクラスのうちでいずれのクラスに入力データが含まれるか識別する。 The class identification unit 7 identifies a class including the input data based on the output from the attribute classifier for the input data. More specifically, the class identifying unit 7 is provided with attribute information included for each of the plurality of classes, and compares the attribute of the input data identified by the attribute identifying unit 6 with the attribute information of the class to obtain the second information. The similarity is obtained, and in accordance with the calculated second similarity, which class among the plurality of classes contains the input data is identified.

なお、属性の学習及び転移システム１は、例えば、演算処理等を行うＣＰＵ（Central Processing Unit）、ＣＰＵによって実行される演算プログラム等が記憶されたＲＯＭ（Read Only Memory）、処理データ等を一時的に記憶するＲＡＭ（Random Access Memory）等からなるマイクロコンピュータを中心にして、ハードウェア構成されている。 Note that the attribute learning and transfer system 1 temporarily stores, for example, a central processing unit (CPU) that performs arithmetic processing, a read only memory (ROM) that stores arithmetic programs executed by the CPU, processing data, and the like. The hardware configuration is centered on a microcomputer comprising a RAM (Random Access Memory) or the like stored in the memory.

＜属性識別器の学習方法＞
図２は、学習段階における、属性の学習及び転移を説明するためのフローチャートである。
Ｓ１０１：属性の学習及び転移システム１は、属性識別器を初期化する。
Ｓ１０２：特徴抽出部２は、教師データである入力画像から特徴を抽出する。
Ｓ１０３：ラベリング部３は、教師画像に属性情報をラベル付けする。
Ｓ１０４：識別器生成部５は、教師画像の入力パターンを属性識別器を構成する自己増殖型ニューラルネットワークの対応する部分に入力し、クラスタリングを行う。
Ｓ１０５：属性の学習及び転移システム１は、学習を継続するか否かを判定し、継続すると判定した場合には、新たな教師画像が入力されてＳ１０２以降の処理を繰り返す。継続しないと判定した場合には、学習処理（教師画像を用いた属性識別器の訓練）を終了する。 <Learning method of attribute classifier>
FIG. 2 is a flowchart for explaining attribute learning and transfer in the learning stage.
S101: The attribute learning and transfer system 1 initializes an attribute classifier.
S102: The feature extraction unit 2 extracts features from an input image that is teacher data.
S103: The labeling unit 3 labels attribute information on the teacher image.
S104: The classifier generation unit 5 inputs the input pattern of the teacher image to the corresponding part of the self-propagating neural network constituting the attribute classifier, and performs clustering.
S105: The attribute learning and transfer system 1 determines whether or not to continue the learning. If it is determined that the learning is to be continued, a new teacher image is input and the processes in and after S102 are repeated. If it is determined not to continue, the learning process (training of the attribute classifier using the teacher image) is terminated.

図３は、認識段階における、未知対象検出を説明するためのフローチャートである。
Ｓ２０１：特徴抽出部２は、入力データである入力画像から特徴を抽出する。
Ｓ２０２：属性識別部６は、属性識別器を用いて、その属性識別器が表現する属性を入力データが含むか否かを識別する。
Ｓ２０３：クラス識別部７は、入力データに対する属性識別器からの出力と、その入力データが含まれるクラスの属性情報とを比較する。
Ｓ２０４：クラス識別部７は、上記比較結果に基づいて、入力データが含まれるクラスの識別結果を出力する。 FIG. 3 is a flowchart for explaining unknown object detection in the recognition stage.
S201: The feature extraction unit 2 extracts features from an input image that is input data.
S202: The attribute identification unit 6 uses an attribute classifier to identify whether the input data includes an attribute represented by the attribute classifier.
S203: The class identifying unit 7 compares the output from the attribute classifier with respect to the input data and the attribute information of the class including the input data.
S204: The class identification unit 7 outputs the identification result of the class including the input data based on the comparison result.

＜Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮ＞
次に、本実施の形態において用いる自己増殖型ニューラルネットワークについて簡単に説明する。
自己増殖型ニューラルネットワークとして、例えば、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮ（Ｓｅｌｆ−ＯｒｇａｎｉｚｉｎｇＩｎｃｒｅｍｅｎｔａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）が提案されている。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、自己組織化かつ追加学習可能なニューラルネットワークであり、オンラインの教師無し識別学習のためのメカニズムである（特開２００８−２１７２４２号公報、非特許文献「F. Shen & O. Hasegawa, "An incremental network for on-line unsupervised classification and topology learning, " Neural Networks, 19(1):90-106, 2006.」、及び非特許文献「F. Shen & O. Hasegawa, "An on-line learning mechanism for unsupervised classification and topology representation," in CVPR, 2005.」参照）。 <Adjusted-SOINN>
Next, the self-propagating neural network used in this embodiment will be briefly described.
As a self-propagating neural network, for example, Adjusted-SOINN (Self-Organizing Incremental Neural Network) has been proposed. Adjusted-SOINN is a self-organizing and additionally learning neural network, and is a mechanism for online unsupervised discrimination learning (Japanese Patent Laid-Open No. 2008-217242, non-patent document “F. Shen & O. Hasegawa”). , "An incremental network for on-line unsupervised classification and topology learning," Neural Networks, 19 (1): 90-106, 2006.) and non-patent literature "F. Shen & O. Hasegawa," An on-line learning mechanism for unsupervised classification and topology representation, "in CVPR, 2005.)).

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、入力パターンに対応するノードと、当該ノード間を接続するエッジと、から構成される。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、ノードの空集合から開始して、最初に、２つの入力データを開始時の２つのノードとして取得する。そして、入力パターンξ∈Ｒ^ｎ（ｎ次元ベクトル空間）ごとに、その最も近いノードｓ_１と、２番目に近いノードｓ_２を、以下の数（１）及び（２）により求める。なお、以下の数において、Ａ'はＡｄｊｕｓｔｅｄ−ＳＯＩＮＮにおける全てのノードの集合であり、Ｗ_ｃは、ノードｃのｎ次元の重みベクトルを示している。
Adjusted-SOINN is composed of a node corresponding to the input pattern and an edge connecting the nodes. Adjusted-SOINN starts with an empty set of nodes and first acquires two input data as the two nodes at the start. Then, for each input pattern ξ∈R ⁿ (n-dimensional vector space), the nearest node s ₁ and the second nearest node s ₂ are obtained by the following numbers (1) and (2). In the following numbers, A ′ is a set of all nodes in Adjusted-SOINN, and W _c indicates an n-dimensional weight vector of node c.

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、新たな入力パターンと、第１及び第２の勝者ノード（ノードｓ_１と、ノードｓ_２）との距離が所定の閾値よりも小さい場合には、その入力パターンを第１の勝者ノードに割り当てる。それ以外の場合には、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、入力パターンは現在のノードとあまりに異なるものであると判断して、その入力パターンを新たなノードとして生成する。 If the distance between the new input pattern and the first and second winner nodes (node s ₁ and node s ₂ ) is smaller than a predetermined threshold, the adjusted-SOINN uses the first input pattern as the first input pattern. Assign to the winner node. In other cases, Adjusted-SOIN determines that the input pattern is too different from the current node, and generates the input pattern as a new node.

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮにおける最も近いノードｓ_１に新たなパターンを割り当てた場合、その重みベクトルＷ_ｓ１を新たな入力パターンの値によって更新する。また、（エッジが存在しない場合には、）第１及び第２の勝者ノードの間にエッジを生成する。このような処理が、Ｋ−Ｍｅａｎのような他のクラスタリング手法とＡｄｊｕｓｔｅｄ−ＳＯＩＮＮとで著しく異なる点である。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮでは、その新たに入力されたパターン又はデータは、クラスタを形成するためにネットワークに直接追加されない。その代わりに、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮにおいて存在するノードをエッジにより接続していくことでクラスタが形成される。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮにおける現在のノードと入力パターンとが著しく異なる場合においてのみ、新たなノードを生成することから、このようなクラスタ形成処理によれば、長期間における実行に対してメモリの節約に大きな貢献をもたらす。 When a new pattern is assigned to the closest node s ₁ in the Adjusted-SOINN, the Adjusted-SOINN updates the weight vector W _s1 with the value of the new input pattern. Also, an edge is generated between the first and second winner nodes (if no edge exists). Such processing is significantly different between other clustering methods such as K-Mean and Adjusted-SOINN. In Adjusted-SOINN, the newly entered pattern or data is not added directly to the network to form a cluster. Instead, a cluster is formed by connecting nodes existing in the Adjusted-SOINN using edges. Since a new node is generated only when the current node and the input pattern in Adjusted-SOIN are remarkably different, such cluster formation processing greatly contributes to memory saving for long-term execution. Bring.

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮの他の主要な特徴として、いくつかの重要な性質をノードに与えている点が挙げられる。このアイデアにより、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮでは、ノードが自律的なエージェントのように振舞うことを可能としている。ノードは、年齢や累積エラーなどの性質を持っている。この結果、任意の時点で、各ノードはその自身の年齢や、ノードの累積エラー（ノードが第１勝者として選択される都度、入力パターンとの距離を累積することで求められる。）を持っている。これら性質によって、各ノードは２つの振る舞い（ノードの死滅及び自身の分割）を実行する。ノードの年齢に基づく死滅に関しては、任意の新しい入力パターンに対して勝利することなしにノードが長期間存在している場合には（ノイズ又は不要なノードである場合）、そのノードに連結された全てのエッジが徐々に死滅する。加えて、ノードの累積エラーに基づく分割に関しては、累積されたノイズがあまりにも大きな場合には、ノードは自身を二つに分割する。これは、巨大なクラスタ上でのＫ−Ｍｅａｎの再帰処理の振る舞いに類似する。このように、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、オンラインかつ追加学習可能な識別問題に対する強力な学習ツールであり、本実施の形態では、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮを採用することでその主な長所を継承すると共に、それをコンピュータビジョンにおける属性識別問題に適用可能となるように変更する。 Another key feature of Adjusted-SOINN is that it gives the node some important properties. With this idea, Adjusted-SOINN allows a node to behave like an autonomous agent. Nodes have properties such as age and cumulative error. As a result, at any given time, each node has its own age and accumulated error of the node (obtained by accumulating the distance from the input pattern each time the node is selected as the first winner). Yes. Because of these properties, each node performs two behaviors: node death and its own splitting. With regard to death based on the age of a node, if the node exists for a long time without winning against any new input pattern (if it is a noise or unwanted node), it is connected to that node All edges gradually die out. In addition, with respect to splitting based on the accumulated error of the node, if the accumulated noise is too great, the node splits itself into two. This is similar to the behavior of K-Mean recursion on a large cluster. As described above, Adjusted-SOINN is a powerful learning tool for identification problems that can be learned online and additionally. In this embodiment, adopting Adjusted-SOINN inherits its main advantages and uses it. Change to be applicable to the attribute identification problem in computer vision.

なお、自己増殖型ニューラルネットワークはＡｄｊｕｓｔｅｄ−ＳＯＩＮＮに限定されず、Ｅｎｈａｎｃｅｄ−ＳＯＩＮＮ（特開２００８−２１７２４６）などを利用するものとしてもよい。また、オンラインかつ追加学習可能という観点からは、事前にネットワークの構成やサイズを決定する必要があるためにその性能に制限が加えられるものの、自己増殖型ニューラルネットワークとして、ニューラルガス（ＮＧ）（T. M. Martinetz, and S. G. Berkovich, and K. J. Schulten, "Neural-gas," network for vector quantization and its application to time-series prediction, " IEEE Trans. On Neural Networks, vol. 4, no. 4, pp. 558-569, 1996.）やGrowing neural gas(GAG)（B. Fritzke, "A Growing Neural Gas Network Learns Topologies, " In Advances in Neural Information Processing System, vol. 7, pp. 625-632, 1995.）を用いることもできる。 Note that the self-propagating neural network is not limited to Adjusted-SOINN, but may use Enhanced-SOINN (Japanese Patent Laid-Open No. 2008-217246). From the viewpoint of online and additional learning, although it is necessary to determine the network configuration and size in advance, its performance is limited. However, as a self-propagating neural network, neural gas (NG) (TM Martinetz, and SG Berkovich, and KJ Schulten, "Neural-gas," network for vector quantization and its application to time-series prediction, "IEEE Trans. On Neural Networks, vol. 4, no. 4, pp. 558-569 , 1996.) and Growing neural gas (GAG) (B. Fritzke, "A Growing Neural Gas Network Learns Topologies," In Advances in Neural Information Processing System, vol. 7, pp. 625-632, 1995.) You can also.

＜ＡＴ−ＳＯＩＮＮ＞
以下では、自己組織化かつ追加学習可能なニューラルネットワークであるＡｄｊｕｓｔｅｄ−ＳＯＩＮＮに基づく属性の学習及び転移システム１を、ＡＴ−ＳＯＩＮＮと称し、未知対象クラス識別に対するＡＴ−ＳＯＩＮＮの詳細について説明する。ただし、理解を容易とするために、以下では、ＡＴ−ＳＯＩＮＮにより解決する問題を再度簡単に説明した上で、属性識別器の生成方法について説明し、さらに、属性識別器をどのようにして未知対象認識に利用するのかについて説明する。 <AT-SOINN>
Hereinafter, the attribute learning and transfer system 1 based on Adjusted-SOINN, which is a self-organized and additionally learnable neural network, is referred to as AT-SOINN, and details of AT-SOINN for unknown object class identification will be described. However, in order to facilitate understanding, in the following, the problem to be solved by AT-SOIN will be briefly explained again, the method for generating the attribute classifier will be explained, and how the attribute classifier will be unknown. A description will be given of whether it is used for object recognition.

まず、ＡＴ−ＳＯＩＮＮにより解決したい問題について説明する。
ＡＴ−ＳＯＩＮＮに向けられている問題は、非特許文献１に記載された問題とほぼ類似するものである。この問題に関して、唯一、かつ、重要な相違点は、システムがどのようにして学習されるのかということである。非特許文献１では、教師画像の集合が事前に準備され、システムに一度に入力される必要がある。各クラスの各画像は、２値の特徴ベクトルの属性がラベル付けされる。システムはまず、ＳＶＭによって属性識別器を学習する。教師サンプルとして（ｘ_１，ｌ_１），...，（ｘ_ｎ，ｌ_ｎ）⊂Ｘ×Ｙが与えられる。ここで、Ｘは任意の特徴空間であり、Ｙ＝｛ｙ_１，...，ｙ_Ｋ｝はＫ個の分離したクラスであり、Ｚ＝｛ｚ_１，...，ｚ_Ｌ｝は、Ｙとは互いに素なクラスのテスト用データの集合である（集合Ｚと集合Ｙの積集合は空集合となる。）。ここでの主なタスクは、Ｙとは完全に互いに素なラベル集合Ｚ＝｛ｚ_１，...，ｚ_Ｌ｝に対して、入力画像Ｘ→Ｚを識別することである。 First, problems to be solved by AT-SOINN will be described.
The problem directed to AT-SOINN is almost similar to the problem described in Non-Patent Document 1. The only and important difference with regard to this problem is how the system is learned. In Non-Patent Document 1, a set of teacher images must be prepared in advance and input to the system at once. Each image of each class is labeled with a binary feature vector attribute. The system first learns the attribute classifier by SVM. (X ₁ , l ₁ ),..., (X _n , l _n ) ⊂X × Y are given as teacher samples. Where X is an arbitrary feature space, Y = {y ₁ ,..., Y _K } is K separated classes, and Z = {z ₁ ,..., Z _L } is Y is a set of test data of disjoint classes (a product set of set Z and set Y is an empty set). The main task here is to identify the input image X → Z for a label set Z = {z ₁ ,..., Z _L } that is completely disjoint from Y.

クラスｚ_１，...，ｚ_Ｌが学習段階でたとえ与えられないとしても、ｚ∈Ｚ及びｙ∈Ｙのそれぞれのクラスに対して属性の表現ａが利用可能であるならば、Ｙ及びＺの間に存在する属性ａ∈Ａを転移させることで、属性の識別は可能である。具体的には、任意の教師クラスｙに対する属性の表現ａ^ｙ＝（ａ^ｙ _１，...，ａ^ｙ _ｍ）が固定長の２値のベクトルとなるように、全ての属性が２値（binary value）により表現される。学習処理は、各属性ａ_ｍについての確率的な識別器を学習することで開始する。教師クラス集合Ｙからの全ての画像がラベル付きの教師サンプルとして用いられ、サンプルのラベルに一致する属性ベクトルが記入されてそのラベルが決定される（つまり、クラスｙのサンプルには２値のラベルａ^ｙ _ｍが割り当てられる）。学習された属性識別器は、ｐ（ａ_ｍ｜ｘ）の推定を与える。ｐ（ａ_ｍ｜ｘ）は、ｐ（ａ｜ｘ）＝Π^Ｍ _ｍ＝１ｐ（ａ_ｍ｜ｘ）としての完全な画像属性のレイヤーに対するモデルである。ここで、Ｍは、与えられる属性の全ての個数である。この推定項は、以下の数（３）に示すように、画像が与えられたときのクラスの事後分布を計算するために用いられる。
Even if the classes z ₁ ,..., Z _L are not given in the learning phase, if the attribute representation a is available for each class of z∈Z and y∈Y, Y and Z The attribute can be identified by transferring the attribute aεA existing between the two. Specifically, all the attributes are binary (so that the attribute expression a ^y = (a ^y ₁ ,..., A ^y _m ) for any teacher class y is a fixed-length binary vector. binary value). Learning process begins by learning the probabilistic classifier for each attribute a _m. All images from the teacher class set Y are used as labeled teacher samples, and attribute labels that match the sample labels are entered to determine their labels (ie binary labels for class y samples) a ^y _m is assigned). The learned attribute classifier gives an estimate of p (a _m | x). p _(a m | x) is, p is a model for the layer of the complete image attributes as _{| | (x a m) (} a x) = Π M m = 1 p. Here, M is the total number of given attributes. This estimation term is used to calculate the posterior distribution of the class when an image is given, as shown in the following equation (3).

本実施の形態で提案する手法（ＡＴ−ＳＯＩＮＮ）は、以上で説明した非特許文献１に記載された問題を解決するものであるが、さらに、以下の２つの条件をも満たして問題を解決するものである。
（条件１：オンラインでの属性の学習及び転移の実現）教師画像の集合を事前に準備するものではない。システムは、教師画像の集合のサイズを知らない。ロボットへの利用の観点からは、システムは、クラスインデックスとそのようなクラスの属性の表現とがラベル付された１つの教師画像を、徐々に取得する。
（条件２：追加学習可能な属性の学習及び転移の実現）システムは、入力画像が入力されたときにはいつでも学習を停止して識別することができ、同様にして、新たな教師画像が利用可能となったときにはいつでも学習処理を再開することができる。
これらの条件によって、各属性の識別のためにＳＶＭを学習するという手法は非現実的なことになる。従って、これら２つの条件をも満たして問題を解決することは、本実施の形態による主な貢献となる。 The technique (AT-SOIN) proposed in the present embodiment solves the problem described in Non-Patent Document 1 described above, and further solves the problem by satisfying the following two conditions. To do.
(Condition 1: Online learning of attributes and realization of transfer) A set of teacher images is not prepared in advance. The system does not know the size of the set of teacher images. From the point of view of use for a robot, the system gradually acquires one teacher image labeled with a class index and a representation of attributes of such a class.
(Condition 2: Realization of learning and transfer of attributes that can be additionally learned) The system can stop and identify any time an input image is input, and a new teacher image can be used in the same way. The learning process can be resumed at any time.
Under these conditions, the method of learning SVM for identifying each attribute becomes unrealistic. Therefore, satisfying these two conditions and solving the problem is the main contribution of this embodiment.

＜属性識別器の生成方法＞
次に、属性識別器の生成方法について説明する。
明らかに、未知対象クラスの検出において高いパフォーマンスを得るためには、各属性についての効果的な識別器が依然として必要である。ＳＶＭではオンライン追加システムの要求には答えることができない。このため、本実施の形態では、ＳＶＭに代えて、自己増殖型ニューラルネットワークであるＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを用いて属性識別器を構成する。ただし、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮの適用に際しては、いくつかの重要な修正が必要となる。 <Method for generating attribute classifier>
Next, a method for generating an attribute discriminator will be described.
Clearly, an effective classifier for each attribute is still needed to obtain high performance in detecting unknown object classes. SVM cannot answer the requirements of online add-on systems. For this reason, in this embodiment, an attribute classifier is configured using Adjusted-SOINN, which is a self-propagating neural network, instead of SVM. However, some important corrections are required for the application of Adjusted-SOINN.

基本的に、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮ自体は、オンラインで追加学習可能なクラスタリングツールのように動作する。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、マルチクラスの識別に用いることが可能であるが、本実施の形態での主なタスクは、個別の属性ａを画像が含んでいるか否かを答えることである。従って、これは２値の識別の問題になる（つまり、画像が属性ａを含んでいるか否かの２つを識別する）。さらに、本実施の形態では、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮを用いて識別するクラスの個数を固定する；ポジティブ（＋）クラスと、ネガティブ（−）クラスの２つのクラスのみを考える（ポジティブクラスは画像がその属性を含んでいることを示し、ネガティブクラスはその属性を含んでいないことを示す。）。 Basically, Adjusted-SOINN itself operates like a clustering tool that can be additionally learned online. Adjusted-SOINN can be used for multi-class identification, but the main task in the present embodiment is to answer whether or not the image includes an individual attribute a. This is therefore a binary identification problem (ie, identifying whether the image contains attribute a or not). Furthermore, in this embodiment, the number of classes identified using Adjusted-SOINN is fixed; only two classes, positive (+) class and negative (-) class, are considered (the positive class has an image attribute) ) And the negative class does not contain that attribute.

従って、このような属性の識別を実現するために、属性識別器であるＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを２つの部分（ポジティブ部分とネガティブ部分）に分割する。各部分において、オリジナルなＡｄｊｕｓｔｅｄ−ＳＯＩＮＮと同様の処理によって、入力パターンに基づいてノード及びエッジを生成し、追加的にクラスタを成長させていく。 Therefore, in order to realize such attribute identification, the attribute-identified Adjusted-SOINN is divided into two parts (positive part and negative part). In each part, nodes and edges are generated based on the input pattern by the same process as the original Adjusted-SOINN, and clusters are additionally grown.

１つの属性識別器について１つのＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを必要とするため、画像のＭ個の属性を識別するためには、全部でＭ個のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを必要とする。非特許文献１に記載された内容と同じように、この個数Ｍは、ちょうど１つの特徴空間に対してのみ必要となる。このため、Ｑ個の特徴量を用いるならば、全部でＭ×Ｑ個のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを必要とする。例えば、非特許文献１に記載されたデータセットを用いる実験では、６個の異なる特徴空間における属性識別のために、８５×６個のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを必要とする。本実施の形態にかかるＡｄｊｕｓｔｅｄ−ＳＯＩＮＮの個数は非特許文献１に記載されたＳＶＭの個数とまさに同じであるが、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、より少ないメモリ消費により高速に動作することができ、最も重要なこととして、オンラインかつ追加学習可能な学習処理を行うものである。 Since one Adjusted-SOINN is required for one attribute discriminator, a total of M Adjusted-SOINNs are required to identify the M attributes of the image. Similar to the content described in Non-Patent Document 1, this number M is required only for one feature space. For this reason, if Q feature values are used, a total of M × Q Adjusted-SOINNs are required. For example, in an experiment using the data set described in Non-Patent Document 1, 85 × 6 Adjusted-SOINNs are required for attribute identification in six different feature spaces. The number of Adjusted-SOINNs according to the present embodiment is exactly the same as the number of SVMs described in Non-Patent Document 1, but Adjusted-SOINN can operate at high speed with less memory consumption, and is most important. As a matter of course, a learning process that enables online and additional learning is performed.

図４は、ＡＴ−ＳＯＩＮＮの全体的な構成及び処理を示す概念図である。
まず、学習段階では、追加的な手法において、教師クラス（Training Classes Y）からの教師画像が、Ｍ個のＳＯＩＮＮ（SOINN-based Individual Attribute Classifiers）に徐々に入力される。教師画像は、属性ラベリングモジュール（Labeling Attributes）によるラベリング処理を介して各Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮに入力される。 FIG. 4 is a conceptual diagram showing the overall configuration and processing of AT-SOINN.
First, in the learning stage, in an additional method, teacher images from a teacher class (Training Classes Y) are gradually input to M SOINN-based Individual Attribute Classifiers (SOINN). The teacher image is input to each Adjusted-SOINN through a labeling process by an attribute labeling module (Labeling Attributes).

本実施の形態にかかる属性ラベリングモジュールは、ロボットが使用するシステムの場合におけるものである。管理者（ここでは人間）が、このモジュールを介して、各画像クラスの属性をラベル付けする。本実施の形態では、非特許文献１に記載されている手法と同様にして、属性ラベルは直接得られるものと想定する。 The attribute labeling module according to the present embodiment is for a system used by a robot. The administrator (here human) labels the attributes of each image class through this module. In the present embodiment, it is assumed that the attribute label is obtained directly in the same manner as the method described in Non-Patent Document 1.

教師画像は、属性ラベリングモジュールを介してなされたラベル（属性情報）に応じて、ポジティブサンプル（Positive Sample）又はネガティブサンプル（Negative Sample）として区別され、各Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮの対応する部分に入力される。教師画像の入力による属性識別器の学習は、追加的に実行することができる。また、新たなＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを生成することで、新たな属性識別器を単純に追加することができる。 The teacher image is distinguished as a positive sample (Positive Sample) or a negative sample (Negative Sample) according to the label (attribute information) made through the attribute labeling module, and is input to the corresponding part of each Adjusted-SOINN. . Learning of the attribute classifier by inputting the teacher image can be additionally executed. Also, a new attribute identifier can be simply added by generating a new Adjusted-SOINN.

ＡＴ−ＳＯＩＮＮで認識したいときにはいつでも、未知画像クラスの入力画像をＡＴ−ＳＯＩＮＮに入力する。入力画像ｘは特徴ベクトルによって表現され、個々のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮ全てに入力される。属性ｍの個別のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮ（属性識別器）に対して、ポジティブ部分とネガティブ部分の両方から、それぞれｋ個の最近傍ノードの集合が得られる（ポジティブ部分からｋ個の最近傍ノードＳ^＋ _ｍ＝｛ｓ^＋ _ｍ，１，...，ｓ^＋ _ｍ，ｋ｝が、ネガティブ部分からｋ個の最近傍ノードＳ⁻ _ｍ＝｛ｓ⁻ _ｍ，１，...，ｓ⁻ _ｍ，ｋ｝が得られる。）。そして、以下の数（４）が当てはまる場合にのみ、入力画像ｘは属性ｍを含んでいるとみなされる。なお、ここで、ξは入力パターン（入力画像ｘの特徴ベクトル）であり、Ｗ^＋ _ｍ，ｊはノードｓ^＋ _ｍ，ｊの重みベクトルである。また、予めユーザにより所定の閾値Ｔとして適当な値が設定される。以下の数（４）が当てはまらない場合には、入力画像ｘは属性ｍを含んでいないものとして扱われる。
Whenever it is desired to recognize with AT-SOINN, an input image of an unknown image class is input to AT-SOINN. The input image x is expressed by a feature vector and input to all individual Adjusted-SOINNs. For each Adjusted-SOINN (attribute discriminator) of attribute m, a set of k nearest neighbor nodes is obtained from both the positive part and the negative part (k nearest neighbor nodes S ⁺ from the positive part). _{^{_{m = {s + m, 1}}} , ..., s + m, k} is, k number of nearest node from the negative portion ^{_{^{_{S - m = {s - m}}}} , 1, ..., s - m, k } Is obtained). Only when the following number (4) is true, the input image x is considered to include the attribute m. Here, ξ is an input pattern (feature vector of the input image x), and W ⁺ _{m, j} is a weight vector of the node s ⁺ _{m, j} . An appropriate value is set in advance as a predetermined threshold T by the user. When the following number (4) does not apply, the input image x is treated as not including the attribute m.

＜認識方法＞
次に、属性識別器をどのようにして未知対象認識に利用するのかについて説明する。
各属性の識別器が利用可能となることで、ＡＴ−ＳＯＩＮＮは必要なときにいつでも、未知クラスにおける未知対象を識別することができる。未知対象の識別は、極めて単純な手法により行うことができる。基本的に、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮ及びＳＶＭの両方とも、属性識別器の出力は、入力画像のベクトルと代表ベクトルとの間の距離空間におけるものとなる。ＳＶＭに対しては、その空間を確率空間に変換するPlatt scaling（J. C. Platt, "Probabilities for SV machines, " in Advances in Large Margin Classifiers. MIT Press, 2000.）を実行するために、余計な教師画像の集合を準備する必要がある。教師データの集合を事前に得ることができないというＡＴ−ＳＯＩＮＮが想定する条件下では、残念なことに、余計な教師画像集合を準備するという手法は非現実的である。従って、本実施の形態では、入力画像の特徴ベクトルとＡｄｊｕｓｔｅｄ−ＳＯＩＮＮの最近傍クラスとの距離空間を単純に考慮することによって、未知対象の識別を行う。 <Recognition method>
Next, how to use the attribute classifier for unknown object recognition will be described.
With the availability of an identifier for each attribute, AT-SOINN can identify unknown objects in unknown classes whenever necessary. Identification of unknown objects can be performed by a very simple method. Basically, for both Adjusted-SOINN and SVM, the output of the attribute classifier is in the metric space between the vector of the input image and the representative vector. For SVM, an extra teacher image to perform Platt scaling (JC Platt, “Probabilities for SV machines,” in Advances in Large Margin Classifiers. MIT Press, 2000.) that transforms the space into a probability space. It is necessary to prepare a set of Unfortunately, under the conditions assumed by AT-SOIN that a set of teacher data cannot be obtained in advance, the technique of preparing an extra set of teacher images is unrealistic. Therefore, in this embodiment, the unknown object is identified by simply considering the metric space between the feature vector of the input image and the nearest neighbor class of Adjusted-SOINN.

属性ｍの識別器を表現する２部分のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮ（ポジティブ部分と、ネガティブ部分）に関して、現在の入力画像とｋ個の最近傍ノードとの間の平均距離が、それぞれｄ^＋ _ｍ及びｄ⁻ _ｍとして与えられる。入力入力画像は、以下の数（５）に基づいて、未知対象クラスｃに対して割り当てられる。なお、ここで、ｓ^ｑ _ｚｌは、全部でＱ特徴のうちの特徴ｑの空間におけるクラスｚ_ｌのスコアであり、後述する数（６）に基づいて算出される。
各特徴空間ｑの類似度スコアｓ_ｚｌは、以下の数（６）により得られる。なお、ここで、条件Ｐが真である場合には［Ｐ］＝１であり、それ以外には、［Ｐ］＝０である（ここで、［Ｐ］は、角括弧を用いて記載しているが、正しくは、［Ｐ］は、数（６）に示すように、二重角括弧を用いて記載される。）。全ての未知対象クラスｚは、その属性ベクトルａ^ｚを決定論的な手法によって生じさせるものと想定する。この決定論的な手法は、Iverson's bracket notation（D. E. Knuth, "Two notes on notation, " Amer. Math. Monthly, 99(5):403-422, 1992.）から得られる。具他的には、本実施の形態では、認識段階では、未知対象クラスｚ_ｌについての２値により表現される属性ｍが与えられるため、ａ^ｚｌ _ｍの値が１である場合（ａ^ｚｌ _ｍ＝１）には［ａ^ｚｌ _ｍ＝１］の値は１となり、［ａ^ｚｌ _ｍ＝０］の値は０となり、ａ^ｚｌ _ｍの値が０である場合（ａ^ｚｌ _ｍ＝０）には［ａ^ｚｌ _ｍ＝１］の値は０となり、［ａ^ｚｌ _ｍ＝０］の値は１となる。
For a two-part Adjusted-SOINN (positive part and negative part) representing a classifier of attribute m, the average distance between the current input image and the k nearest neighbors is d ⁺ _m and d ⁻ , respectively. is given as _m . The input input image is assigned to the unknown target class c based on the following number (5). Here, s ^q _zl is a score of the class z _l in the space of the feature q among the Q features in total, and is calculated based on the number (6) described later.
The similarity score s _zl of each feature space q is obtained by the following number (6). Here, when the condition P is true, [P] = 1, otherwise [P] = 0 (where [P] is described using square brackets. However, correctly, [P] is described using double square brackets as shown in number (6)). All unknown object classes z are assumed to have their attribute vectors a ^z generated by a deterministic approach. This deterministic approach is obtained from Iverson's bracket notation (DE Knuth, "Two notes on notation," Amer. Math. Monthly, 99 (5): 403-422, 1992.). Specifically, in the present embodiment, in the recognition stage, an attribute m expressed by a binary value with respect to the unknown object class z _l is given, and therefore when the value of a ^zl _m is 1 (a ^zl _m = 1), the value of [a ^zl _m = 1] is 1, the value of [a ^zl _m = 0] is 0, and when the value of a ^zl _m is 0 (a ^zl _m = 0) The value of [a ^zl _m = 1] is 0, and the value of [a ^zl _m = 0] is 1.

＜実施例＞
次に、本実施の形態による成果及び実験について説明する。
従来手法との比較のために、非特許文献１に記載された属性付の動物データセットを利用する。このデータセットは、５０個の動物クラスについての８５個の属性を含んでいる。学習メカニズムのパフォーマンスのみに焦点を与えるために、本実施の形態では、粗い画像データセットをそのまま用いることに代えて、非特許文献１に開示された予め抽出された特徴データセットを用いて実験を行う。このデータセットに対しては６個の特徴を利用することができる（ＳＩＦＴ、ＳＵＲＦ、ＰＨＯＧ、ｒｇＳＩＦＴ、local self-similarity histograms(ＬＳＳ)、及びRGB Color Histogram（ＣＱ）（詳細は非特許文献１を参照されたい）。 <Example>
Next, the result and experiment by this Embodiment are demonstrated.
For comparison with the conventional method, an animal data set with attributes described in Non-Patent Document 1 is used. This data set contains 85 attributes for 50 animal classes. In order to focus only on the performance of the learning mechanism, in this embodiment, instead of using the coarse image data set as it is, an experiment is performed using the feature data set extracted in advance in Non-Patent Document 1. Do. Six features can be used for this data set (SIFT, SURF, PHOG, rgSIFT, local self-similarity histograms (LSS), and RGB Color Histogram (CQ). See).

ＡＴ−ＳＯＩＮＮは追加学習が可能であるが、ここでは、追加学習が不可能な非特許文献１による成果との比較をするために、４０個の画像クラスが既に学習されたときにおけるパフォーマンスの認識を行うことにする。つまり、５０個の動物クラスのうちで、４０個の画像クラスが教師データとして用いられ、その４０個の画像クラスと互いに素な残りの１０個の画像クラスが、ＡＴ−ＳＯＩＮＮのテスト（認識用データ）に利用される。この条件は非特許文献１に記載されたものと全く同一のものであるため、ＡＴ−ＳＯＩＮＮと非特許文献１に記載された手法との間での公平な比較を可能とする。 AT-SOIN can perform additional learning, but here, in order to compare with the results of Non-Patent Document 1 where additional learning is impossible, recognition of performance when 40 image classes have already been learned. To do. That is, out of 50 animal classes, 40 image classes are used as teacher data, and the remaining 10 image classes that are disjoint from the 40 image classes are used for AT-SOINN test (recognition). Data). Since this condition is exactly the same as that described in Non-Patent Document 1, it is possible to make a fair comparison between AT-SOIN and the method described in Non-Patent Document 1.

なお、実験では、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮのパラメータに関して、ａｇｅ_ｄｅａｄ＝１００、λ＝２５０として設定する。他のパラメータについては、上述したオリジナルのＡｄｊｕｓｔｅｄ−ＳＯＩＮＮの文献に記載されているものと同様の設定を利用する。これらＡｄｊｕｓｔｅｄ−ＳＯＩＮＮのパラメータは、ＡＴ−ＳＯＩＮＮでベストな成果を達成できるように選択されている。実験は、様々な値の設定（設定λ＝１５０、２５０、３５０、４５０、５５０、ａｇｅ_ｄｅａｄ＝５０、１００、１５０、２５０）によって試行するが、これらの結果はそれほど異なるものではない。ただし、あまりに小さな値をλ及びａｇｅ_ｄｅａｄに設定すると、あまりに頻繁なノード除去処理の実行を招くことになり、同時に、存在しているノードが十分な入力を得る前にあまりにも早く死滅してしまい、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮはクラスタを生成することができなくなる。従って、あまりに小さな値（例えば、λ＝２５、ａｇｅ_ｄｅａｄ＝１０）をパラメータとして設定しない限りにおいては、パラメータの値の違いは、識別結果にそれほどの低下をもたらすものではない。 In the experiment, the adjusted-SOINN parameters are set as age _dead = 100 and λ = 250. For the other parameters, the same settings as those described in the original Adjusted-SOIN document described above are used. These Adjusted-SOINN parameters are selected to achieve the best results with AT-SOINN. Experiments are tried with various values settings (setting λ = 150, 250, 350, 450, 550, age _dead = 50, 100, 150, 250), but these results are not very different. However, setting too small values for λ and age _dead will lead to too frequent node removal processing, and at the same time, existing nodes will die too quickly before getting enough input. , Adjusted-SOINN will not be able to generate clusters. Therefore, as long as a too small value (for example, λ = 25, age _dead = 10) is not set as a parameter, a difference in parameter value does not cause a significant decrease in the identification result.

まず、ＡＴ−ＳＯＩＮＮの属性識別器の品質について説明する。
図５は、非特許文献１に記載された手法と比較した場合のＡＴ−ＳＯＩＮＮの個別の属性識別器の品質を示している。ＡＴ−ＳＯＩＮＮによる結果はいくつかの属性について非特許文献１に記載された手法を下回っているものの、後述する図９で示されるように、劇的に減少した計算時間を考慮すれば、ＡＴ−ＳＯＩＮＮの属性識別器の品質は十分に高いものである。 First, the quality of the AT-SOINN attribute discriminator will be described.
FIG. 5 shows the quality of individual attribute classifiers of AT-SOINN when compared with the method described in Non-Patent Document 1. Although the results of AT-SOINN are lower than the method described in Non-Patent Document 1 for some attributes, as shown in FIG. 9 to be described later, if dramatically reduced calculation time is considered, AT- The quality of the SOINN attribute discriminator is sufficiently high.

属性識別器の学習に要する計算時間の観点からは、ＡＴ−ＳＯＩＮＮは、各特徴空間に対する各識別器を学習するために４０個の画像クラス（２４２９４個の教師画像）を用いた学習でおよそ３００秒を必要とする。これに対して、非特許文献１に記載された手法は、各特徴空間に対する各識別器を学習するために数時間を必要とする。従って、非特許文献１に記載された手法は、たった１つの特徴空間に対する全ての属性の識別器を学習するために、合計で１００時間以上を必要とする。また、他の従来手法とは異なり、ＡＴ−ＳＯＩＮＮは完全に追加学習が可能であるために、全体の学習メカニズムを再度開始することなく、新たな入力画像を追加的に入力可能である。 From the viewpoint of calculation time required for learning of the attribute classifier, AT-SOIN is approximately 300 in learning using 40 image classes (24294 teacher images) to learn each classifier for each feature space. Requires seconds. On the other hand, the method described in Non-Patent Document 1 requires several hours to learn each classifier for each feature space. Therefore, the method described in Non-Patent Document 1 requires a total of 100 hours or more in order to learn the classifiers of all the attributes for only one feature space. Further, unlike other conventional methods, AT-SOIN can be additionally learned completely, so that it is possible to additionally input a new input image without restarting the entire learning mechanism.

次に、ＡＴ−ＳＯＩＮＮの実験として、動物の４０個のクラスを用いた８５個の属性予測器の追加的なオンライン学習によって、他の互いに素な１０個の動物のクラスでのテスト（認識）を行う。
図６は、１０個のクラス間での混同行列の結果を示している。マルチクラスの平均精度は、２８，９２％である。このパフォーマンスは、１０％のチャンスレベルよりも明らかに十分に高いものである。 Next, as an AT-SOIN experiment, an additional online learning of 85 attribute predictors using 40 classes of animals was used to test (recognize) another 10 disjoint classes of animals. I do.
FIG. 6 shows the result of the confusion matrix between the 10 classes. The average accuracy of the multiclass is 28,92%. This performance is clearly much higher than the 10% chance level.

図７は、ＡＴ−ＳＯＩＮＮと、非特許文献１に記載された２つの手法ＤＡＰ（Direct attribute prediction）及びＩＡＰ（Indirect attribute prediction）と、の比較を示している。 FIG. 7 shows a comparison between AT-SOINN and the two methods DAP (Direct attribute prediction) and IAP (Indirect attribute prediction) described in Non-Patent Document 1.

ここで、評価のために選択されたデータセットが未知対象検出に関して困難なものであることを示すために、非特許文献１においてベースライン手法として記述された他の結果についても言及する。非特許文献１では、同じ設定で同じデータセットを利用して、ＤＡＰ及びＩＡＰという２つの手法に加えて、さらに２つのベースライン（（ｉ）単純な１回完結の学習アプローチであり、これは、システムが、４０個の教師クラスからの特徴分散の対角線共分散行列を学習し、結果として生じた最近傍の識別のためのマハラノビス距離を利用するものである。（ｉｉ）標準的なありふれたマルチクラス識別であり、半々の画像の分割が教師及び認識用データ（入力データ）に対して与えられる。最初のベースライン（１回完結の学習）によって、認識用データの集合からランダムに選択された最大で１０個の画像により、各目標クラスが表現される。）を実装している。 Here, in order to show that the data set selected for evaluation is difficult for unknown object detection, other results described as a baseline technique in Non-Patent Document 1 are also referred to. In Non-Patent Document 1, using the same data set with the same settings, in addition to the two methods of DAP and IAP, there are two more baselines ((i) a simple one-time learning approach, The system learns a diagonal covariance matrix of feature variances from 40 teacher classes and uses the resulting Mahalanobis distance for nearest neighbor identification. (Ii) Standard commonplace. Multi-class identification, where half-image segmentation is given to teacher and recognition data (input data), randomly selected from a set of recognition data by initial baseline (one-time learning) Each target class is represented by a maximum of 10 images).

しかしながら、最初のベースラインでは、いくつかのサンプル（１から１０個のサンプル）が与えられたとしても、クラスごとの１個の教師画像に対してせいぜい１４．３％の平均精度にすぎず、クラスごとの１０個の教師画像に対しては１８．９％の平均精度にすぎない。この精度は、１０％のチャンスレベル以上に明らかに向上させるものではない。また、二番目のベースライン（標準的なマルチクラス識別）は、問題を大いに簡単化するために、同一のクラスからの多数の教師サンプルが利用可能であるという想定であるにも関わらず、このベースラインは、６５．９％のマルチクラスの精度しか達成することができない。 However, in the initial baseline, even if several samples (1 to 10 samples) are given, the average accuracy is only 14.3% for one teacher image per class, For 10 teacher images per class, it is only 18.9% average accuracy. This accuracy does not clearly improve beyond the 10% chance level. Also, the second baseline (standard multi-class identification), despite the assumption that many teacher samples from the same class are available, greatly simplifies the problem. The baseline can only achieve multi-class accuracy of 65.9%.

図８は、ＡＴ−ＳＯＩＮＮと他のベースラインとの比較結果をまとめた表である。認識は１０個の動物クラスで行った。標準手法（Ordinary Method）及び１回完結の学習手法（One-shot Learning）による結果は、非特許文献１に記載されており、これら手法による結果は、非特許文献１の記載内容から直接得ている。未知対象検出に関して、比較可能な５個の手法のうち最後の３個の手法のみが、認識対象のクラスからの任意のサンプルを必要とせずに検出を行うことができ、それら３個の手法のうちで、ＡＴ−ＳＯＩＮＮのみが追加学習が可能である。さらには、ＡＴ−ＳＯＩＮＮは、ＤＡＰ及びＩＡＰが必要とするよりも極めて短時間で学習を行うことができる。なお、ＤＡＰ及びＩＡＰに関して、各属性に対するＳＶＭ（識別器）の学習に要する学習時間については明確に示されていないものの、１つのＳＶＭの学習には数時間を必要とすることを確認している。 FIG. 8 is a table summarizing the results of comparison between AT-SOINN and other baselines. Recognition was done in 10 animal classes. The results of the standard method (Ordinary Method) and the one-time learning method (One-shot Learning) are described in Non-Patent Document 1, and the results of these methods are obtained directly from the contents of Non-Patent Document 1. Yes. With respect to unknown object detection, only the last three techniques out of the five comparable techniques can detect without the need for any sample from the class of recognition object. Among them, only AT-SOINN can perform additional learning. Furthermore, AT-SOIN can learn in a much shorter time than required by DAP and IAP. Regarding DAP and IAP, although the learning time required for learning of the SVM (identifier) for each attribute is not clearly shown, it has been confirmed that several hours are required for learning of one SVM. .

図８に示されるように、得られたベースラインとの比較に基づけば、１回完結の学習アプローチが仮に認識対象のクラスからの任意のサンプルを必要としないものであっても、ＡＴ−ＳＯＩＮＮはその性能を超えている。また、ＡＴ−ＳＯＩＮＮは、非特許文献１のＩＡＰ手法さえも超えている。ＡＴ−ＳＯＩＮＮとＤＡＰ手法との比較では、精度の観点では、ＡＴ−ＳＯＩＮＮはより低いパフォーマンスを示している。しかしながら、ＡＴ−ＳＯＩＮＮは、事前の教師データセットを必要とすることなく追加的に画像を処理することができ、より重要なことには、ＩＡＰ及びＤＡＰよりも遥かに少ない学習時間で処理することができ、精度の違いは落胆させるほどに大きなものではないと考えている。加えて、ＡＴ−ＳＯＩＮＮは追加学習が可能なアルゴリズムであり、学習データの追加によって識別率を向上させることが可能である。また、属性の追加が可能であり、それにより識別率を向上させることが可能である。ＤＡＰなど従来の転移学習方式では学習データの追加、属性の追加の機能を持っていない。 As shown in FIG. 8, based on a comparison with the obtained baseline, even if the one-time learning approach does not require any samples from the class to be recognized, AT-SOINN Exceeds its performance. Further, AT-SOIN exceeds even the IAP method of Non-Patent Document 1. In comparison between the AT-SOINN and the DAP method, the AT-SOINN shows lower performance in terms of accuracy. However, AT-SOIN can additionally process images without the need for a prior teacher data set, and more importantly, with much less learning time than IAP and DAP. I think that the difference in accuracy is not big enough to be discouraged. In addition, AT-SOIN is an algorithm capable of additional learning, and the identification rate can be improved by adding learning data. Further, it is possible to add attributes, thereby improving the identification rate. Conventional transfer learning methods such as DAP do not have functions for adding learning data and adding attributes.

また、上述した図５に示した属性識別結果には、他の興味深い点が含まれている。ROCカーブ（AUC）の下でのエリアによって評価された個々の属性識別器の品質について考慮すると、図５の結果は、ＡＴ−ＳＯＩＮＮのパフォーマンス（０．６８）とＤＡＰのパフォーマンス（０．７２）との間で重要な違いがないことを示している。ＡＴ−ＳＯＩＮＮによる対象識別の精度（２８．９％）を、ＤＡＰの精度（４０．５％）よりも低くしている理由の１つは、ＡＴ−ＳＯＩＮＮが、出力空間を粗い距離から確率へと変換するのに適した余計な教師データセットを持っていないことにあると考えている。 Further, the attribute identification result shown in FIG. 5 described above includes other interesting points. Considering the quality of the individual attribute classifiers evaluated by the area under the ROC curve (AUC), the results in FIG. 5 show the AT-SOIN performance (0.68) and the DAP performance (0.72). There is no significant difference between and. One of the reasons why the accuracy of object identification by AT-SOINN (28.9%) is lower than that of DAP (40.5%) is that AT-SOIN makes the output space from coarse distance to probability. I think that there is no extra teacher data set suitable for conversion.

図９は、各特徴に対する各属性識別器の学習に必要とする計算時間を示している。
同じ量の教師画像（４０個のクラス、２４２９４個の画像、６個の特徴）について、ＤＡＰ及びＩＡＰ手法では、１つの特徴の１つの属性に対するＳＶＭの学習に数時間を必要とする。これに対して、ＡＴ−ＳＯＩＮＮでは、わずか約２００〜５００秒を必要とすることから、１つの属性識別器の学習に要する計算時間を劇的に減少させることができる。また、ＡＴ−ＳＯＩＮＮの学習メカニズムはオンラインかつ追加学習可能であるために、教師画像を徐々にシステムに入力することができる。 FIG. 9 shows the calculation time required for learning each attribute classifier for each feature.
For the same amount of teacher images (40 classes, 24294 images, 6 features), the DAP and IAP methods require several hours to learn SVM for one attribute of one feature. On the other hand, since AT-SOIN requires only about 200 to 500 seconds, the calculation time required to learn one attribute classifier can be dramatically reduced. In addition, since the AT-SOIN learning mechanism is online and additional learning is possible, teacher images can be gradually input to the system.

以上説明したように、本発明によれば、属性の転移及び学習をオンラインかつ追加学習可能なアプローチにより実現することができる。このアプローチを成功させることは、ロボティクスのコミュニティにおいて重要なインパクトを持ち、特に、ロボティクス操作に関して重要なインパクトを持つものである。 As described above, according to the present invention, attribute transfer and learning can be realized by an online and additional learning approach. The success of this approach has a significant impact on the robotics community, especially with respect to robotics operations.

上述した比較基準のベースラインによって、本発明が処理可能なデータセットは、ロボティクスで見られるもの以上に十分に困難なものであることは明らかである。このデータセットは、いかなるセグメンテーションの注釈も持っていない。また、全ての特徴は、画像全体から抽出されている。そして、いくつかの画像は図１０に例示するようにその主な部分に目標動物を含んでいるかもしれないが、また多くの画像は図１１に例示するようにその主な部分に背景を含んでいる。このことは、ベースライン手法（標準的な手法）が、教師に対して認識テストに使用するデータ集合をなぜ半々に分割した上で、認識テストで６５．９％の精度のみしか達成できないのかを明らかにする。 By virtue of the baseline baseline described above, it is clear that the data sets that can be processed by the present invention are much more difficult than those found in robotics. This data set does not have any segmentation annotations. All features are extracted from the entire image. Some images may include the target animal in its main part as illustrated in FIG. 10, but many images include a background in its main part as illustrated in FIG. It is out. This is why the baseline method (standard method) can only achieve 65.9% accuracy in the recognition test after dividing the data set used for the recognition test in half by the teacher. To clarify.

さらに加えて、上述した実験では、固定数の属性に対する結果を示しているが、ＡＴ−ＳＯＩＮＮは、属性の個数が固定されていない状況においても実行可能である。ＡＴ−ＳＯＩＮＮでは、単純にＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを追加することで、新たな属性識別器を追加することができる。 In addition, although the above-described experiment shows results for a fixed number of attributes, AT-SOINN can be executed even in a situation where the number of attributes is not fixed. In AT-SOINN, a new attribute classifier can be added by simply adding Adjusted-SOINN.

具体的には、ＡＴ−ＳＯＩＮＮが、例えば、現在８５×６個のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮ（６個の特徴空間において８５個の属性識別）を有しており、学習された各クラスの画像をいくつか格納している（例えば、各クラスに対して１００個の画像）場合を想定する。この場合において、ユーザが新たな属性を追加しようするときには、ユーザは、各クラスに対して１つ以上の属性（０又は１）を単純にラベル付けすればよい。従って、ＡＴ−ＳＯＩＮＮは、同じ量の教師画像を用いることで新たな属性識別器を生成することができ、また、８６個の属性を識別することができるようになる。 Specifically, the AT-SOIN currently has, for example, 85 × 6 Adjusted-SOINN (85 attribute identifications in 6 feature spaces), and some images of each learned class Assume the case of storing (for example, 100 images for each class). In this case, when the user attempts to add a new attribute, the user simply labels one or more attributes (0 or 1) for each class. Therefore, the AT-SOIN can generate a new attribute classifier by using the same amount of teacher images, and can identify 86 attributes.

属性識別器の追加に関するテクニックは、同じ属性ラベルを各クラスに対して単に追加し、そして、新たな識別器に対するＳＶＭを学習するために教師画像を再利用することで、これまでの従来手法によっても簡単になすことができるものと考えられるかもしれない。計算時間を考慮しないのであれば、この考えは正しい。しかし、ロボティクスにおけるオンラインでの追加学習可能な利用のためには、１つ以上の属性を新たに追加するために数時間を犠牲にすることは、とても合理的なことではない。ＡＴ−ＳＯＩＮＮでは、同じテクニックを利用することで、わずか３８〜４０分によって、追加の新たな属性を学習することができる。さらには、利用する特徴の個数が固定されている場合には、ＡＴ−ＳＯＩＮＮを並列に実行するようにハードウェアを構成することができ、これにより、学習のために必要な時間をわずか１０分にまで減少させることができる。 The technique for adding attribute discriminators is by adding the same attribute label to each class and reusing the teacher image to learn the SVM for the new discriminator, by the traditional method of the past. May be thought of as something that can be done easily. This idea is correct if calculation time is not taken into account. However, for online addi- tional use in robotics, it is not very reasonable to sacrifice hours to add one or more attributes. In AT-SOINN, using the same technique, additional new attributes can be learned in as little as 38-40 minutes. Furthermore, if the number of features used is fixed, the hardware can be configured to run AT-SOIN in parallel, which reduces the time required for learning to only 10 minutes. Can be reduced to

ロボティクスにおいては、十分に準備された学習のためのシーンから開始することで、ロボットは、より正確に属性を学習することができるようになる。オンラインで追加学習可能な手法によって属性を学習することで、ロボットはユーザにサービスを提供しながら、より多くの教師画像を徐々に取得していくことができる。これにより、本発明は最終的には、ロボティクスにおける想像能力へと繋がるものであり、ロボットは、人間からの言葉による説明のみに基づいて、未知対象を取りに行くことができる。 In robotics, starting from a well-prepared scene for learning allows the robot to learn attributes more accurately. By learning attributes by a method that can be additionally learned online, the robot can gradually acquire more teacher images while providing services to the user. As a result, the present invention ultimately leads to the imagination ability in robotics, and the robot can go to unknown objects based only on verbal explanations from humans.

＜実施の形態２.＞
以上の実施の形態１においては、属性をネガティブ、ポジティブの２値の値として取り扱ったが、属性を多値でとらえることができれば、より柔軟に識別が可能となる。そこで、本願発明者等が鋭意実験研究した結果、多値の属性情報を識別可能な識別器を生成する方法を見出した。以下の説明では、この多値の属性情報を使用して識別器を生成することができるアルゴリズムをＳＴＡＲ（ＳＴＡｔｉｓｔｉｃａｌＲｅｃｏｇｎｉｔｉｏｎ）−ＳＯＩＮＮということする。 <Embodiment 2.>
In the first embodiment described above, attributes are treated as negative and positive binary values. However, if the attributes can be regarded as multi-values, identification can be performed more flexibly. Therefore, as a result of intensive experiments conducted by the inventors of the present application, a method for generating a discriminator capable of discriminating multi-value attribute information was found. In the following description, an algorithm that can generate a discriminator using this multi-valued attribute information is referred to as STAR (STATISTICAL RECOGNITION) -SOINN.

ＳＴＡＲ−ＳＯＩＮＮは、上述のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮ（Ｓｅｌｆ−ＯｒｇａｎｉｚｉｎｇＩｎｃｒｅｍｅｎｔａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を改良したものである。 STAR-SOINN is an improvement of the above-described Adjusted-SOINN (Self-Organizing Incremental Neural Network).

＜Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮ＞
ここで、ＳＴＡＲ−ＳＯＩＮＮの理解を容易とするため、上述のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮについて、さらに詳細に説明する。 <Adjusted-SOINN>
Here, in order to facilitate understanding of STAR-SOINN, the above-described Adjusted-SOINN will be described in more detail.

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、オンラインで追加学習可能な自己増殖型ニューラルネットワークである。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、重みベクトルとして表わされるノードが、自律的に増殖及び消滅することを特徴とする。ノード同士は、所定の条件を満たす場合に、エッジと呼ばれる仮想的な線で結合される。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、このエッジをたどって互いに到達できるノード同士を同じクラスタとみなすことにより、クラスタリングを行う。また、エッジは年齢と呼ばれるパラメータを持っており、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、所定の年齢に達したエッジを削除する。これにより、ノイズとみなし得るノードを所定のタイミングで削除可能としている。 Adjusted-SOINN is a self-propagating neural network that can be additionally learned online. Adjusted-SOINN is characterized in that nodes represented as weight vectors autonomously proliferate and disappear. Nodes are connected by virtual lines called edges when predetermined conditions are satisfied. Adjusted-SOINN performs clustering by regarding nodes that can reach each other by following this edge as the same cluster. Also, the edge has a parameter called age, and Adjusted-SOINN deletes an edge that has reached a predetermined age. Thereby, a node that can be regarded as noise can be deleted at a predetermined timing.

図１２を用いて、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮの学習アルゴリズム、すなわちＡｄｊｕｓｔｅｄ−ＳＯＩＮＮに新たなノードが入力された場合のＡｄｊｕｓｔｅｄ−ＳＯＩＮＮの動作について説明する。 The adjusted-SOINN learning algorithm, that is, the operation of Adjusted-SOINN when a new node is input to Adjusted-SOINN will be described with reference to FIG.

Ｓ３０１：重みベクトルを持つ入力ノードが、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮに新たに入力される。 S301: An input node having a weight vector is newly input to Adjusted-SOINN.

Ｓ３０２：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、入力ノードと既存のノードとの間の距離、典型的にはユークリッド距離を計算する。ユークリッド距離とは、数７により定義される距離をいう。数７において、ｄはユークリッド距離、ｆ及びｇはそれぞれｎ次元のベクトルを示す。
S302: Adjusted-SOINN calculates the distance between the input node and the existing node, typically the Euclidean distance. The Euclidean distance is a distance defined by Equation 7. In Equation 7, d is an Euclidean distance, and f and g are n-dimensional vectors, respectively.

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、この計算結果より、このユークリッド距離が最も近いノード（第１勝者）と２番目に近いノード（第２勝者）とを決定する。 Adjusted-SOINN determines the node (first winner) with the shortest Euclidean distance and the second node (second winner) with the closest Euclidean distance from this calculation result.

Ｓ３０３：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、この第１勝者及び第２勝者ノードがもつ類似度閾値をそれぞれ計算する。ここで、類似度閾値とは、あるノードが隣接ノードを持つ場合、その隣接ノードとの最大距離をいう。あるノードが隣接ノードを持たない場合は、そのノードとそれ以外のノードとの最小距離をいう。類似度閾値は、数８により求められる。数８において、Ｎはノードｉの隣接ノードの集合、Ｗはノードｉの重みベクトル、Ａはノード全体の集合を示す。
S303: Adjusted-SOINN calculates the similarity threshold of the first winner node and the second winner node, respectively. Here, when a certain node has an adjacent node, the similarity threshold is a maximum distance from the adjacent node. When a node does not have an adjacent node, it means the minimum distance between that node and other nodes. The similarity threshold is obtained by Equation 8. In Equation 8, N is a set of adjacent nodes of node i, W is a weight vector of node i, and A is a set of all nodes.

Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、これらの類似度閾値と、上述の入力ノードと第１勝者及び第２勝者とのユークリッド距離とを相互に比較する。比較の結果、入力ノードと第１勝者及び第２勝者とのユークリッド距離が、第１勝者又は第２勝者の類似度閾値よりも大きい場合、入力ノードは、第１勝者及び第２勝者とは異なるクラスタに属するとみなされる。この場合、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、入力ノードの位置に新たなノードを挿入すべきと判定する。一方、入力ノードと第１勝者及び第２勝者とのユークリッド距離が、第１勝者及び第２勝者の類似度閾値よりも小さい場合、入力ノード、第１勝者及び第２勝者はいずれも同一のクラスタに属するとみなされる。この場合、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、新たなノードを挿入すべきでないと判定する。 Adjusted-SOINN compares these similarity thresholds with the Euclidean distances between the input node and the first and second winners. As a result of the comparison, when the Euclidean distance between the input node and the first winner and the second winner is larger than the similarity threshold of the first winner or the second winner, the input node is different from the first winner and the second winner. Considered to belong to the cluster. In this case, Adjusted-SOINN determines that a new node should be inserted at the position of the input node. On the other hand, when the Euclidean distance between the input node and the first winner and the second winner is smaller than the similarity threshold of the first winner and the second winner, the input node, the first winner, and the second winner are all in the same cluster. Considered to belong to. In this case, Adjusted-SOIN determines that a new node should not be inserted.

Ｓ３０４：Ｓ３０３において、新たなノードを挿入すべきと判定された場合、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、入力ノードの位置に新たなノードを挿入する。 S304: If it is determined in S303 that a new node should be inserted, Adjusted-SOINN inserts a new node at the position of the input node.

Ｓ３０５：Ｓ３０３において、新たなノードを挿入すべきでないと判定された場合、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、第１勝者と第２勝者の間にエッジが存在するか否かを判定する。 S305: If it is determined in S303 that a new node should not be inserted, Adjusted-SOINN determines whether an edge exists between the first winner and the second winner.

Ｓ３０６：Ｓ３０５において、第１勝者と第２勝者の間にエッジが存在しないと判定された場合、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、それらの間にエッジを生成する。 S306: If it is determined in S305 that no edge exists between the first winner and the second winner, Adjusted-SOINN generates an edge between them.

Ｓ３０７：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、Ｓ３０５においてエッジが存在しないと判定された場合、Ｓ３０５において生成したエッジの年齢を０とする。一方、Ｓ３０５においてエッジが存在すると判定された場合、既に存在していたエッジの年齢を０とする。加えて、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、第１勝者に接続されている全てのエッジの年齢をインクリメントする。 S307: Adjusted-SOINN sets the age of the edge generated in S305 to 0 when it is determined in S305 that no edge exists. On the other hand, if it is determined in S305 that an edge exists, the age of the edge that has already existed is set to zero. In addition, Adjusted-SOINN increments the age of all edges connected to the first winner.

Ｓ３０８：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、あらかじめ定められた閾値（ａｇｅ）を超えた年齢を持つエッジを削除する。ａｇｅは、ノイズ等の影響により誤って生成されるエッジを削除するために設定されるパラメータである。ａｇｅに小さな値を設定すれば、エッジは削除されやすくなり、ノイズの影響は防ぎやすくなるが、ａｇｅが極端に小さければ、エッジが頻繁に削除されるようになり、学習結果が不安定になる。一方、ａｇｅが大きすぎれば、ノイズの影響で生成されたエッジを適切に取り除くことができない。それで、ａｇｅには、実験により算出された適切な値を設定することが望ましい。 S308: Adjusted-SOINN deletes an edge having an age exceeding a predetermined threshold (age). “age” is a parameter set to delete an edge that is erroneously generated due to the influence of noise or the like. Setting a small value for age makes it easier to remove edges and prevents the effects of noise. However, if age is extremely small, edges are frequently deleted, resulting in unstable learning results. . On the other hand, if age is too large, edges generated due to noise cannot be removed appropriately. Therefore, it is desirable to set an appropriate value calculated through experiments in age.

Ｓ３０９：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、第１勝者及び第１勝者とエッジを介して直接接続されている隣接ノードの重みベクトルを、以下の数９、数１０により更新する。数９、数１０において、ΔＷｉはノードｉの重みベクトルの更新量、ΔＷｊはノードｊの重みベクトルの更新量を示す。また、ｉは第１勝者、ｊは隣接ノード、Ｗｋは入力ノードの重みベクトル、Ｍｉはノードｉがこれまで第１勝者になった回数を示す。これにより、入力ノードは、第１勝者及び隣接ノードにいわば吸収される形となる。
S309: Adjusted-SOINN updates the weight vector of the first winner and the adjacent node directly connected to the first winner via the edge by the following formulas (9) and (10). In Equations 9 and 10, ΔWi represents the update amount of the weight vector of node i, and ΔWj represents the update amount of the weight vector of node j. Further, i is the first winner, j is the adjacent node, Wk is the weight vector of the input node, and Mi is the number of times node i has been the first winner so far. As a result, the input node is absorbed by the first winner and adjacent nodes.

Ｓ３１０：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、以下の２つの条件を満たすノードを、すべてのノードの中から抽出し、削除対象と判定する。 S310: Adjusted-SOINN extracts nodes satisfying the following two conditions from all the nodes, and determines to be deleted.

１つ目の条件は、入力されたノードの数が、あらかじめ定められた設定値、例えば定数λの倍数にあたるか否かを判定する。この設定値は、ノイズとみなし得るノードを定期的に削除するために設定されるパラメータである。λに小さな値を設定すれば、頻繁にノイズ処理を実施することができるが、λが極端に小さければ、実際にはノイズでないノードまで誤って削除してしまう。一方、λが大きすぎれば、ノイズの影響で生成されたノードを適切に取り除くことができない。それで、λには、実験により算出された適切な値を設定することが望ましい。 The first condition determines whether or not the number of input nodes corresponds to a predetermined setting value, for example, a multiple of a constant λ. This set value is a parameter that is set in order to periodically delete nodes that can be regarded as noise. If a small value is set for λ, noise processing can be performed frequently. However, if λ is extremely small, even nodes that are not actually noise are erroneously deleted. On the other hand, if λ is too large, a node generated due to the influence of noise cannot be removed appropriately. Therefore, it is desirable to set λ to an appropriate value calculated by experiment.

２つ目の条件は、ノードの隣接ノード数があらかじめ定められた閾値η以下であることである。閾値ηは、ノード群のうち低密度の領域、すなわちノイズとみなし得るノードを定義するためのパラメータである。 The second condition is that the number of adjacent nodes of the node is equal to or less than a predetermined threshold value η. The threshold value η is a parameter for defining a low density region of the node group, that is, a node that can be regarded as noise.

Ｓ３１１：Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは、Ｓ４０５において削除対象として抽出されたノードを削除する。 S311: Adjusted-SOINN deletes the node extracted as the deletion target in S405.

Ｓ３１２：入力されたノードの数が、あらかじめ定められた定数ρに達したならば、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮは学習を完了する。未だ達していない場合は、次の入力ノードの入力を受付け、上述した手順により学習を継続する。 S312: Adjusted-SOINN completes learning when the number of input nodes reaches a predetermined constant ρ. If not reached yet, the input of the next input node is accepted and the learning is continued by the above-described procedure.

＜ＳＴＡＲ−ＳＯＩＮＮ＞
次に、ＳＴＡＲ−ＳＯＩＮＮについて説明する。 <STAR-SOINN>
Next, STAR-SOINN will be described.

上述のように、ＳＴＡＲ−ＳＯＩＮＮは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮに改良を加えたものである。ＳＴＡＲ−ＳＯＩＮＮとＡｄｊｕｓｔｅｄ−ＳＯＩＮＮとの主な相違点を以下に示す。 As described above, STAR-SOINN is an improved version of Adjusted-SOINN. The main differences between STAR-SOINN and Adjusted-SOINN are shown below.

ＳＴＡＲ−ＳＯＩＮＮは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮに対して統計情報を取り入れる拡張を施すことにより、認識率の向上や情報量の削減を図っている。これを実現するため、ＳＴＡＲ−ＳＯＩＮＮでは、ノードの追加や削除といった重みベクトルの管理手法に幾らかの変更を加えている。また、ノードに付加される情報であるラベルの概念を導入している。ＳＴＡＲ−ＳＯＩＮＮは、ラベルとして、例えばそのノードが属するクラス、備える属性等の情報を保持させることができる。また、この属性値として、多値（０又は１の２値でない、連続値）を保持することが可能である。なお、ラベルは、ノードがＳＴＡＲ−ＳＯＩＮＮに入力される際に、何らかの方法によりそのノードに予め付加される。 STAR-SOINN aims to improve the recognition rate and reduce the amount of information by applying an extension for incorporating statistical information to Adjusted-SOINN. In order to realize this, in STAR-SOINN, some changes are made to the weight vector management method such as addition and deletion of nodes. In addition, the concept of a label, which is information added to a node, is introduced. STAR-SOINN can hold, for example, information such as a class to which the node belongs and attributes provided as a label. In addition, as this attribute value, it is possible to hold a multi-value (not a binary value of 0 or 1 but a continuous value). Note that the label is added to the node in advance by some method when the node is input to the STAR-SOINN.

ここで、クラスとは、識別器の識別対象を示す。識別器が識別するものが、ライオンかトラ、パンダかクマ、のような動物であれば、それらがクラスとなる。そして、属性とは、クラスの特徴を示す。動物の名前がクラスになる場合であれば、属性とは、肉食か否か、草食か否か、体毛が長いか否か、などの情報を示す。また、識別対象（クラス）が、リビングルームなのか、浴室なのか、ベッドルームか、等の部屋を識別する場合は、それら部屋名がクラスとなり、この場合の属性は、テーブルがあるか、ベッドがあるか、バスタブがあるか、等の情報である。 Here, the class indicates an identification target of the classifier. If the classifier identifies animals such as lions or tigers, pandas or bears, they are classes. The attribute indicates the characteristics of the class. If the name of the animal is a class, the attribute indicates information such as whether or not the carnivorous, the herbivorous, or the long hair. In addition, when the identification target (class) is a room such as a living room, a bathroom, or a bedroom, the room name is a class. In this case, the attributes are whether there is a table, a bed There is information such as whether there is a bathtub or a bathtub.

ラベルとは、各ノードに与えられる、そのノードのクラス及び属性の情報である。上述の実施の形態１においては、１つのＡＴ−ＳＯＩＮＮがラベルとして扱えるのは、１つのクラスの１つの属性のみであった。すなわち、ある１つのＡＴ−ＳＯＩＮＮは、例えば「ライオン」というクラスの、例えば「肉食である」という属性のみを学習するものであった。これに対し、ＳＴＡＲ−ＳＯＩＮＮにおいては、１つのクラス「ライオン」の全属性情報（例えば、８５個の属性）を全てラベルとして付与することができる。例えば、「ライオン」というクラスが有する「肉食である」「草食である」等の複数の属性それぞれについて、「肉食である」は「１」（ｐｏｓｉｔｉｖｅ）、「草食である」は「０」（ｎｅｇａｔｉｖｅ）といった値を付与することが可能である。本実施の形態においては、１つのクラスの全属性のデータをラベルとして付与するものとして説明する。 The label is information on the class and attribute of the node given to each node. In Embodiment 1 described above, one AT-SOINN can handle only one attribute of one class as a label. In other words, one AT-SOINN learns only the attribute of “meat”, for example, of the class “lion”, for example. On the other hand, in STAR-SOINN, all attribute information (for example, 85 attributes) of one class “Lion” can be assigned as labels. For example, for each of a plurality of attributes such as “meat” and “herbivore” possessed by the class “lion”, “meat” is “1” (positive), and “herbivore” is “0” ( It is possible to assign a value such as negative). In the present embodiment, description will be made assuming that data of all attributes of one class is given as a label.

また、ラベルの値は、上述の例のように「０」「１」の２値で与えることもできるが、連続値、例えば、１〜０の正規化された値とすることも可能である。例えば、たまに肉を食べる動物の場合に、「肉食動物」という属性を単純に「１」（ｐｏｓｉｔｉｖｅ）にするのではなく、「肉食動物」という属性を"４０％関係性がある"と定義することができる。このように、連続値の入力に対応することにより、精度の良い属性定義を行うことができる。例えば、人間などのような雑食である動物を表現する時、上述のＡＴ−ＳＯＩＮＮの場合は「０」「１」の２値で属性値を与える必要があるため、「肉食動物」と「草食動物」の両方の属性をポジティブとする必要があった。一方、ＳＴＡＲ−ＳＯＩＮＮでは、属性値として連続値を扱うことができるので、例えば両方の属性を０．５と設定することもできる。このように設定すれば、雑食である人間を、本来「肉食動物」が意味するところの動物とは区別して学習することができることになる。なお、ＳＴＡＲ−ＳＯＩＮＮおいても、両方の属性を１００％と設定することや、連続値である属性値を所定の閾値の前後でバイナリ化する等の手法により、ＡＴ−ＳＯＩＮＮと同様に利用することも可能である。なお、属性値は正規化する必要はなく、そのままのデータ値を使用してもよい。 The label value can be given as a binary value of “0” and “1” as in the above example, but can also be a continuous value, for example, a normalized value of 1 to 0. . For example, in the case of an animal that sometimes eats meat, the attribute “carnivore” is not simply set to “1” (positive), but the attribute “carnivore” is defined as “40% related”. be able to. In this way, it is possible to define the attribute with high accuracy by corresponding to the continuous value input. For example, when expressing an omnivorous animal such as a human being, it is necessary to give attribute values as binary values of “0” and “1” in the case of the above-described AT-SONN, so that “carnivorous animals” and “herbivores” Both attributes of “animal” had to be positive. On the other hand, since STAR-SOINN can handle continuous values as attribute values, for example, both attributes can be set to 0.5. By setting in this way, omnivorous humans can be learned separately from the animals that are originally meant by “carnivores”. Note that STAR-SOINN is also used in the same manner as AT-SOINN by setting both attributes to 100% and binarizing attribute values that are continuous values before and after a predetermined threshold. It is also possible. The attribute value does not need to be normalized, and the data value as it is may be used.

なお、このように連続値に対応したラベルは、ここで説明する学習時だけではなく、後述する認識時においても非常に有用である。例えば、認識対象として与えられた未学習の動物が「少しだけ肉を食べる」というような、曖昧な認識結果を出力することも、属性値が連続であることによって可能となる。このような特性により、ＳＴＡＲ−ＳＯＩＮＮは、より人間の認識や感覚を模擬できる認識器を提供することができる。 Note that the label corresponding to the continuous value is very useful not only at the time of learning described here but also at the time of recognition described later. For example, an unrecognized animal given as a recognition target can output an ambiguous recognition result such as “eating a little meat” because the attribute values are continuous. Due to such characteristics, STAR-SOINN can provide a recognizer that can simulate human recognition and senses.

また、ＳＴＡＲ−ＳＯＩＮＮでは、容易に属性を追加することが可能である。具体的には、入力ノードに付与されるラベルを増やすことで、ＳＯＩＮＮが保有している属性を増やすことができる。上述のＡＴ−ＳＯＩＮＮとは異なり、ＳＴＡＲ−ＳＯＩＮＮは、属性の増減がＳＯＩＮＮの数に影響しない。また、ＳＴＡＲ−ＳＯＩＮＮは、このような属性の追加の作業をオンラインで実行することが可能である。そのため、装置の動作中であっても、新しい属性の追加を柔軟に実行することが可能である。このように、ＳＴＡＲ−ＳＯＩＮＮは、環境や命令者の要求に合わせた学習や認識が必要な、人の生活環境のなかで働くロボットに適したオンライン学習性を備えている。 In STAR-SOINN, it is possible to easily add attributes. Specifically, by increasing the labels given to the input nodes, it is possible to increase the attributes possessed by SOINN. Unlike AT-SOINN described above, the increase or decrease in attributes does not affect the number of SOINNs in STAR-SOINN. Further, STAR-SOINN can execute such an attribute addition operation online. Therefore, it is possible to flexibly add a new attribute even during operation of the apparatus. Thus, STAR-SOINN has an online learning property suitable for a robot working in a human living environment that requires learning and recognition in accordance with the environment and the demands of the commander.

次に、図１３を用いて、ＳＴＡＲ−ＳＯＩＮＮの学習アルゴリズム、すなわちＳＴＡＲ−ＳＯＩＮＮに新たなノードが入力された場合のＳＴＡＲ−ＳＯＩＮＮの動作について説明する。 Next, a learning algorithm of STAR-SOINN, that is, the operation of STAR-SOIN when a new node is input to STAR-SOINN will be described with reference to FIG.

Ｓ４０１：重みベクトルをもつ入力ノードと、その入力ノードのラベルが、ＳＴＡＲ−ＳＯＩＮＮに新たに入力される。 S401: An input node having a weight vector and a label of the input node are newly input to STAR-SOINN.

Ｓ４０２：ＳＴＡＲ−ＳＯＩＮＮは、既存のノードの年齢をすべてインクリメントする。 S402: STAR-SOIN increments all ages of existing nodes.

Ｓ３０２：ＳＴＡＲ−ＳＯＩＮＮは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮと同様に、入力ノードに対する第１勝者及び第２勝者ノードを決定する。 S302: STAR-SOINN determines the first winner node and the second winner node for the input node in the same manner as Adjusted-SOINN.

Ｓ３０３：ＳＴＡＲ−ＳＯＩＮＮは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮと同様に、第１勝者及び第２勝者ノードの類似度閾値をそれぞれ計算し、これを用いて入力ノードと第１勝者及び第２勝者とが同一のクラスタに属するか否かを判定する。 S303: STAR-SOINN calculates the similarity threshold values of the first winner node and the second winner node, respectively, similarly to Adjusted-SOINN, and uses the same cluster for the input node, the first winner, and the second winner. It is judged whether it belongs to.

Ｓ４０３：Ｓ３０３において同一クラスタでないと判定された場合、入力ノードと同じ位置に新たなノードを生成する。ここで、ＳＴＡＲ−ＳＯＩＮＮにおいては、新たなノードに対し、入力ノードのラベル重みも設定することが望ましい。ラベル重みとは、ノードに付与されるラベルの値をいう。すなわち、クラス名、属性値が入力ノードにラベルとして付与されていたのであれば、それらのクラス名、属性値が新たなノードにラベルとして付与される。 S403: If it is determined in S303 that the clusters are not the same, a new node is generated at the same position as the input node. Here, in STAR-SOINN, it is desirable to set the label weight of the input node for a new node. The label weight is a label value given to a node. That is, if the class name and attribute value are assigned to the input node as labels, the class name and attribute value are assigned to the new node as labels.

Ｓ３０５乃至Ｓ３０９：Ｓ３０３において同一クラスタと判定された場合、ＳＴＡＲ−ＳＯＩＮＮは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮと同様に、第１勝者と第２勝者の間にエッジを生成し、そのエッジの年齢を０にする。既にエッジが存在する場合には、そのエッジの年齢を０とする。また、第１勝者に接続されているすべてのエッジの年齢をインクリメントする。その後、あらかじめ定められた閾値（ａｇｅ）を超えた年齢を持つエッジを削除する。ついで、ＳＴＡＲ−ＳＯＩＮＮは、第１勝者とその近傍ノードの重みベクトルを更新する。 S305 to S309: When the same cluster is determined in S303, STAR-SOINN generates an edge between the first winner and the second winner, and sets the age of the edge to 0, similarly to Adjusted-SOINN. If an edge already exists, the age of the edge is set to zero. In addition, the age of all edges connected to the first winner is incremented. Thereafter, an edge having an age exceeding a predetermined threshold (age) is deleted. Then, STAR-SOINN updates the weight vector of the first winner and its neighboring nodes.

Ｓ４０４：ＳＴＡＲ−ＳＯＩＮＮは、入力ノードのラベル情報の拡散を行う。すなわち、入力ノードのラベル重みに基づいて、第１勝者とその隣接ノード、及び第２勝者ノードとその隣接ノードのラベル重みを更新する。ラベル重みの更新は、例えば数１１及び数１２に従って行うことができる。ＳＴＡＲ−ＳＯＩＮＮでは、このように入力ノードのラベル情報を統計情報としてＳＴＡＲ−ＳＯＩＮＮ内に蓄積することにより、認識率を向上させている。
S404: STAR-SOINN spreads the label information of the input node. That is, based on the label weight of the input node, the label weights of the first winner and its adjacent node, and the second winner node and its adjacent node are updated. The label weight can be updated according to the equations 11 and 12, for example. In STAR-SOINN, the recognition rate is improved by storing the label information of the input node as statistical information in STAR-SOINN.

Ｓ４０５：上述のＳ４０３で新たなノードを生成した場合、及びＳ４０４でラベル重みを拡散させた場合は、このステップＳ４０５に進む。 S405: If a new node is generated in S403 described above, and if the label weight is diffused in S404, the process proceeds to step S405.

ＳＴＡＲ−ＳＯＩＮＮは、以下の２つの条件を満たすノードを、すべてのノードの中から抽出し、削除対象と判定する。 STAR-SOINN extracts nodes satisfying the following two conditions from all the nodes and determines that they are to be deleted.

１つ目の条件は、ノードの年齢があらかじめ定められた設定値、例えば定数λの倍数にあたることである。この設定値は、ノイズとみなし得るノードを定期的に削除するために設定されるパラメータである。λに小さな値を設定すれば、頻繁にノイズ処理を実施することができるが、λが極端に小さければ、実際にはノイズでないノードまで誤って削除してしまう。一方、λが大きすぎれば、ノイズの影響で生成されたノードを適切に取り除くことができない。そこで、λは、例えば実験等により得られた値を設定することが望ましい。なお、ＳＴＡＲ−ＳＯＩＮＮにおけるλは、Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮにおけるλとは意味合いが異なる。Ａｄｊｕｓｔｅｄ−ＳＯＩＮＮでは、過去に入力されたノードの数をλにより評価し、削除ノードの抽出を行っていたが、この方法では生成されてからの時間が比較的短いノードは削除対象となりやすい。一方、ＳＴＡＲ−ＳＯＩＮＮでは、各ノードに年齢という概念を導入し、入力ノード数ではなく、各ノードの年齢をλにより評価し、削除ノードの抽出を行うこととした。すなわち、各ノードは、一定の年齢に達すると、削除するか否かの判定がなされる。このことにより、ノードの生成タイミングに影響されることなくノイズの除去を行うことができる。 The first condition is that the age of the node corresponds to a predetermined set value, for example, a multiple of a constant λ. This set value is a parameter that is set in order to periodically delete nodes that can be regarded as noise. If a small value is set for λ, noise processing can be performed frequently. However, if λ is extremely small, even nodes that are not actually noise are erroneously deleted. On the other hand, if λ is too large, a node generated due to the influence of noise cannot be removed appropriately. Therefore, it is desirable to set λ to a value obtained by experiments, for example. Note that λ in STAR-SOINN has a different meaning from λ in Adjusted-SOINN. In Adjusted-SOINN, the number of nodes input in the past is evaluated by λ and the deletion node is extracted. However, in this method, a node having a relatively short time after generation is likely to be deleted. On the other hand, in STAR-SOINN, the concept of age is introduced to each node, and the age of each node is evaluated by λ instead of the number of input nodes, and the deleted node is extracted. That is, when each node reaches a certain age, it is determined whether or not to delete. As a result, noise can be removed without being affected by the node generation timing.

２つ目の条件は、ノードの隣接ノード数（エッジで接続されているノード数）があらかじめ定められた閾値η以下であることである。閾値ηは、ノード群のうち低密度の領域、すなわちノイズとみなし得るノードを定義するためのパラメータである。なお、ηは、０以上の整数を示す。識別対象となるクラスに応じて、又は学習に使用す教師データの数に応じて、例えば実験等により、最適なηを設定すればよい。 The second condition is that the number of adjacent nodes (number of nodes connected by an edge) is equal to or less than a predetermined threshold η. The threshold value η is a parameter for defining a low density region of the node group, that is, a node that can be regarded as noise. Η represents an integer of 0 or more. Depending on the class to be identified or the number of teacher data used for learning, an optimal η may be set by experiments or the like.

以上、２つの条件により、例えば、λ＝１００、η＝２と設定した場合には、年齢が１００の倍数に達したノードについて、２以下のノードとエッジで接続されているノード（隣接ノードを２以下有するノード）は、削除される。 As described above, for example, when λ = 100 and η = 2 are set according to the two conditions, a node (adjacent node) is connected to an edge that is a multiple of 100 with an edge connected to two or less nodes. Nodes having 2 or less) are deleted.

Ｓ３１１：ＳＴＡＲ−ＳＯＩＮＮは、Ｓ４０５において削除対象として抽出されたノードを削除する。 S311: STAR-SOINN deletes the node extracted as the deletion target in S405.

Ｓ４０６：ＳＴＡＲ−ＳＯＩＮＮは、削除されたノードのラベル重みの少なくとも一部を、削除ノードの周辺のノードに譲渡する（ラベルの拡散）。ラベル重みの譲渡は、例えば、削除ノードと最も近いノード及び２番目に近いノードのラベル重みを、それぞれ数１３及び数１４に従って更新することにより行うことができる。数１３及び数１４において、ΔＬがラベル重みの増加量である。また、Κは削除ノードのラベル情報、ｃは属性、Ｋｃは削除ノードの属性ｃのラベル重み、Ｔω及びＴｓωはそれぞれ削除ノードと最も近いノード及び２番目に近いノードの類似度閾値である。Ｄω及びＤｓωは、それぞれ数１５及び数１６により定義される。数１５及び数１６において、Ｗω及びＷｓωはそれぞれ削除ノードと最も近いノード及び２番目に近いノードの重みベクトル、Ｗｄは削除ノードの重みベクトルである。
S406: STAR-SOIN transfers at least part of the label weight of the deleted node to nodes around the deleted node (label diffusion). The assignment of the label weight can be performed, for example, by updating the label weights of the node closest to the deleted node and the second closest node according to Equations 13 and 14, respectively. In Expressions 13 and 14, ΔL is an increase amount of the label weight. Also, Κ is the label information of the deleted node, c is the attribute, Kc is the label weight of the attribute c of the deleted node, and Tω and Tsω are the similarity thresholds of the node closest to the deleted node and the second closest node, respectively. Dω and Dsω are defined by Equations 15 and 16, respectively. In Equations 15 and 16, Wω and Wsω are the weight vectors of the node closest to the deleted node and the second closest node, respectively, and Wd is the weight vector of the deleted node.

ＳＴＡＲ−ＳＯＩＮＮにおいては、このようにラベルの拡散を行うことにより、ラベル情報を統計情報としてＳＴＡＲ−ＳＯＩＮＮ内に蓄積し、認識率を向上させすることを可能にしている。 In the STAR-SOINN, label diffusion is performed in this manner, whereby label information is accumulated as statistical information in the STAR-SOINN, and the recognition rate can be improved.

Ｓ３１２：入力されたノードの数が、あらかじめ定められた定数ρに達したならば、ＳＴＡＲ−ＳＯＩＮＮは学習を完了する。未だ達していない場合は、次の入力ノードの入力を受付け、上述した手順により学習を継続する。 S312: If the number of input nodes reaches a predetermined constant ρ, STAR-SOIN completes learning. If not reached yet, the input of the next input node is accepted and the learning is continued by the above-described procedure.

＜認識器生成装置の構成＞
次に、図１４を用いて、本実施の形態にかかる認識器生成装置１００の構成について説明する。認識器生成装置１００は、典型的には、専用コンピュータ、パーソナルコンピュータ（ＰＣ）などのコンピュータにより実現される。 <Configuration of recognizer generator>
Next, the configuration of the recognizer generation device 100 according to the present embodiment will be described with reference to FIG. The recognizer generation device 100 is typically realized by a computer such as a dedicated computer or a personal computer (PC).

認識器生成装置１００の構成要素１０１乃至１０７は、図示しない記憶手段等に格納された各種プログラムに基づいて、各種制御をそれぞれ実行する機能を有し、中央演算処理装置（ＣＰＵ）、読出専用メモリ（ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）、入出力ポート（Ｉ／Ｏ）等により実現される処理部である。 The constituent elements 101 to 107 of the recognizing device generating apparatus 100 have a function of executing various controls based on various programs stored in a storage means (not shown), etc., and a central processing unit (CPU), a read-only memory (ROM), a random access memory (RAM), an input / output port (I / O), and the like.

特徴抽出部１０１は、入力データ（教師データ）から特徴量を抽出する。この特徴量を、重みベクトル（入力ノード）として、後述するラベルとともに、ＳＴＡＲ−ＳＯＩＮＮに入力する処理を行う。 The feature extraction unit 101 extracts feature amounts from input data (teacher data). A process of inputting this feature quantity to the STAR-SOINN as a weight vector (input node) together with a label described later is performed.

教師データとしては、例えば画像センサをはじめとする種々のセンサ等から入力される任意の情報が利用され得る。本実施の形態では、教師データとして、特に動物の画像情報が用いられた場合を主に例示する。教師データが画像情報である場合は、例えばＳＩＦＴ、ＳＵＲＦ、ＨＯＧ、Ｈａａｒ−ｌｉｋｅ等の公知の技術を用いて、その画像情報から特徴量を抽出することができる。 As the teacher data, for example, arbitrary information input from various sensors including an image sensor can be used. In this embodiment, the case where animal image information is used as the teacher data is mainly exemplified. When the teacher data is image information, a feature amount can be extracted from the image information using a known technique such as SIFT, SURF, HOG, Haar-like, and the like.

なお、教師データ（入力データ）としては、画像だけでなく、音声や様々なセンサ情報も入力することができる。例えば、「怖い動物である」という属性がある場合、このような属性の値（「怖い動物である」程度を示す数値）を画像の特徴から判断することは非常に困難である。このような場合に、その動物の鳴き声などの音声情報を入力データとして用いるならば、画像以上に精度の良い認識ができると考えられる。さらには、「皮膚が柔らかい」「表面がザラザラしている」等の属性がある場合には、動物に直接触れることができる圧力センサの情報も用いることで、画像では分かりにくいこれらの属性を精度よく扱うことが可能となる。このような様々な感覚（センサ情報）を用いた対象の認識の仕方は、人間が行っている認識に非常に近いものである。そのため、本実施の形態は、人と同じように活動ができるロボットへ応用するならば非常に有用と考えられる。ロボットは、人間と同じような感覚（センサ）を搭載することで、人と同じような概念を獲得することができる。これにより、ロボットは、例えば人から命令を受ける際に、曖昧且つ複雑に人が認識している属性も扱うことができるようになるため、人とロボットとのやり取りの効率化や簡易化をすすめることができる。 Note that as teacher data (input data), not only images but also voice and various sensor information can be input. For example, when there is an attribute of “a scary animal”, it is very difficult to determine the value of such an attribute (a numerical value indicating the degree of “a scary animal”) from the characteristics of the image. In such a case, if voice information such as the cry of the animal is used as input data, it is considered that the recognition can be performed with higher accuracy than the image. Furthermore, if there are attributes such as “soft skin” and “gritty surface”, the information of the pressure sensor that can directly touch the animal is also used, so that these attributes that are difficult to understand in the image can be accurately detected. It can be handled well. The method of recognizing an object using such various senses (sensor information) is very close to the recognition performed by humans. For this reason, this embodiment is considered to be very useful if applied to a robot that can be active in the same way as a human being. A robot can acquire the same concept as a human by installing a sensor (sensor) similar to a human. This makes it possible for robots to handle vague and complex attributes that humans recognize when receiving commands from humans, for example, thus promoting efficiency and simplification of interaction between humans and robots. be able to.

また、本実施の形態においては、入力される教師データには、ラベルとして、少なくともその教師データが有する属性が付与されているものとする。なお、この他に、教師データの属するクラスに関する情報等をラベルとして保持させてもよい。ラベルは、典型的には人手によりあらかじめ付与されるが、図示しないラベリング部が、所定のアルゴリズムに従って自動的に付与するものであってもよい。 Further, in the present embodiment, it is assumed that at least an attribute of the teacher data is given to the input teacher data as a label. In addition to this, information on a class to which the teacher data belongs may be held as a label. The label is typically given in advance by hand, but may be automatically given by a labeling unit (not shown) according to a predetermined algorithm.

属性とは、上述したように、識別対象を示すクラスが有する性質又は特徴を表わす値のことをいう。例えば、教師データが動物の画像情報であれば、その教師データのクラスに応じて、茶色い、大きい、毛が長い、肉食である等の複数の属性がその程度を表す属性値と共にラベルとして付与される。本実施の形態においては、この属性値は多値である。このように、ＳＴＡＲ−ＳＯＩＮＮでは、属性値を連続値として扱うことができるため、ラベルを人手により入力するだけでなく、例えばセンサから入力されるデータを正規化して得られる値をそのまま用いることも可能である。 As described above, the attribute refers to a value representing the property or characteristic of the class indicating the identification target. For example, if the teacher data is animal image information, a plurality of attributes such as brown, large, long hair, and carnivorous are given as labels together with attribute values indicating the degree according to the class of the teacher data. The In the present embodiment, this attribute value is multivalued. In this way, since the attribute value can be handled as a continuous value in STAR-SOINN, not only the label is manually input, but also, for example, a value obtained by normalizing data input from the sensor can be used as it is. Is possible.

ここで、通常、クラスが異なれば、属性の種類や、その組み合わせ、及び属性値の組み合わせ等は異なるものとなる。識別対象となるクラスにどのような属性を設定するか、また、属性をいくつ設定するか、により、識別器の性能も異なる。すなわち、属性を多数設定すれば、そのクラスをより詳細に表わすことができるが、演算量が増大する。また、属性の数が少なすぎれば、識別能力が低下することとなる。そのクラスを識別するための最適な属性を適当数設定することで、より高性能な認識器を生成することが可能となる。 Here, usually, different classes have different attribute types, combinations thereof, attribute value combinations, and the like. The performance of the discriminator varies depending on what attributes are set in the class to be identified and how many attributes are set. That is, if a large number of attributes are set, the class can be expressed in more detail, but the calculation amount increases. Also, if the number of attributes is too small, the identification capability will be reduced. By setting an appropriate number of optimal attributes for identifying the class, a higher performance recognizer can be generated.

勝者ノード抽出部１０２は、ＳＴＡＲ−ＳＯＩＮＮに重みベクトル（入力ノード）が入力されたときに、入力ノードと各既存ノードとの間の距離を算出し、当該入力ノードと最も近いノード及び２番目に近いノードをそれぞれ第１勝者ノード及び第２勝者ノードとして抽出する処理を行う。 The winner node extraction unit 102 calculates the distance between the input node and each existing node when the weight vector (input node) is input to the STAR-SOINN, and the node closest to the input node and the second A process of extracting close nodes as a first winner node and a second winner node is performed.

ノード挿入判定部１０３は、入力ノードと、第１及び第２勝者ノードとの距離に基づき、入力ノードをＳＴＡＲ−ＳＯＩＮＮ内に新たなノードとして挿入するか否かを判定する処理を行う The node insertion determining unit 103 performs processing for determining whether to insert the input node as a new node in the STAR-SOINN based on the distance between the input node and the first and second winner nodes.

エッジ管理部１０４は、エッジの生成、削除に関する処理を行う。具体的には、入力ノードを新たなノードとして挿入しない場合に、第１勝者ノードと第２勝者ノードとの間にエッジがない場合はエッジを生成しその年齢を０とし、エッジがある場合はその年齢を０とする。また、第１勝者ノードが有する全エッジの年齢をインクリメントし、所定の年齢に達したエッジを削除する等する。 The edge management unit 104 performs processing related to edge generation and deletion. Specifically, when the input node is not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is set to 0. Let the age be 0. Also, the ages of all the edges of the first winner node are incremented, and the edges that have reached a predetermined age are deleted.

ノード重み更新部１０５は、入力ノードを新たなノードとして挿入しない場合、入力ノードの重みベクトルに基づき、第１勝者ノード及び第２勝者ノードの重みベクトルを更新する処理を行う。なお、本実施の形態においては、第１及び第２勝者ノードの重みを更新するものとして説明するが、第１勝者ノードの重みベクトルのみを更新するようにしてもよい。 When the input node is not inserted as a new node, the node weight update unit 105 performs a process of updating the weight vectors of the first winner node and the second winner node based on the weight vector of the input node. In this embodiment, the weights of the first and second winner nodes are described as being updated. However, only the weight vector of the first winner node may be updated.

ラベル重み更新部１０６は、入力ノードを新たなノードとして挿入しない場合、入力ノードのラベル重みを少なくとも第１勝者及び第２勝者に拡散する処理、及びノードを削除するとき、そのノードが有するラベル重みの少なくとも一部を、当該削除ノードの周辺に存在するノードのラベル重みを更新する処理を行う。 When the input node is not inserted as a new node, the label weight update unit 106 spreads the label weight of the input node to at least the first winner and the second winner, and when deleting the node, the label weight of the node A process of updating the label weight of a node existing around the deleted node is performed on at least a part of the node.

ノード削除部１０７は、所定のタイミング、本実施の形態においては、ノードがλの倍数の年齢に達すると、当該ノードのノード密度、すなわちそのノードにエッジで結ばれるノードがいくつ存在するか等に応じてノードを削除する処理を行う。 When a node reaches an age that is a multiple of λ, the node deletion unit 107 determines the node density of the node, that is, how many nodes are connected to the node by an edge, etc. In response, the node is deleted.

なお、認識器生成装置１００は、図示しない属性適正化部を有してもよい。属性適正化部は、所定のタイミングで、ノードにとって不要な属性を削除することができる。所定のタイミングとは、例えばＳＴＡＲ−ＳＯＩＮＮ内のノードの年齢が所定の年齢に達したときであってよく、あるいは、第１勝者及び第２勝者ノードに入力ノードの属性が拡散されるときであってもよい。また、不要な属性とは、例えば重複する属性、又は属性値が所定の閾値を下回る属性等としてもよい。あるいは、属性の関連度が所定の閾値を下回る属性について、所定の評価式等により属性としての有効度を算出し、この有効度が低い属性を不要と判定してもよい。さらに、属性数の上限数が定められている場合であって、新たに属性が追加されたため属性の数がその上限を超える場合に、すべての属性の中から属性値が最も低い属性を選択してもよい。 Note that the recognizer generation device 100 may include an attribute optimization unit (not shown). The attribute optimization unit can delete an attribute unnecessary for the node at a predetermined timing. The predetermined timing may be, for example, when the age of the node in the STAR-SOIN reaches a predetermined age, or when the attribute of the input node is spread to the first winner node and the second winner node. May be. The unnecessary attribute may be, for example, an overlapping attribute or an attribute whose attribute value is lower than a predetermined threshold. Alternatively, for an attribute whose attribute relevance is lower than a predetermined threshold, the effectiveness as an attribute may be calculated by a predetermined evaluation formula or the like, and an attribute with a low effectiveness may be determined as unnecessary. In addition, if the upper limit number of attributes is set and the number of attributes exceeds the upper limit because a new attribute has been added, the attribute with the lowest attribute value is selected from all attributes. May be.

このように属性を減らすことにより、過学習の抑制や認識時間の短縮、情報量の削減などが実現できる。なお、属性を削除した場合は、認識率が低下しない事を認識作業により判定することが望ましい。具体的には、削除する属性と似ている属性を検出し、それらを定量的に評価して削除の可否を判断することができる。 By reducing the attributes in this way, it is possible to suppress over-learning, shorten the recognition time, reduce the amount of information, and the like. In addition, when the attribute is deleted, it is desirable to determine by the recognition work that the recognition rate does not decrease. Specifically, it is possible to detect attributes similar to the attribute to be deleted and quantitatively evaluate them to determine whether deletion is possible.

上述のＡＴ−ＳＯＩＮＮとは異なり、ＳＴＡＲ−ＳＯＩＮＮは、属性の増減がＳＯＩＮＮの数に影響しない。また、ＳＴＡＲ−ＳＯＩＮＮは、このような属性の削除の作業をオンラインで実行することが可能である。そのため、装置の動作中であっても、新しい属性の削減を柔軟に実行することが可能である。すなわち、ＳＴＡＲ−ＳＯＩＮＮは、環境や命令者の要求に合わせた学習や認識が必要な、人の生活環境のなかで働くロボットに適したオンライン学習性を備えている。 Unlike AT-SOINN described above, the increase or decrease in attributes does not affect the number of SOINNs in STAR-SOINN. Also, STAR-SOINN can execute such attribute deletion work online. Therefore, it is possible to flexibly reduce new attributes even during operation of the apparatus. That is, STAR-SOINN has an online learning property suitable for a robot working in a human living environment that requires learning and recognition in accordance with the environment and the demands of the commander.

＜認識器生成方法＞
つづいて、図１５及び図１８を用いて、本実施の形態にかかる認識器生成装置１００の動作について具体的に説明する。図１５は認識器生成装置１００の処理を示すフローチャートである。また、図１８は、認識器生成装置１００及び後述の認識装置２００が行う処理の概念図である。 <Recognizer generation method>
Next, the operation of the recognizer generation device 100 according to the present embodiment will be specifically described with reference to FIGS. 15 and 18. FIG. 15 is a flowchart showing processing of the recognizer generation device 100. FIG. 18 is a conceptual diagram of processing performed by the recognizer generation device 100 and a recognition device 200 described later.

Ｓ５０１：認識器生成装置１００に、教師データが入力される。 S501: Teacher data is input to the recognizer generation device 100.

Ｓ５０２：特徴抽出部１０１が、入力された教師データから特徴量を抽出する。 S502: The feature extraction unit 101 extracts feature amounts from the input teacher data.

Ｓ４０１：抽出された特徴量は、重みベクトルとして、ＳＴＡＲ−ＳＯＩＮＮに入力される。この重みベクトルを入力ノードという。学習段階（認識器生成段階）では、入力ノードと共にラベルが入力され、各入力ノードには、上記ラベルが付与される。 S401: The extracted feature amount is input to STAR-SOINN as a weight vector. This weight vector is called an input node. In the learning stage (recognition generator generation stage), a label is input together with the input node, and the label is given to each input node.

Ｓ４０２：ノード削除部１０７が、ＳＴＡＲ−ＳＯＩＮＮ内のすべての既存ノードの年齢をインクリメントする。 S402: The node deletion unit 107 increments the ages of all existing nodes in the STAR-SOINN.

Ｓ３０２：勝者ノード抽出部１０２が、入力ノードというと既存ノードとの間の距離、典型的にはベクトル間の距離を示すユークリッド距離を計算する。この計算の結果、入力ノードと最も近いノードを第１勝者、２番目に近いノードを第２勝者として抽出する。 S302: The winner node extraction unit 102 calculates a Euclidean distance indicating a distance from an existing node, typically a distance between vectors, as an input node. As a result of this calculation, the node closest to the input node is extracted as the first winner, and the second closest node is extracted as the second winner.

Ｓ３０３：ノード挿入判定部１０３が、入力ノードを、ＳＴＡＲ−ＳＯＩＮＮ内に新たなノードとして挿入するか否かを判定する。 S303: The node insertion determining unit 103 determines whether or not to insert the input node as a new node in the STAR-SOINN.

Ｓ４０３：Ｓ３０３において新たなノードを挿入すべきと判定された場合、ノード挿入判定部１０３は、入力ノードを、ＳＴＡＲ−ＳＯＩＮＮ内に新たなノードとして挿入する。すなわち、入力ノードと同じ位置に、新たなノードを作成する。このとき、ノード挿入判定部１０３は、新たなノードに対し、入力ノードが有していたラベルを付与する。 S403: When it is determined in S303 that a new node should be inserted, the node insertion determination unit 103 inserts the input node as a new node in the STAR-SOINN. That is, a new node is created at the same position as the input node. At this time, the node insertion determination unit 103 gives the label that the input node has to the new node.

Ｓ３０５：Ｓ３０３において新たなノードを挿入すべきでないと判定された場合、エッジ管理部１０４は、第１勝者と第２勝者との間にエッジがあるか否かを判定する。 S305: When it is determined in S303 that a new node should not be inserted, the edge management unit 104 determines whether there is an edge between the first winner and the second winner.

Ｓ３０６：Ｓ３０５においてエッジがないと判定された場合、エッジ管理部１０４は、第１勝者と第２勝者との間にエッジを生成する。 S306: When it is determined in S305 that there is no edge, the edge management unit 104 generates an edge between the first winner and the second winner.

Ｓ３０７：エッジ管理部１０４は、Ｓ３０６においてエッジを生成した場合、その年齢を０とする。また、エッジを生成しなかった場合、既存のエッジの年齢を０とする。さらに、エッジ管理部１０４は、第１勝者が接続されている全てのエッジの年齢をインクリメントする。 S307: The edge management unit 104 sets the age to 0 when an edge is generated in S306. If no edge is generated, the age of the existing edge is set to zero. Furthermore, the edge management unit 104 increments the ages of all edges to which the first winner is connected.

Ｓ３０８：エッジ管理部１０４は、年齢が所定の閾値（ａｇｅ）に達したエッジがあれば、そのエッジを削除する。 S308: If there is an edge whose age has reached a predetermined threshold (age), the edge management unit 104 deletes the edge.

Ｓ３０９：ノード重み更新部１０５が、少なくとも第１勝者ノードの重みベクトルを、入力ノードの重みベクトルに基づいて更新する。あるいは、第１勝者とその隣接ノードの重みベクトルを更新することとしてもよい。重みベクトルの更新量は、例えば数３及び数４により求めることが可能である。 S309: The node weight update unit 105 updates at least the weight vector of the first winner node based on the weight vector of the input node. Or it is good also as updating the weight vector of a 1st winner and its adjacent node. The update amount of the weight vector can be obtained by, for example, Equation 3 and Equation 4.

Ｓ４０４：ラベル重み更新部１０６が、入力ノードのラベル重みを、第１勝者ノード及び第２勝者ノードに拡散する。すなわち、ラベル重み更新部１０６は、第１勝者ノード及び第２勝者ノードのラベル重みを、入力ノードのラベル重みの少なくとも一部に基づいて更新する。ラベル重みの更新量は、例えば数Ａ及び数Ｂにより求めることができる。なお、拡散の範囲は、例えば第１勝者とその隣接ノード、及び第２勝者とその隣接ノードとしてもよい。 S404: The label weight update unit 106 spreads the label weight of the input node to the first winner node and the second winner node. That is, the label weight update unit 106 updates the label weights of the first winner node and the second winner node based on at least a part of the label weight of the input node. The update amount of the label weight can be obtained from the number A and the number B, for example. In addition, the range of spreading | diffusion is good also as a 1st winner and its adjacent node, and a 2nd winner and its adjacent node, for example.

Ｓ４０５：ノード削除部１０７が、ノードの年齢があらかじめ定められた設定値にあたり、かつ隣接ノード数があらかじめ定められた閾値λ以下であるノードを、ＳＴＡＲ−ＳＯＩＮＮのすべてのノードの中から抽出する。ここで、上記設定値は、例えば定数λの倍数とすることができる。 S405: The node deletion unit 107 extracts a node whose node age is equal to a predetermined setting value and whose number of adjacent nodes is equal to or less than a predetermined threshold λ from all the nodes of the STAR-SOINN. Here, the set value can be a multiple of a constant λ, for example.

Ｓ３１１：ノード削除部１０７は、Ｓ４０５において削除対象として抽出されたノードを削除する。 S311: The node deletion unit 107 deletes the node extracted as the deletion target in S405.

Ｓ４０６：ラベル重み更新部１０６が、削除されたノードのラベル重みの少なくとも一部を、削除ノードの周辺のノード、例えば、削除ノードと最も近いノード及び２番目に近いノードに譲渡する。ラベル重みの譲渡は、譲渡される側のノードのラベル重みを、それぞれ数Ｃ及び数Ｄに従って更新することにより行うことができる。ここで、ラベル重み更新部１０６は、任意のラベルのラベル重みのみを拡散するよう構成してもよい。例えば、属性に関するラベルのみを拡散することも可能である。また、属性とクラスとを拡散することとしてもよい。 S406: The label weight update unit 106 transfers at least a part of the label weight of the deleted node to nodes around the deleted node, for example, the node closest to the deleted node and the second closest node. The assignment of the label weight can be performed by updating the label weight of the node on the assignment side according to the number C and the number D, respectively. Here, the label weight updating unit 106 may be configured to diffuse only the label weight of an arbitrary label. For example, it is also possible to spread only the labels relating to attributes. Also, attributes and classes may be diffused.

Ｓ３１２：認識器生成装置１００は、入力されたノードの数があらかじめ定められた定数ρに達したならば、学習は完了したものと判断し、処理を完了する。未だ達していない場合は、次の入力ノードの入力を受付け、上述した手順により処理を継続する。学習を完了したＳＴＡＲ−ＳＯＩＮＮのノード群は、後述する認識装置２００が備える認識器として利用可能である。 S312: The recognizer generating apparatus 100 determines that learning is completed when the number of input nodes reaches a predetermined constant ρ, and completes the process. If not reached yet, the input of the next input node is accepted and the processing is continued according to the above-described procedure. The STAR-SOINN node group for which learning has been completed can be used as a recognizer included in the recognition apparatus 200 described later.

また、上述の実施の形態においては、認識器生成装置１００が、入力データから抽出される１つの特徴量に対応する１つの認識器を生成する構成について説明した。しかしながら、入力データから複数の特徴量が抽出できる場合は、それらの特徴量にそれぞれ対応する、認識器生成装置１００と同等の機能を備えた認識器生成部を複数準備し、各特徴量に対応して独立した認識器を生成する構成とすることができる。この場合、これらの認識器生成部の特徴抽出部１０１は、それぞれ違う特徴量を抽出するよう構成することができる。より具体的には、識別器生成装置１００に教師データが入力されると、その教師データがこれら複数の認識器生成部の特徴抽出部１０１にそれぞれ入力される。これらの特徴抽出部１０１は、それぞれが対応する勝者ノード抽出部１０２、ノード挿入判定部１０３、エッジ管理部１０４、ノード重み管理部１０５、ラベル重み更新部１０６及びノード削除部１０７を有しており、これらの構成要素が、それぞれ上述のステップＳ４０１乃至Ｓ３１５と同様の処理を行う。 Further, in the above-described embodiment, the configuration has been described in which the recognizer generation device 100 generates one recognizer corresponding to one feature amount extracted from input data. However, when a plurality of feature amounts can be extracted from the input data, a plurality of recognizer generation units each having a function equivalent to the recognizer generation device 100 corresponding to each of the feature amounts is prepared, and each feature amount is supported. Thus, an independent recognizer can be generated. In this case, the feature extraction units 101 of these recognizer generation units can be configured to extract different feature amounts. More specifically, when teacher data is input to the discriminator generation device 100, the teacher data is input to the feature extraction units 101 of the plurality of recognizer generation units. These feature extraction units 101 each have a corresponding winner node extraction unit 102, node insertion determination unit 103, edge management unit 104, node weight management unit 105, label weight update unit 106, and node deletion unit 107. These components perform the same processing as in steps S401 to S315 described above.

かかる構成により、認識器生成装置１００は、複数の特徴量に対応する複数の認識器を生成することができる。 With this configuration, the recognizer generation device 100 can generate a plurality of recognizers corresponding to a plurality of feature amounts.

＜認識装置の構成＞
つづいて、認識器生成装置１００により生成した認識器を用いて、転移学習を行うことが可能な認識装置２００について説明する。認識装置２００は、上記認識器を用いて入力データの属性を認識し、その属性により認識対象であるクラス認識することで転移学習を行う。 <Configuration of recognition device>
Next, a recognition device 200 capable of performing transfer learning using the recognizer generated by the recognizer generation device 100 will be described. The recognition apparatus 200 recognizes the attribute of the input data using the recognizer, and performs transfer learning by recognizing the class that is the recognition target based on the attribute.

図１６を用いて、本実施の形態にかかる認識装置２００の構成について説明する。認識器生成装置２００は、典型的には、専用コンピュータ、パーソナルコンピュータ（ＰＣ）などのコンピュータにより実現される。 The configuration of the recognition apparatus 200 according to the present embodiment will be described with reference to FIG. The recognizer generation device 200 is typically realized by a computer such as a dedicated computer or a personal computer (PC).

認識装置２００の構成要素２０１乃至２０３は、図示しない記憶手段等に格納された各種プログラムに基づいて、各種制御をそれぞれ実行する機能を有し、中央演算処理装置（ＣＰＵ）、読出専用メモリ（ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）、入出力ポート（Ｉ／Ｏ）等により実現される。 The constituent elements 201 to 203 of the recognition apparatus 200 have a function of executing various controls based on various programs stored in a storage unit (not shown), and are provided with a central processing unit (CPU) and a read-only memory (ROM). ), A random access memory (RAM), an input / output port (I / O), and the like.

特徴抽出部２０１は、入力データから特徴量を重みベクトルとして抽出する処理を行う。 The feature extraction unit 201 performs a process of extracting feature amounts as weight vectors from input data.

なお、特徴抽出部２０１は、上述の特徴抽出部１０１と同一のものであってもよい。すなわち、特徴抽出部１０１及び２０１は、学習時は教師データの特徴量を抽出し、前記入力データの特徴量を抽出するよう機能するものであってよい。 Note that the feature extraction unit 201 may be the same as the feature extraction unit 101 described above. That is, the feature extraction units 101 and 201 may function to extract the feature amount of the teacher data during learning and to extract the feature amount of the input data.

認識器２０２は、入力データから認識すべき認識対象をクラスとし、当該クラスをその特徴である属性により認識することで転移学習が可能であり、上述の認識器生成装置１００に、クラス及び属性がラベル重みとして付された教師データを入力し、その特徴量を学習することで生成される。 The recognizer 202 can perform transfer learning by using a recognition target to be recognized from input data as a class, and recognizing the class by an attribute that is a characteristic of the class. The recognizer generating apparatus 100 has a class and an attribute. It is generated by inputting the teacher data attached as the label weight and learning the feature amount.

すなわち、この認識器２０２は、認識器生成装置１００に、所定数の前記教師データを入力した後の各ノード（学習済ノード）により構成されている。 That is, the recognizer 202 is configured by each node (learned node) after inputting a predetermined number of the teacher data to the recognizer generation device 100.

結果出力部２０３は、認識器２０２が有する複数の学習済ノードと、入力データから抽出した重みベクトルとの距離に応じて、入力ノードの属性及びクラスを認識し、認識結果を出力する処理を行う。以下に、認識方法の一例を示す。 The result output unit 203 recognizes the attribute and class of the input node according to the distance between the plurality of learned nodes included in the recognizer 202 and the weight vector extracted from the input data, and performs a process of outputting the recognition result. . Below, an example of the recognition method is shown.

まず、結果出力部２０３は、認識器２０２に入力された重みベクトル（入力ノード）と、既存のノードとの距離、典型的にはユークリッド距離をそれぞれ計算する。この計算の結果を用いて、入力ノードと最も近いｋ個（ｋは任意の自然数）のノードを抽出する。つぎに、結果出力部２０３は、これらｋ個のノードが有するラベル重みに基づいて、入力ノードの属性及びクラスを認識する。 First, the result output unit 203 calculates the distance between the weight vector (input node) input to the recognizer 202 and the existing node, typically the Euclidean distance. Using the result of this calculation, k nodes (k is an arbitrary natural number) closest to the input node are extracted. Next, the result output unit 203 recognizes the attribute and class of the input node based on the label weights of these k nodes.

入力ノードの属性及びクラスを認識方法の一例を以下に示す。以下の例は、すべての属性値をｎｅｇａｔｉｖｅ又はｐｏｓｉｔｉｖｅの２値で表す場合の認識方法である。 An example of a method for recognizing attributes and classes of input nodes is shown below. The following example is a recognition method in the case where all attribute values are represented by binary values of negative or positive.

まず、認識器生成装置１００は、ρ番目に学習するクラス（動物）よりも前のクラス（動物）については、上記認識器生成方法として説明した学習ステップを実行する。一方、ρ番目に学習するクラス（動物）以降のクラス（動物）を学習する際には、上記学習ステップに加え、属性を判定するための閾値Ｔ（後述の数１７で利用）を決定するための処理を行う。この閾値Ｔを決定するための処理とは、属性それぞれについて、ｐｏｓｉｔｉｖｅ及びｎｅｇａｔｉｖｅである確率を認識作業により計算し、その平均をとることで算出する。具体的には、ｐｏｓｉｔｉｖｅ及びｎｅｇａｔｉｖｅである確率を求め、この値を用いて、ｐｏｓｉｔｉｖｅであるべき属性についてはｐｏｓｉｔｉｖｅの平均値を、ｎｅｇａｔｉｖｅであるべき属性についてはｎｅｇａｔｉｖｅの平均値を更新してゆく。例えば、ライオンの画像特徴量が入力され、「肉食動物か」という属性について、ｐｏｓｉｔｉｖｅ及びｎｅｇａｔｉｖｅの値を計算した場合を考える。ここで、ライオンは本来「肉食動物」（ｐｏｓｉｔｉｖｅ）であるから、この場合はｐｏｓｉｔｉｖｅの値を用いて、この「肉食動物か」という属性にかかるｐｏｓｉｔｉｖｅの平均値を更新する。ρ番目以降のすべての動物について同様の処理を行ってゆくと、この属性にかかるｐｏｓｉｔｉｖｅの平均値、ｎｅｇａｔｉｖｅの平均値が求められる。そして、このｐｏｓｉｔｉｖｅの平均値及びｎｅｇａｔｉｖｅの平均値の中間値、つまり足して２で割った値を、閾値Ｔとする。 First, the recognizer generation device 100 executes the learning step described as the recognizer generation method for a class (animal) prior to the ρ-th class (animal) to be learned. On the other hand, when learning a class (animal) subsequent to the ρ-th class (animal), in addition to the learning step, a threshold value T (used in Equation 17 described later) for determining attributes is determined. Perform the process. The process for determining the threshold value T is calculated by calculating the probability of positive and negative for each attribute through recognition work and taking the average. Specifically, the probability of being positive and negative is obtained, and using this value, the average value of positive is updated for the attribute that should be positive, and the average value of negative is updated for the attribute that should be negative. For example, let us consider a case where the image characteristic amount of a lion is input and the values of positive and negative are calculated for the attribute “carnivorous”. Here, since the lion is originally a “carnivorous animal” (positive), in this case, the value of positive is used to update the average value of the positive relating to the attribute of “carnivorous animal”. When the same processing is performed for all animals after the ρth, the average value of positive and the average value of negative relating to this attribute are obtained. Then, an intermediate value between the average value of the positive and the average value of the negative, that is, a value obtained by adding and dividing by 2, is set as a threshold value T.

かかる処理の後、認識器２０２は、数１７に示す判定式を利用することにより、入力ノードのクラスを認識する。ここで、ｃは認識結果のクラスである。また、Ｚは比較対象となるクラスの集合、Ｍは属性の数、Ｑは認識対象から抽出する特徴の数、ａはクラスｚの属性ｍ（ｐｏｓｉｔｉｖｅ又はｎｅｇａｔｉｖｅ）を表す。Ｔは学習時に認識を行った際の平均値の中間値、Ｄはｐｏｓｉｔｉｖｅの平均値からｎｅｇａｔｉｖｅの平均値を引いた値である。Ｕは数１８で求められる。数１８における、属性がｍ、特徴ｑの入力ノードの特徴量がＩのときのＰは数１９及び数２０で求められる。ここで、Ｎは特徴ｑのＳＴＡＲ−ＳＯＩＮＮにおいて入力ノードの特徴量Ｉとｔ番目に近い重みベクトルを持つノード、ＷはＮの重みベクトルを示す。また、数１９及び数２０の右辺のＰは数２１及び数２２で求められる。数２１及び数２２は、ノードＮが持っている属性ｍがｐｏｓｉｔｉｖｅ又はｎｅｇａｔｉｖｅである確率を示す。ＬはノードＮの属性ｍのラベル重みである。
After such processing, the recognizer 202 recognizes the class of the input node by using the determination formula shown in Equation 17. Here, c is a recognition result class. Z is a set of classes to be compared, M is the number of attributes, Q is the number of features to be extracted from the recognition target, and a is an attribute m (positive or negative) of class z. T is an intermediate value of average values when recognition is performed during learning, and D is a value obtained by subtracting the negative average value from the positive average value. U is obtained by Equation 18. In Equation 18, P when the attribute is m and the feature quantity of the input node of feature q is I is obtained by Equation 19 and Equation 20. Here, N is a node having the feature vector I of the input node and the t-th closest weight vector in the STAR-SOIN of the feature q, and W is a weight vector of N. Further, P on the right side of Equations 19 and 20 is obtained by Equations 21 and 22. Equations 21 and 22 indicate the probabilities that the attribute m of the node N is positive or negative. L is the label weight of the attribute m of the node N.

ここで、クラスの認識は、属性とクラスとの対応関係を定義した辞書データに基づいて行われる。この辞書データでは、複数の属性及びその値の組合せに対し、１のクラス名が対応付けられる。結果出力部２０３は、入力ノードが有していると認識された属性の種類及びその値と、この辞書データとを比較し、一致するものがあれば、そのクラス名を認識結果として出力することができる。なお、一致するクラス名がこの辞書データに記憶されていない場合であっても、結果出力部２０３は、認識の結果を、認識された属性の種類及びその値からなる未定義クラスとして出力することができる。 Here, class recognition is performed based on dictionary data defining the correspondence between attributes and classes. In this dictionary data, one class name is associated with a combination of a plurality of attributes and their values. The result output unit 203 compares the type and value of the attribute recognized as possessed by the input node with this dictionary data, and if there is a match, outputs the class name as a recognition result. Can do. Even if the matching class name is not stored in the dictionary data, the result output unit 203 outputs the recognition result as an undefined class composed of the recognized attribute type and its value. Can do.

＜認識方法＞
つづいて、図１７を用いて、本実施の形態にかかる認識装置２００の動作について具体的に説明する。 <Recognition method>
Next, the operation of the recognition apparatus 200 according to the present embodiment will be specifically described with reference to FIG.

Ｓ６０１：認識装置２００に、入力データが入力される。入力データとしては、例えば画像センサをはじめとする種々のセンサ等から入力される任意の情報を利用できる。 S601: Input data is input to the recognition apparatus 200. As input data, for example, arbitrary information input from various sensors including an image sensor can be used.

Ｓ５０２：特徴抽出部２０１は、この入力データから特徴量を抽出する。特徴量の抽出処理は、認識器生成装置１００の特徴抽出部１０１と同様に実施することができる。入力データから抽出された特徴量は、重みベクトルとして、認識器２０２に入力される。 S502: The feature extraction unit 201 extracts a feature amount from this input data. The feature amount extraction processing can be performed in the same manner as the feature extraction unit 101 of the recognizer generation device 100. The feature amount extracted from the input data is input to the recognizer 202 as a weight vector.

Ｓ６０２：結果出力部２０３は、既存のノードのうち、入力ノードと最も近いｋ個（ｋは任意の自然数）の学習済ノードを抽出する。 S602: The result output unit 203 extracts k learned nodes closest to the input node (k is an arbitrary natural number) among the existing nodes.

Ｓ６０３：結果出力部２０３は、これらｋ個の学習済ノードが有するラベル重みに基づいて、入力ノードの属性及びクラスを認識し、認識結果を出力する。 S603: The result output unit 203 recognizes the attribute and class of the input node based on the label weights of these k learned nodes, and outputs the recognition result.

なお、上述の実施の形態においては、認識装置２００が、入力データから抽出される１つの特徴量に対応する１つの認識器を利用して認識を行う構成（すなわちＱが１種類の場合）について説明した。しかしながら、入力データから複数の特徴量が抽出できる場合（Ｑが複数の場合）は、それらの特徴量にそれぞれ対応する、独立した認識器を利用する構成としてもよい。例えば、特徴抽出部２０１を複数備え、それらの特徴抽出部２０１がそれぞれ異なる特徴量を抽出するよう構成することができる。この場合、認識装置２００にデータが入力されると、その入力データがこれら複数の特徴抽出部２０１にそれぞれ入力される。これらの特徴抽出部２０１は、それぞれが対応する特徴抽出部２０１、認識器２０２を有しており、これらの認識器２０２から得られるパラメータを用いて、結果出力部２０３が認識結果を出力する。 In the above-described embodiment, a configuration in which the recognition apparatus 200 performs recognition using one recognizer corresponding to one feature amount extracted from input data (that is, when Q is one type). explained. However, when a plurality of feature amounts can be extracted from the input data (when Q is a plurality), a configuration using independent recognizers corresponding to the feature amounts may be used. For example, a plurality of feature extraction units 201 may be provided, and the feature extraction units 201 may be configured to extract different feature amounts. In this case, when data is input to the recognition apparatus 200, the input data is input to the plurality of feature extraction units 201, respectively. Each of these feature extraction units 201 includes a feature extraction unit 201 and a recognizer 202 corresponding to each other, and a result output unit 203 outputs a recognition result using parameters obtained from these recognizers 202.

なお、このような認識器生成装置及び認識装置は、専用コンピュータ、パーソナルコンピュータ（ＰＣ）などのコンピュータにより実現可能である。但し、コンピュータは、物理的に単一である必要はなく、分散処理を実行する場合には、複数であってもよい。図１に示すように、コンピュータ１０は、ＣＰＵ１１（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＯＭ１２（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）及びＲＡＭ１３（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）を有し、これらがバス１４を介して相互に接続されている。尚、コンピュータを動作させるためのＯＳソフトなどは、説明を省略するが、この情報処理装置を構築するコンピュータも当然備えているものとする。 Such a recognizer generation device and a recognition device can be realized by a computer such as a dedicated computer or a personal computer (PC). However, the computer does not need to be physically single, and a plurality of computers may be used when performing distributed processing. As shown in FIG. 1, the computer 10 includes a CPU 11 (Central Processing Unit), a ROM 12 (Read Only Memory), and a RAM 13 (Random Access Memory), which are connected to each other via a bus 14. Although explanation of OS software for operating the computer is omitted, it is assumed that a computer for constructing the information processing apparatus is also provided.

バス１４には又、入出力インターフェイス１５も接続されている。入出力インターフェイス１５には、例えば、キーボード、マウス、センサなどよりなる入力部１６、ＣＲＴ、ＬＣＤなどよりなるディスプレイ、並びにヘッドフォンやスピーカなどよりなる出力部１７、ハードディスクなどより構成される記憶部１８、モデム、ターミナルアダプタなどより構成される通信部１９などが接続されている。 An input / output interface 15 is also connected to the bus 14. The input / output interface 15 includes, for example, an input unit 16 including a keyboard, a mouse, and a sensor, a display including a CRT and an LCD, an output unit 17 including headphones and speakers, a storage unit 18 including a hard disk, A communication unit 19 including a modem and a terminal adapter is connected.

ＣＰＵ１１は、ＲＯＭ１２に記憶されている各種プログラム、又は記憶部１８からＲＡＭ１３にロードされた各種プログラムに従って各種の処理、本実施の形態においては、例えば最近傍プロトタイプ選択手段３４やプロトタイプ削除手段３５における処理を実行する。ＲＡＭ１３には又、ＣＰＵ１１が各種の処理を実行する上において必要なデータなども適宜記憶される。 The CPU 11 performs various processes according to various programs stored in the ROM 12 or various programs loaded from the storage unit 18 to the RAM 13, and in this embodiment, for example, processes in the nearest prototype selection unit 34 and the prototype deletion unit 35. Execute. The RAM 13 also appropriately stores data necessary for the CPU 11 to execute various processes.

通信部１９は、例えば図示しないインターネットを介しての通信処理を行ったり、ＣＰＵ１１から提供されたデータを送信したり、通信相手から受信したデータをＣＰＵ１１、ＲＡＭ１３、記憶部１８に出力したりする。記憶部１８はＣＰＵ１１との間でやり取りし、情報の保存・消去を行う。通信部１９は又、他の装置との間で、アナログ信号又はディジタル信号の通信処理を行う。 The communication unit 19 performs, for example, communication processing via the Internet (not shown), transmits data provided from the CPU 11, and outputs data received from the communication partner to the CPU 11, the RAM 13, and the storage unit 18. The storage unit 18 exchanges with the CPU 11 to save and erase information. The communication unit 19 also performs communication processing of analog signals or digital signals with other devices.

入出力インターフェイス１５は又、必要に応じてドライブ２０が接続され、例えば、磁気ディスク２０１、光ディスク２０２、フレキシブルディスク２０３、又は半導体メモリ２０４などが適宜装着され、それらから読み出されたコンピュータプログラムが必要に応じて記憶部１８にインストールされる。 The input / output interface 15 is also connected to the drive 20 as necessary. For example, a magnetic disk 201, an optical disk 202, a flexible disk 203, or a semiconductor memory 204 is appropriately mounted, and a computer program read from them is required. Is installed in the storage unit 18 according to the above.

＜実施例＞
本実施の形態の認識器生成装置１００及び認識装置２００（以下、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮ（ＡｔｔｒｉｂｕｔｅＴｒａｎｓｆｅｒ−ＳＴＡＲ−ＳＯＩＮＮ）という）を、非特許文献１記載の技術であるＬａｍｐｅｒｔらのＤＡＰとＩＡＰ、上述のＡＴ−ＳＯＩＮＮとの比較で評価した結果を図１９乃至図２２に示す。なお、本実施の形態の評価において用いたパラメータは、λ＝６００、ａｇｅ＝１００、η＝０、ｋ＝１３、ρ＝２５である。これらは予備実験の結果に基づき決定した。学習画像は２４，２９５枚、認識画像は６，１８０枚である。また、実験は全て、ＣＰＵが２．９３ＧＨｚ、メモリが８ＧＢのパソコンを使用した。 <Example>
The recognizer generation apparatus 100 and the recognition apparatus 200 (hereinafter referred to as AT-STAR-SOINN (Attribute Transfer-STAR-SOINN)) according to the present embodiment are DAP and IAP of Lampert et al., Which are techniques described in Non-Patent Document 1. The result evaluated in comparison with the above-mentioned AT-SOINN is shown in FIGS. The parameters used in the evaluation of this embodiment are λ = 600, age = 100, η = 0, k = 13, and ρ = 25. These were determined based on the results of preliminary experiments. There are 24,295 learning images and 6,180 recognition images. In all experiments, a personal computer with a CPU of 2.93 GHz and a memory of 8 GB was used.

図１９に、本実施の形態のＡＴ−ＳＴＡＲ−ＳＯＩＮＮ、及び上記他の手法の認識率の比較結果を示す。図１９より、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮにおける認識率は、バッチで学習したＤＡＰには劣るが、ＡＴ−ＳＯＩＮＮと同等であることがわかる。 FIG. 19 shows a comparison result of the recognition rates of the AT-STAR-SOINN of this embodiment and the other methods described above. From FIG. 19, it can be seen that the recognition rate in AT-STAR-SOINN is inferior to DAP learned in batch, but is equivalent to AT-SOINN.

図２０に、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮとＡＴ−ＳＯＩＮＮとがそれぞれ保有する、ＳＯＩＮＮ（ＳＴＡＲ−ＳＯＩＮＮ又はＡｄｊｕｓｔｅｄ−ＳＯＩＮＮ）のノード数の比較結果を示す。図２２によれば、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮでは、ＡＴ−ＳＯＩＮＮのノード数の９９．４７％を削減することができた。これは、ＡＴ−ＳＯＩＮＮではＡｄｊｕｓｔｅｄ−ＳＯＩＮＮの数が属性の数に比例していたのに対し、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮではＳＴＡＲ−ＳＯＩＮＮ内のラベルを用いて属性を管理することで、属性の増減によりＳＯＩＮＮの数を変動させずに学習出来ることに起因する。本実施の形態では、このような大幅な情報量（ＳＯＩＮＮ）の削減を行ったにも関わらず、上述のように認識率が殆ど低下しなかった。 FIG. 20 shows a comparison result of the number of nodes of SOINN (STAR-SOINN or Adjusted-SOINN) possessed by AT-STAR-SOINN and AT-SOINN, respectively. According to FIG. 22, in AT-STAR-SOINN, 99.47% of the number of nodes in AT-SOINN could be reduced. In AT-SOINN, the number of Adjusted-SOINN was proportional to the number of attributes, whereas in AT-STAR-SOINN, the number of attributes was increased or decreased by managing the attributes using the labels in STAR-SOINN. This is because learning can be performed without changing the number of SOINNs. In the present embodiment, the recognition rate hardly decreased as described above, despite such a large amount of information reduction (SOINN).

図２１及び図２２に、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮ及び上記他の手法の学習時間及び認識時間の比較結果を示す。図２１によれば、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮの学習時間は、ＤＡＰに比べ約１３，６１６倍速く、ＡＴ−ＳＯＩＮＮに比べ約４７倍速い。さらに、図２２によれば、ＡＴ−ＳＴＡＲ−ＳＯＩＮＮにおいて認識に要した時間は、ＤＡＰに比べ約１，８９２倍速く、ＡＴ−ＳＯＩＮＮに比べ約１５６倍速い。 21 and 22 show the comparison results of the learning time and the recognition time of AT-STAR-SOINN and the other methods described above. According to FIG. 21, the learning time of AT-STAR-SOINN is about 13,616 times faster than DAP and about 47 times faster than AT-SOINN. Further, according to FIG. 22, the time required for recognition in AT-STAR-SOINN is about 1,892 times faster than DAP and about 156 times faster than AT-SOINN.

このように、本実施の形態によれば、認識率を低下させることなく、高速に、オンラインかつ追加学習が可能な属性の学習及び転移を実現することができる。 Thus, according to the present embodiment, it is possible to realize learning and transfer of attributes that can be performed online and additional learning at high speed without reducing the recognition rate.

＜実施の形態３.＞
次に、実施の形態２における識別器生成装置により生成された識別器を搭載したロボット装置について説明する。図２３は、本実施の形態にかかるロボット装置を示すブロック図である。 <Third Embodiment>
Next, a robot apparatus equipped with a discriminator generated by the discriminator generation apparatus according to Embodiment 2 will be described. FIG. 23 is a block diagram showing the robot apparatus according to the present embodiment.

ロボット装置３００は、入力データ取得部３０１及び認識装置２００を有する。ここで、認識装置２０１は、実施の形態２において説明した認識装置２００と同一のものであってよい。 The robot apparatus 300 includes an input data acquisition unit 301 and a recognition apparatus 200. Here, the recognition apparatus 201 may be the same as the recognition apparatus 200 described in the second embodiment.

入力データ取得部３０１は、例えばカメラを備えた撮像部である。この場合、入力データ取得部３０１は、入力データとして画像データを取得することができる。なお、入力データ取得部３０１は、種々のセンサ等により他の種類のデータを取得できるものであってもよい。 The input data acquisition unit 301 is an imaging unit including a camera, for example. In this case, the input data acquisition unit 301 can acquire image data as input data. The input data acquisition unit 301 may be capable of acquiring other types of data using various sensors or the like.

認識装置２０１は、入力データ取得部３０１が取得したデータを入力データとして、その入力データクラスや属性等を認識する。この認識は、実施の形態２と同様の手順により実施可能である。 The recognition device 201 recognizes the input data class, attributes, and the like using the data acquired by the input data acquisition unit 301 as input data. This recognition can be performed by the same procedure as in the second embodiment.

なお、ロボット装置３００は、上述の認識器生成装置１００と同様の構成をさらに備えることにより、入力データを教師データとして追加学習を行うよう構成されてもよい。このとき、入力データ取得部３０１は、入力データに付与されるラベル情報を取得することが望ましい。ラベル情報は、例えば入力データが取得される度に人が入力することとしてもよく、あるいは図示しないラベリング部が、所定のアルゴリズムに従って自動的に付与するものであってもよい。 Note that the robot apparatus 300 may be configured to perform additional learning using input data as teacher data by further including a configuration similar to that of the recognizer generation apparatus 100 described above. At this time, it is desirable that the input data acquisition unit 301 acquires label information given to the input data. The label information may be input by a person every time input data is acquired, or may be automatically provided by a labeling unit (not shown) according to a predetermined algorithm.

また、ロボット装置３００は、例えば自走可能な車輪等の移動手段を有し、この移動手段により移動しながら、入力データ取得部３０１により入力データを取得するものであってもよい。 The robot apparatus 300 may include a moving unit such as a self-propelled wheel, and may acquire input data by the input data acquiring unit 301 while moving by the moving unit.

また、ロボット装置３００は、他のロボット装置、認識装置、認識器生成装置、あるいは種々の学習済みノードが登録された記憶装置又はデータベース等と通信可能な通信部を有しており、この通信部を介して、上記装置等から学習済ノードを取得するよう構成してもよい。 The robot apparatus 300 includes a communication unit capable of communicating with another robot apparatus, a recognition apparatus, a recognizer generation apparatus, or a storage device or database in which various learned nodes are registered. A learned node may be obtained from the above device or the like via

一般に、対象を認識するのに必要な属性の種類や辞書データは、環境等に応じて変化し得る。例えば、動物の種類を認識するのに必要な属性群や辞書データと、場所を認識するのに必要な属性群や辞書データは異なるものとなるであろう。そこで、そうした場合、ロボット装置３００は、認識対象に応じて必要な、特定の属性を備えた教師データにより学習済みのノード群を、通信部を介してダウンロードするよう構成することができる。ロボット装置３００は、このノード群を認識器として用いることにより、クラスや属性を適切に認識することができる。このように、状況に応じた適切な認識器を、ネットワークを介して融通することにより、他の装置における学習結果を再利用でき、適切な認識を効率的に行うロボット装置３００を提供することができる。 In general, the types of attributes and dictionary data necessary for recognizing an object can change according to the environment or the like. For example, the attribute group and dictionary data necessary for recognizing the kind of animal and the attribute group and dictionary data necessary for recognizing the place will be different. Therefore, in such a case, the robot apparatus 300 can be configured to download a node group that has been learned with teacher data having a specific attribute necessary for the recognition target via the communication unit. The robot apparatus 300 can appropriately recognize the class and the attribute by using the node group as a recognizer. As described above, by providing an appropriate recognizer according to the situation through the network, it is possible to reuse the learning result in another apparatus and to provide the robot apparatus 300 that efficiently performs appropriate recognition. it can.

本実施の形態では、ロボット装置３００がＳＴＡＲ−ＳＯＩＮＮを利用して高速に学習、及び認識を行うことが可能である。このような処理時間の高速化は、リアルタイム性が求められるロボットに対して必要不可欠な特徴であり、かかる点において本発明はロボットへの適用に好適である。 In the present embodiment, the robot apparatus 300 can learn and recognize at high speed using STAR-SOINN. Such an increase in processing time is an indispensable feature for a robot that requires real-time performance, and in this respect, the present invention is suitable for application to a robot.

＜その他の実施の形態＞
上述の実施の形態のほか、本発明はさらに、携帯端末への適用も可能である。例えば、携帯端末はカメラやマイクからなる入力部と、上述の認識装置２００とを備える。携帯端末は、入力部から入力されたデータを上述の認識装置２００により認識する。これにより、見たことや聞いたことがないものであっても、この携帯端末を用いて、それが何であるかを推測することができるようになる。 <Other embodiments>
In addition to the above-described embodiment, the present invention can be applied to a mobile terminal. For example, the mobile terminal includes an input unit including a camera and a microphone, and the recognition device 200 described above. The portable terminal recognizes the data input from the input unit by the recognition device 200 described above. As a result, even if something that has never been seen or heard, this mobile terminal can be used to infer what it is.

このとき、認識装置２００は、インターネットなどのネットワーク上に配置されていてもよい。また、認識装置２００は、インターネットなどのネットワーク上に配置された、学習済みノードのデータベースを用いて、その国の環境や文化に合わせた学習、及び認識を行うこととしてもよい。さらに、このデータベースは、このような学習済みノードのデータを、位置情報と関連付けて記憶していてもよい。 At this time, the recognition apparatus 200 may be arranged on a network such as the Internet. The recognition apparatus 200 may perform learning and recognition in accordance with the environment and culture of the country using a database of learned nodes arranged on a network such as the Internet. Further, this database may store such learned node data in association with the position information.

多くの携帯端末にはＧＰＳなどの位置を同定する機能が搭載されているから、このような機能により取得した位置情報をキーとして、上記データベースから適切な学習済みノードを取得し、これを認識に利用することで、認識器２００は場所の情報も用いた適切な認識を行うことができる。例えば、認識対象としてタオルを考えたときに、洗面所や浴場のような「体の一部を洗う場所」に特化した学習済みノードを用いれば、体を拭く物という意味合いも含めた認識が可能となるであろうし、ダイニングやリビングのような「机などがある場所」に特化した学習済みノードを用いれば、机や家具を拭くものという意味をも含めた認識が可能となるであろう。このような、場所により意味の異なる物であっても、本実施の形態の手法を用いることで、認識することが可能となる。 Many mobile terminals are equipped with a function for identifying the position such as GPS. Therefore, using the position information acquired by such a function as a key, an appropriate learned node is acquired from the database, and this is recognized. By using this, the recognizer 200 can perform appropriate recognition using location information. For example, when a towel is considered as a recognition target, using a learned node specialized in a “place where a part of the body is washed” such as a washroom or a bath, recognition including the meaning of an object that wipes the body is possible. It will be possible, and if we use learned nodes specialized for “places with desks” such as dining and living, it will be possible to recognize the meaning of wiping desks and furniture. Let's go. Even such a thing having a different meaning depending on a place can be recognized by using the method of the present embodiment.

このようなネットワークを用いた学習手法は、従来の転移を用いていない学習手法でも可能であるが、ネットワーク上に日常環境にあるもの全てに関する学習済みデータを備えることは現実的でない。この点、転移学習を行うＳＴＡＲ−ＳＯＩＮＮを用いた本発明によれば、学習済みのデータに基づいてみ学習の対象の認識が可能である。かかる点において、本発明は日常環境での認識処理が求められる上記携帯端末への適用に好適である。また、本発明は従来の学習手法に比べ計算量が少なくて済み、低処理能力でもリアルタイム性を確保できるため、その点においても携帯端末への搭載に適している。 Such a learning method using a network can be a learning method that does not use a conventional transfer, but it is not realistic to have learned data on everything in the daily environment on the network. In this regard, according to the present invention using the STAR-SOINN that performs transfer learning, it is possible to recognize the learning target based on the learned data. In this respect, the present invention is suitable for application to the portable terminal that requires recognition processing in a daily environment. In addition, the present invention requires a smaller amount of calculation than conventional learning methods, and can secure real-time performance even with a low processing capability, so that it is also suitable for mounting on a portable terminal.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

上述の実施の形態では、自己増殖型ニューラルネットワークとしてＡｄｊｕｓｔｅｄ−ＳＯＩＮＮを利用した例を示したが、本発明はこれに限定されず、Ｅｎｈａｎｃｅｄ−ＳＯＩＮＮ（特開２００８−２１７２４６）、ｋ−ｍｅａｎｓ等の公知のクラスタリングツールを利用するものであってもよい。 In the above-described embodiment, an example in which Adjusted-SOINN is used as a self-propagating neural network has been shown. However, the present invention is not limited to this, and Enhanced-SOINN (Japanese Patent Laid-Open No. 2008-217246), k-means, etc. A known clustering tool may be used.

また、上述の実施の形態では、１枚の画像から１つのクラス（動物）を認識する例を示したが、動画像などの複数の連続した画像から１つの動物を認識することも可能である。ＳＯＩＮＮを用いた時系列データの学習は、例えば非特許文献「ＳｈｏｇｏＯｋａｄａａｎｄＯｓａｍｕＨａｓｅｇａｗａ，Ｏｎ−ｌｉｎｅＬｅａｒｎｉｎｇｏｆＳｅｑｕｅｎｃｅＤａｔａＢａｓｅｄｏｎＳｅｌｆ−ｏｒｇａｎｉｚｉｎｇＩｎｃｒｅｍｅｎｔａｌＮｅｕｒａｌＮｅｔｗｏｒｋ，Ｉｎｔｅｒ−ｎａｔｉｏｎａｌＪｏｉｎｔＣｏｎｆｅｒｅｎｃｅｏｎＮｅｕｒａｌＮｅｔｗｏｒｋｓ，２００８．」に記載の手法を用いてを行うことができる。 In the above-described embodiment, an example is shown in which one class (animal) is recognized from one image, but it is also possible to recognize one animal from a plurality of continuous images such as moving images. . The learning of time series data using SOINN is, for example, the non-patent literature “Shoko Okada and Osamu Hasegawa, On-line Learning of Sequence Data Based on SelfNeural Incremental Incremental. Can be performed using the techniques described.

さらに、上述の実施の形態では、識別対象（クラス）として主に動物を例示したが、本発明はこれに限定されず、あらゆる物体、空間、事象等の識別に応用することが可能である。例えば、台所にある物体「コップ」「紙コップ」「ボトル」「やかん」等のクラスを、「水をいれるもの」「紙で出来ている」「金属でできている」等の属性により識別することが考えられる。また、文房具「ペン」「ボールペン」「はさみ」「カッター」等のクラスは、「細長い物」「書く物」「切るもの」「金属の刃」等の属性により、屋外にある物体「自転車」「自動車」「バイク」「トラック」等のクラスは、「タイヤ」「ハンドル」「荷台」「排気口」等の属性により識別することが可能である。 Furthermore, in the above-described embodiment, animals are mainly exemplified as identification objects (classes). However, the present invention is not limited to this, and can be applied to identification of all objects, spaces, events, and the like. For example, objects such as “cups”, “paper cups”, “bottles”, “kettles” in the kitchen are identified by attributes such as “water-filled”, “made of paper”, “made of metal”, etc. It is possible. The class of stationery "pen", "ballpoint pen", "scissors", "cutter", etc. has the object "bicycle", " Classes such as “automobile”, “bike”, and “truck” can be identified by attributes such as “tire”, “handle”, “loading platform”, and “exhaust port”.

また、上述の実施の形態では、ラベルとして属性を用いる例を主に示したが、本発明はこれに限定されず、あらゆるラベルの転移学習にこれを応用することが可能である。 Moreover, although the example which uses an attribute as a label was mainly shown in the above-mentioned embodiment, this invention is not limited to this, It is possible to apply this to transfer learning of every label.

また、上述の認識器生成装置及び認識器における任意の処理を、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）にコンピュータプログラムを実行させることにより実現することも可能である。この場合、コンピュータプログラムは、様々なタイプの非一時的なコンピュータ可読媒体（ｎｏｎ−ｔｒａｎｓｉｔｏｒｙｃｏｍｐｕｔｅｒｒｅａｄａｂｌｅｍｅｄｉｕｍ）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（ｔａｎｇｉｂｌｅｓｔｏｒａｇｅｍｅｄｉｕｍ）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（ＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰＲＯＭ）、フラッシュＲＯＭ、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（ｔｒａｎｓｉｔｏｒｙｃｏｍｐｕｔｅｒｒｅａｄａｂｌｅｍｅｄｉｕｍ）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 Further, arbitrary processing in the above-described recognizer generation device and recognizer can be realized by causing a CPU (Central Processing Unit) to execute a computer program, for example. In this case, the computer program can be stored and provided to the computer using various types of non-transitory computer readable media. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)). The program may also be supplied to the computer by various types of transitory computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１属性の学習及び転移システム
２特徴抽出部
３ラベリング部
４識別器保持部
５識別器生成部
６属性識別部
７クラス識別部
１００認識器生成装置
１０１特徴抽出部
１０２勝者ノード抽出部
１０３ノード挿入判定部
１０４エッジ管理部
１０５ノード重み管理部
１０６ラベル重み更新部
１０７ノード削除部
２００認識装置
２０１特徴抽出部
２０２認識器
２０３結果出力部
３００ロボット装置
３０１入力データ取得部 DESCRIPTION OF SYMBOLS 1 Attribute learning and transfer system 2 Feature extraction part 3 Labeling part 4 Classifier holding part 5 Classifier production | generation part 6 Attribute identification part 7 Class identification part 100 Recognizer production | generation apparatus 101 Feature extraction part 102 Winner node extraction part 103 Node insertion determination Unit 104 edge management unit 105 node weight management unit 106 label weight update unit 107 node deletion unit 200 recognition device 201 feature extraction unit 202 recognizer 203 result output unit 300 robot device 301 input data acquisition unit

Claims

A recognizer generating device that generates a recognizer that recognizes a class to be identified by an attribute that is a feature by learning a feature amount of teacher data,
A feature extraction unit that extracts a feature amount as a weight vector from teacher data in which the class and the attribute are attached as label weights;
The extracted weight vector is used as an input node, a distance between the input node and each node is calculated, and a node closest to the input node and a node closest to the second are respectively a first winner node and a second winner node. A winner node extraction unit that extracts as:
A node insertion determination unit that determines whether to insert the input node as a new node based on a distance between the input node and the first and second winner nodes;
When the input node is not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is set to 0. An edge management unit that increments the ages of all edges of the first winner node and deletes edges that have reached a predetermined age;
A node weight updating unit that updates the weight vector of the first winner node based on the weight vector of the input node when not inserting the input node as a new node;
When not inserting the input node as a new node, a label weight updating unit that spreads at least part of the label weight of the input node to the first and second winner nodes;
A node deletion unit that deletes the node according to the node density at a predetermined timing;
The said label weight update part is a recognizer production | generation apparatus which spread | diffuses at least one part of the label weight which a deletion node has to the surrounding node of the said deletion node, when the said node deletion part deletes a node.

The recognizer generation device according to claim 1, wherein the attribute is multi-value data.

The class be one in which a plurality of attributes are set, may have different attributes depending on the class, learner generation apparatus according to claim 1 or 2, wherein.

Any one of the Claims 1 thru | or 3 which further has an attribute optimization part which deletes an attribute unnecessary for the said node among the attributes contained in the label weight which the said node has whenever the age of the said node reaches predetermined age. The recognizer generation device according to claim 1.

5. The recognizer generation device according to claim 1, wherein the label weight update unit diffuses only the attribute among the label weights of the deleted node to the neighboring nodes. 6.

The node deletion unit increments the ages of all nodes every time a new node is input, and deletes nodes that are connected by edges with less than a predetermined number of nodes every time a predetermined age is reached. The recognizer generation device according to any one of claims 1 to 5.

A plurality of recognizer generation units including the feature extraction unit, the winner node extraction unit, the node insertion determination unit, the edge management unit, the node weight update unit, the label weight update unit, and a node deletion unit; The recognizer generation device according to claim 1, wherein the extraction unit extracts different feature amounts from the teacher data.

The node insertion determination unit calculates a similarity threshold, which is a threshold for determining whether or not to add the input node as a new node, based on the distances between the first and second winner nodes and surrounding nodes, The recognizer generation device according to claim 1, wherein it is determined whether to insert the input node as a new node based on a similarity threshold.

If there is a neighboring node that is connected to the first and second winner nodes by an edge, the node insertion determining unit determines the distance to the farthest node among the neighboring nodes, and the neighboring node exists. Otherwise, the distance from the nearest node is set as the first and second similarity thresholds, respectively, and when the input node is larger than one of the first and second similarity thresholds, the input node The recognizer generation device according to claim 1, wherein the recognizer generation device is inserted as a node.

The recognizer generation device according to claim 1, further comprising a labeling unit that labels the teacher data with the class and the attribute as label weights.

The recognizer generation device according to any one of claims 1 to 10, wherein each node after inputting a predetermined number of teacher data is set as a learned node, and a discriminator is configured by the learned node.

A recognizer generating method for generating a recognizer that recognizes a class to be identified by an attribute that is a feature by learning a feature amount of teacher data,
A feature extraction step of extracting a feature quantity as a weight vector from teacher data to which the class and the attribute are attached as label weights;
The extracted weight vector is used as an input node, a distance between the input node and each node is calculated, and a node closest to the input node and a node closest to the second are respectively a first winner node and a second winner node. A winner node extraction step to extract as
A node insertion determination step of determining whether to insert the input node as a new node based on a distance between the input node and the first and second winner nodes;
When the input node is not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is set to 0. An edge management step that increments the ages of all edges of the first winner node and deletes edges that have reached a predetermined age;
A node weight update step of updating the weight vector of the first winner node based on the weight vector of the input node when not inserting the input node as a new node;
A first label weight update step of spreading at least a part of the label weight of the input node to the first and second winner nodes when not inserting the input node as a new node;
A node deletion step of deleting a node according to the node density at a predetermined timing;
And a first label weight updating step of diffusing at least a part of the label weight of the deleted node to nodes around the deleted node when the node is deleted in the node deleting step.

A recognition device that can perform transfer learning by recognizing a class to be recognized from input data and recognizing the class by an attribute that is a characteristic of the class,
A feature extraction unit that extracts a feature quantity as a weight vector from the input data;
The classifier is generated by inputting the teacher data with the class and the attribute added as label weights to the recognizer generation device, and learning the feature amount between the learned node and the learned node. A recognizer configured using a self-propagating neural network including connected edges;
A result output unit that outputs a recognition result according to a distance between a plurality of learned nodes composed of weight vectors of the recognizer and a weight vector extracted from the input data;
The recognizer generation device includes:
A feature extraction unit for extracting a feature amount from the teacher data as a weight vector;
The extracted weight vector is used as an input node, a distance between the input node and each node is calculated, and a node closest to the input node and a node closest to the second are respectively a first winner node and a second winner node. A winner node extraction unit that extracts as:
A node insertion determination unit that determines whether to insert the input node as a new node based on a distance between the input node and the first and second winner nodes;
When the input node is not inserted as a new node, if there is no edge between the first winner node and the second winner node, an edge is generated and its age is set to 0. An edge management unit that increments the ages of all edges of the first winner node and deletes edges that have reached a predetermined age;
A node weight updating unit that updates the weight vector of the first winner node based on the weight vector of the input node when not inserting the input node as a new node;
When not inserting the input node as a new node, a label weight updating unit that spreads at least part of the label weight of the input node to the first and second winner nodes;
A node deletion unit that deletes the node according to the node density at a predetermined timing;
The label weight update unit, when the node deletion unit deletes a node, spreads at least a part of the label weight of the deletion node to nodes around the deletion node,
Each node after inputting a predetermined number of the teacher data is regarded as the learned node,
The recognizing device recognizes the attribute of the input data according to the similarity between the weight vector extracted from the input data and the learned node.

The recognition apparatus according to claim 13, wherein the feature extraction unit extracts a feature amount of the teacher data during learning and extracts a feature amount of the input data during recognition.

The result output unit extracts N (N is a natural number) learned nodes closest to the weight vector extracted from the input data, and based on the label weights of the N learned nodes, 15. The recognition apparatus according to claim 13 or 14, which outputs class and attribute information as a recognition result.

The result output unit has dictionary data indicating a correspondence relationship between the attribute and the class, and refers to the dictionary data based on a label weight of the N learned nodes, and the class of the input data The recognition apparatus according to claim 15 , which outputs a recognition result as a recognition result.

A feature extraction unit for extracting features from input data and teacher data;
A labeling unit for labeling the attribute information given to the teacher data;
An attribute discriminator for identifying an attribute included in the input data, wherein the attribute discriminator is configured using a self-propagating neural network including a node and an edge connecting the nodes, and the self-propagating neural network Is divided into a plurality of parts according to the identification contents identified by the attribute, and the feature extracting unit is applied to the part of the self-propagating neural network specified by the attribute information labeled by the labeling unit A classifier generating unit that inputs the characteristics of the teacher data extracted in step (b) as an input pattern and generates the node and the edge based on the input pattern in the self-propagating neural network;
A discriminator holding unit for holding the attribute discriminator generated by the discriminator generating unit;
When the input data is input, a feature extracted from the input data is input to each part of the self-propagating neural network constituting the attribute classifier held by the classifier holding unit. A first similarity between the input pattern and the node included in the self-propagating neural network is calculated in each part of the self-propagating neural network, and the calculated first similarity Depending on the degree, an attribute identification unit for identifying which attribute of the identification content is included in the input data;
Attribute information included in each of a plurality of classes is given, the attribute of the input data identified by the attribute identification unit is compared with the attribute information of the class to obtain a second similarity, and the calculated second A class identifying unit for identifying which class of the plurality of classes includes the input data according to the similarity of
An attribute learning and transfer system comprising:

The discriminator generator is
Claim 17, wherein the dividing a first portion indicating that it contains the attribute, and a second portion indicating that does not contain the attribute, the said attribute identifier Attribute learning and transfer system.