JP2008217246A

JP2008217246A - Information processor, information processing method, and program

Info

Publication number: JP2008217246A
Application number: JP2007051709A
Authority: JP
Inventors: Osamu Hasegawa; 修長谷川; Furao Shen; 富饒申; Kazuki Ogura; 和貴小倉
Original assignee: Tokyo Institute of Technology NUC
Current assignee: Tokyo Institute of Technology NUC
Priority date: 2007-03-01
Filing date: 2007-03-01
Publication date: 2008-09-18
Anticipated expiration: 2027-03-01
Also published as: JP5130523B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processor, an information processing method, and a program for separating a class having the overlap of the distribution of high density. <P>SOLUTION: This information processor is provided with a node density calculation means 27 for calculating the node density of a node of interest based on a mean distance between the node of interest and the adjacent nodes; a distribution overlap region detection means 28 for dividing a cluster as the group of nodes connected by sides into sub-clusters based on the node density calculated by the node density calculation means 27, and for detecting the overlap region of distribution; a side connection decision means 29 for, if the node where a winner node is positioned is in the distribution overlap region, deciding whether to connect a side between winner nodes based on the node density of the winner nodes; a side connection means 30 for connecting a side between the winner nodes based on the decision result; and a side deletion means 31 for deleting the side between the winner nodes based on the decision result. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、任意のクラスに属する入力ベクトルを順次入力して、当該入力ベクトルの入力分布構造を学習する情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program for sequentially inputting input vectors belonging to an arbitrary class and learning an input distribution structure of the input vectors.

入力データを任意個のクラスタに分類するクラスタリングとして、競合型ニューラルネットワークを利用する手法が良く知られている。競合型ニューラルネットワークは機械学習分野における教師なし分類の代表的な手法である。 As clustering for classifying input data into arbitrary clusters, a technique using a competitive neural network is well known. Competitive neural networks are a typical technique for unsupervised classification in the field of machine learning.

競合型ニューラルネットワークでは、入力層に学習データである入力ベクトルが与えられた場合に、競合層に配置される各ニューロンが持つ参照ベクトルと入力ベクトルとの距離を計算し、入力ベクトルに最も近い参照ベクトルを持つニューロン及びその近傍に位置する近傍ニューロンの参照ベクトルを入力ベクトルに近づくように更新することで学習が行われる。
逐次的に入力ベクトルを与えて学習を繰り返すことで、競合層には入力ベクトルの位相構造を反映した特徴マップが形成され、入力ベクトルの教師なしクラスタリングが行われる。
ここで、特徴マップとはニューロン群とそれらを結ぶ辺から構成されるネットワークを示す。 In a competitive neural network, when an input vector, which is training data, is given to the input layer, the distance between the reference vector of each neuron placed in the competitive layer and the input vector is calculated, and the reference closest to the input vector Learning is performed by updating the reference vectors of neurons having vectors and neighboring neurons located in the vicinity thereof so as to approach the input vector.
By sequentially applying the input vector and repeating the learning, a feature map reflecting the phase structure of the input vector is formed in the competitive layer, and unsupervised clustering of the input vector is performed.
Here, the feature map indicates a network composed of neuron groups and edges connecting them.

競合型ニューラルネットワークの代表的な手法であるコホネンの自己組織化マップは、ニューロンの数を事前に決定して特徴マップを形成するため、分類能力に限界がある。このため、学習中に学習すべきクラスが増加する追加学習への対応が難しい（非特許文献１参照）。
また、入力ベクトルが与えられる度に各ニューロンの参照ベクトルが更新されるため、過去に与えられた入力ベクトルに対応したニューロンの持つ参照ベクトルが徐々に破壊されてしまう。 Kohonen's self-organizing map, which is a representative method of competitive neural networks, has a limited classification ability because it determines the number of neurons in advance and forms a feature map. For this reason, it is difficult to cope with additional learning in which classes to be learned increase during learning (see Non-Patent Document 1).
Further, each time an input vector is given, the reference vector of each neuron is updated, so that the reference vector possessed by the neuron corresponding to the input vector given in the past is gradually destroyed.

一方、非特許文献２に開示された技術は、学習中に必要に応じてニューロンを増殖させることでこれらの問題に対応している。
以下、非特許文献２に開示された技術であるSelf-Organizing Incremental Neural Network（以下、ＳＯＩＮＮという。）による学習を簡単に説明する。 On the other hand, the technique disclosed in Non-Patent Document 2 addresses these problems by proliferating neurons as necessary during learning.
Hereinafter, learning by Self-Organizing Incremental Neural Network (hereinafter referred to as “SOINN”), which is a technique disclosed in Non-Patent Document 2, will be briefly described.

ＳＯＩＮＮは２層ネットワーク構造を有し、１層目及び２層目において同様の学習処理を実施する。ＳＯＩＮＮは、１層目の出力である学習結果を２層目への入力ベクトルとして利用する。 The SOINN has a two-layer network structure and performs the same learning process in the first layer and the second layer. SOINN uses the learning result, which is the output of the first layer, as an input vector to the second layer.

図１６は、従来技術であるＳＯＩＮＮによる学習処理を説明するためのフローチャートである。以下、図１６を用いてＳＯＩＮＮの処理を説明する。
Ｓ１０１：ＳＯＩＮＮに対して入力ベクトルを与える。
Ｓ１０２：与えられた入力ベクトルに最も近いノード(以下、第１勝者ノードという。)及び２番目に近いノード(以下、第２勝者ノードという。)を探索する。
Ｓ１０３：第１勝者ノード及び第２勝者ノードの類似度閾値に基づいて、入力ベクトルがこれら勝者ノードの少なくともいずれか一方と同一のクラスタに属すか否かを判定する。
ここで、ノードの類似度閾値はボロノイ領域の考えに基づいて算出する。学習過程において、ノードの位置は入力ベクトルの分布を近似するため次第に変化し、それに伴いボロノイ領域も変化する。即ち、類似度閾値もノードの位置変化に応じて適応的に変化してゆく。 FIG. 16 is a flowchart for explaining learning processing by SOIN which is a conventional technique. Hereinafter, the SOIN process will be described with reference to FIG.
S101: An input vector is given to SOINN.
S102: Search for a node closest to the given input vector (hereinafter referred to as a first winner node) and a second closest node (hereinafter referred to as a second winner node).
S103: Based on the similarity threshold of the first winner node and the second winner node, it is determined whether or not the input vector belongs to the same cluster as at least one of these winner nodes.
Here, the node similarity threshold is calculated based on the idea of the Voronoi region. In the learning process, the position of the node gradually changes to approximate the distribution of the input vector, and the Voronoi region also changes accordingly. That is, the similarity threshold value adaptively changes according to the change in the position of the node.

Ｓ１０４：Ｓ１０３における判定の結果、入力ベクトルが勝者ノードと異なるクラスタに属す場合は、入力ベクトルと同じ位置にノードを挿入し、Ｓ１０１へと進み次の入力ベクトルを処理する。
尚、このときの挿入をクラス間挿入と呼ぶ。
Ｓ１０５：一方、入力ベクトルが勝者ノードと同一のクラスタに属す場合は、第１勝者ノード及び第２勝者ノード間に辺を生成し、ノード間を辺によって直接的に接続する。
Ｓ１０６：第１勝者ノード及び第１勝者ノードと辺によって直接的に接続しているノードの重みベクトルをそれぞれ更新する。 S104: If the result of determination in S103 is that the input vector belongs to a different cluster from the winner node, the node is inserted at the same position as the input vector, and the process proceeds to S101 to process the next input vector.
This insertion is called interclass insertion.
S105: On the other hand, when the input vector belongs to the same cluster as the winner node, an edge is generated between the first winner node and the second winner node, and the nodes are directly connected by the edge.
S106: Update the weight vectors of the first winner node and the nodes directly connected to the first winner node by edges.

Ｓ１０７：Ｓ１０５において生成された辺は年齢を有しており、予め設定された閾値を超えた年齢を持つ辺を削除する。
入力ベクトルを逐次的に与えてゆくオンライン学習においては、ノードの位置が常に徐々に変化してゆくため、初期の学習で構成した隣接関係が以後の学習によって成立しない可能性がある。このため、一定期間を経ても更新されないような辺について、辺の年齢が高くなるように構成することにより、学習に不要な辺を削除する。 S107: The edge generated in S105 has an age, and an edge having an age exceeding a preset threshold is deleted.
In online learning in which input vectors are given sequentially, the position of the node always changes gradually, so that the adjacency constructed by the initial learning may not be established by subsequent learning. For this reason, a side that is not updated even after a certain period of time is configured so that the age of the side becomes high, thereby deleting a side that is not necessary for learning.

Ｓ１０８：入力ベクトルの入力総数が、予め設定されたλの倍数であるか否かを判定する。
判定の結果、入力ベクトルの入力総数がλの倍数でない場合には、Ｓ１０１へと戻り次の入力ベクトルを処理する。
一方、入力ベクトルの総数がλの倍数となった場合には以下の処理を実行する。 S108: It is determined whether the total number of inputs of the input vector is a preset multiple of λ.
As a result of the determination, if the total number of input vectors is not a multiple of λ, the process returns to S101 to process the next input vector.
On the other hand, when the total number of input vectors is a multiple of λ, the following processing is executed.

Ｓ１０９：局所累積誤差が最大であるノードを探索し、そのノード付近に新たなノードを挿入する。ノードの持つ平均誤差を示す誤差半径に基づいて、ノード挿入が成功であったか否かを判定する。
尚、このときの挿入をクラス内挿入と呼ぶ。
ここで、ノード及び入力ベクトル間の距離差をノードの持つ誤差として、入力ベクトルの入力に応じてノードの誤差を累積することにより局所累積誤差を算出する。誤差半径はノードの持つ誤差及びノードが第１勝者となった回数に基づいて算出する。 S109: A node having the largest local accumulated error is searched, and a new node is inserted in the vicinity of the node. It is determined whether or not the node insertion is successful based on the error radius indicating the average error of the node.
This insertion is called intra-class insertion.
Here, the local accumulated error is calculated by accumulating the error of the node according to the input of the input vector as the error of the node, which is the distance difference between the node and the input vector. The error radius is calculated based on the error of the node and the number of times the node has become the first winner.

Ｓ１１０：クラス内挿入によるノード挿入が成功であると判定した場合には、クラス内挿入により挿入されたノード及び局所累積誤差が最大のノードを辺によって直接的に接続する。
一方、クラス内挿入によるノード挿入が失敗であると判定した場合には、クラス内挿入により挿入したノードを削除してＳ１１１へと進む。 S110: When it is determined that the node insertion by the intra-class insertion is successful, the node inserted by the intra-class insertion and the node having the maximum local accumulated error are directly connected by the edge.
On the other hand, if it is determined that node insertion by intra-class insertion has failed, the node inserted by intra-class insertion is deleted, and the process proceeds to S111.

Ｓ１１１：隣接ノード数及びノードが第１勝者となった回数に基づいて、ノイズノードを削除する。
ここで、隣接ノードとは、ノードと辺によって直接的に接続されるノードを示し、隣接ノードの個数が１以下であるノードを削除対象とする。また、第１勝者となった回数の累積回数を予め設定されたパラメタｃを使用して算出される閾値と比較し、第１勝者累積回数が閾値を下回るノードを削除対象とする。 S111: The noise node is deleted based on the number of adjacent nodes and the number of times the node becomes the first winner.
Here, the adjacent node indicates a node that is directly connected to the node by a side, and a node whose number of adjacent nodes is 1 or less is a deletion target. Also, the cumulative number of times of becoming the first winner is compared with a threshold value calculated using a preset parameter c, and a node whose first winner cumulative number is less than the threshold value is targeted for deletion.

Ｓ１１２：入力ベクトルの入力総数が予め設定されたＬＴの倍数であるか否かを判定する。
判定の結果、入力ベクトルの入力総数がＬＴの倍数でない場合には、Ｓ１０１へと戻り次の入力ベクトルを処理する。
一方、入力ベクトルの総数がＬＴの倍数となった場合には、以下の処理を実行する。 S112: It is determined whether or not the total number of inputs of the input vector is a preset multiple of LT.
If it is determined that the total number of input vectors is not a multiple of LT, the process returns to S101 to process the next input vector.
On the other hand, when the total number of input vectors is a multiple of LT, the following processing is executed.

Ｓ１１３：１層目の学習を終了するか否かを判定する。
判定の結果、２層目の学習へと進む場合には、Ｓ１０１へと進み１層目の学習結果であるノードを２層目への入力ベクトルとして入力する。
ただし、追加学習を行う場合は、２層目に残っている以前の学習結果を消去した上で２層目の学習を開始する。 S113: It is determined whether or not learning for the first layer is to be ended.
As a result of the determination, when the process proceeds to the learning of the second layer, the process proceeds to S101, and the node that is the learning result of the first layer is input as an input vector to the second layer.
However, when additional learning is performed, the previous learning result remaining in the second layer is deleted, and then the second layer learning is started.

２層目への入力回数が予め設定された回数ＬＴの倍数となり２層目の学習を終了する場合には、ノードを異なるクラスに分類し、クラス数及び各クラスの代表的なプロトタイプベクトルを出力し停止する。
ここで、プロトタイプベクトルはノードの重みベクトルに相当する。 When the number of inputs to the second layer is a multiple of the preset number of times LT and the learning of the second layer is completed, the nodes are classified into different classes and the number of classes and representative prototype vectors for each class are output. Then stop.
Here, the prototype vector corresponds to a node weight vector.

このように、非特許文献２に開示された技術であるＳＯＩＮＮは、ノード数を自律的に管理することにより非定常的な入力を学習することができ、分布に複雑な形状を持つクラスに対しても適切なクラス数及び位相構造を抽出できるなど多くの利点を持つ。ＳＯＩＮＮの応用例として、例えばパターン認識においては、ひらがな文字のクラスを学習させた後に、カタカナ文字のクラスなどを追加的に学習させることができる。
T. Kohonen, "Self-organized formation of topologically correct feature maps," Biol. Cybern, vol.43, No.1 pp.59-69, Jan 1982 F.Shen and O.Hasegawa, "An incremental network for on-line unsupervised classification and topology learning," Neural Networks, Vol.19, No.1, pp.90-106, 2006 In this way, SOIN, which is a technique disclosed in Non-Patent Document 2, can learn non-stationary inputs by autonomously managing the number of nodes, and can be used for classes having a complicated shape in the distribution. However, it has many advantages such as the ability to extract the appropriate number of classes and phase structure. As an application example of SOIN, for example, in pattern recognition, after learning a hiragana character class, it is possible to additionally learn a katakana character class.
T. Kohonen, "Self-organized formation of topologically correct feature maps," Biol. Cybern, vol.43, No.1 pp.59-69, Jan 1982 F.Shen and O.Hasegawa, "An incremental network for on-line unsupervised classification and topology learning," Neural Networks, Vol.19, No.1, pp.90-106, 2006

しかしながら、ＳＯＩＮＮは、入力ベクトルの属する複数のクラスが近接して、クラス間に高密度の入力ベクトル分布の重なりが存在する学習データを学習させた場合、異なるクラスが連結して１つのクラスタを形成する。
ＳＯＩＮＮは２層ネットワーク構造を用いて学習を実施することで、この問題の解決を試みているものの、分布の重なりが高密度な場合には適切にクラスを分離できないという問題がある。 However, in SOINN, when a plurality of classes to which input vectors belong are close to each other and learning data in which high-density input vector distributions exist between classes is learned, different classes are connected to form one cluster. To do.
Although SOIN attempts to solve this problem by performing learning using a two-layer network structure, there is a problem that classes cannot be separated properly when the distribution overlap is high.

具体例として、図１に示すように、２つのクラス、クラス１及びクラス２間に入力ベクトル分布の重なり部分Ａが存在する場合を想定する。ＳＯＩＮＮは、このような分布の重なりが低密度である場合には２つのクラスを適切に分離することができるものの、重なりが高密度の場合には分離できずに、複数のクラスが連結して１つのクラスタを形成する。即ち、２つのクラス、クラス１及びクラス２が誤って接続されて１つのクラスタを形成するのである。 As a specific example, a case is assumed in which an overlapping portion A of the input vector distribution exists between two classes, class 1 and class 2, as shown in FIG. Although SOIN can properly separate two classes when the overlap of such distribution is low density, it cannot be separated when the overlap is high, and multiple classes are connected. One cluster is formed. That is, two classes, class 1 and class 2, are mistakenly connected to form one cluster.

一般に、学習データの入力ベクトルについて、クラスの中心領域においては多数の入力ベクトルが存在し、クラスの境界領域に近づくに従って減少してゆく。このため、学習データの入力ベクトル分布の密度が所定の閾値を下回った領域をクラスの境界とすることにより、クラスを分離することができる。
しかし、異なるクラス間に高密度の入力ベクトル分布の重なりが存在する場合には、クラスの境界領域であっても相当程度の学習データが存在するため、入力ベクトルの分布の密度が所定の閾値を上回り、クラスを容易に分離することができない。
また、単純に閾値を大きな値に設定することで解決を試みても、本来クラスの境界ではない領域を境界領域として判定する可能性があり、学習結果が安定しないという問題を招く。 In general, with respect to the input vector of learning data, there are a large number of input vectors in the center region of the class, and they decrease as they approach the boundary region of the class. For this reason, the class can be separated by setting the region where the density of the input vector distribution of the learning data is lower than the predetermined threshold as the class boundary.
However, if there is a high-density input vector distribution overlap between different classes, there is a considerable amount of learning data even in the class boundary region. The class cannot be easily separated.
Further, even if a solution is attempted by simply setting the threshold value to a large value, there is a possibility that an area that is not originally a class boundary may be determined as a boundary area, resulting in a problem that the learning result is not stable.

一方、ノードに密度を持たせることにより、学習データに含まれる入力ベクトルの分布の密度を推定することができる。ノードの密度を局所的に与えられる入力ベクトルの数によって定義する。即ち、注目するノード付近に多くの入力ベクトルが与えられる場合には、そのノード付近の学習データにおける入力ベクトルの分布の密度は高いものと考えられ、注目するノード付近に入力ベクトルがほとんど与えられない場合には、そのノード付近での学習データにおける入力ベクトルの分布の密度は低いものと考えられる。 On the other hand, by providing the nodes with a density, it is possible to estimate the density of the distribution of the input vectors included in the learning data. Define the density of nodes by the number of input vectors given locally. That is, when many input vectors are given near the node of interest, the density of the input vector distribution in the learning data near the node is considered to be high, and almost no input vector is given near the node of interest. In this case, it is considered that the density of the input vector distribution in the learning data near the node is low.

このため、ＳＯＩＮＮを含む競合型ニューラルネットワークにおいては、ノードが第１勝者ノードとなった回数である勝者回数をノードの密度として定義している。
しかしながら、勝者回数に基づく従来のノード密度の定義は自然な定義であるものの、以下に示す問題がある。
一つ目の問題として、一般に入力ベクトルの分布が高密度の領域には多くのノードが挿入されるため、そのような領域においてはノードが第１勝者ノードとなる機会が少ない。即ち、より高密度の領域に位置するノードほど勝者の回数が多くなるとは限らないという問題がある。
二つ目の問題として、追加学習を行う場合には、以前の学習において生成されたノードは第１勝者ノードとならないことが多い。即ち、追加学習によって以前の学習において生成されたノードが勝者となった回数は相対的に少なくなり、以前の学習で得られた結果に悪影響を与えてしまうという問題がある。 For this reason, in a competitive neural network including SOINN, the number of winners, which is the number of times a node has become the first winner node, is defined as the node density.
However, although the conventional definition of node density based on the number of winners is a natural definition, there are the following problems.
As a first problem, since a large number of nodes are generally inserted in a region where the distribution of input vectors is high, there is little chance that the node becomes the first winner node in such a region. That is, there is a problem that the number of winners is not always increased as the node is located in a higher density area.
As a second problem, when additional learning is performed, the node generated in the previous learning often does not become the first winner node. That is, there is a problem that the number of times that the node generated in the previous learning is the winner by the additional learning is relatively small, and the result obtained in the previous learning is adversely affected.

このように、従来の勝者回数に基づくノード密度を使用しては、クラスの境界となりうる分布の重なり領域を検出するには不十分なものであり、高密度の分布の重なりを持つクラスを適切に分離することができない。 Thus, using the node density based on the number of winners in the past is insufficient to detect the overlapping area of distribution that can be a class boundary. Can not be separated.

さらに、ＳＯＩＮＮでは入力ベクトルに含まれるノイズデータを効果的に除去するため、２層ネットワーク構造を必要とする。このため、２層ネットワーク構造へと拡張したことに伴い、１層目における学習処理の終了を判定する必要があり、完全にオンラインで追加学習を実現することができない。即ち、１層構造によりノイズデータを効率的に除去することができないという問題がある。 Furthermore, SOIN requires a two-layer network structure to effectively remove noise data contained in the input vector. For this reason, with the extension to the two-layer network structure, it is necessary to determine the end of the learning process in the first layer, and additional learning cannot be realized completely online. That is, there is a problem that noise data cannot be efficiently removed by the one-layer structure.

本発明は係る課題を解決するためになされたものであり、高密度の分布の重なりを持つクラスを分離することができる情報処理装置、情報処理方法、及びプログラムを提供することを第１の目的とする。
更に、ノイズデータを効率的に除去することができる情報処理装置、情報処理方法、及びプログラムを提供することを第２の目的とする。 SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and a first object of the present invention is to provide an information processing apparatus, an information processing method, and a program that can separate classes having high-density distribution overlap. And
Furthermore, a second object is to provide an information processing apparatus, an information processing method, and a program that can efficiently remove noise data.

本発明に係る情報処理装置は、多次元ベクトルで記述されるノードが配置される少なくとも１層以上の構造を有し、任意のクラスに属する入力ベクトルを順次入力して、当該入力ベクトルの入力分布構造としてのクラス数及び位相構造を学習する情報処理装置において、当該入力される入力ベクトルに最も近い重みベクトルを持つノードを第１勝者ノードとし、２番目に近い重みベクトルを持つノードを第２勝者ノードとし、当該第１勝者ノード及び当該第２勝者ノードの間に辺を接続したとき、注目するノード及び当該注目するノードと辺によって直接的に接続されるノード間の平均距離に基づいて、当該注目するノードのノード密度を算出するノード密度算出手段と、辺によって接続されるノードの集合であるクラスタを、前記ノード密度算出手段によって算出されるノード密度に基づいてクラスタの部分集合であるサブクラスタに分割し、当該サブクラスタの境界である分布の重なり領域を検出する分布重なり領域検出手段と、前記第１勝者ノード及び前記第２勝者ノードが前記分布重なり領域に位置するノードである場合に、当該第１勝者ノード及び当該第２勝者ノードのノード密度に基づいて当該第１勝者ノード及び当該第２勝者ノード間に辺を接続するか否かを判定する辺接続判定手段と、前記判定結果に基づいて、前記第１勝者ノード及び前記第２勝者ノード間に辺を接続する辺接続手段と、前記判定結果に基づいて、前記第１勝者ノード及び前記第２勝者ノード間の辺を削除する辺削除手段とを備えるものである。 The information processing apparatus according to the present invention has a structure of at least one layer in which nodes described by multidimensional vectors are arranged, sequentially inputs input vectors belonging to an arbitrary class, and inputs distribution of the input vectors In an information processing apparatus that learns the number of classes and the phase structure as a structure, a node having a weight vector closest to the input vector to be input is a first winner node, and a node having a weight vector closest to the second is a second winner When a side is connected between the first winner node and the second winner node, based on the node of interest and the average distance between the node of interest and the node directly connected by the side, Node density calculation means for calculating the node density of the node of interest, and a cluster that is a set of nodes connected by edges A distribution overlapping area detecting means for dividing the sub-cluster which is a subset of the cluster based on the node density calculated by the calculating means and detecting an overlapping area of the distribution which is a boundary of the sub-cluster; the first winner node; When the second winner node is a node located in the distributed overlap area, an edge is formed between the first winner node and the second winner node based on the node density of the first winner node and the second winner node. Based on the determination result, on the basis of the determination result, on the basis of the determination result, on the basis of the determination result, on the side connection means for connecting an edge between the first winner node and the second winner node And an edge deleting means for deleting an edge between the first winner node and the second winner node.

このように、ノード密度算出手段により算出されるノード密度によれば、ノードについて、そのノードを含むある程度の範囲の領域におけるノードの密集具合を推定することができる。
これにより、分布が高密度の領域に位置するノードであっても、ノードが第１勝者回数となった回数をノードの密度とする従来の場合に比べて、入力ベクトルの入力分布密度により近似した密度となるノード密度を算出することができる。
そして、ノード密度算出手段により算出されるノード密度に基づいて分布の重なり領域を検出し、分布の重なり領域に位置する第１勝者ノード及び第２勝者ノード間に辺を接続するか否かを判定することで、異なるクラスから生成されるクラスタが互いに接続されることを防止することができると共に、誤って一つのクラスタとして接続された場合であっても、接続されたクラスタを適切に分離することができる。 As described above, according to the node density calculated by the node density calculating means, it is possible to estimate the degree of congestion of the node in a certain range of area including the node.
As a result, even if the node is located in a high-density area, the input distribution density of the input vector is approximated compared to the conventional case where the node density is the number of times the node has reached the first winner count. The node density that is the density can be calculated.
Then, the overlapping area of the distribution is detected based on the node density calculated by the node density calculating means, and it is determined whether or not an edge is connected between the first winner node and the second winner node located in the overlapping area of the distribution. By doing so, it is possible to prevent clusters generated from different classes from being connected to each other, and to properly separate the connected clusters even if they are mistakenly connected as one cluster Can do.

また、多次元ベクトルで記述されるノードが配置される少なくとも１層以上の構造を有し、任意のクラスに属する入力ベクトルを順次入力して、当該入力ベクトルの入力分布構造を学習する情報処理装置において、当該入力される入力ベクトルに最も近い重みベクトルを持つノードを第１勝者ノードとし、２番目に近い重みベクトルを持つノードを第２勝者ノードとし、当該第１勝者ノード及び当該第２勝者ノードの間に辺を接続したとき、注目するノード及び当該注目するノードと辺によって直接的に接続されるノード間の平均距離に基づいて、当該注目するノードのノード密度を算出するノード密度算出手段と、注目するノードについて、前記ノード密度算出手段により算出されるノード密度及び当該注目するノードと辺によって直接的に接続されるノードの個数に基づいて、当該注目するノードを削除するノイズノード削除手段を備えるものである。 An information processing apparatus that has at least one layer structure in which nodes described by multidimensional vectors are arranged, sequentially inputs input vectors belonging to an arbitrary class, and learns an input distribution structure of the input vectors , The node having the weight vector closest to the inputted input vector is the first winner node, the node having the second closest weight vector is the second winner node, and the first winner node and the second winner node A node density calculation means for calculating a node density of the node of interest based on an average distance between the node of interest and a node directly connected by the edge when the edge is connected between The node of interest is directly determined by the node density calculated by the node density calculating means and the node and edge of interest. Based on the number of connected nodes, but with a noise node deleting means for deleting the node to the attention.

このように、ノード密度算出手段により算出されるノード密度によれば、ノードについて、そのノードを含むある程度の範囲の領域におけるノードの密集具合を推定することができる。
これにより、分布が高密度の領域に位置するノードであっても、ノードが第１勝者回数となった回数をノードの密度とする従来の場合に比べて、入力ベクトルの入力分布密度により近似した密度となるノード密度を算出することができる。
そして、ノード密度算出手段により算出されるノード密度、及びノードと辺によって直接的に接続されるノードの個数に基づいて注目するノードを削除することで、ノイズノードを効率的に削除することができる。 As described above, according to the node density calculated by the node density calculating means, it is possible to estimate the degree of congestion of the node in a certain range of area including the node.
As a result, even if the node is located in a high-density area, the input distribution density of the input vector is approximated compared to the conventional case where the node density is the number of times the node has reached the first winner count. The node density that is the density can be calculated.
Then, by deleting the node of interest based on the node density calculated by the node density calculating unit and the number of nodes directly connected to the node by the side, the noise node can be efficiently deleted. .

さらに、前記第１勝者ノードに対応する重みベクトル及び当該第１勝者ノードと辺によって直接的に接続されるノードに対応する重みベクトルをそれぞれ前記入力ベクトルに更に近づけるように更新する重みベクトル更新手段を更に備えるようにしてもよい。 Furthermore, weight vector updating means for updating the weight vector corresponding to the first winner node and the weight vector corresponding to the node directly connected to the first winner node by an edge so as to be closer to the input vector, respectively. You may make it provide further.

これにより、第１勝者ノードに対応する重みベクトル、及び第１勝者ノードと辺によって直接的に接続されるノードに対応する重みベクトルを、それぞれ入力ベクトルに更に近づけるように更新することができる。 Accordingly, the weight vector corresponding to the first winner node and the weight vector corresponding to the node directly connected to the first winner node by the edge can be updated so as to be closer to the input vector, respectively.

さらにまた、注目するノードについて、前記ノード密度算出手段により算出されるノード密度及び当該注目するノードと辺によって直接的に接続されるノードの個数に基づいて、当該注目するノードを削除するノイズノード削除手段を更に備えるようにしてもよい。 Furthermore, for the node of interest, noise node deletion that deletes the node of interest based on the node density calculated by the node density calculation means and the number of nodes directly connected to the node of interest by the side You may make it further provide a means.

これにより、ノード密度算出手段により算出されるノード密度、及びノードと辺によって直接的に接続されるノードの個数に基づいて、ノードを削除することで、ノイズノードをさらに効率的に削除することができる。 Thus, the noise node can be deleted more efficiently by deleting the node based on the node density calculated by the node density calculating means and the number of nodes directly connected to the node by the side. it can.

また、前記ノード密度算出手段は、前記第１勝者ノード及び当該第１勝者ノードと辺によって直接的に接続されるノード間の平均距離に基づいて、当該第１勝者ノードのノード密度を単位入力数あたりの割合として算出する単位ノード密度算出部を有するようにしてもよい。 Further, the node density calculating means calculates the node density of the first winner node based on an average distance between the first winner node and the first winner node and a node directly connected by an edge. You may make it have a unit node density calculation part calculated as a ratio per.

このように、ノード密度を、ノードの密集具合を反映させた上で、単位入力数あたりのノードのノード密度として算出することができる。
これにより、追加学習を長時間実施する場合であっても、ノードのノード密度が相対的に小さくなってしまうことを防ぐことができ、従来の手法に比べて、入力ベクトルの入力分布密度により近似したノード密度を変化させずに保持して算出することができる。 As described above, the node density can be calculated as the node density of the node per unit input after reflecting the density of the nodes.
As a result, even when additional learning is performed for a long time, the node density of the node can be prevented from becoming relatively small, and the input distribution density of the input vector is approximated compared to the conventional method. The calculated node density can be held and calculated without change.

さらに、前記ノード密度算出手段は、前記第１勝者ノード及び当該第１勝者ノードと辺によって直接的に接続されるノード間の平均距離に基づいて、当該第１勝者ノードのノード密度のポイント値を算出するノード密度ポイント算出部と、前記入力ベクトルの入力数が所定の単位入力数となるまでノード密度ポイントを累積し、当該入力ベクトルの入力数が所定の単位入力数になった場合に、当該累積されたノード密度ポイントを単位入力数あたりの割合として算出し、単位入力数あたりのノードのノード密度を算出する単位ノード密度ポイント算出部とを有するようにしてもよい。 Further, the node density calculation means calculates a node density point value of the first winner node based on an average distance between the first winner node and the first winner node and a node directly connected by an edge. The node density point calculation unit to calculate, and the node density points are accumulated until the input number of the input vector reaches a predetermined unit input number, and when the input number of the input vector reaches the predetermined unit input number, A unit node density point calculation unit that calculates the accumulated node density point as a ratio per unit input number and calculates the node density of the node per unit input number may be provided.

このように、ノード密度を、ノードの密集具合を反映させたポイントとして算出し、単位入力数あたりのノードのノード密度ポイントとして算出することができる。
これにより、追加学習を長時間実施する場合であっても、ノードのノード密度が相対的に小さくなってしまうことを防ぐことができ、従来の手法に比べて、入力ベクトルの入力分布密度により近似したノード密度を変化させずに保持して算出することができる。 Thus, the node density can be calculated as a point reflecting the degree of congestion of the node, and can be calculated as a node density point of the node per unit input number.
As a result, even when additional learning is performed for a long time, the node density of the node can be prevented from becoming relatively small, and the input distribution density of the input vector is approximated compared to the conventional method. The calculated node density can be held and calculated without change.

さらにまた、前記入力ベクトルをニューラルネットワークに入力し、当該入力される入力ベクトルに基づいて、該ニューラルネットワークに配置されるノードを自動的に増加させる自己増殖型ニューラルネットワークであるようにしてもよい。 Furthermore, the input vector may be input to a neural network, and a self-propagating neural network that automatically increases the nodes arranged in the neural network based on the input vector that is input may be used.

このように、ノードを自動的に増加させることで、入力ベクトル空間からランダムに入力ベクトルが与えられる定常的な環境に限られず、例えば一定期間毎に入力ベクトルの属するクラスが切替えられて、切替後のクラスからランダムに入力ベクトルが与えられる非定常的な環境に対応することができる。 In this way, by automatically increasing the number of nodes, it is not limited to a stationary environment in which input vectors are randomly given from the input vector space. For example, the class to which the input vector belongs is switched every fixed period. It is possible to deal with a non-stationary environment in which input vectors are randomly given from these classes.

また、注目するノードについて、当該注目するノードと辺によって直接的に接続されるノードが存在する場合には、当該直接的に接続されるノードのうち当該注目するノードからの距離が最大であるノード間の距離を類似度閾値とし、当該注目するノードと辺によって直接的に接続されるノードが存在しない場合には、当該注目するノードからの距離が最小であるノード間の距離を類似度閾値として算出する類似度閾値算出手段と、前記入力ベクトルと前記第１勝者ノード間の距離が当該第１勝者ノードの類似度閾値より大きいか否か、及び、前記入力ベクトルと前記第２勝者ノード間の距離が当該第２勝者ノードの類似度閾値より大きいか否かを判定する類似度閾値判定手段と、類似度閾値判定結果に基づいて、前記入力ベクトルをノードとして当該入力ベクトルと同じ位置に挿入するノード挿入手段と、を更に備えるようにしてもよい。 In addition, when there is a node that is directly connected to the target node by a side with respect to the target node, the node having the maximum distance from the target node among the directly connected nodes If there is no node that is directly connected to the node of interest by a side, the distance between the nodes with the smallest distance from the node of interest is used as the similarity threshold. A similarity threshold calculating means for calculating, whether a distance between the input vector and the first winner node is greater than a similarity threshold of the first winner node, and between the input vector and the second winner node Similarity threshold determination means for determining whether the distance is greater than the similarity threshold of the second winner node, and based on the similarity threshold determination result, the input vector is determined as a node A node insertion means for inserting in the same position as the input vector and may further comprise a.

このように、類似度閾値に基づいてノードの挿入を判断することで、ノード数を自律的に管理することができる。
これにより、入力ベクトル空間からランダムに入力ベクトルが与えられる定常的な環境に限られず、例えば一定期間毎に入力ベクトルの属するクラスが切替えられて、切替後のクラスからランダムに入力ベクトルが与えられる非定常的な環境に対応することができる共に、このような非定常的な環境において必要とされる、新しいクラスを追加的に学習する追加学習を実施することができる。 Thus, the number of nodes can be managed autonomously by determining the insertion of a node based on the similarity threshold.
As a result, the present invention is not limited to a stationary environment in which input vectors are randomly given from the input vector space. For example, the class to which the input vector belongs is switched at regular intervals, and the input vector is randomly given from the switched class. While being able to cope with a stationary environment, additional learning can be performed to additionally learn new classes that are needed in such a non-stationary environment.

さらに、前記自己増殖型ニューラルネットワークは１層構造であるようにしてもよい。 Furthermore, the self-propagating neural network may have a single layer structure.

このように、１層構造とすることで、２層目の学習を開始するタイミングを指定せずに追加学習を実施することができる。即ち、完全なオンラインでの追加学習を実施することができる。
また、非特許文献２に開示された技術と比べて、学習に際して事前に指定するパラメタの数を減少させることができ、より簡単に学習を実施することができる。 As described above, the single-layer structure allows additional learning to be performed without specifying the timing for starting learning of the second layer. That is, complete online additional learning can be performed.
Compared with the technique disclosed in Non-Patent Document 2, the number of parameters designated in advance for learning can be reduced, and learning can be performed more easily.

さらにまた、前記分布重なり領域検出手段は、前記ノード密度算出手段により算出されたノード密度に基づいて、ノード密度が局所的に最大であるノードを探索するノード探索部と、
当該探索したノードに対して、既に他のノードに付与済みのラベルとは異なるラベルを付与する第１のラベル付与部と、前記第１のラベル付与部によりラベルが付与されなかったノードについて、前記第１のラベル付与部によりラベルが付与されたノードと辺によって接続されるノードについて、前記第１のラベル付与部によりラベルが付与されたノードのラベルと同じラベルを付与する第２のラベル付与部と、辺によって接続されるノードの集合であるクラスタを、同じラベルが付与されたノードからなるクラスタの部分集合であるサブクラスタに分割するクラスタ分割部と、注目するノード及び当該注目するノードと辺によって直接的に接続されるノードがそれぞれ異なるサブクラスタに属する場合に、当該注目するノード及び当該注目するノードと辺によって直接的に接続されるノードを含む領域を、サブクラスタの境界である分布の重なり領域として検出する分布重なり領域検出部とを有するようにしてもよい。 Furthermore, the distribution overlap area detection means, based on the node density calculated by the node density calculation means, a node search unit for searching for a node having a locally maximum node density,
For the searched node, a first label assigning unit that assigns a label different from a label that has already been assigned to another node, and a node that has not been given a label by the first label assigning unit, A second label assigning unit that assigns the same label as the node of the node to which the label is assigned by the first label assigning unit with respect to the node connected by the side to the node to which the label is given by the first label assigning unit A cluster dividing unit that divides a cluster, which is a set of nodes connected by edges, into sub-clusters, which are subsets of clusters composed of nodes with the same label, a node of interest, and the node and edge of interest If the nodes directly connected to each other belong to different sub-clusters, That a region including a node which is directly connected by nodes and edges, may have a distribution overlapping area detection unit detects as the overlapped area of the distribution which is the boundary of the sub-cluster.

このように、ノード密度が局所的に最大となるノードに基づいてクラスタをサブクラスタに分割することで、単純に密度の低い領域をクラスタの境界として検出する方法では検出が困難となる場合においても、サブクラスタの境界である分布の重なり領域を適切に検出することができる。 In this way, even if it is difficult to detect by dividing the cluster into sub-clusters based on the node where the node density is locally maximum, simply detecting the low density area as the cluster boundary Thus, it is possible to appropriately detect the overlapping region of the distribution that is the boundary of the subcluster.

本発明に係る情報処理方法は、多次元ベクトルで記述されるノードが配置される少なくとも１層以上の構造を有し、任意のクラスに属する入力ベクトルを順次入力して、当該入力ベクトルの入力分布構造を学習する情報処理方法において、当該入力される入力ベクトルに最も近い重みベクトルを持つノードを第１勝者ノードとし、２番目に近い重みベクトルを持つノードを第２勝者ノードとし、当該第１勝者ノード及び当該第２勝者ノードの間に辺を接続したとき、注目するノード及び当該注目するノードと辺によって直接的に接続されるノード間の平均距離に基づいて、当該注目するノードのノード密度を算出するノード密度算出ステップと、辺によって接続されるノードの集合であるクラスタを、前記ノード密度算出手段によって算出されるノード密度に基づいてクラスタの部分集合であるサブクラスタに分割し、当該サブクラスタの境界である分布の重なり領域を検出する分布重なり領域検出ステップと、前記第１勝者ノード及び前記第２勝者ノードが前記分布重なり領域に位置するノードである場合に、当該第１勝者ノード及び当該第２勝者ノードのノード密度に基づいて当該第１勝者ノード及び当該第２勝者ノード間に辺を接続するか否かを判定する辺接続判定ステップと、前記判定結果に基づいて、前記第１勝者ノード及び前記第２勝者ノード間に辺を接続する辺接続ステップと、前記判定結果に基づいて、前記第１勝者ノード及び前記第２勝者ノード間の辺を削除する辺削除ステップとを備えるものである。 An information processing method according to the present invention has a structure of at least one layer in which nodes described by multidimensional vectors are arranged, and sequentially inputs input vectors belonging to an arbitrary class, and the input distribution of the input vectors In the information processing method for learning a structure, a node having a weight vector closest to the input vector to be input is a first winner node, a node having a weight vector closest to the second is a second winner node, and the first winner is When a side is connected between a node and the second winner node, the node density of the target node is determined based on the average distance between the target node and the target node and the node directly connected by the side. The node density calculation step to calculate and a cluster which is a set of nodes connected by edges are calculated by the node density calculation means. A distributed overlap area detecting step of dividing a sub-cluster which is a subset of the cluster based on a node density and detecting an overlap area of the distribution which is a boundary of the sub-cluster, the first winner node and the second winner node Whether or not an edge is connected between the first winner node and the second winner node based on the node density of the first winner node and the second winner node An edge connection determining step for determining the edge, an edge connecting step for connecting an edge between the first winner node and the second winner node based on the determination result, and the first winner based on the determination result A side deleting step of deleting a side between the node and the second winner node.

本発明に係るプログラムは、上述のような情報処理をコンピュータに実行させるものである。 The program according to the present invention causes a computer to execute information processing as described above.

本発明によれば、高密度の分布の重なりを持つクラスを分離することができる情報処理装置、情報処理方法、及びプログラムを提供することができる。
更に、ノイズデータを効率的に除去することができる情報処理装置、情報処理方法、及びプログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the information processing apparatus, the information processing method, and program which can isolate | separate the class with a high-density distribution overlap can be provided.
Furthermore, an information processing apparatus, an information processing method, and a program that can efficiently remove noise data can be provided.

発明の実施の形態１．
図２は、本実施の形態１に係る情報処理装置を実現するためのシステム構成の一例を示す図である。情報処理装置は、専用コンピュータ、パーソナルコンピュータ（ＰＣ）などのコンピュータより実現可能である。但し、コンピュータは、物理的に単一である必要はなく、分散処理を実行する場合には、複数であってもよい。図２に示すように、コンピュータ１０は、ＣＰＵ１１（Central Processing Unit）、ＲＯＭ１２（Read Only Memory）及びＲＡＭ１３（Random Access Memory）を有し、これらがバス１４を介して相互に接続されている。尚、コンピュータを動作させるためのＯＳソフトなどは、説明を省略するが、この情報処理装置を構築するコンピュータも当然備えているものとする。 Embodiment 1 of the Invention
FIG. 2 is a diagram illustrating an example of a system configuration for realizing the information processing apparatus according to the first embodiment. The information processing apparatus can be realized by a computer such as a dedicated computer or a personal computer (PC). However, the computer does not need to be physically single, and a plurality of computers may be used when performing distributed processing. As shown in FIG. 2, the computer 10 includes a CPU 11 (Central Processing Unit), a ROM 12 (Read Only Memory), and a RAM 13 (Random Access Memory), which are connected to each other via a bus 14. Although explanation of OS software for operating the computer is omitted, it is assumed that a computer for constructing the information processing apparatus is also provided.

バス１４には又、入出力インターフェイス１５も接続されている。入出力インターフェイス１５には、例えば、キーボード、マウス、センサなどよりなる入力部１６、ＣＲＴ、ＬＣＤなどよりなるディスプレイ、並びにヘッドフォンやスピーカなどよりなる出力部１７、ハードディスクなどより構成される記憶部１８、モデム、ターミナルアダプタなどより構成される通信部１９などが接続されている。 An input / output interface 15 is also connected to the bus 14. The input / output interface 15 includes, for example, an input unit 16 including a keyboard, a mouse, and a sensor, a display including a CRT and an LCD, an output unit 17 including headphones and speakers, a storage unit 18 including a hard disk, A communication unit 19 including a modem and a terminal adapter is connected.

ＣＰＵ１１は、ＲＯＭ１２に記憶されている各種プログラム、又は記憶部１８からＲＡＭ１３にロードされた各種プログラムに従って各種の処理、本実施の形態においては、例えばノード密度算出手段２７や分布重なり領域検出手段２８における処理を実行する。ＲＡＭ１３には又、ＣＰＵ１１が各種の処理を実行する上において必要なデータなども適宜記憶される。 The CPU 11 performs various processes according to various programs stored in the ROM 12 or various programs loaded from the storage unit 18 to the RAM 13. In this embodiment, for example, in the node density calculation unit 27 and the distribution overlap region detection unit 28. Execute the process. The RAM 13 also appropriately stores data necessary for the CPU 11 to execute various processes.

通信部１９は、例えば図示しないインターネットを介しての通信処理を行ったり、ＣＰＵ１１から提供されたデータを送信したり、通信相手から受信したデータをＣＰＵ１１、ＲＡＭ１３、記憶部１８に出力したりする。記憶部１８はＣＰＵ１１との間でやり取りし、情報の保存・消去を行う。通信部１９は又、他の装置との間で、アナログ信号又はディジタル信号の通信処理を行う。 The communication unit 19 performs, for example, communication processing via the Internet (not shown), transmits data provided from the CPU 11, and outputs data received from the communication partner to the CPU 11, the RAM 13, and the storage unit 18. The storage unit 18 exchanges with the CPU 11 to save and erase information. The communication unit 19 also performs communication processing of analog signals or digital signals with other devices.

入出力インターフェイス１５は又、必要に応じてドライブ２０が接続され、例えば、磁気ディスク２０１、光ディスク２０２、フレキシブルディスク２０３、又は半導体メモリ２０４などが適宜装着され、それらから読み出されたコンピュータプログラムが必要に応じて記憶部１８にインストールされる。 The input / output interface 15 is also connected to the drive 20 as necessary. For example, a magnetic disk 201, an optical disk 202, a flexible disk 203, or a semiconductor memory 204 is appropriately mounted, and a computer program read from them is required. Is installed in the storage unit 18 according to

続いて、本実施形態に係る情報処理装置１における各処理について、その主な処理を図３に示す機能ブロック図を用いて説明する。ハードウェア上は、各処理は実際にはソフトウェアと上記ＣＰＵ１１などのハードウェア資源とが協働して実現している。 Subsequently, each process in the information processing apparatus 1 according to the present embodiment will be described with reference to a functional block diagram shown in FIG. In terms of hardware, each processing is actually realized by cooperation of software and hardware resources such as the CPU 11.

情報処理装置１は、ｎ次元ベクトルで記述されるノードが配置される少なくとも１層以上の構造のニューラルネットワークを有し、入力情報取得手段２１、勝者ノード探索手段２２、類似度閾値算出手段２３、類似度閾値判定手段２４、ノード挿入手段２５、重みベクトル更新手段２６、ノード密度算出手段２７、分布重なり領域検出手段２８、辺接続判定手段２９、辺接続手段３０、辺削除手段３１、ノイズノード削除手段３２、クラス決定手段３３、出力情報表示手段３４を含む。
尚、本実施形態に係る情報処理装置は、非特許文献２に開示される技術であるＳＯＩＮＮに比べて、更に、ノード密度算出手段２７、分布重なり領域検出手段２８、辺接続判定手段２９、辺接続手段３０、辺削除手段３１、ノイズノード削除手段３２を含むものである。
ノード密度算出手段２７、分布重なり領域検出手段２８、辺接続判定手段２９、辺接続手段３０、及び辺削除手段３１によれば、分布に高密度の重なりのあるクラスを分離することができる。
さらに、ノード密度算出手段２７及びノイズノード削除手段３２によれば、ノイズノードを効率的に削除することができる。
以下、更に詳細に説明する。 The information processing apparatus 1 includes a neural network having a structure of at least one layer in which nodes described by n-dimensional vectors are arranged, and includes an input information acquisition unit 21, a winner node search unit 22, a similarity threshold calculation unit 23, Similarity threshold determination means 24, node insertion means 25, weight vector update means 26, node density calculation means 27, distribution overlap area detection means 28, edge connection determination means 29, edge connection means 30, edge deletion means 31, noise node deletion Means 32, class determination means 33, and output information display means 34 are included.
Note that the information processing apparatus according to the present embodiment further includes a node density calculation unit 27, a distribution overlap region detection unit 28, an edge connection determination unit 29, an edge, as compared with the SOIN disclosed in Non-Patent Document 2. The connecting means 30, the edge deleting means 31, and the noise node deleting means 32 are included.
According to the node density calculation means 27, the distribution overlap area detection means 28, the edge connection determination means 29, the edge connection means 30, and the edge deletion means 31, it is possible to separate classes having a high density overlap in the distribution.
Furthermore, according to the node density calculation unit 27 and the noise node deletion unit 32, the noise node can be deleted efficiently.
This will be described in more detail below.

本実施形態においては、情報処理装置１を構成するニューラルネットワークは、入力ベクトルをニューラルネットワークに入力し、入力される入力ベクトルに基づいて、ニューラルネットワークに配置されるノードを自動的に増加させる自己増殖型ニューラルネットワークであり、１層構造を有するものとして以下説明する。 In the present embodiment, the neural network constituting the information processing apparatus 1 inputs an input vector to the neural network, and automatically increases the number of nodes arranged in the neural network based on the input vector that is input. This will be described below as a type neural network having a one-layer structure.

これにより、自己増殖型ニューラルネットワークを用いてノードを自動的に増加させることができるため、入力ベクトル空間からランダムに入力ベクトルが与えられる定常的な環境に限られず、例えば一定期間毎に入力ベクトルの属するクラスが切替えられて、切替後のクラスからランダムに入力ベクトルが与えられる非定常的な環境にも対応することができる。
さらにまた、１層構造とすることで、２層目の学習を開始するタイミングを指定せずに追加学習を実施することができる。即ち、完全なオンラインでの追加学習を実施することができる。
また、ＳＯＩＮＮと比べて、学習に際して事前に指定するパラメタの数を少なくすることができ、より簡単に学習を実施することができる。 As a result, the number of nodes can be automatically increased using a self-propagating neural network, so that the present invention is not limited to a stationary environment in which input vectors are randomly given from the input vector space. It is possible to cope with a non-stationary environment in which the class to which the input belongs is switched and an input vector is randomly given from the switched class.
Furthermore, by adopting a single-layer structure, additional learning can be performed without specifying the timing for starting learning of the second layer. That is, complete online additional learning can be performed.
Further, compared with SOIN, the number of parameters specified in advance for learning can be reduced, and learning can be performed more easily.

入力情報取得手段２１は、情報処理装置１に入力として与えられる情報として、任意のクラスに属するｎ次元の入力ベクトルを取得する。そして、取得された入力ベクトルを一時記憶部（例えばＲＡＭ１３）に格納し、一時記憶部に格納されたニューラルネットワークに対して順次入力する。 The input information acquisition unit 21 acquires an n-dimensional input vector belonging to an arbitrary class as information given as an input to the information processing apparatus 1. Then, the acquired input vector is stored in a temporary storage unit (for example, the RAM 13), and sequentially input to the neural network stored in the temporary storage unit.

勝者ノード探索手段２２は、一時記憶部に格納される入力ベクトル及びノードについて、入力ベクトルに最も近い重みベクトルを持つノードを第１勝者ノードとし、２番目に近い重みベクトルを持つノードを第２勝者ノードとして探索し、その結果を一時記憶部に格納する。
即ち、ｎ次元の入力ベクトルξに対して、一時記憶部に格納される以下の式を満足するノードを、それぞれ第１勝者ノードａ₁及び第２勝者ノードａ₂として探索し、その結果を一時記憶部に格納する。
ここで、ａは一時記憶部に格納されたノード集合Ａに含まれるノードであり、Ｗ_ａは一時記憶部に格納されたノードａの重みベクトルを示す。 The winner node search means 22 uses the input vector and the node stored in the temporary storage unit as the first winner node with the node having the weight vector closest to the input vector as the second winner. Search as a node, and store the result in a temporary storage unit.
That is, for the n-dimensional input vector ξ, nodes satisfying the following expressions stored in the temporary storage unit are searched as the first winner node a ₁ and the second winner node a ₂ respectively, and the result is temporarily stored. Store in the storage.
Here, a is a node included in the node set A stored in the temporary storage unit, and W _a is a weight vector of the node a stored in the temporary storage unit.

類似度閾値算出手段２３は、一時記憶部に格納されたノード及びノードの類似度閾値について、注目するノードについて、注目するノードと辺によって直接的に接続されるノード（以下、隣接ノードという。）が存在する場合には、隣接ノードのうち注目するノードからの距離が最大であるノードまでの距離を類似度閾値として算出し、その結果を一時記憶部に格納し、隣接ノードが存在しない場合には、注目するノードからの距離が最小であるノードまでの距離を類似度閾値として算出し、その結果を一時記憶部に格納する。
具体的には、例えば以下のようにして注目するノードの類似度閾値を算出し、その結果を一時記憶部に格納する。
Ｓ２０１：類似度閾値算出手段２３は、新しく挿入され一時記憶部に格納されたノードｉの類似度閾値T_iを＋∞（十分大きな値）に設定し、その結果を一時記憶部に格納する。
Ｓ２０２：一時記憶部に格納されたノードについて、ノードｉが入力ベクトルから最も近いノードまたは２番目に近いノードとなった場合に、ノードｉが隣接ノードを持つか否かを判定し、その結果を一時記憶部に格納する。
Ｓ２０３：一時記憶部に格納された判定の結果、隣接ノードを持つ場合には、一時記憶部に格納された類似度閾値及びノードについて、類似度閾値T_iを隣接ノードへの最大距離とし、その結果を一時記憶部に格納する。
即ち、ノードｉについて、一時記憶部に格納される以下の式に基づいて類似度閾値T_iを算出し、その結果を一時記憶部に格納する。
ここで、ｃは一時記憶部に格納されたノードｉの隣接ノード集合Ｎ_ｉに含まれるノードであり、Ｗ_ｃは一時記憶部に格納されたノードｃの重みベクトルを示す。
Ｓ２０４：判定の結果、隣接ノードを持たない場合には、ノードｉからノードｉを除いた他の各ノードへの距離を算出し、算出された距離のうち最小の距離を類似度閾値T_iとする。
即ち、ノードｉについて、一時記憶部に格納される以下の式に基づいて類似度閾値T_iを算出し、その結果を一時記憶部に格納する。
The similarity threshold value calculation means 23 is a node directly connected by a node to which attention is paid and a side (hereinafter referred to as an adjacent node) with respect to the node of interest with respect to the node and node similarity threshold value stored in the temporary storage unit. If there is an adjacent node, the distance to the node having the maximum distance from the node of interest among the adjacent nodes is calculated as a similarity threshold, and the result is stored in the temporary storage unit. Calculates the distance to the node having the smallest distance from the node of interest as the similarity threshold, and stores the result in the temporary storage unit.
Specifically, for example, the similarity threshold value of the node of interest is calculated as follows, and the result is stored in the temporary storage unit.
S201: The similarity threshold calculation means 23 sets the similarity threshold T _i of the node i newly inserted and stored in the temporary storage unit to + ∞ (a sufficiently large value), and stores the result in the temporary storage unit.
S202: For the node stored in the temporary storage unit, when the node i is the closest node or the second closest node from the input vector, it is determined whether or not the node i has an adjacent node, and the result is Store in the temporary storage.
S203: As a result of the determination stored in the temporary storage unit, if there is an adjacent node, the similarity threshold T _i is set as the maximum distance to the adjacent node for the similarity threshold and the node stored in the temporary storage unit, Store the result in the temporary storage.
That is, for the node i, the similarity threshold value T _i is calculated based on the following expression stored in the temporary storage unit, and the result is stored in the temporary storage unit.
Here, c is a node included in the adjacent node set N _i of the node i stored in the temporary storage unit, and W _c indicates a weight vector of the node c stored in the temporary storage unit.
S204: If there is no adjacent node as a result of the determination, the distance from the node i to each of the other nodes excluding the node i is calculated, and the minimum distance among the calculated distances is set as the similarity threshold T _i . To do.
That is, for the node i, the similarity threshold value T _i is calculated based on the following expression stored in the temporary storage unit, and the result is stored in the temporary storage unit.

類似度閾値判定手段２４は、一時記憶部に格納された入力ベクトル、ノード及び、ノードの類似度閾値について、入力ベクトル及びと第１勝者ノード間の距離が第１勝者ノードの類似度閾値より大きいか否か、及び、入力ベクトルと第２勝者ノード間の距離が第２勝者ノードの類似度閾値より大きいか否かを判定し、その結果を一時記憶部に格納する。
即ち、一時記憶部に格納される以下の式に示すように、入力ベクトルξと第１勝者ノードａ₁の間の距離が類似度閾値Ｔ_ａ1よりも大きいか否かを判定しその結果を一時記憶部に格納すると共に、入力ベクトルξと第２勝者ノードａ₂の間の距離が類似度閾値Ｔ_ａ2よりも大きいか否かを判定しその結果を一時記憶部に格納する。
The similarity threshold determination means 24 has a distance between the input vector and the first winner node greater than the similarity threshold of the first winner node with respect to the input vector, the node, and the node similarity threshold stored in the temporary storage unit. And whether or not the distance between the input vector and the second winner node is larger than the similarity threshold of the second winner node, and the result is stored in the temporary storage unit.
That is, as shown in the following expression stored in the temporary storage unit, it is determined whether or not the distance between the input vector ξ and the first winner node a ₁ is larger than the similarity threshold T _a1, and the result is temporarily stored. While storing in the storage unit, it is determined whether or not the distance between the input vector ξ and the second winner node a ₂ is larger than the similarity threshold value _{Ta 2,} and the result is stored in the temporary storage unit.

ノード挿入手段２５は、一時記憶部に格納された類似度閾値判定手段２４による判定結果に基づいて、一時記憶部に格納された入力ベクトルについて、入力ベクトルを新たなノードとして、入力ベクトルと同じ位置に挿入し、その結果を一時記憶部に格納する。 Based on the determination result by the similarity threshold determination unit 24 stored in the temporary storage unit, the node insertion unit 25 uses the input vector stored in the temporary storage unit as a new node and has the same position as the input vector. The result is stored in the temporary storage unit.

重みベクトル更新手段２６は、一時記憶部に格納されたノードの重みベクトルについて、第１勝者ノードの重みベクトル及び第１勝者ノードの隣接ノードの重みベクトルをそれぞれ入力ベクトルに更に近づけるように更新し、その結果を一時記憶部に格納する。
第１勝者ノードａ₁の重みベクトルの更新量ΔＷ_ａ1、及び第１勝者ノードａ₁の隣接ノードｉの重みベクトルの更新量ΔＷ_ａｉは、例えば一時記憶部に格納される以下の式に基づいて算出し、その結果を一時記憶部に格納する。
ここで、ε₁（ｔ）及びε₂（ｔ）はそれぞれ一時記憶部に格納される以下の式に基づいて算出し、その結果を一時期億部に格納する。
尚、本実施の形態においては、追加学習に対応するため、入力ベクトルの入力回数ｔに代えて、一時記憶部に格納される第１勝者ノードａ₁が第１勝者ノードとなった累積回数Ｍ_ａ1を用いる。 The weight vector updating unit 26 updates the weight vector of the first winner node and the weight vector of the adjacent node of the first winner node so that the weight vector of the node stored in the temporary storage unit is closer to the input vector. The result is stored in the temporary storage unit.
Update amount [Delta] W _ai of the weight vectors of the neighboring node i of the update amount [Delta] W _a1, and the first winning node a ₁ of the first weight vector of the winning node a ₁ is, for example, based on the following equation that is stored in the temporary storage unit Calculate and store the result in the temporary storage.
Here, ε ₁ (t) and ε ₂ (t) are respectively calculated based on the following formulas stored in the temporary storage unit, and the results are temporarily stored in 100 million copies.
In the present embodiment, in order to cope with additional learning, instead of the input vector input count t, the cumulative count M in which the first winner node a ₁ stored in the temporary storage unit becomes the first winner node. _{Use a1} .

ノード密度算出手段２７は、一時記憶部に格納されたノード及びノード密度について、注目するノードについて、その隣接ノード間の平均距離に基づいて、注目するノードのノード密度を算出し、その結果を一時記憶部に格納する。
さらに、ノード密度算出手段２７は、単位ノード密度算出部を有し、単位ノード密度算出部は、追加学習に対応するため、一時記憶部に格納された第１勝者ノード及びノード密度について、第１勝者ノードとその隣接ノード間の平均距離に基づいて、第１勝者ノードのノード密度を単位入力数あたりの割合として算出し、その結果を一時記憶部に格納する。
さらにまた、ノード密度算出手段２７は、一時記憶部に格納されたノード及びノード密度ポイントについて、第１勝者ノード及びその隣接ノード間の平均距離に基づいて、第１勝者ノードのノード密度のポイント値を算出するノード密度ポイント算出部と、入力ベクトルの入力数が所定の単位入力数となるまでノード密度ポイントを一時記憶部に格納して累積し、入力ベクトルの入力数が所定の単位入力数になった場合に、一時記憶部に格納して累積されたノード密度ポイントを単位入力数あたりの割合として算出し、単位入力数あたりのノードのノード密度を算出し、その結果を一時記憶部に格納する単位ノード密度ポイント算出部を有する。 The node density calculating means 27 calculates the node density of the node of interest based on the average distance between the adjacent nodes for the node of interest and the node stored in the temporary storage unit, and temporarily calculates the result. Store in the storage.
Further, the node density calculating unit 27 includes a unit node density calculating unit, and the unit node density calculating unit corresponds to the additional learning, and therefore, the first winner node and the node density stored in the temporary storage unit are Based on the average distance between the winner node and its adjacent nodes, the node density of the first winner node is calculated as a ratio per unit input number, and the result is stored in the temporary storage unit.
Furthermore, the node density calculation means 27 calculates the node density point value of the first winner node based on the average distance between the first winner node and its adjacent nodes for the nodes and node density points stored in the temporary storage unit. A node density point calculation unit that calculates the number of input vectors, and the node density points are stored and accumulated in a temporary storage unit until the number of inputs of the input vector reaches a predetermined number of unit inputs. In this case, the node density points stored and accumulated in the temporary storage unit are calculated as a ratio per unit input number, the node density of the node per unit input number is calculated, and the result is stored in the temporary storage unit A unit node density point calculation unit.

具体的には、ノード密度ポイント算出部は、例えば一時記憶部に格納される以下の式に基づいてノードｉに与えられるノード密度のポイント値ｐ_ｉを算出し、その結果を一時記憶部に格納する。尚、ノードｉに与えられるポイント値ｐ_ｉは、ノードｉが第１勝者ノードとなった場合には一時記憶部に格納される以下の式に基づいて算出されるポイント値が与えられるが、ノードｉが第１勝者ノードでない場合にはノードｉにはポイントは与えられないものとする。
ここで、ｅ_ｉはノードｉからその隣接ノードまでの平均距離を示し、一時記憶部に格納される以下の式に基づいて算出し、その結果を一時記憶部に格納する。

尚、ｍは一時記憶部に格納されたノードｉの隣接ノードの個数を示し、Ｗ_ｉは一時記憶部に格納されたノードｉの重みベクトルを示す。 Specifically, the node density point calculation unit calculates a node density point value p _i given to the node i based on, for example, the following expression stored in the temporary storage unit, and stores the result in the temporary storage unit: To do. The point value p _i given to the node i is given a point value calculated based on the following formula stored in the temporary storage unit when the node i becomes the first winner node. If i is not the first winner node, no points are given to node i.
Here, e _i represents an average distance from the node i to the adjacent node, is calculated based on the following expression stored in the temporary storage unit, and the result is stored in the temporary storage unit.

Incidentally, m represents the number of neighbor nodes of node i that are stored in the temporary storage unit, W _i represents the weight vector of node i that are stored in the temporary storage unit.

ここで、隣接ノードへの平均距離が大きくなる場合には、ノードを含むその領域にはノードが少ないものと考えられ、逆に平均距離が小さくなる場合には、その領域にはノードが多いものと考えられる。
従って、ノードの多い領域で第１勝者ノードとなった場合には高いポイントが与えられ、ノードの少ない領域で第１勝者ノードとなった場合には低いポイントが与えられるようにノードの密度のポイント値の算出方法を上述のように構成する。
これにより、ノードを含むある程度の範囲の領域におけるノードの密集具合を推定することができるため、ノードの分布が高密度の領域に位置するノードであっても、ノードが第１勝者回数となった回数をノードの密度とする従来の場合に比べて、入力ベクトルの入力分布密度により近似した密度となるノード密度ポイントを算出することができる。 Here, when the average distance to the adjacent node is large, it is considered that there are few nodes in the area including the node. Conversely, when the average distance is small, the area has many nodes. it is conceivable that.
Therefore, the node density points so that a high point is awarded when the first winner node is reached in a region with many nodes, and a low point is awarded when the first winner node is found in a region with few nodes. The value calculation method is configured as described above.
As a result, it is possible to estimate the density of nodes in a certain area including the nodes, so even if the nodes are located in a high-density area, the number of nodes is the first winner. Compared with the conventional case where the number of times is a node density, a node density point that is a density approximated by the input distribution density of the input vector can be calculated.

単位ノード密度ポイント算出部は、例えば一時記憶部に格納される以下の式に基づいてノードｉの単位入力数あたりのノード密度ｄｅｎｓｉｔｙ_ｉを算出し、その結果を一時記憶部に格納する。
ここで、連続して与えられる入力ベクトルの入力回数を予め設定され一時記憶部に格納される一定の入力回数λごとの区間に分け、各区間においてノードｉに与えられたポイントについてその合計を累積ポイントｓ_ｉと定める。尚、入力ベクトルの総入力回数を予め設定され一時記憶部に格納されるＬＴとする場合に、ＬＴ/λを区間の総数ｎとしその結果を一時記憶部に格納し、ｎのうち、ノードに与えられたポイントの合計が０以上であった区間の数をＮとして算出し、その結果を一時記憶部に格納する（Ｎとｎは必ずしも同じとならない点に注意する）。
累積ポイントｓ_ｉは、例えば一時記憶部に格納される以下の式に基づいて算出し、その結果を一時記憶部に格納する。
ここで、ｐ_ｉ ^{（ｊ，ｋ）}はｊ番目の区間におけるｋ番目の入力によってノードｉに与えられたポイントを示し、上述のノード密度ポイント算出部により算出され、その結果を一時記憶部に格納する。
このように、単位ノード密度ポイント算出部は、一時記憶部に格納されたノードｉの密度ｄｅｎｓｉｔｙ_ｉを累積ポイントｓ_ｉの平均として算出し、その結果を一時記憶部に格納する。 The unit node density point calculation unit calculates the node density density _i per unit input number of the node i based on, for example, the following expression stored in the temporary storage unit, and stores the result in the temporary storage unit.
Here, the number of inputs of consecutively given input vectors is divided into sections for each predetermined number of inputs λ that are set in advance and stored in the temporary storage unit, and the sum of points given to node i in each section is accumulated. It is defined as point s _i . When the total number of input vectors is set to LT that is preset and stored in the temporary storage unit, LT / λ is set to the total number n of sections, and the result is stored in the temporary storage unit. The number of sections in which the total of given points is 0 or more is calculated as N, and the result is stored in the temporary storage unit (note that N and n are not necessarily the same).
The accumulated points s _i are calculated based on, for example, the following expression stored in the temporary storage unit, and the result is stored in the temporary storage unit.
Here, p _i ^{(j, k)} indicates a point given to the node i by the k-th input in the j-th section, is calculated by the node density point calculation unit described above, and the result is stored in the temporary storage unit. To do.
As described above, the unit node density point calculation unit calculates the density density _i of the node i stored in the temporary storage unit as an average of the accumulated points s _i and stores the result in the temporary storage unit.

尚、本実施の形態においては追加学習に対応するため、ｎに代えてＮを用いる。これは、追加学習において、以前の学習で生成されたノードにはポイントが与えられないことが多く、ｎを用いて密度を算出すると、以前学習したノードの密度が次第に低くなってしまうという問題を回避するためである。即ち、ｎに代えてＮを用いてノード密度を算出することで、追加学習を長時間行った場合であっても、追加されるデータが以前学習したノードの近くに入力されない限りは、そのノードの密度を変化させずに保持することができる。
これにより、追加学習を長時間実施する場合であっても、ノードのノード密度が相対的に小さくなってしまうことを防ぐことができ、従来の手法に比べて、入力ベクトルの入力分布密度により近似したノード密度を変化させずに保持して算出することができる。 In this embodiment, N is used instead of n in order to cope with additional learning. This is because, in additional learning, points are often not given to nodes generated by previous learning, and when n is used to calculate the density, the previously learned node density gradually decreases. This is to avoid it. That is, by calculating the node density using N instead of n, even if additional learning is performed for a long time, as long as the added data is not input near the previously learned node, that node Can be maintained without changing the density.
As a result, even when additional learning is performed for a long time, the node density of the node can be prevented from becoming relatively small, and the input distribution density of the input vector is approximated compared to the conventional method. The calculated node density can be held and calculated without change.

分布重なり領域検出手段２８は、一時記憶部に格納されたノード、ノード間を接続する辺、及びノードの密度について、辺によって接続されるノードの集合であるクラスタを、ノード密度算出手段２７によって算出されるノード密度に基づいてクラスタの部分集合であるサブクラスタに分割し、その結果を一時記憶部に格納し、サブクラスタの境界である分布の重なり領域を検出し、その結果を一時記憶部に格納する。 The distribution overlap area detection means 28 calculates a cluster, which is a set of nodes connected by edges, with respect to the nodes stored in the temporary storage unit, the edges connecting the nodes, and the node density. Is divided into sub-clusters that are a subset of the cluster based on the node density, the result is stored in the temporary storage unit, the overlapping area of the distribution that is the boundary of the sub-cluster is detected, and the result is stored in the temporary storage unit Store.

さらに、分布重なり領域検出手段２８は、一時記憶部に格納されたノード、ノード間を接続する辺、及びノードの密度について、ノード密度算出手段２７により算出されたノード密度に基づいて、ノード密度が局所的に最大であるノードを探索するノード探索部と、探索したノードに対して、既に他のノードに付与済みのラベルとは異なるラベルを付与する第１のラベル付与部と、第１のラベル付与部によりラベルが付与されなかったノードのうち、そのノードと辺によって接続されるノードについて、第１のラベル付与部によりラベルが付与されたノードのラベルと同じラベルを付与する第２のラベル付与部と、それぞれ異なるラベルが付与されたノード間に辺によって直接的に接続がある場合に、その辺によって接続されるノードの集合であるクラスタをクラスタの部分集合であるサブクラスタに分割するクラスタ分割部と、注目するノード及びその隣接ノードがそれぞれ異なるサブクラスタに属する場合に、その注目するノード及びその隣接ノードを含む領域を、サブクラスタの境界である分布の重なり領域として検出する分布重なり領域検出部を有する。 Further, the distribution overlap area detection unit 28 determines the node density based on the node density calculated by the node density calculation unit 27 for the node stored in the temporary storage unit, the side connecting the nodes, and the node density. A node search unit that searches for a node that is locally maximum, a first label assigning unit that assigns a label different from labels already assigned to other nodes to the searched node, and a first label The second label assignment that assigns the same label as the node of the node to which the label is assigned by the first label assignment unit for the nodes that are connected by the side among the nodes that have not been given the label by the assignment unit. Set of nodes connected by a side when there is a direct connection between the part and a node with a different label A cluster dividing unit that divides a cluster into sub-clusters that are subsets of the cluster, and when the node of interest and its neighboring nodes belong to different sub-clusters, the region including the node of interest and its neighboring nodes is It has a distribution overlap area detection unit that detects a distribution overlap area as a cluster boundary.

具体的には、一時記憶部に格納されたノード、ノード間を接続する辺、及びノードの密度について、例えば以下のようにしてサブクラスタの境界である分布の重なり領域を検出し、その結果を一時記憶部に格納する。
Ｓ３０１：ノード探索部は、一時記憶部に格納されたノード及びノードの密度について、ノード密度算出手段２７により算出されたノード密度に基づいて、ノード密度が局所的に最大であるノードを探索し、その結果を一時記憶部に格納する。
Ｓ３０２：第１のラベル付与部は、一時記憶部に格納されたノード、及びノードのラベルについて、Ｓ３０１において探索したノードに対して、既に他のノードに付与済みのラベルとは異なるラベルを付与し、その結果を一時記憶部に格納する。
Ｓ３０３：第２のラベル付与部は、一時記憶部に格納されたノード、ノード間を接続する辺、及びノードのラベルについて、Ｓ３０２において第１のラベル付与部によりラベルが付与されなかったノードについて、第１のラベル付与部にラベルが付与されたノードと辺によって接続されるノードについて、第１のラベル付与部によりラベルが付与されたノードのラベルと同じラベルを付与し、その結果を一時記憶部に格納する。即ち、密度が局所的に最大の隣接ノードと同じラベルを付与する。
Ｓ３０４：クラスタ分割部は、一時記憶部に格納されたノード、ノード間を接続する辺、及びノードのラベルについて、一時記憶部に格納された辺によって接続されるノードの集合であるクラスタを、同じラベルが付与されたノードからなるクラスタの部分集合であるサブクラスタに分割し、その結果を一時記憶部に格納する。
Ｓ３０５：分布重なり領域検出部は、一時記憶部に格納されたノード、ノード間を接続する辺、及びノードのラベルについて、注目するノードとその隣接ノードが異なるサブクラスタにそれぞれ属する場合に、その注目するノード及びその隣接ノードを含む領域を、サブクラスタの境界である分布の重なり領域として検出し、その結果を一時記憶部に格納する。 Specifically, for the nodes stored in the temporary storage unit, the edges connecting the nodes, and the density of the nodes, for example, the overlapping area of the distribution that is the boundary of the sub-cluster is detected as follows, and the result is Store in the temporary storage.
S301: The node search unit searches for a node whose node density is locally maximum based on the node density calculated by the node density calculation unit 27 for the node and the node density stored in the temporary storage unit, The result is stored in the temporary storage unit.
S302: The first label assigning unit assigns a label different from a label already assigned to another node to the node searched in S301 for the node and the node label stored in the temporary storage unit. The result is stored in the temporary storage unit.
S303: The second label assigning unit is the node stored in the temporary storage unit, the side connecting the nodes, and the label of the node. For the node that has not been given a label by the first label assigning unit in S302, For the node connected by the side and the node to which the label is assigned to the first label giving unit, the same label as the label of the node to which the label is given by the first label giving unit is given, and the result is temporarily stored To store. That is, the same label as the adjacent node having the locally maximum density is assigned.
S304: The cluster dividing unit uses the same cluster, which is a set of nodes connected by the sides stored in the temporary storage unit, for the nodes stored in the temporary storage unit, the sides connecting the nodes, and the labels of the nodes. The data is divided into sub-clusters that are a subset of the cluster composed of nodes to which labels are assigned, and the results are stored in the temporary storage unit.
S305: The distribution overlap area detection unit, when the node stored in the temporary storage unit, the side connecting the nodes, and the label of the node belong to different subclusters, respectively, An area including the node to be processed and its adjacent nodes is detected as an overlapping area of the distribution that is the boundary of the sub-cluster, and the result is stored in the temporary storage unit.

辺接続判定手段２９は、一時記憶部に格納されたノード、ノード密度、及び分布重なり領域について、第１勝者ノード及び第２勝者ノードが分布重なり領域に位置するノードである場合に、第１勝者ノード及び第２勝者ノードのノード密度に基づいて第１勝者ノード及び第２勝者ノード間に辺を接続するか否かを判定し、その結果を一時記憶部に格納する。 The edge connection determination means 29 determines the first winner when the first winner node and the second winner node are nodes located in the distribution overlap area with respect to the node, node density, and distribution overlap area stored in the temporary storage unit. Based on the node density of the node and the second winner node, it is determined whether or not an edge is connected between the first winner node and the second winner node, and the result is stored in the temporary storage unit.

さらに辺接続判定手段２９は、一時記憶部に格納されたノード、ノード密度、ノードのサブクラスタについて、ノードが属しているサブクラスタを判定する所属サブクラスタ判定部と、ノードが属するサブクラスタの頂点の密度及びノードの密度に基づいて、第１勝者ノード及び第２勝者ノード間に辺を接続するか否かを判定する辺接続判定部を有する。 Further, the edge connection determination unit 29 includes a sub-cluster determination unit that determines a sub-cluster to which the node belongs, and a vertex of the sub-cluster to which the node belongs, for the node, node density, and node sub-cluster stored in the temporary storage And an edge connection determination unit for determining whether to connect an edge between the first winner node and the second winner node based on the density of the node and the density of the node.

辺接続手段３０は、一時記憶部に格納された辺接続判定手段２９の判定結果に基づいて、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間に辺を接続し、その結果を一時記憶部に格納する。
辺削除手段３１は、一時記憶部に格納された辺接続判定手段２９の判定結果に基づいて、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間の辺を削除し、その結果を一時記憶部に格納する。 The edge connecting means 30 is based on the determination result of the edge connection determining means 29 stored in the temporary storage section, and between the first winner node and the second winner node for the nodes stored in the temporary storage section and the edges between the nodes. The sides are connected to and the result is stored in the temporary storage unit.
The edge deleting means 31 is based on the determination result of the edge connection determining means 29 stored in the temporary storage section, and between the first winner node and the second winner node for the nodes stored in the temporary storage section and the edges between the nodes. Are deleted, and the result is stored in the temporary storage unit.

具体的には、一時記憶部に格納されたノード、ノード密度、ノードのサブクラスタ、及びノード間の辺について、例えば以下のようにして辺接続判定手段２９は辺を接続するか否かを判定し、辺接続手段３０及び辺削除手段３１は辺の生成及び削除処理を実施し、その結果を一時記憶部に格納する。
Ｓ４０１：所属サブクラスタ判定部は、一時記憶部に格納されたノード、ノードのサブクラスタについて、第１勝者ノード及び第２勝者ノードが属するサブクラスタをそれぞれ判定し、その結果を一時記憶部に格納する。
Ｓ４０２：一時記憶部に格納されたＳ４０１における判定の結果、第１勝者ノード及び第２勝者ノードがどのサブクラスタにも属していない場合、又は、第１勝者ノード及び第２勝者ノードが同じサブクラスタに属している場合には、辺接続手段３０は、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間に辺を生成することによりノード間を接続し、その結果を一時記憶部に格納する。
Ｓ４０３：一時記憶部に格納されたＳ４０１における判定の結果、第１勝者ノード及び第２勝者ノードが互いに異なるサブクラスタに属す場合には、辺接続判定部は、一時記憶部に格納されたノード、ノード密度、及びノード間の辺について、ノードが属するサブクラスタの頂点の密度及びノードの密度に基づいて、第１勝者ノード及び第２勝者ノード間に辺を接続するか否かを判定し、その結果を一時記憶部に格納する。
Ｓ４０４：一時記憶部に格納されたＳ４０３における辺接続判定部による判定の結果、辺を接続する必要がないと判定した場合には、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間を辺によって接続せず、既にノード間が辺によって接続されていた場合には、辺削除手段３１は、一時記憶部に格納されたノード及びノード間の辺について、一時記憶部に格納された第１勝者ノード及び第２勝者ノード間の辺を削除し、その結果を一時記憶部に格納する。
Ｓ４０５：一時記憶部に格納されたＳ４０３における辺接続判定部による判定の結果、辺を接続する必要があると判定した場合には、辺接続手段３０は、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間に辺を生成しノード間を接続する。 Specifically, for the nodes, node density, node sub-clusters, and sides between nodes stored in the temporary storage unit, for example, the side connection determination unit 29 determines whether to connect the sides as follows. The edge connecting means 30 and the edge deleting means 31 perform edge generation and deletion processing and store the results in the temporary storage unit.
S401: The belonging sub-cluster determining unit determines the sub-cluster to which the first winner node and the second winner node belong for each of the nodes and node sub-clusters stored in the temporary storage unit, and stores the result in the temporary storage unit. To do.
S402: As a result of the determination in S401 stored in the temporary storage unit, when the first winner node and the second winner node do not belong to any subcluster, or the first winner node and the second winner node are the same subcluster The edge connecting means 30 connects the nodes by generating an edge between the first winner node and the second winner node for the node and the edge between the nodes stored in the temporary storage unit. The result is stored in the temporary storage unit.
S403: As a result of the determination in S401 stored in the temporary storage unit, if the first winner node and the second winner node belong to different sub-clusters, the edge connection determination unit, the node stored in the temporary storage unit, For node density and edges between nodes, determine whether to connect edges between the first winner node and the second winner node based on the density of the vertices of the subcluster to which the node belongs and the density of the node, and Store the result in the temporary storage.
S404: If the result of determination by the side connection determination unit in S403 stored in the temporary storage unit determines that there is no need to connect the sides, the node stored in the temporary storage unit and the sides between the nodes are When the first winner node and the second winner node are not connected by an edge and the nodes are already connected by an edge, the edge deleting unit 31 determines the nodes stored in the temporary storage unit and the edges between the nodes. The edge between the first winner node and the second winner node stored in the temporary storage unit is deleted, and the result is stored in the temporary storage unit.
S405: If it is determined that the sides need to be connected as a result of the determination by the side connection determination unit in S403 stored in the temporary storage unit, the side connection unit 30 stores the nodes and nodes stored in the temporary storage unit. For the edges in between, edges are generated between the first winner node and the second winner node to connect the nodes.

ここで、辺接続判定部による判定処理について詳細に説明する。
まず、辺接続判定部は、一時記憶部に格納されたノード及びノード密度について、第１勝者ノードのノード密度ｄｅｎｓｉｔｙ_ｗｉｎ及び第２勝者ノード密度ｄｅｎｓｉｔｙ_{ｓｅｃ−ｗｉｎ}のうち、最小のノード密度ｍを例えば一時記憶部に格納される以下の式に基いて算出し、その結果を一時記憶部に格納する。
次に、一時記憶部に格納されたノード、ノードのノード密度、及びノードのサブクラスについて、第１勝者ノード及び第２勝者ノードがそれぞれ属するサブクラスタＡ及びサブクラスタＢについて、サブクラスタＡの頂点の密度Ａ_ｍａｘ及びサブクラスタＢの頂点の密度Ｂ_ｍａｘを算出し、その結果を一時記憶部に格納する。
尚、サブクラスタに含まれるノードのうち、ノード密度が最大であるノード密度をサブクラスタの頂点の密度とする。
そして、一時記憶部に格納されたノードが属するサブクラスタの頂点の密度Ａ_ｍａｘ及びＢ_ｍａｘ、及びノードの密度ｍについて、ｍがα_ＡＡ_ｍａｘより小さく、かつ、ｍがα_ＢＢ_ｍａｘより小さいか否かを判定し、その結果を一時記憶部に格納する。即ち、一時記憶部に格納される以下の不等式を満足するか否かを判定し、その結果を一時記憶部に格納する。
判定の結果、ｍがα_ＡＡ_ｍａｘより小さく、かつ、ｍがα_ＢＢ_ｍａｘより小さい場合には、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間には辺は不要であると判定し、その結果を一時記憶部に格納する。
一方、判定の結果、ｍがα_ＡＡ_ｍａｘ以上、または、ｍがα_ＢＢ_ｍａｘ以上である場合には、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間に辺は必要であると判定し、その結果を一時記憶部に格納する。 Here, the determination process by the edge connection determination unit will be described in detail.
First, the edge connection determination unit determines the minimum node density m among the node density density _win and the second winner node density density _sec-win of the first winner node for the node and node density stored in the temporary storage unit, for example. Calculation is performed based on the following expression stored in the temporary storage unit, and the result is stored in the temporary storage unit.
Next, with regard to the nodes stored in the temporary storage unit, the node density of the nodes, and the subclass of the nodes, the subcluster A and the subcluster B to which the first winner node and the second winner node belong respectively, The density A _max and the density B _max of the vertices of the sub-cluster B are calculated, and the result is stored in the temporary storage unit.
Of the nodes included in the subcluster, the node density having the highest node density is defined as the density of the vertices of the subcluster.
Then, regarding the density A _max and B _max of the vertices of the subcluster to which the node stored in the temporary storage unit belongs, and the density m of the node, m is smaller than α _A A _max and m is smaller than α _B B _max. And the result is stored in the temporary storage unit. That is, it is determined whether or not the following inequality stored in the temporary storage unit is satisfied, and the result is stored in the temporary storage unit.
As a result of the determination, when m is smaller than α _A A _max and m is smaller than α _B B _max , the first winner node and the second winner are determined for the node and the edge between the nodes stored in the temporary storage unit. It is determined that no edge is required between the nodes, and the result is stored in the temporary storage unit.
On the other hand, as a result of the determination, if m is greater than or equal to α _A A _max or m is greater than or equal to α _B B _max , the first winner node and the first It is determined that an edge is necessary between the two winner nodes, and the result is stored in the temporary storage unit.

このように、第１勝者ノード及び第２勝者ノードの最小ノード密度ｍを、第１勝者ノード及び第２勝者ノードをそれぞれ含むサブクラスタの平均的なノード密度と比較することで、第１勝者ノード及び第２勝者ノードを含む領域におけるノード密度の凹凸の大きさを判定することができる。即ち、サブクラスタＡ及びサブクラスタＢの間に存在する分布の谷間のノード密度ｍが、閾値α_ＡＡ_ｍａｘ又はα_ＢＢ_ｍａｘより大きな場合には、ノード密度の形状は小さな凹凸であると判定することができる。 Thus, by comparing the minimum node density m of the first winner node and the second winner node with the average node density of the sub-cluster including the first winner node and the second winner node, respectively, the first winner node And the size of the unevenness of the node density in the region including the second winner node can be determined. That is, when the node density m between the valleys of the distribution existing between the sub-cluster A and the sub-cluster B is larger than the threshold value α _A A _max or α _B B _max , the node density shape is determined to be small unevenness. can do.

ここで、α_Ａ及びα_Ｂは一時記憶部に格納される以下の式に基づいて算出し、その結果を一時記憶部に格納する。尚、α_Ｂについてもα_Ａと同様にして算出することができるためここでは説明を省略する。
ｉ）Ａ_ｍａｘ/ｍｅａｎ_Ａ−１≦１の場合には、α_Ａ＝０．０とする。
ｉｉ）１＜Ａ_ｍａｘ/ｍｅａｎ_Ａ−１≦２の場合には、α_Ａ＝０．５とする。
ｉｉｉ）２＜Ａ_ｍａｘ/ｍｅａｎ_Ａ−１の場合には、α_Ａ＝１．０とする。
Ａ_ｍａｘ/ｍｅａｎ_Ａの値が１以下となるi）の場合には、Ａ_ｍａｘとｍｅａｎ_Ａの値は同程度であり、密度の凹凸はノイズの影響によるものと判断する。そして、αの値を０．０とすることで、サブクラスタが統合されるようにする。
また、Ａ_ｍａｘ/ｍｅａｎ_Ａの値が２を超えるi i i）の場合には、Ａ_ｍａｘはｍｅａｎ_Ａに比べて十分大きく、明らかな密度の凹凸が存在するものと判断する。そして、αの値を１．０とすることで、サブクラスタが分離されるようにする。
そして、Ａ_ｍａｘ/ｍｅａｎ_Ａの値が上述した場合以外となる i i）の場合には、αの値を０．５とすることで、密度の凹凸の大きさに応じてサブクラスタが統合又は分離されるようにする。
尚、ｍｅａｎ_ＡはサブクラスタＡに属すノードｉのノード密度ｄｅｎｓｉｔｙ_ｉの平均値を示し、Ｎ_ＡをサブクラスタＡに属するノードの数として、一時記憶部に格納される以下の式に基づいて算出し、その結果を一時記憶部に格納する。
Here, α _A and α _B are calculated based on the following expressions stored in the temporary storage unit, and the results are stored in the temporary storage unit. Since α _B can be calculated in the same manner as α _A , description thereof is omitted here.
i) When A _max / mean _A −1 ≦ 1, α _A = 0.0.
ii) When 1 <A _max / mean _A −1 ≦ 2, α _A = 0.5.
iii) If 2 <A _max / mean _A -1, then α _A = 1.0.
In the case of i) where the value of A _max / mean _A is 1 or less, the values of A _max and mean _A are similar, and it is determined that the unevenness in density is due to the influence of noise. Then, the sub-cluster is integrated by setting the value of α to 0.0.
When the value of A _max / mean _A exceeds iii), it is determined that A _max is sufficiently larger than mean _A and that there are irregularities with clear density. Then, the sub-cluster is separated by setting the value of α to 1.0.
Then, in the case of ii) where the value of A _max / mean _A is other than the case described above, by setting the value of α to 0.5, the sub-clusters are integrated or separated according to the size of the density unevenness. To be.
Here, mean _A indicates the average value of the node density density _i of the node i belonging to the sub-cluster A, and N _A is calculated based on the following formula stored in the temporary storage unit, where N _A is the number of nodes belonging to the sub-cluster A. The result is stored in the temporary storage unit.

このように、サブクラスタへの分離を行う際に、サブクラスタに含まれるノード密度の凹凸の程度を判定し、ある基準を満たした２つのサブクラスタを１つに統合することで、分布の重なり領域の検出におけるサブクラスタの分けすぎによる不安定化を防止することができる。
例えば、図４に示す２つのサブクラスタＡ及びＢについて、サブクラスタＡの頂点の密度がＡ_ｍａｘであり、サブクラスタＢの頂点の密度がＢ_ｍａｘであるものとする。
図４に示すように、ノイズや学習サンプルが少ないことが原因で、密度の分布に多くの細かい凹凸が形成されることがある。
このような場合に、第１勝者ノード及び第２勝者ノードがサブクラスタＡ及びＢの間にある分布の重なり領域に位置する場合に、ノード間の接続を行う際にある基準を満たした２つのサブクラスタを１つに統合することで、図４に示すように密度の分布に多くの細かい凹凸が含まれる場合であっても、図１に示すように密度の分布を平滑化することができる。 In this way, when separating into sub-clusters, the degree of unevenness of the node density included in the sub-cluster is determined, and two sub-clusters that satisfy a certain standard are integrated into one, thereby overlapping the distribution. It is possible to prevent instability due to excessive division of sub-clusters in area detection.
For example, for the two subclusters A and B shown in FIG. 4, it is assumed that the density of the vertices of the subcluster A is A _max and the density of the vertices of the subcluster B is B _max .
As shown in FIG. 4, many fine irregularities may be formed in the density distribution due to a small amount of noise and learning samples.
In such a case, when the first winner node and the second winner node are located in the overlapping region of the distribution between the sub-clusters A and B, the two that satisfy certain criteria when connecting between the nodes By integrating the sub-clusters into one, the density distribution can be smoothed as shown in FIG. 1 even when the density distribution includes many fine irregularities as shown in FIG. .

ノイズノード削除手段３２は、一時記憶部に格納されたノード、ノード密度、ノード間の辺、隣接ノードの個数について、注目するノードについて、ノード密度算出手段２７により算出されるノード密度及び注目するノードの隣接ノードの個数に基づいて、注目するノードを削除し、その結果を一時記憶部に格納する。 The noise node deletion unit 32 is configured to calculate the node density calculated by the node density calculation unit 27 and the node of interest with respect to the node of interest, the number of nodes, the node density, the sides between the nodes, and the number of adjacent nodes stored in the temporary storage unit Based on the number of adjacent nodes, the node of interest is deleted, and the result is stored in the temporary storage unit.

さらにノイズノード削除手段３２は、一時記憶部に格納されたノード、ノード密度、ノード間の辺、隣接ノードの個数について、注目するノードのノード密度を所定の閾値と比較するノード密度比較部と、注目するノードの隣接ノードの個数を算出する隣接ノード数算出部と、注目するノードをノイズノードとみなして削除するノイズノード削除部を有する。
具体的には、例えば以下のようにして一時記憶部に格納されたノード、ノード密度、ノード間の辺、隣接ノードの個数について、ノード密度及び注目するノードの隣接ノードの個数に基づいて、注目するノードを削除し、その結果を一時記憶部に格納する。 Further, the noise node deleting unit 32 includes a node density comparison unit that compares the node density of the node of interest with a predetermined threshold for the number of nodes, node density, sides between nodes, and adjacent nodes stored in the temporary storage unit; It has an adjacent node number calculation unit that calculates the number of adjacent nodes of the node of interest, and a noise node deletion unit that deletes the node of interest as a noise node.
Specifically, for example, the nodes, the node density, the sides between the nodes, and the number of adjacent nodes stored in the temporary storage unit as described below are based on the node density and the number of adjacent nodes of the target node. The node to be deleted is deleted, and the result is stored in the temporary storage unit.

ノイズノード削除手段３２は、一時記憶部に格納されたノード、ノード間の辺、隣接ノードの個数について、注目するノードｉについて、隣接ノード数算出部によりその隣接ノードの個数を算出し、その結果を一時記憶部に格納する。そして、一時記憶部に格納された隣接ノードの個数に応じて、以下の処理を実施する。
ｉ）一時記憶部に格納された隣接ノード数が２の場合、ノード密度比較部はノードｉのノード密度ｄｅｎｓｉｔｙ_ｉを例えば一時記憶部に格納される以下の式に基づいて算出する閾値と比較し、その結果を一時記憶部に格納する。
一時記憶部に格納された比較結果について、ノード密度ｄｅｎｓｉｔｙ_ｉが閾値より小さい場合には、ノイズノード削除部は、一時記憶部に格納されたノードについて、ノードを削除し、その結果を一時記憶部に格納する。
ｉｉ）一時記憶部に格納された隣接ノード数が１の場合、ノード密度比較部はノードｉのノード密度ｄｅｎｓｉｔｙ_ｉを例えば一時記憶部に格納される以下の式に基づいて算出する閾値と比較し、その結果を一時記憶部に格納する。
一時記憶部に格納された比較の結果について、ノード密度ｄｅｎｓｉｔｙ_ｉが閾値より小さい場合には、ノイズノード削除部は、一時記憶部に格納されたノードについて、ノードを削除し、その結果を一時記憶部に格納する。
ｉｉｉ）一時記憶部に格納された隣接ノード数について、隣接ノードを持たない場合、ノイズノード削除部は、一時記憶部に格納されたノードについて、ノードを削除し、その結果を一時記憶部に格納する。
ここで、予め設定され一時記憶部に格納される所定のパラメタｃ₁及びｃ₂を調整することで、ノイズノード削除手段３２によるノイズノードの削除の振る舞いを調整することができる。 The noise node deletion means 32 calculates the number of adjacent nodes by the adjacent node number calculation unit for the node i of interest with respect to the nodes stored in the temporary storage unit, the sides between the nodes, and the number of adjacent nodes, and the result Is stored in the temporary storage unit. Then, the following processing is performed according to the number of adjacent nodes stored in the temporary storage unit.
i) When the number of adjacent nodes stored in the temporary storage unit is 2, the node density comparison unit compares the node density density _i of the node _i with a threshold value calculated based on the following formula stored in the temporary storage unit, for example. The result is stored in the temporary storage unit.
For the comparison result stored in the temporary storage unit, when the node density density _i is smaller than the threshold, the noise node deletion unit deletes the node for the node stored in the temporary storage unit, and the result is stored in the temporary storage unit. To store.
ii) When the number of adjacent nodes stored in the temporary storage unit is 1, the node density comparison unit compares the node density density _i of the node _i with, for example, a threshold value calculated based on the following formula stored in the temporary storage unit. The result is stored in the temporary storage unit.
If the node density density _i is smaller than the threshold for the comparison result stored in the temporary storage unit, the noise node deletion unit deletes the node for the node stored in the temporary storage unit, and temporarily stores the result. Store in the department.
iii) Regarding the number of adjacent nodes stored in the temporary storage unit, when there is no adjacent node, the noise node deletion unit deletes the node for the node stored in the temporary storage unit, and stores the result in the temporary storage unit To do.
Here, by adjusting predetermined parameters c ₁ and c ₂ that are set in advance and stored in the temporary storage unit, the behavior of noise node deletion by the noise node deletion unit 32 can be adjusted.

クラス決定手段３３は、一時記憶部に格納されたノード、ノード間の辺、及びノードのクラスについて、ノード間に生成された辺に基づいて、ノードの属するクラスを決定し、その結果を一時記憶部に格納する。 The class determination means 33 determines the class to which the node belongs based on the side generated between the nodes, the side between the nodes, and the node class stored in the temporary storage unit, and temporarily stores the result. Store in the department.

具体的には、一時記憶部に格納されたノード、ノード間の辺、及びノードのクラスについて、例えば以下のようにしてノードの属するクラスを決定し、その結果を一時記憶部に格納する。
Ｓ５０１：一時記憶部に格納されたノード及びノードのクラスについて、すべてのノードをどのクラスにも属していない状態にし、その結果を一時記憶部に格納する。
Ｓ５０２：一時記憶部に格納されたノード及びノードのクラスについて、どのクラスにも属していないノードから、ノードｉをランダムに選択し、新しいクラスのラベルを付与し、その結果を一時記憶部に格納する。
Ｓ５０３：一時記憶部に格納されたノード、ノード間の辺、及びノードのクラスについて、ノードｉとパスによって接続しているノードをすべて探索し、ノードｉと同じラベルを付与し、その結果を一時記憶部に格納する。
Ｓ５０４：一時記憶部に格納されたノード及びノードのクラスについて、どのクラスにも属していないノードが存在する場合には、Ｓ５０２へと進み、全てのノードに対してクラスのラベルを付与するまで処理を続ける。
ここで、ノードａ及びノードｂとがパスによって接続されるとは、ノードａ及びノードｂ間において、いくつかの辺を通して２つのノードが接続されることを示す。
即ち、ノード集合Ａに含まれるノードａ、ノードｂ、ノードｘ_ｉ（ｉ＝１，２，・・・，ｎ）に対して、ノードａ及びノードｘ₁間の辺を示す（ａ，ｘ₁）、ノードｘ₁及びノードｘ₂間の辺を示す（ｘ₁，ｘ₂）、・・・、ノードｘ_ｎ及びノードｂ間の辺を示す（ｘ_ｎ，ｂ）という辺の連続が存在する場合に、ノードａ及びノードｂ間とがパスによって接続されるという。 Specifically, for the nodes stored in the temporary storage unit, the sides between the nodes, and the node class, for example, the class to which the node belongs is determined as follows, and the result is stored in the temporary storage unit.
S501: With respect to the nodes and node classes stored in the temporary storage unit, all the nodes do not belong to any class, and the result is stored in the temporary storage unit.
S502: For nodes and node classes stored in the temporary storage unit, node i is randomly selected from nodes that do not belong to any class, a new class label is assigned, and the result is stored in the temporary storage unit. To do.
S503: Search for all nodes connected to node i by a path for nodes, edges between nodes, and node classes stored in the temporary storage unit, assign the same label as node i, and temporarily store the result. Store in the storage.
S504: If there is a node that does not belong to any class of nodes and node classes stored in the temporary storage unit, the process proceeds to S502, and processing is performed until class labels are assigned to all nodes. Continue.
Here, that the node a and the node b are connected by a path indicates that two nodes are connected through some sides between the node a and the node b.
That is, for the node a, node b, and node x _i (i = 1, 2,..., N) included in the node set A, the sides between the node a and the node x ₁ are indicated (a, x ₁ ), (X ₁ , x ₂ ) indicating the side between the node x ₁ and the node x ₂ ,..., (X _n , b) indicating the side between the node x _n and the node b exists. In some cases, the node a and the node b are connected by a path.

出力情報表示手段３４は、一時記憶部に格納されたノード、及びノードのクラスについて、ノードの属するクラスのクラス数、及び各クラスのプロトタイプベクトルを出力する。 The output information display means 34 outputs the number of classes to which the node belongs and the prototype vector of each class for the nodes and node classes stored in the temporary storage unit.

続いて、本実施形態に係る情報処理装置における全体処理フローについて、図５のフローチャートを用いて説明する。図５は、本実施形態に係る情報処理装置による学習処理の処理概要を示すフローチャートである。
Ｓ６０１：入力情報取得手段２１は、ランダムに２つの入力ベクトルを取得し、ノード集合Ａをそれらに対応する２つのノードのみを含む集合として初期化し、その結果を一時記憶部に格納する。また、辺集合Ｃ⊂Ａ×Ａを空集合として初期化し、その結果を一時記憶部に格納する。
Ｓ６０２：入力情報取得手段２１は、新しい入力ベクトルξを入力し、その結果を一時記憶部に格納する。
Ｓ６０３：勝者ノード探索手段２２は、一時記憶部に格納された入力ベクトル及びノードについて、入力ベクトルξに最も近い重みベクトルを持つ第１勝者ノードａ₁及び２番目に近い重みベクトルを持つ第２勝者ノードａ₂を探索し、その結果を一時記憶部に格納する。 Next, an overall processing flow in the information processing apparatus according to the present embodiment will be described with reference to the flowchart of FIG. FIG. 5 is a flowchart illustrating an outline of a learning process performed by the information processing apparatus according to the present embodiment.
S601: The input information acquisition unit 21 acquires two input vectors at random, initializes the node set A as a set including only two nodes corresponding to them, and stores the result in the temporary storage unit. Also, the edge set C⊂A × A is initialized as an empty set, and the result is stored in the temporary storage unit.
S602: The input information acquisition unit 21 inputs a new input vector ξ and stores the result in the temporary storage unit.
S603: The winner node search means 22 has the first winner node a ₁ having the weight vector closest to the input vector ξ and the second winner having the second closest weight vector for the input vector and node stored in the temporary storage unit. The node a ₂ is searched, and the result is stored in the temporary storage unit.

Ｓ６０４：類似度閾値判定手段２４は、一時記憶部に格納された入力ベクトル、ノード、ノードの類似度閾値について、入力ベクトルξと第１勝者ノードａ₁間の距離が第１勝者ノードａ₁の類似度閾値T₁より大きいか否か、及び、入力ベクトルξと第２勝者ノードａ₂間の距離が第２勝者ノードａ₂の類似度閾値T₂より大きいか否かを判定し、その結果を一時記憶部に格納する。
ここで、一時記憶部に格納された第１勝者ノードａ₁の類似度閾値T₁及び第２勝者ノードａ₂の類似度閾値T₂は、上述のＳ２０１乃至Ｓ２０４において示したように類似度閾値算出手段２３により算出され、その結果が一時記憶部に格納される。
Ｓ６０５：一時記憶部に格納されたＳ６０４における判定の結果、入力ベクトルξと第１勝者ノードａ₁間の距離が第１勝者ノードａ₁の類似度閾値T₁より大きい、又は、入力ベクトルξと第２勝者ノードａ₂間の距離が第２勝者ノードａ₂の類似度閾値T₂より大きい場合には、ノード挿入手段２５は、一時記憶部に格納された入力ベクトル及びノードについて、入力ベクトルξを新たなノードｉとして、入力ベクトルξと同じ位置に挿入し、その結果を一時記憶部に格納する。 S604: The similarity threshold determination unit 24 determines that the distance between the input vector ξ and the first winner node a ₁ is the first winner node a ₁ for the input vector, node, and node similarity threshold stored in the temporary storage unit. It is determined whether or not it is greater than the similarity threshold T ₁ and whether the distance between the input vector ξ and the second winner node a ₂ is greater than the similarity threshold T ₂ of the second winner node a ₂ , and the result Is stored in the temporary storage unit.
Here, the similarity threshold T ₂ of the similarity threshold T ₁ and the second winning node a ₂ of the first winning node a ₁ stored in the temporary storage unit, similarity threshold as shown in S201 to S204 described above Calculated by the calculation means 23 and the result is stored in the temporary storage unit.
S605: As a result of the determination in S604 stored in the temporary storage unit, the distance between the input vector ξ and the first winner node a ₁ is greater than the similarity threshold T ₁ of the first winner node a ₁ , or the input vector ξ when the distance between the second winning node a ₂ is greater than the similarity threshold T ₂ of the second winning node a _2, the node insertion means 25, the input vector and node stored in the temporary storage unit, an input vector ξ Is inserted at the same position as the input vector ξ as a new node i, and the result is stored in the temporary storage unit.

Ｓ６０６：一方、一時記憶部に格納されたＳ６０４における判定の結果、入力ベクトルξと第１勝者ノードａ₁間の距離が第１勝者ノードａ₁の類似度閾値T₁以下であり、かつ、入力ベクトルξと第２勝者ノードａ₂間の距離が第２勝者ノードａ₂の類似度閾値T₂以下である場合には、辺接続判定手段２９は、一時記憶部に格納されたノード、ノード密度、ノード間の辺について、第１勝者ノードａ₁及び第２勝者ノードａ₂のノード密度に基づいて、第１勝者ノードａ₁及び第２勝者ノードａ₂間に辺を接続するか否かを判定し、その結果を一時記憶部に格納する。 S606: On the other hand, as a result of the determination in S604 stored in the temporary storage unit, the distance between the input vector ξ and the first winner node a ₁ is equal to or less than the similarity threshold T ₁ of the first winner node a ₁ and the input When the distance between the vector ξ and the second winner node a ₂ is equal to or less than the similarity threshold T ₂ of the second winner node a ₂ , the edge connection determination unit 29 determines the node and node density stored in the temporary storage unit. for edges between nodes, based on the node density of the first winning node a ₁ and the second winning node a _2, whether to connect the edges between the first winning node a ₁ and the second winning node a ₂ The determination is made, and the result is stored in the temporary storage unit.

Ｓ６０７：一時記憶部に格納されたＳ６０６における判定の結果、第１勝者ノードａ₁及び第２勝者ノードａ₂間に辺を生成して接続する場合には、辺接続手段３０は、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノード及び第２勝者ノード間に辺を接続し、その結果を一時記憶部に格納する。
そして、情報処理装置は、一時記憶部に格納された辺及び辺の年齢について、新しく生成された辺、及び、既にノード間に辺が生成されていた場合にはその辺について、辺の年齢を０に設定しその結果を一時記憶部に格納し、第１勝者ノードａ₁と直接的に接続される辺の年齢をインクリメントし（１増やす）、その結果を一時記憶部に格納する。
一方、一時記憶部に格納されたＳ６０６における判定の結果、第１勝者ノードａ₁及び第２勝者ノードａ₂間に辺を接続しない場合には、Ｓ６０８へと処理を進めるが、既にノード間に辺が生成されていた場合には、辺削除手段３１は、一時記憶部に格納されたノード及びノード間の辺について、第１勝者ノードａ₁及び第２勝者ノードａ₂間の辺を削除し、その結果を一時記憶部に格納する。尚、上述のＳ４０１乃至Ｓ４０５において示したようにして、辺接続判定手段２９、辺接続手段３０、及び辺削除手段３１は処理を実施する。
次いで、一時記憶部に格納されたノード及びノード密度のポイント値について、第１勝者ノードａ₁について、ノード密度算出手段２７は、一時記憶部に格納された第１勝者ノードａ₁のノード密度のポイント値を算出しその結果を一時記憶部に格納し、算出され一時記憶部に格納されたノード密度のポイント値を以前までに算出され一時記憶部に格納されたポイント値に加算することで、ノード密度ポイントとして累積し、その結果を一時記憶部に格納する。
次いで、情報処理装置は、一時記憶部に格納された第１勝者ノードａ₁が第１勝者ノードとなった累積回数Ｍ_ａ1をインクリメントし（１増やす）、その結果を一時記憶部に格納する。 S607: As a result of the determination in S606 stored in the temporary storage unit, when the side is generated and connected between the first winner node a ₁ and the second winner node a ₂ , the side connection unit 30 includes the temporary storage unit Are connected between the first winner node and the second winner node, and the result is stored in the temporary storage unit.
Then, the information processing device determines the age of the side and the age of the side stored in the temporary storage unit, the side of the newly generated side, and the side of the side if the side has already been generated between the nodes. The result is set to 0, the result is stored in the temporary storage unit, the age of the side directly connected to the first winner node a ₁ is incremented (increased by 1), and the result is stored in the temporary storage unit.
On the other hand, the result of determination in S606 stored in the temporary storage unit, when not connected to edges between the first winning node a ₁ and the second winning node a _2, although the process proceeds to S608, already between nodes When the side has been generated, the side deletion unit 31 deletes the side between the first winner node a ₁ and the second winner node a ₂ for the node stored in the temporary storage unit and the side between the nodes. The result is stored in the temporary storage unit. In addition, as shown in above-mentioned S401 thru | or S405, the edge connection determination means 29, the edge connection means 30, and the edge deletion means 31 implement a process.
Next, with respect to the node and node density point values stored in the temporary storage unit, for the first winner node a ₁ , the node density calculation means 27 calculates the node density of the first winner node a ₁ stored in the temporary storage unit. By calculating the point value and storing the result in the temporary storage unit, by adding the point value of the node density calculated and stored in the temporary storage unit to the point value previously calculated and stored in the temporary storage unit, Accumulate as node density points and store the result in the temporary storage.
Next, the information processing apparatus increments (accumulates by 1) the cumulative number of times M _a1 at which the first winner node a ₁ stored in the temporary storage unit has become the first winner node, and stores the result in the temporary storage unit.

Ｓ６０８：重みベクトル更新手段２６は、一時記憶部に格納されたノード及びノードの重みベクトルについて、第１勝者ノードａ₁の重みベクトル及び第１勝者ノードａ₁の隣接ノードの重みベクトルをそれぞれ入力ベクトルξに更に近づけるように更新し、その結果を一時記憶部に格納する。
Ｓ６０９：情報処理装置は、一時記憶部に格納された辺について、予め設定され一時記憶部に格納された閾値ａｇｅ_ｔを超えた年齢を持つ辺を削除し、その結果を一時記憶部に格納する。尚、ａｇｅ_ｔはノイズなどの影響により誤って生成される辺を削除するために使用する。ａｇｅ_ｔに小さな値を設定することにより、辺が削除されやすくなりノイズによる影響を防ぐことができるものの、値を極端に小さくすると、頻繁に辺が削除されるようになり学習結果が不安定になる。一方、極端に大きな値をａｇｅ_ｔに設定すると、ノイズの影響で生成された辺を適切に取り除くことができない。これらを考慮して、パラメタａｇｅ_ｔは実験により予め算出し一時記憶部に格納される。 S608: The weight vector updating means 26 inputs the weight vector of the first winner node a _{1 and} the weight vector of the adjacent node of the first winner node a ₁ for the nodes and node weight vectors stored in the temporary storage unit, respectively. Update it so that it is closer to ξ, and store the result in the temporary storage.
S609: the information processing apparatus, the stored edge in the temporary storage unit, delete the edges with a exceeds a preset threshold age _t stored in the temporary storage unit age, and stores the result in the temporary storage unit . The age _t is used to delete a side that is erroneously generated due to the influence of noise or the like. By setting a small value for age _t , edges can be easily deleted and the effect of noise can be prevented. However, if the value is extremely small, edges are frequently deleted and the learning result becomes unstable. Become. On the other hand, if excessively set to a large value age _t, it can not be removed edges generated by influence of noise appropriately. Taking these into account, the parameter age _t is calculated in advance by experiments and stored in the temporary storage unit.

Ｓ６１０：情報処理装置は、一時記憶部に格納された与えられた入力ベクトルξの総数について、与えられた入力ベクトルξの総数が予め設定され一時記憶部に格納されたλの倍数であるか否かを判定し、その結果を一時記憶部に格納する。一時記憶部に格納された判定の結果、入力ベクトルの総数がλの倍数でない場合にはＳ６０２へと戻り、次の入力ベクトルξを処理する。
一方、入力ベクトルξの総数がλの倍数となった場合には以下の処理を実行する。
尚、λはノイズと見なされるノードを削除する周期である。λに小さな値を設定することにより、頻繁にノイズ処理を実施することができるものの、値を極端に小さくすると、実際にはノイズではないノードを誤って削除してしまう。一方、極端に大きな値をλに設定すると、ノイズの影響で生成されたノードを適切に取り除くことができない。これらを考慮して、パラメタλは実験により予め算出し一時記憶部に格納される。 S610: The information processing apparatus determines whether or not the total number of input vectors ξ stored in the temporary storage unit is a multiple of λ set in advance and stored in the temporary storage unit. And the result is stored in the temporary storage unit. As a result of the determination stored in the temporary storage unit, if the total number of input vectors is not a multiple of λ, the process returns to S602 to process the next input vector ξ.
On the other hand, when the total number of input vectors ξ is a multiple of λ, the following processing is executed.
Note that λ is a period for deleting a node regarded as noise. Although it is possible to frequently perform noise processing by setting a small value to λ, if the value is extremely small, nodes that are not actually noise are erroneously deleted. On the other hand, if an extremely large value is set to λ, a node generated due to noise cannot be removed appropriately. In consideration of these, the parameter λ is calculated in advance through experiments and stored in the temporary storage unit.

Ｓ６１１：分布重なり領域検出手段２８は、一時記憶部に格納されたサブクラスタ及び分布の重なり領域について、上述のＳ３０１乃至Ｓ３０５において示したようにしてサブクラスタの境界である分布の重なり領域を検出し、その結果を一時記憶部に格納する。
Ｓ６１２：ノード密度算出手段２７は、一時記憶部に格納されて累積されたノード密度ポイントを単位入力数あたりの割合として算出しその結果を一時記憶部に格納し、単位入力数あたりのノードのノード密度を算出し、その結果を一時記憶部に格納する。
Ｓ６１３：ノイズノード削除手段３２は、一時記憶部に格納されたノードについて、ノイズノードと見なしたノードを削除し、その結果を一時記憶部に格納する。尚、Ｓ６１３においてノイズノード削除手段３２が使用するパラメタｃ₁及びｃ₂はノードをノイズと見なすか否かの判定に使用する。通常、隣接ノード数が２であるノードはノイズではないことが多いため、ｃ₁は０に近い値を使用する。また、隣接ノード数が１であるノードはノイズであることが多いため、ｃ₂は１に近い値を使用するものとし、これらのパラメタは予め設定され一時記憶部に格納される。
Ｓ６１４：情報処理装置は、一時記憶部に格納された与えられた入力ベクトルξの総数について、与えられた入力ベクトルξの総数が予め設定され一時記憶部に格納されたＬＴの倍数であるか否かを判定し、その結果を一時記憶部に格納する。一時記憶部に格納された判定の結果、入力ベクトルの総数がＬＴの倍数でない場合にはＳ６０２へと戻り、次の入力ベクトルξを処理する。
一方、入力ベクトルξの総数がＬＴの倍数となった場合には以下の処理を実行する。 S611: The distribution overlap area detection means 28 detects the distribution overlap area that is the boundary of the subcluster as shown in S301 to S305 above for the subcluster and distribution overlap area stored in the temporary storage unit. The result is stored in the temporary storage unit.
S612: The node density calculation unit 27 calculates the node density points stored in the temporary storage unit and accumulated as a ratio per unit input number, stores the result in the temporary storage unit, and stores the node nodes per unit input number. The density is calculated and the result is stored in the temporary storage unit.
S613: The noise node deleting unit 32 deletes a node regarded as a noise node from the nodes stored in the temporary storage unit, and stores the result in the temporary storage unit. In S613, the parameters c ₁ and c ₂ used by the noise node deletion unit 32 are used to determine whether or not the node is regarded as noise. Normally, a node having two adjacent nodes is often not a noise, and therefore c ₁ uses a value close to 0. Further, since a node having the number of adjacent nodes of 1 is often noise, c ₂ is assumed to use a value close to 1, and these parameters are set in advance and stored in the temporary storage unit.
S614: The information processing apparatus determines whether the total number of given input vectors ξ is a preset multiple of LT stored in the temporary storage unit with respect to the total number of input vectors ξ stored in the temporary storage unit. And the result is stored in the temporary storage unit. If the total number of input vectors is not a multiple of LT as a result of the determination stored in the temporary storage unit, the process returns to S602 to process the next input vector ξ.
On the other hand, when the total number of input vectors ξ is a multiple of LT, the following processing is executed.

Ｓ６１５：クラス決定手段３３は、一時記憶部に格納されたノード、ノード間の辺、及びノードのクラスについて、ノード間に生成された辺に基づいて、上述のＳ５０１乃至Ｓ５０４において示したようにしてノードの属するクラスを決定し、その結果を一時記憶部に格納する。そして、出力情報表示手段３４は、一時記憶部に格納されたノード及びノードのクラスについて、ノードの属するクラスのクラス数、及び各クラスのプロトタイプベクトルを出力する。以上の処理を終了した後、学習を停止する。 S615: The class determination unit 33 performs the above processing on the nodes stored in the temporary storage unit, the sides between the nodes, and the class of the nodes based on the sides generated between the nodes as shown in S501 to S504 above. The class to which the node belongs is determined, and the result is stored in the temporary storage unit. And the output information display means 34 outputs the class number of the class to which the node belongs and the prototype vector of each class for the node and the class of the node stored in the temporary storage unit. After completing the above processing, learning is stopped.

続いて、以下に本実施の形態の具体例として、入力データに対する学習結果を説明する。
まず、図７に示す人工データセットを用いて、従来技術であるＳＯＩＮＮ及び本実施形態の情報処理装置について比較実験を実施する。 Subsequently, a learning result for input data will be described below as a specific example of the present embodiment.
First, using the artificial data set shown in FIG. 7, a comparative experiment is performed on the conventional technique, SOIN, and the information processing apparatus of the present embodiment.

図７は、従来技術であるＳＯＩＮＮ及び本実施形態の情報処理装置との比較実験に用いた入力ベクトルの人工データセットを示す図である。
人工データセットは分布に重なりのある２つのガウス分布Ａ及びＢ、２つの同心円Ｃ及びＤ、及びサインカーブＥ１、Ｅ２、Ｅ３の合計５つのクラスによって構成される。また、実世界の環境を想定して、人工データセットには１０％の一様ノイズが加えられている。
図６は、非定常的な環境における図７に示す人工データセットからの入力ベクトルの入力環境を示す表である。定常的な環境においては、人工データセット全体からランダムに入力ベクトルを与えるものとし、非定常的な環境においては、図７に示すように人工データセットを７つの領域Ａ、Ｂ、Ｃ、Ｄ、Ｅ１、Ｅ２、Ｅ３に分け、入力環境を一定期間ごとに図６に示す表に従って切り替えながら入力ベクトルを与えるものとする。このような非定常的な環境における実験は、オンラインでの追加学習を想定して実施するものである。
従来技術であるＳＯＩＮＮについて、同じ人工データセットを用いた実験が実施され、定常的な環境及び非定常的な環境における実験ともに、５つのクラス及び各クラスの位相構造を適切に出力することが示されている。 FIG. 7 is a diagram showing an artificial data set of input vectors used in a comparison experiment between the conventional technique SOIN and the information processing apparatus of the present embodiment.
The artificial data set is constituted by a total of five classes: two Gaussian distributions A and B with overlapping distributions, two concentric circles C and D, and sine curves E1, E2, and E3. In addition, assuming a real-world environment, 10% uniform noise is added to the artificial data set.
FIG. 6 is a table showing an input environment of input vectors from the artificial data set shown in FIG. 7 in a non-stationary environment. In a stationary environment, an input vector is randomly given from the entire artificial data set. In an unsteady environment, the artificial data set is divided into seven regions A, B, C, D, It is assumed that the input environment is divided into E1, E2, and E3, and the input environment is given while switching the input environment at regular intervals according to the table shown in FIG. Experiments in such non-stationary environments are conducted assuming online additional learning.
Experiments using the same artificial data set were performed for the conventional technique, SOINN, and it was shown that 5 classes and topological structures of each class were appropriately output in both stationary and non-stationary environments. Has been.

図８は、図７に示す人工データセットに対する、本実施形態の情報処理装置の出力結果を示す図である。
図８に示すように、定常的な環境及び非定常的な環境のいずれにおいても、本実施形態の情報処理装置は５つのクラス及び各クラスの位相構造を適切に出力することができる。即ち、従来技術であるＳＯＩＮＮに比べて、本実施形態の情報処理装置は同程度の学習機能を有するものである。
尚、予め設定されるパラメタについては、λ＝１００、ａｇｅ_ｔ＝１００、ｃ₁＝０．００１、ｃ₂＝１．０とし、これらは実験により定めた。 FIG. 8 is a diagram showing an output result of the information processing apparatus of the present embodiment for the artificial data set shown in FIG.
As shown in FIG. 8, the information processing apparatus according to the present embodiment can appropriately output the five classes and the phase structure of each class in both the stationary environment and the non-stationary environment. That is, the information processing apparatus according to the present embodiment has the same level of learning function as compared with the conventional SOIN.
The parameters set in advance were λ = 100, age _t = 100, c ₁ = 0.001, and c ₂ = 1.0, and these were determined by experiment.

次に、図９は、従来技術であるＳＯＩＮＮ及び本実施形態の情報処理装置との比較実験に用いた入力ベクトルの人工データセットを示す図である。
図９に示す人工データセットは分布に重なりのある３つのガウス分布から構成され、１０％の一様ノイズが加えられている。図９に示す人工データセットは、図７に示す人工データセットに比べて、クラス間の分布に高密度の重なりを持つ。
定常的な環境においては、図９に示す人工データセット全体からランダムに入力ベクトルを選択し、非定常的な環境においては、各クラスから順にそれぞれ１０，０００回ずつ入力ベクトルを選択して学習を実施する。 Next, FIG. 9 is a diagram showing an artificial data set of input vectors used in a comparison experiment between the conventional technique SOINN and the information processing apparatus of the present embodiment.
The artificial data set shown in FIG. 9 is composed of three Gaussian distributions with overlapping distributions, and 10% uniform noise is added. The artificial data set shown in FIG. 9 has a higher density of distribution between classes than the artificial data set shown in FIG.
In a stationary environment, input vectors are randomly selected from the entire artificial data set shown in FIG. 9, and in an unsteady environment, learning is performed by selecting input vectors 10,000 times in order from each class. carry out.

図１０は、図９に示す人工データセットに対する、従来技術であるＳＯＩＮＮの出力結果を示す図である。
図１０に示すように、定常的な環境及び非定常的な環境のいずれにおいても、従来技術であるＳＯＩＮＮは高密度の重なりのあるクラスを分離することができない。
尚、予め設定されるパラメタについて、λ＝２００、ａｇｅ_ｔ＝５０、ｃ＝１．０については実験により定め、α₁＝１/６、α₂＝１/４、α₃＝１/４、β＝２/３、γ＝３/４については非特許文献２に開示された値と同じ値を使用した。 FIG. 10 is a diagram showing an output result of the conventional SOIN for the artificial data set shown in FIG.
As shown in FIG. 10, in either a stationary environment or a non-stationary environment, the conventional technique SOINN cannot separate high-density overlapping classes.
For the preset parameters, λ = 200, age _t = 50, and c = 1.0 are determined by experiment, α ₁ = 1/6, α ₂ = 1/4, α ₃ = 1/4, For β = 2/3 and γ = 3/4, the same values as those disclosed in Non-Patent Document 2 were used.

一方、図１１に示すように、定常的な環境及び非定常的な環境のいずれにおいても、本実施形態の情報処理装置は３つのクラス及び各クラスの位相構造を適切に出力することができる。即ち、本実施形態の情報処理装置は高密度の重なりを持つクラスを分離することができる。
尚、予め設定されるパラメタについては、λ＝２００、ａｇｅ_ｔ＝５０、ｃ₁＝０．００１、ｃ₂＝１．０とし、これらは実験により定めた。 On the other hand, as shown in FIG. 11, the information processing apparatus according to the present embodiment can appropriately output the three classes and the phase structure of each class in both the stationary environment and the non-stationary environment. That is, the information processing apparatus according to the present embodiment can separate classes having high density overlap.
The parameters set in advance were λ = 200, age _t = 50, c ₁ = 0.001, and c ₂ = 1.0, and these were determined by experiments.

続いて、実データセットを用いて、従来技術であるＳＯＩＮＮ及び本実施形態の情報処理装置との比較実験を実施する。 Subsequently, using the actual data set, a comparison experiment between the conventional technique SOINN and the information processing apparatus of the present embodiment is performed.

まず、ＡＴ＆Ｔデータベース（ｈｔｔｐ:／／ｗｗｗ．ｕｋ．ｒｅｓｅａｒｃｈ．ａｔｔ．ｃｏｍ）を用いた比較実験を実施する。
実験に使用するデータセットは、ＡＴ＆Ｔ＿ＦＡＣＥデータベースから選択された１０クラス（各クラスは１０個のサンプルを含む）を使用する。データセットに含まれるオリジナル画像は９２×１１２ピクセル、グレースケール２５６階調である（詳細は、非特許文献２における図１１を参照）。
ここで、実験に際しては、オリジナルの画像を２３×２８ピクセルに縮小し（最近傍法により補間)、ガウス分布（サイズ４、分散２）による平滑化処理を施した画像を使用する。これらの処理によって得られた特徴ベクトルを入力ベクトルとして実験に使用する（詳細は、非特許文献２における図１２を参照）。
定常的な環境においては、入力ベクトルをデータセット全体からランダムに選択し、非定常的な環境においては、各クラスから順にそれぞれ１，０００回ずつ入力ベクトルを選択して学習を実施する。 First, a comparative experiment using the AT & T database (http://www.uk.research.att.com) is performed.
The data set used for the experiment uses 10 classes selected from the AT & T_FACE database (each class contains 10 samples). The original image included in the data set has 92 × 112 pixels and 256 gray scales (see FIG. 11 in Non-Patent Document 2 for details).
Here, in the experiment, an original image is reduced to 23 × 28 pixels (interpolated by the nearest neighbor method), and an image subjected to smoothing processing by a Gaussian distribution (size 4, variance 2) is used. The feature vector obtained by these processes is used as an input vector for the experiment (for details, see FIG. 12 in Non-Patent Document 2).
In a stationary environment, learning is performed by selecting an input vector randomly from the entire data set, and in an unsteady environment, selecting an input vector 1,000 times in order from each class.

データセットに対する実験結果は、定常的な環境及び非定常的な環境のいずれにおいても、本実施形態の情報処理装置は出力クラス数として１０クラスを出力する場合が最多である。
尚、予め設定されるパラメタについては、λ＝２５、ａｇｅ_ｔ＝２５、ｃ₁＝０．０、ｃ₂＝１．０とし、これらは実験により定めた。 As for the experimental results for the data set, the information processing apparatus according to the present embodiment outputs 10 classes as the number of output classes in both the stationary environment and the non-stationary environment.
The parameters set in advance were λ = 25, age _t = 25, c ₁ = 0.0, and c ₂ = 1.0, and these were determined by experiments.

また、本実施形態の情報処理装置が出力するプロトタイプベクトルの一つを用いて、非特許文献２におけるＳＯＩＮＮの実験と同様に、オリジナルのデータセットの識別を実施する。
その結果、本実施形態の情報処理装置は、定常的な環境においては９０％の認識率を、非定常的な環境においては８６％の認識率を得ることができる。
これらの認識率はＳＯＩＮＮと同程度の認識率であり、本実施形態の情報処理装置はＳＯＩＮＮと同程度の認識機能を有するものである。
尚、ＳＯＩＮＮと同程度の認識率を得ることができたのは、データセットに含まれるサンプル数が少なく、クラス間の分布の重なりが低密度であるためと考えられる。 Further, using one of the prototype vectors output by the information processing apparatus of the present embodiment, the original data set is identified in the same manner as the SOIN experiment in Non-Patent Document 2.
As a result, the information processing apparatus of the present embodiment can obtain a recognition rate of 90% in a stationary environment and a recognition rate of 86% in a non-stationary environment.
These recognition rates are similar to those of SOINN, and the information processing apparatus of this embodiment has a recognition function similar to that of SOINN.
The reason why the recognition rate comparable to that of SOINN could be obtained is that the number of samples included in the data set is small and the distribution overlap between classes is low density.

次に、従来技術であるＳＯＩＮＮ及び本実施形態の情報処理装置の出力結果について、その安定性を比較するため、それぞれ１，０００回ずつ実験を実施して、出力クラス数の頻度を確認する。
図１２は、ＳＯＩＮＮの出力結果である出力クラス数の頻度を示す図である。図１３は、本実施形態の情報処理装置の出力結果である出力クラス数の頻度を示す図である。
図１２及び図１３より、定常的な環境（ｓｔａｔｉｏｎａｒｙ）及び非定常的な環境（ｎｏｎ−ｓｔａｔｉｏｎａｒｙ）のいずれにおいても、本実施形態の情報処理装置はＳＯＩＮＮと比較して１０クラス前後を出力する回数が多い。即ち、本実施形態の情報処理装置は、ＳＯＩＮＮに比べて出力結果が安定する。 Next, in order to compare the stability of the output results of the conventional technique SOINN and the information processing apparatus of the present embodiment, experiments are performed 1,000 times to confirm the frequency of the number of output classes.
FIG. 12 is a diagram showing the frequency of the number of output classes, which is the output result of SOIN. FIG. 13 is a diagram illustrating the frequency of the number of output classes, which is an output result of the information processing apparatus according to the present embodiment.
From FIG. 12 and FIG. 13, the number of times that the information processing apparatus of this embodiment outputs around 10 classes as compared with SOIN in both the stationary environment (stationary) and the non-stationary environment (non-stationary). There are many. That is, the output result of the information processing apparatus of this embodiment is more stable than that of SOIN.

さらに続いて、別の実データを用いて比較実験を実施する。実験に使用するデータとして、Optical Recognition of Handwritten Digits database(Optdigits)を使用する（http://www.ics.uci. edu/~mlearn/MLRepository.html）。このデータセットは１０クラスの手書き数字からなり、学習データとして３，８２３個、テストデータとして１，７９７個のサンプルを含む。サンプルの次元数は６４である。 Subsequently, a comparative experiment is performed using another actual data. As data used in the experiment, Optical Recognition of Handwritten Digits database (Optdigits) is used (http://www.ics.uci.edu/~mlearn/MLRepository.html). This data set is composed of 10 classes of handwritten numerals, and includes 3,823 samples as learning data and 1,797 samples as test data. The number of dimensions of the sample is 64.

図１４は、Ｏｐｔｄｉｇｉｔｓのデータセットに対する、従来技術であるＳＯＩＮＮの出力結果であるプロトタイプベクトルの一例を示す図である。
データセットに対する実験結果は、定常的な環境及び非定常的な環境のいずれにおいても、ＳＯＩＮＮは出力クラス数として１０クラスを出力する場合が最多である。
そして、ＳＯＩＮＮが出力するプロトタイプベクトルの一つを用いて、テストデータの分類を実施すると、定常的な環境においては９２．２％の認識率を、非定常的な環境においては９０．４％の認識率を得ることができる。
さらに、ＳＯＩＮＮについて１００回の実験を実施し、クラス数の変動を確認すると、定常的な環境及び非定常的な環境のいずれにおいても、６乃至１３クラスを出力する。
尚、予め設定されるパラメタについて、λ＝２００、ａｇｅ_ｔ＝５０、ｃ＝１．０については実験により定め、α₁＝１/６、α₂＝１/４、α₃＝１/４、β＝２/３、γ＝３/４については非特許文献２に開示された値と同じ値を使用した。 FIG. 14 is a diagram illustrating an example of a prototype vector that is an output result of SOIN, which is a conventional technique, with respect to the Optdigits data set.
As a result of the experiment on the data set, SOINN outputs 10 classes as the number of output classes in both the stationary environment and the non-stationary environment.
When the test data is classified using one of the prototype vectors output by SOINN, the recognition rate is 92.2% in a stationary environment and 90.4% in a non-stationary environment. Recognition rate can be obtained.
Furthermore, when 100 experiments are performed for SOINN and the variation in the number of classes is confirmed, 6 to 13 classes are output in both the stationary environment and the non-stationary environment.
For the preset parameters, λ = 200, age _t = 50, and c = 1.0 are determined by experiment, α ₁ = 1/6, α ₂ = 1/4, α ₃ = 1/4, For β = 2/3 and γ = 3/4, the same values as those disclosed in Non-Patent Document 2 were used.

図１５は、Ｏｐｔｄｉｇｉｔｓのデータセットに対する、本実施形態の情報処理装置の出力結果であるプロトタイプベクトルの一例を示す図である。
データセットに対する実験結果は、定常的な環境及び非定常的な環境のいずれにおいても、本実施形態の情報処理装置は出力クラス数として１２クラスを出力する場合が最多である。
そして、本実施形態の情報処理装置が出力するプロトタイプベクトルの一つを用いて、テストデータの分類を実施すると、定常的な環境においては９４．３％の認識率を、非定常的な環境においては９５．８％の認識率を得ることができ、ＳＯＩＮＮに比べて高い認識率を得ることができる。
さらに、本実施形態の情報処理装置について１００回の実験を実施して、クラス数の変動を確認すると、定常的な環境及び非定常的な環境のいずれにおいても、１０乃至１３クラスを出力し、ＳＯＩＮＮに比べて出力結果が安定する。
尚、予め設定されるパラメタについては、λ＝２００、ａｇｅ_ｔ＝５０、ｃ₁＝０．００１、ｃ₂＝１．０とし、これらは実験により定めた。 FIG. 15 is a diagram illustrating an example of a prototype vector that is an output result of the information processing apparatus of the present embodiment for the Optdigits data set.
As for the experimental results for the data set, the information processing apparatus according to the present embodiment outputs 12 classes as the number of output classes in both the stationary environment and the non-stationary environment.
Then, when test data is classified using one of the prototype vectors output by the information processing apparatus of the present embodiment, a recognition rate of 94.3% is obtained in a non-stationary environment. Can obtain a recognition rate of 95.8%, and can obtain a recognition rate higher than that of SOINN.
Furthermore, when the experiment of 100 times is performed on the information processing apparatus according to the present embodiment and the change in the number of classes is confirmed, 10 to 13 classes are output in both the stationary environment and the unsteady environment. The output result is more stable than that of SOINN.
The parameters set in advance were λ = 200, age _t = 50, c ₁ = 0.001, and c ₂ = 1.0, and these were determined by experiments.

ここで、図１４及び図１５に示すプロトタイプベクトルの例を比較すると、ＳＯＩＮＮは「１'」及び「９'」といったサンプルを抽出することができないが、本実施形態の情報処理装置は「１'」及び「９'」といったサンプルを抽出することができる。即ち、ＳＯＩＮＮでは数字の「１」を１つのクラスとして出力するのに対して、本実施形態の情報処理装置は「１」を２つのクラスとして出力することが確認できる（同様にして、本実施形態の情報処理装置は数字の「９」についても２クラスに分けている。)。
これは、オリジナルのデータセットにおいて、図１５に示す「１」及び「１'」といったサンプルの間には大きな違いが認められることより、本実施形態の情報処理装置は「１'」を分離して抽出したものと考えられる(「９」及び「９'」についても同様であるものと考えられる。)。
従って、本実施形態の情報処理装置は、例えば図１５に示す「１」及び「１'」などの分布に重なりのあるクラスを分離することができ、ＳＯＩＮＮに比べて、オリジナルデータの情報をより適切に保存することができる。
以上、実データＯｐｔｄｉｇｉｔｓを使用した実験結果より、本実施形態の情報処理装置は、ＳＯＩＮＮに比べて、分布に重なりのあるクラスを分離することができる。
さらに、本実施形態の情報処理装置は、高い認識率を得ることができるとともに、出力結果の安定性が高いものである。 Here, comparing the prototype vector examples shown in FIGS. 14 and 15, although the SINN cannot extract samples such as “1 ′” and “9 ′”, the information processing apparatus of the present embodiment is “1 ′”. ”And“ 9 ′ ”can be extracted. That is, it can be confirmed that the number “1” is output as one class in SOINN, whereas the information processing apparatus of the present embodiment outputs “1” as two classes (similarly, The information processing apparatus in the form also divides the number “9” into two classes.)
This is because, in the original data set, a large difference is recognized between samples such as “1” and “1 ′” shown in FIG. 15, so that the information processing apparatus of this embodiment separates “1 ′”. ("9" and "9 '" are considered to be the same).
Therefore, the information processing apparatus of the present embodiment can separate classes with overlapping distributions such as “1” and “1 ′” shown in FIG. 15, for example. Can be stored properly.
As described above, from the experimental results using the actual data Optdigits, the information processing apparatus according to the present embodiment can separate classes having overlapping distributions as compared to SOIN.
Furthermore, the information processing apparatus according to the present embodiment can obtain a high recognition rate and has high output result stability.

以上の通り説明した本発明に係る情報処理装置によって、次のような効果を奏することができる。
まず、分布に高密度の重なりのあるクラスを分離することができる。そして、分布の重なり領域の検出処理においては、平滑化の手法を導入したことより、ＳＯＩＮＮに比べてより安定的に動作することができる。
さらに、１層構造であっても効率的にノイズノードを削除することができるため、完全なオンラインでの追加学習を実現することができる。
さらにまた、ＳＯＩＮＮに比べて、より少ないパラメタで動作するため、処理をより容易に実行することができる。
これにより、例えば、本発明に係る情報処理装置をロボットに搭載することで、ロボットは、周囲から取得する様々な情報を入力データとして、従来は分類が困難であった複雑なデータについても、ノイズデータを排除しながら、リアルタイムで安定して認識することができる。 The following effects can be achieved by the information processing apparatus according to the present invention described above.
First, classes with high-density overlap in distribution can be separated. In addition, in the process of detecting the overlapping region of the distribution, since a smoothing method is introduced, the operation can be performed more stably than the SOIN.
Furthermore, since the noise node can be efficiently deleted even in the single-layer structure, complete online additional learning can be realized.
Furthermore, since the operation is performed with fewer parameters compared to SOIN, the processing can be executed more easily.
As a result, for example, by mounting the information processing apparatus according to the present invention on a robot, the robot uses various information acquired from the surroundings as input data, even for complicated data that has been difficult to classify conventionally. Recognize stably in real time while eliminating data.

その他の発明の実施の形態．
本発明の目的は、上述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体（または記憶媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは当然である。この場合、記録媒体から読み出されたプログラムコード自体が上述の実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 Other Embodiments of the Invention
An object of the present invention is to supply a recording medium (or storage medium) that records a program code of software that realizes the functions of the above-described embodiments to a system or apparatus, and a computer (or CPU or MPU) of the system or apparatus. Of course, this can also be achieved by reading and executing the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.

また、コンピュータが読み出したプログラムコードを実行することにより、上述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼動しているオペレーティングシステム（ＯＳ）などが実際の処理の一部又は全部を行い、その処理によって上述した実施形態の機能が実現される場合も当然含まれる。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) operating on the computer based on the instruction of the program code. Of course, a case where part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing is included.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行い、その処理によって上述した実施形態の機能が実現される場合も当然含まれる。 Further, after the program code read from the recording medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. Naturally, a case where the CPU or the like provided in the card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing is included.

本発明を上記記録媒体に適用する場合、その記録媒体には、上述した図５に示すフローチャートに対応するプログラムコードが格納されることになる。 When the present invention is applied to the recording medium, program code corresponding to the above-described flowchart shown in FIG. 5 is stored in the recording medium.

以上、本発明をその実施の形態により説明したが、本発明はその趣旨の範囲において種々の変形が可能である。 As mentioned above, although this invention was demonstrated by the embodiment, this invention can be variously deformed in the range of the meaning.

クラス間の分布重なり領域の例を示す図である。It is a figure which shows the example of the distribution overlap area | region between classes. 本発明を実施するための構成例を示す図である。It is a figure which shows the structural example for implementing this invention. 本発明を実施するための機能ブロックを示す図である。It is a figure which shows the functional block for implementing this invention. 細かい凹凸を含んだクラス間の分布重なり領域の例を示す図である。It is a figure which shows the example of the distribution overlap area | region between the classes containing a fine unevenness | corrugation. 本発明に係る実施例による学習処理の処理概要を示すフローチャートである。It is a flowchart which shows the process outline | summary of the learning process by the Example which concerns on this invention. 非定常的な環境における入力ベクトルの入力環境を示す表である。It is a table | surface which shows the input environment of the input vector in non-stationary environment. 比較実験に用いた入力ベクトルの人工データセットを示す図である。It is a figure which shows the artificial data set of the input vector used for the comparison experiment. 人工データセットに対する本発明に係る実施例の出力結果を示す図である。It is a figure which shows the output result of the Example which concerns on this invention with respect to an artificial data set. 分布の重なりを含み、かつ、ノイズを含む人工データセットを示す図である。It is a figure which shows the artificial data set containing the overlap of distribution and containing noise. 人工データセットに対するＳＯＩＮＮの出力結果を示す図である。It is a figure which shows the output result of SOINN with respect to an artificial data set. 人工データセットに対する本発明に係る実施例の出力結果を示す図である。It is a figure which shows the output result of the Example which concerns on this invention with respect to an artificial data set. ＳＯＩＮＮの出力結果である出力クラス数の頻度を示す図である。It is a figure which shows the frequency of the number of output classes which is an output result of SOIN. 本発明に係る実施例の出力結果である出力クラス数の頻度を示す図である。It is a figure which shows the frequency of the number of output classes which is an output result of the Example which concerns on this invention. ＳＯＩＮＮの出力結果であるプロトタイプベクトルを示す図である。It is a figure which shows the prototype vector which is an output result of SOIN. 本発明に係る実施例の出力結果であるプロトタイプベクトルを示す図である。It is a figure which shows the prototype vector which is an output result of the Example which concerns on this invention. ＳＯＩＮＮによる学習処理の処理概要を示すフローチャートである。It is a flowchart which shows the process outline | summary of the learning process by SOIN.

Explanation of symbols

１情報処理装置
１０コンピュータ
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４バス
１５入出力インターフェイス
１６入力部
１７出力部
１８記憶部
１９通信部
２０ドライブ
２０１磁気ディスク
２０２光ディスク
２０３フレキシブルディスク
２０４半導体メモリ
２１入力情報取得手段
２２勝者ノード探索手段
２３類似度閾値算出手段
２４類似度閾値判定手段
２５ノード挿入手段
２６重みベクトル更新手段
２７ノード密度算出手段
２８分布重なり領域検出手段
２９辺接続判定手段
３０辺接続手段
３１辺削除手段
３２ノイズノード削除手段
３３クラス決定手段
３４出力情報表示手段 1 Information processing apparatus 10 Computer 11 CPU
12 ROM
13 RAM
14 bus 15 input / output interface 16 input unit 17 output unit 18 storage unit 19 communication unit 20 drive 201 magnetic disk 202 optical disk 203 flexible disk 204 semiconductor memory 21 input information acquisition unit 22 winner node search unit 23 similarity threshold calculation unit 24 similarity Threshold determination means 25 Node insertion means 26 Weight vector update means 27 Node density calculation means 28 Distribution overlap area detection means 29 Edge connection determination means 30 Edge connection means 31 Edge deletion means 32 Noise node deletion means 33 Class determination means 34 Output information display means

Claims

Having a structure of at least one layer in which nodes described by multidimensional vectors are arranged;
In an information processing apparatus for sequentially inputting input vectors belonging to an arbitrary class and learning an input distribution structure of the input vectors,
A node having a weight vector closest to the input vector to be input is a first winner node, a node having a weight vector closest to the second is a second winner node, and between the first winner node and the second winner node. When the side is connected to
Node density calculating means for calculating the node density of the node of interest based on the node of interest and the average distance between the node of interest and nodes directly connected by the sides;
A cluster that is a set of nodes connected by edges is divided into sub-clusters that are subsets of clusters based on the node density calculated by the node density calculation means, and an overlapping region of distribution that is a boundary of the sub-cluster A distribution overlap area detecting means for detecting
When the first winner node and the second winner node are nodes located in the distributed overlap area, the first winner node and the second winner node are determined based on the node density of the first winner node and the second winner node. Edge connection determination means for determining whether to connect an edge between two winner nodes;
Based on the determination result, side connection means for connecting a side between the first winner node and the second winner node;
An information processing apparatus comprising: an edge deleting unit that deletes an edge between the first winner node and the second winner node based on the determination result.

Having a structure of at least one layer in which nodes described by multidimensional vectors are arranged;
In an information processing apparatus for sequentially inputting input vectors belonging to an arbitrary class and learning an input distribution structure of the input vectors,
A node having a weight vector closest to the input vector to be input is a first winner node, a node having a weight vector closest to the second is a second winner node, and between the first winner node and the second winner node. When the side is connected to
Node density calculating means for calculating the node density of the node of interest based on the node of interest and the average distance between the node of interest and nodes directly connected by the sides;
Noise node deletion means for deleting the node of interest based on the node density calculated by the node density calculation means and the number of nodes directly connected to the node of interest by sides. An information processing apparatus characterized by that.

Weight vector updating means for updating the weight vector corresponding to the first winner node and the weight vector corresponding to the node directly connected to the first winner node by an edge so as to be closer to the input vector, respectively. The information processing apparatus according to claim 1 or 2.

Noise node deletion means for deleting the node of interest based on the node density calculated by the node density calculation means and the number of nodes directly connected to the node of interest by sides. The information processing apparatus according to claim 1, further comprising:

The node density calculation means includes:
Unit node density for calculating the node density of the first winner node as a ratio per unit input based on the average distance between the first winner node and the node directly connected to the first winner node by an edge The information processing apparatus according to claim 1, further comprising a calculation unit.

The node density calculation means includes:
A node density point calculation unit that calculates a point value of a node density of the first winner node based on an average distance between the first winner node and a node directly connected to the first winner node by an edge;
Node density points are accumulated until the number of inputs of the input vector reaches a predetermined number of unit inputs, and when the number of inputs of the input vector reaches a predetermined number of unit inputs, the accumulated node density points are input as units. 5. The information processing apparatus according to claim 1, further comprising: a unit node density point calculation unit that calculates a ratio per number and calculates a node density of the node per unit input number. 6.

2. The self-propagating neural network that inputs the input vector to a neural network and automatically increases the number of nodes arranged in the neural network based on the inputted input vector. 6. The information processing apparatus according to any one of claims 6.

When there is a node that is directly connected to the node of interest by a side with respect to the node of interest, among the nodes that are directly connected, between the nodes having the maximum distance from the node of interest If the distance is a similarity threshold and there is no node that is directly connected to the node of interest by a side, the distance between the nodes with the smallest distance from the node of interest is calculated as the similarity threshold. Similarity threshold calculation means;
Whether the distance between the input vector and the first winner node is greater than the similarity threshold of the first winner node, and the distance between the input vector and the second winner node is similar to the second winner node A similarity threshold determination means for determining whether or not it is greater than the degree threshold;
8. The information processing apparatus according to claim 7, further comprising node insertion means for inserting the input vector as a node at the same position as the input vector based on the similarity threshold determination result.

The information processing apparatus according to claim 7, wherein the self-propagating neural network has a one-layer structure.

The distribution overlap area detecting means includes:
Based on the node density calculated by the node density calculation means, a node search unit for searching for a node having a locally maximum node density;
A first label assigning unit for assigning a label different from a label already assigned to another node to the searched node;
For a node that has not been given a label by the first label giving unit, a label is given by the first label giving unit for a node that is connected by a side to a node that has been given a label by the first label giving unit. A second label assigning unit that assigns the same label as the label of the given node;
A cluster dividing unit that divides a cluster, which is a set of nodes connected by edges, into sub-clusters, which are subsets of a cluster of nodes having the same label;
An area including a node of interest and a node directly connected by the edge of interest and the node of interest when the node of interest and a node directly connected by the edge belong to different sub-clusters The information processing apparatus according to claim 1, further comprising: a distribution overlap region detection unit that detects a distribution overlap region that is a boundary of subclusters.

Having a structure of at least one layer in which nodes described by multidimensional vectors are arranged;
In an information processing method for sequentially inputting input vectors belonging to an arbitrary class and learning an input distribution structure of the input vectors,
A node having a weight vector closest to the input vector to be input is a first winner node, a node having a weight vector closest to the second is a second winner node, and between the first winner node and the second winner node. A node density calculating step of calculating a node density of the node of interest based on an average distance between the node of interest and the node directly connected by the edge when the edge is connected to
A cluster that is a set of nodes connected by edges is divided into sub-clusters that are subsets of clusters based on the node density calculated by the node density calculation step, and an overlapping region of distribution that is a boundary of the sub-cluster A distribution overlap area detecting step for detecting
When the first winner node and the second winner node are nodes located in the distributed overlap area, the first winner node and the second winner node are determined based on the node density of the first winner node and the second winner node. An edge connection determination step of determining whether to connect an edge between two winner nodes;
An edge connection step of connecting an edge between the first winner node and the second winner node based on the determination result;
An information processing method comprising: an edge deletion step of deleting an edge between the first winner node and the second winner node based on the determination result.

Having a structure of at least one layer in which nodes described by multidimensional vectors are arranged;
In an information processing method for sequentially inputting input vectors belonging to an arbitrary class and learning an input distribution structure of the input vectors,
A node having a weight vector closest to the input vector to be input is a first winner node, a node having a weight vector closest to the second is a second winner node, and between the first winner node and the second winner node. A node density calculating step of calculating a node density of the node of interest based on an average distance between the node of interest and the node directly connected by the edge when the edge is connected to
For a node of interest, a noise node deletion step of deleting the node of interest based on the node density calculated by the node density calculation step and the number of nodes directly connected to the node of interest by a side An information processing method characterized by the above.

A weight vector update step of updating the weight vector corresponding to the first winner node and the weight vector corresponding to the node directly connected to the first winner node by an edge so as to be closer to the input vector, respectively. 13. The information processing method according to claim 11 or 12, wherein:

For the node of interest, a noise node deletion step of deleting the node of interest based on the node density calculated by the node density calculation step and the number of nodes directly connected to the node of interest by sides The information processing method according to claim 11, further comprising:

The node density calculating step includes:
Unit node density for calculating the node density of the first winner node as a ratio per unit input based on the average distance between the first winner node and the node directly connected to the first winner node by an edge 15. The information processing method according to claim 11, further comprising a calculation step.

The node density calculating step includes:
A node density point calculating step of calculating a point value of the node density of the first winner node based on an average distance between the first winner node and a node directly connected to the first winner node by an edge;
Node density points are accumulated until the number of inputs of the input vector reaches a predetermined number of unit inputs, and when the number of inputs of the input vector reaches a predetermined number of unit inputs, the accumulated node density points are input as units. 15. The information processing method according to claim 11, further comprising a unit node density point calculation step of calculating a node density per unit input and calculating a node density of the node per unit input number.

12. The self-propagating neural network that inputs the input vector to a neural network and automatically increases the number of nodes arranged in the neural network based on the input vector that is input. 16. The information processing method according to any one of items 16.

When there is a node that is directly connected to the node of interest by a side with respect to the node of interest, among the nodes that are directly connected, between the nodes having the maximum distance from the node of interest If the distance is a similarity threshold and there is no node that is directly connected to the node of interest by a side, the distance between the nodes with the smallest distance from the node of interest is calculated as the similarity threshold. A similarity threshold calculation step;
Whether the distance between the input vector and the first winner node is greater than the similarity threshold of the first winner node, and the distance between the input vector and the second winner node is similar to the second winner node A similarity threshold determination step for determining whether or not it is greater than the degree threshold;
18. The information processing method according to claim 17, further comprising a node insertion step of inserting the input vector as a node at the same position as the input vector based on a similarity threshold determination result.

The information processing method according to claim 17, wherein the self-propagating neural network has a one-layer structure.

The distribution overlap area detection step includes:
Based on the node density calculated by the node density calculation step, a node search step for searching for a node having a locally maximum node density;
A first label assigning step for assigning a label different from a label already assigned to another node to the searched node;
For a node that has not been given a label by the first label assigning unit, a node that is connected by a side to a node that has been given a label by the first label assigning step is labeled by the first label assigning step. A second labeling step of assigning the same label as the label of the assigned node;
A cluster dividing step of dividing a cluster, which is a set of nodes connected by edges, into sub-clusters, which are subsets of a cluster of nodes having the same label;
An area including a node of interest and a node directly connected by the edge of interest and the node of interest when the node of interest and a node directly connected by the edge belong to different sub-clusters 12. The information processing method according to claim 11, further comprising a distribution overlap region detecting step of detecting the region as a distribution overlap region that is a boundary of sub-clusters.

A program for causing a computer to execute the information processing according to any one of claims 11 to 20.