WO2022145087A1

WO2022145087A1 - Information processing device, information processing method, and non-transitory computer-readable medium

Info

Publication number: WO2022145087A1
Application number: PCT/JP2021/031880
Authority: WO
Inventors: 修長谷川; 洸輔井加田
Original assignee: Ｓｏｉｎｎ株式会社
Priority date: 2020-12-28
Filing date: 2021-08-31
Publication date: 2022-07-07
Also published as: JPWO2022145087A1; JP7489730B2

Abstract

An information processing device (100) performs processing with respect to a result of learning an input distribution structure for relevant input vectors as a network structure having a plurality of nodes and a plurality of sides, the device including: a winner node detection unit (1); a side learning time update unit (2); and a load balancing unit (3). The winner node detection unit (1) detects, as a first winner node, a node positioned at a distance closest to an input vector, from among the plurality of nodes, and detects a node positioned second closest as a second winner node. The side learning time update unit (2) increases, by a prescribed value only, the side learning time for a side connecting the first winner node and the second winner node. The load balancing unit (3) selects, at a prescribed timing, one or more sides from the plurality of sides on the basis of the side learning time, generates a new node on each of the selected one or more sides, and inserts the new node(s) into the network structure.

Description

Information processing equipment, information processing methods and non-temporary computer-readable media

The present invention relates to an information processing device, an information processing method and a program, for example, an information processing device, an information processing method and a program for learning an input distribution structure of an input vector by sequentially inputting an input vector belonging to an arbitrary class.

As a learning method for proliferating neurons as needed during learning, a method called a self-organizing neural network (SOINN) has been proposed (Patent Document 1). SOINN can learn non-stationary inputs by autonomously managing the number of nodes, and has many advantages such as being able to extract an appropriate number of classes and topological structure even for classes with complicated distribution shapes. Have. As an application example of SOINN, for example, in pattern recognition, after learning a class of hiragana characters, it is possible to additionally learn a class of katakana characters.

As an example of such SOINN, a method called E-SOINN (Enhanced SOINN) has been proposed. E-SOINN allows online additional learning to add learning at any time, and has the advantage of good learning efficiency rather than batch learning. Therefore, in E-SOINN, additional learning is possible even when the learning environment changes to a new environment. Further, E-SOINN has an advantage of high noise immunity to input data.

However, in SOINN including E-SOINN, it is difficult to insert a new node into the network, so it is difficult to accurately express the structure of the input data, and the learning result depends on the input order of the input data. There was a problem that it would be different. In order to solve such a problem, a method called LB-SOINN (Load Balance Self-Organizing Incremental Neural Network) has been proposed (Patent Document 2). LB-SOINN treats the load of a node in the network as the node learning time, detects a node with a large node learning time, and uses the weight vector of the detected node on the side connecting the detected node and the adjacent node. Generate a new node with a weight vector determined based on it. As a result, the increase in the learning time of the detected node can be mitigated, and a new node can be generated in the vicinity thereof, so that the structure of the input data can be learned more accurately.

Japanese Unexamined Patent Publication No. 2008-217246 Japanese Unexamined Patent Publication No. 2014-164396

However, in the above-mentioned LB-SOINN, it was found that a new node is generated between clusters when there is a lot of noise in the input data. FIG. 32 schematically shows the generation of a new node by load balancing in LB-SOINN. In LB-SOINN, among the nodes included in the neural network, the node Nmax whose learning time is relatively long and biased (referred to as the detected node) and the adjacent node connected to the node are adjacent to each other. By performing load balancing to generate a new node Nnew with the node Nnei, the network structure can be learned accurately. As shown in FIG. 32, when the detected node Nmax and the adjacent node Nnei belong to the same cluster CL, the load of the node belonging to the cluster can be appropriately balanced.

However, if the input data with a large noise is learned in LB-SOINN, the detected node Nmax and the adjacent node Nnei may belong to different clusters although they are connected by the sides. FIG. 33 schematically shows the generation of a new node by load balancing when learning data with a large noise in LB-SOINN. As shown in FIG. 33, for example, when the adjacent node Nnei belongs to the cluster CL1 and the detected node Nmax belongs to another cluster CL2, if the load balancing in the LB-SOINN is performed, the two clusters A new node Nnew is generated in the boundary area B between them. As a result, the cluster CL1 and the cluster CL2, which should be originally distinguished, cannot be distinguished, and there is a possibility that the structure of the input data cannot be learned correctly.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an information processing device, an information processing method, and a program capable of learning the structure of input data more accurately.

The information processing apparatus according to the embodiment of the present invention sequentially inputs an input vector and connects the input distribution structure of the input vector between a plurality of nodes described by a multidimensional vector and two said nodes. In an information processing device that learns as a network structure in which a plurality of sides are arranged, a node located at the closest distance to the input vector input from the plurality of nodes included in the network structure is detected as the first winner node. Then, the side learning time of the side connecting between the winner node detection unit that detects the node located at the second closest distance as the second winner node and the first winner node and the second winner node is set as the first. A side learning time update unit that increases by the value of, and one or more sides are selected from the plurality of sides based on the side learning time at a predetermined timing, and a new side is newly added on each of the selected one or more sides. It has a load balancing unit that creates a node and inserts it into the network structure.

The information processing apparatus according to the embodiment of the present invention is the information processing apparatus, wherein the side learning time update unit further connects the side between the first winner node and the second winner node. Other than that, it is desirable to increase the side learning time of the side connected to the first winner node and the side learning time of the side connected to the second winner node by a second value smaller than the first value.

In the information processing device according to the embodiment of the present invention, in the above information processing device, it is desirable that the load balancing unit selects one or more sides having a relatively large side learning time.

In the information processing apparatus according to the embodiment of the present invention, in the above information processing apparatus, the load balancing unit may select one or more sides whose side learning time is larger than a predetermined side learning time threshold. desirable.

The information processing apparatus according to the embodiment of the present invention is the above-mentioned information processing apparatus, in which the load balancing unit obtains one or more sides from the plurality of sides based on the side learning time and the length of the sides. It is desirable to select.

In the information processing apparatus according to the embodiment of the present invention, it is desirable that the load balancing unit selects one or more sides having a relatively large length in the above information processing apparatus.

In the information processing apparatus according to the embodiment of the present invention, it is desirable that the load balancing unit selects one or more sides whose length is larger than a predetermined value in the above information processing apparatus.

The information processing apparatus according to the embodiment of the present invention is the above-mentioned information processing apparatus, in which the load balancing unit wins the first node connected to one end of each side in the selected one or more sides. It is desirable to determine the position to generate the new node based on the number of times and the number of wins of the second node connected to the other end.

The information processing apparatus according to the embodiment of the present invention is the information processing apparatus, wherein the load balancing unit is the number of wins of the first node on each side and the number of wins of the first node on each side in one or more selected sides. It is desirable to generate the new node at the position of the center of gravity calculated from the number of wins of the second node.

The information processing apparatus according to the embodiment of the present invention is the above-mentioned information processing apparatus, in which the load balancing unit deletes one or more selected sides and is connected to one end of each deleted side. Generate a first side connecting one node and the new node, a second side connected to the other end of each deleted side, and a second side connecting the new node. It is desirable to insert it into the network structure.

In the information processing apparatus according to the embodiment of the present invention, in the information processing apparatus, the load balancing unit sets the side learning time of each deleted side to the first side and the second side. It is desirable to let each of them inherit at a predetermined ratio.

In the information processing device according to the embodiment of the present invention, in the above information processing device, each of the first and second sides has the number of wins of the first node and the number of wins of the second node, respectively. It is desirable to inherit the side learning time of each deleted side by the ratio indicated by the value obtained by dividing the number of wins of the first node and the number of wins of the second node.

The information processing apparatus according to the embodiment of the present invention is the above-mentioned information processing apparatus, in which the information processing apparatus includes the distance between the input vector and the first winner node and the input vector and the second winner. A component of the input vector when the node insertion is executed as a result of the determination by the node insertion determination unit that determines whether or not to execute the node insertion based on the distance between the nodes and the node insertion determination unit. It is desirable to have a node insertion section that generates an insertion node having the same component as the weight vector and inserts the generated insertion node into the network structure.

In the information processing method according to the embodiment of the present invention, input vectors are sequentially input, and the input distribution structure of the input vector is connected to a plurality of nodes described by a multidimensional vector and between the two said nodes. It is an information processing method that learns as a network structure in which a plurality of sides are arranged, and the winner node detection unit is located at the closest distance to the input vector input from the plurality of nodes included in the network structure. The node is detected as the first winner node, the node located at the second closest distance is detected as the second winner node, and the edge learning time update unit performs an edge learning time update unit between the first winner node and the second winner node. The side learning time of the connected sides is increased by a predetermined value, and the load balancing unit selects one or more sides from the plurality of sides based on the side learning time at a predetermined timing, and the selected one or more sides. A new node is created on each side and inserted into the network structure.

In the program according to the embodiment of the present invention, input vectors belonging to an arbitrary class are sequentially input, and the input distribution structure of the input vector is described by a plurality of nodes described by a multidimensional vector and between two said nodes. It is a program that causes a computer to execute a process of learning as a network structure in which a plurality of sides are arranged, and is located at a distance closest to the input vector input from the plurality of nodes included in the network structure. The processing to detect the node to be detected as the first winner node and the node located at the second closest distance as the second winner node, and the side learning time update unit are the first winner node and the second winner node. A process of increasing the side learning time of the side connecting between the sides by a predetermined value, and selecting one or more sides from the plurality of sides based on the side learning time at a predetermined timing, and selecting the one or more sides. The process of creating a new node on each of the above and inserting it into the network structure is performed by the computer.

According to the present invention, it is possible to provide an information processing device, an information processing method and a program capable of learning the structure of input data more accurately.

It is a figure which shows an example of the system configuration for realizing the information processing apparatus which concerns on Embodiment 1. FIG. It is a figure which shows typically the functional structure of the information processing apparatus which concerns on Embodiment 1. FIG. It is a figure which shows typically the functional structure of the information processing apparatus which concerns on Embodiment 1. FIG. It is a flowchart of the learning process in the information processing apparatus which concerns on Embodiment 1. FIG. It is a figure which shows the case of the increase of learning time. It is a figure which shows the insertion position of a new node. It is a figure which shows the insertion position of a new node. It is a figure which shows the relationship between the side and the number of wins of a node. It is a figure which shows the example which four nodes are connected in a star shape. It is a figure which shows the deletion of an edge. It is a figure which shows the generation and insertion of an edge. It is a figure which shows the inheritance of the edge learning time. It is a figure which shows the data generated for use in an experiment in Embodiment 1. FIG. It is a figure which shows the input data which gave noise to the generated data. It is a figure which shows the learning result of the input data by the algorithm which removed the load equilibration from the LB-SOINN of Patent Document 2. It is a figure which shows the learning result of the input data by LB-SOINN of patent document 2. FIG. It is a figure which shows the learning result of the input data which applied the load equilibration which concerns on Embodiment 1. FIG. It is a figure which shows the data generated for use in the experiment in Embodiment 2. FIG. It is a figure which shows the input data which gave noise to the generated data. It is a figure which shows the learning result of the input data by the algorithm which removed the load equilibration from the LB-SOINN of Patent Document 2. It is a figure which shows the learning result of the input data by LB-SOINN of patent document 2. FIG. It is a figure which shows the learning result of the input data which applied the load equilibration which concerns on Embodiment 1. FIG. It is a figure which shows the distribution of the input vector. It is a figure which shows the node distribution after applying the load equilibration described in Embodiment 1. FIG. It is a figure which shows the image of the load equilibration which concerns on Embodiment 2. It is a figure which shows the node distribution after applying the load equilibration which concerns on Embodiment 2 under the input vector distribution of FIG. It is a figure which shows the learning result of the input data which applied the load equilibration which concerns on Embodiment 2. It is a figure which shows typically the structure of the information processing apparatus which concerns on Embodiment 3. FIG. It is a flowchart of the operation of the information processing apparatus which concerns on Embodiment 3. FIG. It is a figure which shows typically the structure of the information processing apparatus which concerns on Embodiment 4. It is a flowchart of the operation of the information processing apparatus which concerns on Embodiment 4. It is a figure which shows typically the generation of a new node by load equilibration in LB-SOINN. It is a figure which shows typically the generation of the new node by load equilibration in the case of learning the data with a large noise in LB-SOINN.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In each drawing, the same elements are designated by the same reference numerals, and duplicate explanations are omitted as necessary.

Embodiment 1
FIG. 1 is a diagram showing an example of a system configuration for realizing the information processing apparatus according to the first embodiment. The information processing device 100 can be realized by a computer 10 such as a dedicated computer or a personal computer (PC). However, the number of computers does not have to be physically single, and may be multiple when performing distributed processing. As shown in FIG. 1, the computer 10 has a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, and a RAM (Random Access Memory) 13, which are connected to each other via a bus 14. There is. Although the description of the OS software for operating the computer is omitted, it is assumed that the computer that constructs this information processing device also has it.

The input / output interface 15 is also connected to the bus 14. The input / output interface 15 includes, for example, an input unit 16 including a keyboard, a mouse, and a sensor, a display including a CRT and an LCD, an output unit 17 including headphones and speakers, and a storage unit 18 including a hard disk. A communication unit 19 composed of a modem, a terminal adapter, etc. is connected.

The CPU 11 executes various processes according to various programs stored in the ROM 12 or various programs loaded from the storage unit 18 into the RAM 13, and in the present embodiment, for example, processes each part of the information processing apparatus 100 described later. .. A GPU (Graphics Processing Unit) is provided separately from the CPU 11, and as with the CPU 11, various programs stored in the ROM 12 or various programs loaded from the storage unit 18 into the RAM 13 are processed in various ways. In the present embodiment, the GPU (Graphics Processing Unit) is provided. For example, the processing of each part of the information processing apparatus 100 described later may be executed. The GPU is suitable for the purpose of performing routine processing in parallel, and can be applied to the processing in the neural network described later. It is also possible to improve the processing speed as compared with the CPU 11. The RAM 13 also appropriately stores data and the like necessary for the CPU 11 and the GPU 21 to execute various processes.

The communication unit 19 performs communication processing via the Internet (not shown), transmits data provided by the CPU 11, and outputs data received from the communication partner to the CPU 11, RAM 13, and storage unit 18. The storage unit 18 communicates with the CPU 11 and stores / erases information. The communication unit 19 also performs communication processing of an analog signal or a digital signal with another device.

The input / output interface 15 also requires a computer program to which a drive 20 is connected as needed, and for example, a magnetic disk 20A, an optical disk 20B, a flexible disk 20C, a semiconductor memory 20D, or the like is appropriately mounted and read from them. It is installed in the storage unit 18 according to the above.

Subsequently, each process in the information processing apparatus 100 according to the present embodiment will be described. In the information processing apparatus 100, a neural network having a non-hierarchical structure in which a node described by an n (n is an integer of 1 or more) dimensional vector is arranged is input. The neural network is stored in a storage unit such as a RAM 13.

The neural network in the present embodiment is a self-propagating neural network in which an input vector is input to the neural network and the number of nodes arranged in the neural network is automatically increased based on the input input vector. By using a type neural network, the number of nodes can be increased automatically.

The neural network in this embodiment has a non-hierarchical structure. By adopting a non-hierarchical structure, additional learning can be performed without specifying the timing to start learning in other layers. That is, additional learning can be carried out online.

The input data is input as an n-dimensional input vector. For example, the input vector is stored in the temporary storage unit (for example, RAM 13), and is sequentially input to the neural network stored in the temporary storage unit.

In the following, for the sake of simplicity, the neural network in this embodiment is also simply referred to as a network.

Hereinafter, the load balancing process performed by the information processing apparatus 100 according to the first embodiment will be described. The network on which the load balancing process is performed is a neural network composed of a plurality of nodes and a plurality of sides connecting the nodes. For example, the load balancing process is removed from the above-mentioned E-SOINN and LB-SOINN. Created by an algorithm.

FIGS. 2 and 3 schematically show the functional configuration of the information processing apparatus 100 according to the first embodiment. FIG. 4 shows a flowchart of processing in the information processing apparatus 100 according to the first embodiment. On the hardware, the software and the hardware resources such as one or both of the CPU 11 and the GPU 21 cooperate to realize the functional configuration shown in FIGS. 2 and 3 and each process shown in FIG.

The information processing device 100 has a winner node detection unit 1, an edge learning time update unit 2, and a load balancing unit 3, which are the basic components of the information processing device 100. The learning processing unit 4 and the clustering unit 5 shown in FIG. 3 may be provided in the information processing device 100 or may be provided separately from the information processing device 100. Here, it is assumed that the learning processing unit 4 and the clustering unit 5 are provided in the information processing apparatus 100.

The information processing apparatus 100 is input with a network constructed by learning, which is a target of load balancing processing, and a new input vector. Specifically, the learning processing unit 4 sequentially learns the input vector, and at a predetermined timing, passes the constructed network and the new input vector to the information processing apparatus 100.

When the result of the learning processing unit 4 is input to the information processing device 100, the winner node detection unit 1 detects the first winner and the second winner based on the input vector and the node of the neural network. Next, the side learning time updating unit 2 increases the learning time of the side connecting the first winner and the second winner by a predetermined value.

When the total number of inputs of the input vector is an integral multiple of a predetermined number of units (λ), the load balancing unit 3 detects an edge included in the neural network whose edge learning time is relatively long. Insert a new node located on the detected edge into the neural network.

After that, the information processing apparatus 100 determines whether all the input vectors have been input. If it is determined that all the input vectors have not been input, a new input vector is input to the learning processing unit 4, and the processing is continued. When it is determined that all the input vectors have been input, the clustering unit 5 performs clustering processing of the network after load balancing, the nodes constituting the network are classified, and a series of processing for the input data is completed.

Hereinafter, the processing in the information processing apparatus 100 will be specifically described with reference to FIG.

Step S1
The learning processing unit 4 learns input vectors that are sequentially input in cooperation with the winner node detection unit 1 to form a neural network showing an input distribution structure composed of nodes and edges connecting the nodes. .. The learning processing unit 4 sequentially stores the nodes and sides obtained as a result of learning in the temporary storage unit.

Step S2
The winner node detection unit 1 refers to the input vector and the node stored in the temporary storage unit, detects the node closest to the target input vector ε as the first winner node a1, and second to the target input vector ε. A nearby node is detected as the second winner node a2, and the result is stored in the temporary storage unit. The winner node detection unit 1 executes, for example, the processes shown in the following equations [1] and [2] as the detection process, and stores the result in the temporary storage unit.

Here, D (ε, a) is the distance between the input vector ε calculated using a predetermined distance scale and the node a, and A is a set of nodes. As the distance scale between the nodes, any distance scale such as Euclidean distance, cosine distance, Manhattan distance, and fractional distance can be applied. Generally, the Euclidean distance is used when the input data is low-dimensional, and the cosine distance, Manhattan distance and fractional distance are used when the input data is high-dimensional.

Although the winner node detection unit 1 and the learning processing unit 4 are described separately here, the winner node detection unit 1 may be included in the learning processing unit 4. For example, when the input data is learned by E-SOINN or LB-SOINN according to

Patent Documents

1 and 2, the winner node is detected in the learning of the input vector. In this case, the winner node detection unit 1 is included in the learning processing unit 4, and the winner node detection result performed in the learning process can be used as it is.

Step S3
The edge learning time update unit 2 increases the learning time of the edge (also referred to as the winning edge) connecting the first winner node and the second winner node by a predetermined value, and stores the result in the temporary storage unit. Further, even if the side learning time updating unit 2 increases the learning time of the side connected to the first winner node and the side connected to the second winner node other than the winning side, and stores the result in the temporary storage unit. good.

Since the contribution of the winning side is emphasized, the amount of increase in the learning time of the side connected to the first winner node other than the winning side and the side connected to the second winner node may be smaller than the increase amount of the winning side. desirable. Further, the amount of increase in the learning time of the side connected to the first winner node other than the winning side and the side connected to the second winner node can be in the negative direction, that is, in the decreasing direction. FIG. 5 shows a case of an increase in learning time. In FIG. 5, the winning node is W, the node connecting to the winning node W on the winning side EW is Na, and the node connecting to the winning node W on the non-winning side E is Nb.

Cases

1 and 2 are simple cases in which only the learning time of the winning side EW is increased, and the learning time is increased by 1 and 2, respectively. Case 3 is a case where the learning time of the winning side EW and the side E other than the winning side increases, and the learning time increases by 3 and 1, respectively. Case 4 is a case where the learning time of the winning side EW and the side E other than the winning side changes in different directions (positive and negative), the learning time of the winning side EW increases by 1, and the side other than the winning side The learning time of E is reduced by 1. That is, the learning time of the winning side EW may be increased with respect to the learning time of the side E other than the winning side, and as in Case 4, the learning time of the side E other than the winning side should be decreased. I understand that it is also good.

Step S4
The load balancing unit 3 has an input number determination unit 31, a target side detection unit 32, an edge node insertion unit 33, a target side deletion unit 34, and a new node side connection unit 35, and is loaded based on steps S41 to S45. Perform parallelization.

Step S41
The input number determination unit 31 determines whether or not the number of inputs of the input vector has reached a predetermined number. If the number of inputs of the input vector reaches a predetermined number, the process proceeds to step S42, and if the number of inputs of the input vector does not reach the predetermined number, the process proceeds to step S5.

In the present embodiment, the above-mentioned input number determination is treated as a part of step S4 as step S41, but the input number determination may be performed separately from step S4. For example, in the third and fourth embodiments described later, the input number determination (step S22 in FIG. 29, steps S70 and S72 in FIG. 31) is a process different from that in step S4, and in FIGS. 29 and 31 Step S4 includes steps S42 to S45.

Step S42
The target side detection unit 32 refers to the side learning time of the side stored in the temporary storage unit, detects the side having a relatively large side learning time, and stores the result in the temporary storage unit. Hereinafter, the side detected by the target side detection unit 32 is referred to as a target side. The target side detection unit 32 detects, for example, a side having an edge learning time larger than a predetermined threshold value TH1, and stores the result in the temporary storage unit. The threshold value TH1 can be any positive value, for example, a value obtained by multiplying the average value _TAVE of the side learning time of all sides stored in the temporary storage unit by a predetermined coefficient c (TH1 = c · T). _AVE ) may be used. Although it is possible to set all sides having a side learning time larger than the threshold value TH1 as the target sides, only a part of them may be set as the target sides. For example, among the edges having an edge learning time larger than the threshold TH1, only a predetermined number of edges may be set as the target edge in order from the edge learning time.

Further, a relatively large side may be detected for each class determined by classifying the node and the side. For example, a threshold value TH1 may be prepared for each class and the target side may be detected from each class. The classification method will be described later in step S6.

Step S43
The edge node insertion unit 33 generates a new node at a predetermined position on the detected target edge, inserts it into the network, and stores the result in the temporary storage unit. Hereinafter, the predetermined position on the target side where the new node is inserted is referred to as a new node insertion position. The new node insertion position may be various positions, but may be, for example, the midpoint of the target side as shown in FIG.

In the above, the new node insertion position is set as the midpoint of the side, but the new node insertion position may be determined based on the number of wins of the two nodes at both ends of the target side. Hereinafter, a specific description will be given. The nodes at both ends of the target side are N1 and N2, the number of wins is VN1 and VN2, respectively, and the length of the target side is L. In this case, on the target side, a position separated by L (VN2) / (VN1 + VN2) from the node N1, or a position separated by L (VN1) / (VN1 + VN2) from the node N2, that is, a position of the center of gravity based on the number of wins. Is the new node insertion position. For example, as shown in FIG. 7, when the number of wins of the node N1 is 20 and the number of wins of the node N2 is 80, the position separated from the node N1 by 4/5 · L, that is, 1/5 · L from the node N2. The position separated by the new node is the new node insertion position.

The number of wins is a value that means the number of times that node is selected as a winner in a certain node, and the winner is, for example, the first winner. FIG. 8 shows the relationship between the side and the number of wins of the node. Node Ni and node Nj are connected to both ends of the side EA, respectively. At this time, the number of wins VNi_A of the node Ni indicates the number of times that the node Ni becomes the winner when the side EA becomes the target side. Similarly, the number of wins of the node Nj VNj_A indicates the number of times that the node Nj becomes the winner when the side EA becomes the target side. In this way, the number of wins of a node is held for each of the sides connected to that node, and when the node is connected to a plurality of sides, each side is the target side. It will keep the number of wins if it becomes.

Hereinafter, a specific example will be described. FIG. 9 shows an example in which four nodes are connected in a star shape. In this example, the node Ni is connected to the nodes Nj, Nk and Nm by the sides EA, EB and EC, respectively. In this case, the nodes Nj, Nk, and Nm hold the number of wins VNj_A, VNk_AB, and VNm_C when the connected sides EA, EB, and EC are the target sides, respectively. On the other hand, since the three sides EA, EB and EC are connected to the central node Ni, the number of wins VNi_A, VNi_B and VNi_C for each side is held. Therefore, when the input data in the vicinity of the side is biased, for example, when the input data in the vicinity of the side EA is large and the input data in the vicinity of the side EB and EC is small, the number of wins VNi_A is larger than the number of wins VNi_B and VNi_C. Since it is a value, it is possible to reflect such a bias of input data in the number of wins. This makes it possible to generate new nodes that are more faithful to the distribution of the input data.

However, not limited to this, it may be changed as appropriate according to the usage situation. For example, as in the node learning time described in Patent Document 2, the number of wins unique to each node may be used. In this case, regardless of the target side, the same node has one number of wins, so that the processing becomes simpler and the processing load can be suppressed. Needless to say, the number of wins may take into consideration the number of second winners.

Step S44
After inserting the new node, the target side deletion unit 34 deletes the target side connecting the node N1 and the node N2 from the neural network, and stores the result in the temporary storage unit, as shown in FIG. For example, the target side deletion unit 34 executes the operation shown in the following equation [3] stored in the temporary storage unit, and stores the result in the temporary storage unit. Here, C indicates an edge set, and (N1, N2) indicates an edge connecting the node N1 and the node N2, for example.

Step S45
As shown in FIG. 11, the new node side connection unit 35 has two sides connecting each of the two nodes N1 and N2 connected by the deleted target side and the inserted new node N. Is generated and the result is stored in the temporary storage unit. For example, the new node side connection unit 35 executes the operation shown in the following equation [4] stored in the temporary storage unit, and stores the result in the temporary storage unit.

The two sides connected to the new node may take over the learning time of the deleted target side under a predetermined rule. For example, the learning time of the target side may be evenly distributed to the learning time of the two newly inserted sides.

Further, for example, the learning time of the target side may be distributed to the learning time of the newly inserted two sides based on the number of wins of the two nodes at both ends of the target side. In the same manner as described above, the nodes at both ends of the target side are N1 and N2, and the number of wins is VN1 and VN2, respectively. Also, let ST be the learning time of the target side. In this case, as shown in FIG. 12, the learning time of the side E1 connecting the node N1 and the new node is ST · VN1 / (VN1 + VN2), and the learning time of the side E2 connecting the node N2 and the new node is ST · VN2. / (VN1 + VN2) may be used. For example, as shown in FIG. 10, when the learning time of the node N1 is 20, the number of wins of the node N2 is 80, and the learning time of the target side is 100, the learning time ST1 of the side E1 connecting the node N1 and the new node is 20. The learning time ST2 of the side E2 connecting the node N2 and the new node is 80.

Step S5
After that, for example, the clustering unit 5 determines whether the processing of the input vector is completed, that is, whether all the input vectors have been input. If the total number of given input vectors ε stored in the temporary storage unit does not match the total number of all input vectors, the process returns to step S1 and processes the next input vector ε. On the other hand, when the total number of input vectors ε matches the total number of all input vectors, the following step S6 is executed. The method of determining the end is not limited to this, and an instruction to end may be given by the user.

Step S6
The clustering unit 5 refers to the load-balanced nodes and edges that have been processed and stored in the temporary storage unit, and classifies them into classes. Various classification methods can be applied to the classification of the nodes constituting the network, and for example, the same processing as LB-SOINN of Non-Patent Document 2 may be performed.

In the present embodiment, it has been described that the classification of step S6 is performed when it is determined that the processing is completed in step S5, but this is merely an example, and even if it is performed at any other timing. good. For example, it may be executed at the timing after the load balancing in step S4 is performed, or an integer having a predetermined number of units (λ') whose total number of inputs of the input vector is different from the predetermined number of units (λ). It may be executed when it is doubled. If the nodes and edges are classified at the time of load balancing, load balancing can also be performed for each class as described above.

Subsequently, the effect of the load balancing process in the information processing apparatus 100 according to the first embodiment will be described with reference to the experimental results. Here, as shown in FIG. 13, data is generated by combining two parts in which two-dimensional input vectors are distributed in an arc shape, and as shown in FIG. 14, the generated data is input with noise. Used as data.

FIG. 15 shows the learning result of the input data by the algorithm excluding the load balancing from the LB-SOINN of Patent Document 2. In this case, the two clusters cannot be distinguished because the load balancing is not performed.

Next, FIG. 16 shows the learning result of the input data by LB-SOINN of Patent Document 2. In this case, due to the influence of noise as described above, the load balancing of LB-SOINN creates nodes between the clusters and causes them to be excessively crowded, making it impossible to distinguish between the two clusters.

Next, FIG. 17 shows the processing result of the input data to which the load balancing according to the first embodiment is applied. In this case, since load balancing is performed based on the edge with a relatively large edge learning time, node generation on the edge with a small learning time, that is, between clusters is suppressed, and the distribution of the input vector becomes more stable. It is reflected accurately. Further, by suppressing the nodes between the clusters, the generation of the edges connecting the nodes inside the cluster and the nodes between the clusters is also suppressed, and as a result, the edges connecting the clusters are reduced. That is, it can be understood that the information processing apparatus 100 according to the first embodiment can express the distribution of the input vector more accurately.

Embodiment 2
The load equilibration according to the second embodiment will be described. In the first embodiment, a new node is added on an edge having a relatively large edge learning time to perform load balancing. On the other hand, in the present embodiment, in order to more accurately reflect the distribution of the input data, a new node is added on the edge where both the edge learning time and the length are relatively large to perform load balancing.

The following will be explained with reference to the experimental results. Here, as shown in FIG. 18, data consisting of an input vector densely packed in the central portion and an input vector distributed in a ring around the central portion is generated, and as shown in FIG. 19, the generated data is used. The one with noise was used as the input data.

FIG. 20 shows the learning result of the input data by the algorithm excluding the load balancing from the LB-SOINN of Patent Document 2. In this case, since load balancing is not performed, the density of the input vector cannot be expressed by the nodes, and the nodes exist uniformly in both the central part and the surrounding annular part, and the nodes are uniformly present between the clusters. Have been connected on many sides. As a result, it can be understood that the distribution of the input vector cannot be accurately expressed.

FIG. 21 shows the learning result of the input data by LB-SOINN of Patent Document 2. In this case, although the density of the nodes in the center is expressed, it is too dense and the clusters are connected on many sides due to the influence of noise, so the clusters can also be separated. I can understand that it is not.

Next, FIG. 22 shows the processing result of the input data to which the load balancing according to the first embodiment is applied. In this case, as compared with the LB-SOINN of FIG. 22, the density of the nodes in the central portion is suppressed. However, there is still an excessive concentration of nodes in the center. In addition, load equilibration is performed based on the edges where the edge learning time is relatively large. Therefore, the node generation on the side where the learning time is small is suppressed, and the sparseness of the nodes can be expressed. However, nodes are excessively generated in the central portion, and due to the large number of the nodes, there is a high probability that a side connecting the annular portion and the central portion will be generated due to noise or the like. As a result, about 10 sides connecting the clusters remain.

When a new node is added on an edge having a relatively large edge learning time and load balancing is performed as in the first embodiment, the edge on which the new node is generated is deleted and is shorter than the deleted edge. Edges are generated. In addition, the edge learning time tends to increase in places where the distribution density of the input vector is large, so if the learning of the input vector and the additional equilibrium processing are advanced, it will be generated by the load parallelization processing especially in the place where the distribution density of the input vector is large. There is a tendency that the learning time becomes large at the edge of the vector and further load parallelization processing is executed. Then, load equilibration creates a new node on the previously generated short edge and creates a shorter edge. As a result, as shown in FIG. 22, it can be understood that the nodes are densely packed in the central part.

FIG. 23 shows the distribution of the input vector. The horizontal axis of FIG. 23 indicates any component of the input vector. In FIG. 23, assuming data with a large amount of noise, a subcluster SC1 represented by a large peak on the left side and a subcluster SC2 represented by a small peak on the right side exist in one cluster. .. When the load equilibration described in the first embodiment is applied to the nodes and sides generated by learning the input vector distributed in this way, the load is centered on the subcluster SC1 having the highest node density (broken line portion). Equilibration is performed.

FIG. 24 shows the node distribution after applying the load equilibration described in the first embodiment. As shown in FIG. 24, as a result of performing a large number of load equilibrations in the subcluster SC1 on the left side, more nodes than necessary are generated, and the node density increases locally. As a result, the node distribution is biased, and the processing speed is reduced due to unnecessary node generation.

On the other hand, in the present embodiment, not only the edge learning time but also the length of the edge is evaluated to prevent the generation of a new node on an excessively short edge, so that the node can be balanced by the equilibrium process. Relieve congestion. Hereinafter, the target side detection (step S42) in the load equilibration according to the second embodiment will be described.

The target side detection unit 32 refers to the side learning time of the side stored in the temporary storage unit, detects the side having a relatively large side learning time, and stores the result in the temporary storage unit. Further, the target side detection unit 32 refers to a side having a relatively large side learning time stored in the temporary storage unit, and detects a side having a relatively long length. Hereinafter, the side detected by the target side detection unit 32, which has a relatively long side learning time and a relatively long length, is referred to as a target side.

Similar to the first embodiment, the target side detection unit 32 detects, for example, a side having an edge learning time larger than a predetermined threshold value TH1, and stores the result in the temporary storage unit. The threshold value TH1 can be any positive value, for example, a value obtained by multiplying the average value _TAVE of the side learning time of all sides stored in the temporary storage unit by a predetermined coefficient c (TH1 = c · T). _AVE ) may be used. Although it is possible to set all sides having a side learning time larger than the threshold value TH1 as the target sides, only a part of them may be set as the target sides. For example, among the edges having an edge learning time larger than the threshold TH1, only a predetermined number may be set as the target edge in order from the edge learning time.

Next, the target side detection unit 32 detects, for example, a side longer than the threshold value _LTH from a side having a side learning time larger than a predetermined threshold value TH1, and stores the result in the temporary storage unit. The threshold value _LTH can be any positive value, for example, a value obtained by multiplying the average _{LAVE of the lengths of all sides stored in the temporary storage unit by a predetermined coefficient d (LTH} ₌ d · L). _AVE ) may be used.

Further, a side that is relatively large and has a relatively long length may be detected for each class determined by classifying the node and the side. For example, a threshold value TH1 and a threshold value _LTH may be prepared for each class, and the target side may be detected from each class.

The effect of load equilibration according to this embodiment will be described. FIG. 25 shows an image of load equilibration according to the second embodiment. In the load equilibration according to the first embodiment, when detecting the side to be the target of the load equilibration, only the side learning time is referred to, and the length of the side is not referred to. Therefore, as shown in FIG. 25, an edge whose edge learning time is longer than the threshold value but whose length is smaller than the threshold value is also detected as a target edge (region A1 + A2 in FIG. 25). As a result, the target edges increase, resulting in excessive congestion of new nodes.

On the other hand, in the present embodiment, when detecting the side to be the target of load balancing, not only the side learning time but also the side length is referred to. Therefore, as shown in FIG. 25, the side learning time is used. Only the side larger than the threshold value and the length larger than the threshold value is detected as the target side (region A1 in FIG. 25). As a result, it is possible to prevent a short side from being detected as a target side as compared with the first embodiment. FIG. 26 shows the node distribution after applying the load equilibration according to the second embodiment under the input vector distribution of FIG. 20. FIG. 27 shows the processing result of the input data to which the load equilibration according to the second embodiment is applied. As shown in FIG. 26, the load balancing process in the left subclass containing many short sides is suppressed, and the congestion of new nodes can be prevented.

By performing load equilibration according to the present embodiment, as shown in FIG. 27, as compared with FIG. 22, the density of the nodes in the central portion is significantly suppressed, and the unevenness of the density of the nodes is alleviated. Moreover, it can be understood that the number of edges between clusters is also significantly reduced, and the distribution of input data can be learned more accurately.

Embodiment 3
In the third embodiment, as a specific example of the process for performing the load equilibration according to the first embodiment, the load equilibration of LB-SOINN in Patent Document 2 is replaced with the load equilibration according to the first embodiment. An example will be described. In the third embodiment, the term "learning" is used for a series of processes including not only the neural network configuration process but also the load parallelization process.

FIG. 28 schematically shows the configuration of the information processing apparatus 300 according to the third embodiment. The learning processing unit 4 includes an input information acquisition unit 41, a node density update determination unit 42, a node density calculation unit 43, a node insertion determination unit 44, a node insertion unit 45, an edge connection determination unit 46, an edge connection unit 47, and a winner node learning. It has a time calculation unit 48, a weight vector update unit 49, and an old-age side deletion unit 50. The clustering unit 5 includes a subcluster determination unit 51, a noise node deletion unit 52, a learning end determination unit 53, and a class determination unit 54. Further, the information processing apparatus 300 further includes an output information display unit 6.

Further, in the present embodiment, the method described in <3: Framework for combining new distance scales> in Patent Document 2 is used. Specifically, the distance scale shown in the formula 14 of Patent Document 2 is used, and more specifically, the formula 17 is used. As described in Patent Document 2, in order to use this distance scale, a minimum distance value and a maximum distance value between nodes used for normalization of each distance scale are required. The minimum and maximum distance values between nodes change when a new input vector is input to the network, so this point should also be taken into consideration. The method of consideration will be described later.

Further, in the present embodiment, the method described in <5: Definition of new node density and its calculation method> of Patent Document 2 is used. Specifically, the vector di of the average distance from the adjacent node for the node i described in the equation 23, the vector pi of the point value of the node density described in the equation 24, and the node described in the equation 25. The vector si of the cumulative point value of the node density of i and the node density hi described in Equation 26 are used.

FIG. 29 shows a flowchart of the operation of the information processing apparatus 300 according to the third embodiment.

Step S11
The input information acquisition unit 41 acquires an n-dimensional input vector as information given as input to the information processing apparatus 300, stores the acquired input vector in a temporary storage unit (for example, RAM 13), and stores it in the temporary storage unit. Input sequentially to the neural network. Specifically, the input information acquisition unit 41 initializes the node set A as an empty set and the edge set C⊂A × A as an empty set as the initialization process, and stores the result in the temporary storage unit. Further, as a quasi-initialization process, when the number of nodes included in the node set A is one or less, input vectors are randomly acquired so that the number of nodes is two, and the corresponding nodes are set as a node set. In addition to A, the result is stored in the temporary storage unit. Next, as an input process, a new input vector ε ∈ R ⁿ is input, and the result is stored in the temporary storage unit. The initialization process is executed once only immediately after the process is started, and is not executed thereafter. The quasi-initialization process is executed only when the number of nodes included in the node set A is one or less, and is not executed in other cases. For example, if there are two or more nodes in the node set A instead of the first input, only the input process is executed.

Step S12
The node density update determination unit 42 determines that the minimum distance value and the maximum distance value between the nodes stored in the temporary storage unit and the nodes based on each distance scale are the minimum distance value and the maximum distance value between the nodes based on each distance scale. It is confirmed whether or not at least one value has changed, and if at least one value has changed, it is determined that the node density is updated, and the result is stored in the temporary storage unit. As shown above, this process and the process of step S13 take into consideration the fact that the minimum distance value and the maximum distance value between the nodes change when a new input vector is input to the network. Is.

Step S13
When updating the node density as a result of the determination stored in the temporary storage unit, the node density calculation unit 43 determines the node stored in the temporary storage unit, the vector of the cumulative point value of the node density, the learning time of the node, and so on. For the minimum and maximum distance values between nodes based on each distance scale, based on the vector of cumulative point values of node density, learning time of nodes, minimum distance value and maximum distance value between nodes based on each distance scale, The vector s _i ^→ of the cumulative point value of the node density of the node i ∈ A included in the node set A is calculated and updated again, and the node using the updated vector s _i ^→ of the cumulative point value of the node density of the node i →. The node density h _i of i is calculated again, and the result is stored in the temporary storage unit. The node density calculation unit 43, for example, by executing the calculation processing shown in the equations (27) to (30) and the equation (26) described in Patent Document 2, the vector s of the cumulative point value of the node density of the node i _i ^→ and the node density h _i of the node i are calculated and updated again.

Step S2
When it is determined in step S12 that the node density is not updated, and after the node density is calculated in step S13, the winner node detection unit 1 performs the same as in the case of the information processing apparatus 100 according to the first embodiment. The first winner node and the second winner node are detected, and the result is stored in the temporary storage unit.

Step S14
The node insertion determination unit 44 determines whether or not to execute node insertion by referring to the target input vector stored in the temporary storage unit, the node, and the similarity threshold value of the node described later. Hereinafter, a specific description will be given.

The node insertion determination unit 44 refers to the node including the first winner node and the second winner node stored in the temporary storage unit, and sets the similarity threshold as the node i that focuses on the first winner node or the second winner node. Calculate _Ti . First, the node insertion determination unit 44 determines whether or not an adjacent node of the node i exists, and stores the result in the temporary storage unit.

When the node insertion determination unit 44 determines that the node i is adjacent to the node i as a result of the determination stored in the temporary storage unit, the node insertion determination unit 44 is from the node i among the adjacent nodes j as shown in the equation [5]. The distance to the node with the maximum distance is calculated by _Ti with the similarity threshold, and the result is stored in the temporary storage unit. Here, D (i, j) is the distance between the node i and the node j calculated by using the above-mentioned distance scale.

Further, as a result of the determination stored in the temporary storage unit, when the adjacent node of the node i does not exist, the node insertion determination unit 44 is among the nodes j other than the node i, as shown in the equation [6]. The distance to the node with the minimum distance from the node _i is calculated as the similarity threshold Ti, and the result is stored in the temporary storage unit.

As described above, the node insertion determination unit 44 calculates the similarity threshold T a1 of the _first winner node _a1 and the similarity threshold T _a2 of the second winner node a ₂ , and stores the results in the temporary storage unit.

Next, in the node insertion determination unit 44, the distance D (ε, a ₁ ) between the input vector ε and the first winner node a ₁ is larger than the similarity threshold T _a1 of the first winner node a ₁ (D (ε ε). , _{A 1} )> T _a1 ), or the distance D (ε, a ₂ ) between the input vector ε and the second winner node a ₂ is greater than the similarity threshold T a ₂ of the second winner node a ₂ (D). If ( _ε , a ₂ )> T a 2), it is determined that the node insertion is executed, and if not, it is determined that the node insertion is not executed, and the result is stored in the temporary storage unit.

Step S15
When it is determined in step S14 that the node is to be inserted, the node insertion unit 45 refers to the determination result of the node insertion determination unit 44 stored in the temporary storage unit, and when it is determined to execute the node insertion, the target input. Assuming that the vector ε is a node that should be newly added to the network, an insertion node having the same component as the component of the target input vector ε as a weight vector is generated, and the generated insertion node is inserted into the network, and the result is Is stored in the temporary storage unit. After that, the process proceeds to step S5.

Step S16
When it is determined in step S15 that the node is not inserted, the side connection determination unit 46 refers to the node and the subcluster label of the node stored in the temporary storage unit, and is the first winner node based on the subcluster label of the node. And the subcluster to which the second winner node belongs is determined, and the result is stored in the temporary storage unit. Here, the subcluster label of a node means label information indicating the subcluster to which the node belongs. A cluster is a set of nodes included in a mixed class that are connected by edges. A subcluster is a subset of a cluster of nodes with the same subcluster label.

As a result of the determination stored in the temporary storage unit, if at least one of the first winner node and the second winner node does not belong to any of the subclusters, or the first winner node and the second winner node are the same sub. When belonging to the cluster, the side connection determination unit 46 determines that the side is connected between the first winner node and the second winner node, and stores the result in the temporary storage unit.

As a result of the determination stored in the temporary storage unit, when the first winner node and the second winner node belong to different subclusters (for example, the first winner node belongs to the subcluster SC1 and the second winner node belongs to the subcluster). (When belonging to the cluster SC2), the side connection determination unit 46 determines the node density condition for the first winner node based on the average node density of the subcluster including the first winner node (the following equation [7]), and , Determine whether or not at least one of the node density conditions (the following equation [8]) for the second winner node based on the average node density of the subcluster including the second winner node is satisfied. The result is stored in the temporary storage unit.

In equations [7] and [8], _{ha1 indicates the node density of the first winner node, and ha2} _indicates the node density of the second winner node. min (ha a1, _{ha a2) indicates the lowest node density among the node density h a1 of the first winner node and the node density h a2} _of _the _second winner node. h _SC1 indicates the node density of the node having the highest node density among the nodes included in the subcluster SC1 to which the _first winner node a1 belongs. h _SC2 indicates the node density of the node having the highest node density among the nodes included in the subcluster SC2 to which the second winner node _a2 belongs. hm _SC1 indicates the average node density of all the nodes included in the subcluster SC1. hm _SC2 indicates the average node density of all the nodes included in the subcluster SC2. θ indicates a parameter in which an appropriate value is determined and set in advance by the user, and the value is determined within the range of [1, 2]. Also, θ is a parameter of the tolerance factor and is used to determine how much difference between subclusters contained in one class is tolerated. This tolerance decreases as θ increases.

The condition of the node density for the _first winner node a1 shown in the equation [7] is the minimum node density of the node density h _a1 of the _first winner node a1 and the node density h _a2 of the _second winner node a2. Is larger than the threshold calculated according to the ratio of the maximum node density h _SC1 to the average node density hm _SC1 of the subcluster SC1 based on the average node density hm _SC1 of the subcluster SC1 including the _first winner node a1. It is a condition to judge whether or not. Further, the node density condition for the _second winner node a2 shown in the equation [8] is the smallest node among the node density h a1 of the _first winner node _a1 and the node density h _a2 of the second winner node _a2 . The density is greater than the threshold calculated according to the ratio of the maximum node density h _SC2 to the average node density hm _SC2 of the subcluster SC2 relative to the average node density hm _SC2 of the subcluster SC2 including the _second winner node a2. It is a condition for determining whether or not.

If at least one of the equations [7] and [8] is satisfied, the edge connection determination unit 46 determines that the edges are connected between the first winner node and the second winner node, and if not, the first. It is determined that the side is not connected between the winner node and the second winner node, and the result is stored in the temporary storage unit. If it is determined that the edges are not connected, the edge connection unit 47 does not connect the edges between the first winner node and the second winner node (if the edges exist between the first winner node and the second winner node). (Delete the side), store the result in the temporary storage unit, and proceed to step S20.

Step S17
When it is determined that the edges are connected as a result of the determination in step S16 stored in the temporary storage unit, the edge connection unit 47 connects the edges between the first winner node and the second winner node, and temporarily stores the result. Store in the department. If an edge already exists between the first winner node and the second winner node, that edge is maintained. Further, the side connection unit 47 sets the age of the side determined to be connected in the above process to 0, and stores the result in the temporary storage unit.

Step S18
The node density calculation unit 43 includes the nodes stored in the temporary storage unit, the minimum and maximum distance values between the nodes based on each distance scale, the vector of the average distance of the nodes from the adjacent nodes, and the point value of the node density of the nodes. With respect to the vector of, the vector of the cumulative point value of the node density, and the node density, the distance of the node i from the adjacent node based on each distance scale and the distance between the nodes based on each distance scale are set with the _first winner node a1 as the node i. Based on the minimum distance value and the maximum distance value, the vector di ^{→ of the average distance from the adjacent node for the node i is calculated, and the first is based on the calculated vector di →} _of _the ^average distance from the adjacent node. The vector _pi ^→ of the point value of the node density of the winner node a1 is calculated, and the vector of the cumulative point value of the node density is calculated based on the calculated vector _pi ^→ of the _point value of the node density of the _first winner node a1. s _i ^→ is calculated, the node density h _i of the first winner node a ₁ is calculated based on the vector s _i ^→ of the cumulative point value of the node density of the first winner node a ₁ , and the result is temporarily obtained. Store in the storage. The node density calculation unit 43, for example, by executing the calculation processing represented by the equations (24) to (26) described in Patent Document 2 stored in the temporary storage unit, the cumulative point value of the node density of the node i. Vector s _i ^→ and node density h _i of node i are calculated.

Step S19
The winner node learning time calculation unit 48 increases the learning time Ma1 of the _first winner node _a1 stored in the temporary storage unit by a predetermined value, and stores the result in the temporary storage unit. The winner node learning time calculation unit 48 increases the learning time _{Ma1 of the first winner node a1} _by ₁ by executing the process of, for example, Ma1 (t + 1) = _Ma1 (t) + 1, and obtains the result. Store in the temporary storage.

Step S20
The weight vector update unit 49 updates the weight vectors of the node and the node stored in the temporary storage unit so that the weight vectors of the _first winner node a1 and its adjacent nodes are closer to the input vector ε, respectively, as a result. Is stored in the temporary storage unit. The weight vector update unit 49 uses, for example, the equations (33) and (34) described in Patent Document 2, and the update amount ΔW _a1 and the first winner node a for the weight vector Wa ₁ of the first winner node _a1 . The update amount ΔW _j for the weight vector W _s1 of the adjacent node j of ₁ is calculated based on the learning time Ma1, the update amount ΔW _a1 is added to the weight vector W _a1 of the _first winner node _a1 , and the update amount ΔW _j is added to the weight vector W _s1 of the adjacent node j, and this result is stored in the temporary storage unit.

Step S21
The old-age side deletion unit 50 increases the ages of all the sides directly connected to the first winner node by a predetermined value with respect to the ages of the nodes, the sides between the nodes, and the sides stored in the temporary storage unit. Store the result in the temporary storage. Further, the old-age side deletion unit 50 deletes the side stored in the temporary storage unit that has an age exceeding a predetermined threshold value preset and stored in the temporary storage unit, and stores the result in the temporary storage unit. Store.

Step S3
The side learning time update unit 2 updates the side learning time as in the first embodiment. Since the details of the edge learning time update are the same as those in the first embodiment, the description thereof will be omitted. However, if the winning edge does not exist, the edge learning time is not updated.

Step S22
Similar to step S41 in the information processing apparatus 100 according to the first embodiment, the input number determination unit 31 presets the total number of given input vectors for the total number of given input vectors stored in the temporary storage unit. It is determined whether or not it is a multiple of a predetermined number of units stored in the temporary storage unit, and the result is stored in the temporary storage unit. Then, if the number of inputs of the input vector is a multiple of a predetermined number of units, the process proceeds to step S4, and if the number of inputs of the input vector is not a multiple of the predetermined number of units, the process proceeds to step S5.

Step S4
Subsequent load equilibration is the same as that of the first embodiment, and thus the description thereof will be omitted. However, since the process corresponding to step S41 is executed in step S22 above, the process to be executed here is the process of steps S42 to S45.

Step S51
After load balancing, the subcluster determination unit 51 locally has the highest node density for the nodes stored in the temporary storage, the vertices between the nodes, the subcluster label of the node, the node density, and the boronoy region. Is assigned a different subcluster label to each vertex, and all nodes to which the subcluster label is not assigned are assigned the same subcluster label as the adjacent node with the highest node density. A boronoy region is generated based on the vertices whose node density is larger than a predetermined threshold among the vertices, and in the generated boronoi region, the subcluster including the reference vertex and other vertices different from the reference vertex are generated. If the subcluster containing the overlap area has an overlapping area and the condition of the average node density of the nodes located in the overlapping area is satisfied, the subcluster label of the subcluster containing the reference vertex includes other vertices. It is assigned as a subcluster label of the subcluster, and the result is stored in the temporary storage unit. The process by the subcluster determination unit 51 can determine the subcluster, for example, by performing the same process as in steps S201 to S205 and S301 to S305 in Patent Document 2.

Step S52
The noise node deletion unit 52 deletes a node regarded as a noise node for all the nodes a included in the node set A stored in the temporary storage unit, and stores the result in the temporary storage unit. The noise node deleting unit 52 executes the processes shown in steps S601 to S604 in Patent Document 2, for example, regarding the nodes stored in the temporary storage unit, the sides between the nodes, the number of adjacent nodes, and the node density, and the node of interest. Based on the number of adjacent nodes of a and the node density, the node of interest can be deleted and the result can be stored in the temporary storage unit.

Step S5
The learning end determination unit 53 determines whether or not to end the learning process by the information processing apparatus 300, as in step S5 in the information processing apparatus 100 according to the first embodiment. If it is determined that the process is not completed, the process returns to step S11 and the next input vector ε is processed. On the other hand, if it is determined to end, the process proceeds to step S53.

Step S53
The class determination unit 54 determines the class to which the node belongs based on the node stored in the temporary storage unit, the side between the nodes, and the class of the node based on the side generated between the nodes, and the result is stored in the temporary storage unit. Store in. The class determination unit 54 may determine the class by performing the same processing as in steps S701 to S704 in Patent Document 2, for example.

After that, the output information display unit 6 may output the number of classes of the class to which the node belongs and the prototype vector of each class for the node and the class of the node stored in the temporary storage unit. After completing the above processing, learning is stopped.

As described above, according to the present embodiment, the load balancing according to the first embodiment can be applied to accurately learn the structure of the input vector.

As described above, it goes without saying that, in the present embodiment, the load equilibration according to the second embodiment may be applied instead of the load equilibration according to the first embodiment.

Further, the order of the processes (steps) shown in the flowchart of FIG. 29 is an example, and the order of the processes (steps) may be changed as appropriate.

Embodiment 4
In the third embodiment, as a specific example of the process for performing the load equilibration according to the first embodiment, the load equilibration of LB-SOINN in Patent Document 2 is replaced with the load equilibration according to the first embodiment. An example was explained. On the other hand, in the fourth embodiment, another specific example of the process in the case of performing the load equilibration according to the first embodiment will be described. In the fourth embodiment, the term "learning" is used for a series of processes including not only the neural network configuration process but also the load parallelization process.

FIG. 30 schematically shows the configuration of the information processing apparatus 400 according to the fourth embodiment. The information processing apparatus 400 has a configuration in which the learning processing unit 4 of the information processing apparatus 300 according to the third embodiment is replaced with the learning processing unit 7, and the clustering unit 5 is replaced with the clustering unit 8.

The learning processing unit 7 includes a part of the components of the learning processing unit 4, specifically, an input information acquisition unit 41, a node insertion determination unit 44, a node insertion unit 45, an edge connection unit 47, and a winner node learning time. It has a calculation unit 48, a weight vector update unit 49, and an old-age side deletion unit 50.

The clustering unit 8 includes a part of the components of the clustering unit 5, and specifically includes a noise node deletion unit 52, a learning end determination unit 53, and a class determination unit 54.

Hereinafter, the operation of the information processing apparatus 400 will be described. FIG. 31 shows a flowchart of the operation of the information processing apparatus 400 according to the fourth embodiment.

Step S11
Since step S11 is the same as that of the third embodiment (FIG. 29), the description thereof will be omitted.

Step S2
Based on the input vector input in step S11, the winner node detection unit 1 detects the first winner node and the second winner node as in the case of the information processing apparatus 100 according to the first embodiment (FIG. 4). Then, the result is stored in the temporary storage unit.

Step S14
Since step S14 is the same as that of the third embodiment (FIG. 29), the description thereof will be omitted. If it is determined that the node is not inserted, the process proceeds to step S17.

Step S15
Similar to the third embodiment (FIG. 29), when it is determined in step S14 that the node is to be inserted, the node is inserted. The difference from the third embodiment is that the process proceeds to step S17.

Step S17
Regardless of the determination result in step S14 that the node is not inserted or after step S15, that is, regardless of the determination result in step S14 which is the node insertion determination, the side connection portion 47 is the same as in the third embodiment (FIG. 29). Make a node connection. Specifically, the side connection unit 47 connects a side between the first winner node and the second winner node, and stores the result in the temporary storage unit. If an edge already exists between the first winner node and the second winner node, that edge is maintained. Further, the side connection unit 47 sets the age of the connected side or the maintained side to 0, and stores the result in the temporary storage unit.

Steps S19 to S21, S3
Since steps S19 to S21 and S3 are the same as those in the third embodiment (FIG. 29), the description thereof will be omitted.

Step S70
The input number determination unit 31 is the same as in step S41 (FIG. 4) in the information processing apparatus 100 according to the first embodiment, with respect to the total number of given input vectors stored in the temporary storage unit. It is determined whether or not the total number is a multiple of a predetermined number of units preset and stored in the temporary storage unit, and the result is stored in the temporary storage unit. In this embodiment, the number of predetermined units is two, and the predetermined number of units used in step S70 is the first number of units (λ1). Further, in the present embodiment, the first unit number (λ1) is set to a fixed value, but the present invention is not limited to this, and for example, the first unit number (λ1) is appropriately changed according to the number of inputs of the input vector. You may.

By executing such processing, it may be possible to avoid executing unnecessary processing while accurately grasping the class. This is due to the following reasons. Step S70 defines the number of times the class determination is executed in the next step S71. When the number of inputs is small, the class is often not stable yet, so if λ1 is made small, the class can be grasped accurately. On the other hand, as the number of inputs increases, the class often becomes stable, so if λ1 is large, unnecessary processing can be reduced.

Step S71
When the total number of inputs of the input vector is an integral multiple of the first unit number (λ1), the class is determined in the same manner as in step S53 of the third embodiment (FIG. 29). Since the details of the process are the same as those in step S53, the description thereof will be omitted.

Step S72
When the number of inputs of the input vector is other than the first unit λ1 in step S70, or after step S71, that is, regardless of the determination result in step S70, the input number determination process of step S72 is performed. The input number determination unit 31 is the same as in step S41 (FIG. 4) in the information processing apparatus 100 according to the first embodiment, with respect to the total number of given input vectors stored in the temporary storage unit. It is determined whether or not the total number is a multiple of a predetermined number of units preset and stored in the temporary storage unit, and the result is stored in the temporary storage unit. The predetermined number of units used in step S71 is defined as the second unit number (λ2). Further, in the present embodiment, the second unit number (λ2) is set to a fixed value, but the present invention is not limited to this, and for example, the frequency of subsequent processing is appropriately determined according to the number of inputs according to the number of inputs of the input vector. May be adjusted.

Step S4
When the total number of inputs of the input vector is an integral multiple of the second unit number (λ2), the target edge detection unit 32 prepares a threshold TH1 for each class based on the classes classified in step S71. The target edge is detected from each class. The threshold value TH1 can be any positive value, for example, a value obtained by multiplying the average value _TAVE of the side learning time of the side of the corresponding class stored in the temporary storage unit by a predetermined coefficient c (TH1 = c. It may be T _AVE ). Since other processes are the same as those in the third embodiment (FIG. 29), the description thereof will be omitted.

Step S52
Similar to the third embodiment (FIG. 29), the noise node deletion unit 52 deletes all the nodes a included in the node set A stored in the temporary storage unit, and deletes the nodes regarded as noise nodes. Store the result in the temporary storage. The noise node deletion unit 52 can delete the node of interest based on the number of adjacent nodes and sides of the node a of interest, and store the result in the temporary storage unit, for example. In the present embodiment, the node in which the number of sides is 0 is deleted, but the present invention is not limited to this. In the present embodiment, an example in which the processing is performed based on the same second predetermined number of units (λ2) in steps S4 and S52 has been described, but this is merely an example. Step S4 may perform processing based on the second predetermined number of units (λ2), and step S52 may perform processing based on the third predetermined number of units (λ3). When processing is performed based on a different number of units, the number of parameters increases and the degree of freedom increases, so that the user's adjustment time increases, but it can be assumed that the input data can be expressed more accurately.

Step S5
The learning end determination unit 53 determines whether or not to end the learning process by the information processing apparatus 400, as in step S5 (FIG. 4) in the information processing apparatus 100 according to the first embodiment. If it is determined that the learning is not completed, the process returns to step S11.

Step S53
Since step S53 is the same as that of the third embodiment (FIG. 29), the description thereof will be omitted.

Needless to say, also in the present embodiment, the load equilibration according to the second embodiment may be applied instead of the load equilibration according to the first embodiment. Further, the processing of this embodiment is only an example, and the processing and order of the steps may be changed as appropriate.

Other Embodiments The present invention is not limited to the above embodiments, and can be appropriately modified without departing from the spirit. For example, regarding the distance scale, since sample data cannot be obtained in advance when performing online additional learning, it is necessary to analyze the number of dimensions of the input vector in advance to determine which distance scale is effective. I can't. Therefore, as described by using equation (14) in Patent Document 2, a new distance scale may be introduced by combining different distance scales to represent the distance between two nodes. For example, as shown by the equation (17) derived by using the equations (14) to (16) in Patent Document 2, a new distance scale that combines the Euclidean distance and the cosine distance may be used.

Further, regarding the distance scale, the case where the Euclidean distance is combined with the cosine distance has been described as an example, but the present invention is not limited to this, and other distance scales (for example, cosine distance, Manhattan distance, fractional distance) may be combined. good. Further, it is not limited to a valid distance scale in a high-dimensional space, and may be combined with other distance scales according to the problem to be learned.

In the above-described embodiment, the magnitude determination of the two values has been described, but this is merely an example, and the case where the two values are equal in the magnitude determination of the two values may be handled as necessary. That is, the determination of whether the first value is greater than or equal to the second value or less than the second value, and whether the first value is greater than or less than the second value or less than or equal to the second value. As for the determination of, any of them may be adopted as necessary. Determining whether the first value is less than or equal to the second value or greater than the second value, and whether the first value is less than the second value or greater than or equal to the second value. Any of these may be adopted. In other words, when the magnitude of two values is determined to obtain two determination results, the case where the two values are equal may be included in any of the two determination results as necessary.

In the above-described embodiment, the present invention has been described mainly as a hardware configuration, but the present invention is not limited to this, and arbitrary processing is realized by causing a CPU (Central Processing Unit) to execute a computer program. It is also possible to do. In this case, the computer program can be stored and supplied to the computer using various types of non-transitory computer readable media. Non-temporary computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), optomagnetic recording media (eg, optomagnetic disks), CD-ROMs (Read Only Memory), CD-Rs, etc. Includes CD-R / W, semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)). The program may also be supplied to the computer by various types of temporary computer-readable media. Examples of temporary computer readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

Although the invention of the present application has been described above with reference to the embodiments, the invention of the present application is not limited to the above. Various changes that can be understood by those skilled in the art can be made within the scope of the invention in the configuration and details of the invention of the present application.

This application claims priority on the basis of Japanese Application Japanese Patent Application No. 2020-218714 filed on December 28, 2020 and incorporates all of its disclosures herein.

1 Winner node detection unit 2 Side learning time update unit 3 Load balancing unit 4, 7

Learning processing unit

5, 8 Clustering unit 6 Output information display unit 10 Computer 11 CPU
12 ROM
13 RAM
14 Bus 15 I / O interface 16 Input unit 17 Output unit 18 Storage unit 19 Communication unit 20 Drive 20A Magnetic disk

20B Optical disk

20C Flexible disk

20D Semiconductor memory 31 Input number determination unit 32 Target side detection unit 33 Side top node insertion unit
34 Target side deletion part 35 New node side connection part 41 Input information acquisition part 42 Node density update judgment part 43 Node density calculation part 44 Node insertion judgment part 45 Node insertion part 46 Side connection judgment part 47 Side connection part 48 Winner node learning time Calculation unit 49 Weight vector update unit 50 Old-age side deletion unit 51 Subcluster determination unit 52 Noise node deletion unit 53 Learning end determination unit 54

Class determination unit

100, 300, 400 Information processing device

Claims

Information processing in which input vectors are sequentially input and the input distribution structure of the input vector is learned as a network structure in which a plurality of nodes described by a multidimensional vector and a plurality of sides connecting the two nodes are arranged. In the device
From the plurality of nodes included in the network structure, the node located at the closest distance to the input input vector is detected as the first winner node, and the node located at the second closest distance is set as the second winner node. Winner node detector to detect and
An edge learning time update unit that increases the edge learning time of the edge connecting between the first winner node and the second winner node by the first value, and
A load that selects one or more sides from the plurality of sides at a predetermined timing based on the side learning time, creates a new node on each of the selected one or more sides, and inserts the new node into the network structure. Equipped with a balancing unit,
Information processing equipment.
The edge learning time update unit further includes an edge learning time of an edge connected to the first winner node other than the edge connecting between the first winner node and the second winner node, and the second winner. The edge learning time of the edge connected to the node is increased by a second value smaller than the first value.
The information processing apparatus according to claim 1.
The load balancing unit selects one or more edges having a relatively large edge learning time.
The information processing apparatus according to claim 1 or 2.
The load balancing unit selects one or more edges whose edge learning time is larger than a predetermined edge learning time threshold.
The information processing apparatus according to claim 3.
The load balancing unit selects one or more sides from the plurality of sides based on the side learning time and the length of the sides.
The information processing apparatus according to any one of claims 1 to 4.
The load balancing unit selects one or more sides having a relatively large length.
The information processing apparatus according to claim 5.
The load balancing unit selects one or more sides whose length is larger than a predetermined value.
The information processing apparatus according to claim 6.
The load balancing unit is based on the number of wins of the first node connected to one end of each side and the number of wins of the second node connected to the other end in the selected one or more sides. Determine where to generate the node,
The information processing apparatus according to any one of claims 1 to 7.
The load balancing unit generates the new node at the center of gravity position calculated from the number of wins of the first node and the number of wins of the second node on each side in one or more selected sides.
The information processing apparatus according to claim 8.
The load balancing unit is
Delete one or more of the selected edges
Connect the first side that connects the first node connected to one end of each deleted side and the new node, and the second node connected to the other end of each deleted side and the new node. Generate a second side and insert it into the network structure.
The information processing apparatus according to any one of claims 1 to 9.
The load balancing unit causes each of the first side and the second side to inherit the side learning time of each of the deleted sides at a predetermined ratio.
The information processing apparatus according to claim 10.
For each of the first and second sides, the number of wins of the first node and the number of wins of the second node are divided by the sum of the number of wins of the first node and the number of wins of the second node. Inherit the side learning time of each deleted side by the ratio indicated by the specified value.
The information processing apparatus according to claim 11.
The information processing device is
A node insertion determination unit that determines whether or not to execute node insertion based on the distance between the input vector and the first winner node and the distance between the input vector and the second winner node.
As a result of the determination by the node insertion determination unit, when the node insertion is executed, an insertion node having the same component as the component of the input vector is generated as a weight vector, and the generated insertion node is inserted into the network structure. With a node insertion part,
The information processing apparatus according to any one of claims 1 to 12.
Information processing in which input vectors are sequentially input and the input distribution structure of the input vector is learned as a network structure in which a plurality of nodes described by a multidimensional vector and a plurality of sides connecting the two nodes are arranged. It ’s a method,
The winner node detection unit detects the node located at the closest distance to the input vector input from the plurality of nodes included in the network structure as the first winner node, and the node located at the second closest distance. Is detected as the second winner node,
The edge learning time update unit increases the edge learning time of the edge connecting between the first winner node and the second winner node by a predetermined value.
The load balancing unit selects one or more sides from the plurality of sides at a predetermined timing based on the side learning time, and generates a new node on each of the selected one or more sides. Insert into the network structure,
Information processing method.
A process of sequentially inputting input vectors and learning the input distribution structure of the input vector as a network structure in which a plurality of nodes described by a multidimensional vector and a plurality of sides connecting two said nodes are arranged. A program that lets a computer run
From the plurality of nodes included in the network structure, the node located at the closest distance to the input input vector is detected as the first winner node, and the node located at the second closest distance is set as the second winner node. Processing to detect and
A process in which the edge learning time update unit increases the edge learning time of the edge connecting between the first winner node and the second winner node by a predetermined value.
A process of selecting one or more sides from the plurality of sides at a predetermined timing based on the side learning time, generating a new node on each of the selected one or more sides, and inserting the new node into the network structure. And let the computer do it,
A non-temporary computer-readable medium containing a program.