WO2022158027A1

WO2022158027A1 - Cluster processing device, clustering processing method, non-transitory computer-readable medium, and information processing device

Info

Publication number: WO2022158027A1
Application number: PCT/JP2021/031795
Authority: WO
Inventors: 修長谷川; 洸輔井加田; 直純津田
Original assignee: Ｓｏｉｎｎ株式会社
Priority date: 2021-01-19
Filing date: 2021-08-30
Publication date: 2022-07-28
Also published as: JPWO2022158027A1

Abstract

A teaching data creation unit (100) has an initial clustering processing unit (100A) and an additional clustering processing unit (100B). The initial clustering processing unit (100A) clusters the nodes included in input data (IN1) for teaching data creation and acquires clustering intermediate data (DINT_L) in which labels are added to nodes belonging to a cluster. The additional clustering processing unit (100B) creates teaching data (DTCH) in which an unlabeled node that is included in the clustering intermediate data (DINT_L) is assigned the same label as the one added to a node that is in the shortest distance from an unlabeled node that belongs to any cluster included in the clustering intermediate data (DINT_L).

Description

Clustering processing device, clustering processing method, non-transitory computer-readable medium, and information processing device

The present invention relates to a clustering processing device, a clustering processing method, a program, and an information processing device, for example, a clustering processing device, a clustering processing method, and a clustering processing method that sequentially input input vectors belonging to an arbitrary class and learn the input distribution structure of the input vectors. It relates to a program and an information processing device.

In recent years, a method of performing supervised learning on various input data and classifying the input data is known. In this case, it is necessary to prepare teacher data in which appropriate labels are assigned to the data in advance. In order to create such teacher data, unlabeled input data for creating teacher data is prepared, and clustering is performed by unsupervised learning using a clustering method such as the k-means method. Creates teacher data with a label assigned to each cluster.

An efficient learning method for creating such training data is called Self-Organizing Incremental Neural Network (SOINN), which is a learning method that grows neurons as needed during learning. A method is proposed. SOINN has many advantages such as being able to learn non-stationary inputs by autonomously managing the number of nodes, and being able to extract an appropriate number of classes and topological structure even for classes with complex distribution shapes. have As an application example of SOINN, for example, in pattern recognition, after learning a class of hiragana characters, a class of katakana characters can be additionally learned.

As an example of such SOINN, a technique called E-SOINN (Enhanced SOINN) has been proposed (Patent Document 1). E-SOINN is capable of online additional learning in which learning is added at any time, and has the advantage of being more efficient than batch learning. Therefore, in E-SOINN, additional learning is possible even when the learning environment changes to a new environment. E-SOINN also has the advantage of high noise resistance to input data.

However, in SOINN including E-SOINN, it is difficult to insert a new node into the network. There was a problem of being different. In order to solve these problems, a technique called LB-SOINN (Load Balance Self-Organizing Incremental Neural Network) has been proposed (Patent Document 2). LB-SOINN treats the load of a node in the network as the node learning time, detects a node with a large node learning time, and puts the weight vector of the detected node on the edge connecting the detected node and its adjacent nodes. Create a new node with the weight vector determined based on. As a result, it is possible to more accurately learn the structure of the input data by alleviating an increase in the learning time of the detected node and generating a new node in its vicinity.

JP 2008-217246 A JP 2014-164396 A

However, even with these methods, it may not be possible to label all nodes after clustering. In such a case, it is necessary for the operator to refer to the result after clustering and manually assign a label considered appropriate to the unlabeled nodes. Therefore, it takes a long time to create training data. In this case, it is conceivable that the teacher data obtained may vary depending on the operator who performs the work.

It is also possible to apply labels to unlabeled nodes by applying other methods such as active learning (Non-Patent Document 1). However, even in this case, if unlabeled nodes remain, it is still necessary to manually assign labels. Also, with this method, the teacher data must be created first, so manual labeling is required here as well.

Furthermore, when labeling is done manually, if the amount of data becomes excessively large, labeling itself becomes difficult in the first place.

In this way, it is difficult to create training data quickly and automatically with general training data creation methods. Therefore, in unsupervised learning of input data, there is a demand for a technique that can assign appropriate labels to all nodes.

The present invention has been made in view of the above circumstances, and provides a clustering processing apparatus, clustering processing method, program, and information processing capable of assigning appropriate labels to all nodes in unsupervised learning of input data. The purpose is to provide an apparatus.

A clustering processing apparatus according to an embodiment of the present invention clusters input data composed of a plurality of unlabeled nodes described by multidimensional vectors, and clusters intermediate data obtained by labeling nodes belonging to the clusters. and an unlabeled node included in the clustering intermediate data, among the nodes belonging to any of the clusters included in the clustering intermediate data, the label attached and an additional clustering processing unit that creates clustering result data with the same label as that attached to the node at the shortest distance from the node that does not have the clustering result. As a result, clustering result data can be created in which the same label as the shortest-distance node belonging to any cluster is assigned to a node that was not assigned a label in the initial clustering process.

A clustering processing device according to an embodiment of the present invention is the clustering processing device described above, wherein the additional clustering processing unit selects unlabeled nodes that do not belong to any of the clusters from the nodes included in the clustering intermediate data. a node selection unit that selects one, a distance calculation unit that calculates the distance between the selected one unlabeled node and all nodes belonging to the cluster, and based on the calculated distance, A belonging cluster determination unit that identifies a shortest distance node at the shortest distance from the selected one unlabeled node from all nodes belonging to the cluster, and selects the same label as the label given to the shortest distance node. and a label assigning unit that assigns to one unlabeled node that has been labeled. As a result, the shortest distance between a node to which no label was given in the initial clustering process and a node belonging to the cluster can be calculated, and the same label as the shortest distance node can be given to the node to which the label was not given.

A clustering processing device according to an embodiment of the present invention is the clustering processing device described above, wherein the additional clustering processing unit determines whether or not a node to which the intermediate clustering data is not labeled exists. The additional clustering processing unit further includes a determination unit, wherein the node selection unit and the distance calculation unit continue until the progress determination unit determines that the clustering intermediate data includes unlabeled nodes. , the processing by the belonging cluster determining unit and the label assigning unit are repeated. As a result, labels can be assigned to all pairs of nodes that were not labeled in the initial clustering process.

A clustering processing method according to an embodiment of the present invention clusters input data consisting of a plurality of unlabeled nodes described by multidimensional vectors, and clusters intermediate data obtained by labeling nodes belonging to the clusters. and the shortest from the unlabeled node among the nodes belonging to any of the clusters included in the clustering intermediate data to the unlabeled node included in the clustering intermediate data It creates clustering result data with the same labels as those attached to the nodes in the distance. As a result, clustering result data can be created in which the same label as the shortest-distance node belonging to any cluster is assigned to a node that was not assigned a label in the initial clustering process.

A program according to an embodiment of the present invention clusters input data consisting of a plurality of unlabeled nodes described by multidimensional vectors, and acquires clustering intermediate data in which the nodes belonging to the clusters are labeled. and for an unlabeled node included in the clustering intermediate data, the node belonging to one of the clusters included in the clustering intermediate data, from the unlabeled node to the shortest and a process of creating clustering result data with the same labels as those attached to nodes at a distance. As a result, clustering result data can be created in which the same label as the shortest-distance node belonging to any cluster is assigned to a node that was not assigned a label in the initial clustering process.

An information processing apparatus according to an embodiment of the present invention creates teacher data by performing a clustering process on input data for creating teacher data, which is composed of a plurality of unlabeled nodes described by multidimensional vectors. a supervised learning unit that assigns a label to a node of input data to be learned, which is composed of a plurality of unlabeled nodes described by a multidimensional vector, based on the teacher data; a display unit for displaying a result of processing by the supervised learning unit, wherein the teacher data creation unit clusters input data composed of a plurality of unlabeled nodes described by multidimensional vectors, and clusters an initial clustering processing unit that obtains clustering intermediate data in which labels are assigned to nodes belonging to; and an additional clustering processing unit for creating clustering result data with the same label attached to the node at the shortest distance from the unlabeled node among the nodes belonging to the . As a result, clustering result data can be created in which the same label as the shortest-distance node belonging to any cluster is assigned to a node that was not assigned a label in the initial clustering process.

According to the present invention, in unsupervised learning of input data, it is possible to provide a clustering processing device, a clustering processing method, a program, and an information processing device capable of assigning appropriate labels to all nodes after clustering.

1 is a diagram illustrating an example of a system configuration for realizing an information processing apparatus according to a first embodiment; FIG. 1 is a diagram schematically showing a basic configuration of an information processing apparatus according to a first embodiment; FIG. 2 is a diagram showing in more detail the configuration of the information processing apparatus according to the first embodiment; FIG. 3 is a diagram illustrating another configuration example of the information processing apparatus according to the first embodiment; FIG. 2 is a diagram schematically showing the basic configuration of a teacher data creation unit according to the first embodiment; FIG. FIG. 10 is a diagram showing an example of teacher data creation input data IN1 used to create teacher data; FIG. 10 is a diagram showing an example of clustering intermediate data acquired by clustering processing of an initial clustering processing unit; 4 is a diagram showing in more detail the configuration of the teacher data creation unit according to the first embodiment; FIG. 5 is a flowchart of clustering processing performed by a teacher data creation unit according to the first embodiment; 4 shows a node selected by a node selection unit; FIG. 10 is a diagram showing an example of teacher data D _TCH created by automatically assigning labels to all nodes;

Embodiments of the present invention will be described below with reference to the drawings. In each drawing, the same elements are denoted by the same reference numerals, and redundant description will be omitted as necessary.

Embodiment 1
1 is a diagram illustrating an example of a system configuration for realizing an information processing apparatus according to a first embodiment; FIG. The information processing apparatus 1000 can be implemented by a computer 10 such as a dedicated computer or a personal computer (PC). However, the computer does not need to be physically single, and multiple computers may be used when performing distributed processing. As shown in FIG. 1, a computer 10 has a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12 and a RAM (Random Access Memory) 13, which are interconnected via a bus 14. there is It should be noted that although the explanation of OS software and the like for operating the computer will be omitted, it is assumed that the computer that constructs this information processing apparatus also has it as a matter of course.

An input/output interface 15 is also connected to the bus 14 . The input/output interface 15 includes, for example, an input unit 16 made up of a keyboard, mouse, sensors, etc., a display made up of a CRT, LCD, etc., an output unit 17 made up of headphones, speakers, etc., a storage unit 18 made up of a hard disk, etc. A communication unit 19 including a modem, a terminal adapter, etc. is connected.

The CPU 11 executes various processes according to various programs stored in the ROM 12 or various programs loaded from the storage section 18 to the RAM 13. In this embodiment, for example, processes of each section of the information processing apparatus 1000 to be described later. . Separately from the CPU 11, a GPU (Graphics Processing Unit) is provided, and in the same manner as the CPU 11, various programs stored in the ROM 12 or various programs loaded from the storage unit 18 to the RAM 13 are performed. For example, the processing of each part of the information processing apparatus 1000 described later may be executed.The GPU is suitable for performing routine processing in parallel, and can be applied to processing in a neural network described later. , the processing speed can be improved compared to the CPU 11. The RAM 13 also stores data necessary for the CPU 11 and the GPU 21 to execute various kinds of processing.

The communication unit 19 performs, for example, communication processing via the Internet (not shown), transmits data provided by the CPU 11, and outputs data received from the communication partner to the CPU 11, RAM 13, and storage unit 18. The storage unit 18 communicates with the CPU 11 to save/delete information. The communication unit 19 also performs communication processing of analog signals or digital signals with other devices.

The input/output interface 15 is also connected to a drive 20 as necessary, and for example, a magnetic disk 20A, an optical disk 20B, a flexible disk 20C, or a semiconductor memory 20D is appropriately mounted, and a computer program read from them is required. is installed in the storage unit 18 according to the

Next, each process in the information processing apparatus 1000 according to this embodiment will be described. The information processing apparatus 1000 is input with a non-hierarchical neural network in which nodes described by an n-dimensional vector (n is an integer equal to or greater than 1) are arranged. The neural network is stored in a storage unit such as the RAM 13, for example.

The neural network according to the present embodiment is a self-propagating neural network that inputs an input vector into the neural network and automatically increases the number of nodes arranged in the neural network based on the inputted input vector. The number of nodes can be automatically increased by using a type neural network.

The neural network in this embodiment has a non-hierarchical structure. By adopting a non-hierarchical structure, additional learning can be performed without specifying the timing of starting learning in other layers. That is, additional learning can be carried out online.

Input data is input as an n-dimensional input vector. For example, the input vectors are stored in a temporary storage unit (eg, RAM 13) and sequentially input to the neural network stored in the temporary storage unit.

A specific configuration of the information processing apparatus 1000 according to the first embodiment will be described below. The information processing apparatus 1000 is first provided with input data for creating teacher data used for creating teacher data, and based on this, teacher data is created. Then, the information processing apparatus 1000 uses the created teacher data to assign labels to the data (nodes) included in the input data using the teacher data for learning target input data that is separately given.

FIG. 2 schematically shows the basic configuration of the information processing device 1000 according to the first embodiment. Further, FIG. 3 shows in more detail the configuration of the information processing apparatus 1000 according to the first embodiment. The information processing apparatus 1000 has at least a teacher data creation unit 100 and may further have a supervised learning unit 110 and a display unit 120 . In this example, input data IN1 for creating teacher data and input data IN2 to be learned for supervised learning are supplied from the outside of information processing apparatus 1000 to teacher data creating unit 100 and supervised learning unit 110, respectively. The input data IN1 for creating teacher data and the learning target input data IN2 to be subjected to supervised learning are, as described above, data containing a plurality of nodes described by multidimensional vectors.

Note that the input data IN1 for creating teacher data and the input data IN2 to be learned may be stored in the storage unit provided in the information processing apparatus 1000 . FIG. 4 shows another configuration example of the information processing apparatus 1000 according to the first embodiment. In this example, the information processing device 1000 further has a storage unit 130 . The teacher data creation input data IN1 and the learning target input data IN2 are appropriately stored in the storage unit 130, and the teacher data creation unit 100 and the supervised learning unit 110 store the teacher data creation input data IN1 and the learning target input data IN1 as necessary. Data IN2 may be read from storage unit 130 . The input data IN1 for creating teacher data and the input data IN2 to be learned may be stored in the storage unit 130 in advance or at arbitrary timing by the operator of the information processing apparatus 1000 . Note that the storage unit 130 corresponds to, for example, one or both of the ROM 12 and the storage unit 18 shown in FIG.

The teacher data creation unit 100 performs clustering on the teacher data creation input data IN1 using an unsupervised learning method such as the k-means method or the SOINN method described above to create clustering intermediate data. The teacher data _DTCH is created by assigning labels to unlabeled nodes included in the intermediate data.

In other words, the training data creation unit 100 is configured to perform clustering processing capable of automatically assigning labels to all nodes belonging to clusters after the first clustering processing.

The application of the clustering process performed by the teacher data creation unit 100 is not limited to the creation of teacher data, and can be applied to various clustering processes of input data. Therefore, the training data creation unit 100 is also simply referred to as a clustering processing device.

FIG. 5 schematically shows the basic configuration of the teacher data creation unit 100. The training data creation unit 100 has an initial clustering processing unit 100A and an additional clustering processing unit 100B.

The initial clustering processing unit 100A clusters the input data IN1 for creating teacher data using an unsupervised learning method such as the k-means method or the SOINN method described above, and assigns labels to the nodes as clustering intermediate data. _{D_INT_L} is configured as creating.

FIG. 6 shows an example of input data IN1 for creating teacher data used for creating teacher data. In this example, data in which unlabeled nodes (represented by white circles) are distributed on a two-dimensional plane is used as input data IN1 for creating teacher data.

FIG. 7 shows an example of clustering intermediate data _{DINT_L} acquired by the initial clustering processing of the initial clustering processing unit 100A. The clustering intermediate data D _{INT_L} in FIG. 7 includes, for example, four clusters C1 to C4, and multiple nodes belong to each of the clusters C1 to C4. Note that not all the nodes included in the clustering intermediate data _{DINT_L} belong to any cluster, and there are nodes that do not belong to any cluster and are not assigned any labels. In FIG. 7, nodes that do not belong to any cluster and that have not been given any labels are indicated by white circles as unlabeled nodes.

As described above, the clustering intermediate data _{DINT_L} includes unlabeled nodes, and further clustering processing is performed by the subsequent additional clustering processing unit 100B. Therefore, for the sake of distinction, the main body of processing here is referred to as the initial clustering processing unit 100A, and the clustering processing performed by the initial clustering processing unit 100A is referred to as the initial clustering processing.

The additional clustering processing unit 100B further performs a process of assigning labels to unlabeled nodes included in the clustering intermediate data _{DINT_L} to create teacher data _DTCH , which is clustering result data.

Here, in order to distinguish it from the initial clustering process, the subject of processing here is called the additional clustering processing unit 100B, and the clustering process performed by the additional clustering processing unit 100B is called the additional clustering process.

Next, the configuration and operation of the training data creation unit 100 will be described in more detail. FIG. 8 shows in more detail the configuration of the training data creation unit 100 according to the first embodiment. The initial clustering processing unit 100A has a data acquisition unit 101, a clustering processing unit 102, and a first labeling unit 103. FIG. The additional clustering processing unit 100B has a node selection unit 104, a distance calculation unit 105, an belonging cluster determination unit 106, a second label assignment unit 107, and a progress determination unit .

FIG. 9 shows a flowchart of clustering processing performed by the training data creation unit 100. As shown in FIG. The clustering process performed by the teacher data creation unit 100 creates teacher data D _TCH , which is clustering result data, through the following steps S1 to S8.

step S1
The data acquisition unit 101 acquires input data IN1 for creating teacher data from the outside of the information processing apparatus 1000 or from the storage unit 130 .

step S2
The clustering processing unit 102 performs initial clustering processing on input data IN1 for creating training data using an unsupervised learning method such as the k-means method or the SOINN method described above, and clustering intermediate data _DINT , which is the clustering result. get. As described above, the clustering intermediate data _{D_INT} at this stage includes nodes that do not belong to any cluster.

step S3
The first label assigning unit 103 assigns a label to a node that belongs to one of the clusters C1 to C4. For example, the first label assigning unit 103 may assign labels “C1”, “C2”, “C3” and “C4” to the nodes that belong to each of the clusters C1 to C4. However, the label assigned to the node that belongs to any of the clusters C1 to C4 is merely an example, and other appropriate labels may be assigned as necessary. The first labeling unit 103 outputs clustering intermediate data _{DINT_L} to which labels have been added. As described above, the clustering intermediate data _{DINT_L} at this stage also includes unlabeled nodes.

step S4
The node selection unit 104 selects one unlabeled node included in the labeled clustering intermediate data _{DINT_L} , and outputs a selection result SEL. FIG. 10 shows nodes selected by the node selection unit 104 . In this example, it is assumed that the unlabeled node NS in FIG. 10 is selected.

step S5
The distance calculation unit 105 calculates distances between the selected node NS and each node belonging to the clusters C1 to C4 based on the selection result SEL. Distance calculation section 105 outputs distance information DIS indicating the calculated distance. As the distance scale applied here, various distance scales such as cosine distance, Euclidean distance, Mahalanobis distance, Manhattan distance, and fractional distance can be used.

step S6
The belonging cluster determining unit 106 detects the shortest distance among the distances calculated based on the distance information DIS, and determines the node located at the shortest distance DIS _MIN from the node NS selected among the nodes belonging to the clusters C1 to C4. Identify. Here, as an example, it is assumed that one of the nodes belonging to cluster C4 is the shortest distance node NN. The belonging cluster determination unit 106 outputs node identification information ND indicating the identified node.

Step S7
The second label assigning unit 107 assigns the same label as the label assigned to the node specified by the belonging cluster determining unit 106 to the unlabeled node NS based on the node specifying information ND. For example, when the shortest distance node NN belonging to cluster C4 is given the label "C4", the belonging cluster determining unit 106 gives the same label "C4" to the unlabeled node NS. As a result, the labeled unlabeled node NS becomes a node belonging to the cluster C4.

Step S8
The progress determination unit 108 determines whether or not the clustering intermediate data _{DINT_L} after labeling includes an unlabeled node. If the clustering intermediate data _{DINT_L} includes an unlabeled node, the process returns to step S4. As a result, the processing of steps S4 to S7 is repeated until all unlabeled nodes included in the clustering intermediate data _{DINT_L} belong to any cluster including the shortest distance node.

On the other hand, if the labeled clustering intermediate data _{DINT_L} does not contain an unlabeled node, the progress determination unit 108 ends the clustering process and outputs the latest clustering intermediate data _{DINT_L} as the teacher data _DTCH .

By the processing shown in steps S1 to S8 described above, labels can be automatically assigned to all nodes even if unlabeled nodes remain in the initial clustering processing. FIG. 11 shows an example of teacher data D _TCH created by automatically labeling all nodes. As shown in FIG. 11, there is no unlabeled node in the training data D _TCH , and each node belongs to one of the clusters C1 to C4.

The supervised learning unit 110 labels the learning target input data IN2 separately provided based on the teacher data D _TCH created as described above.

The display unit 120 can appropriately display intermediate results and final results of clustering by the teacher data creation unit 100, processing results by the supervised learning unit 110, and the like.

As described above, according to this configuration, in unsupervised learning of input data, suitable labels can be automatically assigned to all nodes after clustering.

As a result, training data can be automatically and quickly created without manual intervention. In addition, it is possible to automatically perform a series of processes from creation of teacher data to unsupervised learning using the teacher data and display of the learning results.

Other Embodiments The present invention is not limited to the above-described embodiments, and can be modified as appropriate without departing from the scope of the invention. For example, regarding the distance measure, since sample data cannot be obtained in advance when performing online additional learning, it is necessary to analyze the dimensionality of the input vector in advance to determine which distance measure is effective. can't For this reason, different distance measures may be combined to introduce a new distance measure representing the distance between two nodes, as described using equation (14) in US Pat. For example, a new distance measure combining the Euclidean distance and the cosine distance may be used as shown in Equation (17) derived using Equations (14) to (16) in Patent Document 2.

Also, regarding the distance measure, the case of combining the cosine distance with the Euclidean distance has been described as an example, but it is not limited to this, and other distance measures (for example, cosine distance, Manhattan distance, fractional distance) may be combined. good. Furthermore, it is not limited to effective distance scales in high-dimensional space, and other distance scales may be combined according to the problem to be learned.

In the above-described embodiments, the present invention has been described mainly as a hardware configuration, but it is not limited to this, and arbitrary processing can be realized by causing a CPU (Central Processing Unit) to execute a computer program. It is also possible to In this case, the computer program can be stored and provided to the computer using various types of non-transitory computer readable medium. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (eg, flexible discs, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical discs), CD-ROMs (Read Only Memory), CD-Rs, CD-R/W, semiconductor memory (eg, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)). The program may also be supplied to the computer on various types of transitory computer readable medium. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. Transitory computer-readable media can deliver the program to the computer via wired channels, such as wires and optical fibers, or wireless channels.

Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the invention.

This application claims priority based on Japanese Patent Application No. 2021-6621 filed on January 19, 2021, and the entire disclosure thereof is incorporated herein.

10 computer 11 CPU
12 ROMs
13 RAM
14 bus 15 input/output interface 16 input unit 17 output unit 18 storage unit 19 communication unit 20 drive 20A magnetic disk 20B optical disk 20C flexible disk 20D semiconductor memory 100 teacher data creation unit 100A initial clustering processing unit 100B additional clustering processing unit 110 supervised learning Unit 120 Display unit 130 Storage unit 101 Data acquisition unit 102 Clustering processing unit 103 First labeling unit 104 Node selection unit 105 Distance calculation unit 106 Belonging cluster determination unit 107 Second labeling unit 108 Progress determination unit 1000 Information processing device C1- C4 cluster D _INT , D _{INT_L} clustering intermediate data D _TCH teacher data IN1 input data for creating teacher data IN2 input data for learning

Claims

an initial clustering processing unit that clusters input data composed of a plurality of unlabeled nodes described by a multidimensional vector and obtains clustering intermediate data in which labels are assigned to nodes belonging to the cluster;
With respect to the unlabeled node included in the clustering intermediate data, a node that is the shortest distance from the unlabeled node among the nodes belonging to any of the clusters included in the clustering intermediate data. An additional clustering processing unit that creates clustering result data with the same label attached to
Clustering processor.
The additional clustering processing unit
a node selection unit that selects one unlabeled node that does not belong to any of the clusters from the nodes included in the clustering intermediate data;
a distance calculation unit that calculates the distance between the selected one unlabeled node and all the nodes belonging to the cluster;
A belonging cluster determination unit that identifies a shortest distance node from the selected one unlabeled node from among all nodes belonging to the cluster based on the calculated distance;
a label assigning unit that assigns the same label as the label assigned to the shortest distance node to the selected one unlabeled node;
2. The clustering processing device according to claim 1.
The additional clustering processing unit further includes a progress determination unit that determines whether or not there is a node that is not labeled in the clustering intermediate data,
The additional clustering processing unit continues to perform the node selection unit, the distance calculation unit, the belonging cluster determination unit and repeating the processing by the labeling unit;
3. The clustering processing device according to claim 2.
clustering input data consisting of a plurality of unlabeled nodes described by a multidimensional vector, and obtaining clustering intermediate data in which the nodes belonging to the cluster are labeled;
With respect to the unlabeled node included in the clustering intermediate data, a node that is the shortest distance from the unlabeled node among the nodes belonging to any of the clusters included in the clustering intermediate data. Create clustering result data with the same labels as those attached to
Clustering processing method.
A process of clustering input data composed of a plurality of unlabeled nodes described by a multidimensional vector, and obtaining clustering intermediate data in which the nodes belonging to the cluster are labeled;
With respect to the unlabeled node included in the clustering intermediate data, a node that is the shortest distance from the unlabeled node among the nodes belonging to any of the clusters included in the clustering intermediate data. causing a computer to execute a process of creating clustering result data with the same label attached to
A non-transitory computer-readable medium that stores a program.
a teacher data creation unit that creates teacher data by clustering input data for creating teacher data composed of a plurality of unlabeled nodes described by multidimensional vectors;
a supervised learning unit that assigns a label to a node of input data to be learned, which is composed of a plurality of unlabeled nodes described by a multidimensional vector, based on the teacher data;
a display unit that displays a result of processing by the supervised learning unit;
The training data creation unit
an initial clustering processing unit that clusters input data composed of a plurality of unlabeled nodes described by a multidimensional vector and obtains clustering intermediate data in which labels are assigned to nodes belonging to the cluster;
With respect to the unlabeled node included in the clustering intermediate data, a node that is the shortest distance from the unlabeled node among the nodes belonging to any of the clusters included in the clustering intermediate data. An additional clustering processing unit that creates clustering result data with the same label attached to
Information processing equipment.