WO2020161845A1 - クラスタリング装置及びクラスタリング方法 - Google Patents
クラスタリング装置及びクラスタリング方法 Download PDFInfo
- Publication number
- WO2020161845A1 WO2020161845A1 PCT/JP2019/004315 JP2019004315W WO2020161845A1 WO 2020161845 A1 WO2020161845 A1 WO 2020161845A1 JP 2019004315 W JP2019004315 W JP 2019004315W WO 2020161845 A1 WO2020161845 A1 WO 2020161845A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cluster
- element data
- data
- value
- clusters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/325—Power saving in peripheral device
- G06F1/3275—Power saving in memory, e.g. RAM, cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23211—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to a clustering device and a clustering method.
- Clustering that classifies a set of a plurality of element data to be classified into a plurality of clusters (data group) is known, and has been frequently used in image analysis, data mining, big data analysis, etc. in recent years. There is. Further, such clustering is unsupervised learning of machine learning. In the clustering, similar element data are classified into the same cluster, and the element data are classified so that the element data are not as similar as possible between the clusters, and the tendency or the feature of the element data is extracted from the classification result.
- clustering methods for classifying a plurality of element data into clusters
- the k-means method is known as one of typical clustering methods.
- the number of clusters k is set in advance, for example, arbitrary k pieces of element data are selected from all N pieces of element data, and these are set as initial values of the center of gravity of k clusters (procedure). 1).
- each piece of element data is classified into a cluster of centroids having the shortest distance from the element data (procedure 2).
- the average of the element data in the cluster is set as the center of gravity of the new cluster (procedure 3).
- procedure 2 and procedure 3 are repeated until the center of gravity of each cluster does not change.
- the clustering method as described above classifies element data into clusters having a preset number of clusters, and it is necessary to preset the number of clusters.
- the evaluation value for each clustering result is calculated by changing the number of clusters, and the evaluation value is the extreme value or the number of clusters with the maximum or minimum value It has a number.
- a method is known in which, when new element data to be classified is added, the optimum number of clusters is determined as described above by using all element data including the new element data, and clustering is performed with the optimum number of clusters. (See Non-Patent Document 1).
- a method is known in which, when new element data is added, a cluster centroid that minimizes the distance from the new element data is identified, and the new element data is classified into a cluster having the identified cluster centroid (non- (See Patent Document 2).
- the method of performing clustering by obtaining the optimum number of clusters by using all element data including the new element data can expect highly accurate classification results, but has a computational load. Because it was large, it was inefficient. Further, it is not suitable for applications that require high-speed response for clustering results.
- the method of classifying the new element data into the cluster of the center of gravity of the cluster that minimizes the distance from the new element data has high responsiveness for the clustering result. May be less accurate. This is because the number of clusters before the addition of the element data is regarded as appropriate even after the addition of the element data.
- the present invention has been made in view of the above circumstances, and an object thereof is to provide a clustering device and a clustering method that contribute to efficient and highly accurate clustering.
- a clustering device of the present invention is a clustering device for classifying a plurality of element data, a data storage section for storing the element data, an evaluation value calculation section for calculating an evaluation value for evaluating a classification result, and a set number of clusters.
- a batch processing unit that classifies the plurality of element data into the cluster having the optimum number of clusters, and a new added An update processing unit that classifies the element data into the cluster closest to the new element data among the plurality of clusters into which the plurality of element data are classified by the batch processing unit; And a judgment unit for judging the validity of the classification result after the classification of the new element data based on the evaluation value obtained by classifying the new element data.
- a clustering device of the present invention uses a clustering unit that classifies a plurality of element data into any of a plurality of clusters, and a calculated value of a calculation process of the clustering unit that classifies the plurality of element data, Internal combination that is the first sum value of each cluster of the values obtained by normalizing the first index value for each cluster indicating the degree of dispersion of the element data by the first value based on the number of data of the cluster Degree and an external separation degree obtained by normalizing the second sum value of the second index values for each of the clusters, which is an index of the distance between the clusters, by a second value based on the number of clusters,
- the evaluation value calculation unit is configured to calculate an evaluation value for evaluating the classification result by the clustering unit from a predetermined arithmetic expression having the internal coupling degree and the external separation degree as variables.
- the clustering method of the present invention while changing the number of clusters, obtains the optimal number of clusters from the evaluation value that evaluates the classification result for each number of clusters obtained by performing classification using all element data,
- the new element data when new element data is added after classifying the element data by the optimum number of clusters obtained using all the element data, the new element data is added to the existing plurality of clusters. Since the cluster is classified into the closest cluster, and the validity of the classification result is judged based on the evaluation value obtained after this classification, efficient and highly accurate clustering can be performed.
- an internal coupling degree indicating a degree of dispersion of the element data in a cluster and an inter-cluster degree. Since the evaluation value is calculated based on the degree of separation and the external degree of separation, a highly accurate evaluation value that suppresses excessive classification can be efficiently obtained, and as a result, efficient and highly accurate clustering can be performed. become able to.
- FIG. 6 is a circuit diagram which shows the structure of the clustering apparatus which implemented this invention. It is explanatory drawing which shows the structure of a main memory and a delay circuit. It is a circuit diagram which shows the structure of the cell of a distance register part. It is a circuit diagram showing a configuration of a maximum value detection circuit. It is a circuit diagram which shows the structure of a CID mask circuit. It is a circuit diagram which shows the structure of a gravity center calculation circuit. It is a circuit diagram showing a configuration of a neighborhood search circuit. It is a block diagram which shows the structure of an evaluation value calculation circuit.
- FIG. 6 is a circuit diagram showing an enable signal circuit connected to a cell of a CID register.
- the clustering device 10 performs clustering on a plurality of element data.
- the clustering device 10 performs batch processing (batch processing) and update processing (online processing).
- the optimal number of clusters is obtained from the evaluation value for each number of clusters obtained by performing clustering using all the element data while changing the number of clusters, and the element data is classified by the obtained optimal number of clusters.
- This is a process of obtaining the clustering result (classification result).
- the k-means method is used as the clustering method in the batch processing, and the number of clusters having the maximum evaluation value (maximum) is set as the optimum number of clusters.
- the update process is a process for efficiently and quickly classifying the added new element data into an existing cluster when new element data is added after the batch processing. Further, in this update processing, an evaluation value is obtained after the added new element data is classified into clusters, and the validity of the update processing is determined using the evaluation value. The evaluation value obtained by this updating process is the same as the evaluation value obtained by the batch process. In the validity judgment, it is judged whether or not the clustering result by the update processing is valid, and if it is valid, the result by the update processing is taken as the final result, and if it is not valid, the batch processing is executed.
- the evaluation value for the number of clusters Nc found in the batch process and the update process is E(Nc)
- the evaluation value E(Nc) found in this example is expressed by equation (1). Note that i is 1, 2,... Nc, and is a cluster ID in this example.
- SWD Internal coupling degree that is the first sum value (normalized)
- SBS/Nc External separation (standardized)
- SBS second sum value
- X element data
- GG Data centroid C i is the centroid of all the component data: Cluster V i of the cluster ID "i": the cluster C i is the centroid of the cluster centroids n i: the cluster C i Number of element data of d(V i , GG): distance between the center of gravity V i of the cluster and the center of gravity of data GG d(X, V i ): element data X and the center of gravity of the cluster V i Distance between data centers of gravity, which is the distance between
- a center-of-gravity distance d (V i, GG) barycentric length DGV i data inter-centroid distance d (X, V i) data distance between the centers of gravity DXV i.
- data distance between centers of gravity DXV i when to distinguish the distance between cluster centroids V i of the element data X and the cluster C i in the cluster C i is sometimes referred to as a cluster within a distance DXV i.
- the element data will be referred to as element data X 1 , X 2, ...
- the element data X is a q (q is an integer of 1 or more) dimensional vector, and represents each feature amount such as image color, shade, and color distribution. Each dimension of the element data X is represented by N bits (for example, 8 bits).
- the data centroid GG is obtained as the arithmetic mean of all the element data X
- the cluster centroid V i is obtained as the arithmetic mean of each element data X in the cluster.
- Data centroids GG and cluster centroids V i is the same q-dimensional vector and element data X.
- the centroid distance DGV i and the data centroid distance DXV i are obtained as the Manhattan distance in this example.
- the denominator value SWD on the right side of Expression (1) is an internal coupling degree that indicates the degree of variance (similarity between element data) of the element data X in the cluster C i for all clusters.
- the degree of internal coupling in Expression (1) is calculated for each cluster C i of the coupling index value SWD i standardized by dividing the first index value SD i for each cluster C i by the number of data n i in the cluster. Is calculated as the first sum value, which is the sum of The first index value SD i is the sum of the intra-cluster distance DXV i for each element data X in the cluster C i .
- the numerator on the right side of Expression (1) has an external degree of separation that indicates the degree of separation of the cluster C i for all the clusters.
- the external degree of separation in the equation (1) is a standardized value obtained by dividing the cluster index value SBS by the number of clusters Nc.
- the cluster index value SBS is obtained as a second total sum value that is the total sum of the second index values SBS i for each cluster Ci that is an index of the distance between the clusters.
- the second index value SBS i is the center-of-gravity distance DGV i weighted by the number of data n i of the cluster C i .
- the second index value SBS i using the center of gravity distance DGV i is advantageous to reduce the number of calculations than with the distance between the cluster interconnect.
- the value of weighting is not limited to the number of data n i, may be weighted by value based on the number of data n i.
- the degree of internal coupling SWD is expressed as in equation (2) using the coupling index value SWD i
- the cluster index value SBS is expressed as in equation (3) using the second index value SBS i.
- the cluster C bond index at the i value SWD i and the second index value SBS i, respectively formula (4) is expressed by the equation (5).
- the first sum value is the internal coupling degree. Therefore, in all the element data X, when there is a group of element data X having a large range with respect to a group of other element data X, or when there is a group of element data X having a high distribution density of element data X, etc. Even in this case, the degree of dispersion of the element data X in each cluster Ci is properly reflected in the internal coupling degree. That is, when the excessive classification is performed, the evaluation value E(Nc) does not become larger, and the excessive classification is suppressed.
- the clustering device 10 includes a system controller 11 and a calculation unit 12.
- the arithmetic unit 12 includes a main memory 14, a center of gravity memory 15, a clustering arithmetic unit 16, a neighborhood search circuit unit 17, and an evaluation value calculation circuit 18.
- the system controller 11 inputs the element data X to the arithmetic unit 12, instruct execution of clustering and update processing of batch processing, sets the number of clusters Nc, and optimizes the number of clusters based on the evaluation value E(Nc) acquired from the arithmetic unit 12.
- the determination of Nc, the above-mentioned validity determination, control of power gating for the arithmetic unit 12 and the like are performed. Power gating will be described later.
- the system controller 11 acquires the cluster ID assigned to each element data X from the operation unit 12 as a clustering result, the cluster ID of the cluster Ci into which the added new element data Xnew is classified, and the like. In this example, the system controller 11 serves as the determination unit.
- the system controller 11 monitors the contents of the centroid memory 15, that is, each cluster centroid V i when executing the clustering by the arithmetic unit 12, and when the variation of each cluster centroid V i disappears, that is, the cluster centroid V i converges.
- end clustering instead of the cluster centroid V i converging, the clustering may be ended when a preset number of times of later-described classification calculation is performed.
- the system controller 11 stores the restoration data for each cluster number Nc at the time of batch processing.
- the restoration data is data for restoring the cluster ID, the intra-cluster distance DXV i , the cluster centroid V i , the number of data n i, and the like held in the arithmetic unit 12 into the clustered state with the optimal number of clusters Nc.
- the cluster ID assigned to each element data X is stored as the restoration data.
- the cluster centroid V i or the like may be used as the restoration data, or the cluster ID and the cluster centroid V i may be used.
- the data held in the arithmetic unit 12 can be restored by executing the clustering with the optimum number of clusters Nc again, but by using the cluster ID and the cluster centroid V i , the data can be restored at a high speed with a small amount of calculation. ..
- clustering such as the k-means method
- most of the calculation time required for the clustering is the time of iterative calculation for converging the cluster centroids.
- the number of iterations of calculation reaches several tens to several hundreds, depending on the total number of data of the element data X, and reaches 1000 times in the case of a large number.
- the cluster centroid V i that has converged once and the cluster ID determined by it it is possible to achieve high-speed (short time) and highly accurate clustering without performing iterative calculation.
- the arithmetic unit 12 is manufactured as an ASIC (Application Specific Integrated Circuit) that performs the above batch processing and update processing, and each unit of the arithmetic unit 12 is configured to operate synchronously based on a clock from a clock generator (not shown). Has been done.
- the arithmetic unit 12 is triggered by an instruction of clustering and update processing of collective processing by the system controller 11.
- the clustering calculation unit 16 is a circuit that performs clustering by the k-means method, and performs various calculations of clustering in batch processing, classifies each element data X into clusters C i, and the like.
- the clustering calculation unit 16 includes a delay circuit 21, a distance calculation circuit 22, a main register unit 26 including a distance register unit 24 and a CID (cluster ID) register unit 25, a maximum value detection circuit 27, and a CID (cluster ID) mask circuit. 28 and a center of gravity calculation circuit 29.
- the clustering calculation unit 16 constitutes a batch processing unit together with the system controller 11.
- the main memory 14 as a data storage unit stores a plurality of element data X written by the system controller 11. As shown in an example in FIG. 2, the main memory 14 has unit blocks 14a having the same N-bit capacity as each dimensional component (hereinafter, referred to as a vector component) of one element data X in a matrix of q ⁇ M. And q unit blocks 14a having the same dimension as the element data X are arranged in each column. The number M of unit blocks 14a arranged in the row direction is equal to or larger than the maximum number of element data X to be classified. The main memory 14 stores each element data X such that one vector component is stored in one unit block 14a and one element data X is stored in each unit block 14a in the same column.
- FIG. 2 illustrates a state in which the respective vector components Xp1, Xp2... Xpq of the element data Xp are written.
- the main memory 14 When reading the element data X, the main memory 14 sequentially reads M unit blocks 14a arranged in the row direction as one unit. As a result, the vector component of each element data X is output from the main memory 14 in parallel one dimension at a time. Note that the same reading is performed from the column in which the element data X is not written, and in this case, the component vector of "0" is read, for example.
- the main memory 14 may be configured to write the element data X in small capacity, such as in each unit block 14a. The same applies to the center of gravity memory 15.
- the center-of-gravity memory 15 stores the cluster center-of-gravity V i of each cluster, and has the same configuration as the main memory 14, although not shown. That is, the center-of-gravity memory 15 is formed by arranging q unit blocks of N bits in a column direction and arranging the unit blocks in a matrix, and stores the q-dimensional cluster center of gravity V i in each column.
- the center-of-gravity memory 15 has a smaller capacity than the main memory 14. From the center of gravity memory 15, each cluster center of gravity V i is read out, and for one cluster center of gravity V i , one column of vector components is sequentially read out.
- a non-volatile memory is used for the main memory 14 and the center of gravity memory 15.
- the non-volatile memory used for the main memory 14 and the center-of-gravity memory 15 those having a memory element such as an MTJ element are preferably used.
- the delay circuit 21 is provided between the main memory 14 and the distance calculation circuit 22.
- the delay circuit 21 inputs the element data X read from the main memory 14 and input to the distance calculation circuit 22, and the center of gravity calculation circuit 29 calculates the element data X from the element data X and inputs the element data X to the distance calculation circuit 22.
- the input timing of the cluster center of gravity V i is synchronized.
- the delay circuit 21 is composed of register sections 31 connected in multiple stages.
- Each register unit 31 is composed of M cells 31a.
- Each cell 31a is a register having a capacity of N bits.
- the input of the element data X to the distance calculation circuit 22 is delayed by sequentially sending the vector component from each cell 31a of one register unit 31 to each cell 31a of the register unit 31 of the next stage.
- the delay time of the delay circuit 21, that is, the number of stages of the register unit 31 is determined in advance from the number of clocks required to calculate the data centroid GG, the cluster centroid V i , and the like.
- the distance calculation circuit 22 receives each element data X input from the main memory 14 via the delay circuit 21 and the cluster centroid V i calculated by the centroid calculation circuit 29 during the batch processing. Further, in the update processing, as the cluster centroid V i , the one from the centroid memory 15 is input instead of the one from the centroid calculation circuit 29. The distance calculation circuit 22 calculates the data center-to-center-of-gravity distance DXV i for each input element data X in parallel.
- the input of vector data such as the element data X and the cluster centroid V i to the circuit means that the vector components are sequentially input.
- the distance register unit 24 holds each data centroid distance DXV i calculated by the distance calculation circuit 22, and the CID register unit 25 holds a cluster ID (cluster information).
- the content of the distance register unit 24 is such that, if the new data center-to-center-of-gravity distance DXV i calculated by the corresponding distance calculation circuit 22 is smaller than the data center-to-center-of-gravity distance DXV i stored at that time, the new data is calculated.
- the distance between the centers of gravity is updated to DXV i .
- Each cluster ID finally held in the CID register unit 25 at the time of clustering indicates a cluster into which each element data is classified, and the data centroid distance DXV i held in the distance register unit 24 indicates the intra-cluster distance. Show.
- the distance calculation circuit 22 has M cells 22a for calculating the data center-to-center-of-gravity distance DXV i .
- the distance register unit 24 stores M cells 24a holding the data center-of-gravity distance DXV i
- the CID register unit 25 stores M cells holding the cluster ID. 25a respectively.
- the cells 24a and 25a are registers each having a capacity of a plurality of bits. As described above, when the element data X is a q-dimensional vector and each dimension is N bits, the data center-to-center-of-gravity distance DXV i is (N+q)-bit data, so that the cell 24a is at least (N+q).
- One cell 22a, 24a, 25a corresponds to each column of the main memory 14.
- each element data X stored in the main memory 14 corresponds to one cell 22a, 24a, 25a, 28a, respectively, and each content of the element data X and cells 22a, 24a, 25a, 28a is Corresponding to each other.
- FIG. 3 and FIGS. 4 to 7, which will be described later, show only the configuration and signals of essential parts.
- the cell 22a of the distance calculation circuit 22 includes a full adder 32, an XOR (exclusive OR) circuit 33, a selector 34, a full adder 35, and a calculation register 36, as an example is shown in FIG. Has been done.
- the element data X from the delay circuit 21 is input to one input end of the full adder 32, and the cluster center of gravity V i from the center of gravity calculating circuit 29 or the center of gravity memory 15 is invertedly input to the other input end of the full adder 32.
- the carry signal (negative logic) is input to the XOR circuit 33.
- the distance between the element data X and the cluster centroid V i for each dimension is sequentially output from the XOR circuit 33.
- the distance of each dimension from the XOR circuit 33 is sequentially input to one input terminal of the full adder 35 via the selector 34. Every time the one-dimensional distance from the XOR circuit 33 is input to the full adder 35, the content of the calculation register 36 is read in synchronization with this and is input to the other input end of the full adder 35. The contents of the calculation register 36 are updated with the calculation result of the full adder 35 obtained as a result. The initial value of the content of the calculation register 36 is "0". As a result, when the q-dimensional distance is input to the full adder 35, the data center-to-center-of-gravity distance (Manhattan distance) DXV i is held in the calculation register 36.
- the selector 34 inverts the contents of the cell 24a to one input terminal of the full adder 35 after the calculation register 36 stores the distance between data centers of gravity as described above, that is, after the distance DXV i between data centers of gravity is calculated. input. After that, the contents of the calculation register 36 and the contents of the cell 24a are read out in synchronization with each other and input to the full adder 35. The contents of the calculation register 36 are given to the cell 24a as input data, and the carry signal of the full adder 35 is given as an update signal. As a result, when a carry occurs in the full adder 35, the contents of the cell 24a are updated to the contents of the calculation register 36.
- the distance register unit 24 holds the data center-of-gravity distance DXV i for each element data X.
- each cell 25a of the CID register unit 25 the cluster ID of the cluster C i to be processed is input as the designated CID while the distance calculation circuit 22 calculates the data center-of-gravity distance DXV i . Similar to the cell 24a, the content held in the input cluster ID of the cell 25a is updated when a carry occurs in the full adder 35. As a result, the cell 25a holds the cluster Ci having the minimum distance DXV i between the data centers of gravity. That is, the cluster ID held by the cell 25a indicates the cluster ID of the cluster into which the corresponding element data X is classified.
- the bus for transmitting various data in the arithmetic unit 12 has a bus width (number of bits) according to the data to be transmitted. For example, since each input terminal of the full adder 32 of the distance register section 24 inputs the element data X and the N-bit vector component of the cluster centroid V i in parallel, the bus width of each of these input terminals is N bits. Bus is connected. Further, between the full adder 35 and the calculation register 36, since the data center-to-center-of-gravity distance DXV i of (N+q) bits is transmitted in parallel, a bus having a bus width of (N+q) bits is used.
- a bus that selectively outputs both the element data X and the cluster centroid V i like the selector 37c of the CID mask circuit 28 has a large number of bits, and in this example, (N+q) bits corresponding to the cluster centroid V i.
- the width of the bus has been. Note that, in FIGS. 3, 5, and 7, the bus width of the main part is shown in the drawings.
- the distance register unit 24 and the CID register unit 25 are provided with an enable signal circuit 71 (see FIG. 9) that switches between enabling/disabling the latch operation of the cells 24a and 25a for each pair of cells 24a and 25a. There is.
- the enable signal circuit 71 By inputting the enable signal from the enable signal circuit 71 to the cells 24a and 25a, the latch operation of the cells 24a and 25a can be switched between valid and invalid at timings corresponding to various operations.
- the control of the individual cells 24a and 25a by the system controller 11 becomes unnecessary. Further, between the cell 25a and the corresponding cell 28a, the contents of one can be latched by the other.
- the maximum value detection circuit 27 when initially setting the cluster center of gravity V i of the cluster C i , compares the data center-of-gravity distances DXV i held in the distance register unit 24 and maximizes the data center-center-of-gravity distance DXV. i is detected.
- the maximum value detection circuit 27 outputs M maximum flags (1 bit) corresponding to the M cells 24a.
- the maximum value detection circuit 27 includes an M-bit input AND circuit 27a, an OR circuit 27b corresponding to each of the M cells 24a of the distance register unit 24, a NAND circuit 27c and a 1-bit unit. Register 27d is provided.
- the OR circuit 27b, the NAND circuit 27c, and the register 27d are connected to each other corresponding to the cell 24a.
- the cell 24a sends out the held data of the inter-center-of-gravity distance DXV i to the maximum value detection circuit 27 one bit at a time from the upper bit.
- the 1-bit signal from the corresponding cell 24a is inverted and input to one input terminal of the OR circuit 27b, and the output of the register 27d is inverted and input to the other input terminal.
- the output of the OR circuit 27b is input to one input terminal of the NAND circuit 27c, and the output of the AND circuit 27a is inverted and input to the other input terminal.
- the register 27d holds the logic (“1” or “0”) of the output of the NAND circuit 27c and outputs the held logic. With this configuration, all the bits of the cluster centroid distance DXV i are transmitted from the cell 24a.
- each register 27d is output to the corresponding cell 28a of the CID mask circuit 28 as a maximum flag indicating whether or not the distance between cluster centroids DXV i is maximum.
- the CID mask circuit 28 is configured to output only the element data X input from the main memory 14 or the data inter-center of gravity distance DXV i input from the distance register section 24, which is necessary for processing. ..
- the CID mask circuit 28 has M cells 28a as shown in FIG. As shown in FIG. 5B, the cell 28a includes selectors 37a to 37c, a MID register 37d, a comparator 37e, and the like. Like each of the cells 24a and 25a, each MID register 37d is provided with an enable signal circuit 75 (see FIG. 10), which makes it unnecessary for the system controller 11 to individually control each MID register 37d. By inputting the enable signal from the enable signal circuit 75 to the MID register 37d, the valid/invalid of the latch operation of the MID register 37d can be switched at a timing according to various operations.
- the selector 37a selects either one of the element data X and the data center-of-gravity distance DXV i as input data and outputs it to the selector 37c.
- the selector 37b selects either the cluster ID from the cell 24a of the CID register unit 25 or the external setting ID (cluster ID) from the system controller 11 and inputs it to the MID register 37d.
- the MID register 37d holds the cluster ID from the selector 37b by a latch operation and inputs the held cluster ID to the comparator 37e.
- the comparator 37e compares the designated CID with the cluster ID from the MID register 37d, outputs a 1-bit comparison flag (C-flag) indicating the comparison result to the outside, and inputs it to the selector 37c.
- the comparison flag is "1" when the designated CID and the cluster ID from the MID register 37d match, and "0" when they do not match.
- the selector 37c outputs the input data (the element data X or the data center-of-gravity distance DXV i ) when the comparison flag from the comparator 37e is “1”, and when the comparison flag is “0”, it outputs null data in which all bits are “0”. Output.
- the cluster ID of the cell 25a of the CID register unit 25 corresponding to the MID register 37d is held, the element data X classified into the cluster ID that matches the designated CID or the data center-of-gravity distance DXV i Are output from the cell 28a, and the comparison flags from the cells 28a become "1".
- the centroid calculating circuit 29 calculates the data centroid GG and the cluster centroid V i from each element data X input through the CID mask circuit 28 and the comparison flag. Further, the center of gravity calculation circuit 29 inputs the cluster centroids V i number of data obtained in the course of calculation n i and data addition value SS i in the evaluation value calculation circuit 18. The data number n i and the data addition value SS i are used in the evaluation value calculation circuit 18 for calculation of the second index value SBS i and the like. Further, the center-of-gravity calculation circuit 29 calculates the combination index value SWD i from each element data X input through the CID mask circuit 28 and the comparison flag, and sends it to the evaluation value calculation circuit 18. As described above, the combined index value SWD i is obtained as a value obtained by dividing the first index value SD i , which is the sum of the intra-cluster distances DXV i , by the data number n i .
- the above-mentioned data addition value SS i is data obtained by adding each element data X of the cluster C i for each dimension, as shown in Expression (6), and is the same q-dimensional vector as the element data X.
- the cluster centroid V i is obtained by dividing the data addition value SS i by the number of data n i , as shown in Expression (7). Further, the data centroid GG is obtained by dividing the data addition value for all element data X by the total number of data.
- the center of gravity calculating circuit 29 sends the cluster center of gravity V i to the center of gravity memory 15 and the distance calculating circuit 22.
- the center-of-gravity calculation circuit 29 includes a selector unit 38 including M selectors 38a, an adder 39, a first register 41 and a second register 42, and a divider 43, as shown in an example in FIG. There is.
- Each selector 38 a operates so as to sequentially select the output data (element data X or the data center-to-center-of-gravity distance DXV i ) from the CID mask circuit 28 and the comparison flag (C-flag) and output them to the adder 39. ..
- the adder 39 adds the input data.
- the first register 41 holds the calculation result of the adder 39 when the element data X or the data center-to-center-of-gravity distance DXV i is input from the selector unit 38 to the adder 39. Accordingly, when the element data X is input to the adder 39, the data addition value is held in the first register 41, and when the data center-of-gravity distance DXV i is input, the first index value. SD i is held in the first register 41.
- the adder 39 calculates a comparison flag, that is, a value obtained by adding 1-bit data, and this value is held in the second register 42.
- the value held in the second register 42 is the number of the element data X or the data center-to-center-of-gravity distance DXV i output from the CID mask circuit 28. Thereby, the number of data n i or the total number of data can be obtained.
- the adder 39 functions as a data adder when the element data X is input, and functions as a number calculator when the comparison flag is input.
- the divider 43 outputs a value obtained by dividing the value held in the first register 41 by the value held in the second register 42. As the output of the divider 43, the data center of gravity GG, the cluster center of gravity V i , and the coupling index value SWD i are obtained.
- the neighborhood search circuit unit 17 constitutes an update processing unit together with the system controller 11 and the center-of-gravity memory 15.
- the update processing unit and the batch processing unit described above constitute a clustering unit.
- This neighborhood search circuit unit 17 specifies the cluster ID that minimizes the data center-of-gravity distance DXV i with the new element data Xnew to be added during the update processing, and the new element data Xnew is added to the specified cluster C i. Classify.
- the neighborhood search circuit unit 17 includes a calculation unit 17a, a short distance register unit 17b, and a short distance CID register unit 17c.
- the calculation unit 17a calculates a data center-to-center-of-gravity distance DXV i between the new element data Xnew and each cluster center of gravity V i sequentially read from the center of gravity memory 15.
- the short-distance register unit 17b and the short-distance CID register unit 17c hold the minimum data centroid distance DXV i and the cluster ID based on the calculation result of the calculation unit 17a.
- the cluster ID finally held in the short distance CID register unit 17c becomes the cluster ID of the cluster C i into which the new element data Xnew is classified.
- the cluster ID finally held in the short distance CID register unit 17c is written in the cell 25a of the CID register unit 25 corresponding to the new element data Xnew. Further, a part of the calculation circuit forming the neighborhood search circuit unit 17 is used for obtaining the evaluation value E(Nc) at the time of the update processing.
- the calculation unit 17a includes a selector 44, a full adder 45, an XOR circuit 46, a selector 47, a full adder 48, a calculation register 49, and an adder 61.
- the calculation unit 17a sequentially calculates the data center-of-gravity distance DXV i between the new element data Xnew and each cluster center of gravity V i . Further, the minimum distance between data centers of gravity DXV i is held in the short distance register unit 17b, and the cluster ID corresponding to the distance between data centers of gravity DXV i is held in the short distance CID register unit 17c.
- the circuit configuration of the neighborhood search circuit unit 17 including the calculation unit 17a, the short distance register unit 17b, and the short distance CID register unit 17c is the same as that of the cell 22a of the distance calculation circuit 22, the cell 24a of the distance register unit 24, and the CID register unit 25. Since the cell 25a has the same configuration as the cell 25a, detailed description thereof will be omitted.
- the full adder 45, the XOR circuit 46, and the adder 61 are used to classify the cluster centroid V i and the data centroid GG into which the new element data Xnew is classified.
- the center-of-gravity distance DGV i is calculated.
- the center-of-gravity distance DGV i is sent to the evaluation value calculation circuit 18.
- the cluster centroid V i is input from the centroid memory 15.
- the evaluation value calculation circuit 18 as an evaluation value calculation unit calculates the evaluation value E(Nc) at the end of each clustering in the batch processing and at the time of the update processing. As shown in an example in FIG. 8, the evaluation value calculation circuit 18 is roughly divided into a logic section 18a and an evaluation register section 18b.
- the logic unit 18a includes a selector 50, a multiplier 51, a subtractor 52, an integrator 53, a parallel adder 54, a multiplier 55, adders 56 and 57, a denominator register 58, a numerator register 59, and a divider 60. ..
- the logic unit 18a calculates the evaluation value E(Nc) according to the above equation (1) using various data held in the evaluation register unit 18b, the number of clusters Nc input from the system controller 11, and the like. Further, the logic unit 18a calculates the cluster centroid V i into which the new element data Xnew is classified during the update processing, and writes this in the centroid memory 15. The detailed operation of the evaluation value calculation circuit 18 will be described later.
- the evaluation register unit 18b includes a GG register 63, a data number register 64, an SBS register unit 66, a SWD register unit 67, and a selector 68.
- the GG register 63 holds the data gravity center GG calculated by the gravity center calculation circuit 29.
- the data number register 64 holds the data number n i of each cluster C i obtained by the centroid calculation circuit 29.
- the data center of gravity GG may be stored in the center of gravity memory 15, and the GG register 63 may be omitted. It is also preferable that each of the GG register 63, the data number register 64, the SBS register unit 66, and the SWD register unit 67 is a non-volatile register.
- the SBS register unit 66 has a first SBS register 66 1 , a second SBS register 66 2 ...
- the iSBS register 66 i holds the second index value SBS i obtained from the data addition value SS i .
- the SWD register unit 67 has a first SWD register 67 1 , a second SWD register 67 2, ...
- the i- th SWD register 67 i holds the combination index value SWD i .
- the selector 68 selects one of the SBS register unit 66 and the SWD register unit 67 and sends the data held in the selected register unit to the logic unit 18a.
- FIG. 9 shows an example of the enable signal circuit 71 connected to the cell 24a of the distance register section 24 and the cell 25a of the CID register section 25.
- the enable signal circuit 71 is provided for each of the pair of cells 24a and 25a as described above, and the enable signal circuit 71 is also connected to the cell 25a of the CID register section 25, but in FIG. Illustration of the cell 25a is omitted.
- the enable signal circuit 71 is composed of AND circuits 71a, 71c, 71f, OR circuits 71b, 71d, and a NAND circuit 71f.
- the first control signal (CIDM flag) and the second control signal (Fupdate_preset_N) are input to the OR circuit 71b, and the third control signal (OF) and the fourth control signal (Fset) are input to the NAND circuit 71e.
- the first control signal and the fifth control signal (Flag_enable) are input to the AND circuit 71f.
- the output of the NAND circuit 71e and the output of the AND circuit 71f are input to the OR circuit 71d.
- the output of the OR times 71d and the sixth control signal (Fauto) are input to the AND circuit 71c.
- the outputs of the AND circuit 71c and the OR circuit 71b are input to the AND circuit 71a, and the output of the AND circuit 71a is input as an enable signal to the enable terminal of the cell 24a
- any one of the data for initialization, the contents of the calculation register 36, and the data read by the cell 24a (the distance between the centers of gravity of the data) itself is input via a selector (not shown).
- the data for initialization is data in which all bits are "1" or data in which "0" is set.
- the cell 24a can output the data one bit at a time starting from the most significant bit by shifting the held data to the upper bit side like a shift register. In this case, by returning the data read by the cell 24a to the input, the contents of the cell 24a are returned to the original state after the transmission of all the bits is completed.
- the first to sixth control signals are generated in the arithmetic unit 12.
- the first control signal is a comparison flag.
- the first control signal is a comparison flag output from the cell 28a of the CID mask circuit 28.
- the third control signal is a carry signal of full adder 35.
- the second, fourth, fifth, and sixth control signals are signals from the system controller 11.
- the second control signal is a low active signal for controlling the initialization of the content of each cell 24a of the distance register section 24.
- the fourth control signal is a set signal set to "1" or "0".
- the fifth control signal is a signal for validating the first control signal (comparison flag).
- the sixth control signal is a signal that supports automatic updating of each cell 24a of the distance register unit 24 during classification calculation.
- the maximum value (data in which all bits are “1”) is written in each of the cells 24a corresponding to the element data X, and the other cells 24a are in minimum value (data in which all bits are “0”).
- the operation of the enable signal circuit 71 in the case of initializing the distance register section 24 for leaving the following is as follows. In this initialization, data with each bit "0" is written in advance in the cell 24a of the distance register unit 24, and "1" is stored in the MID register 37d of each cell 28a corresponding to the element data X, while other data is stored.
- the CID mask circuit 28 is initialized with "0" held in the MID register 37d, and the designated CID of "1" is input to the comparator 37e. Also, the operation is performed with the contents of each cell 25a of the CID register unit 25 set to "1". Further, initialization data in which all the bits are "1” is input to each cell 24a.
- the second control signal is "0"
- the third control signal is “0”
- the fourth control signal is “1”
- the sixth control signal is “1”
- the fifth control signal is "1( It can be 0).”
- the comparison flags (first control signals) from the CID mask circuit 28 set as described above those corresponding to the element data X are "1", and the others are "0".
- the enable signal of the cell 22a corresponding to the element data X becomes "1”
- the enable signals of the other cells 22a become “0”
- only the cell 24a corresponding to the element data X has all bits "1". Latch and hold the initialized data that has become.
- the second control signal is "1"
- the fourth control signal is “1”
- the fifth control signal is “0”
- the sixth control signal is "1”.
- the same enable signal as the logic of the third control signal which is the carry signal of full adder 35 is input.
- the carry in the full adder 35 is generated, i.e. the enable signal when the data distance between center of gravity DXV i to hold calculation register 36 over data distance between center of gravity DXV i that cell 24a is held is small “1”
- the cell 24a latches and holds the distance DXV i between the data centers of gravity of the calculation register 36.
- the MID register 37d of the cell 28a corresponding to the new element data Xnew of the CID mask circuit 28 is written.
- the content is the cluster ID of the classification destination.
- the cluster ID is set as a designated CID and is input to each cell 25a and each comparator 37e.
- the second control signal is set to "0”
- the fourth control signal is set to "1”
- the fifth control signal is set to "0”
- the sixth control signal is set to "1”.
- the enable signal of only the cell 25a having the first control signal (comparison flag) of "1" becomes "1". Since the comparison flag is "1" only from the cell 28a corresponding to the new element data Xnew, as a result, the cluster ID of the classification destination is written only in the cell 25a corresponding to the new element data Xnew. ..
- FIG. 10 shows an example of the enable signal circuit 75 provided in the MID register 37d of the CID mask circuit.
- the enable signal circuit 75 is provided for each MID register 37d.
- the enable signal circuit 75 includes a selector 75a, an AND circuit 75b, a NAND circuit 75c, and a NOT circuit 75d.
- the seventh control signal (MaxDetector) is input to the NOT circuit 75d.
- the output of the NOT circuit 75d and the eighth control signal (i_presetMIDreg_N) are input to the NAND circuit 75c.
- the output of the NAND circuit 75c and the ninth control signal (Disable_N) are input to the AND circuit 75b.
- the selector 75a receives the output from the AND circuit 75b and the tenth control signal (Column Decoder), and inputs one of them as an enable signal to the enable terminal of the MID register 37d.
- the seventh control signal is a maximum flag from the maximum value detection circuit 27, and when setting a new initial value of the cluster centroid V i , the seventh control signal is used to control the latch operation of the MID register 37d. ..
- the eighth control signal is a low active signal from the system controller 11, and the eighth control signal controls the latch operation of the MID register 37d at the time of initialization.
- the ninth control signal is a low active signal from the system controller 11, and the ninth control signal switches between valid/invalid of the eighth control signal.
- the tenth control signal is a column decode signal of the main memory 14 and is a signal for controlling the cell 28a of the CID mask circuit 28 corresponding to the column of the main memory 14.
- the tenth control signal is used as an enable signal when the contents of the MID register 37d are designated by the external setting CID. Specifically, it is used, for example, when designating an unused cell 28a for new element data Xnew to be added in the future.
- the tenth control signal is a signal from the system controller 11.
- the arithmetic unit 12 configured as above is divided into a first power domain PD1 to a sixth power domain PD6, as shown in FIG.
- Power supply from the power supply PS to the first to sixth power domains PD1 to PD6 is independently controlled by the system controller 11 via the gate circuit unit PG.
- the system controller 11 supplies electric power at a timing necessary for a power domain including a circuit required for calculation.
- the main memory 14 is in the first power domain PD1
- the center of gravity memory 15 is in the second power domain PD2
- the neighborhood search circuit unit 17 is in the third power domain PD3
- the logic unit 18a of the evaluation value calculation circuit 18 is in the first power domain PD3. It is included in each of the four power domains PD4.
- the distance calculation circuit 22, the maximum value detection circuit 27, the CID mask circuit 28, and the center of gravity calculation circuit 29 of the clustering calculation unit 16 are included in the fifth power domain PD5.
- the distance register unit 24 and the CID register unit 25 of the clustering calculation unit 16 and the evaluation register unit 18b of the evaluation value calculation circuit 18 are included in the sixth power domain PD6.
- the first power domain PD1 of the main memory 14 is written in order to write each element data X. Is powered to.
- the power supply to the first power domain PD1 is continued until the clustering is completed.
- the period T2 is a period in which the initial cluster center of gravity V i for clustering is initialized, and the power supply to the second power domain PD2, the fifth power domain PD5, and the sixth power domain PD6 is started. Since the second index value SBS i by the logic unit 18a is not calculated in the period T2, the power supply to the fourth power domain PD4 is stopped.
- Each period from the period T3 to the period T7 is a period during which the calculation unit 12 substantially performs clustering calculation.
- power is supplied to each power domain except for the third power domain PD3 of the neighborhood search circuit unit 17 used for the update process.
- the power supply to the first power domain PD1, the second power domain PD2, and the fifth power domain PD5 is stopped.
- the power supply to the fourth power domain PD4 and the sixth power domain PD6 is continued in order to calculate the evaluation value E(Nc) in the logic unit 18a of the evaluation value calculation circuit 18. ..
- power is supplied to the fifth power domain PD5 in order to calculate the coupling index value SWD i using the clustering calculation unit 16.
- the period T2 to the period T10 is a processing period for one cluster number Nc, and in the batch process, the same power supply control as that from the period T2 to the period T10 is repeatedly performed in order to obtain the optimum cluster number Nc.
- the second index value SBS i is obtained and updated at any time by the logic unit 18a for each classification calculation, power is supplied to the fourth power domain PD4 from the period T3.
- the second index value SBS i only needs to be able to acquire the final value. Therefore, for example, as indicated by the chain double-dashed line, the next cluster centroid V i is detected after the convergence of the data centroid distance DXV i is detected.
- the second index value SBS i may be calculated by supplying power to the fourth power domain PD4 before starting the calculation. This is advantageous for power saving.
- FIG. 13 shows the state of power supply in the update process.
- the neighborhood search circuit unit 17 identifies the cluster ID that provides the minimum data centroid distance DXV i . Therefore, power is supplied to the second power domain PD2 of the center-of-gravity memory 15 and the third power domain PD3 of the neighborhood search circuit unit 17 in the first period T11.
- the sixth power domain PD6 continuously supplies power from the period T11.
- the power supply to the third power domain PD3 is stopped. Instead, the logic unit 18a starts supplying power to the fourth power domain PD4 in order to calculate a new cluster centroid V i of the cluster C i into which the new element data Xnew is classified.
- the power supply to the fourth power domain PD4 is continued in order to obtain the new second index value SBS i of the cluster C i into which the new element data Xnew is classified by the logic unit 18a.
- power is supplied to the third power domain PD3 during the period T13.
- the evaluation value E(Nc) is calculated by the logic unit 18a.
- power supply to each power domain is stopped except for the sixth power domain PD6.
- the stop can be controlled.
- the power supply to the sixth power domain PD6 can be stopped after the evaluation value E(Nc) is calculated.
- the power supply can be started in the period T11, and the power supply can be stopped after the update process is completed.
- the contents of the SBS register unit 66 and the SWD register unit 67 of the evaluation register unit 18b are saved in another storage device before the power supply is stopped, and the saved contents are returned after the power supply is started.
- the power supply to the 6-power domain PD6 and its stop may be controlled. Further, the sixth power domain PD6 continues its power supply when the update process is continuously performed from the batch process or when the batch process is continuously performed after the update process.
- Each element data X is written in the main memory 14. “0” is written in each unit block of the column in the main memory 14 in which the element data X is not written. If the batch processing has not been executed with each element data X written in the main memory 14, the batch processing is performed.
- clustering processing is performed with the number of clusters Nc using all element data X, and the evaluation value E(Nc) is calculated for each clustering process.
- the number of clusters Nc at which the evaluation value E(Nc) changes from increasing to decreasing that is, the number of clusters Nc when the evaluation value E(Nc+1) of this time becomes smaller than the evaluation value E(Nc) of the previous time is an optimum value.
- the clustered state is set to the optimum number of clusters Nc.
- the maximum number of clusters within the set number of clusters Nc may be set as the optimum value.
- the distance register unit 24 Prior to execution of clustering by batch processing, the distance register unit 24 writes a maximum value in a cell 24a corresponding to the written element data X and a "0" in a cell 24a not corresponding to any element data X. Will be initialized. Further, the CID register unit 25 is initialized by writing "1" in each cell 25a corresponding to the written element data X and "0" in the cell 24a not corresponding to any element data X. To be done. After this, the cluster ID of the corresponding cell 24a is latched and held in the MID register 37d of each cell 28a of the CID mask circuit 28.
- the clustering device 10 starts a clustering process in which the number Nc of clusters is set to “2” according to an instruction from the system controller 11.
- initial setting is performed.
- the data center of gravity GG is obtained, and the element data X that is the initial value of the cluster center of gravity V 2 is specified.
- the element data X is specified by classifying the element data X, which is the initial value of the cluster centroid V 2 , into the cluster C 2, and the data centroid GG is the provisional cluster centroid V 1 .
- the element data X is read from the main memory 14.
- the read element data X is sent to the CID mask circuit 28 and the delay circuit 21.
- the CID mask circuit 28 inputs “1” as the designated CID and selects the element data X from the main memory 14 as the input data. Therefore, only the cell 28a in which "1" is held in the MID register 37d outputs the element data X, and only the comparison flag from the cell 28a becomes "1". Therefore, the comparison flag of the cell 28a corresponding to the column in which the element data X of the main memory 14 is not written is not "1".
- the selector unit 38 selects each comparison flag from the CID mask circuit 28 in the initial state. Thereby, the number of flags (the number of signals) of the comparison flag of “1” is obtained by the adder 39, and the result is held in the second register 42. Next, the selector unit 38 selects the element data X and inputs it to the adder 39. As a result, the data addition value, which is a q-dimensional vector obtained by adding the input element data X for each dimension, is held in the first register 41. After this, the divider 43 divides the data addition value held in the first register 41 by the number of flags held in the second register 42.
- the cluster IDs of all cells 25a corresponding to the element data X are "1", so the data addition value and the number of flags at this time are the values obtained for all the element data X, respectively. Therefore, the data center of gravity GG is obtained as the division result of the divider 43.
- the data center of gravity GG obtained by the divider 43 as described above is held in the GG register 63 of the evaluation value calculation circuit 18. Further, the system controller 11 obtains the content of the second register 42 as the number of all element data X. Further, the data center of gravity GG obtained by the divider 43 is input to the distance calculation circuit 22 as the provisional cluster center of gravity V 1 .
- Each element data X previously read from the main memory 14 is input to the distance calculation circuit 22 in synchronization with the provisional input of the cluster centroid V 1 .
- the data inter-center of gravity DXV 1 is calculated from the input element data X and the provisional cluster center of gravity V 1 . Then, when the calculated data center-to-center-of-gravity distance DXV 1 is smaller than the value held in the cell 24a of the distance register unit 24 at that time, the content of the cell 24a is updated.
- each cell 24a corresponding to the element data X since the maximum value is held in each cell 24a corresponding to the element data X, the content of each cell 24a corresponding to all the element data X is calculated by the distance calculation circuit 22 between the data centroids calculated this time. Updated to distance DXV 1 .
- the contents of the cells 24a are updated as described above, and the data center-to-center-of-gravity distance DXV 1 held in all the cells 24a is input to the maximum value detection circuit 27.
- the maximum value detection circuit 27 As a result, among the M maximum flags output from the maximum value detection circuit 27, only the maximum flag corresponding to the maximum of the input data center-of-gravity distance DXV 1 becomes “1”.
- the first classification calculation is performed.
- initialization for classification calculation, processing for cluster C 1 and processing for cluster C 2 are performed in order.
- each cell 24a of the distance register unit 24 and each cell 25a of the CID register unit 25 are initialized for classification calculation. That is, the maximum value is written in the cell 24a corresponding to the written element data X, and "0" is written in the cell 24a not corresponding to any element data X to be initialized.
- the CID register unit 25 is initialized by writing "1" in each cell 25a corresponding to the written element data X and "0" in the cell 24a not corresponding to any element data X. To be done. Since the contents of each cell 25a corresponding to the element data X are always updated in the subsequent processing, the initialization may be set to a value other than "1".
- processing for cluster C1 is performed.
- the element data X is read from the main memory 14 and input to the delay circuit 21 and the CID mask circuit 28.
- the selector 37a is switched so that the element data X from the main memory 14 is input to the comparator 37e, and "1" is given to the comparator 37e as the designated CID.
- the MID register 37d of the CID mask circuit 28 at this time, only the content of the MID register 37d corresponding to the element data X having the cluster center of gravity V2 is "2", and the MID register 37d corresponding to another element data X. Is "1".
- the adder 39 performs the addition in the state where the comparison flag is input first, so that “1 The number of flags of the comparison flag that is “” is calculated, and the calculation result is held in the second register 42. That is, the data number n 1 of the element data X classified into the cluster C 1 is held in the second register 42.
- the element data X is input to the adder 39, addition is performed by the adder 39, and the calculation result is held in the first register 41.
- the first register 41 holds the data addition value SS 1 which is a q-dimensional vector obtained by adding the element data X classified into the cluster C 1 .
- the divider 43 divides the data addition value SS 1 of the first register 41 by the data number n 1 of the second register 42 to calculate the cluster centroid V 1 .
- the cluster centroid V 1 is written in the centroid memory 15 and input to the distance calculation circuit 22. Further, the data addition value SS 1 of the first register 41 and the data number n 1 of the second register 42 are sent to the evaluation value calculation circuit 18, respectively.
- the data number n 1 from the centroid calculation circuit 29 is held in the data number register 64, and the data addition value SS 1 is input to the subtractor 52.
- the data number n 1 of the data number register 64 and the data center of gravity GG of the GG register 63 are read, and these are multiplied by the multiplier 51.
- the subtracter 52 calculates the difference (q-dimensional vector) between the output value (q-dimensional vector) of the multiplier 51 and the data addition value SS 1 from the centroid calculation circuit 29, and each vector component of the difference is calculated by the integrator 53. Accumulate with.
- the second index value SBS 1 of the cluster C 1 at the present time is calculated.
- the second index value SBS 1 is held in the first SBS register 66 1 .
- the cluster centroid V i , the number of data n i, and the data addition value SS i have the relationship shown in the above equation (7).
- the second index value SBS i represented by the above equation (5) can be transformed into the following equation (8). Therefore, the second index value SBS i can be obtained by the calculation in the evaluation value calculation circuit 18 using the data gravity center GG, the data addition value SS i, and the data number n i .
- each element data X from the delay circuit 21 is input to the distance calculation circuit 22 in synchronization with the cluster center of gravity V 1 from the center of gravity calculation circuit 29.
- the data distance between the centers of gravity DXV 1 between cluster centroids V 1 is calculated for each element data X input. Then, if the inter-data-center-of-gravity distance DXV 1 calculated in the cell 22a is shorter than the distance currently held in the cell 24a of the corresponding distance register unit 24, the contents of the cell 24a in which the calculated data center of gravity is calculated.
- the inter-distance DXV 1 is updated, and the cluster ID of each corresponding cell 25a is also updated.
- Process for cluster C2 in addition to using the "2" as specified CID is the same as the process for the cluster C 1. That is, of the element data X read from the main memory 14 and input to the CID mask circuit 28, only the element data X in which the content of the cell 25a of the corresponding CID register unit 25 is “2” is the center of gravity. It is input to the calculation circuit 29. Further, the same number of comparison flags as the element data X output from the CID mask circuit 28 becomes “1”.
- the CID mask circuit 28 as described above, only the contents of the MID register 37d corresponding to the element data X which is the cluster centroids V 2 is "2" because the cluster centroids V 2 and the element data X centroid calculation While being input to the circuit 29, only one comparison flag becomes "1".
- the center-of-gravity calculation circuit 29 obtains the data number n 2 for the cluster C 2 and the data addition value SS 2 for the cluster C 2 from the element data X output from the CID mask circuit 28 and each comparison flag. , And the cluster centroid V 2 is calculated from these.
- the cluster centroid V 2 obtained by the divider 43 is written to the centroid memory 15 and input to the distance calculation circuit 22. Further, the data addition value SS 2 of the first register 41 and the data number n 2 of the second register 42 are sent to the evaluation value calculation circuit 18, respectively.
- the data number n 2 from the gravity center calculation circuit 29 is held in the data number register 64 separately from the previously written data number n 1, and the data addition value SS 2 is input to the subtractor 52. After that, the number of data n 2 of the data register 65 and the data center of gravity GG of the GG register 63 are read out, and the second index value SBS 2 is calculated in the same manner as the above-mentioned second index value SBS 1 .
- the second index value SBS 2 is held in the second SBS register 66 2 .
- Each element data X from the delay circuit 21 is input to the distance calculation circuit 22 in synchronization with the cluster center of gravity V 2 from the center of gravity calculation circuit 29.
- Each cell 22a of the distance calculation circuit 22 calculates the distance DXV 2 between the data center of gravity and the cluster center of gravity V 2 for the input element data X. If the calculated data center-to-center-of-gravity distance DXV 2 is shorter than the distance held in the cell 24a of the corresponding distance register unit 24, the content of the cell 24a is updated to the calculated data center-of-center-of-gravity distance DXV 2 . Then, the cluster ID of each corresponding cell 25a is also updated.
- the cluster IDs of those cells 25a are updated to "2". Therefore, among the element data X classified into the cluster C 1 until then, the element data X closer to the cluster centroid V 2 than the cluster centroid V 1 has the contents of the corresponding cell 24a and the contents of the cell 25a updated at the same time. Then, the state is classified into the cluster C 2 .
- each MID register 37d is updated to the content of the cell 25a of the corresponding CID register unit 25.
- the first classification calculation ends.
- the second classification calculation is performed.
- the second classification calculation is the same procedure as the first classification calculation, and after the initialization for classification calculation, the process for cluster C 1 and the process for cluster C 2 are performed in order.
- the maximum value is written in the cell 24a corresponding to the written element data X, and "1" is written in each cell 25a.
- the gravity center calculation circuit 29 obtains a new data number n 1 , a data addition value SS 1, and a cluster gravity center V 1 . Then, the cluster centroid V 1 held in the centroid memory 15 and the data number n 1 held in the data number register 64 are respectively updated to the newly calculated ones. In the evaluation value calculation circuit 18, the new second index value SBS 1 is calculated using the new data number n 1 and the data addition value SS 1, and the contents of the first SBS register 66 1 are updated. ..
- the distance calculation circuit 22 calculates the distance DXV 1 between the data centroids with respect to the new cluster centroid V 1 for each element data X. If the new data center-to-center-of-gravity distance DXV 1 thus calculated is shorter than the content of the cell 24a of the distance register unit 24, the content of the cell 24a is updated to the new data-to-center-of-gravity distance DXV 1. , The content of the cell 25a of the CID register unit 25 corresponding to the cell 24a is updated to "1".
- the process for cluster C 2 is similarly performed.
- the contents of the centroid memory 15 and the data number register 64 are updated to the new cluster centroid V 2 and the data number n 2 calculated by the centroid calculation circuit 29.
- a new second index value SBS 2 is calculated by the evaluation value calculation circuit 18 using the new data addition value SS 2 and the new data number n 2 calculated by the center of gravity calculation circuit 29, and the second SBS register 66. The contents of 2 are updated.
- the distance calculation circuit 22 calculates a data center-to-center-of-gravity distance DXV 2 with respect to the new cluster center of gravity V 2 for each element data X, and the calculated new data-to-center-of-gravity distance DXV 2 is a cell of the distance register unit 24. is shorter than the contents of 24a, with the contents of that cell 24a is updated to the distance DXV 2 between the new data center of gravity, the contents of the cell 25a of the CID register 25 corresponding to that cell 24a is updated to "2" , The classification of each element data X into each cluster is updated. After this, the contents of each MID register 37d are updated to the contents of the cell 25a of the corresponding CID register unit 25, and the second classification calculation ends.
- the third and subsequent classification calculations are similarly performed, and the cluster centroids V 1 and V 2 , the numbers of data n 1 and n 2 , and the second index values SBSS 1 and SBS 2 are updated. Further, the contents of each cell 24a of the distance register unit 24 and the contents of each cell 25a of the CID register unit 25 are updated, and each element data X is classified into each cluster.
- the system controller 11 monitors the content of the gravity center memory 15 for each classification calculation as described above. Then, the system controller 11 ends the classification calculation when the variation in the contents of the center-of-gravity memory 15 disappears.
- the center of gravity memory 15, the data number register 64, the cluster center of gravity V 1 and V 2 held in the SBS register unit 66, the number of data n 1 and n 2 , the second index values SBSS 1 and SBS 2 Are calculated based on the cluster centroids V 1 and V 2 that have not changed, that is, have converged.
- the evaluation value E(Nc) is calculated.
- the selector 68 selects the SBS register unit 66, and the contents of the SBS registers 66 1 , 66 2 ...
- the contents of the read SBS registers 66 1 , 66 2 ... Are read out and added by the parallel adder 54.
- the number of clusters Nc is “2”, so that the second index values SBSS 1 and SBS 2 are substantially read from the SBS register unit 66 and added by the parallel adder 54.
- the cluster index value SBS which is the sum of the second index values SBS i , is obtained.
- centroid calculation circuit 29 is used to calculate the coupling index values SWD 1 and SWD 2 for the clusters C 1 and C 2 .
- Each in-cluster distance DXV i is read from each cell 24a of the distance register unit 24 and input to the center of gravity calculation circuit 29 via the CID mask circuit 28.
- the CID mask circuit 28 At this time, first, by inputting “1” as the designated CID into each cell 28a of the CID mask circuit 28, only the intra-cluster distance DXV 1 corresponding to each element data X classified into the cluster C 1 is calculated. Output to the circuit 29.
- the number of data n 1 of each element data X classified into the cluster C 1 is obtained by the adder 39 from the number of comparison flags set to “1”, which is stored in the second register 42. Retained.
- the in-cluster distances DXV 1 are integrated, but the first index value SD 1 is obtained by the adder 39, and this is stored in the first register 41.
- the divider 43 divides the first index value SD 1 of the first register 41 by the data number n 1 of the second register 42 to obtain the combined index value SWD 1 .
- This combination index value SWD 1 is sent to the evaluation value calculation circuit 18 and held in the first SWD register 67 1 .
- the selector 68 selects the SWD register unit 67 in the evaluation value calculation circuit 18, and the contents of the SWD registers 67 1 , 67 2 ... It is read in parallel.
- the contents of the read SWD registers 67 1, 67 2, ... are added in parallel adder 54.
- the combined index values SWD 1 and SWD 2 are substantially read from the SWD register unit 67 and added by the parallel adder 54.
- the internal coupling degree SWD is obtained as the addition result of the parallel adder 54.
- the multiplication result of the multiplier 55 is held in the denominator register 58.
- the internal coupling degree SWD is multiplied by the number of clusters Nc because the cluster index value SBS is standardized by the number of clusters Nc when the evaluation value E(Nc) is obtained by the following division. This is because SBS/Nc).
- the divider 60 divides the contents of the numerator register 59 by the contents of the denominator register 58 to calculate the evaluation value E(2) when the number of clusters Nc is “2”.
- the evaluation value calculation circuit 18 calculates the evaluation value E(2) by using the calculation value of the calculation process for the clustering calculation unit 16 to perform clustering. That is, when the cluster centroids V 1 and V 2 are calculated, the second index value SBS obtained by using the data addition values SS 1 and SS 2 and the data numbers n 1 and n 2 which are calculated during the calculation. The evaluation value E(2) is calculated using 1 and SBS 2 . Therefore, the evaluation value E(2) is efficiently obtained at high speed and low power consumption. The same applies to the evaluation value E(Nc) calculated after this.
- the system controller 11 determines that the evaluation value E(2) obtained by the divider 60 as described above and the contents of each cell 26a held in the CID register unit 25 at this time, that is, the cluster number Nc is "2". The cluster ID corresponding to each element data X at this time is acquired. The system controller 11 holds the acquired evaluation value E(2) and each cluster ID in a storage unit (not shown).
- the clustering process with the number of clusters Nc set to "3" is performed. Also in the "3" clustering process for the number of clusters Nc, initial setting, classification calculation, and evaluation value calculation are performed as in the case where the number of clusters Nc is "2".
- the cluster center of gravity V 1 and V 2 obtained when the number of clusters Nc is set to “2” and the largest cluster among the intra-cluster distances DXV 1 and DXV 2
- the cluster center of gravity V 3 which is the element data X corresponding to the inner distance DXV i is set as an initial value. This accelerates the convergence of the cluster centroid V i .
- the initial value of the cluster center of gravity V 3 is set by the same procedure as in the case of setting the initial value of the cluster center of gravity V 2 described above, but the intra-cluster distances DXV 1 and DXV 2 are held in the distance register unit 24. Therefore, the cluster centroids V 1 and V 2 and the intra-cluster distances DXV 1 and DXV 2 are not calculated. All the intra-cluster distances DXV 1 and DXV 2 are read from each cell 24a of the distance register unit 24, and the maximum value detection circuit 27 and the CID mask circuit 28 are used to set the maximum flag of “1”, that is, the maximum intra-cluster distance DXV i. The latch operation of only the MID register 37d of the cell 28a corresponding to is permitted.
- the first classification calculation is performed.
- the designated CID is set to “1”
- the process for the cluster C 1 is performed, and then the designation is performed.
- the process for cluster C 2 is performed with the CID set to “2”.
- the designated CID is set to “3”, and the processing for the cluster C 3 is performed in the same manner as for the clusters C 1 and C 2 .
- the cluster centroid V 3 obtained by the process for the cluster C 3 is written in the centroid memory 15, and the data number n 3 is written in the data number register 64. Further, the second index value SBS 3 obtained from the data number n 3 and the data gravity center GG, and the data addition value SS 3 is written in the third SBS register 66 3 .
- the data center of gravity GG may be newly calculated, but in this example, the one obtained when the number of clusters Nc is “2” and held in the GG register 63 is used as it is.
- the second classification calculation is similarly performed. Thereafter, the classification calculation is similarly repeated.
- the content of each cell 24a of the distance register unit 24 and the content of each cell 25a of the CID register unit 25 are updated for each classification calculation, and the classification of each element data X into each cluster C i is updated.
- the cluster centroids V 1 to V 3 of the centroid memory 15 are updated, and the data numbers n 1 to n 3 of the data number register 64 and the second index values SBS 1 to SBS of the first to third SBS registers 66 1 to 66 3 are updated. 3 is updated.
- the system controller 11 ends the classification calculation. After that, the evaluation value E(3) is calculated by the evaluation value calculation circuit 18. At the time of this evaluation value E(3), the coupling index values SWD 1 to SWD 3 for the clusters C 1 to C 3 are calculated using the centroid calculation circuit 29.
- the system controller 11 uses the evaluation value E(3) obtained as described above and the contents of each cell 26a held in the CID register unit 25 at this time, that is, each element when the number of clusters Nc is "3".
- the cluster ID corresponding to the data is acquired.
- the system controller 11 holds the acquired evaluation value E(3) and each cluster ID in the storage unit.
- the clustering process is performed while increasing the number of clusters Nc by 1, and the evaluation value E(Nc) for each number of clusters Nc and the cluster ID corresponding to each element data X are acquired and stored.
- the system controller 11 sets the last cluster number Nc to the optimum value when the current evaluation value E(Nc+1) becomes smaller than the previous evaluation value E(Nc). Then, after this, the state of the arithmetic unit 12 is restored to the clustered state with the optimum cluster number Nc.
- the system controller 11 initializes each unit except the main memory 14 and the GG register 63 of the arithmetic unit 12, and then corresponds to the optimum number of clusters Nc stored in the storage unit. Each of the cluster IDs thus written is written back to each cell 25a of the CID register unit 25. After that, the classification calculation for the clusters C 1 , C 2, ..., C NC is performed once, and then the evaluation value E(Nc) is calculated.
- the index values SBS 1 to SBS NC , the combined index values SWD 1 to SWD Nc of the SWD register unit 67, and the data numbers n 1 to n Nc of the data number register 64 are final values clustered by the optimal cluster number Nc. Each is restored.
- the cluster ID written in the CID register unit 25 is not changed by the classification calculation.
- the restoration method is not limited to the above.
- the content held by each unit of the arithmetic unit 12 after the end of the classification calculation is stored in the storage unit for each cluster number Nc, and the optimum number of clusters Nc among them is written back to each unit. May be.
- the previous cluster number Nc is set to an optimum value when the current evaluation value E(Nc+1) becomes smaller than the previous evaluation value E(Nc)
- the previous cluster number Nc is used for restoration. Only each cluster ID of the CID register unit 25 may be stored in the storage unit.
- the arithmetic unit 12 is restored to the state of clustering with the optimum number of clusters Nc, and the batch processing ends.
- the update processing is performed.
- the new element data Xnew is classified into the cluster C i having the smallest data centroid distance DXV i . That is, the new element data Xnew is classified into the nearest cluster C i .
- the cluster centroid V i of the cluster C i into which the new element data Xnew is classified is updated, and then the evaluation value E(Nc) is calculated. Then, the validity of the clustering result after the update processing is determined based on the evaluation value E(Nc).
- the update process will be specifically described below.
- the update process is performed in response to the addition of the new element data Xnew.
- the new element data Xnew to be added is first input to the neighborhood search circuit unit 17 by the system controller 11, and the cluster centroid V i is sequentially read from the centroid memory 15 and input to the neighborhood search circuit unit 17. ..
- the calculation unit 17a sequentially calculates the data center-of-gravity distance DXV i between the new element data Xnew and the cluster center of gravity V i that is sequentially input.
- the short-range register unit 17b When the new data center-to-center-of-gravity distance DXV i obtained by the calculation unit 17a is smaller than the content held in the short-range register unit 17b, the contents of the short-range register unit 17b is updated to the new data center-of-center distance DXV i. It This ultimately smallest data distance between center of gravity DXV i for the new element data Xnew is held at a short distance register unit 17b.
- the designated CID indicating the cluster ID corresponding to the cluster centroid V i input to the neighborhood search circuit unit 17 is input to the short distance CID register unit 17c.
- the short distance CID register unit 17c holds the cluster ID corresponding to the minimum distance between data center of gravity DXV i . In this way, the new element data Xnew is classified into the cluster C i having the smallest data center-of-gravity distance DXV i .
- k is a value of 1, 2,... Nc.
- the evaluation value calculation circuit 18 obtains the cluster centroid V k after the addition of the new element data Xnew by the equation (9).
- the value V kOLD and the value n kOLD are both the cluster centroid and the number of data for the cluster C k before the new element data Xnew is added (classified), and the cluster centroid V kOLD is the centroid. From the memory 15, the data number n kOLD is held in the data number register 64.
- the data number n kOLD read from the data number register 64 of the evaluation value calculation circuit 18 and the fixed value “1” are input to the adder 56 to obtain the value “n kOLD +1”, which is stored in the denominator register 58. Retained. Further, the cluster centroid V kOLD read from the centroid memory 15 and the data number n kOLD read from the data number register 64 are input to the multiplier 51, and the value “n kOLD ⁇ V kOLD ” is obtained.
- the value “n kOLD ⁇ V kOLD ” from the multiplier 51 and the new element data Xnew are input to the adder 57, the value “n kOLD ⁇ V kOLD +Xnew” is obtained, and this value is held in the numerator register 59. It Then, the divider 60 divides the contents of the numerator register 59 by the contents of the denominator register 58 to calculate the cluster centroid V k after the new element data Xnew is classified. The cluster centroid V k calculated in this way is written in the centroid memory 15 to update the cluster centroid V k of the cluster C k .
- the second index value SBS k and the combined index value SWD k are updated.
- the cluster centroid V k is read from the centroid memory 15 and the data centroid GG is read from the GG register 63, respectively, and these are read by the full adder 45 of the neighborhood search circuit unit 17.
- a difference vector (q-dimensional vector) between the cluster centroid V k and the data centroid GG is obtained as the output of the adder 61.
- the difference vector is input to the multiplier 51 of the evaluation value calculation circuit 18 via the selector 50.
- the data number n k read from the data number register 64 is input to the multiplier 51.
- a difference vector obtained by multiplying the number of data n k is obtained.
- the second index value SBS k is obtained by passing the difference vector obtained by multiplying the number of data n k through the subtractor 52 and inputting it to the integrator 53.
- the content of the kth SBS register of the SBS register unit 66 is updated to the second index value SBS k calculated in this way.
- the combination index value SWD k is updated.
- the new combination index value SWD k is calculated by the clustering calculation unit 16.
- the new element data Xnew is written in an unused column of the main memory 14, and "k" is written as a cluster ID in the cell 25a of the CID register unit 25 corresponding to the column.
- the system controller 11 reads out the cluster ID held in the short-distance CID register unit 17c, and gives the cluster ID as a designated CID to each cell 25a of the CID register unit 25. This is performed by causing only the cell 25a corresponding to the element data Xnew to perform the latch operation.
- each MID register 37d of the CID mask circuit 28 After writing the cluster ID of the new element data Xnew to the CID register unit 25, the contents of each MID register 37d of the CID mask circuit 28 are updated to the contents of the cell 25a of the corresponding CID register unit 25. Thereafter, the designated CID designating "k" as the cluster ID is given to the comparator 37e of each cell 28a of the CID mask circuit 28. As a result, only the comparison flag from the cell 28a in which the cluster ID of the cell 25a is "k" becomes "1".
- the cell 25a that is, the cell 25a having the cluster ID “k” held therein and the cell 24a corresponding to the cell 25a are adapted to perform the latch operation.
- the cell 24a corresponding to respective element data Xk are classified into clusters C k including new element data Xnew, contents of the cell 25a is updated.
- the intra-cluster distance DXVi is read from the distance register unit 24 and input to the centroid calculation circuit 29 via the CID mask circuit 28.
- the content of each MID register 37d is the same as that of the cell 25a of the corresponding CID register unit 25, and "k" is designated as the designated CID. Therefore, only the intra-cluster distance DXV k is input to the centroid calculation circuit 29. Further, the comparison flag input to the center of gravity calculation circuit 29 has the same number of flags as the number of each element data X classified into the cluster C k as “1”.
- the centroid calculation circuit 29 as in the case of the classification calculation, the first index value SD k obtained by adding the intra-cluster distances DXV k and the number of data of each element data X classified into the cluster C k. n k is calculated, and a new combination index value SWD k is calculated from these.
- the new combination index value SWD k is sent to the evaluation value calculation circuit 18, and the content of the kth SWD register of the SWD register unit 67 is updated to this new combination index value SWD k .
- the evaluation value calculation circuit 18 calculates the evaluation value E(Nc) using the contents of the SBS register unit 66 and the contents of the SWD register unit 67.
- the calculation procedure of the evaluation value E(Nc) at this time is the same as the procedure performed after clustering in the collective processing.
- the cluster centroid V k of the cluster C k into which the new element data Xnew is classified is updated, but the data centroid GG is not updated.
- the total number of existing element data X is very large, and the amount of movement of the data center of gravity GG due to addition of one or several new element data Xnew is very small. Therefore, the variation of the second index value SBSi including the center-of-gravity distance DGV i as a parameter is very small, and the effect of not updating the data center-of-gravity GG on the evaluation value E(Nc) is considerably small.
- the movement of the cluster centroid V k due to the addition of one or several new element data Xnew is considerably larger than the movement of the data centroid GG, although it depends on the number of data n k .
- the combined index value SWD k and the second index value SBS k vary greatly, and the variation of the evaluation value E(Nc) accompanying these variations also increases.
- the system controller 11 acquires the evaluation value E(Nc) calculated as described above, and determines the validity of the clustering result by the update process based on this evaluation value E(Nc). That is, it is determined whether or not the clustering state is properly maintained even after the new element data Xnew is classified by the update processing as described above.
- the evaluation value E(Nc) obtained for the number Nc of clusters which is determined to be appropriate in the batch processing performed immediately before the update processing is set as the reference evaluation value, and the reference evaluation value and the update value are obtained.
- the evaluation value E(Nc) is compared. In this comparison, for example, when the latter is equal to or more than the former, it is determined to be appropriate, and the processing is ended. On the other hand, if the latter is smaller than the former, it is determined to be invalid and batch processing is performed.
- the batch processing is performed by the same procedure as above.
- the cluster centroid V i obtained with the proper number of clusters Nc obtained in the previous batch processing the cluster ID for each element data, or the cluster ID for each element data obtained by the update processing, It is also preferable to perform initial setting. By doing so, clustering by the k-means method can be converged early, and the number of calculations and the calculation time can be shortened.
- the new element data Xnew can be efficiently and quickly classified. Then, the validity of the clustering result by this classification is judged by the evaluation value E(Nc), and when the clustering result is bad, all the element data X including the new element data is batch processed. Since the clustering is performed and the clusters are classified into the optimum number Nc of clusters, the accuracy of the clustering is kept high. When this is used, for example, in an automatic recognition device having a learning function, high-speed, real-time recognition (classification) is realized by the update processing, and highly accurate learning by batch processing becomes possible depending on the situation.
- the method of determining whether or not the clustering state is properly maintained from the evaluation value E(Nc) obtained by the update processing is not limited to the above method, but as described above, it is obtained before the update processing. It is a preferable method to compare the obtained evaluation value E(Nc) as a reference evaluation value. Further, as the reference evaluation value, an evaluation value obtained by evaluating the clustering result by the clustering performed immediately before the current update processing can be used as the reference evaluation value regardless of the batch processing and the update processing. Further, it is appropriate when the reduced magnitude of the post-update evaluation value with respect to the pre-update evaluation value is within a predetermined range (for example, 10% to 15% or less of the pre-update evaluation value or a predetermined value or less).
- the updating process is performed every time one new element data is added, but in response to the number of new element data being added becoming a constant of 2 or more, It may be configured to perform update processing. Further, when new element data is added, the update processing may be performed depending on whether a predetermined condition is satisfied, or the batch processing may be performed from the beginning without performing the update processing. For example, if the number of new element data is a preset number or more, or if the ratio of the number of new element data to be added to the clustered element data is a certain value or more, update processing is not performed and batch processing is performed from the beginning. May be performed.
- the order of processing for each cluster in classification calculation in batch processing is arbitrary.
- the cluster centroid is written in the centroid memory each time the cluster centroid is obtained by the classification calculation, the cluster centroid may be written in the centroid memory after the cluster centroids converge. In this case, the convergence may be determined by monitoring the cluster centroid calculated by the centroid calculation circuit.
- the second index value may be calculated and written to the SBS register unit of the second index value. The same applies to the writing of the data number to the data number register.
- the above-mentioned clustering verification using the clustering device 10 was performed for the following three cases (1) to (3). That is, when (1) the new element data Xnew is placed inside one of the two existing clusters (FIG. 16), (2) the new element data Xnew is placed far from the existing two clusters, This is when there is a difference in distance from each existing cluster (FIG. 17), and (3) when the new element data Xnew is arranged near one of the two existing clusters (FIG. 18). In any case, each existing cluster has 50 element data X classified by batch processing, and 10 new element data Xnew are added and collectively updated.
- the evaluation value E(Nc) was calculated by the above equation (1).
- each new element data Xnew is classified into one cluster in which they are arranged by the update processing. ..
- the evaluation value E(Nc) in each cluster number Nc of clustering using all the element data X before addition of the new element data Xnew is shown in the column of collective processing (before addition) in Table 1.
- the evaluation value E(2) in the state where the new element data Xnew is added and the update processing is performed is shown in the update processing column of Table 1.
- the evaluation value E(Nc) in each cluster number Nc using all the element data X to which the new element data Xnew is added is shown in the column of collective processing (after addition) in Table 1.
- each new element data Xnew is updated by the update process. It was classified into one of the clusters with a short distance.
- the evaluation value E(Nc) at each cluster number Nc of clustering using all the element data X before the addition of the new element data Xnew is added to the column of collective processing (before addition) in Table 2 and the new element data Xnew is added.
- the evaluation value E(2) in the state where the update processing is performed is shown in the update processing column of Table 2. In this case, since the evaluation value E(2) is greatly reduced by the update process, the batch process is performed after the validity judgment. At this time, the evaluation value E(Nc) in each cluster number Nc is shown in the column of batch processing (after addition) in Table 2.
- each new element data Xnew is classified into one of the clusters in which the new element data Xnew is arranged in the vicinity by the update processing. ..
- the evaluation value E(Nc) in each cluster number Nc of clustering using all the element data X before the addition of the new element data Xnew is added to the column of collective processing (before addition) in Table 3 and the new element data Xnew is added.
- the evaluation value E(2) in the state in which the update processing is performed is shown in the update processing column of Table 3. In this case, the decrease in the evaluation value E(2) due to the updating process was about 10%.
- the evaluation value E(Nc) in each cluster number Nc when the batch processing is performed after the update processing is shown in the column of batch processing (after addition) in Table 3.
- the formula for calculating the evaluation value is not limited to the above.
- the internal coupling degree obtained as described above means that the smaller the value is, the more clustered the data in each cluster is so that the mutual similarity of the data in the cluster becomes higher.
- the external separation degree obtained as described above means that the clusters are in a clustering state in which the clusters are separated from each other such that the larger the value is, the lower the similarity between the clusters is. Therefore, the calculation for obtaining the evaluation value from the internal coupling degree and the external isolation degree is performed when the internal coupling degree changes in the direction of decreasing and the external coupling degree increases. The value may be changed to increase or decrease, and an arithmetic expression using variables such as the degree of internal coupling and the degree of external separation that can obtain such a result may be used.
- a cluster between one cluster C i and another cluster C j the distance (d (V i, V j )) may be used the sum is a cluster index value SBS (the second sum) for each cluster of the minimum value among.
- the minimum value of the inter-cluster distance (d(V i , V j )) between one cluster C i and another cluster C j is the second index value SBS i .
- the distance between data centers of gravity (intra-cluster distance)
- the distance between centers of gravity, and the distance between clusters other than the Manhattan distance, for example, Euclidean distance, Minkowski distance, Point symmetry distance, etc. may be used. May be.
- the standardized combination index value SWD i is obtained by dividing the first index value SD i by the number of data n i in the cluster as the first value, but the first value is However, the value can be a value based on the number of data n i of the cluster C i .
- the standardized external isolation is obtained by dividing the cluster index value SBS by the number of clusters Nc as the second value, but the second value is limited to this.
- the value can be standardized by a value based on the number of clusters Nc.
- a power of the cluster number Nc a value obtained by multiplying the cluster number Nc by a constant, or a constant number is subtracted from or added to the cluster number Nc, as in the first value.
- a value or the like can be used as the second value.
- an evaluation value (hereinafter, such evaluation value E(Nn , Nc)), but such an evaluation value E(Nn, Nc) is useful in evaluating the state of clustering between each cluster number Nc when there is no increase in element data. is there.
- the evaluation value E(Nn, Nc) is the same as the evaluation value before the total data number Nn increases with the increase of the element data even if the clustering state becomes good. Compared to.
- the total number Nn of element data is dynamic so that addition of element data occurs, and when the evaluation values are compared before and after the increase of the element data as in the above update processing, It is preferable that the total number of data Nn is not included unlike the evaluation value E(Nc) shown in Expression (1) and the evaluation value E(Nc) using the cluster index value SBS shown in Expression (10).
- the method of evaluating the clustering state (classification result) by using the evaluation value E(Nc) and the method of calculating the evaluation value E(Nc) by using the calculation value of the calculation process for the clustering calculation unit to perform clustering are
- the present invention is not limited to the case where the update process is performed when new element data is added, and can be applied to, for example, a configuration where the batch process is performed when new element data is added.
- the cluster center of gravity is used as the first representative point that is the base point of the intra-cluster distance
- the second representative point that is the base point of each cluster of the center-of-gravity distance and the inter-cluster distance that serves as an index of the distance between the clusters.
- the first representative point and the second representative point are not limited to these.
- the first representative point and the second representative point may be element data closest to the cluster centroid in each cluster.
- the second representative point is a base point for measuring the distance (distance) between the clusters or the distance between the reference point and the cluster, which will be described later.
- the element data and the like may be used.
- the first representative point and the second representative point may be points or data in the cluster or arbitrary points or element data in the cluster, which are determined by a separately determined standard.
- the data center of gravity is used as the reference point, but the reference point can be set to any point or element data, and in addition to the data center of gravity as described above, the element data closest to the data center of gravity can be set.
- the evaluation value E(Nc) may be the reciprocal of Expression (1).
- the evaluation value E(Nc) that is the smallest (minimum) is the optimum number of clusters Nc.
- the calculation for obtaining the evaluation value in addition to the ratio between the internal coupling degree and the external separation degree, for example, as shown in each of the following equations, To obtain the evaluation value E(Nc) by weighting and adding the reciprocal number and the other, and the evaluation value E( by subtracting the other from one of the internal coupling degree and the external separation degree. It is possible to use the one for obtaining Nc).
- the values Wa and Wb in the equation are weighting constants, and Wa and Wb ⁇ 0.
- clustering is also performed using the k-means method, but the present invention is not limited to the k-means method, and can be applied to hard clustering in which each element data is classified so as to belong to one cluster.
- hard clustering include the “k-means++” method, spectral clustering, simple concatenation method, and Ward method.
- clustering device 11 system controller 14 main memory 15 center of gravity memory 16 clustering calculation unit 17 neighborhood search circuit unit 18 evaluation value calculation circuit 18a logic unit 18b evaluation register unit 25 CID register unit 29 center of gravity calculation circuit PD1 to PD6 power domain
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/423,192 US11914448B2 (en) | 2019-02-06 | 2019-02-06 | Clustering device and clustering method |
| KR1020217019429A KR102799088B1 (ko) | 2019-02-06 | 2019-02-06 | 클러스터링 장치 및 클러스터링 방법 |
| JP2020570278A JP7262819B2 (ja) | 2019-02-06 | 2019-02-06 | クラスタリング装置及びクラスタリング方法 |
| PCT/JP2019/004315 WO2020161845A1 (ja) | 2019-02-06 | 2019-02-06 | クラスタリング装置及びクラスタリング方法 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2019/004315 WO2020161845A1 (ja) | 2019-02-06 | 2019-02-06 | クラスタリング装置及びクラスタリング方法 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020161845A1 true WO2020161845A1 (ja) | 2020-08-13 |
Family
ID=71947736
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2019/004315 Ceased WO2020161845A1 (ja) | 2019-02-06 | 2019-02-06 | クラスタリング装置及びクラスタリング方法 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US11914448B2 (https=) |
| JP (1) | JP7262819B2 (https=) |
| KR (1) | KR102799088B1 (https=) |
| WO (1) | WO2020161845A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113031877A (zh) * | 2021-04-12 | 2021-06-25 | 中国移动通信集团陕西有限公司 | 数据存储方法、装置、设备及介质 |
| CN114119883A (zh) * | 2022-01-29 | 2022-03-01 | 北京中科慧云科技有限公司 | 基于自适应聚类的大粮堆储粮三维云图绘制方法及装置 |
| WO2022265044A1 (ja) * | 2021-06-18 | 2022-12-22 | 国立大学法人東北大学 | 演算処理装置 |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11664129B2 (en) * | 2019-08-13 | 2023-05-30 | International Business Machines Corporation | Mini-batch top-k-medoids for extracting specific patterns from CGM data |
| US12001462B1 (en) * | 2023-05-04 | 2024-06-04 | Vijay Madisetti | Method and system for multi-level artificial intelligence supercomputer design |
| US11526688B2 (en) * | 2020-04-16 | 2022-12-13 | International Business Machines Corporation | Discovering ranked domain relevant terms using knowledge |
| US12339874B2 (en) | 2023-03-29 | 2025-06-24 | Seoul National University R&Db Foundation | Density-based data clustering apparatus and method |
| CN117574222B (zh) * | 2023-11-07 | 2024-06-11 | 广州海洋地质调查局 | 一种底栖生境的分类方法、系统、设备及介质 |
| CN117992811B (zh) * | 2024-04-03 | 2024-06-28 | 厦门悠生活网络科技有限公司 | 一种大学宿舍物联网洗衣机的故障检测方法及系统 |
| CN118378110B (zh) * | 2024-06-25 | 2024-09-03 | 山东德源电力科技股份有限公司 | 一种具备监测数据分析功能的电能表 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH05205058A (ja) * | 1992-01-24 | 1993-08-13 | Hitachi Ltd | クラスタリング方法 |
| JPH11219374A (ja) * | 1997-10-31 | 1999-08-10 | Hitachi Ltd | データクラスタリング方法、装置およびプログラム記録媒体 |
| US20120303623A1 (en) * | 2011-05-26 | 2012-11-29 | Yahoo! Inc. | System for incrementally clustering news stories |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8437844B2 (en) | 2006-08-21 | 2013-05-07 | Holland Bloorview Kids Rehabilitation Hospital | Method, system and apparatus for real-time classification of muscle signals from self-selected intentional movements |
| CN106383695B (zh) | 2016-09-14 | 2019-01-25 | 中国科学技术大学苏州研究院 | 基于fpga的聚类算法的加速系统及其设计方法 |
| JP6132996B1 (ja) | 2017-01-24 | 2017-05-24 | 株式会社ディジタルメディアプロフェッショナル | 画像処理装置,画像処理方法,画像処理プログラム |
-
2019
- 2019-02-06 US US17/423,192 patent/US11914448B2/en active Active
- 2019-02-06 JP JP2020570278A patent/JP7262819B2/ja active Active
- 2019-02-06 WO PCT/JP2019/004315 patent/WO2020161845A1/ja not_active Ceased
- 2019-02-06 KR KR1020217019429A patent/KR102799088B1/ko active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH05205058A (ja) * | 1992-01-24 | 1993-08-13 | Hitachi Ltd | クラスタリング方法 |
| JPH11219374A (ja) * | 1997-10-31 | 1999-08-10 | Hitachi Ltd | データクラスタリング方法、装置およびプログラム記録媒体 |
| US20120303623A1 (en) * | 2011-05-26 | 2012-11-29 | Yahoo! Inc. | System for incrementally clustering news stories |
Non-Patent Citations (2)
| Title |
|---|
| ARBELAITZ, OLATZ ET AL.: "An extensive comparative study of cluster validity indices", PATTERN RECOGNITION, vol. 46, no. 1, 2013, pages 243 - 256, XP055571755, DOI: 10.1016/j.patcog.2012.07.021 * |
| SAITTA, S. ET AL.: "A comprehensive validity index for clustering", INTELLIGENT DATA ANALYSIS, vol. 12, no. 6, 2008, pages 529 - 548, XP055571762, DOI: 10.3233/IDA-2008-12602 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113031877A (zh) * | 2021-04-12 | 2021-06-25 | 中国移动通信集团陕西有限公司 | 数据存储方法、装置、设备及介质 |
| CN113031877B (zh) * | 2021-04-12 | 2024-03-08 | 中国移动通信集团陕西有限公司 | 数据存储方法、装置、设备及介质 |
| WO2022265044A1 (ja) * | 2021-06-18 | 2022-12-22 | 国立大学法人東北大学 | 演算処理装置 |
| CN114119883A (zh) * | 2022-01-29 | 2022-03-01 | 北京中科慧云科技有限公司 | 基于自适应聚类的大粮堆储粮三维云图绘制方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20210118067A (ko) | 2021-09-29 |
| KR102799088B1 (ko) | 2025-04-23 |
| US20220066533A1 (en) | 2022-03-03 |
| JPWO2020161845A1 (https=) | 2020-08-13 |
| JP7262819B2 (ja) | 2023-04-24 |
| US11914448B2 (en) | 2024-02-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020161845A1 (ja) | クラスタリング装置及びクラスタリング方法 | |
| US20230041966A1 (en) | Activation Functions for Deep Neural Networks | |
| CN111033529B (zh) | 神经网络的架构优化训练 | |
| KR20210064354A (ko) | 구현 비용을 목적으로서 포함하는 것에 의한 신경망 트레이닝 | |
| KR102551277B1 (ko) | 병합 조인 시스템 및 병합 조인 방법 | |
| US11275558B2 (en) | Sorting instances of input data for processing through a neural network | |
| US9269041B2 (en) | Hardware enhancements to radial basis function with restricted coulomb energy learning and/or K-nearest neighbor based neural network classifiers | |
| US10997497B2 (en) | Calculation device for and calculation method of performing convolution | |
| Alabassy et al. | A high-accuracy implementation for softmax layer in deep neural networks | |
| CN118780368B (zh) | 动态旁路选择的多模态网络加速方法及系统 | |
| US12462148B2 (en) | Architecture estimation device, architecture estimation method, and computer readable medium | |
| CN117368705B (zh) | 基于图卷积神经网络的数字集成电路时序监测方法 | |
| JP4351196B2 (ja) | パターンマッチングアーキテクチャ | |
| WO2022000576A1 (zh) | 一种形式验证比较点匹配方法、系统、处理器及存储器 | |
| US10599803B2 (en) | High level synthesis apparatus, high level synthesis method, and computer readable medium | |
| CN110969259B (zh) | 具有数据关联自适应舍入的处理核心 | |
| JP2002108958A (ja) | 回路設計システム、回路設計方法および回路設計プログラムを格納したコンピュータ読取り可能な記録媒体 | |
| CN117634566A (zh) | 一种模型的量化感知训练方法及装置 | |
| WO2024253730A1 (en) | Machine learning for netlist design | |
| Ghaffari et al. | A fully pipelined FPGA architecture for multiscale BRISK descriptors with a novel hardware-aware sampling pattern | |
| CN112416709B (zh) | 芯片动态功耗估计方法、装置、处理器芯片及服务器 | |
| CN110097183B (zh) | 信息处理方法以及信息处理系统 | |
| TW202131237A (zh) | 構造轉換裝置、構造轉換方法及構造轉換程式產品 | |
| CN115293083A (zh) | 集成电路时序预测方法、装置、电子设备及存储介质 | |
| CN114880775A (zh) | 一种基于主动学习Kriging模型的可行域搜索方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19914198 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020570278 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19914198 Country of ref document: EP Kind code of ref document: A1 |