CN111275099A - Clustering method and clustering system based on grid granularity calculation - Google Patents

Clustering method and clustering system based on grid granularity calculation Download PDF

Info

Publication number
CN111275099A
CN111275099A CN202010055555.1A CN202010055555A CN111275099A CN 111275099 A CN111275099 A CN 111275099A CN 202010055555 A CN202010055555 A CN 202010055555A CN 111275099 A CN111275099 A CN 111275099A
Authority
CN
China
Prior art keywords
grid
granularity
clustering
grids
density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010055555.1A
Other languages
Chinese (zh)
Inventor
徐慧
姚舜宇
李倩云
高鳗
张伟
陈宏伟
刘伟
宗欣露
苏军
严灵毓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN202010055555.1A priority Critical patent/CN111275099A/en
Publication of CN111275099A publication Critical patent/CN111275099A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention belongs to the technical field of data processing and discloses a clustering method and a clustering system based on grid granularity calculation, wherein the clustering method based on the grid granularity calculation comprises the steps of reading an original data set; initializing relevant parameters; dividing n-dimensional data into mutually disjoint grids, traversing all the grids and marking the grids as a central grid, an edge grid and a noise grid; and performing density calculation based on granularity on the processed grid, obtaining a clustering center according to a density peak value, and finally outputting a clustering result. On the basis of a K-means algorithm, the influence of noise is eliminated, and the selection of an initial point is optimized; the problem of large calculation amount of a fast clustering algorithm based on density peak values is solved through gridding optimization, and excessive manual decision and errors caused by the manual decision are avoided. By introducing the concept of granularity, the edge of a dense area is prevented from being damaged during gridding, and the accuracy of the cluster initialization center point is improved.

Description

Clustering method and clustering system based on grid granularity calculation
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a clustering method and a clustering system based on grid granularity calculation.
Background
Currently, the closest prior art: the development of big data technology, along with the rapid increase of the generated data volume, the mining of big data becomes a common problem, and the traditional data storage and processing data can not meet the requirements. Clustering analysis has again become a research hotspot as an important technique for the analysis of various data. Conventional clustering algorithms include partition-based algorithms, hierarchy-based algorithms, density-based algorithms, and the like.
The objective of cluster analysis is to find structures hidden in data and to classify data having the same properties as much as possible into the same class according to some similarity measure.
The K-means algorithm is one of ten classic algorithms in the field of machine learning. The K-means algorithm is a hard clustering algorithm and is a typical target function clustering method based on a prototype, namely, a certain distance from a data point to the prototype is used as an optimized target function, and an adjustment rule of iterative operation is obtained by using a function extremum solving method. The K-means algorithm takes Euclidean distance as similarity measure, and solves the optimal classification of a corresponding initial clustering center vector V, so that the evaluation index J is minimum. The algorithm uses a sum of squared errors criterion function as a clustering criterion function.
The Fast Clustering (CFSFDP) algorithm based on density peak is a Clustering algorithm based on density, and takes a high-density area as a judgment basis. The CFSFDP algorithm first calculates the local density of each point by using a truncation distance, and then calculates the minimum distance between each data point and the data points whose local density is higher than them; then drawing a decision graph according to the calculated local density and minimum distance of each point, then manually selecting a clustering center in the decision graph, and then dividing the data points of the rest non-clustering centers into the clusters where the clustering centers closest to the data points are located; and finally, dividing each obtained cluster into a cluster core and a cluster halo so as to obtain a final clustering result. This nonparametric approach, compared to the conventional approach, is suitable for processing data sets of any shape and does not require setting the number of clusters in advance.
CLIQUE (clustering In QUEst) is a simple grid-based clustering method for finding density-based clusters In a subspace. CLIQUE divides each dimension into non-overlapping intervals, thereby dividing the entire embedding space of the data object into cells. Each attribute is divided into N equal parts, the whole data space is divided into a super-rectangular body set, data points of each unit are counted, the units larger than a certain threshold value S are called as dense units, and then the dense units are connected to form a class. Unlike other methods, it can automatically identify classes embedded in the data subspace.
Granularity is a database term, and in the field of computers, granularity refers to the minimum value of system memory expansion increment. Granularity issues are one of the most important aspects of designing a data warehouse. Granularity refers to the level of refinement or integration of the data held in the data units of a data warehouse. The higher the refinement degree is, the smaller the granularity level is; conversely, the lower the degree of refinement, the larger the granularity level. The main problem with granularity is to have it at a suitable level, which can be neither too high nor too low. A low level of granularity can provide exhaustive data, but takes up more storage space and requires longer query times. The high granularity level can be conveniently inquired at high speed. But cannot provide overly thin data.
In summary, as a classical algorithm for solving the clustering problem, the K-means algorithm is simple and fast, and when the structure set is dense and the difference between clusters is obvious, the clustering result is better, and when a large amount of data is processed, the algorithm has higher scalability and high efficiency. However, the conventional K-means algorithm also has a plurality of defects at present and needs to be further optimized; (1) the method of randomly selecting the initial clustering centers causes instability of an algorithm and is likely to fall into a locally optimal condition. (2) The K-means algorithm is sensitive to noise and isolated point data, the mass center of a cluster is taken as a cluster center and added into the next round of calculation, so that a small amount of data can greatly influence the average value, and results are unstable and even wrong. (3) Any cluster cannot be found, and generally only spherical clusters can be found. Because the K-means algorithm mainly measures the similarity between data objects by using the euclidean distance function, and uses the sum of squared errors as a criterion function, only spherical clusters with more uniformly distributed data objects can be generally found.
For the CFSFDP algorithm, the algorithm principle is simple and easy to realize, and the clustering effect is excellent and has great attention. However, there are some limitations, such as the truncation distance is selected by the user according to experience, and if the selection is not appropriate, the clustering result is poor. In addition, when the density of the data points is measured, only one constant truncation distance parameter exists, and a good clustering effect cannot be obtained under the condition that a plurality of high-density points exist in the same cluster at the same time.
CLIQUE has the advantage of high efficiency of grid-like algorithms, is insensitive to data input order, and does not need to assume any normative data distribution. It expands linearly with the size of the input data, has good scalability as the data dimensions increase, and is very effective for clustering of high dimensional data in large databases. It also has many limitations: (1) like most density-based clustering algorithms, grid-based clustering relies heavily on the choice of density thresholds (too high, clusters may be lost; too low, clusters that should be separated may be merged). (2) If there are different densities of clusters and noise, it may not be possible to find values that fit in all parts of the data space. (3) Many steps of the CLIQUE algorithm use an approximation algorithm, and the accuracy of the clustering results may be reduced accordingly.
The difficulty of solving the technical problems is as follows: the difficulty and the characteristic of the invention are mainly that how to integrate the idea of the granularity into a K-means algorithm, a clustering algorithm based on grids and a clustering algorithm based on density and realize the clustering method based on grid granularity calculation from the perspective of integration.
The significance of solving the technical problems is as follows: the process of dividing a collection of physical or abstract objects into classes composed of similar objects is called clustering. The cluster generated by clustering is a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters. There are a number of classification problems in the natural and social sciences. Cluster analysis is a statistical analysis method for studying (sample or index) classification problems. By solving the technical problem, the quality of final clustering can be improved, so that the final clustering effect is better.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a clustering method and a clustering system based on grid granularity calculation. The clustering method provided by the invention introduces the idea of granularity, solves the problems that the K-means clustering algorithm has high selection dependency on an initial central point and is sensitive to noise and isolated point data on the basis of the K-means clustering algorithm, and avoids the condition that the CFSFDP algorithm needs to manually participate in selecting the clustering center so as to improve the performance of the K-means clustering algorithm.
The invention is realized in such a way that a clustering method based on grid granularity calculation comprises the following steps: the data set is gridded and divided into a center grid, an edge grid and a noise grid by calculating the density of the grid and the adjacent grid. And density calculation based on granularity is introduced to avoid damaging the edge of the dense area, and finally, the accuracy of the cluster initialization center point is improved.
The invention eliminates the influence of noise and optimizes the selection of the initial point on the basis of the K-means algorithm. The problem of large calculation amount of the CFSFDP algorithm is solved through gridding optimization, and excessive manual decision and errors caused by the manual decision are avoided. By introducing the concept of granularity, the edge of a dense area is prevented from being damaged during gridding, and the accuracy of the cluster initialization center point is improved.
Further, the clustering method based on grid granularity calculation further comprises the following steps:
reading and initializing a data set;
step two, gridding the data set;
step three, calculating the density of each grid, classifying the grids, dividing the grids into mutually-disjoint grids, traversing all the grids, marking the grids as a central grid, an edge grid and a noise grid, and removing the noise grid;
fourthly, performing granularity calculation on the processed grids;
step five, obtaining a clustering center according to the density peak value;
and step six, outputting a clustering result.
Further, in step one, initializing data includes: the data set is read and initialized so that all data is projected into the data space.
Further, in the second step, gridding the data set includes: uniformly dividing each dimension of the data space into the same segment number, and recording the segment number as f; forming a plurality of grid objects with the same size, eliminating the grid objects with the data object number of 0 in the grid objects, and marking the rest grid objects as a grid object set as G; the number of grid objects in the grid object set is recorded as N;
the specific steps of gridding are as follows:
1) uniformly dividing each dimension in a data set space containing n data objects into the same segment number, marking as f, giving an initial value of f 2, and forming a grid;
2) removing grids with the data object number of 0 in the grid objects, and recording the number of the remaining non-empty grid objects as N;
3) if N < N/6, making f equal to f +1, returning to Step1, otherwise making f equal to f/2, returning to Step 1);
4) determining a grid object set G, dividing the number of segments f and the number of grid objects N.
Further, in step three, calculating the density of the grid and classifying includes: the number of data points contained in each grid is calculated and the grids are classified into the following three categories:
(1) if all the adjacent grids of the grid are grids containing data, marking as a central grid;
(2) if the adjacent grids of the grid have a central grid and an empty grid, marking as an edge grid;
(3) if the adjacent grid of the grid is only an empty grid, the adjacent grid is marked as a noise grid, and the noise grid data is removed.
Further, in step four, the density calculation based on the granularity includes: centering on a central mesh, and making the central mesh and all adjacent meshes total 3dThe individual grids are defined as the minimum computational granularity (where d is the dimension);
putting all the central grids into a calculation queue; calculating the density of the set of grids at the granularity according to the divided grids; sequentially carrying out density calculation based on granularity on the central grids according to the queues;
in the fifth step, the method for obtaining the clustering center comprises the following steps: and clustering the grid objects and the data points in each grid object through the initial points obtained in the fourth step.
Further, the granularity-based density calculation method includes:
step1: setting the middle point of the area with the highest density as an initial point; if the number of the current initial points reaches k, stopping the current step;
the method for acquiring the initial point comprises the following steps: sorting the density of the minimum granularity, wherein the grid object at the position of the cluster center has higher density rho, setting the central grid midpoint of the area with the highest density as an initial point, marking the grid set as the initial point and the adjacent grid as processed grids, setting the grid density of the area as 0, recalculating the granularity density and sorting if the adjacent grid of the central grid is also the central grid, and setting the central grid midpoint of the area with the highest density as an initial point again until the number of the initial points reaches k;
step2: otherwise, marking the grid set as the initial point and the adjacent grid as the processed grid, setting the grid density of the area as 0, recalculating the granularity, and returning to the step 1;
another object of the present invention is to provide a clustering control system for implementing the grid granularity calculation-based clustering method.
It is a further object of the present invention to provide a computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface for implementing said grid granularity calculation based clustering method when executed on an electronic device.
It is another object of the present invention to provide a computer-readable storage medium, comprising instructions which, when run on a computer, cause the computer to perform the above-mentioned clustering method based on grid-granularity computation.
Another object of the present invention is to provide a calculator for implementing the clustering method based on grid granularity calculation.
In summary, the advantages and positive effects of the invention are: the invention provides a clustering method based on grid granularity calculation, which reads an original data set; initializing relevant parameters; dividing n-dimensional data into mutually disjoint grids, traversing all the grids and marking the grids as a central grid, an edge grid and a noise grid; and then carrying out granularity calculation on the processed grid, obtaining a clustering center according to the density peak value, and finally outputting a clustering result. The method has the advantages of high accuracy, small difference of clustering effects of different data sets and small parameter dependence.
Compared with the prior art, the invention has the advantages that: the invention gridds the data set and divides the data set into a central grid, an edge grid and a noise grid by calculating the density of the grids and the adjacent grids. And density calculation based on granularity is introduced to avoid damaging the edge of the dense area, and finally, the accuracy of the cluster initialization center point is improved.
The invention eliminates the influence of noise and optimizes the selection of the initial point on the basis of the K-means algorithm. The problem of large calculation amount of the CFSFDP algorithm is solved through gridding optimization, and excessive manual decision and errors caused by the manual decision are avoided. By introducing the concept of granularity, the edge of a dense area is prevented from being damaged during gridding, and the accuracy of the cluster initialization center point is improved.
Drawings
Fig. 1 is a flowchart of a clustering method based on grid granularity calculation according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a clustering method based on grid granularity calculation according to an embodiment of the present invention.
Fig. 3 is a graph of the minimum computation granularity of 3 × 3 meshes in a divided two-dimensional mesh provided in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The K-means algorithm is one of ten classic algorithms in the field of machine learning, and is widely applied to science and industry due to simplicity and high efficiency. However, the conventional K-means clustering method has two obvious disadvantages:
the algorithm initially selects cluster centers at random. For clustering algorithms, the initial clustering center is important because it is the basis for the computation of the result, and the next center is updated from the previous center. It is difficult and time consuming to converge to the correct result if the initial center is randomly generated.
The algorithm is sensitive to noise or isolated points, and since the center point is calculated by mean, once a noise point is classified into a designated cluster, the center point is necessarily deviated from the actual position.
Aiming at the problems in the prior art, the invention provides a clustering method and a clustering system based on grid granularity calculation, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the clustering method based on grid granularity calculation provided by the embodiment of the present invention includes:
s101: the data set is read and initialized.
S102: the data set is gridded.
S103: the density of each mesh is calculated and classified, and the noise mesh is removed.
S104: the granular density calculation is performed for the region containing the central mesh.
S105: and taking the point of the previous step as an initial point, and carrying out k-means clustering.
S106: and outputting the result.
Fig. 2 is a principle of a clustering method based on grid granularity calculation according to an embodiment of the present invention.
In step S101, initializing data includes: the data set is read and initialized so that all data is projected into the data space.
In step S102, gridding the data set includes: the algorithm needs to divide each dimension of the data space into the same segment number uniformly, and the segment number is marked as f; forming a plurality of grid objects with the same size, eliminating the grid objects with the data object number of 0 in the grid objects, and marking the rest grid objects as a grid object set as G; the number of grid objects in the grid object set is recorded as N, and experimental results show that the algorithm has better clustering quality when the number N of the grid objects in the grid object set is greater than or equal to 1/6 of the data volume N in the data set.
As a preferred embodiment of the present invention, the specific steps of gridding are as follows:
step1, uniformly dividing each dimension of the data set space containing n data objects into the same number of segments, and marking the segments as f (giving f an initial value of 2) to form a grid.
And Step2, removing grids with the data object number of 0 in the grid objects, and recording the number of the remaining non-empty grid objects as N.
And Step3, if N is less than N/6, making f equal to f +1 and returning to Step1, otherwise, making f equal to f/2 and returning to Step 1.
And Step4, determining a grid object set G, dividing the number of segments f and the number of grid objects N.
In step S103, the calculating the density of the mesh and classifying includes: the number of data points contained in each grid is calculated and the grids are classified into the following 3 categories:
(1) if all the adjacent grids of the grid are grids containing data, the grid is marked as a central grid.
(2) If the adjacent grids of the grid have a center grid and an empty grid, the grid is marked as an edge grid.
(3) If the adjacent grid of the grid is only an empty grid, the adjacent grid is marked as a noise grid, and the noise grid data is removed.
As a preferred embodiment of the present invention, the side length can also be directly made as follows when dividing the grid:
Figure BDA0002372667530000081
in step S104, the density calculation based on the granularity includes: centering on a central mesh, and making the central mesh and all adjacent meshes together 3dThe individual grids are defined as the minimum computational granularity (where d is the dimension).
If the minimum particle size is chosen to be 2dThen the position of the central mesh in the set of meshes needs to be considered, and the granularity is not adopted to avoid the error caused by the position; if the minimum granularity is selected to be higher, the probability of multiple density peaks under the current mesh partitioning method becomes higher. Thus, option 3dAs the minimum computational granularity.
All the central grids are put into a calculation queue. The density of the set of meshes at the granularity is calculated from the divided meshes. And sequentially carrying out density calculation based on granularity on the central grids according to the queues. As shown in fig. 3, in a divided two-dimensional grid, 3 × 3 grids are selected as the minimum computation granularity.
The following procedure is performed for each granularity:
step 4.1: the midpoint of the region of highest density is set as an initial point.
Step 4.1.1: and if the number of the current initial points reaches k, stopping the current step.
Step 4.2: otherwise, marking the grid set as the initial point and the adjacent grid as the processed grid, setting the grid density of the area as 0, recalculating the granularity, and returning to the step 4.1.
In step S104, acquiring the initial point includes: and (3) sorting the density with the minimum granularity, wherein the grid object at the position of the cluster center has higher density rho, setting the central grid midpoint of the area with the highest density as an initial point, marking the grid and the adjacent grid which are set as the initial point as processed grids, setting the grid density of the area as 0, if the adjacent grid of the central grid is also the central grid, carrying out the same processing on the central grid, recalculating the granularity density and sorting, and setting the central grid midpoint of the area with the highest density as an initial point again until the number of the initial points reaches k.
In step S105, the cluster calculation includes: and clustering the grid object and the data points in each grid object through the initial points obtained in the steps, and outputting a clustering result.
The invention is further described below in connection with a comparative table with the prior art.
Figure BDA0002372667530000091
Figure BDA0002372667530000101
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When used in whole or in part, can be implemented in a computer program product that includes one or more computer instructions. When loaded or executed on a computer, cause the flow or functions according to embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.)). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A clustering method based on grid granularity calculation is characterized by comprising the following steps:
reading and initializing a data set;
step two, gridding the data set;
step three, calculating the density of each grid, classifying the grids, dividing the grids into mutually-disjoint grids, traversing all the grids, marking the grids as a central grid, an edge grid and a noise grid, and removing the noise grid;
fourthly, performing density calculation based on granularity on the processed grids;
step five, obtaining a clustering center according to the density peak value;
and step six, outputting a clustering result.
2. The method for clustering based on grid granularity calculation of claim 1, wherein in the first step, initializing data comprises: the data set is read and initialized so that all data is projected into the data space.
3. The method for clustering based on grid-granularity computation of claim 1, wherein in the second step, gridding the data set comprises: uniformly dividing each dimension of the data space into the same segment number, and recording the segment number as f; forming a plurality of grid objects with the same size, eliminating the grid objects with the data object number of 0 in the grid objects, and marking the rest grid objects as a grid object set as G; the number of grid objects in the grid object set is recorded as N;
the specific steps of gridding are as follows:
1) uniformly dividing each dimension in a data set space containing n data objects into the same segment number, marking as f, giving an initial value of f 2, and forming a grid;
2) removing grids with the data object number of 0 in the grid objects, and recording the number of the remaining non-empty grid objects as N;
3) if N < N/6, making f equal to f +1, returning to Step1, otherwise making f equal to f/2, returning to Step 1);
4) determining a grid object set G, dividing the number of segments f and the number of grid objects N.
4. The method for clustering based on grid granularity calculation as claimed in claim 1, wherein in step three, calculating the density of the grid and classifying comprises: the number of data points contained in each grid is calculated and the grids are classified into the following three categories:
(1) if all the adjacent grids of the grid are grids containing data, marking as a central grid;
(2) if the adjacent grids of the grid have a central grid and an empty grid, marking as an edge grid;
(3) if the adjacent grid of the grid is only an empty grid, the adjacent grid is marked as a noise grid, and the noise grid data is removed.
5. The method for clustering based on grid granularity calculation as claimed in claim 1, wherein in the fourth step, the granularity-based density calculation comprises: centering on a central gridA center, the center grid and all adjacent grids are 3dEach grid is defined as the minimum computational granularity, where d is the dimension;
putting all the central grids into a calculation queue; calculating the density of the set of grids at the granularity according to the divided grids; sequentially carrying out density calculation based on granularity on the central grids according to the queues;
in the fifth step, the method for obtaining the clustering center comprises the following steps: and clustering the grid objects and the data points in each grid object through the initial points obtained in the fourth step.
6. The method for clustering based on grid granularity calculation of claim 5, wherein the method for calculating the granularity-based density comprises:
step1: setting the middle point of the area with the highest density as an initial point; if the number of the current initial points reaches k, stopping the current step; the method for acquiring the initial point comprises the following steps: sorting the density of the minimum granularity, wherein the grid object at the position of the cluster center has higher density rho, setting the central grid midpoint of the area with the highest density as an initial point, marking the grid set as the initial point and the adjacent grid as processed grids, setting the grid density of the area as 0, recalculating the granularity density and sorting if the adjacent grid of the central grid is also the central grid, and setting the central grid midpoint of the area with the highest density as an initial point again until the number of the initial points reaches k;
step2: otherwise, marking the grid set as the initial point and the adjacent grid as the processed grid, setting the grid density of the area as 0, recalculating the granularity, and returning to the step 1.
7. A clustering control system for implementing the grid granularity calculation-based clustering method according to any one of claims 1 to 6.
8. A computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface for implementing a grid-granularity computation-based clustering method according to any one of claims 1 to 6 when executed on an electronic device.
9. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the grid granularity calculation-based clustering method of any one of claims 1 to 6.
10. A calculator for implementing the grid granularity calculation-based clustering method according to any one of claims 1 to 6.
CN202010055555.1A 2020-01-17 2020-01-17 Clustering method and clustering system based on grid granularity calculation Pending CN111275099A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010055555.1A CN111275099A (en) 2020-01-17 2020-01-17 Clustering method and clustering system based on grid granularity calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010055555.1A CN111275099A (en) 2020-01-17 2020-01-17 Clustering method and clustering system based on grid granularity calculation

Publications (1)

Publication Number Publication Date
CN111275099A true CN111275099A (en) 2020-06-12

Family

ID=71003029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010055555.1A Pending CN111275099A (en) 2020-01-17 2020-01-17 Clustering method and clustering system based on grid granularity calculation

Country Status (1)

Country Link
CN (1) CN111275099A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361411A (en) * 2021-06-07 2021-09-07 国网新疆电力有限公司哈密供电公司 Random pulse interference signal elimination method based on grid and density clustering algorithm
CN113852845A (en) * 2021-02-05 2021-12-28 天翼智慧家庭科技有限公司 Data processing method and device based on granularity clustering
CN114357099A (en) * 2021-12-28 2022-04-15 福瑞莱环保科技(深圳)股份有限公司 Clustering method, clustering system and storage medium
CN116432988A (en) * 2023-06-12 2023-07-14 青岛精锐机械制造有限公司 Intelligent management method, medium and equipment for valve production process data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852845A (en) * 2021-02-05 2021-12-28 天翼智慧家庭科技有限公司 Data processing method and device based on granularity clustering
CN113361411A (en) * 2021-06-07 2021-09-07 国网新疆电力有限公司哈密供电公司 Random pulse interference signal elimination method based on grid and density clustering algorithm
CN114357099A (en) * 2021-12-28 2022-04-15 福瑞莱环保科技(深圳)股份有限公司 Clustering method, clustering system and storage medium
CN114357099B (en) * 2021-12-28 2024-03-19 福瑞莱环保科技(深圳)股份有限公司 Clustering method, clustering system and storage medium
CN116432988A (en) * 2023-06-12 2023-07-14 青岛精锐机械制造有限公司 Intelligent management method, medium and equipment for valve production process data
CN116432988B (en) * 2023-06-12 2023-09-05 青岛精锐机械制造有限公司 Intelligent management method, medium and equipment for valve production process data

Similar Documents

Publication Publication Date Title
CN111275099A (en) Clustering method and clustering system based on grid granularity calculation
US6012058A (en) Scalable system for K-means clustering of large databases
Sim et al. A survey on enhanced subspace clustering
CN106096066B (en) Text Clustering Method based on random neighbor insertion
CN109886334B (en) Shared neighbor density peak clustering method for privacy protection
US10019649B2 (en) Point cloud simplification
WO2022166380A1 (en) Data processing method and apparatus based on meanshift optimization
CN107832456B (en) Parallel KNN text classification method based on critical value data division
Ashabi et al. The systematic review of K-means clustering algorithm
CN115454779A (en) Cloud monitoring stream data detection method and device based on cluster analysis and storage medium
KR20100045682A (en) Method and system of clustering for multi-dimensional data streams
CN112926635B (en) Target clustering method based on iterative self-adaptive neighbor propagation algorithm
CN108280236A (en) A kind of random forest visualization data analysing method based on LargeVis
CN111522968A (en) Knowledge graph fusion method and device
CN110781943A (en) Clustering method based on adjacent grid search
CN117454255B (en) Intelligent building energy consumption data optimized storage method
CN108764307A (en) The density peaks clustering method of natural arest neighbors optimization
CN108549696B (en) Time series data similarity query method based on memory calculation
Sun Personalized music recommendation algorithm based on spark platform
CN116500703B (en) Thunderstorm monomer identification method and device
CN114186110A (en) Data clustering method, device and equipment and readable storage medium
CN114626451A (en) Data preprocessing optimization method based on density
Li et al. An integrated fast Hough transform for multidimensional data
CN114091559A (en) Data filling method and device, equipment and storage medium
Lukač et al. Sweep-hyperplane clustering algorithm using dynamic model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200612

RJ01 Rejection of invention patent application after publication