CN113850281A - Data processing method and device based on MEANSHIFT optimization - Google Patents
Data processing method and device based on MEANSHIFT optimization Download PDFInfo
- Publication number
- CN113850281A CN113850281A CN202110161944.7A CN202110161944A CN113850281A CN 113850281 A CN113850281 A CN 113850281A CN 202110161944 A CN202110161944 A CN 202110161944A CN 113850281 A CN113850281 A CN 113850281A
- Authority
- CN
- China
- Prior art keywords
- sample
- cluster
- centers
- distance
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title abstract description 11
- 238000005457 optimization Methods 0.000 title description 11
- 238000000034 method Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 6
- 238000013480 data collection Methods 0.000 claims description 5
- 238000003064 k means clustering Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- BULVZWIRKLYCBC-UHFFFAOYSA-N phorate Chemical compound CCOP(=S)(OCC)SCSCC BULVZWIRKLYCBC-UHFFFAOYSA-N 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a data processing method and device based on mean shift. The method comprises the following steps: collecting user behavior data in real time as an original sample set; initializing a cluster center according to the number of clusters and the original sample set; for each sample in the original sample set, determining whether two or more cluster centers are closest to the sample, if so, calculating a local density gradient direction of the sample by using mean shift, calculating a similarity between the local density gradient direction of the sample and a direction of the sample towards each of the two or more cluster centers, and dividing the sample into the cluster corresponding to the maximum similarity; otherwise, dividing the sample into a cluster closest to the center of the cluster; and pushing related data to each user group in real time according to the clustering result.
Description
Technical Field
The invention relates to the field of data mining and machine learning, in particular to a data processing method and device based on MEANSHIFT optimization.
Background
With the rapid development of modern information technology, the world has spanned the internet + big data era. Big data is changing people's thinking, production and life style deeply, and big data is deeply fused with each industry, producing unprecedented social and commercial value. A plurality of data processing methods based on data mining and machine learning are generated in the big data development process, wherein the traditional K-means algorithm is used for processing N samplesThe K samples are randomly selected as initial cluster centers, the original samples are divided into the clusters where the cluster centers closest to the original samples are located based on a minimum distance rule, and when the distances between the samples and the centers of one or more other clusters are close to the minimum distance, the K-means clustering effect is not ideal. How to improve the clustering effect in this scenario becomes an urgent problem to be solved.
Chinese patent application 'a K-means clustering method based on density Canopy' (CN201911127104.8) proposes a K-means clustering method based on density Canopy, and the density Canopy is taken as a preprocessing step of a K-means algorithm, so that the clustering accuracy is improved compared with that of the traditional K-means algorithm, but the method does not consider the relation between an original sample and other clusters, only local optimization is ensured, and global optimization cannot be obtained.
The Chinese patent application 'K-means clustering method based on a neural network' (CN201810570097.8) provides a K-means clustering method based on a neural network, which solves the problems that the prior K-means iteratively optimizes clustering centers and label distribution by two independent steps, so that the inference speed is slow, new data, large-scale data and online data cannot be processed, and the prior K-means is sensitive to an initial value.
Therefore, in order to make the sample division more reasonable and further improve the clustering accuracy under the condition that the sample is closest to and similar to the plurality of clusters, it is desirable to provide an improved data processing method.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The invention provides a data processing method and device based on mean shift optimization, which consider the relationship between an original sample and other clusters, so that the edges and peripheral regions of each cluster are divided more reasonably, the cluster is compact, and the clustering precision and speed are greatly improved.
According to an aspect of the present invention, there is provided a data processing method, the method including:
collecting user behavior data in real time as an original sample set;
initializing a cluster center according to the number of clusters and the original sample set;
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
The local density gradient direction of the sample is calculated using mean shift meanshift,
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, an
Dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
and pushing related data to each user group in real time according to the clustering result.
According to one embodiment of the present invention, determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
Wherein if a set existsThen determine presenceThe cluster center is closest to the sample, where ε is a threshold set by human experience.
According to a further embodiment of the present invention, calculating the local density gradient direction of the sample using mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
According to a further embodiment of the present invention, calculating the similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
According to a further embodiment of the present invention, the initializing of the cluster centers is performed by a K-means + + clustering algorithm, wherein the distance between the respective cluster centers is as large as possible.
According to another aspect of the present invention, there is provided a data processing apparatus, the apparatus comprising:
a data collection module configured to collect user behavior data in real-time as an original sample set;
an initializing cluster center module configured to initialize a cluster center according to a number of clusters and the original sample set;
a data clustering module configured to:
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
Calculating a local density gradient direction of the sample using a mean shift mean, calculating a similarity between the local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, and
dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
a data push module configured to push relevant data in real-time to respective user groups associated with respective class clusters based on the clustering results.
According to one embodiment of the present invention, determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
Wherein if a set existsThen determine presenceThe cluster center is closest to the sample, where ε is a threshold set by human experience.
According to a further embodiment of the present invention, calculating the local density gradient direction of the sample using mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
According to a further embodiment of the present invention, calculating the similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
According to a further embodiment of the present invention, the initializing of the cluster centers is performed by a K-means + + clustering algorithm, wherein the distance between the respective cluster centers is as large as possible.
Compared with the scheme in the prior art, the data processing method and device based on the meanshift optimization provided by the invention at least have the following advantages:
(1) by considering the relation between the original sample and other clusters, the edges and peripheral regions of each cluster are divided more reasonably, the cluster is compact, the clustering effect is improved, and the global optimum is achieved.
(2) Compared with the traditional K-means algorithm, the method can more accurately estimate the central positions of the K clusters, so that the K clusters are quickly converged, and the iteration times are reduced.
These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
Drawings
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
Fig. 1 shows an exemplary architecture diagram of a data processing apparatus based on meanshift optimization according to an embodiment of the present invention.
Fig. 2 shows a flowchart of a data processing method based on meanshift optimization according to an embodiment of the present invention.
Fig. 3 shows a flowchart of a meanshift-based clustering algorithm according to an embodiment of the present invention.
FIG. 4 shows an example of a central sample two-dimensional region according to one embodiment of the invention.
Detailed Description
The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.
Fig. 1 is an exemplary architecture diagram of a data processing apparatus 100 based on meanshift optimization according to an embodiment of the present invention. As shown in fig. 1, the apparatus 100 of the present invention comprises: the system comprises a data acquisition module 101, an initialization cluster center module 102, a data clustering module 103 and a data pushing module 104.
The data collection module 101 may collect user data in real time as a raw sample set and store it in a big data platform according to data characteristics. As an example, the data collection module 101 may collect behavior data of tv programs watched by the user in real time as an original sample set, where the history of tv programs watched by the user i 30 days before is counted each day, and for each of the T program types, the program types are accumulated according to their corresponding watching time, and the normalized metric is a score, that is, timet/(time1+time2+…+timeT) Wherein each user is to each sectionThe score of the mesh type is stored as the original sample xi。
The initialize cluster center module 102 may initialize the cluster center based on the number of clusters and the original sample set. As an example, the initialize cluster centers module 102 may utilize a K-means + + clustering algorithm to initialize the K cluster centers with as large a distance as possible. The K-means + + algorithm comprises the following specific steps: (1) firstly, randomly selecting a sample point X from an original sample set XiAs the first initial cluster center ci(ii) a (2) Then calculate each sample point xiThe shortest distance D (x) between the current existing cluster center and each sample point x is calculatediThe probability P (x) of the next clustering center is selected, and finally the sample point x corresponding to the maximum probability value is selectediAs the next cluster center; and (3) repeating the step (2) until K cluster centers are selected.
The data clustering module 103 may calculate, for each sample closest and approximate to two or more cluster centers in the original sample set, a local density gradient direction of the sample using a mean shift mean algorithm; calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers; and attributing the sample to the cluster corresponding to the maximum similarity for clustering. In particular, the data clustering module 103 may calculate each sample X in the original sample set XiEuclidean distances to the centers of K cluster classes (as can be seen in FIG. 4, each arrow in FIG. 4 points from the center sample point to the cluster class center), x for each sampleiObtaining a distance set, and calculating a sample x according to the distance setiCorresponding distance ratio set, judging sample xiWhether the center closest to and similar to two or more cluster-like centers exists or not, if so, recording the corresponding cluster-like center, and calculating a sample xiThe local mean-shift vector, which represents the direction in which the sample x is directed to the maximum increase in estimated density (referred to simply as the density gradient direction), is computediWith the local density gradient direction of the sample xiSimilarity of directions to the center of various clustersThe sample xiAnd dividing the cluster into the cluster with the maximum similarity and clustering.
The data pushing module 104 may push related data to each user group in real time according to the clustering result. In one example, the tv users may be automatically divided into K groups by a clustering algorithm, then T attributes (program types) in the centers of the clusters of each group are sorted, and the background directionally pushes related programs for each group according to the respective Top-N attributes (program types).
For convenience of explanation, the following will describe the embodiments of the present invention by taking the K-means + + clustering algorithm based on mean shift mean as an example, but those skilled in the art will understand that the present invention is also applicable to other clustering algorithms.
Fig. 2 is a flow diagram of a data processing method 200 based on meanshift optimization according to an embodiment of the invention. The method begins at step 201 with the data collection module 101 collecting user behavior data in real time as a raw sample set X.
In step 202, the initialize cluster center module 102 initializes the cluster center based on the number of clusters and the original sample set. Algorithms for initializing cluster centers include, but are not limited to, K-means + +, K-means, Canopy, and the like.
In step 203, the data clustering module 103 determines, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest and approximate to the sample; if so, calculating a local density gradient direction of the sample using a non-parametric estimation mean shift algorithm, calculating a similarity between the local density gradient direction of the sample and a direction of the sample towards each of the two or more cluster centers, and dividing the sample into the cluster corresponding to the maximum similarity; otherwise, dividing the sample into the cluster closest to the center of the cluster. The specific implementation steps of the algorithm are described in further detail below in fig. 3.
In step 204, the data pushing module 104 pushes relevant data to each user group in real time according to the clustering result.
Fig. 3 shows a flow diagram of a meanshift-based clustering algorithm 300 according to one embodiment of the invention. The detailed steps of the algorithm 300 are as follows:
Step 2: initializing K cluster centers using a K-means + + algorithm,
And step 3: computing each original sample X in the original sample set XiEuclidean distances to the centers of K clusters of classes, denoted as d (x)i,ck) Where K is 1,2,3, …, K, where the euclidean distance is found by the following equation: for points x and y in the n-dimensional space,thus, for each sample xiObtain a set of distances
And 4, step 4: computing the original sample xiC from other cluster centersqDistance and from cluster-like centerTo obtain a corresponding set of distance ratiosWherein the original sample xiFrom the center of the clusterIs the smallest.
And 5: if it isAre all greater than a threshold value epsilon, then the minimum distance is usedIs divided, i.e. sample xiAnd dividing the cluster into the cluster class closest to the center of the cluster class, wherein epsilon can be a threshold value set by manual experience.
Step 6: if there is a collectionThen it indicates that there isCluster center and sample xiThe distance is nearest and approximate, and the sample x is judged by mean shift mean at the momentiTo which cluster class it belongs. The method comprises the following specific steps:
a) with sample xiAs a center, h is a radius, and is taken as a p-dimensional sphere, which is marked as Sh(xi)。
b) Finding xiOffset mean vector, denoted Mh(xi)。
Note that if Z is 0, then look at xiAbnormal points are selected and removed.
d)Mh(xi) Are respectively connected withCalculating corresponding similarity by cosine similarity algorithm, and calculating xiAnd dividing the vectors into clusters with the maximum similarity, wherein the cosine similarity algorithm evaluates the similarity of the two vectors by calculating the cosine value of an included angle of the two vectors, and the greater the cosine value is, the higher the similarity is.
And 7: every sample X in original sample set XiAfter the division is finished, updating the center of each cluster to obtainCalculating the target function of the whole cluster, and marking as E(1)Wherein the objective function expression is as follows:
and 8: when E is(t+1)Approximation E(t)If yes, convergence is indicated, and a clustering result is output, otherwise, the step 3-step 7 are continuously executed.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the claimed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.
Claims (10)
1. A method of data processing, the method comprising:
collecting user behavior data in real time as an original sample set;
initializing a cluster center according to the number of clusters and the original sample set;
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
The local density gradient direction of the sample is calculated using mean shift meanshift,
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, an
Dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
and pushing related data to each user group in real time according to the clustering result.
2. The method of claim 1, wherein determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
3. The method of claim 1, wherein calculating the local density gradient direction of the sample using a mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
4. The method of claim 1, wherein computing a similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
5. The method of claim 1, wherein the initializing cluster centers is performed by a K-means + + clustering algorithm, wherein a distance between each cluster center is as large as possible.
6. A data processing apparatus, characterized in that the apparatus comprises:
a data collection module configured to collect user behavior data in real-time as an original sample set;
an initializing cluster center module configured to initialize a cluster center according to a number of clusters and the original sample set;
a data clustering module configured to:
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
The local density gradient direction of the sample is calculated using mean shift meanshift,
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, an
Dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
a data push module configured to push relevant data in real-time to respective user groups associated with respective class clusters based on the clustering results.
7. The apparatus of claim 6, wherein determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
8. The apparatus of claim 6, wherein calculating the local density gradient direction of the sample using a mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
9. The apparatus of claim 6, wherein calculating a similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
10. The apparatus of claim 6, wherein the initializing cluster centers is performed by a K-means + + clustering algorithm, wherein a distance between each cluster center is as large as possible.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110161944.7A CN113850281B (en) | 2021-02-05 | 2021-02-05 | MEANSHIFT optimization-based data processing method and device |
PCT/CN2021/136291 WO2022166380A1 (en) | 2021-02-05 | 2021-12-08 | Data processing method and apparatus based on meanshift optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110161944.7A CN113850281B (en) | 2021-02-05 | 2021-02-05 | MEANSHIFT optimization-based data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113850281A true CN113850281A (en) | 2021-12-28 |
CN113850281B CN113850281B (en) | 2024-03-12 |
Family
ID=78972859
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110161944.7A Active CN113850281B (en) | 2021-02-05 | 2021-02-05 | MEANSHIFT optimization-based data processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113850281B (en) |
WO (1) | WO2022166380A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913423A (en) * | 2022-05-25 | 2022-08-16 | 中国电建集团成都勘测设计研究院有限公司 | Model training method and extraction method for surrounding rock fracture information |
CN115563522A (en) * | 2022-12-02 | 2023-01-03 | 湖南工商大学 | Traffic data clustering method, device, equipment and medium |
CN117808549A (en) * | 2023-12-29 | 2024-04-02 | 深圳市中港星互联网科技有限公司 | Product recommendation method for providing health degree solution based on enterprise data |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116304776B (en) * | 2023-03-21 | 2023-11-21 | 宁波送变电建设有限公司运维分公司 | Power grid data value anomaly detection method and system based on k-Means algorithm |
CN116628289B (en) * | 2023-07-25 | 2023-12-01 | 泰能天然气有限公司 | Heating system operation data processing method and strategy optimization system |
CN117113118B (en) * | 2023-10-19 | 2024-01-26 | 张家港长三角生物安全研究中心 | Intelligent monitoring method and system for biological aerosol |
CN117217501B (en) * | 2023-11-09 | 2024-02-20 | 山东多科科技有限公司 | Digital production planning and scheduling method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101777126A (en) * | 2010-02-10 | 2010-07-14 | 华中科技大学 | Clustering method for multidimensional characteristic vectors |
CN104008127A (en) * | 2014-04-21 | 2014-08-27 | 中国电子科技集团公司第二十八研究所 | Group identification method based on clustering algorithm |
CN106779073A (en) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | Media information sorting technique and device based on deep neural network |
CN108985318A (en) * | 2018-05-28 | 2018-12-11 | 中国地质大学(武汉) | A kind of global optimization K mean cluster method and system based on sample rate |
CN110019563A (en) * | 2018-08-09 | 2019-07-16 | 北京首钢自动化信息技术有限公司 | A kind of portrait modeling method and device based on multidimensional data |
CN110134839A (en) * | 2019-03-27 | 2019-08-16 | 平安科技(深圳)有限公司 | Time series data characteristic processing method, apparatus and computer readable storage medium |
CN110852370A (en) * | 2019-11-06 | 2020-02-28 | 国网湖南省电力有限公司 | Clustering algorithm-based large-industry user segmentation method |
CN111967338A (en) * | 2020-07-27 | 2020-11-20 | 广东电网有限责任公司广州供电局 | Method and system for distinguishing partial discharge pulse interference signal based on mean shift clustering algorithm |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222234A (en) * | 2011-07-14 | 2011-10-19 | 苏州两江科技有限公司 | Image object extraction method based on mean shift and K-means clustering technology |
US11977959B2 (en) * | 2019-05-15 | 2024-05-07 | EMC IP Holding Company LLC | Data compression using nearest neighbor cluster |
CN110441819B (en) * | 2019-08-06 | 2020-10-27 | 五季数据科技(北京)有限公司 | Earthquake first-motion wave automatic pickup method based on mean shift clustering analysis |
-
2021
- 2021-02-05 CN CN202110161944.7A patent/CN113850281B/en active Active
- 2021-12-08 WO PCT/CN2021/136291 patent/WO2022166380A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101777126A (en) * | 2010-02-10 | 2010-07-14 | 华中科技大学 | Clustering method for multidimensional characteristic vectors |
CN104008127A (en) * | 2014-04-21 | 2014-08-27 | 中国电子科技集团公司第二十八研究所 | Group identification method based on clustering algorithm |
CN106779073A (en) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | Media information sorting technique and device based on deep neural network |
CN108985318A (en) * | 2018-05-28 | 2018-12-11 | 中国地质大学(武汉) | A kind of global optimization K mean cluster method and system based on sample rate |
CN110019563A (en) * | 2018-08-09 | 2019-07-16 | 北京首钢自动化信息技术有限公司 | A kind of portrait modeling method and device based on multidimensional data |
CN110134839A (en) * | 2019-03-27 | 2019-08-16 | 平安科技(深圳)有限公司 | Time series data characteristic processing method, apparatus and computer readable storage medium |
CN110852370A (en) * | 2019-11-06 | 2020-02-28 | 国网湖南省电力有限公司 | Clustering algorithm-based large-industry user segmentation method |
CN111967338A (en) * | 2020-07-27 | 2020-11-20 | 广东电网有限责任公司广州供电局 | Method and system for distinguishing partial discharge pulse interference signal based on mean shift clustering algorithm |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114913423A (en) * | 2022-05-25 | 2022-08-16 | 中国电建集团成都勘测设计研究院有限公司 | Model training method and extraction method for surrounding rock fracture information |
CN115563522A (en) * | 2022-12-02 | 2023-01-03 | 湖南工商大学 | Traffic data clustering method, device, equipment and medium |
CN115563522B (en) * | 2022-12-02 | 2023-04-07 | 湖南工商大学 | Traffic data clustering method, device, equipment and medium |
CN117808549A (en) * | 2023-12-29 | 2024-04-02 | 深圳市中港星互联网科技有限公司 | Product recommendation method for providing health degree solution based on enterprise data |
Also Published As
Publication number | Publication date |
---|---|
WO2022166380A1 (en) | 2022-08-11 |
CN113850281B (en) | 2024-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113850281B (en) | MEANSHIFT optimization-based data processing method and device | |
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
Zhang et al. | Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling | |
US10719780B2 (en) | Efficient machine learning method | |
CN113326731B (en) | Cross-domain pedestrian re-identification method based on momentum network guidance | |
CN109815801A (en) | Face identification method and device based on deep learning | |
CN108647577A (en) | A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system | |
CN105760888B (en) | A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute | |
CN103425996B (en) | A kind of large-scale image recognition methods of parallel distributed | |
CN111160407B (en) | Deep learning target detection method and system | |
CN112699953B (en) | Feature pyramid neural network architecture searching method based on multi-information path aggregation | |
Abdul Samadh et al. | Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization | |
CN111125469B (en) | User clustering method and device of social network and computer equipment | |
JP6897749B2 (en) | Learning methods, learning systems, and learning programs | |
CN113076970A (en) | Gaussian mixture model clustering machine learning method under deficiency condition | |
CN112784929A (en) | Small sample image classification method and device based on double-element group expansion | |
CN109299263A (en) | File classification method, electronic equipment and computer program product | |
CN107783998A (en) | The method and device of a kind of data processing | |
CN115048539B (en) | Social media data online retrieval method and system based on dynamic memory | |
CN111325276A (en) | Image classification method and device, electronic equipment and computer-readable storage medium | |
CN117633597A (en) | Resident peak-valley electricity utilization characteristic classification method and system based on self-adaptive spectral clustering | |
CN114781779A (en) | Unsupervised energy consumption abnormity detection method and device and storage medium | |
CN117495891B (en) | Point cloud edge detection method and device and electronic equipment | |
Hassan et al. | Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization | |
CN109711439A (en) | A kind of extensive tourist's representation data clustering method in density peak accelerating neighbor seaching using Group algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220127 Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200072 Applicant after: Tianyi Digital Life Technology Co.,Ltd. Address before: 201702 3rd floor, 158 Shuanglian Road, Qingpu District, Shanghai Applicant before: Tianyi Smart Family Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |