CN113850281A - Data processing method and device based on MEANSHIFT optimization - Google Patents

Data processing method and device based on MEANSHIFT optimization Download PDF

Info

Publication number
CN113850281A
CN113850281A CN202110161944.7A CN202110161944A CN113850281A CN 113850281 A CN113850281 A CN 113850281A CN 202110161944 A CN202110161944 A CN 202110161944A CN 113850281 A CN113850281 A CN 113850281A
Authority
CN
China
Prior art keywords
sample
cluster
centers
distance
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110161944.7A
Other languages
Chinese (zh)
Other versions
CN113850281B (en
Inventor
吕超
张继东
沈志平
吴浩宇
吴风蛟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Digital Life Technology Co Ltd
Original Assignee
Tianyi Smart Family Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Smart Family Technology Co Ltd filed Critical Tianyi Smart Family Technology Co Ltd
Priority to CN202110161944.7A priority Critical patent/CN113850281B/en
Priority to PCT/CN2021/136291 priority patent/WO2022166380A1/en
Publication of CN113850281A publication Critical patent/CN113850281A/en
Application granted granted Critical
Publication of CN113850281B publication Critical patent/CN113850281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a data processing method and device based on mean shift. The method comprises the following steps: collecting user behavior data in real time as an original sample set; initializing a cluster center according to the number of clusters and the original sample set; for each sample in the original sample set, determining whether two or more cluster centers are closest to the sample, if so, calculating a local density gradient direction of the sample by using mean shift, calculating a similarity between the local density gradient direction of the sample and a direction of the sample towards each of the two or more cluster centers, and dividing the sample into the cluster corresponding to the maximum similarity; otherwise, dividing the sample into a cluster closest to the center of the cluster; and pushing related data to each user group in real time according to the clustering result.

Description

Data processing method and device based on MEANSHIFT optimization
Technical Field
The invention relates to the field of data mining and machine learning, in particular to a data processing method and device based on MEANSHIFT optimization.
Background
With the rapid development of modern information technology, the world has spanned the internet + big data era. Big data is changing people's thinking, production and life style deeply, and big data is deeply fused with each industry, producing unprecedented social and commercial value. A plurality of data processing methods based on data mining and machine learning are generated in the big data development process, wherein the traditional K-means algorithm is used for processing N samples
Figure BDA0002935756730000011
The K samples are randomly selected as initial cluster centers, the original samples are divided into the clusters where the cluster centers closest to the original samples are located based on a minimum distance rule, and when the distances between the samples and the centers of one or more other clusters are close to the minimum distance, the K-means clustering effect is not ideal. How to improve the clustering effect in this scenario becomes an urgent problem to be solved.
Chinese patent application 'a K-means clustering method based on density Canopy' (CN201911127104.8) proposes a K-means clustering method based on density Canopy, and the density Canopy is taken as a preprocessing step of a K-means algorithm, so that the clustering accuracy is improved compared with that of the traditional K-means algorithm, but the method does not consider the relation between an original sample and other clusters, only local optimization is ensured, and global optimization cannot be obtained.
The Chinese patent application 'K-means clustering method based on a neural network' (CN201810570097.8) provides a K-means clustering method based on a neural network, which solves the problems that the prior K-means iteratively optimizes clustering centers and label distribution by two independent steps, so that the inference speed is slow, new data, large-scale data and online data cannot be processed, and the prior K-means is sensitive to an initial value.
Therefore, in order to make the sample division more reasonable and further improve the clustering accuracy under the condition that the sample is closest to and similar to the plurality of clusters, it is desirable to provide an improved data processing method.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The invention provides a data processing method and device based on mean shift optimization, which consider the relationship between an original sample and other clusters, so that the edges and peripheral regions of each cluster are divided more reasonably, the cluster is compact, and the clustering precision and speed are greatly improved.
According to an aspect of the present invention, there is provided a data processing method, the method including:
collecting user behavior data in real time as an original sample set;
initializing a cluster center according to the number of clusters and the original sample set;
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
The local density gradient direction of the sample is calculated using mean shift meanshift,
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, an
Dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
and pushing related data to each user group in real time according to the clustering result.
According to one embodiment of the present invention, determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
Figure BDA0002935756730000021
Wherein if a set exists
Figure BDA0002935756730000022
Then determine presence
Figure BDA0002935756730000023
The cluster center is closest to the sample, where ε is a threshold set by human experience.
According to a further embodiment of the present invention, calculating the local density gradient direction of the sample using mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
According to a further embodiment of the present invention, calculating the similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
According to a further embodiment of the present invention, the initializing of the cluster centers is performed by a K-means + + clustering algorithm, wherein the distance between the respective cluster centers is as large as possible.
According to another aspect of the present invention, there is provided a data processing apparatus, the apparatus comprising:
a data collection module configured to collect user behavior data in real-time as an original sample set;
an initializing cluster center module configured to initialize a cluster center according to a number of clusters and the original sample set;
a data clustering module configured to:
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
Calculating a local density gradient direction of the sample using a mean shift mean, calculating a similarity between the local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, and
dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
a data push module configured to push relevant data in real-time to respective user groups associated with respective class clusters based on the clustering results.
According to one embodiment of the present invention, determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
Figure BDA0002935756730000031
Wherein if a set exists
Figure BDA0002935756730000032
Then determine presence
Figure BDA0002935756730000033
The cluster center is closest to the sample, where ε is a threshold set by human experience.
According to a further embodiment of the present invention, calculating the local density gradient direction of the sample using mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
According to a further embodiment of the present invention, calculating the similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
According to a further embodiment of the present invention, the initializing of the cluster centers is performed by a K-means + + clustering algorithm, wherein the distance between the respective cluster centers is as large as possible.
Compared with the scheme in the prior art, the data processing method and device based on the meanshift optimization provided by the invention at least have the following advantages:
(1) by considering the relation between the original sample and other clusters, the edges and peripheral regions of each cluster are divided more reasonably, the cluster is compact, the clustering effect is improved, and the global optimum is achieved.
(2) Compared with the traditional K-means algorithm, the method can more accurately estimate the central positions of the K clusters, so that the K clusters are quickly converged, and the iteration times are reduced.
These and other features and advantages will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. It is to be understood that both the foregoing general description and the following detailed description are explanatory only and are not restrictive of aspects as claimed.
Drawings
So that the manner in which the above recited features of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only some typical aspects of this invention and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.
Fig. 1 shows an exemplary architecture diagram of a data processing apparatus based on meanshift optimization according to an embodiment of the present invention.
Fig. 2 shows a flowchart of a data processing method based on meanshift optimization according to an embodiment of the present invention.
Fig. 3 shows a flowchart of a meanshift-based clustering algorithm according to an embodiment of the present invention.
FIG. 4 shows an example of a central sample two-dimensional region according to one embodiment of the invention.
Detailed Description
The present invention will be described in detail below with reference to the attached drawings, and the features of the present invention will be further apparent from the following detailed description.
Fig. 1 is an exemplary architecture diagram of a data processing apparatus 100 based on meanshift optimization according to an embodiment of the present invention. As shown in fig. 1, the apparatus 100 of the present invention comprises: the system comprises a data acquisition module 101, an initialization cluster center module 102, a data clustering module 103 and a data pushing module 104.
The data collection module 101 may collect user data in real time as a raw sample set and store it in a big data platform according to data characteristics. As an example, the data collection module 101 may collect behavior data of tv programs watched by the user in real time as an original sample set, where the history of tv programs watched by the user i 30 days before is counted each day, and for each of the T program types, the program types are accumulated according to their corresponding watching time, and the normalized metric is a score, that is, timet/(time1+time2+…+timeT) Wherein each user is to each sectionThe score of the mesh type is stored as the original sample xi
The initialize cluster center module 102 may initialize the cluster center based on the number of clusters and the original sample set. As an example, the initialize cluster centers module 102 may utilize a K-means + + clustering algorithm to initialize the K cluster centers with as large a distance as possible. The K-means + + algorithm comprises the following specific steps: (1) firstly, randomly selecting a sample point X from an original sample set XiAs the first initial cluster center ci(ii) a (2) Then calculate each sample point xiThe shortest distance D (x) between the current existing cluster center and each sample point x is calculatediThe probability P (x) of the next clustering center is selected, and finally the sample point x corresponding to the maximum probability value is selectediAs the next cluster center; and (3) repeating the step (2) until K cluster centers are selected.
The data clustering module 103 may calculate, for each sample closest and approximate to two or more cluster centers in the original sample set, a local density gradient direction of the sample using a mean shift mean algorithm; calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers; and attributing the sample to the cluster corresponding to the maximum similarity for clustering. In particular, the data clustering module 103 may calculate each sample X in the original sample set XiEuclidean distances to the centers of K cluster classes (as can be seen in FIG. 4, each arrow in FIG. 4 points from the center sample point to the cluster class center), x for each sampleiObtaining a distance set, and calculating a sample x according to the distance setiCorresponding distance ratio set, judging sample xiWhether the center closest to and similar to two or more cluster-like centers exists or not, if so, recording the corresponding cluster-like center, and calculating a sample xiThe local mean-shift vector, which represents the direction in which the sample x is directed to the maximum increase in estimated density (referred to simply as the density gradient direction), is computediWith the local density gradient direction of the sample xiSimilarity of directions to the center of various clustersThe sample xiAnd dividing the cluster into the cluster with the maximum similarity and clustering.
The data pushing module 104 may push related data to each user group in real time according to the clustering result. In one example, the tv users may be automatically divided into K groups by a clustering algorithm, then T attributes (program types) in the centers of the clusters of each group are sorted, and the background directionally pushes related programs for each group according to the respective Top-N attributes (program types).
For convenience of explanation, the following will describe the embodiments of the present invention by taking the K-means + + clustering algorithm based on mean shift mean as an example, but those skilled in the art will understand that the present invention is also applicable to other clustering algorithms.
Fig. 2 is a flow diagram of a data processing method 200 based on meanshift optimization according to an embodiment of the invention. The method begins at step 201 with the data collection module 101 collecting user behavior data in real time as a raw sample set X.
In step 202, the initialize cluster center module 102 initializes the cluster center based on the number of clusters and the original sample set. Algorithms for initializing cluster centers include, but are not limited to, K-means + +, K-means, Canopy, and the like.
In step 203, the data clustering module 103 determines, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest and approximate to the sample; if so, calculating a local density gradient direction of the sample using a non-parametric estimation mean shift algorithm, calculating a similarity between the local density gradient direction of the sample and a direction of the sample towards each of the two or more cluster centers, and dividing the sample into the cluster corresponding to the maximum similarity; otherwise, dividing the sample into the cluster closest to the center of the cluster. The specific implementation steps of the algorithm are described in further detail below in fig. 3.
In step 204, the data pushing module 104 pushes relevant data to each user group in real time according to the clustering result.
Fig. 3 shows a flow diagram of a meanshift-based clustering algorithm 300 according to one embodiment of the invention. The detailed steps of the algorithm 300 are as follows:
step 1: inputting the number of clusters K and the original sample set X, i.e.
Figure BDA0002935756730000061
Step 2: initializing K cluster centers using a K-means + + algorithm,
namely, it is
Figure BDA0002935756730000062
And step 3: computing each original sample X in the original sample set XiEuclidean distances to the centers of K clusters of classes, denoted as d (x)i,ck) Where K is 1,2,3, …, K, where the euclidean distance is found by the following equation: for points x and y in the n-dimensional space,
Figure BDA0002935756730000071
thus, for each sample xiObtain a set of distances
Figure BDA0002935756730000072
And 4, step 4: computing the original sample xiC from other cluster centersqDistance and from cluster-like center
Figure BDA0002935756730000073
To obtain a corresponding set of distance ratios
Figure BDA0002935756730000074
Wherein the original sample xiFrom the center of the cluster
Figure BDA0002935756730000075
Is the smallest.
And 5: if it is
Figure BDA0002935756730000076
Are all greater than a threshold value epsilon, then the minimum distance is usedIs divided, i.e. sample xiAnd dividing the cluster into the cluster class closest to the center of the cluster class, wherein epsilon can be a threshold value set by manual experience.
Step 6: if there is a collection
Figure BDA0002935756730000077
Then it indicates that there is
Figure BDA0002935756730000078
Cluster center and sample xiThe distance is nearest and approximate, and the sample x is judged by mean shift mean at the momentiTo which cluster class it belongs. The method comprises the following specific steps:
a) with sample xiAs a center, h is a radius, and is taken as a p-dimensional sphere, which is marked as Sh(xi)。
b) Finding xiOffset mean vector, denoted Mh(xi)。
Figure BDA0002935756730000079
Note that if Z is 0, then look at xiAbnormal points are selected and removed.
c) Finding a sample xiTo
Figure BDA00029357567300000710
And { cvDirections, i.e.
Figure BDA00029357567300000711
d)Mh(xi) Are respectively connected with
Figure BDA00029357567300000712
Calculating corresponding similarity by cosine similarity algorithm, and calculating xiAnd dividing the vectors into clusters with the maximum similarity, wherein the cosine similarity algorithm evaluates the similarity of the two vectors by calculating the cosine value of an included angle of the two vectors, and the greater the cosine value is, the higher the similarity is.
And 7: every sample X in original sample set XiAfter the division is finished, updating the center of each cluster to obtain
Figure BDA00029357567300000713
Calculating the target function of the whole cluster, and marking as E(1)Wherein the objective function expression is as follows:
Figure BDA00029357567300000714
and 8: when E is(t+1)Approximation E(t)If yes, convergence is indicated, and a clustering result is output, otherwise, the step 3-step 7 are continuously executed.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the claimed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.

Claims (10)

1. A method of data processing, the method comprising:
collecting user behavior data in real time as an original sample set;
initializing a cluster center according to the number of clusters and the original sample set;
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
The local density gradient direction of the sample is calculated using mean shift meanshift,
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, an
Dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
and pushing related data to each user group in real time according to the clustering result.
2. The method of claim 1, wherein determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
Figure FDA0002935756720000011
Wherein if a set exists
Figure FDA0002935756720000012
Then determine presence
Figure FDA0002935756720000013
The cluster center is closest to the sample, where ε is a threshold set by human experience.
3. The method of claim 1, wherein calculating the local density gradient direction of the sample using a mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
4. The method of claim 1, wherein computing a similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
5. The method of claim 1, wherein the initializing cluster centers is performed by a K-means + + clustering algorithm, wherein a distance between each cluster center is as large as possible.
6. A data processing apparatus, characterized in that the apparatus comprises:
a data collection module configured to collect user behavior data in real-time as an original sample set;
an initializing cluster center module configured to initialize a cluster center according to a number of clusters and the original sample set;
a data clustering module configured to:
determining, for each sample in the original sample set, whether there are two or more cluster-like centers that are closest in distance to the sample,
if present, then
The local density gradient direction of the sample is calculated using mean shift meanshift,
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster centers, an
Dividing the samples into class clusters corresponding to the maximum similarity;
otherwise, dividing the sample into a cluster closest to the center of the cluster; and
a data push module configured to push relevant data in real-time to respective user groups associated with respective class clusters based on the clustering results.
7. The apparatus of claim 6, wherein determining whether there are two or more cluster-like centers closest in distance to the sample further comprises:
calculating Euclidean distances from the samples to the centers of K clusters to obtain a distance set aiming at the samples, wherein K is the number of the clusters;
calculating the distance c between the sample and the center of other clusterqTo the smallest distance in said set of distances to obtain a corresponding set of distance ratios
Figure FDA0002935756720000021
Wherein if a set exists
Figure FDA0002935756720000022
Then determine presence
Figure FDA0002935756720000023
The cluster center is closest to the sample, where ε is a threshold set by human experience.
8. The apparatus of claim 6, wherein calculating the local density gradient direction of the sample using a mean shift mean further comprises:
a mean-shift vector local to the sample is calculated, where the vector represents the direction of greatest increase relative to the estimated density to which the sample itself points.
9. The apparatus of claim 6, wherein calculating a similarity further comprises:
calculating a similarity between a local density gradient direction of the sample and a direction of the sample toward each of the two or more cluster-like centers using a cosine similarity algorithm, wherein the greater the cosine value, the higher the similarity.
10. The apparatus of claim 6, wherein the initializing cluster centers is performed by a K-means + + clustering algorithm, wherein a distance between each cluster center is as large as possible.
CN202110161944.7A 2021-02-05 2021-02-05 MEANSHIFT optimization-based data processing method and device Active CN113850281B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110161944.7A CN113850281B (en) 2021-02-05 2021-02-05 MEANSHIFT optimization-based data processing method and device
PCT/CN2021/136291 WO2022166380A1 (en) 2021-02-05 2021-12-08 Data processing method and apparatus based on meanshift optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110161944.7A CN113850281B (en) 2021-02-05 2021-02-05 MEANSHIFT optimization-based data processing method and device

Publications (2)

Publication Number Publication Date
CN113850281A true CN113850281A (en) 2021-12-28
CN113850281B CN113850281B (en) 2024-03-12

Family

ID=78972859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110161944.7A Active CN113850281B (en) 2021-02-05 2021-02-05 MEANSHIFT optimization-based data processing method and device

Country Status (2)

Country Link
CN (1) CN113850281B (en)
WO (1) WO2022166380A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913423A (en) * 2022-05-25 2022-08-16 中国电建集团成都勘测设计研究院有限公司 Model training method and extraction method for surrounding rock fracture information
CN115563522A (en) * 2022-12-02 2023-01-03 湖南工商大学 Traffic data clustering method, device, equipment and medium
CN117808549A (en) * 2023-12-29 2024-04-02 深圳市中港星互联网科技有限公司 Product recommendation method for providing health degree solution based on enterprise data

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304776B (en) * 2023-03-21 2023-11-21 宁波送变电建设有限公司运维分公司 Power grid data value anomaly detection method and system based on k-Means algorithm
CN116628289B (en) * 2023-07-25 2023-12-01 泰能天然气有限公司 Heating system operation data processing method and strategy optimization system
CN117113118B (en) * 2023-10-19 2024-01-26 张家港长三角生物安全研究中心 Intelligent monitoring method and system for biological aerosol
CN117217501B (en) * 2023-11-09 2024-02-20 山东多科科技有限公司 Digital production planning and scheduling method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777126A (en) * 2010-02-10 2010-07-14 华中科技大学 Clustering method for multidimensional characteristic vectors
CN104008127A (en) * 2014-04-21 2014-08-27 中国电子科技集团公司第二十八研究所 Group identification method based on clustering algorithm
CN106779073A (en) * 2016-12-27 2017-05-31 西安石油大学 Media information sorting technique and device based on deep neural network
CN108985318A (en) * 2018-05-28 2018-12-11 中国地质大学(武汉) A kind of global optimization K mean cluster method and system based on sample rate
CN110019563A (en) * 2018-08-09 2019-07-16 北京首钢自动化信息技术有限公司 A kind of portrait modeling method and device based on multidimensional data
CN110134839A (en) * 2019-03-27 2019-08-16 平安科技(深圳)有限公司 Time series data characteristic processing method, apparatus and computer readable storage medium
CN110852370A (en) * 2019-11-06 2020-02-28 国网湖南省电力有限公司 Clustering algorithm-based large-industry user segmentation method
CN111967338A (en) * 2020-07-27 2020-11-20 广东电网有限责任公司广州供电局 Method and system for distinguishing partial discharge pulse interference signal based on mean shift clustering algorithm

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222234A (en) * 2011-07-14 2011-10-19 苏州两江科技有限公司 Image object extraction method based on mean shift and K-means clustering technology
US11977959B2 (en) * 2019-05-15 2024-05-07 EMC IP Holding Company LLC Data compression using nearest neighbor cluster
CN110441819B (en) * 2019-08-06 2020-10-27 五季数据科技(北京)有限公司 Earthquake first-motion wave automatic pickup method based on mean shift clustering analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777126A (en) * 2010-02-10 2010-07-14 华中科技大学 Clustering method for multidimensional characteristic vectors
CN104008127A (en) * 2014-04-21 2014-08-27 中国电子科技集团公司第二十八研究所 Group identification method based on clustering algorithm
CN106779073A (en) * 2016-12-27 2017-05-31 西安石油大学 Media information sorting technique and device based on deep neural network
CN108985318A (en) * 2018-05-28 2018-12-11 中国地质大学(武汉) A kind of global optimization K mean cluster method and system based on sample rate
CN110019563A (en) * 2018-08-09 2019-07-16 北京首钢自动化信息技术有限公司 A kind of portrait modeling method and device based on multidimensional data
CN110134839A (en) * 2019-03-27 2019-08-16 平安科技(深圳)有限公司 Time series data characteristic processing method, apparatus and computer readable storage medium
CN110852370A (en) * 2019-11-06 2020-02-28 国网湖南省电力有限公司 Clustering algorithm-based large-industry user segmentation method
CN111967338A (en) * 2020-07-27 2020-11-20 广东电网有限责任公司广州供电局 Method and system for distinguishing partial discharge pulse interference signal based on mean shift clustering algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913423A (en) * 2022-05-25 2022-08-16 中国电建集团成都勘测设计研究院有限公司 Model training method and extraction method for surrounding rock fracture information
CN115563522A (en) * 2022-12-02 2023-01-03 湖南工商大学 Traffic data clustering method, device, equipment and medium
CN115563522B (en) * 2022-12-02 2023-04-07 湖南工商大学 Traffic data clustering method, device, equipment and medium
CN117808549A (en) * 2023-12-29 2024-04-02 深圳市中港星互联网科技有限公司 Product recommendation method for providing health degree solution based on enterprise data

Also Published As

Publication number Publication date
WO2022166380A1 (en) 2022-08-11
CN113850281B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN113850281B (en) MEANSHIFT optimization-based data processing method and device
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
Zhang et al. Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling
US10719780B2 (en) Efficient machine learning method
CN113326731B (en) Cross-domain pedestrian re-identification method based on momentum network guidance
CN109815801A (en) Face identification method and device based on deep learning
CN108647577A (en) A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN105760888B (en) A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute
CN103425996B (en) A kind of large-scale image recognition methods of parallel distributed
CN111160407B (en) Deep learning target detection method and system
CN112699953B (en) Feature pyramid neural network architecture searching method based on multi-information path aggregation
Abdul Samadh et al. Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization
CN111125469B (en) User clustering method and device of social network and computer equipment
JP6897749B2 (en) Learning methods, learning systems, and learning programs
CN113076970A (en) Gaussian mixture model clustering machine learning method under deficiency condition
CN112784929A (en) Small sample image classification method and device based on double-element group expansion
CN109299263A (en) File classification method, electronic equipment and computer program product
CN107783998A (en) The method and device of a kind of data processing
CN115048539B (en) Social media data online retrieval method and system based on dynamic memory
CN111325276A (en) Image classification method and device, electronic equipment and computer-readable storage medium
CN117633597A (en) Resident peak-valley electricity utilization characteristic classification method and system based on self-adaptive spectral clustering
CN114781779A (en) Unsupervised energy consumption abnormity detection method and device and storage medium
CN117495891B (en) Point cloud edge detection method and device and electronic equipment
Hassan et al. Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization
CN109711439A (en) A kind of extensive tourist's representation data clustering method in density peak accelerating neighbor seaching using Group algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220127

Address after: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200072

Applicant after: Tianyi Digital Life Technology Co.,Ltd.

Address before: 201702 3rd floor, 158 Shuanglian Road, Qingpu District, Shanghai

Applicant before: Tianyi Smart Family Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant