CN112070140A - Density clustering mark-like pattern recognition method based on dimension decomposition - Google Patents
Density clustering mark-like pattern recognition method based on dimension decomposition Download PDFInfo
- Publication number
- CN112070140A CN112070140A CN202010904135.6A CN202010904135A CN112070140A CN 112070140 A CN112070140 A CN 112070140A CN 202010904135 A CN202010904135 A CN 202010904135A CN 112070140 A CN112070140 A CN 112070140A
- Authority
- CN
- China
- Prior art keywords
- test data
- core
- core points
- points
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Abstract
The invention discloses a density clustering mark pattern recognition method based on dimension decomposition, which comprises the steps of taking a core point matrix from training data according to a clustering core point index set, taking the kth test data in an unmanned aerial vehicle test data matrix, solving a neighboring core point set of the kth test data, analyzing the neighboring core point set of the kth test data, recognizing a clustering mark of the kth test data, and traversing each test data in the unmanned aerial vehicle test data matrix. The invention only needs to input the neighborhood radius, and gets rid of the disturbance of the algorithm super parameter adjustment. Modeling is not needed and algorithm overhead is small.
Description
Technical Field
The method belongs to the field of pattern recognition, and particularly relates to a density clustering mark-like pattern recognition method based on dimension decomposition.
Background
The density clustering algorithm DBSCAN has the advantages that data in any shape can be processed, the clustering quantity can be automatically deduced according to the rule of the data, noise data can be automatically eliminated, and the like, so that the method is widely applied to multiple fields.
After the DBSCAN algorithm performs cluster analysis on the original cluster data (training data), the training data is divided into a plurality of clusters and labeled with different class labels, i.e., the data is grouped. In practical applications, it is often necessary to determine to which group of training data the new data (test data) belongs, i.e. cluster landmark pattern recognition of the test data.
Common pattern recognition methods include similarity (distance) -based pattern recognition methods based on neural networks and machine learning. The similarity-based pattern recognition method judges the class labels according to the spatial similarity of the test data and the training data, and the calculation cost is high. The pattern recognition method based on the neural network needs modeling, the model is easy to fall into local optimal and overfitting, and the calculation cost is large when the data volume is increased. The pattern recognition method based on machine learning also needs to learn the rule modeling of training data and class marks so as to recognize the class marks of test data, but the algorithm has the following problems: 1) the calculation overhead is large when the data amount increases; 2) over-fitting or under-fitting is easily generated; 3) the method is troubled by the algorithm super-parameter tuning. Common pattern recognition methods based on machine learning include decision trees, discriminant analysis, logistic regression, naive Bayes, support vector machines, nearest neighbor classification, integration algorithms, and the like.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a density clustering logo pattern recognition method based on dimension decomposition.
The method makes full use of the calculation convenience brought by dimension decomposition, and in the algorithm, firstly, the range of the data to be processed is reduced by performing searching and combinational logic judgment based on the dimension decomposition, and then, the clustering class labels are further accurately identified. The algorithm has small total calculation overhead and high accuracy.
The above object of the present invention is achieved by the following technical solutions:
a density clustering mark-like pattern recognition method based on dimension decomposition comprises the following steps:
The input cluster core point index set C and neighborhood radius Eps,
test data T of input unmanned aerial vehicle is T ═ T1,t2,…,tpWhere p is the total number of test data, the dimension of the test data is n,
Step 3, taking kth test data t in unmanned aerial vehicle test data matrixk,As test data tkTraversing all core points in the core point matrix CX if the ith core point CXiEach dimension value satisfies:
Then the core point CX is setiLogging test data tkSet of neighboring core points N1,
and 5, repeating the steps 3 to 4 to traverse each test data in the unmanned aerial vehicle test data matrix.
The step 4 comprises the following steps:
step 4.1, if the test data tkSet of neighboring core points N1If it is empty, test data t will be sentkMarking as a noise point;
step 4.2, if the test data tkSet of neighboring core points N1Has only one core point, which is marked as core point CXrThen, further determine whether the following equation holds:
||CXr-tk||2≤Eps
wherein | · | purple sweet2Which means that the 2-norm operation is taken,
if the above formula is true, then
In the above formula, the first and second carbon atoms are,representing core points CXrThe cluster class label of (a) is,as test data tkThe cluster class label of (a) is,
if not, testing data tkMarking as a noise point;
step 4.3, if the test data tkSet of neighboring core points N1The number of central core points is more than 1, and the test data tkSet of neighboring core points N1Mid-rejection distance test data tkObtaining a new set of neighboring core points from the core points having a Euclidean distance greater than EpsAnalyzing a set of new neighboring core pointsIdentifying test data tkThe cluster class label.
The step 4.3 comprises the following steps:
step 4.3.1, from test data tkSet of neighboring core points N1Mid-rejection distance test data tkObtaining a new set of neighboring core points from the core points having a Euclidean distance greater than Eps
Step 4.3.2, if the new set of neighboring core pointsIf it is empty, test data t will be sentkMarking as a noise point;
step 4.3.3, if the new set of neighboring core pointsHas only one core point, which is marked as core point CXfThen, then
In the above formula, the first and second carbon atoms are,representing core points CXfThe cluster class label of (a) is,as test data tkThe cluster class label;
step 4.3.4, if the new neighboring core point setIf the number of the central core points is multiple and the clustering class labels are the same, the new adjacent core points are gatheredThe cluster class label of the central core point is taken as test data tkCluster type mark of
Step 4.3.5, if the new set of neighboring core pointsThe number of the central core points is moreIf the cluster type labels are different, then the new neighboring core point setFind and test data t inkEuropean nearest core point CXz,Representing core points CXzClustering class mark of (2) to classify core point CXzCluster type mark ofAs tkCluster type mark of
Compared with the prior art, the invention has the following advantages:
1. no additional parameters. The algorithm only needs to input the neighborhood radius Eps during DBSCAN clustering analysis, and does not need to input extra parameters, so that the trouble of excessive parameter adjustment of the algorithm is eliminated.
2. No modeling is required. The algorithm judges and identifies the test data class labels based on dimension decomposition and mathematical rules on the basis of the clustering principle of the deep research DBSCAN algorithm. The algorithm does not need to be modeled in advance by means of training data, and overfitting, under-fitting and falling into local optimal risks do not exist.
3. The algorithm overhead is small. The algorithm of the invention firstly decomposes multidimensional data, respectively executes search operation on each dimensionality, then reduces the range to be processed by combinational logic judgment and then runs distance operation with larger calculation cost, thus having remarkable calculation advantages compared with other pattern recognition methods based on similarity (distance), neural network and machine learning.
Drawings
FIG. 1 is unmanned aerial vehicle training data and class labels;
FIG. 2 is a diagram of unmanned aerial vehicle test data and true class labels;
fig. 3 is a result of identifying the working condition of the test data of the unmanned aerial vehicle, where fig. 3(a) is a true class mark of the test data, fig. 3(b) is a weighted KNN algorithm class mark, fig. 3(c) is a weighted KNN algorithm class mark difference, fig. 3(d) is an algorithm class mark of the present invention, and fig. 3(e) is an algorithm class mark difference of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples for the purpose of facilitating understanding and practice of the invention by those of ordinary skill in the art, and it is to be understood that the present invention has been described in the illustrative embodiments and is not to be construed as limited thereto.
Example (b):
a density clustering mark-like pattern recognition method based on dimension decomposition comprises the following steps:
Training data class mark corresponding to input training data XWherein x is1~xmFor the number m of training data sets,for each training data x1~xmCorresponding to the clustering class mark, m is the total number of training data, the dimensionality of the training data is n,
as shown in fig. 1. And inputting a clustering core point index set C and a neighborhood radius Eps of DBSCAN clustering analysis.
Test data T of input unmanned aerial vehicle is T ═ T1,t2,…,tpWherein t is1~tpP test data, p is the total number of the test data, and n is the dimension of the test data which is the same as the dimension of the training data. By usingIndicating test data to be identifiedThe number of the class label is set as the standard,for each test data t1~tpAnd (5) corresponding to the clustering class labels. To determine the accuracy of the test data algorithm for identifying the class labels relative to the actual class labels, the actual class labels of the test data are given, as shown in fig. 2.
And 2, taking out a core point matrix CX from the training data X according to the clustering core point index set C. With ACRepresenting the total number of core points in a matrix of core points CX, CXiRepresenting the ith core point in the matrix of core points CX, whereinAs core points CXiN dimensional values. By usingA cluster type mark for representing the ith core point, wherein the serial number of the core point in the i core point matrix CX, and the range of i is [1, AC]。
Step 3, taking kth test data t in the test data matrix of the unmanned aerial vehiclekK is in the range of [1, p ]]WhereinFor this purpose test data tkN dimensional values. Traversing all core points in the core point matrix CX if the ith core point CXiEach dimension value satisfies:
Then the core point CX is setiLogging test data tkSet of neighboring core points N1Up to ACThe traversal of the core points ends.
step 4.1, if the test data tkSet of neighboring core points N1If it is empty, test data t will be sentkMarked as noise points.
Step 4.2, if the test data tkSet of neighboring core points N1Has only one core point, which is marked as core point CXrThen, further determine whether the following equation holds:
||CXr-tk||2≤Eps
in the above formula, | · the luminance | |2Representing a 2 norm operation.
If the above formula is true, then
In the above formula, the first and second carbon atoms are,representing core points CXrThe cluster class label of (a) is,as test data tkThe cluster class label.
If not, testing data tkMarked as noise points.
Step 4.3, if the test data tkSet of neighboring core points N1When the number of the central core points is more than 1, identifying the test data t according to the following stepskThe cluster class label;
step 4.3.1, from test data tkSet of neighboring core points N1Mid-rejection distance test data tkObtaining a new set of neighboring core points for which the Euclidean distance is greater than EpsAnd (4) showing.
Step 4.3.2, if the new set of neighboring core pointsIf it is empty, test data t will be sentkMarked as noise points.
Step 4.3.3, if the new set of neighboring core pointsHas only one core point, which is marked as core point CXfThen, then
In the above formula, the first and second carbon atoms are,representing core points CXfThe cluster class label of (a) is,as test data tkThe cluster class label.
Step 4.3.4, if the new neighboring core point setIf there are still more (greater than 1) core points and the clustering criteria are the same, then the new neighboring core points are collectedThe cluster class mark of the central core point is used as test data tkCluster type mark of
Step 4.3.5, if the new set of neighboring core pointsIf there are still more (greater than 1) core points and the cluster labels are different, then the new neighboring core point setIn-process finding and testingData tkEuropean nearest core point CXz,Representing core points CXzClustering of (2) class mark, with CXzCluster type mark ofIs identified as tkCluster type mark of
And 5, repeating the steps 3 to 4 to traverse each test data in the unmanned aerial vehicle test data matrix.
In order to check the algorithm of the invention, a weighted KNN algorithm is used as a comparison. Drawing the real class mark of the test data and identifying the class mark effect by two algorithms, as shown in figure 3. As can be seen from the figure, the classification accuracy of the algorithm of the present invention on the test data class of the drone is 100%, which is higher than the recognition accuracy (97.58%) of the weighted KNN algorithm. From the aspect of running time, under the same platform (MATLAB2019a, 64G memory, 3.2GHz main frequency), the average time of the algorithm is 0.004708 seconds after multiple running, which is far lower than the average time of 0.062163 seconds after multiple running of weighted KNN.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (3)
1. A density clustering mark-like pattern recognition method based on dimension decomposition is characterized by comprising the following steps:
step 1, inputting the training data X of the clustered unmanned aerial vehicle as X1,x2,…,xmWhere m is the total number of training data, the dimensionality of the training data is n,
A set C of cluster core point indices and a neighborhood radius Eps are input,
test data T of input unmanned aerial vehicle is T ═ T1,t2,…,tpWhere p is the total number of test data, the dimension of the test data is n,
step 2, extracting a core point matrix CX, A from the training data X according to the clustering core point index set CCBeing the total number of core points in the matrix of core points CX, CXiRepresenting the ith core point in the matrix of core points CX,as core points CXiN dimensional values of (a), i ranges from [1, A ]C],
Step 3, taking kth test data t in unmanned aerial vehicle test data matrixk,As test data tkTraversing all core points in the core point matrix CX if the ith core point CXiEach dimension value satisfies:
Then the core point CX is setiLogging test data tkSet of neighboring core points N1,
Step 4, analyzing the test data tkSet of neighboring core points N1For test data tkThe cluster class labels of (a) are identified,
and 5, repeating the steps 3 to 4 to traverse each test data in the unmanned aerial vehicle test data matrix.
2. The method for recognizing the density cluster label mode based on the dimension decomposition as claimed in claim 1, wherein the step 4 comprises the following steps:
step 4.1, if the test data tkSet of neighboring core points N1If it is empty, test data t will be sentkMarking as a noise point;
step 4.2, if the test data tkSet of neighboring core points N1Has only one core point, which is marked as core point CXrThen, further determine whether the following equation holds:
||CXr-tk||2≤Eps
wherein | · | purple sweet2Which means that the 2-norm operation is taken,
if the above formula is true, then
In the above formula, the first and second carbon atoms are,representing core points CXrThe cluster class label of (a) is,as test data tkThe cluster class label of (a) is,
if not, testing data tkMarking as a noise point;
step 4.3, if the test data tkSet of neighboring core points N1The number of central core points is more than 1, and the test data tkSet of neighboring core points N1Mid-rejection distance test data tkObtaining a new set of neighboring core points from the core points having a Euclidean distance greater than EpsAnalyzing a set of new neighboring core pointsIdentifying test data tkThe cluster class label.
3. The method for recognizing the density cluster label mode based on the dimension decomposition as claimed in claim 1, wherein the step 4.3 comprises the following steps:
step 4.3.1, from test data tkSet of neighboring core points N1Mid-rejection distance test data tkObtaining a new set of neighboring core points from the core points having a Euclidean distance greater than Eps
Step 4.3.2, if the new set of neighboring core pointsIf it is empty, test data t will be sentkMarking as a noise point;
step 4.3.3, if the new set of neighboring core pointsHas only one core point, which is marked as core point CXfThen, then
In the above formula, the first and second carbon atoms are,representing core points CXfThe cluster class label of (a) is,as test data tkThe cluster class label;
step 4.3.4, if the new neighboring core point setIf the number of the central core points is multiple and the clustering class labels are the same, the new adjacent core points are gatheredThe cluster class mark of the central core point is used as test data tkCluster type mark of
Step 4.3.5, if the new set of neighboring core pointsIf the number of the central core points is multiple and the clustering labels are different, then the new adjacent core point setFind and test data t inkEuropean nearest core point CXz,Representing core points CXzClustering class mark of (2) to classify core point CXzCluster type mark ofAs tkCluster type mark of
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010904135.6A CN112070140B (en) | 2020-09-01 | 2020-09-01 | Density clustering mark-like pattern recognition method based on dimension decomposition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010904135.6A CN112070140B (en) | 2020-09-01 | 2020-09-01 | Density clustering mark-like pattern recognition method based on dimension decomposition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112070140A true CN112070140A (en) | 2020-12-11 |
CN112070140B CN112070140B (en) | 2022-05-03 |
Family
ID=73666064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010904135.6A Active CN112070140B (en) | 2020-09-01 | 2020-09-01 | Density clustering mark-like pattern recognition method based on dimension decomposition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112070140B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966773A (en) * | 2021-03-24 | 2021-06-15 | 山西大学 | Unmanned aerial vehicle flight condition mode identification method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055212A1 (en) * | 2009-09-01 | 2011-03-03 | Cheng-Fa Tsai | Density-based data clustering method |
CN103714153A (en) * | 2013-12-26 | 2014-04-09 | 西安理工大学 | Density clustering method based on limited area data sampling |
WO2015127801A1 (en) * | 2014-02-28 | 2015-09-03 | 小米科技有限责任公司 | Clustering method, apparatus, and terminal device |
US20180150724A1 (en) * | 2016-11-30 | 2018-05-31 | Cylance Inc. | Clustering Analysis for Deduplication of Training Set Samples for Machine Learning Based Computer Threat Analysis |
CN110490088A (en) * | 2019-07-26 | 2019-11-22 | 北京工业大学 | DBSCAN Density Clustering method based on region growth method |
CN110942099A (en) * | 2019-11-29 | 2020-03-31 | 华侨大学 | Abnormal data identification and detection method of DBSCAN based on core point reservation |
-
2020
- 2020-09-01 CN CN202010904135.6A patent/CN112070140B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055212A1 (en) * | 2009-09-01 | 2011-03-03 | Cheng-Fa Tsai | Density-based data clustering method |
CN103714153A (en) * | 2013-12-26 | 2014-04-09 | 西安理工大学 | Density clustering method based on limited area data sampling |
WO2015127801A1 (en) * | 2014-02-28 | 2015-09-03 | 小米科技有限责任公司 | Clustering method, apparatus, and terminal device |
US20180150724A1 (en) * | 2016-11-30 | 2018-05-31 | Cylance Inc. | Clustering Analysis for Deduplication of Training Set Samples for Machine Learning Based Computer Threat Analysis |
CN110490088A (en) * | 2019-07-26 | 2019-11-22 | 北京工业大学 | DBSCAN Density Clustering method based on region growth method |
CN110942099A (en) * | 2019-11-29 | 2020-03-31 | 华侨大学 | Abnormal data identification and detection method of DBSCAN based on core point reservation |
Non-Patent Citations (2)
Title |
---|
GUO K,ET.AL: "UAV sensor fault detection u-", 《SENSORS》 * |
王小林等: "一种面向大规模二维点集数据的密度聚类算法", 《安徽工业大学学报(自然科学版)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966773A (en) * | 2021-03-24 | 2021-06-15 | 山西大学 | Unmanned aerial vehicle flight condition mode identification method and system |
CN112966773B (en) * | 2021-03-24 | 2022-05-31 | 山西大学 | Unmanned aerial vehicle flight condition mode identification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112070140B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ibrahim et al. | Cluster representation of the structural description of images for effective classification | |
CN106055573B (en) | Shoe print image retrieval method and system under multi-instance learning framework | |
CN110942091A (en) | Semi-supervised few-sample image classification method for searching reliable abnormal data center | |
CN109543723B (en) | Robust image clustering method | |
CN111160750A (en) | Distribution network analysis and investment decision method based on association rule mining | |
CN106815362A (en) | One kind is based on KPCA multilist thumbnail Hash search methods | |
CN110097060A (en) | A kind of opener recognition methods towards trunk image | |
CN105930792A (en) | Human action classification method based on video local feature dictionary | |
CN112070140B (en) | Density clustering mark-like pattern recognition method based on dimension decomposition | |
Poojitha et al. | A collocation of IRIS flower using neural network clustering tool in MATLAB | |
CN111767273B (en) | Data intelligent detection method and device based on improved SOM algorithm | |
Hamza et al. | Incremental classification of invoice documents | |
CN112785015A (en) | Equipment fault diagnosis method based on case reasoning | |
CN112508363A (en) | Deep learning-based power information system state analysis method and device | |
CN106203469A (en) | A kind of figure sorting technique based on orderly pattern | |
CN115879046A (en) | Internet of things abnormal data detection method based on improved feature selection and hierarchical model | |
CN109214466A (en) | A kind of novel clustering algorithm based on density | |
CN112465253B (en) | Method and device for predicting links in urban road network | |
Cohen-Shapira et al. | TRIO: Task-agnostic dataset representation optimized for automatic algorithm selection | |
CN111402205B (en) | Mammary tumor data cleaning method based on multilayer perceptron | |
CN110533080B (en) | Fuzzy rule set-based breast cancer cell image classification method | |
Endo et al. | A clustering method using hierarchical self-organizing maps | |
Mohseni et al. | Outlier Detection in Test Samples using Standard Deviation and Unsupervised Training Set Selection | |
Liu et al. | An accurate method of determining attribute weights in distance-based classification algorithms | |
CN115269855B (en) | Paper fine-grained multi-label labeling method and device based on pre-training encoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |