CN112070140A

CN112070140A - Density clustering mark-like pattern recognition method based on dimension decomposition

Info

Publication number: CN112070140A
Application number: CN202010904135.6A
Authority: CN
Inventors: 梁少军
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2020-12-11
Anticipated expiration: 2040-09-01
Also published as: CN112070140B

Abstract

The invention discloses a density clustering mark pattern recognition method based on dimension decomposition, which comprises the steps of taking a core point matrix from training data according to a clustering core point index set, taking the kth test data in an unmanned aerial vehicle test data matrix, solving a neighboring core point set of the kth test data, analyzing the neighboring core point set of the kth test data, recognizing a clustering mark of the kth test data, and traversing each test data in the unmanned aerial vehicle test data matrix. The invention only needs to input the neighborhood radius, and gets rid of the disturbance of the algorithm super parameter adjustment. Modeling is not needed and algorithm overhead is small.

Description

Density clustering mark-like pattern recognition method based on dimension decomposition

Technical Field

The method belongs to the field of pattern recognition, and particularly relates to a density clustering mark-like pattern recognition method based on dimension decomposition.

Background

The density clustering algorithm DBSCAN has the advantages that data in any shape can be processed, the clustering quantity can be automatically deduced according to the rule of the data, noise data can be automatically eliminated, and the like, so that the method is widely applied to multiple fields.

After the DBSCAN algorithm performs cluster analysis on the original cluster data (training data), the training data is divided into a plurality of clusters and labeled with different class labels, i.e., the data is grouped. In practical applications, it is often necessary to determine to which group of training data the new data (test data) belongs, i.e. cluster landmark pattern recognition of the test data.

Common pattern recognition methods include similarity (distance) -based pattern recognition methods based on neural networks and machine learning. The similarity-based pattern recognition method judges the class labels according to the spatial similarity of the test data and the training data, and the calculation cost is high. The pattern recognition method based on the neural network needs modeling, the model is easy to fall into local optimal and overfitting, and the calculation cost is large when the data volume is increased. The pattern recognition method based on machine learning also needs to learn the rule modeling of training data and class marks so as to recognize the class marks of test data, but the algorithm has the following problems: 1) the calculation overhead is large when the data amount increases; 2) over-fitting or under-fitting is easily generated; 3) the method is troubled by the algorithm super-parameter tuning. Common pattern recognition methods based on machine learning include decision trees, discriminant analysis, logistic regression, naive Bayes, support vector machines, nearest neighbor classification, integration algorithms, and the like.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a density clustering logo pattern recognition method based on dimension decomposition.

The method makes full use of the calculation convenience brought by dimension decomposition, and in the algorithm, firstly, the range of the data to be processed is reduced by performing searching and combinational logic judgment based on the dimension decomposition, and then, the clustering class labels are further accurately identified. The algorithm has small total calculation overhead and high accuracy.

The above object of the present invention is achieved by the following technical solutions:

a density clustering mark-like pattern recognition method based on dimension decomposition comprises the following steps:

step 1, inputting the training data X of the clustered unmanned aerial vehicle as X₁,x₂,…,x_mWhere m is the total number of training data, the dimensionality of the training data is n,

training data class mark corresponding to input training data X

The input cluster core point index set C and neighborhood radius Eps,

test data T of input unmanned aerial vehicle is T ═ T₁,t₂,…,t_pWhere p is the total number of test data, the dimension of the test data is n,

step 2, extracting a core point matrix CX, A from the training data X according to the clustering core point index set C^CBeing the total number of core points in the matrix of core points CX, CX_iRepresenting the ith core point in the matrix of core points CX,

as core points CX_iN dimensional values of (a), i ranges from [1, A ]^C]，

Step 3, taking kth test data t in unmanned aerial vehicle test data matrix_k，

As test data t_kTraversing all core points in the core point matrix CX if the ith core point CX_iEach dimension value satisfies:

j is the dimension number of the core point, and the range of j is [1, n ]]，

Then the core point CX is set_iLogging test data t_kSet of neighboring core points N₁，

Step 4, analyzing the test data t_kSet of neighboring core points N₁For test data t_kThe cluster class labels of (a) are identified,

and 5, repeating the steps 3 to 4 to traverse each test data in the unmanned aerial vehicle test data matrix.

The step 4 comprises the following steps:

step 4.1, if the test data t_kSet of neighboring core points N₁If it is empty, test data t will be sent_kMarking as a noise point;

step 4.2, if the test data t_kSet of neighboring core points N₁Has only one core point, which is marked as core point CX_rThen, further determine whether the following equation holds:

||CX_r-t_k||₂≤Eps

wherein | · | purple sweet₂Which means that the 2-norm operation is taken,

if the above formula is true, then

In the above formula, the first and second carbon atoms are,

representing core points CX_rThe cluster class label of (a) is,

as test data t_kThe cluster class label of (a) is,

if not, testing data t_kMarking as a noise point;

step 4.3, if the test data t_kSet of neighboring core points N₁The number of central core points is more than 1, and the test data t_kSet of neighboring core points N₁Mid-rejection distance test data t_kObtaining a new set of neighboring core points from the core points having a Euclidean distance greater than Eps

Analyzing a set of new neighboring core points

Identifying test data t_kThe cluster class label.

The step 4.3 comprises the following steps:

step 4.3.1, from test data t_kSet of neighboring core points N₁Mid-rejection distance test data t_kObtaining a new set of neighboring core points from the core points having a Euclidean distance greater than Eps

Step 4.3.2, if the new set of neighboring core points

If it is empty, test data t will be sent_kMarking as a noise point;

step 4.3.3, if the new set of neighboring core points

Has only one core point, which is marked as core point CX_fThen, then

In the above formula, the first and second carbon atoms are,

representing core points CX_fThe cluster class label of (a) is,

as test data t_kThe cluster class label;

step 4.3.4, if the new neighboring core point set

If the number of the central core points is multiple and the clustering class labels are the same, the new adjacent core points are gathered

The cluster class label of the central core point is taken as test data t_kCluster type mark of

Step 4.3.5, if the new set of neighboring core points

The number of the central core points is moreIf the cluster type labels are different, then the new neighboring core point set

Find and test data t in_kEuropean nearest core point CX_z，

Representing core points CX_zClustering class mark of (2) to classify core point CX_zCluster type mark of

As t_kCluster type mark of

Compared with the prior art, the invention has the following advantages:

1. no additional parameters. The algorithm only needs to input the neighborhood radius Eps during DBSCAN clustering analysis, and does not need to input extra parameters, so that the trouble of excessive parameter adjustment of the algorithm is eliminated.

2. No modeling is required. The algorithm judges and identifies the test data class labels based on dimension decomposition and mathematical rules on the basis of the clustering principle of the deep research DBSCAN algorithm. The algorithm does not need to be modeled in advance by means of training data, and overfitting, under-fitting and falling into local optimal risks do not exist.

3. The algorithm overhead is small. The algorithm of the invention firstly decomposes multidimensional data, respectively executes search operation on each dimensionality, then reduces the range to be processed by combinational logic judgment and then runs distance operation with larger calculation cost, thus having remarkable calculation advantages compared with other pattern recognition methods based on similarity (distance), neural network and machine learning.

Drawings

FIG. 1 is unmanned aerial vehicle training data and class labels;

FIG. 2 is a diagram of unmanned aerial vehicle test data and true class labels;

fig. 3 is a result of identifying the working condition of the test data of the unmanned aerial vehicle, where fig. 3(a) is a true class mark of the test data, fig. 3(b) is a weighted KNN algorithm class mark, fig. 3(c) is a weighted KNN algorithm class mark difference, fig. 3(d) is an algorithm class mark of the present invention, and fig. 3(e) is an algorithm class mark difference of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples for the purpose of facilitating understanding and practice of the invention by those of ordinary skill in the art, and it is to be understood that the present invention has been described in the illustrative embodiments and is not to be construed as limited thereto.

Example (b):

step 1, inputting the training data X of the clustered unmanned aerial vehicle as X₁,x₂,…,x_mWhere m is the total number of training data and the dimension of the training data is n, i.e., x₁,x₂,…,x_mIs n.

Training data class mark corresponding to input training data X

Wherein x is₁～x_mFor the number m of training data sets,

for each training data x₁～x_mCorresponding to the clustering class mark, m is the total number of training data, the dimensionality of the training data is n,

as shown in fig. 1. And inputting a clustering core point index set C and a neighborhood radius Eps of DBSCAN clustering analysis.

Test data T of input unmanned aerial vehicle is T ═ T₁,t₂,…,t_pWherein t is₁～t_pP test data, p is the total number of the test data, and n is the dimension of the test data which is the same as the dimension of the training data. By using

Indicating test data to be identifiedThe number of the class label is set as the standard,

for each test data t₁～t_pAnd (5) corresponding to the clustering class labels. To determine the accuracy of the test data algorithm for identifying the class labels relative to the actual class labels, the actual class labels of the test data are given, as shown in fig. 2.

And 2, taking out a core point matrix CX from the training data X according to the clustering core point index set C. With A^CRepresenting the total number of core points in a matrix of core points CX, CX_iRepresenting the ith core point in the matrix of core points CX, wherein

As core points CX_iN dimensional values. By using

A cluster type mark for representing the ith core point, wherein the serial number of the core point in the i core point matrix CX, and the range of i is [1, A^C]。

Step 3, taking kth test data t in the test data matrix of the unmanned aerial vehicle_kK is in the range of [1, p ]]Wherein

For this purpose test data t_kN dimensional values. Traversing all core points in the core point matrix CX if the ith core point CX_iEach dimension value satisfies:

j is the dimension of the core point, and the range of j is [1, n ]]，

Then the core point CX is set_iLogging test data t_kSet of neighboring core points N₁Up to A^CThe traversal of the core points ends.

step 4.1, if the test data t_kSet of neighboring core points N₁If it is empty, test data t will be sent_kMarked as noise points.

||CX_r-t_k||₂≤Eps

in the above formula, | · the luminance | |₂Representing a 2 norm operation.

If the above formula is true, then

In the above formula, the first and second carbon atoms are,

representing core points CX_rThe cluster class label of (a) is,

as test data t_kThe cluster class label.

If not, testing data t_kMarked as noise points.

Step 4.3, if the test data t_kSet of neighboring core points N₁When the number of the central core points is more than 1, identifying the test data t according to the following steps_kThe cluster class label;

step 4.3.1, from test data t_kSet of neighboring core points N₁Mid-rejection distance test data t_kObtaining a new set of neighboring core points for which the Euclidean distance is greater than Eps

And (4) showing.

Step 4.3.2, if the new set of neighboring core points

If it is empty, test data t will be sent_kMarked as noise points.

Step 4.3.3, if the new set of neighboring core points

Has only one core point, which is marked as core point CX_fThen, then

In the above formula, the first and second carbon atoms are,

representing core points CX_fThe cluster class label of (a) is,

as test data t_kThe cluster class label.

Step 4.3.4, if the new neighboring core point set

If there are still more (greater than 1) core points and the clustering criteria are the same, then the new neighboring core points are collected

The cluster class mark of the central core point is used as test data t_kCluster type mark of

Step 4.3.5, if the new set of neighboring core points

If there are still more (greater than 1) core points and the cluster labels are different, then the new neighboring core point set

In-process finding and testingData t_kEuropean nearest core point CX_z，

Representing core points CX_zClustering of (2) class mark, with CX_zCluster type mark of

Is identified as t_kCluster type mark of

In order to check the algorithm of the invention, a weighted KNN algorithm is used as a comparison. Drawing the real class mark of the test data and identifying the class mark effect by two algorithms, as shown in figure 3. As can be seen from the figure, the classification accuracy of the algorithm of the present invention on the test data class of the drone is 100%, which is higher than the recognition accuracy (97.58%) of the weighted KNN algorithm. From the aspect of running time, under the same platform (MATLAB2019a, 64G memory, 3.2GHz main frequency), the average time of the algorithm is 0.004708 seconds after multiple running, which is far lower than the average time of 0.062163 seconds after multiple running of weighted KNN.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A density clustering mark-like pattern recognition method based on dimension decomposition is characterized by comprising the following steps:

training data class corresponding to input training data XSign board

A set C of cluster core point indices and a neighborhood radius Eps are input,

as core points CX_iN dimensional values of (a), i ranges from [1, A ]^C]，

Step 3, taking kth test data t in unmanned aerial vehicle test data matrix_k，

j is the dimension number of the core point, and the range of j is [1, n ]]，

2. The method for recognizing the density cluster label mode based on the dimension decomposition as claimed in claim 1, wherein the step 4 comprises the following steps:

||CX_r-t_k||₂≤Eps

wherein | · | purple sweet₂Which means that the 2-norm operation is taken,

if the above formula is true, then

In the above formula, the first and second carbon atoms are,

representing core points CX_rThe cluster class label of (a) is,

as test data t_kThe cluster class label of (a) is,

if not, testing data t_kMarking as a noise point;

Analyzing a set of new neighboring core points

Identifying test data t_kThe cluster class label.

3. The method for recognizing the density cluster label mode based on the dimension decomposition as claimed in claim 1, wherein the step 4.3 comprises the following steps: