CN112966567B

CN112966567B - Coordinate positioning method and system based on PCA (principal component analysis), clustering and K nearest neighbor

Info

Publication number: CN112966567B
Application number: CN202110163315.8A
Authority: CN
Inventors: 周品艺
Original assignee: Shenzhen Pinzhi Information Technology Co ltd
Current assignee: Shenzhen Pinzhi Information Technology Co ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-12-10
Anticipated expiration: 2041-02-05
Also published as: CN112966567A

Abstract

The application relates to a coordinate positioning method, a system, a storage medium and a terminal based on PCA (principal component analysis), clustering and K nearest neighbor, which solve the problem that a CPU (central processing unit) is needed to do a large amount of calculation when the position of an energy waveform peak point received by an antenna needs to be acquired, and are troublesome, and comprise the following steps: acquiring data received by all antennas; carrying out Principal Component Analysis (PCA) dimensionality reduction on the acquired data received by all the antennas; on the premise of keeping the maximum variance in each dimension of data of the dimensions, retaining the principal component with the maximum information content of the data subjected to PCA dimension reduction processing by a principal component analysis method and classifying the data by using a K-Means method; and calculating the corresponding physical position of each class, and removing noise points in the clusters by using K neighbor in the query operation. The method uses the energy data received by all the antennas to calculate the position of the energy waveform peak point, and is characterized in that the calculation error is less than 0.01mm, and the time complexity of the method is low, so that the calculation speed of more than 500 times per second on the MCU is ensured.

Description

Coordinate positioning method and system based on PCA (principal component analysis), clustering and K nearest neighbor

Technical Field

The application relates to the technical field of coordinate positioning, in particular to a coordinate positioning method, a system, a storage medium and a terminal based on PCA (principal component analysis), clustering and K nearest neighbor.

Background

An antenna is a transducer that converts a guided wave propagating on a transmission line into an electromagnetic wave propagating in an unbounded medium (usually free space) or vice versa. A component for transmitting or receiving electromagnetic waves in a radio device. Engineering systems such as radio communication, broadcasting, television, radar, navigation, electronic countermeasure, remote sensing, radio astronomy and the like all use electromagnetic waves to transmit information and work by depending on antennas. In addition, in transferring energy with electromagnetic waves, non-signal energy radiation also requires antennas. The antennas are generally reciprocal in that the same pair of antennas can be used as both transmit and receive antennas. The same antenna is the same as the basic characteristic parameter for transmission or reception. This is the reciprocity theorem for antennas.

Energy data received by existing antennas is often in different dimensions.

For the above related technologies, the inventor thinks that it is troublesome to require a CPU to perform a large amount of calculations when a worker needs to obtain the position of the peak of the energy waveform received by the antenna.

Disclosure of Invention

In order to calculate the position of an energy waveform peak point, the method is characterized in that the calculation error is less than 0.01mm, the time complexity of the method is low, and the calculation speed of each second is more than 500 times on a micro processor MCU (microprogrammed control Unit), the application provides a coordinate positioning method, a system, a storage medium and a terminal based on PCA (principal component analysis), clustering and K nearest neighbor, and the following technical scheme is adopted:

in a first aspect, the present application provides a coordinate positioning method based on PCA and clustering and K nearest neighbor, which adopts the following technical scheme:

a coordinate positioning method based on PCA and clustering and K nearest neighbor comprises the following steps:

acquiring data received by all antennas;

carrying out Principal Component Analysis (PCA) dimensionality reduction on the acquired data received by all the antennas;

on the premise of keeping the maximum variance in each dimension of data of each dimension, the data subjected to PCA dimension reduction processing by a principal component analysis method projects the original high-dimensional data in a low-dimensional space by searching a new vector base, removes noise with smaller variance, retains the principal component with the maximum information content and classifies the data by adopting a K-Means method;

and calculating the corresponding physical position of each class, and removing noise points in the clusters by using K neighbor in the query operation.

Optionally, the data received by all the antennas includes training set data and test set data; wherein the training set X_train＝{(x_i，y_i)∈R[0，1]1,2,3 … n and test set

X_test＝{(x_i，y_i)∈R[0，1]1,2,3 … m, where i is the number of a set of antenna strength and position data,

d is the total number of antennas, yi is the actual physical location.

Optionally, the dimensionality reduction processing on the acquired antenna data by a Principal Component Analysis (PCA) method includes the following steps:

solving the covariance matrix by using the PCA algorithm for decomposing the covariance matrix based on the eigenvalues

The eigenvalues and eigenvectors of (a);

according to the sorting of the eigenvalues from large to small, the first z are taken after the sorting, so that the condition that z is the minimum is met

Wherein take

t is 0.01, yielding z is 7;

mapping X' to PX for training set and test set by using the characteristic vector matrix P formed by the large characteristic values of the previous z to obtain a new training set X_train′＝{(x_i′，y_i′)∈R[0，1]1,2,3 … n and test set X_test′＝{(x_i′，y_i)∈R[0，1]1,2,3 … m, wherein

Optionally, the step of classifying the data by using the K-Means method is as follows:

defining a post-clustering error estimate as

Wherein

t is the number of data in the cluster, and the function is a monotone decreasing function;

y in cluster_iConforming to the positive Tai distribution N (mu, sigma)²) The area in the range of the horizontal axis (μ -2 σ, μ +2 σ) is 95%, so that

e (K) 0.005 can meet the requirement that the error is less than 0.01mm to obtain

K is 1500. Each cluster represents a physical location of

Thus obtaining

X_train' clustered set X_train"＝{(X_i"_iy_i")∈R[0,1]1,2,3 … K, wherein

Is the center of the cluster and,

y_iis "clustered

Optionally, the step of calculating the physical location corresponding to each class and using the K neighbor to remove noise existing in the cluster in the query operation is as follows:

take (x)_p′，y_p') and X_train"Euclidean distance in set front

Small point of Q, physical position is front small point of Q and y is in the set_i"median;

using X_train"build a binary tree for level i node x_eGuarantee the left subtree

Right subtree

The query process traverses the binary tree and maintains a maximum heap of Q size;

if the point traversed to (x)_p′，y_p') Euclidean distance ratio of heap vertex to

(x_p′，y_p') the Euclidean distance is small, deleting the heap vertex and inserting the traversal point;

if the point traversing to the ith layer meets the requirement

Greater than the heap apex and (x)_p′，y_p') and the time complexity of traversing the tree is O (log) without traversing the subtree₂K) The time complexity of maintaining the maximum heap is

O(log₂Q), then the total time complexity of one query is

O(log₂K*log₂Q)。

In a second aspect, the present application provides a coordinate positioning system based on PCA and clustering and K nearest neighbor, which adopts the following technical solution:

a PCA and cluster and K-nearest neighbor based coordinate positioning system comprising a memory, a processor and a program stored on the memory and executable on the processor, the program being capable of being loaded and executed by the processor to implement a PCA and cluster and K-nearest neighbor based coordinate positioning method according to any of the preceding claims.

In a third aspect, the present application provides a computer storage medium, which adopts the following technical solutions:

a computer storage medium comprising a memory, a processor and a program stored on the memory and executable on the processor, the program being capable of being loaded and executed by the processor to implement a PCA and clustering and K-nearest neighbor based coordinate positioning method according to any of the preceding claims.

In a fourth aspect, the present application provides a terminal, which adopts the following technical solution:

a terminal comprising a memory, a processor and a program stored on the memory and executable on the processor, the program being capable of being loaded and executed by the processor to implement a PCA and clustering and K-nearest neighbor based coordinate positioning method according to any of the preceding claims.

To sum up, the beneficial technical effect of this application does:

the principal component analysis method is adopted to perform dimensionality reduction analysis on original high-dimensional data, interference of useless information is reduced, then topological relations of different antenna information are analyzed according to clustering results, and finally the position of an energy waveform peak point can be effectively analyzed by applying k nearest neighbor.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a schematic step diagram of a PCA and clustering and K-nearest neighbor based coordinate positioning method according to an embodiment of the present application.

Fig. 2 is a detailed step diagram of step S200 in fig. 1.

Fig. 3 is a schematic diagram of the processing step of dimensionality reduction of the acquired antenna data by principal component analysis PCA mentioned in step S300 of fig. 1.

FIG. 4 is a schematic diagram of the step of classifying data by the K-Means method mentioned in step S400 of FIG. 1.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.

In addition, the descriptions related to "first", "second", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include a single feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

In the present invention, unless otherwise expressly stated or limited, the terms "connected," "secured," and the like are to be construed broadly, and for example, "secured" may be a fixed connection, a removable connection, or an integral part; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The present application is described in further detail below with reference to the attached drawings.

Referring to fig. 1, a PCA and clustering and K-nearest neighbor based coordinate positioning method disclosed in the present application includes steps S100 to S400.

In step S100, data received by all antennas is acquired.

Specifically, the acquiring of the data received by all the antennas in step S100 is mainly implemented by a high-precision stepper, and specifically, the acquired data are a training set and a test set, where the training set X is_train＝{(x_i,y_i)∈R[0,1]1,2,3.. n }, test set X_test＝{(x_i，y_i)∈R[0，1]1,2,3 … m, wherein

i is the number of a set of antenna strength and position data,

d is the total number of antennas, yi is the actual physical location.

Referring to fig. 2, in step S200, the acquired data received by all antennas is subjected to a principal component analysis PCA dimension reduction process.

The expected effect is not achieved considering that the use of feature values with too high dimensions will overwhelm the useful information therein. Therefore, the Principal Component Analysis (PCA) method is adopted to perform dimensionality reduction projection on preprocessed data, the PCA dimensionality reduction is also called principal component analysis, the idea is to project original high-dimensional data in a low-dimensional space by searching a new vector base under the premise of keeping the maximum variance in each dimensionality of the data, eliminate noise with small variance and keep the principal component with the maximum information content.

Referring to fig. 2, specifically, step S200 may be divided into steps S210 to S230.

In step S210, the training set and the test set x are separately aligned using Principal Component Analysis (PCA)_iPerforming dimension reduction processing, wherein the covariance matrix is obtained by using PCA algorithm based on eigenvalue decomposition covariance matrix

The eigenvalues and eigenvectors of (c).

In step S220, according to the sorting of the eigenvalues from large to small, the top z are taken after the sorting, so that the condition that z is the minimum is satisfied

Wherein take

t is 0.01, yielding z is 7.

In step S230, the training set and the test set are mapped X' ═ PX using the previous z large eigenvalue to form an eigenvector matrix P, and a new training set X is obtained_train′＝{(x_i′，y_i′)∈R[0，1]1,2,3 … n and test set X_test′＝{(x_i′，y_i)∈R[0，1]1,2,3 … m, wherein

In step S300, on the premise of keeping the maximum variance in each dimension of data of each dimension, the data processed by principal component analysis PCA dimension reduction projects the original high-dimensional data in a low-dimensional space by finding a new vector base, removes noise with a small variance, retains the principal component with the maximum information amount, and classifies the data by using a K-Means method.

The data after PCA dimensionality reduction is subjected to clustering analysis by adopting a K-Means method, the K-Means clustering is very wide in application due to excellent calculating speed and classification performance and belongs to the most common algorithm in unsupervised clustering, the core point of the K-Means algorithm is iteration of a clustering center, the clustering center is initialized randomly according to a preset clustering number, all samples are classified according to the distance from the samples to each center, the error sum from each type of internal sample to the center is calculated, the average value of the internal samples is used as a new clustering center, and iteration is carried out continuously until the error sum in the class reaches the minimum value range.

Specifically, the clustering K-means algorithm used in step S300 is applied to X_train' Classification is carried out, and Euclidean distance is used for the weight value of each group of data to the cluster center in each iteration. The selection of the K value can greatly influence the result, if the K value is too small, the accuracy after clustering can not reach the error requirement and is less than 0.01mm, and if the K value is too large, a large number of clusters can be generated as noise points.

Referring to fig. 3, in detail, the classification of data by the K-Means method in step S300 can be divided into steps S3a0 through S3b 0.

In step S3a0, a post-clustering error estimate is defined herein as

Where t is the number of data in the cluster, which is a monotonically decreasing function.

In step S3b0, y in the cluster_iConforming to the positive Tai distribution N (mu, sigma)²) The area in the range of the horizontal axis (μ -2 σ, μ +2 σ) is 95%, so that

K is 1500. Each cluster represents a physical location of

Thus obtaining

X_train' clustered set X_train″＝{(x_i″，y_i″)∈R[0，1]1,2,3 … K, wherein

Is the center of the cluster and,

y_iis "clustered

In step S400, the physical location corresponding to each class is calculated, and noise existing in the clusters is removed using K neighbors in the query operation.

Referring to fig. 4, specifically, step S400 may be divided into step S4a0, step S4b0, step S4c0, step S4d0, step S4e0, and step S4d0 and step S4e0 as parallel steps.

Specifically, the query operation is at X_train"physical location is calculated in the set, here using test set X_test. For X after conversion of principal component analysis method_test' one set of data (x)_p′，y_p') because X_train"there is noise, and to avoid the effect of noise on the result, K neighbors are used here, and (x) is taken_p′，y_p') and X_train"the Euclidean distance of the front Q small point in the set, the physical position is y in the front Q small point set_i"median number. There is a need to speed up the acquisition of the pre-Q small set, using X_train"build a binary tree for level i node x_eGuarantee the left subtree

Right subtree

The query process traverses the binary tree and maintains a maximum heap of Q size. If the point traversed to (x)_p′，y_p') Euclidean distance ratio of heap vertex to (x)_p′，y_p') is small, the heap vertex is deleted and the traversal point is inserted, if the point traversed to the ith layer satisfies

Greater than heap vertex and

(x_p′，y_p') the subtree is not traversed. Here the temporal complexity of traversing the tree is O (log)₂K) The time complexity of maintaining the maximum heap is O (log)₂Q), then the total temporal complexity of one query is O (log)₂K*log₂Q)。

An embodiment of the present invention provides a computer-readable storage medium, which includes a program capable of being loaded and executed by a processor to implement any one of the methods shown in fig. 1-4.

The computer-readable storage medium includes, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Based on the same inventive concept, the embodiment of the present invention provides a coordinate positioning system based on PCA and clustering and K-nearest neighbor, which includes a memory and a processor, wherein the memory stores a program capable of running on the processor to implement any one of the methods shown in fig. 1 to fig. 4.

Based on the same inventive concept, embodiments of the present invention provide a terminal, which includes a program capable of being loaded and executed by a processor to implement any one of the methods shown in fig. 1 to 4.

It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The embodiments of the present invention are preferred embodiments of the present invention, and the scope of the present invention is not limited by these embodiments, so: all equivalent changes made according to the structure, shape and principle of the invention are covered by the protection scope of the invention.

Claims

1. A coordinate positioning method based on PCA, clustering and K nearest neighbor is characterized by comprising the following steps:

acquiring data received by all antennas;

calculating the physical position corresponding to each class, and removing noise points existing in the clusters by using K neighbor in query operation, specifically comprising the following steps:

taking (X)_p′，y_p') and X_train"the Euclidean distance of the front Q small point in the set, the physical position is y in the front Q small point set_i"median;

Right subtree

if the point traversed to (x)_p′，y_p') Euclidean distance ratio of heap vertex to (x)_p′，y_p') the Euclidean distance is small, deleting the heap vertex and inserting the traversal point; if the point traversing to the ith layer meets the requirement

Greater than the heap apex and (x)_p′，y_p') and the time complexity of traversing the tree is 0 (log) without traversing the subtree₂K) The time complexity of maintaining the maximum heap is 0 (log)₂Q), then the total temporal complexity of one query is 0 (log)₂K*log₂Q)；

Acquiring data received by all antennas, wherein the data comprises training set data and test set data; wherein the training set

X_train＝{(x_i，y_i)∈R[0，1]1,2,3 … n and test set

d is the total number of the antennas, and yi is the real physical position;

the acquired antenna data is subjected to PCA (principal component analysis) dimension reduction processing by the principal component analysis method, and the method comprises the following steps:

The eigenvalues and eigenvectors of (a);

according to the sorting of the eigenvalues from large to small, taking the first Z after the sorting so as to ensure that Z is the largestIn a small case, satisfy

Wherein t is 0.01,

mapping X' to PX for training set and test set by using the characteristic vector matrix P formed by the previous Z large characteristic values to obtain a new training set

X_train′＝{(x_i′，y_i′)∈R[0，1]1,2,3 … n and test set

X_test′＝{(x_i′，y_i)∈R[0，1]1,2,3 … m, wherein

2. The PCA and clustering and K nearest neighbor based coordinate locating method according to claim 1, wherein: the steps of classifying the data by the K-Means method are as follows:

defining a post-clustering error estimate as

Wherein t is the number of data in the cluster, and the function is a monotone decreasing function;

y in cluster_iConforming to the positive Tai distribution N (mu, sigma)²) The area in the horizontal axis interval (mu-2 sigma, mu +2 sigma) is up to 95%, so that the requirement that the error is less than 0.01mm can be met by taking the value of e (K) ═ 0.005 to obtain

K is 1500, and each cluster represents a physical position of

Thus obtaining X_train' clustered collections

X_train″＝{(x_i″，y_i″)∈R[0，1]1,2,3 … K, wherein

Is the center of the cluster, y ″)_iFor clustering

3. A PCA and clustering and K-nearest neighbor based coordinate positioning system comprising a memory, a processor and a program stored on the memory and executable on the processor, the program being capable of being loaded and executed by the processor to implement a PCA and clustering and K-nearest neighbor based coordinate positioning method according to any of claims 1-2.

4. A computer storage medium comprising a memory, a processor and a program stored on the memory and executable on the processor, the program being capable of being loaded and executed by the processor to implement a PCA and clustering and K-nearest neighbor based coordinate positioning method according to any of the preceding claims 1-2.

5. A terminal comprising a memory, a processor and a program stored on the memory and executable on the processor, the program being capable of being loaded and executed by the processor to implement a PCA-based coordinate positioning method according to any of the claims 1-2.