CN111639712A

CN111639712A - Positioning method and system based on density peak clustering and gradient lifting algorithm

Info

Publication number: CN111639712A
Application number: CN202010482361.XA
Authority: CN
Inventors: 魏爱辉; 李卫宁; 张晖; 陈春海; 方士琦
Original assignee: Beidou Shurui Beijing Technology Co ltd
Current assignee: Beidou Shurui Beijing Technology Co ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-08

Abstract

The application discloses a positioning method and a system based on density peak clustering and gradient lifting algorithm, wherein the method comprises the following steps: setting a reference point in a preset area; collecting a first received signal strength value of the reference point, forming a piece of fingerprint data together with the position coordinate of the reference point, and storing the fingerprint data into a fingerprint database; acquiring a second received signal strength value of each access point acquired by the positioning terminal at a point to be positioned; comparing the second received signal strength value with the first received signal strength value, and obtaining a comparison result; and obtaining the position coordinates of the to-be-positioned point according to the comparison result. Compared with the prior art, the method has the following beneficial effects: the positioning method provided by the application realizes the measurement and calculation of the position coordinates of the to-be-measured point, can concentrate the algorithm in limited AP, can obviously reduce the error of indoor positioning compared with the KNN algorithm, and improves the positioning precision by 20%. In addition, the number of required APs is smaller for the same positioning accuracy.

Description

Positioning method and system based on density peak clustering and gradient lifting algorithm

Technical Field

The application relates to the field of positioning in a wireless local area network, in particular to a positioning method and a positioning system based on density peak clustering and a gradient lifting algorithm.

Background

Location fingerprinting is a location method based on scene analysis and matching. In an indoor environment, the signal strength of the AP (access point) transmissions received by different location points is different, and therefore, the current location information can be described by using the RSSI value of different APs at the point. The algorithm comprises two stages, an off-line stage and an on-line stage.

In the off-line stage, firstly, a reasonable reference point is planned in an area to be positioned, then RSSI signal values of all APs are collected at the reference point, and form a piece of fingerprint data together with the position coordinates of the point, and the fingerprint data are stored in a fingerprint database;

in the on-line stage, the Received Signal Strength (RSS) signal values of each AP, which are acquired by the positioning terminal at the point to be positioned, are compared with the data in the off-line database through a matching algorithm to obtain the position coordinates of the point to be positioned.

However, in the prior art, the received signal strength is easily affected by many factors such as co-channel interference, complex and changeable indoor environment, moving crowds and the like, so that the received signal strength has serious volatility, changes of the signal strength are caused, the indoor positioning accuracy is seriously affected, and a lot of problems are brought to the Wi-Fi indoor positioning technology based on the fingerprint positioning algorithm.

Disclosure of Invention

The main objective of the present application is to provide a positioning method based on density peak clustering and gradient boosting algorithm, which includes:

setting a reference point in a preset area;

collecting a first received signal strength value of the reference point, forming a piece of fingerprint data together with the position coordinate of the reference point, and storing the fingerprint data into a fingerprint database;

acquiring a second received signal strength value of each access point acquired by the positioning terminal at a point to be positioned;

comparing the second received signal strength value with the first received signal strength value, and obtaining a comparison result;

and obtaining the position coordinates of the to-be-positioned point according to the comparison result.

Optionally, collecting a first received signal strength value of the reference point, and forming a fingerprint data together with the position coordinates of the reference point, and storing the fingerprint data into a fingerprint database includes:

establishing a fingerprint database by adopting a density peak value clustering algorithm;

calculating local density and distance of a Gaussian core of the sample in the space to which the sample belongs through a density peak value clustering algorithm;

screening a density peak value clustering algorithm, and simultaneously obtaining a sample with a high value as a clustering center;

and taking the sample value of the cluster center as the first received signal strength value of the reference point.

Optionally, the screening of the density peak clustering algorithm and obtaining a high-value sample as a clustering center includes:

the received signal strength data from reference point k to signal receiving points l, m, n are filtered initially

Taking data of a point l as an X coordinate of a three-dimensional coordinate system, data of a point m as a Y coordinate, data of a point n as a Z coordinate, and τ as a recording frequency, wherein points which represent a sample as three-dimensional space S distribution are as follows:

the Euclidean distance for a sample i to a sample j in space S is defined as d_ijThe two-dimensional properties of local density and distance of the gaussian kernel of the sample i, ρ i and i respectively, are defined as follows:

wherein d is_cFor the truncation distance, p_iRepresenting a distance in space S from sample i smaller than d_cThe number of samples of (a);

d in space S_ijThe total number is N ═ N (N-1)/2, and the ascending order is:

d₁≤d₂≤…≤d_N；d_c＝d_f(Nμ)

wherein f (N mu) represents an integer obtained by rounding N mu, and mu epsilon (0, 1) is a given parameter;

for a sample i in the space S, the binary group (rho) is obtained through the calculation of the formula_i，_i)，i∈I_sDrawing a binary group (rho i, i) } of all samples in the space on a two-dimensional plane by taking rho as a horizontal axis and taking a vertical axis as a vertical axis, and selecting the binary group satisfying max { rho { (rho i, i) }_i*_i1, 2, …, with n samples as cluster centers.

Optionally, the positioning method based on density peak clustering and gradient boosting algorithm further includes: and constructing a positioning model on the basis of a gradient lifting algorithm, wherein the gradient lifting algorithm uses an addition model and continuously reduces residual errors generated in a training process to achieve an algorithm for classifying or regressing data.

Optionally, constructing the localization model based on the gradient boosting algorithm comprises:

establishing a mapping relation between fingerprint data and physical position coordinates through a gradient model algorithm, taking the fingerprint database D as an input space, and initializing a classification regression tree:

wherein, y_iRepresenting the physical position coordinates of the ith reference point; tau is an output value of a leaf node of the classification regression tree, namely a predicted value of the position coordinate of the ith reference point; n is the number of fingerprint samples; l is a loss function of the model;

using the value approximation of the negative gradient of the loss function on the current model instead of the residual error as an approximation of the error to fit the next classification backReturn to Tree, F_m-1(x) The negative gradient value of the loss function of the classification regression tree is expressed as:

the input space of the mth classification regression tree Φ { (x)₁，α_m1)，(x₂，α_m2)，…，(x_N，α_mN)}；

Calculating the output value of each child node through linear search:

and fitting the next classification regression tree by taking the negative gradient value of the current model as an approximate value of the error through the loss function, wherein the final positioning model of the gradient model algorithm is as follows:

wherein M represents the total number of the classification regression trees generated by iteration, and a regularization coefficient lambda needs to be multiplied before each classification regression tree in the iteration process_mTo avoid over-fitting the training data, the value range is (0, 1)]，τ_mjTo classify the output values of the leaf nodes of the regression tree, I denotes when x ∈β_mjTake 1 if not, or 0 if not.

Optionally, the loss function is a Huber loss function, which takes a fractional point σ as a boundary, and reduces the influence of an abnormal value on a prediction result by adopting two different strategies; for abnormal points far away from the center, an absolute value loss function is adopted, and abnormal points near the center adopt a mean square error loss function; the Huber loss function is as follows:

optionally, comparing the second received signal strength value with the first received signal strength value, and obtaining a comparison result includes:

comparing the second signal strength values with the first signal strength values in the fingerprint database in sequence;

and taking the position coordinate corresponding to the first signal strength value closest to the second signal strength value as the position coordinate of the access point.

According to another aspect of the present application, there is also provided a positioning system based on density peak clustering and gradient boosting algorithm, including:

the reference point setting module is used for setting a reference point in a preset area;

the fingerprint database establishing module is used for collecting a first received signal strength value of the reference point, forming a piece of fingerprint data together with the position coordinate of the reference point and storing the fingerprint data into a fingerprint database;

the receiving signal strength value acquisition module is used for acquiring a second receiving signal strength value of each access point acquired by the positioning terminal at a point to be positioned;

a comparison module, configured to compare the second received signal strength value with the first received signal strength value, and obtain a comparison result;

and the coordinate acquisition module is used for acquiring the position coordinate of the to-be-positioned point according to the comparison result.

Optionally, the fingerprint database establishing module includes:

the fingerprint database establishing module is used for establishing a fingerprint database by adopting a density peak value clustering algorithm;

the Gaussian kernel local density and distance calculation module calculates the Gaussian kernel local density and distance of the sample in the space to which the sample belongs through a density peak value clustering algorithm;

the cluster center screening module is used for screening the density peak value clustering algorithm and obtaining a sample with a high value as a cluster center;

and the first received signal strength value determining module is used for taking the sample value of the cluster center as the first received signal strength value of the reference point.

Optionally in a clusterThe heart screening module includes: a filtering module for setting the received signal intensity data from the reference point k to the signal receiving points l, m, n after preliminary filtering

The three-dimensional space S distribution module is used for taking data of a point I as an X coordinate of a three-dimensional coordinate system, taking data of a point m as a Y coordinate, taking data of a point n as a Z coordinate, expressing tau as a recording frequency, and expressing a sample as a point of three-dimensional space S distribution:

a definition module for defining Euclidean distance from sample i to sample j in space S as d_ijThe two-dimensional properties of local density and distance of the gaussian kernel of the sample i, ρ i and i respectively, are defined as follows:

d in space S_ijThe total number is N ═ N (N-1)/2, and the ascending order is:

d₁≤d₂≤…≤d_N；d_c＝d_f(Nμ)

a drawing module for calculating the sample i in the space S by the formula to obtain the binary group (rho)_i，_i)，i∈I_sDrawing a binary group (rho i, i) } of all samples in the space on a two-dimensional plane by taking rho as a horizontal axis and taking a vertical axis as a vertical axis, and selecting the binary group satisfying max { rho { (rho i, i) }_i*_i1, 2, …, with n samples as cluster centers.

Compared with the prior art, the method has the following beneficial effects:

the invention provides a WIFI indoor positioning algorithm based on linear discriminant analysis and a gradient lifting decision tree. The algorithm firstly uses DPC to extract main positioning characteristics in the original position fingerprint, and removes redundancy and noise; then, a GBDT positioning model is constructed by utilizing a forward distribution algorithm and an addition model, the measurement and calculation of the position coordinates of the point to be measured are realized, the algorithm can obviously reduce the error of indoor positioning in a limited AP set compared with a KNN (K-Nearest Neighbor) algorithm, and the positioning precision is improved by 20%. In addition, the number of required APs is smaller for the same positioning accuracy.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic flow chart diagram of a positioning method based on density peak clustering and gradient boosting algorithm according to an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a positioning method based on density peak clustering and gradient boosting according to an embodiment of the present application;

FIG. 3 is a sample three-dimensional space S-map according to one embodiment of the present application;

FIG. 4 is a schematic diagram of a cluster center according to one embodiment of the present application;

FIG. 5 is a comparison graph of different fingerprint data set positioning errors according to one embodiment of the present application;

FIG. 6 is a diagram illustrating maximum depths of classification regression trees, according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a computer device according to one embodiment of the present application; and

FIG. 8 is a schematic diagram of a computer-readable storage medium according to one embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Referring to fig. 1-2, an embodiment of the present application provides a positioning method based on density peak clustering and gradient boosting algorithm, including:

s2: setting a reference point in a preset area;

s4: collecting a first received signal strength value of the reference point, forming a piece of fingerprint data together with the position coordinate of the reference point, and storing the fingerprint data into a fingerprint database;

s6: acquiring a second received signal strength value of each access point acquired by the positioning terminal at a point to be positioned;

s8: comparing the second received signal strength value with the first received signal strength value, and obtaining a comparison result;

s10: and obtaining the position coordinates of the to-be-positioned point according to the comparison result.

In an embodiment of the application, a fingerprint database is established by adopting a density peak clustering algorithm; calculating local density and distance of a Gaussian core of the sample in the space to which the sample belongs through a density peak value clustering algorithm; screening a density peak value clustering algorithm, and simultaneously obtaining a sample with a high value as a clustering center; and taking the sample value of the cluster center as the first received signal strength value of the reference point.

In an embodiment of the present application, comparing the second rssi value with the first rssi value, and obtaining the comparison result includes:

In an embodiment of the present application, the step of screening the density peak clustering algorithm and obtaining a high-value sample as a clustering center includes:

the Euclidean distance for a sample i to a sample j in space S is defined as d_ijOf sample iThe two-dimensional properties of local density and distance of the Gaussian kernel, ρ i and i respectively, are defined as follows:

d in space S_ijThe total number is N ═ N (N-1)/2, and the ascending order is:

d₁≤d₂≤…≤d_N；d_c＝d_f(Nμ)

1 establishing a fingerprint library

Because interference exists among multiple groups of signals or RSS is influenced by shielding, reflection, absorption and the like of indoor objects, the RSS data needs to be removed to realize primary filtering processing of the RSS data. In order to further optimize the processing to improve the credibility of the data, a fingerprint database is established by adopting a Density peak clustering algorithm (DPC). Local density and distance of a Gaussian Kernel (Gaussian Kernel) of the sample in the space where the sample belongs are calculated through a DPC algorithm, the sample with higher value obtained by screening the two attributes is taken as a clustering center, and the clustering center has the highest density attribute in the sample space.

Referring to FIG. 3, it is assumed that RSS data from a reference point k to signal receiving points l, m, n are filtered primarily

Taking data of point l as an X coordinate of a three-dimensional coordinate system, data of point m as a Y coordinate, data of point n as a Z coordinate, τ represents the number of recordings, and a sample is represented as a point of a three-dimensional space S distribution as shown in fig. 3:

the Euclidean distance for a sample i to a sample j in space S is defined as d_ijThe two-dimensional properties of local density and distance of the gaussian kernel for sample i, ρ i and i, respectively, are defined as follows.

Wherein d is_cFor the truncation distance, p_iRepresenting a distance in space S from sample i smaller than d_cThe number of samples.

D in space S_ijThe total number is N ═ N (N-1)/2, and the ascending order is:

d₁≤d₂≤…≤d_N；d_c＝d_f(Nμ)

where f (N.mu.) represents an integer obtained by rounding off N.mu.and [ mu ] (0, 1) is a given parameter.

For a sample i in the space S, the binary group (rho) is obtained through the calculation of the formula_i，_i)，i∈I_sDrawing a binary group (rho i, i) } of all samples in the space on a two-dimensional plane by taking rho as a horizontal axis and taking a vertical axis as a vertical axis, and selecting the binary group satisfying max { rho { (rho i, i) }_i*_i1, 2, …, with n samples as cluster centers. Selecting a clustering center sample in a sample space to be stored in a fingerprint database according to the calculation method, and taking the sample as the position fingerprint of the reference point

DPC extraction methodThe fingerprint database after bit feature is

Wherein x is_i＝(rss₁，…，rss_p) New fingerprint data, y, representing the ith reference point_iIs the physical location coordinate of the ith reference point, and p is the feature dimension, which has a great influence on the final prediction accuracy of the model. If p is too small, the introduced positioning features are relatively less, so that the positioning accuracy is lower; otherwise, redundant information and noise in the fingerprint data can be introduced, and the final position coordinate prediction result is influenced. The feature dimension p of the fingerprint data retained after the DPC extraction of the positioning features needs to be trained in an off-line stage to find the retained optimal dimension.

In an embodiment of the present application, the positioning method based on density peak clustering and gradient boosting algorithm further includes: and constructing a positioning model on the basis of a gradient lifting algorithm, wherein the gradient lifting algorithm uses an addition model and continuously reduces residual errors generated in a training process to achieve an algorithm for classifying or regressing data.

In an embodiment of the present application, constructing the localization model based on the gradient boosting algorithm includes:

the value approximation of the negative gradient of the loss function on the current model is used instead of the residual error, as an approximation of the error, to fit the next classification regression tree, F_m-1(x) The negative gradient value of the loss function of the classification regression tree is expressed as:

Calculating the output value of each child node through linear search:

2 building GBDT positioning model

The method comprises the steps of constructing a positioning model on the basis of GBDT, continuously reducing residual errors generated in a training process by adopting an addition model (namely linear combination of basis functions) by adopting the GBDT to achieve an algorithm for classifying or regressing data, introducing a basic learning model trained by an iterative mode in a gradient lifting thought framework, weighting and fusing the trained basic learning models, combining weak learners into a strong learner, improving generalization capability and model accuracy of the algorithm, and constructing a final algorithm model.

Utilizing GBDT to construct a mapping relation between fingerprint data and physical position coordinates, taking the fingerprint database D generated in the step (1) as an input space, and initializing a classification regression tree:

in the formula: y is_iRepresenting the physical position coordinates of the ith reference point; tau is an output value of a leaf node of the classification regression tree, namely a predicted value of the position coordinate of the ith reference point; n is the number of fingerprint samples; l is the loss function of the model.

The value approximation of the negative gradient of the loss function on the current model is used instead of the residual, as an approximation of the error to fit the next classification regression tree. F_m-1(x) The negative gradient value of the penalty function of the classification regression tree can be expressed as:

the input space of the mth classification regression tree Φ { (x)₁，α_m1)，(x₂，α_m2)，…，(x_N，α_mN)}。

In order to minimize the deviation of the predicted value output by the classification regression tree, the invention adopts linear search to calculate the output value of each sub-node:

and fitting the next classification regression tree by using the negative gradient value of the loss function in the current model as an approximate value of the error, wherein the final GBDT positioning model is as follows:

in the formula: m represents the total number of the classification regression trees generated by iteration, and a regularization coefficient lambda needs to be multiplied before each classification regression tree in the iteration process_mTo avoid over-fitting the training data, the value range is (0, 1)]，τ_mjTo classify the output values of the leaf nodes of the regression tree, I denotes when x ∈β_mjTake 1 if not, or 0 if not.

In an embodiment of the present application, the loss function is a Huber loss function, which takes a fractional point σ as a boundary, and two different strategies are used to reduce the influence of an abnormal value on a prediction result; for abnormal points far away from the center, an absolute value loss function is adopted, and abnormal points near the center adopt a mean square error loss function; the Huber loss function is as follows:

the selection of the loss function has a great influence on the prediction accuracy of the GBDT positioning model. The invention selects a Huber loss function which takes a quantile point sigma as a boundary, and adopts two different strategies to reduce the influence of abnormal values on a prediction result. For outliers farther from the center, the absolute loss function (LAD) is used, while points near the center apply the mean square error loss function (LS). Therefore, the Huber loss function is selected to obviously reduce the influence of the abnormal value in the fingerprint database D on the positioning result. The Huber loss function is as follows:

3. actual verification

Referring to fig. 5, when the DPC extracted positioning feature dimension p is 4, the GBDT loss function is a Huber function, the learning rate is 0.02, the number of classification regression trees is 126, and the maximum depth of a single classification regression tree is 4, the average positioning accuracy can reach 1.51m, which is significantly better than the KNN indoor positioning algorithm.

GBDT is a lifting algorithm that generates classification regression trees iteratively to reduce prediction bias gradually. To avoid overfitting the training samples, increasing the generalization capability of the model, the maximum depth of each classification regression tree needs to be limited in the iterative process. The following figure shows the influence of the maximum depth of the classification regression tree on the accuracy of the positioning algorithm in the text when the maximum depth of the classification regression tree takes different values.

Referring to fig. 6, it can be seen from fig. 6 that the average positioning error of the algorithm of the present invention decreases with the increase of the maximum depth of the classification regression tree, when the maximum depth is 4, the curve reaches the inflection point, the average positioning is 1.51m, and thereafter, the maximum depth of the classification regression tree continues to increase, the average positioning error gradually increases, and the generalization capability of the model also decreases.

Compared with the prior art, the method has the following beneficial effects:

Referring to fig. 7, the present application further provides a computer device including a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor implements the method of any one of the above methods when executing the computer program.

Referring to fig. 8, a computer-readable storage medium, a non-volatile readable storage medium, having stored therein a computer program which, when executed by a processor, implements any of the methods described above.

A computer program product comprising computer readable code which, when executed by a computer device, causes the computer device to perform the method of any of the above.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A positioning method based on density peak value clustering and gradient lifting algorithm is characterized by comprising the following steps:

setting a reference point in a preset area;

acquiring a second received signal strength value of the positioning terminal at the access point;

2. The method of claim 1, wherein the first rssi value of the reference point is collected and combined with the location coordinates of the reference point to form a fingerprint data, and the storing of the fingerprint data into the fingerprint database comprises:

3. The method of claim 2, wherein the step of screening the density peak clustering algorithm and obtaining a high-value sample as a clustering center comprises:

d in space S_ijThe total number is N ═ N (N-1)/2, and the ascending order is:

d₁≤d₂≤…≤d_N；d_c＝d_f(Nμ)

4. The method of claim 3, wherein the method further comprises: and constructing a positioning model on the basis of a gradient lifting algorithm, wherein the gradient lifting algorithm uses an addition model and continuously reduces residual errors generated in a training process to achieve an algorithm for classifying or regressing data.

5. The method of claim 4, wherein the constructing the localization model based on the gradient boosting algorithm comprises:

Calculating the output value of each child node through linear search:

wherein M represents the total number of the classification regression trees generated by iteration, and a regularization coefficient lambda needs to be multiplied before each classification regression tree in the iteration process_mTo avoid over-fitting the training dataThe value range is (0, 1)]，τ_mjTo classify the output values of the leaf nodes of the regression tree, I denotes when x ∈β_mjTake 1 if not, or 0 if not.

6. The method for positioning based on density peak clustering and gradient boosting algorithm according to claim 5, wherein the loss function is a Huber loss function, which is bounded by a quantile point σ, and two different strategies are adopted to reduce the influence of abnormal values on the prediction result; for abnormal points far away from the center, an absolute value loss function is adopted, and abnormal points near the center adopt a mean square error loss function; the Huber loss function is as follows:

7. the method of claim 1, wherein comparing the second RSSI value with the first RSSI value and obtaining the comparison result comprises:

8. A positioning system based on density peak clustering and gradient boosting algorithm is characterized by comprising:

9. The density peak clustering and gradient boosting algorithm-based positioning system according to claim 8, wherein the fingerprint database building module comprises:

10. The density peak clustering and gradient boosting algorithm-based localization system according to claim 9, wherein the cluster center filtering module comprises:

a filtering module for setting the received signal intensity data from the reference point k to the signal receiving points l, m, n after preliminary filtering

d in space S_ijThe total number is N ═ N (N-1)/2, and the ascending order is:

d₁≤d₂≤…≤d_N；d_c＝d_f(Nμ)