CN112800983B

CN112800983B - A non-line-of-sight signal recognition method based on random forest

Info

Publication number: CN112800983B
Application number: CN202110138933.7A
Authority: CN
Inventors: 杨小凤; 韦艳芳; 王强
Original assignee: Yulin Normal University
Current assignee: Yulin Normal University
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2024-03-08
Anticipated expiration: 2041-02-01
Also published as: CN112800983A

Abstract

The invention discloses a random forest-based non-line-of-sight signal identification method, relates to the field of wireless positioning technology, and solves the technical problems of poor real-time performance and low accuracy of existing positioning methods. The method is: constructing a random algorithm composed of several decision trees. The forest model measures the received signals r _* (t) from the target node to each base station in sequence, inputs the characteristic parameters of r _* (t) into the random forest model, and obtains the line-of-sight/non-line-of-sight identification results of the signals received by the target node and each base station. , signals identified as non-line-of-sight can be removed when positioning using received signals.

Description

Random forest-based non-line-of-sight signal identification method

Technical Field

The invention relates to the technical field of wireless positioning, in particular to a non-line-of-sight signal identification method based on a random forest.

Background

The wireless positioning (Wireless Localization) is widely applied to the fields of military, logistics, security, medical treatment, searching, rescue and the like. The positioning accuracy of the positioning system in a complex multipath, non-line-of-sight (NLOS) environment is improved, and the positioning accuracy is a research hotspot of wireless positioning based on Time-of-Arrival (TOA), and one of the key problems is non-line-of-sight signal identification.

Non-line-of-sight signal identification refers to identifying and removing non-line-of-sight measurements when there are more range measurements, and locating using only line-of-sight measurements. Currently, there are three main types of methods:

1) Based on the distance measurement method, according to the fact that the variance of the distance measurement value in the non-line-of-sight environment is larger than the variance of the distance measurement value in the line-of-sight environment, the variances of the distance measurement values are compared with a preset threshold value, the variances of the distance measurement values are larger than the threshold value, the non-line-of-sight signal can be judged, the line-of-sight signal can be judged when the variances of the distance measurement values are smaller than the threshold value, the method is suitable for static target positioning, when the target is in a dynamic state, the variance of the distance measurement value is increased, and the line-of-sight signal can be easily misjudged as the non-line-of-sight signal;

2) A method based on channel statistics, identifying non-line-of-sight signals by analyzing a cumulative distribution function of the magnitudes of the channel impulse responses, or by comparing the joint likelihood function values of kurtosis (kurtosis), average delay (mean excess delay) and root mean square delay (root mean square delay) of the channels with threshold values, but the definition of the decision threshold is ambiguous;

3) The channel is identified by the geographic geometry information of the environment, and the non-line-of-sight signal is identified by utilizing a ray tracing algorithm, so that the layout of the environment needs to be known in advance.

The above algorithms all belong to statistical methods, and the common disadvantages are: firstly, the prior distribution of the samples is generally required to be known in advance, and enough sample data needs to be collected, and the requirements are often difficult to achieve in practical application, and the algorithm real-time performance is not high; the feature joint probability distribution required by the algorithm (II) is sometimes difficult to determine and has poor stability.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a non-line-of-sight signal identification method based on a random forest, which has high real-time performance and good stability.

The technical scheme of the invention is as follows: a non-line-of-sight signal identification method based on random forests comprises the following steps:

s1, randomly selecting N training positions in a test area comprising 1 target node and A base stations, sequentially placing training communication nodes at the training positions, and measuring a received signal r of the training communication node from N epsilon N to K base stations at each training position _n (t)，K＝1，2，…，A；

Respectively calculating r _n 6 characteristic parameters of (t), comprising:

energy parameter e= +|r _n (t)| ² dt、

Maximum amplitude parameter r _max ＝max _r |r _n (t)|、

Rise time parameter t _rise ＝min _t {t：|r _n (t)|≥0.6r _max }-min _t {t：|r _n (t)|≥6σ _n }、

Average delay parameter

Root mean square delay parameter

Kurtosis parameter

Combining the 6 characteristic parameters into a characteristic set F= { e, r _max ，t _rise ，τ _m ，τ _r ，κ _s R obtained by N positions _n (t) forming a training input matrixBuilding training output matrix->Wherein y is _n R is _n The visual distance or non-visual distance identifying mark of (t), if identified as visual distance, y _n =1, if it is recognized as non-line of sight, y _n =0, resulting in a complete training set +.>

S2, measuring a received signal r from the target node to the base station _* (t), and calculating r _* 6 characteristic parameters of (t);

s3, constructing a random forest model formed by a plurality of decision trees, wherein for each decision tree: sampling N times from the complete training set D in a put-back sampling mode to form a training set D' of the decision tree, randomly selecting M characteristic parameters from the characteristic set F as the characteristic to be selected of the decision tree, wherein M is less than 6, calculating the Gini index of the characteristic to be selected, sequentially taking the characteristic to be selected as the splitting characteristics of the root node, the middle node and the leaf node of the decision tree according to the sequence from the small Gini index to the large Gini index,

wherein Gini (D ', f) is the classification of training set D' recognition into D according to the feature f to be selected ₁ (apparent distance) and D ₂ Gini index of two classes (non-line of sight), gini (D) _i ) Is of the class D _i Gini index, |d, of (i=1, 2) ₁ |、|D ₂ The I and the I D' I are respectively a set D ₁ 、D ₂ The number of samples in D', p _i Is of the class D _i The probability of classification correctness and mistakes is identified;

s4, receiving the signal r by the target node _* Inputting the feature parameters of (t) into the random forest model, and dividing each decision tree into a feature value area with each nodeComparing the range and outputting the vision distance/non-vision distance recognition category, wherein most of the judgment results of all decision trees are used as the final vision distance/non-vision distance recognition result of the base station;

s5, repeating the steps S1-S4 for the rest base stations in the test area to obtain the sight distance/non-sight distance recognition results of the signals received by the target node and each base station, and removing the signals recognized as non-sight distances when the received signals are used for positioning.

As a further improvement, A is more than or equal to 3.

Advantageous effects

Compared with the prior art, the invention has the advantages that:

the method of the invention uses non-line-of-sight signal recognition as a line-of-sight/non-line-of-sight two-category classification problem to process, adopts a machine learning method, namely a random forest to carry out recognition classification, wherein the random forest is a high-precision classifier comprising a plurality of decision trees, and the output category is determined by most of the categories output by each decision tree.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments in the drawings.

Referring to fig. 1, a non-line-of-sight signal identification method based on random forest includes the following steps:

s1, randomly selecting N training positions in a test area comprising 1 target node and A base stations, sequentially placing training communication nodes at the training positions, and measuring a received signal r of the training communication node from N epsilon N to K base stations at each training position _n (t)，A≥3，K＝1，2，…，A；

Respectively calculating r _n 6 characteristic parameters of (t), comprising:

energy parameter e= +|r _n (t)| ² dt、

Maximum amplitude parameter r _max ＝max _r |r _n (t)|、

Average delay parameter

Root mean square delay parameter

Kurtosis parameter

Combining 6 characteristic parameters into a characteristic set F= { e, r _max ，t _rise ，τ _m ，τ _r ，κ _s R obtained by N positions _n (t) forming a training input matrixBuilding training output matrix->Wherein y is _n R is _n The visual distance or non-visual distance identifying mark of (t), if identified as visual distance, y _n =1, if it is recognized as non-line of sight, y _n =0, resulting in a complete training set +.>

S2, measuring a received signal r from a target node to the base station _* (t), and calculating r _* 6 characteristic parameters of (t)；

S3, constructing a random forest model formed by a plurality of decision trees, wherein for each decision tree: sampling N times from the complete training set D in a sampling-back mode to form a training set D' of the decision tree, randomly selecting M characteristic parameters from the characteristic set F as the to-be-selected characteristics of the decision tree, wherein M is less than 6, calculating the Gini index of the to-be-selected characteristics, sequentially taking the to-be-selected characteristics as the splitting characteristics of the root node, the middle node and the leaf node of the decision tree according to the sequence from the small Gini index to the large Gini index,

s4, receiving the signal r by the target node _* Inputting the characteristic parameters of (t) into a random forest model, comparing each decision tree with the split characteristic value interval range of each node, and outputting the vision distance/non-vision distance identification category of each decision tree, wherein most of the judgment results of all decision trees are used as the final vision distance/non-vision distance identification result of the base station;

s5, repeating the steps S1-S4 for other base stations in the test area to obtain the sight distance/non-sight distance recognition result of the signals received by the target node and each base station, and removing the signals recognized as non-sight distances when the received signals are used for positioning.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these do not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. A non-line-of-sight signal identification method based on random forests is characterized by comprising the following steps:

s1, randomly selecting N training positions in a test area comprising 1 target node and A base stations, sequentially placing training communication nodes at the training positions, and measuring received signals of the training communication nodes at each training position N ∊ N to the Kth base station，K=1,2,…,A；

Respectively calculating to obtainIs included in the set of 6 characteristic parameters:

energy parameter、

Maximum amplitude parameter、

Rise time parameter、

Average delay parameter、

Root mean square delay parameter、

Kurtosis parameter；

Combining the 6 characteristic parameters into a characteristic setN positions are taken +.>Composing training input matrix->Building training output matrix->Wherein->Is->Is a visual distance or non-visual distance identification mark, if the visual distance is identified, the step of +.>If it is recognized as non-line of sight, then +.>Obtaining the finished productWhole training set->；

S2, measuring the received signal from the target node to the base stationAnd calculate +.>Is set to 6 characteristic parameters;

s3, constructing a random forest model formed by a plurality of decision trees, wherein for each decision tree: from a complete training setSampling N times in a manner of put-back sampling to form training set +.>From feature set->Randomly selecting M characteristic parameters as the to-be-selected characteristics of the decision tree, M<6, calculating the Gini index of the feature to be selected, taking the feature to be selected as the splitting features of the root node, the middle node and the leaf node of the decision tree in sequence from small to large according to the Gini index,

，

wherein the method comprises the steps ofTo be according to the optional feature->Training set->Identification is divided into sight distance->And non-line of sight->Gini index of two classes, +.>For category->Gini index of->，、、Respectively set->、、The number of samples in>For category->The probability of classification correctness and mistakes is identified;

s4, receiving the signal from the target nodeInputting the characteristic parameters of the random forest model, comparing each decision tree with the split characteristic value interval range of each node, and outputting the sight distance/non-sight distance identification category of each decision tree, wherein most of the judgment results of all decision trees are used as the final sight distance/non-sight distance identification result of the base station;

2. The random forest-based non-line-of-sight signal identification method of claim 1, wherein A is not less than 3.