Random forest-based non-line-of-sight signal identification method
Technical Field
The invention relates to the technical field of wireless positioning, in particular to a non-line-of-sight signal identification method based on a random forest.
Background
The wireless positioning (Wireless Localization) is widely applied to the fields of military, logistics, security, medical treatment, searching, rescue and the like. The positioning accuracy of the positioning system in a complex multipath, non-line-of-sight (NLOS) environment is improved, and the positioning accuracy is a research hotspot of wireless positioning based on Time-of-Arrival (TOA), and one of the key problems is non-line-of-sight signal identification.
Non-line-of-sight signal identification refers to identifying and removing non-line-of-sight measurements when there are more range measurements, and locating using only line-of-sight measurements. Currently, there are three main types of methods:
1) Based on the distance measurement method, according to the fact that the variance of the distance measurement value in the non-line-of-sight environment is larger than the variance of the distance measurement value in the line-of-sight environment, the variances of the distance measurement values are compared with a preset threshold value, the variances of the distance measurement values are larger than the threshold value, the non-line-of-sight signal can be judged, the line-of-sight signal can be judged when the variances of the distance measurement values are smaller than the threshold value, the method is suitable for static target positioning, when the target is in a dynamic state, the variance of the distance measurement value is increased, and the line-of-sight signal can be easily misjudged as the non-line-of-sight signal;
2) A method based on channel statistics, identifying non-line-of-sight signals by analyzing a cumulative distribution function of the magnitudes of the channel impulse responses, or by comparing the joint likelihood function values of kurtosis (kurtosis), average delay (mean excess delay) and root mean square delay (root mean square delay) of the channels with threshold values, but the definition of the decision threshold is ambiguous;
3) The channel is identified by the geographic geometry information of the environment, and the non-line-of-sight signal is identified by utilizing a ray tracing algorithm, so that the layout of the environment needs to be known in advance.
The above algorithms all belong to statistical methods, and the common disadvantages are: firstly, the prior distribution of the samples is generally required to be known in advance, and enough sample data needs to be collected, and the requirements are often difficult to achieve in practical application, and the algorithm real-time performance is not high; the feature joint probability distribution required by the algorithm (II) is sometimes difficult to determine and has poor stability.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a non-line-of-sight signal identification method based on a random forest, which has high real-time performance and good stability.
The technical scheme of the invention is as follows: a non-line-of-sight signal identification method based on random forests comprises the following steps:
s1, randomly selecting N training positions in a test area comprising 1 target node and A base stations, sequentially placing training communication nodes at the training positions, and measuring a received signal r of the training communication node from N epsilon N to K base stations at each training position n (t),K=1,2,…,A;
Respectively calculating r n 6 characteristic parameters of (t), comprising:
energy parameter e= +|r n (t)| 2 dt、
Maximum amplitude parameter r max =max r |r n (t)|、
Rise time parameter t rise =min t {t:|r n (t)|≥0.6r max }-min t {t:|r n (t)|≥6σ n }、
Average delay parameter
Root mean square delay parameter
Kurtosis parameter
Combining the 6 characteristic parameters into a characteristic set F= { e, r max ,t rise ,τ m ,τ r ,κ s R obtained by N positions n (t) forming a training input matrixBuilding training output matrix->Wherein y is n R is n The visual distance or non-visual distance identifying mark of (t), if identified as visual distance, y n =1, if it is recognized as non-line of sight, y n =0, resulting in a complete training set +.>
S2, measuring a received signal r from the target node to the base station * (t), and calculating r * 6 characteristic parameters of (t);
s3, constructing a random forest model formed by a plurality of decision trees, wherein for each decision tree: sampling N times from the complete training set D in a put-back sampling mode to form a training set D' of the decision tree, randomly selecting M characteristic parameters from the characteristic set F as the characteristic to be selected of the decision tree, wherein M is less than 6, calculating the Gini index of the characteristic to be selected, sequentially taking the characteristic to be selected as the splitting characteristics of the root node, the middle node and the leaf node of the decision tree according to the sequence from the small Gini index to the large Gini index,
wherein Gini (D ', f) is the classification of training set D' recognition into D according to the feature f to be selected 1 (apparent distance) and D 2 Gini index of two classes (non-line of sight), gini (D) i ) Is of the class D i Gini index, |d, of (i=1, 2) 1 |、|D 2 The I and the I D' I are respectively a set D 1 、D 2 The number of samples in D', p i Is of the class D i The probability of classification correctness and mistakes is identified;
s4, receiving the signal r by the target node * Inputting the feature parameters of (t) into the random forest model, and dividing each decision tree into a feature value area with each nodeComparing the range and outputting the vision distance/non-vision distance recognition category, wherein most of the judgment results of all decision trees are used as the final vision distance/non-vision distance recognition result of the base station;
s5, repeating the steps S1-S4 for the rest base stations in the test area to obtain the sight distance/non-sight distance recognition results of the signals received by the target node and each base station, and removing the signals recognized as non-sight distances when the received signals are used for positioning.
As a further improvement, A is more than or equal to 3.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
the method of the invention uses non-line-of-sight signal recognition as a line-of-sight/non-line-of-sight two-category classification problem to process, adopts a machine learning method, namely a random forest to carry out recognition classification, wherein the random forest is a high-precision classifier comprising a plurality of decision trees, and the output category is determined by most of the categories output by each decision tree.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention will be further described with reference to specific embodiments in the drawings.
Referring to fig. 1, a non-line-of-sight signal identification method based on random forest includes the following steps:
s1, randomly selecting N training positions in a test area comprising 1 target node and A base stations, sequentially placing training communication nodes at the training positions, and measuring a received signal r of the training communication node from N epsilon N to K base stations at each training position n (t),A≥3,K=1,2,…,A;
Respectively calculating r n 6 characteristic parameters of (t), comprising:
energy parameter e= +|r n (t)| 2 dt、
Maximum amplitude parameter r max =max r |r n (t)|、
Rise time parameter t rise =min t {t:|r n (t)|≥0.6r max }-min t {t:|r n (t)|≥6σ n }、
Average delay parameter
Root mean square delay parameter
Kurtosis parameter
Combining 6 characteristic parameters into a characteristic set F= { e, r max ,t rise ,τ m ,τ r ,κ s R obtained by N positions n (t) forming a training input matrixBuilding training output matrix->Wherein y is n R is n The visual distance or non-visual distance identifying mark of (t), if identified as visual distance, y n =1, if it is recognized as non-line of sight, y n =0, resulting in a complete training set +.>
S2, measuring a received signal r from a target node to the base station * (t), and calculating r * 6 characteristic parameters of (t);
S3, constructing a random forest model formed by a plurality of decision trees, wherein for each decision tree: sampling N times from the complete training set D in a sampling-back mode to form a training set D' of the decision tree, randomly selecting M characteristic parameters from the characteristic set F as the to-be-selected characteristics of the decision tree, wherein M is less than 6, calculating the Gini index of the to-be-selected characteristics, sequentially taking the to-be-selected characteristics as the splitting characteristics of the root node, the middle node and the leaf node of the decision tree according to the sequence from the small Gini index to the large Gini index,
wherein Gini (D ', f) is the classification of training set D' recognition into D according to the feature f to be selected 1 (apparent distance) and D 2 Gini index of two classes (non-line of sight), gini (D) i ) Is of the class D i Gini index, |d, of (i=1, 2) 1 |、|D 2 The I and the I D' I are respectively a set D 1 、D 2 The number of samples in D', p i Is of the class D i The probability of classification correctness and mistakes is identified;
s4, receiving the signal r by the target node * Inputting the characteristic parameters of (t) into a random forest model, comparing each decision tree with the split characteristic value interval range of each node, and outputting the vision distance/non-vision distance identification category of each decision tree, wherein most of the judgment results of all decision trees are used as the final vision distance/non-vision distance identification result of the base station;
s5, repeating the steps S1-S4 for other base stations in the test area to obtain the sight distance/non-sight distance recognition result of the signals received by the target node and each base station, and removing the signals recognized as non-sight distances when the received signals are used for positioning.
The method of the invention uses non-line-of-sight signal recognition as a line-of-sight/non-line-of-sight two-category classification problem to process, adopts a machine learning method, namely a random forest to carry out recognition classification, wherein the random forest is a high-precision classifier comprising a plurality of decision trees, and the output category is determined by most of the categories output by each decision tree.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these do not affect the effect of the implementation of the present invention and the utility of the patent.