CN110267193B

CN110267193B - Vehicle position tracking method based on Markov decision process model

Info

Publication number: CN110267193B
Application number: CN201910458141.0A
Authority: CN
Inventors: 张�杰; 李骏; 邢志超; 邵雨蒙; 梁腾
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2021-02-12
Anticipated expiration: 2039-05-29
Also published as: CN110267193A

Abstract

The invention discloses a vehicle position tracking method based on a Markov decision process model, which comprises the following steps: establishing a two-dimensional road network model; defining the state, action and reward of the sensor cluster, establishing a Markov decision process model, and obtaining the optimal action sequence of the sensor cluster by utilizing reinforcement learning to realize preliminary tracking; and accurately tracking the target vehicle by using a Gaussian weight positioning algorithm based on RSSI. The invention realizes the accurate positioning of the vehicle and provides help for the effective implementation of vehicle position tracking.

Description

Vehicle position tracking method based on Markov decision process model

Technical Field

The invention relates to the technical field of target tracking, in particular to a vehicle position tracking method based on a Markov decision process model.

Background

The vehicle networking uses GPS, vehicle terminal and other devices to realize the effective utilization of the relevant information of the vehicle on the user information platform through the wireless communication technology. The vehicle networking utilizes the vehicle position information and the historical driving data information provided by the related equipment to store the information to the cloud end, and the analysis work such as data fusion and data mining is carried out to provide services such as better position positioning and road matching as a user, so that the user can better know the road traffic condition, reasonably plan road selection and relieve traffic pressure. The real-time position location information can also be used for providing early warning of road traffic conditions. This requires the use of vehicle position data information due to its strong uncertainty and randomness.

The existing vehicle position positioning method comprises a vehicle motion track model, a discrete linear error model, a model based on an optimization control algorithm and the like, which are all based on target vehicle analysis, do not start from a sensor, and have strong uncertainty and randomness.

Disclosure of Invention

The invention aims to provide a vehicle position tracking method based on a Markov decision process model, which is used for accurately tracking the position of a target vehicle in real time.

The technical scheme for realizing the purpose of the invention is as follows: a vehicle position tracking method based on a Markov decision process model comprises the following steps:

step 1, establishing a two-dimensional road network model;

step 2, defining the state, action and reward of the sensor cluster, establishing a Markov decision process model, and obtaining an optimal action sequence of the sensor cluster by utilizing reinforcement learning to realize preliminary tracking;

and 3, accurately tracking the target vehicle by using a Gaussian weight positioning algorithm based on the RSSI.

Compared with the prior art, the invention has the following remarkable advantages: the optimal action sequence of the sensor cluster is obtained by establishing a two-dimensional road network model and a Markov decision process model; the sensor cluster utilizes the optimal action sequence to carry out state transition to reach the optimal state, and primary tracking is realized; and on the basis, accurate tracking of the position coordinates of the target vehicle is realized by utilizing a Gaussian weight positioning algorithm based on the RSSI.

Drawings

Figure 1 is a flow chart of a method for tracking vehicle position based on a markov decision process model of the present invention.

FIG. 2 is a diagram of sensor cluster tracking.

Fig. 3 is a graph comparing a gaussian weight location algorithm based on RSSI with a conventional location algorithm.

Detailed Description

The invention provides a vehicle position positioning algorithm based on MDP. When a target vehicle is positioned at a certain coordinate point, a sensor cluster is taken as a target, the optimal action sequence of the sensor cluster is obtained by establishing a Markov decision process and utilizing a Q-learning algorithm in reinforcement learning, and preliminary target tracking is realized; on the basis, the target vehicle is accurately tracked and positioned by utilizing a Gaussian weight positioning algorithm based on the RSSI.

As shown in fig. 1, the positioning method includes the following steps:

step 1, establishing a two-dimensional road network model; the method specifically comprises the following steps:

and projecting the actual road map into a Cartesian rectangular coordinate system on a two-dimensional plane. Roads are mainly divided into three types, including: a single road parallel to the X-axis or Y-axis, represented by its head-to-tail coordinates; two roads which are perpendicular to each other and are respectively parallel to the X, Y axis are represented by head and tail coordinates and intersection coordinates; a single road, which is not parallel to the X-axis and the Y-axis, is represented by the head-to-tail coordinates and two extended line foci parallel to the X-axis and the Y-axis, respectively, passing through the head-to-tail coordinate points. The other various complex roads can be divided into combinations of the above three road types.

Step 2, defining the state, action and reward of the sensor cluster, establishing a Markov decision process model, and obtaining an optimal action sequence of the sensor cluster by utilizing reinforcement learning to realize preliminary tracking; the method specifically comprises the following steps:

s21, the cluster value of each sensor cluster can be 0 or 1; when the cluster value is 0, the sensor cluster is in a dormant state, and when the cluster value is 1, the sensor cluster is in a working state; the binary number combination formed by the cluster values of each sensor cluster is the state of the sensor cluster; the action sub-value of each sensor cluster may take 0 or 1; when the corresponding action sub value of the sensor cluster is 1, the state is changed, and when the corresponding action sub value is 0, the state is kept unchanged; the binary number combination formed by each action sub-value is the action of the sensor cluster; the states and the transitions between the states satisfy:

s_t+1＝s_t∧a_k,k＝0,1,...,N

s_tfor the current momentState value of sensor cluster, s_t+1Is the state value of the sensor cluster at the next moment, a_kFor the action value taken, N is the number of elements in the state set or action set.

S22, defining a direct reward when the target vehicle is at a certain coordinate: when the target vehicle is located in the working range of the sensor cluster with the cluster value of 1, directly awarding the sensor cluster as positive awards; when the target vehicle is located in the working range of the sensor with the cluster value of 0, the direct reward is a negative reward; when the target vehicle is located outside the working range of the sensor with the cluster value of 1, directly awarding the target vehicle is negative awards; when the target vehicle is outside the operating range of the sensor with cluster value 0, the direct award is 0. Establishing a Markov decision process, and calculating each Q value by using a Q-learning algorithm in reinforcement learning:

s_tfor the state value, s, of the sensor cluster at the current moment_t+1Is the state value of the sensor cluster at the next moment, a_tFor the action value taken at the current time, a' is any action element in the action set, r is the direct reward, α is the learning rate, and γ is the reward discount value. And obtaining a final Q table through iterative calculation, obtaining an optimal action sequence of the sensor cluster under the current coordinate of the target vehicle according to the Q table, and performing state transition on the sensor cluster to an optimal state by using the optimal action sequence, namely, a sensor close to the vehicle is in a working state, and a sensor far away from the vehicle is in a dormant state, so that preliminary tracking is realized.

Step 3, accurately tracking the target vehicle by utilizing a Gaussian weight positioning algorithm based on RSSI (received signal strength indicator); the method specifically comprises the following steps:

s31, for each sensor in the sensor cluster under the working state, obtaining the distance d from the target vehicle by using the RSSI ranging formula_iAnd sorting the obtained distance sets from small to large. The total number of the sensors in the sensor cluster is 3N, and the position coordinate of the sensor corresponding to each distance after sequencing is (x)_i,y_i) 1, 2.., 3N, let the specific location coordinates of the target vehicle be (x, y), we can obtain:

three equations are taken in sequence for N times in total, and the simplification is as follows:

obtaining specific position coordinates of target vehicle by using least square method

There are N coordinates in total. Each coordinate corresponds to an average distance:

defining the weight of each coordinate point by a Gaussian function as follows:

and sigma is the influence degree of the coordinate point, and the value range is [0.1,0.2 ].

The coordinates of the final target vehicle are:

the present invention will be described in detail below with reference to examples and the accompanying drawings.

Examples

The invention implements the method using pycharm software. Let the mapping of the target vehicle's actual position in the two-dimensional road network model be (2.1, 3.2). There are four sensor clusters, and the initial state of the sensor cluster is {0000 }. Assume that the state transition probability of a sensor cluster is approximately 1.

FIG. 2 is a diagram of sensor cluster tracking. The black circle is the target vehicle; the square is a sensor cluster, which is in a dormant state when the square is black and in an operating state when the square is white. It can be observed that the sensor continuously utilizes the optimal action sequence to carry out state transition to reach the optimal state along with the movement of the target vehicle so as to realize the initial tracking.

Fig. 3 is a comparison graph of a gaussian weight positioning algorithm based on RSSI and a conventional three-point positioning algorithm, wherein the two algorithms are simulated 100 times respectively to obtain a target vehicle coordinate point. It can be observed that the results obtained by the RSSI-based gaussian weight positioning algorithm are significantly better than those obtained by the conventional positioning algorithm.

Claims

1. A vehicle position tracking method based on a Markov decision process model is characterized by comprising the following steps:

step 1, establishing a two-dimensional road network model;

step 2, defining the state, action and reward of the sensor cluster, establishing a Markov decision process model, and obtaining an optimal action sequence of the sensor cluster by utilizing reinforcement learning to realize preliminary tracking; the specific process is as follows:

s_t+1＝s_t∧a_t

s_tfor the state value, s, of the sensor cluster at the current moment_t+1Is the state value of the sensor cluster at the next moment, a_tAn action value taken for the current time;

s22, defining a direct reward when the target vehicle is at a certain coordinate: when the target vehicle is located in the working range of the sensor cluster with the cluster value of 1, directly awarding the sensor cluster as positive awards; when the target vehicle is located in the working range of the sensor with the cluster value of 0, the direct reward is a negative reward; when the target vehicle is located outside the working range of the sensor with the cluster value of 1, directly awarding the target vehicle is negative awards; when the target vehicle is located outside the working range of the sensor with the cluster value of 0, the direct reward is 0; establishing a Markov decision process model, and calculating each Q value by using a Q-learning algorithm in reinforcement learning:

a' is any action element in the action set, r is direct reward, alpha is learning rate, and gamma is reward discount value; obtaining a final Q table through iterative calculation, and obtaining an optimal action sequence of the sensor cluster under the current coordinate of the target vehicle according to the Q table;

and 3, accurately tracking the target vehicle by using a Gaussian weight positioning algorithm based on RSSI (received signal strength indicator), which specifically comprises the following steps:

s31, for each sensor in the sensor cluster under the working state, obtaining the distance d from the target vehicle by using the RSSI ranging formula_iSorting the obtained distance sets from small to large; the total number of the sensors in the sensor cluster is 3N, and the position coordinate of the sensor corresponding to each distance after sequencing is (x)_i,y_i) 1, 2.., 3N, let the specific location coordinates of the target vehicle be (x, y), we can obtain:

N coordinates are total; each coordinate corresponds to an average distance:

defining the weight of each coordinate point by a Gaussian function as follows:

sigma is the influence degree of the coordinate point, and the value range is [0.1,0.2 ];

the coordinates of the final target vehicle are:

2. the Markov decision process model-based vehicle position tracking method of claim 1, wherein the specific process of step 1 is as follows:

projecting an actual road map into a Cartesian rectangular coordinate system on a two-dimensional plane; roads are mainly divided into three types, including: a single road parallel to the X-axis or Y-axis, represented by its head-to-tail coordinates; two roads which are perpendicular to each other and are respectively parallel to the X, Y axis are represented by head and tail coordinates and intersection coordinates; a single road which is not parallel to the X axis and the Y axis is represented by head and tail coordinates and two extended line focus coordinates which are parallel to the X axis and the Y axis and pass through head and tail coordinate points respectively; the other various complex roads can be divided into combinations of the above three road types.