CN113335277A

CN113335277A - Intelligent cruise control method and device, electronic equipment and storage medium

Info

Publication number: CN113335277A
Application number: CN202110458260.3A
Authority: CN
Inventors: 王朱伟; 金森繁; 刘力菡; 方超; 孙阳; 李萌; 杨睿哲
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-09-03

Abstract

The embodiment of the invention provides an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles. The invention solves the problems of unpredictability of complex traffic environment and unreliability of network in the conventional cruise control method based on networked control.

Description

Intelligent cruise control method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of automatic control, in particular to an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium.

Background

Cruise control is an advanced driving assisting method, can effectively reduce the burden of a driver, and improves road traffic efficiency, driving safety and fuel economy. At present, cruise control methods based on networked control, such as Adaptive Cruise Control (ACC), Coordinated Adaptive Cruise Control (CACC), and interconnected cruise control (CCC), have many limitations although receiving wide attention and application. For example, the ACC method combines multiple sensor technologies to sense road traffic information, and the sensor has poor sensing sensitivity and is easily interfered by external environments, so that the ACC method has insufficient stability and safety. The CACC method introduces a vehicle-to-vehicle (V2V) communication technology in the internet of vehicles on the basis of the ACC to promote the vehicles in the fleet to actively exchange their motion state information, however, the CACC method requires that each vehicle in the fleet is equipped with an ACC autopilot device to assist cooperative control, and the communication topology is usually fixed, and when there is a manually driven vehicle in the fleet or the road conditions change, the performance and stability of the CACC will inevitably be reduced, which also limits its application in future traffic scenarios. In order to realize more flexible vehicle queue design, connection structure and communication topological structure, the CCC further provided allows the controlled vehicle to receive the state information broadcasted by a plurality of front vehicles without equipping all vehicles with sensors, and the whole queue does not need to be designed uniformly while the information perception and control capability of each vehicle is improved. Although the CCC system requires neither a specific head car nor a fixed communication structure, and thus can selectively communicate, allowing for a modular design and better scalability, under the limitation of environmental changes, controlled vehicle movement, transmission capability of network nodes and link quality, the characteristics of its topology, network communication delay and expectation state will be dynamic and time-varying, and unpredictability of complex traffic environment and unreliability of network will bring a serious challenge to the networked control based cruise control method.

Disclosure of Invention

The embodiment of the invention provides an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium, which are used for solving the problems of part or all of the problems in the conventional cruise control method based on networked control.

In a first aspect, an embodiment of the present invention provides a smart cruise control method, including:

determining a current status signal of the automatically controlled vehicle;

inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;

the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.

Preferably, the markov decision process model is constructed by the following steps:

acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;

according to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;

and constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.

Preferably, the acquiring queue state information of a vehicle queue built by automatically controlled vehicles and building a dynamic equation of the queue system according to the queue state information includes the following steps:

obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;

establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;

acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;

and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.

Preferably, the preset scope policy includes:

if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;

if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula is

Wherein V (h) represents the desired vehicle speed, h represents the vehicle distance, h_minIndicates a preset minimum vehicle distance, h_maxIndicating a preset maximum vehicle distance, v_maxRepresenting a preset maximum vehicle speed;

if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;

and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.

Preferably, the dynamic equation of the queue system obtained after the discretization process is as follows:

y_i+1＝A₀y_i+B₁u_i+B₂u_i-1；

wherein, y_iY (i Δ T) and u_iU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,

i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambda_jAnd

representing system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,

is the partial derivative of the range strategy at the desired vehicle distance.

Preferably, the quadratic optimization control equation is constructed by taking the minimized state error and the input as an objective function according to the dynamic equation of the queue system as follows:

wherein N is the number of sampling intervals, C and D are coefficient matrices:

c1 and c2 are preset coefficients.

Preferably, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on vehicle queue real-time collected state samples constructed by the automatic control vehicle, and includes:

establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;

in each time slot according to the input state s_kThe current actor network will output the corresponding action policy mu(s)_k|θ^μ) Execution policy

And obtaining the state s of the next moment according to the state transfer function_k+1And deriving a corresponding prize r from the prize function_kWill(s)_k,a_k,s_k,+r_1k) Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,

current critic networks update their parameter θ by minimizing the following mean square error loss function^Q：

Where M is the number of samples sampled in a small batch, Q(s)_t,a_t|θ^Q) Is the current Q value by multiplying s_tAnd a_tInput into the current critic network to obtain x_tFor the target Q value, expressed as:

x_t＝r_t+γQ′(s_t+1,μ′(s_t+1|θ^μ′)|θ^Q′)

in the formula, r_tFor a corresponding value of the reward function, Q'(s)_t+1,μ′(s_t+1|θ^μ′)|θ^Q′) The next Q value, μ'(s), generated for the target critic network_t+1|θ^μ′) Based on input for target operator networkState s_t+1The generated next action strategy;

current actor networks update their parameter θ by a strategic gradient function^μ：

Wherein the content of the first and second substances,

is a gradient operator;

the target operator network and the target critic network update their parameters theta respectively as follows^Q' and theta^μ'：

θ^Q′←δθ^Q+(1-δ)θ^Q′

θ^μ′←δθ^μ+(1-δ)θ^μ′

Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.

In a second aspect, an embodiment of the present invention provides an intelligent cruise control apparatus, including a state signal unit and an intelligent control unit;

the state signal unit is used for determining a current state signal of the automatic control vehicle;

the intelligent control unit is used for inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;

the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the intelligent cruise control method according to any one of the above-mentioned first aspects when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the intelligent cruise control method according to any one of the above-mentioned first aspects.

According to the intelligent cruise control method, the intelligent cruise control device, the electronic equipment and the storage medium, the current state signal of the automatic control vehicle is input into the intelligent optimization control model, so that the intelligent cruise control of the automatic control vehicle is realized; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles. The embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control by continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene, realizing the safe and stable driving of the automatic driving vehicle, and solving the problems of unpredictability of complex traffic environment and unreliability of the network in the conventional cruise control method based on networked control.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a smart cruise control method according to the present invention;

FIG. 2 is a schematic diagram of a smart cruise control scenario based on networked control provided by the present invention;

FIG. 3 is a diagram of a smart cruise control architecture based on networked control provided by the present invention;

fig. 4 is a schematic structural diagram of an intelligent cruise control device provided by the invention;

FIG. 5 is a block diagram of an intelligent optimization control module provided by the present invention;

FIG. 6 is a block diagram of a system modeling module provided by the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes a smart cruise control method, apparatus, electronic device and storage medium provided by the present invention with reference to fig. 1 to 7.

The embodiment of the invention provides an intelligent cruise control method. Fig. 1 is a schematic flow chart of a smart cruise control method according to an embodiment of the present invention, as shown in fig. 1, the method includes:

step 110, determining a current state signal of the automatic control vehicle;

specifically, the vehicle queue in the embodiment of the invention comprises a manually-driven vehicle and a CCC vehicle, each vehicle in the queue is provided with a communication device, and the CCC automatically-driven vehicle can receive state information including headway, vehicle speed and acceleration from other vehicles through V2V communication technology.

Step 120, inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;

Specifically, a dynamic equation of a vehicle queue system is constructed by analyzing vehicle dynamics and wireless network characteristics, an optimization control problem is established by considering the influence of dynamic time-varying network communication time delay and an expected state, an MDP model is constructed, a DRL algorithm is used, samples are generated by continuously interacting with the environment and a neural network is trained, and finally an intelligent optimization control strategy of an automatic control vehicle is obtained, so that the automatic control vehicle can track an ideal expected vehicle speed and always keep a safe vehicle distance with a front vehicle, and meanwhile, the stable running of the control system and a vehicle queue under a network dynamic condition is guaranteed.

The method provided by the embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control through continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene and realizing the safe and stable driving of the CCC automatic driving vehicle.

Based on any one of the above embodiments, the construction process of the markov decision process model comprises the following steps:

it should be noted that, due to the flexible network topology between vehicles in the CCC system, each vehicle can communicate with nearby vehicles. Through wireless V2V communication, the CCC vehicle can acquire real-time state information such as headway, speed and acceleration of other vehicles in the fleet, so that the whole vehicle queue can be modeled. Meanwhile, the CCC can provide services for heterogeneous vehicle queues, so that the sequence and the number of manually driven vehicles and CCC automatic control vehicles in a fleet are variable, and the requirements of real traffic scenes on the flexibility of the vehicle queues are met better. Generally, the automatic control vehicle does not need to consider the vehicle state of the subsequent vehicle, and in order to describe the technical scheme more clearly, the embodiment of the invention takes the tail vehicle as the CCC automatic control vehicle and other vehicles as the manual driving vehicles as examples. In addition, the method provided by the embodiment of the invention is also suitable for controlling the automatic control vehicle in a more complex model, and when the queue model changes, the modeling method provided by the embodiment of the invention can be used for constructing a corresponding system dynamic equation according to the specific situation of the queue.

it is noted that the goal of cruise control is to enable the vehicles in the vehicle train to track a desired vehicle speed and maintain a desired vehicle separation while achieving comfortable and smooth acceleration control. Therefore, a quadratic optimization control problem can be constructed with the goal of minimizing vehicle speed and vehicle distance errors and control inputs. On the one hand, however, such optimization control problems are difficult to solve directly due to the high dimensional state space and complex physical properties. On the other hand, due to the influence of the actual network communication delay and the dynamic time-varying characteristic of the expected state, a traditional optimization decision method depending on a fixed parameter model and a static strategy is adopted, so that higher robustness and stability risks often exist. Therefore, the embodiment of the invention provides an intelligent optimization control method based on DRL (deep recovery learning) to improve the adaptability and stability of an automatic control vehicle under a complex dynamic condition.

It should be noted that the Reinforcement Learning (RL) problem is usually described by MDP (markov Decision process), which generally includes a state, an action, a state transfer function and a reward function, and an MDP model of the system is established according to the system model and the optimization problem. And obtaining an intelligent optimization control strategy by adopting an algorithm based on Deep Reinforcement Learning (DRL) according to the MDP model. When dealing with cruise control, which is a continuous control problem of action values, traditional artificial intelligence algorithms based on discrete actions, such as Q-learning, DQN (Deep Q-learning), Actor-Critic (Actor-Critic), etc., performance is often degraded due to poor convergence and stability. The embodiment of the invention is based on a Deep Deterministic Policy Gradient (DDPG) algorithm in a DRL, and carries out sample collection and training by continuously interacting with the environment according to a defined MDP model, continuously optimizes neural network parameters by taking a maximized reward function as a target, and finally can automatically control the current state input of the vehicle according to the CCC to generate an intelligent optimization control Policy output signal in real time, thereby realizing the safe and stable control of the CCC automatically controlled vehicle.

Based on any one of the embodiments, the acquiring queue state information of a vehicle queue built by automatically controlled vehicles and establishing a dynamic equation of the queue system according to the queue state information includes the following steps:

Specifically, establishing a queue system model according to a queue includes:

collecting the distance, speed and acceleration information of each vehicle in the queue according to V2V communication;

establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information;

obtaining expected vehicle speed according to the head vehicle, and combining a range strategy to obtain an expected vehicle distance of each vehicle;

establishing a state error equation of each vehicle according to the expected vehicle speed and the expected vehicle distance as well as the current vehicle speed and the current vehicle distance of each vehicle;

and simultaneously establishing state error equations of all vehicles to obtain a queue state equation based on continuous time, and obtaining a queue system model based on discrete time after discretization.

Due to the fact that wireless V2V communication is introduced to promote state information sharing and communication between vehicles, a vehicle dynamic equation with time delay is obtained by analyzing the influence of time delay characteristics in a wireless network on CCC automatic control vehicles. And then, the state error equations of all the manually driven vehicles and the CCC automatically driven vehicles in the queue are combined to obtain a continuous time system state error equation. Then, the continuous time system state equation is discretized through sampling, and a queue system model based on discrete time is obtained.

Based on any of the above embodiments, the preset scope policy includes:

It should be noted that, the dynamic analysis is performed on the manually driven vehicle and the CCC automatic control vehicle, the state information of each vehicle in the queue, such as the vehicle distance, the vehicle speed, and the acceleration, is obtained through the V2V communication, and then the vehicle dynamic equation can be established according to the relationship between them. The speed of the head vehicle in the queue is used as the expected speed of other vehicles, and the expected distance can be obtained according to the range strategy. After the desired vehicle speed and the desired vehicle distance are obtained, a state error equation for each vehicle may be obtained. Wherein the expected vehicle distance and the vehicle speed satisfy the following range strategies:

wherein V (h) represents the desired vehicle speed, h represents the current vehicle distance, h_minIndicates a preset minimum vehicle distance, h_maxIndicating a preset maximum vehicle distance, v_maxIndicating a preset maximum vehicle speed.

Based on any of the above embodiments, the dynamic equation of the queue system obtained after the discretization process is as follows:

y_i+1＝A₀y_i+B₁u_i+B₂u_i-1；

Based on any of the above embodiments, the quadratic optimization control equation is constructed by using the minimized state error and the input as the objective function according to the dynamic equation of the queue system as follows:

c1 and c2 are preset coefficients.

Specifically, fig. 2 is a schematic diagram of an intelligent cruise control scene based on networked control according to an embodiment of the present invention, and for convenience of understanding, a vehicle queue according to an embodiment of the present invention is composed of m +1 vehicle groups, where a tail vehicle, i.e., a #1 vehicle, is a CCC autonomous vehicle, other vehicles are human manual vehicles, and a front vehicle, i.e., a # m +1 vehicle, is a head vehicle. Each vehicle in the fleet is equipped with a communication device, and the CCC autonomous vehicle can receive status information from other vehicles, including headway, vehicle speed, and acceleration, via V2V communication technology. In order to clearly illustrate the technical scheme of the embodiment of the invention, the head vehicle in the embodiment of the invention is used as a tracking target of the CCC automatic driving vehicle and runs at a dynamically changing vehicle speed.

As shown in FIG. 2, the equations for the dynamics of a human manually driven vehicle may be defined as follows:

wherein v is_j(t) represents a vehicle speed of a jth vehicle, h_j(t) represents a vehicle distance between the jth vehicle and the preceding vehicle,

denotes v (t) aboutDifferential of time t, λ_jAnd

represents a system parameter related to human driving behavior, and v (h) is a desired speed based on a vehicle distance.

While the dynamic equations for a CCC autonomous vehicle may be defined as follows:

wherein u (t) represents the control strategy, i.e. the acceleration of the CCC autonomous vehicle, and τ (t) represents the network-induced time delay in the networked control process.

The purpose of each vehicle in the fleet is to achieve a desired vehicle distance h^*(t) and desired vehicle speed v^*(t)＝V(h^*(t)). The distance error can be defined according to the deviation between the actual state and the expected state

Error of vehicle speed

Using linear first order approximation from a vehicle dynamics model

The error dynamics model for the vehicle fleet can be found as follows:

defining a state vector:

the dynamic equation of the system obtained by simultaneously establishing the error dynamics equation of each vehicle is as follows:

in the above formula, the first and second carbon atoms are,

obtaining the ith sampling interval by sampling a discretization system dynamic equation

The discrete time system dynamic model is as follows:

y_i+1＝A₀y_i+B₁u_i+B₂u_i-1

wherein, y_iY (i Δ T) and u_iU (i Δ T) represents the state variable and the acceleration control strategy at the current moment, respectively, Δ T represents the sampling interval, and the other parameters are:

the aim of cruise control is to make the vehicle track the target distance and speed so as to keep the whole motorcade in balance state^*Is equal to 0. To achieve optimal control, a quadratic cost function is defined as:

in the above formula, N is the number of sampling intervals, C and D are coefficient matrices:

wherein, c₁And c₂In the embodiment of the present invention, the coefficients are respectively 1 and 0.1 for presetting the coefficients.

In summary, the cruise control system optimization problem can be constructed as follows:

s.t.y_i+1＝A₀y_i+B₁u_i+B₂u_i-1

based on the influence of the dynamic time-varying characteristics of the network, in order to improve the environmental adaptability and the self-learning capability of the networked intelligent cruise control system, the embodiment of the invention provides an intelligent optimization control method based on the DRL to solve the optimization problem.

MDP is usually used to formally describe the RL problem, and at each time slot k, the agent observes the current state from the environment and makes a decision, and after performing an action gets the next state and adjusts the policy by the reward value fed back. According to the embodiment of the invention, states, actions, state transition functions and reward functions in the MDP are defined according to a cruise control system model and an optimization problem under a constructed network dynamic scene.

1) Status of state

Considering that the optimal control strategy is affected by the current state and the delay control signal caused by the network delay, the new state vector is defined as:

2) movement of

For networked cruise control systems, actions may be defined as acceleration control strategies:

a_k＝u_k

3) state transfer function

Discrete time system model and state vector s based on networked cruise control system_kThe state transition function can be expressed as:

s_k+1＝s_kE+a_kF

wherein the content of the first and second substances,

4) reward function

Unlike minimizing the cost function in optimization theory, the goal of the intelligent algorithm is to maximize the long-term cumulative prize value, so the prize function can be defined as:

wherein the content of the first and second substances,

the long term jackpot value, referred to as the reward, is expressed as follows:

in the above formula, 0 < gamma < 1 is a discount factor.

Because the action value of the cruise control system is continuous, the DDPG method in the DRL can well solve the problem of system performance reduction caused by discrete action design. Therefore, the embodiment of the invention provides an intelligent optimization control method based on DDPG to obtain an intelligent control strategy, thereby improving the convergence and stability of the system.

Based on any one of the embodiments, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on a vehicle queue real-time collected state sample constructed by the automatic control vehicle, and includes:

it should be noted that the intelligent cruise control architecture based on networked control is shown in fig. 3, wherein the DDPG mainly comprises four deep neural networks, namely, a current operator network μ (s | θ |)^μ) Target operator network μ' (s | θ)^μ′) Current criticic network Q (s, a | θ)^Q) Target criticic network Q' (s, a | θ)^Q′) Where μ (-) is a deterministic action policy, Q (-) is an action cost evaluation function, and θ represents the corresponding neural network parameter. The intelligent agent obtains a control strategy mu through training the operator network learning, and obtains a corresponding Q value through training the critic network to evaluate the control strategy.

In each time slot according to the input state s_kThe current actor network will output the corresponding action policy μ (s | θ)^μ) Execution policy

And obtaining the state s of the next moment according to the state transfer function_k+1And deriving a corresponding prize r from the prize function_kWill be

Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,

x_t＝r_t+γQ′(s_t+1,μ′(s_t+1|θ^μ′)|θ^Q′)

in the formula, r_tFor a corresponding value of the reward function, Q'(s)_t+1,μ′(s_t+1|θ^μ′)|θ^Q′) The next Q value, μ'(s), generated for the target critic network_t+1|θ^μ′) According to input state s for target operator network_t+1The generated next action strategy;

Wherein the content of the first and second substances,

is a gradient operator;

the target operator network and the target critic network update their parameters theta respectively as follows^Q'And theta^μ'：

θ^Q′←δθ^Q+(1-δ)θ^Q′

θ^μ′←δθ^μ+(1-δ)θ^μ′

Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.

Specifically, the intelligent cruise control method based on networked control can be divided into two steps: sampling and training.

1) Sampling

First, enough samples need to be collected for training, and in each time slot, according to the input state s_kCurrent actor networkWill output the corresponding action strategy mu (s | theta)^μ). In order to ensure effective exploration in a continuous motion space, the random noise eta is added to obtain an exploration strategy as follows:

execution policy

The state s of the next moment can be obtained according to the state transfer function_k+1And deriving a corresponding prize r from the prize function_kThen will(s)_k,a_k,s_k+1,r_k) Stored as samples in an empirical playback buffer. The above steps are repeated continuously to generate enough samples.

2) Training

In the training process of the embodiment of the invention, 200 time slots are taken as an episode (episode), and in each episode, a small batch of M samples(s) are randomly extracted_t,a_t,s_t+1,r_t) The method is used for training to reduce sample data correlation and improve training efficiency.

Where M is the number of samples sampled in a small batch, Q(s)_t,a_t|θ^Q) Is the current Q value by multiplying s_tAnd a_tInput into the current critic network to obtain x_tFor a target Q value, it can be expressed as:

x_t＝r_t+γQ′(s_t+1,μ′(s_t+1|θ^μ′)|θ^Q′)

in the above formula, r_tFor a corresponding value of the reward function, Q'(s)_t+1,μ′(s_t+1|θ^μ′)|θ^Q′) The next Q value, μ'(s), generated for the target critic network_t+1|θ^μ′) According to input state s for target operator network_t+1The next action policy generated.

Wherein M is the number of samples sampled in small batches,

the method is a gradient operator, and the main objective of the formula is to increase the action probability of obtaining a larger Q value by the current operator network.

Then, the target operator network and the target critic network respectively update the parameters theta thereof by means of' soft update^Q' and theta^μ'：

θ^Q′←δθ^Q+(1-δ)θ^Q′

θ^μ′←δθ^μ+(1-δ)θ^μ′

Wherein 0 < δ < 1 is a fixed constant.

Finally, through training of enough episodes, the optimized current actor network parameter theta can be obtained^μ*. Therefore, according to the input state s obtained each time, the optimization control strategy that the current operator network can generate the networked cruise control system in real time is as follows:

u^*＝a^*＝μ(s|θ^μ*)。

the following describes an intelligent cruise control device provided by the present invention, and the following description and the above-described intelligent cruise control method can be referred to correspondingly.

Fig. 4 is a schematic structural diagram of an intelligent cruise control device according to an embodiment of the present invention, and as shown in fig. 4, the device includes a status signal unit 410 and an intelligent control unit 420;

the state signal unit 410 is used for determining a current state signal of the automatic control vehicle;

the intelligent control unit 420 is configured to input a current state signal of the automatically controlled vehicle into an intelligent optimal control model, so as to implement intelligent cruise control on the automatically controlled vehicle;

The device provided by the embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control through continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene and realizing the safe and stable driving of the CCC automatic driving vehicle.

Based on any one of the above embodiments, the intelligent control unit comprises an intelligent optimization control module;

as shown in fig. 5, the intelligent optimization control module includes a system modeling module 510, a problem construction module 520, an MDP construction module 530, and a calculation processing module 540;

the system modeling module 510 is configured to obtain queue state information of a vehicle queue formed by automatically controlling vehicles, and establish a dynamic equation of the queue system according to the queue state information;

the problem construction module 520 is configured to construct a quadratic optimization control equation with a minimized state error and an input as an objective function according to the dynamic equation of the queue system;

the MDP building module 530 is configured to build a markov decision process model for networked control according to the dynamic equation of the queuing system and the quadratic form optimization control equation;

the calculation processing module 540 is configured to generate samples and train based on continuous interaction between the DRL algorithm and the environment, so as to obtain an intelligent optimization control strategy.

Based on any of the above embodiments, as shown in fig. 6, the system modeling module includes a state obtaining module 610, a dynamic constructing module 620, a state error constructing module 630, and a system dynamic module 640;

the state obtaining module 610 is configured to obtain vehicle distance, vehicle speed and acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;

the dynamic construction module 620 is configured to establish a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed, and the acceleration information of each vehicle in the vehicle queue;

the state error establishing module 630 is configured to obtain an expected vehicle speed through a head vehicle, obtain an expected vehicle distance of each vehicle based on a preset range strategy, and establish a state error equation of each vehicle according to the expected vehicle speed of the head vehicle, the expected vehicle distance of each vehicle, and the current vehicle speed and vehicle distance of each vehicle;

and the system dynamic module 640 is configured to combine the state error equations of the vehicles, and obtain the dynamic equation of the queue system after discretization processing based on the state equations of the vehicles in the continuous-time queue.

Based on any of the above embodiments, the preset scope policy includes:

y_i+1＝A₀y_i+B₁u_i+B₂u_i-1；

c₁and c2 is a preset coefficient.

x_t＝r_t+γQ′(s_t+1,μ′(s_t+1|θ^μ′)|θ^Q′)

Wherein M is the number of samples sampled in small batches,

is a gradient operator;

θ^Q′←δθ^Q+(1-δ)θ^Q′

θ^μ′←δθ^μ+(1-δ)θ^μ′

Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.

To sum up, the intelligent cruise control method and the intelligent cruise control device provided by the embodiment of the invention construct a dynamic equation of an overall vehicle queue system by comprehensively analyzing vehicle dynamics and wireless network characteristics, consider the influence of dynamic time-varying network communication delay and an expected state, and establish an optimization control problem, thereby constructing an MDP model, adopt an intelligent algorithm based on DRL, generate samples through continuous interaction with the environment, train a neural network, and continuously accumulate experiences, thereby obtaining an intelligent optimization control strategy of automatically controlled vehicles, so that the automatically controlled vehicles can track an ideal expected vehicle speed and always keep a safe vehicle distance with a front vehicle, and meanwhile, the automatically controlled vehicles can also autonomously and stably run in an actual complex network dynamic scene. That is, the embodiment of the invention obtains the intelligent optimization control strategy of the cruise control system based on the networked control by integrally modeling the vehicle queue and combining the optimization control theory and the artificial intelligence method under the scene of network communication delay and the dynamic change of the expected state of the system, thereby realizing the stable control of the CCC automatic control vehicle. The method has the advantages that the networked control and artificial intelligence technology is applied to the automatic cruise control system of the vehicle, the influence of a complex dynamic environment on the control system is considered, a DRL-based method is further designed to obtain an intelligent optimization control strategy, and the environmental adaptability and the self-learning capability of the cruise control system are promoted.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a smart cruise control method comprising: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the smart cruise control method provided by the above methods, where the method includes: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.

In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the intelligent cruise control method provided in the foregoing aspects, the method including: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A smart cruise control method, comprising:

determining a current status signal of the automatically controlled vehicle;

2. The smart cruise control method according to claim 1, wherein the markov decision process model building process comprises the steps of:

3. The intelligent cruise control method according to claim 2, wherein said obtaining queue state information of a vehicle queue built by automatically controlled vehicles and building a dynamic equation of the queue system according to said queue state information comprises the steps of:

4. A smart cruise control method according to claim 3, characterized in that the predetermined range strategy comprises:

if the current vehicle distance is not less than the preset minimum vehicle distance and is not less thanIf the vehicle distance is larger than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula is

5. A smart cruise control method according to claim 3, characterized in that the discretized dynamic equations of the obtained queue system are as follows:

y_i+1＝A₀y_i+B₁u_i+B₂u_i-1；

6. A smart cruise control process according to claim 2, characterized in that said quadratic optimization control equation is constructed from the dynamical equations of said fleet system with the objective function of minimizing the state error and inputs as follows:

c₁and c₂Is a preset coefficient.

7. The intelligent cruise control method according to claim 1, wherein the intelligent optimization control model is obtained by performing neural network parameter training on a Markov decision process model based on vehicle queue real-time collected state samples built by the automatic control vehicle, and comprises the following steps:

x_t＝r_t+γQ′(s_t+1,μ′(s_t+1|θ^μ′)|θ^Q′)

Wherein ^ is a gradient operator;

θ^Q′←δθ^Q+(1-δ)θ^Q′

θ^μ′←δθ^μ+(1-δ)θ^μ′

Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.

8. The intelligent cruise control device is characterized by comprising a state signal unit and an intelligent control unit;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the smart cruise control method according to any of claims 1 to 7 are implemented by the processor when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the smart cruise control method according to any one of claims 1 to 7.