CN113335277A - Intelligent cruise control method and device, electronic equipment and storage medium - Google Patents

Intelligent cruise control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113335277A
CN113335277A CN202110458260.3A CN202110458260A CN113335277A CN 113335277 A CN113335277 A CN 113335277A CN 202110458260 A CN202110458260 A CN 202110458260A CN 113335277 A CN113335277 A CN 113335277A
Authority
CN
China
Prior art keywords
vehicle
queue
state
current
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110458260.3A
Other languages
Chinese (zh)
Inventor
王朱伟
金森繁
刘力菡
方超
孙阳
李萌
杨睿哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110458260.3A priority Critical patent/CN113335277A/en
Publication of CN113335277A publication Critical patent/CN113335277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
    • B60W30/14Adaptive cruise control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • B60W2554/4042Longitudinal speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects
    • B60W2554/802Longitudinal distance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Geometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Operations Research (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The embodiment of the invention provides an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles. The invention solves the problems of unpredictability of complex traffic environment and unreliability of network in the conventional cruise control method based on networked control.

Description

Intelligent cruise control method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of automatic control, in particular to an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium.
Background
Cruise control is an advanced driving assisting method, can effectively reduce the burden of a driver, and improves road traffic efficiency, driving safety and fuel economy. At present, cruise control methods based on networked control, such as Adaptive Cruise Control (ACC), Coordinated Adaptive Cruise Control (CACC), and interconnected cruise control (CCC), have many limitations although receiving wide attention and application. For example, the ACC method combines multiple sensor technologies to sense road traffic information, and the sensor has poor sensing sensitivity and is easily interfered by external environments, so that the ACC method has insufficient stability and safety. The CACC method introduces a vehicle-to-vehicle (V2V) communication technology in the internet of vehicles on the basis of the ACC to promote the vehicles in the fleet to actively exchange their motion state information, however, the CACC method requires that each vehicle in the fleet is equipped with an ACC autopilot device to assist cooperative control, and the communication topology is usually fixed, and when there is a manually driven vehicle in the fleet or the road conditions change, the performance and stability of the CACC will inevitably be reduced, which also limits its application in future traffic scenarios. In order to realize more flexible vehicle queue design, connection structure and communication topological structure, the CCC further provided allows the controlled vehicle to receive the state information broadcasted by a plurality of front vehicles without equipping all vehicles with sensors, and the whole queue does not need to be designed uniformly while the information perception and control capability of each vehicle is improved. Although the CCC system requires neither a specific head car nor a fixed communication structure, and thus can selectively communicate, allowing for a modular design and better scalability, under the limitation of environmental changes, controlled vehicle movement, transmission capability of network nodes and link quality, the characteristics of its topology, network communication delay and expectation state will be dynamic and time-varying, and unpredictability of complex traffic environment and unreliability of network will bring a serious challenge to the networked control based cruise control method.
Disclosure of Invention
The embodiment of the invention provides an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium, which are used for solving the problems of part or all of the problems in the conventional cruise control method based on networked control.
In a first aspect, an embodiment of the present invention provides a smart cruise control method, including:
determining a current status signal of the automatically controlled vehicle;
inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
Preferably, the markov decision process model is constructed by the following steps:
acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;
according to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;
and constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.
Preferably, the acquiring queue state information of a vehicle queue built by automatically controlled vehicles and building a dynamic equation of the queue system according to the queue state information includes the following steps:
obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;
acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;
and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.
Preferably, the preset scope policy includes:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula is
Figure BDA0003041385510000031
Wherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
Preferably, the dynamic equation of the queue system obtained after the discretization process is as follows:
yi+1=A0yi+B1ui+B2ui-1
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
Figure BDA0003041385510000032
Figure BDA0003041385510000041
Figure BDA0003041385510000042
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAnd
Figure BDA0003041385510000043
representing system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,
Figure BDA0003041385510000044
is the partial derivative of the range strategy at the desired vehicle distance.
Preferably, the quadratic optimization control equation is constructed by taking the minimized state error and the input as an objective function according to the dynamic equation of the queue system as follows:
Figure BDA0003041385510000045
wherein N is the number of sampling intervals, C and D are coefficient matrices:
Figure BDA0003041385510000046
c1 and c2 are preset coefficients.
Preferably, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on vehicle queue real-time collected state samples constructed by the automatic control vehicle, and includes:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
in each time slot according to the input state skThe current actor network will output the corresponding action policy mu(s)kμ) Execution policy
Figure BDA0003041385510000051
And obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill(s)k,ak,sk,+r1k) Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
Figure BDA0003041385510000052
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ
Figure BDA0003041385510000053
Where M is the number of samples sampled in a small batch, Q(s)t,atQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1μ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1μ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1μ′) Based on input for target operator networkState st+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ
Figure BDA0003041385510000054
Wherein the content of the first and second substances,
Figure BDA0003041385510000055
is a gradient operator;
the target operator network and the target critic network update their parameters theta respectively as followsQ' and thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
In a second aspect, an embodiment of the present invention provides an intelligent cruise control apparatus, including a state signal unit and an intelligent control unit;
the state signal unit is used for determining a current state signal of the automatic control vehicle;
the intelligent control unit is used for inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the intelligent cruise control method according to any one of the above-mentioned first aspects when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the intelligent cruise control method according to any one of the above-mentioned first aspects.
According to the intelligent cruise control method, the intelligent cruise control device, the electronic equipment and the storage medium, the current state signal of the automatic control vehicle is input into the intelligent optimization control model, so that the intelligent cruise control of the automatic control vehicle is realized; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles. The embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control by continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene, realizing the safe and stable driving of the automatic driving vehicle, and solving the problems of unpredictability of complex traffic environment and unreliability of the network in the conventional cruise control method based on networked control.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a smart cruise control method according to the present invention;
FIG. 2 is a schematic diagram of a smart cruise control scenario based on networked control provided by the present invention;
FIG. 3 is a diagram of a smart cruise control architecture based on networked control provided by the present invention;
fig. 4 is a schematic structural diagram of an intelligent cruise control device provided by the invention;
FIG. 5 is a block diagram of an intelligent optimization control module provided by the present invention;
FIG. 6 is a block diagram of a system modeling module provided by the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following describes a smart cruise control method, apparatus, electronic device and storage medium provided by the present invention with reference to fig. 1 to 7.
The embodiment of the invention provides an intelligent cruise control method. Fig. 1 is a schematic flow chart of a smart cruise control method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 110, determining a current state signal of the automatic control vehicle;
specifically, the vehicle queue in the embodiment of the invention comprises a manually-driven vehicle and a CCC vehicle, each vehicle in the queue is provided with a communication device, and the CCC automatically-driven vehicle can receive state information including headway, vehicle speed and acceleration from other vehicles through V2V communication technology.
Step 120, inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
Specifically, a dynamic equation of a vehicle queue system is constructed by analyzing vehicle dynamics and wireless network characteristics, an optimization control problem is established by considering the influence of dynamic time-varying network communication time delay and an expected state, an MDP model is constructed, a DRL algorithm is used, samples are generated by continuously interacting with the environment and a neural network is trained, and finally an intelligent optimization control strategy of an automatic control vehicle is obtained, so that the automatic control vehicle can track an ideal expected vehicle speed and always keep a safe vehicle distance with a front vehicle, and meanwhile, the stable running of the control system and a vehicle queue under a network dynamic condition is guaranteed.
The method provided by the embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control through continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene and realizing the safe and stable driving of the CCC automatic driving vehicle.
Based on any one of the above embodiments, the construction process of the markov decision process model comprises the following steps:
acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;
it should be noted that, due to the flexible network topology between vehicles in the CCC system, each vehicle can communicate with nearby vehicles. Through wireless V2V communication, the CCC vehicle can acquire real-time state information such as headway, speed and acceleration of other vehicles in the fleet, so that the whole vehicle queue can be modeled. Meanwhile, the CCC can provide services for heterogeneous vehicle queues, so that the sequence and the number of manually driven vehicles and CCC automatic control vehicles in a fleet are variable, and the requirements of real traffic scenes on the flexibility of the vehicle queues are met better. Generally, the automatic control vehicle does not need to consider the vehicle state of the subsequent vehicle, and in order to describe the technical scheme more clearly, the embodiment of the invention takes the tail vehicle as the CCC automatic control vehicle and other vehicles as the manual driving vehicles as examples. In addition, the method provided by the embodiment of the invention is also suitable for controlling the automatic control vehicle in a more complex model, and when the queue model changes, the modeling method provided by the embodiment of the invention can be used for constructing a corresponding system dynamic equation according to the specific situation of the queue.
According to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;
it is noted that the goal of cruise control is to enable the vehicles in the vehicle train to track a desired vehicle speed and maintain a desired vehicle separation while achieving comfortable and smooth acceleration control. Therefore, a quadratic optimization control problem can be constructed with the goal of minimizing vehicle speed and vehicle distance errors and control inputs. On the one hand, however, such optimization control problems are difficult to solve directly due to the high dimensional state space and complex physical properties. On the other hand, due to the influence of the actual network communication delay and the dynamic time-varying characteristic of the expected state, a traditional optimization decision method depending on a fixed parameter model and a static strategy is adopted, so that higher robustness and stability risks often exist. Therefore, the embodiment of the invention provides an intelligent optimization control method based on DRL (deep recovery learning) to improve the adaptability and stability of an automatic control vehicle under a complex dynamic condition.
And constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.
It should be noted that the Reinforcement Learning (RL) problem is usually described by MDP (markov Decision process), which generally includes a state, an action, a state transfer function and a reward function, and an MDP model of the system is established according to the system model and the optimization problem. And obtaining an intelligent optimization control strategy by adopting an algorithm based on Deep Reinforcement Learning (DRL) according to the MDP model. When dealing with cruise control, which is a continuous control problem of action values, traditional artificial intelligence algorithms based on discrete actions, such as Q-learning, DQN (Deep Q-learning), Actor-Critic (Actor-Critic), etc., performance is often degraded due to poor convergence and stability. The embodiment of the invention is based on a Deep Deterministic Policy Gradient (DDPG) algorithm in a DRL, and carries out sample collection and training by continuously interacting with the environment according to a defined MDP model, continuously optimizes neural network parameters by taking a maximized reward function as a target, and finally can automatically control the current state input of the vehicle according to the CCC to generate an intelligent optimization control Policy output signal in real time, thereby realizing the safe and stable control of the CCC automatically controlled vehicle.
Based on any one of the embodiments, the acquiring queue state information of a vehicle queue built by automatically controlled vehicles and establishing a dynamic equation of the queue system according to the queue state information includes the following steps:
obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;
acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;
and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.
Specifically, establishing a queue system model according to a queue includes:
collecting the distance, speed and acceleration information of each vehicle in the queue according to V2V communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information;
obtaining expected vehicle speed according to the head vehicle, and combining a range strategy to obtain an expected vehicle distance of each vehicle;
establishing a state error equation of each vehicle according to the expected vehicle speed and the expected vehicle distance as well as the current vehicle speed and the current vehicle distance of each vehicle;
and simultaneously establishing state error equations of all vehicles to obtain a queue state equation based on continuous time, and obtaining a queue system model based on discrete time after discretization.
Due to the fact that wireless V2V communication is introduced to promote state information sharing and communication between vehicles, a vehicle dynamic equation with time delay is obtained by analyzing the influence of time delay characteristics in a wireless network on CCC automatic control vehicles. And then, the state error equations of all the manually driven vehicles and the CCC automatically driven vehicles in the queue are combined to obtain a continuous time system state error equation. Then, the continuous time system state equation is discretized through sampling, and a queue system model based on discrete time is obtained.
Based on any of the above embodiments, the preset scope policy includes:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula is
Figure BDA0003041385510000111
Wherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
It should be noted that, the dynamic analysis is performed on the manually driven vehicle and the CCC automatic control vehicle, the state information of each vehicle in the queue, such as the vehicle distance, the vehicle speed, and the acceleration, is obtained through the V2V communication, and then the vehicle dynamic equation can be established according to the relationship between them. The speed of the head vehicle in the queue is used as the expected speed of other vehicles, and the expected distance can be obtained according to the range strategy. After the desired vehicle speed and the desired vehicle distance are obtained, a state error equation for each vehicle may be obtained. Wherein the expected vehicle distance and the vehicle speed satisfy the following range strategies:
Figure BDA0003041385510000112
wherein V (h) represents the desired vehicle speed, h represents the current vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxIndicating a preset maximum vehicle speed.
Based on any of the above embodiments, the dynamic equation of the queue system obtained after the discretization process is as follows:
yi+1=A0yi+B1ui+B2ui-1
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
Figure BDA0003041385510000121
Figure BDA0003041385510000122
Figure BDA0003041385510000123
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAnd
Figure BDA0003041385510000126
representing system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,
Figure BDA0003041385510000124
is the partial derivative of the range strategy at the desired vehicle distance.
Based on any of the above embodiments, the quadratic optimization control equation is constructed by using the minimized state error and the input as the objective function according to the dynamic equation of the queue system as follows:
Figure BDA0003041385510000125
wherein N is the number of sampling intervals, C and D are coefficient matrices:
Figure BDA0003041385510000131
c1 and c2 are preset coefficients.
Specifically, fig. 2 is a schematic diagram of an intelligent cruise control scene based on networked control according to an embodiment of the present invention, and for convenience of understanding, a vehicle queue according to an embodiment of the present invention is composed of m +1 vehicle groups, where a tail vehicle, i.e., a #1 vehicle, is a CCC autonomous vehicle, other vehicles are human manual vehicles, and a front vehicle, i.e., a # m +1 vehicle, is a head vehicle. Each vehicle in the fleet is equipped with a communication device, and the CCC autonomous vehicle can receive status information from other vehicles, including headway, vehicle speed, and acceleration, via V2V communication technology. In order to clearly illustrate the technical scheme of the embodiment of the invention, the head vehicle in the embodiment of the invention is used as a tracking target of the CCC automatic driving vehicle and runs at a dynamically changing vehicle speed.
As shown in FIG. 2, the equations for the dynamics of a human manually driven vehicle may be defined as follows:
Figure BDA0003041385510000132
Figure BDA0003041385510000133
wherein v isj(t) represents a vehicle speed of a jth vehicle, hj(t) represents a vehicle distance between the jth vehicle and the preceding vehicle,
Figure BDA0003041385510000137
denotes v (t) aboutDifferential of time t, λjAnd
Figure BDA0003041385510000134
represents a system parameter related to human driving behavior, and v (h) is a desired speed based on a vehicle distance.
While the dynamic equations for a CCC autonomous vehicle may be defined as follows:
Figure BDA0003041385510000135
Figure BDA0003041385510000136
wherein u (t) represents the control strategy, i.e. the acceleration of the CCC autonomous vehicle, and τ (t) represents the network-induced time delay in the networked control process.
The purpose of each vehicle in the fleet is to achieve a desired vehicle distance h*(t) and desired vehicle speed v*(t)=V(h*(t)). The distance error can be defined according to the deviation between the actual state and the expected state
Figure BDA0003041385510000141
Error of vehicle speed
Figure BDA0003041385510000142
Using linear first order approximation from a vehicle dynamics model
Figure BDA0003041385510000143
The error dynamics model for the vehicle fleet can be found as follows:
Figure BDA0003041385510000144
Figure BDA0003041385510000145
defining a state vector:
Figure BDA0003041385510000146
the dynamic equation of the system obtained by simultaneously establishing the error dynamics equation of each vehicle is as follows:
Figure BDA0003041385510000147
in the above formula, the first and second carbon atoms are,
Figure BDA0003041385510000148
Figure BDA0003041385510000149
obtaining the ith sampling interval by sampling a discretization system dynamic equation
Figure BDA00030413855100001410
The discrete time system dynamic model is as follows:
yi+1=A0yi+B1ui+B2ui-1
wherein, yiY (i Δ T) and uiU (i Δ T) represents the state variable and the acceleration control strategy at the current moment, respectively, Δ T represents the sampling interval, and the other parameters are:
Figure BDA0003041385510000151
the aim of cruise control is to make the vehicle track the target distance and speed so as to keep the whole motorcade in balance state*Is equal to 0. To achieve optimal control, a quadratic cost function is defined as:
Figure BDA0003041385510000152
in the above formula, N is the number of sampling intervals, C and D are coefficient matrices:
Figure BDA0003041385510000153
wherein, c1And c2In the embodiment of the present invention, the coefficients are respectively 1 and 0.1 for presetting the coefficients.
In summary, the cruise control system optimization problem can be constructed as follows:
Figure BDA0003041385510000154
s.t.yi+1=A0yi+B1ui+B2ui-1
based on the influence of the dynamic time-varying characteristics of the network, in order to improve the environmental adaptability and the self-learning capability of the networked intelligent cruise control system, the embodiment of the invention provides an intelligent optimization control method based on the DRL to solve the optimization problem.
MDP is usually used to formally describe the RL problem, and at each time slot k, the agent observes the current state from the environment and makes a decision, and after performing an action gets the next state and adjusts the policy by the reward value fed back. According to the embodiment of the invention, states, actions, state transition functions and reward functions in the MDP are defined according to a cruise control system model and an optimization problem under a constructed network dynamic scene.
1) Status of state
Considering that the optimal control strategy is affected by the current state and the delay control signal caused by the network delay, the new state vector is defined as:
Figure BDA0003041385510000161
2) movement of
For networked cruise control systems, actions may be defined as acceleration control strategies:
ak=uk
3) state transfer function
Discrete time system model and state vector s based on networked cruise control systemkThe state transition function can be expressed as:
sk+1=skE+akF
wherein the content of the first and second substances,
Figure BDA0003041385510000162
4) reward function
Unlike minimizing the cost function in optimization theory, the goal of the intelligent algorithm is to maximize the long-term cumulative prize value, so the prize function can be defined as:
Figure BDA0003041385510000163
wherein the content of the first and second substances,
Figure BDA0003041385510000164
the long term jackpot value, referred to as the reward, is expressed as follows:
Figure BDA0003041385510000171
in the above formula, 0 < gamma < 1 is a discount factor.
Because the action value of the cruise control system is continuous, the DDPG method in the DRL can well solve the problem of system performance reduction caused by discrete action design. Therefore, the embodiment of the invention provides an intelligent optimization control method based on DDPG to obtain an intelligent control strategy, thereby improving the convergence and stability of the system.
Based on any one of the embodiments, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on a vehicle queue real-time collected state sample constructed by the automatic control vehicle, and includes:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
it should be noted that the intelligent cruise control architecture based on networked control is shown in fig. 3, wherein the DDPG mainly comprises four deep neural networks, namely, a current operator network μ (s | θ |)μ) Target operator network μ' (s | θ)μ′) Current criticic network Q (s, a | θ)Q) Target criticic network Q' (s, a | θ)Q′) Where μ (-) is a deterministic action policy, Q (-) is an action cost evaluation function, and θ represents the corresponding neural network parameter. The intelligent agent obtains a control strategy mu through training the operator network learning, and obtains a corresponding Q value through training the critic network to evaluate the control strategy.
In each time slot according to the input state skThe current actor network will output the corresponding action policy μ (s | θ)μ) Execution policy
Figure BDA0003041385510000172
And obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill be
Figure BDA0003041385510000175
Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
Figure BDA0003041385510000173
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ
Figure BDA0003041385510000174
Where M is the number of samples sampled in a small batch, Q(s)t,atQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1μ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1μ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1μ′) According to input state s for target operator networkt+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ
Figure BDA0003041385510000181
Wherein the content of the first and second substances,
Figure BDA0003041385510000182
is a gradient operator;
the target operator network and the target critic network update their parameters theta respectively as followsQ'And thetaμ'
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
Specifically, the intelligent cruise control method based on networked control can be divided into two steps: sampling and training.
1) Sampling
First, enough samples need to be collected for training, and in each time slot, according to the input state skCurrent actor networkWill output the corresponding action strategy mu (s | theta)μ). In order to ensure effective exploration in a continuous motion space, the random noise eta is added to obtain an exploration strategy as follows:
Figure BDA0003041385510000183
execution policy
Figure BDA0003041385510000184
The state s of the next moment can be obtained according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkThen will(s)k,ak,sk+1,rk) Stored as samples in an empirical playback buffer. The above steps are repeated continuously to generate enough samples.
2) Training
In the training process of the embodiment of the invention, 200 time slots are taken as an episode (episode), and in each episode, a small batch of M samples(s) are randomly extractedt,at,st+1,rt) The method is used for training to reduce sample data correlation and improve training efficiency.
Current critic networks update their parameter θ by minimizing the following mean square error loss functionQ
Figure BDA0003041385510000191
Where M is the number of samples sampled in a small batch, Q(s)t,atQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor a target Q value, it can be expressed as:
xt=rt+γQ′(st+1,μ′(st+1μ′)|θQ′)
in the above formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1μ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1μ′) According to input state s for target operator networkt+1The next action policy generated.
Current actor networks update their parameter θ by a strategic gradient functionμ
Figure BDA0003041385510000192
Wherein M is the number of samples sampled in small batches,
Figure BDA0003041385510000193
the method is a gradient operator, and the main objective of the formula is to increase the action probability of obtaining a larger Q value by the current operator network.
Then, the target operator network and the target critic network respectively update the parameters theta thereof by means of' soft updateQ' and thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein 0 < δ < 1 is a fixed constant.
Finally, through training of enough episodes, the optimized current actor network parameter theta can be obtainedμ*. Therefore, according to the input state s obtained each time, the optimization control strategy that the current operator network can generate the networked cruise control system in real time is as follows:
u*=a*=μ(s|θμ*)。
the following describes an intelligent cruise control device provided by the present invention, and the following description and the above-described intelligent cruise control method can be referred to correspondingly.
Fig. 4 is a schematic structural diagram of an intelligent cruise control device according to an embodiment of the present invention, and as shown in fig. 4, the device includes a status signal unit 410 and an intelligent control unit 420;
the state signal unit 410 is used for determining a current state signal of the automatic control vehicle;
the intelligent control unit 420 is configured to input a current state signal of the automatically controlled vehicle into an intelligent optimal control model, so as to implement intelligent cruise control on the automatically controlled vehicle;
the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.
The device provided by the embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control through continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene and realizing the safe and stable driving of the CCC automatic driving vehicle.
Based on any one of the above embodiments, the intelligent control unit comprises an intelligent optimization control module;
as shown in fig. 5, the intelligent optimization control module includes a system modeling module 510, a problem construction module 520, an MDP construction module 530, and a calculation processing module 540;
the system modeling module 510 is configured to obtain queue state information of a vehicle queue formed by automatically controlling vehicles, and establish a dynamic equation of the queue system according to the queue state information;
the problem construction module 520 is configured to construct a quadratic optimization control equation with a minimized state error and an input as an objective function according to the dynamic equation of the queue system;
the MDP building module 530 is configured to build a markov decision process model for networked control according to the dynamic equation of the queuing system and the quadratic form optimization control equation;
the calculation processing module 540 is configured to generate samples and train based on continuous interaction between the DRL algorithm and the environment, so as to obtain an intelligent optimization control strategy.
Based on any of the above embodiments, as shown in fig. 6, the system modeling module includes a state obtaining module 610, a dynamic constructing module 620, a state error constructing module 630, and a system dynamic module 640;
the state obtaining module 610 is configured to obtain vehicle distance, vehicle speed and acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
the dynamic construction module 620 is configured to establish a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed, and the acceleration information of each vehicle in the vehicle queue;
the state error establishing module 630 is configured to obtain an expected vehicle speed through a head vehicle, obtain an expected vehicle distance of each vehicle based on a preset range strategy, and establish a state error equation of each vehicle according to the expected vehicle speed of the head vehicle, the expected vehicle distance of each vehicle, and the current vehicle speed and vehicle distance of each vehicle;
and the system dynamic module 640 is configured to combine the state error equations of the vehicles, and obtain the dynamic equation of the queue system after discretization processing based on the state equations of the vehicles in the continuous-time queue.
Based on any of the above embodiments, the preset scope policy includes:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula is
Figure BDA0003041385510000211
Wherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
Based on any of the above embodiments, the dynamic equation of the queue system obtained after the discretization process is as follows:
yi+1=A0yi+B1ui+B2ui-1
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
Figure BDA0003041385510000221
Figure BDA0003041385510000222
Figure BDA0003041385510000223
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAnd
Figure BDA0003041385510000224
representing system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,
Figure BDA0003041385510000225
is the partial derivative of the range strategy at the desired vehicle distance.
Based on any of the above embodiments, the quadratic optimization control equation is constructed by using the minimized state error and the input as the objective function according to the dynamic equation of the queue system as follows:
Figure BDA0003041385510000226
wherein N is the number of sampling intervals, C and D are coefficient matrices:
Figure BDA0003041385510000231
c1and c2 is a preset coefficient.
Based on any one of the embodiments, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on a vehicle queue real-time collected state sample constructed by the automatic control vehicle, and includes:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
in each time slot according to the input state skThe current actor network will output the corresponding action policy μ (s | θ)μ) Execution policy
Figure BDA0003041385510000232
And obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill(s)k,ak,sk,+r1k) Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
Figure BDA0003041385510000233
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ
Figure BDA0003041385510000234
Where M is the number of samples sampled in a small batch, Q(s)t,atQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1μ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1μ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1μ′) According to input state s for target operator networkt+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ
Figure BDA0003041385510000241
Wherein M is the number of samples sampled in small batches,
Figure BDA0003041385510000242
is a gradient operator;
the target operator network and the target critic network update their parameters theta respectively as followsQ' and thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
To sum up, the intelligent cruise control method and the intelligent cruise control device provided by the embodiment of the invention construct a dynamic equation of an overall vehicle queue system by comprehensively analyzing vehicle dynamics and wireless network characteristics, consider the influence of dynamic time-varying network communication delay and an expected state, and establish an optimization control problem, thereby constructing an MDP model, adopt an intelligent algorithm based on DRL, generate samples through continuous interaction with the environment, train a neural network, and continuously accumulate experiences, thereby obtaining an intelligent optimization control strategy of automatically controlled vehicles, so that the automatically controlled vehicles can track an ideal expected vehicle speed and always keep a safe vehicle distance with a front vehicle, and meanwhile, the automatically controlled vehicles can also autonomously and stably run in an actual complex network dynamic scene. That is, the embodiment of the invention obtains the intelligent optimization control strategy of the cruise control system based on the networked control by integrally modeling the vehicle queue and combining the optimization control theory and the artificial intelligence method under the scene of network communication delay and the dynamic change of the expected state of the system, thereby realizing the stable control of the CCC automatic control vehicle. The method has the advantages that the networked control and artificial intelligence technology is applied to the automatic cruise control system of the vehicle, the influence of a complex dynamic environment on the control system is considered, a DRL-based method is further designed to obtain an intelligent optimization control strategy, and the environmental adaptability and the self-learning capability of the cruise control system are promoted.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a smart cruise control method comprising: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the smart cruise control method provided by the above methods, where the method includes: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the intelligent cruise control method provided in the foregoing aspects, the method including: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A smart cruise control method, comprising:
determining a current status signal of the automatically controlled vehicle;
inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
2. The smart cruise control method according to claim 1, wherein the markov decision process model building process comprises the steps of:
acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;
according to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;
and constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.
3. The intelligent cruise control method according to claim 2, wherein said obtaining queue state information of a vehicle queue built by automatically controlled vehicles and building a dynamic equation of the queue system according to said queue state information comprises the steps of:
obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;
acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;
and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.
4. A smart cruise control method according to claim 3, characterized in that the predetermined range strategy comprises:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and is not less thanIf the vehicle distance is larger than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula is
Figure FDA0003041385500000021
Wherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
5. A smart cruise control method according to claim 3, characterized in that the discretized dynamic equations of the obtained queue system are as follows:
yi+1=A0yi+B1ui+B2ui-1
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
Figure FDA0003041385500000022
Figure FDA0003041385500000023
Figure FDA0003041385500000031
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAnd
Figure FDA0003041385500000032
representing system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,
Figure FDA0003041385500000033
is the partial derivative of the range strategy at the desired vehicle distance.
6. A smart cruise control process according to claim 2, characterized in that said quadratic optimization control equation is constructed from the dynamical equations of said fleet system with the objective function of minimizing the state error and inputs as follows:
Figure FDA0003041385500000034
wherein N is the number of sampling intervals, C and D are coefficient matrices:
Figure FDA0003041385500000035
c1and c2Is a preset coefficient.
7. The intelligent cruise control method according to claim 1, wherein the intelligent optimization control model is obtained by performing neural network parameter training on a Markov decision process model based on vehicle queue real-time collected state samples built by the automatic control vehicle, and comprises the following steps:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
in each time slot according to the input state skThe current actor network will output the corresponding action policy mu(s)kμ) Execution policy
Figure FDA0003041385500000041
And obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill be
Figure FDA0003041385500000042
Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
Figure FDA0003041385500000043
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ
Figure FDA0003041385500000044
Where M is the number of samples sampled in a small batch, Q(s)t,atQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1μ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1μ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1μ′) According to input state s for target operator networkt+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ
Figure FDA0003041385500000045
Wherein ^ is a gradient operator;
the target operator network and the target critic network update their parameters theta respectively as followsQ'And thetaμ'
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
8. The intelligent cruise control device is characterized by comprising a state signal unit and an intelligent control unit;
the state signal unit is used for determining a current state signal of the automatic control vehicle;
the intelligent control unit is used for inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the smart cruise control method according to any of claims 1 to 7 are implemented by the processor when executing the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the smart cruise control method according to any one of claims 1 to 7.
CN202110458260.3A 2021-04-27 2021-04-27 Intelligent cruise control method and device, electronic equipment and storage medium Pending CN113335277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110458260.3A CN113335277A (en) 2021-04-27 2021-04-27 Intelligent cruise control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110458260.3A CN113335277A (en) 2021-04-27 2021-04-27 Intelligent cruise control method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113335277A true CN113335277A (en) 2021-09-03

Family

ID=77468696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110458260.3A Pending CN113335277A (en) 2021-04-27 2021-04-27 Intelligent cruise control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113335277A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113734167A (en) * 2021-09-10 2021-12-03 苏州智加科技有限公司 Vehicle control method, device, terminal and storage medium
CN114387787A (en) * 2022-03-24 2022-04-22 华砺智行(武汉)科技有限公司 Vehicle track control method and device, electronic equipment and storage medium
CN116257069A (en) * 2023-05-16 2023-06-13 睿羿科技(长沙)有限公司 Unmanned vehicle formation decision and speed planning method
CN117055586A (en) * 2023-06-28 2023-11-14 中国科学院自动化研究所 Underwater robot tour search and grabbing method and system based on self-adaptive control

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109606367A (en) * 2018-11-06 2019-04-12 北京工业大学 The optimum linearity control method and device of cruise control system based on car networking
CN109624986A (en) * 2019-03-01 2019-04-16 吉林大学 A kind of the study cruise control system and method for the driving style based on pattern switching
CN109733415A (en) * 2019-01-08 2019-05-10 同济大学 A kind of automatic Pilot following-speed model that personalizes based on deeply study
US20200033868A1 (en) * 2018-07-27 2020-01-30 GM Global Technology Operations LLC Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
CN110989576A (en) * 2019-11-14 2020-04-10 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 Hybrid vehicle intelligent time-domain-variable model prediction energy management method
CN112162555A (en) * 2020-09-23 2021-01-01 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN112580148A (en) * 2020-12-20 2021-03-30 东南大学 Heavy-duty operation vehicle rollover prevention driving decision method based on deep reinforcement learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200033868A1 (en) * 2018-07-27 2020-01-30 GM Global Technology Operations LLC Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
CN109606367A (en) * 2018-11-06 2019-04-12 北京工业大学 The optimum linearity control method and device of cruise control system based on car networking
CN109733415A (en) * 2019-01-08 2019-05-10 同济大学 A kind of automatic Pilot following-speed model that personalizes based on deeply study
CN109624986A (en) * 2019-03-01 2019-04-16 吉林大学 A kind of the study cruise control system and method for the driving style based on pattern switching
CN110989576A (en) * 2019-11-14 2020-04-10 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN111267831A (en) * 2020-02-28 2020-06-12 南京航空航天大学 Hybrid vehicle intelligent time-domain-variable model prediction energy management method
CN112162555A (en) * 2020-09-23 2021-01-01 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN112580148A (en) * 2020-12-20 2021-03-30 东南大学 Heavy-duty operation vehicle rollover prevention driving decision method based on deep reinforcement learning

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113734167A (en) * 2021-09-10 2021-12-03 苏州智加科技有限公司 Vehicle control method, device, terminal and storage medium
CN114387787A (en) * 2022-03-24 2022-04-22 华砺智行(武汉)科技有限公司 Vehicle track control method and device, electronic equipment and storage medium
CN114387787B (en) * 2022-03-24 2022-08-23 华砺智行(武汉)科技有限公司 Vehicle track control method and device, electronic equipment and storage medium
CN116257069A (en) * 2023-05-16 2023-06-13 睿羿科技(长沙)有限公司 Unmanned vehicle formation decision and speed planning method
CN116257069B (en) * 2023-05-16 2023-08-08 睿羿科技(长沙)有限公司 Unmanned vehicle formation decision and speed planning method
CN117055586A (en) * 2023-06-28 2023-11-14 中国科学院自动化研究所 Underwater robot tour search and grabbing method and system based on self-adaptive control

Similar Documents

Publication Publication Date Title
CN113335277A (en) Intelligent cruise control method and device, electronic equipment and storage medium
Zhu et al. Human-like autonomous car-following model with deep reinforcement learning
Liang et al. A deep reinforcement learning network for traffic light cycle control
WO2021208771A1 (en) Reinforced learning method and device
Zhu et al. Multi-robot flocking control based on deep reinforcement learning
CN109990790B (en) Unmanned aerial vehicle path planning method and device
Li et al. A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations
CN109726804B (en) Intelligent vehicle driving behavior personification decision-making method based on driving prediction field and BP neural network
CN113412494B (en) Method and device for determining transmission strategy
CN112937564A (en) Lane change decision model generation method and unmanned vehicle lane change decision method and device
Naveed et al. Trajectory planning for autonomous vehicles using hierarchical reinforcement learning
Han et al. Intelligent decision-making for 3-dimensional dynamic obstacle avoidance of UAV based on deep reinforcement learning
CN115578876A (en) Automatic driving method, system, equipment and storage medium of vehicle
US20230367934A1 (en) Method and apparatus for constructing vehicle dynamics model and method and apparatus for predicting vehicle state information
CN113867354A (en) Regional traffic flow guiding method for intelligent cooperation of automatic driving of multiple vehicles
CN112462602B (en) Distributed control method for keeping safety spacing of mobile stage fleet under DoS attack
CN115494879B (en) Rotor unmanned aerial vehicle obstacle avoidance method, device and equipment based on reinforcement learning SAC
Zhou et al. A novel mean-field-game-type optimal control for very large-scale multiagent systems
CN114815882A (en) Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning
Yuan et al. Prioritized experience replay-based deep q learning: Multiple-reward architecture for highway driving decision making
CN117406756A (en) Method, device, equipment and storage medium for determining motion trail parameters
Zhuang et al. Robust auto-parking: Reinforcement learning based real-time planning approach with domain template
Wang et al. Experience sharing based memetic transfer learning for multiagent reinforcement learning
Diallo et al. Coordination in adversarial multi-agent with deep reinforcement learning under partial observability
CN114723058A (en) Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination