CN113335277A - Intelligent cruise control method and device, electronic equipment and storage medium - Google Patents
Intelligent cruise control method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113335277A CN113335277A CN202110458260.3A CN202110458260A CN113335277A CN 113335277 A CN113335277 A CN 113335277A CN 202110458260 A CN202110458260 A CN 202110458260A CN 113335277 A CN113335277 A CN 113335277A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- queue
- state
- current
- control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 238000005457 optimization Methods 0.000 claims abstract description 65
- 230000008569 process Effects 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 230000006870 function Effects 0.000 claims description 35
- 238000004891 communication Methods 0.000 claims description 27
- 230000001133 acceleration Effects 0.000 claims description 22
- 238000011217 control strategy Methods 0.000 claims description 21
- 230000000875 corresponding effect Effects 0.000 claims description 19
- 238000005070 sampling Methods 0.000 claims description 19
- 230000009471 action Effects 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 5
- 230000001276 controlling effect Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 9
- 238000010276 construction Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000005094 computer simulation Methods 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
- B60W30/14—Adaptive cruise control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/40—Dynamic objects, e.g. animals, windblown objects
- B60W2554/404—Characteristics
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/40—Dynamic objects, e.g. animals, windblown objects
- B60W2554/404—Characteristics
- B60W2554/4042—Longitudinal speed
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/80—Spatial relation or speed relative to objects
- B60W2554/802—Longitudinal distance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2111/00—Details relating to CAD techniques
- G06F2111/06—Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Geometry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Algebra (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Operations Research (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The embodiment of the invention provides an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles. The invention solves the problems of unpredictability of complex traffic environment and unreliability of network in the conventional cruise control method based on networked control.
Description
Technical Field
The invention relates to the technical field of automatic control, in particular to an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium.
Background
Cruise control is an advanced driving assisting method, can effectively reduce the burden of a driver, and improves road traffic efficiency, driving safety and fuel economy. At present, cruise control methods based on networked control, such as Adaptive Cruise Control (ACC), Coordinated Adaptive Cruise Control (CACC), and interconnected cruise control (CCC), have many limitations although receiving wide attention and application. For example, the ACC method combines multiple sensor technologies to sense road traffic information, and the sensor has poor sensing sensitivity and is easily interfered by external environments, so that the ACC method has insufficient stability and safety. The CACC method introduces a vehicle-to-vehicle (V2V) communication technology in the internet of vehicles on the basis of the ACC to promote the vehicles in the fleet to actively exchange their motion state information, however, the CACC method requires that each vehicle in the fleet is equipped with an ACC autopilot device to assist cooperative control, and the communication topology is usually fixed, and when there is a manually driven vehicle in the fleet or the road conditions change, the performance and stability of the CACC will inevitably be reduced, which also limits its application in future traffic scenarios. In order to realize more flexible vehicle queue design, connection structure and communication topological structure, the CCC further provided allows the controlled vehicle to receive the state information broadcasted by a plurality of front vehicles without equipping all vehicles with sensors, and the whole queue does not need to be designed uniformly while the information perception and control capability of each vehicle is improved. Although the CCC system requires neither a specific head car nor a fixed communication structure, and thus can selectively communicate, allowing for a modular design and better scalability, under the limitation of environmental changes, controlled vehicle movement, transmission capability of network nodes and link quality, the characteristics of its topology, network communication delay and expectation state will be dynamic and time-varying, and unpredictability of complex traffic environment and unreliability of network will bring a serious challenge to the networked control based cruise control method.
Disclosure of Invention
The embodiment of the invention provides an intelligent cruise control method, an intelligent cruise control device, electronic equipment and a storage medium, which are used for solving the problems of part or all of the problems in the conventional cruise control method based on networked control.
In a first aspect, an embodiment of the present invention provides a smart cruise control method, including:
determining a current status signal of the automatically controlled vehicle;
inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
Preferably, the markov decision process model is constructed by the following steps:
acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;
according to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;
and constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.
Preferably, the acquiring queue state information of a vehicle queue built by automatically controlled vehicles and building a dynamic equation of the queue system according to the queue state information includes the following steps:
obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;
acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;
and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.
Preferably, the preset scope policy includes:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula isWherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
Preferably, the dynamic equation of the queue system obtained after the discretization process is as follows:
yi+1=A0yi+B1ui+B2ui-1;
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAndrepresenting system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,is the partial derivative of the range strategy at the desired vehicle distance.
Preferably, the quadratic optimization control equation is constructed by taking the minimized state error and the input as an objective function according to the dynamic equation of the queue system as follows:
wherein N is the number of sampling intervals, C and D are coefficient matrices:
c1 and c2 are preset coefficients.
Preferably, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on vehicle queue real-time collected state samples constructed by the automatic control vehicle, and includes:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
in each time slot according to the input state skThe current actor network will output the corresponding action policy mu(s)k|θμ) Execution policyAnd obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill(s)k,ak,sk,+r1k) Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ:
Where M is the number of samples sampled in a small batch, Q(s)t,at|θQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1|θμ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1|θμ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1|θμ′) Based on input for target operator networkState st+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ:
the target operator network and the target critic network update their parameters theta respectively as followsQ' and thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
In a second aspect, an embodiment of the present invention provides an intelligent cruise control apparatus, including a state signal unit and an intelligent control unit;
the state signal unit is used for determining a current state signal of the automatic control vehicle;
the intelligent control unit is used for inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the intelligent cruise control method according to any one of the above-mentioned first aspects when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the intelligent cruise control method according to any one of the above-mentioned first aspects.
According to the intelligent cruise control method, the intelligent cruise control device, the electronic equipment and the storage medium, the current state signal of the automatic control vehicle is input into the intelligent optimization control model, so that the intelligent cruise control of the automatic control vehicle is realized; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles. The embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control by continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene, realizing the safe and stable driving of the automatic driving vehicle, and solving the problems of unpredictability of complex traffic environment and unreliability of the network in the conventional cruise control method based on networked control.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a smart cruise control method according to the present invention;
FIG. 2 is a schematic diagram of a smart cruise control scenario based on networked control provided by the present invention;
FIG. 3 is a diagram of a smart cruise control architecture based on networked control provided by the present invention;
fig. 4 is a schematic structural diagram of an intelligent cruise control device provided by the invention;
FIG. 5 is a block diagram of an intelligent optimization control module provided by the present invention;
FIG. 6 is a block diagram of a system modeling module provided by the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following describes a smart cruise control method, apparatus, electronic device and storage medium provided by the present invention with reference to fig. 1 to 7.
The embodiment of the invention provides an intelligent cruise control method. Fig. 1 is a schematic flow chart of a smart cruise control method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
specifically, the vehicle queue in the embodiment of the invention comprises a manually-driven vehicle and a CCC vehicle, each vehicle in the queue is provided with a communication device, and the CCC automatically-driven vehicle can receive state information including headway, vehicle speed and acceleration from other vehicles through V2V communication technology.
the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
Specifically, a dynamic equation of a vehicle queue system is constructed by analyzing vehicle dynamics and wireless network characteristics, an optimization control problem is established by considering the influence of dynamic time-varying network communication time delay and an expected state, an MDP model is constructed, a DRL algorithm is used, samples are generated by continuously interacting with the environment and a neural network is trained, and finally an intelligent optimization control strategy of an automatic control vehicle is obtained, so that the automatic control vehicle can track an ideal expected vehicle speed and always keep a safe vehicle distance with a front vehicle, and meanwhile, the stable running of the control system and a vehicle queue under a network dynamic condition is guaranteed.
The method provided by the embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control through continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene and realizing the safe and stable driving of the CCC automatic driving vehicle.
Based on any one of the above embodiments, the construction process of the markov decision process model comprises the following steps:
acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;
it should be noted that, due to the flexible network topology between vehicles in the CCC system, each vehicle can communicate with nearby vehicles. Through wireless V2V communication, the CCC vehicle can acquire real-time state information such as headway, speed and acceleration of other vehicles in the fleet, so that the whole vehicle queue can be modeled. Meanwhile, the CCC can provide services for heterogeneous vehicle queues, so that the sequence and the number of manually driven vehicles and CCC automatic control vehicles in a fleet are variable, and the requirements of real traffic scenes on the flexibility of the vehicle queues are met better. Generally, the automatic control vehicle does not need to consider the vehicle state of the subsequent vehicle, and in order to describe the technical scheme more clearly, the embodiment of the invention takes the tail vehicle as the CCC automatic control vehicle and other vehicles as the manual driving vehicles as examples. In addition, the method provided by the embodiment of the invention is also suitable for controlling the automatic control vehicle in a more complex model, and when the queue model changes, the modeling method provided by the embodiment of the invention can be used for constructing a corresponding system dynamic equation according to the specific situation of the queue.
According to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;
it is noted that the goal of cruise control is to enable the vehicles in the vehicle train to track a desired vehicle speed and maintain a desired vehicle separation while achieving comfortable and smooth acceleration control. Therefore, a quadratic optimization control problem can be constructed with the goal of minimizing vehicle speed and vehicle distance errors and control inputs. On the one hand, however, such optimization control problems are difficult to solve directly due to the high dimensional state space and complex physical properties. On the other hand, due to the influence of the actual network communication delay and the dynamic time-varying characteristic of the expected state, a traditional optimization decision method depending on a fixed parameter model and a static strategy is adopted, so that higher robustness and stability risks often exist. Therefore, the embodiment of the invention provides an intelligent optimization control method based on DRL (deep recovery learning) to improve the adaptability and stability of an automatic control vehicle under a complex dynamic condition.
And constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.
It should be noted that the Reinforcement Learning (RL) problem is usually described by MDP (markov Decision process), which generally includes a state, an action, a state transfer function and a reward function, and an MDP model of the system is established according to the system model and the optimization problem. And obtaining an intelligent optimization control strategy by adopting an algorithm based on Deep Reinforcement Learning (DRL) according to the MDP model. When dealing with cruise control, which is a continuous control problem of action values, traditional artificial intelligence algorithms based on discrete actions, such as Q-learning, DQN (Deep Q-learning), Actor-Critic (Actor-Critic), etc., performance is often degraded due to poor convergence and stability. The embodiment of the invention is based on a Deep Deterministic Policy Gradient (DDPG) algorithm in a DRL, and carries out sample collection and training by continuously interacting with the environment according to a defined MDP model, continuously optimizes neural network parameters by taking a maximized reward function as a target, and finally can automatically control the current state input of the vehicle according to the CCC to generate an intelligent optimization control Policy output signal in real time, thereby realizing the safe and stable control of the CCC automatically controlled vehicle.
Based on any one of the embodiments, the acquiring queue state information of a vehicle queue built by automatically controlled vehicles and establishing a dynamic equation of the queue system according to the queue state information includes the following steps:
obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;
acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;
and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.
Specifically, establishing a queue system model according to a queue includes:
collecting the distance, speed and acceleration information of each vehicle in the queue according to V2V communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information;
obtaining expected vehicle speed according to the head vehicle, and combining a range strategy to obtain an expected vehicle distance of each vehicle;
establishing a state error equation of each vehicle according to the expected vehicle speed and the expected vehicle distance as well as the current vehicle speed and the current vehicle distance of each vehicle;
and simultaneously establishing state error equations of all vehicles to obtain a queue state equation based on continuous time, and obtaining a queue system model based on discrete time after discretization.
Due to the fact that wireless V2V communication is introduced to promote state information sharing and communication between vehicles, a vehicle dynamic equation with time delay is obtained by analyzing the influence of time delay characteristics in a wireless network on CCC automatic control vehicles. And then, the state error equations of all the manually driven vehicles and the CCC automatically driven vehicles in the queue are combined to obtain a continuous time system state error equation. Then, the continuous time system state equation is discretized through sampling, and a queue system model based on discrete time is obtained.
Based on any of the above embodiments, the preset scope policy includes:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula isWherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
It should be noted that, the dynamic analysis is performed on the manually driven vehicle and the CCC automatic control vehicle, the state information of each vehicle in the queue, such as the vehicle distance, the vehicle speed, and the acceleration, is obtained through the V2V communication, and then the vehicle dynamic equation can be established according to the relationship between them. The speed of the head vehicle in the queue is used as the expected speed of other vehicles, and the expected distance can be obtained according to the range strategy. After the desired vehicle speed and the desired vehicle distance are obtained, a state error equation for each vehicle may be obtained. Wherein the expected vehicle distance and the vehicle speed satisfy the following range strategies:
wherein V (h) represents the desired vehicle speed, h represents the current vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxIndicating a preset maximum vehicle speed.
Based on any of the above embodiments, the dynamic equation of the queue system obtained after the discretization process is as follows:
yi+1=A0yi+B1ui+B2ui-1;
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAndrepresenting system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,is the partial derivative of the range strategy at the desired vehicle distance.
Based on any of the above embodiments, the quadratic optimization control equation is constructed by using the minimized state error and the input as the objective function according to the dynamic equation of the queue system as follows:
wherein N is the number of sampling intervals, C and D are coefficient matrices:
c1 and c2 are preset coefficients.
Specifically, fig. 2 is a schematic diagram of an intelligent cruise control scene based on networked control according to an embodiment of the present invention, and for convenience of understanding, a vehicle queue according to an embodiment of the present invention is composed of m +1 vehicle groups, where a tail vehicle, i.e., a #1 vehicle, is a CCC autonomous vehicle, other vehicles are human manual vehicles, and a front vehicle, i.e., a # m +1 vehicle, is a head vehicle. Each vehicle in the fleet is equipped with a communication device, and the CCC autonomous vehicle can receive status information from other vehicles, including headway, vehicle speed, and acceleration, via V2V communication technology. In order to clearly illustrate the technical scheme of the embodiment of the invention, the head vehicle in the embodiment of the invention is used as a tracking target of the CCC automatic driving vehicle and runs at a dynamically changing vehicle speed.
As shown in FIG. 2, the equations for the dynamics of a human manually driven vehicle may be defined as follows:
wherein v isj(t) represents a vehicle speed of a jth vehicle, hj(t) represents a vehicle distance between the jth vehicle and the preceding vehicle,denotes v (t) aboutDifferential of time t, λjAndrepresents a system parameter related to human driving behavior, and v (h) is a desired speed based on a vehicle distance.
While the dynamic equations for a CCC autonomous vehicle may be defined as follows:
wherein u (t) represents the control strategy, i.e. the acceleration of the CCC autonomous vehicle, and τ (t) represents the network-induced time delay in the networked control process.
The purpose of each vehicle in the fleet is to achieve a desired vehicle distance h*(t) and desired vehicle speed v*(t)=V(h*(t)). The distance error can be defined according to the deviation between the actual state and the expected stateError of vehicle speedUsing linear first order approximation from a vehicle dynamics modelThe error dynamics model for the vehicle fleet can be found as follows:
defining a state vector:
the dynamic equation of the system obtained by simultaneously establishing the error dynamics equation of each vehicle is as follows:
in the above formula, the first and second carbon atoms are,
obtaining the ith sampling interval by sampling a discretization system dynamic equationThe discrete time system dynamic model is as follows:
yi+1=A0yi+B1ui+B2ui-1
wherein, yiY (i Δ T) and uiU (i Δ T) represents the state variable and the acceleration control strategy at the current moment, respectively, Δ T represents the sampling interval, and the other parameters are:
the aim of cruise control is to make the vehicle track the target distance and speed so as to keep the whole motorcade in balance state*Is equal to 0. To achieve optimal control, a quadratic cost function is defined as:
in the above formula, N is the number of sampling intervals, C and D are coefficient matrices:
wherein, c1And c2In the embodiment of the present invention, the coefficients are respectively 1 and 0.1 for presetting the coefficients.
In summary, the cruise control system optimization problem can be constructed as follows:
s.t.yi+1=A0yi+B1ui+B2ui-1
based on the influence of the dynamic time-varying characteristics of the network, in order to improve the environmental adaptability and the self-learning capability of the networked intelligent cruise control system, the embodiment of the invention provides an intelligent optimization control method based on the DRL to solve the optimization problem.
MDP is usually used to formally describe the RL problem, and at each time slot k, the agent observes the current state from the environment and makes a decision, and after performing an action gets the next state and adjusts the policy by the reward value fed back. According to the embodiment of the invention, states, actions, state transition functions and reward functions in the MDP are defined according to a cruise control system model and an optimization problem under a constructed network dynamic scene.
1) Status of state
Considering that the optimal control strategy is affected by the current state and the delay control signal caused by the network delay, the new state vector is defined as:
2) movement of
For networked cruise control systems, actions may be defined as acceleration control strategies:
ak=uk
3) state transfer function
Discrete time system model and state vector s based on networked cruise control systemkThe state transition function can be expressed as:
sk+1=skE+akF
wherein the content of the first and second substances,
4) reward function
Unlike minimizing the cost function in optimization theory, the goal of the intelligent algorithm is to maximize the long-term cumulative prize value, so the prize function can be defined as:
wherein the content of the first and second substances,
the long term jackpot value, referred to as the reward, is expressed as follows:
in the above formula, 0 < gamma < 1 is a discount factor.
Because the action value of the cruise control system is continuous, the DDPG method in the DRL can well solve the problem of system performance reduction caused by discrete action design. Therefore, the embodiment of the invention provides an intelligent optimization control method based on DDPG to obtain an intelligent control strategy, thereby improving the convergence and stability of the system.
Based on any one of the embodiments, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on a vehicle queue real-time collected state sample constructed by the automatic control vehicle, and includes:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
it should be noted that the intelligent cruise control architecture based on networked control is shown in fig. 3, wherein the DDPG mainly comprises four deep neural networks, namely, a current operator network μ (s | θ |)μ) Target operator network μ' (s | θ)μ′) Current criticic network Q (s, a | θ)Q) Target criticic network Q' (s, a | θ)Q′) Where μ (-) is a deterministic action policy, Q (-) is an action cost evaluation function, and θ represents the corresponding neural network parameter. The intelligent agent obtains a control strategy mu through training the operator network learning, and obtains a corresponding Q value through training the critic network to evaluate the control strategy.
In each time slot according to the input state skThe current actor network will output the corresponding action policy μ (s | θ)μ) Execution policyAnd obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill beStoring the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ:
Where M is the number of samples sampled in a small batch, Q(s)t,at|θQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1|θμ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1|θμ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1|θμ′) According to input state s for target operator networkt+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ:
the target operator network and the target critic network update their parameters theta respectively as followsQ'And thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
Specifically, the intelligent cruise control method based on networked control can be divided into two steps: sampling and training.
1) Sampling
First, enough samples need to be collected for training, and in each time slot, according to the input state skCurrent actor networkWill output the corresponding action strategy mu (s | theta)μ). In order to ensure effective exploration in a continuous motion space, the random noise eta is added to obtain an exploration strategy as follows:
execution policyThe state s of the next moment can be obtained according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkThen will(s)k,ak,sk+1,rk) Stored as samples in an empirical playback buffer. The above steps are repeated continuously to generate enough samples.
2) Training
In the training process of the embodiment of the invention, 200 time slots are taken as an episode (episode), and in each episode, a small batch of M samples(s) are randomly extractedt,at,st+1,rt) The method is used for training to reduce sample data correlation and improve training efficiency.
Current critic networks update their parameter θ by minimizing the following mean square error loss functionQ:
Where M is the number of samples sampled in a small batch, Q(s)t,at|θQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor a target Q value, it can be expressed as:
xt=rt+γQ′(st+1,μ′(st+1|θμ′)|θQ′)
in the above formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1|θμ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1|θμ′) According to input state s for target operator networkt+1The next action policy generated.
Current actor networks update their parameter θ by a strategic gradient functionμ:
Wherein M is the number of samples sampled in small batches,the method is a gradient operator, and the main objective of the formula is to increase the action probability of obtaining a larger Q value by the current operator network.
Then, the target operator network and the target critic network respectively update the parameters theta thereof by means of' soft updateQ' and thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein 0 < δ < 1 is a fixed constant.
Finally, through training of enough episodes, the optimized current actor network parameter theta can be obtainedμ*. Therefore, according to the input state s obtained each time, the optimization control strategy that the current operator network can generate the networked cruise control system in real time is as follows:
u*=a*=μ(s|θμ*)。
the following describes an intelligent cruise control device provided by the present invention, and the following description and the above-described intelligent cruise control method can be referred to correspondingly.
Fig. 4 is a schematic structural diagram of an intelligent cruise control device according to an embodiment of the present invention, and as shown in fig. 4, the device includes a status signal unit 410 and an intelligent control unit 420;
the state signal unit 410 is used for determining a current state signal of the automatic control vehicle;
the intelligent control unit 420 is configured to input a current state signal of the automatically controlled vehicle into an intelligent optimal control model, so as to implement intelligent cruise control on the automatically controlled vehicle;
the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.
The device provided by the embodiment of the invention can continuously and intelligently learn and adjust the optimization control strategy of networked cruise control through continuously interacting with the environment, thereby being suitable for the actual complex and changeable network dynamic scene and realizing the safe and stable driving of the CCC automatic driving vehicle.
Based on any one of the above embodiments, the intelligent control unit comprises an intelligent optimization control module;
as shown in fig. 5, the intelligent optimization control module includes a system modeling module 510, a problem construction module 520, an MDP construction module 530, and a calculation processing module 540;
the system modeling module 510 is configured to obtain queue state information of a vehicle queue formed by automatically controlling vehicles, and establish a dynamic equation of the queue system according to the queue state information;
the problem construction module 520 is configured to construct a quadratic optimization control equation with a minimized state error and an input as an objective function according to the dynamic equation of the queue system;
the MDP building module 530 is configured to build a markov decision process model for networked control according to the dynamic equation of the queuing system and the quadratic form optimization control equation;
the calculation processing module 540 is configured to generate samples and train based on continuous interaction between the DRL algorithm and the environment, so as to obtain an intelligent optimization control strategy.
Based on any of the above embodiments, as shown in fig. 6, the system modeling module includes a state obtaining module 610, a dynamic constructing module 620, a state error constructing module 630, and a system dynamic module 640;
the state obtaining module 610 is configured to obtain vehicle distance, vehicle speed and acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
the dynamic construction module 620 is configured to establish a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed, and the acceleration information of each vehicle in the vehicle queue;
the state error establishing module 630 is configured to obtain an expected vehicle speed through a head vehicle, obtain an expected vehicle distance of each vehicle based on a preset range strategy, and establish a state error equation of each vehicle according to the expected vehicle speed of the head vehicle, the expected vehicle distance of each vehicle, and the current vehicle speed and vehicle distance of each vehicle;
and the system dynamic module 640 is configured to combine the state error equations of the vehicles, and obtain the dynamic equation of the queue system after discretization processing based on the state equations of the vehicles in the continuous-time queue.
Based on any of the above embodiments, the preset scope policy includes:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and not more than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula isWherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
Based on any of the above embodiments, the dynamic equation of the queue system obtained after the discretization process is as follows:
yi+1=A0yi+B1ui+B2ui-1;
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAndrepresenting system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,is the partial derivative of the range strategy at the desired vehicle distance.
Based on any of the above embodiments, the quadratic optimization control equation is constructed by using the minimized state error and the input as the objective function according to the dynamic equation of the queue system as follows:
wherein N is the number of sampling intervals, C and D are coefficient matrices:
c1and c2 is a preset coefficient.
Based on any one of the embodiments, the intelligent optimization control model is obtained by performing neural network parameter training on a markov decision process model based on a vehicle queue real-time collected state sample constructed by the automatic control vehicle, and includes:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
in each time slot according to the input state skThe current actor network will output the corresponding action policy μ (s | θ)μ) Execution policyAnd obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill(s)k,ak,sk,+r1k) Storing the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ:
Where M is the number of samples sampled in a small batch, Q(s)t,at|θQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1|θμ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1|θμ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1|θμ′) According to input state s for target operator networkt+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ:
the target operator network and the target critic network update their parameters theta respectively as followsQ' and thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
To sum up, the intelligent cruise control method and the intelligent cruise control device provided by the embodiment of the invention construct a dynamic equation of an overall vehicle queue system by comprehensively analyzing vehicle dynamics and wireless network characteristics, consider the influence of dynamic time-varying network communication delay and an expected state, and establish an optimization control problem, thereby constructing an MDP model, adopt an intelligent algorithm based on DRL, generate samples through continuous interaction with the environment, train a neural network, and continuously accumulate experiences, thereby obtaining an intelligent optimization control strategy of automatically controlled vehicles, so that the automatically controlled vehicles can track an ideal expected vehicle speed and always keep a safe vehicle distance with a front vehicle, and meanwhile, the automatically controlled vehicles can also autonomously and stably run in an actual complex network dynamic scene. That is, the embodiment of the invention obtains the intelligent optimization control strategy of the cruise control system based on the networked control by integrally modeling the vehicle queue and combining the optimization control theory and the artificial intelligence method under the scene of network communication delay and the dynamic change of the expected state of the system, thereby realizing the stable control of the CCC automatic control vehicle. The method has the advantages that the networked control and artificial intelligence technology is applied to the automatic cruise control system of the vehicle, the influence of a complex dynamic environment on the control system is considered, a DRL-based method is further designed to obtain an intelligent optimization control strategy, and the environmental adaptability and the self-learning capability of the cruise control system are promoted.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a smart cruise control method comprising: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the smart cruise control method provided by the above methods, where the method includes: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the intelligent cruise control method provided in the foregoing aspects, the method including: determining a current status signal of the automatically controlled vehicle; inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle; the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A smart cruise control method, comprising:
determining a current status signal of the automatically controlled vehicle;
inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by carrying out neural network parameter training on a Markov decision process model based on vehicle queue real-time acquisition state samples constructed by the automatic control vehicles.
2. The smart cruise control method according to claim 1, wherein the markov decision process model building process comprises the steps of:
acquiring queue state information of a vehicle queue formed by automatically controlling vehicles, and establishing a dynamic equation of a queue system according to the queue state information;
according to the dynamic equation of the queue system, a quadratic optimization control equation is constructed by taking the minimized state error and the input as objective functions;
and constructing a Markov decision process model for networked control according to the dynamic equation of the queue system and the quadratic optimization control equation.
3. The intelligent cruise control method according to claim 2, wherein said obtaining queue state information of a vehicle queue built by automatically controlled vehicles and building a dynamic equation of the queue system according to said queue state information comprises the steps of:
obtaining the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue through vehicle-to-vehicle communication;
establishing a dynamic equation of each vehicle in the queue according to the vehicle distance, the vehicle speed and the acceleration information of each vehicle in the vehicle queue;
acquiring an expected speed through a head vehicle, acquiring an expected distance between vehicles based on a preset range strategy, and establishing a state error equation of each vehicle according to the expected speed of the head vehicle, the expected distance between vehicles and the current speed and distance between vehicles;
and combining the state error equations of all the vehicles, and obtaining the dynamic equation of the queue system after discretization processing based on the state equations of all the vehicles in the continuous time queue.
4. A smart cruise control method according to claim 3, characterized in that the predetermined range strategy comprises:
if the current vehicle distance is smaller than the preset minimum vehicle distance, the expected vehicle speed is 0;
if the current vehicle distance is not less than the preset minimum vehicle distance and is not less thanIf the vehicle distance is larger than the preset maximum vehicle distance, obtaining the expected vehicle speed according to the preset maximum vehicle speed, the current vehicle distance, the preset minimum vehicle distance and the preset maximum vehicle distance, wherein the calculation formula isWherein V (h) represents the desired vehicle speed, h represents the vehicle distance, hminIndicates a preset minimum vehicle distance, hmaxIndicating a preset maximum vehicle distance, vmaxRepresenting a preset maximum vehicle speed;
if the current vehicle distance is larger than the preset maximum vehicle distance, the expected vehicle speed is the preset maximum vehicle speed;
and obtaining the expected vehicle distance of each vehicle according to the expected vehicle speed.
5. A smart cruise control method according to claim 3, characterized in that the discretized dynamic equations of the obtained queue system are as follows:
yi+1=A0yi+B1ui+B2ui-1;
wherein, yiY (i Δ T) and uiU (i Δ T) denotes a state variable and an acceleration control strategy at the present time,
i is the serial number of the sampling interval, Delta T is the sampling interval, tau is the network induced time delay and lambdajAndrepresenting system parameters related to human driving behavior, j being the serial number of the vehicles in the queue, m being the total number of vehicles in the vehicle queue except the head vehicle,is the partial derivative of the range strategy at the desired vehicle distance.
6. A smart cruise control process according to claim 2, characterized in that said quadratic optimization control equation is constructed from the dynamical equations of said fleet system with the objective function of minimizing the state error and inputs as follows:
wherein N is the number of sampling intervals, C and D are coefficient matrices:
c1and c2Is a preset coefficient.
7. The intelligent cruise control method according to claim 1, wherein the intelligent optimization control model is obtained by performing neural network parameter training on a Markov decision process model based on vehicle queue real-time collected state samples built by the automatic control vehicle, and comprises the following steps:
establishing a depth certainty strategy gradient algorithm comprising a current actor network, a current critic network, a target actor network and a target critic network to update the Markov decision process model parameters;
in each time slot according to the input state skThe current actor network will output the corresponding action policy mu(s)k|θμ) Execution policyAnd obtaining the state s of the next moment according to the state transfer functionk+1And deriving a corresponding prize r from the prize functionkWill beStoring the samples in an experience playback buffer to obtain state samples; wherein the content of the first and second substances,
current critic networks update their parameter θ by minimizing the following mean square error loss functionQ:
Where M is the number of samples sampled in a small batch, Q(s)t,at|θQ) Is the current Q value by multiplying stAnd atInput into the current critic network to obtain xtFor the target Q value, expressed as:
xt=rt+γQ′(st+1,μ′(st+1|θμ′)|θQ′)
in the formula, rtFor a corresponding value of the reward function, Q'(s)t+1,μ′(st+1|θμ′)|θQ′) The next Q value, μ'(s), generated for the target critic networkt+1|θμ′) According to input state s for target operator networkt+1The generated next action strategy;
current actor networks update their parameter θ by a strategic gradient functionμ:
Wherein ^ is a gradient operator;
the target operator network and the target critic network update their parameters theta respectively as followsQ'And thetaμ':
θQ′←δθQ+(1-δ)θQ′
θμ′←δθμ+(1-δ)θμ′
Wherein, delta is a fixed constant, and delta is more than 0 and less than 1.
8. The intelligent cruise control device is characterized by comprising a state signal unit and an intelligent control unit;
the state signal unit is used for determining a current state signal of the automatic control vehicle;
the intelligent control unit is used for inputting a current state signal of the automatic control vehicle into an intelligent optimization control model to realize intelligent cruise control on the automatic control vehicle;
the intelligent optimization control model is obtained by training a Markov decision process model based on vehicle queues built by the automatic control vehicles and collected state samples in real time.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the smart cruise control method according to any of claims 1 to 7 are implemented by the processor when executing the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the smart cruise control method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110458260.3A CN113335277A (en) | 2021-04-27 | 2021-04-27 | Intelligent cruise control method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110458260.3A CN113335277A (en) | 2021-04-27 | 2021-04-27 | Intelligent cruise control method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113335277A true CN113335277A (en) | 2021-09-03 |
Family
ID=77468696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110458260.3A Pending CN113335277A (en) | 2021-04-27 | 2021-04-27 | Intelligent cruise control method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113335277A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113734167A (en) * | 2021-09-10 | 2021-12-03 | 苏州智加科技有限公司 | Vehicle control method, device, terminal and storage medium |
CN114387787A (en) * | 2022-03-24 | 2022-04-22 | 华砺智行(武汉)科技有限公司 | Vehicle track control method and device, electronic equipment and storage medium |
CN116257069A (en) * | 2023-05-16 | 2023-06-13 | 睿羿科技(长沙)有限公司 | Unmanned vehicle formation decision and speed planning method |
CN117055586A (en) * | 2023-06-28 | 2023-11-14 | 中国科学院自动化研究所 | Underwater robot tour search and grabbing method and system based on self-adaptive control |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109606367A (en) * | 2018-11-06 | 2019-04-12 | 北京工业大学 | The optimum linearity control method and device of cruise control system based on car networking |
CN109624986A (en) * | 2019-03-01 | 2019-04-16 | 吉林大学 | A kind of the study cruise control system and method for the driving style based on pattern switching |
CN109733415A (en) * | 2019-01-08 | 2019-05-10 | 同济大学 | A kind of automatic Pilot following-speed model that personalizes based on deeply study |
US20200033868A1 (en) * | 2018-07-27 | 2020-01-30 | GM Global Technology Operations LLC | Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents |
CN110989576A (en) * | 2019-11-14 | 2020-04-10 | 北京理工大学 | Target following and dynamic obstacle avoidance control method for differential slip steering vehicle |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | Hybrid vehicle intelligent time-domain-variable model prediction energy management method |
CN112162555A (en) * | 2020-09-23 | 2021-01-01 | 燕山大学 | Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet |
CN112580148A (en) * | 2020-12-20 | 2021-03-30 | 东南大学 | Heavy-duty operation vehicle rollover prevention driving decision method based on deep reinforcement learning |
-
2021
- 2021-04-27 CN CN202110458260.3A patent/CN113335277A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200033868A1 (en) * | 2018-07-27 | 2020-01-30 | GM Global Technology Operations LLC | Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents |
CN109606367A (en) * | 2018-11-06 | 2019-04-12 | 北京工业大学 | The optimum linearity control method and device of cruise control system based on car networking |
CN109733415A (en) * | 2019-01-08 | 2019-05-10 | 同济大学 | A kind of automatic Pilot following-speed model that personalizes based on deeply study |
CN109624986A (en) * | 2019-03-01 | 2019-04-16 | 吉林大学 | A kind of the study cruise control system and method for the driving style based on pattern switching |
CN110989576A (en) * | 2019-11-14 | 2020-04-10 | 北京理工大学 | Target following and dynamic obstacle avoidance control method for differential slip steering vehicle |
CN111267831A (en) * | 2020-02-28 | 2020-06-12 | 南京航空航天大学 | Hybrid vehicle intelligent time-domain-variable model prediction energy management method |
CN112162555A (en) * | 2020-09-23 | 2021-01-01 | 燕山大学 | Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet |
CN112580148A (en) * | 2020-12-20 | 2021-03-30 | 东南大学 | Heavy-duty operation vehicle rollover prevention driving decision method based on deep reinforcement learning |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113734167A (en) * | 2021-09-10 | 2021-12-03 | 苏州智加科技有限公司 | Vehicle control method, device, terminal and storage medium |
CN114387787A (en) * | 2022-03-24 | 2022-04-22 | 华砺智行(武汉)科技有限公司 | Vehicle track control method and device, electronic equipment and storage medium |
CN114387787B (en) * | 2022-03-24 | 2022-08-23 | 华砺智行(武汉)科技有限公司 | Vehicle track control method and device, electronic equipment and storage medium |
CN116257069A (en) * | 2023-05-16 | 2023-06-13 | 睿羿科技(长沙)有限公司 | Unmanned vehicle formation decision and speed planning method |
CN116257069B (en) * | 2023-05-16 | 2023-08-08 | 睿羿科技(长沙)有限公司 | Unmanned vehicle formation decision and speed planning method |
CN117055586A (en) * | 2023-06-28 | 2023-11-14 | 中国科学院自动化研究所 | Underwater robot tour search and grabbing method and system based on self-adaptive control |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113335277A (en) | Intelligent cruise control method and device, electronic equipment and storage medium | |
Zhu et al. | Human-like autonomous car-following model with deep reinforcement learning | |
Liang et al. | A deep reinforcement learning network for traffic light cycle control | |
WO2021208771A1 (en) | Reinforced learning method and device | |
Zhu et al. | Multi-robot flocking control based on deep reinforcement learning | |
CN109990790B (en) | Unmanned aerial vehicle path planning method and device | |
Li et al. | A reinforcement learning-based vehicle platoon control strategy for reducing energy consumption in traffic oscillations | |
CN109726804B (en) | Intelligent vehicle driving behavior personification decision-making method based on driving prediction field and BP neural network | |
CN113412494B (en) | Method and device for determining transmission strategy | |
CN112937564A (en) | Lane change decision model generation method and unmanned vehicle lane change decision method and device | |
Naveed et al. | Trajectory planning for autonomous vehicles using hierarchical reinforcement learning | |
Han et al. | Intelligent decision-making for 3-dimensional dynamic obstacle avoidance of UAV based on deep reinforcement learning | |
CN115578876A (en) | Automatic driving method, system, equipment and storage medium of vehicle | |
US20230367934A1 (en) | Method and apparatus for constructing vehicle dynamics model and method and apparatus for predicting vehicle state information | |
CN113867354A (en) | Regional traffic flow guiding method for intelligent cooperation of automatic driving of multiple vehicles | |
CN112462602B (en) | Distributed control method for keeping safety spacing of mobile stage fleet under DoS attack | |
CN115494879B (en) | Rotor unmanned aerial vehicle obstacle avoidance method, device and equipment based on reinforcement learning SAC | |
Zhou et al. | A novel mean-field-game-type optimal control for very large-scale multiagent systems | |
CN114815882A (en) | Unmanned aerial vehicle autonomous formation intelligent control method based on reinforcement learning | |
Yuan et al. | Prioritized experience replay-based deep q learning: Multiple-reward architecture for highway driving decision making | |
CN117406756A (en) | Method, device, equipment and storage medium for determining motion trail parameters | |
Zhuang et al. | Robust auto-parking: Reinforcement learning based real-time planning approach with domain template | |
Wang et al. | Experience sharing based memetic transfer learning for multiagent reinforcement learning | |
Diallo et al. | Coordination in adversarial multi-agent with deep reinforcement learning under partial observability | |
CN114723058A (en) | Neural network end cloud collaborative reasoning method and device for high-sampling-rate video stream analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |