CN114148349B - Vehicle personalized following control method based on generation of countermeasure imitation study - Google Patents

Vehicle personalized following control method based on generation of countermeasure imitation study Download PDF

Info

Publication number
CN114148349B
CN114148349B CN202111568497.3A CN202111568497A CN114148349B CN 114148349 B CN114148349 B CN 114148349B CN 202111568497 A CN202111568497 A CN 202111568497A CN 114148349 B CN114148349 B CN 114148349B
Authority
CN
China
Prior art keywords
vehicle
following
neural network
simulation
personalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111568497.3A
Other languages
Chinese (zh)
Other versions
CN114148349A (en
Inventor
任玥
邹博文
梁新成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to CN202111568497.3A priority Critical patent/CN114148349B/en
Publication of CN114148349A publication Critical patent/CN114148349A/en
Application granted granted Critical
Publication of CN114148349B publication Critical patent/CN114148349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety
    • B60W60/0016Planning or execution of driving tasks specially adapted for safety of the vehicle or its occupants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
    • B60W30/14Adaptive cruise control
    • B60W30/143Speed control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/10Longitudinal speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/10Longitudinal speed
    • B60W2520/105Longitudinal acceleration
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2540/00Input parameters relating to occupants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects
    • B60W2554/802Longitudinal distance
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects
    • B60W2554/804Relative longitudinal speed
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application provides a vehicle personalized following control method based on generation of countermeasure imitation study, which comprises the following steps: establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle; setting different speed curves of a front vehicle in a simulated vehicle following simulation environment; according to different speed curves of a front vehicle, carrying out a simulated driving following test in a simulated following simulation environment, collecting driving data of a main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set; constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method; and performing personalized following control on the vehicle by using a vehicle personalized following control model. The application can solve the technical problem that the driving habit of the driver can not be objectively and comprehensively reflected by formulating the reward function in the existing follow-up control technology based on deep reinforcement learning.

Description

Vehicle personalized following control method based on generation of countermeasure imitation study
Technical Field
The application relates to the technical field of automatic driving, in particular to a vehicle personalized following control method based on generation of countermeasure imitation learning.
Background
In the development process of the automatic driving technology, from initial constant-speed cruising, self-adaptive cruising and final full automatic driving, the vehicle autonomous following control is one of the key technologies of the vehicle active safety technology and the vehicle automatic driving technology.
The existing autonomous following control technology is mainly divided into two main types: model-based control and Data-based (Data-driven) control. The vehicle longitudinal acceleration is controlled by a constraint optimization method by establishing a vehicle kinematics/dynamics model, describing collision risk of longitudinal movement of the vehicle and combining indexes such as vehicle following efficiency, passenger comfort and the like in the vehicle following process. Thanks to the rapid development of chip computing power, simulation technology and AI technology, the deep reinforcement learning technology provides a brand-new thought for automatic driving control strategies, and by setting a reward function and utilizing continuous interaction of an intelligent body and a simulation environment to try mistakes and optimize the control strategy, the cost of system dynamics modeling and parameter adjustment can be reduced, and by adding driving habit indexes into the reward function, the following control is more in accordance with driving habits of different drivers through the learning of actual driving data of the drivers.
However, at present, the establishment of the reward function is still generally based on subjective judgment on the performance of the following system, and is difficult to objectively and comprehensively reflect the implicit relation between the state space and the output of the system, so that the following control based on the traditional deep reinforcement learning has certain limitations in individuation and pleasure.
Disclosure of Invention
Aiming at the defects existing in the prior art, the application provides a vehicle personalized following control method based on generation of countermeasures and imitation learning, so as to solve the technical problem that the driving habit of a driver cannot be objectively and comprehensively reflected by formulating a reward function in the existing following control technology based on deep reinforcement learning.
The technical scheme adopted by the application is that the vehicle personalized following control method based on the generation of the countermeasure imitation study comprises the following steps:
establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle;
setting different speed curves of a front vehicle in a simulated vehicle following simulation environment;
according to different speed curves of a front vehicle, carrying out a simulated driving following test in a simulated following simulation environment, collecting driving data of a main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set;
constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method;
and performing personalized following control on the vehicle by using a vehicle personalized following control model.
Further, establishing the simulated vehicle following simulation environment includes:
setting up a simulated vehicle following simulation environment by adopting an automatic driving simulation platform;
the method comprises the steps that vehicle dynamics simulation software is adopted for a main vehicle to carry out main vehicle dynamics modeling;
a random traffic flow model is used to describe the surrounding vehicle motion.
Further, the different speed profiles of the front vehicle include: the front vehicle runs at a constant speed, at a reduced speed, at emergency braking and at random speed.
Further, the following state includes: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle r Main vehicle speed v h The action is longitudinal acceleration a of the main vehicle h
Further, constructing the vehicle personalized following control model includes:
the method comprises the steps of taking the relative distance, the relative speed and the speed of a main vehicle and a front vehicle as inputs, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network;
the method comprises the steps of taking the relative distance and the relative speed of a main vehicle and a front vehicle, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, taking a true value and a false value as outputs, and establishing a discrimination neural network;
a continuous follow-up segment is obtained from a uniform sampling of the driver dataset:
wherein Respectively representing the m-th step following state and the actual action of a driver;
inputting the obtained continuous following segment into a strategy generation neural network to interact with a simulation environment to obtain a simulation following segment: wherein />Respectively represent the following shape of the nth step of the simulation processThe state and the strategy generate a neural network output action;
inputting the simulation following segment into a discrimination neural network, and discriminating the true degree of the strategy generation neural network output by adopting the discrimination neural network;
generating a neural network using a plurality of sequential following segment training strategies;
training and distinguishing the neural network by using a plurality of simulation following fragments;
and updating and judging the neural network parameters by adopting a gradient descent method.
Further, the number of the neurons of the input layer of the strategy generation neural network is 3, and the number of the neurons is respectively the relative distance, the host vehicle speed and the relative speed; the number of the neurons of the output layer is 1, and the neurons are longitudinal acceleration of the host vehicle; the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the policy generating neural network is expressed as:
f=π(as;ω)
wherein a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generation neural network parameters.
Further, 4 neurons of an input layer of the neural network are judged, namely, the relative distance, the speed of the host vehicle, the relative speed and the longitudinal acceleration of the host vehicle, 1 neuron of an output layer is judged, and the values of the neurons of the output layer are (0, 1); the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the discriminating neural network is expressed as:
p a =D(s,a;θ)∈(0,1)
where s represents the vehicle state, s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discrimination neural network parameter.
Further, when a continuous following fragment is obtained by uniformly sampling from a driver data set, taking the initial state of a host vehicle and the running track of a front vehicle in the fragment as simulation scenes, defining a current strategy generation neural network parameter strategy generation neural network to perform probability sampling on each action so as to control interaction between the host vehicle and the environment, stopping simulation when a stopping condition is met, and recording simulation following fragment data; the stop conditions include:
finishing the reading of the sample data;
the two vehicles collide;
the speed of the master vehicle is less than or equal to 0.
Further, when the discrimination neural network is adopted to discriminate the true degree of the output of the strategy generation neural network, the cross entropy is defined as the kth step return function r k =logD(s k ,a k The method comprises the steps of carrying out a first treatment on the surface of the θ), substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:
wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.
Further, when training the strategy generation neural network and the discrimination neural network, the objective function of the strategy generation neural network is as follows:
where ω represents the network parameter to be updated, ω now Representing current network parameters;
the loss function of the discrimination neural network is:
wherein m is the sampling point number of the continuous following fragments, and n is the sampling point number of the simulation following fragments;
the gradient descent method is adopted to update and judge the parameters of the neural network, and the parameter updating formula is as follows:
wherein ,θold For the current discrimination of network parameters, θ new In order to judge the network parameters after updating, lambda is the learning rate.
According to the technical scheme, the beneficial technical effects of the application are as follows:
1. the strategy network is more in line with the behavior characteristics of the driver through the generation network and the discrimination network without manually defining the reward function, so that the following control strategy is more in line with the driving habit of the driver.
2. Based on the simulation driving device, the driving data acquisition of the driver is carried out by establishing different simulation scenes, the system structure is simpler, the cost is low, and the driving is safer compared with the actual road.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
FIG. 1 is a schematic diagram of a vehicle personalized follow-up control flow according to an embodiment of the present application;
FIG. 2 is a block diagram of a vehicle personalized follow-up control strategy according to an embodiment of the present application;
FIG. 3 is a block diagram of a policy generation neural network according to an embodiment of the present application;
FIG. 4 is a diagram of a discriminating neural network according to an embodiment of the application;
fig. 5 is a schematic diagram of a truncation function according to an embodiment of the present application.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
Examples
The embodiment provides a vehicle personalized following control method based on generation of countermeasure imitation study, as shown in fig. 1, specifically comprising the following steps:
step 1, establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle.
In a specific embodiment, an automatic driving simulation platform Prescan is adopted to build a simulation vehicle following simulation environment; the simulated vehicle following simulation environment comprises a road model, a main vehicle and a front vehicle. And carrying out host vehicle dynamics modeling on the host vehicle by adopting vehicle dynamics simulation software Carsim to obtain a host vehicle Carsim model. For surrounding vehicles, a random traffic flow model is used to describe the surrounding vehicle motion. In this embodiment, the host vehicle is a vehicle that adopts autonomous personalized following control, and the front vehicle is a vehicle that is located in the same lane as the host vehicle, in front of the host vehicle, and is followed by the host vehicle.
And 2, setting different speed curves of the front vehicle in a simulated vehicle following simulation environment.
Specifically, the different speed profiles of the front vehicle include: the front vehicle runs at a constant speed, runs at a reduced speed, runs at an emergency brake and at random speed, and different following working conditions of the main vehicle can be simulated by setting different speed curves of the front vehicle.
And 3, performing a simulated driving following test in a simulated following simulation environment according to different speed curves of the front vehicle, collecting driving data of the main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set.
In a specific embodiment, compass G29 is adopted as a simulated driving device in a simulated vehicle following simulation environment, a driver controls a steering wheel and an accelerator/brake pedal, and a driving simulator collects steering wheel angle, accelerator pedal opening and brake pedal opening signals and transmits the steering wheel angle, accelerator pedal opening and brake pedal opening signals to a main vehicle Carsim model.
According to different speed curves of the front vehicle, the driver is in the modelAnd simulating different following working conditions of the simulated car following simulation environment, performing multiple simulated driving following tests, acquiring driving data of a main car and a front car in the same lane to obtain a following state, wherein the following state comprises the following states: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle r Main vehicle speed v h Longitudinal acceleration a of host vehicle h . The relative distance, the host vehicle speed and the relative speed are selected as the vehicle state s, i.e. s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the The longitudinal acceleration of the main vehicle is selected as action a, i.e. a= [ a ] h ]。
In a specific embodiment, the collected test data may be embodied as a plurality of different continuous following segments, that is, k simulated driving tests are performed, but each test has different time, for example, the first time is 8 seconds, the second time is 10 seconds, each test corresponds to one continuous following segment, and each continuous following segment contains 4 driving data of relative distance, relative speed, host vehicle speed and host vehicle longitudinal acceleration. In the case of collecting test data, the sampling frequency is not limited, and the preferred frequency is 5Hz, the number of sampling points for the first test for 8 seconds is 40, and the number of sampling points for the second test for 10 seconds is 50.
A plurality of consecutive heel segments is selected as the driver dataset Γ, specifically as follows:
Γ={τ (1)(2) ,...,τ (k) }
wherein: τ= [ s ] 1 ,a 1 ,s 2 ,a 2 ,...,s m ,a m ]K is the number of consecutive follow-up segments and m is the number of sampling points in each segment.
Step 4, constructing a vehicle personalized following control model according to the driver following data set by using a generated countermeasure network imitation reinforcement learning method
As shown in fig. 2, the construction of the vehicle personalized following control model specifically includes:
step 4.1, taking the relative distance, the relative speed and the speed of the main vehicle and the front vehicle as input, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network
In the present embodimentIn the method, the longitudinal acceleration a= [ a ] of the main vehicle is adopted h ]In a specific embodiment, the longitudinal acceleration range of the host vehicle is set to-3 m/s 2 ≤a h ≤3m/s 2
The strategy generation neural network structure is shown in figure 3, and the number of the input layer neurons is 3, namely the relative distance, the host vehicle speed and the relative speed; the number of the neurons of the output layer is 1, namely the longitudinal acceleration of the host vehicle, the number of the hidden layers is 2, and the number of the neurons of each hidden layer is 5. The policy generating neural network is expressed as:
f=π(a|s;ω)
wherein a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generating neural network parameters including the number of network layers, the number of neurons per layer.
Step 4.2, taking the relative distance between the main vehicle and the front vehicle, the relative speed, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, and taking the true value and the false value as outputs to establish a discrimination neural network
The neural network structure is shown in figure 4, the number of neurons of the input layer is 4, and the neurons are the relative distance, the speed of the host vehicle, the relative speed and the longitudinal acceleration [ d, v ] of the host vehicle h ,v r ,a h ]The number of output layer neurons is 1, the value of which is (0, 1), the closer the output is to 1 is to be "true", i.e., the action is the driver behavior, and the closer is to 0 is to be "false", i.e., the action is generated by the strategy generation neural network. The hidden layers are 2 layers, and the number of neurons of each hidden layer is 5. The discriminating neural network is expressed as:
p a =D(s,a;θ)∈(0,1)
where s represents the vehicle state, s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discriminating neural network parameter, and the parameters include the number of network layers and the number of neurons in each layer.
Step 4.3, obtaining a continuous heel segment from the driver data set by uniform sampling
In this embodiment, continuous patches obtained from uniform sampling of the driver data setThe section is recorded as follows: wherein />Representing the m-th step following state and the actual action of the driver, respectively.
The following continuous fragments are obtained as any one of all fragments from a uniform sampling of the driver dataset.
Step 4.4, inputting the obtained continuous following segments into a strategy generation neural network to interact with a simulation environment respectively, so as to obtain simulation following segments; the simulation following segment is input into a discrimination neural network, and the discrimination neural network is adopted to discriminate the real degree of the output of the strategy generation neural network
Taking the initial state of the main vehicle and the running track of the front vehicle in the continuous following section obtained in the step 4.3 as simulation scenes, and defining the parameters of the current strategy generation neural network as omega now Strategy generation neural network pi (a|s; omega) now ) Each action performs probability sampling to control interaction between the host vehicle and the environment, and stops simulation when the simulation stopping condition is met to obtain simulation following fragment data wherein />Generating a neural network output action by representing the following state and the strategy of the nth step of the simulation process respectively;
the condition for stopping the simulation is any one of the following:
finishing the reading of the sample data;
the two vehicles collide (the relative distance d is less than or equal to 0);
speed v of host vehicle h ≤0。
In this step, m and n represent the lengths of two consecutive heel fragments, respectively;
when the discrimination neural network is adopted to discriminate the true degree of the strategy generation neural network output, in a specific embodiment, the cross entropy is defined as the kth step return function:
r k =logD(s k ,a k ;θ)
substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:
wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.
In this way, the discrimination neural network can discriminate the true degree of the output of the policy generation neural network, and the closer the output is to 1, the more true the action is the driver behavior, and the closer the action is to 0, the more false the action is generated by the policy generation neural network.
Step 4.5 generating a neural network using a plurality of continuous following segment training strategies
A plurality of continuous following segments can be obtained by step 4.3, and the strategy generation neural network is trained using the plurality of continuous following segments. Specifically, a near-end strategy optimization method (Proximal Policy Optimization, PPO) is adopted to update a strategy to generate a neural network during training, and an objective function is defined as follows:
where ω represents the network parameter to be updated, ω now Representing current network parameters; k represents the number of steps, pi () generates a neural network for the policy, r t Is a return function.
Designing a near-end strategy optimization function based on maximum clipping, and updating the performance target as follows:
the strategy generation neural network parameter updating method is shown in the following formula:
wherein ωnew Representing the updated network parameters and,represents a truncation function, wherein ε is a truncation parameter; the function y=clip (x, a, b) is shown in fig. 5. In this embodiment, adam-based stochastic gradient ascent is used to solve for ω new
Step 4.6, training and distinguishing the neural network by using a plurality of simulation following fragments
And 4.5, obtaining a plurality of simulation following fragments, and training the discrimination neural network by using the simulation following fragments. Specifically, the loss function is defined during training as follows:
wherein, m is the sampling point number of the continuous following fragments, D () is the discrimination neural network, and n is the sampling point number of the simulation following fragments. The smaller the first term in the above formula, the more representativeThe larger the second term, the smaller the second term, representing +.>The smaller the explanatory discriminator loss function G is, the better it is to be able to discriminate whether the system input is input by the driver or generated by the decision network.
Step 4.7: updating and distinguishing neural network parameters by adopting gradient descent method
Specifically, the parameter updating formula is:
wherein ,θold For the current discrimination of network parameters, θ new In order to determine the network parameters after updating, lambda is the learning rate,representing a deviation from θ.
And (3) repeatedly executing the steps 4.3-4.7, obtaining a plurality of continuous following fragments from a driver data set, performing continuous interactive trial and error with the environment through an agent, training a strategy generation neural network and a discrimination neural network until convergence, and finally obtaining the strategy generation neural network through the step 4.4 when the convergence is finished, namely the vehicle personalized following control model, wherein the output of the strategy generation neural network is the personalized following control strategy.
And 5, performing personalized following control on the vehicle by using a personalized following control model of the vehicle.
By adopting the technical scheme of the embodiment, the strategy network is more in line with the behavior characteristics of the driver by generating the network and distinguishing the network without manually defining the rewarding function, so that the following control strategy is more in line with the driving habit of the driver.
Meanwhile, the driving data acquisition of the driver is carried out by establishing different simulation scenes based on the simulation driving device, the system structure is simple, the cost is low, and the driving is safer compared with the actual road driving.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application, and are intended to be included within the scope of the appended claims and description.

Claims (8)

1. A vehicle personalized follow-up control method based on generation of challenge-mimicking learning, comprising the steps of:
establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a main vehicle and a front vehicle;
setting different speed curves of a front vehicle in the simulated vehicle following simulation environment;
according to different speed curves of the front vehicle, carrying out a simulated driving following test in the simulated following simulation environment, collecting driving data of the main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set;
constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method; the vehicle personalized following control model is constructed by the following steps: the method comprises the steps of taking the relative distance, the relative speed and the speed of a main vehicle and a front vehicle as inputs, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network; the method comprises the steps of taking the relative distance and the relative speed of a main vehicle and a front vehicle, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, taking a true value and a false value as outputs, and establishing a discrimination neural network; a continuous follow-up segment is obtained from a uniform sampling of the driver dataset: wherein />Respectively representing the m-th step following state and the actual action of a driver; inputting the obtained continuous following segment into a strategy generation neural network to interact with a simulation environment to obtain a simulation following segment: /> wherein />Generating a neural network output action by representing the following state and the strategy of the nth step of the simulation process respectively; inputting the simulation following segment into a discrimination neural network, and discriminating the true degree of the strategy generation neural network output by adopting the discrimination neural network;
generating a neural network by using a plurality of continuous following fragments to train the strategy, wherein the objective function of the strategy generating the neural network during training is as follows:
wherein ω' represents the policy generation neural network parameters to be updated, ω now Representing the current policy generation neural network parameters, k representing the number of steps, pi () representing the policy generation neural network, r t Is a return function;
training a discrimination neural network by using a plurality of simulation following fragments, wherein the loss function of the discrimination neural network during training is as follows:
wherein m is the sampling point number of the continuous following fragments, D () is the discrimination neural network, and n is the sampling point number of the simulation following fragments;
the gradient descent method is adopted to update and judge the parameters of the neural network, and the parameter updating formula is as follows:
wherein ,θold For currently distinguishing the neural network parameters, θ new In order to judge the neural network parameters after updating, lambda is the learning rate,representing deviation of theta;
and performing personalized following control on the vehicle by using a vehicle personalized following control model.
2. The vehicle personalized heel control method based on generating countermeasure imitation learning of claim 1, wherein establishing an imitation heel simulation environment comprises:
setting up a simulated vehicle following simulation environment by adopting an automatic driving simulation platform;
the method comprises the steps that vehicle dynamics simulation software is adopted for a main vehicle to carry out main vehicle dynamics modeling;
a random traffic flow model is used to describe the surrounding vehicle motion.
3. The vehicle personalized heel control method based on generating countermeasure imitation learning of claim 1, wherein the different speed profiles of the preceding vehicle include: the front vehicle runs at a constant speed, at a reduced speed, at emergency braking and at random speed.
4. The vehicle personalized heel control method based on generation of countermeasure imitation learning of claim 1, wherein the heel state includes: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle r Main vehicle speed v h The action is longitudinal acceleration a of the main vehicle h
5. The vehicle personalized following control method based on generation of countermeasure imitation learning according to claim 4, wherein the input layer neurons of the strategy generation neural network are 3, which are respectively a relative distance, a host vehicle speed and a relative speed; the number of the neurons of the output layer is 1, and the neurons are longitudinal acceleration of the host vehicle; the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the policy generating neural network is expressed as:
f=π(as;ω)
wherein a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generation neural network parameters.
6. The vehicle personalized following control method based on the generation of the countermeasure imitation study according to claim 4, wherein 4 input layer neurons of the discrimination neural network are respectively the relative distance, the host vehicle speed, the relative speed and the host vehicle longitudinal acceleration, and 1 output layer neuron is the value of (0, 1); the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the discriminating neural network is expressed as:
p a =D(s,a;θ)∈(0,1)
where s represents the vehicle state, s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discrimination neural network parameter.
7. The method for personalized vehicle following control based on generation of countermeasure imitation study according to claim 1, wherein when a continuous following segment is obtained by uniformly sampling from a driver data set, a main vehicle initial state and a front vehicle running track in the segment are used as simulation scenes, current strategy generation neural network parameters are defined, probability sampling is performed on each action of the strategy generation neural network to control the main vehicle to interact with the environment, simulation is stopped when a stopping condition is met, and simulation following segment data are recorded; the stop condition includes:
finishing the reading of the sample data;
the two vehicles collide;
the host vehicle speed is less than or equal to 0.
8. The vehicle personalized following control method based on generation of countermeasure imitation study according to claim 1, wherein when the discrimination neural network is adopted to discriminate the true degree of the policy generation neural network output, the cross entropy is defined as the kth step return function r k =logD(s k ,a k The method comprises the steps of carrying out a first treatment on the surface of the θ), substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:
wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.
CN202111568497.3A 2021-12-21 2021-12-21 Vehicle personalized following control method based on generation of countermeasure imitation study Active CN114148349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111568497.3A CN114148349B (en) 2021-12-21 2021-12-21 Vehicle personalized following control method based on generation of countermeasure imitation study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111568497.3A CN114148349B (en) 2021-12-21 2021-12-21 Vehicle personalized following control method based on generation of countermeasure imitation study

Publications (2)

Publication Number Publication Date
CN114148349A CN114148349A (en) 2022-03-08
CN114148349B true CN114148349B (en) 2023-10-03

Family

ID=80451718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111568497.3A Active CN114148349B (en) 2021-12-21 2021-12-21 Vehicle personalized following control method based on generation of countermeasure imitation study

Country Status (1)

Country Link
CN (1) CN114148349B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117698685B (en) * 2024-02-06 2024-04-09 北京航空航天大学 Dynamic scene-oriented hybrid electric vehicle self-adaptive energy management method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109624986A (en) * 2019-03-01 2019-04-16 吉林大学 A kind of the study cruise control system and method for the driving style based on pattern switching
CN109733415A (en) * 2019-01-08 2019-05-10 同济大学 A kind of automatic Pilot following-speed model that personalizes based on deeply study
CN111483468A (en) * 2020-04-24 2020-08-04 广州大学 Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning
CN111795700A (en) * 2020-06-30 2020-10-20 浙江大学 Unmanned vehicle reinforcement learning training environment construction method and training system thereof
CN111982137A (en) * 2020-06-30 2020-11-24 北京百度网讯科技有限公司 Method, device, equipment and storage medium for generating route planning model
CN112201069A (en) * 2020-09-25 2021-01-08 厦门大学 Deep reinforcement learning-based method for constructing longitudinal following behavior model of driver
CN112580149A (en) * 2020-12-22 2021-03-30 浙江工业大学 Vehicle following model generation method based on generation of countermeasure network and driving duration
CN113010967A (en) * 2021-04-22 2021-06-22 吉林大学 Intelligent automobile in-loop simulation test method based on mixed traffic flow model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284280B (en) * 2018-09-06 2020-03-24 百度在线网络技术(北京)有限公司 Simulation data optimization method and device and storage medium
WO2020079069A2 (en) * 2018-10-16 2020-04-23 Five AI Limited Driving scenarios for autonomous vehicles

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109733415A (en) * 2019-01-08 2019-05-10 同济大学 A kind of automatic Pilot following-speed model that personalizes based on deeply study
CN109624986A (en) * 2019-03-01 2019-04-16 吉林大学 A kind of the study cruise control system and method for the driving style based on pattern switching
CN111483468A (en) * 2020-04-24 2020-08-04 广州大学 Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning
CN111795700A (en) * 2020-06-30 2020-10-20 浙江大学 Unmanned vehicle reinforcement learning training environment construction method and training system thereof
CN111982137A (en) * 2020-06-30 2020-11-24 北京百度网讯科技有限公司 Method, device, equipment and storage medium for generating route planning model
CN112201069A (en) * 2020-09-25 2021-01-08 厦门大学 Deep reinforcement learning-based method for constructing longitudinal following behavior model of driver
CN112580149A (en) * 2020-12-22 2021-03-30 浙江工业大学 Vehicle following model generation method based on generation of countermeasure network and driving duration
CN113010967A (en) * 2021-04-22 2021-06-22 吉林大学 Intelligent automobile in-loop simulation test method based on mixed traffic flow model

Also Published As

Publication number Publication date
CN114148349A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
Zhu et al. Typical-driving-style-oriented personalized adaptive cruise control design based on human driving data
CN106874597B (en) highway overtaking behavior decision method applied to automatic driving vehicle
CN107264534B (en) Based on the intelligent driving control system and method for driver experience's model, vehicle
CN110321954A (en) The driving style classification and recognition methods of suitable domestic people and system
CN113010967B (en) Intelligent automobile in-loop simulation test method based on mixed traffic flow model
CN111332362B (en) Intelligent steer-by-wire control method integrating individual character of driver
CN111267830B (en) Hybrid power bus energy management method, device and storage medium
CN111845701A (en) HEV energy management method based on deep reinforcement learning in car following environment
CN109709956A (en) A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding
CN110949398A (en) Method for detecting abnormal driving behavior of first-vehicle drivers in vehicle formation driving
CN104462716B (en) A kind of the brain-computer interface parameter and kinetic parameter design method of the brain control vehicle based on people's bus or train route model
CN111775949A (en) Personalized driver steering behavior assisting method of man-machine driving-sharing control system
CN108482481B (en) Four-wheel steering control method for four-wheel independent drive and steering electric automobile
CN113581182B (en) Automatic driving vehicle lane change track planning method and system based on reinforcement learning
CN111204348A (en) Method and device for adjusting vehicle running parameters, vehicle and storage medium
CN111783943B (en) LSTM neural network-based driver braking strength prediction method
CN116432448B (en) Variable speed limit optimization method based on intelligent network coupling and driver compliance
CN113901718A (en) Deep reinforcement learning-based driving collision avoidance optimization method in following state
CN110490275A (en) A kind of driving behavior prediction technique based on transfer learning
CN114148349B (en) Vehicle personalized following control method based on generation of countermeasure imitation study
CN110320916A (en) Consider the autonomous driving vehicle method for planning track and system of occupant's impression
Selvaraj et al. An ML-aided reinforcement learning approach for challenging vehicle maneuvers
Zhu et al. Design of an integrated vehicle chassis control system with driver behavior identification
CN113033902B (en) Automatic driving lane change track planning method based on improved deep learning
Xu et al. Modeling Lateral Control Behaviors of Distracted Drivers for Haptic-Shared Steering System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant