CN114148349B

CN114148349B - Vehicle personalized following control method based on generation of countermeasure imitation study

Info

Publication number: CN114148349B
Application number: CN202111568497.3A
Authority: CN
Inventors: 任玥; 邹博文; 梁新成
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2023-10-03
Anticipated expiration: 2041-12-21
Also published as: CN114148349A

Abstract

The application provides a vehicle personalized following control method based on generation of countermeasure imitation study, which comprises the following steps: establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle; setting different speed curves of a front vehicle in a simulated vehicle following simulation environment; according to different speed curves of a front vehicle, carrying out a simulated driving following test in a simulated following simulation environment, collecting driving data of a main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set; constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method; and performing personalized following control on the vehicle by using a vehicle personalized following control model. The application can solve the technical problem that the driving habit of the driver can not be objectively and comprehensively reflected by formulating the reward function in the existing follow-up control technology based on deep reinforcement learning.

Description

Vehicle personalized following control method based on generation of countermeasure imitation study

Technical Field

The application relates to the technical field of automatic driving, in particular to a vehicle personalized following control method based on generation of countermeasure imitation learning.

Background

In the development process of the automatic driving technology, from initial constant-speed cruising, self-adaptive cruising and final full automatic driving, the vehicle autonomous following control is one of the key technologies of the vehicle active safety technology and the vehicle automatic driving technology.

The existing autonomous following control technology is mainly divided into two main types: model-based control and Data-based (Data-driven) control. The vehicle longitudinal acceleration is controlled by a constraint optimization method by establishing a vehicle kinematics/dynamics model, describing collision risk of longitudinal movement of the vehicle and combining indexes such as vehicle following efficiency, passenger comfort and the like in the vehicle following process. Thanks to the rapid development of chip computing power, simulation technology and AI technology, the deep reinforcement learning technology provides a brand-new thought for automatic driving control strategies, and by setting a reward function and utilizing continuous interaction of an intelligent body and a simulation environment to try mistakes and optimize the control strategy, the cost of system dynamics modeling and parameter adjustment can be reduced, and by adding driving habit indexes into the reward function, the following control is more in accordance with driving habits of different drivers through the learning of actual driving data of the drivers.

However, at present, the establishment of the reward function is still generally based on subjective judgment on the performance of the following system, and is difficult to objectively and comprehensively reflect the implicit relation between the state space and the output of the system, so that the following control based on the traditional deep reinforcement learning has certain limitations in individuation and pleasure.

Disclosure of Invention

Aiming at the defects existing in the prior art, the application provides a vehicle personalized following control method based on generation of countermeasures and imitation learning, so as to solve the technical problem that the driving habit of a driver cannot be objectively and comprehensively reflected by formulating a reward function in the existing following control technology based on deep reinforcement learning.

The technical scheme adopted by the application is that the vehicle personalized following control method based on the generation of the countermeasure imitation study comprises the following steps:

establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle;

setting different speed curves of a front vehicle in a simulated vehicle following simulation environment;

according to different speed curves of a front vehicle, carrying out a simulated driving following test in a simulated following simulation environment, collecting driving data of a main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set;

constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method;

and performing personalized following control on the vehicle by using a vehicle personalized following control model.

Further, establishing the simulated vehicle following simulation environment includes:

setting up a simulated vehicle following simulation environment by adopting an automatic driving simulation platform;

the method comprises the steps that vehicle dynamics simulation software is adopted for a main vehicle to carry out main vehicle dynamics modeling;

a random traffic flow model is used to describe the surrounding vehicle motion.

Further, the different speed profiles of the front vehicle include: the front vehicle runs at a constant speed, at a reduced speed, at emergency braking and at random speed.

Further, the following state includes: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle _r Main vehicle speed v _h The action is longitudinal acceleration a of the main vehicle _h 。

Further, constructing the vehicle personalized following control model includes:

the method comprises the steps of taking the relative distance, the relative speed and the speed of a main vehicle and a front vehicle as inputs, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network;

the method comprises the steps of taking the relative distance and the relative speed of a main vehicle and a front vehicle, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, taking a true value and a false value as outputs, and establishing a discrimination neural network;

a continuous follow-up segment is obtained from a uniform sampling of the driver dataset:

wherein Respectively representing the m-th step following state and the actual action of a driver;

inputting the obtained continuous following segment into a strategy generation neural network to interact with a simulation environment to obtain a simulation following segment: wherein />Respectively represent the following shape of the nth step of the simulation processThe state and the strategy generate a neural network output action;

inputting the simulation following segment into a discrimination neural network, and discriminating the true degree of the strategy generation neural network output by adopting the discrimination neural network;

generating a neural network using a plurality of sequential following segment training strategies;

training and distinguishing the neural network by using a plurality of simulation following fragments;

and updating and judging the neural network parameters by adopting a gradient descent method.

Further, the number of the neurons of the input layer of the strategy generation neural network is 3, and the number of the neurons is respectively the relative distance, the host vehicle speed and the relative speed; the number of the neurons of the output layer is 1, and the neurons are longitudinal acceleration of the host vehicle; the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the policy generating neural network is expressed as:

f＝π(as；ω)

wherein a represents an action, a= [ a ] _h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] _h ,v _r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generation neural network parameters.

Further, 4 neurons of an input layer of the neural network are judged, namely, the relative distance, the speed of the host vehicle, the relative speed and the longitudinal acceleration of the host vehicle, 1 neuron of an output layer is judged, and the values of the neurons of the output layer are (0, 1); the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the discriminating neural network is expressed as:

p _a ＝D(s,a；θ)∈(0,1)

where s represents the vehicle state, s= [ d, v _h ,v _r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] _h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discrimination neural network parameter.

Further, when a continuous following fragment is obtained by uniformly sampling from a driver data set, taking the initial state of a host vehicle and the running track of a front vehicle in the fragment as simulation scenes, defining a current strategy generation neural network parameter strategy generation neural network to perform probability sampling on each action so as to control interaction between the host vehicle and the environment, stopping simulation when a stopping condition is met, and recording simulation following fragment data; the stop conditions include:

finishing the reading of the sample data;

the two vehicles collide;

the speed of the master vehicle is less than or equal to 0.

Further, when the discrimination neural network is adopted to discriminate the true degree of the output of the strategy generation neural network, the cross entropy is defined as the kth step return function r _k ＝logD(s _k ,a _k The method comprises the steps of carrying out a first treatment on the surface of the θ), substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:

wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.

Further, when training the strategy generation neural network and the discrimination neural network, the objective function of the strategy generation neural network is as follows:

where ω represents the network parameter to be updated, ω _now Representing current network parameters;

the loss function of the discrimination neural network is:

wherein m is the sampling point number of the continuous following fragments, and n is the sampling point number of the simulation following fragments;

the gradient descent method is adopted to update and judge the parameters of the neural network, and the parameter updating formula is as follows:

wherein ,θ_old For the current discrimination of network parameters, θ _new In order to judge the network parameters after updating, lambda is the learning rate.

According to the technical scheme, the beneficial technical effects of the application are as follows:

1. the strategy network is more in line with the behavior characteristics of the driver through the generation network and the discrimination network without manually defining the reward function, so that the following control strategy is more in line with the driving habit of the driver.

2. Based on the simulation driving device, the driving data acquisition of the driver is carried out by establishing different simulation scenes, the system structure is simpler, the cost is low, and the driving is safer compared with the actual road.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of a vehicle personalized follow-up control flow according to an embodiment of the present application;

FIG. 2 is a block diagram of a vehicle personalized follow-up control strategy according to an embodiment of the present application;

FIG. 3 is a block diagram of a policy generation neural network according to an embodiment of the present application;

FIG. 4 is a diagram of a discriminating neural network according to an embodiment of the application;

fig. 5 is a schematic diagram of a truncation function according to an embodiment of the present application.

Detailed Description

Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application.

It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.

Examples

The embodiment provides a vehicle personalized following control method based on generation of countermeasure imitation study, as shown in fig. 1, specifically comprising the following steps:

step 1, establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle.

In a specific embodiment, an automatic driving simulation platform Prescan is adopted to build a simulation vehicle following simulation environment; the simulated vehicle following simulation environment comprises a road model, a main vehicle and a front vehicle. And carrying out host vehicle dynamics modeling on the host vehicle by adopting vehicle dynamics simulation software Carsim to obtain a host vehicle Carsim model. For surrounding vehicles, a random traffic flow model is used to describe the surrounding vehicle motion. In this embodiment, the host vehicle is a vehicle that adopts autonomous personalized following control, and the front vehicle is a vehicle that is located in the same lane as the host vehicle, in front of the host vehicle, and is followed by the host vehicle.

And 2, setting different speed curves of the front vehicle in a simulated vehicle following simulation environment.

Specifically, the different speed profiles of the front vehicle include: the front vehicle runs at a constant speed, runs at a reduced speed, runs at an emergency brake and at random speed, and different following working conditions of the main vehicle can be simulated by setting different speed curves of the front vehicle.

And 3, performing a simulated driving following test in a simulated following simulation environment according to different speed curves of the front vehicle, collecting driving data of the main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set.

In a specific embodiment, compass G29 is adopted as a simulated driving device in a simulated vehicle following simulation environment, a driver controls a steering wheel and an accelerator/brake pedal, and a driving simulator collects steering wheel angle, accelerator pedal opening and brake pedal opening signals and transmits the steering wheel angle, accelerator pedal opening and brake pedal opening signals to a main vehicle Carsim model.

According to different speed curves of the front vehicle, the driver is in the modelAnd simulating different following working conditions of the simulated car following simulation environment, performing multiple simulated driving following tests, acquiring driving data of a main car and a front car in the same lane to obtain a following state, wherein the following state comprises the following states: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle _r Main vehicle speed v _h Longitudinal acceleration a of host vehicle _h . The relative distance, the host vehicle speed and the relative speed are selected as the vehicle state s, i.e. s= [ d, v _h ,v _r ]The method comprises the steps of carrying out a first treatment on the surface of the The longitudinal acceleration of the main vehicle is selected as action a, i.e. a= [ a ] _h ]。

In a specific embodiment, the collected test data may be embodied as a plurality of different continuous following segments, that is, k simulated driving tests are performed, but each test has different time, for example, the first time is 8 seconds, the second time is 10 seconds, each test corresponds to one continuous following segment, and each continuous following segment contains 4 driving data of relative distance, relative speed, host vehicle speed and host vehicle longitudinal acceleration. In the case of collecting test data, the sampling frequency is not limited, and the preferred frequency is 5Hz, the number of sampling points for the first test for 8 seconds is 40, and the number of sampling points for the second test for 10 seconds is 50.

A plurality of consecutive heel segments is selected as the driver dataset Γ, specifically as follows:

Γ＝{τ ⁽¹⁾ ,τ ⁽²⁾ ,...,τ ^(k) }

wherein: τ= [ s ] ₁ ,a ₁ ,s ₂ ,a ₂ ,...,s _m ,a _m ]K is the number of consecutive follow-up segments and m is the number of sampling points in each segment.

Step 4, constructing a vehicle personalized following control model according to the driver following data set by using a generated countermeasure network imitation reinforcement learning method

As shown in fig. 2, the construction of the vehicle personalized following control model specifically includes:

step 4.1, taking the relative distance, the relative speed and the speed of the main vehicle and the front vehicle as input, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network

In the present embodimentIn the method, the longitudinal acceleration a= [ a ] of the main vehicle is adopted _h ]In a specific embodiment, the longitudinal acceleration range of the host vehicle is set to-3 m/s ² ≤a _h ≤3m/s ² 。

The strategy generation neural network structure is shown in figure 3, and the number of the input layer neurons is 3, namely the relative distance, the host vehicle speed and the relative speed; the number of the neurons of the output layer is 1, namely the longitudinal acceleration of the host vehicle, the number of the hidden layers is 2, and the number of the neurons of each hidden layer is 5. The policy generating neural network is expressed as:

f＝π(a|s；ω)

wherein a represents an action, a= [ a ] _h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] _h ,v _r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generating neural network parameters including the number of network layers, the number of neurons per layer.

Step 4.2, taking the relative distance between the main vehicle and the front vehicle, the relative speed, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, and taking the true value and the false value as outputs to establish a discrimination neural network

The neural network structure is shown in figure 4, the number of neurons of the input layer is 4, and the neurons are the relative distance, the speed of the host vehicle, the relative speed and the longitudinal acceleration [ d, v ] of the host vehicle _h ,v _r ,a _h ]The number of output layer neurons is 1, the value of which is (0, 1), the closer the output is to 1 is to be "true", i.e., the action is the driver behavior, and the closer is to 0 is to be "false", i.e., the action is generated by the strategy generation neural network. The hidden layers are 2 layers, and the number of neurons of each hidden layer is 5. The discriminating neural network is expressed as:

p _a ＝D(s,a；θ)∈(0,1)

where s represents the vehicle state, s= [ d, v _h ,v _r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] _h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discriminating neural network parameter, and the parameters include the number of network layers and the number of neurons in each layer.

Step 4.3, obtaining a continuous heel segment from the driver data set by uniform sampling

In this embodiment, continuous patches obtained from uniform sampling of the driver data setThe section is recorded as follows: wherein />Representing the m-th step following state and the actual action of the driver, respectively.

The following continuous fragments are obtained as any one of all fragments from a uniform sampling of the driver dataset.

Step 4.4, inputting the obtained continuous following segments into a strategy generation neural network to interact with a simulation environment respectively, so as to obtain simulation following segments; the simulation following segment is input into a discrimination neural network, and the discrimination neural network is adopted to discriminate the real degree of the output of the strategy generation neural network

Taking the initial state of the main vehicle and the running track of the front vehicle in the continuous following section obtained in the step 4.3 as simulation scenes, and defining the parameters of the current strategy generation neural network as omega _now Strategy generation neural network pi (a|s; omega) _now ) Each action performs probability sampling to control interaction between the host vehicle and the environment, and stops simulation when the simulation stopping condition is met to obtain simulation following fragment data wherein />Generating a neural network output action by representing the following state and the strategy of the nth step of the simulation process respectively;

the condition for stopping the simulation is any one of the following:

finishing the reading of the sample data;

the two vehicles collide (the relative distance d is less than or equal to 0);

speed v of host vehicle _h ≤0。

In this step, m and n represent the lengths of two consecutive heel fragments, respectively;

when the discrimination neural network is adopted to discriminate the true degree of the strategy generation neural network output, in a specific embodiment, the cross entropy is defined as the kth step return function:

r _k ＝logD(s _k ,a _k ；θ)

substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:

In this way, the discrimination neural network can discriminate the true degree of the output of the policy generation neural network, and the closer the output is to 1, the more true the action is the driver behavior, and the closer the action is to 0, the more false the action is generated by the policy generation neural network.

Step 4.5 generating a neural network using a plurality of continuous following segment training strategies

A plurality of continuous following segments can be obtained by step 4.3, and the strategy generation neural network is trained using the plurality of continuous following segments. Specifically, a near-end strategy optimization method (Proximal Policy Optimization, PPO) is adopted to update a strategy to generate a neural network during training, and an objective function is defined as follows:

where ω represents the network parameter to be updated, ω _now Representing current network parameters; k represents the number of steps, pi () generates a neural network for the policy, r _t Is a return function.

Designing a near-end strategy optimization function based on maximum clipping, and updating the performance target as follows:

the strategy generation neural network parameter updating method is shown in the following formula:

wherein ω_new Representing the updated network parameters and,represents a truncation function, wherein ε is a truncation parameter; the function y=clip (x, a, b) is shown in fig. 5. In this embodiment, adam-based stochastic gradient ascent is used to solve for ω _new 。

Step 4.6, training and distinguishing the neural network by using a plurality of simulation following fragments

And 4.5, obtaining a plurality of simulation following fragments, and training the discrimination neural network by using the simulation following fragments. Specifically, the loss function is defined during training as follows:

wherein, m is the sampling point number of the continuous following fragments, D () is the discrimination neural network, and n is the sampling point number of the simulation following fragments. The smaller the first term in the above formula, the more representativeThe larger the second term, the smaller the second term, representing +.>The smaller the explanatory discriminator loss function G is, the better it is to be able to discriminate whether the system input is input by the driver or generated by the decision network.

Step 4.7: updating and distinguishing neural network parameters by adopting gradient descent method

Specifically, the parameter updating formula is:

wherein ,θ_old For the current discrimination of network parameters, θ _new In order to determine the network parameters after updating, lambda is the learning rate,representing a deviation from θ.

And (3) repeatedly executing the steps 4.3-4.7, obtaining a plurality of continuous following fragments from a driver data set, performing continuous interactive trial and error with the environment through an agent, training a strategy generation neural network and a discrimination neural network until convergence, and finally obtaining the strategy generation neural network through the step 4.4 when the convergence is finished, namely the vehicle personalized following control model, wherein the output of the strategy generation neural network is the personalized following control strategy.

And 5, performing personalized following control on the vehicle by using a personalized following control model of the vehicle.

By adopting the technical scheme of the embodiment, the strategy network is more in line with the behavior characteristics of the driver by generating the network and distinguishing the network without manually defining the rewarding function, so that the following control strategy is more in line with the driving habit of the driver.

Meanwhile, the driving data acquisition of the driver is carried out by establishing different simulation scenes based on the simulation driving device, the system structure is simple, the cost is low, and the driving is safer compared with the actual road driving.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application, and are intended to be included within the scope of the appended claims and description.

Claims

1. A vehicle personalized follow-up control method based on generation of challenge-mimicking learning, comprising the steps of:

establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a main vehicle and a front vehicle;

setting different speed curves of a front vehicle in the simulated vehicle following simulation environment;

according to different speed curves of the front vehicle, carrying out a simulated driving following test in the simulated following simulation environment, collecting driving data of the main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set;

constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method; the vehicle personalized following control model is constructed by the following steps: the method comprises the steps of taking the relative distance, the relative speed and the speed of a main vehicle and a front vehicle as inputs, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network; the method comprises the steps of taking the relative distance and the relative speed of a main vehicle and a front vehicle, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, taking a true value and a false value as outputs, and establishing a discrimination neural network; a continuous follow-up segment is obtained from a uniform sampling of the driver dataset: wherein />Respectively representing the m-th step following state and the actual action of a driver; inputting the obtained continuous following segment into a strategy generation neural network to interact with a simulation environment to obtain a simulation following segment: /> wherein />Generating a neural network output action by representing the following state and the strategy of the nth step of the simulation process respectively; inputting the simulation following segment into a discrimination neural network, and discriminating the true degree of the strategy generation neural network output by adopting the discrimination neural network;

generating a neural network by using a plurality of continuous following fragments to train the strategy, wherein the objective function of the strategy generating the neural network during training is as follows:

wherein ω' represents the policy generation neural network parameters to be updated, ω _now Representing the current policy generation neural network parameters, k representing the number of steps, pi () representing the policy generation neural network, r _t Is a return function;

training a discrimination neural network by using a plurality of simulation following fragments, wherein the loss function of the discrimination neural network during training is as follows:

wherein m is the sampling point number of the continuous following fragments, D () is the discrimination neural network, and n is the sampling point number of the simulation following fragments;

wherein ,θ_old For currently distinguishing the neural network parameters, θ _new In order to judge the neural network parameters after updating, lambda is the learning rate,representing deviation of theta;

2. The vehicle personalized heel control method based on generating countermeasure imitation learning of claim 1, wherein establishing an imitation heel simulation environment comprises:

a random traffic flow model is used to describe the surrounding vehicle motion.

3. The vehicle personalized heel control method based on generating countermeasure imitation learning of claim 1, wherein the different speed profiles of the preceding vehicle include: the front vehicle runs at a constant speed, at a reduced speed, at emergency braking and at random speed.

4. The vehicle personalized heel control method based on generation of countermeasure imitation learning of claim 1, wherein the heel state includes: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle _r Main vehicle speed v _h The action is longitudinal acceleration a of the main vehicle _h 。

5. The vehicle personalized following control method based on generation of countermeasure imitation learning according to claim 4, wherein the input layer neurons of the strategy generation neural network are 3, which are respectively a relative distance, a host vehicle speed and a relative speed; the number of the neurons of the output layer is 1, and the neurons are longitudinal acceleration of the host vehicle; the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the policy generating neural network is expressed as:

f＝π(as；ω)

6. The vehicle personalized following control method based on the generation of the countermeasure imitation study according to claim 4, wherein 4 input layer neurons of the discrimination neural network are respectively the relative distance, the host vehicle speed, the relative speed and the host vehicle longitudinal acceleration, and 1 output layer neuron is the value of (0, 1); the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the discriminating neural network is expressed as:

p _a ＝D(s,a；θ)∈(0,1)

7. The method for personalized vehicle following control based on generation of countermeasure imitation study according to claim 1, wherein when a continuous following segment is obtained by uniformly sampling from a driver data set, a main vehicle initial state and a front vehicle running track in the segment are used as simulation scenes, current strategy generation neural network parameters are defined, probability sampling is performed on each action of the strategy generation neural network to control the main vehicle to interact with the environment, simulation is stopped when a stopping condition is met, and simulation following segment data are recorded; the stop condition includes:

finishing the reading of the sample data;

the two vehicles collide;

the host vehicle speed is less than or equal to 0.

8. The vehicle personalized following control method based on generation of countermeasure imitation study according to claim 1, wherein when the discrimination neural network is adopted to discriminate the true degree of the policy generation neural network output, the cross entropy is defined as the kth step return function r _k ＝logD(s _k ,a _k The method comprises the steps of carrying out a first treatment on the surface of the θ), substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step: