CN113221469A - Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator - Google Patents
Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator Download PDFInfo
- Publication number
- CN113221469A CN113221469A CN202110625802.1A CN202110625802A CN113221469A CN 113221469 A CN113221469 A CN 113221469A CN 202110625802 A CN202110625802 A CN 202110625802A CN 113221469 A CN113221469 A CN 113221469A
- Authority
- CN
- China
- Prior art keywords
- track
- traffic
- reward function
- data
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The invention provides a reverse reinforcement learning method and a system for enhancing the authenticity of a traffic simulator, which comprises the following steps: initializing a track action strategy through a generator; generating track data of a plurality of agents by combining the current environment; mixing the track data with preset expert track data, inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training aim is to maximize a reward function; inputting the reward function into a generator, and obtaining a new track action strategy by the generator; generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence; and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy. The method can deduce the reward function of the real-world vehicle, can optimize strategies under different traffic environments, and has good expandability.
Description
Technical Field
The invention relates to the field of computer software and traffic, in particular to a reverse reinforcement learning method and a reverse reinforcement learning system for enhancing the reality of a traffic simulator.
Background
In recent years, with the development of urbanization, urban traffic flow is increased year by year, population is dense and the like, so that the construction of an urban road traffic system is very complex, and a plurality of road construction problems, such as urban traffic network planning and evaluation, traffic jam and traffic stream evacuation, lane restriction and speed limitation, traffic signal control and the like, cannot be intuitively and scientifically solved.
Traffic simulators have been one of the important research hotspots in the traffic field. Microscopic traffic simulation plays an important role in the planning, design and operation of traffic systems. At present, the traffic simulator has two important functions: firstly, effect evaluation of city planning and city operation is carried out. A well-designed traffic simulator allows city operators and city planners to test policies for urban road planning, traffic control, and traffic congestion optimization by accurately inferring the possible impact that policies for construction and passage of various facilities have on the urban traffic environment. And secondly, providing the learnt data of urban traffic operation for researchers carrying out various urban intelligent algorithms. At present, a lot of work is carried out to train and test an intelligent traffic signal control strategy by using a traffic simulator, because a lot of simulation data can be generated for training a signal controller, so that the problem that real city data cannot meet the requirement of a lot of training data of a machine learning algorithm is solved.
Currently, most advanced micro traffic simulators use a Following Model (Car Following Model, CFM for short) to describe the movement of a single vehicle.
Each vehicle has certain attributes and parameters, and when a simulation system creates a vehicle, the simulation system needs to initialize the parameter values of the vehicle, and controls the vehicle by adjusting the parameters in the driving process. The commonly used parameters comprise automobile acceleration parameters and driver reaction time parameters of each automobile, and the richness and diversity of vehicle simulation are increased through different parameter settings, and urban traffic vehicle tracks can be more realistically simulated.
However, it is now conventional to use several physical and empirical formulas to define the parameters of the following model. The setting of these parameters must be carefully calibrated using traffic data. The calibrated follow-up model may be used as a strategy to provide optimal behavior of the vehicle under given environmental conditions. The optimization of this strategy is achieved by calibrating the parameters of the following model, which are obtained by analyzing the inter-observed and simulated traffic measurements.
The whole process is as follows:
step 1: generating a series of traffic flow status data by a traffic simulator;
step 2: generating a traffic vehicle control strategy pi according to the traffic flow state dataψ;
Step 3: generating current vehicle traffic actions (starting, stopping, accelerating and decelerating) of each vehicle according to current vehicle traffic states (parameters such as vehicle speed, whether to use a traffic light, the distance between vehicles ahead and the like);
step 4: transmitting the generated traffic strategy to a traffic simulator control API;
step 5: applying the new traffic strategy and generating new traffic flow state data.
An effective traffic simulator should be able to produce accurate simulations in different traffic environments without being affected by environmental dynamics. This can be broken down into two specific challenges.
The first challenge is: the goal of conventional follow-up models is generally to simulate the follow-up behavior of a vehicle by applying physical laws and human knowledge. The movement of vehicles in the real world depends on many factors including speed, distance from neighbors, road networks, traffic lights and psychological factors of the driver. Although models that emphasize fitting different factors are continuously added to the family of the following models, such as setting a threshold value for vehicle movement according to the psychological tendency of the driver based on safe driving distance and speed, and the like. However, there is currently no general model that fully reveals the authenticity of vehicle behavior patterns in a comprehensive context. The follow-up model relies on inaccurate a priori knowledge and often fails to show true simulation effects despite correction.
The second challenge is: many studies consider learning using expert data. Due to the fact that the expert data have relative normativity and stability, the following model still has no robustness to various environment dynamics under different traffic environments. This is a challenging problem due to the non-stationarity of real world traffic dynamics. For example, weather and road conditions may change the mechanical properties and coefficient of friction of the vehicle with the road surface, and ultimately result in changes in its acceleration and braking performance. In a real scenario, a good driver will adjust his driving strategy accordingly to environmental changes and will behave differently under these dynamic changes (e.g., given the same observed speed, different accelerations will be used). However, given a fixed strategy (i.e., CFM), current simulators are generally unable to adapt the strategy to different dynamics. To simulate different traffic environments with significantly different dynamics, the following model must be recalibrated using new trajectory data associated with the environment, requiring relearning. Such relearning results in inefficiencies.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a reverse reinforcement learning method and a reverse reinforcement learning system for enhancing the authenticity of a traffic simulator.
The invention provides a reverse reinforcement learning method for enhancing the authenticity of a traffic simulator, which comprises the following steps:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
Preferably, the method treats traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, SmAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ represents a long-term reward discount coefficient, and a state transition function is defined as T (s' | s, a) describing a micro traffic simulation problem in an inverse reinforcement learning manner.
Preferably, it is assumed that the strategy of action pi is based on a trajectory*(a | s) generates M pieces of track data D ═ τ1,τ2,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
Preferably, the arbiter uses a state-action pair as input,
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (a, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
Preferably, the agent comprises a vehicle interacting with the environment.
According to the invention, the inverse reinforcement learning system for enhancing the authenticity of the traffic simulator comprises:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
Preferably, the method treats traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, smAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ representsThe long-term reward discount coefficient and the state transition function are defined as T (s' | s, a), and the microscopic traffic simulation problem is described in an inverse reinforcement learning mode.
Preferably, it is assumed that the strategy of action pi is based on a trajectory*(a | s) generates M pieces of track data D ═ τ1,τ2,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
Preferably, the arbiter uses a state-action pair as input,
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
Preferably, the agent comprises a vehicle interacting with the environment.
Compared with the prior art, the invention has the following beneficial effects:
the present invention is based on an Inverse Reinforcement Learning (IRL) model, which can infer the reward function of real-world vehicles. It enables us to optimize strategies in different traffic environments.
The present invention uses a parameter sharing mechanism to extend the proposed model to a multi-agent environment, making our model have good extensibility.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of the inverse reinforcement learning of the augmented reality of the traffic simulator of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention adopts the consideration of the following model of the inverse reinforcement learning to solve the two challenges of the traditional following model:
for the first challenge, a direct consideration is to learn the behavior pattern of the vehicle directly from real-world observations, rather than relying on a priori knowledge that is not reliable or that is not well explored by state space. Recently, mock learning has shown the ability to learn from the demonstration example. However, direct modeling of learning methods, such as behavioral cloning, aims to extract expert strategies directly from the data. This approach may still fail in addressing the second challenge. Because the learning strategy may lose effect when the traffic environment changes dynamically, such as the weather changes or the road conditions change, the learning needs to be repeated.
Inverse Reinforcement Learning (IRL) learns not only the expert's strategy but also the reward function (e.g., driving at the most appropriate speed without colliding) from the demonstration example, which can adapt to different traffic environments. Therefore, an inverse reinforcement learning based approach is used to train vehicle simulated agents to generate accurate trajectories.
Meanwhile, parameter sharing is used for accelerating the learning of the multi-agent. Considering the complex real traffic scenario of multi-vehicle interaction, we extend IRL into the multi-agent environment of traffic simulation. A parameter sharing mechanism is combined with inverse reinforcement learning, a new inverse reinforcement learning algorithm called parameter sharing is generated, and a dynamic robust traffic simulation model is formed. Meanwhile, an online updating process is provided, and the learned reward function is used for guiding strategy learning in a new environment under the condition that no new track data is needed.
As shown in fig. 1, the present invention provides a reverse reinforcement learning method for enhancing the reality of a traffic simulator, comprising:
an initialization step: a track action policy is initialized by the generator.
A track data generation step: and generating track data of a plurality of agents according to the track action strategy and the current environment.
Mixing: and mixing the track data with preset expert track data to obtain mixed track data.
Training: and inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training aim is to maximize a reward function.
And (3) optimizing: and inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy.
Iteration step: and generating the trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training the discriminator until convergence.
An output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
Considering the complex interaction between vehicles in traffic, traffic simulation is considered as a multi-agent control problem. Formally defining our model as a set of tuples (M, { S)m},{Am},T,r,γ,ρ0) Markov Decision Process (Markov Decision Process) of the representation.
Where M denotes a group of agents and M is one of the agents. SmAnd AmRespectively, as the state and action of each agent. Rho0Is a distribution that expresses the initial state. r (s, a) is a reward function, and γ represents a long-term reward discount coefficient. It is assumed that the environmental dynamics remain unchanged for a given set of expert demonstrations. The state transition function is defined as T (s' | s, a). The present invention describes the microscopic traffic simulation problem in the form of Inverse Reinforcement Learning (IRL).
Given the trajectory of the motion of an expert vehicle, the goal of the invention is to learn the reward function of the vehicle agent.
The problem of traffic simulator follow model learning is defined as follows:
suppose an expert generates a strategy according to a trajectory*(a | s) generating M pieces of expert trajectory data D ═ τ1,τ2,…,τM-a trace of n points τ ═ s0,a0,a1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log-likelihood of the expert trajectory:
The present invention trains a network with a discriminator-generator, which uses state-action pairs as inputs,
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy. Updating the arbiter amounts to updating the reward function and in turn the update strategy can be seen as improving the sampling distribution used to estimate the arbiter.
We describe traffic simulation as a multi-agent system problem, treating each vehicle in the traffic system as an agent interacting with the environment. The scattered parameter sharing training scheme is combined with the IRL, and a multi-vehicle simultaneous control strategy under the complex traffic environment of parameter sharing IRL (PS-IRL) learning is provided. In our algorithm, control is decentralized and learning is centralized.
The procedure of the reverse reinforcement learning system of the embodiment is as follows:
step 1: initializing a track action strategy pi (a | s);
step 2: applying to the environment, generating trajectory data D ═ τ of M agents1,τ2,…,τM};
Step 3: mixing the expert tracks and the generated tracks, training a discriminator, and distinguishing whether the expert tracks are the expert tracks;
step 4: continuing to train the reward function rθ(s,a)
Step 5: obtaining a new track action strategy pi;
step 6: repeatedly executing Step 2-Step 5 until convergence;
step 7: output rθ(s, a) and π (a | s).
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (10)
1. A reverse reinforcement learning method for enhancing the realism of a traffic simulator, comprising:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
2. The reverse-reinforcement learning method for enhancing the realism of traffic simulators as claimed in claim 1, wherein said method considers traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S) } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, SmAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ represents a long-term reward discount coefficient, and a state transition function is defined as T (s' | s, a) describing a micro traffic simulation problem in an inverse reinforcement learning manner.
3. The inverse reinforcement learning method for enhancing the realism of traffic simulators as claimed in claim 2, wherein it is assumed that pi is a trajectory action strategy*(a | s) generates M pieces of track data D ═ τ1,τ2,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
4. The inverse reinforcement learning method for enhancing the realism of traffic simulators as claimed in claim 3, wherein said arbiter uses a state-action pair as an input,
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
5. The inverse reinforcement learning method of enhancing the realism of a traffic simulator of claim 1, wherein the agent comprises a vehicle interacting with the environment.
6. A reverse reinforcement learning system for enhancing the realism of traffic simulators, comprising:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
7. The system of claim 6, wherein the method treats traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S) } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, SmAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ represents a long-term reward discount coefficient, and a state transition function is defined as T (s' | s, a) describing a micro traffic simulation problem in an inverse reinforcement learning manner.
8. The inverse reinforcement learning system for enhancing the realism of traffic simulators as claimed in claim 7, wherein the assumption is made according to a trajectory action strategy pi*(a | s) generates M pieces of track data D ═ τ1,τ2,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
9. The inverse reinforcement learning system for enhancing the realism of traffic simulators as claimed in claim 8, wherein said arbiter uses state-action pairs as inputs,
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
10. The inverse reinforcement learning system for enhancing the realism of traffic simulators as recited in claim 6, wherein the agents include vehicles that interact with the environment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110625802.1A CN113221469A (en) | 2021-06-04 | 2021-06-04 | Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110625802.1A CN113221469A (en) | 2021-06-04 | 2021-06-04 | Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113221469A true CN113221469A (en) | 2021-08-06 |
Family
ID=77082882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110625802.1A Pending CN113221469A (en) | 2021-06-04 | 2021-06-04 | Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113221469A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023109663A1 (en) * | 2021-12-17 | 2023-06-22 | 深圳先进技术研究院 | Serverless computing resource configuration method based on maximum entropy inverse reinforcement learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111091711A (en) * | 2019-12-18 | 2020-05-01 | 上海天壤智能科技有限公司 | Traffic control method and system based on reinforcement learning and traffic lane competition theory |
CN111401556A (en) * | 2020-04-22 | 2020-07-10 | 清华大学深圳国际研究生院 | Selection method of opponent type imitation learning winning incentive function |
CN112172813A (en) * | 2020-10-14 | 2021-01-05 | 长安大学 | Car following system and method for simulating driving style based on deep inverse reinforcement learning |
US20210049415A1 (en) * | 2018-03-06 | 2021-02-18 | Waymo UK Ltd. | Behaviour Models for Autonomous Vehicle Simulators |
CN112818599A (en) * | 2021-01-29 | 2021-05-18 | 四川大学 | Air control method based on reinforcement learning and four-dimensional track |
CN112884130A (en) * | 2021-03-16 | 2021-06-01 | 浙江工业大学 | SeqGAN-based deep reinforcement learning data enhanced defense method and device |
-
2021
- 2021-06-04 CN CN202110625802.1A patent/CN113221469A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210049415A1 (en) * | 2018-03-06 | 2021-02-18 | Waymo UK Ltd. | Behaviour Models for Autonomous Vehicle Simulators |
CN111091711A (en) * | 2019-12-18 | 2020-05-01 | 上海天壤智能科技有限公司 | Traffic control method and system based on reinforcement learning and traffic lane competition theory |
CN111401556A (en) * | 2020-04-22 | 2020-07-10 | 清华大学深圳国际研究生院 | Selection method of opponent type imitation learning winning incentive function |
CN112172813A (en) * | 2020-10-14 | 2021-01-05 | 长安大学 | Car following system and method for simulating driving style based on deep inverse reinforcement learning |
CN112818599A (en) * | 2021-01-29 | 2021-05-18 | 四川大学 | Air control method based on reinforcement learning and four-dimensional track |
CN112884130A (en) * | 2021-03-16 | 2021-06-01 | 浙江工业大学 | SeqGAN-based deep reinforcement learning data enhanced defense method and device |
Non-Patent Citations (1)
Title |
---|
GUANJIE ZHENG ET AL.: ""Objective-aware Traffic Simulation via Inverse Reinforcement Learning"", 《链接:HTTPS://ARXIV.ORG/ABS/2105.09560V1》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023109663A1 (en) * | 2021-12-17 | 2023-06-22 | 深圳先进技术研究院 | Serverless computing resource configuration method based on maximum entropy inverse reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bhattacharyya et al. | Simulating emergent properties of human driving behavior using multi-agent reward augmented imitation learning | |
Ye et al. | Automated lane change strategy using proximal policy optimization-based deep reinforcement learning | |
Jia et al. | Advanced building control via deep reinforcement learning | |
CN109388073B (en) | Method and device for vehicle dynamic simulation | |
Camponogara et al. | Distributed learning agents in urban traffic control | |
CN111483468B (en) | Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning | |
CN109733415A (en) | A kind of automatic Pilot following-speed model that personalizes based on deeply study | |
Lu et al. | Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios | |
CN109709956A (en) | A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding | |
Li et al. | Combined trajectory planning and tracking for autonomous vehicle considering driving styles | |
CN105700526A (en) | On-line sequence limit learning machine method possessing autonomous learning capability | |
CN116134292A (en) | Tool for performance testing and/or training an autonomous vehicle planner | |
CN113221469A (en) | Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator | |
Ramyar et al. | A personalized highway driving assistance system | |
CN113657433B (en) | Multi-mode prediction method for vehicle track | |
Venkatesh et al. | Connected and automated vehicles in mixed-traffic: Learning human driver behavior for effective on-ramp merging | |
CN114973650A (en) | Vehicle ramp entrance confluence control method, vehicle, electronic device, and storage medium | |
Konstantinidis et al. | Parameter sharing reinforcement learning for modeling multi-agent driving behavior in roundabout scenarios | |
Yuan et al. | Evolutionary decision-making and planning for autonomous driving based on safe and rational exploration and exploitation | |
Zhang et al. | PlanLight: learning to optimize traffic signal control with planning and iterative policy improvement | |
Sukthankar et al. | Evolving an intelligent vehicle for tactical reasoning in traffic | |
CN116894395A (en) | Automatic driving test scene generation method, system and storage medium | |
Koeberle et al. | Exploring the trade off between human driving imitation and safety for traffic simulation | |
CN116620327A (en) | Lane changing decision method for realizing automatic driving high-speed scene based on PPO and Lattice | |
Yang et al. | Accelerating safe reinforcement learning with constraint-mismatched policies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210806 |
|
RJ01 | Rejection of invention patent application after publication |