CN113221469A - Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator - Google Patents

Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator Download PDF

Info

Publication number
CN113221469A
CN113221469A CN202110625802.1A CN202110625802A CN113221469A CN 113221469 A CN113221469 A CN 113221469A CN 202110625802 A CN202110625802 A CN 202110625802A CN 113221469 A CN113221469 A CN 113221469A
Authority
CN
China
Prior art keywords
track
traffic
reward function
data
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110625802.1A
Other languages
Chinese (zh)
Inventor
薛贵荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tianran Intelligent Technology Co ltd
Original Assignee
Shanghai Tianran Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tianran Intelligent Technology Co ltd filed Critical Shanghai Tianran Intelligent Technology Co ltd
Priority to CN202110625802.1A priority Critical patent/CN113221469A/en
Publication of CN113221469A publication Critical patent/CN113221469A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention provides a reverse reinforcement learning method and a system for enhancing the authenticity of a traffic simulator, which comprises the following steps: initializing a track action strategy through a generator; generating track data of a plurality of agents by combining the current environment; mixing the track data with preset expert track data, inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training aim is to maximize a reward function; inputting the reward function into a generator, and obtaining a new track action strategy by the generator; generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence; and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy. The method can deduce the reward function of the real-world vehicle, can optimize strategies under different traffic environments, and has good expandability.

Description

Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator
Technical Field
The invention relates to the field of computer software and traffic, in particular to a reverse reinforcement learning method and a reverse reinforcement learning system for enhancing the reality of a traffic simulator.
Background
In recent years, with the development of urbanization, urban traffic flow is increased year by year, population is dense and the like, so that the construction of an urban road traffic system is very complex, and a plurality of road construction problems, such as urban traffic network planning and evaluation, traffic jam and traffic stream evacuation, lane restriction and speed limitation, traffic signal control and the like, cannot be intuitively and scientifically solved.
Traffic simulators have been one of the important research hotspots in the traffic field. Microscopic traffic simulation plays an important role in the planning, design and operation of traffic systems. At present, the traffic simulator has two important functions: firstly, effect evaluation of city planning and city operation is carried out. A well-designed traffic simulator allows city operators and city planners to test policies for urban road planning, traffic control, and traffic congestion optimization by accurately inferring the possible impact that policies for construction and passage of various facilities have on the urban traffic environment. And secondly, providing the learnt data of urban traffic operation for researchers carrying out various urban intelligent algorithms. At present, a lot of work is carried out to train and test an intelligent traffic signal control strategy by using a traffic simulator, because a lot of simulation data can be generated for training a signal controller, so that the problem that real city data cannot meet the requirement of a lot of training data of a machine learning algorithm is solved.
Currently, most advanced micro traffic simulators use a Following Model (Car Following Model, CFM for short) to describe the movement of a single vehicle.
Each vehicle has certain attributes and parameters, and when a simulation system creates a vehicle, the simulation system needs to initialize the parameter values of the vehicle, and controls the vehicle by adjusting the parameters in the driving process. The commonly used parameters comprise automobile acceleration parameters and driver reaction time parameters of each automobile, and the richness and diversity of vehicle simulation are increased through different parameter settings, and urban traffic vehicle tracks can be more realistically simulated.
However, it is now conventional to use several physical and empirical formulas to define the parameters of the following model. The setting of these parameters must be carefully calibrated using traffic data. The calibrated follow-up model may be used as a strategy to provide optimal behavior of the vehicle under given environmental conditions. The optimization of this strategy is achieved by calibrating the parameters of the following model, which are obtained by analyzing the inter-observed and simulated traffic measurements.
The whole process is as follows:
step 1: generating a series of traffic flow status data by a traffic simulator;
step 2: generating a traffic vehicle control strategy pi according to the traffic flow state dataψ
Step 3: generating current vehicle traffic actions (starting, stopping, accelerating and decelerating) of each vehicle according to current vehicle traffic states (parameters such as vehicle speed, whether to use a traffic light, the distance between vehicles ahead and the like);
step 4: transmitting the generated traffic strategy to a traffic simulator control API;
step 5: applying the new traffic strategy and generating new traffic flow state data.
An effective traffic simulator should be able to produce accurate simulations in different traffic environments without being affected by environmental dynamics. This can be broken down into two specific challenges.
The first challenge is: the goal of conventional follow-up models is generally to simulate the follow-up behavior of a vehicle by applying physical laws and human knowledge. The movement of vehicles in the real world depends on many factors including speed, distance from neighbors, road networks, traffic lights and psychological factors of the driver. Although models that emphasize fitting different factors are continuously added to the family of the following models, such as setting a threshold value for vehicle movement according to the psychological tendency of the driver based on safe driving distance and speed, and the like. However, there is currently no general model that fully reveals the authenticity of vehicle behavior patterns in a comprehensive context. The follow-up model relies on inaccurate a priori knowledge and often fails to show true simulation effects despite correction.
The second challenge is: many studies consider learning using expert data. Due to the fact that the expert data have relative normativity and stability, the following model still has no robustness to various environment dynamics under different traffic environments. This is a challenging problem due to the non-stationarity of real world traffic dynamics. For example, weather and road conditions may change the mechanical properties and coefficient of friction of the vehicle with the road surface, and ultimately result in changes in its acceleration and braking performance. In a real scenario, a good driver will adjust his driving strategy accordingly to environmental changes and will behave differently under these dynamic changes (e.g., given the same observed speed, different accelerations will be used). However, given a fixed strategy (i.e., CFM), current simulators are generally unable to adapt the strategy to different dynamics. To simulate different traffic environments with significantly different dynamics, the following model must be recalibrated using new trajectory data associated with the environment, requiring relearning. Such relearning results in inefficiencies.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a reverse reinforcement learning method and a reverse reinforcement learning system for enhancing the authenticity of a traffic simulator.
The invention provides a reverse reinforcement learning method for enhancing the authenticity of a traffic simulator, which comprises the following steps:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
Preferably, the method treats traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, SmAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ represents a long-term reward discount coefficient, and a state transition function is defined as T (s' | s, a) describing a micro traffic simulation problem in an inverse reinforcement learning manner.
Preferably, it is assumed that the strategy of action pi is based on a trajectory*(a | s) generates M pieces of track data D ═ τ12,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
Figure BDA0003101051140000031
wherein
Figure BDA0003101051140000032
Is in the reward function rθ(s, a) distribution of trajectories.
Preferably, the arbiter uses a state-action pair as input,
Figure BDA0003101051140000033
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (a, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
Preferably, the agent comprises a vehicle interacting with the environment.
According to the invention, the inverse reinforcement learning system for enhancing the authenticity of the traffic simulator comprises:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
Preferably, the method treats traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, smAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ representsThe long-term reward discount coefficient and the state transition function are defined as T (s' | s, a), and the microscopic traffic simulation problem is described in an inverse reinforcement learning mode.
Preferably, it is assumed that the strategy of action pi is based on a trajectory*(a | s) generates M pieces of track data D ═ τ12,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
Figure BDA0003101051140000041
wherein
Figure BDA0003101051140000042
Is in the reward function rθ(s, a) distribution of trajectories.
Preferably, the arbiter uses a state-action pair as input,
Figure BDA0003101051140000043
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
Preferably, the agent comprises a vehicle interacting with the environment.
Compared with the prior art, the invention has the following beneficial effects:
the present invention is based on an Inverse Reinforcement Learning (IRL) model, which can infer the reward function of real-world vehicles. It enables us to optimize strategies in different traffic environments.
The present invention uses a parameter sharing mechanism to extend the proposed model to a multi-agent environment, making our model have good extensibility.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of the inverse reinforcement learning of the augmented reality of the traffic simulator of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention adopts the consideration of the following model of the inverse reinforcement learning to solve the two challenges of the traditional following model:
for the first challenge, a direct consideration is to learn the behavior pattern of the vehicle directly from real-world observations, rather than relying on a priori knowledge that is not reliable or that is not well explored by state space. Recently, mock learning has shown the ability to learn from the demonstration example. However, direct modeling of learning methods, such as behavioral cloning, aims to extract expert strategies directly from the data. This approach may still fail in addressing the second challenge. Because the learning strategy may lose effect when the traffic environment changes dynamically, such as the weather changes or the road conditions change, the learning needs to be repeated.
Inverse Reinforcement Learning (IRL) learns not only the expert's strategy but also the reward function (e.g., driving at the most appropriate speed without colliding) from the demonstration example, which can adapt to different traffic environments. Therefore, an inverse reinforcement learning based approach is used to train vehicle simulated agents to generate accurate trajectories.
Meanwhile, parameter sharing is used for accelerating the learning of the multi-agent. Considering the complex real traffic scenario of multi-vehicle interaction, we extend IRL into the multi-agent environment of traffic simulation. A parameter sharing mechanism is combined with inverse reinforcement learning, a new inverse reinforcement learning algorithm called parameter sharing is generated, and a dynamic robust traffic simulation model is formed. Meanwhile, an online updating process is provided, and the learned reward function is used for guiding strategy learning in a new environment under the condition that no new track data is needed.
As shown in fig. 1, the present invention provides a reverse reinforcement learning method for enhancing the reality of a traffic simulator, comprising:
an initialization step: a track action policy is initialized by the generator.
A track data generation step: and generating track data of a plurality of agents according to the track action strategy and the current environment.
Mixing: and mixing the track data with preset expert track data to obtain mixed track data.
Training: and inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training aim is to maximize a reward function.
And (3) optimizing: and inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy.
Iteration step: and generating the trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training the discriminator until convergence.
An output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
Considering the complex interaction between vehicles in traffic, traffic simulation is considered as a multi-agent control problem. Formally defining our model as a set of tuples (M, { S)m},{Am},T,r,γ,ρ0) Markov Decision Process (Markov Decision Process) of the representation.
Where M denotes a group of agents and M is one of the agents. SmAnd AmRespectively, as the state and action of each agent. Rho0Is a distribution that expresses the initial state. r (s, a) is a reward function, and γ represents a long-term reward discount coefficient. It is assumed that the environmental dynamics remain unchanged for a given set of expert demonstrations. The state transition function is defined as T (s' | s, a). The present invention describes the microscopic traffic simulation problem in the form of Inverse Reinforcement Learning (IRL).
Given the trajectory of the motion of an expert vehicle, the goal of the invention is to learn the reward function of the vehicle agent.
The problem of traffic simulator follow model learning is defined as follows:
suppose an expert generates a strategy according to a trajectory*(a | s) generating M pieces of expert trajectory data D ═ τ1,τ2,…,τM-a trace of n points τ ═ s0,a0,a1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log-likelihood of the expert trajectory:
Figure BDA0003101051140000061
wherein
Figure BDA0003101051140000071
Is in the reward function rθ(s, a) distribution of trajectories.
The present invention trains a network with a discriminator-generator, which uses state-action pairs as inputs,
Figure BDA0003101051140000072
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy. Updating the arbiter amounts to updating the reward function and in turn the update strategy can be seen as improving the sampling distribution used to estimate the arbiter.
We describe traffic simulation as a multi-agent system problem, treating each vehicle in the traffic system as an agent interacting with the environment. The scattered parameter sharing training scheme is combined with the IRL, and a multi-vehicle simultaneous control strategy under the complex traffic environment of parameter sharing IRL (PS-IRL) learning is provided. In our algorithm, control is decentralized and learning is centralized.
The procedure of the reverse reinforcement learning system of the embodiment is as follows:
step 1: initializing a track action strategy pi (a | s);
step 2: applying to the environment, generating trajectory data D ═ τ of M agents12,…,τM};
Step 3: mixing the expert tracks and the generated tracks, training a discriminator, and distinguishing whether the expert tracks are the expert tracks;
step 4: continuing to train the reward function rθ(s,a)
Step 5: obtaining a new track action strategy pi;
step 6: repeatedly executing Step 2-Step 5 until convergence;
step 7: output rθ(s, a) and π (a | s).
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A reverse reinforcement learning method for enhancing the realism of a traffic simulator, comprising:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
2. The reverse-reinforcement learning method for enhancing the realism of traffic simulators as claimed in claim 1, wherein said method considers traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S) } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, SmAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ represents a long-term reward discount coefficient, and a state transition function is defined as T (s' | s, a) describing a micro traffic simulation problem in an inverse reinforcement learning manner.
3. The inverse reinforcement learning method for enhancing the realism of traffic simulators as claimed in claim 2, wherein it is assumed that pi is a trajectory action strategy*(a | s) generates M pieces of track data D ═ τ12,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
Figure FDA0003101051130000011
wherein
Figure FDA0003101051130000012
Is in the reward function rθ(s, a) distribution of trajectories.
4. The inverse reinforcement learning method for enhancing the realism of traffic simulators as claimed in claim 3, wherein said arbiter uses a state-action pair as an input,
Figure FDA0003101051130000021
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
5. The inverse reinforcement learning method of enhancing the realism of a traffic simulator of claim 1, wherein the agent comprises a vehicle interacting with the environment.
6. A reverse reinforcement learning system for enhancing the realism of traffic simulators, comprising:
an initialization step: initializing a track action strategy through a generator;
a track data generation step: generating track data of a plurality of agents according to the track action strategy and the current environment;
mixing: mixing the track data with preset expert track data to obtain mixed track data;
training: inputting the mixed track data into a discriminator, and distinguishing the expert track data by a training discriminator, wherein the training goal is to maximize a reward function;
and (3) optimizing: inputting the reward function into the generator, and obtaining a new track action strategy by the generator according to the reward function, the current environment and the track action strategy;
iteration step: generating trajectory data of a plurality of agents by using a new trajectory action strategy, mixing the trajectory data with preset expert trajectory data, and training a discriminator until convergence;
an output step: and the traffic simulator carries out traffic simulation according to the final reward function and the track action strategy.
7. The system of claim 6, wherein the method treats traffic simulation as a multi-agent control problem formally defined as a multi-tuple (M, { S) } of elementsm},{Am},T,r,γ,ρ0) A represented Markov decision process;
m represents a group of agents, M is one of the agents, SmAnd AmExpressed as the state and action, ρ, of each agent, respectively0Is a distribution expressing an initial state, r (s, a) is a reward function, and γ represents a long-term reward discount coefficient, and a state transition function is defined as T (s' | s, a) describing a micro traffic simulation problem in an inverse reinforcement learning manner.
8. The inverse reinforcement learning system for enhancing the realism of traffic simulators as claimed in claim 7, wherein the assumption is made according to a trajectory action strategy pi*(a | s) generates M pieces of track data D ═ τ12,…,τM-a trace of n points τ ═ s0,a0,s1,a1,…,sn,anAim at learning the reward function rθ(s, a) maximizing the log likelihood of the expert trajectory data:
Figure FDA0003101051130000031
wherein
Figure FDA0003101051130000032
Is in the reward function rθ(s, a) distribution of trajectories.
9. The inverse reinforcement learning system for enhancing the realism of traffic simulators as claimed in claim 8, wherein said arbiter uses state-action pairs as inputs,
Figure FDA0003101051130000033
where f and the generator strategy pi are learned functions, the training objective is to maximize the following reward function r (s, a):
r(s,a)=log(1-D(s,a))-logD(s,a)
in each iteration, the extracted reward value is used to guide the training of the generator strategy.
10. The inverse reinforcement learning system for enhancing the realism of traffic simulators as recited in claim 6, wherein the agents include vehicles that interact with the environment.
CN202110625802.1A 2021-06-04 2021-06-04 Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator Pending CN113221469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625802.1A CN113221469A (en) 2021-06-04 2021-06-04 Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625802.1A CN113221469A (en) 2021-06-04 2021-06-04 Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator

Publications (1)

Publication Number Publication Date
CN113221469A true CN113221469A (en) 2021-08-06

Family

ID=77082882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625802.1A Pending CN113221469A (en) 2021-06-04 2021-06-04 Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator

Country Status (1)

Country Link
CN (1) CN113221469A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109663A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Serverless computing resource configuration method based on maximum entropy inverse reinforcement learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091711A (en) * 2019-12-18 2020-05-01 上海天壤智能科技有限公司 Traffic control method and system based on reinforcement learning and traffic lane competition theory
CN111401556A (en) * 2020-04-22 2020-07-10 清华大学深圳国际研究生院 Selection method of opponent type imitation learning winning incentive function
CN112172813A (en) * 2020-10-14 2021-01-05 长安大学 Car following system and method for simulating driving style based on deep inverse reinforcement learning
US20210049415A1 (en) * 2018-03-06 2021-02-18 Waymo UK Ltd. Behaviour Models for Autonomous Vehicle Simulators
CN112818599A (en) * 2021-01-29 2021-05-18 四川大学 Air control method based on reinforcement learning and four-dimensional track
CN112884130A (en) * 2021-03-16 2021-06-01 浙江工业大学 SeqGAN-based deep reinforcement learning data enhanced defense method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210049415A1 (en) * 2018-03-06 2021-02-18 Waymo UK Ltd. Behaviour Models for Autonomous Vehicle Simulators
CN111091711A (en) * 2019-12-18 2020-05-01 上海天壤智能科技有限公司 Traffic control method and system based on reinforcement learning and traffic lane competition theory
CN111401556A (en) * 2020-04-22 2020-07-10 清华大学深圳国际研究生院 Selection method of opponent type imitation learning winning incentive function
CN112172813A (en) * 2020-10-14 2021-01-05 长安大学 Car following system and method for simulating driving style based on deep inverse reinforcement learning
CN112818599A (en) * 2021-01-29 2021-05-18 四川大学 Air control method based on reinforcement learning and four-dimensional track
CN112884130A (en) * 2021-03-16 2021-06-01 浙江工业大学 SeqGAN-based deep reinforcement learning data enhanced defense method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUANJIE ZHENG ET AL.: ""Objective-aware Traffic Simulation via Inverse Reinforcement Learning"", 《链接:HTTPS://ARXIV.ORG/ABS/2105.09560V1》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109663A1 (en) * 2021-12-17 2023-06-22 深圳先进技术研究院 Serverless computing resource configuration method based on maximum entropy inverse reinforcement learning

Similar Documents

Publication Publication Date Title
Bhattacharyya et al. Simulating emergent properties of human driving behavior using multi-agent reward augmented imitation learning
Ye et al. Automated lane change strategy using proximal policy optimization-based deep reinforcement learning
Jia et al. Advanced building control via deep reinforcement learning
CN109388073B (en) Method and device for vehicle dynamic simulation
Camponogara et al. Distributed learning agents in urban traffic control
CN111483468B (en) Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning
CN109733415A (en) A kind of automatic Pilot following-speed model that personalizes based on deeply study
Lu et al. Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios
CN109709956A (en) A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding
Li et al. Combined trajectory planning and tracking for autonomous vehicle considering driving styles
CN105700526A (en) On-line sequence limit learning machine method possessing autonomous learning capability
CN116134292A (en) Tool for performance testing and/or training an autonomous vehicle planner
CN113221469A (en) Inverse reinforcement learning method and system for enhancing authenticity of traffic simulator
Ramyar et al. A personalized highway driving assistance system
CN113657433B (en) Multi-mode prediction method for vehicle track
Venkatesh et al. Connected and automated vehicles in mixed-traffic: Learning human driver behavior for effective on-ramp merging
CN114973650A (en) Vehicle ramp entrance confluence control method, vehicle, electronic device, and storage medium
Konstantinidis et al. Parameter sharing reinforcement learning for modeling multi-agent driving behavior in roundabout scenarios
Yuan et al. Evolutionary decision-making and planning for autonomous driving based on safe and rational exploration and exploitation
Zhang et al. PlanLight: learning to optimize traffic signal control with planning and iterative policy improvement
Sukthankar et al. Evolving an intelligent vehicle for tactical reasoning in traffic
CN116894395A (en) Automatic driving test scene generation method, system and storage medium
Koeberle et al. Exploring the trade off between human driving imitation and safety for traffic simulation
CN116620327A (en) Lane changing decision method for realizing automatic driving high-speed scene based on PPO and Lattice
Yang et al. Accelerating safe reinforcement learning with constraint-mismatched policies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210806

RJ01 Rejection of invention patent application after publication