WO2019071909A1 - 基于相对熵深度逆强化学习的自动驾驶系统及方法 - Google Patents
基于相对熵深度逆强化学习的自动驾驶系统及方法 Download PDFInfo
- Publication number
- WO2019071909A1 WO2019071909A1 PCT/CN2018/078740 CN2018078740W WO2019071909A1 WO 2019071909 A1 WO2019071909 A1 WO 2019071909A1 CN 2018078740 W CN2018078740 W CN 2018078740W WO 2019071909 A1 WO2019071909 A1 WO 2019071909A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- driving
- trajectory
- strategy
- road information
- reinforcement learning
- Prior art date
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 51
- 238000000034 method Methods 0.000 title claims description 30
- 238000013480 data collection Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 31
- 230000010365 information processing Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 abstract description 5
- 230000008569 process Effects 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 238000009826 distribution Methods 0.000 description 4
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
Definitions
- the invention relates to an automatic driving system and method based on relative entropy depth inverse reinforcement learning, and belongs to the technical field of automatic driving.
- An existing automobile automatic driving system discriminates a driving environment by a camera and an image recognition system installed in a cab, and then the vehicle is controlled by a vehicle-mounted main control computer, a GPS positioning system, and a path planning software according to pre-stored road maps and the like. Navigate to plan a reasonable driving path between the vehicle's current location and destination to direct the vehicle to the destination.
- the object of the present invention is to provide an automatic driving system and method based on relative entropy depth inverse reinforcement learning, which utilizes a deep neural network structure and inputs historical driving trajectory information of a user driver to obtain various driving strategies representing individual driving habits. These driving strategies are personalized and intelligent for automatic driving.
- an automatic driving system based on relative entropy depth inverse reinforcement learning comprising:
- Client display driving strategy
- Driving basic data acquisition subsystem collecting road information
- a storage module connecting with the client and the driving basic data collection subsystem and storing road information collected by the driving basic data collection subsystem;
- the driving basic data collecting subsystem collects road information and transmits the road information to the client and the storage module, and the storage module receives the road information, and stores the continuous road information as a historical track. Simulating and calculating a driving strategy according to the historical trajectory, the storage module transmitting the driving strategy to a client for selection by a user, the client accepting and selecting according to the road information and user personality selection Driving strategy to implement automatic driving.
- the storage module includes a driving trajectory library for storing a historical driving trajectory, a trajectory information processing subsystem that calculates and simulates a driving strategy according to driving trajectories and driving habits, and a driving strategy library that stores driving strategies; the driving trajectory The library transmits driving trajectory data to the trajectory information processing subsystem, and the trajectory information processing subsystem analyzes and simulates a driving strategy according to the driving trajectory data and transmits the driving strategy to the driving strategy library, and the driving strategy library receives And storing the driving strategy.
- trajectory information processing subsystem calculates and simulates a driving strategy by using a multi-objective relative entropy depth inverse reinforcement learning algorithm.
- the multi-objective inverse reinforcement learning algorithm uses the EM algorithm framework nested relative entropy depth inverse reinforcement learning to calculate the parameters of the multi-reward function.
- the driving basic data collection subsystem includes a sensor for collecting road information.
- the invention also provides a method for automatic driving based on relative entropy depth inverse reinforcement learning, the method comprising the following steps:
- S1 collecting road information and transmitting the road information to the client and the storage module;
- the storage module receives the road information and stores a piece of road information as a historical trajectory, calculates and simulates various driving strategies according to the historical trajectory, and transmits the driving strategy to the client;
- the client receives the road information and the driving strategy, and implements automatic driving according to the personalized driving strategy and road information selected by the user.
- the storage module includes a driving trajectory library for storing a historical driving trajectory, a trajectory information processing subsystem that calculates and simulates a driving strategy according to driving planning and driving habits, and a driving strategy library that stores driving strategies; the driving trajectory The library transmits driving trajectory data to the trajectory information processing subsystem, and the trajectory information processing subsystem analyzes and simulates a driving strategy according to the driving trajectory data and transmits the driving strategy to the driving strategy library, and the driving strategy library receives And storing the driving strategy.
- a driving trajectory library for storing a historical driving trajectory
- a trajectory information processing subsystem that calculates and simulates a driving strategy according to driving planning and driving habits
- a driving strategy library that stores driving strategies
- trajectory information processing subsystem calculates and simulates a driving strategy by using a multi-objective relative entropy depth inverse reinforcement learning algorithm.
- the multi-objective inverse reinforcement learning algorithm uses the EM algorithm framework nested relative entropy depth inverse reinforcement learning to calculate the parameters of the multi-reward function.
- the invention has the beneficial effects of: collecting the road information in real time by setting the driving basic data acquisition subsystem in the system, and transmitting the road information to the storage module, and the storage module receives the road information and stores the continuous road information as a historical track. According to the historical driving trajectory, the driving strategy is simulated to realize the individual and intelligent automatic driving.
- FIG. 1 is a flow chart of an automatic driving system and method based on relative entropy depth inverse reinforcement learning according to the present invention.
- Figure 2 is a schematic diagram of the Markov decision process MDP.
- an automatic driving system based on relative entropy depth inverse reinforcement learning includes:
- Client 1 Display driving strategy
- Driving basic data acquisition subsystem 2 collecting road information
- the storage module 3 is connected to the client 1 and the driving basic data collection subsystem 2 and stores the road information collected by the driving basic data collection subsystem 2;
- the driving basic data collecting subsystem 2 collects road information and transmits the road information to the client 1 and the storage module 3.
- the storage module 3 receives the road information and continues the road information. Storing as a historical trajectory, analyzing and calculating a driving strategy according to the historical trajectory, the storage module 3 transmitting the driving strategy to the client 1 for selection by the user, the client 1 receiving the road information and according to The individual driving strategy selected by the user implements automatic driving.
- the storage module 3 is a cloud.
- the client 1 selects the driving strategy according to the driving strategy of the user's personality, downloads the corresponding driving strategy from the cloud 3 driving strategy library 33, and then performs real-time driving decisions according to the driving strategy and the basic data to realize real-time driverless control.
- the driving basic data collection subsystem 2 collects road information through sensors (not shown).
- the collected information serves two purposes: to pass the information to the client 1 to provide basic data for the current driving decision; to pass the information to the driving track library 31 of the cloud 3, which is stored as the historical driving trajectory data of the user driver.
- the cloud 3 includes a driving trajectory library 31 for a historical driving trajectory, a trajectory information processing subsystem 32 that calculates and simulates a driving strategy according to driving planning and driving habits, and a driving strategy library 33 that stores driving strategies; the driving trajectory library Driving the trajectory data to the trajectory information processing subsystem 32, the trajectory information processing subsystem 32 calculates and simulates a driving strategy based on the driving trajectory data and transmits the driving strategy to the driving strategy library 33, the driving The strategy library 33 receives and stores the driving strategy.
- the trajectory information processing subsystem 32 calculates and simulates a driving strategy using a multi-objective relative entropy depth inverse reinforcement learning algorithm.
- the multi-objective inverse reinforcement learning algorithm uses the EM algorithm framework nested relative entropy depth inverse reinforcement learning to calculate the parameters of the multi-reward function.
- the historical driving trajectory includes an expert historical driving trajectory and a historical trajectory of the user.
- the inverse reinforcement learning IRL refers to a problem that the reward function R is unknown in the Markov decision process MDP where the environment is known.
- the value Q(s, a) of a state action pair is often estimated using a known environment, a given reward function R, and a Markov property (also referred to as an action cumulative bonus value). Then, using the value Q(s, a) of the converged state action pairs to find the strategy ⁇ , the agent can use the strategy ⁇ to make the decision.
- the reward function R is often extremely difficult to know, but some excellent trajectories T N are relatively easy to obtain.
- the problem of restoring the bonus function R using the excellent trajectory T N is called an inverse reinforcement learning problem IRL.
- the relative entropy depth inverse reinforcement learning is performed by using the user history driving trajectory data known in the driving trajectory library 31, and the reward function R of various user personalities is restored, thereby simulating the corresponding driving strategy ⁇ . .
- the relative entropy depth inverse reinforcement learning algorithm is a model-free algorithm. It does not need the state transition function T(s, a, s') in the known environmental model.
- the relative entropy inverse reinforcement learning algorithm can use the importance sampling method in the calculation. Avoid the state transition function T(s, a, s').
- the automatic driving decision process of the automobile is a Markov decision process MDP/R without a bonus function, which can be expressed as a set ⁇ state space S, action space A, environment-defined state transition probability T (omitted against Environmental transfer probability T requirements).
- the value function of the car agent (cumulative bonus value) can be expressed as
- a plurality of bonus functions R (targets) exist simultaneously, representing different driving habits of the user driver.
- the prior probability distributions of the G reward functions be ⁇ 1 ,..., ⁇ G
- the reward weights are ⁇ 1 ,..., ⁇ G
- ⁇ ( ⁇ 1 ,..., ⁇ G , ⁇ 1 , . . . , ⁇ G ) represents a set of parameters of the G reward functions.
- the MellowMax generator is defined as: MellowMax is a more optimized algorithm that guarantees that the estimate of the V value converges to a unique point. At the same time, MellowMax has the characteristics: scientific probability distribution mechanism and expectation estimation method. In this embodiment, the reinforcement learning algorithm combined with MellowMax will be more reasonable in the exploration and utilization of the environment during the automatic driving process. This ensures that the autopilot system has sufficient learning for various scenarios and a more scientific assessment of the current state as the reinforcement learning process converges.
- a more scientific evaluation of the expected value of the feature of the state can be obtained according to the reinforcement learning combined with a soft maximization algorithm MellowMax.
- the probability distribution of motion selection can be obtained by using MellowMax.
- the iterative process of reinforcement learning can be used to obtain the expected value ⁇ of the feature that can be obtained by the reward function composed of the parameter of the current depth neural network ⁇ .
- ⁇ can be understood as the cumulative expectation of the feature.
- the EM algorithm is used to solve the above-described multi-objective inverse reinforcement learning problem with hidden variables.
- the EM algorithm can be divided into E steps and M steps according to the steps. Through the continuous iteration of E step and M step, the approximate maximum value is approximated.
- Step E Calculate first Where Z is a regular term.
- z ij represents the probability that the i-th driving trajectory belongs to the driving habit (reward function) j.
- Calculated likelihood estimate (The Q function Q( ⁇ , ⁇ t ) referred to here is the update objective function of the EM algorithm, paying attention to the difference between the Q action state value function in the reinforcement learning), and the likelihood estimation value is obtained after the calculation.
- Step M Selecting a suitable multi-driving habit parameter set ⁇ ( ⁇ l and ⁇ l ) maximizes the likelihood estimate Q( ⁇ , ⁇ t ) in the E step. Due to the mutual independence of ⁇ l and ⁇ l , their maximization can be separately determined. Can get The second half
- the completion of the gradient update marks the completion of an iterative update of the relative entropy depth inverse reinforcement learning.
- the new deep network reward function which completes the parameter update with the update, generates a new strategy ⁇ for a new iteration.
- the calculation of the E step and the M step is iteratively performed until the likelihood estimate Q( ⁇ , ⁇ t ) converges to the maximum value.
- the set of parameters ⁇ ( ⁇ 1 , . . . , ⁇ G , ⁇ 1 , . . . , ⁇ G ) obtained at this time is the prior distribution and weight of the reward function representing the multi-driving habit that we want to solve.
- the driving strategy ⁇ of each driving habit R is obtained through the calculation of the reinforcement learning RL. Output multiple driving strategies and save them in the cloud's driving strategy library. Users can choose a personalized, intelligent driving strategy in the client.
- the invention also provides a method for automatic driving based on relative entropy depth inverse reinforcement learning, the method comprising the following steps:
- S1 collecting road information and transmitting the road information to the client and the storage module;
- the storage module receives the road information and calculates and simulates various driving strategies according to the road information, and transmits the driving strategy to the client;
- the client receives the road information and the driving strategy, and implements automatic driving according to the personalized driving strategy and road information selected by the user.
- the road information is collected in real time, and the road information is transmitted to the storage module 3 and the client 1.
- the storage module 3 receives the road information and simulates driving according to the historical driving trajectory. Strategy, to achieve individual, intelligent automatic driving.
- the driving strategy is implemented in the cloud 3 instead of running the calculation process in the client 1.
- all driving strategies are already completed in the cloud 3.
- the user only needs to choose to download the driving strategy he needs, and the car body can perform real-time automatic driving according to the driving strategy and real-time road information selected by the user.
- a large amount of road information is uploaded to the cloud 3 and stored as a historical driving trajectory.
- Use the stored historical driving trajectory big data to update the driving strategy library. Using the trajectory information big data, the system will achieve automatic driving closer to the user's needs.
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Traffic Control Systems (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
Abstract
Description
Claims (9)
- 一种基于相对熵深度逆强化学习的自动驾驶系统,其特征在于,所述系统包括:客户端:显示驾驶策略;驾驶基础数据采集子系统:采集道路信息;存储模块:与所述客户端及驾驶基础数据采集子系统连接并存储所述驾驶基础数据采集子系统所采集到的道路信息;其中,所述驾驶基础数据采集子系统采集道路信息并将所述道路信息传输给所述客户端及存储模块,所述存储模块接收所述道路信息,并将持续的一段道路信息存储为历史轨迹,根据所述历史轨迹进行分析计算模拟出驾驶策略,所述存储模块将所述驾驶策略传输至客户端以供用户选择,所述客户端接受并根据所述道路信息和用户个性选择的所述驾驶策略实施自动驾驶。
- 如权利要求1所述的基于相对熵深度逆强化学习的自动驾驶系统,其特征在于,所述存储模块包括用于存储历史驾驶轨迹的驾驶轨迹库、根据驾驶轨迹及驾驶习惯计算并模拟出驾驶策略的轨迹信息处理子系统及存储驾驶策略的驾驶策略库;所述驾驶轨迹库将驾驶轨迹数据传输给所述轨迹信息处理子系统,所述轨迹信息处理子系统根据所述驾驶轨迹数据分析计算并模拟出驾驶策略并传输给所述驾驶策略库,所述驾驶策略库接收并存储所述驾驶策略。
- 如权利要求2所述的基于相对熵深度逆强化学习的自动驾驶系统,其特征在于,所述轨迹信息处理子系统采用多目标的相对熵深度逆强化学习算法计算并模拟驾驶策略。
- 如权利要求3所述的基于相对熵深度逆强化学习的自动驾驶系统,其特征在于,所述多目标的逆强化学习算法采用EM算法框架嵌套相对熵深度逆强化学习计算多奖赏函数的参数。
- 如权利要求1所述的基于相对熵深度逆强化学习的个性化自动驾驶系统,其特征在于,所述驾驶基础数据采集子系统包括用于采集道路信息的传感器。
- 一种基于相对熵深度逆强化学习的自动驾驶的方法,其特征在于,所述方法包括如下步骤:S1:采集道路信息并将所述道路信息传输给客户端及存储模块;S2:所述存储模块接收所述道路信息并将持续的一段道路信息存储为历史轨迹,根据所述历史轨迹分析计算并模拟多种驾驶策略,并将所述驾驶策略传递给所述客户端;S3:所述客户端接收所述道路信息及驾驶策略,并根据用户选择的个性驾驶策略及道路信息实施自动驾驶。
- 如权利要求6所述的基于相对熵深度逆强化学习的自动驾驶的方法,其特征在于,所述存储模块包括用于存储历史驾驶轨迹的驾驶轨迹库、根据驾驶规划及驾驶习惯计算并模拟出驾驶策略的轨迹信息处理子系统及存储驾驶策略的驾驶策略库;所述驾驶轨迹库将驾驶轨迹数据传输给所述轨迹信息处理子系统,所述轨迹信息处理子系统根据所述驾驶轨迹数据分析计算并模拟出驾驶策略并传输给所述驾驶策略库,所述驾驶策略库接收并存储所述驾驶策略。
- 如权利要求7所述的基于相对熵深度逆强化学习的自动驾驶的方法,其特征在于,所述轨迹信息处理子系统采用多目标的相对熵深度逆强化学习算法计算并模拟驾驶策略。
- 如权利要求8所述的基于相对熵深度逆强化学习的自动驾驶的方法,其特征在于,所述多目标的逆强化学习算法采用EM算法框架嵌套相对熵深度逆强化学习计算多奖赏函数的参数。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710940590.XA CN107544516A (zh) | 2017-10-11 | 2017-10-11 | 基于相对熵深度逆强化学习的自动驾驶系统及方法 |
CN201710940590.X | 2017-10-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019071909A1 true WO2019071909A1 (zh) | 2019-04-18 |
Family
ID=60967749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/078740 WO2019071909A1 (zh) | 2017-10-11 | 2018-03-12 | 基于相对熵深度逆强化学习的自动驾驶系统及方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107544516A (zh) |
WO (1) | WO2019071909A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110673602A (zh) * | 2019-10-24 | 2020-01-10 | 驭势科技(北京)有限公司 | 一种强化学习模型、车辆自动驾驶决策的方法和车载设备 |
TWI737437B (zh) * | 2020-08-07 | 2021-08-21 | 財團法人車輛研究測試中心 | 軌跡決定方法 |
WO2023083113A1 (en) * | 2021-11-10 | 2023-05-19 | International Business Machines Corporation | Reinforcement learning with inductive logic programming |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10678241B2 (en) * | 2017-09-06 | 2020-06-09 | GM Global Technology Operations LLC | Unsupervised learning agents for autonomous driving applications |
CN107544516A (zh) * | 2017-10-11 | 2018-01-05 | 苏州大学 | 基于相对熵深度逆强化学习的自动驾驶系统及方法 |
CN108803609B (zh) * | 2018-06-11 | 2020-05-01 | 苏州大学 | 基于约束在线规划的部分可观察自动驾驶决策方法 |
WO2020000192A1 (en) * | 2018-06-26 | 2020-01-02 | Psa Automobiles Sa | Method for providing vehicle trajectory prediction |
CN110654372B (zh) * | 2018-06-29 | 2021-09-03 | 比亚迪股份有限公司 | 车辆驾驶控制方法、装置、车辆和存储介质 |
US10845815B2 (en) * | 2018-07-27 | 2020-11-24 | GM Global Technology Operations LLC | Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents |
CN109636432B (zh) * | 2018-09-28 | 2023-05-30 | 创新先进技术有限公司 | 计算机执行的项目选择方法和装置 |
CN111159832B (zh) * | 2018-10-19 | 2024-04-02 | 百度在线网络技术(北京)有限公司 | 交通信息流的构建方法和装置 |
CN110321811B (zh) * | 2019-06-17 | 2023-05-02 | 中国工程物理研究院电子工程研究所 | 深度逆强化学习的无人机航拍视频中的目标检测方法 |
CN110238855B (zh) * | 2019-06-24 | 2020-10-16 | 浙江大学 | 一种基于深度逆向强化学习的机器人乱序工件抓取方法 |
CN110955239B (zh) * | 2019-11-12 | 2021-03-02 | 中国地质大学(武汉) | 一种基于逆强化学习的无人船多目标轨迹规划方法及系统 |
CN110837258B (zh) * | 2019-11-29 | 2024-03-08 | 商汤集团有限公司 | 自动驾驶控制方法及装置、系统、电子设备和存储介质 |
CN111026127B (zh) * | 2019-12-27 | 2021-09-28 | 南京大学 | 基于部分可观测迁移强化学习的自动驾驶决策方法及系统 |
CN114194211B (zh) * | 2021-11-30 | 2023-04-25 | 浪潮(北京)电子信息产业有限公司 | 一种自动驾驶方法、装置及电子设备和存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699717A (zh) * | 2013-12-03 | 2014-04-02 | 重庆交通大学 | 基于前视断面选点的复杂道路汽车行驶轨迹预测方法 |
CN105718750A (zh) * | 2016-01-29 | 2016-06-29 | 长沙理工大学 | 一种车辆行驶轨迹的预测方法及系统 |
CN107544516A (zh) * | 2017-10-11 | 2018-01-05 | 苏州大学 | 基于相对熵深度逆强化学习的自动驾驶系统及方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014152554A1 (en) * | 2013-03-15 | 2014-09-25 | Caliper Corporation | Lane-level vehicle navigation for vehicle routing and traffic management |
DE112015004218B4 (de) * | 2014-09-16 | 2019-05-23 | Honda Motor Co., Ltd. | Fahrassistenzvorrichtung |
CN106842925B (zh) * | 2017-01-20 | 2019-10-11 | 清华大学 | 一种基于深度强化学习的机车智能操纵方法与系统 |
CN107169567B (zh) * | 2017-03-30 | 2020-04-07 | 深圳先进技术研究院 | 一种用于车辆自动驾驶的决策网络模型的生成方法及装置 |
CN107084735A (zh) * | 2017-04-26 | 2017-08-22 | 电子科技大学 | 适用于减少冗余导航的导航路径框架 |
CN107229973B (zh) * | 2017-05-12 | 2021-11-19 | 中国科学院深圳先进技术研究院 | 一种用于车辆自动驾驶的策略网络模型的生成方法及装置 |
CN107200017A (zh) * | 2017-05-22 | 2017-09-26 | 北京联合大学 | 一种基于深度学习的无人驾驶车辆控制系统 |
-
2017
- 2017-10-11 CN CN201710940590.XA patent/CN107544516A/zh active Pending
-
2018
- 2018-03-12 WO PCT/CN2018/078740 patent/WO2019071909A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699717A (zh) * | 2013-12-03 | 2014-04-02 | 重庆交通大学 | 基于前视断面选点的复杂道路汽车行驶轨迹预测方法 |
CN105718750A (zh) * | 2016-01-29 | 2016-06-29 | 长沙理工大学 | 一种车辆行驶轨迹的预测方法及系统 |
CN107544516A (zh) * | 2017-10-11 | 2018-01-05 | 苏州大学 | 基于相对熵深度逆强化学习的自动驾驶系统及方法 |
Non-Patent Citations (1)
Title |
---|
LU CHENJIE: "The Research of Apprenticeship Learning Algorithm Applied in thr Unmanned Car High-Speed Driving in the Simulated Environnment", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE, CHINA MASTER'S THESE FULL-TEXT DATABASE, 15 June 2013 (2013-06-15), pages 19-21 - 32-45, ISSN: 1674-0246 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110673602A (zh) * | 2019-10-24 | 2020-01-10 | 驭势科技(北京)有限公司 | 一种强化学习模型、车辆自动驾驶决策的方法和车载设备 |
CN110673602B (zh) * | 2019-10-24 | 2022-11-25 | 驭势科技(北京)有限公司 | 一种强化学习模型、车辆自动驾驶决策的方法和车载设备 |
TWI737437B (zh) * | 2020-08-07 | 2021-08-21 | 財團法人車輛研究測試中心 | 軌跡決定方法 |
WO2023083113A1 (en) * | 2021-11-10 | 2023-05-19 | International Business Machines Corporation | Reinforcement learning with inductive logic programming |
Also Published As
Publication number | Publication date |
---|---|
CN107544516A (zh) | 2018-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019071909A1 (zh) | 基于相对熵深度逆强化学习的自动驾驶系统及方法 | |
CN110745136B (zh) | 一种驾驶自适应控制方法 | |
KR102335389B1 (ko) | 자율 주행 차량의 lidar 위치 추정을 위한 심층 학습 기반 특징 추출 | |
Ohnishi et al. | Barrier-certified adaptive reinforcement learning with applications to brushbot navigation | |
CN107169567B (zh) | 一种用于车辆自动驾驶的决策网络模型的生成方法及装置 | |
KR102292277B1 (ko) | 자율 주행 차량에서 3d cnn 네트워크를 사용하여 솔루션을 추론하는 lidar 위치 추정 | |
US20210284184A1 (en) | Learning point cloud augmentation policies | |
EP3035314B1 (en) | A traffic data fusion system and the related method for providing a traffic state for a network of roads | |
WO2020119363A1 (zh) | 自动驾驶方法、训练方法及相关装置 | |
EP3719603B1 (en) | Action control method and apparatus | |
US20240160901A1 (en) | Controlling agents using amortized q learning | |
US20220187088A1 (en) | Systems and methods for providing feedback to improve fuel consumption efficiency | |
CN112148008B (zh) | 一种基于深度强化学习的实时无人机路径预测方法 | |
CN110488842B (zh) | 一种基于双向内核岭回归的车辆轨迹预测方法 | |
US11567495B2 (en) | Methods and systems for selecting machine learning models to predict distributed computing resources | |
CN113299085A (zh) | 一种交通信号灯控制方法、设备及存储介质 | |
CN109858137B (zh) | 一种基于可学习扩展卡尔曼滤波的复杂机动飞行器航迹估计方法 | |
CN114261400B (zh) | 一种自动驾驶决策方法、装置、设备和存储介质 | |
CN114199248B (zh) | 一种基于混合元启发算法优化anfis的auv协同定位方法 | |
CN115311860B (zh) | 一种交通流量预测模型的在线联邦学习方法 | |
CN112036598A (zh) | 一种基于多信息耦合的充电桩使用信息预测方法 | |
Xu et al. | Trajectory prediction for autonomous driving with topometric map | |
CN115691140B (zh) | 一种汽车充电需求时空分布的分析与预测方法 | |
CN114187759B (zh) | 一种基于数据驱动模型的路侧单元驾驶辅助方法及装置 | |
CN114495036A (zh) | 一种基于三阶段注意力机制的车辆轨迹预测方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18867035 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18867035 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02/10/2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18867035 Country of ref document: EP Kind code of ref document: A1 |