WO2020180014A3 - 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템 - Google Patents

심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템 Download PDF

Info

Publication number
WO2020180014A3
WO2020180014A3 PCT/KR2020/001692 KR2020001692W WO2020180014A3 WO 2020180014 A3 WO2020180014 A3 WO 2020180014A3 KR 2020001692 W KR2020001692 W KR 2020001692W WO 2020180014 A3 WO2020180014 A3 WO 2020180014A3
Authority
WO
WIPO (PCT)
Prior art keywords
training
agent
information
basis
autonomous driving
Prior art date
Application number
PCT/KR2020/001692
Other languages
English (en)
French (fr)
Other versions
WO2020180014A2 (ko
Inventor
최진영
박경식
김민수
석상옥
서준호
Original Assignee
네이버랩스 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 네이버랩스 주식회사 filed Critical 네이버랩스 주식회사
Priority to EP20765632.3A priority Critical patent/EP3936963A4/en
Priority to JP2021552641A priority patent/JP7271702B2/ja
Publication of WO2020180014A2 publication Critical patent/WO2020180014A2/ko
Publication of WO2020180014A3 publication Critical patent/WO2020180014A3/ko
Priority to US17/466,450 priority patent/US20210397961A1/en

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • G01S17/931Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0268Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0268Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means
    • G05D1/0274Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means using mapping information stored in a memory device
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0287Control of position or course in two dimensions specially adapted to land vehicles involving a plurality of land vehicles, e.g. fleet or convoy travelling
    • G05D1/0289Control of position or course in two dimensions specially adapted to land vehicles involving a plurality of land vehicles, e.g. fleet or convoy travelling with means for avoiding collisions between vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Optics & Photonics (AREA)
  • Feedback Control In General (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템을 개시한다. 일실시예에 따른 에이전트 학습 방법은, 심층 강화 학습(Deep Reinforcement Learning, DRL)을 위한 시뮬레이션상에서 액터-크리틱(actor-critic) 알고리즘을 통해 에이전트를 학습시키는 단계를 포함할 수 있다. 이때, 학습시키는 단계는, 상기 액터-크리틱 알고리즘에서 에이전트의 행동을 결정하는 평가망인 액터 네트워크에 제1 정보를, 상기 행동이 기설정된 보상을 최대화하는데 얼마나 도움이 되는가를 평가하는 가치망인 크리틱에 제2 정보를 입력하는 것을 특징으로 할 수 있다. 여기서, 상기 제2 정보는 상기 제1 정보와 추가 정보를 포함할 수 있다.
PCT/KR2020/001692 2019-03-05 2020-02-06 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템 WO2020180014A2 (ko)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20765632.3A EP3936963A4 (en) 2019-03-05 2020-02-06 AUTONOMOUS DRIVING AGENT TRAINING METHOD AND SYSTEM BASED ON DEEP REINFORCEMENT LEARNING
JP2021552641A JP7271702B2 (ja) 2019-03-05 2020-02-06 深層強化学習に基づく自律走行エージェントの学習方法およびシステム
US17/466,450 US20210397961A1 (en) 2019-03-05 2021-09-03 Method and system for training autonomous driving agent on basis of deep reinforcement learning

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020190025284A KR102267316B1 (ko) 2019-03-05 2019-03-05 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템
KR10-2019-0025284 2019-03-05

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/466,450 Continuation US20210397961A1 (en) 2019-03-05 2021-09-03 Method and system for training autonomous driving agent on basis of deep reinforcement learning

Publications (2)

Publication Number Publication Date
WO2020180014A2 WO2020180014A2 (ko) 2020-09-10
WO2020180014A3 true WO2020180014A3 (ko) 2020-12-03

Family

ID=72338692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/001692 WO2020180014A2 (ko) 2019-03-05 2020-02-06 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템

Country Status (5)

Country Link
US (1) US20210397961A1 (ko)
EP (1) EP3936963A4 (ko)
JP (1) JP7271702B2 (ko)
KR (1) KR102267316B1 (ko)
WO (1) WO2020180014A2 (ko)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11645498B2 (en) * 2019-09-25 2023-05-09 International Business Machines Corporation Semi-supervised reinforcement learning
CN112180927B (zh) * 2020-09-27 2021-11-26 安徽江淮汽车集团股份有限公司 一种自动驾驶时域构建方法、设备、存储介质及装置
KR102461831B1 (ko) * 2021-01-13 2022-11-03 부경대학교 산학협력단 자율주행 차량 군집 운행을 위한 비신호 교차로에서의 강화학습기반 통행 개선을 위한 장치 및 방법
CN113110101B (zh) * 2021-04-20 2022-06-21 济南大学 一种生产线移动机器人聚集式回收入库仿真方法及系统
CN113253612B (zh) * 2021-06-01 2021-09-17 苏州浪潮智能科技有限公司 一种自动驾驶控制方法、装置、设备及可读存储介质
CN113359771B (zh) * 2021-07-06 2022-09-30 贵州大学 一种基于强化学习的智能自动驾驶控制方法
CN114397817A (zh) * 2021-12-31 2022-04-26 上海商汤科技开发有限公司 网络训练、机器人控制方法及装置、设备及存储介质
CN114372563A (zh) * 2022-01-10 2022-04-19 四川大学 基于混合脉冲强化学习网络结构的机器人控制方法及系统
CN114104005B (zh) * 2022-01-26 2022-04-19 苏州浪潮智能科技有限公司 自动驾驶设备的决策方法、装置、设备及可读存储介质
CN114594793B (zh) * 2022-03-07 2023-04-25 四川大学 一种基站无人机的路径规划方法
KR102670927B1 (ko) * 2022-04-01 2024-05-30 전북대학교산학협력단 지능형 자율비행을 위한 액터-크리틱 심층강화학습 기반 목표점 추정 및 충돌회피 기법을 이용하는 자율 비행 플랫폼
CN115361301B (zh) * 2022-10-09 2023-01-10 之江实验室 一种基于dqn的分布式计算网络协同流量调度系统与方法
CN116202550B (zh) * 2023-05-06 2023-07-11 华东交通大学 融合改进势场与动态窗口的汽车路径规划方法
CN117291845B (zh) * 2023-11-27 2024-03-19 成都理工大学 一种点云地面滤波方法、系统、电子设备及存储介质
CN117824663B (zh) * 2024-03-05 2024-05-10 南京思伽智能科技有限公司 一种基于手绘场景图理解的机器人导航方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180034553A (ko) * 2015-07-24 2018-04-04 딥마인드 테크놀로지스 리미티드 심층 강화 학습을 이용한 지속적인 제어
WO2018083532A1 (en) * 2016-11-03 2018-05-11 Deepmind Technologies Limited Training action selection neural networks
WO2018083671A1 (en) * 2016-11-04 2018-05-11 Deepmind Technologies Limited Reinforcement learning with auxiliary tasks
JP2018126797A (ja) * 2017-02-06 2018-08-16 セイコーエプソン株式会社 制御装置、ロボットおよびロボットシステム
JP2019031268A (ja) * 2017-05-12 2019-02-28 トヨタ モーター エンジニアリング アンド マニュファクチャリング ノース アメリカ,インコーポレイティド 能動的探索なしの強化学習に基づく制御ポリシー学習及び車両制御方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101539270B1 (ko) 2015-02-27 2015-07-24 군산대학교산학협력단 충돌회피 및 자율주행을 위한 센서융합 기반 하이브리드 반응 경로 계획 방법, 이를 수행하기 위한 기록 매체 및 이동로봇
CN116992917A (zh) 2016-10-10 2023-11-03 渊慧科技有限公司 用于选择动作的系统和方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180034553A (ko) * 2015-07-24 2018-04-04 딥마인드 테크놀로지스 리미티드 심층 강화 학습을 이용한 지속적인 제어
WO2018083532A1 (en) * 2016-11-03 2018-05-11 Deepmind Technologies Limited Training action selection neural networks
WO2018083671A1 (en) * 2016-11-04 2018-05-11 Deepmind Technologies Limited Reinforcement learning with auxiliary tasks
JP2018126797A (ja) * 2017-02-06 2018-08-16 セイコーエプソン株式会社 制御装置、ロボットおよびロボットシステム
JP2019031268A (ja) * 2017-05-12 2019-02-28 トヨタ モーター エンジニアリング アンド マニュファクチャリング ノース アメリカ,インコーポレイティド 能動的探索なしの強化学習に基づく制御ポリシー学習及び車両制御方法

Also Published As

Publication number Publication date
US20210397961A1 (en) 2021-12-23
KR102267316B1 (ko) 2021-06-21
EP3936963A2 (en) 2022-01-12
JP7271702B2 (ja) 2023-05-11
WO2020180014A2 (ko) 2020-09-10
JP2022524494A (ja) 2022-05-06
KR20200108527A (ko) 2020-09-21
EP3936963A4 (en) 2023-01-25

Similar Documents

Publication Publication Date Title
WO2020180014A3 (ko) 심층 강화 학습에 기반한 자율주행 에이전트의 학습 방법 및 시스템
US11429854B2 (en) Method and device for a computerized mechanical device
WO2019199475A3 (en) Training machine learning model based on training instances with: training instance input based on autonomous vehicle sensor data, and training instance output based on additional vehicle sensor data
CN113561986B (zh) 自动驾驶汽车决策方法及装置
WO2017176356A3 (en) Partitioned machine learning architecture
CN108870090B (zh) 基于最小二乘支持向量机信息融合的管道泄漏检测方法
RU2017100526A (ru) Системы и способ распознавания речи
WO2018142394A3 (en) Computer aided driving
CN111009153A (zh) 一种轨迹预测模型的训练方法、装置和设备
US20190145860A1 (en) Method and apparatus for autonomous system performance and grading
WO2008033394A3 (en) Complexity management tool
US20220009510A1 (en) Method for training at least one algorithm for a control device of a motor vehicle, computer program product, and motor vehicle
WO2003025689A3 (en) Large scale process control by driving factor identification
EP4005498A4 (en) INFORMATION PROCESSING DEVICE, PROGRAM, LEARNED MODEL, DIAGNOSTIC SUPPORT DEVICE, LEARNING DEVICE AND METHOD FOR GENERATION OF A PREDICTIVE MODEL
WO2015127110A3 (en) Event-based inference and learning for stochastic spiking bayesian networks
RU2015155633A (ru) Системы и способы создания и реализации агента или системы с искусственным интеллектом
US20220204020A1 (en) Toward simulation of driver behavior in driving automation
CN106055579B (zh) 基于人工神经网络的车辆性能数据清洗系统及其方法
WO2020086176A8 (en) Artificial neural network and method of training an artificial neural network with epigenetic neurogenesis
RU2018109361A (ru) Способы и системы для определения местонахождения устройства для очистки полости рта
WO2021124110A8 (en) System and methods thereof for monitoring proper behavior of an autonomous vehicle
WO2020068701A3 (en) Automated determination of web page rendering performance
PH12018550213A1 (en) System and method for learning-based group tagging
US20200384989A1 (en) Method for the improved detection of objects by a driver assistance system
WO2022167870A3 (en) Prediction of pipeline column separations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20765632

Country of ref document: EP

Kind code of ref document: A2

ENP Entry into the national phase

Ref document number: 2021552641

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020765632

Country of ref document: EP

Effective date: 20211005