JP7438336B2 - 強化学習モデルのための状態シミュレータ - Google Patents

強化学習モデルのための状態シミュレータ Download PDF

Info

Publication number
JP7438336B2
JP7438336B2 JP2022515598A JP2022515598A JP7438336B2 JP 7438336 B2 JP7438336 B2 JP 7438336B2 JP 2022515598 A JP2022515598 A JP 2022515598A JP 2022515598 A JP2022515598 A JP 2022515598A JP 7438336 B2 JP7438336 B2 JP 7438336B2
Authority
JP
Japan
Prior art keywords
features
state
subsets
actions
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2022515598A
Other languages
English (en)
Japanese (ja)
Other versions
JP2022547529A (ja
JP2022547529A5 (https=
Inventor
マシーン、マイケル
ザドロズニ、アレクサンダー
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of JP2022547529A publication Critical patent/JP2022547529A/ja
Publication of JP2022547529A5 publication Critical patent/JP2022547529A5/ja
Application granted granted Critical
Publication of JP7438336B2 publication Critical patent/JP7438336B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)
JP2022515598A 2019-09-12 2020-08-11 強化学習モデルのための状態シミュレータ Active JP7438336B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/568,284 2019-09-12
US16/568,284 US11574244B2 (en) 2019-09-12 2019-09-12 States simulator for reinforcement learning models
PCT/EP2020/072487 WO2021047842A1 (en) 2019-09-12 2020-08-11 States simulator for reinforcement learning models

Publications (3)

Publication Number Publication Date
JP2022547529A JP2022547529A (ja) 2022-11-14
JP2022547529A5 JP2022547529A5 (https=) 2022-12-13
JP7438336B2 true JP7438336B2 (ja) 2024-02-26

Family

ID=72050874

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2022515598A Active JP7438336B2 (ja) 2019-09-12 2020-08-11 強化学習モデルのための状態シミュレータ

Country Status (5)

Country Link
US (1) US11574244B2 (https=)
EP (1) EP4028959A1 (https=)
JP (1) JP7438336B2 (https=)
CN (1) CN114365157A (https=)
WO (1) WO2021047842A1 (https=)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102338304B1 (ko) * 2020-10-20 2021-12-13 주식회사 뉴로코어 강화 학습을 이용한 공장 시뮬레이터 기반 스케줄링 시스템
CN115617796A (zh) * 2022-10-12 2023-01-17 中电智元数据科技有限公司 一种分布式数据库索引选择方法
CN118837737B (zh) * 2024-06-27 2025-09-09 西安交通大学 一种水下推进电机故障诊断方法、装置、设备及存储介质

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013242761A (ja) 2012-05-22 2013-12-05 Internatl Business Mach Corp <Ibm> マルコフ決定過程システム環境下における方策パラメータを更新するための方法、並びに、その制御器及び制御プログラム

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8918866B2 (en) * 2009-06-29 2014-12-23 International Business Machines Corporation Adaptive rule loading and session control for securing network delivered services
US9128739B1 (en) * 2012-12-31 2015-09-08 Emc Corporation Determining instances to maintain on at least one cloud responsive to an evaluation of performance characteristics
US20160260024A1 (en) * 2015-03-04 2016-09-08 Qualcomm Incorporated System of distributed planning
US10540598B2 (en) 2015-09-09 2020-01-21 International Business Machines Corporation Interpolation of transition probability values in Markov decision processes
CN108701252B (zh) 2015-11-12 2024-02-02 渊慧科技有限公司 使用优先化经验存储器训练神经网络
US10839302B2 (en) 2015-11-24 2020-11-17 The Research Foundation For The State University Of New York Approximate value iteration with complex returns by bounding
CN108230057A (zh) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 一种智能推荐方法及系统
US20180342004A1 (en) * 2017-05-25 2018-11-29 Microsoft Technology Licensing, Llc Cumulative success-based recommendations for repeat users
WO2020005240A1 (en) * 2018-06-27 2020-01-02 Google Llc Adapting a sequence model for use in predicting future device interactions with a computing system
US10963313B2 (en) * 2018-08-27 2021-03-30 Vmware, Inc. Automated reinforcement-learning-based application manager that learns and improves a reward function
US11468322B2 (en) * 2018-12-04 2022-10-11 Rutgers, The State University Of New Jersey Method for selecting and presenting examples to explain decisions of algorithms
EP3776347B1 (en) * 2019-06-17 2025-07-02 Google LLC Vehicle occupant engagement using three-dimensional eye gaze vectors

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013242761A (ja) 2012-05-22 2013-12-05 Internatl Business Mach Corp <Ibm> マルコフ決定過程システム環境下における方策パラメータを更新するための方法、並びに、その制御器及び制御プログラム

Also Published As

Publication number Publication date
JP2022547529A (ja) 2022-11-14
CN114365157A (zh) 2022-04-15
EP4028959A1 (en) 2022-07-20
WO2021047842A1 (en) 2021-03-18
US11574244B2 (en) 2023-02-07
US20210081829A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
CN110366734B (zh) 优化神经网络架构
US11615302B2 (en) Effective user modeling with time-aware based binary hashing
CN109902706B (zh) 推荐方法及装置
JP7438336B2 (ja) 強化学習モデルのための状態シミュレータ
KR102203253B1 (ko) 생성적 적대 신경망에 기반한 평점 증강 및 아이템 추천 방법 및 시스템
US20180024989A1 (en) Automated building and sequencing of a storyline and scenes, or sections, included therein
US20160086498A1 (en) Recommending a Set of Learning Activities Based on Dynamic Learning Goal Adaptation
US12079289B2 (en) Recommending content to subscribers
KR20200046189A (ko) 생성적 적대 신경망에 기반한 협업 필터링을 위한 방법 및 시스템
Leike Nonparametric general reinforcement learning
JP2020087103A (ja) 学習方法、コンピュータプログラム、分類器、及び生成器
CN114930317A (zh) 用于视频接地的图形卷积网络
US10537801B2 (en) System and method for decision making in strategic environments
CN114119078A (zh) 目标资源确定方法、装置、电子设备及介质
CN114138954B (zh) 用户咨询问题推荐方法、系统、计算机设备及存储介质
KR102549937B1 (ko) Sns 텍스트 기반의 사용자의 인테리어 스타일 분석 모델 제공 장치 및 방법
CN114118411A (zh) 图像识别网络的训练方法、图像识别方法及装置
CN116402138A (zh) 一种多粒度历史聚合的时序知识图谱推理方法及系统
CN112699203B (zh) 路网数据的处理方法和装置
CN116955808B (zh) 一种游戏推荐方法、装置、电子设备及介质
CN120821823A (zh) 一种基于语言模型的任务处理方法、装置
CN119072696A (zh) 训练神经网络系统以执行多个机器学习任务
US12462200B1 (en) Accelerated training of a machine learning model
KR20210000181A (ko) 게임 데이터 처리 방법
CN110347916A (zh) 跨场景的项目推荐方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20220518

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20221202

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20230120

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20231213

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20240130

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20240213

R150 Certificate of patent or registration of utility model

Ref document number: 7438336

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150