WO2022137520A1 - 学習装置、学習方法および学習プログラム - Google Patents
学習装置、学習方法および学習プログラム Download PDFInfo
- Publication number
- WO2022137520A1 WO2022137520A1 PCT/JP2020/048791 JP2020048791W WO2022137520A1 WO 2022137520 A1 WO2022137520 A1 WO 2022137520A1 JP 2020048791 W JP2020048791 W JP 2020048791W WO 2022137520 A1 WO2022137520 A1 WO 2022137520A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- reward function
- distance
- locus
- parameters
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
Definitions
- the present invention relates to a learning device, a learning method, and a learning program for performing reverse reinforcement learning.
- Inverse Reinforcement Learning is known as a method for facilitating the setting of this reward function.
- a reward function that reflects the expert's intention is generated by repeating optimization using the reward function and updating the parameters of the reward function using the decision history data of the expert. ..
- Non-Patent Document 1 describes maximum entropy reverse reinforcement learning (ME-IRL: Maximum Entropy-IRL), which is one of reverse reinforcement learning.
- Non-Patent Document 2 describes GCL (Guided Cost Learning), which is one of the methods of reverse reinforcement learning that improves maximum entropy reverse reinforcement learning.
- GCL Guided Cost Learning
- the weight of the reward function is updated by using importance sampling.
- imitation learning that reproduces a given behavior history by combining reverse reinforcement learning that learns reward functions and behavior imitation that directly learns policies is also known (see, for example, Non-Patent Document 3).
- the reward function is learned so as to reduce the difference between the behavior history of the expert who wants to reproduce and the optimized execution result.
- the above-mentioned differences are defined by probabilistic distances such as KL (Kullback-Leibler) divergence and JS (Jensen-Shannon) divergence.
- the gradient method is generally used.
- it is difficult to set the probability distribution in the combinatorial optimization problem and it is difficult to apply the above-mentioned inverse reinforcement learning to the combinatorial optimization problem to which many of the actual problems belong.
- an object of the present invention is to provide a learning device, a learning method, and a learning program capable of stably performing reverse reinforcement learning in a combination optimization problem.
- the learning device is determined based on a function input means that accepts input of a reward function whose feature amount is set so as to satisfy the Lipschitz continuity condition, a probability distribution of a skilled person's trajectory, and parameters of the reward function.
- An estimation means that estimates the locus that minimizes the Wasserstein distance, which represents the distance to the probability distribution of the locus, and an update means that updates the parameters of the reward function to maximize the Wasserstein distance based on the estimated locus. It is characterized by being equipped with.
- the learning method accepts the input of a reward function whose feature amount is set so as to satisfy the Lipschitz continuity condition, and has a probability distribution of the trajectory of an expert and a probability distribution of the trajectory determined based on the parameters of the reward function. It is characterized in that the trajectory that minimizes the Wasserstein distance representing the distance to and is estimated, and the parameter of the reward function is updated so as to maximize the Wasserstein distance based on the estimated trajectory.
- the learning program according to the present invention is determined based on a function input process that accepts an input of a reward function whose feature amount is set so as to satisfy the Lipschitz continuity condition, a probability distribution of a skilled person's trajectory, and a parameter of the reward function.
- the estimation process that estimates the trajectory that minimizes the Wasserstein distance, which represents the distance from the probability distribution of the trajectory to be performed, and updates the parameters of the reward function so that the Wasserstein distance is maximized based on the estimated trajectory. It is characterized by executing an update process.
- reverse reinforcement learning can be stably performed in the combinatorial optimization problem.
- the locus ⁇ is represented by the following equation 1
- the probability model representing the locus distribution p ⁇ ( ⁇ ) is represented by the following equation 2.
- C ⁇ ( ⁇ ) in Equation 2 is a cost function
- the reward function r ⁇ ( ⁇ ) is expressed by reversing the sign (that is, ⁇ c ⁇ ( ⁇ )) (see Equation 3).
- Z represents the sum of the rewards for all the trajectories (see Equation 4).
- Equation 5 ⁇ is the step width and L ME ( ⁇ ) is the distance scale between the distributions used in ME-IRL.
- Equation 6 the second term in Equation 6 is the sum of rewards for all trajectories.
- ME-IRL is premised on the fact that the value of this second term can be calculated exactly. However, in reality, it is difficult to calculate the sum of rewards for all trajectories, so in GCL described in Non-Patent Document 2, this value is approximately calculated by importance sampling.
- FIG. 1 is a block diagram showing a configuration example of an embodiment of the learning device according to the present invention.
- the learning device 100 of the present embodiment is a device that performs inverse reinforcement learning that estimates a reward function from the behavior of a target person (expert) by machine learning, and specifically performs information processing based on the behavior characteristics of the expert. It is a device.
- the learning device 100 includes a storage unit 10, an input unit 20, a feature amount setting unit 30, a weight initial value setting unit 40, a mathematical optimization execution unit 50, a weight update unit 60, a convergence determination unit 70, and the like. It is provided with an output unit 80.
- a device including the mathematical optimization execution unit 50, the weight update unit 60, and the convergence test unit 70. can be called a reverse reinforcement learning device.
- the storage unit 10 stores information necessary for the learning device 100 to perform various processes.
- the storage unit 10 may store the decision-making history data (trajectory) of the expert received by the input unit 20 described later. Further, the storage unit 10 may store candidates for the feature amount of the reward function used for learning by the mathematical optimization execution unit 50 and the weight update unit 60, which will be described later.
- the feature quantity candidate does not necessarily have to be the feature quantity used for the objective function.
- the storage unit 10 may store a mathematical optimization solver for realizing the mathematical optimization execution unit 50, which will be described later.
- the content of the mathematical optimization solver is arbitrary and may be determined according to the environment and the device to be executed.
- the input unit 20 receives input of information necessary for the learning device 100 to perform various processes.
- the input unit 20 may, for example, accept the input of the above-mentioned expert decision-making history data (specifically, a state / action pair). Further, the input unit 20 may accept the input of the constraint z in the initial state used when the reverse reinforcement learning device described later performs the reverse reinforcement learning.
- the feature amount setting unit 30 sets the feature amount of the reward function from the data including the state and the action. Specifically, the feature amount setting unit 30 rewards the tangent slope to be finite throughout the function so that the inverse reinforcement learning device described later can use the Wasserstein distance as a distance measure between distributions. Set the features of the function. The feature amount setting unit 30 may set the feature amount of the reward function so as to satisfy the Lipschitz continuity condition, for example.
- the feature amount setting unit 30 may set the feature amount so that the reward function becomes a linear function.
- equation 7 illustrated below can be said to be an inappropriate reward function in the present disclosure because the gradient becomes infinite at a0 .
- the feature amount setting unit 30 may determine, for example, a reward function in which the feature amount is set according to a user's instruction, or may acquire a reward function satisfying the Lipschitz continuity condition from the storage unit 10.
- the weight initial value setting unit 40 initializes the weight of the reward function. Specifically, the weight initial value setting unit 40 sets the weight of each feature amount included in the reward function.
- the method of initializing the weight is not particularly limited, and the weight may be initialized based on an arbitrary method predetermined according to the user or the like.
- the mathematical optimization execution unit 50 minimizes the distance between the probability distribution of the locus of the expert (behavior history) and the probability distribution of the locus determined based on the optimized (reward function) parameter.
- the locus ⁇ ⁇ ( ⁇ ⁇ is the superposition of ⁇ ) is derived.
- the mathematical optimization execution unit 50 uses the Wasserstein distance instead of KL / JS divergence as a distance measure between distributions, and executes mathematical optimization so as to minimize the Wasserstein distance. By doing so, the trajectory ⁇ ⁇ of the expert is estimated.
- the Wasserstein distance is defined by Equation 8 illustrated below. Due to the limitation of Wasserstein distance, the cost function c ⁇ ( ⁇ ) needs to be a function that satisfies the Lipschitz continuity condition. On the other hand, in the present embodiment, since the feature amount of the reward function is set by the feature amount setting unit 30 so as to satisfy the Lipschitz continuity condition, the mathematical optimization execution unit 50 sets the Wasserstein distance as illustrated below. It will be possible to use it.
- the Wasserstein distance defined by the above-exemplified equation 8 takes a value of 0 or less, and increasing this value corresponds to bringing the distributions closer to each other.
- the argument of the cost function c ⁇ (that is, ⁇ ⁇ ( ⁇ , z (i) )) represents the i-th locus optimized by the parameter ⁇ .
- z is a locus parameter.
- the second term of Equation 8 is a term that can be calculated even in a combinatorial optimization problem. Therefore, by using the Wasserstein distance exemplified in Equation 8 as a distance scale between distributions, it becomes possible to stably carry out inverse reinforcement learning in the combinatorial optimization problem.
- the weight update unit 60 may also describe the update rule by a non-enlarged map (hereinafter, referred to as a non-enlarged map gradient method) in order to monotonically increase the Wasserstein distance when updating the parameters of the reward function. There is.) May be used.
- a non-enlarged map gradient method hereinafter, referred to as a non-enlarged map gradient method
- ⁇ t be the parameter of the reward function updated at the t-th time
- W ( ⁇ t ) be the Wasserstein distance
- ⁇ t be the step width.
- the update rule of the parameter of the reward function can be expressed as the following equation 12.
- the weight update unit 60 searches for the step width of the gradient that increases the Wasserstein distance under the constraint that the update rule of the parameters of the reward function (that is, ⁇ (t) ⁇ ⁇ (t + 1)) becomes a non-enlarged map. , Update the reward function parameters with that step width. Specifically, the weight updating unit 60 updates the parameters of the reward function with a step width ⁇ t that satisfies the conditions shown in the following equations 13 and 14.
- Equations 13 and 14 the Wasserstein distance W ( ⁇ t + 1) at the time of the previous update t-1 is increased so that the Wasserstein distance after the parameter update becomes large (W ( ⁇ t + 1 )> W ( ⁇ t )).
- the estimation result by the mathematical optimization execution unit 50 may be discontinuous with respect to the change in the reward function.
- the mathematical optimization execution unit 50 can update the parameters while guaranteeing the monotonous increase of the Wasserstein distance by using the non-enlarged mapping gradient method described above.
- trajectory estimation process by the mathematical optimization execution unit 50 and the parameter update process by the weight update unit 60 are repeated until it is determined by the convergence test unit 70 described later that the Wasserstein distance has converged.
- the convergence test unit 70 determines that the distance has not converged, the process by the mathematical optimization execution unit 50 and the weight update unit 60 is continued. On the other hand, when the convergence test unit 70 determines that the distance has converged, the process by the mathematical optimization execution unit 50 and the weight update unit 60 is terminated.
- the output unit 80 outputs the learned reward function.
- FIG. 2 is an explanatory diagram showing an example of reverse reinforcement learning using the Wasserstein distance.
- the reverse reinforcement learning using the Wasserstein distance shown in the present disclosure may be referred to as Wasserstein IRL (WIRL).
- WIRL Wasserstein IRL
- the parameters of the reward function are updated by performing mathematical optimization so as to maximize the Wasserstein distance. This process corresponds to the process of the weight update unit 60.
- the input unit 20, the feature amount setting unit 30, the weight initial value setting unit 40, the mathematical optimization execution unit 50, the weight update unit 60, the convergence determination unit 70, and the output unit 80 are programs (learning programs). ) Is realized by a computer processor (for example, a CPU (Central Processing Unit)) that operates according to the above.
- a computer processor for example, a CPU (Central Processing Unit)
- the program is stored in the storage unit 10 included in the learning device 100, the processor reads the program, and the input unit 20, the feature amount setting unit 30, the weight initial value setting unit 40, and the mathematical optimization execution unit according to the program. It may operate as 50, a weight update unit 60, a convergence determination unit 70, and an output unit 80. Further, the function of the learning device 100 may be provided in the SaaS (Software as a Service) format.
- SaaS Software as a Service
- the input unit 20, the feature amount setting unit 30, the weight initial value setting unit 40, the mathematical optimization execution unit 50, the weight update unit 60, the convergence test unit 70, and the output unit 80 are respectively. It may be realized by dedicated hardware. Further, a part or all of each component of each device may be realized by a general-purpose or dedicated circuit (circuitry), a processor, or a combination thereof. These may be composed of a single chip or may be composed of a plurality of chips connected via a bus. A part or all of each component of each device may be realized by the combination of the circuit or the like and the program described above.
- each component of the learning device 100 when a part or all of each component of the learning device 100 is realized by a plurality of information processing devices and circuits, the plurality of information processing devices and circuits may be centrally arranged or distributed. It may be arranged.
- the information processing device, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client-server system and a cloud computing system.
- FIG. 3 is a flowchart showing an operation example of the learning device 100 of the present embodiment.
- the input unit 20 accepts input of expert data (that is, expert locus / decision-making history data) (step S11).
- the feature amount setting unit 30 sets the feature amount of the reward function from the data including the state and the action so as to satisfy the Lipschitz continuity condition (step S12).
- the weight initial value setting unit 40 initializes the weight (parameter) of the reward function (step S13).
- the mathematical optimization execution unit 50 accepts the input of the reward function whose feature amount is set so as to satisfy the Lipschitz continuity condition (step S14). Then, the mathematical optimization execution unit 50 executes mathematical optimization so as to minimize the Wasserstein distance (step S15). Specifically, the mathematical optimization execution unit 50 determines a trajectory that minimizes the Wasserstein distance, which represents the distance between the probability distribution of the trajectory of the expert and the probability distribution of the trajectory determined based on the parameters of the reward function. presume.
- the weight update unit 60 updates the parameters of the reward function so as to maximize the Wasserstein distance based on the estimated locus (step S16).
- the weight update unit 60 may update the parameters of the reward function using, for example, the non-enlarged mapping gradient method.
- the convergence test unit 70 determines whether or not the Wasserstein distance has converged (step S17). When it is determined that the Wasserstein distance has not converged (No in step S17), the processing after step S15 is repeated using the updated locus. On the other hand, when it is determined that the Wasserstein distance has converged (Yes in step S17), the output unit 80 outputs the learned reward function (step S18).
- the mathematical optimization execution unit 50 accepts the input of the reward function whose feature amount is set so as to satisfy the Lipschitz continuity condition, and the probability distribution of the locus of the expert and the reward function. Estimate the locus that minimizes the Wasserstein distance, which represents the distance from the probability distribution of the locus determined based on the parameters. Then, the weight update unit 60 updates the parameters of the reward function so as to maximize the Wasserstein distance based on the estimated locus. Therefore, in the combinatorial optimization problem, reverse reinforcement learning can be stably performed.
- FIG. 4 is a block diagram showing an outline of the learning device according to the present invention.
- the learning device 90 (for example, the learning device 100) according to the present invention includes a function input means 91 (for example, a mathematical optimization execution unit 50) that accepts an input of a reward function whose feature amount is set so as to satisfy the Lipschitz continuity condition.
- An estimation means 92 (for example, a mathematical optimization execution unit) that estimates a trajectory that minimizes the Wasserstein distance, which represents the distance between the probability distribution of the trajectory of an expert and the probability distribution of the trajectory determined based on the parameters of the reward function. 50) and an updating means 93 (for example, a weight updating unit 60) that updates the parameters of the reward function so as to maximize the Wasserstein distance based on the estimated trajectory.
- the updating means 93 may update the parameter of the reward function by using the non-magnifying mapping gradient method, which is an updating rule by the non-magnifying mapping.
- the function input means 91 may accept input of a reward function whose feature amount is set so as to be a linear function.
- FIG. 5 is a schematic block diagram showing a configuration of a computer according to at least one embodiment.
- the computer 1000 includes a processor 1001, a main storage device 1002, an auxiliary storage device 1003, and an interface 1004.
- the above-mentioned learning device 90 is mounted on the computer 1000.
- the operation of each of the above-mentioned processing units is stored in the auxiliary storage device 1003 in the form of a program (learning program).
- the processor 1001 reads a program from the auxiliary storage device 1003, expands it to the main storage device 1002, and executes the above processing according to the program.
- the auxiliary storage device 1003 is an example of a non-temporary tangible medium.
- non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs (Compact Disc Read-only memory), DVD-ROMs (Read-only memory), which are connected via interface 1004. Examples include semiconductor memory.
- the program may be for realizing a part of the above-mentioned functions. Further, the program may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with another program already stored in the auxiliary storage device 1003.
- difference file difference program
- a function input means that accepts input of a reward function whose features are set so as to satisfy the Lipschitz continuity condition, and An estimation means for estimating a locus that minimizes the Wasserstein distance, which represents the distance between the probability distribution of the locus of an expert and the probability distribution of the locus determined based on the parameters of the reward function.
- a learning device comprising an updating means for updating the parameters of the reward function so as to maximize the Wasserstein distance based on an estimated locus.
- Appendix 2 The learning device according to Appendix 1, wherein the updating means updates the parameters of the reward function by using the non-enlarged mapping gradient method, which is an updating rule by a non-enlarged mapping.
- the updating means uses the ratio of the ratio of the iganstein distance gradient at the time of this update to the wasserstein distance gradient at the time of the previous update so that the iganstein distance after the parameter update becomes large.
- the learning device according to Appendix 1 or Appendix 2, which updates the parameters of the reward function with a step width equal to or less than the value of the product of the step width at the time of the previous update.
- a means for determining whether or not the Wasserstein distance has converged is provided. If it is determined that the Wasserstein metric distance has not converged, the estimation means is a Wasser that represents the distance between the probability distribution of the expert's trajectory and the probability distribution of the trajectory determined based on the parameters of the updated reward function. The trajectory that minimizes the Stein distance is estimated, and the updating means updates the parameters of the reward function so as to maximize the Wasserstein distance based on the estimated trajectory.
- Appendix 9 To the computer The program storage medium according to Appendix 8 for storing a learning program for updating the parameters of the reward function by using the non-magnification mapping gradient method, which is an update rule by a non-magnification mapping in the update process.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/268,664 US20240037452A1 (en) | 2020-12-25 | 2020-12-25 | Learning device, learning method, and learning program |
| PCT/JP2020/048791 WO2022137520A1 (ja) | 2020-12-25 | 2020-12-25 | 学習装置、学習方法および学習プログラム |
| JP2022570960A JP7537517B2 (ja) | 2020-12-25 | 2020-12-25 | 学習装置、学習方法および学習プログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/048791 WO2022137520A1 (ja) | 2020-12-25 | 2020-12-25 | 学習装置、学習方法および学習プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022137520A1 true WO2022137520A1 (ja) | 2022-06-30 |
Family
ID=82157797
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/048791 Ceased WO2022137520A1 (ja) | 2020-12-25 | 2020-12-25 | 学習装置、学習方法および学習プログラム |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US20240037452A1 (https=) |
| JP (1) | JP7537517B2 (https=) |
| WO (1) | WO2022137520A1 (https=) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024214164A1 (ja) * | 2023-04-11 | 2024-10-17 | 日本電気株式会社 | 情報処理装置、学習方法、および学習プログラム |
| WO2024214163A1 (ja) * | 2023-04-11 | 2024-10-17 | 日本電気株式会社 | 情報処理装置、学習方法、および学習プログラム |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018131214A1 (ja) * | 2017-01-13 | 2018-07-19 | パナソニックIpマネジメント株式会社 | 予測装置及び予測方法 |
| WO2019155052A1 (en) * | 2018-02-09 | 2019-08-15 | Deepmind Technologies Limited | Generative neural network systems for generating instruction sequences to control an agent performing a task |
| JP2020177016A (ja) * | 2019-04-16 | 2020-10-29 | ローベルト ボツシユ ゲゼルシヤフト ミツト ベシユレンクテル ハフツングRobert Bosch Gmbh | 内燃機関を有する車両の駆動システムの排気ガス排出量を低減するための方法 |
-
2020
- 2020-12-25 US US18/268,664 patent/US20240037452A1/en active Pending
- 2020-12-25 JP JP2022570960A patent/JP7537517B2/ja active Active
- 2020-12-25 WO PCT/JP2020/048791 patent/WO2022137520A1/ja not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018131214A1 (ja) * | 2017-01-13 | 2018-07-19 | パナソニックIpマネジメント株式会社 | 予測装置及び予測方法 |
| WO2019155052A1 (en) * | 2018-02-09 | 2019-08-15 | Deepmind Technologies Limited | Generative neural network systems for generating instruction sequences to control an agent performing a task |
| JP2020177016A (ja) * | 2019-04-16 | 2020-10-29 | ローベルト ボツシユ ゲゼルシヤフト ミツト ベシユレンクテル ハフツングRobert Bosch Gmbh | 内燃機関を有する車両の駆動システムの排気ガス排出量を低減するための方法 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024214164A1 (ja) * | 2023-04-11 | 2024-10-17 | 日本電気株式会社 | 情報処理装置、学習方法、および学習プログラム |
| WO2024214163A1 (ja) * | 2023-04-11 | 2024-10-17 | 日本電気株式会社 | 情報処理装置、学習方法、および学習プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| US20240037452A1 (en) | 2024-02-01 |
| JP7537517B2 (ja) | 2024-08-21 |
| JPWO2022137520A1 (https=) | 2022-06-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3446260B1 (en) | Memory-efficient backpropagation through time | |
| JP2019164793A5 (https=) | ||
| JP7529145B2 (ja) | 学習装置、学習方法および学習プログラム | |
| KR102215978B1 (ko) | 블록체인망 상 비동기 분산 병렬형 앙상블 모델 학습 및 추론 시스템 및 그 방법 | |
| CN112767230A (zh) | Gpu图神经网络优化方法及装置 | |
| CN118760100B (zh) | 一种大规模模糊柔性作业车间调度方法及相关设备 | |
| WO2014199920A1 (ja) | 予測関数作成装置、予測関数作成方法、及びコンピュータ読み取り可能な記録媒体 | |
| JP6201556B2 (ja) | 予測モデル学習装置、予測モデル学習方法およびコンピュータプログラム | |
| WO2022137520A1 (ja) | 学習装置、学習方法および学習プログラム | |
| CN118709448B (zh) | 基于数字孪生的动态仿真优化模型构建方法 | |
| CN111985631B (zh) | 信息处理设备、信息处理方法及计算机可读记录介质 | |
| JP7455773B2 (ja) | 求解装置およびプログラム | |
| JP2023174889A (ja) | 学習装置 | |
| Neumann et al. | Sliding window 3-objective Pareto optimization for problems with chance constraints | |
| JP7464115B2 (ja) | 学習装置、学習方法および学習プログラム | |
| WO2022230019A1 (ja) | 学習装置、学習方法および学習プログラム | |
| JPWO2012032747A1 (ja) | 特徴点選択システム、特徴点選択方法および特徴点選択プログラム | |
| JP7529028B2 (ja) | 学習装置、学習方法および学習プログラム | |
| JP7555429B2 (ja) | 学習率がニアゼロである場合の勾配降下法 | |
| JPWO2020090076A1 (ja) | 回答統合装置、回答統合方法および回答統合プログラム | |
| JP7283548B2 (ja) | 学習装置、予測システム、方法およびプログラム | |
| JP7420236B2 (ja) | 学習装置、学習方法および学習プログラム | |
| Venkatesh et al. | i-qls: Quantum-supported algorithm for least squares optimization in non-linear regression | |
| Coolen et al. | Survival signature for reliability quantification of large systems and networks | |
| US20230394970A1 (en) | Evaluation system, evaluation method, and evaluation program |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20967006 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022570960 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18268664 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20967006 Country of ref document: EP Kind code of ref document: A1 |