JP2022064643A

JP2022064643A - Model learning device, controller, model learning method and computer program

Info

Publication number: JP2022064643A
Application number: JP2020173380A
Authority: JP
Inventors: 竜大森安; Tatsuhiro Moriyasu; 太郎池田; Taro Ikeda; 幹人竹内; Mikito Takeuchi
Original assignee: Toyota Industries Corp; Toyota Central R&D Labs Inc
Current assignee: Toyota Industries Corp; Toyota Central R&D Labs Inc
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2022-04-26
Anticipated expiration: 2040-10-14
Also published as: EP3985461A1; AU2021240175A1; US20220114461A1; JP7336425B2; AU2021240175B2

Abstract

To provide a technology for learning a model that can determine an input that improves the followability of an output to a target value while stably controlling a system, in a model learning device that learns a model indicating the relation between the input and the output in the system.SOLUTION: A model learning device comprises a model storage unit that stores a model used for learning a nonlinear equation of state for predicting an output variable y using an input variable v, and a learning unit that learns an equation of state using the model and an input/output data set that includes a plurality of sets of input variable data and output variable data for the model. The model is an equation of state that includes a bijective map Ψ with the input variable v as input, and a bijective map Φ with the output variable y as input.SELECTED DRAWING: Figure 1

Description

本発明は、モデル学習装置、制御装置、モデル学習方法、および、コンピュータプログラムに関する。 The present invention relates to a model learning device, a control device, a model learning method, and a computer program.

従来から、システムを制御するための入力と、この入力に対するシステムからの出力との関係を表すモデルを学習するモデル学習装置が知られている。例えば、特許文献１には、システムの将来の状態を予測し制御するモデル予測制御に用いられるモデルを、機械学習によって学習するモデル学習装置が記載されている。非特許文献１には、特殊なモデルを用いたモデル予測制御によって、システムの出力を最大化させる技術が記載されている。 Conventionally, a model learning device for learning a model representing a relationship between an input for controlling a system and an output from the system with respect to this input has been known. For example, Patent Document 1 describes a model learning device that learns a model used for model predictive control that predicts and controls a future state of a system by machine learning. Non-Patent Document 1 describes a technique for maximizing the output of a system by model predictive control using a special model.

特願２０１８－１７９８８８号公報Japanese Patent Application No. 2018-179888

“ＯｐｔｉｍａｌＣｏｎｔｒｏｌＶｉａＮｅｕｒａｌＭｅｔｗｏｒｋｓ：ＡＣｏｎｖｅｘＡｐｐｒｏａｃｈ"、［ｏｎｌｉｎｅ］、ＹｉｚｅＣｈｅｎ、ＹｕａｎｙｕａｎＳｈｉ、ＢａｏｓｅｎＺｈａｎｇ、［令和２年９月２８日検索］、インターネット（ＵＲＬ：ｈｔｔｐｓ：／／ａｒｘｉｖ.ｏｒｇ／ａｂｓ／１８０５.１１８３）"Optimal Control Via Natural Methods: A Convex Approach", [online], Yize Chen, Yuanyun Shi, Baosen Zhang, [Reiwa 2nd September 28th search], Internet (URL) /1805.1183)

しかしながら、上述したような先行技術によっても、モデル学習装置において、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習する技術については、なお改善の余地があった。モデルを用いるモデル予測制御では、システムの制御周期ごとに、最適制御問題（ＯＣＰ）と呼ばれる一種の最適化問題を解く。この最適制御問題では、モデルからシステムの将来の状態やシステムにおける出力変化を予測できることを利用して、システムの状態や出力変化が最も望ましい挙動となるように、最適な入力の時系列を求める。具体的には、設計者が任意に設定する目的関数を最小化するような入力の時系列を求める最適化（最小化）問題として解くことになる。 However, even with the prior art as described above, in the model learning device, a model capable of constructing a control device capable of determining an input that improves the followability to the target value of the output while stably controlling the system is learned. There was still room for improvement in the technology to be used. In model predictive control using a model, a kind of optimization problem called an optimal control problem (OCP) is solved for each control cycle of the system. In this optimal control problem, the time series of the optimum input is obtained so that the system state and the output change become the most desirable behavior by utilizing the fact that the future state of the system and the output change in the system can be predicted from the model. Specifically, it will be solved as an optimization (minimization) problem for finding a time series of inputs that minimizes the objective function arbitrarily set by the designer.

特許文献１の技術では、機械学習を用いて学習されたモデルは、非線形性が強いため、最適制御問題は、非凸最適化問題となりやすい。このため、解の一意性を保証することができない。また、設定される初期条件によっては、入力に不規則なばらつきが発生するおそれがあり、信頼性を担保することが困難である。また、非特許文献１の技術では、特殊なモデルを用いて制御装置を構築することで、ある出力や状態そのものを最大化ないし最小化するための入力を決定することができるものの、出力の目標値を与えて、それに追従させる場合、出力の偏差を最小化することができる入力を一意に決定することは困難である。したがって、出力の目標値に追従させる制御では不安定になりやすい。 In the technique of Patent Document 1, since the model learned by using machine learning has strong non-linearity, the optimum control problem tends to be a non-convex optimization problem. Therefore, the uniqueness of the solution cannot be guaranteed. Further, depending on the initial conditions to be set, irregular variations may occur in the input, and it is difficult to ensure reliability. Further, in the technique of Non-Patent Document 1, by constructing a control device using a special model, it is possible to determine an input for maximizing or minimizing a certain output or state itself, but the output target. Given a value and following it, it is difficult to uniquely determine an input that can minimize output deviations. Therefore, the control that follows the target value of the output tends to be unstable.

本発明は、上述した課題を解決するためになされたものであり、システムにおける入力と出力との関係を表すモデルを学習するモデル学習装置において、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができるモデルを学習する技術を提供することを目的とする。 The present invention has been made to solve the above-mentioned problems, and is a model learning device for learning a model representing a relationship between an input and an output in a system, in which the system is stably controlled with respect to an output target value. It is an object of the present invention to provide a technique for learning a model capable of determining an input that improves followability.

本発明は、上述の課題を解決するためになされたものであり、以下の形態として実現できる。 The present invention has been made to solve the above-mentioned problems, and can be realized as the following forms.

（１）本発明の一形態によれば、システムに入力される入力変数ｖと、前記システムから出力される出力変数ｙとの関係を表すモデルを学習するモデル学習装置が提供される。このモデル学習装置は、前記入力変数ｖを用いて前記出力変数ｙを予測するための非線形の状態方程式の学習に用いられるモデルを記憶するモデル記憶部と、前記モデルと、前記モデルに対する入力変数データと出力変数データの組を複数含んだ入出力データセットと、を用いて前記状態方程式を学習する学習部と、を備え、前記モデルは、前記入力変数ｖを入力とする全単射な写像Ψと、前記出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。 (1) According to one embodiment of the present invention, there is provided a model learning device that learns a model representing the relationship between the input variable v input to the system and the output variable y output from the system. This model learning device includes a model storage unit that stores a model used for learning a nonlinear state equation for predicting the output variable y using the input variable v, the model, and input variable data for the model. The model comprises an input / output data set containing a plurality of sets of output variable data and a learning unit for learning the state equation using the input variable v, and the model is a fully monomorphic mapping Ψ with the input variable v as an input. It is a state equation including a fully monomorphic map Φ with the output variable y as an input.

この構成によれば、モデルは、システムに入力される入力変数ｖを入力とする全単射な写像Ψと、システムから出力される出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。このような状態方程式は、写像Ψ、Φのそれぞれを内部変数とすることで、線形化することができるため、非線形な構造をしているモデルを用いた制御問題においても、解が一意であることを保証することができる。これにより、システムに入力される入力変数ｖの最適値を１つに決めることができるため、システムを制御する制御装置にこのモデル学習装置を適用した場合、入力変数ｖの最適値を用いて、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 According to this configuration, the model has a bijective map Ψ with the input variable v input to the system as input and a bijective map Φ with the output variable y output from the system as input. It is a state equation including. Since such an equation of state can be linearized by setting each of the maps Ψ and Φ as internal variables, the solution is unique even in a control problem using a model having a non-linear structure. We can guarantee that. As a result, the optimum value of the input variable v input to the system can be determined to be one. Therefore, when this model learning device is applied to the control device that controls the system, the optimum value of the input variable v is used. While stably controlling the system, it is possible to improve the followability of the output from the system to the target value. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system.

（２）上記形態のモデル学習装置において、前記モデルは、式（１）によって定義されてもよい。

上記式において、等号の左辺は、前記出力変数ｙを表すｎ（ｎは整数）次元ベクトルの時間微分であり、等号の右辺のうち、前記入力変数ｖは、ｍ（ｍは整数）次元ベクトルであり、外生入力ｄは、前記出力変数ｙの変化に影響を与える制御不可能な入力を示すｐ（ｐは整数）次元ベクトルであり、前記写像Ψは、前記入力変数ｖと前記外生入力ｄを入力としてｍ次元のベクトルを返す関数であり、前記写像Φは、前記出力変数ｙと前記外生入力ｄを入力としてｎ次元のベクトルを返す関数であり、関数Ａ’、関数Ｂ’、関数ｃ’のそれぞれは、前記外生入力ｄを入力として、ｎ×ｎ行列、ｎ×ｍ行列、ｎ次元ベクトルのそれぞれを返す関数である。この構成によれば、写像Ψ、Φのそれぞれは、入力変数ｖ、出力変数ｙを入力とする全単射な写像であるため、例えば、関数Ｆ、Ｇを用いて、Ｆ^-1＝Ψ、Ｇ^-1＝Φとするように、式（１）を形式的に書き換えることが可能である。また、式（１）のモデルに含まれる写像Ψ、Φのそれぞれには、出力変数ｙの変化に影響を与える制御不可能な入力である外生入力ｄが含まれている。さらに、式（１）のモデルでは、外生入力ｄを入力とする関数Ａ’（ｄ）と関数Ｂ’（ｄ）とのそれぞれが写像Ψ、Φのそれぞれの係数となっている。また、式（１）のモデルには、外生入力ｄを入力とする関数ｃ’（ｄ）と、外生入力ｄの時間微分の項と、が含まれている。これらによって、式（１）のモデルは、出力変数ｙの変化に影響を与える制御不可能な外生入力ｄによる影響も考慮した状態方程式となるため、このモデルを用いることで、システムの将来の状態を高精度に予測することができる。したがって、システムを高精度に制御することができるモデルを学習することができる。 (2) In the model learning device of the above embodiment, the model may be defined by the equation (1).

In the above equation, the left side of the equality is the time differential of the n (n is an integer) dimensional vector representing the output variable y, and of the right side of the equality, the input variable v is the m (m is an integer) dimension. The exogenous input d is a p (p is an integer) dimensional vector indicating an uncontrollable input that affects the change in the output variable y, and the mapping Ψ is the input variable v and the outside. It is a function that returns an m-dimensional vector with the raw input d as an input, and the mapping Φ is a function that returns an n-dimensional vector with the output variable y and the external raw input d as inputs, and is a function A'and a function B. ', Each of the functions c'is a function that takes the exogenous input d as an input and returns each of an n × n matrix, an n × m matrix, and an n-dimensional vector. According to this configuration, each of the maps Ψ and Φ is a bijective map with the input variable v and the output variable y as inputs. Therefore, for example, using the functions F and G, F ^-1 = Ψ, Equation (1) can be formally rewritten so that G ^-1 = Φ. Further, each of the maps Ψ and Φ included in the model of the equation (1) includes an exogenous input d which is an uncontrollable input that affects the change of the output variable y. Further, in the model of the equation (1), the function A'(d) and the function B'(d) having the exogenous input d as an input are the coefficients of the mapping Ψ and Φ, respectively. Further, the model of the equation (1) includes a function c'(d) having the exogenous input d as an input and a term of the time derivative of the exogenous input d. As a result, the model of Eq. (1) becomes an equation of state that takes into account the influence of the uncontrollable exogenous input d that affects the change of the output variable y. Therefore, by using this model, the future of the system can be used. The state can be predicted with high accuracy. Therefore, it is possible to learn a model that can control the system with high precision.

（３）上記形態のモデル学習装置において、前記式（１）において、前記写像Ψを内部変数ｕと定義し、前記写像Φを内部変数ｘと定義すると、前記学習部は、式（２）～式（４）によって定義される前記状態方程式を学習してもよい。

この構成によれば、式（１）の状態方程式において、写像Ψを内部変数ｕと定義し、写像Φを内部変数ｘと定義することで、式（１）の状態方程式を線形化することができる。これにより、式（１）に示す状態方程式は、それを用いた最適制御問題の解が一意となることを保証することができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 (3) In the model learning device of the above embodiment, if the map Ψ is defined as the internal variable u and the map Φ is defined as the internal variable x in the equation (1), the learning unit will perform the equations (2) to The state equation defined by the equation (4) may be learned.

According to this configuration, in the equation of state of equation (1), the equation of state of equation (1) can be linearized by defining the mapping Ψ as the internal variable u and the mapping Φ as the internal variable x. can. Thereby, the equation of state shown in the equation (1) can guarantee that the solution of the optimal control problem using the equation of state is unique. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system.

（４）上記形態のモデル学習装置において、前記写像Ψは、式（５）～式（８）によって定義されてもよい。

また、前記写像Φは、式（９）～式（１２）によって定義されてもよい。

式（５）～式（１２）において、ｉは、多層ニューラルネットワークにおける層の番号であり、Ｌ_Ψ、Ｌ_Φのそれぞれは、多層ニューラルネットワークの層数であり、Ｗ_Ψ、Ｗ_Φのそれぞれは重みであり、ｂ_Ψ、ｂ_Φはバイアスであり、ψ_Ψ、φ_Φのそれぞれは、活性化関数であって、入力と同次元の出力を返す任意の全単射な写像である。この構成によれば、写像Ψ、Φのそれぞれは、多層ニューラルネットワークを用いて定義されている。これにより、モデルを用いて計算される入力変数ｖに対する出力変数ｙが実際のシステムの出力に近づくように、多層ニューラルネットワークの各層における重みＷ_Ψ、Ｗ_Φやバイアスｂ_Ψ、ｂ_Φを調整することで、実際のシステムの出力を高精度に予測するモデルを学習することができる。したがって、出力の目標値に対する追従性をさらに向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 (4) In the model learning device of the above embodiment, the mapping Ψ may be defined by equations (5) to (8).

Further, the map Φ may be defined by the equations (9) to (12).

In equations (5) to (12), i is the number of layers in the multi-layered neural network, L _Ψ and L _Φ are the number of layers in the multi-layered neural network, and W _Ψ and W _Φ are each. Weights, b _Ψ and b _Φ are biases, and ψ _Ψ and φ _Φ are activation functions, respectively, and are any single-shot maps that return an output of the same dimension as the input. According to this configuration, each of the maps Ψ and Φ is defined using a multi-layer neural network. This adjusts the weights W _Ψ , W _Φ and bias b _Ψ , b _Φ in each layer of the multi-layer neural network so that the output variable y for the input variable v calculated using the model approaches the output of the actual system. This makes it possible to learn a model that predicts the output of an actual system with high accuracy. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that further improves the followability of the output to the target value.

（５）上記形態のモデル学習装置において、前記学習部は、前記モデルに対して、前記入出力データセットのうちの前記入力変数データのセットを与えて出力を推定し、推定された出力と、前記入出力データセットのうちの前記出力変数データのセットと、の一致度を評価し、評価の結果に応じて前記モデルの学習パラメータ、例えば、式（５）～式（１２）に含まれる重みＷ_Ψ、Ｗ_Φやバイアスｂ_Ψ、ｂ_Φを更新することで、前記状態方程式を学習してもよい。この構成によれば、学習部は、入出力データセットのうちの入力変数データセットを用いて推定された出力と、出力変数データセットとの一致度を評価する。学習部は、この一致度の評価に応じて、モデルについての学習パラメータを更新し、状態方程式を学習する。すなわち、学習部は、予め準備された入出力データセットを教師データとした学習手法に沿って、非線形の状態方程式を学習することができる。これにより、実際のシステムに沿ったモデルを学習することができるため、システムをさらに安定的に制御しつつ、システムからの出力の目標値に対する追従性がさらに向上された制御装置を構築可能なモデルを学習することができる。 (5) In the model learning device of the above embodiment, the learning unit gives the model a set of the input variable data in the input / output data set, estimates the output, and obtains the estimated output and the estimated output. The degree of agreement with the set of output variable data in the input / output data set is evaluated, and the training parameters of the model, for example, the weights included in the equations (5) to (12), are evaluated according to the evaluation result. The above state equation may be learned by updating W _Ψ , W _Φ , bias b _Ψ , and b _Φ . According to this configuration, the learning unit evaluates the degree of matching between the output estimated using the input variable data set of the input / output data set and the output variable data set. The learning unit updates the learning parameters for the model and learns the equation of state according to the evaluation of the degree of agreement. That is, the learning unit can learn the nonlinear equation of state according to the learning method using the input / output data set prepared in advance as the teacher data. As a result, it is possible to learn a model along with the actual system, so it is possible to construct a control device with further improved followability to the target value of the output from the system while controlling the system more stably. Can be learned.

（６）上記形態のモデル学習装置において、前記学習部は、前記式（２）～式（４）を、離散時刻ｋの時間ステップで離散化した式（１３）～式（１５）に示す状態方程式を学習してもよい。

この構成によれば、学習部は、式（２）～式（４）に示す状態方程式を、離散時刻ｋの時間ステップで離散化した式（１３）～式（１５）に示す状態方程式を学習する。これにより、内部変数ｘ、ｕの数を有限とすることができるため、モデルの学習に要する時間を短くすることができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを比較的短時間で学習することができる。 (6) In the model learning device of the above embodiment, the learning unit shows the states shown in the equations (13) to (15) in which the equations (2) to (4) are discretized in the time step of the discrete time k. You may learn the equation.

According to this configuration, the learning unit learns the equations of state shown in equations (13) to (15), which are discretized by the time step of the discrete time k from the equations of state shown in equations (2) to (4). do. As a result, the number of internal variables x and u can be made finite, so that the time required for learning the model can be shortened. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system in a relatively short time.

（７）本発明の別の形態によれば、システムを制御する制御装置が提供される。この制御装置は、上記（６）に記載のモデル学習装置と、前記学習部が学習した前記状態方程式を用いて、前記出力変数ｙの目標値に対応する前記入力変数ｖの目標値を決定する決定部と、を備え、前記決定部は、前記学習部が学習した式（１３）～式（１５）に示す状態方程式を用いた最適制御問題を解くことで前記入力変数ｖの目標値を決定してもよい。この構成によれば、決定部は、学習部が学習した式（１３）～式（１５）に示す状態方程式を用いて、最適制御問題を解くことで入力変数ｖの目標値を決定する。このとき、式（１５）が線形モデルであることを利用することで、式（１３）～式（１５）を用いた最適制御問題を凸最適化問題とすることができる。これにより、システムに入力される入力変数ｖの最適値を１つに決めることができるため、制御装置は、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。 (7) According to another aspect of the present invention, a control device for controlling the system is provided. This control device uses the model learning device described in (6) above and the state equation learned by the learning unit to determine the target value of the input variable v corresponding to the target value of the output variable y. The determination unit includes a determination unit, and the determination unit determines a target value of the input variable v by solving an optimum control problem using the state equations shown in the equations (13) to (15) learned by the learning unit. You may. According to this configuration, the determination unit determines the target value of the input variable v by solving the optimum control problem using the equations of state shown in the equations (13) to (15) learned by the learning unit. At this time, by utilizing the fact that the equation (15) is a linear model, the optimum control problem using the equations (13) to (15) can be made into a convex optimization problem. As a result, the optimum value of the input variable v input to the system can be determined to be one, so that the control device can stably control the system and improve the followability of the output from the system to the target value. be able to.

（８）本発明のさらに別の形態によれば、システムに入力される入力変数ｖと、前記システムから出力される出力変数ｙとの関係を表すモデルを学習するモデル学習方法が提供される。このモデル学習方法は、前記入力変数ｖを用いて前記出力変数ｙを予測するための非線形の状態方程式の学習に用いられるモデルを取得する工程と、前記モデルと、前記モデルに対する入力変数データと出力変数データの組を複数含んだ入出力データセットと、を用いて前記状態方程式を学習する工程と、を備え、前記モデルは、前記入力変数ｖを入力とする全単射な写像Ψと、前記出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。この構成によれば、モデルを取得する工程において取得するモデルは、システムに入力される入力変数ｖを入力とする全単射な写像Ψと、システムから出力される出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。このような状態方程式は、写像Ψ、Φのそれぞれを内部変数とすることで、線形化することができるため、非線形な構造をしているモデルを用いた制御問題においても、解が一意であることを保証することができる。これにより、システムに入力される入力変数ｖの最適値を１つに決めることができるため、システムを制御する制御装置にこのモデル学習方法を適用した場合、入力変数ｖの最適値を用いて、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 (8) According to still another embodiment of the present invention, there is provided a model learning method for learning a model representing the relationship between the input variable v input to the system and the output variable y output from the system. This model learning method includes a step of acquiring a model used for learning a nonlinear state equation for predicting the output variable y using the input variable v, the model, and input variable data and output for the model. The model comprises an input / output data set including a plurality of sets of variable data and a step of learning the state equation using the input / output data set. It is a state equation including a fully monomorphic map Φ with an output variable y as an input. According to this configuration, the model acquired in the process of acquiring the model is bijective mapping Ψ with the input variable v input to the system as the input and bijection with the output variable y output from the system as the input. It is a state equation including a bijective map Φ. Since such an equation of state can be linearized by setting each of the maps Ψ and Φ as internal variables, the solution is unique even in a control problem using a model having a non-linear structure. We can guarantee that. As a result, the optimum value of the input variable v input to the system can be determined to be one. Therefore, when this model learning method is applied to the control device that controls the system, the optimum value of the input variable v is used. While stably controlling the system, it is possible to improve the followability of the output from the system to the target value. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system.

（９）本発明のさらに別の形態によれば、システムに入力される入力変数ｖと、前記システムから出力される出力変数ｙとの関係を表すモデルの学習を情報処理装置に実行させるコンピュータプログラムが提供される。このコンピュータプログラムは、前記入力変数ｖを用いて前記出力変数ｙを予測するための非線形の状態方程式の学習に用いられるモデルを取得する機能と、前記モデルと、前記モデルに対する入力変数データと出力変数データの組を複数含んだ入出力データセットと、を用いて前記状態方程式を学習する機能と、を前記情報処理装置に実行させ、前記モデルは、前記入力変数ｖを入力とする全単射な写像Ψと、前記出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。この構成によれば、コンピュータプログラムでは、モデルを取得する機能において取得するモデルは、システムに入力される入力変数ｖを入力とする全単射な写像Ψと、システムから出力される出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。このような状態方程式は、写像Ψ、Φのそれぞれを内部変数とすることで、線形化することができるため、非線形な構造をしているモデルを用いた制御問題においても、解が一意であることを保証することができる。これにより、システムに入力される入力変数ｖの最適値を１つに決めることができるため、システムを制御する制御装置の情報処理装置にこのコンピュータプログラムを適用した場合、入力変数ｖの最適値を用いて、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。したがって、情報処理装置は、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 (9) According to still another embodiment of the present invention, a computer program that causes an information processing apparatus to learn a model representing the relationship between the input variable v input to the system and the output variable y output from the system. Is provided. This computer program has a function of acquiring a model used for learning a nonlinear state equation for predicting the output variable y using the input variable v, the model, input variable data and output variables for the model. The information processing apparatus is made to execute an input / output data set including a plurality of data sets and a function of learning the state equation using the input / output data set, and the model is a total single-shot with the input variable v as an input. It is a state equation including a map Ψ and a fully monomorphic map Φ with the output variable y as an input. According to this configuration, in the computer program, the model acquired in the function to acquire the model has a bijective mapping Ψ with the input variable v input to the system as an input and the output variable y output from the system. It is a state equation including a bijective map Φ as an input. Since such an equation of state can be linearized by setting each of the maps Ψ and Φ as internal variables, the solution is unique even in a control problem using a model having a non-linear structure. We can guarantee that. As a result, the optimum value of the input variable v input to the system can be determined to be one. Therefore, when this computer program is applied to the information processing device of the control device that controls the system, the optimum value of the input variable v is set. By using it, it is possible to improve the followability of the output from the system to the target value while stably controlling the system. Therefore, the information processing device can learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system.

なお、本発明は、種々の態様で実現することが可能であり、例えば、非線形システムのモデルを学習する装置および方法、学習により得られたモデルを用いて状態を推定する装置および方法、これらの装置が含まれるシステム、これらの装置およびシステムにおいて実行されるコンピュータプログラム、そのコンピュータプログラムを配布するためのサーバ装置、そのコンピュータプログラムを記憶した一時的でない記憶媒体等の形態で実現することができる。 The present invention can be realized in various embodiments, for example, an apparatus and method for learning a model of a nonlinear system, an apparatus and method for estimating a state using a model obtained by learning, and the like. It can be realized in the form of a system including the device, a computer program executed in these devices and the system, a server device for distributing the computer program, a non-temporary storage medium in which the computer program is stored, and the like.

第１実施形態のモデル学習装置の構成を示す模式図である。It is a schematic diagram which shows the structure of the model learning apparatus of 1st Embodiment. 第１実施形態のモデル学習方法のフローチャートである。It is a flowchart of the model learning method of 1st Embodiment. 第２実施形態の制御装置の構成を示す模式図である。It is a schematic diagram which shows the structure of the control device of 2nd Embodiment. 第２実施形態の予測制御方法のフローチャートである。It is a flowchart of the prediction control method of 2nd Embodiment. 凸関数と非凸関数の一例を説明する模式図である。It is a schematic diagram explaining an example of a convex function and a non-convex function. モデル学習装置における計算結果を説明する第１の模式図である。It is 1st schematic diagram explaining the calculation result in the model learning apparatus. ２つのモデル学習装置における計算結果を説明する第２の模式図である。It is a 2nd schematic diagram explaining the calculation result in two model learning devices.

＜第１実施形態＞
図１は、第１実施形態のモデル学習装置１００の構成を示す模式図である。本実施形態のモデル学習装置１００は、非線形システムのモデルを学習する装置である。ここで、「非線形システム」とは、任意の制御対象物（システム）に対する入出力パラメータの関係性が一次式では表せない、または、近似できない性質を持つシステムを意味する。また、本実施形態では「モデル」として、非線形の状態方程式を例示する。すなわち、モデル学習装置１００は、任意のシステムの状態を、該システムから出力される出力変数ｙとみなすことで、システムに入力される入力変数ｖによって制御された結果、システムの出力変数ｙを予測する非線形の状態方程式を学習する。なお、「状態方程式」とは、「ｙ・（ｔ）＝ｆ（ｙ（ｔ）、・・・）」のように、現時刻ｔにおける出力変数ｙ（ｔ）によって、それ自身の出力変数ｙ・（ｔ）を決定する方程式を意味する。以降、表記の便宜上、任意の変数ｚの時間微分を「ｚ・」と記載する。 <First Embodiment>
FIG. 1 is a schematic diagram showing the configuration of the model learning device 100 of the first embodiment. The model learning device 100 of the present embodiment is a device for learning a model of a nonlinear system. Here, the "non-linear system" means a system having a property that the relationship between input / output parameters to an arbitrary controlled object (system) cannot be expressed or approximated by a linear expression. Further, in this embodiment, a non-linear equation of state is exemplified as a "model". That is, the model learning device 100 predicts the output variable y of the system as a result of being controlled by the input variable v input to the system by regarding the state of an arbitrary system as the output variable y output from the system. Learn non-linear state equations. The "equation of state" is an output variable y of its own according to the output variable y (t) at the current time t, such as "y · (t) = f (y (t), ...)". -It means an equation that determines (t). Hereinafter, for convenience of notation, the time derivative of any variable z is described as “z ·”.

システムは、例えば、内燃機関、ハイブリッド機関、パワートレインなどが含まれる。内燃機関、ハイブリッド機関、パワートレインなどの駆動機関をシステムとした場合、モデル学習装置１００により学習されるモデルは、システムの駆動に関する種々のパラメータ、例えば、制御対象部のアクチュエータの操作量、制御対象部に対する外乱、制御対象部の状態、制御対象部の出力、制御対象部の出力目標値などの関係を表す非線形の状態方程式となる。 Systems include, for example, internal combustion engines, hybrid engines, powertrains and the like. When a drive engine such as an internal combustion engine, a hybrid engine, or a power train is used as a system, the model learned by the model learning device 100 has various parameters related to the drive of the system, for example, the operation amount of the actuator of the control target unit and the control target. It is a non-linear state equation that expresses the relationship between the disturbance to the unit, the state of the controlled object unit, the output of the controlled object unit, the output target value of the controlled object unit, and the like.

モデル学習装置１００は、例えば、パーソナルコンピュータ（ＰＣ：ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）であり、ＣＰＵ１１０と、記憶部１２０と、ＲＯＭ／ＲＡＭ１３０と、通信部１４０と、入出力部１５０と、を備えている。モデル学習装置１００の各部は、バスにより相互に接続されている。 The model learning device 100 is, for example, a personal computer (PC: Personal Computer), and includes a CPU 110, a storage unit 120, a ROM / RAM 130, a communication unit 140, and an input / output unit 150. Each part of the model learning device 100 is connected to each other by a bus.

ＣＰＵ１１０は、制御部１１１と、学習部１１２と、を備えている。制御部１１１は、ＲＯＭ１３０に格納されているコンピュータプログラムをＲＡＭ１３０に展開して実行することにより、モデル学習装置１００の各部を制御する。学習部１１２は、任意のシステム（非線形システム）の状態を表す出力変数ｙを予測するための非線形の状態方程式を学習する。学習部１１２の機能の詳細は、後述する。 The CPU 110 includes a control unit 111 and a learning unit 112. The control unit 111 controls each unit of the model learning device 100 by expanding and executing the computer program stored in the ROM 130 in the RAM 130. The learning unit 112 learns a non-linear equation of state for predicting an output variable y representing the state of an arbitrary system (non-linear system). The details of the function of the learning unit 112 will be described later.

記憶部１２０は、ハードディスク、フラッシュメモリ、メモリカードなどで構成される記憶媒体である。記憶部１２０は、モデル記憶部１２１と、データセット記憶部１２２と、を有している。モデル記憶部１２１は、学習部１１２による状態方程式の学習のために用いられるモデルを予め記憶している。本実施形態では、モデル記憶部１２１に記憶されたモデルは、入力変数ｖを入力とする全単射な写像Ψと、出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式であって、式（１）により定義される。ここで、「全単射」とは、集合Ａの写像の結果が集合Ｂであるとした場合に、ＡとＢの各要素が必ず一対一の対応関係を持つことを意味する。これは、例えば、関数ｆが全単射である場合、一意の逆関数ｆ^-1が存在することと同義である。 The storage unit 120 is a storage medium composed of a hard disk, a flash memory, a memory card, and the like. The storage unit 120 includes a model storage unit 121 and a data set storage unit 122. The model storage unit 121 stores in advance a model used for learning the equation of state by the learning unit 112. In the present embodiment, the model stored in the model storage unit 121 is a state equation including a bijective map Ψ with an input variable v as an input and a bijective map Φ with an output variable y as an input. And is defined by equation (1). Here, "bijection" means that each element of A and B always has a one-to-one correspondence relationship when the result of the mapping of the set A is the set B. This is synonymous with the existence of a unique inverse function f ^-1 , for example, when the function f is bijective.

上記式において、等号の左辺は、出力変数ｙを表すｎ（ｎは整数）次元ベクトルの時間微分であり、等号の右辺のうち、入力変数ｖは、ｍ（ｍは整数）次元ベクトルであり、外生入力ｄは、出力変数ｙの変化に影響を与える制御不可能な入力を示すｐ（ｐは整数）次元ベクトルであり、写像Ψは、入力変数ｖと外生入力ｄを入力としてｍ次元のベクトルを返す関数であり、写像Φは、出力変数ｙと外生入力ｄを入力としてｎ次元のベクトルを返す関数であり、関数Ａ’、関数Ｂ’、関数ｃ’のそれぞれは、外生入力ｄを入力として、ｎ×ｎ行列、ｎ×ｍ行列、ｎ次元ベクトルのそれぞれを返す関数である。

In the above equation, the left side of the equality is the time differential of the n (n is an integer) dimensional vector representing the output variable y, and of the right side of the equality, the input variable v is the m (m is an integer) dimensional vector. Yes, the exogenous input d is a p (p is an integer) dimensional vector indicating an uncontrollable input that affects changes in the output variable y, and the mapping Ψ takes the input variable v and the exogenous input d as inputs. The mapping Φ is a function that returns an m-dimensional vector, and the mapping Φ is a function that returns an n-dimensional vector with the output variable y and the exogenous input d as inputs, and each of the function A', the function B', and the function c'is It is a function that takes an exogenous input d as an input and returns each of an n × n matrix, an n × m matrix, and an n-dimensional vector.

データセット記憶部１２２は、式（１）で表されるモデルに対する入力変数データと出力変数データの組を複数含んでいる入出力データセットを予め記憶している。この入力変数データと出力変数データの組は、システムに対する実験や計算により予め求められている。入出力データセットは、学習部１１２による状態方程式の学習のために用いられる教師データとして用いられる。以降、入出力データセットのうち、複数の入力変数データをまとめて「入力変数データセット」とも呼び、複数の出力変数データをまとめて「出力変数データセット」とも呼ぶ。 The data set storage unit 122 stores in advance an input / output data set including a plurality of sets of input variable data and output variable data for the model represented by the equation (1). This set of input variable data and output variable data has been obtained in advance by experiments and calculations on the system. The input / output data set is used as teacher data used for learning the equation of state by the learning unit 112. Hereinafter, among the input / output data sets, a plurality of input variable data are collectively referred to as an "input variable data set", and a plurality of output variable data are collectively referred to as an "output variable data set".

通信部１４０は、モデル学習装置１００と他の装置との間における、通信インターフェースを介した通信を制御する。他の装置としては、例えば、システムを制御する制御装置や、他の情報処理装置、および、データセット記憶部１２２から入出力データセットを取得するための計測器などが挙げられる。入出力部１５０は、モデル学習装置１００と、利用者との間の情報の入出力に使用される種々のインターフェースである。入出力部１５０としては、例えば、入力部としてのタッチパネル、キーボード、マウス、操作ボタン、マイクや、出力部としてのタッチパネル、モニタ、スピーカー、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）インジケータなどが挙げられる。 The communication unit 140 controls communication via the communication interface between the model learning device 100 and other devices. Examples of the other device include a control device for controlling the system, another information processing device, and a measuring instrument for acquiring an input / output data set from the data set storage unit 122. The input / output unit 150 is various interfaces used for input / output of information between the model learning device 100 and the user. Examples of the input / output unit 150 include a touch panel as an input unit, a keyboard, a mouse, operation buttons, a microphone, a touch panel as an output unit, a monitor, a speaker, an LED (Light Emitting Diode) indicator, and the like.

図２は、第１実施形態のモデル学習方法のフローチャートである。モデル学習装置１００におけるモデル学習方法は、例えば、所定のアプリケーションの起動などの利用者からの要求などによって実行される。本実施形態では、式（１）に示す状態方程式において、出力変数ｙ、入力変数ｖ、システムにおける外生入力ｄ、出力変数ｙの時間微分ｙ・、および、外生入力ｄの時間微分ｄ・を含む既知の入出力データセットを用いて、式（１６）に示す関数Ｆの関数形を学習（推定）する。ここで、出力変数ｙは、ｎ次元ベクトルであり、入力変数ｖは、ｍ次元ベクトルであり、外生入力ｄは、ｐ次元ベクトルである。

FIG. 2 is a flowchart of the model learning method of the first embodiment. The model learning method in the model learning device 100 is executed, for example, by a request from a user such as starting a predetermined application. In the present embodiment, in the state equation shown in the equation (1), the output variable y, the input variable v, the exogenous input d in the system, the time derivative y of the output variable y, and the time derivative d of the exogenous input d. The functional form of the function F shown in Eq. (16) is learned (estimated) using a known input / output data set including. Here, the output variable y is an n-dimensional vector, the input variable v is an m-dimensional vector, and the exogenous input d is a p-dimensional vector.

最初に、学習部１１２は、モデル記憶部１２１に記憶されているモデルを取得する（ステップＳ１１）。具体的には、学習部１１２は、関数Ｆを学習するためのモデルを式（１）に示す状態方程式と想定する。学習部１１２は、式（１）に示す状態方程式において、各変数の値をゼロまたはランダムな値とすることで、各変数を初期化する。

First, the learning unit 112 acquires the model stored in the model storage unit 121 (step S11). Specifically, the learning unit 112 assumes that the model for learning the function F is the equation of state shown in the equation (1). The learning unit 112 initializes each variable by setting the value of each variable to zero or a random value in the equation of state shown in the equation (1).

本実施形態では、学習部１１２は、式（１）に含まれる写像Ψを式（２）で示す内部変数ｕと定義し、式（１）に含まれる写像Φを式（３）で示す内部変数ｘと定義する。これにより、学習部１１２は、式（１）を内部変数ｕ、ｘで示した式（４）の状態方程式を学習することとなる。式（１）の状態方程式に含まれる写像Φ、Ψのそれぞれを内部変数ｘ、ｕのそれぞれで定義する効果は、後述する。

In the present embodiment, the learning unit 112 defines the mapping Ψ included in the equation (1) as the internal variable u represented by the equation (2), and the mapping Φ included in the equation (1) is represented by the equation (3). Defined as a variable x. As a result, the learning unit 112 learns the equation of state of the equation (4) in which the equation (1) is represented by the internal variables u and x. The effect of defining each of the maps Φ and Ψ included in the equation of state of equation (1) with the internal variables x and u will be described later.

さらに、本実施形態では、学習部１１２は、多層ニューラルネットワークの考え方を用いて、写像Ψについての式（５）～式（８）を定義する。

Further, in the present embodiment, the learning unit 112 defines equations (5) to (8) for the mapping Ψ using the concept of the multi-layer neural network.

また、本実施形態では、学習部１１２は、写像Ψについての式（５）～式（８）と同様に、多層ニューラルネットワークの考え方を用いて、写像Φについての式（９）～式（１２）を定義する。

ここで、ｉは、多層ニューラルネットワークにおける層の番号であり、Ｌ_Ψ、Ｌ_Φのそれぞれは、多層ニューラルネットワークの層数であり、Ｗ_Ψ、Ｗ_Φのそれぞれは重みであり、ｂ_Ψ、ｂ_Φはバイアスであり、ψ_Ψ、φ_Φのそれぞれは、活性化関数であり、入力と同次元の出力を返す任意の全単射な写像である。重みＷ_Ψ、Ｗ_Φ、バイアスｂ_Ψ、ｂ_Φ、活性関数、ψ_Ψ、φ_Φのそれぞれは、多層ニューラルネットワークの層ごとに設定されてもよい。 Further, in the present embodiment, the learning unit 112 uses the concept of the multi-layer neural network as in the equations (5) to (8) for the mapping Ψ, and the equations (9) to (12) for the mapping Φ. ) Is defined.

Here, i is the number of layers in the multi-layered neural network, L _Ψ and L _Φ are the number of layers in the multi-layered neural network, and W _Ψ and W _Φ are weights, and b _Ψ and b. _Φ is the bias, and each of ψ _Ψ and φ _Φ is an activation function, which is an arbitrary all-unilateral mapping that returns an output of the same dimension as the input. Each of the weights W _Ψ , W _Φ , bias b _Ψ , b _Φ , active function, ψ _Ψ , φ _Φ may be set for each layer of the multi-layer neural network.

次に、学習部１１２は、データセット記憶部１２２から、出力変数ｙ、入力変数ｖ、外生入力ｄ、出力変数ｙの時間微分ｙ・、外生入力ｄの時間微分ｄ・についての入出力データセット［ｙ、ｖ、ｄ、ｙ・、ｄ・］を取得する（ステップＳ１２）。本実施形態では、入出力データセット［ｙ、ｖ、ｄ、ｙ・、ｄ・］の各データは、ｊ組（ｊは自然数、ｊ＝１～Ｎ）含まれている。取得した入出力データセットのうち、［ｙ_j、ｖ_j、ｄ_j、ｄ・_j］は、入力変数データセットに相当し、［ｙ・_j］は、出力変数データセットに相当する。 Next, the learning unit 112 inputs and outputs from the data set storage unit 122 with respect to the output variable y, the input variable v, the exogenous input d, the time derivative y of the output variable y, and the time derivative d of the exogenous input d. Acquire the data set [y, v, d, y ·, d ·] (step S12). In the present embodiment, each data of the input / output data set [y, v, d, y ·, d ·] includes j sets (j is a natural number, j = 1 to N). Of the acquired input / output data sets, [y _j , v _j , d _j , d · _j ] corresponds to the input variable data set, and [y · _j ] corresponds to the output variable data set.

次に、学習部１１２は、モデルに入力データセットを与えて出力を推定する（ステップＳ１３）。具体的には、学習部１１２は、ステップＳ１１で取得し初期化した式（１）の状態方程式に対して、ステップＳ１２で取得した入力変数データセット［ｙ_j、ｖ_j、ｄ_j、ｄ・_j］を与える。これにより、出力変数ｙ・ｊの推定値（式（１７）の左辺）を得ることができる。なお、（∂Φ／∂ｙ）^-1は、出力変数ｙおよび外生入力ｄの関数であるため、出力変数ｙ_jおよび外生入力ｄ_jを代入することで評価可能である。また、式（１７）の右辺の（∂Φ／∂ｄ）は、入力変数ｖおよび外生入力ｄの関数であるため、入力変数ｖ_jおよび外生入力ｄ_jを代入することで評価可能である。

Next, the learning unit 112 gives an input data set to the model and estimates the output (step S13). Specifically, the learning unit 112 has the input variable data set [y _j , v _j , d _j , d. _j ] is given. As a result, the estimated value of the output variables y and j (the left side of the equation (17)) can be obtained. Since (∂Φ / ∂y) ^-1 is a function of the output variable y and the exogenous input d, it can be evaluated by substituting the output variable y _j and the exogenous input d _j . Further, since (∂Φ / ∂d) on the right side of the equation (17) is a function of the input variable v and the exogenous input d, it can be evaluated by substituting the input variable v _j and the exogenous input d _j . be.

次に、学習部１１２は、推定された出力と出力変数データセットとの一致度を評価する（ステップＳ１４）。具体的には、学習部１１２は、ステップＳ１３で得られた出力変数ｙ・_jの推定値と、ステップＳ１２で取得した出力変数データセット［ｙ・_j］との一致度を評価する。学習部１１２は、例えば、式（１８）に示す二乗平均誤差（ＭＳＥ：ＭｅａｎＳｑｕａｒｅＥｒｒｏｒ）を一致度の指標として用いることができる。ＭＳＥの場合、等号の左辺Ｊの値が小さければ小さいほど、一致度が高い。なお、学習部１１２は、二乗平均誤差の代わりに、例えば、絶対平均誤差率や、交差エントロピーなどの指標を用いて、一致度を評価してもよい。

Next, the learning unit 112 evaluates the degree of matching between the estimated output and the output variable data set (step S14). Specifically, the learning unit 112 evaluates the degree of agreement between the estimated value of the output variable y · _j obtained in step S13 and the output variable data set [y · _j ] obtained in step S12. For example, the learning unit 112 can use the root mean square error (MSE: Mean Square Error) shown in the equation (18) as an index of the degree of matching. In the case of MSE, the smaller the value of the left side J of the equal sign, the higher the degree of coincidence. The learning unit 112 may evaluate the degree of agreement by using an index such as an absolute average error rate or cross entropy instead of the root mean square error.

次に、学習部１１２は、一致度が十分であるか否かを判定する（ステップＳ１５）。例えば、式（１８）のＭＳＥを用いる場合、学習部１１２は、Ｊの値が所定値以下である場合に、一致度が十分であると判定できる。なお、学習部１１２は、Ｊの値の変化率が所定値以下である場合に、一致度が十分であると判定してもよい。所定値は任意に決定できる。 Next, the learning unit 112 determines whether or not the degree of agreement is sufficient (step S15). For example, when the MSE of the equation (18) is used, the learning unit 112 can determine that the degree of agreement is sufficient when the value of J is equal to or less than a predetermined value. The learning unit 112 may determine that the degree of agreement is sufficient when the rate of change of the value of J is equal to or less than a predetermined value. The predetermined value can be arbitrarily determined.

一致度が十分でない場合（ステップＳ１５：ＮＯ）、学習部１１２は、ステップＳ１６に進み、ステップＳ１１で定義した式（１）のモデルにおける、例えば、式（１）に含まれる関数Ａ’、関数Ｂ’、関数ｃ’、式（５）～式（１２）に含まれる重みＷ_Ψ、Ｗ_Φやバイアスｂ_Ψ、ｂ_Φなどの学習パラメータを更新する。学習部１１２は、例えば、バックプロパゲーションにより各学習パラメータに対するＪの勾配を評価し、各種の勾配法に基づいて、各学習パラメータを更新してもよい。その後、学習部１１２は、ステップＳ１３に進み、出力の推定および評価を繰り返す。 When the degree of matching is not sufficient (step S15: NO), the learning unit 112 proceeds to step S16, and in the model of the equation (1) defined in the step S11, for example, the function A'included in the equation (1), the function. The learning parameters such as B', the function c', the weights W _Ψ , W _Φ , the bias b _Ψ , and b _Φ included in the equations (5) to (12) are updated. The learning unit 112 may evaluate the gradient of J for each learning parameter by, for example, backpropagation, and update each learning parameter based on various gradient methods. After that, the learning unit 112 proceeds to step S13 and repeats the estimation and evaluation of the output.

一致度が十分である場合（ステップＳ１５：ＹＥＳ）、学習部１１２は、処理を終了する。この際、学習部１１２は、学習した関数Ｆについて、入出力部１５０に出力してもよく、記憶部１２０に記憶してもよく、通信部１４０を介して他の装置に送信してもよい。 When the degree of matching is sufficient (step S15: YES), the learning unit 112 ends the process. At this time, the learning unit 112 may output the learned function F to the input / output unit 150, store it in the storage unit 120, or transmit it to another device via the communication unit 140. ..

本実施形態のモデル学習装置１００がシステムの操作量を制御する制御装置と組み合わされている場合、モデル学習装置１００は、学習部１１２において学習した関数Ｆを制御装置に出力する。制御装置では、出力された関数Ｆを用いて、システムの現在時刻の出力から、将来の出力を制御するための入力を計算する。制御装置は、計算された入力をシステムに出力し、システムを制御する。 When the model learning device 100 of the present embodiment is combined with a control device that controls the operation amount of the system, the model learning device 100 outputs the function F learned by the learning unit 112 to the control device. The control device uses the output function F to calculate an input for controlling future outputs from the output of the system at the current time. The control device outputs the calculated input to the system and controls the system.

次に、図２で説明したモデル学習方法で学習されるモデル（状態方程式）において、解の一意性を保証できる理由について説明する。一般に、過渡的な現象を再現できる動的なモデルをニューラルネットワーク（機械学習）で構築する場合、当該モデルが安定である、言い換えれば、発散しない、保証はない。しかし、上述した式（１）に示した状態方程式を、出力変数ｙを写像Φによって変換した内部変数ｘを用いることで等価変換した式（４）は、内部変数ｘについて線形な微分方程式を含んでいる。このとき、入力変数ｖを写像Ψを用いて変換した内部変数ｕも同様に、微分方程式の線形項となる、写像Φ、Ψのそれぞれは、全単射な写像であるため、一意の逆関数が存在する。すなわち、内部変数ｘと出力変数ｙ、および、入力変数ｖと内部変数ｕのそれぞれは、相互に変換が可能であることから、線形化された式（４）を解くことで、非線形の式（１）の解を求めることができる。したがって、モデル学習装置１００を備える制御装置は、図２で説明したモデル学習方法で学習されるモデルを用いて、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。 Next, the reason why the uniqueness of the solution can be guaranteed in the model (equation of state) learned by the model learning method described with reference to FIG. 2 will be described. In general, when a dynamic model capable of reproducing a transient phenomenon is constructed by a neural network (machine learning), the model is stable, in other words, it does not diverge, and there is no guarantee. However, the equation (4) obtained by equivalently transforming the state equation shown in the above equation (1) by using the internal variable x obtained by transforming the output variable y by the mapping Φ includes a linear differential equation with respect to the internal variable x. I'm out. At this time, the internal variable u obtained by transforming the input variable v using the map Ψ is also a linear term of the differential equation. Since each of the maps Φ and Ψ is a bijective map, it is a unique inverse function. Exists. That is, since the internal variable x and the output variable y, and the input variable v and the internal variable u can be converted to each other, a non-linear equation (4) can be solved by solving the linearized equation (4). The solution of 1) can be obtained. Therefore, the control device including the model learning device 100 uses the model learned by the model learning method described with reference to FIG. 2 to stably control the system and improve the followability of the output from the system to the target value. can do.

以上説明した、本実施形態のモデル学習装置１００によれば、モデルは、システムに入力される入力変数ｖを入力とする全単射な写像Ψと、システムから出力される出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。このような状態方程式は、写像Ψ、Φのそれぞれを内部変数とすることで、線形化することができるため、非線形な構造をしているモデルを用いた制御問題においても、解が一意であることを保証することができる。これにより、システムに入力される入力変数ｖの最適値を１つに決めることができるため、システムを制御する制御装置にこのモデル学習装置１００を適用した場合、入力変数ｖの最適値を用いて、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができるモデルを学習することができる。 According to the model learning device 100 of the present embodiment described above, the model inputs a bijective mapping Ψ having an input variable v input to the system as an input and an output variable y output from the system as an input. It is a state equation including a bijective map Φ. Since such an equation of state can be linearized by setting each of the maps Ψ and Φ as internal variables, the solution is unique even in a control problem using a model having a non-linear structure. We can guarantee that. As a result, the optimum value of the input variable v input to the system can be determined to be one. Therefore, when the model learning device 100 is applied to the control device that controls the system, the optimum value of the input variable v is used. It is possible to improve the followability of the output from the system to the target value while stably controlling the system. Therefore, it is possible to learn a model that can determine an input that improves the followability of the output to the target value while stably controlling the system.

また、一般的に、機械学習を用いて学習されたモデルは、比較的非線形性が強いため、このモデルを用いて予測される出力を何らかの目標に適切に追従させる最適制御問題は、非凸最適化問題になりやすい。このため、その問題を解く際の初期条件によって、得られる解が大きく変化してしまう可能性があり、入力のばたつきなどの信頼性問題に繋がるため、最適解を得ることが非常に難しい。本実施形態のモデル学習装置１００は、解が一意であることを保証することができるため、システムの出力（状態）の目標値に追従させる制御問題に対応する最適制御問題を、凸最適化問題とすることができる。これにより、解が、初期条件によらず最適な一意となることが保証されるため、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。 Also, in general, a model trained using machine learning has a relatively strong non-linearity, so the optimal control problem that appropriately follows the output predicted using this model to some target is non-convex optimization. It tends to be a problem. For this reason, the obtained solution may change significantly depending on the initial conditions when solving the problem, which leads to reliability problems such as input fluttering, and it is very difficult to obtain the optimum solution. Since the model learning device 100 of the present embodiment can guarantee that the solution is unique, the optimum control problem corresponding to the control problem that follows the target value of the output (state) of the system is a convex optimization problem. Can be. This guarantees that the solution will be optimally unique regardless of the initial conditions, so that it is possible to improve the followability of the output from the system to the target value while stably controlling the system.

また、本実施形態のモデル学習装置１００によれば、式（１）のモデルに含まれる写像Ψ、Φのそれぞれには、出力変数ｙの変化に影響を与える制御不可能な入力である外生入力ｄが含まれている。また、式（１）のモデルでは、外生入力ｄを入力とする関数Ａ’（ｄ）と関数Ｂ’（ｄ）とのそれぞれが写像Ψ、Φのそれぞれの係数となっている。さらに、式（１）のモデルには、外生入力ｄを入力とする関数ｃ’（ｄ）と、外生入力ｄの時間微分の項と、が含まれている。これらによって、式（１）のモデルは、出力変数ｙの変化に影響を与える制御不可能な外生入力ｄによる影響も考慮した状態方程式となるため、このモデルを用いることで、システムの将来の状態を高精度に予測することができる。したがって、システムを高精度に制御することができる制御装置を構築可能なモデルを学習することができる。 Further, according to the model learning device 100 of the present embodiment, each of the maps Ψ and Φ included in the model of the equation (1) is an exogenous input that affects the change of the output variable y. The input d is included. Further, in the model of the equation (1), the function A'(d) and the function B'(d) having the exogenous input d as an input are the coefficients of the mapping Ψ and Φ, respectively. Further, the model of the equation (1) includes a function c'(d) having the exogenous input d as an input and a term of the time derivative of the exogenous input d. As a result, the model of Eq. (1) becomes an equation of state that takes into account the influence of the uncontrollable exogenous input d that affects the change of the output variable y. Therefore, by using this model, the future of the system can be used. The state can be predicted with high accuracy. Therefore, it is possible to learn a model capable of constructing a control device capable of controlling the system with high accuracy.

また、本実施形態のモデル学習装置１００によれば、式（１）の状態方程式において、写像Ψを内部変数ｕと定義し、写像Φを内部変数ｘと定義することで、式（４）に示すように、状態方程式を線形化することができる。これにより、式（１）に示す状態方程式において、解が一意であることを保証することができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 Further, according to the model learning device 100 of the present embodiment, in the equation of state of the equation (1), the map Ψ is defined as the internal variable u, and the map Φ is defined as the internal variable x, so that the equation (4) is obtained. As shown, the equation of state can be linearized. This makes it possible to guarantee that the solution is unique in the equation of state shown in the equation (1). Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system.

また、本実施形態のモデル学習装置１００によれば、写像Ψ、Φのそれぞれは、多層ニューラルネットワークを用いて定義されている（式（５）～式（１２））。これにより、多層ニューラルネットワークの各層における重みＷ_Ψ、Ｗ_Φやバイアスｂ_Ψ、ｂ_Φを調整することで、モデルを用いて計算される入力変数ｖの入力によるシステムの出力が実際の値に近づけることができる。したがって、出力の目標値に対する追従性をさらに向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 Further, according to the model learning device 100 of the present embodiment, each of the maps Ψ and Φ is defined by using a multi-layer neural network (Equations (5) to (12)). By adjusting the weights W _Ψ , W _Φ , bias b _Ψ , and b _Φ in each layer of the multi-layer neural network, the output of the system by the input of the input variable v calculated using the model approaches the actual value. be able to. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that further improves the followability of the output to the target value.

また、本実施形態のモデル学習装置１００によれば、学習部１１２は、入出力データセットのうちの入力変数データセットを用いて推定された出力と、出力変数データセットとの一致度を評価する。学習部１１２は、この一致度の評価に応じて、モデルについての学習パラメータを更新し、状態方程式を学習する。すなわち、学習部１１２は、予め準備された入出力データセットを教師データとした学習手法に沿って、非線形の状態方程式を学習することができる。これにより、実際のシステムに沿ったモデルを学習することができるため、システムをさらに安定的に制御しつつ、システムからの出力の目標値に対する追従性がさらに向上させる制御装置を構築可能なモデルを学習することができる。 Further, according to the model learning device 100 of the present embodiment, the learning unit 112 evaluates the degree of matching between the output estimated by using the input variable data set of the input / output data sets and the output variable data set. .. The learning unit 112 updates the learning parameters for the model according to the evaluation of the degree of coincidence, and learns the equation of state. That is, the learning unit 112 can learn the nonlinear equation of state according to the learning method using the input / output data set prepared in advance as the teacher data. As a result, it is possible to learn a model that is in line with the actual system, so it is possible to build a model that can build a control device that further improves the followability to the target value of the output from the system while controlling the system more stably. You can learn.

また、本実施形態のモデル学習方法によれば、モデルを取得するステップＳ１１において取得するモデルは、システムに入力される入力変数ｖを入力とする全単射な写像Ψと、システムから出力される出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。このような状態方程式は、写像Ψ、Φのそれぞれを内部変数ｕ、ｘとすることで、線形化することができるため、非線形な構造をしているモデルを用いた制御問題においても、解が一意であることを保証することができる。これにより、システムに入力される入力変数ｖの最適値を１つに決めることができるため、システムを制御する制御装置にこのモデル学習方法を適用した場合、入力変数ｖの最適値を用いて、システムを安定的に制御しつつ、システムからの出力の目標値に対する追従性を向上することができる。したがって、システムを安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 Further, according to the model learning method of the present embodiment, the model acquired in step S11 of acquiring the model has a bijective mapping Ψ input to the input variable v input to the system and output from the system. It is a state equation including a bijective map Φ with an output variable y as an input. Since such an equation of state can be linearized by setting the maps Ψ and Φ as internal variables u and x, respectively, the solution can be solved even in a control problem using a model having a non-linear structure. It can be guaranteed to be unique. As a result, the optimum value of the input variable v input to the system can be determined to be one. Therefore, when this model learning method is applied to the control device that controls the system, the optimum value of the input variable v is used. While stably controlling the system, it is possible to improve the followability of the output from the system to the target value. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves followability to an output target value while stably controlling the system.

＜第２実施形態＞
図３は、第２実施形態の制御装置２００の構成を示す模式図である。第２実施形態の制御装置２００は、学習部２１２と決定部２１３を有するＣＰＵ２１０を備える。 <Second Embodiment>
FIG. 3 is a schematic diagram showing the configuration of the control device 200 of the second embodiment. The control device 200 of the second embodiment includes a CPU 210 having a learning unit 212 and a determination unit 213.

制御装置２００は、車載ＥＣＵ（ＥｌｅｃｔｒｏｎｉｃＣｏｎｔｒｏｌＵｎｉｔ）として実現され得る。本実施形態の制御装置２００は、制御装置２００をシステム３００の制御のために用いることができる。システム３００とは、第１実施形態と同様に、例えば、内燃機関、ハイブリッド機関、パワートレインなどである。なお、制御装置２００は、例えば、パーソナルコンピュータであって、システム３００の分析のために用いてもよい。 The control device 200 can be realized as an in-vehicle ECU (Electronic Control Unit). In the control device 200 of the present embodiment, the control device 200 can be used for controlling the system 300. The system 300 is, for example, an internal combustion engine, a hybrid engine, a power train, or the like, as in the first embodiment. The control device 200 may be, for example, a personal computer and may be used for analysis of the system 300.

制御装置２００は、ＣＰＵ２１０と、記憶部１２０と、ＲＯＭ／ＲＡＭ１３０と、通信部１４０と、入出力部１５０と、を備えている。制御装置２００の各部は、バスにより相互に接続されている。なお、制御装置２００の機能部のうちの少なくとも一部は、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃａｔｉｏｎＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）により実現されてもよい。 The control device 200 includes a CPU 210, a storage unit 120, a ROM / RAM 130, a communication unit 140, and an input / output unit 150. Each part of the control device 200 is connected to each other by a bus. At least a part of the functional parts of the control device 200 may be realized by an ASIC (Application Specific Circuit).

ＣＰＵ２１０は、制御部１１１と、学習部２１２と、決定部２１３と、を備えている。制御部１１１は、第１実施形態の制御部１１１と同様に、ＲＯＭ１３０に格納されているコンピュータプログラムをＲＡＭ１３０に展開して実行することにより、モデル学習装置１００の各部を制御する。学習部２１２は、後述する予測制御方法において、システム３００の状態を表す出力変数ｙを予測するための非線形の状態方程式を学習する。決定部２１３は、学習部２１２が学習した状態方程式を用いて、出力変数ｙの目標値に対応する入力変数ｖの目標値を決定する。 The CPU 210 includes a control unit 111, a learning unit 212, and a determination unit 213. Similar to the control unit 111 of the first embodiment, the control unit 111 controls each unit of the model learning device 100 by expanding and executing the computer program stored in the ROM 130 in the RAM 130. The learning unit 212 learns a non-linear equation of state for predicting the output variable y representing the state of the system 300 in the prediction control method described later. The determination unit 213 determines the target value of the input variable v corresponding to the target value of the output variable y by using the equation of state learned by the learning unit 212.

図４は、第２実施形態の予測制御方法のフローチャートである。システム３００の予測制御方法は、例えば、所定のアプリケーションの起動などの利用者からの要求などによって実行される。 FIG. 4 is a flowchart of the predictive control method of the second embodiment. The predictive control method of the system 300 is executed, for example, by a request from a user such as starting a predetermined application.

最初に、学習部２１２は、モデル、目的関数、および、制約関数を取得する（ステップＳ２１）。具体的には、学習部２１２は、モデル記憶部１２１に記憶されている非線形の状態方程式を読み込むとともに、システム３００を最適に制御するための目的関数Ｊと、制約関数Ｇとを読み込む。本実施形態では、学習部２１２は、式（２）～式（４）を、離散時刻ｋにおいて所定の時間ステップΔｔで離散化した式（１３）～式（１５）に示す状態方程式を読み込む。

式（１５）に含まれるＡ（ｄ_k）、Ｂ（ｄ_k）、Ｃ（ｄ_k）のそれぞれは、例えば、式（２）～式（４）の関数Ａ’（ｄ）、関数Ｂ’（ｄ）、および、関数ｃ’（ｄ）を用いて、以下の式（１９）～式（２１）としてもよい。

First, the learning unit 212 acquires the model, the objective function, and the constraint function (step S21). Specifically, the learning unit 212 reads the nonlinear equation of state stored in the model storage unit 121, and also reads the objective function J and the constraint function G for optimally controlling the system 300. In the present embodiment, the learning unit 212 reads the equations of state shown in the equations (13) to (15) in which the equations (2) to (4) are discretized in the predetermined time step Δt at the discrete time k.

Each of A (d _k ), B (d _k ), and C (d _k ) included in the equation (15) is, for example, the function A'(d) and the function B'of the equations (2) to (4). The following equations (19) to (21) may be obtained by using (d) and the function c'(d).

次に、学習部２１２は、現在時刻の最適制御問題のパラメータを決定する（ステップＳ２２）。具体的には、学習部２１２は、現在時刻を時刻ｋとして、システム３００の各所に事前に設けられているセンサなどから取得された出力変数ｙ_k、制御入力ｖ_k-1、外生入力ｄ_k、および、目標値ｙ_ktを読み込む。学習部２１２は、式（１３）～式（１５）を用いて、内部変数ｘ_k、内部変数ｘ_kの目標値ｘ_kt、および、内部変数ｕ_k-1を計算する。 Next, the learning unit 212 determines the parameters of the optimum control problem at the current time (step S22). Specifically, the learning unit 212 has an output variable y _k , a control input v _k-1 , and an external input d acquired from sensors or the like provided in advance in various parts of the system 300, with the current time as the time k. _{Read k} and the target value y _kt . The learning unit 212 calculates the internal variable x _k , the target value x _kt of the internal variable x _k , and the internal variable u _k-1 using the equations (13) to (15).

次に、決定部２１３は、最適化の初期入力時系列を読み込む（ステップＳ２３）。具体的には、決定部２１３は、離散時刻ｋを起点として、時刻ｋ_f=ｋ＋Ｎ（Ｎは所定の自然数）までの入力時系列ｕ_k、・・・ｕ_kfの初期値を決定する。 Next, the determination unit 213 reads the initial input time series of optimization (step S23). Specifically, the determination unit 213 determines the initial values of the input time series uk, ... U _kf starting from the discrete time _k and up to the time k _f = k + N (N is a predetermined natural number).

次に、決定部２１３は、最適制御問題を解く（ステップＳ２４）。具体的には、決定部２１３は、式（２２）、（２３）に示す最適制御問題を解く。

ｘκ（κ＝ｋ、・・・ｋ_f＋１）は、式（１５）に従う。ｇは、ｘ_k、・・・ｘ_kf+1、ｕ_k-1、・・・ｕ_kfに対して凸となる任意のスカラー関数である。制約関数Ｇは、ｘ_k、・・・ｘ_kf+1、ｕ_k-1、・・・ｕ_kfに対して凸となる任意のベクトル関数である。Ｑは、ｎ×ｎの正定値対称行列であり、目標値ｘ_ktは、離散時刻ｋにおけるｘの目標値であり、離散時刻ｋにおける出力変数ｙの目標値ｙ_ktからｘ_kt＝Φ（ｙ_kt、ｄ_k）によって変換されたものである。 Next, the determination unit 213 solves the optimal control problem (step S24). Specifically, the determination unit 213 solves the optimal control problem shown in the equations (22) and (23).

xκ (κ = k, ... k _f +1) follows equation (15). g is an arbitrary scalar function that is convex with respect to x _k , ... x _{kf + 1} , u _k-1 , ... u _kf . The constraint function G is an arbitrary vector function that is convex with respect to x _k , ... x _{kf + 1} , u _k-1 , ... u _kf . Q is an n × n definite matrix, and the target value x _kt is the target value of x at the discrete time k. From the target value y _kt of the output variable y at the discrete time k, x _kt = Φ (y). It is converted by _kt , d _k ).

式（２２）および式（２３）に示す最適制御問題では、目的関数Ｊを最小化するｕκ（κ＝ｋ、・・・ｋ_f）の時系列を求める。このとき、式（２２）に含まれる式（２４）を小さくするため、目標値に素早く追従するようなｕκ（κ＝ｋ、・・・ｋ_f）でなければならない。そのため、式（２４）を含む目的関数Ｊを最小化するｕκ（κ＝ｋ、・・・ｋ_f）の解は、目標値に素早く追従させる制御を実現するものとなっている。

In the optimal control problem shown in Eqs. (22) and (23), a time series of uκ (κ = k, ... k _f ) that minimizes the objective function J is obtained. At this time, in order to reduce the equation (24) included in the equation (22), the uκ (κ = k, ... k _f ) must be such that the target value can be quickly followed. Therefore, the solution of uκ (κ = k, ... k _f ) that minimizes the objective function J including the equation (24) realizes the control to quickly follow the target value.

スカラー関数ｇは、副次的な機能を持たせるために自由に設定できる。例えば、次のように設定してもよい。

Ｒ、Ｓのそれぞれは、ｍ×ｍの正定値対称行列である。式（２５）に含まれる式（２６）は、内部変数ｕが０に近いほど小さくなり、式（２５）に含まれる式（２７）は、内部変数ｕの時間的な変化が小さいほど小さくなる。これにより、目的関数Ｊを最小化する解は、内部変数ｕをできるだけ０に近づけ、かつ、内部変数ｕをできるだけ変化させないものとなる。

The scalar function g can be freely set to have a secondary function. For example, the following may be set.

Each of R and S is a definite-value symmetric matrix of m × m. The equation (26) included in the equation (25) becomes smaller as the internal variable u approaches 0, and the equation (27) included in the equation (25) becomes smaller as the temporal change of the internal variable u becomes smaller. .. As a result, the solution that minimizes the objective function J makes the internal variable u as close to 0 as possible and does not change the internal variable u as much as possible.

ベクトル関数である制約関数Ｇには、所望の制約条件を設定できる。例えば、次のように設定してもよい。

式（２８）は、以下の式（２９）に示す上下限制約を表す。

決定部２１３は、以上の問題を解いて、内部変数ｕ_kを求めれば、そこから式（１３）を用いて、入力変数ｖ_kの目標値を決定することができる。 A desired constraint condition can be set in the constraint function G, which is a vector function. For example, the following may be set.

Equation (28) represents the upper and lower limit constraints shown in the following equation (29).

If the determination unit 213 solves the above problem and obtains the internal variable u _k , the determination unit 213 can determine the target value of the input variable v _k from the internal variable u k using the equation (13).

図５は、凸関数と非凸関数の一例を説明する模式図である。ここで、凸関数とは、任意の０＜ｔ＜１、および、任意のｘ、ｙに対して、以下の式（３０）が成り立つ関数のことを言う。

直感的には、図５（ａ）に示すような形の関数が凸関数であり、図５（ｂ）に示すような形の関数が、非凸関数である。凸関数の場合、最適値（図５（ａ）では最小値Ｌ０）を、一意に決定することができる。しかしながら、非凸関数の場合、図５（ｂ）に示すように、局所的に最小値となる値が複数（図５（ｂ）の場合、値Ｌ１、Ｌ２、Ｌ３、Ｌ４、Ｌ５、Ｌ６）存在するため、最適値が決定されるとは限らない。 FIG. 5 is a schematic diagram illustrating an example of a convex function and a non-convex function. Here, the convex function means a function in which the following equation (30) holds for any 0 <t <1 and any x, y.

Intuitively, a function of the form shown in FIG. 5 (a) is a convex function, and a function of the form shown in FIG. 5 (b) is a non-convex function. In the case of a convex function, the optimum value (minimum value L0 in FIG. 5A) can be uniquely determined. However, in the case of the non-convex function, as shown in FIG. 5 (b), there are a plurality of locally minimum values (in the case of FIG. 5 (b), the values L1, L2, L3, L4, L5, L6). Because it exists, the optimum value is not always determined.

ステップＳ２４では、ステップＳ２２で決定した条件において、ステップＳ２３の初期値を用いて、式（２２）、（２３）の最適制御問題を解く。この問題は、たとえば逐次二次計画法などの数理計画法を用いて解くことができる。 In step S24, under the conditions determined in step S22, the optimum control problem of the equations (22) and (23) is solved by using the initial value of step S23. This problem can be solved using a mathematical programming method such as a sequential quadratic programming method.

次に、得られた解をシステム３００への入力として反映する（ステップＳ２５）。具体的には、制御部１１１は、ステップＳ２４で得られるｕ_k、・・・ｕ_kfの最適解と、式（１３）のΨを用いて、ｖ_k、・・・ｖ_kfに変換し、このうちのｖ_kを実際の制御入力ｖ_kとする。 Next, the obtained solution is reflected as an input to the system 300 (step S25). Specifically, the control unit 111 converts the optimum solution of u _k , ... u _kf obtained in step S24 into v _k , ... v _kf using Ψ of the equation (13). Of these, v _k is the actual control input v _k .

次に、制御部１１１は、制御を終了するか否かを判定する（ステップＳ２６）。具体的には、制御部１１１は、制御を終了する外部信号の受信の状態に応じて制御を終了するか否かを判定する。制御部１１１が外部信号を受信している場合、予測した制御入力ｖ_kを外部に出力し、今回の制御処理を終了する。出力は、入出力部１５０に対して行ってもよく、記憶部１２０に記憶させてもよく、通信部１４０を介して他の装置、例えば、呼び出し元のＥＣＵなどに送信してもよい。制御部１１１が外部信号を受信しない場合、ステップＳ２７に進む。 Next, the control unit 111 determines whether or not to end the control (step S26). Specifically, the control unit 111 determines whether or not to end the control according to the reception state of the external signal that ends the control. When the control unit 111 receives an external signal, the predicted control input v _k is output to the outside, and the current control process is terminated. The output may be performed to the input / output unit 150, may be stored in the storage unit 120, or may be transmitted to another device, for example, the calling ECU or the like via the communication unit 140. If the control unit 111 does not receive the external signal, the process proceeds to step S27.

ステップＳ２６において制御部２１１が外部信号を受信しない場合、制御部１１１は、時刻を進める（ステップＳ２７）。制御部１１１は、時刻を進め、ステップＳ２２に戻る。その後、ステップＳ２２からステップＳ２５を繰り返し、ステップＳ２６において、制御部２１１が制御を終了する外部信号を受信しているか否かを判定する。 If the control unit 211 does not receive the external signal in step S26, the control unit 111 advances the time (step S27). The control unit 111 advances the time and returns to step S22. After that, steps S22 to S25 are repeated, and in step S26, it is determined whether or not the control unit 211 has received an external signal for terminating control.

図６は、モデル学習装置１００における計算結果を説明する第１の模式図である。ここで、第１実施形態のモデル学習装置１００を用いて、仮想のシステムの出力から入力の予測処理を行った計算結果を説明する。図６は、今回の計算結果において、仮想のシステムにおける複数の出力の時間変化を示したものである。図６には、４種類の出力（「出力１」、「出力２」、「出力３」、「出力４」）の時間変化が実線ＯＰ１、ＯＰ２、ＯＰ３、ＯＰ４で示されている。４種類の出力のうち、出力１、出力２、および、出力３は、異なる種類の出力を示しており、それぞれの出力において、目標値が設定されている（出力１、出力２、および、出力３の点線Ｄｏ１、Ｄｏ２、Ｄｏ３参照）。また、出力４においては、上限制約が点線Ｄｏ４で示されている。 FIG. 6 is a first schematic diagram illustrating a calculation result in the model learning device 100. Here, the calculation result obtained by performing the input prediction processing from the output of the virtual system by using the model learning device 100 of the first embodiment will be described. FIG. 6 shows the time change of a plurality of outputs in the virtual system in the calculation result of this time. In FIG. 6, the time changes of the four types of outputs (“output 1”, “output 2”, “output 3”, and “output 4”) are shown by solid lines OP1, OP2, OP3, and OP4. Of the four types of outputs, output 1, output 2, and output 3 indicate different types of outputs, and target values are set for each output (output 1, output 2, and output). 3 Dotted line Do1, Do2, Do3). Further, in the output 4, the upper limit constraint is indicated by the dotted line Do4.

図７は、２つのモデル学習装置における計算結果を説明する第２の模式図である。図７には、図６で示す４種類の出力が仮想のシステムから出力されるための入力を計算した結果が示されている。図７には、本実施形態のモデル学習装置を用いて計算された３種類の入力（「入力１」、「入力２」、「入力３」）の時間変化を、一点鎖線で囲んだ内側に示している。また、図７には、比較例のモデル学習装置を用いて計算された３種類の入力の時間変化を、二点鎖線で囲んだ内側に示している。比較例のモデル学習装置では、本実施形態のモデル学習装置のように、モデルとして、入力変数および出力変数を入力とする写像に、全単射な写像が用いられていない。 FIG. 7 is a second schematic diagram illustrating the calculation results of the two model learning devices. FIG. 7 shows the result of calculating the input for outputting the four types of outputs shown in FIG. 6 from the virtual system. In FIG. 7, the time changes of three types of inputs (“input 1”, “input 2”, and “input 3”) calculated using the model learning device of the present embodiment are shown inside surrounded by a dotted chain line. Shows. Further, FIG. 7 shows the time variation of three types of inputs calculated using the model learning device of the comparative example inside surrounded by a two-dot chain line. In the model learning device of the comparative example, unlike the model learning device of the present embodiment, a bijective mapping is not used as a mapping with input variables and output variables as inputs.

図７に示す入力１～入力３は、図６で示した４種類の出力に対して、複数の異なる初期条件で計算された結果である。比較例のモデル学習装置では、初期条件が異なることで、入力１～入力３の値がそれぞれ変動し、例えば、入力２だけを見ても安定せず、ばらついている。このため、比較例の予測処理では、出力１～出力４を実現するための入力を１つに決定することが難しい。一方、本実施形態のモデル学習装置では、初期条件が異なっていても、入力１～入力３の値がばらつくことがない。すなわち、入力を１つに決定することができるため、入力が安定する。 Inputs 1 to 3 shown in FIG. 7 are the results calculated under a plurality of different initial conditions for the four types of outputs shown in FIG. In the model learning device of the comparative example, the values of input 1 to input 3 fluctuate due to different initial conditions, and for example, even if only input 2 is viewed, it is not stable and varies. Therefore, in the prediction process of the comparative example, it is difficult to determine one input for realizing the outputs 1 to 4. On the other hand, in the model learning device of the present embodiment, the values of input 1 to input 3 do not vary even if the initial conditions are different. That is, since the input can be determined to be one, the input is stable.

以上説明した、本実施形態の制御装置２００によれば、学習部２１２が取得するモデルは、システム３００に入力される入力変数ｖを入力とする全単射な写像Ψと、システム３００から出力される出力変数ｙを入力とする全単射な写像Φと、を含む状態方程式である。このような状態方程式は、写像Ψ、Φのそれぞれを内部変数とすることで、線形化することができるため、非線形な構造をしているモデルを用いた制御問題においても、解が一意であることを保証することができる。これにより、システム３００を安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを学習することができる。 According to the control device 200 of the present embodiment described above, the model acquired by the learning unit 212 is output from the system 300 and a bijective mapping Ψ with the input variable v input to the system 300 as an input. It is a state equation including a bijective map Φ with an output variable y as an input. Since such an equation of state can be linearized by setting each of the maps Ψ and Φ as internal variables, the solution is unique even in a control problem using a model having a non-linear structure. We can guarantee that. This makes it possible to learn a model capable of constructing a control device capable of determining an input that improves followability to a target value of an output while stably controlling the system 300.

また、本実施形態の制御装置２００によれば、学習部２１２は、式（２）～式（４）に示す状態方程式を、離散時刻ｋの時間ステップで離散化した式（１３）～式（１５）として学習する。これにより、内部変数ｘ、ｕの数を有限とすることができるため、モデルの学習に要する時間を短くすることができる。したがって、システム３００を安定的に制御しつつ出力の目標値に対する追従性を向上させる入力を決定することができる制御装置を構築可能なモデルを比較的短時間で学習することができる。 Further, according to the control device 200 of the present embodiment, the learning unit 212 discretizes the equations of state shown in the equations (2) to (4) in the time step of the discrete time k, and the equations (13) to the equations ( Learn as 15). As a result, the number of internal variables x and u can be made finite, so that the time required for learning the model can be shortened. Therefore, it is possible to learn a model capable of constructing a control device capable of determining an input that improves the followability of the output to the target value while stably controlling the system 300 in a relatively short time.

また、本実施形態の制御装置２００によれば、決定部２１３は、学習部２１２が学習した式（１３）～式（１５）に示す状態方程式を用いて、式（２２）および式（２３）に示す最適制御問題を解くことで入力変数ｖを決定する。これにより、最適制御問題は、線形モデルに対する制御問題となり、式（１３）～式（１５）を用いた最適制御問題を凸最適化問題とすることができる。したがって、システム３００に入力される入力変数ｖの最適値を１つに決めることができるため、制御装置は、システム３００を安定的に制御しつつ、システム３００からの出力の目標値に対する追従性を向上することができる。 Further, according to the control device 200 of the present embodiment, the determination unit 213 uses the equations of state shown in the equations (13) to (15) learned by the learning unit 212, and the equations (22) and (23) are used. The input variable v is determined by solving the optimum control problem shown in. As a result, the optimal control problem becomes a control problem for the linear model, and the optimal control problem using the equations (13) to (15) can be regarded as the convex optimization problem. Therefore, since the optimum value of the input variable v input to the system 300 can be determined to be one, the control device can stably control the system 300 and can follow the target value of the output from the system 300. Can be improved.

＜本実施形態の変形例＞
本発明は上記の実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば次のような変形も可能である。また、上記実施形態において、ハードウェアによって実現されるとした構成の一部をソフトウェアに置き換えるようにしてもよく、逆に、ソフトウェアによって実現されるとした構成の一部をハードウェアに置き換えるようにしてもよい。 <Modified example of this embodiment>
The present invention is not limited to the above embodiment, and can be carried out in various embodiments without departing from the gist thereof, and for example, the following modifications are also possible. Further, in the above embodiment, a part of the configuration realized by the hardware may be replaced with software, and conversely, a part of the configuration realized by the software may be replaced with the hardware. You may.

［変形例１］
上記実施形態では、モデル学習装置、または、モデル学習装置を備える制御装置の構成の一例を示した。しかしながら、モデル学習装置および制御装置の構成は、種々の変形が可能であり、これらの構成に限定されない。例えば、モデル学習装置および制御装置の少なくとも一方は、ネットワーク上に配置された複数の情報処理装置（サーバ装置や、車載ＥＣＵ等を含む）が協働することによって構成されてもよい。 [Modification 1]
In the above embodiment, an example of the configuration of the model learning device or the control device including the model learning device is shown. However, the configurations of the model learning device and the control device can be variously modified and are not limited to these configurations. For example, at least one of the model learning device and the control device may be configured by the cooperation of a plurality of information processing devices (including a server device, an in-vehicle ECU, etc.) arranged on the network.

［変形例２］
上記実施形態では、モデル学習方法（図２参照）、および、予測制御方法（図４参照）の手順の一例を示した。しかしながら、これらの方法は、種々の変形が可能であり、これらの手順に限定されない。例えば、一部のステップを省略してもよく、説明しない他のステップを追加してもよい。また、一部のステップの実行順序を変更してもよい。 [Modification 2]
In the above embodiment, an example of the procedure of the model learning method (see FIG. 2) and the predictive control method (see FIG. 4) is shown. However, these methods are capable of various modifications and are not limited to these procedures. For example, some steps may be omitted or other steps not explained may be added. Further, the execution order of some steps may be changed.

［変形例３］
第１実施形態において、状態方程式を式（１）と定義し、式（１）に含まれる写像Ψ、Φのそれぞれを式（２）、（３）で示す内部変数ｕ、ｘのそれぞれで定義した。しかしながら、写像Ψ、Φのそれぞれの定義は、あくまで一例であり、これらは、任意の形に定義してよい。このとき、内部変数とともに、出力変数ｙの変化に影響を与える制御不可能な外生入力ｄを入力とする写像とすることで、システムの将来の状態を高精度に予測することができるモデルとすることができる。 [Modification 3]
In the first embodiment, the equation of state is defined as the equation (1), and the maps Ψ and Φ included in the equation (1) are defined by the internal variables u and x shown in the equations (2) and (3), respectively. bottom. However, the definitions of maps Ψ and Φ are merely examples, and these may be defined in any form. At this time, by using a mapping that takes an uncontrollable exogenous input d that affects the change of the output variable y as an input together with the internal variable, a model that can predict the future state of the system with high accuracy. can do.

［変形例４］
第１実施形態では、モデル学習方法（図２参照）のステップＳ１４において、学習部１１２は、一致度を用いてモデルを学習するとした。このとき、学習部１１２は、一致度に加えて、制約条件を満たしているか否かを判断してもよい。例えば、式（１）の状態方程式に含まれる関数Ａ’（ｄ）、関数Ｂ’（ｄ）、関数ｃ’（ｄ）のそれぞれに制約条件を設定してもよい。 [Modification 4]
In the first embodiment, in step S14 of the model learning method (see FIG. 2), the learning unit 112 learns the model using the degree of agreement. At this time, the learning unit 112 may determine whether or not the constraint condition is satisfied in addition to the degree of agreement. For example, constraints may be set for each of the function A'(d), the function B'(d), and the function c'(d) included in the equation of state of the equation (1).

［変形例５］
第１実施形態では、写像Ψ、写像Φ、関数Ａ’（ｄ）、関数Ｂ’（ｄ）、および、関数ｃ’（ｄ）は、外生入力ｄが入力されることで出力されるとした。しかしながら、写像Ψ、写像Φ、関数Ａ’（ｄ）、関数Ｂ’（ｄ）、および、関数ｃ’（ｄ）は、外生入力ｄに依存して出力が変化しなくてもよい。 [Modification 5]
In the first embodiment, the map Ψ, the map Φ, the function A'(d), the function B'(d), and the function c'(d) are output by inputting the exogenous input d. bottom. However, the output of the map Ψ, the map Φ, the function A'(d), the function B'(d), and the function c'(d) does not have to change depending on the exogenous input d.

［変形例６］
第２実施形態では、学習部２１２は、式（２）～式（４）を離散化した式（１３）～式（１５）に変換した状態方程式を用いて、最適制御問題を解くとした。しかしながら、学習部２１２は、状態方程式を離散化せずに、最適制御問題を解いてもよい。式（１３）～式（１５）に変換した状態方程式を用いて最適制御問題を解くことで、内部変数ｘ、ｕの数を有限とすることができるため、モデルの学習に要する時間を比較的短くすることができる。 [Modification 6]
In the second embodiment, the learning unit 212 solves the optimal control problem by using the equation of state obtained by converting the equations (2) to (4) into the discretized equations (13) to (15). However, the learning unit 212 may solve the optimal control problem without discretizing the equation of state. By solving the optimal control problem using the equations of state converted from equations (13) to (15), the number of internal variables x and u can be made finite, so the time required for learning the model is relatively long. Can be shortened.

以上、実施形態、変形例に基づき本態様について説明してきたが、上記した態様の実施の形態は、本態様の理解を容易にするためのものであり、本態様を限定するものではない。本態様は、その趣旨並びに特許請求の範囲を逸脱することなく、変更、改良され得ると共に、本態様にはその等価物が含まれる。また、その技術的特徴が本明細書中に必須なものとして説明されていなければ、適宜、削除することができる。 Although this embodiment has been described above based on the embodiments and modifications, the embodiments described above are for facilitating the understanding of the present embodiment and do not limit the present embodiment. This aspect may be modified or improved without departing from its spirit and claims, and this aspect includes its equivalent. Further, if the technical feature is not described as essential in the present specification, it may be deleted as appropriate.

１００…モデル学習装置
１１０，２１０…ＣＰＵ
１１１，２１１…制御部
１１２，２１２…学習部
１２０…記憶部
１２１…モデル記憶部
１２２…データセット記憶部
１３０…ＲＯＭ／ＲＡＭ
１４０…通信部
１５０…入出力部
２００…制御装置
２１３…決定部
３００…システム 100 ... Model learning device 110, 210 ... CPU
111, 211 ... Control unit 112, 212 ... Learning unit 120 ... Storage unit 121 ... Model storage unit 122 ... Data set storage unit 130 ... ROM / RAM
140 ... Communication unit 150 ... Input / output unit 200 ... Control device 213 ... Decision unit 300 ... System

Claims

A model learning device that learns a model representing the relationship between the input variable v input to the system and the output variable y output from the system.
A model storage unit that stores a model used for learning a non-linear equation of state for predicting the output variable y using the input variable v, and a model storage unit.
A learning unit for learning the state equation using the model and an input / output data set including a plurality of sets of input variable data and output variable data for the model.
Equipped with
The model is a state equation including a bijective map Ψ with the input variable v as an input and a bijective map Φ with the output variable y as an input.
Model learning device.

The model learning device according to claim 1.
The model is defined by Eq. (1).

In the above formula
The left side of the equal sign is the time derivative of the n (n is an integer) dimensional vector representing the output variable y.
Of the right side of the equal sign
The input variable v is an m (m is an integer) dimensional vector.
The exogenous input d is a p (p is an integer) dimensional vector indicating an uncontrollable input that affects the change of the output variable y.
The map Ψ is a function that returns an m-dimensional vector with the input variable v and the exogenous input d as inputs.
The map Φ is a function that returns an n-dimensional vector with the output variable y and the exogenous input d as inputs.
Each of the function A', the function B', and the function c'is a function that takes the exogenous input d as an input and returns each of an n × n matrix, an n × m matrix, and an n-dimensional vector.
Model learning device.

The model learning device according to claim 2.
In the above equation (1), if the map Ψ is defined as the internal variable u and the map Φ is defined as the internal variable x,
The learning unit learns the equation of state defined by the equations (2) to (4).
Model learning device.

The model learning device according to claim 3.
The map Ψ is defined by Eqs. (5) to (8), and is defined by Eqs. (5) to (8).

The map Φ is defined by Eqs. (9) to (12), and is defined by Eqs. (9) to (12).

i is the number of layers in the multi-layered neural network, L _Ψ and L _Φ are the number of layers in the multi-layered neural network, W _Ψ and W _Φ are weights, and b _Ψ and b _Φ are biases. Each of ψ _Ψ and φ _Φ is an activation function and is an arbitrary all-oriental mapping that returns an output of the same dimension as the input.
Model learning device.

The model learning apparatus according to any one of claims 1 to 4.
The learning unit
The output is estimated by giving the model a set of the input variable data in the input / output data set.
Evaluate the degree of agreement between the estimated output and the set of output variable data in the input / output data set.
The equation of state is learned by updating the learning parameters of the model according to the evaluation result.
Model learning device.

The model learning device according to claim 3.
The learning unit learns the equations of state shown in the equations (13) to (15), which are discretized from the equations (2) to (4) in the time step of the discrete time k.
Model learning device.

A control device that controls the system
The model learning device according to claim 6 and
A determination unit for determining the target value of the input variable v corresponding to the target value of the output variable y by using the equation of state learned by the learning unit is provided.
The determination unit determines the target value of the input variable v by solving the optimum control problem using the equations of state shown in the equations (13) to (15) learned by the learning unit.
Control device.

It is a model learning method for learning a model representing the relationship between the input variable v input to the system and the output variable y output from the system.
The process of acquiring a model used for learning a nonlinear equation of state for predicting the output variable y using the input variable v, and
A step of learning the state equation using the model and an input / output data set including a plurality of sets of input variable data and output variable data for the model.
Equipped with
The model is a model learning method, which is a state equation including a bijective mapping Ψ with the input variable v as an input and a bijective mapping Φ with the output variable y as an input.

It is a computer program that causes an information processing apparatus to execute learning of a model representing a relationship between an input variable v input to a system and an output variable y output from the system.
A function to acquire a model used for learning a nonlinear equation of state for predicting the output variable y using the input variable v, and
The information processing apparatus is made to execute the function of learning the state equation using the model and an input / output data set including a plurality of sets of input variable data and output variable data for the model.
The model is a computer program, which is a state equation including a bijective map Ψ with the input variable v as an input and a bijective map Φ with the output variable y as an input.