JP2020198097A

JP2020198097A - Prediction device, learning device, method for prediction, and program

Info

Publication number: JP2020198097A
Application number: JP2020093397A
Authority: JP
Inventors: 石田　武; Takeshi Ishida; 武石田
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2019-05-31
Filing date: 2020-05-28
Publication date: 2020-12-10
Also published as: WO2020241009A1

Abstract

To provide a prediction device, a learning device, a method for prediction, and a program which can allow a model to perform mechanical learning so that knowledge of humans can be easily reflected on the model.SOLUTION: The present invention includes: a function control unit for controlling a behavior of a prediction function showing the relation between an input and an output in a prediction model which outputs a predicted value for the input; a learning unit for allowing the prediction model to learn so that the output obtained by inputting learning data to the prediction function, of which behavior is controlled by the function control unit, will become closer to teacher data corresponding to the learning data; and a prediction unit for predicting a predicted value for the input on the basis of the output obtained by inputting unlearned data to the prediction model learned by the learning unit.SELECTED DRAWING: Figure 1

Description

本発明は、予測装置、学習装置、予測方法、及びプログラムに関する。 The present invention relates to a prediction device, a learning device, a prediction method, and a program.

幅広い産業において、統計解析、機械学習といった手法を用いた分析モデルが活用されている。例えば、担当者の経験則に依存していた販売促進企画の立案を、機械学習によって自動的に行う技術が知られている（例えば、特許文献１参照）。特許文献１では、過去の販売促進企画と、当該販売促進企画に係る顧客データや売上データを学習データとして学習を行い、実施予定の企画を立案するために必要な、顧客への売上予測等の情報を収集する。
また、機械学習の分野において、学習過程における過学習を防止するために、正則化項を用いる手法がある（例えば、特許文献２参照）。特許文献２では、正則化項が深層学習のパラメータを二値に収束させることにより効率的な学習を行う技術が開示されている。 Analytical models using methods such as statistical analysis and machine learning are used in a wide range of industries. For example, there is known a technique for automatically planning a sales promotion plan that relies on the rule of thumb of the person in charge by machine learning (see, for example, Patent Document 1). In Patent Document 1, past sales promotion plans and customer data and sales data related to the sales promotion plans are learned as learning data, and sales forecasts to customers, etc. necessary for planning a plan to be implemented are described. Gather information.
Further, in the field of machine learning, there is a method of using a regularization term in order to prevent overfitting in the learning process (see, for example, Patent Document 2). Patent Document 2 discloses a technique in which a regularization term converges a deep learning parameter to a binary value to perform efficient learning.

特開２０１８−４５３１６号公報JP-A-2018-45316 特開２０１９−４０４１４号公報JP-A-2019-40414

しかしながら、機械学習を用いて生成したモデルから、人間の感覚とは乖離した結果が得られてしまう場合がある。
例えば、本日の広告費用（ｘ１）と、本日の売上（ｘ２）とに基づいて、明日の広告費用に対する売上（ｙ）を予測する分析モデル（ｆ（ｘ１、ｘ２））を構築する場合を考える。通常の感覚で言えば、ある程度までは広告費用を増やすことで売上が伸びるが、それ以上は広告費用を増やしても売上が横ばいとなる地点が存在するはずである。しかし、分析モデルにおける広告費用ｘ１を横軸、売上ｆを縦軸とするグラフを作成すると、広告費用ｘ１の増加に伴い、売上ｆが単調に増加し続け、ある地点から売上の増加率（広告効果）が低減することが考慮できていない結果となることがある。或いは、広告費用ｘ１の増加に伴い、売上ｆが局所的にマイナスになる等、違和感のある挙動を示す場合がある。 However, there are cases where results that deviate from human senses are obtained from the model generated using machine learning.
For example, consider the case of constructing an analysis model (f (x1, x2)) that predicts sales (y) for tomorrow's advertising costs based on today's advertising costs (x1) and today's sales (x2). .. In the usual sense, increasing advertising costs will increase sales to a certain extent, but there should be points where sales will level off even if advertising costs are increased. However, if you create a graph with the advertising cost x1 on the horizontal axis and the sales f on the vertical axis in the analysis model, the sales f will continue to increase monotonously as the advertising cost x1 increases, and the rate of increase in sales from a certain point (advertising). The result may be that the reduction of the effect) cannot be considered. Alternatively, as the advertising cost x1 increases, the sales f may become locally negative, and the behavior may be uncomfortable.

このような状況は、決して稀なものではなく、特に、学習データが不完全な場合において高い頻度で発生することが考えられる。図５Ａ〜図５Ｃは、データの不完全性と、それが引き起こす問題の例を示している。図５Ａ〜図５Ｃに示す上下のグラフでは、横軸に広告費用、縦軸に売上を示しており、上側にデータと真の曲線（真の広告費用と売上との関係を示す曲線）、下側にデータとモデルが予測した曲線を示している。図５Ａ〜図５Ｃでは、不完全なデータを分析した結果、誤った結論（予測値）が導かれる例として、下記（１）〜（３）が示されている。 Such a situation is by no means rare, and it is conceivable that it occurs frequently, especially when the training data is incomplete. 5A-5C show examples of data incompleteness and the problems it causes. In the upper and lower graphs shown in FIGS. 5A to 5C, the horizontal axis shows the advertising cost, the vertical axis shows the sales, the upper side shows the data and the true curve (the curve showing the relationship between the true advertising cost and the sales), and the lower side. The data and the curve predicted by the model are shown on the side. In FIGS. 5A to 5C, the following (1) to (3) are shown as examples in which an erroneous conclusion (predicted value) is drawn as a result of analyzing incomplete data.

（１）学習に用いるデータが不足している（図５Ａ参照）
（２）学習に用いるデータにノイズが多く含まれている（図５Ｂ参照）
（３）学習に用いるべき重要なデータが取得できない、ないし学習過程において考慮されていない（図５Ｃ参照） (1) Insufficient data used for learning (see Fig. 5A)
(2) The data used for learning contains a lot of noise (see Fig. 5B).
(3) Important data to be used for learning cannot be obtained or is not considered in the learning process (see Fig. 5C).

例えば、（１）では、図５Ａに示すように、モデルの入力に用いられる説明変数や、モデルの挙動を決定するパラメータに対して、学習に用いるデータが不足している場合に、学習の過程においてモデルが、真の予想曲線に対して乖離しているデータと乖離していないデータとを区別することができないために、違和感のある挙動を示す学習済みモデルが生成されてしまうと考えられる。
（２）では、図５Ｂに示すように、真の曲線から乖離しているデータが多数存在している場合に、学習の過程においてモデルが、乖離しているデータからの影響を受けるために、違和感のある挙動を示す学習済みモデルが生成されてしまうと考えられる。
（３）では、図５Ｃに示すように、学習の過程において予測モデルの挙動に影響を与え得る重要な情報（例えば、広告の内容が不評であった等）が入力変数に用いられなかったために、違和感のある挙動を示す学習済みモデルが生成されてしまうと考えられる。 For example, in (1), as shown in FIG. 5A, the learning process is performed when the data used for training is insufficient for the explanatory variables used for inputting the model and the parameters for determining the behavior of the model. In this case, since the model cannot distinguish between the data that deviates from the true prediction curve and the data that does not deviate from the true prediction curve, it is considered that a trained model showing a strange behavior is generated.
In (2), as shown in FIG. 5B, when there are many data deviating from the true curve, the model is affected by the deviating data in the learning process. It is considered that a trained model showing a strange behavior is generated.
In (3), as shown in FIG. 5C, important information that can affect the behavior of the prediction model (for example, the content of the advertisement was unpopular) was not used as the input variable in the learning process. , It is considered that a trained model showing a strange behavior is generated.

これにより、学習済みモデルから分析結果の利用者の納得が得られ難い予測値が出力されてしまう事象が発生し、開発コストをかけて生成した分析モデルが活用できない問題が発生している。 As a result, an event occurs in which a predicted value that is difficult for the user to understand the analysis result is output from the trained model, and there is a problem that the analysis model generated at the development cost cannot be utilized.

本発明は、上記問題を解決すべくなされたもので、その目的は、モデルに、人間の知見が反映され易くなるように機械学習させることができる予測装置、学習装置、予測方法、及びプログラムを提供することにある。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a prediction device, a learning device, a prediction method, and a program capable of machine learning so that human knowledge can be easily reflected in a model. To provide.

上記問題を解決するために、本発明の一態様は、入力に対する予測値を出力する予測モデルにおける、入力と出力との関係を示す予測関数の挙動を制御する関数制御部と、前記関数制御部により挙動を制御された前記予測関数に、学習データを入力させることにより得られる出力が、当該学習データに対応する教師データに近づくように、前記予測モデルを学習させる学習部と、前記学習部による学習済みの前記予測モデルに、未学習データを入力させることにより得られる出力に基づいて、入力に対する予測値を予測する予測部と、を備えることを特徴とする予測装置である。 In order to solve the above problem, one aspect of the present invention includes a function control unit that controls the behavior of a prediction function indicating a relationship between an input and an output in a prediction model that outputs a prediction value with respect to an input, and the function control unit. The learning unit that trains the prediction model so that the output obtained by inputting the training data into the prediction function whose behavior is controlled by the method approaches the teacher data corresponding to the training data, and the learning unit. The prediction device is characterized by including a prediction unit that predicts a predicted value with respect to an input based on an output obtained by inputting unlearned data to the trained prediction model.

また、本発明の一態様は、上記に記載の予測装置において、前記関数制御部は、予め設定した所定の損失関数に正則化項を加算したものを、前記予測モデルを学習させる過程において用いる損失関数とすることにより、前記予測関数の挙動を制御し、前記正則化項は、前記予測関数、及び前記予測関数に用いられる変数に基づいて導出される関数を変数とする関数に、所定の正則化重みが乗算されることにより生成されるようにしてもよい。 Further, in one aspect of the present invention, in the prediction device described above, the function control unit uses a preset loss function plus a regularization term in the process of training the prediction model. By making it a function, the behavior of the prediction function is controlled, and the regularization term is a predetermined regularity to the prediction function and a function whose variables are functions derived based on the variables used in the prediction function. It may be generated by multiplying the conversion weights.

また、本発明の一態様は、上記に記載の予測装置において、前記正則化項は、前記予測関数を、前記予測関数の入力に用いられる変数で微分することにより導出される導関数を変数とする関数に、所定の正則化重みが乗算されることにより生成されるようにしてもよい。 Further, in one aspect of the present invention, in the prediction device described above, the regularization term uses a derivative derived by differentiating the prediction function with a variable used for inputting the prediction function as a variable. It may be generated by multiplying the function to be used by a predetermined regularization weight.

また、本発明の一態様は、上記に記載の予測装置において、前記正則化項は、前記予測関数の入力に用いられる変数の近傍における、当該予測関数の出力を入力変数とする関数を含むようにしてもよい。 Further, in one aspect of the present invention, in the prediction device described above, the regularization term includes a function whose input variable is the output of the prediction function in the vicinity of the variable used for inputting the prediction function. May be good.

また、本発明の一態様は、上記に記載の予測装置において、前記正則化項は、前記予測関数の入力に用いられる変数の近傍における、当該予測関数のテイラー級数における所定の次数までの項からなる出力を入力変数とする関数を含むようにしてもよい。 Further, in one aspect of the present invention, in the prediction device described above, the regularization term is from a term up to a predetermined order in the Taylor series of the prediction function in the vicinity of the variable used for inputting the prediction function. It may include a function whose input variable is the output of.

また、本発明の一態様は、上記に記載の予測装置において、前記正則化項は、前記予測関数の入力に用いられる変数の値に応じて互いに異なる関数を含むようにしてもよい。 Further, in one aspect of the present invention, in the prediction device described above, the regularization term may include functions different from each other depending on the value of the variable used for inputting the prediction function.

また、本発明の一態様は、上記に記載の予測装置において、前記正則化項は、前記予測関数の出力を変数とする関数に、所定の正則化重みが乗算されることにより生成されるようにしてもよい。 Further, one aspect of the present invention is such that in the prediction device described above, the regularization term is generated by multiplying a function whose variable is the output of the prediction function by a predetermined regularization weight. It may be.

また、本発明の一態様は、上記に記載の予測装置において、前記正則化項は、前記予測関数の入力を変数とする関数に、所定の正則化重みが乗算されることにより生成されるようにしてもよい。 Further, one aspect of the present invention is such that in the prediction device described above, the regularization term is generated by multiplying a function whose variable is the input of the prediction function by a predetermined regularization weight. It may be.

また、本発明の一態様は、入力に対する予測値を出力する予測モデルにおける、入力と出力との関係を示す予測関数の挙動を制御する関数制御部と、前記関数制御部により挙動を制御された前記予測関数に、学習データを入力させることにより得られる出力が、当該学習データに対応する教師データに近づくように、前記予測モデルを学習させる学習部と、を備えることを特徴とする学習装置である。 Further, one aspect of the present invention is a function control unit that controls the behavior of a prediction function that indicates the relationship between an input and an output in a prediction model that outputs a prediction value with respect to an input, and the behavior is controlled by the function control unit. A learning device including a learning unit that trains the prediction model so that the output obtained by inputting the training data into the prediction function approaches the teacher data corresponding to the training data. is there.

また、本発明の一態様は、関数制御部が、入力に対する予測値を出力する予測モデルにおける、入力と出力との関係を示す予測関数の挙動を制御する関数制御過程と、学習部が、前記関数制御部により挙動を制御された前記予測関数に、学習データを入力させることにより得られる出力が、当該学習データに対応する教師データに近づくように、前記予測モデルを学習させる学習過程と、予測部が、前記学習部による学習済みの前記予測モデルに、未学習データを入力させることにより得られる出力に基づいて、入力に対する予測値を予測する予測過程と、を含むことを特徴とする予測方法である。 Further, in one aspect of the present invention, the function control process controls the behavior of the prediction function indicating the relationship between the input and the output in the prediction model in which the function control unit outputs the prediction value with respect to the input, and the learning unit describes the above. A learning process in which the prediction model is trained so that the output obtained by inputting training data to the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the training data, and prediction. A prediction method characterized in that the unit includes a prediction process of predicting a predicted value with respect to an input based on an output obtained by inputting unlearned data to the prediction model trained by the learning unit. Is.

また、本発明の一態様は、コンピュータを、入力に対する予測値を出力する予測モデルにおける、入力と出力との関係を示す予測関数の挙動を制御する関数制御手段、前記関数制御手段により挙動を制御された前記予測関数に、学習データを入力させることにより得られる出力が、当該学習データに対応する教師データに近づくように、前記予測モデルを学習させる学習手段、前記学習手段による学習済みの前記予測モデルに、未学習データを入力させることにより得られる出力に基づいて、入力に対する予測値を予測する予測手段として機能させるためのプログラムである。 Further, in one aspect of the present invention, the behavior of a computer is controlled by a function control means for controlling the behavior of a prediction function indicating a relationship between an input and an output in a prediction model that outputs a prediction value with respect to the input, and the function control means. A learning means for training the prediction model so that the output obtained by inputting the training data into the predicted function is close to the teacher data corresponding to the training data, and the prediction learned by the learning means. It is a program for making a model function as a prediction means for predicting a predicted value for an input based on an output obtained by inputting untrained data.

この発明によれば、モデルに、人間の知見が反映され易くなるように機械学習させることができる。 According to the present invention, the model can be machine-learned so as to easily reflect human knowledge.

第１の実施形態の予測装置１の構成例を示すブロック図である。It is a block diagram which shows the structural example of the prediction apparatus 1 of 1st Embodiment. 第１の実施形態の関数制御部１６が行う処理を説明する図である。It is a figure explaining the process performed by the function control unit 16 of the 1st embodiment. 第１の実施形態の関数制御部１６が行う処理を説明する図である。It is a figure explaining the process performed by the function control unit 16 of the 1st embodiment. 第１の実施形態の予測装置１が行う処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process performed by the prediction apparatus 1 of 1st Embodiment. 本願の実施形態の課題を説明する図である。It is a figure explaining the subject of embodiment of this application. 本願の実施形態の課題を説明する図である。It is a figure explaining the subject of embodiment of this application. 本願の実施形態の課題を説明する図である。It is a figure explaining the subject of embodiment of this application. 第２の実施形態における課題を説明する図である。It is a figure explaining the problem in 2nd Embodiment. 第２の実施形態において予測装置１が行う処理を説明する図である。It is a figure explaining the process performed by the prediction apparatus 1 in the 2nd Embodiment. 第２の実施形態の変形例において予測装置１が行う処理を説明する図である。It is a figure explaining the process performed by the prediction apparatus 1 in the modification of the 2nd Embodiment.

以下、本発明の実施形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

まず、第１の実施形態について説明する。図１は、第１の実施形態の予測装置１の構成例を示すブロック図である。予測装置１は、予測モデルを生成し、生成した予測モデルを用いて、入力に対する予測を行う装置である。ここでの予測モデルは、任意の項目について、入力に対する予測値を出力するモデルであり、例えば、広告費用（入力）に対する売上（予測値）を出力するモデルである。 First, the first embodiment will be described. FIG. 1 is a block diagram showing a configuration example of the prediction device 1 of the first embodiment. The prediction device 1 is a device that generates a prediction model and uses the generated prediction model to make a prediction for an input. The prediction model here is a model that outputs a predicted value for an input for an arbitrary item, and is, for example, a model that outputs a sales (forecast value) for an advertising cost (input).

予測装置１は、例えば、学習データ取得部１１と、教師データ取得部１２と、前処理部１３と、学習部１４と、予測部１５と、関数制御部１６と、予測モデルパラメータ記憶部１７とを備える。 The prediction device 1 includes, for example, a learning data acquisition unit 11, a teacher data acquisition unit 12, a preprocessing unit 13, a learning unit 14, a prediction unit 15, a function control unit 16, and a prediction model parameter storage unit 17. To be equipped.

学習データ取得部１１は、学習データを取得する。学習データは、予測モデルに学習させる際の入力として用いられるデータである。例えば、予測モデルが、広告費用（入力）に対する売上（予測値）を出力するモデルである場合、学習データは、過去において投資された広告費用の実績を示すデータである。 The learning data acquisition unit 11 acquires the learning data. The training data is data used as an input when training the prediction model. For example, when the forecast model is a model that outputs sales (forecast value) with respect to advertising cost (input), the training data is data showing the actual performance of advertising cost invested in the past.

教師データ取得部１２は、教師データを取得する。教師データは、予測モデルに学習させる際の出力として用いられるデータである。例えば、予測モデルが、広告費用（入力）に対する売上（予測値）を出力するモデルである場合、学習データは、過去における売上の実績を示すデータである。 The teacher data acquisition unit 12 acquires teacher data. The teacher data is data used as an output when the prediction model is trained. For example, when the forecast model is a model that outputs sales (forecast value) with respect to advertising cost (input), the training data is data showing the actual sales in the past.

前処理部１３は、学習データに教師データを対応付けることにより、予測モデルに学習させるデータを生成する。前処理部１３は、例えば、ある過去の日付において投資された広告費用（入力データ）に、その日付における売上の実績を対応づけたデータを、予測モデルに学習させるデータとして生成する。 The preprocessing unit 13 generates data to be trained by the prediction model by associating the training data with the teacher data. For example, the preprocessing unit 13 generates data in which the advertising cost (input data) invested on a certain past date is associated with the actual sales on that date as data to be trained by the prediction model.

学習部１４は、前処理部１３により生成されたデータを用いて、予測モデルを学習させる。予測モデルは、任意の機械学習の手法を用いた構成であってよいが、例えば、リカレントニューラルネットワーク（以下、ＲＮＮと称する）である。一般的に、ＲＮＮは、入力層、隠れ層（中間層）、出力層の３つの階層により構成される。入力層には、ＲＮＮに学習させたいデータ（入力データ）が入力される。出力層からは、ＲＮＮによって学習された結果を示すデータ（出力データ）が出力される。隠れ層は、学習の核となる処理を行う。例えば、隠れ層は、入力を、活性化関数（伝達関数）と呼ばれる関数により表現される値に変換して出力する。例えば、活性化関数は、正規化線形関数や、シグモイド関数、ステップ関数などであるが、これに限定されず、任意の関数が用いられてよい。 The learning unit 14 trains the prediction model using the data generated by the preprocessing unit 13. The prediction model may be configured by using an arbitrary machine learning method, and is, for example, a recurrent neural network (hereinafter referred to as RNN). Generally, an RNN is composed of three layers: an input layer, a hidden layer (intermediate layer), and an output layer. Data (input data) to be trained by the RNN is input to the input layer. Data (output data) indicating the result learned by the RNN is output from the output layer. The hidden layer performs the core processing of learning. For example, the hidden layer converts the input into a value represented by a function called an activation function (transfer function) and outputs it. For example, the activation function includes, but is not limited to, a rectified linear function, a sigmoid function, a step function, and the like, and any function may be used.

ここで、ＲＮＮの構成について簡単に説明する。ＲＮＮは、入力層のユニットから、隠れ層のｎ層のうち、最も浅い層の複数のユニットの各々に対してノードが接続される。ここで、ｎは任意の自然数である。また、最も浅い層とは隠れ層のｎ層のうち、最も入力層に近い層であり、この例では第１層である。第１層のユニットから、隠れ層のｎ層のうち、第１層の次に浅い層（この例では第２層）の複数のユニットの各々に対してノードが接続される。ユニット同士を接続するノードの各々には、結合係数よる重みづけＷ及びバイアス成分ｂが適用される。これにより、ある層のユニットから、より深い層のユニットにデータが出力される際に、ユニット同士を接続するノードの結合係数に応じた重みＷ及びバイアス成分ｂが付与されたデータが出力される。 Here, the configuration of the RNN will be briefly described. In the RNN, nodes are connected from the unit of the input layer to each of the plurality of units of the shallowest layer among the n layers of the hidden layer. Here, n is an arbitrary natural number. The shallowest layer is the layer closest to the input layer among the n layers of the hidden layer, and is the first layer in this example. Nodes are connected from the unit of the first layer to each of a plurality of units of the n-layer of the hidden layer, which is the shallowest layer next to the first layer (second layer in this example). A weighting W based on a coupling coefficient and a bias component b are applied to each of the nodes connecting the units. As a result, when data is output from a unit in a certain layer to a unit in a deeper layer, data to which a weight W and a bias component b corresponding to the coupling coefficient of the nodes connecting the units are added is output. ..

学習部１４は、予測モデルの入力層に学習データを入力させる。学習部１４は、予測モデルに入力させた学習データに対して出力層から出力されたデータが、当該学習データに対応する教師データに近づくように、予測モデルを学習させる。学習部１４は、誤差と予測モデルに設定したパラメータとの関係を、損失関数として導出する。ここでの誤差は、予測モデルの出力層から出力されるデータと、教師データとの乖離度合いである。乖離度合いには、任意の指標が用いられてよいが、例えば、誤差の二乗（二乗誤差）やクロスエントロピー等が用いられる。 The learning unit 14 causes the input layer of the prediction model to input the learning data. The learning unit 14 trains the prediction model so that the data output from the output layer with respect to the learning data input to the prediction model approaches the teacher data corresponding to the learning data. The learning unit 14 derives the relationship between the error and the parameters set in the prediction model as a loss function. The error here is the degree of dissociation between the data output from the output layer of the prediction model and the teacher data. Any index may be used for the degree of divergence, and for example, the square of the error (square error), cross entropy, or the like is used.

一般に、損失関数ｌ（小文字のＬ）は、教師データｙ_R、及び予測関数ｆ（ｘ）を変数とする関数で表現することができ、以下に示す（１）式で表現される。（１）式において、ｌは損失関数、ｙ_Rは教師データ、ｆ（ｘ）は予測モデルの入力（ｘ）と出力（ｆ（ｘ））との関係を示す関数である。 In general, the loss function l (lowercase L) can be expressed by a function having the teacher data y _R and the prediction function f (x) as variables, and is expressed by the following equation (1). In equation (1), l is a loss function, y _R is training data, and f (x) is a function indicating the relationship between the input (x) and the output (f (x)) of the prediction model.

損失関数ｌ（ｙ_R、ｆ（ｘ）） …（１） Loss function l (y _R , f (x)) ... (1)

予測モデルの出力をｙとすれば、予測関数は、ｙ＝ｆ（ｘ）にて表現される。これを（１）式に適用すると、損失関数ｌは、以下の（２）式にて示すことができる。 If the output of the prediction model is y, the prediction function is expressed by y = f (x). When this is applied to the equation (1), the loss function l can be expressed by the following equation (2).

損失関数ｌ（ｙ_R、ｙ） …（２） Loss function l (y _R , y) ... (2)

本実施形態では、学習部１４は、式（１）又は式（２）に示す損失関数ｌに代えて、関数制御部１６により導出された正則化項Ｌを損失関数ｌに加えたものを、新たな損失関数ｌ＃として用いる。 In the present embodiment, the learning unit 14 adds the regularization term L derived by the function control unit 16 to the loss function l instead of the loss function l shown in the equation (1) or the equation (2). Used as a new loss function l #.

損失関数ｌに正則化項Ｌを加算することにより、予測関数ｙの挙動を制御することができる。これにより、予測関数ｙが違和感のある挙動を示す場合において、その挙動を違和感のない挙動とすることが可能である。例えば、予測関数ｙが、学習の過程において広告費用ｘの増加に伴い、売上ｆ（ｘ）が局所的にマイナスになってしまう挙動を示した場合であっても、マイナスにはならないように制御することができる。したがって、予測モデルに人間の知見が反映され易くなるように機械学習させることが可能である。 The behavior of the prediction function y can be controlled by adding the regularization term L to the loss function l. As a result, when the prediction function y shows a strange behavior, it is possible to make the behavior a natural behavior. For example, even if the prediction function y shows a behavior that the sales f (x) locally becomes negative as the advertising cost x increases in the learning process, it is controlled so as not to become negative. can do. Therefore, it is possible to perform machine learning so that human knowledge is easily reflected in the prediction model.

損失関数ｌ＃は、以下に示す（３）式で表現される。（３）式において、ｌ＃は本実施形態の学習部１４が用いる損失関数、ｌは（１）式又は（２）式で示される損失関数、ｙ_Rは教師データ、ｆ（ｘ）は予測モデルの出力、λは正則化の重み係数、Ｌは正則化項である。 The loss function l # is expressed by the following equation (3). In equation (3), l # is the loss function used by the learning unit 14 of the present embodiment, l is the loss function represented by equation (1) or (2), y _R is the teacher data, and f (x) is the prediction. The output of the model, λ is the weighting coefficient of regularization, and L is the regularization term.

損失関数ｌ＃＝ｌ（ｙ_R、ｙ）＋λ×Ｌ …（３） Loss function l # = l (y _R , y) + λ × L ... (3)

なお、正則化の重み係数λは任意の実数、あるいは、入力ｘの関数であってよい。また、正則化項Ｌについては、後で詳しく説明する。 The regularization weighting coefficient λ may be an arbitrary real number or a function of the input x. Further, the regularization term L will be described in detail later.

学習部１４は、誤差逆伝搬法を用いて、損失関数ｌが最小となるように、重みＷとバイアス成分ｂの組み合わせを決定する。学習部１４は、決定（更新）した重みＷとバイアス成分ｂとを、ノードやユニットと対応付け、対応付けた情報を予測モデルパラメータ記憶部１７に記憶させる。 The learning unit 14 uses the error backpropagation method to determine the combination of the weight W and the bias component b so that the loss function l is minimized. The learning unit 14 associates the determined (updated) weight W and the bias component b with the nodes and units, and stores the associated information in the prediction model parameter storage unit 17.

予測部１５は、予測モデルパラメータ記憶部１７を参照することで、学習により決定された各階層の重みＷとバイアス成分ｂに基づいてＲＮＮを生成（再構築）する。予測部１５は、生成（再構築）したＲＮＮを予測モデルとして、予測モデルに未学習の入力データを入力することにより、予測モデルから出力される出力データに基づいて予測値を予測する。「未学習の入力データ」とは、例えば、予測モデルを学習させる段階において学習データに利用されていないデータのことである。予測部１５は、例えば、再構築したＲＮＮの入力層に、未学習の入力データを入力することにより出力層から出力される値を、予測値として出力する。 The prediction unit 15 generates (reconstructs) an RNN based on the weight W and the bias component b of each layer determined by learning by referring to the prediction model parameter storage unit 17. The prediction unit 15 uses the generated (reconstructed) RNN as a prediction model, inputs unlearned input data to the prediction model, and predicts the prediction value based on the output data output from the prediction model. The “unlearned input data” is, for example, data that is not used as training data at the stage of training the prediction model. For example, the prediction unit 15 outputs a value output from the output layer as a prediction value by inputting unlearned input data to the reconstructed RNN input layer.

予測モデルパラメータ記憶部１７は、予測モデルの学習により決定された各階層の重みＷとバイアス成分ｂが記憶される。予測モデルパラメータ記憶部１７には、ＲＮＮの構成を示す情報が記憶されてよい。ＲＮＮの構成を示す情報には、例えば、ＲＮＮの隠れ層の層数や、各層のユニット数、活性化関数、などを示す情報が含まれる。 The prediction model parameter storage unit 17 stores the weight W and the bias component b of each layer determined by learning the prediction model. Information indicating the configuration of the RNN may be stored in the prediction model parameter storage unit 17. The information indicating the configuration of the RNN includes, for example, information indicating the number of hidden layers of the RNN, the number of units of each layer, the activation function, and the like.

関数制御部１６は、予測モデルにおける、予測関数ｙの挙動を制御する。関数制御部１６は、学習が実行される度に、予測部１５により導出される予測関数ｙを取得する。関数制御部１６は、予測関数ｙの挙動に違和感があるか否かを判定する。 The function control unit 16 controls the behavior of the prediction function y in the prediction model. The function control unit 16 acquires the prediction function y derived by the prediction unit 15 each time learning is executed. The function control unit 16 determines whether or not there is a sense of discomfort in the behavior of the prediction function y.

図２は、第１の実施形態の関数制御部１６が行う処理を説明する図である。図２には、予測モデルが予測した、入力（横軸に示す、広告費用ｘ）と、出力（縦軸に示す、売上ｙ）の関係が示されている。ここでは、広告費用に対する売上は増加する傾向にあること、及び、広告費用に対する売上の増加率は急激に変化しないという業務上の知見があることを前提とする。 FIG. 2 is a diagram illustrating a process performed by the function control unit 16 of the first embodiment. FIG. 2 shows the relationship between the input (advertising cost x shown on the horizontal axis) and the output (sales y shown on the vertical axis) predicted by the prediction model. Here, it is assumed that there is a tendency for sales to increase with respect to advertising costs, and that there is business knowledge that the rate of increase in sales with respect to advertising costs does not change sharply.

図２に示すように、予測モデルが予測した予測結果が、領域Ｅ１に示すような広告費用に対する売上が減少するものであった場合、関数制御部１６は、領域Ｅ１において予測関数ｙの挙動に違和感があると判定する。また、領域Ｅ２に示すような、広告費用に対する売上が増加する比率が急激に変化（所定の閾値より傾きが大きい）ものであった場合、関数制御部１６は、領域Ｅ２において予測関数ｙの挙動に違和感があると判定する。 As shown in FIG. 2, when the prediction result predicted by the prediction model is such that the sales for the advertising cost as shown in the area E1 decrease, the function control unit 16 changes the behavior of the prediction function y in the area E1. Judge that there is a sense of discomfort. Further, when the ratio of the increase in sales to the advertising cost is abruptly changed (the slope is larger than a predetermined threshold value) as shown in the area E2, the function control unit 16 causes the behavior of the prediction function y in the area E2. Is judged to be uncomfortable.

関数制御部１６は、予測モデルの挙動に違和感があると判定する場合、損失関数ｌに正則化項Ｌを加算することにより、新たな損失関数ｌ＃を導出する。関数制御部１６は、導出した損失関数ｌ＃を用いて、予測部１５により予測モデルを学習させることにより、予測関数ｙの挙動を制御する。 When the function control unit 16 determines that the behavior of the prediction model is uncomfortable, the function control unit 16 derives a new loss function l # by adding the regularization term L to the loss function l. The function control unit 16 controls the behavior of the prediction function y by training the prediction model by the prediction unit 15 using the derived loss function l #.

関数制御部１６が導出する正則化項Ｌは、（４）式に示すように、予測モデルの入出力、及び任意階数の導関数を変数とする関数として表現される。（４）式において、ｘは予測モデルの入力、ｙは予測モデルの出力、ｄｙ／ｄｘは予測モデルの出力ｙを入力ｘで一回微分した導関数、ｄ＾ｎｙ／ｄｘ＾ｎｙは予測モデルの出力ｙを入力ｘでｎ回微分した導関数である。ｎは任意の自然数である。 As shown in the equation (4), the regularization term L derived by the function control unit 16 is expressed as a function whose variables are the input / output of the prediction model and the derivative of an arbitrary order. In equation (4), x is the input of the prediction model, y is the output of the prediction model, dy / dx is the derivative of the output y of the prediction model differentiated once with the input x, and d ^ ny / dx ^ ny is the prediction model. It is a derivative obtained by differentiating the output y of n is an arbitrary natural number.

正則化項Ｌ（ｘ、ｙ、ｄｙ／ｄｘ、…、ｄ＾ｎｙ／ｄｘ＾ｎｙ） …（４） Regularization term L (x, y, dy / dx, ..., d ^ ny / dx ^ ny) ... (4)

なお、正則化項Ｌは、（４）式にて示した変数を全て使用するものに限定されることはなく、（４）式にて示した変数のうち、少なくとも一つの変数を使用するものであればよい。例えば、導関数については、二階微分以上の高階導関数のみを用いるものであってもよい。
また、正則化項Ｌは、従来の統計および機械学習において、特に過学習を防いだり、汎化能力を高めたりするための正則化の技法により用いられる、いわゆるＬ１正則化や、Ｌ２正則化を含んでいてもよいし、Ｌ１正則化や、Ｌ２正則化を含まずに構成されてもよい。 The regularization term L is not limited to the one that uses all the variables shown in the equation (4), and uses at least one of the variables shown in the equation (4). It should be. For example, as the derivative, only the higher derivative of the second derivative or higher may be used.
Further, the regularization term L is a so-called L1 regularization or L2 regularization that is used in conventional statistics and machine learning by a regularization technique for preventing overfitting and enhancing generalization ability. It may be included, or it may be configured without including L1 regularization and L2 regularization.

関数制御部１６は、正則化項Ｌを、（５）式のように、入力ｘの範囲と、出力ｙの挙動を示す関数の積により導出する。（５）式において、Ｉ_Ａは入力ｘの範囲を示す関数、ＧｒａｄＬｏｓｓは出力ｙの挙動（例えば、勾配）を制御する関数である。 The function control unit 16 derives the regularization term L from the product of the range of the input x and the function indicating the behavior of the output y, as in Eq. (5). (5) In the equation, the function _{I A} indicating the range of the input x, GradLoss is a function to control the behavior of the output y (e.g., slope).

Ｌ（ｘ、ｙ、ｄｙ／ｄｘ、…、ｄ＾ｎｙ／ｄｘ＾ｎｙ）
＝Ｉ_Ａ（ｘ）×ＧｒａｄＬｏｓｓ（ｘ、ｙ、ｄｙ／ｄｘ、…、ｄ＾ｎｙ／ｄｘ＾ｎｙ）
…（５） L (x, y, dy / dx, ..., d ^ ny / dx ^ ny)
_{= I A (x) × GradLoss} (x, y, dy / dx, ..., d ^ ny / dx ^ ny)
… (5)

（５）式において、関数Ｉ_Ａ（ｘ）は、ｘ∈Ａの場合に１、ｘ∈Ａでない場合に０を出力する関数（領域判定関数）である。ここでｘは、入力として取り得る任意の値である。これにより、分析モデルの定義域の任意の部分集合Ａにおいて、出力ｙの挙動を制御することができる。 In the equation (5), the function IA (x) is a function (domain determination function) that outputs 1 when x ∈ _A and 0 when x ∈ _A is not. Here, x is an arbitrary value that can be taken as an input. As a result, the behavior of the output y can be controlled in any subset A of the domain of the analysis model.

ここで、関数制御部１６が、導出する正則化項Ｌの例について、図３を用いて説明する。図３は、第１の実施形態の関数制御部１６が行う処理を説明する図である。図３では、業務上の知見、ｆ（ｘ）がどのような挙動になるとよいか、ＧｒａｄＬｏｓｓの定義式の各項目を備える。
「業務上の知見」には、予測モデルに予測させる項目に応じて人間が設定する知見が示される。なお、知見は業務上のものに限定されることはなく、例えば、歴史的な背景によるもの、経験によるもの、前提や想定によるもの、及びこれらの組み合わせによる知見が含まれてよい。
「ｆ（ｘ）がどのような挙動になるとよいか」には、業務上の知見に対応する出力ｆ（ｘ）の挙動が数式により示されている。「ＧｒａｄＬｏｓｓの定義式」には、業務上の知見に対応する具体的なＧｒａｄＬｏｓｓの式が示されている。 Here, an example of the regularization term L derived by the function control unit 16 will be described with reference to FIG. FIG. 3 is a diagram illustrating a process performed by the function control unit 16 of the first embodiment. FIG. 3 includes each item of the definition formula of GradLoss, which is the business knowledge and what kind of behavior f (x) should be.
“Business knowledge” indicates knowledge set by humans according to the items predicted by the prediction model. The knowledge is not limited to business, and may include, for example, historical background, experience, assumptions and assumptions, and knowledge based on a combination thereof.
In "What kind of behavior should f (x) be?", The behavior of the output f (x) corresponding to the business knowledge is shown by a mathematical formula. The “GradLoss definition formula” shows a specific GradLoss formula corresponding to business knowledge.

関数制御部１６は、例えば、図３の１番目の項目に示すように、入力ｘに対して出力ｙが増加傾向であるべきとする業務上の知見がある場合、ｄｙ／ｄｘ＞０、つまりｙの一回微分が正であるような挙動が望ましいと判定する。この場合、関数制御部１６は、ＧｒａｄＬｏｓｓ関数として、ｍａｘ（（−１）×ｄｙ／ｄｘ、０）を定義する。ここでのｍａｘ関数は、引数に示された二つの値を比較して大きい方を出力する関数である。
例えば、ｄｙ／ｄｘが正である場合、（（−１）×ｄｙ／ｄｘ）は負となり、ＧｒａｄＬｏｓｓ関数は０を出力する。一方、ｄｙ／ｄｘが負である場合、（（−１）×ｄｙ／ｄｘ）は正となり、ＧｒａｄＬｏｓｓ関数は（（−１）×ｄｙ／ｄｘ）を出力する。 For example, as shown in the first item of FIG. 3, the function control unit 16 has dy / dx> 0, that is, when there is business knowledge that the output y should tend to increase with respect to the input x. It is determined that the behavior in which the first derivative of y is positive is desirable. In this case, the function control unit 16 defines max ((-1) × dy / dx, 0) as the GradLoss function. The max function here is a function that compares the two values indicated in the arguments and outputs the larger one.
For example, when dy / dx is positive, ((-1) × dy / dx) becomes negative, and the GradLoss function outputs 0. On the other hand, when dy / dx is negative, ((-1) × dy / dx) becomes positive, and the GradLoss function outputs ((-1) × dy / dx).

ＧｒａｄＬｏｓｓ関数は、（４）式、及び（５）式に示すように、損失関数に加算される正則化項Ｌを構成する。したがって、ｄｙ／ｄｘが正である場合に正則化項Ｌは０となり、予測部１５により学習に用いられる損失関数として、（１）式又は（２）式に示す損失関数ｌそのものが適用される。一方、ｄｙ／ｄｘが負である場合に正則化項Ｌは（（−１）×ｄｙ／ｄｘ）に応じた値となり、予測部１５により学習に用いられる損失関数として、（１）式又は（２）式に示す損失関数ｌに、（（−１）ｄｙ／ｄｘ）に応じた正則化項Ｌを加算したものが適用される。
なお、入力ｘに対して出力ｙが減少傾向であるべきとする業務上の知見がある場合には、ＧｒａｄＬｏｓｓ関数として、ｍａｘ（ｄｙ／ｄｘ、０）を定義してもよい。 The GradLoss function constitutes the regularization term L to be added to the loss function, as shown in Eqs. (4) and (5). Therefore, when dy / dx is positive, the regularization term L becomes 0, and the loss function l itself shown in Eq. (1) or (2) is applied as the loss function used for learning by the prediction unit 15. .. On the other hand, when dy / dx is negative, the regularization term L becomes a value corresponding to ((-1) × dy / dx), and the loss function used for learning by the prediction unit 15 is the equation (1) or (1). The loss function l shown in Eq. 2) plus the regularization term L according to ((-1) dy / dx) is applied.
If there is business knowledge that the output y should tend to decrease with respect to the input x, max (dy / dx, 0) may be defined as the GradLoss function.

関数制御部１６は、図３の２番目の項目に示すように、入力ｘに対して出力ｙにおける増加の程度が強すぎるとの業務上の知見がある場合、ｄｙ／ｄｘ＜ｂ、つまりｙの傾きがｂより小さくなるような挙動が望ましいと判定する。ここでのｂは、任意の正の実数である。この場合、関数制御部１６は、ＧｒａｄＬｏｓｓ関数として、（ｍａｘ（ｄｙ／ｄｘ、ｂ）−ｂ）を定義する。 As shown in the second item of FIG. 3, the function control unit 16 has dy / dx <b, that is, y when there is a business finding that the degree of increase in the output y is too strong with respect to the input x. It is determined that the behavior in which the inclination of is smaller than b is desirable. Here, b is an arbitrary positive real number. In this case, the function control unit 16 defines (max (dy / dx, b) -b) as the GradLoss function.

例えば、ｄｙ／ｄｘがｂより小さい場合、ｍａｘ関数からｂが出力され、ＧｒａｄＬｏｓｓ関数は０を出力する。一方、ｄｙ／ｄｘがｂより大きい場合、ｍａｘ関数からｄｙ／ｄｘが出力され、ＧｒａｄＬｏｓｓ関数は（ｄｙ／ｄｘ−ｂ）を出力する。したがって、ｄｙ／ｄｘがｂより小さい場合に正則化項Ｌは０となり、予測部１５により学習に用いられる損失関数として、（１）式又は（２）式に示す損失関数ｌそのものが適用される。一方、ｄｙ／ｄｘがｂより大きい場合に正則化項Ｌは（ｄｙ／ｄｘ−ｂ）となり、予測部１５により学習に用いられる損失関数として、（１）式又は（２）式に示す損失関数ｌに、（ｄｙ／ｄｘ−ｂ）に応じた正則化項Ｌを加算したものが適用される。
なお、入力ｘに対して出力ｙにおける減少の程度に着目する場合には、ＧｒａｄＬｏｓｓ関数として、ｍａｘ（−ｄｙ／ｄｘ、−ｂ）＋ｂを定義してもよく、強すぎる、弱すぎるといった程度に着目する場合には、ｂの値を適宜設定してよい。 For example, when dy / dx is smaller than b, b is output from the max function and 0 is output from the GradLoss function. On the other hand, when dy / dx is larger than b, dy / dx is output from the max function, and (dy / dx−b) is output from the GradLoss function. Therefore, when dy / dx is smaller than b, the regularization term L becomes 0, and the loss function l itself shown in Eq. (1) or (2) is applied as the loss function used for learning by the prediction unit 15. .. On the other hand, when dy / dx is larger than b, the regularization term L becomes (dy / dx−b), and the loss function used for learning by the prediction unit 15 is the loss function shown in Eq. (1) or (2). The sum of l plus the regularization term L according to (dy / dx−b) is applied.
When paying attention to the degree of decrease in the output y with respect to the input x, max (-dy / dx, -b) + b may be defined as the GradLoss function, to the extent that it is too strong or too weak. When paying attention, the value of b may be set as appropriate.

関数制御部１６は、図３の３番目の項目に示すように、入力ｘに対して出力ｙにおける増加の傾向が下に凸であるべきとの業務上の知見がある場合、ｄ＾２ｙ／ｄｘ＾２＞０、つまりｙの二回微分が正となるような挙動が望ましいと判定する。この場合、関数制御部１６は、ＧｒａｄＬｏｓｓ関数として、（ｍａｘ（（−１）×ｄ＾２ｙ／ｄｘ＾２、０）を定義する。 As shown in the third item of FIG. 3, the function control unit 16 has a business knowledge that the tendency of increase in the output y should be convex downward with respect to the input x, when d ^ 2y / It is determined that dx ^ 2> 0, that is, the behavior in which the double derivative of y is positive is desirable. In this case, the function control unit 16 defines (max ((-1) × d ^ 2y / dx ^ 2, 0) as the GradLoss function.

例えば、ｄ＾２ｙ／ｄｘ＾２が正である場合、ＧｒａｄＬｏｓｓ関数は０を出力する。一方、ｄ＾２ｙ／ｄｘ＾２が負である場合、ＧｒａｄＬｏｓｓ関数は（（−１）×ｄ＾２ｙ／ｄｘ＾２）を出力する。したがって、ｄ＾２ｙ／ｄｘ＾２が正である場合に正則化項Ｌは０となり、予測部１５により学習に用いられる損失関数として、（１）式又は（２）式に示す損失関数ｌそのものが適用される。一方、（ｄ＾２ｙ／ｄｘ＾２）が負である場合に正則化項Ｌは（（−１）×ｄ＾２ｙ／ｄｘ＾２）となり、予測部１５により学習に用いられる損失関数として、（１）式又は（２）式に示す損失関数ｌに、（（−１）×ｄ＾２ｙ／ｄｘ＾２）に応じた正則化項Ｌを加算したものが適用される。
なお、入力ｘに対して出力ｙにおける増加の傾向が上に凸であるべき場合には、ＧｒａｄＬｏｓｓ関数として、ｍａｘ（ｄ＾２ｙ／ｄｘ＾２、０）を定義してもよい。 For example, if d ^ 2y / dx ^ 2 is positive, the GradLoss function outputs 0. On the other hand, when d ^ 2y / dx ^ 2 is negative, the GradLoss function outputs ((-1) × d ^ 2y / dx ^ 2). Therefore, when d ^ 2y / dx ^ 2 is positive, the regularization term L becomes 0, and the loss function l itself shown in Eq. (1) or (2) is used as the loss function for learning by the prediction unit 15. Is applied. On the other hand, when (d ^ 2y / dx ^ 2) is negative, the regularization term L becomes ((-1) × d ^ 2y / dx ^ 2), and the loss function used for learning by the prediction unit 15 is The loss function l shown in the equation (1) or the equation (2) plus the regularization term L according to ((-1) × d ^ 2y / dx ^ 2) is applied.
If the tendency of the increase in the output y should be convex with respect to the input x, max (d ^ 2y / dx ^ 2, 0) may be defined as the GradLoss function.

このように、関数制御部１６は、業務上の知見に応じて、出力ｙにおいて望ましい挙動を示す数式（例えば、出力ｙの導関数）を規定する。関数制御部１６は、出力ｙにおける入出力の関係において、規定した数式に合致する挙動が示されている場合に０となり、当該数式に合致しない挙動が示されている場合に０でない値となる正則化項Ｌを導出する。これにより、関数制御部１６は、予測モデル（予測関数ｙ）が業務上の知見に応じた挙動をする場合としない場合とで、互いに異なる正則化項Ｌを導出することができる。したがって、関数制御部１６は、予測モデル（予測関数ｙ）が業務上の知見に応じた挙動をしない場合に、業務上の知見に応じた挙動をする場合と比較して、値が大きい正則化項Ｌを導出して、損失関数ｌ＃を大きくしたものを用いて予測モデルを学習させることができる。予測モデルは、学習の過程において、業務上の知見に応じた挙動をしない場合に、損失が大きくなることを学習し、業務上の知見に応じた挙動をするように、予測モデルのパラメータ（重みＷ及びバイアス成分ｂ）を決定するように学習が進められることが期待できる。 In this way, the function control unit 16 defines a mathematical formula (for example, a derivative of the output y) that shows desirable behavior in the output y, depending on the business knowledge. The function control unit 16 becomes 0 when the behavior that matches the specified mathematical expression is shown in the input / output relationship at the output y, and becomes a non-zero value when the behavior that does not match the mathematical expression is shown. The regularization term L is derived. As a result, the function control unit 16 can derive a regularization term L that is different from each other depending on whether the prediction model (prediction function y) behaves according to business knowledge or not. Therefore, the function control unit 16 regularizes the value when the prediction model (prediction function y) does not behave according to the business knowledge, as compared with the case where the prediction model (prediction function y) behaves according to the business knowledge. The term L can be derived and the prediction model can be trained using a larger loss function l #. The prediction model learns that the loss increases when it does not behave according to the business knowledge in the learning process, and the parameters (weights) of the prediction model so as to behave according to the business knowledge. It can be expected that learning will proceed so as to determine W and the bias component b).

なお、図３の例では、正則化項Ｌの変数が、予測関数ｙの導関数である場合を例示して説明したが、これに限定されない。例えば、正則化項Ｌに用いられる変数は、予測関数ｙの出力ｙそのものであってもよい。この場合、出力ｙの値が一定の範囲から逸脱することがあり得ないとする業務上の知見がある場合に、出力ｙが一定の範囲に収まるように、予測関数ｙの挙動を是正することが可能となる。 In the example of FIG. 3, the case where the variable of the regularization term L is a derivative of the prediction function y has been described as an example, but the present invention is not limited to this. For example, the variable used for the regularization term L may be the output y of the prediction function y itself. In this case, if there is business knowledge that the value of the output y cannot deviate from a certain range, correct the behavior of the prediction function y so that the output y falls within a certain range. Is possible.

また、正則化項Ｌに用いられる変数は、予測関数ｙの入力ｘそのものであってもよい。この場合、入力ｘの値に応じた予測値の挙動を制御することができる。
一般に予測モデルは、学習により予測モデルのパラメータが決定されていれば、入力に対する予測値を出力することが可能である。このため、入力ｘが所定の範囲でしか得られない、つまり学習データが所定の範囲でしか得られてない場合であっても、その範囲外の入力に対する予測値を出力することができてしまう。この場合、学習データが存在していないため、予測値は実績（学習データと教師データとの対応関係）とは無関係に出力されてしまう。このため、入力ｘにおける所定の範囲において業務上の知見とは乖離した結果となる可能性が高い。
このような場合に、正則化項Ｌの変数として入力ｘを用いることで、入力ｘにおける特定の範囲において、業務上の知見に沿うように予測関数ｙの挙動を是正することが可能となる。例えば、入力ｘにおける特定の範囲において、出力ｙが増加と減少を繰り返すような挙動が予測されている場合において、その範囲の出力ｙが増加する傾向にあるという知見を与えれば、出力ｙが減少する挙動を見直すような学習がなされ、学習データが存在しない範囲においても、出力ｙが増加する傾向となるように、予測関数ｙの挙動を是正することが可能となる。
なお、学習データが存在しない、或いは、不足する領域において、ダミーデータ（ｃ、ｆ（ｃ））を作成し、これを用いて予測関数ｙの挙動の是正を行ってもよい。ここでのｃは、学習データには存在しない、予測関数ｙの定義域上の任意の点である。
また、正則化項Ｌの入力を、損失関数ｌの入力と独立させても良い。例えば、正則化項Ｌの計算に予測関数ｙの値が不要である場合に、損失関数ｌで用いる学習データ（ｘ、ｙ）に対して、正則化項Ｌの入力データを（ｘ＋εｉ、ｙ）としても良い。ここでεｉは任意の分布からサンプリングされた任意個数のノイズ値である。 Further, the variable used for the regularization term L may be the input x itself of the prediction function y. In this case, the behavior of the predicted value according to the value of the input x can be controlled.
Generally, the prediction model can output the prediction value for the input if the parameters of the prediction model are determined by learning. Therefore, even if the input x can be obtained only in a predetermined range, that is, the training data can be obtained only in a predetermined range, the predicted value for the input outside the range can be output. .. In this case, since the learning data does not exist, the predicted value is output regardless of the actual result (correspondence between the learning data and the teacher data). Therefore, there is a high possibility that the result will be different from the business knowledge within a predetermined range of the input x.
In such a case, by using the input x as the variable of the regularization term L, it is possible to correct the behavior of the prediction function y in a specific range of the input x so as to be in line with the business knowledge. For example, when the behavior in which the output y repeatedly increases and decreases is predicted in a specific range of the input x, the output y decreases if the knowledge that the output y in that range tends to increase is given. Learning is performed to review the behavior of the prediction function y, and it is possible to correct the behavior of the prediction function y so that the output y tends to increase even in the range where the training data does not exist.
In addition, dummy data (c, f (c)) may be created in a region where learning data does not exist or is insufficient, and the behavior of the prediction function y may be corrected by using the dummy data (c, f (c)). Here, c is an arbitrary point on the domain of the prediction function y that does not exist in the training data.
Further, the input of the regularization term L may be made independent of the input of the loss function l. For example, when the value of the prediction function y is not required for the calculation of the regularization term L, the input data of the regularization term L is (x + εi, y) with respect to the training data (x, y) used in the loss function l. May be. Here, εi is an arbitrary number of noise values sampled from an arbitrary distribution.

図４は、第１の実施形態の予測装置１が行う処理の流れを示すフローチャートである。
まず、予測装置１の学習部１４は、予測モデルを学習させる（ステップＳ１０）。学習部１４は、予測モデルに学習データを入力した場合に出力される予測値が、学習データに対応付けられた教師データに近づくように、予測モデルのパラメータを決定することにより、予測モデルを学習させる。 FIG. 4 is a flowchart showing a flow of processing performed by the prediction device 1 of the first embodiment.
First, the learning unit 14 of the prediction device 1 trains the prediction model (step S10). The learning unit 14 learns the prediction model by determining the parameters of the prediction model so that the prediction value output when the training data is input to the prediction model approaches the teacher data associated with the training data. Let me.

関数制御部１６は、学習された学習済みの予測モデルの挙動に違和感が有るか否かを判定する（ステップＳ１１）。関数制御部１６は、予測モデルが予測する項目に応じて予め設定された業務上の知見に応じた数式（例えば、出力ｙの任意階数の導関数）に、予測モデルの挙動が合致しない場合に、予測モデルの挙動に違和感が有ると判定する。 The function control unit 16 determines whether or not there is a sense of discomfort in the behavior of the trained predicted model (step S11). When the function control unit 16 does not match the behavior of the prediction model with a mathematical formula (for example, a derivative of an arbitrary rank of the output y) set in advance according to the items predicted by the prediction model. , Judge that there is a sense of discomfort in the behavior of the prediction model.

関数制御部１６は、予測モデルの挙動に違和感が有る場合、損失関数ｌに、当該違和感に対応した正則化項Ｌを加算する（ステップＳ１２）。関数制御部１６は、例えば、ｍａｘ関数などを用いて、業務上の知見に応じた数式に合致しない程度に応じた正則化項Ｌとすることにより、当該違和感に対応した正則化項Ｌを導出する。
学習部１４は、関数制御部１６により導出された正則化項Ｌを損失関数ｌに加算した損失関数ｌ＃を用いて、予測モデルの再学習を実施する（ステップＳ１３）。
学習部１４は、予測モデルの再学習を実施（ステップＳ１３）した後、予測モデルの学習における終了条件を満たすか否かを判定する（ステップＳ１４）。学習の終了条件は、予め定められた条件であって、例えば、予測値と教師データとの誤差が所定の閾値未満となったこと、一回の学習あたりの予測値と教師データとの誤差の変化量が所定の閾値未満となったこと、等である。
学習部１４は、予測モデルを学習させる学習の終了条件を満たさない場合、ステップＳ１０に戻り、再学習された学習済の予測モデルが終了条件を満たすようにするための学習を行う。このようにして、学習部１４は、予測モデルにおける学習の終了条件を満たすまで、ステップＳ１０〜Ｓ１３に示す処理のフローを繰り返す。 When the behavior of the prediction model has a feeling of strangeness, the function control unit 16 adds the regularization term L corresponding to the feeling of strangeness to the loss function l (step S12). The function control unit 16 derives the regularization term L corresponding to the discomfort by, for example, using the max function or the like to set the regularization term L according to the degree that does not match the mathematical formula according to the business knowledge. To do.
The learning unit 14 retrains the prediction model using the loss function l # obtained by adding the regularization term L derived by the function control unit 16 to the loss function l (step S13).
After re-learning the prediction model (step S13), the learning unit 14 determines whether or not the end condition in the learning of the prediction model is satisfied (step S14). The learning end condition is a predetermined condition, for example, that the error between the predicted value and the teacher data is less than a predetermined threshold value, and the error between the predicted value and the teacher data per learning. The amount of change is less than a predetermined threshold, and so on.
When the learning end condition for learning the prediction model is not satisfied, the learning unit 14 returns to step S10 and performs learning for making the relearned learned prediction model satisfy the end condition. In this way, the learning unit 14 repeats the flow of processing shown in steps S10 to S13 until the learning end condition in the prediction model is satisfied.

なお、上述した図４のフローチャートでは、ステップＳ１１において、関数制御部１６が学習済みの予測モデルに対し、違和感の有無を判定する場合を例示したが、これに限定されることはない。関数制御部１６は、学習前、学習過程、及び学習済みの何れの過程においても、予測モデルの挙動について違和感の有無を判定してもよい。すなわち、学習前、学習過程、及び学習済みの何れの過程における予測モデルの挙動についても、損失関数ｌ＃を用いた学習が行われることにより、予測モデルの挙動が業務上の知見に沿うように再構築されてよい。 In the flowchart of FIG. 4 described above, in step S11, the case where the function control unit 16 determines whether or not there is a sense of discomfort with respect to the trained prediction model is illustrated, but the present invention is not limited to this. The function control unit 16 may determine whether or not there is a sense of discomfort in the behavior of the prediction model in any of the pre-learning, the learning process, and the learned process. That is, for the behavior of the prediction model in any of the pre-learning, the learning process, and the trained process, the behavior of the prediction model is in line with the business knowledge by learning using the loss function l #. It may be rebuilt.

以上説明したように、第１の実施形態の予測装置１では、関数制御部１６と学習部１４と予測部１５とを備える。関数制御部１６は、予測関数ｙの挙動を制御する。学習部１４は、関数制御部１６により挙動を制御された予測関数ｙに、学習データを入力させることにより得られる出力が、当該学習データに対応する教師データに近づくように、予測モデルを学習させる。予測部１５は、学習部１４による学習済みの予測モデルに、未学習データを入力させることにより得られる出力に基づいて、入力に対する予測値を予測する。
これにより、第１の実施形態の予測装置１では、関数制御部１６が予測関数ｙの挙動を制御することができ、予測関数ｙの挙動が、業務上の知見と異なる場合に、是正することができ、予測モデルに、人間の知見が反映され易くなるように機械学習させることが可能となる。 As described above, the prediction device 1 of the first embodiment includes a function control unit 16, a learning unit 14, and a prediction unit 15. The function control unit 16 controls the behavior of the prediction function y. The learning unit 14 trains the prediction model so that the output obtained by inputting the training data to the prediction function y whose behavior is controlled by the function control unit 16 approaches the teacher data corresponding to the learning data. .. The prediction unit 15 predicts the predicted value for the input based on the output obtained by inputting the unlearned data into the predicted model trained by the learning unit 14.
As a result, in the prediction device 1 of the first embodiment, the function control unit 16 can control the behavior of the prediction function y, and if the behavior of the prediction function y is different from the business knowledge, correct it. This makes it possible to make the prediction model machine-learn so that human knowledge can be easily reflected.

また、第１の実施形態の予測装置１では、関数制御部１６は、予め設定した所定の損失関数ｌに正則化項Ｌを加算したものを、予測モデルを学習させる過程において用いる損失関数ｌ＃とすることにより、予測関数ｙの挙動を制御する。また、正則化項Ｌは、予測関数ｙ、及び予測関数ｙに用いられる変数（例えば、予測関数ｙの入力ｘ）に基づいて導出される関数を変数とする関数に、所定の正則化重みλが乗算されることにより生成される。
これにより、第１の実施形態の予測装置１では、損失関数ｌに正則化項Ｌを加算することにより、業務上の知見と異なる場合に損失が大きく見えるようにして、予測関数ｙの挙動が業務上の知見に沿うように学習させることができ、上述した効果と同様の効果を奏する。 Further, in the prediction device 1 of the first embodiment, the function control unit 16 uses a preset loss function l plus the regularization term L in the process of training the prediction model. By doing so, the behavior of the prediction function y is controlled. Further, the regularization term L is a function whose variable is a function derived based on the prediction function y and the variable used for the prediction function y (for example, the input x of the prediction function y), and has a predetermined regularization weight λ. Is generated by multiplying.
As a result, in the prediction device 1 of the first embodiment, the regularization term L is added to the loss function l so that the loss looks large when it differs from the business knowledge, and the behavior of the prediction function y changes. It can be learned in line with business knowledge, and has the same effect as the above-mentioned effect.

また、第１の実施形態の予測装置１では、正則化項は、予測関数ｙを、予測関数ｙの入力ｘで微分することにより導出される導関数（例えば、ｄｙ／ｄｘ）を変数とする関数に、所定の正則化重みλが乗算されることにより生成される。これにより、第１の実施形態の予測装置１では、入力ｘに対する出力ｙの傾きに応じた正則化項Ｌを導出することができ、入力ｘに対する出力ｙの傾きが業務上の知見に沿うように学習させることができ、上述した効果と同様の効果を奏する。 Further, in the prediction device 1 of the first embodiment, the regularization term uses a derivative (for example, dy / dx) derived by differentiating the prediction function y with the input x of the prediction function y as a variable. It is generated by multiplying the function by a predetermined regularization weight λ. As a result, in the prediction device 1 of the first embodiment, the regularization term L can be derived according to the slope of the output y with respect to the input x, so that the slope of the output y with respect to the input x is in line with the business knowledge. Can be trained to produce the same effect as the above-mentioned effect.

また、第１の実施形態の予測装置１では、正則化項は、予測関数ｙの入力ｘの値に応じて互いに異なる関数（例えば、（５）式のＩ_Ａ（ｘ））を含む。これにより、第１の実施形態の予測装置１では、入力ｘの特定の範囲に応じた正則化項Ｌを導出することができ、入力ｘの特定の範囲における出力ｙの挙動を業務上の知見に沿うように学習させることができ、上述した効果と同様の効果を奏する。 Further, the prediction apparatus 1 of the first embodiment, regularization term includes different depending on the value of the input x of the prediction function y function (e.g., (5) formula I _{A (x))} a. As a result, in the prediction device 1 of the first embodiment, the regularization term L corresponding to the specific range of the input x can be derived, and the behavior of the output y in the specific range of the input x can be known in business. It can be trained according to the above-mentioned effect, and has the same effect as the above-mentioned effect.

また、第１の実施形態の予測装置１では、正則化項は、予測関数ｙの出力ｙを変数とする関数に、所定の正則化重みλが乗算されることにより生成される。これにより第１の実施形態の予測装置１では、出力ｙの値に応じた正則化項Ｌを導出することができ、例えば、出力ｙの値が一定の範囲から逸脱することがあり得ないとする業務上の知見がある場合に、出力ｙの挙動を業務上の知見に沿うように学習させることができる。 Further, in the prediction device 1 of the first embodiment, the regularization term is generated by multiplying a function whose variable is the output y of the prediction function y by a predetermined regularization weight λ. As a result, in the prediction device 1 of the first embodiment, the regularization term L corresponding to the value of the output y can be derived, and for example, the value of the output y cannot deviate from a certain range. When there is business knowledge to be performed, the behavior of the output y can be learned so as to be in line with the business knowledge.

また、第１の実施形態の予測装置１では、正則化項は、予測関数ｙの入力ｘを変数とする関数に、所定の正則化重みλが乗算されることにより生成される。これにより実施形態の予測装置１では、入力ｘの値に応じた正則化項Ｌを導出することができ、例えば、入力ｘにおける所定の範囲において、学習データが存在していないために予測値が制御できない場合であっても、ダミーデータ等を用いて予測関数ｙの挙動を業務上の知見に沿うように学習させることができる。 Further, in the prediction device 1 of the first embodiment, the regularization term is generated by multiplying a function whose variable is the input x of the prediction function y by a predetermined regularization weight λ. As a result, in the prediction device 1 of the embodiment, the regularization term L corresponding to the value of the input x can be derived. For example, since the learning data does not exist in the predetermined range of the input x, the predicted value is set. Even if it cannot be controlled, it is possible to learn the behavior of the prediction function y by using dummy data or the like so as to be in line with business knowledge.

また、第１の実施形態の予測装置１が、学習済みモデルを生成する学習装置と、学習装置により生成された学習済みモデルとを用いて予測を行う制御装置とで構成されてもよい。この場合、学習装置は、関数制御部１６と学習部１４とを備える。学習装置が関数制御部１６を備えることにより、実施形態の学習装置は、業務上の知見を反映した予測モデルを作成することができ、上述した効果と同様の効果を奏する。 Further, the prediction device 1 of the first embodiment may be composed of a learning device that generates a learned model and a control device that makes a prediction using the learned model generated by the learning device. In this case, the learning device includes a function control unit 16 and a learning unit 14. When the learning device includes the function control unit 16, the learning device of the embodiment can create a prediction model that reflects business knowledge, and has the same effect as the above-mentioned effect.

なお、上述した第１の実施形態では、予測モデルにＲＮＮが適用された場合を例示して説明したが、これに限定されない。例えば、予測モデルとして、ＲＮＮ以外の再帰型ニューラルネットワークであるＬＳＴＭ（Long Short Term Memory）が適用されてもよいし、順伝搬型のニューラルネットワークが適用されてもよい。順伝搬型の場合、予測モデルとして多層パーセプトロンが適用されてもよい。また、予測モデルとしてニューラルネットワーク以外の他の機械学習が用いられてもよい。 In the first embodiment described above, the case where the RNN is applied to the prediction model has been illustrated and described, but the present invention is not limited to this. For example, as a prediction model, an LSTM (Long Short Term Memory), which is a recurrent neural network other than RNN, may be applied, or a forward propagation type neural network may be applied. In the case of the forward propagation type, a multi-layer perceptron may be applied as a prediction model. Further, machine learning other than the neural network may be used as the prediction model.

また、上述した第１の実施形態では、ＧｒａｄＬｏｓｓ関数として、ｍａｘ関数を用いる場合を例示して説明したが、これに限定されない。ＧｒａｄＬｏｓｓ関数として、少なくとも出力ｆ（ｘ）の挙動について、人間の知見を反映させた関数や数式が用いられればよい。例えば、ＧｒａｄＬｏｓｓの関数として、ｍａｘ関数の代わりに、ｍｉｎ関数が用いられてもよいのは勿論である。ｍｉｎ関数は、引数に示された複数の値のうち、最も小さい値を出力する関数である。 Further, in the above-described first embodiment, the case where the max function is used as the GradLoss function has been described as an example, but the present invention is not limited to this. As the GradLoss function, a function or mathematical formula that reflects human knowledge may be used at least for the behavior of the output f (x). For example, as a function of GradLoss, it goes without saying that a min function may be used instead of the max function. The min function is a function that outputs the smallest value among a plurality of values indicated in the argument.

次に、第２の実施形態について説明する。本実施形態では、予測関数ｙの挙動が業務上の知見に沿うように正則化項Ｌを規定する際に、予測関数の局所的な情報への依存度を低下させている点において、上述した実施形態と相違する。 Next, the second embodiment will be described. In the present embodiment, when the regularization term L is defined so that the behavior of the prediction function y is in line with business knowledge, the dependence of the prediction function on local information is reduced, as described above. Different from the embodiment.

図６は、第２の実施形態における課題を説明する図である。図６のグラフの横軸は広告費用、縦軸は売上を示す。図６には、広告費用と売上の関係が、「点」と「実線」の２パターンで示されている。「点」で示されているパターンは、広告費用と売上の関係を示す実績データである。「実線」で示されているパターンは、実績データを用いた学習をしたモデルが予測した曲線（広告費用と売上の関係を予測する予測関数）である。 FIG. 6 is a diagram illustrating a problem in the second embodiment. The horizontal axis of the graph in FIG. 6 shows advertising costs, and the vertical axis shows sales. In FIG. 6, the relationship between advertising costs and sales is shown in two patterns, “dots” and “solid lines”. The pattern indicated by "dots" is actual data showing the relationship between advertising costs and sales. The pattern shown by the "solid line" is a curve predicted by a model trained using actual data (a prediction function that predicts the relationship between advertising costs and sales).

図６では、モデルに、正則化項ＬにおけるＧｒａｄＬｏｓｓ関数として、（ｍａｘ（（−１）×ｄｙ／ｄｘ、０）を採用した場合において、モデルが予測した曲線が示されている。 In FIG. 6, the curve predicted by the model when (max ((-1) × dy / dx, 0) is adopted as the GradLoss function in the regularization term L is shown in the model.

図６のグラフの「点」のパターンに示すように、広告費用と売上の関係が、何らかの要因により、単調に増加する傾向にない箇所が散見される実績データが得られた場合を考える。このような実績データを用いた学習を行う際、「広告費用に対する売上は増加する傾向にある」との業務上の知見から、モデルに、式（ＧＬ１）に示す、ＧｒａｄＬｏｓｓ＝（ｍａｘ（（−１）×ｄｙ／ｄｘ、０）を採用したとする。 As shown in the "dot" pattern of the graph of FIG. 6, consider the case where the actual data is obtained in which the relationship between the advertising cost and the sales does not tend to increase monotonically for some reason. When learning using such performance data, from the business knowledge that "sales for advertising costs tend to increase", the model is shown in the formula (GL1), which is GradeLoss = (max ((-). It is assumed that 1) × dy / dx, 0) is adopted.

この場合、最終的に得られる予測関数が、各点について制約を満たすが、全体として単調増加しないものとなる可能性がある。すなわち、実績データの各点において増加傾向とするために、点と点の間で一旦減少させるように予測してしまうことが考えられる。このような予測関数が出力する売上予測は、「広告費用に対する売上は増加する傾向にある」との業務上の知見には必ずしも合致しないものとなる。 In this case, the finally obtained prediction function may satisfy the constraints for each point, but may not increase monotonically as a whole. That is, in order to make the actual data tend to increase at each point, it is conceivable that the data is predicted to decrease once between the points. The sales forecast output by such a forecast function does not necessarily match the business knowledge that "sales for advertising costs tend to increase".

この対策として、本実施形態では、予測の対象とする範囲の全体において、予測関数ｙの挙動が業務上の知見に沿うように正則化項Ｌを規定する。予測の対象とする範囲とは、図６の例における広告費用の範囲、或いは売上の範囲である。すなわち、本実施形態では、広告費用の範囲の全体において、「広告費用に対する売上は増加する傾向にある」との業務上の知見に合致するように、正則化項Ｌを規定する。 As a countermeasure for this, in the present embodiment, the regularization term L is defined so that the behavior of the prediction function y is in line with the business knowledge in the entire range to be predicted. The range to be predicted is the range of advertising costs or the range of sales in the example of FIG. That is, in the present embodiment, the regularization term L is defined so as to match the business knowledge that "sales for advertising costs tend to increase" in the entire range of advertising costs.

図７は、第２の実施形態において予測装置１が行う処理を説明する図である。図７のグラフの横軸と縦軸及び、「点」と「実線」の２パターンが示す内容は、図６と同様であるため、その説明を省略する。図７では、モデルに式（ＧＬ２）に示すＧｒａｄＬｏｓｓ関数を採用する。 FIG. 7 is a diagram illustrating a process performed by the prediction device 1 in the second embodiment. Since the contents shown by the horizontal axis and the vertical axis of the graph of FIG. 7 and the two patterns of "point" and "solid line" are the same as those of FIG. 6, the description thereof will be omitted. In FIG. 7, the GradLoss function shown in the equation (GL2) is adopted as the model.

ＧｒａｄＬｏｓｓ＝（−１）×｛ｍｉｎ（ｆ（ａ）−ｆ（ａ−ε）、０）
＋ｍｉｎ（ｆ（ａ＋ε）−ｆ（ａ）、０）｝ …（ＧＬ２） GradLoss = (-1) x {min (f (a) -f (a-ε), 0)
+ Min (f (a + ε) −f (a), 0)}… (GL2)

式（ＧＬ２）におけるｆ（ｘ）は変数ｘの予測関数ｙ（ｙ＝ｆ（ｘ））である。定数ａは変数ｘが取り得る範囲における任意の実数である。幅εは定数ａの近傍として規定する範囲であり任意に設定されてよい。例えば、幅εは、実績データにおけるｘ軸方向のデータ間隔の平均値である。或いは、幅εとして、実績データにおけるｙ軸方向のデータ間隔の平均値を用いてもよい。なお、図７の例では、実績データにおけるｘ軸は広告費用を示す軸であり、実績データにおけるｙ軸は売上を示す軸である。 F (x) in the equation (GL2) is a prediction function y (y = f (x)) of the variable x. The constant a is an arbitrary real number in the range that the variable x can take. The width ε is a range defined as the vicinity of the constant a and may be arbitrarily set. For example, the width ε is the average value of the data intervals in the x-axis direction in the actual data. Alternatively, as the width ε, the average value of the data intervals in the y-axis direction in the actual data may be used. In the example of FIG. 7, the x-axis in the actual data is the axis showing the advertising cost, and the y-axis in the actual data is the axis showing the sales.

モデルに式（ＧＬ２）に示すＧｒａｄＬｏｓｓ関数を採用することにより、ある定数ａから±εの幅で規定される範囲において「減少しない」ように、予測関数ｙを学習させることができる。ここでの「減少しない」とは、増加する（傾きが正）か、或いは、増加も減少もしない（傾きが０（ゼロ））であることを示す。 By adopting the GradLoss function shown in the equation (GL2) in the model, the prediction function y can be trained so as not to "decrease" in the range defined by the width of ± ε from a certain constant a. Here, "does not decrease" means that it increases (slope is positive) or does not increase or decrease (slope is 0 (zero)).

以上説明したように、第２の実施形態では、正則化項Ｌは、予測関数ｙ＝ｆ（ｘ）の入力に用いられる変数（ｘ＝ａ）の近傍（ａ±ε）における、当該予測関数ｙの出力（ｆ（ａ−ε）、ｆ（ａ）、ｆ（ａ＋ε））を入力変数とする関数（例えば、式（ＧＬ２）で規定されるＧｒａｄＬｏｓｓ関数）を含む。これにより、予測の対象とする範囲の全体において、予測関数ｙの挙動が業務上の知見に沿うように、モデルに学習させることができる。すなわち、ある特定の箇所、例えば（ｘ＝ａ）の近傍において局所的に業務上の知見に沿わない振る舞いが予測される場合であっても、その範囲に、第２の実施形態で規定する式（ＧＬ２）を用いた学習を行うことにより、その振る舞いを是正することができる。 As described above, in the second embodiment, the regularization term L is the prediction function in the vicinity (a ± ε) of the variable (x = a) used for inputting the prediction function y = f (x). It includes a function (for example, the GradLoss function defined by the equation (GL2)) having the output of y (f (a−ε), f (a), f (a + ε)) as an input variable. As a result, the model can be trained so that the behavior of the prediction function y is in line with the business knowledge in the entire range to be predicted. That is, even if behavior that does not conform to the business knowledge is predicted locally in a specific place, for example, in the vicinity of (x = a), the formula specified in the second embodiment is included in the range. By performing learning using (GL2), the behavior can be corrected.

次に、第２の実施形態の変形例について説明する。本変形例では、ＧｒａｄＬｏｓｓ関数として予測関数ｙのテイラー級数を用いる点において、上述した実施形態と相違する。 Next, a modified example of the second embodiment will be described. This modification differs from the above-described embodiment in that the Taylor series of the prediction function y is used as the GradLoss function.

図８は、第２の実施形態の変形例において予測装置１が行う処理を説明する図である。図８のグラフの横軸と縦軸及び、「点」と「実線」の２パターンが示す内容は、図６と同様であるため、その説明を省略する。図８では、モデルに式（ＧＬ３）に示すＧｒａｄＬｏｓｓ関数を採用する。 FIG. 8 is a diagram illustrating a process performed by the prediction device 1 in the modified example of the second embodiment. Since the contents shown by the horizontal axis and the vertical axis of the graph of FIG. 8 and the two patterns of "point" and "solid line" are the same as those of FIG. 6, the description thereof will be omitted. In FIG. 8, the GradLoss function shown in the equation (GL3) is adopted as the model.

ＧｒａｄＬｏｓｓ
＝｜ｆ（ａ−ε）−（ａ−ε）ｆ′（ａ−ε）−ｆ（ａ）＋ａ×ｆ′（ａ）｜
＋｜ｆ′（ａ−ε）−ｆ′（ａ）｜
＋｜ｆ（ａ）−ａ×ｆ′（ａ）−ｆ（ａ＋ε）＋（ａ＋ε）ｆ′（ａ＋ε）｜
＋｜ｆ′（ａ）−ｆ′（ａ＋ε）｜ …（ＧＬ３） GradLoss
= | F (a-ε)-(a-ε) f'(a-ε) -f (a) + a x f'(a) |
+ | F'(a-ε) -f'(a) |
+ | F (a) -a × f'(a) -f (a + ε) + (a + ε) f'(a + ε) |
+ | F'(a) -f'(a + ε) | ... (GL3)

式（ＧＬ３）におけるｆ（ｘ）は変数ｘの予測関数ｙ（ｙ＝ｆ（ｘ））である。ｆ´（ｘ）は予測関数ｙを一回微分した導関数である。定数ａは変数ｘが取り得る範囲における任意の実数である。幅εは定数ａの近傍として規定する範囲であり任意に設定されてよい。例えば、幅εは、実績データにおけるｘ軸方向のデータ間隔の平均値である。或いは、幅εとして、実績データにおけるｙ軸方向のデータ間隔の平均値を用いてもよい。なお、図８の例でも、図７と同様に、実績データにおけるｘ軸は広告費用を示す軸であり、実績データにおけるｙ軸は売上を示す軸である。 F (x) in the equation (GL3) is a prediction function y (y = f (x)) of the variable x. f'(x) is a derivative obtained by differentiating the prediction function y once. The constant a is an arbitrary real number in the range that the variable x can take. The width ε is a range defined as the vicinity of the constant a and may be arbitrarily set. For example, the width ε is the average value of the data intervals in the x-axis direction in the actual data. Alternatively, as the width ε, the average value of the data intervals in the y-axis direction in the actual data may be used. In the example of FIG. 8, as in FIG. 7, the x-axis in the actual data is the axis showing the advertising cost, and the y-axis in the actual data is the axis showing the sales.

また、式（ＧＬ３）では、予測関数ｙのテイラー級数における有限次数の多項式から求められる近似式が用いられている。予測関数ｙ＝ｆ（ｘ）における、ｘ＝（ａ−ε）、ｘ＝ａ、ｘ＝（ａ＋ε）、のそれぞれのテイラー級数を用いた一次までの近似式は、以下の式（ＧＬ３−１）で示される。 Further, in the equation (GL3), an approximate equation obtained from a polynomial of finite degree in the Taylor series of the prediction function y is used. The approximate expression up to the first order using the Taylor series of x = (a−ε), x = a, x = (a + ε) in the prediction function y = f (x) is the following equation (GL3-1). ).

Ｔ（ａ−ε）＝ｆ（ａ−ε）＋ｆ´（ａ−ε）×（ｘ−（ａ−ε））
Ｔ（ａ）＝ｆ（ａ）＋ｆ´（ａ） ×（ｘ−ａ）
Ｔ（ａ＋ε）＝ｆ（ａ＋ε）＋ｆ´（ａ＋ε）×（ｘ−（ａ＋ε）） …（ＧＬ３−１） T (a-ε) = f (a-ε) + f'(a-ε) x (x- (a-ε))
T (a) = f (a) + f'(a) x (x-a)
T (a + ε) = f (a + ε) + f'(a + ε) × (x− (a + ε))… (GL3-1)

式（ＧＬ３−１）における、関数ｆ（ｘ）、ｆ´（ｘ）、定数ａ、幅εのそれぞれは、式（ＧＬ３）同様であるため、その説明を省略する。 Since each of the functions f (x), f'(x), the constant a, and the width ε in the equation (GL3-1) is the same as in the equation (GL3), the description thereof will be omitted.

ｆ（ａ−ε）とｆ（ａ）の不変性損失をＴ（ａ−ε）とＴ（ａ）の各次数の係数の絶対誤差として定義する。また、ｆ（ａ）とｆ（ａ＋ε）の不変性損失を、Ｔ（ａ）とＴ（ａ＋ε）の各次数の係数の絶対誤差として定義する。そして、これらの二つの不変性損失のそれぞれの各次数の係数の絶対誤差を加算することにより式（ＧＬ３）が得られる。 The invariant loss of f (a-ε) and f (a) is defined as the absolute error of the coefficients of the respective orders of T (a-ε) and T (a). Further, the invariant loss of f (a) and f (a + ε) is defined as the absolute error of the coefficients of the respective orders of T (a) and T (a + ε). Then, the equation (GL3) is obtained by adding the absolute errors of the coefficients of the respective orders of these two invariant losses.

モデルに式（ＧＬ３）に示すＧｒａｄＬｏｓｓ関数を採用することにより、ある定数ａから±εの幅で規定される範囲（以下、定数ａの近傍という）において、予測関数ｙのテイラー級数の特定次数までの近似式が一致するように予測関数が変化する。つまり、モデルは、定数ａの近傍において、定数ａの接線を通過し、且つ「定数ａの近傍の外」と滑らかに接続する曲線となるように、予測関数ｙを学習させることができる。ここでの「定数ａの近傍の外」とは、（ａ−ε）以下となる範囲か、或いは、（ａ＋ε）以上となる範囲を示す。なお、式（ＧＬ３）は、一次までの近似式を採用した場合の例である。式（ＧＬ３）に、任意の次数までの近似式が採用されてもよいのは勿論である。 By adopting the GradLoss function shown in the equation (GL3) in the model, in the range defined by the width of ± ε from a certain constant a (hereinafter referred to as the neighborhood of the constant a), up to the specific order of the Taylor series of the prediction function y. The prediction function changes so that the approximate expressions of are matched. That is, the model can train the prediction function y so that it becomes a curve that passes through the tangent of the constant a in the vicinity of the constant a and smoothly connects to "outside the vicinity of the constant a". Here, "outside the vicinity of the constant a" indicates a range of (a−ε) or less, or a range of (a + ε) or more. The equation (GL3) is an example in which an approximate equation up to the first order is adopted. Of course, an approximate expression up to an arbitrary order may be adopted in the expression (GL3).

以上説明したように、第２の実施形態の変形例では、正則化項Ｌは、予測関数ｙ＝ｆ（ｘ）の入力に用いられる変数（ｘ＝ａ）の近傍（ａ±ε）における、当該予測関数ｙのテイラー級数における所定の次数までの項からなる出力（例えば、式（ＧＬ３）で規定されるＧｒａｄＬｏｓｓ関数）を入力変数とする関数を含む。これにより、上述した第２の実施形態と同様の効果を奏する。すなわち、ある特定の箇所、例えば（ｘ＝ａ）の近傍において局所的に業務上の知見に沿わない振る舞いが予測される場合であっても、その範囲に、第２の実施形態で規定する式（ＧＬ３）を用いた学習を行うことにより、その振る舞いを是正することができる。 As described above, in the modified example of the second embodiment, the regularization term L is located in the vicinity (a ± ε) of the variable (x = a) used for inputting the prediction function y = f (x). Includes a function whose input variable is an output (for example, the GradLoss function defined by the equation (GL3)) consisting of terms up to a predetermined order in the Taylor series of the prediction function y. As a result, the same effect as that of the second embodiment described above is obtained. That is, even if behavior that does not conform to the business knowledge is predicted locally in a specific place, for example, in the vicinity of (x = a), the formula specified in the second embodiment is included in the range. By performing learning using (GL3), the behavior can be corrected.

特に、モデルのパラメータ数が多く、図６の例で示したような見かけ上は制約を充足するが、全体として違和感がある予測が発生する場合に、第２の実施形態、及びその変形例において説明した定義式（式（ＧＬ２）、及び式（ＧＬ３））が効果的である。 In particular, in the second embodiment and its modification, when the number of parameters of the model is large and the constraint is apparently satisfied as shown in the example of FIG. The described definition formulas (formula (GL2) and formula (GL3)) are effective.

以上、いくつかの実施形態を説明したが、各実施形態で説明した構成は、その実施形態のみの適用に限定されることはない。例えば、第１の実施形態で述べた構成を第２の実施形態に適用してもよいし、第２の実施形態で述べた構成を第１の実施形態に適用してもよい。 Although some embodiments have been described above, the configurations described in each embodiment are not limited to the application of only the embodiments. For example, the configuration described in the first embodiment may be applied to the second embodiment, or the configuration described in the second embodiment may be applied to the first embodiment.

上述した実施形態における予測装置１の全部又は一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the prediction device 1 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a storage device such as a flexible disk, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, or a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

１予測装置
１１学習データ取得部
１２教師データ取得部
１３前処理部
１４学習部
１５予測部
１６関数制御部
１７予測モデルパラメータ記憶部 1 Prediction device 11 Learning data acquisition unit 12 Teacher data acquisition unit 13 Preprocessing unit 14 Learning unit 15 Prediction unit 16 Function control unit 17 Prediction model parameter storage unit

Claims

A function control unit that controls the behavior of the prediction function that indicates the relationship between the input and the output in the prediction model that outputs the prediction value for the input.
A learning unit that trains the prediction model so that the output obtained by inputting training data to the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the training data.
A prediction unit that predicts a predicted value for an input based on an output obtained by inputting unlearned data into the prediction model that has been trained by the learning unit.
A prediction device characterized by comprising.

The function control unit controls the behavior of the prediction function by adding a regularization term to a predetermined loss function set in advance as a loss function used in the process of training the prediction model.
The regularization term is generated by multiplying the prediction function and a function whose variables are functions derived based on the variables used in the prediction function by a predetermined regularization weight.
The prediction device according to claim 1.

The regularization term is generated by multiplying a function whose variable is a derivative derived by differentiating the prediction function with a variable used to input the prediction function by a predetermined regularization weight. ,
The prediction device according to claim 2.

The regularization term includes a function that takes the output of the prediction function as an input variable in the vicinity of the variable used to input the prediction function.
The prediction device according to claim 2.

The regularization term includes a function whose input variable is an output consisting of terms up to a predetermined degree in the Taylor series of the prediction function in the vicinity of the variable used for inputting the prediction function.
The prediction device according to claim 4.

The regularization term includes functions that differ from each other depending on the value of the variable used to input the prediction function.
The prediction device according to any one of claims 2 to 5.

The regularization term is generated by multiplying a function whose variable is the output of the prediction function by a predetermined regularization weight.
The prediction device according to any one of claims 2 to 6.

The regularization term is generated by multiplying a function whose variable is the input of the prediction function by a predetermined regularization weight.
The prediction device according to any one of claims 2 to 7.

A function control unit that controls the behavior of the prediction function that indicates the relationship between the input and the output in the prediction model that outputs the prediction value for the input.
A learning unit that trains the prediction model so that the output obtained by inputting training data to the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the training data.
A learning device characterized by comprising.

A function control process in which the function control unit controls the behavior of the prediction function that indicates the relationship between the input and the output in the prediction model that outputs the prediction value for the input.
The learning unit trains the prediction model so that the output obtained by inputting the training data into the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the learning data. The learning process and
A prediction process in which the prediction unit predicts a prediction value for an input based on an output obtained by inputting unlearned data into the prediction model trained by the learning unit.
A prediction method characterized by including.

Computer,
A function control means that controls the behavior of a prediction function that indicates the relationship between an input and an output in a prediction model that outputs a prediction value for the input.
A learning means for training the prediction model so that the output obtained by inputting the training data to the prediction function whose behavior is controlled by the function control means approaches the teacher data corresponding to the training data.
A program for causing the prediction model trained by the learning means to function as a prediction means for predicting a predicted value with respect to an input based on an output obtained by inputting unlearned data.