WO2020241009A1

WO2020241009A1 - Prediction device, learning device, prediction method, and program

Info

Publication number: WO2020241009A1
Application number: PCT/JP2020/012873
Authority: WO
Inventors: 石田　武
Original assignee: 株式会社エヌ・ティ・ティ・データ
Priority date: 2019-05-31
Filing date: 2020-03-24
Publication date: 2020-12-03
Also published as: JP2020198097A

Abstract

A prediction device equipped with: a function control unit for controlling the behavior of a prediction function indicating a relationship between an input and an output in a prediction model that outputs a predicted value with respect to an input; a learning unit for causing the prediction model to learn such that the output obtained by inputting learning data to the prediction function, the behavior of which is controlled by the function control unit, approaches training data corresponding to the learning data; and a prediction unit for predicting a predicted value with respect to an input, on the basis of the output obtained by inputting unlearned data to the prediction model that has completed the learning caused by the learning unit.

Description

Predictors, learning devices, prediction methods, and programs

The present invention relates to a prediction device, a learning device, a prediction method, and a program.
The present application claims priority based on Japanese Patent Application No. 2019-102808 filed in Japan on May 31, 2019, the contents of which are incorporated herein by reference.

Analytical models using methods such as statistical analysis and machine learning are used in a wide range of industries. For example, there is known a technique for automatically planning a sales promotion plan that relies on the rule of thumb of the person in charge by machine learning (see, for example, Patent Document 1). In Patent Document 1, past sales promotion plans and customer data and sales data related to the sales promotion plans are learned as learning data, and sales forecasts to customers, etc., which are necessary for planning an implementation schedule, are described. Gather information.
Further, in the field of machine learning, there is a method of using a regularization term in order to prevent overfitting in the learning process (see, for example, Patent Document 2). Patent Document 2 discloses a technique in which a regularization term converges a deep learning parameter to a binary value to perform efficient learning.

JP-A-2018-45316 Japanese Unexamined Patent Publication No. 2019-40414

However, there are cases where results that deviate from human senses are obtained from the model generated using machine learning.
For example, consider the case of constructing an analysis model (f (x1, x2)) that predicts sales (y) for tomorrow's advertising costs based on today's advertising costs (x1) and today's sales (x2). ..
In the usual sense, increasing advertising costs will increase sales to a certain extent, but there should be points where sales will level off even if advertising costs are increased. However, if you create a graph with the advertising cost x1 on the horizontal axis and the sales f on the vertical axis in the analysis model, the sales f will continue to increase monotonously as the advertising cost x1 increases, and the rate of increase in sales from a certain point (advertising). The result may be that the reduction of the effect) cannot be considered. Alternatively, as the advertising cost x1 increases, the sales f may become locally negative, and the behavior may be uncomfortable.

Such a situation is not rare, and it is conceivable that it occurs frequently, especially when the training data is incomplete. 5A-5C show examples of data imperfections and the problems they cause. In the upper and lower graphs shown in FIGS. 5A to 5C, the horizontal axis shows advertising costs, the vertical axis shows sales, the upper side shows the data and the true curve (the curve showing the relationship between the true advertising costs and sales), and the lower side. The data and the curve predicted by the model are shown on the side. In FIGS. 5A to 5C, the following (1) to (3) are shown as examples in which an erroneous conclusion (predicted value) is drawn as a result of analyzing incomplete data.

(1) Insufficient data used for learning (see Fig. 5A)
(2) The data used for learning contains a lot of noise (see Fig. 5B).
(3) Important data to be used for learning cannot be obtained or is not considered in the learning process (see Fig. 5C).

For example, in (1), as shown in FIG. 5A, the data used for learning is insufficient for the explanatory variables used for inputting the model and the parameters for determining the behavior of the model. In this case, since the model cannot distinguish between the data that deviates from the true prediction curve and the data that does not deviate from the true prediction curve in the process of learning, a trained model that exhibits a strange behavior is generated. It is thought that it will be done.
In (2), as shown in FIG. 5B, there are many data that deviate from the true curve. In this case, it is considered that the model is influenced by the dissociated data in the learning process, so that a trained model showing a strange behavior is generated.
In (3), as shown in FIG. 5C, important information that can affect the behavior of the prediction model (for example, the content of the advertisement was unpopular) was not used as the input variable in the learning process. , It is considered that a trained model showing a strange behavior is generated.

As a result, an event occurs in which a predicted value that is difficult for the user to understand the analysis result is output from the trained model, and there is a problem that the analysis model generated at the development cost cannot be utilized.

The present invention has been made to solve the above problems, and an object of the present invention is to provide a prediction device, a learning device, a prediction method, and a program capable of machine learning so that human knowledge can be easily reflected in a model. To provide.

In order to solve the above problem, one aspect of the present invention includes a function control unit that controls the behavior of a prediction function indicating a relationship between an input and an output in a prediction model that outputs a prediction value with respect to an input, and the function control unit. The learning unit that trains the prediction model so that the output obtained by inputting the training data into the prediction function whose behavior is controlled by the method approaches the teacher data corresponding to the training data, and the learning unit. It is a prediction device including a prediction unit that predicts a predicted value with respect to an input based on an output obtained by inputting unlearned data into the trained prediction model.

Further, in one aspect of the present invention, in the prediction device described above, the function control unit uses a preset loss function plus a regularization term in the process of training the prediction model. By making it a function, the behavior of the prediction function is controlled, and the regularization term is a predetermined regularity to the prediction function and a function whose variables are functions derived based on the variables used in the prediction function. It may be generated by multiplying the conversion weights.

Further, in one aspect of the present invention, in the prediction device described above, the regularization term uses a derivative derived by differentiating the prediction function with a variable used for inputting the prediction function as a variable. It may be generated by multiplying the function to be used by a predetermined regularization weight.

Further, in one aspect of the present invention, in the prediction device described above, the regularization term may include functions different from each other depending on the value of the variable used for inputting the prediction function.

Further, one aspect of the present invention is such that in the prediction device described above, the regularization term is generated by multiplying a function whose variable is the output of the prediction function by a predetermined regularization weight. It may be.

Further, one aspect of the present invention is such that in the prediction device described above, the regularization term is generated by multiplying a function whose variable is the input of the prediction function by a predetermined regularization weight. It may be.

Further, one aspect of the present invention is a function control unit that controls the behavior of a prediction function that indicates the relationship between an input and an output in a prediction model that outputs a prediction value with respect to an input, and the behavior is controlled by the function control unit. It is a learning device including a learning unit that trains the prediction model so that the output obtained by inputting the training data into the prediction function approaches the teacher data corresponding to the training data.

Further, in one aspect of the present invention, the function control process controls the behavior of the prediction function indicating the relationship between the input and the output in the prediction model in which the function control unit outputs the prediction value with respect to the input, and the learning unit describes the above. A learning process in which the prediction model is trained so that the output obtained by inputting training data to the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the training data, and prediction. The unit is a prediction method including a prediction process of predicting a predicted value with respect to an input based on an output obtained by inputting unlearned data to the predicted model that has been trained by the learning unit.

Further, in one aspect of the present invention, the behavior of a computer is controlled by a function control means for controlling the behavior of a prediction function indicating a relationship between an input and an output in a prediction model that outputs a prediction value with respect to the input, the function control means. A learning means that trains the prediction model so that the output obtained by inputting the training data into the predicted function is close to the teacher data corresponding to the training data, and the prediction that has been learned by the learning means. It is a program for making a model function as a prediction means for predicting a predicted value for an input based on an output obtained by inputting untrained data.

According to the present invention, machine learning can be performed so that the model can easily reflect human knowledge.

It is a block diagram which shows the configuration example of the prediction apparatus of embodiment. It is a graph explaining the process performed by the function control unit of embodiment. It is a table explaining the process performed by the function control unit of embodiment. It is a flowchart which shows the flow of the process performed by the prediction apparatus of embodiment. It is a graph explaining the subject of this application. It is a graph explaining the subject of this application. It is a graph explaining the subject of this application.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

First, the embodiment will be described. FIG. 1 is a block diagram showing a configuration example of the prediction device 1 of the first embodiment. The prediction device 1 is a device that generates a prediction model and uses the generated prediction model to make a prediction for an input. The prediction model here is a model that outputs a predicted value for an input for an arbitrary item, and is, for example, a model that outputs a sales (forecast value) for an advertising cost (input).

The prediction device 1 includes, for example, a learning data acquisition unit 11, a teacher data acquisition unit 12, a preprocessing unit 13, a learning unit 14, a prediction unit 15, a function control unit 16, and a prediction model parameter storage unit 17. To be equipped.

The learning data acquisition unit 11 acquires the learning data. The training data is data used as an input when training the prediction model. For example, when the forecast model is a model that outputs sales (predicted value) with respect to advertising cost (input), the training data is data showing the actual performance of advertising cost invested in the past.

The teacher data acquisition unit 12 acquires teacher data. The teacher data is data used as an output when the prediction model is trained. For example, when the forecast model is a model that outputs sales (forecast value) with respect to advertising cost (input), the training data is data showing the actual sales in the past.

The preprocessing unit 13 generates data to be trained by the prediction model by associating the training data with the teacher data. For example, the preprocessing unit 13 generates data in which the advertising cost (input data) invested on a certain past date is associated with the actual sales on that date as data to be trained by the prediction model.

The learning unit 14 trains the prediction model using the data generated by the preprocessing unit 13. The prediction model may be configured using an arbitrary machine learning method, and is, for example, a recurrent neural network (hereinafter referred to as RNN). Generally, an RNN is composed of three layers: an input layer, a hidden layer (intermediate layer), and an output layer. Data (input data) to be trained by the RNN is input to the input layer. Data (output data) indicating the result learned by the RNN is output from the output layer. The hidden layer performs the core processing of learning. For example, the hidden layer converts the input into a value represented by a function called an activation function (transfer function) and outputs it. For example, the activation function includes, but is not limited to, a rectified linear function, a sigmoid function, a step function, and the like, and any function may be used.

Here, the configuration of the RNN will be briefly explained. In the RNN, nodes are connected from the unit of the input layer to each of the plurality of units of the shallowest layer among the n layers of the hidden layer. Here, n is an arbitrary natural number. The shallowest layer is the layer closest to the input layer among the n layers of the hidden layer, and is the first layer in this example. Nodes are connected from the unit of the first layer to each of a plurality of units of the n-layer of the hidden layer, which is the shallowest layer next to the first layer (second layer in this example). A weighting W based on a coupling coefficient and a bias component b are applied to each of the nodes connecting the units. As a result, when data is output from a unit in a certain layer to a unit in a deeper layer, data to which a weight W and a bias component b corresponding to the coupling coefficient of the nodes connecting the units are added is output. ..

The learning unit 14 causes the learning data to be input to the input layer of the prediction model. The learning unit 14 trains the prediction model so that the data output from the output layer with respect to the learning data input to the prediction model approaches the teacher data corresponding to the learning data. The learning unit 14 derives the relationship between the error and the parameters set in the prediction model as a loss function. The error here is the degree of dissociation between the data output from the output layer of the prediction model and the teacher data. Any index may be used for the degree of divergence, and for example, the square of the error (square error), cross entropy, or the like is used.

In general, the loss function l (lowercase L) can be expressed by a function having the teacher data y _R and the prediction function f (x) as variables, and is expressed by the following equation (1). In equation (1), l is a loss function, y _R is training data, and f (x) is a function showing the relationship between the input (x) and the output (f (x)) of the prediction model.

Loss function l (y _R , f (x)) ... (1)

If the output of the prediction model is y, the prediction function is expressed by y = f (x). When this is applied to the equation (1), the loss function l can be expressed by the following equation (2).

Loss function l (y _R , y) ... (2)

In the present embodiment, the learning unit 14 adds the regularization term L derived by the function control unit 16 to the loss function l instead of the loss function l shown in the equation (1) or the equation (2). Used as a new loss function l #.

The behavior of the prediction function y can be controlled by adding the regularization term L to the loss function l. As a result, when the prediction function y shows a strange behavior, it is possible to make the behavior a natural behavior. For example, even if the prediction function y shows a behavior that the sales f (x) locally becomes negative as the advertising cost x increases in the learning process, it is controlled so as not to become negative. can do. Therefore, it is possible to perform machine learning so that human knowledge is easily reflected in the prediction model.

The loss function l # is expressed by the following equation (3). In equation (3), l # is the loss function used by the learning unit 14 of the present embodiment, l is the loss function represented by equation (1) or (2), y _R is the teacher data, and f (x) is the prediction. The output of the model, λ is the weighting coefficient of regularization, and L is the regularization term.

Loss function l # = l (y _R , y) + λ × L ... (3)

Note that the regularization weighting coefficient λ may be an arbitrary real number or a function of the input x. Further, the regularization term L will be described in detail later.

The learning unit 14 uses the error back propagation method to determine the combination of the weight W and the bias component b so that the loss function l is minimized. The learning unit 14 associates the determined (updated) weight W and the bias component b with the nodes and units, and stores the associated information in the prediction model parameter storage unit 17.

The prediction unit 15 generates (reconstructs) an RNN based on the weight W and the bias component b of each layer determined by learning by referring to the prediction model parameter storage unit 17. The prediction unit 15 uses the generated (reconstructed) RNN as a prediction model, inputs unlearned input data to the prediction model, and predicts the prediction value based on the output data output from the prediction model. The “unlearned input data” is, for example, data that is not used as training data at the stage of training the prediction model. For example, the prediction unit 15 outputs a value output from the output layer as a prediction value by inputting unlearned input data to the reconstructed RNN input layer.

The prediction model parameter storage unit 17 stores the weight W and the bias component b of each layer determined by learning the prediction model. Information indicating the configuration of the RNN may be stored in the prediction model parameter storage unit 17. The information indicating the configuration of the RNN includes, for example, information indicating the number of hidden layers of the RNN, the number of units of each layer, the activation function, and the like.

The function control unit 16 controls the behavior of the prediction function y in the prediction model. The function control unit 16 acquires the prediction function y derived by the prediction unit 15 each time learning is executed. The function control unit 16 determines whether or not there is a sense of discomfort in the behavior of the prediction function y.

FIG. 2 is a graph illustrating the processing performed by the function control unit 16 of the embodiment. FIG. 2 shows the relationship between the input (advertising cost x shown on the horizontal axis) and the output (sales y shown on the vertical axis) predicted by the prediction model. Here, it is assumed that there is business knowledge that sales for advertising costs increase monotonically and that the rate of increase in sales for advertising costs does not change sharply. That is, this premise is based on the business knowledge that advertising does not have a negative effect and that advertising does not increase sales exponentially.

According to the above business knowledge, in the area E1 shown in FIG. 2, sales y with respect to advertising costs should be on an increasing trend. However, as shown in FIG. 2, when the prediction result predicted by the prediction model is such that the sales for the advertising cost as shown in the area E1 decrease, the function control unit 16 determines the prediction function y in the area E1. Judge that there is something wrong with the behavior. Further, when the ratio of the increase in sales to the advertising cost as shown in the area E2 changes abruptly (the slope is larger than the predetermined threshold value and the increasing tendency is too strong), the function control unit 16 sets the area. In E2, it is determined that the behavior of the prediction function y feels strange.

When the function control unit 16 determines that the behavior of the prediction model is uncomfortable, the function control unit 16 derives a new loss function l # by adding the regularization term L to the loss function l. The function control unit 16 controls the behavior of the prediction function y by training the prediction model by the prediction unit 15 using the derived loss function l #.

The regularization term L derived by the function control unit 16 is expressed as a function whose variables are the input / output of the prediction model and the derivative of an arbitrary order, as shown in equation (4). In equation (4), x is the input of the prediction model, y is the output of the prediction model, dy / dx is the derivative of the output y of the prediction model differentiated once with the input x, and d ^ ny / dx ^ ny is the prediction model. It is a derivative obtained by differentiating the output y of the above at the input x n times. n is an arbitrary natural number.

Regularization term L (x, y, dy / dx, ..., d ^ ny / dx ^ ny) ... (4)

The regularization term L is not limited to the one that uses all the variables shown in the equation (4), and uses at least one of the variables shown in the equation (4). It should be. For example, as the derivative, only the higher derivative of the second derivative or higher may be used.
In addition, the regularization term L is a so-called L1 regularization or L2 regularization used in conventional statistics and machine learning by a regularization technique for preventing overfitting and enhancing generalization ability. It may be included, or it may be configured without including L1 regularization and L2 regularization.

The function control unit 16 derives the regularization term L from the product of the range of the input x and the function indicating the behavior of the output y, as in Eq. (5). (5) In the equation, the function _{I A} indicating the range of the input x, GradLoss is a function to control the behavior of the output y (e.g., slope).

L (x, y, dy / dx, ..., d ^ ny / dx ^ ny)
_{= I A (x) × GradLoss} (x, y, dy / dx, ..., d ^ ny / dx ^ ny) ... (5)

In the equation (5), the function IA (x) is a function (domain determination function) that outputs 1 when x ∈ _A and 0 when x ∈ _A is not. Here, x is an arbitrary value that can be taken as an input.
As a result, the behavior of the output y can be controlled in any subset A of the domain of the analysis model.

Here, an example of the regularization term L derived by the function control unit 16 will be described with reference to FIG. FIG. 3 is a table for explaining the processing performed by the function control unit 16 of the embodiment. FIG. 3 includes each item of the definition formula of GradLoss, which is the business knowledge and what kind of behavior f (x) should be.
“Business knowledge” indicates knowledge set by humans according to the items predicted by the prediction model. The knowledge is not limited to business, and may include, for example, historical background, experience, assumptions and assumptions, and knowledge based on a combination thereof.
In "What kind of behavior should f (x) be?", The behavior of the output f (x) corresponding to the business knowledge is shown by a mathematical formula. The “GradLoss definition formula” shows a specific GradLoss formula corresponding to business knowledge.

For example, as shown in the first item of FIG. 3, the function control unit 16 has dy / dx> 0, that is, when there is business knowledge that the output y should tend to increase with respect to the input x. It is determined that the behavior in which the first derivative of y is positive is desirable. In this case, the function control unit 16 defines max ((-1) × dy / dx, 0) as the GradLoss function. The max function here is a function that compares the two values indicated in the arguments and outputs the larger one.
For example, when dy / dx is positive, ((-1) × dy / dx) becomes negative, and the GradLoss function outputs 0. On the other hand, when dy / dx is negative, ((-1) × dy / dx) becomes positive, and the GradLoss function outputs ((-1) × dy / dx).

The GradLoss function constitutes the regularization term L to be added to the loss function, as shown in Eqs. (4) and (5). Therefore, when dy / dx is positive, the regularization term L becomes 0, and the loss function l itself shown in Eq. (1) or (2) is applied as the loss function used for learning by the prediction unit 15. .. On the other hand, when dy / dx is negative, the regularization term L becomes a value corresponding to ((-1) × dy / dx), and the loss function used for learning by the prediction unit 15 is the equation (1) or (1). The loss function l shown in Eq. 2) plus the regularization term L according to ((-1) dy / dx) is applied.
If there is business knowledge that the output y should tend to decrease with respect to the input x, max (dy / dx, 0) may be defined as the GradLoss function.

As shown in the second item of FIG. 3, the function control unit 16 has dy / dx <b, that is, y when there is a business finding that the degree of increase in the output y is too strong with respect to the input x. It is determined that the behavior in which the inclination of is smaller than b is desirable. Here, b is an arbitrary positive real number. In this case, the function control unit 16 defines (max (dy / dx, b) -b) as the GradLoss function.

For example, when dy / dx is smaller than b, b is output from the max function and 0 is output from the GradLoss function. On the other hand, when dy / dx is larger than b, dy / dx is output from the max function, and (dy / dx-b) is output from the GradLoss function. Therefore, when dy / dx is smaller than b, the regularization term L becomes 0, and the loss function l itself shown in Eq. (1) or (2) is applied as the loss function used for learning by the prediction unit 15. .. On the other hand, when dy / dx is larger than b, the regularization term L becomes (dy / dx−b), and the loss function used for learning by the prediction unit 15 is the loss function shown in Eq. (1) or (2). The sum of l plus the regularization term L according to (dy / dx−b) is applied.
When paying attention to the degree of decrease in the output y with respect to the input x, max (-dy / dx, -b) + b may be defined as the GradLoss function, to the extent that it is too strong or too weak. When paying attention, the value of b may be set as appropriate.

As shown in the third item of FIG. 3, the function control unit 16 has a business knowledge that the tendency of increase in the output y should be convex downward with respect to the input x, when d ^ 2y / It is determined that dx ^ 2> 0, that is, the behavior in which the double derivative of y is positive is desirable. In this case, the function control unit 16 defines (max ((-1) × d ^ 2y / dx ^ 2, 0) as the GradLoss function.

For example, if d ^ 2y / dx ^ 2 is positive, the GradLoss function outputs 0. On the other hand, when d ^ 2y / dx ^ 2 is negative, the GradLoss function outputs ((-1) × d ^ 2y / dx ^ 2). Therefore, when d ^ 2y / dx ^ 2 is positive, the regularization term L becomes 0, and the loss function l itself shown in Eq. (1) or (2) is used as the loss function for learning by the prediction unit 15. Is applied. On the other hand, when (d ^ 2y / dx ^ 2) is negative, the regularization term L becomes ((-1) × d ^ 2y / dx ^ 2), and the loss function used for learning by the prediction unit 15 is The loss function l shown in the equation (1) or (2) plus the regularization term L according to ((-1) × d ^ 2y / dx ^ 2) is applied.
If the tendency of the increase in the output y should be convex with respect to the input x, max (d ^ 2y / dx ^ 2, 0) may be defined as the GradLoss function.

In this way, the function control unit 16 defines a mathematical formula (for example, a derivative of the output y) that shows desirable behavior in the output y, depending on business knowledge. The function control unit 16 becomes 0 when the behavior that matches the specified mathematical expression is shown in the input / output relationship at the output y, and becomes a non-zero value when the behavior that does not match the mathematical expression is shown. The regularization term L is derived.
As a result, the function control unit 16 can derive a regularization term L that is different from each other depending on whether the prediction model (prediction function y) behaves according to business knowledge or not. Therefore, the function control unit 16 regularizes the value when the prediction model (prediction function y) does not behave according to the business knowledge, as compared with the case where the prediction model (prediction function y) behaves according to the business knowledge. The term L can be derived and the prediction model can be trained using a larger loss function l #.
The prediction model learns that the loss increases when it does not behave according to the business knowledge in the learning process, and the parameters (weights) of the prediction model so as to behave according to the business knowledge. It can be expected that learning will proceed so as to determine W and the bias component b).

In the example of FIG. 3, the case where the variable of the regularization term L is a derivative of the prediction function y has been described as an example, but the present invention is not limited to this. For example, the variable used for the regularization term L may be the output y of the prediction function y itself. In this case, if there is business knowledge that the value of the output y cannot deviate from a certain range, correct the behavior of the prediction function y so that the output y falls within a certain range. Is possible.

Further, the variable used for the regularization term L may be the input x itself of the prediction function y. In this case, the behavior of the predicted value according to the value of the input x can be controlled.
Generally, the prediction model can output the prediction value for the input if the parameters of the prediction model are determined by learning. Therefore, even if the input x can be obtained only in a predetermined range, that is, the training data can be obtained only in a predetermined range, the predicted value for the input outside the range can be output. .. In this case, since the learning data does not exist, the predicted value is output regardless of the actual result (correspondence between the learning data and the teacher data). Therefore, there is a high possibility that the result will be different from the business knowledge within a predetermined range of the input x.
In such a case, by using the input x as the variable of the regularization term L, it is possible to correct the behavior of the prediction function y in a specific range of the input x so as to be in line with the business knowledge. For example, when a behavior in which the output y repeatedly increases and decreases is predicted in a specific range of the input x, the output y decreases if the knowledge that the output y in that range increases monotonically is given. Learning is performed to review the behavior, and it is possible to correct the behavior of the prediction function y so that the output y increases monotonically even in the range where the training data does not exist.
In addition, dummy data (c, f (c)) may be created in a region where learning data does not exist or is insufficient, and the behavior of the prediction function y may be corrected by using the dummy data (c, f (c)). Here, c is an arbitrary point on the domain of the prediction function y that does not exist in the training data.
Further, the input of the regularization term L may be made independent of the input of the loss function l. For example, when the value of the prediction function y is not required for the calculation of the regularization term L, the input data of the regularization term L is (x + εi, y) with respect to the training data (x, y) used in the loss function l. May be. Here, εi is an arbitrary number of noise values sampled from an arbitrary distribution.

FIG. 4 is a flowchart showing a flow of processing performed by the prediction device 1 of the first embodiment.
First, the learning unit 14 of the prediction device 1 trains the prediction model (step S10). The learning unit 14 learns the prediction model by determining the parameters of the prediction model so that the prediction value output when the training data is input to the prediction model approaches the teacher data associated with the training data. Let me.

The function control unit 16 determines whether or not there is a sense of discomfort in the behavior of the learned predicted model (step S11). When the function control unit 16 does not match the behavior of the prediction model with a mathematical formula (for example, a derivative of an arbitrary rank of the output y) set in advance according to the items predicted by the prediction model. , Judge that there is a sense of discomfort in the behavior of the prediction model.

When the behavior of the prediction model has a feeling of strangeness, the function control unit 16 adds the regularization term L corresponding to the feeling of strangeness to the loss function l (step S12). The function control unit 16 derives the regularization term L corresponding to the discomfort by, for example, using the max function or the like to set the regularization term L according to the degree that does not match the mathematical formula according to the business knowledge. To do.
The learning unit 14 retrains the prediction model using the loss function l # obtained by adding the regularization term L derived by the function control unit 16 to the loss function l (step S13).
After re-learning the prediction model (step S13), the learning unit 14 determines whether or not the end condition in the learning of the prediction model is satisfied (step S14). The learning end condition is a predetermined condition, for example, that the error between the predicted value and the teacher data is less than a predetermined threshold value, and the error between the predicted value and the teacher data per learning. The amount of change is less than a predetermined threshold, and so on.
If the learning end condition for learning the prediction model is not satisfied, the learning unit 14 returns to step S10 and performs learning for making the relearned learned prediction model satisfy the end condition. In this way, the learning unit 14 repeats the flow of processing shown in steps S10 to S13 until the learning end condition in the prediction model is satisfied.

Note that, in the flowchart of FIG. 4 described above, in step S11, the case where the function control unit 16 determines whether or not there is a sense of discomfort with respect to the trained prediction model is illustrated, but the present invention is not limited to this. The function control unit 16 may determine whether or not there is a sense of discomfort in the behavior of the prediction model before learning, in the learning process, or in any of the learned processes. That is, for the behavior of the prediction model in any of the pre-learning, the learning process, and the learned process, the behavior of the prediction model is in line with the business knowledge by learning using the loss function l #. It may be rebuilt.

As described above, the prediction device 1 of the embodiment includes a function control unit 16, a learning unit 14, and a prediction unit 15. The function control unit 16 controls the behavior of the prediction function y. The learning unit 14 trains the prediction model so that the output obtained by inputting the training data to the prediction function y whose behavior is controlled by the function control unit 16 approaches the teacher data corresponding to the learning data. .. The prediction unit 15 predicts the predicted value for the input based on the output obtained by inputting the unlearned data into the predicted model trained by the learning unit 14.
As a result, in the prediction device 1 of the embodiment, the function control unit 16 can control the behavior of the prediction function y, and when the behavior of the prediction function y is different from the business knowledge, it can be corrected. It is possible to make the prediction model machine-learn so that human knowledge can be easily reflected.

Further, in the prediction device 1 of the first embodiment, the function control unit 16 uses a preset loss function l plus the regularization term L in the process of training the prediction model. By doing so, the behavior of the prediction function y is controlled. Further, the regularization term L is a function whose variable is a function derived based on the prediction function y and the variable used for the prediction function y (for example, the input x of the prediction function y), and has a predetermined regularization weight λ. Is generated by multiplying.
As a result, in the prediction device 1 of the first embodiment, the regularization term L is added to the loss function l so that the loss looks large when it differs from the business knowledge, and the behavior of the prediction function y changes. It can be learned in line with business knowledge, and has the same effect as the above-mentioned effect.

Further, in the prediction device 1 of the first embodiment, the regularization term uses a derivative (for example, dy / dx) derived by differentiating the prediction function y with the input x of the prediction function y as a variable. It is generated by multiplying the function by a predetermined regularization weight λ. As a result, in the prediction device 1 of the first embodiment, the regularization term L can be derived according to the slope of the output y with respect to the input x, so that the slope of the output y with respect to the input x is in line with the business knowledge. Can be trained to produce the same effect as the above-mentioned effect.

Further, the prediction apparatus 1 of the first embodiment, regularization term includes different depending on the value of the input x of the prediction function y function (e.g., (5) formula I _{A (x))} a. As a result, in the prediction device 1 of the first embodiment, the regularization term L corresponding to the specific range of the input x can be derived, and the behavior of the output y in the specific range of the input x can be known in business. It can be trained according to the above-mentioned effect, and the same effect as the above-mentioned effect is obtained.

Further, in the prediction device 1 of the first embodiment, the regularization term is generated by multiplying a function whose variable is the output y of the prediction function y by a predetermined regularization weight λ. As a result, in the prediction device 1 of the first embodiment, the regularization term L corresponding to the value of the output y can be derived, and for example, the value of the output y cannot deviate from a certain range. When there is business knowledge to be performed, the behavior of the output y can be learned so as to be in line with the business knowledge.

Further, in the prediction device 1 of the first embodiment, the regularization term is generated by multiplying a function whose variable is the input x of the prediction function y by a predetermined regularization weight λ. As a result, in the prediction device 1 of the first embodiment, the regularization term L corresponding to the value of the input x can be derived. For example, because the training data does not exist in a predetermined range of the input x. Even when the predicted value cannot be controlled, it is possible to learn the behavior of the prediction function y by using dummy data or the like so as to be in line with business knowledge.

Further, the prediction device 1 of the first embodiment may be composed of a learning device that generates a learned model and a control device that makes a prediction using the learned model generated by the learning device. In this case, the learning device includes a function control unit 16 and a learning unit 14. When the learning device includes the function control unit 16, the learning device of the embodiment can create a prediction model that reflects business knowledge, and has the same effect as the above-mentioned effect.

In the above-described embodiment, the case where the RNN is applied to the prediction model has been illustrated and described, but the present invention is not limited to this. For example, as a prediction model, RSTM (Long Short Term Memory), which is a recurrent neural network other than RNN, may be applied, or a forward propagation type neural network may be applied. In the case of the forward propagation type, a multi-layer perceptron may be applied as a prediction model. Further, machine learning other than the neural network may be used as the prediction model.

Further, in the above-described embodiment, the case where the max function is used as the GradLoss function has been described as an example, but the present invention is not limited to this. As the GradLoss function, a function or mathematical formula that reflects human knowledge may be used at least for the behavior of the output f (x).
For example, as a function of GradLoss, it goes without saying that a min function may be used instead of the max function. The min function is a function that outputs the smallest value among a plurality of values indicated in the argument.

The whole or part of the prediction device 1 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be for realizing a part of the above-mentioned functions, and may be further realized for realizing the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized by using a programmable logic device such as FPGA (Field Programmable Gate Array).

Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs and the like within a range that does not deviate from the gist of the present invention.

1 Prediction device 11 Learning data acquisition unit 12 Teacher data acquisition unit 13 Preprocessing unit 14 Learning unit 15 Prediction unit 16 Function control unit 17 Prediction model parameter storage unit

Claims

A function control unit that controls the behavior of the prediction function that indicates the relationship between the input and the output in the prediction model that outputs the prediction value for the input.
A learning unit that trains the prediction model so that the output obtained by inputting training data to the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the training data.
A prediction unit that predicts a predicted value for an input based on an output obtained by inputting unlearned data to the prediction model that has been trained by the learning unit.
Predictor device.
The function control unit controls the behavior of the prediction function by adding a regularization term to a predetermined loss function set in advance as a loss function used in the process of training the prediction model.
The regularization term is generated by multiplying the prediction function and a function whose variables are functions derived based on the variables used in the prediction function by a predetermined regularization weight.
The prediction device according to claim 1.
The regularization term is generated by multiplying a function whose variable is a derivative derived by differentiating the prediction function with a variable used to input the prediction function by a predetermined regularization weight. ,
The prediction device according to claim 2.
The regularization terms include functions that differ from each other depending on the value of the variable used to input the prediction function.
The prediction device according to claim 2 or 3.
The regularization term is generated by multiplying a function whose variable is the output of the prediction function by a predetermined regularization weight.
The prediction device according to any one of claims 2 to 4.
The regularization term is generated by multiplying a function whose variable is the input of the prediction function by a predetermined regularization weight.
The prediction device according to any one of claims 2 to 5.
A function control unit that controls the behavior of the prediction function that indicates the relationship between the input and the output in the prediction model that outputs the prediction value for the input.
A learning unit that trains the prediction model so that the output obtained by inputting training data to the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the training data.
A learning device equipped with.
A function control process in which the function control unit controls the behavior of the prediction function that indicates the relationship between the input and the output in the prediction model that outputs the prediction value for the input.
The learning unit trains the prediction model so that the output obtained by inputting the training data into the prediction function whose behavior is controlled by the function control unit approaches the teacher data corresponding to the learning data. The learning process and
A prediction process in which the prediction unit predicts a prediction value for an input based on an output obtained by inputting unlearned data into the prediction model trained by the learning unit.
Forecasting methods including.
Computer,
A function control means that controls the behavior of a prediction function that indicates the relationship between an input and an output in a prediction model that outputs a prediction value for the input.
A learning means for training the prediction model so that the output obtained by inputting the training data to the prediction function whose behavior is controlled by the function control means approaches the teacher data corresponding to the training data.
A program for causing the prediction model trained by the learning means to function as a prediction means for predicting a predicted value for an input based on an output obtained by inputting unlearned data.