WO2020261509A1

WO2020261509A1 - Machine learning device, machine learning program, and machine learning method

Info

Publication number: WO2020261509A1
Application number: PCT/JP2019/025711
Authority: WO
Inventors: 一紀中田
Original assignee: Tdk株式会社
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2020-12-30
Also published as: WO2020262587A1

Abstract

A machine learning device wherein a recurrent neural network has an input layer, an intermediate layer, and an output layer, weights assigned to the respective edges linking intermediate nodes are fixed at a predetermined size, and the machine learning device carries out an output data generation process and a weight updating process every time the input layer receives input data of one or more dimensions in a predetermined sequence. The weight updating process is a process for calculating a Kalman gain matrix in an ensemble Kalman filter method on the basis of two or more estimation weight vectors having mutually different components and a predicted output vector calculated for each of the two or more estimation weight vectors, and updating the weights assigned to the respective edges linking the intermediate nodes and an output node, on the basis of the calculated Kalman gain matrix.

Description

Machine learning devices, machine learning programs, and machine learning methods

The present invention relates to a machine learning device, a machine learning program, and a machine learning method.

In a neural network that performs machine learning of time series data, a method using the extended Kalman filter method is known as a method of updating weights (see Non-Patent Document 1).

On the other hand, in order to implement neural networks on devices such as edge devices, research and development are also being conducted on technologies for realizing neural networks by hardware.

However, the neural network that updates the weights using the extended Kalman filter method may cause numerical instability due to quantization error in the matrix calculation for calculating the Kalman gain matrix. The occurrence of such numerical instability can be suppressed by increasing the number of quantization bits. However, in order to realize the neural network by hardware, it is desirable that the number of quantization bits is small. Under these circumstances, in a conventional neural network that involves matrix calculation for calculating the Kalman gain matrix, numerical instability due to quantization error occurs without increasing the number of quantization bits. It was sometimes difficult to control.

One aspect of the present invention is machine learning that performs machine learning of input data of one dimension or more arranged in a predetermined order by using a recursive neural network having a plurality of nodes connected to each other by weighted edges. A device, the recursive neural network, comprising an input layer having one or more input nodes, an intermediate layer having one or more intermediate nodes, and an output layer having one or more output nodes. The input node, the intermediate node, and the output node are different nodes from the plurality of nodes, and the weight assigned to each edge connecting the intermediate nodes has a predetermined size. The machine learning device performs output data generation processing and weight update processing each time the input layer receives input data of one dimension or more in a predetermined order, and generates the output data. The processing includes a first process of outputting the input data received by the input layer from the input layer to the intermediate layer, and one or more dimensions corresponding to the input data input to the intermediate layer by the first process. A second process of outputting the intermediate data from the intermediate layer to the output layer, and a one-dimensional or higher output data corresponding to the one-dimensional or higher intermediate data input to the output layer by the second process are generated. The third process is a process of performing the first process, the second process, and the third process in this order, and the weight update process is assigned to each edge connecting the intermediate node and the output node. A vector having an estimated value for weight as a component is used as an estimated weight vector, a vector having a predicted value for output data of one dimension or more as a component is used as a predicted output vector, and two or more estimated weight vectors having different components from each other. , The Kalman gain matrix in the ensemble Kalman filter method is calculated based on the predicted output vector calculated for each of the two or more estimated weight vectors, and the intermediate node and the output are based on the calculated Kalman gain matrix. It is a machine learning device that updates the weight assigned to each edge that connects to a node.

According to the present invention, in a recurrent neural network accompanied by matrix calculation for calculating a Kalman gain matrix, numerical instability due to quantization error occurs without increasing the number of quantization bits. Can be suppressed.

It is a figure which shows an example of the structure of the machine learning apparatus 1 which concerns on embodiment. It is a figure which shows an example of the structure of the recurrent neural network which concerns on embodiment. It is a figure which shows an example of the flow of the weight update process performed by the machine learning apparatus 1. It is a figure which shows an example of the whole structure of the data flow in the weight update processing based on the ensemble Kalman filter method. It is a figure which shows the simplest concrete example of the data flow in a block B3. The figure which shows an example of the double pendulum composed of the 1st weight of mass m1 connected by the rod of length l1 from the origin, and the 2nd weight of mass m2 connected by the rod of length l2. Is. The temporal change of the output data output from the machine learning device 1 during the period in which the machine learning device 1 is machine-learning the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows an example of the plotted graph. The temporal change of the output data output from the machine learning device 1 in the period after the machine learning device 1 is machine-learned about the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows an example of the graph which plotted. The temporal change of the output data output from the machine learning device 1 during the period in which the machine learning device 1 is machine-learning the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows the other example of the plotted graph. The temporal change of the output data output from the machine learning device 1 in the period after the machine learning device 1 is machine-learned about the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows another example of the graph which plotted. The temporal change of the output data output from the machine learning device 1 during the period in which the machine learning device 1 is machine-learning the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows still another example of the plotted graph. The temporal change of the output data output from the machine learning device 1 in the period after the machine learning device 1 is machine-learned about the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows still another example of the graph which plotted.

<Embodiment>
Hereinafter, embodiments of the present invention will be described with reference to the drawings.

<Configuration of machine learning device>
First, the configuration of the machine learning device 1 according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of the configuration of the machine learning device 1 according to the embodiment.

The machine learning device 1 performs machine learning of p-dimensional input data. p may be any integer as long as it is an integer of 1 or more. The machine learning device 1 performs such machine learning by using a recurrent neural network having a plurality of nodes. In the recurrent neural network, the plurality of nodes are connected to each other by weighted edges.

Here, the p-dimensional input data is data that correlates with each other. Further, the p-dimensional input data is data arranged in a predetermined order. In the following, as an example, a case where the predetermined order is in chronological order will be described. In this case, the p-dimensional input data is p-dimensional time series data. The p-dimensional time series data is, for example, data acquired from p sensors in chronological order. The p-sensors may be p-type sensors, or some or all of them may be p-sensors of the same type. Further, the predetermined order may be another order such as a spatially arranged order instead of the time series order.

In the following, for convenience of explanation, the time indicating the time series order is indicated by the discretized time k. k is an integer. When the time k indicating the time series order is a continuous variable, k may be another number such as a real number.

Here, as shown in FIG. 2, the recurrent neural network according to the embodiment has at least an input layer L1, an intermediate layer L2, and an output layer L3. FIG. 2 is a diagram showing an example of the configuration of the recurrent neural network according to the embodiment. Hereinafter, for convenience of explanation, the recurrent neural network according to the embodiment will be referred to as an ensemble FORCE learner.

Note that in a certain neural network, each node means the operation itself between the data flowing in the neural network. Therefore, each node means a function that performs the operation in the neural network realized by software. In addition, each node means an element that performs the calculation in the neural network realized by hardware.

Further, in a certain neural network, an edge connecting between a certain node N1 and another node N2 indicates a data flow from the node N1 to the node N2. The data flowing from node N1 to node N2 is multiplied by the weight assigned to the edge connecting between node N1 and node N2. That is, the data after the weight is multiplied by passing through the edge is input to the node N2 from the edge. Therefore, in the neural network realized by software, the edge means a function that performs such weight multiplication. Further, the edge means an element that performs such weight multiplication in the neural network realized by hardware.

The input layer L1 has an input node. Here, the input layer L1 may have the same number of input nodes as the number of dimensions of the P-dimensional input data, or may have a number of input nodes different from the number of dimensions of the P-dimensional input data. When the input layer L1 has a number of input nodes different from the number of dimensions of the P-dimensional input data, the number of these input nodes may be less than P and more than P. It may be. Then, in this case, for example, a weighted linear sum of P-dimensional input data is input to these input nodes. In the following, as an example, a case where the input layer L1 has P input nodes will be described. In this case, a certain input node accepts the input data associated with the input node among the input data. In other words, the p-th input node among the P input nodes receives the p-th input data among the P-dimensional time series data. Here, p is any integer of 1 or more and P or less. That is, p is a number (label) that identifies each of the P input nodes with each other, and is also a number (label) that identifies each of the P input data with each other. The input layer L1 outputs each of the P-dimensional input data received by the P input nodes to the intermediate layer L2.

Further, the intermediate layer L2 has a plurality of intermediate nodes. Further, the intermediate layer L2 receives each of the P-dimensional input data output by the input layer L1. More specifically, the intermediate layer L2 receives each of the P-dimensional input data output by the input layer L1 by a part or all of the plurality of intermediate nodes. The intermediate layer outputs Q-dimensional intermediate data corresponding to the received P-dimensional input data to the output layer L3. Q may be any number as long as it is an integer of 1 or more. Therefore, the intermediate layer L2 has at least Q intermediate nodes that output each of the Q-dimensional intermediate data to the output layer L3. Here, the qth intermediate node among these Q intermediate nodes outputs the qth intermediate data among the Q-dimensional intermediate data to the output layer L3. q is any integer of 1 or more and Q or less. q is a number (label) that identifies each of the Q intermediate nodes from each other, and is also a number (label) that identifies each of the Q-dimensional intermediate data from each other.

Here, when one or more input data is received, a certain intermediate node generates an output value obtained when the sum of the received one or more input data is input to the first activation function. The first activation function may be any function as long as it is a non-linear function. Then, the intermediate node outputs the generated output value to another node connected to the intermediate node by the edge. When the intermediate node is one of the Q intermediate nodes described above, the generated output value is output to the output layer L3 as intermediate data. The individual intermediate nodes of the intermediate layer L2 generate such an output value. Of the processes performed by the intermediate node, other processes such as bias addition will not be described.

The intermediate layer L2 is, for example, a reservoir in reservoir computing. Therefore, the weight in the intermediate layer L2 is determined in advance by a random number. Then, the weight is not updated in the intermediate layer L2. In other words, the weight assigned to the edge connecting the intermediate nodes is fixed to a predetermined size (that is, a size determined by a random number). In addition, the intermediate layer L2 may be another intermediate layer in which the weight is not updated in the layer instead of the reservoir.

The output layer L3 has R output nodes. Here, R may be any integer as long as it is an integer of 1 or more. The output layer L3 receives Q-dimensional intermediate data from the intermediate layer L2 by these R output nodes. The output layer L3 generates and outputs R-dimensional output data corresponding to the received Q-dimensional intermediate data. That is, the r-th output node among the R output nodes generates the r-th output data among the R-dimensional output data. r is any integer of 1 or more and R or less. r is a number (label) that identifies each of the R output nodes from each other, and is also a number (label) that identifies each of the R-dimensional output data from each other.

Here, when one or more intermediate data is received, a certain output node generates an output value obtained when the sum of the received one or more intermediate data is input to the second activation function. The second activation function will be described later. As a result, the output node outputs the output value as output data. The output node of the output layer L3 generates such an output value. Of the processes performed by the output node, other processes such as bias addition and output of the output value will not be described.

As described above, the ensemble FORCE learner has an intermediate layer L2 which is a reservoir in this example. Therefore, the ensemble FORCE learner is a kind of reservoir computing in this example.

Note that the input node, the intermediate node, and the output node are different nodes among the plurality of nodes of the ensemble FORCE learner, and do not overlap with each other.

Here, when the data D1 from the input node X11 to the intermediate node X12 is output, the data D1 is multiplied by the weight assigned to the edge connecting the input node X11 and the intermediate node X12. Then, the data D1 after the weight is multiplied is input to the intermediate node X12.

Further, when some data D2 is output from a certain intermediate node X21 to another intermediate node X22, the data D2 is multiplied by the weight assigned to the edge connecting the intermediate node X21 and the intermediate node X22. Then, the data D2 after the weight is multiplied is input to the intermediate node X22.

Further, when the data D3 from the intermediate node X31 to the output node X32 is output, the data D3 is multiplied by the weight assigned to the edge connecting the intermediate node X31 and the output node X32. Then, the data D3 after the weight is multiplied is input to the output node X32.

Further, since the weights in the intermediate layer L2 are not updated, in the ensemble FORCE learning, the weights are updated for the weights assigned to each edge connecting the intermediate node and the output node. Also, the weights are not updated for the weights assigned to the edges that connect the input node and the intermediate node. Therefore, in the following, for convenience of explanation, the weights assigned to the edges connecting the intermediate node and the output node will be collectively referred to as update target weights. In the following, for convenience of explanation, the number of weights to be updated is represented by L. L may be any number as long as it is an integer of 2 or more.

Note that each "○" shown in FIG. 2 indicates a node. That is, each "◯" included in the input layer L1 indicates an input node. Further, each "○" included in the intermediate layer L2 indicates an intermediate node. Further, "○" included in the output layer L3 indicates an output node, respectively.

Further, the arrows connecting the nodes shown in FIG. 2 are drawn to clearly represent the image of the connection mode by the edges between the nodes in the ensemble FORCE learner, and the actual ensemble FORCE learner. It is different from the connection mode by the edge between each node in.

Further, the input of the input data to the input layer L1 and the output of the output data from the output layer L3 may be performed by a known method or may be performed by a method to be developed in the future. Is omitted.

The machine learning device 1 uses such ensemble FORCE learning to perform machine learning of the above-mentioned P-dimensional input data. More specifically, the machine learning device 1 performs weight update processing every time the input layer L1 receives P-dimensional input data in chronological order (that is, every time the input data is received in a predetermined order). Performs output data generation processing.

The weight update process is a process for updating the update target weight. The machine learning device 1 performs a weight update process before performing the output data generation process. That is, each time the input layer L1 receives P-dimensional input data in chronological order, the machine learning device 1 updates the update target weight and then performs the output data generation process.

The weight update process is a process for updating the update target weight based on the ensemble Kalman filter method. Conventionally, when updating weights based on the ensemble Kalman filter method, it is necessary to prepare the same number of intermediate layers of the neural network for updating as the number of samples (number of particles) in the ensemble Kalman filter method. Therefore, in the past, there has been a problem that the calculation cost increases in this case. On the other hand, in the ensemble FORCE learning, the weights in the intermediate layer L2 are not updated. Therefore, in the ensemble FORCE learning, the weight can be updated based on the ensemble Kalman filter method while keeping the number of the intermediate layers L2 at one. As a result, the machine learning device 1 can suppress an increase in calculation cost. In addition, the weight update based on the ensemble Kalman filter method has a lower inverse matrix calculation cost than the weight update by another method involving matrix calculation for calculating the Kalman gain matrix. As a result, the machine learning device 1 can suppress the occurrence of numerical instability due to the quantization error without increasing the number of quantization bits.

More specifically, the weight update process obtains a Kalman gain matrix in the ensemble Kalman filter method based on M estimated weight vectors having different components and predicted output vectors calculated for each of M estimated weight vectors. This is a process of calculating and updating the update target weight based on the calculated Kalman gain matrix. M is the number of samples in the ensemble Kalman filter method. M may be any number as long as it is an integer of 2 or more.

Here, the estimated weight vector is a vector having an estimated value for each weight included in the update target weight as a component. Hereinafter, for convenience of explanation, the estimated value will be referred to as an estimated weight. Since the number of weights to be updated is L (that is, the number of estimated weights is L), the estimated weight vector is an L-dimensional vector. The initial value of the estimated weight is randomly determined by a random number.

Further, the predicted output vector is a vector having predicted values for each of the R-dimensional output data as components. That is, the predicted output vector is an R-dimensional vector. The predicted output vector at a certain time k is calculated based on the estimated weight vector at the time k.

The details of the weight update process will be described later.

The output data generation process is a process performed after the update target weight is updated by the weight update process, and then the update target weight is used. The output data generation process is a process in which the first process, the second process, and the third process are performed in the order of the first process, the second process, and the third process.

The first process is a process of outputting the P-dimensional input data received by the input layer L1 from the input layer L1 to the intermediate layer L2.

The second process is a process of outputting Q-dimensional intermediate data corresponding to the p-dimensional input data input to the intermediate layer L2 by the first process from the intermediate layer L2 to the output layer L3.

The third process is a process of generating R-dimensional output data corresponding to Q-dimensional or higher intermediate data input to the output layer L3 by the second process.

The output data generation process is the same as the process for generating output data in Reservoir Computing. Therefore, further detailed description of the output data generation process will be omitted.

Return to Fig. 1. The machine learning device 1 includes an arithmetic unit 11, a memory 12, and a network interface 13. In addition to these, the machine learning device 1 may be configured to include other circuits and other devices. For example, the machine learning device 1 may be configured to include an input device such as a keyboard and a mouse. Further, for example, the machine learning device 1 may be configured to include an output device such as a display. Further, for example, the machine learning device 1 may be configured to include an interface for connecting at least one of the input device and the output device.

The arithmetic unit 11 is a processor, for example, an FPGA (Field Programmable Gate Array). The arithmetic unit 11 may be a CPU (Central Processing Unit) instead of the FPGA, may be a combination of the FPGA and the CPU, or may be another processor.

In this example, the arithmetic unit 11 is an FPGA. Therefore, the arithmetic unit 11 realizes the above-mentioned ensemble FORCE learning by the hardware (for example, an integrated circuit or the like) possessed by the FPGA, and performs machine learning on the p-dimensional input data. When the arithmetic unit 11 is a CPU, the arithmetic unit 11 may be configured to perform the machine learning by combining the hardware possessed by the CPU and the software executed by the CPU. Further, the arithmetic unit 11 may be configured by near memory, memory logic, or the like, as will be described later. In other words, the arithmetic unit 11 may be composed of hardware including at least one of near memory and memory logic.

The memory 12 stores, for example, various information used by the arithmetic unit 11. The memory 12 includes, for example, SSD (Solid State Drive), HDD (Hard Disk Drive), EEPROM (Electrically Erasable Programmable Read-Only Memory), ROM (Read-Only Memory), RAM (Random Access Memory), and the like. The memory 12 may be an external storage device connected by a digital input / output port such as USB, instead of the one built in the arithmetic unit 11.

The network interface 13 is an interface that connects to an external device such as a sensor via a network.

<Weight update process>
Hereinafter, the weight update process performed by the machine learning device 1 will be described. Here, the weight update process described below is a process based on the ensemble Kalman filter method. In the weight update process based on the ensemble Kalman filter method, sequential calculation is performed according to the time series order indicated by the discretized time k. Therefore, the time k that appears as an argument of the function, vector, matrix, etc. described below indicates the time series order in such sequential calculation. The following formulation by the ensemble Kalman filter method is only an example, and may be another formulation.

Here, the ensemble FORCE learner can be represented by the nonlinear vector functions shown in the following equations (1) and (2).

The vector x in the above equation (1) represents a weight vector. The weight vector is a vector having an update target weight as a component. That is, the vector x _{k + 1} indicates a weight vector at time k + 1. Further, x _k indicates a weight vector at time k. The vector η is a weight error vector indicating a modeling error for the weight vector. That is, the vector η _k indicates a weight error vector at time k. The vector η _k is obtained by assuming some error distribution as the error distribution of the modeling error for the weight vector at time k. As the error distribution, for example, a Gaussian distribution or the like can be adopted. The first term on the right side in the equation (1) may be a non-linear function in which the vector x _k and the time k are variables.

Further, the vector y in the above equation (2) indicates an output vector. The output vector is a vector having R-dimensional output data as a component. That is, the vector y _k indicates the output vector at time k. The vector ζ is an output error vector indicating a modeling error for the output vector. That is, the vector ζ _k indicates the output error vector at time k. The vector ζ _k is obtained by assuming some error distribution as the error distribution of the modeling error for the output error vector at time k. As the error distribution, for example, a Gaussian distribution or the like can be adopted. The function h is the above-mentioned second activation function. More specifically, the function h is a two-variable function, such as a sigmoid function, a bicurve tangent function, a linear function, and Relu.

Here, in the weight update based on the ensemble Kalman filter method, M weight vectors are treated as M samples. A model representing the time evolution for each of these M weight vectors should be represented by the above equation (1). Therefore, in the following, the model representing the time evolution for each of the M weight vectors is represented by the M equations shown in the following equation (3).

In the above formula (3), j is a subscript for identifying each of the M formulas. That is, j is an integer of 1 or more and M or less. Therefore, j is also a subscript that identifies each of the M weight vectors and is also a subscript that identifies each of the M weight error vectors.

Also, since the output data is calculated according to the M weight vectors, there should be M. Then, the model representing the time evolution for each of the M output data should be represented by the above equation (2). Therefore, in the following, the model representing the time evolution for each of the M output data is represented by the M equations shown in the following equation (4).

Also in the above equation (4), j is a subscript for identifying each of the M equations. That is, j is also a subscript that identifies each of the M output vectors, and is also a subscript that identifies each of the M output error vectors.

In the ensemble Kalman filter method, the above equation (3) is expressed as the following equation (5) with the first term on the right side of the equation (3) as the estimated weight vector and the left side of the equation (3) as the predicted weight vector. It will be fixed.

The first term on the right side in the above equation (5) indicates an estimated weight vector. Further, the left side in the equation (5) indicates a predicted weight vector. Here, as shown in the equation (5), in order to obtain the predicted weight vector associated with the time k, the estimated weight vector associated with the time k-1 is required. Therefore, the estimated weight vector needs a vector as an initial value. For example, a value of 0 or more and 1 or less may be randomly assigned to each component of the vector as the initial value, or another value may be assigned by another method.

Further, in the ensemble Kalman filter method, in the above equation (4), the first term on the right side of the equation (4) is used as the estimated output vector, and the left side of the equation (4) is used as the predicted output vector, as in the following equation (6). It is re-expressed as.

Here, the first term on the right side in the above equation (6) indicates an estimated output vector. That is, in the ensemble Kalman filter method, the estimated output vector is represented by the second activation function having the predicted weight vector and the time as variables. The left side in Eq. (6) indicates the predicted output vector.

Further, in the ensemble Kalman filter method, the error ensemble vector for the estimated weight vector calculated based on the above equation (5) is expressed as the following equations (7) and (8). In the following, for convenience of explanation, the error ensemble vector will be referred to as a weight error ensemble vector.

The left side of the above equation (7) shows the weight error ensemble vector. The right side of the equation (7) shows each component of the weight error ensemble vector. As shown on the right side of equation (7), the weight error ensemble vector is defined as a horizontal vector. That is, the transposed matrix of the weight error ensemble vector is a vertical vector. Then, each component of the weight error ensemble vector is calculated by the equation (8). That is, each component of the weight error ensemble vector is the difference between each estimated weight vector and the average of M estimated weight vectors.

Further, in the ensemble Kalman filter method, the error ensemble vector for the estimated output vector calculated based on the above equation (6) is expressed as the following equations (9) and (10). In the following, for convenience of explanation, the error ensemble vector will be referred to as an output error ensemble vector.

The left side of the above equation (9) shows the output error ensemble vector. The right side of the equation (9) shows each component of the output error ensemble vector. As shown on the right side of equation (9), the output error ensemble vector is defined as a horizontal vector. That is, the transposed matrix of the output error ensemble vector is a vertical vector. Then, each component of the output error ensemble vector is calculated by the equation (10). That is, each component of the output error ensemble vector is the difference between each estimated output vector and the average of M estimated output vectors.

Then, in the ensemble Kalman filter method, the two covariance matrices used for calculating the Kalman gain matrix based on the weight error ensemble vector and the output error ensemble vector calculated based on the above equations (7) to (10) It is expressed as the following equations (11) and (12).

In the following, for convenience of explanation, the covariance matrix shown in the above equation (11) will be referred to as a first covariance matrix. The first covariance matrix is a matrix of L rows and R columns because the number of weights to be updated is L and the dimension of the output data is R.

In the following, for convenience of explanation, the covariance matrix shown in the above equation (12) will be referred to as a second covariance matrix. The second covariance matrix is a matrix of R rows and R columns because the dimension of the output data is R.

In the ensemble Kalman filter method, the Kalman gain matrix is based on the first covariance matrix calculated based on the above equation (11) and the second covariance matrix calculated based on the equation (12) as follows. It is expressed as the equation (13).

The left side of the above equation (13) shows the Kalman gain matrix. Here, as described above, the first covariance matrix is a matrix of L rows and R columns, and the second covariance matrix is a matrix of R rows and R columns. Therefore, the Kalman gain matrix is a matrix of L rows and R columns.

In the ensemble Kalman filter method, the estimated weight vector can be calculated by correcting the predicted weight vector as shown in the following equation (14) based on the Kalman gain matrix based on the above equation (13).

The first term in parentheses in the second term on the right side of the above equation (14) indicates the teacher data for the output data.

When the estimated weight vector is calculated based on the equation (14), the updated Kalman filter method calculates the updated weight to be updated based on the following equation (15).

The left side of the above equation (15) indicates the update target weight after the above update. That is, the update target weight after update is the average of the estimated weight vectors.

The machine learning device 1 calculates the updated update target weight by the above equation (15), and then performs the above-mentioned output data generation process using the calculated update target weight. Then, when the input data is next received by the input layer L1, the machine learning device 1 uses M estimated weight vectors calculated based on the above equation (14) as inputs to the above equation (5). By using it, the next weight update process is started. As described above, the machine learning device 1 performs the weight update process and the output data generation process each time the input data is received by the input layer L1.

Here, in the ensemble FORCE learning, for example, when the number of output nodes is 1, the second covariance matrix is a 1-by-1 matrix, that is, a scalar. As a result, in the ensemble FORCE learning, in this case, the Kalman gain matrix becomes an L-by-1 matrix, that is, an L-dimensional vector. From this, the machine learning device 1 that performs machine learning by ensemble FORCE learning can significantly reduce the calculation cost for calculating the Kalman gain matrix by setting the number of output nodes to 1. In the ensemble FORCE learning, even when the number of output nodes is 2 or more, since the second covariance matrix is a matrix of s rows and s columns at most, another neural network that involves the calculation of the inverse matrix ( For example, the calculation cost for calculating the Kalman gain matrix can be reduced as compared with (for example, a neural network using the extended Kalman filter method). As a result, the machine learning device 1 can suppress the occurrence of numerical instability due to the quantization error.

Also, as a general theory, updating weights by the ensemble Kalman filter method in a neural network requires preparing the same number of intermediate layers as the number of samples. Therefore, using the weight update by the ensemble Kalman filter method in the neural network is not preferable from the viewpoint of reducing the calculation cost. However, in a neural network having an intermediate layer (intermediate layer L2 in this example) in which weights are not updated in a layer such as a reservoir, as in ensemble FORCE learning, it is sufficient to prepare one intermediate layer. .. Therefore, the ensemble FORCE learning can suppress the increase in calculation cost and the occurrence of numerical instability due to the quantization error. In other words, the ensemble FORCE learner can be said to be a neural network that can achieve both the merits of adopting the reservoir computing and the merits of updating the weights by the ensemble Kalman filter method.

Here, with reference to FIG. 3, the flow of the weight update process performed by the machine learning device 1 will be described. FIG. 3 is a diagram showing an example of the flow of the weight update process performed by the machine learning device 1. The machine learning device 1 processes the flowchart shown in FIG. 3 every time the input layer L1 receives P-dimensional input data in chronological order. In the following, as an example, a case where the machine learning device 1 receives the first P-dimensional input data in the time series order at the timing before the processing of step S110 shown in FIG. 3 is performed will be described.

The machine learning device 1 specifies the initial values of each of the M estimated weight vectors (step S110). The machine learning device 1 may have a configuration in which the initial value is calculated by a random number, a configuration given by the user, or a configuration specified by another method.

Next, the machine learning device 1 calculates M predicted weight vectors based on the M initial values specified in step S110, the above equation (5), and the M weight error vectors (). Step S120). The machine learning device 1 may have a configuration in which M weight error vectors are calculated by random numbers, a configuration given by a user, or a configuration specified by another method.

Next, the machine learning device 1 calculates M predicted output vectors based on the M predicted weight vectors calculated in step S120, the above equation (6), and M output error vectors. (Step S130). The machine learning device 1 may have a configuration in which M output error vectors are calculated by random numbers, a configuration given by the user, or a configuration specified by another method.

Next, the machine learning device 1 calculates two error ensemble vectors based on the M predicted weight vectors calculated in step S120 and the M predicted output vectors calculated in step S130 (step S140). .. More specifically, the machine learning device 1 calculates the weight error ensemble vector based on the M predicted weight vectors calculated in step S120 and the above equations (7) and (8). Further, the machine learning device 1 calculates an output error ensemble vector based on the M predicted output vectors calculated in step S130 and the above equations (9) and (10).

Next, the machine learning device 1 calculates two covariance matrices based on the two error ensemble vectors calculated in step S140 (step S150). More specifically, the machine learning device 1 has a first covariance matrix based on the weight error ensemble vector calculated in step S140, the output error ensemble vector calculated in step S140, and the above equation (11). Is calculated. Further, the machine learning device 1 calculates the second covariance matrix based on the output error ensemble vector calculated in step S140 and the above equation (12).

Next, the machine learning device 1 calculates the Kalman gain matrix based on the first covariance matrix calculated in step S150, the second covariance matrix calculated in step S150, and the above equation (13). (Step S160).

Next, the machine learning device 1 includes the M predicted weight vectors calculated in step S120, the M predicted output vectors calculated in step S130, the teacher data, the Kalman gain matrix calculated in step S160, and the above. The update target weight after the update is calculated based on the equation (14) and the equation (15) of (step S180).

Next, the machine learning device 1 waits until the next input data is received by the input layer L1 (step S190).

When the machine learning device 1 determines that the next input data has been received by the input layer L1 (step S190-YES), the machine learning device 1 transitions to step S120 and converts the M estimated weight vectors calculated in step S170 executed immediately before. Based on this, M predicted weight vectors are calculated.

By processing the flowchart as described above, the machine learning device 1 performs the weight update process based on the ensemble Kalman filter method. As a result, the machine learning device 1 can significantly reduce the calculation cost for calculating the Kalman gain matrix. As a result, the machine learning device 1 can suppress the occurrence of numerical instability due to the quantization error. Further, the machine learning device 1 can perform online learning by the ensemble FORCE learning shown in FIG. 2 by the weight updating process. As a result, the machine learning device 1 can be mounted on the edge device as, for example, a device that performs machine learning by the ensemble FORCE learning. When considering mounting the ensemble FORCE learning on an edge device or the like, it is important to improve the efficiency of the weight update process. Therefore, in the weight update process, it is required to realize an efficient data flow. In particular, when ensemble FORCE learning is implemented in an edge device or the like as hardware including at least one of near memory and memory logic, realization of efficient data flow leads to speeding up of memory access speed, calculation speed, etc. , Very important. Therefore, the efficient data flow in the weight update process will be described below.

<Data flow in weight update processing>
As described above, the ensemble FORCE learner can be implemented in an edge device or the like as hardware including at least one of near memory and memory logic. The memory access speed, calculation speed, and the like of the ensemble FORCE learner mounted on the edge device or the like as the hardware differ depending on the design of the data flow in the weight update process based on the ensemble Kalman filter method. Under these circumstances, when the ensemble FORCE learner is implemented in an edge device or the like as hardware including at least one of near memory and memory logic, it is necessary to consider an efficient data flow.

Therefore, in the following, a specific example considered to be efficient as a data flow in the weight update process based on the ensemble Kalman filter method will be described.

FIG. 4 is a diagram showing an example of the overall configuration of the data flow in the weight update process based on the ensemble Kalman filter method. As shown in FIG. 4, the data flow in the weight update process is roughly divided into three blocks, block B1, block B2, and block B3. Note that each of these three blocks indicates hardware that includes at least one of near memory and memory logic. In FIG. 4, the time series order in the data flow is shown by the time k.

Block B1 is a block for calculating M predicted weight vectors. Block B1 includes a block associated with each of the M estimated weight vectors. More specifically, the block B1 includes the block B1-j as a block in which the j-th estimated weight vector among the M estimated weight vectors is input. That is, block B1 includes M blocks of blocks B1-1 to B1-M.

Here, as shown in FIG. 4, the block B1-j contains the j-th estimated weight vector among the M estimated weight vectors and the j-th weight error vector among the M weight error vectors. Entered. Then, the block B1-j calculates the j-th predicted weight vector among the M predicted weight vectors based on the estimated weight vector, the weight error vector, and the above equation (5). Block B1-j outputs the calculated predicted weight vector to block B2.

Block B2 is a block for calculating M estimated weight vectors. Block B2 includes a block associated with each of the M predicted weight vectors. More specifically, the block B2 includes the block B2-j as a block in which the j-th predicted weight vector among the M predicted weight vectors is input. That is, block B2 includes M blocks of blocks B2-1 to B2-M.

Here, in the block B2-j, as shown in FIG. 4, the j-th predicted weight vector among the M predicted weight vectors and the j-th difference vector among the M difference vectors will be described later. The Kalman gain matrix output from block B3 is input. Here, the j-th difference vector among the M difference vectors is calculated by the following equation (16).

The left side of the above equation (16) indicates the j-th difference vector among the M difference vectors. That is, the j-th difference vector among the M difference vectors is a vector in which the inside of the parentheses of the second term on the right side of the equation (14) is newly redefined as one vector.

Block B2-j is based on the j-th predicted weight vector among the M predicted weight vectors, the j-th difference vector among the M difference vectors, the Kalman gain matrix, and the above equation (14). Then, the j-th estimated weight vector out of the M estimated weight vectors is calculated. Block B2-j outputs the calculated estimated weight vector. As a result, the machine learning device 1 is updated based on the M estimated weight vectors calculated in the block B2 by other blocks (not shown in FIG. 4) and the above equation (15). Can be calculated.

Block B3 is a block for calculating the Kalman gain matrix. M predictive weight vectors output from the block B1 are input to the block B3. Then, the block B3 calculates the Kalman gain matrix based on the M predicted weight vectors. Block B3 outputs the calculated Kalman gain matrix. At this time, the block B3 also outputs the Kalman gain matrix to the block B2.

Here, FIG. 5 is a diagram showing the simplest concrete example of the data flow inside the block B3. The data flow shown in FIG. 3 is a data flow that holds regardless of the second activation function adopted in the ensemble FORCE learning, which is a non-linear function. The data flow shown in FIG. 5 is roughly divided into five blocks, blocks B31 to B35. It should be noted that each of the five blocks indicates hardware including at least one of near memory and memory logic. In FIG. 5, the time series order in the data flow is shown by the time k.

Block B31 is a block for calculating the first term on the right side of the above equation (8) and the first term on the right side of the above equation (10). The first term on the right side of the above equation (8) is, that is, the average of M predicted weight vectors. Further, the first term on the right side of the above equation (10) is, that is, the average of M predicted output vectors. That is, M prediction weight vectors and M prediction output vectors are input to the block B31. Then, the block B31 calculates the average of M prediction weight vectors and the average of M prediction output vectors. The block B31 outputs the calculated average of the M predicted weight vectors and the calculated average of the M predicted output vectors to the block B32. At this time, the block B31 further outputs the input M prediction weight vectors and the input M prediction output vectors to the block B32.

Block B32 is a block for calculating each component of the weight error ensemble vector and each component of the output error ensemble vector. That is, the average of the M predicted weight vectors output from the block B31 and the average of the M predicted output vectors output from the block B31 are input to the block B32. Further, in the block B32, M predictive weight vectors output from the block B31 and M predictive output vectors output from the block B31 are input. Then, the block B32 calculates each component of the weight error ensemble vector based on the M predicted weight vectors and the average of the M predicted weight vectors. Further, the block B32 calculates each component of the output error ensemble vector based on the average of the M predicted output vectors and the M predicted output vectors. The block B32 outputs each component of the calculated weight error ensemble vector and each component of the calculated output error ensemble vector to the block B33.

Block B33 is a block that generates a weight error ensemble vector and an output error ensemble vector. That is, each component of the weight error ensemble vector output from the block B32 and each component of the output error ensemble vector output from the block B32 are input to the block B33. Then, the block B33 generates a weight error ensemble vector based on each component of the weight error ensemble vector output from the block B32. Further, the block B32 generates an output error ensemble vector based on each component of the output error ensemble vector. The block B33 outputs the generated weight error ensemble vector and the generated output error ensemble vector to the block B34.

Block B34 is a block for calculating the first covariance matrix and the second covariance matrix. In other words, the block B35 is a block that performs the calculation of the above equations (11) and (12). That is, the weight error ensemble vector output from the block B33 and the output error ensemble vector output from the block B33 are input to the block B34. Then, the block B34 calculates the first covariance matrix based on the weight error ensemble vector and the output error ensemble vector. Further, the block B34 calculates the second covariance matrix based on the output error ensemble vector. The block B34 outputs the calculated first covariance matrix and the calculated second covariance matrix to the block B35.

Block B35 is a block for calculating the Kalman gain matrix. In other words, the block B35 is a block that performs the calculation of the above equation (13). That is, the first covariance matrix output from the block B34 and the second covariance matrix output from the block B34 are input to the block B35. Then, the block B35 calculates the Kalman gain matrix based on the first covariance matrix and the second covariance matrix. Block B35 outputs the calculated Kalman gain matrix.

Based on the above data flow, the machine learning device 1 can implement ensemble FORCE learning on an edge device or the like as hardware including at least one of near memory and memory logic. As a result, the machine learning device 1 can increase the speed of memory access, the calculation speed, and the like without using a special function as the second activation function.

<Results of machine learning by machine learning device>
Hereinafter, the result of machine learning by the machine learning device 1 will be described.

In the following, the result of machine learning by the machine learning device 1 will be described by taking as an example the result of having the machine learning device 1 machine learn the temporal change of the displacement of the double pendulum shown in FIG. FIG. 6 shows a double pendulum composed of a first weight having a mass m1 connected by a rod having a length l1 from the origin and a second weight having a mass m2 connected to the weight by a rod having a length l2. It is a figure which shows an example. The temporal changes in the displacements of the first weight and the second weight in the X-axis direction and the Y-axis direction in the double pendulum shown in FIG. 6 are deterministically described by the equation of motion. In FIG. 6, the direction in which gravity acts is the direction indicated by the arrow g.

The equation of motion for the double pendulum shown in FIG. 6 is written down for each of the first weight and the second weight. At that time, the force in the equation of motion written for each of the first weight and the second weight is between the angle θ1 between the Y-axis and the rod l1 shown in FIG. 6 and the Y-axis and the rod l2. It is shown by a function of four parameters: the angle θ2 of, the angular velocity which is the change of the angle θ1 per unit time, and the angular velocity which is the change of the angle θ2 per unit time.

Therefore, we detected these four parameters in chronological order by the sensor, and input the four parameters detected in chronological order into the machine learning device 1 as four-dimensional input data. At that time, we stored in advance the teacher data about the temporal change of the displacement of each of the first weight and the second weight in the machine learning device 1. Then, the machine learning device 1 was made to perform online learning of the temporal change of the displacement of each of the first weight and the second weight for a predetermined period. The result is the graph shown in FIGS. 7 and 8.

FIG. 7 shows the output data output from the machine learning device 1 during the period in which the machine learning device 1 is machine-learning the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows an example of the graph which plotted the temporal change. The vertical axis of the graph shown in FIG. 7 shows the displacement of the second weight in the X-axis direction. The horizontal axis of the graph shows the elapsed time. In addition, in FIG. 7, the said period is shown as a period of elapsed time 0 to elapsed time 400.

The plot PLT1 in the graph shown in FIG. 7 is a plot of teacher data. Further, the plot PLT2 in the graph is a plot of output data. As shown in FIG. 7, the degree of agreement between the output data output from the machine learning device 1 during online learning and the teacher data is not so high. This is because the machine learning device 1 is learning online.

On the other hand, FIG. 8 is output from the machine learning device 1 in the period after the machine learning device 1 is machine-learned about the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows an example of the graph which plotted the temporal change of output data. The vertical axis of the graph shown in FIG. 8 shows the displacement of the second weight in the X-axis direction. The horizontal axis of the graph shows the elapsed time. In addition, in FIG. 8, the said period is shown as a period of elapsed time 400 to elapsed time 800.

The plot PLT1 in the graph shown in FIG. 8 is a plot of teacher data. Further, the plot PLT3 in the graph is a plot of output data. As shown in FIG. 8, the degree of agreement between the output data output from the machine learning device 1 after the online learning and the teacher data is higher than that before the online learning.

Here, in the example shown in FIGS. 7 and 8, the machine learning device 1 is used when the number of intermediate nodes is 500 and the number of samples in the ensemble Kalman filter method is 100 (that is, at M = 100). This is an example of the result of having a student perform online learning. The accuracy of the results of online learning performed by the machine learning device 1 varies depending on the number of intermediate nodes and the number of samples.

In the example shown in FIGS. 9 and 10, when the number of intermediate nodes is 250 and the number of samples in the ensemble Kalman filter method is 100, a graph similar to the graphs shown in FIGS. 7 and 8 is obtained. This is an example of the result when the machine learning device 1 is drawn.

FIG. 9 shows the output data output from the machine learning device 1 during the period in which the machine learning device 1 is machine-learning the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows another example of the graph which plotted the temporal change. The vertical axis of the graph shown in FIG. 9 shows the displacement of the second weight in the X-axis direction. The horizontal axis of the graph shows the elapsed time. In addition, in FIG. 9, the said period is shown as a period of elapsed time 0 to elapsed time 400.

The plot PLT1 in the graph shown in FIG. 9 is a plot of teacher data. Further, the plot PLT4 in the graph is a plot of output data. As shown in FIG. 9, the degree of agreement between the output data output from the machine learning device 1 during online learning and the teacher data is not so high. This is because the machine learning device 1 is learning online.

On the other hand, FIG. 10 is output from the machine learning device 1 in the period after the machine learning device 1 is machine-learned about the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows another example of the graph which plotted the temporal change of output data. The vertical axis of the graph shown in FIG. 10 shows the displacement of the second weight in the X-axis direction. The horizontal axis of the graph shows the elapsed time. In addition, in FIG. 10, the said period is shown as a period of elapsed time 400 to elapsed time 800.

The plot PLT1 in the graph shown in FIG. 10 is a plot of teacher data. Further, the plot PLT5 in the graph is a plot of output data. As shown in FIG. 10, the degree of agreement between the output data output from the machine learning device 1 after the online learning and the teacher data is higher than that before the online learning. Further, as shown in FIG. 10, the degree of coincidence between the output data output from the machine learning device 1 during online learning in the example shown in FIG. 10 and the teacher data is during online learning in the example shown in FIG. Compared with the output data output from the machine learning device 1 and the teacher data, there is not much change. This is because even if the number of intermediate nodes in the example shown in FIG. 10 is half the number of intermediate nodes in the example shown in FIG. 7, the accuracy of online learning performed by the machine learning device 1 is high. Means that.

That is, the machine learning device 1 can improve the accuracy of online learning while reducing the number of intermediate nodes by the ensemble FORCE learning and the weight update process by the ensemble Kalman filter method. As a result, the machine learning device 1 can achieve both a reduction in manufacturing cost and an improvement in machine learning accuracy.

Further, the examples shown in FIGS. 11 and 12 are the same as the graphs shown in FIGS. 7 and 8 when the number of intermediate nodes is 250 and the number of samples in the ensemble Kalman filter method is 20. This is an example of the result when the graph is drawn on the machine learning device 1.

FIG. 11 shows the output data output from the machine learning device 1 during the period in which the machine learning device 1 is machine-learning the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows still another example of the graph which plotted the temporal change. The vertical axis of the graph shown in FIG. 11 shows the displacement of the second weight in the X-axis direction. The horizontal axis of the graph shows the elapsed time. In addition, in FIG. 11, the said period is shown as a period of elapsed time 0 to elapsed time 400.

The plot PLT1 in the graph shown in FIG. 11 is a plot of teacher data. Further, the plot PLT6 in the graph is a plot of output data. As shown in FIG. 11, the degree of agreement between the output data output from the machine learning device 1 during online learning and the teacher data is not so high. This is because the machine learning device 1 is learning online.

On the other hand, FIG. 12 is output from the machine learning device 1 in the period after the machine learning device 1 is made to machine learn the temporal change of the displacement of the second weight in the X-axis direction in the double pendulum shown in FIG. It is a figure which shows still another example of the graph which plotted the temporal change of output data. The vertical axis of the graph shown in FIG. 12 shows the displacement of the second weight in the X-axis direction. The horizontal axis of the graph shows the elapsed time. In addition, in FIG. 12, the said period is shown as a period of elapsed time 400 to elapsed time 800.

The plot PLT1 in the graph shown in FIG. 12 is a plot of teacher data. Further, the plot PLT7 in the graph is a plot of output data. As shown in FIG. 12, the degree of agreement between the output data output from the machine learning device 1 after the online learning and the teacher data is higher than that before the online learning. Further, as shown in FIG. 12, the degree of coincidence between the output data output from the machine learning device 1 during online learning in the example shown in FIG. 12 and the teacher data is during online learning in the example shown in FIG. Compared with the output data and the teacher data output from the machine learning device 1 of the above, there is not much change. This is because even if the number of samples in the example shown in FIG. 12 is one-fifth of the number of intermediate nodes in the example shown in FIG. 10, the accuracy of online learning performed by the machine learning device 1 is high. Means.

That is, the machine learning device 1 can improve the accuracy of online learning while reducing the number of samples by the ensemble FORCE learning and the weight update process by the ensemble Kalman filter method. As a result, the machine learning device 1 can achieve both a reduction in manufacturing cost and an improvement in machine learning accuracy.

As described above, the machine learning device according to the embodiment uses a recursive neural network having a plurality of nodes connected to each other by weighted edges, and input data of one dimension or more arranged in a predetermined order. A recursive neural network is an input layer having one or more input nodes, an intermediate layer having one or more intermediate nodes, and an output layer having one or more output nodes. , The input node, the intermediate node, and the output node are different nodes among a plurality of nodes, and the weight assigned to each edge connecting the intermediate nodes has a predetermined size. The machine learning device performs output data generation processing and weight update processing each time the input layer receives input data of one dimension or more in a predetermined order, and the output data generation processing is input. The first process of outputting the input data received by the layer from the input layer to the intermediate layer, and the intermediate data of one dimension or more corresponding to the input data input to the intermediate layer by the first process are output from the intermediate layer to the output layer. The first process, the second process, and the third process are the second process of the process and the third process of generating the output data of one dimension or more corresponding to the intermediate data of one dimension or more input to the output layer by the second process. The weight update process is performed in the order of processing, and the weight update process uses a vector having an estimated value for the weight assigned to each edge connecting the intermediate node and the output node as a component as an estimated weight vector, and outputs data of one dimension or more. In the ensemble Kalman filter method, a vector having a predicted value for is used as a predicted output vector, and based on two or more estimated weight vectors having different components and a predicted output vector calculated for each of two or more estimated weight vectors. This is a process of calculating a Kalman gain matrix and updating the weights assigned to each edge connecting the intermediate node and the output node based on the calculated Kalman gain matrix. As a result, the machine learning device causes numerical instability due to the quantization error in the recurrent neural network accompanied by the matrix calculation for calculating the Kalman gain matrix without increasing the number of quantization bits. It is possible to suppress the storage.

Further, the machine learning device calculates two or more predicted weight vectors based on two or more estimated weight vectors in the weight update process, and a predicted weight error ensemble vector based on the calculated two or more predicted weight vectors and two or more predicted weight errors ensemble vectors. A configuration may be used in which the predicted output error ensemble vector based on the predicted output vector is calculated, and the Kalman gain matrix is calculated based on the calculated predicted weight error ensemble vector and the calculated predicted output error ensemble vector.

Further, in the machine learning device, the output layer has one output node, the predicted output vector is a vector having predicted values for one-dimensional output data as components, and the Kalman gain matrix has a plurality of rows. A configuration that is a one-column matrix may be used.

Further, in the machine learning device, a configuration in which the intermediate layer is a reservoir may be used.

Further, the machine learning device may use a configuration in which at least the weight update process is performed by hardware including at least one of the near memory and the memory logic.

Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and changes, substitutions, deletions, etc., are made as long as the gist of the present invention is not deviated. May be done.

Further, a program for realizing the function of an arbitrary component in the device (for example, machine learning device 1) described above is recorded on a computer-readable recording medium, and the program is read into a computer system and executed. You may try to do it. The term "computer system" as used herein includes hardware such as an OS (Operating System) and peripheral devices. The "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD (Compact Disk) -ROM, or a storage device such as a hard disk built in a computer system. .. Furthermore, a "computer-readable recording medium" is a volatile memory (RAM) inside a computer system that serves as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, it shall include those that hold the program for a certain period of time.

Further, the above program may be transmitted from a computer system in which this program is stored in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the "transmission medium" for transmitting a program means a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
Further, the above program may be for realizing a part of the above-mentioned functions. Further, the above program may be a so-called difference file (difference program) that can realize the above-mentioned functions in combination with a program already recorded in the computer system.

1 ... Machine learning device, 11 ... Arithmetic logic unit, 12 ... Memory, 13 ... Network interface, L1 ... Input layer, L2 ... Intermediate layer, L3 ... Output layer

Claims

A machine learning device that performs machine learning of one-dimensional or higher input data arranged in a predetermined order using a recurrent neural network having a plurality of nodes connected to each other by weighted edges.
The recurrent neural network
An input layer with one or more input nodes,
An intermediate layer with one or more intermediate nodes and
An output layer with one or more output nodes,
Have,
The input node, the intermediate node, and the output node are nodes that are different from each other among the plurality of nodes.
The weight assigned to each edge connecting the intermediate nodes is fixed to a predetermined size.
The machine learning device performs an output data generation process and a weight update process each time the input layer receives the input data of one dimension or more in the predetermined order.
The output data generation process is
The first process of outputting the input data received by the input layer from the input layer to the intermediate layer, and
A second process of outputting one-dimensional or more intermediate data corresponding to the input data input to the intermediate layer by the first process from the intermediate layer to the output layer.
The third process of generating output data of one dimension or more corresponding to the intermediate data of one dimension or more input to the output layer by the second process, and
Is a process in which the first process, the second process, and the third process are performed in this order.
The weight update process is
A vector having an estimated value for the weight assigned to each edge connecting the intermediate node and the output node as a component is used as an estimated weight vector, and a vector having a predicted value for the output data of one dimension or more is used as a component. The Kalman gain matrix in the ensemble Kalman filter method is calculated based on the two or more estimated weight vectors having different components as the predicted output vectors and the predicted output vectors calculated for each of the two or more estimated weight vectors. , Is a process of updating the weight assigned to each edge connecting the intermediate node and the output node based on the calculated Kalman gain matrix.
Machine learning device.
In the weight update process, the machine learning device calculates the two or more predicted weight vectors based on the two or more estimated weight vectors, and together with the predicted weight error ensemble vector based on the calculated two or more predicted weight vectors. , The predicted output error ensemble vector based on the two or more predicted output vectors is calculated, and the Kalman gain matrix is calculated based on the calculated predicted weight error ensemble vector and the calculated predicted output error ensemble vector. ,
The machine learning device according to claim 1.
The output layer has one output node, and the output node has one.
The predicted output vector is a vector having a predicted value for the one-dimensional output data as a component.
The Kalman gain matrix is a matrix having a plurality of rows and one column.
The machine learning device according to claim 1 or 2.
The intermediate layer is a reservoir,
The machine learning device according to any one of claims 1 to 3.
At least, the weight update process is performed by hardware including at least one of near memory and memory logic.
The machine learning device according to any one of claims 1 to 4.
A machine learning program that allows a computer to perform machine learning of one-dimensional or higher input data arranged in a predetermined order using a recurrent neural network having multiple nodes connected to each other by weighted edges. ,
The recurrent neural network
An input layer with one or more input nodes,
An intermediate layer with one or more intermediate nodes and
An output layer with one or more output nodes,
Have,
The input node, the intermediate node, and the output node are nodes that are different from each other among the plurality of nodes.
The weight assigned to each edge connecting the intermediate nodes is fixed to a predetermined size.
The machine learning program performs an output data generation process and a weight update process each time the input layer receives the input data of one dimension or more in the predetermined order.
The output data generation process is
The first process of outputting the input data received by the input layer from the input layer to the intermediate layer, and
A second process of outputting one-dimensional or more intermediate data corresponding to the input data input to the intermediate layer by the first process from the intermediate layer to the output layer.
The third process of generating output data of one dimension or more corresponding to the intermediate data of one dimension or more input to the output layer by the second process, and
Is a process in which the first process, the second process, and the third process are performed in this order.
The weight update process is
A vector having an estimated value for the weight assigned to each edge connecting the intermediate node and the output node as a component is used as an estimated weight vector, and a vector having a predicted value for the output data of one dimension or more is used as a component. The Kalman gain matrix in the ensemble Kalman filter method is calculated based on the two or more estimated weight vectors having different components as the predicted output vectors and the predicted output vectors calculated for each of the two or more estimated weight vectors. , Is a process of updating the weight assigned to each edge connecting the intermediate node and the output node based on the calculated Kalman gain matrix.
Machine learning program.
A machine learning method that performs machine learning of input data of one dimension or more arranged in a predetermined order using a recurrent neural network having a plurality of nodes connected to each other by weighted edges.
The recurrent neural network
An input layer with one or more input nodes,
An intermediate layer with one or more intermediate nodes and
An output layer with one or more output nodes,
Have,
The input node, the intermediate node, and the output node are nodes that are different from each other among the plurality of nodes.
The weight assigned to each edge connecting the intermediate nodes is fixed to a predetermined size.
In the machine learning method, every time the input layer receives the input data of one dimension or more in the predetermined order, the output data generation process and the weight update process are performed.
The output data generation process is
The first process of outputting the input data received by the input layer from the input layer to the intermediate layer, and
A second process of outputting one-dimensional or more intermediate data corresponding to the input data input to the intermediate layer by the first process from the intermediate layer to the output layer.
The third process of generating output data of one dimension or more corresponding to the intermediate data of one dimension or more input to the output layer by the second process, and
Is a process in which the first process, the second process, and the third process are performed in this order.
The weight update process is
A vector having an estimated value for the weight assigned to each edge connecting the intermediate node and the output node as a component is used as an estimated weight vector, and a vector having a predicted value for the output data of one dimension or more is used as a component. The Kalman gain matrix in the ensemble Kalman filter method is calculated based on the two or more estimated weight vectors having different components as the predicted output vectors and the predicted output vectors calculated for each of the two or more estimated weight vectors. , Is a process of updating the weight assigned to each edge connecting the intermediate node and the output node based on the calculated Kalman gain matrix.
Machine learning method.
In reservoir computing, weights are updated by the ensemble Kalman filter method.
Machine learning device.