WO2013182176A1

WO2013182176A1 - Method for training an artificial neural network, and computer program products

Info

Publication number: WO2013182176A1
Application number: PCT/DE2013/000205
Authority: WO
Inventors: Gerhard DÖDING; László GERMÁN; Klaus Kemper
Original assignee: Kisters Ag
Priority date: 2012-06-06
Filing date: 2013-04-18
Publication date: 2013-12-12
Also published as: DE102012011194A1

Abstract

The invention relates to a method for training an artificial neural network comprising at least one layer of input neurons and an output layer of output neurons, wherein only the output neurons are adapted.

Description

Method of training an artificial neural network and computer program products

[01] The invention relates to a method for training an artificial neural network and computer program products. [02] In particular, the method relates to training an artificial neural network having at least one hidden layer with tributary neurons and an output layer with output neurons.

[03] Artificial neural networks are able to learn complicated nonlinear functions by means of a learning algorithm, which tries to determine all parameters of the function by iterative or recursive procedure from existing input and desired output values.

[04] The networks used are massively parallel structures for modeling arbitrary functional relationships. For this they are offered training data that represent the relationships to be modeled using examples. During training, the internal parameters of the neural networks, such as their synaptic weights, are adjusted by training processes to produce the desired response to the input data. This training is called supervised learning.

[05] Previous training processes run in such a way that in epochs, ie cycles in which the data is offered to the network, the response error at the output of the network is iteratively reduced.

CONFIRMATION COPY [06] For this purpose, the errors of the output neurons are propagated backwards into the network (backpropagation). Using various processes (gradient descent, heuristic methods such as particle swarm optimization or evolution method), the synaptic weights of all neurons in the network are then changed so that the neural network approximates the desired functionality as precisely as possible.

[07] In artificial neural networks topology refers to the structure of the network. Neurons can be arranged in consecutive layers. For example, in a network with a single trainable neuron layer, one speaks of a single-layer network. The last layer of the network, whose neuron output is usually the only one visible outside the network, is called the output layer. Layers in front of it are accordingly called hidden layers. The inventive method is suitable for neural feed forward networks of any topology having at least one layer with feeder neurons and an output layer with output neurons. [08] The described learning methods serve to cause a neural network to generate associated output patterns for particular input patterns. For this purpose, the network is trained or adapted. The training of artificial neural networks, that is estimating the parameters contained in the model, usually leads to high-dimensional nonlinear optimization problems. The principal difficulty in solving these problems in practice is often that one can not be sure whether one has found the global optimum or only a local one. An approach to the global solution usually requires a time-consuming multiple repetition of the optimization with always new starting values for the inner parameters and the given input and output values. [09] The previous training methods are very compute-intensive and therefore require long computation times, which increase very strongly with the number of connected neurons and layers. Therefore, very complex neural networks are necessary for the approximation. zierter functional relationships are required to train only very slowly so that an acceptable residual error is achieved.

[10] In addition, networks trained in such a way suffer from the risk of being suboptimal trained, since the training methods used mostly exploit only local information about the error propagation and therefore almost always get caught up in local flaws.

The object of the invention is to further develop a method for training an artificial neural network in such a way that response values with minimal deviation from the desired output values are provided at predefined input values in the shortest possible time.

[12] This object is achieved by a generic method in which only the output neurons are adapted.

In other words, for a functionality to be trained and a given network, input values and output values are given, and only the output neurons are adapted to minimize the output error.

[14] Different randomly generated tributary subnets can alternatively be connected to the same output layer.

[15] With the exception of the neurons, which represent results (output neurons), the upstream neurons (feeder neurons) generate multilevel nonlinear computations of the input values and the intermediate values of other neurons.

[16] The task of the tributary neurons is to create a suitable internal representation of the functionality to be learned in a high-dimensional space. The task of the output neurons is to examine the offer of the feeder neuron and to determine the most suitable selection of non-linear allocation results. [17] Therefore, these two classes of neurons can be adapted differently and it has surprisingly been found that the time required for training an artificial neural network can be significantly reduced if only the output neurons are adapted.

[18] The method is based on a new interpretation of the mode of action of feed-forward networks and is essentially based on two process steps: a) Create suitable internal representations of the functionality to be trained. b) Choose an optimal selection from the offer of pre-calculated outputs of the feeder neurons.

[19] The invention presented here thus relies on a completely different paradigm for describing the function of neural feed-forward networks.

[20] A feed-forward network is interpreted as a series connection of two subnetworks.

[21] The first part contains all the neurons except the output neurons. These neurons are initialized with random synaptic weights, random transfer functions, and random network topology, and are not altered at any stage of the adaptation. Therefore, they also generate only random nonlinear billing of the offered input information.

[22] The second part contains only the output neurons. These are connected according to the predetermined network topology with the first part of the network synaptic weights.

[23] According to the invention, only these weights are adapted to the task. [24] This is preferably done with a tichonov-regularized regression between the random allocations (the intermediate result offer of the first subnet) and the necessary activation of the output neurons. The synaptic weights of the output neurons therefore select, according to the invention, from the random offer of the first subnetwork preferably in only one computation step, ie not iteratively and not with methods of gradient descent, the optimal synaptic weights of the output layer.

[25] If the number of neurons in the first subnetwork is large enough, there will always be enough non-linear computation results so that the subsequent output layer can adapt very well to the task.

[26] From chance, therefore, the pre-computations arise to solve the problem of approximation. This is called random induced emergence.

[27] The invention therefore offers the following advantages:

[28] Only one calculation step is necessary for the complete adaptation of the network to the given task.

[29] Therefore, the adaptation is very fast because standard regression methods can be used (e.g., Cholesky factorization, singular value decomposition, LU decomposition, etc.).

[30] It is not possible to get stuck in a local optimum, since no gradient descent and no fault backing propagation are performed.

Because of the strong Tichonov regularization memorization (overtraining) is excluded. [32] The use of very large neural networks is easily possible. This is even advantageous because increasing the non-linear randomization increases the chance of improved approximation quality.

[33] Theoretically, a network can learn by: developing new connections, deleting existing connections, changing the weighting, adjusting the thresholds of the neurons, adding or deleting neurons. In addition, the learning behavior changes as the activation function of the neurons changes or the learning rate of the network changes.

[34] Since an artificial neural network learns mainly by modifying the weights of the neurons, it is proposed that the synaptic weights of the output neurons be determined to adapt the output neurons. A commonly performed adaptation of the feeder neurons, preferably by adaptation of their synaptic weights, is not necessary according to the invention.

[35] It is envisaged that the synaptic weights of the output neurons will be determined based on the values of those tributary neurons that are directly connected to the output neurons and the default output values.

[36] An advantageous method provides that the output neurons are adapted with fewer than five adaptation steps, preferably only one step.

[37] In adaptation or training, it is advantageous if predefined initial values are back-calculated with the inverse transfer functions.

[38] Furthermore, the invention relates to a method for controlling a system in which the future behavior of observable quantities forms the basis for the control function and artificial neural network is trained as described above. [39] A compute rogrammprodukt with compute rogrammcodemitteln to carry out the described method makes it possible to execute the process as a program on a computer.

[40] Such a computer program product can also be stored on a computer-readable data memory.

[41] An embodiment of the method according to the invention will be described in more detail with reference to Figures 1 and 2.

[42] It shows:

FIG. 1 shows a highly abstracted scheme of an artificial neural network with several levels and feed-forward property and

Figure 2 is a diagram of an artificial neuron.

The artificial neural network (1) shown in Figure 1 consists of 5 neurons (2, 3, 4, 5 and 6), of which the neurons (2, 3, 4) are arranged as a hidden layer and represent feeder neurons, while the neurons (5, 6) represent output neurons as the output layer. The input values (7, 8, 9) are assigned to the feeder neurons (2, 3, 4) and the output neurons (5, 6) are assigned output values (10, 11). The difference between the response (12) of the output neuron (5) and the output value (10), as well as the difference between the response (13) of the output neuron (6) and the output value (11), is referred to as an output error. [44] The artificial neuron scheme shown in Figure 2 shows how inputs (14, 15, 16, 17) result in a response (18). In this case, the inputs (xj, x ₂ , x _3, x _{n) are} evaluated via weights (19) and a corresponding transfer function (20) leads to an activation (21). An activation function (22) with a threshold value (23) leads to an initial value and thus to a response (18), [45] Since the weighting (19) has the strongest influence on the response (18) of the neurons (2 to 6), the training process will be described below exclusively with regard to an adaptation of the weights of the network (1).

[46] In the exemplary embodiment, in a first step of the training process, all weights (19) of the network (1) are initialized with random values in the interval [-1, 1]. Thereafter, the response (12, 13, 24, 25, 26, 27, 28, 29) of each neuron (2 to 6) is calculated for each training data set.

[47] The desired given initial values (10, 1 1) of all output neurons (5, 6) are calculated back to their necessary activations by means of the inverse transfer function of the respective output neuron (5, 6).

[48] The synaptic weights of all output neurons are determined by a ticho- nov regularized regression process between inverted predefined output values (10, 1 1) and those pre-calculation values of the tributary neurons (2, 3, 4) directly connected to the output neurons (5, 6) ) are connected. [49] If the desired approximation target is reached, ie if the output error is smaller than a set upper limit, the method ends here.

[50] Otherwise, repeat the procedure with another random initialization of the weights or a larger number of tributary neurons.

[51] This makes it possible, for example, to enter historical weather data such as sun intensity, wind speed and precipitation as input values (7, 8, 9), while the output value is the power consumption at certain times of the day. By appropriately training the network (1), the response (12, 13) is optimized so that the output error becomes sufficiently small. After that, the grid can be used for forecasts by entering predicted weather data. the expected and with the artificial neural network (1) expected power consumption values.

[52] This makes it possible to control a plant with the calculated values in order to process many input values very quickly and to convert them into control functions.

[53] Whereas for such calculations with a conventional training process in practical use many hours were necessary for training the neural network, the method according to the invention allows training within a few seconds or minutes. [54] The method described thus makes it possible to greatly reduce the time required for a given artificial neural network. Thus, the network can be chosen large enough to achieve the desired quality of the results. The short training period opens up the use of artificial neural networks in less powerful computers, especially smartphones. [55] Smartphones can thus be continuously trained during their use, after a training phase to provide the user information itself, which he retrieves regularly. If, for example, the user can display special stock market data daily via an application, these stock market data can be automatically displayed to the user during any use of the smartphone without the user first activating the application and retrieving his data.

Claims

Method for training an artificial neural network (1) comprising at least one hidden layer with feeder neurons (2, 3, 4) and one output layer with output neurons (5, 6), characterized in that only the output neurons (5, 6) adapt become.

Method according to Claim 1, characterized in that input values (7, 8, 9) and output values (10, 11) are specified for a functionality to be trained and a given network (1), and only the output neurons (5, 6) be adapted so that the output error is minimized.

Method according to one of the preceding claims, characterized in that different randomly generated feeder subnets are alternatively connected to the same output layer.

Method according to one of the preceding claims, characterized in that for adapting the output neurons (5, 6), the synaptic weights of the output neurons (5, 6) are determined.

Method according to claim 4, characterized in that the synaptic weights of the output neurons (5, 6) are determined on the basis of the values of those tributary neurons (2, 3, 4) directly connected to the output neurons (5, 6) and the predetermined ones Output values (10, 1 1) can be determined.

Method according to one of the preceding claims, characterized in that the output neurons (5, 6) are adapted with less than five adaptation steps and preferably only one step.

7. The method according to any one of the preceding claims, characterized in that predetermined output values (10, 1 1) are back-calculated with the inverse transfer functions.

8. The method according to any one of the preceding claims, characterized in that the output neurons (5, 6) are adapted with tichonov-regularized regression.

9. A method for controlling a plant, wherein the future behavior of observable quantities forms the basis for the control function and an artificial neural network is trained according to one of the preceding claims.

Computer program product with program code means for carrying out a method according to one of the preceding claims, when the program is executed on a computer.

1 1. Computer program product with program code means according to claim 10, which are stored on a computer-readable data memory.