WO2015011688A2 - Method of training a neural network - Google Patents

Method of training a neural network Download PDF

Info

Publication number
WO2015011688A2
WO2015011688A2 PCT/IB2014/063430 IB2014063430W WO2015011688A2 WO 2015011688 A2 WO2015011688 A2 WO 2015011688A2 IB 2014063430 W IB2014063430 W IB 2014063430W WO 2015011688 A2 WO2015011688 A2 WO 2015011688A2
Authority
WO
WIPO (PCT)
Prior art keywords
hidden layer
matrix
weight matrix
layer
random
Prior art date
Application number
PCT/IB2014/063430
Other languages
French (fr)
Other versions
WO2015011688A3 (en
Inventor
Timothy LILLICRAP
Colin AKERMAN
Douglas Tweed
Daniel COWNDEN
Original Assignee
Isis Innovation Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Isis Innovation Ltd. filed Critical Isis Innovation Ltd.
Priority to US14/907,560 priority Critical patent/US20160162781A1/en
Priority to EP14755417.4A priority patent/EP3025277A2/en
Publication of WO2015011688A2 publication Critical patent/WO2015011688A2/en
Publication of WO2015011688A3 publication Critical patent/WO2015011688A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to a method of training a neural network, and a system comprising a neural network.
  • the work leading to this invention had received funding from the European Research Council under ERC grant agreement no. 243274.
  • Artificial neural networks are computational systems, based on biological neural networks. Artificial neural networks (hereinafter referred to as 'neural networks') have been used in a wide range of applications where extraction of information or patterns from potentially noisy input data is required. Such applications include character, speech and image recognition, document search, time series analysis, medical image diagnosis and data mining.
  • Neural networks typically comprise a large number of interconnected nodes. In some classes of neural networks, the nodes are separated into different layers, and the connections between the nodes are characterised by associated weights. Each node has an associated function causing it to generate an output dependent on the signals received on each input connection and the weights of those connections. Neural networks are adaptive, in that the connection weights can be adjusted to change the response of the network to a particular input or class of inputs.
  • artificial neural networks can be trained by using a training set comprising a set of inputs and corresponding expected outputs.
  • the goal of training is to tune a network's parameters so that it performs well on the training set and, importantly, to generalize to untrained ' test' data.
  • an error signal is generated from the difference between the expected output and the actual output of the network, and a summary of the error called the loss or cost is computed (typically, the sum of squared errors).
  • the loss or cost typically, the sum of squared errors.
  • This gradient is used as a training signal and is generated from the forward connection weights and error signal and fed back to modify the forward connection weights.
  • Backprop thus requires that error be fed back through the network via a pathway which depends explicitly and intricately on the forward connections.
  • This requirement of a strict match between the forward path and feedback path is problematic for a number of reasons.
  • One issue which arises when training deep networks is the " vanishing gradient' problem where the backward path tends to shrink the error gradients and thus make very small updates to neurons in deeper layers which prevents effective learning in such deeper networks).
  • this strict connectivity requirement can be extremely difficult to instantiate.
  • perturbation or reinforcement methods computes estimates of the gradient of the loss with respect to the network weights. It does this by correlating small changes in the forward connection weights with changes in the loss. Perturbation methods are simple in that they require only the scalar loss signal to be fed back to the network, with no knowledge of the forward connection weights used in the feedback process. In small networks this method can sometimes learn as quickly as backprop. However, the estimate of the gradient becomes worse as the size of the network grows, and does not improve over the course of learning.
  • a method of training a neural network having at least an input layer, a hidden layer and an output layer, and a plurality of forward weight matrices encoding connection weights between successive pairs of layers, the method comprising the steps of:
  • the change matrix may be the cross product of the fixed random feedback weight matrix and the error vector.
  • the method may comprise an initial step of initialising the neural network with random connection weight values.
  • the method may comprise an initial step of generating the fixed random feedback weight matrix.
  • the fixed random feedback weight matrix elements may comprise random values from a uniform distribution over [-a, a] where a is a scalar.
  • the method may comprise iteratively performing steps (a) to (e) for a plurality of input values.
  • Step (e) may comprise modifying the forward weight matrix encoding connection weights between the pair of layers comprising the input layer and the hidden layer.
  • Step (e) may comprise modifying the forward weight matrix encoding connection weights between the pair of layers comprising the hidden layer and the output layer
  • the neural network may comprise a plurality of hidden layers, each hidden layer having an associated forward weight matrix and an associated fixed random backward weight matrix, the method comprising the steps of; generating a change matrix for each hidden layer using the associated fixed random weight matrix and; modifying each forward weight matrix in accordance with the respective change matrix.
  • the hidden layers may comprise a first hidden layer and a second hidden layer, the second hidden layer being deeper than the first hidden layer, wherein the step of generating a change matrix for the second hidden layer comprises calculating a product of the associated random weight matrix and the error vector.
  • the hidden layers may comprise a first hidden layer and a second hidden layer, the second hidden layer being deeper than the first hidden layer, wherein the step of generating a change matrix for the second hidden layer comprises calculating a product of the fixed random weight matrix associated with the first hidden layer, the random weight matrix associated with the second hidden layer, and the error vector.
  • the elements of the fixed random weight matrices may comprise random values from a uniform distribution over [-a, a] where a is a scalar and where a is different for each fixed random weight matrix.
  • a system comprising a neural network where the neural network is trained by a method according to the first aspect of the invention.
  • Fig. 1 is a diagrammatic illustration of an neural network
  • FIG. 2 is an illustration of a known method of training a neural network
  • FIG. 3 is an illustration of a method of training a neural network embodying the present invention
  • FIG. 4 is a flow chart showing a method of training a neural network embodying the present invention
  • Fig. 5 is a graph showing error as a function of training time for the neural network of figures 2 and 3 using different training methods
  • Fig. 6 is a graph showing the angle between updates made by the method of Figure 3 and by backpropagation
  • Fig. 7 is a graph similar to Fig. 6 showing the angle between updates made by the method of Figure 3 and by backpropagation changes in individual neurons in the hidden layer of the network of figure 2.
  • Fig. 8 is a graph similar to Figure 5 showing error as a function of training time for the neural network of figures 2 and 3 using different training methods trained on a standard dataset.
  • Fig. 9a is a method similar to Fig. 3 illustrating a further method of training an neural network
  • Fig. 9b illustrates a method similar to that of Fig. 9a
  • Fig. 10 is shows the results of training a neural network for character recognition using a known method of training neural networks and a method embodying the present invention.
  • the neural network 10 comprises an input layer 11 to receive data having a plurality of nodes 11a, lib, 11c, a hidden layer 12 having a plurality of nodes 12a, 12b, 12c, 12d and an output layer 13 having a plurality of nodes 13a, 13b.
  • Each of the nodes of input layer 11 are connected to each of the nodes of hidden layer 12, and each of the nodes of hidden layer 12 are connected each of the nodes of output layer 13.
  • Each of the connections between nodes in successive pairs of layers has an associated weight held in a matrix, and the number of layers and nodes is typically selected or adjusted according to the application the neural network 10 is intended to perform.
  • a conventional method of training a neural network 10 is that of backpropagation, illustrated with reference to figure 2.
  • Figure 2 illustrates a 3-layer neural network 10.
  • the matrix of connection weights between input layer 11 and hidden layer 12 is given by W 0 and the matrix of connection weights between hidden layer 12 and output layer 13 is given by W.
  • L ⁇ e T e
  • the error e y * —y
  • y * is the expected output.
  • the backpropagation algorithm sends the loss rapidly toward zero. It exploits the depth of the network by adjusting the hidden-unit weights according to the gradient of the loss.
  • the output weights W are adjusted using the formula
  • the method proceeds by computing a modification for the output weights, and then using the product of the transpose of the output weight matrix and the error vector to compute a modification for the upstream weight matrix. Consequently, information about downstream connection weights must be used to calculate the changes to upstream connection weights.
  • B is a matrix of fixed random weights. B must have the same dimensions as W . But B does not contain any information about the forward connection weights, and may be generated in any appropriate way.
  • the elements of B comprise random values from a uniform distribution over [—a, a], although any other suitable distribution may be used as appropriate, for example a Gaussian distribution. The method is described herein as 'feedback alignment'.
  • a method of implementing the invention is illustrated in flow diagram 20 in figure 8.
  • a neural network is initialised, for example by randomly selecting connection weights over the uniform interval [-0.01, 0.01].
  • a random weight matrix i? is generated by randomly selecting element values over a suitable distribution.
  • an input having a corresponding expected output is supplied to the network, and at step 23 an output received from the network.
  • an error vector is calculated from the difference between the expected output and the received output, and at step 25 a change matrix calculated from the product of the error vector and the random weight matrix.
  • the connection weights of a weight matrix in the network are modified, for example by adding the change matrix and the weight matrix.
  • the network is tested to check whether the training is complete, for example when an error value is below a suitable threshold. If not, steps 22 to 26 are repeatedly performed for a plurality of inputs and corresponding expected outputs until step 27 is passed.
  • the upstream weight matrix is modified in accordance with the change weight matrix as described, and the output weight matrix may be modified in accordance with conventional backpropagation methods or using feedback alignment, or indeed vice versa.
  • a 30-20-10 neural network was trained to approximate a linear function.
  • the error is plotted against number of training examples in the graph of figure 5.
  • the upper line shows the results of adjusting the output weights W only.
  • the next line illustrates a fast perturbation method (node perturbation)).
  • the lower two lines show conventional backpropagation training and training with a random matrix as described above, and it is clear that training the network with backpropagation and with a method embodying the invention are equally effective.
  • Ah BP W T e
  • Ah FA Be
  • backprop always explicitly and precisely computes the gradient
  • perturbation methods estimate a noisy approximation of the gradient, but this estimate does not improve over the course of training and degrades with larger network sizes.
  • Feedback alignment shapes the forward weights over time so that the random feedback weights deliver increasingly good updates, and does so even as the size of the networks grows.
  • feedback alignment represents a third fundamental approach to tuning parameters in a neural network, distinct from both backprop and perturbation methods.
  • This method has the advantage that the feedback pathway does not need to be constructed with knowledge of the forward connections.
  • training using this method has several other advantages. It can act as a natural regularizer (to help generalization) which is more effective than weight decay (i.e. an L2-norm penalty on the weight magnitudes). It can be combined with recently developed regularizers such as 'dropout' to give additional benefit.
  • the top trace shows performance when only the output weights are trained.
  • Backprop begins to overfit near the end of training, giving worse errors on the test set.
  • Feedback Alignment is just as quick as backprop and consistently reaches a lower error on the test set. In deeper networks with more neurons the same effect holds.
  • the best reported performance on the test set with a feedforward network using L2-norm penalty regularization is 1.6% error.
  • Performance using ' dropout' regularization without additional unsupervised training also gives 1.3% error.
  • an error rate of 1.12% is achieved.
  • neural networks with more than one hidden layer may be desirable as shown in figures 9a and 9b.
  • a neural network 30 is shown with an input layer 31, a first hidden layer 32a, a second hidden layer 32b, and an output layer 33.
  • Connection weights between the input layer 31 and the first hidden layer 32a are given by first connection matrix Wo, between the first hidden layer 32a and the second hidden layer 32b by Wi, and between the second hidden layer 32a and the output layer 33 by M -
  • first connection matrix Wo between the first hidden layer 32a and the second hidden layer 32b by Wi
  • Each layer 32a, 32b has an associated fixed random feedback weight matrix Bi, Bz ' m the example of figure 4 generated in step 21.
  • the range [-a, a] for the elements of each fixed random feedback weight matrix may be different for each matrix.
  • abs() takes the absolute value of each element in a matrix and mean() takes the mean of all the elements in a matrix.
  • FIG. 10 An example is shown in figure 10, in which a 784-1000-10 network with nodes having a sigmoidal response function was trained to categorise handwritten digits.
  • the top image shows the initially hidden unit features, the second image features learned using backpropagation and the third image shows features learnt using the method described herein.
  • Such a system may be especially suitable for use in the design of special purpose physical microchips (Very Large Scale Integrated chips - VLSI chips).
  • special purpose physical hardware that is able to compute like a network.
  • Hardware based networks compute faster and can be installed in small devices like cameras or mobile phones. Training these "on-chip” networks has always been difficult with backpropagation or similar learning algorithms because they require precise transport of error signals and writing circuits that obtain this precision is difficult or impossible.
  • Most approaches to this problem have proposed using reinforcement or ' perturbation' approaches, but these give much slower learning than backprop as the size of the trained network grows.
  • the method described above removes the need for the kind of precision of connectivity required by backprop, making it suitable for training such hardware versions of neural networks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Feedback Control In General (AREA)

Abstract

A method of training a neural network having at least an input layer, an output layer and a hidden layer, and a weight matrix encoding connection weights between two of the layers, the method comprising the steps of (a) providing an input to the input layer, the input having an associated expected output, (b) receiving a generated output at the output layer, (c) generating an error vector from the difference between the generated output and expected output, (d) generating a change matrix, the change matrix being the product of a random weight matrix and the error vector, and (e) modifying the weight matrix in accordance with the change matrix.

Description

Method of Training a Neural Network
[1] The present invention relates to a method of training a neural network, and a system comprising a neural network. The work leading to this invention had received funding from the European Research Council under ERC grant agreement no. 243274.
Background to the Invention
[2] Artificial neural networks are computational systems, based on biological neural networks. Artificial neural networks (hereinafter referred to as 'neural networks') have been used in a wide range of applications where extraction of information or patterns from potentially noisy input data is required. Such applications include character, speech and image recognition, document search, time series analysis, medical image diagnosis and data mining.
[3] Neural networks typically comprise a large number of interconnected nodes. In some classes of neural networks, the nodes are separated into different layers, and the connections between the nodes are characterised by associated weights. Each node has an associated function causing it to generate an output dependent on the signals received on each input connection and the weights of those connections. Neural networks are adaptive, in that the connection weights can be adjusted to change the response of the network to a particular input or class of inputs.
[4] Conventionally, artificial neural networks can be trained by using a training set comprising a set of inputs and corresponding expected outputs. The goal of training is to tune a network's parameters so that it performs well on the training set and, importantly, to generalize to untrained 'test' data. To achieve this, an error signal is generated from the difference between the expected output and the actual output of the network, and a summary of the error called the loss or cost is computed (typically, the sum of squared errors). Then, one of two basic approaches is typically taken to tune the network parameters to reduce the loss: approaches based on either
backpropagation of error or perturbation methods.
[5] The first, called back-propagation of error learning (or 'backprop'), computes the precise gradient of the loss with respect to the network weights. This gradient is used as a training signal and is generated from the forward connection weights and error signal and fed back to modify the forward connection weights. Backprop thus requires that error be fed back through the network via a pathway which depends explicitly and intricately on the forward connections. This requirement of a strict match between the forward path and feedback path is problematic for a number of reasons. One issue which arises when training deep networks is the "vanishing gradient' problem where the backward path tends to shrink the error gradients and thus make very small updates to neurons in deeper layers which prevents effective learning in such deeper networks). And, in hardware implementations of neural network learning this strict connectivity requirement can be extremely difficult to instantiate.
[6] The second approach, called perturbation or reinforcement methods, computes estimates of the gradient of the loss with respect to the network weights. It does this by correlating small changes in the forward connection weights with changes in the loss. Perturbation methods are simple in that they require only the scalar loss signal to be fed back to the network, with no knowledge of the forward connection weights used in the feedback process. In small networks this method can sometimes learn as quickly as backprop. However, the estimate of the gradient becomes worse as the size of the network grows, and does not improve over the course of learning.
Summary of the Invention
[7] According to a first aspect of the invention there is provided a method of training a neural network having at least an input layer, a hidden layer and an output layer, and a plurality of forward weight matrices encoding connection weights between successive pairs of layers, the method comprising the steps of:
(a) providing an input to the input layer, the input having an associated expected output,
(b) receiving a generated output at the output layer,
(c) generating an error vector from the difference between the generated output and expected output,
(d) for at least one pair of the layers, generating a change matrix, the change matrix being the product of a fixed random feedback weight matrix and the error vector, and
(e) modifying the forward weight matrix for the at least one pair of the layers in accordance with the change matrix.
[8] The change matrix may be the cross product of the fixed random feedback weight matrix and the error vector. [9] The method may comprise an initial step of initialising the neural network with random connection weight values.
[10] The method may comprise an initial step of generating the fixed random feedback weight matrix.
[11] The fixed random feedback weight matrix elements may comprise random values from a uniform distribution over [-a, a] where a is a scalar.
[12] The method may comprise iteratively performing steps (a) to (e) for a plurality of input values.
[13] Step (e) may comprise modifying the forward weight matrix encoding connection weights between the pair of layers comprising the input layer and the hidden layer.
[14] Step (e) may comprise modifying the forward weight matrix encoding connection weights between the pair of layers comprising the hidden layer and the output layer
[15] The neural network may comprise a plurality of hidden layers, each hidden layer having an associated forward weight matrix and an associated fixed random backward weight matrix, the method comprising the steps of; generating a change matrix for each hidden layer using the associated fixed random weight matrix and; modifying each forward weight matrix in accordance with the respective change matrix.
[16] The hidden layers may comprise a first hidden layer and a second hidden layer, the second hidden layer being deeper than the first hidden layer, wherein the step of generating a change matrix for the second hidden layer comprises calculating a product of the associated random weight matrix and the error vector.
[17] The hidden layers may comprise a first hidden layer and a second hidden layer, the second hidden layer being deeper than the first hidden layer, wherein the step of generating a change matrix for the second hidden layer comprises calculating a product of the fixed random weight matrix associated with the first hidden layer, the random weight matrix associated with the second hidden layer, and the error vector. [18] The elements of the fixed random weight matrices may comprise random values from a uniform distribution over [-a, a] where a is a scalar and where a is different for each fixed random weight matrix.
[19] According to a second aspect of the invention is provided a system comprising a neural network where the neural network is trained by a method according to the first aspect of the invention.
Brief Description of the Drawings
[20] An embodiment of the invention is described by way of example only with reference to the accompanying drawings, wherein;
[21] Fig. 1 is a diagrammatic illustration of an neural network,
[22] Fig. 2 is an illustration of a known method of training a neural network,
[23] Fig. 3 is an illustration of a method of training a neural network embodying the present invention,
[24] Fig. 4 is a flow chart showing a method of training a neural network embodying the present invention,
[25] Fig. 5 is a graph showing error as a function of training time for the neural network of figures 2 and 3 using different training methods
[26] Fig. 6 is a graph showing the angle between updates made by the method of Figure 3 and by backpropagation,
[27] Fig. 7 is a graph similar to Fig. 6 showing the angle between updates made by the method of Figure 3 and by backpropagation changes in individual neurons in the hidden layer of the network of figure 2.
[28] Fig. 8 is a graph similar to Figure 5 showing error as a function of training time for the neural network of figures 2 and 3 using different training methods trained on a standard dataset.
[29] Fig. 9a is a method similar to Fig. 3 illustrating a further method of training an neural network,
[30] Fig. 9b illustrates a method similar to that of Fig. 9a, and [31] Fig. 10 is shows the results of training a neural network for character recognition using a known method of training neural networks and a method embodying the present invention.
Detailed Description of the Preferred Embodiments
[32] With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred
embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
[33] Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of the components set forth in the following description or illustrated in the drawings. The invention is applicable to other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.
[34] Referring now to figure 1, a conventional feedforward neural network is shown at 10. The neural network 10 comprises an input layer 11 to receive data having a plurality of nodes 11a, lib, 11c, a hidden layer 12 having a plurality of nodes 12a, 12b, 12c, 12d and an output layer 13 having a plurality of nodes 13a, 13b. Each of the nodes of input layer 11 are connected to each of the nodes of hidden layer 12, and each of the nodes of hidden layer 12 are connected each of the nodes of output layer 13. Each of the connections between nodes in successive pairs of layers has an associated weight held in a matrix, and the number of layers and nodes is typically selected or adjusted according to the application the neural network 10 is intended to perform.
[35] A conventional method of training a neural network 10 is that of backpropagation, illustrated with reference to figure 2. Figure 2 illustrates a 3-layer neural network 10. The matrix of connection weights between input layer 11 and hidden layer 12 is given by W0 and the matrix of connection weights between hidden layer 12 and output layer 13 is given by W. The output of neural network 11 is given by y = Wh. h is the hidden-unit activity vector, in turn given by h = W0x, where x is the input to the network 10. In training, the goal is to reduce the squared error, or loss, L = ~ eTe where the error e = y*—y, where y* is the expected output. For ease of presentation we develop only a linear network here. The same approach applies for the case where the network is non-linear, so that , e.g. y = a(Wh and h = a(W0x), where σ(-) is a non-linear function (e.g. the standard sigmoid, σ(χ) = 1/(1 + e~x) or σ(χ) = tanh(x)).
[36] In conventional backpropagation training, the backpropagation algorithm sends the loss rapidly toward zero. It exploits the depth of the network by adjusting the hidden-unit weights according to the gradient of the loss. The output weights W are adjusted using the formula
Similarly, the upstream weights Wo are adjusted using the formula
Figure imgf000007_0001
Accordingly, the method proceeds by computing a modification for the output weights, and then using the product of the transpose of the output weight matrix and the error vector to compute a modification for the upstream weight matrix. Consequently, information about downstream connection weights must be used to calculate the changes to upstream connection weights. The computed change matrices are then applied to update the parameters via: Wt+1 = Wf — ηΑ , and WQ +1 = WQ — η νν0, where t is the time step and 77 is a scalar learning rate less than 1.
[37] A method embodying the invention is illustrated in figures 3 and 4. The output weights W are adjusted as described above with reference to figure 2. However, the upstream weights Wo are adjusted in accordance with the formula
AW0 = {Be)xT where B is a matrix of fixed random weights. B must have the same dimensions as W . But B does not contain any information about the forward connection weights, and may be generated in any appropriate way. In the examples described herein, the elements of B comprise random values from a uniform distribution over [—a, a], although any other suitable distribution may be used as appropriate, for example a Gaussian distribution. The method is described herein as 'feedback alignment'.
[38] A method of implementing the invention is illustrated in flow diagram 20 in figure 8. At step 21, a neural network is initialised, for example by randomly selecting connection weights over the uniform interval [-0.01, 0.01]. A random weight matrix i?is generated by randomly selecting element values over a suitable distribution. At step 22, an input having a corresponding expected output is supplied to the network, and at step 23 an output received from the network. At step 24 an error vector is calculated from the difference between the expected output and the received output, and at step 25 a change matrix calculated from the product of the error vector and the random weight matrix. At step 26 the connection weights of a weight matrix in the network are modified, for example by adding the change matrix and the weight matrix. At step 27, the network is tested to check whether the training is complete, for example when an error value is below a suitable threshold. If not, steps 22 to 26 are repeatedly performed for a plurality of inputs and corresponding expected outputs until step 27 is passed.
[39] In the example of a 3-layer neural network as illustrated above, at step 26 the upstream weight matrix is modified in accordance with the change weight matrix as described, and the output weight matrix may be modified in accordance with conventional backpropagation methods or using feedback alignment, or indeed vice versa.
[40] In an example, a 30-20-10 neural network was trained to approximate a linear function. The error is plotted against number of training examples in the graph of figure 5. In figure 5, the upper line shows the results of adjusting the output weights W only. The next line illustrates a fast perturbation method (node perturbation)). The lower two lines show conventional backpropagation training and training with a random matrix as described above, and it is clear that training the network with backpropagation and with a method embodying the invention are equally effective.
[41] It has been unexpectedly found that using this much simpler formula enables a neural network to trained at least as quickly as using backpropagation. This is unexpected because it is clear that feedback via B will not, at least at first, follow the gradient of the loss. Rather, as is shown in Figure 6, the updates delivered to the hidden layer improve over time via implicit, self organizing network dynamics. Figure 6 compares the updates made by backprop and feedback alignment. Initially, feedback alignment takes steps which are approximately orthogonal (i.e. 90 degrees) to those prescribed by backprop, but over time feedback alignment makes changes which are more similar to backprop (the trace corresponds to the feedback alignment learning in Fig. 4). The trace plots the angle between the update sent to the hidden units by backprop, i.e. AhBP = WTe, and that sent by feedback alignment, i.e. AhFA = Be. In contrast, backprop always explicitly and precisely computes the gradient, and perturbation methods estimate a noisy approximation of the gradient, but this estimate does not improve over the course of training and degrades with larger network sizes. Feedback alignment shapes the forward weights over time so that the random feedback weights deliver increasingly good updates, and does so even as the size of the networks grows. Thus, feedback alignment represents a third fundamental approach to tuning parameters in a neural network, distinct from both backprop and perturbation methods.
[42] The method is believed to be effective for the following reasons. Any feedback matrix B will be effective, as long as, on average, eTWBe > 0. Geometrically this means that the teaching signal sent by the random matrix Be is within 90° of the signal used in backpropagation, WTe, such that the random matrix is pushing the network in roughly the same direction as conventional backpropagation. Initially, updates to W0 are not effective but quickly improve by an implicit feedback process which alters the relationship between W and B such that eTWBe > 0 holds. Over the training process, the direction of changes due to the backpropagation process and the present method converge, suggesting that B begins to act like WT. As B is fixed, the direction is driven by changes in W, suggesting that random feedback weights transmit back useful teaching signals to layers deep in a network.
[43] This method has the advantage that the feedback pathway does not need to be constructed with knowledge of the forward connections. In addition, training using this method has several other advantages. It can act as a natural regularizer (to help generalization) which is more effective than weight decay (i.e. an L2-norm penalty on the weight magnitudes). It can be combined with recently developed regularizers such as 'dropout' to give additional benefit.
[44] The regularization effect is thought to come from the fact that the forward weights in a network trained with feedback alignment are shaped simultaneously by two requirements: they are required to reduce the loss, but are also encouraged to 'align' with the random backward matrices. This 'alignment' process is shown in Figure 7 for 20 randomly selected hidden neurons. Figure 7 demonstrates the 'alignment' process which is unexpected and key to the feedback alignment method. Each trace corresponds to a single neuron in the hidden layer of a 3-layer network and shows the angle between the forward weights vector and fixed backward weights vector for that neuron. For most of the neurons, this angle quickly drops and stays well below 90 degrees. Thus learning dynamics implicitly instruct the forward weights to 'align' with the backward weights which are fixed. The angle between the forward weights vector and the fixed random backward weights vector for each neuron tends to decrease over time. In this way feedback alignment places a soft constraint on the forward weight parameters which keeps them from overfitting on training data. This improves generalization performance. Figure 8 shows a straightforward example of this generalization effect, for a simple 3-layer network with 1000 hidden neurons trained on the M NIST dataset. The graph demonstrates that feedback alignment provides better regularization than standard L2-norm weight decay. A network with a single hidden layer trained with Feedback Alignment on the MNIST handwriting dataset continues to improve on the training set, reaching an error rate of 2.1%. The same network trained with backprop using L2 weight decay does not and plateaus at an error rate of 2.4%. For comparison, the top trace shows performance when only the output weights are trained. Backprop begins to overfit near the end of training, giving worse errors on the test set. Feedback Alignment is just as quick as backprop and consistently reaches a lower error on the test set. In deeper networks with more neurons the same effect holds. On the unenhanced, permutation invariant version of the MNSIT data set, the best reported performance on the test set with a feedforward network using L2-norm penalty regularization is 1.6% error. In this example using feedback alignment an error of 1.3% is consistently achieved. Performance using 'dropout' regularization without additional unsupervised training also gives 1.3% error. By combining feedback alignment with dropout, an error rate of 1.12% is achieved.
[45] Because the feedback path is not tied to the forward connections weights, it is simple to avoid the so called 'vanishing gradient' problem in deeper networks but at a much lower computational load than is required with the second order approaches (e.g. Hessian-Free methods or LBFGS) which are sometimes used to overcome this issue. Since the feedback pathway for Feedback Alignment is decoupled from the forward pathway it is possible to pick the scale of the forward and backward weights separately. Small weights, which are the preferred way to initialize a network, can be used for the forward weights, while the scale of the backward weights may be chosen to insure that errors flow to the deepest layer without 'vanishing'. In this fashion, we have successfully trained networks with >10 layers with Feedback Alignment even when all of the forward weights are initialized very close to 0. Backprop fails completely to train deep networks with this initialization since the feedback pathway is tied to the forward pathway and delivers updates to deeper layers which are too small to be useable (this is the 'vanishing gradient' problem). Second order methods (i.e. those based on Newton's method, e.g. Hessian-Free methods or LBFGS) are able to overcome the vanishing gradient issue and train networks from this initialization, but these require a great deal more computation than feedback alignment.
[46] In some applications, neural networks with more than one hidden layer may be desirable as shown in figures 9a and 9b. In these figures, a neural network 30 is shown with an input layer 31, a first hidden layer 32a, a second hidden layer 32b, and an output layer 33. Connection weights between the input layer 31 and the first hidden layer 32a are given by first connection matrix Wo, between the first hidden layer 32a and the second hidden layer 32b by Wi, and between the second hidden layer 32a and the output layer 33 by M - In conventional backpropagation, errors are transmitted to the deeper layers in a stepwise manner, such that ΔΛ0 = WfWje. In the present case, it has been found that random weight matrices are effective. Each layer 32a, 32b has an associated fixed random feedback weight matrix Bi, Bz 'm the example of figure 4 generated in step 21. The range [-a, a] for the elements of each fixed random feedback weight matrix may be different for each matrix. As illustrated in figure 9a, the change in the hidden layer activity vector can be calculated as ΔΛ0 = BtB2e. In some cases, the errors can be propagated directly to deeper layers, in this example such that ΔΛ0 = Bte. That is, it is possible to indiscriminately broadcast error vectors. All that is required is for each node to receive a scalar that is a randomly weighted sum of the error vector.
[47] In networks with 1 or 2 hidden layers, it is simple to manually select (e.g. by trial and error) a scale for the feedback matrices which produces good learning results. In networks with many hidden layers, it becomes important to choose the scale of the feedback matrices more carefully so that error flows back to the deep layers without becoming too small (i.e. 'vanishing') or becoming too large (i.e. 'exploding'). That is, each B feedback matrix should be drawn from a distribution that keeps the changes for each layer of the network within roughly the same range. One simple way to achieve this is to choose the elements for each B from the same uniform distribution over [-a, a], and then examine the change matrices produces and adjust the scale of each B so that changes made at each layer have roughly the same size. One way to do this is to multiplicatively adjust the elements of each Bt. If a network has forward weight matrices Wit with ί £ {0,1, ... , N}, and the corresponding change matrices AWi have been computed by first doing a forward pass and then a backward pass with the existing feedback matrices, then we update the Bt with ί £ {1, ... , N} in pseudocode as follows: for i in {0,1, ... , N - 1}: if (mean(abs(AV j)) > 1.0): Bi+1 = 0.9*Bi+1 if (mean(abs(AV j)) < 0.001): Bi+1 = l.l*Si+i
Here abs() takes the absolute value of each element in a matrix and mean() takes the mean of all the elements in a matrix. In practice, we find that this kind of update to the backward matrices only needs to be applied every few thousand learning steps, and that once good ranges for the elements of Bi have been found, it is possible to discontinue this strategy to save computation.
[48] It will be apparent that a system, such as a computer, which has a neural network trained in this manner may have many applications. An example is shown in figure 10, in which a 784-1000-10 network with nodes having a sigmoidal response function was trained to categorise handwritten digits. The top image shows the initially hidden unit features, the second image features learned using backpropagation and the third image shows features learnt using the method described herein.
[49] Such a system may be especially suitable for use in the design of special purpose physical microchips (Very Large Scale Integrated chips - VLSI chips). There is a growing interest in producing special purpose physical hardware that is able to compute like a network. Hardware based networks compute faster and can be installed in small devices like cameras or mobile phones. Training these "on-chip" networks has always been difficult with backpropagation or similar learning algorithms because they require precise transport of error signals and writing circuits that obtain this precision is difficult or impossible. Most approaches to this problem have proposed using reinforcement or 'perturbation' approaches, but these give much slower learning than backprop as the size of the trained network grows. The method described above removes the need for the kind of precision of connectivity required by backprop, making it suitable for training such hardware versions of neural networks.
[50] In the above description, an embodiment is an example or implementation of the invention. The various appearances of "one embodiment", "an embodiment" or "some embodiments" do not necessarily all refer to the same embodiments.
[51] Although various features of the invention may be described in the context of a single embodiment, the features may also be provided separately or in any suitable combination.
Conversely, although the invention may be described herein in the context of separate
embodiments for clarity, the invention may also be implemented in a single embodiment.
[52] Furthermore, it is to be understood that the invention can be carried out or practiced in various ways and that the invention can be implemented in embodiments other than the ones outlined in the description above.
[53] Meanings of technical and scientific terms used herein are to be commonly understood as by one of ordinary skill in the art to which the invention belong, unless otherwise defined.

Claims

1. A method of training a neural network having at least an input layer, a hidden layer and an output layer, and a plurality of forward weight matrices encoding connection weights between successive pairs of layers, the method comprising the steps of:
(a) providing an input to the input layer, the input having an associated expected output,
(b) receiving a generated output at the output layer,
(c) generating an error vector from the difference between the generated output and expected output,
(d) for at least one pair of the layers, generating a change matrix, the change matrix being the product of a fixed random feedback weight matrix and the error vector, and
(e) modifying the forward weight matrix for the at least one pair of the layers in accordance with the change matrix.
2. A method according to claim 1 wherein the change matrix is the cross product of the fixed random feedback weight matrix and the error vector.
3. A method according to claim 1 or claim 2 comprising an initial step of initialising the neural network with random connection weight values.
4. A method according to any one of the preceding claims comprising an initial step of generating the fixed random feedback weight matrix.
5. A method according to claim 4 wherein the fixed random feedback weight matrix elements comprise random values from a uniform distribution over [-a, a] where a is a scalar.
6. A method according to any one of the preceding claims comprising iteratively performing steps (a) to (e) for a plurality of input values.
7. A method according to any one of the preceding claims wherein step (e) comprises modifying the forward weight matrix encoding connection weights between the pair of layers comprising the input layer and the hidden layer.
8. A method according to any one of the preceding claims wherein step (e) comprises modifying the forward weight matrix encoding connection weights between the pair of layers comprising the hidden layer and the output layer
9. A method according to any one of the preceding claims wherein the neural network comprises a plurality of hidden layers, each hidden layer having an associated forward weight matrix and an associated fixed random backward weight matrix, the method comprising the steps of; generating a change matrix for each hidden layer using the associated fixed random weight matrix and; modifying each forward weight matrix in accordance with the respective change matrix.
10. A method according to claim 9 wherein the hidden layers comprise a first hidden layer and a second hidden layer, the second hidden layer being deeper than the first hidden layer, wherein the step of generating a change matrix for the second hidden layer comprises calculating a product of the associated random weight matrix and the error vector.
11. A method according to claim 9 wherein the hidden layers comprise a first hidden layer and a second hidden layer, the second hidden layer being deeper than the first hidden layer, wherein the step of generating a change matrix for the second hidden layer comprises calculating a product of the fixed random weight matrix associated with the first hidden layer, the random weight matrix associated with the second hidden layer, and the error vector.
12. A method according to any one of claims 9 to 11 wherein the elements of the fixed random weight matrices comprise random values from a uniform distribution over [-a, a] where a is a scalar and where a is different for each fixed random weight matrix.
13. A system comprising a neural network where the neural network is trained by a method according to any one of the preceding claims.
PCT/IB2014/063430 2013-07-26 2014-07-25 Method of training a neural network WO2015011688A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/907,560 US20160162781A1 (en) 2013-07-26 2014-07-25 Method of training a neural network
EP14755417.4A EP3025277A2 (en) 2013-07-26 2014-07-25 Method of training a neural network

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201361858928P 2013-07-26 2013-07-26
US61/858,928 2013-07-26
GB1402736.1 2014-02-17
GBGB1402736.1A GB201402736D0 (en) 2013-07-26 2014-02-17 Method of training a neural network

Publications (2)

Publication Number Publication Date
WO2015011688A2 true WO2015011688A2 (en) 2015-01-29
WO2015011688A3 WO2015011688A3 (en) 2015-05-14

Family

ID=50440261

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2014/063430 WO2015011688A2 (en) 2013-07-26 2014-07-25 Method of training a neural network

Country Status (4)

Country Link
US (1) US20160162781A1 (en)
EP (1) EP3025277A2 (en)
GB (1) GB201402736D0 (en)
WO (1) WO2015011688A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203625A (en) * 2016-06-29 2016-12-07 中国电子科技集团公司第二十八研究所 A kind of deep-neural-network training method based on multiple pre-training
WO2017201506A1 (en) * 2016-05-20 2017-11-23 Google Llc Training neural networks using synthetic gradients
WO2018217829A1 (en) * 2017-05-23 2018-11-29 Intel Corporation Methods and apparatus for enhancing a neural network using binary tensor and scale factor pairs
CN110175630A (en) * 2015-05-07 2019-08-27 西门子保健有限责任公司 The method and system for going deep into neural network for approximation to detect for anatomical object
CN110197256A (en) * 2019-04-30 2019-09-03 济南大学 A kind of Professional Certification weight optimization method and system based on neural network
WO2019178702A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN110309918A (en) * 2019-07-05 2019-10-08 北京中科寒武纪科技有限公司 Verification method, device and the computer equipment of Neural Network Online model
US10546238B2 (en) 2017-07-05 2020-01-28 International Business Machines Corporation Pre-training of neural network by parameter decomposition
US10776698B2 (en) * 2015-07-31 2020-09-15 Canon Kabushiki Kaisha Method for training an artificial neural network
WO2021044244A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Machine learning hardware having reduced precision parameter components for efficient parameter update
US11004216B2 (en) 2019-04-24 2021-05-11 The Boeing Company Machine learning based object range detection

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4202782A1 (en) 2015-11-09 2023-06-28 Google LLC Training neural networks represented as computational graphs
JP6610278B2 (en) * 2016-01-18 2019-11-27 富士通株式会社 Machine learning apparatus, machine learning method, and machine learning program
US11468290B2 (en) * 2016-06-30 2022-10-11 Canon Kabushiki Kaisha Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
US20180039884A1 (en) * 2016-08-03 2018-02-08 Barnaby Dalton Systems, methods and devices for neural network communications
US10810482B2 (en) 2016-08-30 2020-10-20 Samsung Electronics Co., Ltd System and method for residual long short term memories (LSTM) network
US10685285B2 (en) * 2016-11-23 2020-06-16 Microsoft Technology Licensing, Llc Mirror deep neural networks that regularize to linear networks
US10546242B2 (en) 2017-03-03 2020-01-28 General Electric Company Image analysis neural network systems
CN107122195B (en) * 2017-05-08 2023-08-11 云南大学 Subjective and objective fusion software nonfunctional demand evaluation method
US10642835B2 (en) 2017-08-14 2020-05-05 Sisense Ltd. System and method for increasing accuracy of approximating query results using neural networks
US11256985B2 (en) 2017-08-14 2022-02-22 Sisense Ltd. System and method for generating training sets for neural networks
US11216437B2 (en) 2017-08-14 2022-01-04 Sisense Ltd. System and method for representing query elements in an artificial neural network
WO2019039758A1 (en) * 2017-08-25 2019-02-28 주식회사 수아랩 Method for generating and learning improved neural network
US10257072B1 (en) 2017-09-28 2019-04-09 Cisco Technology, Inc. Weight initialization for random neural network reinforcement learning
JP6568175B2 (en) * 2017-10-20 2019-08-28 ヤフー株式会社 Learning device, generation device, classification device, learning method, learning program, and operation program
US11461628B2 (en) * 2017-11-03 2022-10-04 Samsung Electronics Co., Ltd. Method for optimizing neural networks
US10198928B1 (en) 2017-12-29 2019-02-05 Medhab, Llc. Fall detection system
WO2019210294A1 (en) * 2018-04-27 2019-10-31 Carnegie Mellon University Perturbative neural network
CN114912604A (en) * 2018-05-15 2022-08-16 轻物质公司 Photonic computing system and method for optically performing matrix-vector multiplication
WO2019222150A1 (en) * 2018-05-15 2019-11-21 Lightmatter, Inc. Algorithms for training neural networks with photonic hardware accelerators
TW202032187A (en) 2018-06-04 2020-09-01 美商萊特美特股份有限公司 Real-number photonic encoding
EP3632840B1 (en) 2018-10-05 2023-06-28 IMEC vzw Arrangement for use in a magnonic matrix-vector-multiplier
TW202111467A (en) 2019-02-25 2021-03-16 美商萊特美特股份有限公司 Path-number-balanced universal photonic network
US10803259B2 (en) 2019-02-26 2020-10-13 Lightmatter, Inc. Hybrid analog-digital matrix processors
KR20220039775A (en) 2019-07-29 2022-03-29 라이트매터, 인크. Systems and Methods for Analog Computation Using a Linear Photonic Processor
WO2021040944A1 (en) 2019-08-26 2021-03-04 D5Ai Llc Deep learning with judgment
US11922316B2 (en) 2019-10-15 2024-03-05 Lg Electronics Inc. Training a neural network using periodic sampling over model weights
US11093215B2 (en) 2019-11-22 2021-08-17 Lightmatter, Inc. Linear photonic processors and related methods
CN111461229B (en) * 2020-04-01 2023-10-31 北京工业大学 Deep neural network optimization and image classification method based on target transfer and line search
EP4186005A1 (en) 2020-07-24 2023-05-31 Lightmatter, Inc. Systems and methods for utilizing photonic degrees of freedom in a photonic processor
US20230021835A1 (en) * 2021-07-26 2023-01-26 Qualcomm Incorporated Signaling for additional training of neural networks for multiple channel conditions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175630B (en) * 2015-05-07 2023-09-01 西门子保健有限责任公司 Method and system for approximating deep neural networks for anatomical object detection
CN110175630A (en) * 2015-05-07 2019-08-27 西门子保健有限责任公司 The method and system for going deep into neural network for approximation to detect for anatomical object
US10776698B2 (en) * 2015-07-31 2020-09-15 Canon Kabushiki Kaisha Method for training an artificial neural network
WO2017201506A1 (en) * 2016-05-20 2017-11-23 Google Llc Training neural networks using synthetic gradients
US11715009B2 (en) 2016-05-20 2023-08-01 Deepmind Technologies Limited Training neural networks using synthetic gradients
CN106203625B (en) * 2016-06-29 2019-08-02 中国电子科技集团公司第二十八研究所 A kind of deep-neural-network training method based on multiple pre-training
CN106203625A (en) * 2016-06-29 2016-12-07 中国电子科技集团公司第二十八研究所 A kind of deep-neural-network training method based on multiple pre-training
WO2018217829A1 (en) * 2017-05-23 2018-11-29 Intel Corporation Methods and apparatus for enhancing a neural network using binary tensor and scale factor pairs
US11640526B2 (en) 2017-05-23 2023-05-02 Intel Corporation Methods and apparatus for enhancing a neural network using binary tensor and scale factor pairs
US10546238B2 (en) 2017-07-05 2020-01-28 International Business Machines Corporation Pre-training of neural network by parameter decomposition
US11106974B2 (en) 2017-07-05 2021-08-31 International Business Machines Corporation Pre-training of neural network by parameter decomposition
WO2019178702A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN111902825A (en) * 2018-03-23 2020-11-06 多伦多大学管理委员会 Polygonal object labeling system and method for training object labeling system
US11004216B2 (en) 2019-04-24 2021-05-11 The Boeing Company Machine learning based object range detection
CN110197256A (en) * 2019-04-30 2019-09-03 济南大学 A kind of Professional Certification weight optimization method and system based on neural network
CN110197256B (en) * 2019-04-30 2022-10-11 济南大学 Professional authentication weight optimization method and system based on neural network
CN110309918B (en) * 2019-07-05 2020-12-18 安徽寒武纪信息科技有限公司 Neural network online model verification method and device and computer equipment
CN110309918A (en) * 2019-07-05 2019-10-08 北京中科寒武纪科技有限公司 Verification method, device and the computer equipment of Neural Network Online model
GB2600871A (en) * 2019-09-03 2022-05-11 Ibm Machine learning hardware having reduced precision parameter components for efficient parameter update
WO2021044244A1 (en) * 2019-09-03 2021-03-11 International Business Machines Corporation Machine learning hardware having reduced precision parameter components for efficient parameter update

Also Published As

Publication number Publication date
US20160162781A1 (en) 2016-06-09
WO2015011688A3 (en) 2015-05-14
GB201402736D0 (en) 2014-04-02
EP3025277A2 (en) 2016-06-01

Similar Documents

Publication Publication Date Title
WO2015011688A2 (en) Method of training a neural network
Allen-Zhu et al. A convergence theory for deep learning via over-parameterization
Goel et al. Learning one convolutional layer with overlapping patches
EP3138052B1 (en) Quantum-assisted training of neural networks
Hayou et al. Mean-field behaviour of neural tangent kernel for deep neural networks
US11625589B2 (en) Residual semi-recurrent neural networks
Luo et al. Differentiable dynamic normalization for learning deep representation
Whitaker et al. Prune and tune ensembles: low-cost ensemble learning with sparse independent subnetworks
Cordella et al. A weighted majority vote strategy using bayesian networks
Gu An explainable semi-supervised self-organizing fuzzy inference system for streaming data classification
Suh et al. Gaussian copula variational autoencoders for mixed data
US20220114047A1 (en) Method for mitigating error of quantum circuit and apparatus thereof
US11967124B2 (en) Method and apparatus for classification using neural network
US20220253670A1 (en) Devices and methods for lattice points enumeration
Demyanov Regularization methods for neural networks and related models
Budiman et al. Adaptive convolutional ELM for concept drift handling in online stream data
Eom et al. Alpha-Integration Pooling for Convolutional Neural Networks
Kumar et al. Neural networks and fuzzy logic
Palomo et al. A new self-organizing neural gas model based on Bregman divergences
Song et al. Scalable model selection for belief networks
Srivastava et al. Recognition of Handwritten Digits Using Computer Vision Preprocessor Based Combined Architecture of Self-Organizing Map And Backpropagation on MNIST Dataset
US20230316091A1 (en) Federated learning method and apparatus
Wang et al. An evaluation of the dynamics of diluted neural network
Soudry et al. Mean Field Bayes Backpropagation: scalable training of multilayer neural networks with binary weights
Dong Generalized Uncertainty of Deep Neural Networks: Taxonomy and Applications

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 14907560

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2014755417

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14755417

Country of ref document: EP

Kind code of ref document: A2