WO1993024898A1 - Method of training a neural network - Google Patents

Method of training a neural network Download PDF

Info

Publication number
WO1993024898A1
WO1993024898A1 PCT/GB1993/001180 GB9301180W WO9324898A1 WO 1993024898 A1 WO1993024898 A1 WO 1993024898A1 GB 9301180 W GB9301180 W GB 9301180W WO 9324898 A1 WO9324898 A1 WO 9324898A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
pram
output
prams
pattern
Prior art date
Application number
PCT/GB1993/001180
Other languages
French (fr)
Inventor
John Gerald Taylor
Denise Gorse
Trevor Grant Clarkson
Original Assignee
University College London
King's College London
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB929211780A external-priority patent/GB9211780D0/en
Priority claimed from GB929211910A external-priority patent/GB9211910D0/en
Application filed by University College London, King's College London filed Critical University College London
Publication of WO1993024898A1 publication Critical patent/WO1993024898A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This invention relates to a method of training a neural network, and to a neural network which is trainable by this method.
  • the invention concerns neural networks composed of probabilistic RAMs (pRAMs) , such networks being referred to below as pRAM networks.
  • pRAMs are described, for example, in Proceedings of the First IEEE International Conference on Artificial Neural Networks, IEE, 1989, No. 313, pp 242-246), and in two International Patent Applications Nos. WO 92/00572 and WO 92/00573. Attention is directed to these documents, which are incorporated herein by reference, for a full description of pRAMs.
  • a pRAM is an artificial neuron comprising a memory having a plurality of storage locations at each of which a number representing a probability is stored, the memory having at least one address line to define a succession of storage location addresses, and means for causing to appear at the output of the device, when the storage locations are addressed, a succession of output signals, the probability of the output signal having a given one of the first and second values being determined by the number at the addressed location.
  • the desired probabilistic effect can be achieved by using a comparator connected to receive as an input the contents of each of the successively addressed locations, and a noise generator for inputting to the comparator a succession of random numbers representing noise.
  • a pRAM can be provided with means to enable it to learn, and a form of reinforcement training is described in the above mentioned International Applications, and in a paper entitled “Training strategies for probabilistic RAMs” in Proceedings Parallel Processing in Neural Systems and Computers, March 19, 1990, Dusseldorf.FRG, pages 161-164, incorporated herein by reference.
  • reinforcement training of a pRAM network the network is presented with the pattern or patterns which is it desired that the network should learn to respond to, and the contents of the pRAM storage locations are then altered in a sense which depends on whether the network has succeeded or failed in responding correctly to the pattern concerned and which is such that the probability of the network correctly responding to a pattern is increased. This process is carried out repeatedly.
  • noise is present in virtually all patterns which a pRAM network might be called on to respond to in real life. It has now been found that a pRAM network can be trained to respond to noisy patterns, by carrying out the training on patterns to which noise has been deliberately added.
  • the present invention provides a method of training a pRAM network in which a pRAM network is presented, normally repeatedly, with a pattern to which it is to be trained to respond, to which noise has been added, and the contents of at least some of the storage locations of the pRAM network are altered in dependence on whether the output of the network represents success or failure in responding to the pattern.
  • the invention further provides a pRAM network which is adapted to be trained by the foregoing method. It should be understood that although in the examples given below the patterns are visual patterns, they need not be, and that the term is used to cover any type of pattern. It should further be understood that although the following description is in terms of recognising patterns, the invention is also applicable where the response which the network is to produce when presented with a pattern does not amount to recognition.
  • Figure 1 shows a pRAM network which is trainable by the method of the present invention
  • Figure 2 show two patterns on which the network of Figure 1 may be trained
  • Figure 3 is a graph showing the effect of using different amounts of noise in training on the patterns of Figure 2;
  • Figure 4 is a graph showing the effect of different amounts of noise on the confidence limits obtainable in relation to the patterns of Figure 2;
  • Figure 5 shows another set, this time of four patterns, on which the network of Figure 1 may be trained;
  • Figure 6 is a set of four graphs showing results corresponding to those of Figure 3, but for the patterns of Figure 5;
  • Figure 7 shows yet another set, this time of ten patterns, on which a pRAM network may be trained
  • Figure 8 shows a network for dealing with the patterns of Figure 7;
  • Figure 9 is a graph showing results corresponding to those of Figure 3, but for the patterns of Figure 7.
  • the patterns used are binary images of 5 x 5 pixels.
  • the black blocks in an image have the pixel value of 1 and the white blocks have the pixel value of 0.
  • a pyramidal structure is adopted which has two layers of pRAM nodes, a hidden layer 10 and an output layer 20.
  • the network is fully interconnected, but it is believed that the invention is also applicable where full connectivity does not apply.
  • an additional layer 30 is present for the purpose of training, but these are not used when the network is used after training to recognise a pattern.
  • the hidden layer consists of five 5-pRAMs and the output layer consists of two 5-pRAMS.
  • a 5-pRAM is one which has five input lines and therefore 2 5 memory locations (more generally, an n-pRAM has n input lines and 2 n memory locations, where n ⁇ l) .
  • Each pixel in the input image is applied to a respective one of the 25 input lines of the hidden layer.
  • the pixels can be applied in any order to the input lines, but in this example the five pixels in the first row of the input image are taken to the five inputs of the first pRAM in the hidden layer, the five pixels in the second row to the inputs of the second pRAM, and so on.
  • the learning process takes place in the presence of noise.
  • the layer 30, which constitutes a "noise layer” and consists of twenty-five 1-pRAMS, one for each of the inputs to the layer 20.
  • Each of these pRAMs functions between one pixel of the input image and the corresponding input line of the pRAM network, so that the input image has a preset amount of added noise.
  • Firing probabilities p n and l-p n are stored in L 0 and L ⁇ respectively so that a pixel value 0 will have a probability p n of firing and a pixel value of 1 have a probability p n of not firing. Statistically this is equivalent to p n *100 percent of pixels being inverted in the training pattern. It should be noted that the added noise does not have to be the same for the inputs of 0 and 1, so that some bias can be imposed. Thus, it is not essential that the memory pair in the training layer 30 be set to p n and l-p n , though this will normally be so.
  • p n can be arbitrarily set in the range of from 0 to 1, training with different degrees of noise can be realised. This imposes realistic "white" noise on the pattern, such as will be obtained from real transducers (e.g. a camera) .
  • Training can be carried out using the reinforcement rule:
  • u is the selected pRAM memory contents and a is the state of the pRAM output.
  • the Kronecker delta, S ⁇ t ⁇ reflects the fact that only currently accessed locations are adapted.
  • the values of the reward rate (p) and penalty rate ( ⁇ ) are chosen in this example to be 0.1 and 0.05 respectively. For simplicity, the values are kept fixed during training, though faster convergence could be achieved if they were tuned during training, and this is described below with reference to another example.
  • the value of acceptable training error was set at 0.05.
  • r and p are the reward and penalty factors. The stochastic property of these factors is an important feature in reinforcement training of pRAM networks, because it allows the possibility of "neutral" actions which are neither punished nor rewarded but which may correspond to a useful exploration of the environment.
  • the above rule is used to provide global reinforcement, and in every pRAM in the network the contents of the memory location which was accessed to give the output concerned is updated.
  • alternative training schemes are possible, for example one in which only the memory contents in a particular layer are updated.
  • the training rule of equation (1) may be applied using software in a conventional computer to calculate the required alteration to the memory contents of the pRAMs, the computer then sending the updated memory contents to the pRAMs.
  • the operation may be done in dedicated hardware, and examples are disclosed in detail in the above mentioned International Applications.
  • pRAM network training which differs from at least most training methods used in non-pRAM networks, lies in the initial weight values of the network. Normally where weights are used to denote the connection strength between neurons, a small real number near to zero is set as the initial value of each weight. In pRAM training, however, the memory contents are firing probabilities rather than deterministic connection weights. The firing probabilities near to 0 possess the same confidence in the pRAM's behaviour as those near to 1. Therefore the initial values of the memory contents of the pRAMs in layers 10 and 20 are set to be 0.5 or 0.5+e, where e is a small fraction which varies randomly from pRAM to pRAM.
  • Outputs (A B) in Figure 1 are the output pair of the pRAM network.
  • the output (1 0) is defined to be the training target for pattern 0 of Figure 1, and output (01) the training target for pattern 1.
  • a reward signal is sent when the output pair meets the target and a penalty signal is sent otherwise. This is carried out for a predetermined training period (1000 iterations for instance) .
  • the output firing rate is calculated by accumulating the output spikes of the network over a fixed window at the end of the training period (say the last 100 iterations. If over 95% (since the training error is defined as 0.05) of the outputs in the window meet the target for both the training patterns, the training finishes. If not, the training continues, moving the window of 100 iterations forwards, until the firing rates are within the error limit.
  • Figure 3 shows the results of training the network using different levels of noise and then presenting to it, for recognition, pattern 0 in the presence successively of various levels of noise.
  • the data comes from a set of five pattern recognition trials each with a different amount of noise in the pattern to be recognised, and each trial comprises 200 pattern recognition runs.
  • the vertical axis represents the firing rates of the pRAM network averaged over a period of time, and the horizontal axis represents the percentage noise added to the input patterns.
  • the generalisation ability of the network is shown by Figure 3, where substantial amounts of noise (i.e. divergence from the training pattern) still lead to a correct output. An output is correctly discriminated if the associated output pRAM (A or B) fires for more than 50% of the time. It can be seen that while maintaining the discrimination property, the generalisation ability increases as the training noise increases. Even with 45% added noise in the pattern presented for recognition, the network maintains a 20% confidence margin when trained with 30% training noise.
  • Figure 4 shows that the confidence level decreases with input noise but increases with training noise.
  • the sequence in which the input patterns are applied to the network is an important factor which will influence the training results.
  • the training scheme above adopts a regular sequence in which the patterns are applied in the order; pattern 0, pattern 1, pattern 0, pattern 1, etc. This is alternate training.
  • Other sequences may be used, such as random training, in which the input patterns are applied in a random order, and sequential training, in which the input patterns are applied one at a time without being interleaved. Since the memory contents are not clamped in this training, a different sequence will generate different results in terms of generalisation, discrimination and training times.
  • the output pair (A B) is coded as (01) or (10) , the desired output for each input pattern.
  • This is unary coding which reserves one output node to "fire" for one input pattern.
  • Other codings such as binary coding, can be used in output representation.
  • the output can be coded as (00) , (01) , (10) and (11) and trained for binary coding where four input patterns are provided. This has been shown to produce similar results to those given in Figure 3 and each output (A or B) responds positively to two of the four training patterns and negatively to the other two (again in the presence of noise) .
  • the advantage of using binary coding is that fewer nodes are needed in the output layer since N output nodes can be coded as the responses for 2 N input patterns, instead of only N input patterns in unary coding.
  • FIG. 5 An example of the use of binary coding is given with reference to Figures 5 and 6.
  • the pRAM network of Figure l is trained to recognise the four patterns shown in Figure 5.
  • the output (00) is defined to be the training target for pattern 0, output (01) the training target for pattern 1, and so forth.
  • Training is carried out in the same way as in the example of Figures 2 to 4, except that in this case only two levels of training noise were used, 0% and 30%.
  • the recognition ability of the network was tested by presenting it successively with each of the patterns 0, 1, 2 and 3, each pattern being presented with five different levels of noise, namely 0%, 5%, 10%, 20% and 30%.
  • the results are shown in Figure 6. It can be seen that the discrimination of noisy input patterns is improved as the training noise increases, due to the generalisation behaviour of the pRAM network. Even with 45% added noise, the network maintains a 20% confidence margin when trained with 30% training noise.
  • FIG. 7 shows patterns representing the digits 0 to 9 as the patterns to be recognised.
  • Each pattern is an image of 7x5 pixels, so that the hidden layer 10a of the pRAM network needs to consist of five 7-pRAMS, as shown in Figure 8.
  • the pRAMS of the hidden layer are connected to an output layer 20a which consists of four 5- pRAMS.
  • Each pixel of the input image is applied to a respective one of the 35 input lines of the hidden layer 10a.
  • the reinforcement mechanism may be tuned in terms of two aspects: (a) the training rate p and (b) the reward and penalty signals r and p.
  • the relevant parameters may be adapted in response to a suitable error measure.
  • Reward and' penalty signals may be applied in a graded fashion, dependent on the performance of the network.
  • Figure 3 shows the average firing rate of the output nodes with desired output "1" (denoted by A) and the average firing rate of those with desired output "0" (denoted by B) .
  • the output firing rate is calculated by accumulating the output spikes of the network for a constant training period (1000 runs for instance) . If over 90% of the outputs meet the target for all the training patterns (the training error is taken in this example as being 0.1), the training finishes. If not, the training continues until the firing rates are within the error limit.
  • the vertical axis represents the firing rates of the pRAM network averaged over a period of time and over the ten patterns, and the horizontal axis represents the percentage noise added to the input images.
  • the generalisation ability of the network is shown by Figure 9, where substantial amounts of noise (i.e. divergence from the training patterns) still lead to a correct output. It can be seen that while maintaining the discrimination property, the generalisation ability increases as the training noise increases. Even with 20% added noise, the network maintains a 20% confidence margin when trained with 10% training noise.

Abstract

A method is described of training a pRAM network, in which a pRAM network is repeatedly presented with a pattern to which it is to be trained to respond, to which noise has been added, and the contents of at least some of the storage locations of the pRAM network are altered in dependence on whether the output of the network represents success or failure in responding to the pattern. The noise may be added by applying the pattern via noise-adding pRAMs, or by some other suitable measure, such as the use of a lookup table.

Description

Method of training a neural network
This invention relates to a method of training a neural network, and to a neural network which is trainable by this method. The invention concerns neural networks composed of probabilistic RAMs (pRAMs) , such networks being referred to below as pRAM networks. pRAMs are described, for example, in Proceedings of the First IEEE International Conference on Artificial Neural Networks, IEE, 1989, No. 313, pp 242-246), and in two International Patent Applications Nos. WO 92/00572 and WO 92/00573. Attention is directed to these documents, which are incorporated herein by reference, for a full description of pRAMs.
In brief, however, a pRAM is an artificial neuron comprising a memory having a plurality of storage locations at each of which a number representing a probability is stored, the memory having at least one address line to define a succession of storage location addresses, and means for causing to appear at the output of the device, when the storage locations are addressed, a succession of output signals, the probability of the output signal having a given one of the first and second values being determined by the number at the addressed location. As described in the documents identified in the preceding paragraph, the desired probabilistic effect can be achieved by using a comparator connected to receive as an input the contents of each of the successively addressed locations, and a noise generator for inputting to the comparator a succession of random numbers representing noise.
A pRAM can be provided with means to enable it to learn, and a form of reinforcement training is described in the above mentioned International Applications, and in a paper entitled "Training strategies for probabilistic RAMs" in Proceedings Parallel Processing in Neural Systems and Computers, March 19, 1990, Dusseldorf.FRG, pages 161-164, incorporated herein by reference. In reinforcement training of a pRAM network, the network is presented with the pattern or patterns which is it desired that the network should learn to respond to, and the contents of the pRAM storage locations are then altered in a sense which depends on whether the network has succeeded or failed in responding correctly to the pattern concerned and which is such that the probability of the network correctly responding to a pattern is increased. This process is carried out repeatedly.
However, noise is present in virtually all patterns which a pRAM network might be called on to respond to in real life. It has now been found that a pRAM network can be trained to respond to noisy patterns, by carrying out the training on patterns to which noise has been deliberately added.
Accordingly the present invention provides a method of training a pRAM network in which a pRAM network is presented, normally repeatedly, with a pattern to which it is to be trained to respond, to which noise has been added, and the contents of at least some of the storage locations of the pRAM network are altered in dependence on whether the output of the network represents success or failure in responding to the pattern. The invention further provides a pRAM network which is adapted to be trained by the foregoing method. It should be understood that although in the examples given below the patterns are visual patterns, they need not be, and that the term is used to cover any type of pattern. It should further be understood that although the following description is in terms of recognising patterns, the invention is also applicable where the response which the network is to produce when presented with a pattern does not amount to recognition.
It will be seen that by adding noise during training to the input vectors which represent the patterns on which the pRAM is being trained, for any vector the probability of all nearest-neighbour vectors being generated is greater than zero. This assists in the recovery of information from the network in response to a previously unknown input vector, in that it gives the network the property of generalisation.
Some examples of the invention will now be described with reference to the accompanying drawings, in which:
Figure 1 shows a pRAM network which is trainable by the method of the present invention;
Figure 2 show two patterns on which the network of Figure 1 may be trained;
Figure 3 is a graph showing the effect of using different amounts of noise in training on the patterns of Figure 2;
Figure 4 is a graph showing the effect of different amounts of noise on the confidence limits obtainable in relation to the patterns of Figure 2;
Figure 5 shows another set, this time of four patterns, on which the network of Figure 1 may be trained;
Figure 6 is a set of four graphs showing results corresponding to those of Figure 3, but for the patterns of Figure 5;
Figure 7 shows yet another set, this time of ten patterns, on which a pRAM network may be trained;
Figure 8 shows a network for dealing with the patterns of Figure 7; and
Figure 9 is a graph showing results corresponding to those of Figure 3, but for the patterns of Figure 7.
Referring first to Figures l and 2, a description will now be given of the training of the network shown in Figure 1 on the patterns shown in Figure 2. In this example the patterns used are binary images of 5 x 5 pixels. The black blocks in an image have the pixel value of 1 and the white blocks have the pixel value of 0. In order to reduce the number of connections needed between neurons and to reduce the amount of memory needed, a pyramidal structure is adopted which has two layers of pRAM nodes, a hidden layer 10 and an output layer 20. The network is fully interconnected, but it is believed that the invention is also applicable where full connectivity does not apply. As is explained below, an additional layer 30 is present for the purpose of training, but these are not used when the network is used after training to recognise a pattern.
In the network shown in Figure 1, the hidden layer consists of five 5-pRAMs and the output layer consists of two 5-pRAMS. A 5-pRAM is one which has five input lines and therefore 25 memory locations (more generally, an n-pRAM has n input lines and 2n memory locations, where n≥l) . Each pixel in the input image is applied to a respective one of the 25 input lines of the hidden layer. The pixels can be applied in any order to the input lines, but in this example the five pixels in the first row of the input image are taken to the five inputs of the first pRAM in the hidden layer, the five pixels in the second row to the inputs of the second pRAM, and so on.
As already explained, the learning process takes place in the presence of noise. This is introduced by the layer 30, which constitutes a "noise layer" and consists of twenty-five 1-pRAMS, one for each of the inputs to the layer 20. Each of these pRAMs functions between one pixel of the input image and the corresponding input line of the pRAM network, so that the input image has a preset amount of added noise. There are two memory locations in a 1-pRAM, here called L0 and Lx, and illuminated pixels (with a value of 1) are defined here as addressing location L2 and pixels not illuminated (with a value of 0) are defined as addressing location L0. Firing probabilities pn and l-pn are stored in L0 and Lλ respectively so that a pixel value 0 will have a probability pn of firing and a pixel value of 1 have a probability pn of not firing. Statistically this is equivalent to pn*100 percent of pixels being inverted in the training pattern. It should be noted that the added noise does not have to be the same for the inputs of 0 and 1, so that some bias can be imposed. Thus, it is not essential that the memory pair in the training layer 30 be set to pn and l-pn, though this will normally be so.
While all the training noise in the previous discussion has been assumed to be of a linear distribution, any appropriate distribution for the training noise (resembling the noise distribution in the expected input) could be used, in which case the noise might be added by the use of a lookup table mechanism.
Since the value of pn can be arbitrarily set in the range of from 0 to 1, training with different degrees of noise can be realised. This imposes realistic "white" noise on the pattern, such as will be obtained from real transducers (e.g. a camera) .
Training can be carried out using the reinforcement rule:
Figure imgf000007_0001
In this equation u is the selected pRAM memory contents and a is the state of the pRAM output. The Kronecker delta, SΑt^, reflects the fact that only currently accessed locations are adapted. The values of the reward rate (p) and penalty rate (λ) are chosen in this example to be 0.1 and 0.05 respectively. For simplicity, the values are kept fixed during training, though faster convergence could be achieved if they were tuned during training, and this is described below with reference to another example. The value of acceptable training error was set at 0.05. r and p are the reward and penalty factors. The stochastic property of these factors is an important feature in reinforcement training of pRAM networks, because it allows the possibility of "neutral" actions which are neither punished nor rewarded but which may correspond to a useful exploration of the environment.
The above rule is used to provide global reinforcement, and in every pRAM in the network the contents of the memory location which was accessed to give the output concerned is updated. However, alternative training schemes are possible, for example one in which only the memory contents in a particular layer are updated. The training rule of equation (1) may be applied using software in a conventional computer to calculate the required alteration to the memory contents of the pRAMs, the computer then sending the updated memory contents to the pRAMs. Alternatively, the operation may be done in dedicated hardware, and examples are disclosed in detail in the above mentioned International Applications.
It is observed that training easily converges in this simple kind of pyramidal pRAM network. This enables us to take: r=l-p r,p,e{0,l}
Another important feature in pRAM network training, which differs from at least most training methods used in non-pRAM networks, lies in the initial weight values of the network. Normally where weights are used to denote the connection strength between neurons, a small real number near to zero is set as the initial value of each weight. In pRAM training, however, the memory contents are firing probabilities rather than deterministic connection weights. The firing probabilities near to 0 possess the same confidence in the pRAM's behaviour as those near to 1. Therefore the initial values of the memory contents of the pRAMs in layers 10 and 20 are set to be 0.5 or 0.5+e, where e is a small fraction which varies randomly from pRAM to pRAM.
Outputs (A B) in Figure 1 are the output pair of the pRAM network. The output (1 0) is defined to be the training target for pattern 0 of Figure 1, and output (01) the training target for pattern 1. A reward signal is sent when the output pair meets the target and a penalty signal is sent otherwise. This is carried out for a predetermined training period (1000 iterations for instance) . The output firing rate is calculated by accumulating the output spikes of the network over a fixed window at the end of the training period (say the last 100 iterations. If over 95% (since the training error is defined as 0.05) of the outputs in the window meet the target for both the training patterns, the training finishes. If not, the training continues, moving the window of 100 iterations forwards, until the firing rates are within the error limit.
Figure 3 shows the results of training the network using different levels of noise and then presenting to it, for recognition, pattern 0 in the presence successively of various levels of noise. For each level of training noise, the data comes from a set of five pattern recognition trials each with a different amount of noise in the pattern to be recognised, and each trial comprises 200 pattern recognition runs. The vertical axis represents the firing rates of the pRAM network averaged over a period of time, and the horizontal axis represents the percentage noise added to the input patterns.
In the first set of pattern recognition trials, the network is trained alternately on two patterns to completion with no training noise(TN) added (TN=0%) . Thereafter it is presented, for recognition, with the pattern 0. The pattern is presented successively with increasing levels of noise (0,5,10,20,30%) and the output recorded.
In the next set of pattern recognition trials the network is trained to completion in the presence of 5% noise, and again presented with pattern 0 for recognition with the five different levels of noise. This process is repeated with 10%, 20% and 30% training noise.
The generalisation ability of the network is shown by Figure 3, where substantial amounts of noise (i.e. divergence from the training pattern) still lead to a correct output. An output is correctly discriminated if the associated output pRAM (A or B) fires for more than 50% of the time. It can be seen that while maintaining the discrimination property, the generalisation ability increases as the training noise increases. Even with 45% added noise in the pattern presented for recognition, the network maintains a 20% confidence margin when trained with 30% training noise. The way in which the confidence of the result varies with the level of the training noise and the noise present in the input pattern is shown in Figure 4. This shows that the confidence level decreases with input noise but increases with training noise.
However, training time increases exponentially with training noise, as demonstrated by Table 1 below, which gives the results of an experiment done to determine the number of iterations required to reach an error level of 0.05 for a given level of training noise.
TABLE 1
training noise iterations
0% 200
5% 400
10% 500
20% 800
30% 1500
35% 2750
Thus, it might be appropriate to select a lower value of training noise in applications where fast training is a priority.
The sequence in which the input patterns are applied to the network is an important factor which will influence the training results. The training scheme above adopts a regular sequence in which the patterns are applied in the order; pattern 0, pattern 1, pattern 0, pattern 1, etc. This is alternate training. Other sequences may be used, such as random training, in which the input patterns are applied in a random order, and sequential training, in which the input patterns are applied one at a time without being interleaved. Since the memory contents are not clamped in this training, a different sequence will generate different results in terms of generalisation, discrimination and training times.
During training, the output pair (A B) is coded as (01) or (10) , the desired output for each input pattern. This is unary coding which reserves one output node to "fire" for one input pattern. Other codings, such as binary coding, can be used in output representation. The output can be coded as (00) , (01) , (10) and (11) and trained for binary coding where four input patterns are provided. This has been shown to produce similar results to those given in Figure 3 and each output (A or B) responds positively to two of the four training patterns and negatively to the other two (again in the presence of noise) . The advantage of using binary coding is that fewer nodes are needed in the output layer since N output nodes can be coded as the responses for 2N input patterns, instead of only N input patterns in unary coding.
An example of the use of binary coding is given with reference to Figures 5 and 6. Here the pRAM network of Figure l is trained to recognise the four patterns shown in Figure 5. The output (00) is defined to be the training target for pattern 0, output (01) the training target for pattern 1, and so forth. Training is carried out in the same way as in the example of Figures 2 to 4, except that in this case only two levels of training noise were used, 0% and 30%. The recognition ability of the network was tested by presenting it successively with each of the patterns 0, 1, 2 and 3, each pattern being presented with five different levels of noise, namely 0%, 5%, 10%, 20% and 30%. The results are shown in Figure 6. It can be seen that the discrimination of noisy input patterns is improved as the training noise increases, due to the generalisation behaviour of the pRAM network. Even with 45% added noise, the network maintains a 20% confidence margin when trained with 30% training noise.
Another example of binary coding is given in Figures 7 to 9, where Figure 7 shows patterns representing the digits 0 to 9 as the patterns to be recognised. Each pattern is an image of 7x5 pixels, so that the hidden layer 10a of the pRAM network needs to consist of five 7-pRAMS, as shown in Figure 8. The pRAMS of the hidden layer are connected to an output layer 20a which consists of four 5- pRAMS. Each pixel of the input image is applied to a respective one of the 35 input lines of the hidden layer 10a.
There are sixteen possible output codes from the output layer. Of these, ten are chosen to represent the ten digits, in accordance with Table 2 below. They are chosen in such a way that every output node has the same number of "1" (active) and "0" (not active) states. This is to take account of the influence of the output Hamming distance on the network's memory distribution. For a discussion of the concept of Hamming distance as the measure of the similarity between two patterns, attention is directed to I. Aleksander & H. Morton, "An Introduction to Neural Computing", Chapman and Hall, London, 1990. In this example, there are five Is and five 0s for each node, and this maintains the firing rate of the pRAM network at 50% (which means no knowledge) when the input noise increases to about 50%
Figure imgf000012_0001
To ensure the convergence and fast training of the network, the reinforcement mechanism may be tuned in terms of two aspects: (a) the training rate p and (b) the reward and penalty signals r and p.
In both cases the relevant parameters (p, r or p) may be adapted in response to a suitable error measure.
(a) During training, the scale of the parameter changes is tuned as the output firing rates change. Since, as can be seen from equation (1) , rewarding is provided by the factor and penalising by the factor pλ, this can be achieved by varying just in response to the performance of the network.
(b) Reward and' penalty signals may be applied in a graded fashion, dependent on the performance of the network. The Hamming distances between the real outputs and the desired outputs may be used as a measurement of how far the desired output is from its target. In our example, the maximum Hamming distance in the output is 4 (since there are four output nodes) and the minimum Hamming distance is 0. If the Hamming distance is 4, the network state is said to be too far away from the desired output, so a full punishment is applied (i.e. p=l) • If the Hamming distance is 3, a smaller punishment factor is used (e.g. p-0.75) , and so on. Only if the Hamming distance is 0, is a reward factor is applied. Results of simulations show that the training converges more quickly when this graded reinforcement method is used (reduced from above 2000 to about 700 iterations) , compared with the reinforcement method described earlier in this application.
For simplification, Figure 3 shows the average firing rate of the output nodes with desired output "1" (denoted by A) and the average firing rate of those with desired output "0" (denoted by B) . The output firing rate is calculated by accumulating the output spikes of the network for a constant training period (1000 runs for instance) . If over 90% of the outputs meet the target for all the training patterns (the training error is taken in this example as being 0.1), the training finishes. If not, the training continues until the firing rates are within the error limit. The vertical axis represents the firing rates of the pRAM network averaged over a period of time and over the ten patterns, and the horizontal axis represents the percentage noise added to the input images.
The generalisation ability of the network is shown by Figure 9, where substantial amounts of noise (i.e. divergence from the training patterns) still lead to a correct output. It can be seen that while maintaining the discrimination property, the generalisation ability increases as the training noise increases. Even with 20% added noise, the network maintains a 20% confidence margin when trained with 10% training noise.

Claims

CLAIMS :
1. A method of training a network comprising a plurality of pRAMs, each pRAM comprising a memory having a plurality of storage locations at each of which a number representing a probability is stored, the memory having at least one address line to define a succession of storage location addresses, and means for causing to appear at the output of the pRAM, when the storage locations are addressed, a succession of output signals, the probability of the output signal having a given one of the first and second values being determined by the number at the addressed location, in which the pRAM network is presented with a pattern to which it is to be trained to respond, to which noise has been added, and the contents of at least some of the storage locations of the pRAM network are altered in dependence on whether the output of the network is that desired in response to the pattern.
2. A method as claimed in claim 1, wherein the said network comprises a plurality of input pRAMs constituting a hidden layer and at least one output pRAM connected thereto to constitute an output layer.
3. A method as claimed in claim 2, wherein the said network comprises a plurality of output pRAMs.
4. A method as claimed in claim 3, wherein the outputs of the output pRAMs are treated as constituting a binary code, and the network is trained to generate a given binary code in response to the input of a given pattern.
5. A method as claimed in claim 4, wherein the number of possible binary codes exceeds the number of patterns on which the network is trained, and wherein the binary codes selected to represent the patterns are so chosen that, considered over all the patterns, each output of the output pRAMs has an equal chance of being 0 or 1.
6. A method as claimed in any one of claims 2 to 5, wherein the address line of the hidden layer pRAMs each receive a respective signal via an additional noise-adding pRAM.
7. A method as claimed in claim 6, wherein there is a plurality of the said noise-adding pRAMs, each being a 1- pRAM having two storage locations, with each address line of the pRAMs in the hidden layer receiving a signal via a respective one of the 1-pRAMs.
8. A method as claimed in claim 7, wherein the sum of the numbers stored at the two storage locations of each noise-adding 1-pRAM is 1.
9. A method as claimed in any preceding claim, wherein training is carried out for each pRAM being trained, according to the rule:
Figure imgf000016_0001
where αu is the memory contents of the pRAM, a e{0,l} is the state of the output of the pRAM, p and λ are reward and penalty rates respectively, r and p are reward and penalty factors, and S a^ is a delta function denoting the fact that only currently accessed locations are adapted.
10. A method as claimed in claim 9, wherein r = l-p.
11. A method according to claim 9 or 10, wherein the value of p is varied during training.
12. A method according to claims 9, 10 or 11, wherein the values of r and/or p are varied during training.
13. A method according to any one of claims 9 to 11, wherein, for the or each pattern, the Hamming distance is used as a measure of the extent to which the actual output(s) of the network differ from the desired value(s) , and reinforcement is applied in a way which depends on this measure.
14. A method according to any preceding claim, wherein the initial memory contents of each the storage locations of the pRAMs to be trained is equal to 0.5.
15. A method according to any one of claims 1 to 13, wherein the initial memory contents of each of the storage locations of the pRAMs to be trained is equal to 0.5 + e, where e is a small fraction which varies randomly from pRAM to pRAM.
16. A method according to any preceding claim, wherein the network is trained to respond to a plurality of different patterns.
17. A method according to claim 16, in which the different patterns are successively applied to the network in a fixed order, which is repeated.
18. A method according to claim 16, in which the different patterns are applied in a random order.
19. A method according to claim 16, in which the patterns are applied in a fixed order, with each pattern being applied a plurality of times before the next pattern is applied.
20. A trainable network comprising a plurality of pRAMs, each pRAM comprising a memory having a plurality of storage locations at each of which a number representing a probability is stored, the memory having at least one address line to define a succession of storage location addresses, and means for causing to appear at the output of the pRAM, when the storage locations are addressed, a succession of output signals, the probability of the output signal having a given one of the first and second values being determined by the number at the addressed location, the pRAM network comprising means for receiving a pattern to which it is to be trained to respond to which noise has been added, and means for altering the contents of at least some of the storage locations of the pRAM network in dependence on whether the output of the network is that desired in response to the pattern.
21. A network as claimed in claim 20, which comprises a plurality of input pRAMs constituting a hidden layer and at least one output pRAM connected thereto to constitute an output layer.
22. A network as claimed in claim 21, which comprises a plurality of output pRAMs.
23. A network as claimed in claim 22, wherein the outputs of the output pRAMs are treated as constituting a binary code, and the network is trained to generate a given binary code in response to the input of a given pattern.
24. A network as claimed in claim 23, wherein the number of possible binary codes exceeds the number of patterns on which the network is trained, and wherein the binary codes selected to represent the patterns are so chosen that, considered over all the patterns, each output of the output pRAMs has an equal chance of being 0 or l.
25. A network as claimed in any one of claims 21 to 24, wherein the address lines of the hidden layer pRAMs each receive a respective signal via an additional noise- adding pRAM.
26. A network as claimed in claim 25, wherein there is a plurality of the said noise-adding pRAMs, each being a 1-pRAM, having two storage locations, with each address line of the pRAMs in the hidden layer being adapted to receive a signal via a respective one of the 1-pRAMs.
27. A network as claimed in claim 26, wherein the sum of the numbers stored at the two storage locations of each noise-adding 1-pRAM is 1.
28. A network as claimed in any one of claims 20 to 27, wherein training is carried out for each pRAM being trained, according to the rule:
ΔαJ.(t)=p((a-α1,)r+A(a-αil)p) (tjx*..^
where αu is the memory contents of the pRAM, a €{0,1} is the state of the output of the pRAM, pand A are reward and penalty rates respectively, r and p are reward and penalty factors, and Su * is a delta function denoting the fact that only currently accessed locations are adapted.
29. A network as claimed in claim 28, wherein r ***■ 1- P-
30. A network according to claim 28 or 29, comprising means for varying the value of p during training.
31. A network according to claims 28, 29 or 30, wherein the values of r and/or p are varied during training.
32. A network according to any one of claims 28 to 31, wherein, for the or each pattern, the Hamming distance is used as a measure of the extent to which the actual output(s) of the network differ from the desired value(s) , and reinforcement is applied in a way which depends on this measure.
33. A network according to any one of claims 20 to 32, wherein the initial memory contents of each the storage locations of the pRAMs to be trained is equal to 0.5.
34. A network according to any one of claims 20 to 32, wherein the initial memory contents of each of the storage locations of the pRAMs to be trained in equal to 0.5 ± e, where e is a small faction which varies randomly from pRAM to pRAM.
35. A network according to any one of claims 20 to 34, wherein the network is trainable to respond to a plurality of different patterns.
36. A network according to claim 35, in which the different patterns are successively received by the network in a fixed order, which is repeated.
37. A network according to claim 36, in which the different patterns are received in a random order.
38. A network according to claim 35, in which the patterns are received in a fixed order, with each pattern being received a plurality of times before the next pattern is applied.
PCT/GB1993/001180 1992-06-04 1993-06-03 Method of training a neural network WO1993024898A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB929211780A GB9211780D0 (en) 1992-06-04 1992-06-04 Method of training a neural network
GB9211780.3 1992-06-04
GB9211910.6 1992-06-05
GB929211910A GB9211910D0 (en) 1992-06-05 1992-06-05 Method of training a neural network

Publications (1)

Publication Number Publication Date
WO1993024898A1 true WO1993024898A1 (en) 1993-12-09

Family

ID=26300990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1993/001180 WO1993024898A1 (en) 1992-06-04 1993-06-03 Method of training a neural network

Country Status (2)

Country Link
AU (1) AU4341093A (en)
WO (1) WO1993024898A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2806186A1 (en) * 2000-03-09 2001-09-14 Didier Henri Michel Louis Cugy Providing an imagination capability in a technical system, uses variation of components of vector representing knowledge, with feedback loop to control variation function

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
IEEE FIRST INTERNATIONAL CONFERENCE ON NEURAL NETWORKS vol. 2, 21 June 1987, SAN DIEGO , USA pages 541 - 548 KAN 'A probabilistic logic neuron network for associative learning' *
IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS vol. 3, 18 November 1991, SINGAPORE pages 1891 - 1897 NG 'A probabilistic RAM controller with local reinforcement learning' *
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS vol. 22, no. 3, May 1992, NEW YORK US pages 436 - 440 MATSUOKA 'Noise injection into inputs in back-propagation learning' *
IJCNN INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS vol. 3, 7 June 1992, BALTIMORE , USA pages 660 - 665 GUAN 'The application of noisy reward/penalty learning to pyramidal pRAM structures' *
IJCNN-91-SEATTLE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS vol. 2, 8 July 1991, SEATTLE , USA pages 525 - 530 FULCHER 'Autoassociative memory with "inverted pyramid" logic networks' *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2806186A1 (en) * 2000-03-09 2001-09-14 Didier Henri Michel Louis Cugy Providing an imagination capability in a technical system, uses variation of components of vector representing knowledge, with feedback loop to control variation function

Also Published As

Publication number Publication date
AU4341093A (en) 1993-12-30

Similar Documents

Publication Publication Date Title
Fiesler et al. Weight discretization paradigm for optical neural networks
Hinton Learning translation invariant recognition in a massively parallel networks
US5402522A (en) Dynamically stable associative learning neural system
Lehmen Factors influencing learning by backpropagation
US6167390A (en) Facet classification neural network
US5119469A (en) Neural network with weight adjustment based on prior history of input signals
Zheng et al. Radial basis function network configuration using mutual information and the orthogonal least squares algorithm
Lehtokangas et al. Initializing weights of a multilayer perceptron network by using the orthogonal least squares algorithm
Stone et al. A learning rule for extracting spatio-temporal invariances
Mandischer Evolving recurrent neural networks with non-binary encoding
Chiueh et al. Multivalued associative memories based on recurrent networks
Leighton et al. The autoregressive backpropagation algorithm.
WO1993024898A1 (en) Method of training a neural network
Petridis et al. A genetic algorithm for training recurrent neural networks
Guan et al. Noisy reinforcement training for pRAM nets
Hirase et al. A search for the optimal thresholding sequence in an associative memory
McGregor Further results in multiset processing with neural networks
Hong et al. Character recognition in a sparse distributed memory
Lazarevic et al. Adaptive boosting for spatial functions with unstable driving attributes
Peng et al. Adaptive self-scaling non-monotone BFGS training algorithm for recurrent neural networks
Kursun et al. Single-frame super-resolution by a cortex based mechanism using high level visual features in natural images
US20220309326A1 (en) Learning method of neural network and neural processor
Melnik et al. A gradient descent method for a neural fractal memory
Guan et al. Learning transformed prototypes (LTP)—A statistical pattern classification technique of neural networks
Bala et al. Rapid SAR target modeling through genetic inheritance mechanism

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BB BG BR BY CA CH CZ DE DK ES FI GB HU JP KP KR KZ LK LU MG MN MW NL NO NZ PL PT RO RU SD SE SK UA US VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA