WO1994024637A1

WO1994024637A1 - Hopfield neural networks

Info

Publication number: WO1994024637A1
Application number: PCT/GB1994/000818
Authority: WO
Inventors: Michael Anthony Gell; Shara Jahal Amin; Michael Robert Wistow Manning; Sverrir Olafsson
Original assignee: British Telecommunications Public Limited Company
Priority date: 1993-04-20
Filing date: 1994-04-19
Publication date: 1994-10-27

Abstract

A neural network based on the Hopfield model is used to operate a high speed packet switch, particularly for use in a broad band switching system. The optimisation parameters are established in a novel methodical way, and imposed attractors are used substantially to increase the speed of convergence. To reduce the likehood of local minima being found, noise is introduced into the system by varying a parameter of the neuron transfer function from iteration to iteration.

Description

HOPFIELD NEUKAL NETWORKS

The present invention relates to an improved Hopfield neural network. Specifically, although not exclusively, the improved neural network may be used to control packet switching in a packet switching network.

In a packet switching network, information to be transmitted is digitised and then formed into small addressable packets which are then transmitted over a synchronous network. It is expected that the broad band switching systems of the future will make use of packet switching [1] , and although it is generally agreed that such networks will rely on Asynchronous Transfer Mode (ATM) , many challenges still exist in realising the necessary high speed packet switching technologies [2,3] .

In particular, one of the essential features of such a system is the availability of fast packet switches to route the individual packets reliably and rapidly to their addressed destinations.

It has been proposed by a number of authors [4] to use neural networks for achieving ATM network control. More particularly, it has been proposed to apply the techniques of neural networks to switching and routing schemes; a general review of this field has been given by Brown [5] .

In a recent study, Ali and Nguyen explored the use of a Hopfield network for controlling a high speed packet switch [6] . Use of a Hopfield neural network for control of a switch was first proposed by Marrakchi and Toudet [7] . Although Ali and Nguyen demonstrated that the Hopfield dynamical network could be used to obtain high solution accuracy, long simulation times are still a problem. This puts severe restrictions on the size of the switching system which can be studied. Although the approach used by Ali and Nguyen was an impressive advance, it was to a large extent ad hoc and resulted in proposals for a network which was not only sub-optimal so far as speed of operation was concerned, but which also could not be guaranteed to converge to a valid solution.

Various attempts have been made to improve convergence speed, one technique being to alter the threshold function of the neurons as calculations proceed, to direct operation of the network to the desired result. Such an approach has been used by

Honeywell [11] . An alternative approach, used for example by Foo et al [15] , is to apply constant biases to certain of the neurons to enforce operation precedence relationships. Although both of these approaches may be of assistance in certain circumstances, they do not address the central problem of preventing the calculations from becoming extremely lengthy and unwieldy, and hence slow, for large networks having many active neurons.

A common aim of researchers in the neural network field is to design a robust network which does not easily get trapped in local minima. Various approaches have been used to improve network robustness, including the use of a transfer function which changes with time in a defined way. Ueda et al [16] discloses the use of a time-varying transfer function which gradually becomes sharper as the calculation proceeds. Neelakanta et al

[17] uses a rather similar approach, but based upon the analogy of a gradually decreasing annealing temperature.

Cheung [18] and Gsosh [19] disclose other approaches for changing the operation of individual neurons in a defined way as the calculations proceeds.

In order to optimise the type of neural network now referred to by his name, Hopfield suggested first of all determining an energy function for the problem to be solved, and then comparing that energy function with what is now called the Hopfield Energy Function to determine the weights and biases for the network. Such a method is described in, for example, Hopfield [13] and Chu [14] . The present applicants have found, surprisingly, that using an extension of this technique considerably more information can be obtained to assist in optimising the network.

Neural networks have been applied in a variety of circumstances, and these include routing systems and crossbar switches - see for example Fujitsu [20] and Troudet et al [12] . It is an object of the present invention to provide an improved neural network, based upon the Hopfield model, and particularly for use in operating high speed packet switches (although many other applications may be envisaged) . It is a further object to improve on the work of Ali and Nguyen [6] and to provide a neural network which converges more rapidly to a guaranteed, or virtually guaranteed, valid solution.

It is a further object of the present invention to provide an improved general purpose neural network, based upon the Hopfield model, which will have applications in many technical fields.

According to a irst aspect of the present invention there is provided a method of operating a Hopfield network to solve a problem, the solution to which is partially known, the method comprising forcing at least some of the neurons for which the correct values are known to take on the respective correct values, and operating the network to complete the solution to the problem.

The invention also extends to apparatus for carrying out the method, and it accordingly extends to a Hopfield network adapted to solve a problem the solution to which is partially known, the network including means adapted to force at least some of the neurons for which the correct values are known to take on the respective correct values, and means for operating the network to complete the solution to the problem. By forcing at least some of the neurons to their known correct values, it has been found that convergence of the calculations is considerably speeded up and the network generally becomes more stable.

Typically, where the network is used to solve switching or queuing problems, the individual elements of the solution matrix will converge to either an upper attractor (which may be 1) or a lower attractor (which may be 0) . Where solutions are already known for the individual elements of the solution matrix (for example because of some constraints on the problem to be solved, or from other external sources) the relevant entries may be forced to the corresponding attractor.

According to a second aspect of the present invention there is provided a method of operating a Hopfield network incorporating neurons having a transfer function with a graded response, the method comprising repeatedly updating the neuron outputs according to an updating rule, characterised in that the transfer function has at least one parameter which changes randomly or pseudorando ly between iterations.

The invention also extends to apparatus for carrying out the method, and accordingly also extends to a Hopfield network incorporating neurons having a transfer function with a graded response, the network including means for updating the neuron outputs according to an updating rule, characterised by means for varying a parameter of the transfer function randomly or pseudorandomly between iterations . Changing the updating rule from iteration to iteration introduces noise into the system, and enables the algorithm to avoid at least some of the non-global minima in which it might otherwise become trapped. This may be achieved by randomly or pseudorandomly changing the constant (/?) of a sigmoid function.

According to a third aspect of the present invention there is provided a method of operating a Hopfield network using an updating rule

x_y ( t) = i_j i t-1) +Δ y_kj+- A

( 23 ) or an equation substantially mathematically equivalent thereto, where:

Xi_j is an input to the neuron referenced by ij , y_±j is an output to the neuron referenced by ij , A, B and C are optimisation values, and, further, where

Xi_j and yi_j are related by a monotonic activation (transfer) function

(1) and C is chosen to be very much smaller than the sum of A and B.

The invention also extends to an apparatus for carrying out the method, and according extends to a Hopfield network adapted to operate using an updating rule

x_i3 { t) = x_i:j ( t-l ) +Δt| -ax_i A∑ y_ik-^B∑, V j ⁺ -r; k≠j k≠i ^Δ

(23) or an equation substantially mathematically equivalent thereto, where: x_i;j is an input to the neuron referenced by ij , y_Lj is an output to the neuron referenced by ij , A, B and C are optimisation values, and, further, where x^ and y_i:j are related by a monotonic activation (transfer) function

Vij = £ <^{e χ}i-.

(1) and C is chosen to be very much smaller than the sum of A and B.

It is not impossible that the network may still converge to a correct solution if the above inequality is not satisfied, but it has been found that the performance will not be as good. A value of Δt as high as 0.1 or 0.2 or even greater may be used provided that A, B and C are appropriately chosen. Preferably, C lies within the range 40 to 150, and

A and B (which will often be identical) are at least ten times greater than C (e.g. 20 times or 30 times greater) . The preferred optimisation parameters are A=B=1250 and C=100. This allows use of Δt=0.2 or higher. Previous work has concentrated upon other ranges for

A, B and C. For example, Ali and Nguyen [6] suggest A=B=100 and C=40, which we consider to be sub-optimal. We have found it is impracticable to determine the values of A, B and C solely by numerical simulation, since for any realistic application the number of possibilities to be tried is impossibly high. To get around this problem we have developed a new methodical system for determining more closely what the parameters should be, and we have found, surprisingly, that Ali and Nguyen's values were not ideal. By setting the parameters within our preferred new ranges and selecting an appropriate value of β , we have also managed to improve the stability of the neural network, and this enables us to use a very much larger step size (Δt = 0.2 as opposed to Δt = IO^"4 in Ali and Nguyen [6] ) .

The preferred application of the present invention is in a telecommunications switch, preferably a packet switch for use in a packet switching system. The practical realisation of the switch will depend very much upon the application, and the limitation of the various technologies involved. VLSI, optics, software and so on could all be exploited.

The neural network could be embodied either in hardware or in software. It could also be split between the two (hybridware) .

In a practical embodiment, a packet switch will desirably be associated with a separate queue manager, which sits in front of the switch, and provides prioritisation information to the neural network. If an incoming packet is experiencing an unacceptable delay (or alternatively if an incoming packet is designated as having a high priority) , the queue manager may modify the input matrix to the neural network to take account of the desired sequencing. The function of the queue manager could be achieved using conventional, neural, genetic algorithm, fuzzy algorithm or hybrid techniques.

In implementations in which the present invention is used for problem solving other than in telecommunications switches, it may still be desirable to have an input manager, sitting in front of the -neural network, and modifying the input matrix according to known constraints on the problem to be solved. In particular, the input manager will provide the neural network with prioritisation information where the network is undertaking some sort of sequencing task.

The invention may be carried into practice in a number of ways and one specific embodiment will now be described, by way of example, illustrating the use of the present invention in the field of high speed packet switching. The description of the preferred embodiment will refer to the accompanying drawings, in which:

Figure 1 is a schematic diagram of a high speed packet switch;

Figure 2 is a graph showing the Sigmoid function for values of β of 0.08 and 0.16;

Figure 3 shows the simulation results for an 8 x 8 input matrix with optimisation parameters set at A=B=1250, C=700 and /3=0.08;

Figure 4 shows the values of f (x) for the simulations of Figure 3 ;

Figure 5 shows the simulation results for an 8 x 8 input matrix with optimisation parameters set at A=B=1250 and C=100;

Figure 6 shows the values of f (x) for the simulations of Figure 5;

Figure 7 shows the simulation results for an 8 x 8 input matrix with optimisation parameters set at A=B=1250 and C=100. The figure shows the variation of y when the non- equested neurons are connected;

Figure 8 is a graph corresponding to that of Figure 7 but in which the non-requested neurons are not connected and are consequently excluded from the calculations; and

Figure 9 shows the relationship of the neural network and the queue manager.

The specific embodiment relates particularly to a neural network for operating a high speed packet switch, of the type illustrated in Figure 1. This figure is adapted from a similar drawing in Ali and Nguyen [6] . The purpose of a switch is to ensure that the addressed packets within the system are rapidly routed to their correct destination, along their respective requested pathways. A typical switch has a plurality of inputs and a plurality of outputs, the inputs receiving the packets to be routed and the outputs being connected to the various available pathways. Thus, an incoming packet on input 1 may request to be routed via any one of the switch's outputs. Similarly, packets arriving on any of the other inputs may request to be routed to any output. As will be seen from Figure 1, the switch of the present embodiment is an nxn switch; in other words there are n inputs and n outputs. Each of the n inputs has N separate input queues, one for each of the outputs. Accordingly, an incoming package on input 1 which requests to be routed to output 1 will be queued in the first queue of input 1. Other packets on that input requesting to be routed to output 2 would be stored on the second queue of input 1, and so on. It will be evident that there are a total of n² input queues.

The switch is operated synchronously, and its task is to transfer the packets from the input queues as rapidly as possible, to the requested outputs. Where the requests for transfers are arriving more rapidly than the capacity of the switch, the switch has to choose the packets from the various input queues in such a way as to maximise the throughput. In the present embodiment, it is a neural network which solves this optimisation problem.

Following the method of Ali and Nguyen [6] , a

Hopfield neural network is used. The general Hopfield energy function is compared with the calculated energy function that will maximise the throughput to give the desired differential equation which is the solution to the switching problem. The differential equation includes undefined constants (optimisation parameters) which Ali and Nguyen determined by simulation. As will be described later, the present invention provides a substantially improved method for determining what the optimisation parameters should be for stable and rapid convergence. We will start by considering the basic Hopfield model, and then go on to consider how this can be used to deal with the particular switching problem under consideration.

Hopfield Neural Network - Basic Model

The neural network used in this present embodiment is the -Hopfield model [8,9] , this consists of a large number of processing elements (the neural cells) which are interconnected via neural weights. At any moment in time, each neuron can be described by two continuous variables, the neural activity level x_i;j and the neural output yi_j. These variables are related by the non¬ linear processing function f, as follows:

(1) where f is taken to be some non-linear monotonically increasing function. This function is called the activation (or transfer) function. ^■ The exact form of f is not particularly important, and any appropriate non¬ linear monotonically increasing function could be used. In the preferred embodiment, however, f is taken to be the sigmoid function

f ix) = l+exp ( -βx)

( 2 ) where β is the gain factor, which controls the steepness of the sigmoid function, as illustrated in Figure 2. The Hopfield equation describing the dynamics of a an individual neuron is given by dx- mi -A

— at - = ^~axij⁺ ∑ T_{ijι kl}y_kl ⁺I_: ID k. _=ι

(3) where T_ij/kl is the weight matrix which described the connection strength between the neurons indexed by (ij) and (kl) . I_±j describes the external bias (sometimes referred to as the "external bias current") which is supplied to each neuron.

Hopfield has shown that for the case of symmetric connections T_ijrkl=T_klrij and a monotonically increasing processing function f, the dynamical system (3) possesses a Lyapunov (energy) function which continually decreases with time. The existence of such a function guarantees that the system converges towards equilibrium states. These equilibrium states are often referred to as "point attractors" .

The Hopfield-Energy function is of the following form:

(4) E = - t-¹ 1*., ) dx.^"

where the λ_ι;j are positive constants, and x' is the dummy variable over which the integration is made. This equation is set out in Hopfield [8] .

It is easily established that with λ_ι:j=a for all ij, the equation of motion (3) can be written as

BE

^Xl> ^{= "} y_

(5) where the dot denotes differentiation with respect to time. From this relation we derive the partial differential equation

(6)

The inequality follows from the fact that the processing function (2) is monotonically increasing. If we drop the integral term in (4) (as is usual in calculations of this sort) the time derivative of the energy function becomes

(7 )

If we are to obtain convergence, the equation (4) must continually decrease, and hence the right hand side of equation 7 must be less than or equal to 0. In general, this would not be the case, because of the second term on the right hand side. But in the large gain limit (in other words as the equation (2) tends towards a step function, and β is large) the derivative df-^/dx.., becomes a delta function and therefore (dt.^/dx... ) ^. tends to zero. This establishes a result discussed by Hopfield [8] . Accordingly, provided that the value of β in equation (2) is appropriately chosen, we can be certain that the system converges and that at equilibrium we have not introduced any inaccuracies by dropping the integral term from equation (4) .

There are other approaches to making a Lyapunov function, for example making the decay a zero in the dynamical equation. While this approach has been discussed by Aiyer et al [10] , it is not the preferred approach in the present embodiment and accordingly will not be discussed further.

Formulation of the Switching Problem for Neural Natworlc Solution: Referring back to Figure 1, it will be recalled that we are concerned with solving an optimisation problem for a square switch of size n x n. We will use the method of Brown, and Ali and Nguyen [5,6] to define the status of the switch at any time as an n x n binary matrix which will be designated y. Let r₁₃ be the number of packets at input i requesting a connection to the output j . The status of each input queue is given as an initial condition to a neuron; so, for an n x n switching fabric a total of n² neurons are required. The initial conditions of the neurons (in other words the status of the input queues) are then represented by the matrix

(9)

Where

I 0 if _∑ij=0 I ^yiJ ⁼ |ι if ∑_£j≥ι j

(10) In other words, r., is unity if a particular input queue is busy, and is zero if it is idle. In this formulation, the rows of the matrix y represent the input lines, and the columns represent the output lines. Every index pair (ij) defines a connection channel. During each time slot, only one packet can be permitted per channel: in other words, during each time slot at most one packet may be sent to each of the outputs, and at most one packet may be chosen from each of the inputs. The task of the neural network is to take the input matrix y, and operate upon it, repeatedly, to produce an output or configuration matrix which actually sets up the channel connections, that is defines the packets that are to be chosen to achieve maximum throughput within the switching constraints. Given the switching constraints, it is clear that the output (configuration) matrix can have at most one non-vanishing element in each row and one non-vanishing element in each column. More than a single element in each row, or a single element in each column, would mean that the switching constraints have been violated in that the switch was either trying to pass two packets at once from a single input or to pass two packets at once to a single output.

If the input matrix contains more than one non- vanishing element: in any row, or more than one non- vanishing element in any column, then there are more requests for connections than the switch can handle during that time slot. Since only one input can be connected to one output at any time, the switching mechanism will have to choose only one request and force the rest to wait.

To take an example, if there are requests that each of the inputs for packets to be transmitted to all of the outputs, the input matrix y will be

For that input equation, the output (configuration) matrices that maximise the packet flow through the switch are as follows:

'1 0 0' '1 o o^N '0 1 0' '0 0 1' 'o o i' '0 1 0^N

0 1 0 / 0 0 1 / 1 0 0 1 0 0 / 0 1 0 / 0 0 1

,0 0 1, 0 1 0, ,0 o _., ,0 1 o, J 0 0, J o 0,

Each of these will be called "valid" solutions because they both satisfy the switching constraints and they also maximise the switch throughput. Other configuration equations such as

are not valid. The first of these satisfies the switching constraints but does not maximise the throughput because no packet is chosen from the first input even through there is at least one packet waiting. The second matrix violates the switching constraints in that it attempts to select two packets at once from the first input, and to send two packets' at once to the first output.

Sometimes, of course, there will not be packets waiting at each of the inputs, and the input matrix y will not be full. In that case, the number of valid solutions will be fewer. To take an example, if the input matrix is as follows:

The valid output matrices are

It should be noted that if a row or column of the input matrix contains only zero entries, then in the nomenclature we are using, the resultant output or configuration matrix is valid only if that also contains only zeros in the corresponding rows and/or columns. Otherwise, connections would be made by the switch even through there was no request for such a connection. The following matrix:

'0 0 1\

1 0 0

,0 1 o, would not be a valid configuration matrix, even though it maximises the throughput of the switch and does not violate the switching constraints, because it sets up a connection between the first input line and the third output line which was not requested by any incoming packet. The way in which this is achieved, which is one of the novel features of the present embodiment, will be discussed in more detail below.

The Energy Function:

From an analysis of the configuration matrices, one can construct an energy (Lyapunov) function for the switching problem." This function can then be compared with the standard Hopfield energy function to find the resulting weight connection matrix and the external biases. From the description in the preceding section it is easily established [6] that an energy function for the switching problem is given by

^{E = (}n-_yij ⁾ 1

(11)

This energy function takes on zero values only for configuration matrices that are solutions for a full request matrix/input matrix y (that is, for an input matrix which has at least one non-vanishing element in each row and one non-vanishing element in each column. The last term on the right hand side of equation (11) takes on positive values if the input matrix contains one or more zero rows or columns. A comparison with equation (4) (neglecting the integral term) gives the following expression for the weights and biases:

Ti_j._ki = -Aδ_ijt(l-δ_ij)--Bδ_JJ(l-δ_iΛ)

I ¹.^J. = 2

(12) Where δ_±j is the kronecker delta. Substituting this back into the dynamical equation (3) gives ^2 = -ax_i A± y_ik-B± y_kj + £ ^{α c} k*j k*i ^•6

( 13 ) which is the desired differential equation for the switching problem. Under this dynamical equation the individual neurons will develop towards configurations which minimise the energy function (11) . The parameters A, B and C are known as "optimisation parameters", and in previous work [6] these have been determined purely by trial and error. If the optimisation parameters are not chosen carefully, the equation (13) will either not converge at all, or it will converge only slowly. A further possibility is that the equation might converge to a solution which is not "valid" in the sense described above, for example, because it does not maximise the throughput of the switch. In practice, equation (13) is used in its iterated form as follows:

( 13a)

where Δt is the step length.

Establishing the Optimisation Parameters:

An important feature of the present invention relates to the determination of the optimisation parameters. As was indicated above, optimisation has previously been carried out by trial and error, but it is now believed that significant further progress can be made in this field by the use of a more methodical approach. To gain some initial information about the optimisation parameters A, B and C, ^'we will consider the dynamical equation (13) under conditions of equilibrium, that is when

—x_i n₇* = o dt (14) then from equation (13) , and substituting in equation (2) , we find

^xo,i3 = ^~AΣ ^{f ( χ}o,i ^~B∑ f ^(Xo.Itj^{) +}| k≠j k≠i ^Δ

(15) where x₀, _j is the value of x_xj at equilibrium. To put some further restrictions on the parameters, we should recall that in the final solution to the switching problem each neuron (that is, each entry in the configuration matrix y) will either be zero or one. Given that a valid solution has been found, this means that the matrix will have at most one active unit per row and one active unit per column. We can use this additional information to set further bounds on the optimisation parameters in the region close to equilibrium. If we assume, first of all, that (i,j) is a zero position, then it is straightforward to establish that the equilibrium condition reads

x, = -A-B+— ¹ 2

(16) where x_x denotes the first equilibrium solution. Because we are at equilibrium, we know that the associated y value must be close to zero, and from equation (2) we know that y only tends to zero as x tends to minus infinity. Accordingly, we can rewrite equation (16) as the following inequality

-A-B+£ << 0 2

(17)

In general, there will be n²-n positions in the network satisfying this condition. This solution may be referred to as the "negative attractor" solution, as it is the equilibrium solution obtained as x tends to minus infinity.

On the other hand, if we assume that the position (i_/j) represents a location which is tending towards the value 1, in equilibrium, then the equilibrium condition becomes

₂ ² = ^2

(18) where x₂ represents the second equilibrium solution. Using equation (2) , it is easily seen that y tends to 1 as x tends to infinity, and we can accordingly rewrite equation (18) as the inequality

— C >> 0 2

(19) In general, this condition will have to be satisfied at n positions in the network.

The final equilibrium condition mean that n neurons in the network have converged to one of the two attractors and n²-n neurons have converged to the other attractor. From equations (19) and (17) we find that the parameters have to satisfy the following inequalities

(20) 0 < C < 2 (A+B)

If a symmetric configuration matrix is required, this can be achieved be setting A=B, in which case the condition (20) reads

0 < C < AA

(21)

We have found, in fact, by extensive simulation, that at the limit the magnitudes of x, and x₂ are equal. Putting x = -C/2 into equation (16) gives revised versions of equations (20) and (21) , namely

0 < C < .A+B)

(20a)

0 < C < 2A

(21a)

In the simulations of the Hopfield model for the switching problem we find that any values of A and C which satisfy equation (21a) give the correct result provided that β and Δt are correctly chosen. This has been confirmed for a large number of input matrices and different matrix sizes.

A practical point is that A is always taken to be very much larger than C. This is due to the fact that a large proportion of the neurons have to approach the negative attractor while only a small number of them will approach the positive attractor. Taking A much greater than C speeds convergence, and allows us to use a large value of Δt.

We can assist converge further by selecting an appropriate value for β . In practice, we find that β should be greater than about 0.08, because the approximation in the basic Hopfield model which allows us to neglect the integral term in equation (4) assumes that β is reasonably large.

If a small value of Δt were to be acceptable in equation (13a) , then the term βx in equation (2) can be very large. To make the limiting values of y reasonably close to 0 and 1 we choose, approximately, _/3x greater than about 4. Hence, from equation (18) :

β ≥ £ C

(21b)

Since β has to be greater than about 0.08, as discussed above, we can write a general equation for β as

β = 0.08+2— ^p C

(21c) where K is an arbitrary constant, greater than or equal to zero, and C lies between zero and 2A (see equation (21a)) .-

If we require Δt to be larger, we have to be careful not to set β too large, or the model does not converge. Empirically, we find that for Δt greater than about 0.01 we obtain convergence provided^'that K in equation (21c) does not exceed about 10. A large value of Δt is of course what we are interested in, as this provides more rapid convergence than small Δt.

Discussion:

Having established these limitations of the values on the optimisation parameters, we have been able to investigate areas that Ali and Nguyen [6] did not consider. Ali and Nguyen used the values A=B=100 and C=40, which they considered quite ^' satisfactory. The present work has established that the values used by them were in fact very substantially sub-optimal . To verify this, we have checked one of the cases (n=4) which these authors considered. For the case of n=4, there are 65536 possible request matrices, of which only 200 were checked at random by Ali and Nguyen. They found their parameter settings to be acceptable for the 200 particular cases which were checked. Using their parameter settings for A, B and C we have checked all 65536 of the possible cases: it was found that 65392 outputs were correct but 144 outputs were incorrect (in other words, they provided less than optimal throughput) . Similar conclusions would apply to larger matrices such as n=5 and n=6. Ali and Nguyen did not specify a value of β , so we have used our value of 0.08. When we repeated the simulations with β set at 1.00 we found that their values of A, B and C do provide correct convergence for all cases, but convergence is extremely slow. Because of the very large number of possible input matrices for n=5 and n=6, it is quite out of the question to try all of them with our values, but for the 100000 that we have tried (which were randomly chosen) we have found that using parameter settings A=B=1250 and C=100 convergence was achieved in typically fewer than 30 time intervals, and in each case the solution has been valid. We cannot absolutely guarantee that with our methodical setting of the parameters the system will never converge to a non-global minimum, but it has never done so in the examples we have investigated so far. Provided that A and C satisfy equation (21a) , that Δt is correctly chosen, and that β satisfies equation (21c) we expect that the network will handle all types of input matrix. For rapid convergence, we take A very much greater than C.

From simulations that we have carried out, we believe that the value of C should typically be in the range of 40 to 150 and that A (which equals B) should have a value of at least one order of magnitude greater than that of C. So, for example, if C is taken to be 100 then A and B should both be greater than 1000. We have found that in practice the exact values of A, B and C are not critical to convergence provided that equation (21a) is satisfied, but A has to be much greater than C for convergence to be rapid. As an illustration of the evolution of the system with n=8 (that is, an 8x8 input matrix) , reference will now be made to Figures 3 to 6.

Figures 3 and 4 show the results for simulation of an 8x8 input matrix with optimisation parameters set at A=B=1250 and C=700.

Figure 3 shows how dx/dt varies with time, for each of the n² neurons, and Figure 4 similarly shows how the value of y varies with time. It will be seen that in Figure 4 some neurons are stable, but there are a number which are not and that there is no convergence to any final solution. Here, since A is not very much greater than C, one would expect convergence only for small Δt. The value of Δt used (Δt=0.2) is too large. In contrast, Figures 5 and 6 are corresponding graphs but for simulations based upon the values A=B=1250 and C=100. It will be seen that the system attains stability after only about 40 time intervals, and the system rapidly converges to a valid solution, with some of the neurons converging to one and others to zero. Here, Δt has been set to 0.2, as before, but now because A is very much greater than C the system is not unstable at larger values of Δt. Taking A very much greater than C allows us to use large Δt and hence obtain more rapid convergence than was possible in the prior art.

We have found by experiment that when no equilibrium condition is achieved this is often due to the fact that sometime during the evolution of the system one of the two separate attractors will disappear. By choosing A very much greater than C, we ensure the existence of two attractors throughout the time evolution of the dynamical equation. In this way, we have found that oscillations and unstable behaviour of the network is avoided.

Avoidance of Local Minima:

In any system in which the equations of motion are determined by a continually reducing energy function, there is a risk that the system may become trapped at a local minimum of the energy function. In the present embodiment, that would be equivalent to convergence upon a solution which satisfied the switching constraints but was merely a local maximum for the switch throughput, and not the global maximum. Where there is no foolproof method of avoiding convergence onto local minima, we propose that in practice some "noise" should be introduced into the system to avoid spurious convergences as far as possible. We achieve this, as the neural network is calculating its solution, by randomly varying the value of β in equation (2) . In the first iteration, the value of β is set at 0.08, and in subsequent iterations the value of β is randomly chosen to lie somewhere within the range 0.08 to 0.16. β , of course, is the gain factor which controls the steepness of the sigmoid function, as illustrated in Figure 1. Where β is taken at a value of greater than 0.08 (see equation 21(c)) , the maximum amount of noise can be taken to be equivalent to the value of β . Imposed Attractors:

A network to be used for switching purposes will normally be sparsely connected. This is easily seen as the first term on the right hand side of equation (12) is non-zero only if i=k and j does not equal k. This gives n²(n-l) connections. The second term has equally many non-vanishing contributions. The connection matrix (12) therefore defines 2n²(n-l) connections. This has to be compared with the maximum possible number of connections that a network of n² neurons can have: that is n⁴-n².

As has previously been mentioned, if the input matrix contains a null row or a null column, it is desirable that the output matrix should also contain a corresponding null row or column, since otherwise the neural network will have introduced connections where none have been requested. The energy function (11) and the connection matrix (12) do not guarantee this to be the case. This follows from the fact that the energy function (11) does not take on minima if one of the rows or columns have only vanishing entries. To avoid this occurrence, we propose that the null rows and columns should be decoupled from the Hopfield dynamics. We have found in practice that this both improves the convergence of the other neurons, and also permits considerable increases in speed to be achieved.

The null rows and columns may be decoupled from the Hopfield dynamics in a number of ways. One simple mechanism would simply be not to include the null rows and columns at all in any of the calculations. An alternative, and preferred, method is to force all of the neurons in a null row or a null column onto the negative attractor. What we do in practice is to decouple the input to the null rows and columns from all the other neurons. However, their constant outputs are still coupled. Hence the forced neurons remain fixed at all times at the negative attractor, but their outputs are still fed into the rest of the calculations, and will affect time evolution of the unforced neurons. The external biases of all the forced neurons are set to zero.

The effect of decoupling the null rows and columns will be illustrated by reference to Figures 7 and 8. These show the results for simulations with an 8x8 input matrix, with optimisation parameters set at A=B=1250 and C=100. The input matrices in each case had non- vanishing elements in the first and last rows, while all other elements were zero. It is obvious that this type of input will require only two^' connections. Allowing all the neurons to change freely according to the dynamical equation produced the output shown in Figure 7. This shows that n=8 neurons are connected, in other words that the network has artificially connected calls which are not required by the initial input matrix. However, if the zero request neurons are disconnected from the dynamics of the system by an imposed attractor, as previously described, then the network will converge to a valid solution as shown in Figure 8.

If we call the imposed attractor x₃, to distinguish it from the positive and negative attractors x_x and x₂ of equations (16) and (18) , we can use an equivalent to equation (16) to provide us with some idea of what the imposed attractor should be to ensure that the respective neuron always converges on zero. Since A=B and A is very much greater than C, we have from equation (16) the approximate equation

X-. = -2A

(22) In fact, we find that in practice the exact size of the imposed attractor does not particularly matter, provided that it is large enough and negative enough.

As an extension, it would be possible to use imposed attractors not only for the null rows and the null columns, but also for each individual neuron in respect of which no connection request has been made. If there is no connection request, an ideal solution will never cause a connection to be set up, and it is possible effectively therefore to decouple all of the zeros in the initial input request matrix, and to run the neural network only upon the entries which are initially non¬ zero. This further improves the speed of convergence.

Queue Manager:

In a practical embodiment of the present invention it may be necessary for particular arrangements to be made to deal with incoming packets that are experiencing unacceptable delays, are required to have a particularly high priority, or other specific sequencing requirements. In the preferred embodiment, therefore, there would be a queue manager which would sit in front of the Neural network and provide prioritisation labels or arrangements for the incoming traffic. This is shown schematically in Figure 9.

If an incoming packet is experiencing an unacceptable delay (or alternatively if a critical packet has been designated as having a high priority) , the queue manager will modify the input matrix (9) to take account of the desired sequencing. When the queue manager receives a packet from the input line, it will examine the destination address to determine the appropriate destination output line. It will also determine the priority and/or delay of that particular packet. Based upon the information it has read, it will then appropriately update the input matrix (9) , so changing the initial conditions that the neural network has to operate upon. It may also impose attractors, either positive or negative, or in other ways decouple certain neurons from the network according to the requested priorities of the individual packets.

The function of the queue manager could be achieved using conventional, neural, genetic algorithm, fuzzy algorithm or hybrid techniques.

One particular function that might be provided by the queue manager would be to adjust the input matrix (9) to take account of the fact that there may be more than one packet which is waiting for the particular connection. If a large number of packets start to build up, all waiting for a particular connection, the queue manager should have some mechanism for effectively increasing the priority of those packets to ensure that the queue does not become unacceptably long. An alternative method of achieving the same result might be to make use of an input request matrix in which each element is not merely 0 or 1, but is an integer representing the number of packets that are awaiting that particular connection. The higher the number waiting for a particular connection, the greater would be the initial value of y, according to equations (1) and (2) , and accordingly the greater likelihood there would be of that particular neuron converging to the value 1.

Calculation Procedure:

Finally, we will turn to the computational procedure to be used in operating the neural network. The procedure will effectively be the same, whether the network is embodied in hardware or in software. The procedure is as follows:

1. Receive the input request matrix, which contains 0's and l's in the simplest case, indicating which connections are requested and which are not.

2. To centre the request matrix on 0, convert the 0's to -2A's (the negative attractor) .

3. [Optional] Use the queue manager to make any appropriate amendments to the resultant matrix, for example to allow for higher priority on certain requests . Higher priority would be achieved by replacing a particular 1 entry with a higher number. At the same time, the queue manager will set up any imposed negative attractors that may be required by changing the corresponding entries to large negative numbers, (eg-2A) , and at the same time will arrange for those neurons to be decoupled from the others (as explained above) before calculations start.

4. The modified input matrix is then replaced by a neuron matrix using equation (2) (with the non- requested or forced elements remaining at the negative attractor, -2A) . The elements of this neuron matrix are the y_x- . The parameter β is set randomly within the range 0.08 to 0.16. In the first iteration of step 4, it is set at 0=0.08.

5. Iterate the differential equation (13a) (which contains the parameters A, B and C) to calculate new values of x_ι . The step length Δt may be, for example, 0.2. 6. Go to step 4, unless the system has converged, that is unless the y_ι: have not changed by a given amount since the last iteration. If all the neurons have converged, stop. Practical Realisation:

The network of the present ^• invention could be embodied either in hardware or in software, or in a combination of the two. Potential hardware technologies would include VLSI, optics and so on. The environment in which the network is to be embedded will of course play a key role in the determination of the medium. For example, if speed is of the essence, a hardware implementation would appear preferable, but of course that has to be offset against the fact that hardware implementations for large switches would be exceedingly complicated. A specification for a hardware (electrical) realisation of a Hopfield neural network has already been published by Brown [5] . The various parameters in the Hopfield model can be related to values of the various electrical components in the circuitry.

It is expected that the present invention will have application in very many different fields, and in particular to any problem in which a Hopfield energy can be calculated, and there is a requirement for an input matrix having at most one non-vanishing element per row and at . most one non-vanishing element per column, in other words where the problem is equivalent to the Travelling Salesman Problem. Potential application areas include network and service management (including switching of lines, channels, cards, circuits, networks etc.) ; congestion control; distributed computer systems (including load balancing in microprocessor systems, cards, circuits etc.) and decentralized computer systems; work management systems; financial transaction systems

(in banking, stockmarkets, shops etc.) ; traffic scheduling (airlines, trains, underground, shipping etc.) ; alternated production lines; reservation, storage and cargo systems (including airline, stock etc.) ; general scheduling and resource allocation problems; general channel assignment problems; fluid control systems (including oil, gas, chemical etc.); general control systems (including nuclear, cars, airlines, transport systems etc. ) ; system management systems (including satellite, mobile, audio etc.) ; robotic systems dealing with task assignment; task assignment in general; real time data acquisition and analysis systems; and queuing delay procedures.

The continuous case

The present work can be extended to the continuous case, where the inputs to the network can take on any value within a given range, rather than being restricted to O and 1. One particular way of doing this, as previously described, would be to allow the inputs to be any positive integer, for example an integer corresponding to the number of packets awaiting switching in the respective queue. Alternatively, the inputs could be truly continuous (not just stepped) .

The inputs may be multiplied by a spread factor f, prior to the network calculations being started, to vary the range that the input values span. Because the network calculations are non-linear, altering the input range may have profound effects on the operation and speed of convergence of the net.

In the truly continuous case the addition of noise is to be avoided. Imposed attractors may be used but, in practice, it has been found that they do not add very much speed to the speed of convergence of the net in many cases .

The continuous case can be considered for performing first order task assignment in a multi-service or heterogenous environment by maximising the sum of indicators of suitability of tasks for the chosen resources. The network can also be used for higher order task assignment taking into account, amongst others, intertask communication costs. Application areas include, amongst others, network and service management, distributed computer systems, systems for work management, financial transactions, traffic scheduling, reservation, storage, cargo handling, database control, automated production, and general scheduling, control or resource allocation problems.

REFERENCES

[1] A Pattavini, 'Broadband Switching Systems: First Generation' , European Transactions on Telecommunications, Vol 2, No 1, p75, 1991.

[2] M Listanti and A Roveri, 'Integrated Services Digital networks: Broadband Networks', European Transactions on Telecommunications, Vol 2, No 1, p59, 1991.

[3] CCITT, Draft Recommendation 1.150: B-ISDN ATM Aspects, Geneva, January 1990.

[4] A Hiramatsu, "ATM Communications Network Control by Neural Networks' , IEEE Transactions on Neural networks, Vol 1, No 1, pl22, March 1990.

[5] T X Brown, 'Neural Networks for Switching from E C Posner, Ed', 'Special Issue on Neural Networks in

Communications', IEEE Communications Magazine, p72, November 1989; see also A Maren, C Hartson and R Rap, 'Handbook of Neural Computing Applications', Academic Press, London, 1990.

[6] M M Ali and H T Nguyen, 'A Neural Network Controller for a High- Speed Packet Switch' Proc. Int. Telecommunications Symposium 1990, pp. 493-497.

[7] A Marrakchi and T Troudet, 'A Neural Network Arbitrator for Large Crossbar Packet Switches' , in IEEE Transactions on Circuits and Systems, Vol 36, No 7, pl039, 1989. [8] J J Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Sci. USA Vol.81, pp 3088-3092, Biophysics, May 1984.

[9] A good discussions of Hopfield networks and further references is given in J Hertz, A Krogh and R G Palmer, 'Introduction to the Theory of Neural Computation', Addison-Wesley, Redwood City, 1991.

[10] S.V. Aiyer, M. Niranjan and F. Fallside, A Theoretical Investigation into the performance of the Hopfield Model. IEE Transactions on Neural Networks, Vol 1, No 2, June 1990.

[11] Honeywell Inc, EP-A-0 340 742.

[12] Troudet et al, IEEE Transactions on Circuits and Systems, vol. 38, no. 1, January 1991, New York, US, pages 42 - 56.

[13] J J Hopfield, US-A-4 660 166

[14] P P Chu, IJCNN-91: International Joint Conference on Neural Networks, vol. 1, 8 July 1991, Seattle, USA, pages 141 - 146.

[15] Y S Foo, IEEE International Conference on Neural Networks, vol. 2, 24 July 1988, San Diego, USA, pages 275 - 282.

[16] Ueda et al, IJCNN International Joint Conference on Neural Networks, vol. 4, 7 June 1992, Baltimore, USA, pages 624 - 629. [17] Neelakanta et al, Biological Cybernetics, vol. 65, no. 5, September 1991, Heidelberg, Germany, pages 331 - 338.

[18] Cheung et al, IECON89, 15th Annual Conference of IEEE Industrial Electronics Society, vol. 1, 6 November 1989, Philadelphia, USA, pages 759 - 763.

[19] Ghosh et al, 1993 IEEE International Conference on Neural Networks, vol. 1, 28 March 1993, San

Francisco, USA, pages 359 - 364.

[20] Fujitsui Limited, EP-A-475 233.

Claims

1. A method of operating a Hopfield network to solve a problem, the solution to which is partially known, the method being characterised by forcing at least some of the neurons for which the correct values are known to take on the respective correct values, and operating the network to complete the solution to the problem.

2. A method as claimed in Claim 1 in which the inputs to the forced neurons from the unforced neurons are decoupled during operation of the network.

3. A method as claimed in Claim 1 or Claim 2 in which the outputs from the forced neurons remain coupled to the unforced neurons during operation of the network.

4. A method as claimed in any one of the preceding claims in which the problem is mathematically equivalent to the Travelling Salesman Problem.

5. A method as claimed in Claim 4 in which, when the values of the neurons are represented by a matrix, the rows and columns which are known to be null in the solution to the problem are forced to remain null.

6. A method as claimed in Claim 4 in which, when the values of the neurons are represented by a matrix, the matrix entries which are known to be null in the solution to the problem are forced to remain null.

7. A method as claimed in any one of the preceding claims in which the said correct values are all identical, and form one attractor of the network.

8. A method as claimed in Claim 7 in which the network has exactly two attractors, an upper attractor and a lower attractor, each of the unforced neurons being arranged by an operating algorithm of the network to converge on either the upper or the lower attractor.

9. A method as claimed in Claim 8 in which the forced neurons are forced to remain at the lower attractor value.

10. A method as claimed in Claim 8 in which the forced neurons are forced to remain at the upper attractor value.

11. A method as claimed in any one of the preceding claims in which the starting inputs to the neurons may take any value within a continuous range of values.

12. A method as claimed in Claim 11 in which the starting input values are multiplied by a spread factor f before the network is operated.

13. A method of operating a Hopfield network incorporating neurons having a transfer function with a graded response, the method comprising repeatedly updating the neuron outputs according to an updating rule, characterised in that the transfer function has at least one parameter which changes randomly or pseudorandomly between iterations.

14. A method as claimed in Claim 13 in which the parameter is a non-monotonic function of the iteration number.

15. A method as claimed in Claim 12 or Claim 14 in which the transfer function is defined by y=f(x) , or a function substantially mathematically equivalent thereto, where x is the neuron input, y is the neuron output, and

f ix) = l+ex (-βx)

(2)

β being the said parameter.

16. A method as claimed in Claim 15 when dependent on Claim 16 in which β is chosen randomly within the range between about 0.08 and 0.16.

17. A method of operating a Hopfield network using an updating rule

x_±i(t) = _Xij(t-l) ₊Δ

(23) or an equation substantially mathematically equivalent thereto, where: x_Lj is an input to the neuron referenced by ij , y is an output to the neuron referenced by ij , A, B and C are optimisation values, and, further, where x_1D and y₁₃ are related by a monotonic activation function

(1) characterised in that C is chosen to be very much smaller than the sum of A and B.

18. A method as claimed in Claim 17 in which C lies between about 40 and 150.

19. A method as claimed in Claim 17 or Claim 18 in which A is at least ten times greater than C.

20. A method as claimed in any one of Claims 17 to

19 in which B is at least ten times greater than C.

21. A method as claimed in any one of Claims 17 to

20 in which the network is operated with A set equal to B.

22. A method as claimed in any one of Claims 17 to

21 in which A, B and C are substantially as follows: A=B=1250; C=100.

23. A method as claimed in any one of Claims 17 to 22 in which the activation function f (x) is defined by

f ix) = 1+exp (-βx)

(2) 24. A method as claimed in Claim 22 in which the value of βC is approximately 4 or greater.

25. A method as claimed in Claim 23 or Claim 24 in which β satisfies the equation

β = 0.08+2— C

where K is an arbitrary constant, greater than zero.

26. A method as claimed in Claim 25 in which K is less than about 10.

27. A method as claimed in any one of Claims 17 to 24 further including setting an initial condition for those neurons for which the correct solution is known, to hold the said neurons at their respective correct values.

28. A method as claimed in Claim 24 in which the initial condition is

x, = -Λ-S+— ¹ 2

(16)

29. A method as claimed in Claim 27 or Claim 28 in which the said initial condition is

x. ² = 2

(18)

30. A method as claimed in any one of the preceding claims including the step of amending the initial conditions prior to operating the Hopfield network, in dependence upon constraints on the acceptable solutions.

31. A method as claimed in Claim 28 in which the said step comprises multiplying the initial values of the neurons by a spread factor f, not equal to one.

32. A method as claimed in Claim 28 in which the amending step is carried out by a further neural network.

33. A method of operating a telecommunications switch using a method of operating a Hopfield network as claimed in any one of the preceding claims.

34. A method of operating a telecommunications switch as claimed in Claim 33 in which the switch is a packet switch.

35. A neural network operated by a method as claimed in any one of the preceding claims.

36. A telecommunications switch incorporating a neural network as claimed in Claim 35.

37. A telecommunications switch as claimed in Claim 36 including a queue manager adapted to amend the initial conditions, prior to operating the neural network, in dependence upon the priorities of calls waiting to be switched.

38. A telecommunications switch as claimed in Claim 36 or Claim 37 in which the switch is a packet switch.

39. A telecommunications network incorporating a switch as claimed in any one of Claims 36 to 38.

40. A Hopfield network adapted to solve a problem the solution to which is partially known, the network being characterised by means adapted to force at least some of the neurons for which the correct values are known to take on the respective correct values, and means for operating the network to complete the solution to the problem.

41. A Hopfield network incorporating neurons having a transfer function with a graded response, the network including means for updating the neuron outputs according to an updating rule, characterised by means for varying a parameter of the transfer function randomly or phseudorandomly between iterations .

42. A Hopfield network adapted to operate using an updating rule

( 23 )

or an equation substantially mathematically equivalent thereto, where: Xi_j is an input to the neuron referenced by ij , y_i;j is an output to the neuron referenced by ij , A, B and C are optimisation values, and, further, where x_tj and y_Li are related by a monotonic activation function

(1)

characterised in that C is chosen to be very much smaller than the sum of A and B.

43. A method of operating a Hopfield network substantially as specifically herein described with or without reference to any one or compatible combination of Figures 1, 2, 5, 6, 7, 8, 9.

44. A packet switch substantially as specifically herein described with or without reference to any one or compatible combination of Figures 1, 2, 5, 6, 7, 8, 9.

45. A method of optimising a Hopfield neural network comprising: (a) determining an energy function for the problem to be solved;

(b) comparing the said energy function with the Hopfield energy function

^E = ^~ Σ ^τi3.ki^yi3^y _kr ∑ lij^yij* Σ λ_iJ ^*li) f-¹ i^χ__J ⁾ d^χ _i ^' _j i , j, k. l=l i .3=l i , 3=l °

(4) to determine the weights and biases for the network; characterised by:

(c) substituting the weights and biases into the Hopfield dynamical equation

( 3 ) to give a differential equation describing the expected time-development of the network; (d) deriving information about ^• any undetermined optimization constants in the said differential equation by considering the equilibrium condition:

— dx 1a2. = o dt (14)

(e) deriving further information about the optimization constants by separately considering (i) equilibrium for those neurons which have converged to an upper attractor; and (ii) equilibrium for the neurons which have converged to a lower attractor.

46. A method as claimed in Claim 17 in which the starting inputs to the neurons may take any value within a continuous range of values.