EP0366804A1 - Method of recognizing image structures - Google Patents
Method of recognizing image structures Download PDFInfo
- Publication number
- EP0366804A1 EP0366804A1 EP89903795A EP89903795A EP0366804A1 EP 0366804 A1 EP0366804 A1 EP 0366804A1 EP 89903795 A EP89903795 A EP 89903795A EP 89903795 A EP89903795 A EP 89903795A EP 0366804 A1 EP0366804 A1 EP 0366804A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- neural network
- processing
- elements
- neural
- high order
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Abstract
Description
- The present invention relates to a method for constructing a neural network capable of solving problems such as recognition, which are difficult to solve by means of a prior art computer as well as the application thereof to pattern recognition, initial visual sensation processing, movement control, numerical analysis, etc.
- Heretofore, learning, memorization, identification, etc. are discussed in "Parallel Distributed Processing I and II" by Mcclelland and Rumelhart (MIT Press, 1986). However neither knowledge on the cerebral physiology of living body, which is developed in the highest degree, is reflected therein nor discussion is done on the structure of the network, the speed of calculation, etc., which are problems, in the case where a practical application thereof is premised. In addition, no method for constructing the network for an object depending on the time is described therein.
- On the other hand, a method for solving a neural network as an energy minimizing method is described in "Hop-field & Tank" (Science, Vol. 233 pp. 625-633 (1986)). However the neural network dealt with there is restricted to a monolayer and any solution cannot be obtained within a practical calculation time.
- Hereinbelow a prior art technique by the minimum and maximum searching method for solving the neural network as an energy minimizing problem will be explained.
- When the minimum (maximum) of a given cost function E was obtained, in the case where the cost function had a number of extreme values, generally it was difficult to obtain this minimum by the definite hill-climbing method as a prior art method. This is because, when a value in the neighborhood of a certain extreme value is given as an initial value, the system falls in a minimum value close thereto because of the fact that the method is definite and it is not possible to get out therefrom. Heretofore, in order to solve this problem, a definite hill-climbing method called simulated annealing has been proposed. Simply speaking, it is tried to reach the final destination by making it possible not only to climb the mountain but also to descend therefrom with a certain probability. By the method most widely utilized, taking a problem for obtaining the smallest value of E as an example, it can be solved as follows. At first, instead of considering directly the cost function E, it is considered to maximize a Bolzmann distribution P - exp (-E/T). The parameter T introduced therein is called temperature, which is introduced in order to generate random noise to make it possible to treat the problem statistically. Consequently, when the value obtained by calculation reaches a minimum value, it is necessary to set T at 0 and to make it stay at the minimum value without error. It is the greatest problem of the simulated annealing to determine the cooling schedule how to decrease T.
- As discussed in IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 6, pp. 720-741, (1985), by the Geman brothers' schedule widely utilized heretofore, states are generated according to the Bolzmann distribution to fulfil T(t) = TO/log (t+l), to being a positive constant. Here t corresponds to the number of Monte Carlo simulations and here it is defined that it represents the time. It is a matter of course that as t increases, T(t) approaches 0. Although several examples, in which this method can be successfully applied, have been already reported, there are many cases where it is not always successfully applied. Further, as discussed recently by Szu and Hartley in Physics Letters, vol. 123, pp. 157-161, (1987), in order to increase the convergence to the maximum of P, another schedule of T(t) = TO/t+l has been proposed, which uses Lorenz distribution having a wider spread in stead of Bolzmann distribution. However a disadvantage common to these schedules is that no function form of the cost function, which is to be minimized, is taken into account at all. It is not reflected on T(t) what kind of cost barriers (difference in the cost between a minimum value and a maximum value in the neighborhood thereof) is to be climbed and when the final value is reached (when T is set at 0). Numerically it is proved that the desired greatest value of P is always reached by both the methods, when infinite time has lapsed. However, in practice, although there are cases where the smallest value is reached within a finite time, during which the simulation can be executed, since there are many cases where it is not, the value of utilizing them is not always high.
- The disadvantage common to the schedules stated above is that the function form of the cost function, which is to be minimized, is not reflected on the temperature T. Therefore, in practice, the smallest value cannot be obtained often within a finite time, during which a simulation can be executed.
- An object of the present invention is to provide a high order information processing method using a neural network capable of determining the optimum structure of the neural network, in which a synapse structure based on the physiological knowledge on the living body or inferred from the knowledge is taken-in, to realize a high degree information processing function such as feature extraction, feature unification, memorization, etc. and making possible applications to pattern recognition, movement control, etc. making efficient use of the robust recognizing power thereof as well as applications to optimization problem, large scale numerical analysis, etc. making efficient use of the parallel processing power thereof.
- The problem for achieving the above object is to construct concretely a neural network effecting
- 1. feature extraction,
- 2. feature unification, and
- 3. memorization.
- Another object of the present invention is to provide a minimum and maximum searching method improving the problems of the prior art techniques described above. In order to achieve this object, the temperature T depends not only on the time but also on the function E, i.e. T = T(t,E). As a guiding principle for determining the dependence of the temperature T on E, it is required to minimize the time from the initial state to the state where the minimum value, which is the final target, is given. tl being the final point of time, E is determined so that this tl is minimum.
- A maximization problem of a Bolzmann distribution exp (-E/T) in a one-dimensional space is taken as an example. The basic procedure by the simulated annealing is as follows (Fig. 11). At first the distribution is rewritten as exp (-E/T) = exp {-∫(E(x)/T) 'dx}, where the mark ' represents a differential with respect to a spatial variable and ∫ --- dx an integral. If T were a function of only t, this formula would be a simple equation. Now, denoting the difference in the cost between a certain state x (block 201) and the succeedingly generated state x' (block 202) by △E = E(x') - E (x) ≒ E(x)' (block 203), the probability that the state passes to x' (block 204) is expressed by max [1, exp{-△ (E/T) }] . Whether the passage from the state x to the state x' is allowed or not is determined by comparing this value with a uniform random number η from 0 to 1 (block 205). Consequently, if ΔE < 0, the state passes necessarily to x' and if △E ≧ 0, i.e. even if the cost becomes higher, the passage is allowed with a probability determined by △E (
blocks 206 and 207). - The dynamic process from a certain given initial state to the maximum value is defined by the following time development equation
- As a line for determining the aimed H(t,x), at first, it is required to effect the search with the shortest period of time. That is,
- Rearranging the problem, it is to determine a function H*, which is optimum for minimizing the cost expressed by Eq. (d) for the dynamic equation according to Eq. (a). The concrete procedure therefor will be described in the item "Best Mode for Carrying Out the Invention".
- Fig. 1 is a scheme showing the conception of an embodiment of the present invention; Fig. 2 is a scheme showing a method for constructing a neural network; Fig. 3 is a scheme showing a network for the feature extraction; Fig. 4 is a scheme showing a network for the feature unification; Fig. 5 is a scheme showing a network for the memorization; Fig. 6 is a scheme illustrating an example of application to the image recognition; Fig. 7 is a scheme illustrating an example of application to the movement control; Fig. 8 is a scheme illustrating an example of application to the optimum control; Fig. 9 is a scheme illustrating an example of application to a non-stationary partial differential equation; Fig. 10 is a scheme showing the conception of the whole algorism of a minimum and maximum searching device, which is another embodiment of the present invention; Fig. 11 is a scheme showing a calculation method of the simulated annealing; Figs. 12 and 13 are schemes illustrating examples of application of the present invention; Fig. 14 is a scheme of animage processing system, in the case where the present invention is used for an image processing.
- At first the principle of the neural network based on the present invention.
- A neural network for the feature extraction is a network hierarchically constructed, as indicated in Fig. 3(c). It is supposed that neural elements 331 are arranged two-dimensionally on each of
layers 332 and that there existcouplings 333 between different neural elements only between two layers adjacent to each other. - When a concrete circuit construction is determined, physiological knowledge is referred to. What is well known up to present is the MT field concerning the visual sensation field and the movement. Here a report entitled "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex" (J. Physiol, London, Vol. 160, pp. 106-154, 1962) by Hubel and Wiesel, which relates to the extraction of the feature of the former.
- Fig. 3(a) shows an experimental result indicating the orientation selectivity in the visual sensation region (the NL field) of the cerebral cortex. When an
electrode 32 is inserted obliquely in a cerebrum from the surface 31, it is known that a group of cells reacting at a special value (with a 10° interval in the experiment) of the inclination of a light slit traversing the recepting field of the retina form a layer. In the direction perpendicular to the surface cells linked with each of the left and the right eye are concentrated. Therefore it can be seen that the cells are arranged in a module structure (Fig. 3(b)). Except that information from the left eye and that from the right eye appear alternately, the primitive features are extracted hierarchically in the visual sensation field by this module structure. For example, to each of the sides of a figure one layer corresponding thereto reacts to extract it. Further, although the layers have no inhomogeneity and are constructed by a completely same kind of neural elements, as information is transmitted, they extract sequentially different information as the result of the self organization. - Now, if this procedure is expressed mathematically, it can be written as follows;
- On the basis of the physiological knowledge as described above, an artificial neural network, which can realize the function expressed by Eq. (1), will be constructed. An example of the coupling structure between neural elements is indicated in Fig. 3(d). The state of elements in a layer of higher rank is determined as a function representing the state of 4 neural elements adjacent to each other in a layer of lower rank. In the following, a concrete method for determining the state of elements. Now the state of elements at a two-dimensional position r = (x,y) on a layer t is represented by fℓ (r). Then the coupling between elements on layers adjacent to each other is represented by a following state equation;
- Eq. (2) can be rearranged as follows;
- The stational solution Pℓ S of Eq. (4), in the case where ℓ is sufficiently great, can be given by;
- Now, using the probability distribution expressed by Eq. (5), the relationship in the state of elements between different layers expressed by Eq. (2) is defined as follows;
feature extracting circuit 21. That is, the average value of the state of elements located in lower layers is propagated to upper layers. - The procedure described above is a process called rememorization group transformation. It is more convenient to express the relationship by using the frequency region g than Eq. (6).
- The energy H({fℓ}) represents the coupling relation among elements in the layer ℓ. In general, H can be written as follows;
-
- What is understood from Eq. (10) is that the temperature schedule expressed by Tℓ = 4Tℓ-1 is at a certain critical value. This is because, since at this time λℓ-1 = 1, which corresponds to a simple averaging operation, as the signal propagates in the layers, the spatial distribution thereof becomes more and more uniform and finally only a signal having a uniform distribution can be obtained. This is an extreme smoothing processing, which means that all the information is lost. Therefore, introducing an extremely small quantity ε ≡ 4 - (Tℓ/Tℓ-1), the non-linear terms in Eq. (a) are left. In this way, calculating Eq. (8), two equations expressing the relationship among the coefficients;
- A concrete coefficient, in the case where ℓ is great, is given by;
- From Eq. (5) representing the probability distribution and H given by Eq. (13) it can be understood that the component Fℓ-1(√-|rℓ-1|) of the Fourier frequency, which is q = √1|rℓ-1|, gives the maximum of the probability. That is, in the layers ℓ-1 only the component Fℓ-1(√-|rℓ-1|) is extracted. Now, when the initial value r1 of rl is determined, based on the maximum frequency ∧, r approaches the value given by Eq. (13) in the order of;
- It is confirmed that the physiological experimental facts described previously can be simulated with the feature extracting network constructed as described above. The light slit is a viewed sensation object, which is symmetric with respect to a certain point. A
copy 372 of thelight slit 371, which copy has a given direction with respect to a certain direction (e.g. vertical direction), is prepared as indicated in Fig. 3(q). Then a group of slits including the slit can be defined unequivocally as a periodical function in the peripheral direction. The slit prepared in this way is inputted in the network stated above. If it is thought that the frequency q in Eq. (13) is the frequency in the peripheral direction, it is possible to take out successively specified frequencies. - Primitive information extracted by the
feature extracting network 414, e.g. the contour of a figure, etc., is inputted in a feature unifying network (Fig. 4(a)). Fig. 4(a) shows the process of unifying information by 3 layers as an example of the feature unifying network. - Each
neural element 417 located in afirst layer 413 bears respective primitive information. Since each group of information is exclusive, all the neural elements in the first layer are coupled with negative values with each other. That is, if an element corresponding to a certain group of primitive information is in the ignition state, the other elements should be in the pause state. Of course, in the case where a number of groups of primitive information are inputted simultaneously, since the corresponding elements are turned to the ignition state, no negative coupling is necessary among these elements. In general, the coupling is not always necessary among the elements in the first layer. - The
neural elements 416 located in asecond layer 412 are made correspond to information constructed by the primitive information from the first layer, e.g. figures. Consequently they are joined with the elements in the first element corresponding to the sides constructing each of the figures with apositive value 418 and with the other elements with a negative-value 419. Since each of the figures is exclusive, the elements in the second layer are coupled negatively. - The
neural elements 415 located in athird layer 411 are made correspond to information of higher order constructed by the information from the second layer, e.g. composite figures. Consequently they are joined positively with the elements in the second layer corresponding to a figure constructing each of the composite figures and negatively with the other elements. - The feature unifying process described above is not confirmed physiologically, but many alternative propositions are conceivable. For example, although a neural network consisting of 3 layers is used in this example, it may consist of 2 or 4 layers, depending on the object. Further the state represented by each of the neural elements may correspond to 1 group of information and also 1 group of information may be dispersed to a number of elements.
- Fig. 4(b) is a conceptual scheme for calculating concretely the state of the neural elements in each of the layers. The state of a marked element i in a
layer 421 is represented by x. 422. The variable xi (i = 1, 2, ----, N) is either +1 (ignition state) or -1 (pause state). The input to the marked element is a sum of a signal from anelement j 423 in the same layer and a signal from an element k 424 in another layer. Since the former has generally a negative effect, it has a coupling -Wij 1 (<0) 425 and the latter has acoupling W ik 2 426, which can be either positive or negative. That is, the total input can be written as follows; - The method, by which the state of elements is given in this way as the state, in which the energy function equation (16) is minimum, is disclosed in "Computing with neural circuits" by Hopfield & Tank (Science Vol. 233, pp. 625-633, 1986). However it doesn't deal with neural elements existing in a number of layers as in the present invention, but it takes only elements in a single layer into account. By this method, calculations are executed not successively from the lowest layer, as indicated previously, but the state of elements in all the layers can be calculated parallelly all together. Consequently the formulation according to Eq. (16) is an algorism suitable for parallel calculations.
- It is in fact very difficult to obtain the minimum value of the energy represented by Eq. (16), because, since the state xi is two-valued, i.e. ±1, a number of minimum values appear and the true smallest value cannot be well obtained. On the basis of such a background Kirkpatrick, Gelatt and Vecchi have invented a simulated annealing method published in "Optimization by simulated annealing" (Science Vol. 220, pp. 671-680, 1983), which is a smallest value retrieving method by repeation utilizing the probability. The essential point of the present invention consists in that it is possible to escape from a minimum value owing to fluctuations given to the state by introducing a parameter, which is the temperature. Hopfield and tank have found further to be able to obtain a lower energy, if a problem of the discrete quantity of xl = ±1 is transformed into a problem of a continuous quantity y1 (-∞ < yi < ∞) through a transformation expressed by xi = tan h (yi/constant). A disadvantage of this method is that it takes a very long time. A minimum and maximum retrieving method improved from this point of view will be described later.
- The minimization of the energy equation (16) is not restricted to the method as described above, but as an alternative method, e.g. the following method is known. By this method, since it is not necessary to introduce the tanh function for making the variable continuous, differing from the method described above, an excellent calculating property can be obtained. By the simulated annealing method, the maximization of the probability exp(-E/T) is taken into account instead of the minimization of the energy, where T is a positive parameter. This probability can be rewritten as follows, introducing a continuous variable zi(-∞< zi < ∞);
- When the kernel function of the integral of Eq. (17) is considered as a function of xi, the smallest value thereof can be obtained clearly to xi = -θ[∑ j zi(W)1/2 ji]. Further, when the kernel function is a function of z., since it is a
second order function 1 convex downward, zi = ∑j(W)2ijxj gives the greatest value of the kernel function. Here 0 is a stepwise function, which is 1, if the argument is positive, and -1, if the argument is negative. Consequently - Fig. 4(c) indicates the relation between the initial neural network, in which the neural element state xi 431 is coupled with an adjacent element xi 432 through
W ij 433 and the network equivalent thereto, in which the continuousvariable z i 434 represents the state of elements according to Eq. (17). All the coupling constants in the equivalent network are 1. The state of elements x. 439 is determined from the variable 1z i 434 calculated by using the equivalent network through aconvolution operation 437 thereof with the 1 square root (W)2 of the coupling constant and a comparingoperation 438. - The feature of the equivalent circuit thus constructed is that the calculation time (CPU time) is short, because it is not necessary to introduce newly the tanh function for making the function continuous, as Hopfield and Tank have done. Further, since the kernel function expressed by Eq. (17) is a second order function with respect to z., it is possible to preestimate an approximate value of the state zi giving the smallest value thereof, and in addition, since there exists no minimum values, it is possible to estimate the convergence from the initial state to the state, where the smallest value is given. In the minimization problem, since xi is two-valued, it was extremely difficult to determine the initial value, because numberless minimum values appeared, and in many case no suitable initial value was determined and no state of the smallest value could be obtained.
- A concrete algorism for realizing the method described above is indicated in Fig. 4(d0.
-
- ① Start of the calculation.
- ② The square root of the given coupling constant W is obtained. As an example, 1
- xij = (W2)ij is determined (block 441)
by obtaining the solution of ∑ k xikxkj = Wij - ③ The initial value of the continuous variable zi (i = 1, 2, ----, N) is set (block 442).
- ④ Based on z., the neural element state xi is 1 determined from xi = -θ[∑zi (W
2 )ji](block 443). Here θ is a stepwise function, which is 1, if the argument is positive, and otherwise it is -1. - ⑤ Based on xi determined in ④ , zi, which makes the kernel function expressed by Eq. (17) the greatest, is calculated e.g. by Monte Carlo method (block 444).
- The convergence is judged. If it is not, ④ and ⑤ are executed repeatedly. If it is, the process proceeds to the succeeding step (block 445).
- Termination of the calculation.
- The high order information such as the figure unified by the feature unifying network is stored in a neural network, as indicated in Fig. 5(a). An
input pattern 514 of the high order information is inputted in the lowest input layer and propagated to upper layers so that anoutput pattern 515 is outputted by the highest output layer.Neural elements 512 located in each of the layers 511 are located one- or two-dimensionally, corresponding to the input pattern. Further it is supposed that the state of each of the elements takes only 2 values, i.e. 1, if it is in the ignition state, and -1, if it is in the pause state. In the case where it is multi-valued, it can be dealt with by increasing the number of elements. - The principal function of the memory network is to store the relation between the in- and output patterns by learning 518.. For example, into which class the input pattern is classified, as in the classification (output patterns constituting classes) is stored, or the relation between hand-written letters and correct letters corresponding thereto, as in the recognition of hand-written letters, is stored. Or it is also possible to control appropriately a control object, whose behavior is unknown, by learning.
- Such a method for making a multi-layered neural network learn has been already developed in "Parallel Distributed Processing I and II" (MIT press, 1986). However, because of the following disadvantages, the use thereof in practice is limited to a small field.
- By the prior art method as disclosed in the publication stated above, from the point of view of dispersing the memory to all the synapse couplings, synapse couplings are spread over all the elements. Therefore the amount of information supported by each of the synapses is small so that even if incomplete information is given, complete information can be remembered as associated memory. However, since the time necessary for modifying the synapse couplings, depending on the learning, is proportional to the total number thereof, enormarous calculation time is necessary and therefore it is an undesirable structure in practice.
- The prior art learning algorism disclosed in the publication stated above is a back-propagation. By this method, at first, an appropriate initial value is set for the synapse coupling. Based on this initial synapse coupling, the state of elements is calculated from the lower layer to the upper layer one after another. In general, since the outputted value is different from a
teacher pattern 516, thedifference 517 therebetween is obtained. Then the synapse coupling is modified 519 so as to reduce the difference. The procedure described above is repeated, unitl the difference becomes 0. Such a method, by which the feedback function is the essence thereof, is intuitively understandable and the programming thereof is easy. But, on the other hand, the efficiency is not high from the point of view of the calculation time. - The memory described above, referring to the plasticity of the synapse coupling, corresponds physiologically to a long term memory. According to psychological experiments, it has been clarified that there exists a short term memory, for which no plasticity of the synapse is presumed, apart from the long term memory. A probability to obtain a memory method having a performance higher than prior art one is hidden in simultaneous technological application of the two memory mechanisms.
- On the basis of the background described above, the present invention gives a new method for the memory, which is a variation thereof. In the following the long term memory and the short term memory will be discussed separately.
- In the neural network data or patterns are not stored, as they are, but they are dispersed to be stored in the network in the form of values of the synapse coupling. That is, they are coaded dispersedly. Now N data sets Ii (i = 1,2, ----, N) are given as an input pattern. In general, the input pattern may be either one- or two-dimensional and either two-valued or multi-valued. In the case where they are multi-valued, since the input data Ii can be transformed into two-valued data by increasing the number of data sets, hereinbelow it is supposed that Ii is two-valued.
- The process, by which the input pattern Ii 514 propagates towards an upper layer, can be formulated, as follows. For the non-linear function F written in the form of f = F(Tx), a sigmoid function having a threshold value, whose output f is ±1 in the saturated state, is representative. Denoting the output of the elements within the layer t by fi(ℓ), a relational equation;
- Consequently memorization by learning is to determine the synapse coupling Wij(ℓ) (ℓ = 2, 3, ----, L) so that the output Fi(L) of Eq. (20) is equal to the
teacher pattern d i 516. However Eq. (20) represents a system consiting of N equations (here it is supposed that there are also N outputs) and if all the elements are coupled with each other, there are N∋(L-1) unknown coefficients WiJ (ℓ). That is, the unknown variables are excessively redundant. It can be thought that the amount of information per synapse has a magnitude of N/N 2(L-l) = 1/N(L-1). If N or L is great, since each of the synapses supports only little information, flexible processing such as e.g. associated memory is possible. However, in a real cerebrum, since N is greater than 10 billions, the ratio stated above is substantially equal to 0. Further, in the cerebrum, all the neural elements are not coupled with each other by the synapse coupling. This suggests that there exists a kind of structure in the synapse coupling. Still further, when a uniform neural network is considered in the cerebrum, it is not conceivable that any object dependent type structure depending on the kind of given input is formed therein. - According to the present invention, the optimum structure of the synapse coupling is determined on the basis of the cerebral physiological knowledge. In the prior art neural network aiming technological applications, each of the synapse couplings is changed by learning, but information supported by each synapse is completely uniform in the average. That is, the dependence of Wij (k) on the suffixes i and j is not so great. For this reason, all the synapses should be modified, which makes it difficult to use it in practice from the point of view of the calculation time. At the present stage cerebral physiological experiments don't clarify so far the detailed structure of the synapse coupling. In the present situation rather only a kind of statistical, i.e. macroscopic structure is understood. However, since the macroscopic structure is determined by the structure of the synapse structure, it is possible to conjecture the microscopic synapse structure by the method described below.
- Fig. 5(b) indicates an example of the macroscopic structure obtained experimentally according to "A quantitative Study of Snaptic Reorganization in Red Nucleus Neurons after Lesion of the Nucleus Interpositus of the Cat" by Murakami et al (Brain Research, vol. 242, pp. 41-53, 1982). In the upper figure variations of the number of
synapses T 522, to which degenerative terminals of cerebrum-red nucleus synapses are attached, are indicated in the function of thediameter R 524 of the tree-like protrusion and in the lower figure variations of thediameter R 524 of the tree-like protrusion are indicated in the function of the distance x 523 from the center of the cell having the tree-like protrusion. - In the upper figure a relation represented by T ~ R-α (a = 3) and in the lower figure a relation represented by R ~ x-β (S = 1) are obtained, where the mark ~ indicates a proportional relationship. The results described above relate to statistical quantities obtained by using a number of samples. It seems that the two figures are independent from each other at a glance. However, there exists an intimate relation therebetween as a conclusion from the structure of the synapse coupling. It is possible to infer the optimum structure of the synapse coupling by showing this fact.
- As indicated in Fig. 5(c), the branching of the tree-like protrusion is a bifurcation. By bifurcating n times, in
total 2nprotrusions 531 are obtained. This branching is a branching method, which can be seen fairly generally. The signal propagating in the tree-like protrusion is a pulse train, which is a medium transmitting information. Consequently the information transmission efficiency should be different for every branch. For example, when the cross-section of the tree-like protrusion is approximated by a circle, the transmission efficiency depends on the diameter R. - Now the diameter of a branch generated by n bifurcations is represented by
R n 532 and the diameter of a branch generated by the succeeding bifurcation byR n+1 533. Information sets flowing through these branches are represented by I 534 and In+l 535, respectively. Here the flow of information is replaced by a flow of electric current in an equivalent circuit. The energy consumed in this branch is considered here. The energy corresponding to electric power is clearly proportional to In 2/Rn 2, where the fact that the resistance is inversely proportional to the area of the cross-section of the protrusion 4πRn 2 is used. The volume occupied in the space by this branch is equal to 4πRn 2 x (length of the branch) and it is conceivable that the living body regulates this volume so as to occupy a space as small as possible. Forming a summation of the energy stated previously and this volume over all the branches; - Denoting the diameter of the initial tree-like protrusion going out from the cell by Ro, since R0/√2n = Rn, after n bifurcations
- Although the formula (21) represents a hypothesis concerning the microscopic structure of the synapse coupling, it will be shown below that a series of resulting formulas (22) to (24) deduced on the basis of this hypothesis can explain Fig. 5(b) indicating physiological experimental facts. A sphere having a diameter L, as indicated in Fig. 5(c), is considered. The total number Q of the tree-like protrusions, which find their way from a cell within this sphere to the surface thereof, is calculated. Supposing that the cells are distributed uniformly in this sphere, the total number of the tree-like protrusions finding their way to the surface of the sphere can be calculated, as follows, using the formula (24);
- On the other hand, another formula is deduced for Q from another point of view. The total number of Q of the tree-like protrusions is related clearly to the diameter L of the sphere and the number T of the synapse couplings at the surface of the sphere. This relation is expressed generally by;
- Now, from experiments, the number T of synapses depends on the distance x from the center of the cell as T ~ R -2 ~ x2β. Using the formula (24), when the relational equation is rearranged by using the formula 2 Q ~ x2β, another relational formula Q ~T
α is obtained. Although this has been obtained for one tree-like protrusion, it is supposed that it is similarly valid for an assembly of a number of tree-like protrusions. At this time the unknown function f in Eq. (28) shows a dependence expressed by (KklT) ~ LαTα. Consequently, Eq. (28) is transformed into; - Since information is made correspond to electric current, it is proportional to the area of the cross-section 4πRn 2 of the tree-like protrusion of the transmission medium (Formula (22)) and the amount of information, which can be transmitted for one distribution, is 1/2. Fig. 5(d) shows variations in the ratio of the amount of
information 542, which can be transmitted, depending on the number ofbranchings 541. From this figure, after 6 branchings, it is reduced to an order of 1% of the initial amount of information, which means that the information transmission is substantially impossible. Further, after about 3 branchings, it is reduced to 10%. That is, in practice, it is sufficient to take 3 to 4 branchings into account. For example, when a case where aneural element 545 is coupled with elements in the succeedingupper layer 543, is considered, it is sufficient to take only the coupling with a group ofelements 544 of 2 4 = 16 around an element, which is located just above theelement 545, into account. Here it is supposed that a tree-like protrusion is bifurcated from the central element towards elements close thereto, in the order of increasing distance, one after another. - Since the amount of information transmitted by the tree-like protrusion is expressed by the magnitude Wij (Eq. (16)) of the synapse coupling in an artificial neural network, the magnitude of W.. should be varied according to the Table indicated in Fig. 5(d). For example, if starting points are couplings of elements located at same positions in an upper layer and a lower layer adjacent to each other, Wnn(i) (i),i/Wi,i wnn(i), i/w i,i = 0.5, Wsn(i) /Wi /Wi,i = 0.25, etc. are valid, where nn(i) represents the element closest to an element; and sn(i) the second closest element. It is a matter of course that, if the memory is taken into account, the synapse coupling should be modified by learning. However it is conceivable that the degree of modification effected thereby is small. Consequently it can be thought that the ratios described above don't vary so remarkably. Hereinbelow a concrete learning rule is considered.
- The back propagation method, which is a prior art learning method, is discussed in detail in "Parallel Distributed Processing I and II" (MIT press, 1986). The basic conception is that synapse couplings are determined from an upper layer to a lower layer successively so that the following square error e is minimized, using the output fi(L) given by Eq. (20);
- Hereinbelow the processing procedure will be explained, referring to Fig. 5(e).
- Start of calculation
- Set initial values for states of elements fi (ℓ) (ℓ = 1, 2, ----, L) and synapse couplings WiJ (ℓ,) (ℓ, = 2, 3, ----, L). (Block 551)
- ③ Either the states of elements fi(ℓ) are calculated, starting from the given input, from the lower layer to the upper layer one after another according to Eq. (19) or fi(ℓ) is determined by executing the minimization, using the formula (17). (Block 552)
- ④ The structure of the synapse coupling according to the number of branchings is determined as indicated in Fig. 5(d) and for these synapses their coupling constants WiJ(ℓ) are corrected successively from the upper layer to the lower layer so as to minimize the formula (32). (Block 553)
- ⑤ The convergence is judged. If it is not convergent, ③ and ④ are repeated and if it is convergent, the process proceeds to the following
step ⑥, END. (Block 554) - ⑥ End of calculation
- Apart from the learning method considered in the above, there are alternative proposals as indicated below.
- The prior art back propagation method and the method indicated by the algorism in Fig. 5(e) are based on the suposition that all the synapse couplings W ij are independent from each other. Thus, by the learning method considered above, on the basis of the physiological knowledge, the structure of the coupling necessary and sufficient from the point of view of the transmitted information was determined. As the result, it was made possible to shorten the time necessary for the learning. However, in an artificial neural network it is possible to reduce the number of synapse couplings from another point of view.
- Now, it is supposed that all the neural elements are coupled between different layers. The synapse couplings WiJ(ℓ) are generated by another variable ξ1(ℓ). That is, they are generated by a variable having a lower dimension, instead of the initial N coupling variables. Denoting the dimension thereof by M, it is supposed that;
- By the methods described up to now, when a
teacher pattern d 1 516 is given, the sunapse couplings are corrected successively from the upper layer to the lower layer so as to minimize the square error the formula (32). Apart from such a repetition method, there is another method, by which the synapse coupling can be determined rapidly. That is, it is determined analytically. - The function F expressed by Eq. (20) is a non-linear saturation function. For example a sigmoid function is used therefor. Roughly speaking, the sigmoid function can be divided into saturated parts and a part put therebetween, which can be transformed into a linear form. This linear transformation is approximated by F = A + Bx. Now, supposing a case where all the elements behave in this part, the synapse coupling is determined. In the present invention, a 3-layered neural network is considered as an example. When Eq. (20) is rewritten by using this approximation.
- In practice, since all the elements don't work in their linear region, Eq. (36) is not valid therefor and they should be dealt with separately.
- Eq. (36) has another manner of utilization. In general, by the back propagation method, etc. according to the prior art techniques, a better result will be obtained, when the initial value of the synapse coupling is generated by using random numbers having small values. This is because, if the network is in the state most unstable in the initial state, it can be thought that the value converges rapidly into a stable state. That is, Eq. (36) can be used as the initial value for the synapse coupling.
- By the prior art method, although there were differences in the calculation algorism, it was divided into a part for calculating the state of elements and a part for modifying the synapse coupling by learning. However, apart from the easiness of the intuitive thinking, it is not necessary to calculate them separately. Here an algorism executing both of them simultaneously is shown. The learning as well as the part for calculating the state of elements are based, similarly to that based on Eq. (32), on the minimization of the energy according to the thought of Hopfield described above.. The energy, for which both the state of elements and the learning are taken into account, is represented by;
- A simpler formula can be obtained, if the separable type synapse coupling introduced previously Wij (ℓ) = ξi (ℓ) ξJ (ℓ-1) is introduced, where ξi (ℓ) represents a
new variable 563. If this formula is substituted for Wij(ℓ) in Eq. (37) - In a word, the long term memory is a mapping of information on synapse couplings or coding. The short term memory stores information by a method completely different from the mechanism of the long term memory. Fig. 5(g) shows results of a psychological experiment on the memory disclosed in "Human Information Processing: An Introduction to Psychology", 1977 by Lindsay and Norman. 19 subjects listened 30 words having no relation therebetween, which were presented them with a rate of 1 word per second. A time of 1.5 minites was given to each of the subjects for every termination of presentation of lists. They were asked to write all the words, which they could remember, in any order which they liked. 5 to 10 seconds after they had finished to write them a new list was presented. The procedure identical to that described above was repeated 80 times. A
serial position curve 573 is a curve obtained by representing thereproduction ratio 571 at this time in the function of theserial position 572 indicating the order, in which the words have been presented. The feature of this curve is that the reproduction ratio decreases up to an about 7th word counted from the word most newly presented and that it remains almost equal for the words preceding it. The former part is made correspond to theshort term memory 575 and the latter part to thelong term memory 574. - The short term memory is a mechanism, by which data given to a person are temporarily stored. Necessary data are selected among these data according to a certain criterion, for which data the procedure proceeds to the long term memory. The memory intending prior art technological applications was the long term memory and no attention was paid to the short term memory. consequently it is necessary to select previously necessary data among given data before storing them. That is, the judgment of the selection should be newly considered as a preliminary processing outside of the neural network. On the contrary, the neural network having the mechanism of the short term memory can incorporate such a selection mechanism in the network.
- Further the short term memory has an important aspect concerning the passage to the long term memory. As described previously, in the short term memory, the synapse coupling presents no plasticity, but it is constant. In order to be able to store information with unchangeable synapse coupling, the synapse coupling should be determined according to some rule. Therefore, when the procedure passages to the long term memory, if the coupling structure differs remarkably from that rule, the succeeding short term memory becomes impossible. That is, the approximate value of the synapse coupling is determined at the point of time of the short term memory. Therefore, in the following, a model, which can explain the psychological realization of the short term memory seen in the serial position curve, will be constructed and at the same time the rule imposed on the synapse coupling will be elucidated.
- An equation system representing the behavior of the state of the neural elements can be described by Eq. (16) or Eq. (19). Since both the equations are deduced from equivalent mechanisms, here Eq. (16) is considered. At first, it is considered how the memory in the short term memory can be expressed in Eq. (16). Since the synapse coupling W.. is invariant, it cannot be coded in WiJ as for the long term memory. Therefore the short term memory is made correspond to the
minimum of Eq. (16). According to theserial position curve 573, since it is shown that about 7 states can be memorized, corresponding thereto, a number of minimum values around that number are necessary. That is, the condition imposed on the synapse coupling therefor will be obtained. Now, supposing that Wij = const. (independent of i and j), in the case where the number of element is sufficiently great, depending on the sign of thethreshold value 6, xi = -1 or xi = 1 gives the smallest value of Eq. (16). When the dynamic process of Eq. (16) is considered, there exist no minimum values other than these values, but the state where all the neural elements have a same value is only one memory. Consequently the supposition that W.. = const. (independent of i and j) cannot explain the serial position curve. - The dynamic process to the minimum value of the energy is memorized by using the following probability equation.
-
- Here it is supposed that the synapse coupling Wij takes the following values (Fig. 5(h));
- + 1, p adjacent elements =
- Wij - 1, r adjacent elements
- Now, as the experiment for obtaining the serial position curve, the situation where words are presented to subjects one after another is considered. At this time the groups of neural elements 591 (Fig. 5(i)) corresponding to these words are turned to the ignition state one after another. Of course the groups of elements can be overlapped on each other. In any case, the fact that the words are presented one after another corresponds to that the number of elements coupled in each of the elements (group) increases. That is, (p + r) increases. Therefore the behavior of the average value <xi> of the state of elements in function of p+r will be examined.
-
- In general, since different <x.>s correspond to different stationary states, the number of stationary solutions of Eq. (41) is equal to the number of short term memories. As disclosed also in "Digital dynamics and the simulation of magnetic systems" (Physical Review B, Vol. 28, pp. 2547 - 2554, 1983) by Choi and Huberman, the result is indicated in Fig. 5(j). When the number of elements (p+r) 5101 is small, there exists only one
stationary state 5102. The bifurcation takes place, as (p+r) increases, and thus the number of stationary states increases in the form of atree 5103 as 2, 4, 8 and so forth. However, when the value of (p+r) increases further, exceeding that giving 8, there exist no stationary states, which gives rise achaotic state 5104. That is, there don't exist more than 8 stationary states (minimum values of energy) in this neural network. - From the result described above it was recognized that the short term memory can be realized by using a neural network having positive and negative random synapse couplings. Further, from the consideration described above, it can be understood that almost random synapse couplings may be presumed also for the long term memory.
- Hereinbelow some embodiments of the present invention will be explained, classifying them into several items for different objects of application.
- 1. Recognizing problem
- 1.1 Recognition of moving images
- 1.2 Initial visual sensation
- 2. Control problem
- 2.1 Control of movement
- 2.2 Optimum control
- 3. Mathematical problem
- 3.1 Method for solving non-stationary partial differential equations
- In Fig. 2 a neural network for high order information processing was constructed, starting from an orientation selective extracting circuit in the visual sensation in the cerebral cortex. Here a neural network for recognizing moving images will be constructed by applying this network.
- According to psychological knowledge, when a person recognizes an object, concurrent or competitive actions of physical signals (images) of a body, which is an object to be recognized, and conceptions (images) are necessary. There are many cases where the recognition is not possible only by one of them. That is, features are extracted from a number of physical signals entering the system, which are unified, and a significant recognition is effected by matching them with the memory. Consequently, in addition to the general structure indicated in Fig. 2, a feedback mechanism from the memory to the feature unification is necessary (Fig. 6(a)). Primitive features are extracted from the input by means of a
feature extracting network 611 and these primitive features are unified by means of a featureunifying network 612, which are matched with images stored in amemory network 613.Images input image 623, the pixels, where variations have taken place, are marked. Further, since the direction cannot be judged by this alone, information on the approximate direction exists separately. - The linear direction of the movement is extracted from the image thus prepared by means of the
feature extracting network 611 indicated in Fig. 6. Further the featureunifying network 612 determines the direction of the movement by using the extracted linear direction as a group of.continuousstraight lines 631. However, in general, there is no unnatural movement as indicated by 631 in Fig. 6(c). That is, conceptions on the movement, concretely speaking, that the movement follows asmooth curve 632, etc. are incorporated in amemory network 613. In this way asmooth curve 632 is constructed by matching them. - Various processings such as the recognition of the movement direction, the recognition of the depth, etc. in the initial visual sensation can be formulated as an inverse problem of the problem of identifying the solution from the input data. That is, a system of equations deduced spontaneously from the problem, which is the object, and limiting conditions based on some apriori information are necessary, because the solution cannot be determined only by the system of equations. Now, denoting the variable to be obtained by x and input data by I, the problem can be formulated as a minimization problem as follows;
- In the following, in order to consider a concrete formulation, the movement direction sensation will be examined. Since the input image (two-valued) I is constant with respect to the movement direction and remains unchanged,
-
- It is to determine the input u(t) to the system so as to follow the trajectory dd(t) of the target depending on the time, just as for the control of a robot manipulator, to effect the movement control (Fig. 7(a)).
- Explanation will be made, by taking a robot manipulator as an example. Now, denoting an n-dimensional articulation angular vector by 0, the movement equation can be given by;
- However, for the control of the robot manipulator, the determination of a model with a perfect precision is almost impossible because of the non-linearity of the dynamic characteristics and the undeterminacy of the parameter as well as the non-linearity between the operation space and the articulation space, etc. Consequently it is necessary at first to identify the dynamic characteristics of the system by means of the neural network. Therefore the dynamic process of the neural network having the time dependence is defined as follows;
- As soon as the neural circuit network has learned and the learning is terminated, Fig. 7(b) indicates a simple feedforward control. That is, as long as the output x of the robot manipulator differs from the target trajectory xd(t), uT is calculated by using control rule of the articulation torque, depending on the difference x-xd(t). Further the articulation torque uN is calculated from the input xd(t) to the neural network, which has not yet learned and u = uTLUN is inputted in the manipulator as an external force. When the learning is completely terminated, since u = uN (uT = 0), the torque uN from the neural network is inputted directly in the manipulator and the process proceeds from the feedback control to the feedforward control.
-
- As another method, the s method proposed by Balakrishnan in the optimum control theory can be utilized. That is, it is to determine u, which minimizes
- A useful application of. the method described above to the optimum control problem is considered, making the most of the parallel processing power of the neural network. Since the optimum control problem can be formulated in general as the minimization (maximization) of a certain evaluation function, this method can be applied to fairly numerous problems. Here it will be shown as an example that this method can be applied to the ε method proposed by Balakrishnun.
- Now it is supposed that the dynamic process of the object system follows a differential equation;
- By the ε method the problem stated above is formulated as a problem of minimizing:
-
- If the time is divided in the form of a time mesh and each of the elements expresses Xi or Vi in order to execute the mapping on the neural network, a scheme indicated in Fig. 8 is obtained. Each of the layers represents Xi and Vi at a same point of time and the coupling between different layers depends on the given problem and the structure of f and g.
- As an application making the most of the parallel processing power of the neural network, a method for solving differential equations and more in general partial differential equations can be enumerated. The feature of the neural network on the basis of the concurrent and competitive action between neural elements is that the action is performed simultaneously and parallelly. However, by the Monte Carlo method, which is a practical calculation approach, a simultaneous and parallel processing is simulated by changing the state of one element per unit time and repeating this process a sufficiently great number of times. This simultaneous concurrent and competitive action play an important role for solving differential equations, as indicated below.
- Now it is supposed that a partial differential equation is written as follows;
- Next it is necessary to rewrite the differential ua with respect to time in the form of finite differences. In general, according to the usual method, put ua = uαn - uαn-1, where the suffix n means that the time t is rewritten by a discrete variable.
- Here, what is a problematical point is which point of time (n or n-1), is taken for the point of time for the function F in the right member. In the case where n is taken, it is called the negative solving method, and in the case where n-1 is taken, it is called the positive solving method. At solving a hydraulic equation, in the case where the phenomenon varies relatively slowly and the stationary state is immediately established as a laminar flow, a solution of satisfactorily high precision can be obtained even by the positive solution method. However, for a flow having a high flow speed or a turbulent flow, the unit time should be satisfactorily short and therefore a long calculation time is necessary. If the unit time is too long, the precision is worsened or the process becomes divergent. In such a case the negative solution method is suitable. In general, by the negative solution method the solution is stable and a high precision can be obtained, independently of the unit time. However, in spite of such advantages, by the negative solution method it is required, in general, to repeat to calculate non-linear algebraic equations at each point of time and therefore a long time is necessary therefor.
- Making the most of the parallel processing function of the neural network, the difficulty described above of the negative solution method can be solved. The equation to be solved is;
- Fig. 9(a) indicates the structure of the neural network. A layer is prepared for every point of time. u is made correspond to each of the elements. This un is added to UNN n(a) and USN n(a) at the same point of time, i.e. in the same layer, and coupled with itself. Further, it has a relation also with uαn-1 before one unit time. As a concrete algorism, since the initial value is given for the input layer (n = 0), the solution proceeds one after another towards the upper layer. Or the minimization of
- The procedure of the processing will be explained, referring to Fig. 9(b).
- ① Set the initial value at the input layer (block 921).
- ② Set the initial state of the neural elements in the layers other than the input layer (block 922).
- Set the state of elements given at the boundary (block 923).
- ④ Set the unitial value of the temperature, which is a parameter for the simulated annealing method, etc. (block 924).
- Select at random or regularly the elements other than the input layer and the elements located at the boundary in the neural network (block 925).
- ⑥ Execute the simulated annealing method to change the state of the selected elements (block 926).
- ⑦ Execute the judgment of the convergence.
- If it is not convergent,
repeat - If it is convergent, the process proceeds to ⑧ End.
- If it is not convergent,
- ⑧ End.
- Next, concerning the problem of determining the function H* for minimizing the cost of Eq. (4) stated above, at first the principle there of will be explained.
-
- Since it is known that in general, the temperature is in the relation expressed by T = 0(Γ) with respect to additive noise, ∂V/∂x = 0(Γ-2) is valid, where the notation O(...) means the order of magnitude of ---, Rearranging Eq. (68-1) by using this, the equation, which V should obey, is given by ;
- Finally the problem is to solve the equation expressed by Eq. (70) and to obtain V. It is a matter of course that it is possible to solve this problem numerically. However this method takes a long CPU time and further it has not even any practical simplicity. Therefore, an analytical solution is desired. Here an approximate analytical solution is obtained by using a special perturbation, where r is an extremely small parameter. The state x giving the maximum value of P is represented by a*. Although this value itself is unknown, it is possible to examine the behavior of the solution in the proximity thereof. Therefore, paying attention to ∂V/∂x = o(Γ-2), the magnitude concerning in the right member of Eq. (70) is evaluated. Now, supposing that the state x is located in the neighborhood of a*, which is away therefrom only by √Γ (inner region : x = a* + 0(r)), since the first term is O(Γ-1) and the second term is 0(r 2), the second term is important. On the other hand, in the case where the state x is away from a* by a distance greater than Γ0 (outer region : x = a* + 0(1)), since the first term is O(Γ-2) and the second term is O(Γ-1) , the first term is predominant. Consequently separate solutions are obtained for these regions, which may be jointed smoothly. In the following, in each of the regions, approximate solutions up to O(√Γ) are constructed.
-
- Since it is thought that this region is almost achieved at T = 0, as it is understood from the initial condition, the value of V is small and the first term can be neglected approximately. Denoting the solution in this region by Vi, the approximate equation up to o(√Γ) is given by a doffusion equation, which is;
-
-
- Substituting it for V0(τ, x) in Eq. (73);
- In order to connect the solution (72) in the inner region with the solution (77) in the outer region and to obtain a homogeneous solution over the whole regions, it is sufficient that the values of function and the values of spatial differentiation of the solutions in the different regions are put equal to each other at the boundary xb = α* + O(√Γ). That is,
- At first, Am is determined. Since Eq. (72) describes the state in the neighborhood of a maximum of P, it is a formula, which is valid originally in a region, where T is small. However, in order to connect it with the solution in the outer region, enlarging the region and obtaining the asymptotic form of the solution in the region, where T is great, Vi(τ, xb) ~ Γ/2{-1+2τ} is obtained. In the same way, the asymptotic form of the solution in the region, where T is small, is given by V0(τ, xb) ~ -Am{-1 + 1/2L-1Γ2CAmτ}B(xb). Therefore, comparing the two members with each other, it is determined that Am = 4LΓ-2 C-1Consequently
-
- Taking the easiness of utilization into account, it is more convenient, if the temperature Topt, at which H' = (E/T)')≡ E'/Topt, is defined. This is because, variations in H need not be calculated directly, but it is sufficient to calculate only variations in the cost E. When the above formula is rewritten by using this temperature the following equations are obtained for the different regions;
- From the point of view of the order of magnitude, the last equations are written as Tiopt = O(Γ), Toopt = O(Γ), paying attention to E' = O(√Γ) in the inner region, which is not contradictory to the definition of the temperature. It is thought that for exceeding the maximum value in the outer region the temperature is raised and the magnitude of the additive noise is increased to effect the regulation. Here a restricting condition for the temperature, which is Topt > 0, is supposed. Since E(x) - (x-a*)2 in the neighborhood of a*, the first equation in Eq. (85) is always negative. Therefore Tiopt = 0 in the inner region. This requirement means that fluctuations at extreme values are reduced and in particular that fluctuations are eliminated in the state where the minimum value is given. Further, also in the time dependence, T=0 in that state. Consequently, putting the two regions together, it is possible to write as follows;
- Hereinbelow a concrete embodiment will be explained.
- Here, taking a simple one-dimensional cost function as an example, the usefulness of the here proposed new schedule Topt is verified. Denoting a positive constant by k, as the differential E' of the cost function;
- Then the integration can be executed easily to obtain Topt as follows;
- As an example of the cost function stated above,
- Fig. 12 indicates the comparison of T(t) = T0/log(t+1); T0 = 3 obtained by the prior art method with the result of a simulation effected with Topt. At first, contrarily to the prior art method, almost no fluctuations take place at the smallest value. Further the convergence to the smallest value is rapid (a) (d). In extreme examples (e) and (f), even in the case where the smallest value is not achieved by the prior art method, by the method according to the present invention, it is possible to achieve it. This result indicates the average value obtained by executing 100 simulations.
- The present method is applied to a still more complicated example. In this case, since the integral of Eq. (86) cannot be obtained directly, an approximate optimum schedule, as indicated below, is used. An important feature, which can be seen from the schedule indicated in the example stated above, is that it takes a great value at positions close to the cost barrier to be overcome and 0 in the neighborhood of extreme values. From this fact, it can be thought that if the cost function is set so as to have the greatest value, when the second differential V2E thereof is negative, the essence of the optimum schedule can be caught. Therefore an approximative schedule expressed by;
-
- The number of Monte Carlo simulations is represented by a notation t. The greatest value tmax of t was 1000. Further, for the comparison, the smallest value of Topt was not 0, but it was set at 1/log(tmax+1), which was the value of T by (A) at tmax Fig. 13 show simulation results. By the prior art method by (A) the simulation was effected for TO = 1.0 and 4.0. Although they shows somewhat different values at the starting point of time, both of them give an almost same value E/N = -0.63 as the cost at t max However, by (C), it was possible to obtain a cost -0.90, which is fairly lower than those obtained by other methods. However, although it is thought that the low cost was obtained by the method of (C) simply by raising the temperature, the situation is totally different. In order to see it, the simulation was effected by (B) at a high temperature T(t) = 5.0. A result, which was worse than that obtained by the prior art method, was obtained because of significant noise.
- Fig. 14 indicates schematically the outline of an image processing system utilizing the present invention. Electric signals obtained by imaging an object to be observed 51 by means of an ITV camera are sent to an
image processing device 60 having at least a processor and a memory. They are transformed into digital image data by an analogue-digital converting andquantizing device 53 and stored in a file 54 as primitive image data. Noises due to various factors are mixed in these primitive image data apart from errors caused by non-linear characteristics of the ITV camera 52. Then the primitive image data are read out from the file 54 and sent e.g. to a probabilisticimage processing device 55. In this processing processings such as removal of noises without dulling edges are achieved, in general, by minimizing the energy of the image constructed by the primitive image data. This minimization is executed by a minimum and maximumvalue searching device 56 according to the present invention. The processing is effected by the repetition method according to Figs. 10 and 11 and the intermediate result is stored in afile 57. - The processed image data are read out at need after having been stored in a
file 58 and subjected to other image processings or displayed on andisplay device 60 after having been sent to a D/A converting device 59. - According to the present invention the following effects can be obtained.
- (1) The calculation speed for problems of recognizing images, sound, etc., movement control, time dependent large scale numerical calculations, which were difficult to solve by means of a prior art computer, can be increased by taking-in internal structure based on living body physiological knowledge or presumed from that knowledge and by means of a neural network, whose basic principle is the parallel concurrent and competitive action of groups of neural elements.
- (2) A probabilistic mountain climbing method called simulated annealing is proposed, by which, in the case where minimum (maximum) values of a function having a number of extreme values, the maximization of exp[-E/T) is considered instead of a function E. The parameter introduced here is called temperature and introduced in order to make it possible to generate random noise and to allow probabilistic treatment. Consequently, when E reaches the smallest value, it is necessary to set T at 0 to make the value stay at the smallest value without errors. It is the greatest problem of the simulated annealing to determine how to decrease T to a low temperature. For this purpose the temperature was determined so that the time necessary for passing from the initial state to the state, where the greatest value, which is the final target, is given, was minimized. As the result of a simulation experiment, it was possible to verify that the smallest value, which is smaller than that obtained by the prior art method, can be obtained even for a complicated non-linear function having a number of independent variables. In this way it is possible to obtain surely the smallest value with a high speed.
In the cerebrum the above processes are effected successively. As physiological knowledge the information processing system for the visual sensation on 1. and the plasticity of the synapse coupling on 3. are known only slightly. 2. is a presently active research field, but it has not yet been achieved to obtain any unified understanding. The high order information processing method by means of the neural network according to the present invention is an information processing method simulating a cerebrum. Concerning the
Claims (18)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP6942088A JPH01243176A (en) | 1988-03-25 | 1988-03-25 | System for searching minimum and maximum values |
JP69420/88 | 1988-03-25 | ||
JP63232377A JPH0281280A (en) | 1988-09-19 | 1988-09-19 | Higher order information processing system by neural circuit network |
JP232377/88 | 1988-09-19 | ||
PCT/JP1989/000317 WO1989009457A1 (en) | 1988-03-25 | 1989-03-24 | Processing of high-order information with neuron network and minimum and maximum value searching method therefor |
Publications (3)
Publication Number | Publication Date |
---|---|
EP0366804A1 true EP0366804A1 (en) | 1990-05-09 |
EP0366804A4 EP0366804A4 (en) | 1991-01-23 |
EP0366804B1 EP0366804B1 (en) | 1997-12-10 |
Family
ID=26410625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP89903795A Expired - Lifetime EP0366804B1 (en) | 1988-03-25 | 1989-03-24 | Method of recognizing image structures |
Country Status (4)
Country | Link |
---|---|
US (1) | US5153923A (en) |
EP (1) | EP0366804B1 (en) |
DE (1) | DE68928484T2 (en) |
WO (1) | WO1989009457A1 (en) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR930009066B1 (en) * | 1990-08-18 | 1993-09-22 | 정호선 | Multilayer neural network and method of its circuit design |
JP2760145B2 (en) * | 1990-09-26 | 1998-05-28 | 三菱電機株式会社 | Knowledge information processing device |
JPH0695192A (en) * | 1991-05-01 | 1994-04-08 | Matsushita Electric Ind Co Ltd | Device for image recognition and method for inputting image |
EP0495630A1 (en) * | 1991-01-14 | 1992-07-22 | Kabushiki Kaisha Toshiba | Distribution generation system, and optimization system that adopts distribution generation system |
US5500905A (en) * | 1991-06-12 | 1996-03-19 | Microelectronics And Computer Technology Corporation | Pattern recognition neural network with saccade-like operation |
FR2678407A1 (en) * | 1991-06-28 | 1992-12-31 | Philips Electronique Lab | METHOD AND DEVICE WITH NEURAL NETWORK FOR CLASSIFYING DATA, AND SIGNATURE AUTHENTICATION APPARATUS. |
JP3178884B2 (en) * | 1992-03-30 | 2001-06-25 | 株式会社東芝 | Neural network device |
US5351311A (en) * | 1992-07-28 | 1994-09-27 | The United States Of America As Represented By The Secretary Of The Navy | Neural network for detection and correction of local boundary misalignments between images |
US5420939A (en) * | 1992-12-31 | 1995-05-30 | Intel Corporation | Method and apparatus for a focal neuron system |
US5491650A (en) * | 1993-04-19 | 1996-02-13 | California Institute Of Technology | High precision computing with charge domain devices and a pseudo-spectral method therefor |
KR970008532B1 (en) * | 1993-08-10 | 1997-05-24 | 재단법인 한국전자통신연구소 | Neural metwork |
US5659666A (en) * | 1994-10-13 | 1997-08-19 | Thaler; Stephen L. | Device for the autonomous generation of useful information |
US5745735A (en) * | 1995-10-26 | 1998-04-28 | International Business Machines Corporation | Localized simulated annealing |
US5845271A (en) * | 1996-01-26 | 1998-12-01 | Thaler; Stephen L. | Non-algorithmically implemented artificial neural networks and components thereof |
US6601049B1 (en) * | 1996-05-02 | 2003-07-29 | David L. Cooper | Self-adjusting multi-layer neural network architectures and methods therefor |
US6009418A (en) * | 1996-05-02 | 1999-12-28 | Cooper; David L. | Method and apparatus for neural networking using semantic attractor architecture |
FR2754080B1 (en) * | 1996-10-01 | 1998-10-30 | Commissariat Energie Atomique | LEARNING METHOD FOR THE CLASSIFICATION OF DATA ACCORDING TO TWO CLASSES SEPARATED BY A SEPARATING SURFACE OF ORDER 1 OR 2 |
JP3422686B2 (en) | 1998-06-12 | 2003-06-30 | 三菱電機株式会社 | Data decoding device and data decoding method |
US6202007B1 (en) | 1999-02-19 | 2001-03-13 | John A. Spicer | Exact stability integration in network designs |
US7085688B1 (en) * | 1999-10-22 | 2006-08-01 | Shizuo Sumida | Non-linear characteristic reproducing apparatus and non-linear characteristic reproducing program storage medium |
WO2002069137A1 (en) * | 2001-02-23 | 2002-09-06 | The Trustees Of The University Of Pennsylvania | Dynamical brain model for use in data processing applications |
US6662091B2 (en) | 2001-06-29 | 2003-12-09 | Battelle Memorial Institute | Diagnostics/prognostics using wireless links |
NZ530434A (en) | 2001-07-02 | 2005-01-28 | Battelle Memorial Institute | Intelligent microsensor module |
US7577631B2 (en) * | 2001-09-10 | 2009-08-18 | Feldhake Michael J | Cognitive image filtering |
US8144919B2 (en) * | 2006-09-22 | 2012-03-27 | Fuji Xerox Co., Ltd. | Annealing algorithm for non-rectangular shaped stained glass collages |
WO2012078636A1 (en) | 2010-12-07 | 2012-06-14 | University Of Iowa Research Foundation | Optimal, user-friendly, object background separation |
US9189729B2 (en) * | 2012-07-30 | 2015-11-17 | International Business Machines Corporation | Scalable neural hardware for the noisy-OR model of Bayesian networks |
US10115194B2 (en) * | 2015-04-06 | 2018-10-30 | IDx, LLC | Systems and methods for feature detection in retinal images |
KR102494139B1 (en) | 2015-11-06 | 2023-01-31 | 삼성전자주식회사 | Apparatus and method for training neural network, apparatus and method for speech recognition |
TWI676536B (en) * | 2016-01-22 | 2019-11-11 | 大陸商鴻富錦精密工業(武漢)有限公司 | System and method for controlling robot based on brain electrical signal |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4805225A (en) * | 1986-11-06 | 1989-02-14 | The Research Foundation Of The State University Of New York | Pattern recognition method and apparatus |
US4965725B1 (en) * | 1988-04-08 | 1996-05-07 | Neuromedical Systems Inc | Neural network based automated cytological specimen classification system and method |
US5003490A (en) * | 1988-10-07 | 1991-03-26 | Hughes Aircraft Company | Neural network signal processor |
US4941122A (en) * | 1989-01-12 | 1990-07-10 | Recognition Equipment Incorp. | Neural network image processing system |
JPH05121749A (en) * | 1991-10-25 | 1993-05-18 | Mitsubishi Electric Corp | Electrically writable and erasable semiconductor storage device and manufacture thereof |
-
1989
- 1989-03-24 US US07/445,840 patent/US5153923A/en not_active Expired - Fee Related
- 1989-03-24 EP EP89903795A patent/EP0366804B1/en not_active Expired - Lifetime
- 1989-03-24 WO PCT/JP1989/000317 patent/WO1989009457A1/en active IP Right Grant
- 1989-03-24 DE DE68928484T patent/DE68928484T2/en not_active Expired - Fee Related
Non-Patent Citations (6)
Title |
---|
COMPUTER, vol. 21, no. 3, March 1988, pages 77-88; G.A. CARPENTER et al.: "The ART of adaptive pattern recognition by a self-organizing neural network" * |
IEEE FIRST INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, San Diego, California, 21st - 24th June 1987, pages III-173-183; T. SAMAD et al.: "Associative memory storage using a variant of the generalized delta rule" * |
IEEE FIRST INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, San Diego, California, 21st - 24th June 1987, pages IV-247-258; E. MESROBIAN et al.: "Discrimination of natural textures: A neural network architecture" * |
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS, no. 4, July 1973, pages 425-428; J. KOPLOWITZ et al.: "Motivation system for a robot" * |
N.H.K. LABORATORIES NOTE, no. 236, September 1986, pages 1,3-13; K. FUKUSHIMA et al.: "Neocognitron: A biocybernetic approach to visual pattern recognition" * |
See also references of WO8909457A1 * |
Also Published As
Publication number | Publication date |
---|---|
DE68928484D1 (en) | 1998-01-22 |
US5153923A (en) | 1992-10-06 |
EP0366804B1 (en) | 1997-12-10 |
DE68928484T2 (en) | 1998-07-23 |
WO1989009457A1 (en) | 1989-10-05 |
EP0366804A4 (en) | 1991-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0366804A1 (en) | Method of recognizing image structures | |
Mehlig | Machine learning with neural networks: an introduction for scientists and engineers | |
Mareschal et al. | Generative connectionist networks and constructivist cognitive development | |
Tani et al. | Self-organization of distributedly represented multiple behavior schemata in a mirror system: reviews of robot experiments using RNNPB | |
Werbos | Neurocontrol and fuzzy logic: connections and designs | |
Pearlmutter | Gradient calculations for dynamic recurrent neural networks: A survey | |
Cao et al. | Explanatory models in neuroscience: Part 2--constraint-based intelligibility | |
Cheung et al. | An introduction to neural networks | |
Parisi et al. | Artificial life and Piaget | |
Sharma et al. | Large-scale cognitive model design using the Nengo neural simulator | |
Alsanwy et al. | Prediction of vehicle motion signals for motion simulators using long short-term memory networks | |
Wisniewski et al. | Predicting shifts in generalization gradients with perceptrons | |
Evers | Connectionist modelling and education | |
Cajic et al. | Neuro-Fuzzy Disease Detection Using Interpolation in Matlab: Unveiling the Hidden Patterns | |
Gorbachev et al. | A synergistic effect in the measurement of neuro-fuzzy system | |
Di Nuovo | A developmental neuro-robotics approach for boosting the recognition of handwritten digits | |
Yu et al. | Intelligence in Machines | |
Clark et al. | Artificial neural networks that learn many-body physics | |
Zamani | A Reinforcement Learning Algorithm for Training a Spiking Neural Network Agent | |
Barber et al. | Generating neural circuits that implement probabilistic reasoning | |
Poulos et al. | A classroom observation model fitted to stochastic and probabilistic decision systems | |
Grigoryan et al. | RESEARCH ON ARTIFICIAL NEURAL NETWORKS AND ELABORATION OF THEIR OBTAINING METHODOLOGY PRINCIPLES | |
Schillen | Designing a neural network simulator—the MENS modelling environment for network systems: I | |
Binning | Review of back propagation neural networks and traditional statistical methods | |
Anagnostopoulos | Artificial Neural Networks for Data Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19891128 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE GB |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 19901203 |
|
AK | Designated contracting states |
Kind code of ref document: A4 Designated state(s): DE GB |
|
17Q | First examination report despatched |
Effective date: 19940217 |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE GB |
|
REF | Corresponds to: |
Ref document number: 68928484 Country of ref document: DE Date of ref document: 19980122 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 19990312 Year of fee payment: 11 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 19990527 Year of fee payment: 11 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20000324 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20000324 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20010103 |