CN109344958B

CN109344958B - Object identification method and identification system based on feedback adjustment

Info

Publication number: CN109344958B
Application number: CN201810936167.7A
Authority: CN
Inventors: 吴思; 刘潇
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2022-04-29
Anticipated expiration: 2038-08-16
Also published as: CN109344958A

Abstract

The invention discloses an object identification method based on feedback adjustment, which adopts a deep neural network to identify objects and comprises the following steps: visual information enters the deep neural network from a lower level and is propagated and integrated from bottom to top along the feedforward connection of the deep neural network; in the early stage of object identification, inter-class noise caused by mutual interference among different classes in the deep neural network is inhibited through forward feedback; in the later stage of object identification, objects of the same class in the deep neural network are inhibited through inverse feedback to mix up the identified intra-class noise due to the similarity between the objects. The invention also discloses an object identification system adopting the method.

Description

Object identification method and identification system based on feedback adjustment

Technical Field

The invention relates to an object recognition method, in particular to an object recognition method for optimizing a memory retrieval effect in a neural network by combining forward feedback (push) and reverse feedback (pull), and also relates to a corresponding object recognition system, belonging to the technical field of deep neural networks.

Background

Research in cognitive neuroscience has demonstrated that there is a particular hierarchy of human cognitive recognition of objects. For example, the process we identify border shepherd dogs is such: first, it is judged to belong to the animal (highest grade); then, identify it as a certain dog (higher level); finally, it was identified as belonging to border shepherd dogs (lower grade). In this hierarchical structure, the commonalities of species belonging to the same class constitute its higher-level characteristics, for example, border shepherd dogs and central chinese garden dogs belong to the same class, and their commonalities constitute their higher-level characteristics. In the identification process, the greater the difference between the features of the higher categories, the easier it is to achieve accurate and rapid discrimination, and can help achieve identification of the lower categories.

Deep Neural Network (Deep Neural Network) is a multi-layer Neural Network with at least one hidden layer. The method simulates the hierarchical information processing process of a visual path, and achieves great success in the application fields of object identification and the like. Structurally, a deep neural network is mainly composed of feed-forward connections from lower layers to higher layers. However, experimental data of cognitive neuroscience show that there is also a very rich feedback connection from high to low layers in the real nervous system. At present, the degree of understanding of the role of feedback links in the nervous system is still insufficient.

In the chinese patent application with the application number of 201110185644.1, tianjin university proposes an object identification method based on a cascade micro neural network, which includes the following steps: calculating the number of cascaded convolutional layers in the deep neural network according to the size of the input sample image; constructing a deep cascade micro neural network; setting the training parameters of the deep cascade micro-neural network, and training by adopting a random gradient descent method; classifying by adopting a softmax classifier, and calculating a classification error by utilizing a forward propagation algorithm; updating the weight value of the parameter to be trained in the neural network by utilizing back propagation operation; the deep cascade micro-neural network is obtained, and the identification performance of the object identification system can be improved under the condition that parameters (operation complexity) are not increased.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an object identification method which combines forward feedback (push) and reverse feedback (pull) and optimizes the memory retrieval effect in a neural network.

Another object of the present invention is to provide an object recognition system using the above method.

In order to achieve the purpose of the invention, the invention adopts the following technical scheme:

according to a first aspect of the embodiments of the present invention, there is provided an object recognition method based on feedback adjustment, which uses a deep neural network to perform object recognition, including the following steps:

visual information enters the deep neural network from a lower level and is propagated and integrated from bottom to top along the feedforward connection of the deep neural network; wherein,

in the early stage of object identification, inter-class noise caused by mutual interference among different classes in a deep neural network is inhibited through forward feedback; in the later stage of object identification, the objects in the same class in the deep neural network are inhibited by reverse feedback to mix up the identified intra-class noise due to the similarity between the objects.

Preferably, in the deep neural network, the feedforward connection and the feedback connection are simultaneously arranged between each level of hierarchy, and the recursive connection between the neurons in the same layer is the connection with the associative memory function.

Preferably, the object classification information is stored by using the hierarchical structure of the deep neural network, so that the information of the memory module can be identified in a structured mode.

According to a second aspect of embodiments of the present invention, there is provided an object recognition system based on feedback adjustment, comprising a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of:

Compared with the prior art, the object identification method and the object identification system provided by the invention adopt an inter-level feedback regulation mechanism organically combining forward feedback (push) and reverse feedback (pull). When the object recognition task is completed, the image of the object to be recognized is input into the first layer of the deep neural network, information is integrated from bottom to top, and meanwhile recurrence is attempted in the layers. The invention provides a heuristic for developing a better artificial intelligence algorithm to carry out object recognition.

Drawings

FIG. 1 is a diagram of the activity of a neuron population at the primary visual cortex V1 level during contour integration tasks performed by monkeys;

FIG. 2 is a model diagram of a multi-layer neural network;

FIG. 3 is a diagram illustrating memory modules stored in a class hierarchy in the multi-layer neural network shown in FIG. 2;

FIG. 4 is an exemplary graph of layer 1 neuronal population activity with the "push-pull" phenomenon observed in the experiment;

FIG. 5 is a schematic diagram of object recognition based on feedback adjustment, in accordance with an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an object recognition system provided in the present invention.

Detailed Description

The technical contents of the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

In combination with the study of the cognitive neuroscience on the complex visual cortex pathway, we know that in a neural network, relatively higher levels process more comprehensive and abstract information, for example, the primary visual cortex V1 layer identifies local orientation, higher levels can identify contour, and then upward cortex identifies a specific certain face. Meanwhile, the phenomenon is also found in a deep neural network of object recognition tasks such as face recognition. This means that a bottom-up, gradual abstraction of the processed information is naturally formed in a hierarchical network.

On the other hand, experiments in cognitive neuroscience have shown that there are a large number of feedback connections in addition to feedforward connections in the nervous system. Existing experimental data indicate that feedback modulation in the visual system has a "push-pull" phenomenon. Figure 1 shows the neuronal population activity of V1 (primary visual cortex V1 layer) when the monkey was subjected to the contour integration task. Data were recorded at V1 of awake monkeys. The visual stimulus is a virtual contour hidden in noise, which occurs at time t-0. The blue curve shows that the neural response changes with time, rising early and falling late when the monkey is able to identify the contour. The red curve indicates the situation where the contour is not known by the monkey. The green curve is the neuron population response in the absence of visual stimulation. Under visual stimulation, the neural activity of V1 increased in the early stage, showing a "push (positive feedback)" phenomenon, and decreased in the later stage, showing a "pull (negative feedback)" phenomenon. It is noteworthy that the "push-pull" phenomenon only occurs if the monkey correctly recognizes the contour. In a similar multi-unit recording experiment, it was further confirmed that the "push-pull" phenomenon was due to feedback from V4 (primary visual cortex V4 layer) and was not a neural adaptation result.

In order to reproduce and fully utilize the above-mentioned "push-pull" phenomenon in the deep neural network, the inventor firstly constructs a multi-layer neural network with memory modules arranged in a hierarchy, and the memory modules in each hierarchy correspondingly store different types of information. Referring to fig. 2, in the established multi-layer neural network, both feedforward connection and feedback connection exist between each layer, and the recursive connection (recurrent connection) between neurons in the same layer is also a connection with associative memory function (this is to consider that there are abundant recursive connections between neurons in visual cortex). The recursive connection not only helps the neural network to realize the identification of the memory module, but also can reproduce a more complete and clear memory module from fuzzy or incomplete input information. For convenience of description, the inventor uses the Hopfield model to describe the function of each layer of neural network in the embodiment of the present invention, but the technical idea of the present invention is obviously applicable to other similar memory models. The advantage of the Hopfield model is that it can analyze information quantitatively at each layer, thereby clearly elucidating the role of feedback modulation.

Fig. 3 is a specific embodiment of the multi-layer neural network. In this example, the inventors have built a three-layer neural network. For convenience of description, three layers from top to bottom are respectively called an ancestor layer (layer 3), an ancestor layer (layer 2) and a child layer (layer 1) in the present invention, so as to reflect descending class relationships thereof. It stores memory modules with three-level hierarchical structure in one-to-one correspondence: each module belongs to a group of parent modules, which in turn belong to a set of grandparent modules. In order to simulate the role of the network layer in hierarchical memory retrieval, each layer has a corresponding level of associative memory function, and the layers contact their parent and child layers through feed-forward and feedback connections, respectively. It will be appreciated that it is natural to generalize the three-layer neural network described above to more than three layers in other embodiments.

For ease of analysis, the inventors first described using a discrete Hopfield model at each layer, which facilitates a clear elucidation of the role of feedback modulation. Later, the inventors will confirm the results using a more biologically reasonable continuous Hopfield model simulation.

It is assumed that each layer in a multi-layer neural network has the same number of neurons. Herein, use

Wherein i 1.. times.n, which represents the state of the neuron i of the l layers at the time t, and the value thereof is ± 1;

recursive connecting weights representing symmetry between neurons i and j at level l;

is the feed-forward connection weight from l layer to l +1 layer; corresponding to

I.e. the feedback connection weight from l +1 layer to l layer. The dynamics of neurons follow the Hopfield model, namely:

wherein,

is the total input received by neuron i, specifically the input has the form:

the memory modules stored in the multilayer neural network are respectively as follows: grandfather's module is { ξ a }, where α ═ 1_αThe parent module is { xi^α，βWhere β 1.., P_βAnd the child module is { xi^α，β，γWhere γ 1.., P_γ。P_α，P_βAnd P_γRespectively representing the number of ancestor modules, the number of ancestor modules that are affiliated with the same ancestor, and the number of child modules that are affiliated with the same ancestor. Class relationships between memory modules are defined based on hierarchical properties, i.e., modules belonging to the same group have greater similarity than modules belonging to different groups. For example, child modules xi from the same ancestor, different ancestors, the same ancestor, or different ancestors^α，β，γAnd xi^{α′，β′，γ′}Sigma of similarity between_iξ^α，β，γξ^{α′，β′，γ′}N is each

Or is 0.

The connections between the neurons of the same layer generated by the Hebbian learning rule are:

the feedforward connection of the lower layer to the upper layer is as follows:

in the multilayer neural network, if the state of the layer 1 is a memory module

The information passed to layer 2 by the feed-forward connection is then

That is, through the information integration process of the feed-forward connection, the parent module is retrieved at layer 2

Is easy.

In order to quantify the performance of memory retrieval, the inventor defines a macroscopic variable m (t) which can well measure the degree of coincidence between the network state x (t) and the memory module. The degree of overlap m (t) on the different layers is calculated as follows:

next, the inventor will further explore the role of feedback adjustment in improving the network memory retrieval effect. Without loss of generality, the inventors focused on analyzing the feedback regulation effect of layer 2 on layer 1.

First, it is necessary to understand the memory retrieval of layer 1 itself without feedback adjustment. Like conventional stability analysis, the inventors consider the case of a specific memory module at layer 1 as the initial state of the network, that is, assume that

And then analyzing what factors affect the retrieval effect of the module. After one-step iteration, the neural network is rearranged to obtain:

in the above expression, the input received by the neuron is decomposed into two parts, signal and noise. The latter can be further decomposed into two terms: c. C_iAnd

they represent intra-class noise and inter-class noise, respectively. Wherein, the noise in class refers to the noise caused by the correlation between the modules of the same parent; accordingly, inter-class noise refers to noise introduced by those sibling modules of different parents of the same ancestor.

The morphology of the above distribution indicates that even if the initial input to the multi-layer neural network is noise free, it is true

The noise generated by the correlation of the memory modules can also lead to errors in the eventual evolution of the network dynamics. The correlated noise includes both intra-class noise and inter-class noise, which reflects that the similarity between objects in the object identification process is likely to cause confusion.

To address this problem, the inventors first introduced a setting of forward feedback (push) from layer 2 to layer 1 to suppress inter-class noise

Wherein,

specifically, if the parent module is retrieved at layer 2

Then the "push" feedback gives the input to layer 1 as

Such feedback input can enhance the activity strength of all child modules belonging to this parent module, i.e., the activity strength of all child modules belonging to this parent module

And will not be effective for other memory modules. Conversely, the "push" feedback can effectively suppress interlayer noise of layer 1.

Then, the feedback regulation mechanism of reverse feedback (pull) action is continuously introduced to suppress the noise C in class_iThe following are specifically described:

assume that the "pull" feedback connection weight is set to:

if the state of layer 2 happens to be the correct parent module, i.e. the

Then layer 1 is under "pull" feedback,

at this time, it is possible to obtain,

wherein,

through analysis, it can be found that the sameSimilarity coefficient b between different children of parent₁Within a certain range, "pull" feedback can reduce retrieval errors, i.e.,

essentially, the effect of the "pull" feedback is to actually subtract the values of the parent layer from the module information of the lower layer, consistent with the idea of predictive coding. In fact, the "pull" feedback can explain the endpoint phenomenon that is often used to demonstrate predictive coding theory as well.

The inventors have studied the different effects of the "push" feedback and the "pull" feedback, respectively, and analyzed the effects of the two, the former suppressing inter-class noise and the latter suppressing intra-class noise. That is, both forms of feedback interaction are necessary to maximize feedback gain.

In combination with theoretical analysis and simulation experimental results, the inventors believe that the optimal feedback modulation should be dynamic, that is, the feedback is positive at an early stage (typically 1 τ to 2.5 τ, about 50ms to 150ms, where τ is a time constant); in the later stage (typically 2 τ to 4 τ, approximately 150ms to 250ms, where τ is the time constant) the feedback becomes reversed. In this way, feedback modulation can suppress both inter-class noise and intra-class noise, thereby helping the multi-layer neural network to perform memory module retrieval from coarse to fine. The time constant tau is a parameter for regulating the network evolution rate and can be dynamically adjusted according to actual needs.

It should be noted that the early stage and the later stage may partially coincide, i.e. there is a coincidence time period (typically 2 τ to 2.5 τ). During this coincidence period, the "push" feedback and the "pull" feedback are simultaneously active. Relevant experiments prove that the retrieval effect is better under the condition that the overlapping time periods exist. Of course, by adjusting the time ranges of the early stage and the late stage, the coincidence period may be cancelled or extended.

On this basis, the inventors conducted a simulation experiment combining "push-pull (forward feedback-reverse feedback)". Here, the inventors only describe the results of using the continuous Hopfield model, because although the conclusions of the experiments on the discrete Hopfield model are also consistent in nature, the continuous model is more biologically reasonable than the discrete model, and can better illustrate the dynamics of the feedback modulation.

For brevity of the lines, the inventors herein only describe the neuron kinetic equations of layer 1, and the neuron kinetic equations of the other layers are similar in form, as shown below.

Wherein,

and

representing the input and firing frequencies of the synapses, respectively, tau is a time constant,

is the external input and actan (x) is the inverse function of tangent.

The feedback connection varies with time. At an early stage, the feedback is positive,

a + is an adjustment coefficient, which is always a positive number. At a later stage, the feedback regulation is reversed, i.e.,

a-is the adjustment coefficient, which is always a positive number. According to experimental data, the duration of the feedback adjustment set by the inventors is on the order of the membrane time constant, approximatelyAbout 1 tau to 2.5 tau.

FIG. 4 is a typical example of layer 1 neuron population activity, with the "push-pull" phenomenon observed in the experiment. Section B of fig. 4 shows the search accuracy of layer 1 and layer 2 in the same experiment as a function of time. It can be seen from figure 4 that the retrieval accuracy of the parent module at layer 2 quickly rises to a higher level (m is close to 1). And the retrieval precision of the layer 1 to the child modules is gradually increased under the feedback of 'push-pull', and finally, the accuracy is higher than the accuracy of the level without the feedback. This indicates that the "push-pull" feedback does improve the retrieval of the sub-module.

Based on the feedback modulation mechanism, the multilayer neural network provided by the invention can well reproduce the phenomenon of push-pull: in the early stage of object identification, inter-class noise caused by mutual interference among different classes in a deep neural network is inhibited through forward feedback; in the later stage of object identification, the objects in the same class in the deep neural network are inhibited by reverse feedback to mix up the identified intra-class noise due to the similarity between the objects. The above process is a process of acquiring object information from a rough step to a fine step in terms of time. Again, this result is consistent with the "push-pull" phenomenon observed in the object recognition experiments.

In one embodiment of the invention, assuming the object recognition task is to identify cat or dog classes, in this example, a total of 18 cat or dog classes (9 classes each, sample pictures from the picture library ImageNet) are memorized, and the pictures are processed into column vectors by a pre-trained deep neural network (vgg), and it can be seen in conjunction with fig. 5 that the column vectors automatically form a hierarchical structure between them, more similar to the classes belonging to cats or dogs. The column vector for a particular picture is then entered into the network, and its category is identified and reproduced by the network. As shown in fig. 5, after inputting the picture of the object, the network evolution is started, a large category (cat category) is identified at layer 2, namely forward feedback, the similarity of the dog is suppressed, reverse feedback, and the similarity of other cats is suppressed in a feedback modulation mechanism, until the category identification of the cat or the dog is completed.

It should be noted that the real nervous system has sufficient capacity to implement a dynamic feedback modulation mechanism. For simplicity, the current model assumes that the "push" and "pull" feedback share the same neuron connection, but in practice they are most likely implemented through different signal paths. For example, "push" feedback may be accomplished by direct excitatory synaptic connections from high to low, and in such conditions, feedback cessation may be controlled by means of short-term inhibition of synapses. On the other hand, "pull" feedback may come from another pathway, such as inhibitory interneurons (inter-level inhibition is usually mediated by interneurons). This regulation is delayed compared to the direct connection pathway and is controlled by the response of the upper neurons. That is, the "pull" feedback is only initiated when a higher level object is identified at a higher level.

On the other hand, the conventional neural network with weighting constructed by the Hebbian learning rule cannot support too large storage capacity because the similarity between memory patterns can lead to rapid deterioration of retrieval. To address this deficiency, the present invention contemplates using the hierarchical features of the nervous system structure to classify information by objects stored in a hierarchical structure, thereby reducing the correlation between memory modules (since memory modules at higher levels are more independent). And the interference of intra-class and inter-class correlation between the lower-layer memory modes is restrained through 'push-pull' feedback during the memory retrieval, so that the memory retrieval effect is improved.

Further, the invention also provides an object recognition system based on feedback adjustment. As shown in fig. 6, the system includes a processor 12 and a memory 11 storing instructions executable by the processor 12;

processor 12 may be a general-purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.

The memory 11 is configured to store a program code, and transmit the program code to a CPU or a single chip Microcomputer (MCU). The memory 11 may include volatile memory, such as Random Access Memory (RAM); the memory 11 may also include a non-volatile memory, such as a read-only memory, a flash memory, a hard or solid state disk, an MCU, etc.; the memory 11 may also comprise a combination of memories of the kind described above.

Specifically, the object recognition system based on feedback adjustment provided by the embodiment of the invention comprises a processor 12 and a memory 11; the memory 11 has stored thereon a computer program operable on the processor 12, which when executed by the processor 12 performs the steps of:

The embodiment of the invention also provides a computer readable storage medium. The computer-readable storage medium herein stores one or more programs. Among other things, computer-readable storage media may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard or solid state disk, MCU, etc.; the memory may also comprise a combination of memories of the kind described above. When the one or more programs in the computer readable storage medium are executable by the one or more processors to perform some or all of the steps of the above-described feedback adjustment-based object recognition method.

The object recognition method and system based on feedback adjustment provided by the present invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the invention and would carry a corresponding legal responsibility.

Claims

1. An object recognition method based on feedback adjustment adopts a deep neural network to perform object recognition, and is characterized by comprising the following steps:

in the early stage of object identification, inter-class noise caused by mutual interference among different classes in a deep neural network is inhibited through forward feedback;

in the later stage of object identification, the identified intra-class noise is mixed up due to the similarity between the objects in the same class in the deep neural network through reverse feedback suppression,

the early stage is 1 tau to 2.5 tau, the later stage is 2 tau to 4 tau, where tau is a time constant,

the forward feedback connection weight from layer 2 to layer 1 between neuron i and neuron j is:

wherein α ═ 1.., P_αParent moduleIs { xi^α，βWhere β 1.., P_βAnd the child module is { xi^α，β，γWhere γ 1.., P_γ，P_α，P_βAnd P_γRespectively representing the number of ancestor modules;

the negative feedback connection weight from layer 2 to layer 1 between neuron i and neuron j is:

wherein b is₁Are the similarity coefficients between different children of the same parent.

2. The object recognition method according to claim 1, characterized by comprising the steps of:

the inter-class noise refers to noise caused by the table brother and sister modules of different parents of the same ancestor.

3. The object recognition method according to claim 1, characterized by comprising the steps of:

the intra-class noise refers to noise due to correlation between sub-modules of the same parent.

4. The object recognition method according to claim 1, characterized by comprising the steps of:

the forward feedback and the forward feedback act simultaneously during a coincidence time period of the early stage and the late stage.

5. The object recognition method according to claim 1, characterized by comprising the steps of:

in the deep neural network, feedforward connection and feedback connection are simultaneously arranged between each level, and recursive connection between neurons in the same layer is connection with associative memory function.

6. The object recognition method according to claim 1, characterized by comprising the steps of:

the forward feedback and the backward feedback share the same neuron connection.

7. The object recognition method according to claim 1, characterized by comprising the steps of:

if there is no direct connection between the forward feedback and the backward feedback, the backward feedback is only activated if a higher level object is identified at a higher level.

8. The object recognition method according to claim 1, characterized by comprising the steps of:

storing object classification information using a hierarchy of the deep neural network.

9. The object identification method according to any one of claims 1 to 8, characterized by comprising the steps of:

in the deep neural network, the function of each layer of neural network is described by a Hopfield model.

10. An object recognition system based on feedback adjustment, comprising a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of:

in the early stage of object identification, inter-class noise caused by mutual interference among different classes in a deep neural network is inhibited through forward feedback; in the later stage of object identification, the identified intra-class noise is mixed up due to the similarity between the objects in the same class in the deep neural network through reverse feedback suppression,

wherein α ═ 1.., P_αThe parent module is { xi^α，βWhere β 1.., P_βAnd the child module is { xi^α，β，γWhere γ 1.., P_γ，P_α，P_βAnd P_γRespectively representing the number of ancestor modules;