CN110796195B

CN110796195B - Image classification method including online small sample excitation

Info

Publication number: CN110796195B
Application number: CN201911039732.0A
Authority: CN
Inventors: 杨绍武; 徐利洋; 唐玉华; 黄达; 胡古月; 吴慧超; 郭晖晖; 陈伯韬; 杨懿; 蔡成林
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2022-07-05
Anticipated expiration: 2039-10-29
Also published as: CN110796195A

Abstract

The invention belongs to the field of image processing, and discloses an image classification method comprising online small sample excitation. The invention aims to provide a neural network for image classification with the ability of learning in a constantly changing environment, and the accuracy of classification can be excited according to a reference picture received on line. The invention is based on the convolution neural network, adds neural regulation parameters on the framework of the plastic neural network to make the whole network closer to the biological neural network, and adopts the online and real-time principle on the selection of the reference picture in the prediction stage. The method has the advantages of high flexibility and accuracy in image classification tasks and capability of adjusting output results in real time according to the environment.

Description

Image classification method including online small sample excitation

Technical Field

The invention belongs to the field of image processing, relates to a method for classifying pictures based on an artificial neural network trained by small samples, and particularly relates to an image classification method including online small sample excitation.

Background

The image classification refers to a process of analyzing an unfamiliar test picture by a trained computer and obtaining what category the content in the test picture belongs to, and has wide application requirements in the fields of computer vision and machine learning. The deep neural network is a very popular image classification method with good effect, original features of an image are abstracted into feature vectors which can be linearly segmented through a multilayer convolutional neural network, then the feature vectors are linearly combined through a full connection layer, and finally the probability that the image belongs to each category is obtained.

The traditional method based on deep neural network classification has a limitation that a large amount of training data is needed for training parameters in the network, once the network is trained, all the parameters are solidified, the parameters are not changed in the subsequent prediction process, and the classification type of the whole network is also fixed. Thus, there is a problem that if a new category is added or a new training picture different from the training set is available, the whole network needs to be retrained, which consumes a lot of time and does not achieve the real-time effect.

In the biological neural network, the solidified structure and parameters do not exist, but a flexible neural network structure and operation mechanism exist, so that the biological neural network can adapt to a new environment in a short time and has a rapid learning ability. For example, many animals exist in nature, and the animals can accurately memorize and navigate to the place where the food is located only once or twice. These mechanisms have not been completely explained by cerebral neuroscience so far, but the Hebbian Theory (Hebbian Theory) proposed in Hebbian 1949 is the basic principle of synaptic plasticity, wherein the Hebb learning Rule (Hebb Rule) is an unsupervised learning Rule, and the result of learning is to enable the network to extract the statistical characteristics of the training set, so as to divide the input information into several classes according to the similarity degree of the input information. This is in good agreement with the process of human observing and understanding the world, which is classified to a considerable extent according to the statistical characteristics of things. The Hebb learning rule only changes weights according to the level of activation between neuron connections, and therefore this approach is also called associative learning or parallel learning.

With the continuous development of artificial intelligence, the requirement on the intelligence of image recognition is higher and higher. The mode of the conventional machine learning method is to train a plurality of (for example, 100) classes, then fix the whole network, respectively predict the probability of each class for the 100 classes given by the picture to be classified, and then output the class with the highest probability as a prediction class. However, this mode can be difficult to work with in the presence of the following more flexible, intelligent tasks: (1) the whole image classification task is divided into a plurality of stages (life cycles), the categories of images in each stage are different, for example, the first life cycle expectation classification image belongs to one of three categories of apple, cat and ship, the second life cycle expectation classification image belongs to one of three categories of tree, truck and spider, and so on; (2) the categories in the whole image classification task are fixed, but the actual distribution of the images to be classified belonging to each category changes along with the progress of the classification task; (3) the classification in the whole image classification task is fixed, but in the process of the classification task, new training images which do not appear in the original training set and are provided with classification labels can be continuously obtained, and if the characteristics in the new images are learned, the classification capability of the next images to be classified in a short period can be greatly improved. In a network trained by the conventional image classification method, after the training of the whole network is finished (the training usually requires more than million levels of training data and training time from ten hours to several weeks), the network structure, parameters and classes including each bit of an output probability vector are fixed, and the classification task cannot be adapted according to the change of the environment under the conditions similar to the above.

A method of applying a Hebbian Rule (Hebbian Rule) to an artificial neural network and constructing a neural network structure that can be trained end-to-end was proposed in 2018 and named as a conductible plastic neural network (differential stability).

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in a long-term image classification task which can be divided into a plurality of life cycles, learning of a newly accepted reference picture and features contained in the reference picture in each life cycle is achieved, and the learning result is used for prediction of a picture to be classified at the end of the life cycle. The plastic classification network can adjust definition and sensitivity of each category according to recently accepted reference pictures and corresponding category labels thereof on line in the process of performing classification tasks, improves the recognition rate of unidentified images and achieves the effect of learning.

The specific technical scheme of the invention is as follows:

an image classification method comprising online small sample excitation comprises two stages of training and prediction,

wherein, the training data in the training phase comprises a plurality of life cycles (episodes), and each life cycle comprises the following steps:

firstly, randomly extracting 5 categories from all categories of a training set, numbering the categories as categories 1-5, and then randomly extracting one picture from the categories 1-5 to form a group of 5 reference pictures;

secondly, randomly extracting a class from the classes 1 to 5, and extracting a picture in the class as a test picture of the life cycle;

thirdly, the 5 reference pictures are sequentially subjected to the following operations one by one:

3.1 respectively obtaining a feature vector activin of 1x64 through the combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer;

3.2, the feature vector activin and the fixed weight parameter w are subjected to vector multiplication to obtain a first part activ _1 of the activation amount, namely activin x w; (where the x symbol represents a matrix multiplication of vector activin and matrix w, the same applies hereinafter)

3.3, multiplying the characteristic vector activin, the Hebb and the Hebb weight parameter alpha to obtain a second part activ _2 of the activation vector activin x (alpha x Hebb);

3.4 determine the third part activ _3 of the activation vector to 1000 × label according to the reference picture type label, where label is a type label of one-hot coding, for example, if the label of a certain picture is 3 of 0,1,2,3,4, then label is [0,0,0,1,0 ];

3.5 add the three parts of the activation vector to get the activation vector activ — 1+ activ — 2+ activ — 3;

3.6 using sigmoid function to limit each bit of the activation vector to [0,1] to make it a reasonable probability, and obtaining output activout sigmoid (activ);

3.7 calculate neuromodulation parameters, i.e., Entropy values of the prediction results, where m (abbreviation for Modulation) is the trainable neuromodulation coefficient, mean () is the calculated average, and Encopy () is the calculated Entropy value: mod mean (entry (activout + m activ));

3.8 calculating the correlation between the input vector activin in step 3.1 and the output vector activout in step 3.6 by calculating the two vector outer products, and adding them to the herbrans according to the neuromodulation parameters: (1-mod) Hebb + mod (activin x activivout);

3.9 outputting an updated Hebb;

fourthly, carrying out the following operations on 1 test picture:

4.1 obtaining a feature vector activin' of 1x64 by the combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer;

4.2 vector multiplication is performed on the feature vector activin 'and the fixed weight parameter w to obtain a first part of the activation amount activ _1 ═ activin' x w;

4.3 multiplying the eigenvector activin ' with the Hebb and the Hebb weighting parameter alpha to obtain a second part activ _2 ' of the activation vector activin ' ((alpha) Hebb);

4.4 adding the two parts to obtain an activation vector activ ' activ _1 ' + activ _2 ';

4.5 using sigmoid function to limit the activation vector to each bit between [0,1] to make it a reasonable probability, and getting the output activout ═ sigmoid (activ');

4.6 outputting the prediction result activout;

and fifthly, comparing the prediction result activout of the test picture with the real labeled target, calculating a cross entropy value corsEncopy to obtain a Loss function Loss (activout, target), and returning and adjusting parameters of the whole network through the Loss function.

The prediction phase is performed online and can have any number of lifecycles (episodes), each lifecycle comprising the steps of:

sixthly, receiving 5 reference pictures from a credible message source (such as manual labeling and other image classification methods with high confidence coefficient), numbering the 5 reference pictures into classes 1-5, and then respectively randomly extracting one picture from the classes 1-5 to form a group of 5 reference pictures;

step seven, receiving a picture to be classified;

and eighthly, sequentially carrying out the following operations on 5 reference pictures (which are marked as samples in small sample learning) received in real time in the testing stage one by one, wherein the operations are similar to the training stage, but the error feedback and the network parameter adjustment are not carried out:

8.1 respectively obtaining a feature vector activin _ test of 1x64 through the combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer;

8.2, carrying out vector multiplication on the eigenvector activin _ test and a fixed weight parameter w to obtain a first part activ _ test _1 of the activation quantity activin _ test x w;

8.3, multiplying the feature vector activin _ test with the Hebb and the Hebb weighting parameter alpha to obtain a second part activin _ test _2 of the activation vector activin _ test x (alpha x Hebb);

8.4 the third part of the active vector is activ _ test _3 ═ 1000 label _ test determined by the reference picture type label, where label _ test is the type label of one-hot coding and its coding mode is the same as that in the training phase;

8.5 adding the three parts to obtain an activation vector activ _ test ═ activ _ test _1+ activ _ test _2+ active _ test _ 3;

8.6 using sigmoid function to limit the activation vector to each bit between [0,1] to make it a reasonable probability, and get the output activout _ test (active _ test);

8.7 calculating neuromodulation parameters, i.e. entropy values of the prediction results: mod _ test is mean (entry _ test + m activ _ test));

8.8 calculating the correlation between the input vector activin _ test in step 8.1 and the output vector activivout _ test in step 8.6 by calculating the two vector outer products, and adding them to the herbrad according to the neuromodulation parameters: (1-mod) Hebb + mod _ test (activin _ test x activivout _ test);

8.9 outputting an updated Hebb;

and ninthly, performing the following operations on the 1 picture to be classified in the testing stage:

9.1 obtaining a feature vector activin _ test' of 1x64 by the combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer;

9.2, the feature vector activin _ test ' is vector-multiplied by the fixed weight parameter w to obtain a first part of the activation amount activ _ test _1 ' ═ activin _ test ' x w;

9.3, multiplying the characteristic vector activin _ test ' with the Hebb and the Hebb weighting parameter alpha to obtain a second part activin _ test _2 ' which is activin _ test ' x (alpha x Hebb) of the activation vector;

9.4, adding the two parts to obtain an activation vector activ _ test ' ═ activ _ test _1 ' + activ _ test _2 ';

9.5 using sigmoid function to limit the activation vector to each bit between [0,1] to make it a reasonable probability, and obtaining output activout _ test ═ sigmoid (active _ test');

9.6 output final prediction result activout _ test'.

Compared with the prior art, the invention can achieve the following beneficial effects:

when an artificial intelligence method is used for image classification tasks, the method is limited in that a training model needs a large amount of training data. The meta learning or academic learning method can enable the network to quickly memorize the association between the features and the labels of the reference pictures only through a small amount of real-time training under the condition that only a small amount of samples (called as reference pictures) are possessed, and can realize the classification of new test pictures to be classified. The invention firstly trains the model through inputting a large amount of data, so that the model can obtain the learning ability through learning. In this step of learning, neural network learning is not a simple ability to classify pictures; instead, in a life cycle, a plurality of reference pictures with different categories are displayed, then a picture to be classified which is not seen is classified, and the classification result can be influenced by the information provided by the reference pictures. Before a picture needs to be predicted, a brand new lifecycle is started to empty all residual information in the hash trace. Therefore, the fact that the irrelevant information in the previous life cycle does not influence the prediction of the new life cycle can be guaranteed, the information received in the life cycle can be memorized to the maximum degree, and the influence on the prediction result is enhanced.

Drawings

FIG. 1 is a diagram of the structure of a convolutional network in the process of the invention;

FIG. 2 is the structure of the plastic part in the flow of the invention;

fig. 3 is an overall process of the invention.

Detailed Description

The drawings are only for purposes of illustrating the invention and are not to be construed as limiting the patent; the technical scheme of the invention is further explained by combining the attached drawings.

step one, randomly extracting 5 categories from all categories of a training set, numbering the categories as categories 1-5, and then randomly extracting a picture from the categories 1-5 to form a group of 5 reference pictures;

3.1 obtaining a feature vector activin of 1x64 by combining four layers of convolution, a nonlinear tanh layer and a Maxpool layer (shown in figure 1);

3.2, carrying out vector multiplication on the feature vector activin and a fixed weight parameter w to obtain a first part activ _1 of the activation amount, namely activin x w; (where the x symbol represents a matrix multiplication of vector activin and matrix w, the same applies hereinafter)

3.3, multiplying the feature vector activin with the Hebb and the Hebb weighting parameter alpha to obtain a second part activ _2 of the activation vector activin x (alpha x Hebb);

3.4 determine the third part activ _3 of the activation vector to be 1000 × label according to the reference picture type label, where label is a type label of one-hot coding, for example, label of a certain picture is 3 of 0,1,2,3,4, and then label is [0,0,0,1,0 ];

3.6 using sigmoid function to limit each bit of the activation vector to [0,1] to make it a reasonable probability, and obtaining output activiut (sigmoid) (activ);

3.1-3.6 above are steps for calculating the output activout, shown generally in FIG. 2;

3.9 outputting an updated Hebb;

fourthly, carrying out the following operations on 1 test picture:

4.1 obtaining a feature vector activin' of 1x64 by a combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer (shown in figure 1);

4.6 outputting the prediction result activout;

and fifthly, comparing the prediction result activout of the test picture with the real labeled target, calculating cross entropy cross Encopy to obtain a Loss function Loss (activout, target), and returning and adjusting parameters of the whole network through the Loss function.

sixthly, receiving 5 reference pictures from a credible message source (such as manual labeling and other image classification methods with high confidence coefficient), numbering the reference pictures into classes 1-5, and then randomly extracting one picture from the classes 1-5 to form a group of 5 reference pictures;

step seven, receiving a picture to be classified;

and eighthly, performing the following operations on 5 reference pictures (which are marked as samples in small sample learning) received in real time in the testing stage in sequence, wherein the operations are similar to the training stage, but do not perform error feedback and network parameter adjustment:

8.1 obtaining a feature vector activin _ test of 1x64 by a combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer (shown in figure 1);

8.2, performing vector multiplication on the feature vector activin _ test and the fixed weight parameter w to obtain a first part of the activation amount activin _ test _1 which is activin _ test x w;

8.4 the third part of the active vector is activ _ test _3 ═ 1000 × label _ test determined by the reference picture type label, where label _ test is a one-hot coded type label and the coding mode is the same as the training phase;

8.7 calculating neuromodulation parameters, i.e. entropy values of the prediction results: mod _ test ═ mean (entry (activout _ test + m activ _ test));

8.9 outputting an updated Hebb;

9.1 obtaining a feature vector activin _ test' of 1x64 by the combination of four layers of convolution, a nonlinear tanh layer and a Maxpool layer (shown in figure 1);

9.6 output final prediction result activout _ test'.

Fig. 3 gives the general flow of the plastic classification network. The following examples illustrate specific embodiments of the present invention.

The first stage is a training stage, which is to make a data set according to classification tasks, or to select a classic image classification data set (e.g. ImageNet) as a training data set, and then perform training for a given number of life cycles (e.g. 5000000), in order to train the network parameters well through error feedback, so that the network has the ability of learning online academic society. Randomly extracting 5 classes from all classes (for example, 100 classes) in the training set, numbering them as classes 1-5, and randomly extracting a picture in each of classes 1-5 to form a group of 5 reference pictures. And then any one of the classes 1-5 is selected, and a test picture is extracted from the class to be used as the life cycle. And (3) sequentially laminating 5 reference pictures to obtain a feature vector, adding the feature vector and the product of the fixed weight and the plastic weight, and fixing the output to the label of the picture through a fixing (clipping) operation. The outer product of the output and the eigenvector is updated into the hebby trace (Hebb) according to the calculated neuromodulation parameter (mod). And obtaining a characteristic vector by the convolution layer of the test picture, and then adding the characteristic vector and the product of the fixed weight and the plastic weight without clip labeling to obtain various predicted probability outputs activout. And then comparing the prediction result activout of the test picture with the real annotation target to obtain a Loss function Loss, and returning and adjusting parameters of the whole network through the Loss function.

The trained model is obtained through the first stage, in which some parameters are fixed, such as the fixed weight parameter w, the herbral weight parameter alpha, the structure and parameters of the convolutional layer, etc. The plastic network has the ability to learn during the prediction process, since the reference picture is exchanged online during the test phase for a new picture representing the environmental change, and the hebrag is accumulated by emptying and restarting at the beginning of each life cycle of the test phase.

The second stage is a prediction stage, which is performed on line, and the reference picture is sampled and updated in real time according to the environmental change, and represents the probability distribution in the current environment to a certain extent. For example, classified pictures are continuously transmitted to human recipients who then manually correct the incorrectly classified pictures. These manually corrected pictures are considered to be extremely accurate labels and are returned to the network again as reference pictures. When a picture to be classified starts in a new life cycle (epicode), 5 reference pictures are sequentially subjected to convolution layers to obtain a feature vector, the feature vector is added with the product of the fixed weight and the plastic weight, and the output of the label is fixed to the picture through a fixing (clipping) operation. The outer product of the output and the eigenvector is updated into the hebby trace (Hebb) according to the calculated neuromodulation parameter (mod). And finally, the category with the highest probability value in the output (activout) of the picture to be classified is the classification result of the picture to be classified.

The foregoing description describes embodiments of the invention in use, but, as noted above, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The image classification method containing on-line small sample excitation is characterized by comprising two stages of training and predicting,

wherein, the training data in the training phase comprises a plurality of life cycles, and each life cycle comprises the following steps:

step two, randomly extracting a class from the classes 1 to 5, and extracting a picture in the class to be used as a test picture of the life cycle;

3.2, the feature vector activin and the fixed weight parameter w are subjected to vector multiplication to obtain a first part activ _1 of the activation amount, namely activin x w;

3.4 determining a third part activ _3 of the activation vector to be 1000 label according to the reference picture type label, wherein label is a type label of one-hot coding;

3.7 calculate neuromodulation parameters, i.e., Entropy values of the prediction results, where m is the trainable neuromodulation coefficient, mean () is the calculated average, and Encopy () is the calculated Entropy value: mod mean (entry (activout + m activ));

3.9 outputting an updated Hebb;

fourthly, carrying out the following operations on 1 test picture:

4.1 obtaining a feature vector activin' of 1x64 by combining four layers of convolution, a nonlinear tanh layer and a Maxpool layer;

4.5 using sigmoid function to limit the activation vector to each bit between [0,1] to make it a reasonable probability, and get the output activout _ r, sigmoid (activ');

4.6 outputting the prediction result activout _ r;

fifthly, comparing the prediction result activout _ r of the test picture with the real labeled target, calculating cross entropy cross entry to obtain a Loss function Loss (activout _ r, target), and returning and adjusting parameters of the whole network through the Loss function;

the prediction phase is performed on-line, and there may be any number of life cycles, each life cycle comprising the steps of:

sixthly, receiving 5 reference pictures from a credible message source, numbering the reference pictures into categories 1-5, and then randomly extracting one picture in the categories 1-5 to form a group of 5 reference pictures;

step seven, receiving a picture to be classified;

and eighthly, sequentially carrying out the following operations one by one on the reference pictures received in real time in the total 5 testing stages, but not carrying out error feedback and network parameter adjustment:

8.9 outputting an updated Hebb;

9.6 output final prediction result activout _ test'.