CN113077388B

CN113077388B - Data-augmented deep semi-supervised over-limit learning image classification method and system

Info

Publication number: CN113077388B
Application number: CN202110448092.XA
Authority: CN
Inventors: 曾宇骏; 呼晓畅; 徐昕; 方强; 周思航
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2022-08-09
Anticipated expiration: 2041-04-25
Also published as: CN113077388A

Abstract

The invention discloses a method and a system for classifying data-augmented deep semi-supervised over-limit learning images, wherein the method comprises the steps of extracting features by adopting a deep convolution network model aiming at training images; fine tuning and optimizing a deep convolutional network model based on part of manual label data and generating a pseudo label for a label-free training image; fusing high-level semantic features extracted from the training images with low-level shallow structure features to obtain fused image features; adopting a random linear interpolation technology to amplify the fusion image characteristics and the labels of the training images; and training a single hidden layer feedforward neural network aiming at the augmented fused image features and labels and replacing a full connection layer in the deep convolutional network model to obtain a final image classification recognition network model. The method has the advantages of small requirement on manual marking, robust anti-noise interference capability, good classification and identification performance and strong task expansibility, and data is enlarged.

Description

Data-augmented deep semi-supervised over-limit learning image classification method and system

Technical Field

The invention relates to the technical field of image classification and target identification, in particular to a method and a system for classifying depth semi-supervised over-limit learning images with data augmentation.

Background

The current visual target identification method with better performance mostly adopts a deep learning technology, and supervised learning training of a deep neural network model is often required to be carried out on large-scale artificial labeling data. However, manually marking large amounts of data is expensive and may even be impractical for certain application scenarios. In recent years, related research has been more focused on deep semi-supervised learning techniques. The deep semi-supervised learning technology can improve the performance of the visual target classification and identification method by utilizing a small amount of high-quality labeled data and a large amount of easily-obtained unlabeled data. A variety of deep semi-supervised learning techniques such as entropy minimization, pseudo-labeling, consistency regularization, pre-training, model generation based, etc. have been proposed.

Among them, the most representative common techniques are mainly three types:

the method comprises the steps of firstly, carrying out a pseudo-label-based deep semi-supervised learning technology, using a class prediction result of a model obtained by training data based on a small amount of artificial labeling labels as a pseudo label of unlabeled data, then adding the pseudo label to the artificial labeling data to train an image class label prediction model again, carrying out label updating on the pseudo label data, and repeating iteration in the way until a receivable image classification precision effect is achieved. However, the labels predicted by the model may have noise, and the data itself may also have noise to a certain extent, so that the problem of model degradation often occurs in the image classification model obtained by training based on the samples, and the image classification accuracy is difficult to meet the actual application requirements;

the second is a deep semi-supervised learning technique based on consistent regularization, which penalizes inconsistent predictions from unmarked data under different perturbations. This technique will also suffer from noise signature overfitting due to the use of multiple predictions. Subsequently, various technologies such as generating a high-quality pseudo label and generating effective disturbance are provided to reduce overfitting, but the performance is obviously improved when the noise intensity is low, and the acceptable effect cannot be achieved when the noise intensity is high;

and thirdly, decomposing the feature representation and the classifier into two independent stages, namely pre-training the deep neural network before applying the classifier for learning so as to find out the compressed feature representation of the input data through an unsupervised auxiliary learning task. This technique pursues better feature representation models and better classifiers, respectively, and has advantages in analyzing and mitigating the over-fitting problem. However, this technique lacks label guidance for unsupervised auxiliary tasks and the feature representation model often faces a risk of inconsistency with the final task. For example, an auto-encoder aims to reconstruct all pixels of the original image to guide the learning extraction of the image feature representation, and the final classification task may be related to only a small portion of pixels in the image. In addition, the technology is difficult to effectively set a classifier, the sample data information utilization efficiency is low, and efficient learning training is difficult to perform.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the method and the system for classifying the deep semi-supervised ultralimit learning image are provided, wherein the method and the system are small in artificial marking requirement, have robust anti-noise interference capability, good classification and identification performance and strong task expansibility, and have the advantages of data amplification.

In order to solve the technical problems, the invention adopts the technical scheme that:

a data-augmented deep semi-supervised over-limit learning image classification method comprises the following steps of learning and training an image classification recognition network model:

s1, extracting features of the training image by adopting a depth convolution network model; training the deep convolutional network model aiming at part of artificial label data of the training image to realize fine tuning optimization, and predicting and generating a corresponding pseudo label for the label-free training image through the fine tuning optimized deep convolutional network model;

s2, fusing the high-level semantic features extracted from the training images with the low-level shallow structure features to obtain fused image features;

s3, augmenting the fusion image characteristics and labels of the training image by adopting a random linear interpolation technology;

s4, randomly dividing the augmented fusion image features and labels into batches, sequentially inputting the batches into the single hidden layer feedforward neural network, updating the weight of a network output layer, and repeating the steps until the training of the single hidden layer feedforward neural network is completed; and removing the full connection layer in the deep convolutional network model, and connecting the full connection layer with the trained single hidden layer feedforward neural network to form an image classification and identification network model for realizing the end-to-end classification and identification of the corresponding image target.

Optionally, the deep convolutional network model in step S1 is a 13-CNN deep convolutional network model that has completed pre-training.

Optionally, the training loss function of the 13-CNN deep convolutional network model is as follows:

in the above formula, the first and second carbon atoms are,l _cos a function representing the loss of training is represented,λ ₀ ,λ ₁ andλ ₂ in order to be the weight coefficient,R ₀ in order to be a consistent regularization term,R ₁ in order to cross-entropy regularization terms,y _i in order to be the label of the sample,

the model is given a label for the unlabeled data self-estimation,x _i is as followsiThe number of the samples is one,nas to the number of samples,p(y _i |x _i ) Is the predicted output of the model and is,lthe number of marked samples is as follows:

wherein the content of the first and second substances,Cas to the number of sample classes,p _c in order to average the probability of the class edge,

the class edge probability predicted for the model,p(y|x) Is the sample conditional probability of the model output,His the entropy.

Optionally, the step S2 of obtaining the fused image feature by fusion refers to performing feature-vectorization cascade fusion to obtain the fused image feature, where the function expression is:

in the above formula, the first and second carbon atoms are,f _c (x) For the fused image features after the cascade fusion,concatwhich represents a vector concatenation operation, is shown,ReLUin order to be a linear rectification function,GAPis a function of the global average pooling,f _s (x), f _h (x) Respectively a low-level shallow structure characteristic and a high-level semantic characteristic output by the deep neural network.

Optionally, in step S3, the function expression for augmenting the fused image feature and the label of the training image by using the random linear interpolation technique is as follows:

in the above formula, the first and second carbon atoms are,

for the interpolated image feature matrix,X _j andX _i is the image feature matrix before the interpolation,

for the label matrix after the interpolation, the label matrix is,Y _j andY _i is the label matrix before the interpolation,λis a weight vector sampled in the beta distribution.

Optionally, a weight vector sampled in beta distributionλThe formula of the calculation function is:

in the above formula, the first and second carbon atoms are,f(x:α,γ) For weight vector sampled in beta distributionλThe expression of the computational function of (2),α,γcontrol parameters for beta distributions greater than 0 respectively,xanduare all function variable unknowns.

Optionally, the function expression for updating the weight of the network output layer in step S4 is as follows:

in the above formula, the first and second carbon atoms are,K _k+1 andK _k respectively, a weight matrix is formed by the weight matrix,β ^k+1 andβ ^k the solution parameters for the iteration are respectively,

to input firstk+1 batch of augmentation data and tags

The hidden layer output matrix of the single hidden layer feedforward neural network is used,

is a label matrix and has initial values:

wherein the content of the first and second substances,

for inputting the augmentation data and tags of the initial lot

The hidden layer output matrix of the single hidden layer feedforward neural network is used,cin order to be the weight coefficient,Iis an identity matrix.

Optionally, step S4 is followed by: and inputting the image to be recognized into the image classification recognition network model, and obtaining and outputting an image classification recognition result corresponding to the image to be recognized.

In addition, the invention also provides a data-augmented depth semi-supervised overrun learning image classification system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the data-augmented depth semi-supervised overrun learning image classification method.

Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the data-augmented deep semi-supervised ultralimit learning image classification method.

Compared with the prior art, the invention has the main advantages that:

1. the invention adopts a deep semi-supervised learning technical route, and performs learning training of the image classification and identification network model based on a small amount of labeled data and a large amount of unlabeled data, thereby greatly reducing the cost requirement of manual labeling of the data while ensuring the accuracy of image classification and identification. Meanwhile, the existing deep semi-supervised learning framework is improved, the feature learning and the classifier training process are decoupled and optimized respectively, so that the network model can learn and extract image features more oriented to specific classification recognition tasks. Meanwhile, the classifier training adopts an ultralimit learning machine principle, and the generalization capability of classification and identification is further improved.

2. When the ultralimit learning machine is adopted for learning and training the classifier, a data augmentation mechanism is introduced and integrated into the target optimization function design of the online ultralimit learning machine, so that the classifier obtained by training can effectively tolerate the training data and the noise in the label thereof, and the robustness of classification and identification is effectively improved. In addition, the online ultralimit learning method based on data augmentation is not only limited to a deep semi-supervised learning framework, but also is applicable to classifier learning training related to supervised learning tasks, and has certain task expansibility.

Drawings

FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an implementation principle of the method according to the embodiment of the present invention.

FIG. 3 shows the comparison result of the effect performance of the method of the embodiment of the invention on the typical international standard image classification recognition databases CIFAR-10 and CIFAR-100 and the related typical method.

FIG. 4 is a schematic diagram showing the comparison of the effectiveness performance of the method of the embodiment of the invention on typical international standard image classification recognition databases CIFAR-10 and CIFAR-100 and the classification recognition accuracy thereof with other part of representative methods.

FIG. 5 is a schematic diagram showing the comparison between the performance of the method of the embodiment of the invention on typical international standard image classification recognition databases CIFAR-10 and CIFAR-100 and the classification recognition accuracy of another part of representative methods.

Detailed Description

The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.

As shown in fig. 1 and fig. 2, the method for classifying an image by deep semi-supervised over-limit learning with augmented data of the present embodiment includes the following steps of learning and training an image classification recognition network model:

Step S1 is a step of extracting depth convolution features of an example image and generating pseudo labels, that is, training and learning a depth convolution neural network model with simple classification and recognition capability based on a small amount of artificially labeled image data, and performing preliminary classification and recognition on a large amount of unlabeled images to obtain corresponding pseudo labels, converting original image data into a data set containing artificially labeled images and pseudo labels, and performing retraining on the depth convolution network, thereby converting original semi-supervised learning into a supervised learning process, and the corresponding extracted depth convolution features can also have task relevance.

As an alternative embodiment, the deep convolutional network model in step S1 is a 13-CNN deep convolutional network model with pre-training.

In this embodiment, the training loss function of the 13-CNN deep convolutional network model is as follows:

Step S2, the multi-level image depth convolution feature fusion is to select a shallow feature capable of reflecting image structure information and a deep feature capable of reflecting image category semantic information from the multi-level image convolution features extracted by the depth convolution network, and perform feature vectorization cascade fusion. Specifically, the step S2 of the embodiment of obtaining the fusion image feature by fusion refers to performing feature-vectorized cascade fusion to obtain the fusion image feature, where a function expression of the fusion image feature is:

in the above formula, the first and second carbon atoms are,f _c (x) For the fused image features after the cascade fusion,concatwhich represents a vector concatenation operation, is shown,ReLUin order to be a linear rectification function,GAPis a function of the global average pooling,f _s (x), f _h (x) Respectively a low-level shallow structure characteristic and a high-level semantic characteristic output by the deep neural network. In this embodiment, taking 13-CNN as an example, the output characteristics of the 3 rd convolutional layer and the 18 th convolutional layer may be selected and cascaded after global pooling and linear rectification processing.

Global average pooling functionGAPThe functional expression of (a) is:

linear rectification functionReLUThe functional expression of (a) is:

wherein the content of the first and second substances,W、H、Crespectively, image data of output global average pooling functionXThe width, height and number of channels of the channel,x _i,j,k finger image dataXTo middlekIn the channeljGo to the firstiThe data point value, max, of the column is taken to be the maximum value.

In the step S3 of this embodiment, in the online ultralimit learning classification based on data augmentation, the image features and the corresponding labels obtained in the steps S1 and S2 are augmented by using a random linear interpolation principle, specifically using the existing mixup method. Specifically, in step S3, the function expression for augmenting the fused image feature and the label of the training image by using the random linear interpolation technique is as follows:

in the above formula, the first and second carbon atoms are,

In this embodiment, the weight vector sampled in beta distributionλThe formula of the calculation function is:

in the above formula, the first and second carbon atoms are,f(x:α,γ) For weight vector sampled in beta distributionλThe expression of the computational function of (2),α,γcontrol parameters for beta distributions greater than 0 respectively,xanduare all function variable unknowns. In this example toα=γ。

In this embodiment, in step S4, the fully-connected layer at the end of the deep convolutional network model is replaced by a single hidden layer feedforward neural network, data amplification is further combined with the over-limit learning machine, an objective function as shown below is defined, and learning training of the single hidden layer feedforward neural network is performed accordingly.

Wherein the content of the first and second substances,λto sample the weight vector of the beta distribution,

for the hidden layer output matrix of the used single hidden layer feedforward neural network,βfor the output layer weights of the used single-hidden layer feed-forward neural network to be optimized for learning,Frepresents a Frobingni norm,care weight coefficients. Accordingly, the output weights of the single hidden layer feedforward neural network may be given by:

in the above formula, the first and second carbon atoms are,β ^* representing the output weights of the single hidden layer feed-forward neural network,

in the form of a matrix of data,cin order to be the weight coefficient,Iis a matrix of the units,

being a label matrix, the data matrix consists of N samples,dis the characteristic dimension of the sample. Wherein:

in the above formula, the first and second carbon atoms are,λto sample the weight vector for the beta distribution,Y _i in the form of a matrix of original tags,Y _j is a disorderly ordered matrix of labels.

In this embodiment, the function expression for updating the weight of the network output layer in step S4 is as follows:

in the above formula, the first and second carbon atoms are,K _k+1 andK _k respectively, a weight matrix is formed by the weight matrixes,β ^k+1 andβ ^k the solution parameters for the iteration are respectively,

to input firstk+1 batch of augmentation data and tags

is a label matrix and has initial values:

wherein the content of the first and second substances,

for inputting the augmentation data and tags of the initial lot

After the training of the single hidden layer feedforward neural network is finished, the single hidden layer feedforward neural network is connected with the deep convolution networks used in the S1 and the S2 of the removed full connection layer, and then the classification and the identification of the input image can be completed. In this embodiment, step S4 is followed by: and inputting the image to be recognized into the image classification recognition network model, and obtaining and outputting an image classification recognition result corresponding to the image to be recognized.

FIG. 3 shows the comparison result of the effect performance of the method of this embodiment on the typical international standard image classification recognition databases CIFAR-10 and CIFAR-100 and the related typical method. Particularly, when noise exists in the image data, the method of the embodiment can also have better classification and identification performance, as shown in fig. 4 and 5, the effect performance of the method of the embodiment on the typical international standard image classification and identification databases CIFAR-10 and CIFAR-100 and the classification and identification accuracy comparison with the related representative method are achieved under different label noise intensities (0% -70%).

In addition, the present embodiment further provides a data-augmented deep semi-supervised overrun learning image classification system, which includes a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to execute the steps of the data-augmented deep semi-supervised overrun learning image classification method.

In addition, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the aforementioned data-augmented depth semi-supervised ultralimit learning image classification method is stored.

In summary, the method of the embodiment includes the steps of obtaining a training image to be classified and identified, and performing depth convolution feature coding and fusion on each pixel in the image; training an initial classification recognition network based on a small part of labeled image data and performing classification recognition on unlabeled image data to obtain corresponding pseudo labels; carrying out data augmentation on the acquired image features to be classified and identified, the labels corresponding to the image features and the pseudo labels; and removing and replacing the full connection layer of the initial classification recognition network with a single hidden layer feedforward neural network layer, and performing learning training on the weight of the network layer according to an online overrun learning machine based on data augmentation to obtain a final image classification recognition network model, wherein the model can be used for realizing end-to-end classification recognition of corresponding image targets. The method has the advantages of small requirement on manual marking, robust anti-noise interference capability, good classification and identification performance, strong task expansibility and the like.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A data-augmented deep semi-supervised over-limit learning image classification method is characterized by comprising the following steps of learning and training an image classification recognition network model:

s4, randomly dividing the augmented fusion image features and labels into batches, sequentially inputting the batches into the single hidden layer feedforward neural network, updating the weight of a network output layer, and repeating the steps until the training of the single hidden layer feedforward neural network is completed; removing a full connection layer in the deep convolutional network model, and connecting the full connection layer with a trained single hidden layer feedforward neural network to form an image classification and identification network model for realizing end-to-end classification and identification of a corresponding image target; the function expression for updating the weight of the network output layer is as follows:

to input firstk+1 batch of augmentation data and tags

is a label matrix and has initial values:

wherein，

For inputting the augmentation data and tags of the initial lot

2. The method for classifying image of deep semi-supervised over-learning with augmented data of claim 1, wherein the deep convolutional network model in step S1 is a 13-CNN deep convolutional network model with pre-training.

3. The method for classifying image of deep semi-supervised over-learning based on data augmentation as claimed in claim 2, wherein the training loss function of the 13-CNN deep convolutional network model is as follows:

in the above-mentioned formula, the compound has the following structure,l _cos a function representing the loss of training is represented,λ ₀ ,λ ₁ andλ ₂ in order to be the weight coefficient,R ₀ in order to be a consistent regularization term,R ₁ in order to cross-entropy regularization terms,y _i in order to be the label of the sample,

4. The method for classifying data-augmented depth semi-supervised over-learning images according to claim 1, wherein the step of fusing to obtain the fused image features in step S2 is to perform feature vectorization cascade fusion to obtain the fused image features, and the function expression is as follows:

in the above-mentioned formula, the compound has the following structure,f _c (x) For the fused image features after the cascade fusion,concatwhich represents a vector concatenation operation, is shown,ReLUin order to be a linear rectification function,GAPis a function of the global average pooling,f _s (x), f _h (x) Respectively a low-level shallow structure characteristic and a high-level semantic characteristic output by the deep neural network.

5. The method for classifying image features for data augmentation and semi-supervised over-learning in depth according to claim 1, wherein the function expression for augmenting the fused image features and labels of the training image by using the random linear interpolation technique in step S3 is as follows:

in the above formula, the first and second carbon atoms are,

6. The method of claim 5 wherein the weight vectors sampled in beta distribution are used to classify imagesλThe formula of the calculation function is:

in the above-mentioned formula, the compound has the following structure,f(x:α,γ) For weight vector sampled in beta distributionλThe expression of the computational function of (2),α,γcontrol parameters for beta distributions greater than 0 respectively,xanduare all function variable unknowns.

7. The method for classifying image according to claim 1, wherein step S4 is followed by the steps of: and inputting the image to be recognized into the image classification recognition network model, and obtaining and outputting an image classification recognition result corresponding to the image to be recognized.

8. A data-augmented deep semi-supervised overrun learning image classification system comprising a microprocessor and a memory connected to each other, characterized in that the microprocessor is programmed or configured to perform the steps of the data-augmented deep semi-supervised overrun learning image classification method of any one of claims 1 to 7.

9. A computer-readable storage medium having stored thereon a computer program programmed or configured to perform the data-augmented method for classifying images for deep semi-supervised ultralimit learning according to any one of claims 1 to 7.