CN112381148A - Semi-supervised image classification method based on random regional interpolation - Google Patents
Semi-supervised image classification method based on random regional interpolation Download PDFInfo
- Publication number
- CN112381148A CN112381148A CN202011282976.4A CN202011282976A CN112381148A CN 112381148 A CN112381148 A CN 112381148A CN 202011282976 A CN202011282976 A CN 202011282976A CN 112381148 A CN112381148 A CN 112381148A
- Authority
- CN
- China
- Prior art keywords
- image
- label
- images
- cnn
- real
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a semi-supervised image classification method based on random regional interpolation, which comprises the steps of selecting a small number of images with real labels from a training set, and using the rest images as images without real labels; the two types of images are simultaneously sent to a random area interpolation module; the interpolation process is different, the image with the real label can directly generate a new augmented image through interpolation, but the image without the real label can not be normally interpolated, so that label information with high confidence level can be obtained through a teacher network to be used as a temporary label of the image without the real label, and then interpolation operation is carried out; and training the network by using the new augmented image until the network model is trained to a preset number of times. The method of the invention simultaneously carries out random regional interpolation on the two types of images to generate a new augmented image for training the classification network, thereby improving the generalization performance of the training model.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a semi-supervised image classification method based on random regional interpolation.
Background
With the growth of social media and network services, a large number of photos are uploaded every day. It becomes less feasible to label them manually for model training. However, the presence of real tag data is insufficient in more and more applications, and the absence of real tag data is easily obtained. In order to solve the problem of data learning with real tags in part, semi-supervised learning is studied to effectively reduce the dependence on data with real tags by using data without real tags.
Semi-supervised learning is a key problem in the research of the pattern recognition field and the machine learning field, and is a learning method combining supervised learning and unsupervised learning. Semi-supervised learning uses a large amount of non-genuine label data while using a small amount of genuine label data to perform pattern recognition work. When semi-supervised learning is used, as few personnel as possible will be required to do the work. Meanwhile, the method can bring higher accuracy. Therefore, semi-supervised learning is currently receiving more and more attention from people.
In a semi-supervised setting, only a small portion of the training examples are labeled, while all remaining examples are unlabeled. To overcome the lack of authentic tag data, many data enhancement methods have been developed to obtain similar but different examples from the original ones. The class labels of these instances are unchanged before and after the conversion. Training the augmented data makes the model robust to rotation, translation, clipping, resizing, flipping, and random erasure. Kolesnikov et al studied a number of existing data expansion mechanisms to gain insight into CNN design. To produce more complex supervision, linear interpolation has been used to blend the training instances, and the corresponding training targets are proportional to the blend ratio. The model is regularized to make a smooth prediction between instances. While deep neural networks always tend to learn the most discriminative features to achieve higher training accuracy, these models may focus on areas that are not necessarily important or ideal. We believe this may reduce the generalization performance of invisible data, especially if the tag data is limited. To solve this problem, we propose a more complete mechanism to construct complex class-fuzzy instances, which would force model learning to have a more explanatory and robust function by randomly changing the size and location of the blending region.
Disclosure of Invention
The present invention aims to overcome the problem that existing deep neural networks always tend to learn the most distinctive features to obtain high training accuracy, and may concentrate on areas that are not necessarily important or not needed, especially in cases where supervision is limited. Therefore, the semi-supervised image classification method based on random regional interpolation is provided, the method generates a new augmented image by performing random regional interpolation on an image with a real label and an image without the real label, and the new augmented image is used for training a network, so that the classification accuracy and the generalization performance of the network are greatly improved.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a semi-supervised image classification method based on random regional interpolation comprises the following steps:
S2, collecting images without real labelsObtain an image without real labelObtaining predicted class score information as the non-true label image through a teacher network corresponding to the CNN-13 classification networkThe temporary label of (a); wherein the parameters of the teacher network are obtained by exponential sliding average of CNN-13 classification network parameters;
s3, inputting two true label images to the whole random area interpolation moduleMouth, obtaining new pair of augmented image samples Indicating the generation of new augmented image information,representing label information; inputting two images without real labels to the inlet of the whole random area interpolation module to obtain a new augmented image sample pair Indicating the generation of new augmented image information,representing label information;
s4, obtaining the new pair of the augmented image samples obtained in the step S3Andinputting the data into a CNN-13 classification network for the current round of training and constraining by using a loss function;
and S5, repeating the steps S2-S4, finishing training after reaching the preset training times, outputting the trained CNN-13 classification network, and performing class prediction on the image to be classified by using the trained CNN-13 classification network.
In step S1, scaling all images is required to achieve the desired training effect and reduce the data computation; classifying all data according to requirements, and firstly dividing all image data into training data and test data setsTwo types are adopted; the training data is divided into two categories: real label image collectionAnd genuine-label-free image collectionsIn a ratio of 1: 50 i.e. training data equal toOne real label image is recorded asNamely, it isAn image without real label is recorded asNamely, it is
In step S2, the non-genuine label image set is processedNo real label image inLabeling temporary labels, and adopting the same mode as the test stage for the teacher network corresponding to the CNN-13 classification network to fix the parameters without updating; the parameter of the teacher network is the result of exponential sliding average of the parameter of the CNN-13 classification network; taking the prediction result of the teacher network corresponding to the CNN-13 classification network as the image without the real labelThe temporary label of (1).
In step S3, the random area interpolation module randomly selects a rectangular area according to the beta distribution from the two images with real tags or the two images without real tags, interpolates the image information in the same rectangular area in the second image with the image information in the rectangular area in the first image, and the images outside the rectangular area keep the information of the first image, and finally returns a new augmented image formed after interpolation; the specific situation is as follows:
a. inputting two images containing real labelsAndgenerating new pairs of augmented image samplesThe specific process is as follows:
a1, inputting two images containing real labelAnd andthe information of the image is represented by,andrepresenting label information, wherein the spatial resolution of the two images is W multiplied by H, W represents the width of the images, and H represents the height of the images;
a2, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
a3, calculating a binary mask R with the spatial resolution of W × H, wherein R is a rectangular region (R ═ R)x,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;rx~Unif(0,W-rw) Unif denotes uniform distribution;ry~Unif(0,H-rh);
a6, generating new pair of augmented image samples Indicating the generation of new augmented image information,representing label information;
b. inputting two images without real labelsAndgenerating new pairs of augmented image samplesBecause the input image is the image without the real label, a temporary label needs to be generated on the image without the real label first, and then interpolation operation is carried out, and the specific process is as follows:
b1, inputting two images without real labelsAnd andrepresenting image information, the spatial resolution of the two images being W × H;
b2, willAndinputting the data into a teacher network corresponding to the CNN-13 classification network to obtain a corresponding temporary labelAnd
b3, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
b4, calculating a binary mask R with the spatial resolution of W multiplied by H, wherein the rectangular region R in R is equal to (Rx,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;rx~Unif(0,W-rw);ry~Unif(0,H-rh);
b7, generating new pair of augmented image samples Indicating the generation of new augmented image information,indicating the label information.
In step S4, the new pair of augmented image samples is usedAndoptimizing the CNN-13 classification network; since both the original image and the blended region are randomly determined, predicting class probability distributions of the composite image is a challenge in constructing the training target, and therefore, the CNN-13 classification network is forced to find important regions associated with the object and learn robust features for both classes of composite data, i.e., the generated new augmented image sample pairsAndthe network is optimized using two independent loss functions as follows:
wherein, thetaCClassifying the network for CNN-13 requires updating the parameters,representing the distribution of the synthetic data derived from the images with the real tags,representing the distribution of the synthetic data derived from the non-true-label image, C (-) representing the output of the last hidden layer of the input CNN-13 classification network after the Softmax function,output of the last hidden layer of the CNN-13 classification network representing the input,/CERepresents the cross-entropy loss function between the true label and the predicted value, lDivIs a function for measuring divergence between the training target and the CNN-13 classification network output, using the weight p to control the relative importance of the synthetic instances derived from the non-true label images.
In step S5, the training frequency is set to 400, and when all data are trained, the data are trained again until the preset frequency is reached; wherein, every time training is finished, the teacher network corresponding to the CNN-13 classification network is updated; after the trained CNN-13 classification network is obtained, fixing the parameters of the trained CNN-13 classification network, not updating the parameters of the CNN-13 classification network, and performing class prediction on the images to be classified without using a loss function: and (3) sequentially inputting the images to be classified into the CNN-13 classification network, wherein each image can obtain a corresponding prediction result.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention adopts the current popular deep learning detection framework CNN-13 classification network as a basic model, and compared with the current semi-supervision method, the classification effect is better and the generalization performance is better.
2. The invention solves the problem of insufficient supervision, constructs effective category fuzzy data and is convenient for semi-supervised image classification, which has not been explored before. Compared with the existing interpolation-based method, the random region interpolation provided by the invention provides a more complete data expansion mechanism, and the constructed data standardizes the behavior of the model near the decision boundary.
3. While CNN-13 classification networks always tend to learn the most discriminative features to achieve higher training accuracy, they may focus on areas that are not necessarily important or desirable. This may reduce the generalization performance for invisible data, especially for the case of limited real tag data. To solve this problem, the method proposes a more complete mechanism to construct complex class ambiguity instances. By randomly changing the size and location of the blending region, the CNN-13 classification network learns more robust and explanatory features.
4. The invention regularizes the behavior of the CNN-13 classification network near the decision boundary by adopting a random region interpolation method, and the method only performs interpolation in a random rectangular region and combines one training image with another training image. The augmented image is more complex than the original image and can become class-blurred when there are objects that belong to different classes. And determining a training target by interpolation between the real labels corresponding to the original images according to the area of the mixed region. Considering that in a semi-supervised environment a large number of training instances are not true-labeled, it is crucial to establish reliable training targets in a semi-supervised environment. Therefore, an instructor network corresponding to the CNN-13 classification network is constructed by performing Exponential Moving Average (EMA) on the CNN-13 classification network parameters in the training stage. The teacher network corresponding to the CNN-13 classification network can generate more stable and accurate predictions for the non-genuine label data. Based on the method, the random area interpolation method can be applied to data without real labels and data with real labels, and the model is supervised and trained by using the synthetic data. The randomness of the mixture region helps the model to find important spatial regions associated with the target.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
FIG. 2 is a schematic diagram of the random area interpolation operation of the method of the present invention, in which the Original data represents the Original image inputted to the random area interpolation module, the RRI module represents the random area interpolation module, the Classification network represents the CNN-13 Classification network, and lCERepresenting cross-entropy loss function, l, corresponding to the image with the true labelDivMean square representing correspondence of non-true label imagesThe variance loss function, mask denotes a binary mask, and Beta denotes a Beta distribution.
FIG. 3 is some sample diagrams of the method of the present invention, in which the first column, the second column represent the input image, and the third column represents the generated augmented image.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the semi-supervised image classification method based on random region interpolation provided in this embodiment includes the following steps:
s1, dividing the training data into real label image setsNo-genuine label image setAnd test data collectionThe method comprises the following specific steps:
firstly, horizontally turning an image with the probability of 0.5, then filling 2 pixel points in the width and height of the image, randomly cutting the image into 32x32 pixels, then subtracting the average value of the image pixels, and finally carrying out whitening treatment; classifying the image data according to requirements, firstly dividing the image data into training data and test data setsTwo types are adopted; the training data is divided into two categories: real label image collectionAnd genuine-label-free image collectionsIn a ratio of 1: 50 i.e. training data equal toOne real label image is recorded asNamely, it isAn image without real label is recorded asNamely, it is
S2, collecting images without real labelsObtain an image without real labelObtaining predicted class information through teacher network, and using the predicted class information as the image without real labelThe temporary label is as follows:
for images without real labelsLabeling temporary labels, and adopting the same mode as the test stage for the CNN-13 classification network to fix the parameters without updating; prediction information of teacher network corresponding to CNN-13 classification network as true label-free imageThe parameter of the teacher network corresponding to the CNN-13 classification network is the result of exponential sliding average of the CNN-13 classification network parameters.
S3, inputting two images with real labels and real labels thereof to a random area interpolation moduleObtaining corresponding augmented image sample pairsInputting two images without real labels and temporary labels thereof to a random area interpolation module to obtain corresponding augmented image sample pairsAs shown in the random area interpolation Module (RRI module) section on the left of FIG. 2, two pieces of original image data are inputAndafter passing through a random region interpolation module, a new augmented image sample pair is obtained from the image with the real labelNo real label image will obtain new augmented image sample pairThe resulting imageAs shown in the third column of FIG. 3, the first and second columns of FIG. 3 represent the original image data, respectivelyAnd
the random area interpolation module randomly selects a rectangular area from two images with real labels or two images without real labels according to beta distribution, interpolates image information in the same rectangular area in the second image and image information in the rectangular area in the first image, keeps the information of the first image in the images outside the rectangular area, and finally returns a new augmented image formed after interpolation; the specific situation is as follows:
a. inputting two images containing real labelsAndgenerating new pairs of augmented image samplesThe specific process is as follows:
a1, inputting two images containing real labelAnd andthe information of the image is represented by,andrepresenting label information, wherein the spatial resolution of the two images is W multiplied by H, W represents the width of the images, and H represents the height of the images;
a2, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
a3, calculating a binary mask R with the spatial resolution of W × H, wherein R is a rectangular region (R ═ R)x,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;rx~Unif(0,W-rw) Unif denotes uniform distribution;ry~Unif(0,H-rh);
a6, generating new pair of augmented image samples Indicating the generation of new augmented image information,representing label information;
b. inputting two images without real labelsAndgenerating new pairs of augmented image samplesBecause the input image is the image without the real label, a temporary label needs to be generated on the image without the real label first, and then interpolation operation is carried out, and the specific process is as follows:
b1, inputting two images without real labelsAnd andrepresenting image information, the spatial resolution of the two images being W × H;
b2, willAndinputting the data into a teacher network corresponding to the CNN-13 classification network to obtain a corresponding temporary labelAnd
b3, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
b4, calculating a binary mask R with the spatial resolution of W multiplied by H, wherein the rectangular region R in R is equal to (Rx,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;rx~Unif(0,W-rw);ry~Unif(0,H-rh);
b7, generating new pair of augmented image samples Indicating the generation of new augmented image information,indicating the label information.
S4, inputting the two pairs of the augmented image samples to the entrance of the whole network (CNN-13): an augmented image sample pair corresponding to the non-genuine label image of step S3The other is an augmented image sample pair corresponding to the real label imageAnd for the training of the current round, constraint is carried out by using a loss function. As shown in the CNN-13 Classification network (Classification network) part on the right of FIG. 2, there is an augmented image sample pair corresponding to the true tag imageUsing cross entropy loss function (l)CE) To constrain the corresponding pair of augmented image samples for true tag-free imagesUsing the mean square variance loss function (l)Div) To constrain.
The CNN-13 classification network comprises 9 convolutional layers (divided into 3 groups), 3 pooling layers and 1 full-connection layer, and the specific training process is as follows:
s41, inputting picture I (augmented image sample pair corresponding to true label imageFor the augmented image sample pairs corresponding to the image without the real label
S42, obtaining a feature map F1 by the picture I through a first group of 128, 128 and 128 channel convolution layers, and obtaining a feature map F1' through a maximized pooling layer;
s43, passing the feature diagram F1 'through a second group of 256, 256 and 256 channel convolution layers to obtain a feature diagram F2, and passing through a maximized pooling layer to obtain a feature diagram F2';
s44, passing the feature diagram F2 'through a third group of 512, 256 and 128 channel convolution layers to obtain a feature diagram F3, and passing through a maximized pooling layer to obtain a feature diagram F3';
s45, passing the feature graph F3' through a 128 x 10 full connection layer to obtain a classification result score;
the loss function in the entire network is as follows:
in the formula, thetaCClassifying the network for CNN-13 requires updating the parameters,representing the distribution of the synthetic data derived from the images with the real tags,representing the distribution of the synthetic data derived from the non-true-label image, C (-) representing the output of the last hidden layer of the input CNN-13 classification network after the Softmax function,output of the last hidden layer of the CNN-13 classification network representing the input,/CERepresents the cross-entropy loss function between the true label and the predicted value, lDivIs a function (e.g., mean square distance) that measures divergence between the training target and the CNN-13 classification network output, and the weight p is used to control the relative importance of the synthetic instances derived from the true label-free images.
Minimizing the loss function so that the network tends towards the nearest real label for each predicted result; in our setup, only a limited number of training images are used. The data expansion by using the image without the real label is very important, and the diversity and the number of training images can be expected to be effectively increased so as to improve the generalization capability of the detection model.
And S5, repeating the steps S2-S4, and finishing training after the preset training times are reached.
Real label image collectionAnd genuine-label-free image collectionsThe data size of the method is large, the training frequency is set to 400 in order to train the CNN-13 classification network well, and after all data are trained, the data are trained again in a disorderly mode until the preset frequency is reached, so that the characteristics of the sample can be fully learned. And updating the teacher network corresponding to the CNN-13 classification network every time training is finished, wherein the parameters of the teacher network corresponding to the CNN-13 classification network are the exponential sliding average of the parameters of the CNN-13 classification network.
S6 test data setAnd testing and evaluating the trained CNN-13 classification network to obtain a prediction result.
Fixing the trained CNN-13 classification network, and not updating the CNN-13 classification network and using a loss function in the whole test process; gathering test dataAnd (3) sequentially inputting each image into the trained CNN-13 classification network, obtaining a corresponding prediction result for each image, and performing corresponding calculation with a real class label to obtain a test evaluation result.
In the following we use the Cifar10 dataset as an example and can divide it into 50000 training images and 10000 test images. And selecting 1000 training images as labeled images from 50000 training images, taking the rest images as non-labeled images, horizontally turning the images at the probability of 0.5, filling 2 pixel points in the width and height of the images, randomly cutting the images into 32 multiplied by 32 pixels, subtracting the average value of the image pixels, and finally putting the images into a CNN-13 classification network after whitening treatment.
Firstly, 100 images without real labels are input into a teacher network corresponding to the CNN-13 classification network to obtain temporary labels, and the temporary labels are endowed with images without real labels. And then inputting 100 real label images into a random area interpolation module to obtain an augmented image sample pair corresponding to the real label images. And then inputting 100 real label-free images endowed with temporary labels into a random area interpolation module to obtain an augmented image sample pair corresponding to the real label-free images. Then 100 augmented image sample pairs corresponding to the images with the real labels and 100 augmented image sample pairs corresponding to the images without the real labels are put into a CNN-13 classification network to train the CNN-13 classification network together. In the training process, the label information of the image with the real label is completely real, the temporary label information of the image without the real label is obtained by the teacher network corresponding to the CNN-13 classification network, the uncertainty is very large, and in order to avoid the total loss controlled by the synthesis example derived from the data without the real label, the weight rho is gradually increased to the maximum value of 100 in the previous 100 iterations. Training the initial learning rate to be 0.1, and reducing the initial learning rate to 0 according to a cosine annealing technology; the momentum is 0.9; the optimizer is stochastic echelon descent (SGD); the hyper-parameter a is set to 0.25.
According to the method, after the Cifar10 is trained for 400 times of iteration, the whole CNN-13 classification network basically tends to be stable, the classification result shows good effect, the semi-supervised image classification target is achieved, and a small number of images can bring huge promotion.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims (6)
1. A semi-supervised image classification method based on random regional interpolation is characterized by comprising the following steps:
S2, collecting images without real labelsObtain an image without real labelObtaining predicted class score information as the non-true label image through a teacher network corresponding to the CNN-13 classification networkThe temporary label of (a); wherein the parameters of the teacher network are obtained by exponential sliding average of CNN-13 classification network parameters;
s3, inputting two images with real labels to the inlet of the whole random area interpolation module to obtain a new augmented image sample pair Indicating the generation of new augmented image information,representing label information; inputting two images without real labels to the inlet of the whole random area interpolation module to obtain a new augmented image sample pair Indicating the generation of new augmented image information,representing label information;
s4, obtaining the new pair of the augmented image samples obtained in the step S3Andinputting the data into a CNN-13 classification network for the current round of training and constraining by using a loss function;
and S5, repeating the steps S2-S4, finishing training after reaching the preset training times, outputting the trained CNN-13 classification network, and performing class prediction on the image to be classified by using the trained CNN-13 classification network.
2. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S1, scaling all images is required to achieve the desired training effect and reduce the data computation; classifying all data according to requirements, and firstly dividing all image data into training data and test data setsTwo types are adopted; the training data is divided into two categories: real label image collectionAnd genuine-label-free image collectionsIn a ratio of 1: 50 i.e. training data equal toOne real label image is recorded asNamely, it isAn image without real label is recorded asNamely, it is
3. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S2, the non-genuine label image set is processedNo real label image inLabeling temporary labels, and adopting the same mode as the test stage for the teacher network corresponding to the CNN-13 classification network to fix the parameters without updating; the parameter of the teacher network is the result of exponential sliding average of the parameter of the CNN-13 classification network; taking the prediction result of the teacher network corresponding to the CNN-13 classification network as the image without the real labelThe temporary label of (1).
4. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S3, the random area interpolation module randomly selects a rectangular area according to the beta distribution from the two images with real tags or the two images without real tags, interpolates the image information in the same rectangular area in the second image with the image information in the rectangular area in the first image, and the images outside the rectangular area keep the information of the first image, and finally returns a new augmented image formed after interpolation; the specific situation is as follows:
a. inputting two images containing real labelsAndgenerating new pairs of augmented image samplesThe specific process is as follows:
a1, inputting two images containing real labelAnd andthe information of the image is represented by,andrepresenting label information, wherein the spatial resolution of the two images is W multiplied by H, W represents the width of the images, and H represents the height of the images;
a2, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
a3, calculating a binary mask R with the spatial resolution of W × H, wherein R is a rectangular region (R ═ R)x,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;rx~Unif(0,W-rw) Unif denotes uniform distribution;ry~Unif(0,H-rh);
a6, generating new pair of augmented image samples Indicating the generation of new augmented image information,representing label information;
b. inputting two images without real labelsAndgenerating new pairs of augmented image samplesBecause the input image is the image without the real label, a temporary label needs to be generated on the image without the real label first, and then interpolation operation is carried out, and the specific process is as follows:
b1, inputting two images without real labelsAndandrepresenting image information, the spatial resolution of the two images being W × H;
b2, willAndinputting the data into a teacher network corresponding to the CNN-13 classification network to obtain a corresponding temporary labelAnd
b3, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
b4, calculating a binary mask R with the spatial resolution of W multiplied by H, wherein the rectangular region R in R is equal to (Rx,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;rx~Unif(0,W-rw);ry~Unif(0,H-rh);
5. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S4, the new pair of augmented image samples is usedAndoptimizing the CNN-13 classification network; since both the original image and the blended region are randomly determined, predicting class probability distributions of the composite image is a challenge in constructing the training target, and therefore, the CNN-13 classification network is forced to find important regions associated with the object and learn robust features for both classes of composite data, i.e., the generated new augmented image sample pairsAndthe network is optimized using two independent loss functions as follows:
wherein, thetaCClassifying the network for CNN-13 requires updating the parameters,representing the distribution of the synthetic data derived from the images with the real tags,representing the distribution of the synthetic data derived from the non-true-label image, C (-) representing the output of the last hidden layer of the input CNN-13 classification network after the Softmax function,output of the last hidden layer of the CNN-13 classification network representing the input,/CERepresents the cross-entropy loss function between the true label and the predicted value, lDivIs a function for measuring divergence between the training target and the CNN-13 classification network output, using the weight p to control the relative importance of the synthetic instances derived from the non-true label images.
6. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S5, the training frequency is set to 400, and when all data are trained, the data are trained again until the preset frequency is reached; wherein, every time training is finished, the teacher network corresponding to the CNN-13 classification network is updated; after the trained CNN-13 classification network is obtained, fixing the parameters of the trained CNN-13 classification network, not updating the parameters of the CNN-13 classification network, and performing class prediction on the images to be classified without using a loss function: and (3) sequentially inputting the images to be classified into the CNN-13 classification network, wherein each image can obtain a corresponding prediction result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011282976.4A CN112381148B (en) | 2020-11-17 | 2020-11-17 | Semi-supervised image classification method based on random regional interpolation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011282976.4A CN112381148B (en) | 2020-11-17 | 2020-11-17 | Semi-supervised image classification method based on random regional interpolation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112381148A true CN112381148A (en) | 2021-02-19 |
CN112381148B CN112381148B (en) | 2022-06-14 |
Family
ID=74584880
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011282976.4A Active CN112381148B (en) | 2020-11-17 | 2020-11-17 | Semi-supervised image classification method based on random regional interpolation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112381148B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408575A (en) * | 2021-05-12 | 2021-09-17 | 桂林电子科技大学 | Image data augmentation method based on discriminant area positioning |
CN113420786A (en) * | 2021-05-31 | 2021-09-21 | 杭州电子科技大学 | Semi-supervised classification method for feature mixed image |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108364016A (en) * | 2018-01-12 | 2018-08-03 | 华南理工大学 | Gradual semisupervised classification method based on multi-categorizer |
CN108764281A (en) * | 2018-04-18 | 2018-11-06 | 华南理工大学 | A kind of image classification method learning across task depth network based on semi-supervised step certainly |
CN109657697A (en) * | 2018-11-16 | 2019-04-19 | 中山大学 | Classified optimization method based on semi-supervised learning and fine granularity feature learning |
CN110097103A (en) * | 2019-04-22 | 2019-08-06 | 西安电子科技大学 | Based on the semi-supervision image classification method for generating confrontation network |
CN111275129A (en) * | 2020-02-17 | 2020-06-12 | 平安科技(深圳)有限公司 | Method and system for selecting image data augmentation strategy |
CN111368660A (en) * | 2020-02-25 | 2020-07-03 | 华南理工大学 | Single-stage semi-supervised image human body target detection method |
-
2020
- 2020-11-17 CN CN202011282976.4A patent/CN112381148B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108364016A (en) * | 2018-01-12 | 2018-08-03 | 华南理工大学 | Gradual semisupervised classification method based on multi-categorizer |
CN108764281A (en) * | 2018-04-18 | 2018-11-06 | 华南理工大学 | A kind of image classification method learning across task depth network based on semi-supervised step certainly |
CN109657697A (en) * | 2018-11-16 | 2019-04-19 | 中山大学 | Classified optimization method based on semi-supervised learning and fine granularity feature learning |
CN110097103A (en) * | 2019-04-22 | 2019-08-06 | 西安电子科技大学 | Based on the semi-supervision image classification method for generating confrontation network |
CN111275129A (en) * | 2020-02-17 | 2020-06-12 | 平安科技(深圳)有限公司 | Method and system for selecting image data augmentation strategy |
CN111368660A (en) * | 2020-02-25 | 2020-07-03 | 华南理工大学 | Single-stage semi-supervised image human body target detection method |
Non-Patent Citations (1)
Title |
---|
VIKAS VERMA ET AL.: "Interpolation Consistency Training for Semi-Supervised Learning", 《ARXIV》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408575A (en) * | 2021-05-12 | 2021-09-17 | 桂林电子科技大学 | Image data augmentation method based on discriminant area positioning |
CN113408575B (en) * | 2021-05-12 | 2022-08-19 | 桂林电子科技大学 | Image data augmentation method based on discriminant area positioning |
CN113420786A (en) * | 2021-05-31 | 2021-09-21 | 杭州电子科技大学 | Semi-supervised classification method for feature mixed image |
Also Published As
Publication number | Publication date |
---|---|
CN112381148B (en) | 2022-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
CN111242208B (en) | Point cloud classification method, segmentation method and related equipment | |
CN110097095B (en) | Zero sample classification method based on multi-view generation countermeasure network | |
CN110837836A (en) | Semi-supervised semantic segmentation method based on maximized confidence | |
CN110852273A (en) | Behavior identification method based on reinforcement learning attention mechanism | |
CN109743642B (en) | Video abstract generation method based on hierarchical recurrent neural network | |
CN112381148B (en) | Semi-supervised image classification method based on random regional interpolation | |
CN110889450B (en) | Super-parameter tuning and model construction method and device | |
CN114332578A (en) | Image anomaly detection model training method, image anomaly detection method and device | |
CN113313123B (en) | Glance path prediction method based on semantic inference | |
CN115731441A (en) | Target detection and attitude estimation method based on data cross-modal transfer learning | |
CN112651998A (en) | Human body tracking algorithm based on attention mechanism and double-current multi-domain convolutional neural network | |
CN114360067A (en) | Dynamic gesture recognition method based on deep learning | |
CN111753207A (en) | Collaborative filtering model of neural map based on comments | |
CN113111716A (en) | Remote sensing image semi-automatic labeling method and device based on deep learning | |
Zhou et al. | Attention transfer network for nature image matting | |
CN111259938A (en) | Manifold learning and gradient lifting model-based image multi-label classification method | |
CN112529025A (en) | Data processing method and device | |
CN117152427A (en) | Remote sensing image semantic segmentation method and system based on diffusion model and knowledge distillation | |
CN114841778B (en) | Commodity recommendation method based on dynamic graph neural network | |
CN111209886A (en) | Rapid pedestrian re-identification method based on deep neural network | |
CN113658285B (en) | Method for generating face photo to artistic sketch | |
Yue et al. | A Novel Two-stream Architecture Fusing Static And Dynamic Features for Human Action Recognition | |
Chang et al. | STAU: a spatiotemporal-aware unit for video prediction and beyond | |
Wu et al. | DDFPN: Context enhanced network for object detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |