CN112381148A - Semi-supervised image classification method based on random regional interpolation - Google Patents

Semi-supervised image classification method based on random regional interpolation Download PDF

Info

Publication number
CN112381148A
CN112381148A CN202011282976.4A CN202011282976A CN112381148A CN 112381148 A CN112381148 A CN 112381148A CN 202011282976 A CN202011282976 A CN 202011282976A CN 112381148 A CN112381148 A CN 112381148A
Authority
CN
China
Prior art keywords
image
label
images
cnn
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011282976.4A
Other languages
Chinese (zh)
Other versions
CN112381148B (en
Inventor
曾祥平
霍晓阳
吴斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202011282976.4A priority Critical patent/CN112381148B/en
Publication of CN112381148A publication Critical patent/CN112381148A/en
Application granted granted Critical
Publication of CN112381148B publication Critical patent/CN112381148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semi-supervised image classification method based on random regional interpolation, which comprises the steps of selecting a small number of images with real labels from a training set, and using the rest images as images without real labels; the two types of images are simultaneously sent to a random area interpolation module; the interpolation process is different, the image with the real label can directly generate a new augmented image through interpolation, but the image without the real label can not be normally interpolated, so that label information with high confidence level can be obtained through a teacher network to be used as a temporary label of the image without the real label, and then interpolation operation is carried out; and training the network by using the new augmented image until the network model is trained to a preset number of times. The method of the invention simultaneously carries out random regional interpolation on the two types of images to generate a new augmented image for training the classification network, thereby improving the generalization performance of the training model.

Description

Semi-supervised image classification method based on random regional interpolation
Technical Field
The invention relates to the technical field of computer vision, in particular to a semi-supervised image classification method based on random regional interpolation.
Background
With the growth of social media and network services, a large number of photos are uploaded every day. It becomes less feasible to label them manually for model training. However, the presence of real tag data is insufficient in more and more applications, and the absence of real tag data is easily obtained. In order to solve the problem of data learning with real tags in part, semi-supervised learning is studied to effectively reduce the dependence on data with real tags by using data without real tags.
Semi-supervised learning is a key problem in the research of the pattern recognition field and the machine learning field, and is a learning method combining supervised learning and unsupervised learning. Semi-supervised learning uses a large amount of non-genuine label data while using a small amount of genuine label data to perform pattern recognition work. When semi-supervised learning is used, as few personnel as possible will be required to do the work. Meanwhile, the method can bring higher accuracy. Therefore, semi-supervised learning is currently receiving more and more attention from people.
In a semi-supervised setting, only a small portion of the training examples are labeled, while all remaining examples are unlabeled. To overcome the lack of authentic tag data, many data enhancement methods have been developed to obtain similar but different examples from the original ones. The class labels of these instances are unchanged before and after the conversion. Training the augmented data makes the model robust to rotation, translation, clipping, resizing, flipping, and random erasure. Kolesnikov et al studied a number of existing data expansion mechanisms to gain insight into CNN design. To produce more complex supervision, linear interpolation has been used to blend the training instances, and the corresponding training targets are proportional to the blend ratio. The model is regularized to make a smooth prediction between instances. While deep neural networks always tend to learn the most discriminative features to achieve higher training accuracy, these models may focus on areas that are not necessarily important or ideal. We believe this may reduce the generalization performance of invisible data, especially if the tag data is limited. To solve this problem, we propose a more complete mechanism to construct complex class-fuzzy instances, which would force model learning to have a more explanatory and robust function by randomly changing the size and location of the blending region.
Disclosure of Invention
The present invention aims to overcome the problem that existing deep neural networks always tend to learn the most distinctive features to obtain high training accuracy, and may concentrate on areas that are not necessarily important or not needed, especially in cases where supervision is limited. Therefore, the semi-supervised image classification method based on random regional interpolation is provided, the method generates a new augmented image by performing random regional interpolation on an image with a real label and an image without the real label, and the new augmented image is used for training a network, so that the classification accuracy and the generalization performance of the network are greatly improved.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a semi-supervised image classification method based on random regional interpolation comprises the following steps:
s1, dividing the training set data into real label image sets
Figure BDA0002781412870000021
No-genuine label image set
Figure BDA0002781412870000022
S2, collecting images without real labels
Figure BDA0002781412870000023
Obtain an image without real label
Figure BDA0002781412870000024
Obtaining predicted class score information as the non-true label image through a teacher network corresponding to the CNN-13 classification network
Figure BDA0002781412870000025
The temporary label of (a); wherein the parameters of the teacher network are obtained by exponential sliding average of CNN-13 classification network parameters;
s3, inputting two true label images to the whole random area interpolation moduleMouth, obtaining new pair of augmented image samples
Figure BDA0002781412870000026
Figure BDA0002781412870000027
Indicating the generation of new augmented image information,
Figure BDA0002781412870000028
representing label information; inputting two images without real labels to the inlet of the whole random area interpolation module to obtain a new augmented image sample pair
Figure BDA0002781412870000029
Figure BDA00027814128700000210
Indicating the generation of new augmented image information,
Figure BDA00027814128700000211
representing label information;
s4, obtaining the new pair of the augmented image samples obtained in the step S3
Figure BDA00027814128700000212
And
Figure BDA00027814128700000213
inputting the data into a CNN-13 classification network for the current round of training and constraining by using a loss function;
and S5, repeating the steps S2-S4, finishing training after reaching the preset training times, outputting the trained CNN-13 classification network, and performing class prediction on the image to be classified by using the trained CNN-13 classification network.
In step S1, scaling all images is required to achieve the desired training effect and reduce the data computation; classifying all data according to requirements, and firstly dividing all image data into training data and test data sets
Figure BDA0002781412870000031
Two types are adopted; the training data is divided into two categories: real label image collection
Figure BDA0002781412870000032
And genuine-label-free image collections
Figure BDA0002781412870000033
In a ratio of 1: 50 i.e. training data equal to
Figure BDA0002781412870000034
One real label image is recorded as
Figure BDA0002781412870000035
Namely, it is
Figure BDA0002781412870000036
An image without real label is recorded as
Figure BDA0002781412870000037
Namely, it is
Figure BDA0002781412870000038
In step S2, the non-genuine label image set is processed
Figure BDA0002781412870000039
No real label image in
Figure BDA00027814128700000310
Labeling temporary labels, and adopting the same mode as the test stage for the teacher network corresponding to the CNN-13 classification network to fix the parameters without updating; the parameter of the teacher network is the result of exponential sliding average of the parameter of the CNN-13 classification network; taking the prediction result of the teacher network corresponding to the CNN-13 classification network as the image without the real label
Figure BDA00027814128700000311
The temporary label of (1).
In step S3, the random area interpolation module randomly selects a rectangular area according to the beta distribution from the two images with real tags or the two images without real tags, interpolates the image information in the same rectangular area in the second image with the image information in the rectangular area in the first image, and the images outside the rectangular area keep the information of the first image, and finally returns a new augmented image formed after interpolation; the specific situation is as follows:
a. inputting two images containing real labels
Figure BDA00027814128700000312
And
Figure BDA00027814128700000313
generating new pairs of augmented image samples
Figure BDA00027814128700000314
The specific process is as follows:
a1, inputting two images containing real label
Figure BDA00027814128700000315
And
Figure BDA00027814128700000316
Figure BDA00027814128700000317
and
Figure BDA00027814128700000318
the information of the image is represented by,
Figure BDA00027814128700000319
and
Figure BDA00027814128700000320
representing label information, wherein the spatial resolution of the two images is W multiplied by H, W represents the width of the images, and H represents the height of the images;
a2, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
a3, calculating a binary mask R with the spatial resolution of W × H, wherein R is a rectangular region (R ═ R)x,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;
Figure BDA0002781412870000041
rx~Unif(0,W-rw) Unif denotes uniform distribution;
Figure BDA0002781412870000042
ry~Unif(0,H-rh);
a4, generating new augmented image
Figure BDA0002781412870000043
Wherein
Figure BDA0002781412870000044
a5, generating new augmented image
Figure BDA0002781412870000045
Corresponding label
Figure BDA0002781412870000046
Wherein
Figure BDA0002781412870000047
a6, generating new pair of augmented image samples
Figure BDA0002781412870000048
Figure BDA0002781412870000049
Indicating the generation of new augmented image information,
Figure BDA00027814128700000410
representing label information;
b. inputting two images without real labels
Figure BDA00027814128700000411
And
Figure BDA00027814128700000412
generating new pairs of augmented image samples
Figure BDA00027814128700000413
Because the input image is the image without the real label, a temporary label needs to be generated on the image without the real label first, and then interpolation operation is carried out, and the specific process is as follows:
b1, inputting two images without real labels
Figure BDA00027814128700000414
And
Figure BDA00027814128700000415
Figure BDA00027814128700000416
and
Figure BDA00027814128700000417
representing image information, the spatial resolution of the two images being W × H;
b2, will
Figure BDA00027814128700000418
And
Figure BDA00027814128700000419
inputting the data into a teacher network corresponding to the CNN-13 classification network to obtain a corresponding temporary label
Figure BDA00027814128700000420
And
Figure BDA00027814128700000421
b3, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
b4, calculating a binary mask R with the spatial resolution of W multiplied by H, wherein the rectangular region R in R is equal to (Rx,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;
Figure BDA0002781412870000051
rx~Unif(0,W-rw);
Figure BDA0002781412870000052
ry~Unif(0,H-rh);
b5, generating a new augmented image
Figure BDA0002781412870000053
Wherein
Figure BDA0002781412870000054
b6, generating a new augmented image
Figure BDA0002781412870000055
Corresponding label
Figure BDA0002781412870000056
Wherein
Figure BDA0002781412870000057
b7, generating new pair of augmented image samples
Figure BDA0002781412870000058
Figure BDA0002781412870000059
Indicating the generation of new augmented image information,
Figure BDA00027814128700000510
indicating the label information.
In step S4, the new pair of augmented image samples is used
Figure BDA00027814128700000511
And
Figure BDA00027814128700000512
optimizing the CNN-13 classification network; since both the original image and the blended region are randomly determined, predicting class probability distributions of the composite image is a challenge in constructing the training target, and therefore, the CNN-13 classification network is forced to find important regions associated with the object and learn robust features for both classes of composite data, i.e., the generated new augmented image sample pairs
Figure BDA00027814128700000513
And
Figure BDA00027814128700000514
the network is optimized using two independent loss functions as follows:
Figure BDA00027814128700000515
wherein, thetaCClassifying the network for CNN-13 requires updating the parameters,
Figure BDA00027814128700000516
representing the distribution of the synthetic data derived from the images with the real tags,
Figure BDA00027814128700000517
representing the distribution of the synthetic data derived from the non-true-label image, C (-) representing the output of the last hidden layer of the input CNN-13 classification network after the Softmax function,
Figure BDA00027814128700000518
output of the last hidden layer of the CNN-13 classification network representing the input,/CERepresents the cross-entropy loss function between the true label and the predicted value, lDivIs a function for measuring divergence between the training target and the CNN-13 classification network output, using the weight p to control the relative importance of the synthetic instances derived from the non-true label images.
In step S5, the training frequency is set to 400, and when all data are trained, the data are trained again until the preset frequency is reached; wherein, every time training is finished, the teacher network corresponding to the CNN-13 classification network is updated; after the trained CNN-13 classification network is obtained, fixing the parameters of the trained CNN-13 classification network, not updating the parameters of the CNN-13 classification network, and performing class prediction on the images to be classified without using a loss function: and (3) sequentially inputting the images to be classified into the CNN-13 classification network, wherein each image can obtain a corresponding prediction result.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention adopts the current popular deep learning detection framework CNN-13 classification network as a basic model, and compared with the current semi-supervision method, the classification effect is better and the generalization performance is better.
2. The invention solves the problem of insufficient supervision, constructs effective category fuzzy data and is convenient for semi-supervised image classification, which has not been explored before. Compared with the existing interpolation-based method, the random region interpolation provided by the invention provides a more complete data expansion mechanism, and the constructed data standardizes the behavior of the model near the decision boundary.
3. While CNN-13 classification networks always tend to learn the most discriminative features to achieve higher training accuracy, they may focus on areas that are not necessarily important or desirable. This may reduce the generalization performance for invisible data, especially for the case of limited real tag data. To solve this problem, the method proposes a more complete mechanism to construct complex class ambiguity instances. By randomly changing the size and location of the blending region, the CNN-13 classification network learns more robust and explanatory features.
4. The invention regularizes the behavior of the CNN-13 classification network near the decision boundary by adopting a random region interpolation method, and the method only performs interpolation in a random rectangular region and combines one training image with another training image. The augmented image is more complex than the original image and can become class-blurred when there are objects that belong to different classes. And determining a training target by interpolation between the real labels corresponding to the original images according to the area of the mixed region. Considering that in a semi-supervised environment a large number of training instances are not true-labeled, it is crucial to establish reliable training targets in a semi-supervised environment. Therefore, an instructor network corresponding to the CNN-13 classification network is constructed by performing Exponential Moving Average (EMA) on the CNN-13 classification network parameters in the training stage. The teacher network corresponding to the CNN-13 classification network can generate more stable and accurate predictions for the non-genuine label data. Based on the method, the random area interpolation method can be applied to data without real labels and data with real labels, and the model is supervised and trained by using the synthetic data. The randomness of the mixture region helps the model to find important spatial regions associated with the target.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention.
FIG. 2 is a schematic diagram of the random area interpolation operation of the method of the present invention, in which the Original data represents the Original image inputted to the random area interpolation module, the RRI module represents the random area interpolation module, the Classification network represents the CNN-13 Classification network, and lCERepresenting cross-entropy loss function, l, corresponding to the image with the true labelDivMean square representing correspondence of non-true label imagesThe variance loss function, mask denotes a binary mask, and Beta denotes a Beta distribution.
FIG. 3 is some sample diagrams of the method of the present invention, in which the first column, the second column represent the input image, and the third column represents the generated augmented image.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the semi-supervised image classification method based on random region interpolation provided in this embodiment includes the following steps:
s1, dividing the training data into real label image sets
Figure BDA0002781412870000071
No-genuine label image set
Figure BDA0002781412870000072
And test data collection
Figure BDA0002781412870000073
The method comprises the following specific steps:
firstly, horizontally turning an image with the probability of 0.5, then filling 2 pixel points in the width and height of the image, randomly cutting the image into 32x32 pixels, then subtracting the average value of the image pixels, and finally carrying out whitening treatment; classifying the image data according to requirements, firstly dividing the image data into training data and test data sets
Figure BDA0002781412870000081
Two types are adopted; the training data is divided into two categories: real label image collection
Figure BDA0002781412870000082
And genuine-label-free image collections
Figure BDA0002781412870000083
In a ratio of 1: 50 i.e. training data equal to
Figure BDA0002781412870000084
One real label image is recorded as
Figure BDA0002781412870000085
Namely, it is
Figure BDA0002781412870000086
An image without real label is recorded as
Figure BDA0002781412870000087
Namely, it is
Figure BDA0002781412870000088
S2, collecting images without real labels
Figure BDA0002781412870000089
Obtain an image without real label
Figure BDA00027814128700000810
Obtaining predicted class information through teacher network, and using the predicted class information as the image without real label
Figure BDA00027814128700000811
The temporary label is as follows:
for images without real labels
Figure BDA00027814128700000812
Labeling temporary labels, and adopting the same mode as the test stage for the CNN-13 classification network to fix the parameters without updating; prediction information of teacher network corresponding to CNN-13 classification network as true label-free image
Figure BDA00027814128700000813
The parameter of the teacher network corresponding to the CNN-13 classification network is the result of exponential sliding average of the CNN-13 classification network parameters.
S3, inputting two images with real labels and real labels thereof to a random area interpolation moduleObtaining corresponding augmented image sample pairs
Figure BDA00027814128700000814
Inputting two images without real labels and temporary labels thereof to a random area interpolation module to obtain corresponding augmented image sample pairs
Figure BDA00027814128700000815
As shown in the random area interpolation Module (RRI module) section on the left of FIG. 2, two pieces of original image data are input
Figure BDA00027814128700000816
And
Figure BDA00027814128700000817
after passing through a random region interpolation module, a new augmented image sample pair is obtained from the image with the real label
Figure BDA00027814128700000818
No real label image will obtain new augmented image sample pair
Figure BDA00027814128700000819
The resulting image
Figure BDA00027814128700000820
As shown in the third column of FIG. 3, the first and second columns of FIG. 3 represent the original image data, respectively
Figure BDA00027814128700000821
And
Figure BDA00027814128700000822
the random area interpolation module randomly selects a rectangular area from two images with real labels or two images without real labels according to beta distribution, interpolates image information in the same rectangular area in the second image and image information in the rectangular area in the first image, keeps the information of the first image in the images outside the rectangular area, and finally returns a new augmented image formed after interpolation; the specific situation is as follows:
a. inputting two images containing real labels
Figure BDA0002781412870000091
And
Figure BDA0002781412870000092
generating new pairs of augmented image samples
Figure BDA0002781412870000093
The specific process is as follows:
a1, inputting two images containing real label
Figure BDA0002781412870000094
And
Figure BDA0002781412870000095
Figure BDA0002781412870000096
and
Figure BDA0002781412870000097
the information of the image is represented by,
Figure BDA0002781412870000098
and
Figure BDA0002781412870000099
representing label information, wherein the spatial resolution of the two images is W multiplied by H, W represents the width of the images, and H represents the height of the images;
a2, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
a3, calculating a binary mask R with the spatial resolution of W × H, wherein R is a rectangular region (R ═ R)x,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;
Figure BDA00027814128700000910
rx~Unif(0,W-rw) Unif denotes uniform distribution;
Figure BDA00027814128700000911
ry~Unif(0,H-rh);
a4, generating new augmented image
Figure BDA00027814128700000912
Wherein
Figure BDA00027814128700000913
a5, generating new augmented image
Figure BDA00027814128700000914
Corresponding label
Figure BDA00027814128700000915
Wherein
Figure BDA00027814128700000916
a6, generating new pair of augmented image samples
Figure BDA00027814128700000917
Figure BDA00027814128700000918
Indicating the generation of new augmented image information,
Figure BDA00027814128700000919
representing label information;
b. inputting two images without real labels
Figure BDA00027814128700000925
And
Figure BDA00027814128700000926
generating new pairs of augmented image samples
Figure BDA00027814128700000920
Because the input image is the image without the real label, a temporary label needs to be generated on the image without the real label first, and then interpolation operation is carried out, and the specific process is as follows:
b1, inputting two images without real labels
Figure BDA00027814128700000921
And
Figure BDA00027814128700000922
Figure BDA00027814128700000923
and
Figure BDA00027814128700000924
representing image information, the spatial resolution of the two images being W × H;
b2, will
Figure BDA0002781412870000101
And
Figure BDA0002781412870000102
inputting the data into a teacher network corresponding to the CNN-13 classification network to obtain a corresponding temporary label
Figure BDA0002781412870000103
And
Figure BDA0002781412870000104
b3, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
b4, calculating a binary mask R with the spatial resolution of W multiplied by H, wherein the rectangular region R in R is equal to (Rx,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;
Figure BDA0002781412870000105
rx~Unif(0,W-rw);
Figure BDA0002781412870000106
ry~Unif(0,H-rh);
b5, generating a new augmented image
Figure BDA0002781412870000107
Wherein
Figure BDA0002781412870000108
b6, generating a new augmented image
Figure BDA0002781412870000109
Corresponding label
Figure BDA00027814128700001010
Wherein
Figure BDA00027814128700001011
b7, generating new pair of augmented image samples
Figure BDA00027814128700001012
Figure BDA00027814128700001013
Indicating the generation of new augmented image information,
Figure BDA00027814128700001014
indicating the label information.
S4, inputting the two pairs of the augmented image samples to the entrance of the whole network (CNN-13): an augmented image sample pair corresponding to the non-genuine label image of step S3
Figure BDA00027814128700001015
The other is an augmented image sample pair corresponding to the real label image
Figure BDA00027814128700001016
And for the training of the current round, constraint is carried out by using a loss function. As shown in the CNN-13 Classification network (Classification network) part on the right of FIG. 2, there is an augmented image sample pair corresponding to the true tag image
Figure BDA00027814128700001017
Using cross entropy loss function (l)CE) To constrain the corresponding pair of augmented image samples for true tag-free images
Figure BDA00027814128700001018
Using the mean square variance loss function (l)Div) To constrain.
The CNN-13 classification network comprises 9 convolutional layers (divided into 3 groups), 3 pooling layers and 1 full-connection layer, and the specific training process is as follows:
s41, inputting picture I (augmented image sample pair corresponding to true label image
Figure BDA0002781412870000111
For the augmented image sample pairs corresponding to the image without the real label
Figure BDA0002781412870000112
S42, obtaining a feature map F1 by the picture I through a first group of 128, 128 and 128 channel convolution layers, and obtaining a feature map F1' through a maximized pooling layer;
s43, passing the feature diagram F1 'through a second group of 256, 256 and 256 channel convolution layers to obtain a feature diagram F2, and passing through a maximized pooling layer to obtain a feature diagram F2';
s44, passing the feature diagram F2 'through a third group of 512, 256 and 128 channel convolution layers to obtain a feature diagram F3, and passing through a maximized pooling layer to obtain a feature diagram F3';
s45, passing the feature graph F3' through a 128 x 10 full connection layer to obtain a classification result score;
the loss function in the entire network is as follows:
Figure BDA0002781412870000113
in the formula, thetaCClassifying the network for CNN-13 requires updating the parameters,
Figure BDA0002781412870000114
representing the distribution of the synthetic data derived from the images with the real tags,
Figure BDA0002781412870000116
representing the distribution of the synthetic data derived from the non-true-label image, C (-) representing the output of the last hidden layer of the input CNN-13 classification network after the Softmax function,
Figure BDA0002781412870000115
output of the last hidden layer of the CNN-13 classification network representing the input,/CERepresents the cross-entropy loss function between the true label and the predicted value, lDivIs a function (e.g., mean square distance) that measures divergence between the training target and the CNN-13 classification network output, and the weight p is used to control the relative importance of the synthetic instances derived from the true label-free images.
Minimizing the loss function so that the network tends towards the nearest real label for each predicted result; in our setup, only a limited number of training images are used. The data expansion by using the image without the real label is very important, and the diversity and the number of training images can be expected to be effectively increased so as to improve the generalization capability of the detection model.
And S5, repeating the steps S2-S4, and finishing training after the preset training times are reached.
Real label image collection
Figure BDA0002781412870000122
And genuine-label-free image collections
Figure BDA0002781412870000121
The data size of the method is large, the training frequency is set to 400 in order to train the CNN-13 classification network well, and after all data are trained, the data are trained again in a disorderly mode until the preset frequency is reached, so that the characteristics of the sample can be fully learned. And updating the teacher network corresponding to the CNN-13 classification network every time training is finished, wherein the parameters of the teacher network corresponding to the CNN-13 classification network are the exponential sliding average of the parameters of the CNN-13 classification network.
S6 test data set
Figure BDA0002781412870000123
And testing and evaluating the trained CNN-13 classification network to obtain a prediction result.
Fixing the trained CNN-13 classification network, and not updating the CNN-13 classification network and using a loss function in the whole test process; gathering test data
Figure BDA0002781412870000124
And (3) sequentially inputting each image into the trained CNN-13 classification network, obtaining a corresponding prediction result for each image, and performing corresponding calculation with a real class label to obtain a test evaluation result.
In the following we use the Cifar10 dataset as an example and can divide it into 50000 training images and 10000 test images. And selecting 1000 training images as labeled images from 50000 training images, taking the rest images as non-labeled images, horizontally turning the images at the probability of 0.5, filling 2 pixel points in the width and height of the images, randomly cutting the images into 32 multiplied by 32 pixels, subtracting the average value of the image pixels, and finally putting the images into a CNN-13 classification network after whitening treatment.
Firstly, 100 images without real labels are input into a teacher network corresponding to the CNN-13 classification network to obtain temporary labels, and the temporary labels are endowed with images without real labels. And then inputting 100 real label images into a random area interpolation module to obtain an augmented image sample pair corresponding to the real label images. And then inputting 100 real label-free images endowed with temporary labels into a random area interpolation module to obtain an augmented image sample pair corresponding to the real label-free images. Then 100 augmented image sample pairs corresponding to the images with the real labels and 100 augmented image sample pairs corresponding to the images without the real labels are put into a CNN-13 classification network to train the CNN-13 classification network together. In the training process, the label information of the image with the real label is completely real, the temporary label information of the image without the real label is obtained by the teacher network corresponding to the CNN-13 classification network, the uncertainty is very large, and in order to avoid the total loss controlled by the synthesis example derived from the data without the real label, the weight rho is gradually increased to the maximum value of 100 in the previous 100 iterations. Training the initial learning rate to be 0.1, and reducing the initial learning rate to 0 according to a cosine annealing technology; the momentum is 0.9; the optimizer is stochastic echelon descent (SGD); the hyper-parameter a is set to 0.25.
According to the method, after the Cifar10 is trained for 400 times of iteration, the whole CNN-13 classification network basically tends to be stable, the classification result shows good effect, the semi-supervised image classification target is achieved, and a small number of images can bring huge promotion.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (6)

1. A semi-supervised image classification method based on random regional interpolation is characterized by comprising the following steps:
s1, dividing the training set data into real label image sets
Figure FDA0002781412860000011
No-genuine label image set
Figure FDA0002781412860000012
S2, collecting images without real labels
Figure FDA0002781412860000013
Obtain an image without real label
Figure FDA0002781412860000014
Obtaining predicted class score information as the non-true label image through a teacher network corresponding to the CNN-13 classification network
Figure FDA0002781412860000015
The temporary label of (a); wherein the parameters of the teacher network are obtained by exponential sliding average of CNN-13 classification network parameters;
s3, inputting two images with real labels to the inlet of the whole random area interpolation module to obtain a new augmented image sample pair
Figure FDA0002781412860000016
Figure FDA0002781412860000017
Indicating the generation of new augmented image information,
Figure FDA0002781412860000018
representing label information; inputting two images without real labels to the inlet of the whole random area interpolation module to obtain a new augmented image sample pair
Figure FDA0002781412860000019
Figure FDA00027814128600000110
Indicating the generation of new augmented image information,
Figure FDA00027814128600000111
representing label information;
s4, obtaining the new pair of the augmented image samples obtained in the step S3
Figure FDA00027814128600000112
And
Figure FDA00027814128600000113
inputting the data into a CNN-13 classification network for the current round of training and constraining by using a loss function;
and S5, repeating the steps S2-S4, finishing training after reaching the preset training times, outputting the trained CNN-13 classification network, and performing class prediction on the image to be classified by using the trained CNN-13 classification network.
2. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S1, scaling all images is required to achieve the desired training effect and reduce the data computation; classifying all data according to requirements, and firstly dividing all image data into training data and test data sets
Figure FDA00027814128600000114
Two types are adopted; the training data is divided into two categories: real label image collection
Figure FDA00027814128600000115
And genuine-label-free image collections
Figure FDA00027814128600000116
In a ratio of 1: 50 i.e. training data equal to
Figure FDA00027814128600000117
One real label image is recorded as
Figure FDA00027814128600000118
Namely, it is
Figure FDA00027814128600000119
An image without real label is recorded as
Figure FDA00027814128600000120
Namely, it is
Figure FDA00027814128600000121
3. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S2, the non-genuine label image set is processed
Figure FDA0002781412860000021
No real label image in
Figure FDA0002781412860000022
Labeling temporary labels, and adopting the same mode as the test stage for the teacher network corresponding to the CNN-13 classification network to fix the parameters without updating; the parameter of the teacher network is the result of exponential sliding average of the parameter of the CNN-13 classification network; taking the prediction result of the teacher network corresponding to the CNN-13 classification network as the image without the real label
Figure FDA0002781412860000023
The temporary label of (1).
4. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S3, the random area interpolation module randomly selects a rectangular area according to the beta distribution from the two images with real tags or the two images without real tags, interpolates the image information in the same rectangular area in the second image with the image information in the rectangular area in the first image, and the images outside the rectangular area keep the information of the first image, and finally returns a new augmented image formed after interpolation; the specific situation is as follows:
a. inputting two images containing real labels
Figure FDA0002781412860000024
And
Figure FDA0002781412860000025
generating new pairs of augmented image samples
Figure FDA0002781412860000026
The specific process is as follows:
a1, inputting two images containing real label
Figure FDA0002781412860000027
And
Figure FDA0002781412860000028
Figure FDA0002781412860000029
and
Figure FDA00027814128600000210
the information of the image is represented by,
Figure FDA00027814128600000211
and
Figure FDA00027814128600000212
representing label information, wherein the spatial resolution of the two images is W multiplied by H, W represents the width of the images, and H represents the height of the images;
a2, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
a3, calculating a binary mask R with the spatial resolution of W × H, wherein R is a rectangular region (R ═ R)x,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;
Figure FDA00027814128600000213
rx~Unif(0,W-rw) Unif denotes uniform distribution;
Figure FDA0002781412860000031
ry~Unif(0,H-rh);
a4, generating new augmented image
Figure FDA0002781412860000032
Wherein
Figure FDA0002781412860000033
a5, generating new augmented image
Figure FDA0002781412860000034
Corresponding label
Figure FDA0002781412860000035
Wherein
Figure FDA0002781412860000036
a6, generating new pair of augmented image samples
Figure FDA0002781412860000037
Figure FDA0002781412860000038
Indicating the generation of new augmented image information,
Figure FDA0002781412860000039
representing label information;
b. inputting two images without real labels
Figure FDA00027814128600000310
And
Figure FDA00027814128600000311
generating new pairs of augmented image samples
Figure FDA00027814128600000312
Because the input image is the image without the real label, a temporary label needs to be generated on the image without the real label first, and then interpolation operation is carried out, and the specific process is as follows:
b1, inputting two images without real labels
Figure FDA00027814128600000313
And
Figure FDA00027814128600000314
and
Figure FDA00027814128600000315
representing image information, the spatial resolution of the two images being W × H;
b2, will
Figure FDA00027814128600000320
And
Figure FDA00027814128600000321
inputting the data into a teacher network corresponding to the CNN-13 classification network to obtain a corresponding temporary label
Figure FDA00027814128600000323
And
Figure FDA00027814128600000322
b3, randomly obtaining a combination ratio lambda from the Beta distribution, namely lambda-Beta (alpha ), wherein Beta (alpha ) is an expression of the Beta distribution and is continuous probability distribution defined in a (0,1) interval, alpha is a hyper-parameter, and values of different data sets alpha are different;
b4, calculating a binary mask R with the spatial resolution of W multiplied by H, wherein the rectangular region R in R is equal to (Rx,ry,rw,rh) The value in (b) is 0, and the values in the other regions are 1; wherein (r)x,ry) Denotes the coordinate of the upper left corner, rwIndicates the width of the rectangular area, rhIndicating the height of the rectangular area;
Figure FDA00027814128600000316
rx~Unif(0,W-rw);
Figure FDA00027814128600000317
ry~Unif(0,H-rh);
b5, generating a new augmented image
Figure FDA00027814128600000318
Wherein
Figure FDA00027814128600000319
b6, generating a new augmented image
Figure FDA0002781412860000041
Corresponding label
Figure FDA0002781412860000042
Wherein
Figure FDA0002781412860000043
b7, generating new pair of augmented image samples
Figure FDA0002781412860000044
Figure FDA0002781412860000045
Indicating the generation of new augmented image information,
Figure FDA0002781412860000046
indicating the label information.
5. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S4, the new pair of augmented image samples is used
Figure FDA0002781412860000047
And
Figure FDA0002781412860000048
optimizing the CNN-13 classification network; since both the original image and the blended region are randomly determined, predicting class probability distributions of the composite image is a challenge in constructing the training target, and therefore, the CNN-13 classification network is forced to find important regions associated with the object and learn robust features for both classes of composite data, i.e., the generated new augmented image sample pairs
Figure FDA0002781412860000049
And
Figure FDA00027814128600000410
the network is optimized using two independent loss functions as follows:
Figure FDA00027814128600000411
wherein, thetaCClassifying the network for CNN-13 requires updating the parameters,
Figure FDA00027814128600000412
representing the distribution of the synthetic data derived from the images with the real tags,
Figure FDA00027814128600000413
representing the distribution of the synthetic data derived from the non-true-label image, C (-) representing the output of the last hidden layer of the input CNN-13 classification network after the Softmax function,
Figure FDA00027814128600000414
output of the last hidden layer of the CNN-13 classification network representing the input,/CERepresents the cross-entropy loss function between the true label and the predicted value, lDivIs a function for measuring divergence between the training target and the CNN-13 classification network output, using the weight p to control the relative importance of the synthetic instances derived from the non-true label images.
6. The semi-supervised image classification method based on random regional interpolation as claimed in claim 1, wherein: in step S5, the training frequency is set to 400, and when all data are trained, the data are trained again until the preset frequency is reached; wherein, every time training is finished, the teacher network corresponding to the CNN-13 classification network is updated; after the trained CNN-13 classification network is obtained, fixing the parameters of the trained CNN-13 classification network, not updating the parameters of the CNN-13 classification network, and performing class prediction on the images to be classified without using a loss function: and (3) sequentially inputting the images to be classified into the CNN-13 classification network, wherein each image can obtain a corresponding prediction result.
CN202011282976.4A 2020-11-17 2020-11-17 Semi-supervised image classification method based on random regional interpolation Active CN112381148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011282976.4A CN112381148B (en) 2020-11-17 2020-11-17 Semi-supervised image classification method based on random regional interpolation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011282976.4A CN112381148B (en) 2020-11-17 2020-11-17 Semi-supervised image classification method based on random regional interpolation

Publications (2)

Publication Number Publication Date
CN112381148A true CN112381148A (en) 2021-02-19
CN112381148B CN112381148B (en) 2022-06-14

Family

ID=74584880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011282976.4A Active CN112381148B (en) 2020-11-17 2020-11-17 Semi-supervised image classification method based on random regional interpolation

Country Status (1)

Country Link
CN (1) CN112381148B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408575A (en) * 2021-05-12 2021-09-17 桂林电子科技大学 Image data augmentation method based on discriminant area positioning
CN113420786A (en) * 2021-05-31 2021-09-21 杭州电子科技大学 Semi-supervised classification method for feature mixed image

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364016A (en) * 2018-01-12 2018-08-03 华南理工大学 Gradual semisupervised classification method based on multi-categorizer
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN109657697A (en) * 2018-11-16 2019-04-19 中山大学 Classified optimization method based on semi-supervised learning and fine granularity feature learning
CN110097103A (en) * 2019-04-22 2019-08-06 西安电子科技大学 Based on the semi-supervision image classification method for generating confrontation network
CN111275129A (en) * 2020-02-17 2020-06-12 平安科技(深圳)有限公司 Method and system for selecting image data augmentation strategy
CN111368660A (en) * 2020-02-25 2020-07-03 华南理工大学 Single-stage semi-supervised image human body target detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108364016A (en) * 2018-01-12 2018-08-03 华南理工大学 Gradual semisupervised classification method based on multi-categorizer
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN109657697A (en) * 2018-11-16 2019-04-19 中山大学 Classified optimization method based on semi-supervised learning and fine granularity feature learning
CN110097103A (en) * 2019-04-22 2019-08-06 西安电子科技大学 Based on the semi-supervision image classification method for generating confrontation network
CN111275129A (en) * 2020-02-17 2020-06-12 平安科技(深圳)有限公司 Method and system for selecting image data augmentation strategy
CN111368660A (en) * 2020-02-25 2020-07-03 华南理工大学 Single-stage semi-supervised image human body target detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VIKAS VERMA ET AL.: "Interpolation Consistency Training for Semi-Supervised Learning", 《ARXIV》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408575A (en) * 2021-05-12 2021-09-17 桂林电子科技大学 Image data augmentation method based on discriminant area positioning
CN113408575B (en) * 2021-05-12 2022-08-19 桂林电子科技大学 Image data augmentation method based on discriminant area positioning
CN113420786A (en) * 2021-05-31 2021-09-21 杭州电子科技大学 Semi-supervised classification method for feature mixed image

Also Published As

Publication number Publication date
CN112381148B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
CN111242208B (en) Point cloud classification method, segmentation method and related equipment
CN110097095B (en) Zero sample classification method based on multi-view generation countermeasure network
CN110837836A (en) Semi-supervised semantic segmentation method based on maximized confidence
CN110852273A (en) Behavior identification method based on reinforcement learning attention mechanism
CN109743642B (en) Video abstract generation method based on hierarchical recurrent neural network
CN112381148B (en) Semi-supervised image classification method based on random regional interpolation
CN110889450B (en) Super-parameter tuning and model construction method and device
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN113313123B (en) Glance path prediction method based on semantic inference
CN115731441A (en) Target detection and attitude estimation method based on data cross-modal transfer learning
CN112651998A (en) Human body tracking algorithm based on attention mechanism and double-current multi-domain convolutional neural network
CN114360067A (en) Dynamic gesture recognition method based on deep learning
CN111753207A (en) Collaborative filtering model of neural map based on comments
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
Zhou et al. Attention transfer network for nature image matting
CN111259938A (en) Manifold learning and gradient lifting model-based image multi-label classification method
CN112529025A (en) Data processing method and device
CN117152427A (en) Remote sensing image semantic segmentation method and system based on diffusion model and knowledge distillation
CN114841778B (en) Commodity recommendation method based on dynamic graph neural network
CN111209886A (en) Rapid pedestrian re-identification method based on deep neural network
CN113658285B (en) Method for generating face photo to artistic sketch
Yue et al. A Novel Two-stream Architecture Fusing Static And Dynamic Features for Human Action Recognition
Chang et al. STAU: a spatiotemporal-aware unit for video prediction and beyond
Wu et al. DDFPN: Context enhanced network for object detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant