CN113657455B

CN113657455B - Semi-supervised learning method based on triple play network and labeling consistency regularization

Info

Publication number: CN113657455B
Application number: CN202110837568.9A
Authority: CN
Inventors: 蒋雯; 苗旺; 耿杰
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2024-02-09
Anticipated expiration: 2041-07-23
Also published as: CN113657455A

Abstract

The invention discloses a semi-supervised learning method based on triple network and label consistency regularization, which comprises the following steps: step one, inputting an image data set and a corresponding label set; step two, preprocessing the labeled and unlabeled data sets; step three, constructing and training a depth network of the self-adaptive vision mechanism, and extracting depth characteristics of an image dataset; step four, constructing a twin network, and acquiring a forward propagation result and a pseudo tag by using depth characteristics of marked data and unmarked data; fifthly, constructing a loss function of training annotation data and non-annotation data by utilizing a forward propagation result and a pseudo tag, and performing semi-supervised training on the twin network; according to the invention, a triple network is constructed to train the data set with insufficient annotation data, a self-adaptive vision mechanism generation countermeasure network is firstly established, and unsupervised learning is carried out on the image data set, so that the feature extraction is more effective, and the difference between heterogeneous annotation data and non-annotation data feature extraction can be eliminated; then, a twin network is built and trained based on the principle of labeling consistency, the difference between the similar labeling data and the non-labeling data characteristic discrimination can be eliminated, the number of network training parameters is reduced, and semi-supervised learning is performed by using the non-labeling data more effectively.

Description

Semi-supervised learning method based on triple play network and labeling consistency regularization

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a triple-play network and annotation consistency regularization-based semi-supervised learning method.

Background

In recent years, the deep learning method has made rapid development in the field of artificial intelligence, and has a "paradigm" breakthrough meaning for informatization in the fields of intelligence, nerves, thinking and the like. The deep learning technology is a branch of machine learning, and is an algorithm for carrying out characterization learning on data by taking an artificial neural network as a framework. The deep learning method generally uses a large amount of labeling data, learns a fully supervised model, and has obtained a good application effect. However, such fully supervised learning is costly and time consuming, and the data annotation must be done manually by researchers having expert knowledge in the relevant field. Because some image datasets have the characteristics of high intra-class diversity and high inter-class similarity, the data are difficult to accurately label.

Therefore, for deep learning of different tasks in an actual scene, only a subset of the training set usually has tags due to the diversity of data sources, and none of the rest of the data has tags. This occurs in various types of tasks, especially for image multi-classification tasks. Under the condition of insufficient labeling supervision information, the model cannot be fully fitted, so that differences exist between the extraction of the labeling data and the non-labeling data characteristics, the correlation among the data cannot be fully utilized, and the model with strong generalization capability is obtained.

The problem of data annotation is always the important research field of computer vision and artificial intelligence, and in order to improve the efficiency of a deep learning model, a semi-supervised multi-classification technology for researching annotation consistency is required to eliminate the difference of model feature extraction of an insufficient fitting model.

The existing prior art has the difference between the extraction of the marked data and the unmarked data, so that a semi-supervised learning method with marked consistency is urgently needed, full learning is carried out on the unmarked data in the data set, and convenience is provided for deep learning multi-classification tasks with insufficient marked data in the follow-up actual scene.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a semi-supervised learning method based on triple network and label consistency regularization, which has the advantages of simple structure and reasonable design.

Considering that the actual scene has the characteristic of incomplete information, the acquired data has the phenomenon of general labeling shortage, so that the supervision information is seriously insufficient, the difference exists between the extraction of the labeling data and the non-labeling data, and the training effect and the generalization capability of deep learning network classification are limited. In order to solve the technical problems, the invention adopts the following technical scheme: the semi-supervised learning method based on triple network and label consistency regularization is characterized by comprising the following steps of: the method comprises the following steps:

step one, inputting an image data set and a corresponding label set thereof:

step 101, inputting an image dataset V, in particular v= { V ₁ ,...v _i ,...v _l The label data X= { X }, is divided into ₁ ,...x _f ,...x _n And unlabeled data u= { U } ₁ ,...u _j ,...u _m Vi represents the ith image sample data, i is not less than 1 and not more than l, f is not less than 1 and not more than n, j is not less than 1 and not more than m, and l=m+n, wherein n, m and l are positive integers;

step 102, inputting a label set corresponding to the image set V, and labeling data x= { X ₁ ,...x _i ,...x _n The label of } is p= { p ₁ ,...p _i ,...p _n No-label data u= { U } ₁ ,...u _j ,...u _m No tag.

Step two, preprocessing the labeled and unlabeled data sets:

and step 201, performing data enhancement on the image marked data X and the unmarked data U, wherein the marked data is subjected to single enhancement, and enhanced data X' is obtained. For the non-marked data, carrying out K times of random enhancement to obtain enhanced data U';

and 202, mixing the data X 'and the data U', and randomly arranging to obtain a data combination W, wherein the label of the enhanced data is consistent with the original label.

Step three, constructing and training a depth network of the self-adaptive vision mechanism, which is used for extracting depth characteristics of the image dataset:

step 301, constructing and generating a antagonism network G, which is divided into a data generator and a discriminator.

Step 302, self-convolution layer is used in generating an reactance network, and an adaptive convolution kernel generating function is set based on the principle of space specificity and frequency domain independence. And outputting a convolution kernel with the same size as the feature map according to the input image features, controlling the scaling adjustment parameter number, and scaling the feature map channel.

And 303, deleting labels marked with data in the image set V, performing unsupervised learning on the generated countermeasure network G by using the whole image set V without marking, enabling the pseudo data characteristics generated by the generator to be close to real image characteristics, and enhancing the characteristic representation capability of the discriminator by using a self-convolution layer.

Step 304, a trained discriminator G for generating a challenge network G _d As a feature extractor F _d For extracting target image annotation data X= { X ₁ ,...x _f ,...x _n And unlabeled data u= { U } ₁ ,...u _j ,...u _m Depth feature x _labeled ＝F _d (x _f ) And x _unlabeled ＝F _d (u _j )。

Step four, constructing a twin network, and acquiring a forward propagation result and a pseudo tag by using depth characteristics of the marked data and the unmarked data:

step 401, constructing two shallow classification networks Net ₁ With Net ₂ As a twin network, the data combination W is input;

step 402, for the labeling data, inputting the enhanced data X' and the corresponding label p to obtain depth feature X _labeled ＝F _d (X') predicting by utilizing a twin network, wherein the forward propagation result is thatWherein p is _d1 And p _d2 Is Net ₁ With Net ₂ W is a hyper-parameter;

step 403, for unlabeled data, input enhancementThe depth characteristic x is obtained from the data U' after the data _unlabeled ＝F _d (U') predicting by twin network, and taking output weighted average as forward propagation result p _j ，Wherein (1)>And->And for predicting the unmarked data by the twin network, θ is a network training parameter.

And step 404, sharpening the prediction without the marked data to obtain a pseudo tag q. Wherein the sharpening operation is specificallyT is the sharpening parameter, K is the number of enhancements, and P (U; θ) is the predicted probability of the network for each class.

And 405, performing label fusion on the pseudo label q predicted by the twin network. Specifically, the pseudo tag after fusion is:wherein (1)>For network Net ₁ Sharpened pseudo tag->For network Net ₂ The sharpened pseudo tag, lambda, obeys a probability distribution set according to the actual dataset.

Fifthly, constructing a loss function of training annotation data and non-annotation data by utilizing a forward propagation result and a pseudo tag, and performing semi-supervised training on the twin network:

step 501, a semi-supervised labeling consistency regularization loss function is established, a regularization item of difference between labeling data and non-labeling data is calculated according to each category, and the difference between labeling data of the same category and non-labeling data is eliminated, as follows:

wherein num is the number of categories, x _labeled 、x _unlabeled For the depth characteristics of the image marked data and the unmarked data, class-b is the b category;

step 502, for the enhanced marked data X', a loss function is established as follows:

step 503, for the enhanced non-labeling data U', a loss function is established as follows:

wherein, X 'is equal to the number of samples per batch, U' is equal to K times the number of samples per batch,is a cross entropy function, x, p is enhanced labeled data and labels, u, q is enhanced unlabeled data and pseudo labels.

Step 504, the overall loss function L is a weighting of the three, as follows:

L＝L _X +λ _U L _U +β _U Loss _{semi-supervised}

wherein lambda is _U 、β _U Is a super parameter. And (3) carrying out classification test on the trained twin network model by utilizing the integral loss function L through continuous iteration.

Compared with the prior art, the invention has the following advantages:

1. the invention has simple structure, reasonable design and convenient realization, use and operation.

2. The invention adopts the generation of the self-adaptive visual mechanism to perform unsupervised learning on the data set by using the countermeasure network, and performs the depth feature extraction of the data set on the trained model, so that the difference between the feature extraction of the marked data and the feature extraction of the unmarked data among different categories can be effectively eliminated, the feature extraction and the selection are more robust, the integrity of the image information is protected, and the semi-supervised multi-classification performance is improved.

3. The invention uses the twin network to carry out semi-supervised learning based on the thought of labeling consistency, can effectively eliminate the difference of labeling data and non-labeling data characteristic discrimination among the same categories, avoids the difference of classification results caused by characteristic difference, has relatively smaller training parameter quantity and has higher effectiveness and correctness.

In conclusion, the invention has simple structure and reasonable design. According to the invention, a triple network is constructed to train the data set with insufficient annotation data, a self-adaptive vision mechanism generation countermeasure network is firstly established, and unsupervised learning is carried out on the image data set, so that the feature extraction is more effective, and the difference between heterogeneous annotation data and non-annotation data feature extraction can be eliminated; then, a twin network is built and trained based on the principle of labeling consistency, the difference between the similar labeling data and the non-labeling data characteristic discrimination can be eliminated, the number of network training parameters is reduced, and semi-supervised learning is performed by using the non-labeling data more effectively.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of being practiced otherwise than as specifically illustrated and described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

As shown in fig. 1, the present invention includes the steps of:

step one, inputting an image data set and a corresponding label set thereof:

step 101, inputting an image dataset V, in particular v= { V ₁ ,...v _i ,...v _l The label data X= { X }, is divided into ₁ ,...x _f ,...x _n And unlabeled data u= { U } ₁ ,...u _j ,...u _m }, where v _i Representing the ith image sample data, wherein i is not less than 1 and not more than l, f is not less than 1 and not more than n, j is not less than 1 and not more than m, l=m+n, and n, m and l are positive integers;

Step two, preprocessing the labeled and unlabeled data sets:

step 301, constructing and generating a antagonism network G, which is divided into a data generator and an identifier, wherein the antagonism network is generated by using DCGAN, and particularly adopting Resnet-18.

Step 302, self-convolution layer is used in generating an reactance network, and an adaptive convolution kernel generating function is set based on the principle of space specificity and frequency domain independence. And outputting a convolution kernel with the same size as the feature map according to the input image features, controlling the scaling parameter, and scaling the feature map channels by using a 1x1 convolution check, so as to obtain the feature map, wherein the number of the output feature channels is (Z x Gs), Z is the size of the subsequent self-convolution kernel, and Gs represents the grouping number of the self-convolution operation.

Step 304, a trained discriminator G for generating a challenge network G _d The full connection layer is removed, and the convolution layer is reserved as a feature extractor F _d For extracting target image annotation data X= { X ₁ ,...x _f ,...x _n And unlabeled data u= { U } ₁ ,...u _j ,...u _m Depth feature x _labeled ＝F _d (x _f ) And x _unlabeled ＝F _d (u _j )。

step 401, constructing two shallow classification networks Net ₁ With Net ₂ As a twin network, the data combination W is input, wherein the shallow classification network uses VGG-11;

step 403, for unlabeled data, inputting enhanced data U' to obtain depth feature x _unlabeled ＝F _d (U') predicting by twin network, and taking output weighted average as forward propagation result p _j ，Wherein (1)>And->And for predicting the unmarked data by the twin network, θ is a network training parameter.

And 405, performing label fusion on the pseudo label q predicted by the twin network. Specifically, the pseudo tag after fusion is:wherein (1)>For network Net ₁ Sharpened pseudo tag->For network Net ₂ Sharpened pseudo tags, lambda-Beta (alpha ), alpha sets a compliant probability distribution according to the actual dataset.

Step 504, the overall loss function L is a weighting of the three, as follows:

L＝L _X +λ _U L _U +β _U Loss _{semi-supervised}

The foregoing is merely an embodiment of the present invention, and the present invention is not limited thereto, and any simple modification, variation and equivalent structural changes made to the foregoing embodiment according to the technical matter of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A semi-supervised learning method based on triple play network and label consistency regularization is characterized by comprising the following steps:

step one, inputting an image data set and a corresponding label set thereof:

step 102, inputting a label set corresponding to the image set V, and labeling data x= { X ₁ ,...x _i ,...x _n The label of } is p= { p ₁ ,...p _i ,...p _n No-label data u= { U } ₁ ,...u _j ,...u _m No tag;

step two, preprocessing the labeled and unlabeled data sets:

step 201, performing data enhancement on the image marked data X and the unmarked data U, wherein the marked data is subjected to single enhancement to obtain enhanced data X'; for the non-marked data, carrying out K times of random enhancement to obtain enhanced data U';

step 202, mixing the data X 'and the data U', and randomly arranging to obtain a data combination W, wherein the label of the enhanced data is consistent with the original label;

step 301, constructing and generating a antagonism network G, which is divided into a data generator and a discriminator;

step 302, using a self-convolution layer in generating an reactance network, and setting an adaptive convolution kernel generating function based on the principles of space specificity and frequency domain independence; outputting a convolution kernel with the same size as the feature map according to the input image features, controlling a scaling ratio to adjust the parameter number, and scaling the feature map channel;

step 303, deleting labels marked with data in the image set V, performing unsupervised learning on the generated countermeasure network G by using all the image sets V without marks, enabling the pseudo data characteristics generated by the generator to be close to real image characteristics, and enhancing the characteristic representation capability of the discriminator by using a self-convolution layer;

step 304, a trained discriminator G for generating a challenge network G _d As a feature extractor F _d For extracting target image annotation data X= { X ₁ ,...x _f ,...x _n And unlabeled data u= { U } ₁ ,...u _j ,...u _m Depth feature x _labeled ＝F _d (x _f ) And x _unlabeled ＝F _d (u _j )；

step 403, for unlabeled data, inputting enhanced data U' to obtain depth feature x _unlabeled ＝F _d (U') predicting with twin network, and taking output weighted average as forward propagation resultWherein (1)>And->Predicting non-labeling data for the twin network, wherein θ is a network training parameter;

step 404, sharpening the prediction without the marked data to obtain a pseudo tag q; wherein the sharpening operation is specificallyT is sharpening parameter, K is enhancement times, P (U; θ) is the prediction probability of the network for each class;

step 405, performing label fusion on a pseudo label q predicted by a twin network; specifically, the pseudo tag after fusion is:wherein (1)>For network Net ₁ Sharpened pseudo tag->For network Net ₂ Sharpened pseudo tags, lambda obeys probability distribution set according to the actual dataset;

wherein, X 'is equal to the number of samples per batch, U' is equal to K times the number of samples per batch,is a cross entropy function, x, p are enhanced marked data and labels, u, q are enhanced unmarked data and pseudo labels;

step 504, the overall loss function L is a weighting of the three, as follows:

L＝L _X +λ _U L _U +β _U Loss _{semi-supervised}

wherein lambda is _U 、β _U Is a super parameter; and (3) carrying out classification test on the trained twin network model by utilizing the integral loss function L through continuous iteration.