CN116894985B

CN116894985B - Semi-supervised image classification method and semi-supervised image classification system

Info

Publication number: CN116894985B
Application number: CN202311152314.9A
Authority: CN
Inventors: 刘萍萍; 陈鹏飞; 周求湛
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2023-09-08
Filing date: 2023-09-08
Publication date: 2023-12-15
Anticipated expiration: 2043-09-08
Also published as: CN116894985A

Abstract

The invention discloses a semi-supervised image classification method and a semi-supervised image classification system, belonging to the field of image classification. Dividing the acquired images into a training set and a verification set, and dividing the training set images into marked images and unmarked images; constructing a semi-supervised image classification model; for the marked image, calculating the prediction of the input image and the supervision loss of the corresponding real label by using the semi-supervision image classification modelThe method comprises the steps of carrying out a first treatment on the surface of the For an unlabeled image, carrying out weak enhancement and strong enhancement on the image to obtain a weak enhanced image and a strong enhanced image; obtaining an original image of an unlabeled image, and outputting feature vectors of the weakly enhanced image and the strongly enhanced image after passing through a feature extractor; regarding the feature vector as an anchor point; considering features obtained from weakly enhanced image disturbances as positive samples and features obtained from strongly enhanced image disturbances as negative samples; and the generalization capability of the semi-supervised image classification model is improved. The accuracy of image classification is improved.

Description

Semi-supervised image classification method and semi-supervised image classification system

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to a semi-supervised image classification method and system based on a pseudo tag quality dynamic weighing coefficient.

Background

With the advent of neural networks, supervised deep learning methods achieved superior performance of deep learning often due to the number of labels in the dataset. However, the acquisition of labels is cumbersome and expensive, and thus, to alleviate, semi-supervised approaches have evolved to solve these problems.

Semi-supervised (semi-supervised learning, SSL) learning trains the network from a small amount of marked data, and extends the distribution areas of different categories in the feature space through a large amount of unlabeled data, so that the generalization capability of the network is improved. How to effectively use unlabeled images, self-training becomes an important means among them. Typical self-training methods update a model by assigning false labels to unlabeled samples through prediction of the model, and then iteratively training with the false label samples. Previous work in semi-supervised learning (SSL) proposed applying entropy minimization or consistency regularization on unlabeled images. As increasingly complex mechanisms are introduced into this field, fixMatch breaks this trend, and a remarkable effect is achieved by fusing the two methods into one network using pseudo tags and strong and weak perturbations. The FlexMatch introduces a dynamic threshold to further improve the utilization capability of the unlabeled data of the model, however, the methods do not divide the intra-class compactness and the intra-class separability aiming at strong and weak disturbance, so that the feature extraction capability and the potential full learning capability of the model are limited. A method for introducing contrast learning aiming at the problems above, intra class variance loss loss is proposed, and differentiation is carried out to enhance the characteristic generating capability of the model.

Previous semi-supervised learning approaches often employ confidence thresholds (confidence thresholding) for pseudo tag selection. For example, in FixMatch, only the data with the confidence coefficient greater than a fixed threshold (0.95) is generated into a pseudo tag, and other results which do not meet the confidence coefficient and have lower certainty are discarded in the training iteration process. The excessively high threshold greatly improves the correctness of the pseudo tag, ensures that the model is not affected by the noise of the false tag, but discards a large number of uncertain pseudo tags, causes non-uniformity of learning among categories, is further aggravated along with the self-training, finally causes the Martai effect, and wastes a large amount of potential information in samples. Dynamic thresholds introduce more false labels to participate in training in the early stage by lowering the thresholds (of different categories/different data) in the early stage, but low thresholds in the early stage inevitably introduce false labels with low quality, and noise caused by false labels eventually reduces the accuracy of algorithm classification. So, too high a threshold may result in low utilization of the pseudo tag sort, and even if the pseudo tags utilized are mostly correct (high quality), good classifiers cannot be learned. For common dynamic thresholds, even though a lower threshold is used in the early stages of training to improve utilization, too many false labels, such as Flexmatch, are introduced in the false labels.

Disclosure of Invention

The invention provides a semi-supervised image classification method based on a pseudo tag quality dynamic weighing coefficient, which improves the accuracy of image classification.

The invention provides a semi-supervised image classification system based on a pseudo tag quality dynamic weighing coefficient, which is used for realizing a semi-supervised image classification method based on the pseudo tag quality dynamic weighing coefficient.

The invention is realized by the following technical scheme:

a semi-supervised image classification method based on pseudo tag quality dynamic trade-off coefficients, the semi-supervised image classification method comprising the steps of,

step 1: dividing the acquired images into a training set and a verification set, and dividing the training set images into marked images and unmarked images;

step 2: constructing a semi-supervised image classification model;

step 3: for the marked image, calculating the prediction of the input image and the supervision loss of the corresponding real label by using the semi-supervision image classification model in the step 2；

Step 4: for an unlabeled image, carrying out weak enhancement and strong enhancement on the image to obtain a weak enhanced image and a strong enhanced image;

step 5: acquiring an original image of an unlabeled image, and outputting feature vectors after the original image, the weakly enhanced image and the strongly enhanced image in the step 3 pass through a feature extractor;

Step 6: taking the feature vector of the original image in the step 5 as an anchor point;

step 7: considering the features obtained from the weakly enhanced image disturbance of step 4 as positive samples and the features obtained from the strongly enhanced image disturbance of step 4 as negative samples;

step 8: increasing the distance between the positive sample and the negative sample improves the generalization capability of the semi-supervised image classification model by reducing the distance between the anchor point of step 6 and the positive sample of step 7.

Further, a contrast loss or triple loss measurement learning method is adopted to construct a loss function；

Then, obtaining corresponding image predictions from both enhanced images in a backbone; separately calculating a fixed threshold based consistency loss using two enhanced predictionsAnd a consistency loss based on a fixed dynamic threshold ∈ ->Finally, the overall loss without tag is obtained>Then calculate adaptive fairness penalty ∈ ->Encouraging the model to make different predictions for each class, thereby yielding meaningful dynamic thresholds, in the case of few labeled data; finally, the total loss function consists of +.>、/>And->And the components are combined together.

Further, the loss functionIt is the result that the liquid crystal display device,

wherein the method comprises the steps of， Is the feature vector obtained by the original image through the feature encoder, < > >Is the eigenvector obtained by the weak enhanced sample, +.>Is a feature vector obtained by strongly enhancing the sample, +.>Is a super parameter used to control the relative distance of the three in the metrology space.

Further, the global dynamic threshold is specifically that EMA is used for calculating the change of the dynamic threshold, and the change speed of the dynamic threshold is controlled through a momentum attenuation factor lambda; in each iteration, the confidence level of all unlabeled data is only needed once, and the EMA is utilized to update the global dynamic threshold; global dynamic thresholdIs that

，

Where c is the number of categories of the dataset,for EMA weight coefficient, < >>Is the ratio of the size of the unlabeled data to the labeled data batch,/the label data>Is the previous iteration number +.>Global dynamic threshold of time of day->Is of a batch size, +.>Is unlabeled picture b in batch,>is->Representing the likelihood that the model outputs unlabeled picture b prediction,/for example>Is the number of iterations, +.>Is when the number of iterations->Is a time of day (c).

Further, the local dynamic threshold is specifically, for each class c in the dataset, a global estimate is computed by EMAThen use +.>Measuring the learning value of the sample by comprehensively measuring the global dynamic threshold value +.>Multiplied byRaising or lowering the threshold value of the specific class c on the basis of the global dynamic threshold value;

In order for all classes to initially have the same competitiveness, the initial dynamic threshold for each class is set so that the final adaptive thresholdIs that

Wherein,is global dynamic threshold +.>For the expected maximum-norm regularization of the c-like sample prediction at time t, the calculation is as follows:

=[/>]is composed of all->List of->Is the predicted expected value of the model for each class c; the consistency penalty incurred by the dynamically self-adjusting threshold is:

wherein,is of lot size>For cross entropy function>And->Respectively indicate->Andabbreviations of (i.e., probability results of strong and weak enhanced images through model,)>Is composed of->Converted "one-hot" tag, ">Is the ratio of unlabeled vxx to the size of the marked data batch, +.>Is a confidence function, ++>Is a global dynamic threshold.

Further, the dynamic weight is that the too high threshold value causes low utilization rate of the pseudo tag, and the dynamic threshold value is provided with a dynamic weighting coefficient trade-off of the pseudo tag quality, so that the consistency loss generated by the fixed threshold value with high accuracy is balanced in the initial iteration stageLoss of consistency with dynamic threshold value ∈>The method comprises the steps of carrying out a first treatment on the surface of the The quality of the pseudo tag in the training process of the model is measured by utilizing a global self-adaptive threshold; the unlabeled dataset loss for the entire model is:

Wherein,is a super parameter，/>Is a fixed threshold value, is->Weight coefficient, which is a measure of the quality and quantity of pseudo tags, +.>Other time instances.

Further, the adaptive fairness penaltySpecifically, the mini-batch is optimizedAnd->Cross entropy of (2) as->；

，

Wherein,is the average expectation of unlabeled picture prediction in one batch, < >>Is the histogram distribution of pseudo tags in one batch,/->Is a histogram distribution function, +.>Is->Predictive value for model on strongly enhanced image, < +.>Is->The converted "one-hot" label is used,

for the followingThe EMA scheme is also used for updating:

)，

wherein,is +.>，/>Is +.about.1 times of iteration>，/>Is a "one-hot" tag that models output the prediction likelihood of the b-th picture,

adaptive fairness at t-th iteration (SAF)Is that

，

Wherein the method comprises the steps of,/>Encouraging the output probability of each small batch to be close to the marginal class distribution of the model, and normalizing the histogram distribution; it helps the model to generate diversified predictions;

the overall penalty of the algorithm is therefore:

+/>+/>,

wherein the method comprises the steps of,/>,/>Is a weight coefficient.

A semi-supervised image classification system based on pseudo tag quality dynamic trade-off coefficients comprises an image acquisition module, a semi-supervised image classification module and an optimization module.

The image acquisition module: dividing the acquired images into a training set and a verification set, and dividing the training set images into marked images and unmarked images;

the semi-supervised image classification module is used for constructing a semi-supervised image classification model;

the optimizing module is used for: for the marked image, calculating the prediction of the input image and the supervision loss of the corresponding real label by using the semi-supervision image classification model；

For an unlabeled image, carrying out weak enhancement and strong enhancement on the image to obtain a weak enhanced image and a strong enhanced image;

obtaining an original image of an unlabeled image, and outputting feature vectors of the weakly enhanced image and the strongly enhanced image after passing through a feature extractor;

regarding the feature vector as an anchor point;

considering features obtained from weakly enhanced image disturbances as positive samples and features obtained from strongly enhanced image disturbances as negative samples;

the generalization capability of the semi-supervised image classification model is improved by reducing the distance between the anchor point and the positive sample and increasing the distance between the positive sample and the negative sample.

An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

and the processor is used for realizing the steps of the method when executing the program stored in the memory.

A computer readable storage medium having stored therein a computer program which when executed by a processor performs the above-described method steps.

The beneficial effects of the invention are as follows:

the semi-supervised learning framework of the invention combines a measurement learning method and aims at considering intra-class differences while strengthening inter-class differences. The framework can better utilize the intra-class information and improve the performance of the model.

The invention provides a dynamic consistency loss weight scheme, solves the challenge of pseudo tag threshold selection, balances the quantity and quality of pseudo tags, and improves the accuracy and the utilization rate of pseudo tag data.

The invention performs a number of experiments on the baseline dataset, verifying the validity of the proposed framework. Experimental results show that compared with the existing semi-supervised learning method, the method has the advantage that the classification performance is remarkably improved.

Drawings

FIG. 1 is a schematic diagram of a semi-supervised image classification model of the present invention.

Fig. 2 is a schematic diagram of the adaptive threshold of the present invention.

FIG. 3 is a graph of intra-class difference distance of the present applicationLine graph of (2)

FIG. 4 is a weight coefficient of the present applicationIs a line graph of (2).

FIG. 5 is a fixed threshold of the present applicationIs a line graph of (2).

FIG. 6 is a graph of the weighting factors of the present applicationIs a line graph of (2).

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The following description of the embodiments of the present application will be made more fully with reference to the accompanying drawings, in which 1-6 are shown, it being apparent that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

Example 1

step 2: constructing a semi-supervised image classification model;

step 3: for the marked image, calculating the prediction of the input image and the supervision loss of the corresponding real label by using the semi-supervision image classification model in the step 2 ；

Further, specifically, a loss function is constructed by adopting a metric learning method such as contrast loss or triple lossThe method comprises the steps of carrying out a first treatment on the surface of the The goal of this loss function is to minimize the distance between the anchor point and the positive sample, while maximizing the distance between the anchor point and the negative sample. By minimizing the loss function, features generated by different disturbances can be better distinguished, thereby improving feature discrimination and generalization capabilities.

Then, obtaining corresponding image predictions from both enhanced images in a backbone; separately calculating a fixed threshold based consistency loss using two enhanced predictionsAnd a consistency loss based on a fixed dynamic threshold ∈ ->Finally, the overall loss without tag is obtained>The goal of this step is to prevent that too high a threshold results in low utilization of the pseudo tag arrangement, and even if the pseudo tags utilized are mostly correct (high quality), good classifiers cannot be learned, and that a lower threshold, while improving utilization, introduces too many false tags in the pseudo tags. Then calculate adaptive fairness penalty->Encouraging the model to make different predictions for each class, resulting in meaningful dynamic thresholds, especially in the case of few labeled data, and finally, the total loss function is defined by +.>、/>And->And the components are combined together.

wherein the method comprises the steps of， Is the feature vector obtained by the original image through the feature encoder, < >>Is the eigenvector obtained by the weak enhanced sample, +.>Is a feature vector obtained by strongly enhancing the sample, +.>Is a super parameter used to control the relative distance of the three in the metrology space.

Further, the global dynamic threshold is specifically that when a fixed threshold τ is used, only high quality unlabeled data contributes to the training of the model, while other large amounts of unlabeled data tend to be ignored. Especially in the early stages of training, only a few unlabeled data have a predictive confidence above the threshold.

To solve this problem, a method of dynamic threshold variation is proposed which intuitively reflects the confidence of unlabeled data and the confidence is positively correlated with the overall learning state of the model. The concept of a global dynamic threshold is introduced by inspiring by the FreeMatch method, and is used for reflecting the overall quality of the pseudo tag, and the global threshold tau is set as the average confidence of the model on unlabeled data. Different from the FreeMatch method, the global dynamic threshold is not only used for calculating the self-adaptive local threshold of different categories, but also used for measuring the quality of the pseudo tag through the global threshold and calculating the dynamic weight trade-off so as to balance the consistency loss generated by the unmarked data set.

The global threshold can reflect the quality of the whole pseudo tag of the label-free data set, and the quality of the generated whole pseudo tag of the model is also continuously improved along with the continuous increase of training iteration times, so that the accuracy of the whole algorithm is improved.

However, calculating the confidence level of all unlabeled data during each iteration is very time consuming. To address this problem, an Exponential Moving Average (EMA) was introduced to reflect the change in global dynamic threshold τ. Specifically, EMA is used to calculate the change in dynamic threshold and control its rate of change by a momentum decay factor λ; thus, in each iteration, all non-ones are needed only once Marking the confidence of the data and updating the global dynamic threshold by using the EMA, thereby greatly reducing the calculation time; global dynamic thresholdIs that

，

Further, the local dynamic threshold values are specifically that learning difficulties of different categories are different, and the use of the same threshold value may lead to the martai effect of the model, i.e. easier categories get more training and difficult categories get less training. So different classes should use different thresholds according to different learning and distinguishing difficulties, class thresholds which are easy to learn or more in number should be correspondingly improved, and class thresholds which are difficult to learn or less in number and have great sample value should be correspondingly improved.

Specifically, for each category c in the dataset, a global estimate is computed by EMA Then use +.>Measuring the learning value of the sample by comprehensively measuring the global dynamic threshold value +.>Multiplied by->Raising or lowering the threshold value of the specific class c on the basis of the global dynamic threshold value;

Wherein,is global dynamic threshold +.>For the expected Max-Norm Regularization predicted for a sample of class c at time t, the calculation is as follows:

Further, the dynamic weight is specifically that the too high threshold value causes low utilization rate of the pseudo tag, and too many false tags are introduced into the dynamic threshold value, such as FlexMatch, at the initial low threshold value (about 16% of false tags are introduced into the FlexMatch in the early stage), so that the dynamic weighting coefficient trade-off of the pseudo tag quality is proposed, and consistency loss is generated by iterating the initial balance high-accuracy fixed threshold value Loss of consistency with dynamic threshold value ∈>The method comprises the steps of carrying out a first treatment on the surface of the The quality of the pseudo tag in the training process of the model is measured by utilizing a global self-adaptive threshold; the unlabeled dataset loss for the entire model is:

wherein,is a super parameter，/>Is a fixed threshold value, is->Weight coefficient, which is a measure of the quality and quantity of pseudo tags, +.>Other time instances; the trade-off ensures that the false label loss generated by a higher fixed threshold value at the initial stage of iteration is dominant, the influence of false labels on a model is reduced, low-quality and large-number false labels generated by a dynamic threshold value are supplemented, the capability of the model for preventing overfitting is enhanced, and the model is prevented from being overfitted at the later stage>When using the false tag quality, there has been a high and high number of dynamic thresholds resulting in a consistency penalty.

Further, the adaptive fairness penaltyIn particular, the model is encouraged to make different predictions for each class, resulting in meaningful dynamic thresholds, especially if the marker data is small. EMA +.predicted using the model of Eq.4 because pseudo tag distribution may be non-uniform>As an estimate of the expectations of the predictive distribution of unlabeled data, an adaptive fairness penalty +.>And normalizing probability expectations through pseudo tag histogram distribution, and reducing negative effects caused by unbalanced sample distribution. Optimizing +. >And->Cross entropy of (2) as->；

，

Wherein,is the average expectation of unlabeled picture prediction in one batch, < >>Is the histogram distribution of pseudo tags in one batch,/->Is a histogram distribution function, +.>Is->Predictive value for model on strongly enhanced image, < +.>Is->The converted 'single-heat' label;

for the followingThe EMA scheme is also used for updating:

)，

wherein,is +.>，/>Is +.about.1 times of iteration>，/>Is a 'single-heat' tag for predicting the possibility of outputting the b-th picture by the model;

adaptive fairness at t-th iteration (SAF)Is that

，

the overall penalty of the algorithm is therefore:

+/>+/>,

wherein the method comprises the steps of,/>,/>Is a weight coefficient.

To verify the effectiveness of the model of the present invention, three data sets commonly used in semi-supervised algorithms, CIFAR-10, CIFAR-100, SVHN, were evaluated and compared under different data labelling amounts.

TABLE 1 error Rate on CIFAR-10/100 dataset, best results are shown in bold

In Table 1, the accuracy of the different methods on CIFAR-10, CIFAR-100, SVHN is presented. From this table, it can be observed that the model consistently achieves performance advantages over different tag numbers of CIFAR-10, CIFAR-100. And also on Svhn, does not differ much from the currently best classification result.

Ablation experiments were also performed at cifar10, 250 tags to verify the effectiveness of each component. As shown in table 2. Adding to basic semi-supervision frameworkThe classification performance can be obviously improved, and the classification performance can be effectively improved, so that the model can be effectively helped to learn and judge class information. />+/>The results of (2) also demonstrate that it can be evident at the early stage in the base of higher sample utilizationThe introduction of false labels is reduced on the basis, and the generalization of the model is improved.

TABLE 2 ablation experiment results, best results are shown in bold

Experiments show that the two introduced algorithms can effectively improve semi-supervised image classification mutually.

The dataset used was the cifar10 dataset. The CIFAR-10 dataset contained 60000 32 by 32 color images, divided into 10 classes of 6000. There were 50000 training pictures and 10000 test pictures.

In the training set, 250 samples are randomly selected as marked data, and the rest are unmarked data.

The method provided by the invention is realized by using a PyTorch framework. NVIDIA RTX 3090 is used as the GPU for accelerated training. For the purpose of fair comparison, the same backbone network and super parameters as the previous correlation method are used. Specifically, the Wide ResNet28-2 was used for CIFAR-10, and the Wide ResNet 28-8 was used for CIFAR-100. SGD with momentum of 0.9 was used as an optimizer. The initial learning rate is 0.03, and the cosine learning rate attenuation plan is that Is the initial learning rate, K (K) is the current (total) training step, set K= for all data sets>. In the test phase, all algorithms were inferred using an exponential moving average with a momentum of 0.999 for the training model. The batch size of the tag data is 64. The same weight decay value, a predefined threshold τ, an unlabeled lot ratio μ, and a penalty weight introduced for the pseudo tag are used. The best results are shown in bold. For the method, the same set of hyper-parameters (++) is used on all datasets>=1。/>，τ=0.5，). As described above, the total objective function +.>By supervising the loss->Loss of consistency->And co-training loss->Composition is prepared. Wherein the basic loss function is used: cross entropy loss->Their definitions are as follows:

wherein the method comprises the steps ofIs the probability distribution of each pixel in the true segmented image,/for each pixel in the true segmented image>Is the probability distribution of each pixel in the model predicted segmented image,/for each pixel in the model predicted segmented image>Split image representing real label +.>Representing the predicted classified image.

Super-parameters sigma sum for Intra-class variance loss correlationQuantitative analysis was performed.

As shown in fig. 3 and 4.

If the value of σ is too small, the identifiability of the feature may be reduced, resulting in blurring of the feature. This means that the distance between instances of the same class becomes smaller, making it difficult to distinguish them, affecting the performance of the model. On the other hand, if the value of σ is too large, the variability between instances may increase, resulting in poor intra-class consistency of the features. This means that the distance between instances of the same class becomes large, it is difficult to cluster them together, which also affects the performance of the model.

Weighting ofThe importance of the intra-class variance loss in the total loss is controlled. Increasing the weight emphasizes the distance relationship between instances of the same category and reduces the proportion of other losses in the overall loss. If weight +.>Too small may impair the importance of distance relationships between instances of the same category, potentially resulting in poor performance of features within the learning class.

Figures 5 and 6 provide weights for unsupervised loss functionsAnd a fixed threshold τ.

Super parameter loss weightThe importance of the unsupervised loss in the overall loss function is controlled. Add->The contribution of the unsupervised loss may be emphasized, possibly enhancing the ability of the model to learn useful representations from the unlabeled data. However, will +.>Is arranged too farThe high probability of overfitting to unsupervised losses, ignoring the tagged data, results in performance degradation of the primary task.

The fixed threshold τ determines the threshold for selecting reliable pseudo tags. A higher threshold will filter out less trusted false tags, reducing noise introduced by incorrect false tags. However, setting τ too high may discard most of the available pseudo tags, resulting in information loss, limiting the learning ability of the network. On the other hand, setting τ too low may introduce more incorrect pseudo tags, negatively impacting the training process and model performance.

Finding loss weightsAnd the threshold τ is critical to take advantage of the unsupervised loss and avoid potential drawbacks. By performing deep exploration and optimization in specific tasks and data sets, optimal super-parameter settings can be determined.

After the parameter selection is completed, a specific embodiment of the method will be explained below in connection with the overall framework of the method. Fig. 1 is an overall framework of the method.

The training set is sent into a double-architecture split frame with Wide ResNet as backbone network. In the training process, marked and unmarked images are processed respectively. For the marked image, directly calculating the prediction of the CNN input image and the supervision loss of the corresponding real label. For an unlabeled image, firstly carrying out weak enhancement and strong enhancement on the image to obtain a weak enhanced image and a strong enhanced image, respectively inputting an original image and the two enhanced images into intra-class difference loss between the characteristic vector of the original image and the characteristic vector of the strong enhancement obtained in a characteristic generator ∈>. Calculating the loss of the fixed threshold and the loss of the dynamic threshold using two enhanced predictions finally yields a consistency loss +.>. Recalculating adaptive fairness penalty >。

After the training process is finished, testing is carried out by using the trained CNN network. Inputting the test set into the trained network to extract the characteristics, classifying, and evaluating the correctness of the segmentation result.

Example 2

A semi-supervised image classification system based on a pseudo tag quality dynamic trade-off coefficient, the semi-supervised image classification system utilizing the semi-supervised image classification method based on the pseudo tag quality dynamic trade-off coefficient, the semi-supervised image classification system comprising an image acquisition module, a semi-supervised image classification module and an optimization module;

The feature vector of the original image is regarded as an anchor point;

From the above, the embodiment of the invention combines a measurement learning method through a novel semi-supervised learning framework, and aims to consider intra-class differences while enhancing inter-class differences. The framework can better utilize the intra-class information and improve the performance of the model. Experimental results show that compared with the existing semi-supervised learning method, the method has the advantage that the classification performance is remarkably improved.

Example 3

The embodiment of the invention provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the memory is used for storing the software program and a module, and the processor executes various functional applications and data processing by running the software program and the module stored in the memory. The memory and the processor are connected by a bus. In particular, the processor implements any of the steps of the above-described embodiment by running the above-described computer program stored in the memory.

It should be appreciated that in embodiments of the present invention, the processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read-only memory, flash memory, and random access memory, and provides instructions and data to the processor. Some or all of the memory may also include non-volatile random access memory.

From the above, it can be seen that the electronic device provided by the embodiment of the present invention can implement the semi-supervised image classification method according to the first embodiment by running a computer program, and from the above, the embodiment of the present invention combines a metric learning method through a novel semi-supervised learning framework, so as to enhance the inter-class differences and simultaneously consider the intra-class differences. The framework can better utilize the intra-class information and improve the performance of the model. Experimental results show that compared with the existing semi-supervised learning method, the method has the advantage that the classification performance is remarkably improved.

It should be appreciated that the above-described integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by instructing related hardware by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by a processor. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier wave signal, a telecommunications signal, a software distribution medium, and so forth. The content of the computer readable storage medium can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

It should be noted that, the method and the details thereof provided in the foregoing embodiments may be combined into the apparatus and the device provided in the embodiments, and are referred to each other and are not described in detail.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/device embodiments described above are merely illustrative, e.g., the division of modules or elements described above is merely a logical functional division, and may be implemented in other ways, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A semi-supervised image classification method based on pseudo tag quality dynamic trade-off coefficients is characterized by comprising the following steps,

step 2: constructing a semi-supervised image classification model;

step 8: the generalization capability of the semi-supervised image classification model is improved by reducing the distance between the anchor point of the step 6 and the positive sample of the step 7 and increasing the distance between the positive sample and the negative sample;

construction of loss function by contrast loss or triple loss measurement learning method；

Then, obtaining corresponding image predictions from both enhanced images in a backbone; separately calculating a fixed threshold based consistency loss using two enhanced predictionsAnd a consistency penalty based on local dynamic threshold>Finally, the overall loss without tag is obtained>Then calculate adaptive fairness penalty ∈ ->Encouraging the model to make different predictions for each class, thereby yielding meaningful dynamic thresholds, in the case of few labeled data; finally, the total loss function consists of +. >、/>And->Is formed by the combination of the components;

the loss functionIt is the result that the liquid crystal display device,

wherein the method comprises the steps of， Is the feature vector obtained by the original image through the feature encoder, < >>Is the eigenvector obtained by the weak enhanced sample, +.>Is a feature vector obtained by strongly enhancing the sample, +.>Is a super parameter for controlling the relative distance of the three in the measurement space;

consistency loss by dynamically self-adjusting thresholdThe method comprises the following steps:

wherein,is of lot size>For cross entropy function>And->Respectively indicate->Andabbreviations of (i.e., probability results of strong and weak enhanced images through model,)>Is composed of->Converted "one-hot" tag, ">Is the ratio of unlabeled vxx to the size of the marked data batch, +.>Is a confidence function, ++>Is a global dynamic threshold;

the adaptive fairness lossSpecifically, the +.sub.is optimized on the mini-batch>And->Cross entropy of (2) as->；

，

Wherein,is of a batch size, +.>Is unlabeled picture b in batch,>is the average expectation of unlabeled picture prediction in one batch, < >>Is the histogram distribution of pseudo tags in one batch,/->Is a histogram distribution function, +.>Representation->Predictive value for model on strongly enhanced image, < +.>Is->The converted "one-hot" label is used,

for the following The EMA scheme is also used for updating:

)，

wherein,for EMA weight coefficient, < >>Is +.>，/>Is +.about.1 times of iteration>，/>Is a "one-hot" tag that models output the prediction likelihood of the b-th picture,

adaptive fairness at t-th iterationIs that

，

Wherein,,/>encouraging the output probability of each small batch to be close to the marginal class distribution of the model, and normalizing the histogram distribution; it helps the model to generate diversified predictions; />Is cross entropy loss;

the overall penalty of the algorithm is therefore:

+/>+/>,

wherein the method comprises the steps of,/>,/>Is a weight coefficient;

label-free dataset loss for whole modelThe method comprises the following steps:

wherein,is a superparameter->Is a fixed threshold value, is->Weight coefficient, which is a measure of the quality and quantity of pseudo tags, +.>Other time instances.

2. A semi-supervised image classification method based on pseudo label quality dynamic trade-off coefficients according to claim 1, characterized in that the global dynamic threshold is specifically that EMA is used to calculate the variation of the dynamic threshold and control its variation speed by means of a momentum decay factor λ; in each iteration, the confidence level of all unlabeled data is only needed once, and the EMA is utilized to update the global dynamic threshold; global dynamic threshold Is that

，

Where c is the number of categories of the dataset,for EMA weight coefficient, < >>Is the ratio of the size of the unlabeled data to the size of the batch of labeled data,/-, is the ratio of unlabeled data to the size of the batch of labeled data>Is the previous iteration number +.>Global dynamic threshold of time of day->Is of a batch size, +.>Is unlabeled picture b in batch,>is->Representing the likelihood that the model outputs unlabeled picture b prediction,/for example>Is when the number of iterations->Is (are) time of day->Is the number of iterations.

3. The semi-supervised image classification method based on pseudo label quality dynamic trade-off coefficients of claim 1, wherein the local dynamic threshold is specifically, for each class c in the dataset, estimating global by EMA computationThen use +.>Measuring the learning value of the sample by comprehensively measuring the global dynamic threshold value +.>Multiplied by->Raising or lowering the threshold value of the specific class c on the basis of the global dynamic threshold value;

in order to make all classes have the same competitiveness initially, an initial dynamic threshold of each class is set, and finallyAdaptive threshold of (2)Is that

Wherein,is global dynamic threshold->For the expected maximum-norm regularization of the c-like sample prediction at time t, the calculation is as follows:

wherein, =[/>]Is composed of all->List of->Is the expected value of the model's predictions for each category c.

4. A semi-supervised image classification method based on pseudo label quality dynamic trade-off coefficients as recited in claim 3, wherein the dynamic weights are specifically too highThe threshold value of (2) causes low utilization rate of the pseudo tag, and for the dynamic threshold value, the dynamic weighting coefficient trade-off of the pseudo tag quality is proposed, and the consistency loss generated by the fixed threshold value with high accuracy is balanced at the initial stage of iterationLoss of consistency with dynamic threshold value ∈>The method comprises the steps of carrying out a first treatment on the surface of the The quality of the pseudo tag during training of the model is measured using a global adaptive threshold.

5. A semi-supervised image classification system based on pseudo tag quality dynamic trade-off coefficients, characterized in that the semi-supervised image classification system utilizes the semi-supervised image classification method based on pseudo tag quality dynamic trade-off coefficients according to any one of claims 1-4, the semi-supervised image classification system comprising an image acquisition module, a semi-supervised image classification module, and an optimization module;

the feature vector of the original image is regarded as an anchor point;

6. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

A processor for carrying out the method steps of any one of claims 1-4 when executing a program stored on a memory.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-4.