CN116452854A - Adaptive image classification method based on width learning and random sensitivity - Google Patents
Adaptive image classification method based on width learning and random sensitivity Download PDFInfo
- Publication number
- CN116452854A CN116452854A CN202310274623.7A CN202310274623A CN116452854A CN 116452854 A CN116452854 A CN 116452854A CN 202310274623 A CN202310274623 A CN 202310274623A CN 116452854 A CN116452854 A CN 116452854A
- Authority
- CN
- China
- Prior art keywords
- sample
- representing
- network model
- target domain
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 200
- 230000035945 sensitivity Effects 0.000 title claims abstract description 53
- 230000003044 adaptive effect Effects 0.000 title claims description 6
- 238000009826 distribution Methods 0.000 claims abstract description 132
- 230000000694 effects Effects 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims description 82
- 239000011159 matrix material Substances 0.000 claims description 38
- 239000013598 vector Substances 0.000 claims description 33
- 238000004364 calculation method Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 16
- 238000013508 migration Methods 0.000 claims description 12
- 230000005012 migration Effects 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 7
- 238000002474 experimental method Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000002093 peripheral effect Effects 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims 2
- 238000012549 training Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000008485 antagonism Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Algebra (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a field self-adaptive image classification method based on width learning and random sensitivity, which comprises the following steps: 1) Constructing wide model input information; 2) Constructing a width network model; 3) Introducing edge distribution alignment loss into the model; 4) Introducing a finer granularity of conventional conditional distribution alignment loss into the model; 5) Iteratively selecting a pseudo tag with high quality, constructing enhanced conditional distribution alignment loss and introducing the enhanced conditional distribution alignment loss into a model; 6) Introducing popular regularization into the model, and exploring potential distribution information of the sample; 7) Introducing random sensitivity into the model, and solving the problem that the model is over-fitted on a source domain sample; 8) And solving the model connection weight and the classification result. The invention enables the model to achieve better effect on the target domains with different distributions. The invention solves the problem that the depth field self-adaptive method consumes a large amount of computation resources to a certain extent, and can further realize more flexible and accurate downstream application.
Description
Technical Field
The invention relates to the technical field of field self-adaption, in particular to a field self-adaption image classification method based on width learning and random sensitivity.
Background
In the machine learning field, it is generally assumed that the training set and the test set are independent of each other and have the same distribution, but in our real life, samples used for training and testing models often have different sources for image classification tasks, and it is difficult to ensure that the samples have the same distribution. Thus, a model that performs well on the training set tends to perform poorly on the test set. In order to overcome this problem, the field adaptation problem in the migration learning is widely focused, and the field adaptation problem is defined as that a source field sample and a target field sample share the same feature space, but the edge distribution and the condition distribution of the source field sample and the target field sample are different, and meanwhile, the labels of the source field sample and the target field sample belong to the same label space, namely the same category.
Furthermore, the cost of acquiring a labeled training set is high for image classification tasks, so it is desirable to learn enough knowledge from a source domain sample with rich labels to guide good classification of target domain samples with different distributions and no labels. Therefore, the self-adaption in the unsupervised field becomes an important problem of the current research and is also an important research point of the method. Unsupervised means that only the sample can participate in the training of the target domain, and its real label is not available. The self-adaptive prediction function capable of minimizing classification errors on the target domain is trained by using source domain knowledge and the distribution information of the existing target domain samples, so that the image classification task is completed better.
The self-adaptive image classification method in the existing unsupervised field aims at minimizing the distribution difference between the source domain and the target domain, so that the knowledge of the source domain can be better applied to the target domain. With the continuous progress of computer technology, two types of methods have been developed, which are respectively: 1. conventional field-adaptive image classification methods enable source and target domain samples to reduce the degree of distribution mismatch in a shared feature space, such as mapping features to a regenerated hilbert space, mainly by learning the feature space. 2. The method directly minimizes the maximum mean error between a source domain and a target domain based on the self-adaptive image classification method in the deep learning field, and learns domain invariant features by adopting an antagonism learning mode, and the accuracy is greatly improved by the deep learning-based method, but due to the problems of large parameter quantity of a deep learning model and more calculation resources, the method is improved in aspects of training and landing of the model.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a field self-adaptive image classification method based on width learning and random sensitivity, which can use a special network structure of a width network model to carry out field self-adaptive image classification, introduces various distribution alignment methods and random sensitivity, relieves the negative migration condition in the field self-adaptation process, improves the generalization capability of the width network model, has fewer parameters compared with a depth network, requires fewer calculation resources and training time, and can realize more flexible and accurate downstream application.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: the field self-adaptive image classification method based on width learning and random sensitivity comprises the following steps:
1) Completing the construction of input data by using a source domain sample, a source domain sample label and a target domain sample;
2) Completing the construction of a feature mapping layer, an enhancement layer and an output layer of the width network model;
3) Introducing edge distribution alignment loss into a width network model, and relieving the reduction of model generalization capability caused by different source domain samples and targets and sample edge distribution;
4) Obtaining a target domain sample pseudo tag by using an SVM classifier, introducing alignment among categories with finer granularity, namely conventional conditional distribution alignment loss, into a width network model into which edge distribution alignment loss is introduced, and improving the performance of the target domain sample pseudo tag on the target domain again;
5) Considering that the pseudo tag obtained by the SVM does not have high quality, the effect of the condition distribution on the pseudo tag is reduced, so that the pseudo tag with high quality is selected through iteration, and is used for constructing enhanced condition distribution alignment loss and is introduced into a wide network model into which conventional condition distribution alignment loss is introduced, and the negative migration condition caused by the quality problem of the pseudo tag is relieved;
6) After distribution alignment is completed, in order to explore potential distribution information of the samples and smooth classification boundaries, popular regularization loss is introduced into a width network model which has introduced enhanced conditional distribution alignment loss, so that the width network model can learn the distribution condition of the samples better, and the accuracy of the samples is improved;
7) From step 2) to step 6) are all to improve the performance of the width network model from the view angle of sample distribution, and from the perspective of overfitting, in order to solve the problem that the width network model is overfitted on a source domain sample due to the fact that a standard domain real label is unavailable, the random sensitivity is introduced into the width network model which has introduced manifold regularization loss, and the generalization capability of the width network model is improved;
8) And (3) solving the connection weight of the width network model by using a ridge regression algorithm according to each loss in the steps 2) to 7), and solving a classification result.
Further, in step 1), the input data comprises a source domain sample, a target domain sample, and a source domain sample tag, wherein a real tag of the target domain sample is not available;
source field sample set X s Expressed as:
in the method, in the process of the invention, representing real number,/->Representing the ith source domain sample, each source domain sample having a dimension of 1×d, d representing the characteristic dimension of the sample, n in total s A source domain sample having n s A plurality of;
target domain sample set X t Expressed as:
in the method, in the process of the invention, representing the ith target domain sample, each target domain sample having a dimension of 1×d, n in total t A target domain sample having n t A plurality of;
the set of labels corresponding to each source domain sample is a source domain label set Y s Expressed as:
in the method, in the process of the invention,c represents the total number of sample categories, +.>Representing class labels corresponding to the ith source domain sample, n is shared s A personal tag;
the sample X input into the breadth-network model is expressed as:
X=[X s ;X t ]
in the method, in the process of the invention,
the label Y input into the width network model is expressed as:
Y=[Y s ;Zeros]
in the method, in the process of the invention,a matrix of all 0's is used to represent tags for which the target domain samples are not available,
representing all tags entered.
Further, in step 2), the breadth-network model is divided into three layers:
the first layer is a feature mapping layer which is responsible for converting input samples into feature maps through random weights, biases and linear activation functions and storing the feature maps in feature nodes;
the second layer is an enhancement layer, and the layer maps the characteristic nodes again through random weights and biases and stores the characteristic nodes in the enhancement nodes;
the third layer is an output layer, the characteristic nodes and the enhancement nodes are combined to form full-characteristic nodes which are directly connected with the output layer, and finally, the connection weight of the width network model is solved by using a ridge regression algorithm;
The construction process of the width network model specifically comprises the following steps:
inputting X into the width network model, the i' th group of feature nodes Z containing k feature nodes i' Expressed as:
in the method, in the process of the invention,and->Representing the weight and bias of the randomly generated i' th set of feature layer nodes,n represents the group number of feature nodes, phi i' A linear activation function representing the i' th group of nodes with feature layers, all feature node combinations Z n Expressed as:
Z n =[Z 1 ,Z 2 ,...,Z n ]
in the method, in the process of the invention,n x k represents the total number of feature nodes;
generating a j' th group enhancement node H by mapping feature nodes j' Expressed as:
in the method, in the process of the invention,and->Representing randomly generated j' th set of weights and biases, m representing the total number of enhancement nodes, delta j' Non-linear activation function representing j' th set of enhancement nodes, all enhancement nodes H m Expressed as:
H m =[H 1 ,H 2 ,...,H m ]
in the method, in the process of the invention,
and combining all the characteristic nodes and the enhancement nodes to obtain a full-characteristic node A, wherein the full-characteristic node A is expressed as:
A=[Z n |H m ]
in the method, in the process of the invention,n x k + m represents the total number of feature nodes and enhancement nodes;
the width network model processes the input label into a single-hot label, solves the connection weight by adopting a mode of minimizing the sum of prediction errors, and introduces L 2 The canonical term post-loss function is expressed as:
wherein alpha represents L 2 The penalty coefficient of the regularization term, W, represents the connection weight of the breadth-network model, Representing the experience matrix and I representing the identity matrix.
Further, in step 3), in order to be able to minimize the edge distribution differences of the source domain samples and the target domain samples, edge distribution alignment is introduced into the constructed width network model, using X s And X t The difference of the average value of the predicted output is used as a standard for measuring the edge distribution difference;
inputting X into a width network model, and obtaining a full-feature node A of X after combining the feature node and the enhancement node, wherein the full-feature node A is expressed as:
in the method, in the process of the invention,representing the transpose of the matrix, r representing the total number of source and target domain samples, i.e., n s +n t ,a(x r ) Representing a full feature layer node corresponding to the r sample;
predictive output ψ of the r-th sample r Expressed as:
ψ r =a(x r )W
introducing edge distribution alignment into a breadth-network model, which loses termsExpressed as:
the loss term is calculatedThe rewriting is as follows:
wherein Tr represents the trace of the matrix, the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,representing the set of source fields, +.>Representing a set of target domains with real tags not available, +.>Is->Any one of the elements, x μ And x γ Representing any sample.
Further, in step 4), in order to promote the condition distribution alignment effect, introducing the regular condition distribution alignment with finer granularity into a width network model into which the edge distribution alignment loss is introduced, and firstly using an SVM classifier to obtain a target domain sample pseudo tag set for calculating the regular condition distribution alignment loss;
The set of pseudo tags corresponding to each target domain sample is the target domain pseudo tag setExpressed as:
in the method, in the process of the invention, representing class labels corresponding to the jth target domain sample, n is shared t A personal tag;
using the mean difference of the predicted outputs of samples with source and target domains belonging to the same class as a measure of the difference in conditional distributions, therefore conventional conditional distributions align with the loss termExpressed as:
in the method, in the process of the invention,a sample set representing the true labels in the source domain as class c,sample set indicating pseudo tag in target domain as category c, +.>Representing the number of samples of the source domain belonging to class c,/->Indicating the number of samples of the target domain belonging to class c, < >>Represents the jth target domain sample corresponding thereto +.>Is a pseudo tag of (2);
the loss term is calculatedThe rewriting is as follows:
in the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,representation->Any one of the elements.
Further, in step 5), considering that the pseudo tag obtained by the SVM classifier does not have high quality, the condition distribution will have a reduced effect on the pseudo tag, so the pseudo tag having high quality is selected by iteration, and is used for constructing enhanced condition distribution alignment loss and introduced into a wide network model into which conventional condition distribution alignment loss has been introduced, to alleviate the negative migration condition caused by the quality problem of the pseudo tag, and the method comprises the following steps:
5.1 Selecting the pseudo tag with high confidence through the submodel;
setting the number N of generated submodels drop Setting 0 proportion as F%, then randomly generating N drop 0 1 mask with 0 set to F%Multiplying N by different masks and A points drop Sub-models, each sub-model is matched with X t The mean value of the prediction output of (2) is used as the final prediction output, the variance is used as the measurement standard of the quality of the corresponding pseudo tag, and the smaller the variance is, the higher the quality is;
X t after being input into the width network model, the method and the device are connected with X t Corresponding characteristic nodes and enhancement nodes are combined to obtain X t Is represented as:
in the method, in the process of the invention, representing full-feature nodes corresponding to the jth target domain sample;
after each sub-model is obtained, the output vector of the sub-model to the target domain sample is obtainedExpressed as:
in the method, in the process of the invention,representing a predicted output vector of the epsilon-th sub-model on the target domain sample;
the prediction output vector of each sub-model is added and averaged to obtainExpressed as:
in the method, in the process of the invention,representing an average value of the predicted output vectors of the respective samples of the sub-model;
will beConverted into a single-hot tag to be output, and the pseudo tag is output>At the same time, the output vector set +/of each target domain sample in each sub-model can be obtained >Expressed as:
in the method, in the process of the invention,represents the jth target domain sample N drop A set of sub-predicted output vectors,Representing a predicted output vector of the epsilon sub-model for the jth target domain sample;
the quality of a single sample pseudo tag is calculated as follows:
wherein eta is j Representing the quality of the pseudo tag obtained for the jth target field sample, var representing the variance,representing a prediction result set of the c-th class of j samples by all the sub-models;
5.2 Iterative integration pseudo tags participate in enhanced conditional distribution alignment;
creating a collection of high quality pseudo tagsPseudo tag for calculating current turn to obtain target domain sampleAccording to eta j Sorting from high to low, taking P% of high-quality parts before sorting and adding the parts into a set xi, and selecting pseudo tags of the same sample in multiple iterations +.>After final classification is determined by adopting a majority voting mode, updating xi, and marking a target domain sample with a high-quality pseudo tag as +.>Marking a corresponding high quality pseudo tag as The number of target domain samples with high-quality pseudo tags is represented, and upper corner marks represent high-quality marks;
alignment of conditional distributions taking part in the next round of enhancement can be obtained by xiAnd corresponding- >Imparting X to the composition t And corresponding->The penalty term for conventional conditional distribution alignment is kept with higher weight because it is considered X while participating in conventional conditional distribution alignment t Pseudo tag of->Does not have high quality, but it still has positive significance for the wide network model, especially integrated in the early stages of its iteration>When the number is small;
such that each round of iteration is pairedUpdate regular conditional distribution alignment for participation in the next round and integrate +.>The effect of the enhanced condition distribution alignment of the next round is better and better, the condition distribution alignment effect is improved by distinguishing the condition distribution alignment by the weight, the negative migration condition caused by low quality of the pseudo tag is relieved, and finally the iteration is carried out>Obtaining a result after the round;
thus, according to the conditional distribution alignment principle, enhanced conditional distribution alignment lossFrom source domain sample X s And target field sample with high quality pseudo tag +.>The mean differences in predicted outputs between the same classes are accumulated for those target domain samples and source domain samples X without high quality pseudo tags s The alignment loss between them is 0;
enhanced conditional distribution aligns loss terms according to the rules described above Expressed as:
in the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,sample set representing high quality pseudo tag c in target domain sample, +.>Represents the j th * Target domain sample with high quality pseudo tag, < >>Represents the j th * High quality pseudo tag of a target domain sample with high quality pseudo tag +.>Representation->Any one of the elements->Indicating the number of samples that possess high quality labels and belong to class c.
Further, in step 6), to explore the potential distribution information of the sample, promote the generalization ability of the breadth-network model in the target domain, introduce manifold regularization penalty into the breadth-network model that has introduced enhanced conditional distribution alignment penalty, its penalty termExpressed as:
wherein a (x μ ) For sample x μ Is a (x) γ ) For sample x γ Is a full feature node, omega μγ Representing similarity between any two samples, cosine similarity calculation was used:
in the method, in the process of the invention,tau neighbor sets representing any sample x are clustered by adopting a KNN algorithm;
will lose termsThe rewriting is as follows:
where, Ω - Δ represents a laplace matrix,omega generated by all samples μγ The formed adjacency matrix, omega, represents a diagonal matrix, and the calculation method is as follows:
In omega μμ Represents an element in Ω;
the laplace matrix is normalized:
in the method, in the process of the invention,representing the normalized Laplace matrix;
thus, the term is lostThe final form is expressed as:
further, in step 7), the source domain samples in the input space are divided into two parts, i.e. each source domain sampleAnd the surrounding hidden samples-> Is indicated at->A set of samples within a specific positive value Q around, i.e. in each dimension +.>And->The distance between each is smaller than Q, for each source field sample +.>Can find a set of concealment samples meeting the following requirements +.>
Wherein Deltax is κ Representation ofAnd->Difference vector of any dimension between them, deltax is +.>And->The difference vector between the two is Q which is a self-defined value, and the Q value is not selected too much because the sample which is too far away from the current sample and possibly not already in the current category can be selected specifically according to neighborhood knowledge or repeated experiments, and each hidden sample is supposed to be->All have the same probability of generation, i.e. obey a uniform distribution, in other words can be seen as +.>Is->Peripheral disturbance points, while Δx is the degree of random disturbance;
random sensitivity is aimed at shrinkingAnd->Predicting the output mean square error, relieving the problem of overfitting of the width network model, and simultaneously enabling the width network model to have better generalization capability for target domain samples with distribution differences;
Generating a set of disturbance points for each source domain sampleWith o disturbance points per group, the total number of disturbance pointsThe perturbation points of all source domain samples are denoted +.>Disturbance Point->Only participate in the calculation of random sensitivity, not participate in the calculation of other lost items, +.>After input into the width network model, will +.>The corresponding characteristic nodes and the enhancement nodes are combined to obtain +.>Is represented as:
in the method, in the process of the invention, representing a full-feature node corresponding to the e-th disturbance point of the i-th source domain sample;
X s after being input into the width network model, X is calculated s Corresponding characteristic nodes and enhancement nodes are combined to obtain X s Is expressed as:
in the method, in the process of the invention, representing full-feature nodes corresponding to the ith source domain sample;
thus, a random sensitivity loss termRepresented as:
In the method, in the process of the invention,predictive output vector representing a disturbance sample, +.>Representation->The results after stacking o times were repeated, after stacking UW and +.>And enables a one-to-one correspondence of each source domain sample to its predicted output vector of disturbance points.
Further, in step 8), the final loss function of the width network model obtained according to steps 2) to 7) is expressed as:
wherein lambda is 1 Is the weight of the edge distribution alignment loss, lambda 2 Is the weight of the conventional conditional distribution alignment penalty, σ is the weight of the random sensitivity penalty term, λ 3 Is the weight of manifold regularization loss, lambda 4 Is a weight to enhance conditional distribution alignment loss, where λ 4 Above lambda 2 ;
Obtaining the value of W according to a ridge regression algorithm:
since the ridge regression algorithm is used to solve the width network model, UW and UW are combined in one solving processAt least one should be known to solve, so the first iteration is used as an initialization round of random sensitivity, no random sensitivity loss term is added in the first iteration, and the repeated stacking o times after SW is obtained to form->And it was recorded that from the addition of the random sensitivity loss term at the beginning of the second round, the first round of iteration was used to derive +.>Calculating random sensitivity, and iterating the second round of iterative calculation +.>Recording, calculating random sensitivity for the third iteration, and repeating the steps; at the same time, since the width network model generates N in each iteration drop In order to reduce the influence of the neutron model structure of different iteration rounds on random sensitivity, N used by the submodel is generated drop The masks are shared in each iteration round, while each submodel calculates Y for use in the random sensitivity loss term srpt All recorded by the sub-model generated by the same Mask used in the previous iteration;
in the wide network model iterationIn the next time, use +.>Wheel->As a final classification result of the model, and calculate a classification result of the model.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. compared with other deep field methods, the method greatly shortens training time and requires less calculation resources.
2. Compared with other self-adaptive image classification methods in the non-supervision field, the method provided by the invention has the advantages that the problems of negative migration and over-fitting of the model on the source field data are relieved through further introducing the enhanced distribution alignment and the random sensitivity, and the accuracy of the self-adaptive method in the field is improved.
3. The method has wide use space in the task of classifying computer vision, simple operation, strong adaptability and wide application prospect.
In summary, the invention improves the generalization capability of the breadth-network model through distribution alignment, manifold regularization and random sensitivity, so that the breadth-network model can achieve better effects on target domains with different distributions. The method solves the problems of long time consumption and large consumption of calculation resources commonly occurring in the depth field self-adaptive method to a certain extent, and can further realize more flexible and accurate downstream application.
Drawings
FIG. 1 is a schematic diagram of a logic flow of the present invention.
Fig. 2 is a schematic diagram of the present invention.
Fig. 3 is a block diagram of a breadth-network model.
Fig. 4 is a negative migration schematic.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
As shown in fig. 1 and 2, the present embodiment discloses a field-adaptive image classification method based on width learning and random sensitivity, which includes the steps of:
1) The Office-31 data set is adopted, the Amazon domain is used as a source domain, the Webcam domain is used as a target domain, and AlexNet-FC finely tuned on the source domain is adopted 7 Is characterized by an example. The input of the method comprises a source domain sample, a target domain sample and a label of the source domain sample, wherein the real label of the target domain sample is not available.
2) The input data includes a source domain sample, a target domain sample, and a source domain sample tag, wherein a real tag of the target domain sample is not available.
Source field sample set X s Expressed as:
in the method, in the process of the invention, representing real number,/->Representing the ith source domain sample, each source domain sample having a dimension of 1×d, d representing the characteristic dimension of the sample, n in total s A source domain sample having n s A plurality of;
target domain sample set X t Expressed as:
in the method, in the process of the invention, representing the ith target domain sample, each target domain sample having a dimension of 1×d, n in total t A target domain sample having n t A plurality of;
the set of labels corresponding to each source domain sample is a source domain label set Y s Expressed as:
in the method, in the process of the invention,c represents the total number of sample categories, +.>Representing class labels corresponding to the ith source domain sample, n is shared s A personal tag;
the sample X input into the breadth-network model is expressed as:
X=[X s ;X t ]
in the method, in the process of the invention,
the label Y input into the width network model is expressed as:
Y=[Y s ;Zeros]
in the method, in the process of the invention,a matrix of all 0's is used to represent tags for which the target domain samples are not available,
all tags representing the input;
at this time, n s =2817,n t =795,d=4096,C=31;
As shown in fig. 3, the breadth-network model is divided into three layers:
the first layer is a feature mapping layer which is responsible for converting input samples into feature maps through random weights, biases and linear activation functions and storing the feature maps in feature nodes;
the second layer is an enhancement layer, and the layer maps the characteristic nodes again through random weights and biases and stores the characteristic nodes in the enhancement nodes;
the third layer is an output layer, the characteristic nodes and the enhancement nodes are combined to form full-characteristic nodes which are directly connected with the output layer, and finally, the connection weight of the width network model is solved by using a ridge regression algorithm;
The construction process of the width network model specifically comprises the following steps:
x is input to the breadth-network model,then the i' th group of feature nodes Z comprising k feature nodes i' Expressed as:
in the method, in the process of the invention,and->Representing randomly generated i' th group of weights and biases for feature layer nodes, n representing the number of groups of feature nodes, φ i' A linear activation function representing the i' th group of nodes with feature layers, all feature node combinations Z n Expressed as:
Z n =[Z 1 ,Z 2 ,...,Z n ]
in the method, in the process of the invention,n×k represents the total number of feature nodes, where k=50 and n=10 are set;
generating a j' th group enhancement node H by mapping feature nodes j' Expressed as:
in the method, in the process of the invention,and->Representing randomly generated j' th set of weights and biases, m representing the total number of enhancement nodes, delta j' Non-linear activation function representing j' th set of enhancement nodes, all enhancement nodes H m Expressed as:
H m =[H 1 ,H 2 ,...,H m ]
in the method, in the process of the invention,here, m=1500;
and combining all the characteristic nodes and the enhancement nodes to obtain a full-characteristic node A, wherein the full-characteristic node A is expressed as:
A=[Z n |H m ]
in the method, in the process of the invention,n x k + m represents the total number of feature nodes and enhancement nodes;
the width network model processes the input label into a single-hot label, solves the connection weight by adopting a mode of minimizing the sum of prediction errors, and introduces L 2 The canonical term post-loss function is expressed as:
Wherein alpha represents L 2 The penalty coefficient of the regularization term, W, represents the connection weight of the breadth-network model,represents an experience matrix, I represents an identity matrix, where α=1×10 is set -10 。
3) To be able to minimize the edge distribution differences of source and target domain samples, edge distribution alignment is introduced into the constructed width network model, using X s And X t The difference of the average value of the predicted output is used as a standard for measuring the edge distribution difference;
inputting X into a width network model, and obtaining a full-feature node A of X after combining the feature node and the enhancement node, wherein the full-feature node A is expressed as:
in the method, in the process of the invention,representing a transpose of the matrix, r representing the total number of source and target domain samples,i.e. n s +n t ,a(x r ) Representing a full-feature node corresponding to the r sample;
predictive output ψ of the r-th sample r Expressed as:
ψ r =a(x r )W
introducing edge distribution alignment into a breadth-network model, which loses termsExpressed as:
the loss term is calculatedThe rewriting is as follows:
wherein Tr represents the trace of the matrix, the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,representing the set of source fields, +.>Representing a set of target domains whose real labels are not available.
4) In order to improve the condition distribution alignment effect, introducing the regular condition distribution alignment with finer granularity into a width network model into which edge distribution alignment loss is introduced, and firstly using an SVM classifier to obtain a target domain sample pseudo tag for calculating the regular condition distribution alignment loss;
The set of pseudo tags corresponding to each target domain sample is the target domain pseudo tag setExpressed as:
in the method, in the process of the invention, representing class labels corresponding to the jth target domain sample, n is shared t A personal tag;
using the mean difference of the predicted outputs of samples with source and target domains belonging to the same class as a measure of the difference in conditional distributions, therefore conventional conditional distributions align with the loss termExpressed as:
in the method, in the process of the invention,a sample set representing the true labels in the source domain as class c,sample set indicating pseudo tag in target domain as category c, +.>Representing the number of samples of the source domain belonging to class c,/->Indicating the number of samples of the target domain belonging to class c, < >>Represents the jth target domain sample corresponding thereto +.>Is a pseudo tag of (2);
the loss term is calculatedThe rewriting is as follows:
in the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,representation->Any one of the elements.
5) Considering that the pseudo tag obtained by the SVM classifier does not have high quality, the effect of the condition distribution is reduced, so that the pseudo tag with high quality is selected by iteration, and is used for constructing enhanced condition distribution alignment loss and introduced into a wide network model into which conventional condition distribution alignment loss is introduced, and the negative migration condition caused by the quality problem of the pseudo tag is relieved, as shown in the figure 4, and the method comprises the following steps:
5.1 Selecting the pseudo tag with high confidence through the submodel;
setting the number N of generated submodels drop Setting 0 proportion as F%, then randomly generating N drop 0 1 mask with 0 set to F%Multiplying N by different masks and A points drop Sub-models, each sub-model is matched with X t As the final prediction output, the variance is used as a measure of the quality of the corresponding pseudo tag, the smaller the variance is, the higher the quality is, where N is set drop =3,F=5;
X t After being input into the width network model, the method and the device are connected with X t Corresponding characteristic nodes and enhancement nodes are combined to obtain X t Is represented as:
in the method, in the process of the invention, representing full-feature nodes corresponding to the jth target domain sample;
after each sub-model is obtained, the output vector of the sub-model to the target domain sample is obtainedExpressed as:
in the method, in the process of the invention,representing a predicted output vector of the epsilon-th sub-model on the target domain sample;
the prediction output vector of each sub-model is added and averaged to obtainExpressed as:
in the method, in the process of the invention,representing an average value of the predicted output vectors of the respective samples of the sub-model;
will beConverted into a single-hot tag to be output, and the pseudo tag is output>At the same time, the output vector set +/of each target domain sample in each sub-model can be obtained >Expressed as:
in the method, in the process of the invention,represents the jth target domain sample N drop A set of sub-predicted output vectors,Representing a predicted output vector of the epsilon sub-model for the jth target domain sample;
the quality of a single sample pseudo tag is calculated as follows:
wherein eta is j Representing the quality of the pseudo tag obtained for the jth target field sample, var representing the variance,representing a prediction result set of the c-th class of j samples by all the sub-models;
5.2 Iterative integration pseudo tags participate in enhanced conditional distribution alignment;
creating a collection of high quality pseudo tagsPseudo tag for calculating current turn to obtain target domain sampleAccording to eta j Sorting from high to low, taking P% of high-quality parts before sorting and adding the parts into a set xi, and selecting pseudo tags of the same sample in multiple iterations +.>After final classification is determined by adopting a majority voting mode, updating xi, and marking a target domain sample with a high-quality pseudo tag as +.>Marking a corresponding high quality pseudo tag as The number of target domain samples with high quality pseudo tags is indicated, the upper corner mark indicates a high quality mark, where p=20 is set; />
Alignment of conditional distributions taking part in the next round of enhancement can be obtained by xi And corresponding->Imparting X to the composition t And corresponding->The penalty term for conventional conditional distribution alignment is kept with a higher weight because it is considered that while participating in conventional conditional distribution alignmentX of (2) t Pseudo tag of->Does not have high quality, but it still has positive significance for the wide network model, especially integrated in the early stages of its iteration>When the number is small;
such that each round of iteration is pairedUpdate regular conditional distribution alignment for participation in the next round and integrate +.>The effect of the enhanced condition distribution alignment of the next round is better and better, the condition distribution alignment effect is improved by distinguishing the condition distribution alignment by the weight, the negative migration condition caused by low quality of the pseudo tag is relieved, and finally the iteration is carried out>After the round, a result is obtained, here set +.>
Thus, according to the conditional distribution alignment principle, enhanced conditional distribution alignment lossFrom source domain sample X s And target field sample with high quality pseudo tag +.>The mean differences in predicted outputs between the same classes are accumulated for those target domain samples and source domain samples X without high quality pseudo tags s The alignment loss between them is 0;
Enhanced conditional distribution aligns loss terms according to the rules described aboveExpressed as:
in the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,sample set representing high quality pseudo tag c in target domain sample, +.>Represents the j th * Target domain sample with high quality pseudo tag, < >>Represents the j th * High quality pseudo tag of a target domain sample with high quality pseudo tag +.>Representation->Any one of the elements->Indicating the number of samples that possess high quality labels and belong to class c.
6) To explore the potential distribution information of the sample, the generalization capability of the width network model in the target domain is improved, manifold regularization loss is introduced into the width network model with enhanced conditional distribution alignment loss introduced, and loss terms thereof are introducedRepresented as:
Wherein a (x μ ) For sample x μ Corresponding full feature nodes, a (x γ ) For sample x γ Corresponding full feature node representation, ω μγ Representing similarity between any two samples, cosine similarity calculation was used:
in the method, in the process of the invention,tau neighbor sets representing any sample x are clustered by adopting a KNN algorithm;
will lose termsThe rewriting is as follows:
where, Ω - Δ represents a laplace matrix,omega generated by all samples μγ The formed adjacency matrix, omega, represents a diagonal matrix, and the calculation method is as follows:
In omega μμ Representing the elements in Ω.
The laplace matrix is normalized:
in the method, in the process of the invention,representing the normalized Laplace matrix;
thus, the term is lostThe final form is expressed as:
7) Dividing the source domain samples in the input space into two parts, i.e. each source domain sampleAnd hidden samples around it Is indicated at->A set of samples within a specific positive value Q around, i.e. in each dimension +.>And->The distance between each is smaller than Q, for each source field sample +.>Can find a set of concealment samples meeting the following requirements +.>
Wherein Deltax is κ Representation ofAnd->Difference vector of any dimension between them, deltax is +.>And->The difference vector between Q is a custom value, and the selection of Q value is not too large, because samples too far from the current sample and possibly not already belonging to the current class can be specifically selected according to neighborhood knowledge or repeated experiments, where q=0.05 is set, assuming that each hidden sample is->All have the same probability of generation, i.e. obey a uniform distribution, in other words can be seen as +.>Is->Peripheral disturbance points, while Δx is the degree of random disturbance;
random sensitivity is aimed at shrinkingAnd->Predicting the output mean square error, relieving the problem of overfitting of the width network model, and simultaneously enabling the width network model to have better generalization capability for target domain samples with distribution differences;
For each source domain sampleGenerating a set of disturbance pointsWith o disturbance points per group, the total number of disturbance pointsThe perturbation points of all source domain samples are denoted +.>Disturbance Point->Only the calculation of random sensitivity is participated, and the calculation of other loss items is not participated, o=20, < ->After input into the width network model, will +.>The corresponding characteristic nodes and the enhancement nodes are combined to obtain +.>Is represented as:
in the method, in the process of the invention, representing a full-feature node corresponding to the e-th disturbance point of the i-th source domain sample;
X s after being input into the width network model, X is calculated s Corresponding characteristic nodes and enhancement nodes are combined to obtain X s Is expressed as:
in the method, in the process of the invention,representing full-feature nodes corresponding to the ith source domain sample;
thus, a random sensitivity loss termExpressed as:
in the method, in the process of the invention,predictive output vector representing a disturbance sample, +.>Representation->The results after stacking o times were repeated, after stacking UW and +.>And enables a one-to-one correspondence of each source domain sample to its predicted output vector of disturbance points.
8) The final loss function of the width network model obtained according to steps 2) to 7) is expressed as:
wherein lambda is 1 Is the weight of the edge distribution alignment loss, lambda 2 Is the weight of the conventional conditional distribution alignment penalty, σ is the weight of the random sensitivity penalty term, λ 3 Is the weight of manifold regularization loss, lambda 4 Is a weight to enhance conditional distribution alignment loss, where λ 4 Above lambda 2 Here set lambda 1 =10,λ 2 =10,σ=1,λ 3 =1,λ 4 =30;;
Obtaining the value of W according to a ridge regression algorithm:
since the ridge regression algorithm is used to solve the width network model, UW and UW are combined in one solving processAt least one should be known to solve, so the first iteration is used as an initialization round of random sensitivity, no random sensitivity loss term is added in the first iteration, and the repeated stacking o times after SW is obtained to form->And it was recorded that from the addition of the random sensitivity loss term at the beginning of the second round, the first round of iteration was used to derive +.>Calculating random sensitivity, and iterating the second round of iterative calculation +.>Recording, calculating random sensitivity for the third iteration, and repeating the steps; at the same time, since the width network model generates N in each iteration drop In order to reduce the influence of the neutron model structure of different iteration rounds on random sensitivity, N used by the submodel is generated drop The individual masks are shared in the individual iteration rounds, while each sub-model calculates the random sensitivity loss term +. >All recorded by the sub-model generated by the same Mask used in the previous iteration;
in the wide network model iterationIn the next time, use +.>Wheel->As a final classification result of the model, and calculate a classification result of the model.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (9)
1. The field self-adaptive image classification method based on width learning and random sensitivity is characterized by comprising the following steps of:
1) Completing the construction of input data by using a source domain sample, a source domain sample label and a target domain sample;
2) Completing the construction of a feature mapping layer, an enhancement layer and an output layer of the width network model;
3) Introducing edge distribution alignment loss into a width network model, and relieving the reduction of model generalization capability caused by different source domain samples and targets and sample edge distribution;
4) Obtaining a target domain sample pseudo tag by using an SVM classifier, introducing alignment among categories with finer granularity, namely conventional conditional distribution alignment loss, into a width network model into which edge distribution alignment loss is introduced, and improving the performance of the target domain sample pseudo tag on the target domain again;
5) Considering that the pseudo tag obtained by the SVM does not have high quality, the effect of the condition distribution on the pseudo tag is reduced, so that the pseudo tag with high quality is selected through iteration, and is used for constructing enhanced condition distribution alignment loss and is introduced into a wide network model into which conventional condition distribution alignment loss is introduced, and the negative migration condition caused by the quality problem of the pseudo tag is relieved;
6) After distribution alignment is completed, in order to explore potential distribution information of the samples and smooth classification boundaries, popular regularization loss is introduced into a width network model which has introduced enhanced conditional distribution alignment loss, so that the width network model can learn the distribution condition of the samples better, and the accuracy of the samples is improved;
7) From step 2) to step 6) are all to improve the performance of the width network model from the view angle of sample distribution, and from the perspective of overfitting, in order to solve the problem that the width network model is overfitted on a source domain sample due to the fact that a standard domain real label is unavailable, the random sensitivity is introduced into the width network model which has introduced manifold regularization loss, and the generalization capability of the width network model is improved;
8) And (3) solving the connection weight of the width network model by using a ridge regression algorithm according to each loss in the steps 2) to 7), and solving a classification result.
2. The method of claim 1, wherein in step 1), the input data includes a source domain sample, a target domain sample, and a source domain sample tag, wherein a real tag of the target domain sample is not available;
source field sample set X s Expressed as:
in the method, in the process of the invention, representing real number,/->Representing the ith source domain sample, each source domain sample having a dimension of 1×d, d representing the characteristic dimension of the sample, n in total s A source domain sample of the source domainThe sample has n s A plurality of;
target domain sample set X t Expressed as:
in the method, in the process of the invention, representing the ith target domain sample, each target domain sample having a dimension of 1×d, n in total t A target domain sample having n t A plurality of;
the set of labels corresponding to each source domain sample is a source domain label set Y s Expressed as:
in the method, in the process of the invention,c represents the total number of sample categories, +.>Representing class labels corresponding to the ith source domain sample, n is shared s A personal tag;
the sample X input into the breadth-network model is expressed as:
X=[X s ;X t ]
in the method, in the process of the invention,
the label Y input into the width network model is expressed as:
Y=[Y s ;Zeros]
in the method, in the process of the invention,for a label of all 0 matrices for representing that the target domain sample is not available, < > >Representing all tags entered.
3. The domain adaptive image classification method based on width learning and random sensitivity according to claim 2, wherein in step 2), the width network model is divided into three layers:
the first layer is a feature mapping layer which is responsible for converting input samples into feature maps through random weights, biases and linear activation functions and storing the feature maps in feature nodes;
the second layer is an enhancement layer, and the layer maps the characteristic nodes again through random weights and biases and stores the characteristic nodes in the enhancement nodes;
the third layer is an output layer, the characteristic nodes and the enhancement nodes are combined to form full-characteristic nodes which are directly connected with the output layer, and finally, the connection weight of the width network model is solved by using a ridge regression algorithm;
the construction process of the width network model specifically comprises the following steps:
inputting X into the width network model, the i' th group of feature nodes Z containing k feature nodes i' Expressed as:
in the method, in the process of the invention,and->Representing randomly generated i' th group of weights and biases for feature layer nodes, n representing the number of groups of feature nodes, φ i' A linear activation function representing the i' th group of nodes with feature layers, all feature node combinations Z n Expressed as:
Z n =[Z 1 ,Z 2 ,...,Z n ]
In the method, in the process of the invention,n x k represents the total number of feature nodes;
generating a j' th group enhancement node H by mapping feature nodes j' Expressed as:
in the method, in the process of the invention,and->Representing randomly generated j' th set of weights and biases, m representing the total number of enhancement nodes, delta j' Non-linear activation function representing j' th set of enhancement nodes, all enhancement nodes H m Expressed as:
H m =[H 1 ,H 2 ,...,H m ]
in the method, in the process of the invention,
and combining all the characteristic nodes and the enhancement nodes to obtain a full-characteristic node A, wherein the full-characteristic node A is expressed as:
A=[Z n |H m ]
in the method, in the process of the invention,n x k + m represents the total number of feature nodes and enhancement nodes;
the width network model processes the input label into a single-hot label, solves the connection weight by adopting a mode of minimizing the sum of prediction errors, and introduces L 2 The canonical term post-loss function is expressed as:
wherein alpha represents L 2 The penalty coefficient of the regularization term, W, represents the connection weight of the breadth-network model,representing the experience matrix and I representing the identity matrix.
4. The method of domain-adaptive image classification based on width learning and random sensitivity according to claim 3, wherein in step 3), in order to minimize the edge distribution difference between the source domain sample and the target domain sample, edge distribution alignment is introduced into the constructed width network model using X s And X t The difference of the average value of the predicted output is used as a standard for measuring the edge distribution difference;
inputting X into a width network model, and obtaining a full-feature node A of X after combining the feature node and the enhancement node, wherein the full-feature node A is expressed as:
in the method, in the process of the invention,representing the transpose of the matrix, r representing the total number of source and target domain samples, i.e., n s +n t ,a(x r ) Representing a full feature layer node corresponding to the r sample;
predictive output ψ of the r-th sample r Expressed as:
ψ r =a(x r )W
introducing edge distribution alignment into a breadth-network model, which loses termsExpressed as:
the loss term is calculatedThe rewriting is as follows:
wherein Tr represents the trace of the matrix, the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,representing the set of source fields, +.>Representing a set of target domains with real tags not available, +.>Is->Any one of the elements, x μ And x γ Representing any sample.
5. The method of adaptive image classification in the domain based on breadth learning and random sensitivity according to claim 4, wherein in step 4), in order to enhance the condition distribution alignment effect, introducing a finer granularity of regular condition distribution alignment into a breadth network model into which an edge distribution alignment loss has been introduced, first obtaining a target domain sample pseudo-tag set using an SVM classifier for calculating the regular condition distribution alignment loss;
The set of pseudo tags corresponding to each target domain sample is the target domain pseudo tag setExpressed as:
in the method, in the process of the invention, representing class labels corresponding to the jth target domain sample, n is shared t A personal tag;
using the mean difference of the predicted outputs of samples with source and target domains belonging to the same class as a measure of the difference in conditional distributions, therefore conventional conditional distributions align with the loss termExpressed as:
in the method, in the process of the invention,sample set representing real tags in source domain as category c,/-for>Sample set indicating pseudo tag in target domain as category c, +.>Representing the number of samples of the source domain belonging to class c,/->Indicating the number of samples of the target domain belonging to class c, < >>Represents the jth target domain sample corresponding thereto +.>Is a pseudo tag of (2);
the loss term is calculatedThe rewriting is as follows:
in the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,representation->Any one of the elements.
6. The field adaptive image classification method based on width learning and random sensitivity according to claim 5, wherein in step 5), considering that the pseudo tag obtained by the SVM classifier does not have high quality, the condition distribution is reduced in effect, so that the pseudo tag having high quality is selected by iteration, for constructing enhanced condition distribution alignment loss and introducing into the width network model into which conventional condition distribution alignment loss has been introduced, to alleviate the negative migration condition caused by the quality problem of the pseudo tag, comprising the steps of:
5.1 Selecting the pseudo tag with high confidence through the submodel;
setting the number N of generated submodels drop Setting 0 proportion as F%, then randomly generating N drop 0 1 mask with 0 set to F%Multiplying N by different masks and A points drop Sub-models, each sub-model is matched with X t The mean value of the prediction output of (2) is used as the final prediction output, the variance is used as the measurement standard of the quality of the corresponding pseudo tag, and the smaller the variance is, the higher the quality is;
X t after being input into the width network model, the method and the device are connected with X t Corresponding characteristic nodes and enhancement nodes are combined to obtain X t Is represented as:
in the method, in the process of the invention, representing full-feature nodes corresponding to the jth target domain sample;
after each sub-model is obtained, the output vector of the sub-model to the target domain sample is obtainedExpressed as:
in the method, in the process of the invention,representing a predicted output vector of the epsilon-th sub-model on the target domain sample;
the prediction output vector of each sub-model is added and averaged to obtainExpressed as:
in the method, in the process of the invention,representing an average value of the predicted output vectors of the respective samples of the sub-model;
will beIs converted into a single-heat labelLine output, pseudo tag as output +.>At the same time, the output vector set +/of each target domain sample in each sub-model can be obtained >Expressed as:
in the method, in the process of the invention,represents the jth target domain sample N drop A set of sub-predicted output vectors,Representing a predicted output vector of the epsilon sub-model for the jth target domain sample;
the quality of a single sample pseudo tag is calculated as follows:
wherein eta is j Representing the quality of the pseudo tag obtained for the jth target field sample, var representing the variance,representing a prediction result set of the c-th class of j samples by all the sub-models;
5.2 Iterative integration pseudo tags participate in enhanced conditional distribution alignment;
creating a collection of high quality pseudo tagsPseudo tag of target domain sample obtained by calculating current round +.>According to eta j Sequencing from high to low, adding the high-quality part of the P% before sequencing into a set xi, pseudo tag for the same sample selected in multiple iterations +.>After final classification is determined by adopting a majority voting mode, updating xi, and marking a target domain sample with a high-quality pseudo tag as +.>Marking a corresponding high quality pseudo tag as The number of target domain samples with high-quality pseudo tags is represented, and upper corner marks represent high-quality marks;
alignment of conditional distributions taking part in the next round of enhancement can be obtained by xiAnd corresponding- >Imparting X to the composition t And corresponding->The penalty term for conventional conditional distribution alignment is kept with higher weight because it is considered X while participating in conventional conditional distribution alignment t Pseudo tag of->Does not have high quality, but it still has positive significance for the wide network model, especially integrated in the early stages of its iteration>When the number is small;
such that each round of iteration is pairedUpdate regular conditional distribution alignment for participation in the next round and integrate +.>The effect of the enhanced condition distribution alignment of the next round is better and better, the condition distribution alignment effect is improved by distinguishing the condition distribution alignment by the weight, the negative migration condition caused by low quality of the pseudo tag is relieved, and finally the iteration is carried out>Obtaining a result after the round;
thus, according to the conditional distribution alignment principle, enhanced conditional distribution alignment lossFrom source domain sample X s And target field sample with high quality pseudo tag +.>The mean differences in predicted outputs between the same classes are accumulated for those target domain samples and source domain samples X without high quality pseudo tags s The alignment loss between them is 0;
enhanced conditional distribution aligns loss terms according to the rules described above Expressed as:
in the matrixThe calculation mode of (2) is as follows:
in the method, in the process of the invention,sample set representing high quality pseudo tag c in target domain sample, +.>Represents the j th * Target domain sample with high quality pseudo tag, < >>Represents the j th * High quality pseudo tag of a target domain sample with high quality pseudo tag +.>Representation->Any one of the elements->Indicating the number of samples that possess high quality labels and belong to class c.
7. The method of claim 6, wherein in step 6), in order to explore the potential distribution information of the sample, the generalization ability of the breadth-network model in the target domain is improved, and manifold regularization loss is introduced into the breadth-network model with enhanced conditional distribution alignment loss introduced, the loss term thereofExpressed as:
wherein a (x μ ) For sample x μ Is a (x) γ ) For sample x γ Is a full feature node, omega μγ Representing similarity between any two samples, cosine similarity calculation was used:
in the method, in the process of the invention,tau neighbor sets representing any sample x are clustered by adopting a KNN algorithm;
will lose termsThe rewriting is as follows:
where, Ω - Δ represents a laplace matrix, Omega generated by all samples μγ The formed adjacency matrix, omega, represents a diagonal matrix, and the calculation method is as follows:
in omega μμ Represents an element in Ω;
the laplace matrix is normalized:
in the method, in the process of the invention,representing the normalized Laplace matrix;
thus, the term is lostThe final form is expressed as:
8. the method of adaptive image classification based on width learning and random sensitivity according to claim 7, wherein in step 7), the source domain samples in the input space are divided into two parts, i.e., each source domain sampleAnd the surrounding hidden samples-> Is indicated at->A set of samples within a specific positive value Q around, i.e. in each dimension +.>And->The distance between each is smaller than Q, for each source field sample +.>Can find a group of hidden samples meeting the following requirements
Wherein Deltax is κ Representation ofAnd->Difference vector of any dimension between them, deltax is +.>And->The difference vector between the two is Q which is a self-defined value, and the Q value is not selected too much because the sample which is too far away from the current sample and possibly not already in the current category can be selected specifically according to neighborhood knowledge or repeated experiments, and each hidden sample is supposed to be- >All have the same probability of generation, i.e. obey a uniform distribution, in other words can be seen as +.>Is->Peripheral disturbance points, while Δx is the degree of random disturbance;
random sensitivity is aimed at shrinkingAnd->Predicting the output mean square error, relieving the problem of overfitting of the width network model, and simultaneously enabling the width network model to have better generalization capability for target domain samples with distribution differences;
generating a set of disturbance points for each source domain sampleWith o disturbance points per group, the total number of disturbance pointsThe perturbation points of all source domain samples are denoted +.>Disturbance Point->Only participate in the calculation of random sensitivity, not participate in the calculation of other lost items, +.>After input into the width network model, will +.>The corresponding characteristic nodes and the enhancement nodes are combined to obtain +.>Is represented as:
in the method, in the process of the invention, representing a full-feature node corresponding to the e-th disturbance point of the i-th source domain sample;
X s after being input into the width network model, X is calculated s Corresponding characteristic nodes and enhancement nodes are combined to obtain X s Is expressed as:
in the method, in the process of the invention, representing full-feature nodes corresponding to the ith source domain sample;
thus, a random sensitivity loss termExpressed as:
In the method, in the process of the invention,predictive output vector representing a disturbance sample, +.>Representation->The results after stacking o times were repeated, after stacking UW and +.>And enables a one-to-one correspondence of each source domain sample to its predicted output vector of disturbance points.
9. The method of domain-adaptive image classification based on width learning and random sensitivity according to claim 8, wherein in step 8), the final loss function of the obtained width network model according to step 2) to step 7) is expressed as:
wherein lambda is 1 Is the weight of the edge distribution alignment loss, lambda 2 Is the weight of the conventional conditional distribution alignment penalty, σ is the weight of the random sensitivity penalty term, λ 3 Is the weight of manifold regularization loss, lambda 4 Is a weight to enhance conditional distribution alignment loss, where λ 4 Above lambda 2 ;
Obtaining the value of W according to a ridge regression algorithm:
since the ridge regression algorithm is used to solve the width network model, UW and UW are combined in one solving processAt least one should be known to solve, so the first iteration is used as an initialization round of random sensitivity, no random sensitivity loss term is added in the first iteration, and the repeated stacking o times after SW is obtained to form->And it was recorded that from the addition of the random sensitivity loss term at the beginning of the second round, the first round of iteration was used to derive +. >Calculating random sensitivity, and iterating the second round of iterative calculation +.>Recording, calculating random sensitivity for the third iteration, and repeating the steps; at the same time, since the width network model generates N in each iteration drop In order to reduce the influence of the neutron model structure of different iteration rounds on random sensitivity, N used by the submodel is generated drop The masks are shared in each iteration round, with each submodel being countedFor calculating the random sensitivity loss term>All recorded by the sub-model generated by the same Mask used in the previous iteration;
in the wide network model iterationIn the next time, use +.>Wheel->As a final classification result of the model, and calculate a classification result of the model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310274623.7A CN116452854A (en) | 2023-03-20 | 2023-03-20 | Adaptive image classification method based on width learning and random sensitivity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310274623.7A CN116452854A (en) | 2023-03-20 | 2023-03-20 | Adaptive image classification method based on width learning and random sensitivity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116452854A true CN116452854A (en) | 2023-07-18 |
Family
ID=87128003
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310274623.7A Pending CN116452854A (en) | 2023-03-20 | 2023-03-20 | Adaptive image classification method based on width learning and random sensitivity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116452854A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117253097A (en) * | 2023-11-20 | 2023-12-19 | 中国科学技术大学 | Semi-supervision domain adaptive image classification method, system, equipment and storage medium |
CN118470548A (en) * | 2024-07-12 | 2024-08-09 | 湖南大学 | Heterogeneous image change detection method based on width learning |
-
2023
- 2023-03-20 CN CN202310274623.7A patent/CN116452854A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117253097A (en) * | 2023-11-20 | 2023-12-19 | 中国科学技术大学 | Semi-supervision domain adaptive image classification method, system, equipment and storage medium |
CN117253097B (en) * | 2023-11-20 | 2024-02-23 | 中国科学技术大学 | Semi-supervision domain adaptive image classification method, system, equipment and storage medium |
CN118470548A (en) * | 2024-07-12 | 2024-08-09 | 湖南大学 | Heterogeneous image change detection method based on width learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | 3D2SeqViews: Aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation | |
CN116452854A (en) | Adaptive image classification method based on width learning and random sensitivity | |
Lee et al. | Deep asymmetric multi-task feature learning | |
CN106778832B (en) | The semi-supervised Ensemble classifier method of high dimensional data based on multiple-objection optimization | |
Yang et al. | Efficient and robust multiview clustering with anchor graph regularization | |
CN108415883B (en) | Convex non-negative matrix factorization method based on subspace clustering | |
CN103336992A (en) | FNN learning algorithm | |
CN113408610B (en) | Image identification method based on adaptive matrix iteration extreme learning machine | |
Li et al. | Coupled-view deep classifier learning from multiple noisy annotators | |
CN105046323B (en) | Regularization-based RBF network multi-label classification method | |
CN112464004A (en) | Multi-view depth generation image clustering method | |
CN111652264A (en) | Negative migration sample screening method based on maximum mean difference | |
CN114612721A (en) | Image classification method based on multilevel adaptive feature fusion type increment learning | |
CN113449802A (en) | Graph classification method and device based on multi-granularity mutual information maximization | |
Utkin et al. | A deep forest improvement by using weighted schemes | |
Walchessen et al. | Neural likelihood surfaces for spatial processes with computationally intensive or intractable likelihoods | |
CN114266321A (en) | Weak supervision fuzzy clustering algorithm based on unconstrained prior information mode | |
Xie et al. | MGNR: A Multi-Granularity Neighbor Relationship and Its Application in KNN Classification and Clustering Methods | |
CN116912600A (en) | Image classification method based on variable step length ADMM algorithm extreme learning machine | |
Durasov et al. | Zigzag: Universal sampling-free uncertainty estimation through two-step inference | |
CN110288002A (en) | A kind of image classification method based on sparse Orthogonal Neural Network | |
CN111738298B (en) | MNIST handwriting digital data classification method based on deep-wide variable multi-core learning | |
Li et al. | Extracting Clean and Balanced Subset for Noisy Long-tailed Classification | |
Nagy | Data-driven analysis of fractality and other characteristics of complex networks | |
Zhang et al. | Divide and retain: a dual-phase modeling for long-tailed visual recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |