CN116452854A

CN116452854A - Adaptive image classification method based on width learning and random sensitivity

Info

Publication number: CN116452854A
Application number: CN202310274623.7A
Authority: CN
Inventors: 吴永贤; 安一飞; 钟灿琨; 张建军
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-07-18

Abstract

The invention discloses a field self-adaptive image classification method based on width learning and random sensitivity, which comprises the following steps: 1) Constructing wide model input information; 2) Constructing a width network model; 3) Introducing edge distribution alignment loss into the model; 4) Introducing a finer granularity of conventional conditional distribution alignment loss into the model; 5) Iteratively selecting a pseudo tag with high quality, constructing enhanced conditional distribution alignment loss and introducing the enhanced conditional distribution alignment loss into a model; 6) Introducing popular regularization into the model, and exploring potential distribution information of the sample; 7) Introducing random sensitivity into the model, and solving the problem that the model is over-fitted on a source domain sample; 8) And solving the model connection weight and the classification result. The invention enables the model to achieve better effect on the target domains with different distributions. The invention solves the problem that the depth field self-adaptive method consumes a large amount of computation resources to a certain extent, and can further realize more flexible and accurate downstream application.

Description

Adaptive image classification method based on width learning and random sensitivity

Technical Field

The invention relates to the technical field of field self-adaption, in particular to a field self-adaption image classification method based on width learning and random sensitivity.

Background

In the machine learning field, it is generally assumed that the training set and the test set are independent of each other and have the same distribution, but in our real life, samples used for training and testing models often have different sources for image classification tasks, and it is difficult to ensure that the samples have the same distribution. Thus, a model that performs well on the training set tends to perform poorly on the test set. In order to overcome this problem, the field adaptation problem in the migration learning is widely focused, and the field adaptation problem is defined as that a source field sample and a target field sample share the same feature space, but the edge distribution and the condition distribution of the source field sample and the target field sample are different, and meanwhile, the labels of the source field sample and the target field sample belong to the same label space, namely the same category.

Furthermore, the cost of acquiring a labeled training set is high for image classification tasks, so it is desirable to learn enough knowledge from a source domain sample with rich labels to guide good classification of target domain samples with different distributions and no labels. Therefore, the self-adaption in the unsupervised field becomes an important problem of the current research and is also an important research point of the method. Unsupervised means that only the sample can participate in the training of the target domain, and its real label is not available. The self-adaptive prediction function capable of minimizing classification errors on the target domain is trained by using source domain knowledge and the distribution information of the existing target domain samples, so that the image classification task is completed better.

The self-adaptive image classification method in the existing unsupervised field aims at minimizing the distribution difference between the source domain and the target domain, so that the knowledge of the source domain can be better applied to the target domain. With the continuous progress of computer technology, two types of methods have been developed, which are respectively: 1. conventional field-adaptive image classification methods enable source and target domain samples to reduce the degree of distribution mismatch in a shared feature space, such as mapping features to a regenerated hilbert space, mainly by learning the feature space. 2. The method directly minimizes the maximum mean error between a source domain and a target domain based on the self-adaptive image classification method in the deep learning field, and learns domain invariant features by adopting an antagonism learning mode, and the accuracy is greatly improved by the deep learning-based method, but due to the problems of large parameter quantity of a deep learning model and more calculation resources, the method is improved in aspects of training and landing of the model.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, and provides a field self-adaptive image classification method based on width learning and random sensitivity, which can use a special network structure of a width network model to carry out field self-adaptive image classification, introduces various distribution alignment methods and random sensitivity, relieves the negative migration condition in the field self-adaptation process, improves the generalization capability of the width network model, has fewer parameters compared with a depth network, requires fewer calculation resources and training time, and can realize more flexible and accurate downstream application.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows: the field self-adaptive image classification method based on width learning and random sensitivity comprises the following steps:

1) Completing the construction of input data by using a source domain sample, a source domain sample label and a target domain sample;

2) Completing the construction of a feature mapping layer, an enhancement layer and an output layer of the width network model;

3) Introducing edge distribution alignment loss into a width network model, and relieving the reduction of model generalization capability caused by different source domain samples and targets and sample edge distribution;

4) Obtaining a target domain sample pseudo tag by using an SVM classifier, introducing alignment among categories with finer granularity, namely conventional conditional distribution alignment loss, into a width network model into which edge distribution alignment loss is introduced, and improving the performance of the target domain sample pseudo tag on the target domain again;

5) Considering that the pseudo tag obtained by the SVM does not have high quality, the effect of the condition distribution on the pseudo tag is reduced, so that the pseudo tag with high quality is selected through iteration, and is used for constructing enhanced condition distribution alignment loss and is introduced into a wide network model into which conventional condition distribution alignment loss is introduced, and the negative migration condition caused by the quality problem of the pseudo tag is relieved;

6) After distribution alignment is completed, in order to explore potential distribution information of the samples and smooth classification boundaries, popular regularization loss is introduced into a width network model which has introduced enhanced conditional distribution alignment loss, so that the width network model can learn the distribution condition of the samples better, and the accuracy of the samples is improved;

7) From step 2) to step 6) are all to improve the performance of the width network model from the view angle of sample distribution, and from the perspective of overfitting, in order to solve the problem that the width network model is overfitted on a source domain sample due to the fact that a standard domain real label is unavailable, the random sensitivity is introduced into the width network model which has introduced manifold regularization loss, and the generalization capability of the width network model is improved;

8) And (3) solving the connection weight of the width network model by using a ridge regression algorithm according to each loss in the steps 2) to 7), and solving a classification result.

Further, in step 1), the input data comprises a source domain sample, a target domain sample, and a source domain sample tag, wherein a real tag of the target domain sample is not available;

source field sample set X _s Expressed as:

in the method, in the process of the invention, representing real number,/->Representing the ith source domain sample, each source domain sample having a dimension of 1×d, d representing the characteristic dimension of the sample, n in total _s A source domain sample having n _s A plurality of;

target domain sample set X _t Expressed as:

in the method, in the process of the invention, representing the ith target domain sample, each target domain sample having a dimension of 1×d, n in total _t A target domain sample having n _t A plurality of;

the set of labels corresponding to each source domain sample is a source domain label set Y _s Expressed as:

in the method, in the process of the invention,c represents the total number of sample categories, +.>Representing class labels corresponding to the ith source domain sample, n is shared _s A personal tag;

the sample X input into the breadth-network model is expressed as:

X＝[X _s ；X _t ]

in the method, in the process of the invention,

the label Y input into the width network model is expressed as:

Y＝[Y _s ；Zeros]

in the method, in the process of the invention,a matrix of all 0's is used to represent tags for which the target domain samples are not available,

representing all tags entered.

Further, in step 2), the breadth-network model is divided into three layers:

the first layer is a feature mapping layer which is responsible for converting input samples into feature maps through random weights, biases and linear activation functions and storing the feature maps in feature nodes;

the second layer is an enhancement layer, and the layer maps the characteristic nodes again through random weights and biases and stores the characteristic nodes in the enhancement nodes;

the third layer is an output layer, the characteristic nodes and the enhancement nodes are combined to form full-characteristic nodes which are directly connected with the output layer, and finally, the connection weight of the width network model is solved by using a ridge regression algorithm;

The construction process of the width network model specifically comprises the following steps:

inputting X into the width network model, the i' th group of feature nodes Z containing k feature nodes _i' Expressed as:

in the method, in the process of the invention,and->Representing the weight and bias of the randomly generated i' th set of feature layer nodes,n represents the group number of feature nodes, phi _i' A linear activation function representing the i' th group of nodes with feature layers, all feature node combinations Z ⁿ Expressed as:

Z ⁿ ＝[Z ₁ ,Z ₂ ,...,Z _n ]

in the method, in the process of the invention,n x k represents the total number of feature nodes;

generating a j' th group enhancement node H by mapping feature nodes _j' Expressed as:

in the method, in the process of the invention,and->Representing randomly generated j' th set of weights and biases, m representing the total number of enhancement nodes, delta _j' Non-linear activation function representing j' th set of enhancement nodes, all enhancement nodes H ^m Expressed as:

H ^m ＝[H ₁ ,H ₂ ,...,H _m ]

in the method, in the process of the invention,

and combining all the characteristic nodes and the enhancement nodes to obtain a full-characteristic node A, wherein the full-characteristic node A is expressed as:

A＝[Z ⁿ |H ^m ]

in the method, in the process of the invention,n x k + m represents the total number of feature nodes and enhancement nodes;

the width network model processes the input label into a single-hot label, solves the connection weight by adopting a mode of minimizing the sum of prediction errors, and introduces L ₂ The canonical term post-loss function is expressed as:

wherein alpha represents L ₂ The penalty coefficient of the regularization term, W, represents the connection weight of the breadth-network model, Representing the experience matrix and I representing the identity matrix.

Further, in step 3), in order to be able to minimize the edge distribution differences of the source domain samples and the target domain samples, edge distribution alignment is introduced into the constructed width network model, using X _s And X _t The difference of the average value of the predicted output is used as a standard for measuring the edge distribution difference;

inputting X into a width network model, and obtaining a full-feature node A of X after combining the feature node and the enhancement node, wherein the full-feature node A is expressed as:

in the method, in the process of the invention,representing the transpose of the matrix, r representing the total number of source and target domain samples, i.e., n _s +n _t ，a(x _r ) Representing a full feature layer node corresponding to the r sample;

predictive output ψ of the r-th sample _r Expressed as:

ψ _r ＝a(x _r )W

introducing edge distribution alignment into a breadth-network model, which loses termsExpressed as:

the loss term is calculatedThe rewriting is as follows:

wherein Tr represents the trace of the matrix, the matrixThe calculation mode of (2) is as follows:

in the method, in the process of the invention,representing the set of source fields, +.>Representing a set of target domains with real tags not available, +.>Is->Any one of the elements, x _μ And x _γ Representing any sample.

Further, in step 4), in order to promote the condition distribution alignment effect, introducing the regular condition distribution alignment with finer granularity into a width network model into which the edge distribution alignment loss is introduced, and firstly using an SVM classifier to obtain a target domain sample pseudo tag set for calculating the regular condition distribution alignment loss;

The set of pseudo tags corresponding to each target domain sample is the target domain pseudo tag setExpressed as:

in the method, in the process of the invention, representing class labels corresponding to the jth target domain sample, n is shared _t A personal tag;

using the mean difference of the predicted outputs of samples with source and target domains belonging to the same class as a measure of the difference in conditional distributions, therefore conventional conditional distributions align with the loss termExpressed as:

in the method, in the process of the invention,a sample set representing the true labels in the source domain as class c,sample set indicating pseudo tag in target domain as category c, +.>Representing the number of samples of the source domain belonging to class c,/->Indicating the number of samples of the target domain belonging to class c, < >>Represents the jth target domain sample corresponding thereto +.>Is a pseudo tag of (2);

the loss term is calculatedThe rewriting is as follows:

in the matrixThe calculation mode of (2) is as follows:

in the method, in the process of the invention,representation->Any one of the elements.

Further, in step 5), considering that the pseudo tag obtained by the SVM classifier does not have high quality, the condition distribution will have a reduced effect on the pseudo tag, so the pseudo tag having high quality is selected by iteration, and is used for constructing enhanced condition distribution alignment loss and introduced into a wide network model into which conventional condition distribution alignment loss has been introduced, to alleviate the negative migration condition caused by the quality problem of the pseudo tag, and the method comprises the following steps:

5.1 Selecting the pseudo tag with high confidence through the submodel;

setting the number N of generated submodels _drop Setting 0 proportion as F%, then randomly generating N _drop 0 1 mask with 0 set to F%Multiplying N by different masks and A points _drop Sub-models, each sub-model is matched with X _t The mean value of the prediction output of (2) is used as the final prediction output, the variance is used as the measurement standard of the quality of the corresponding pseudo tag, and the smaller the variance is, the higher the quality is;

X _t after being input into the width network model, the method and the device are connected with X _t Corresponding characteristic nodes and enhancement nodes are combined to obtain X _t Is represented as:

in the method, in the process of the invention, representing full-feature nodes corresponding to the jth target domain sample;

after each sub-model is obtained, the output vector of the sub-model to the target domain sample is obtainedExpressed as:

in the method, in the process of the invention,representing a predicted output vector of the epsilon-th sub-model on the target domain sample;

the prediction output vector of each sub-model is added and averaged to obtainExpressed as:

in the method, in the process of the invention,representing an average value of the predicted output vectors of the respective samples of the sub-model;

will beConverted into a single-hot tag to be output, and the pseudo tag is output>At the same time, the output vector set +/of each target domain sample in each sub-model can be obtained >Expressed as:

in the method, in the process of the invention,represents the jth target domain sample N _drop A set of sub-predicted output vectors,Representing a predicted output vector of the epsilon sub-model for the jth target domain sample;

the quality of a single sample pseudo tag is calculated as follows:

wherein eta is _j Representing the quality of the pseudo tag obtained for the jth target field sample, var representing the variance,representing a prediction result set of the c-th class of j samples by all the sub-models;

5.2 Iterative integration pseudo tags participate in enhanced conditional distribution alignment;

creating a collection of high quality pseudo tagsPseudo tag for calculating current turn to obtain target domain sampleAccording to eta _j Sorting from high to low, taking P% of high-quality parts before sorting and adding the parts into a set xi, and selecting pseudo tags of the same sample in multiple iterations +.>After final classification is determined by adopting a majority voting mode, updating xi, and marking a target domain sample with a high-quality pseudo tag as +.>Marking a corresponding high quality pseudo tag as The number of target domain samples with high-quality pseudo tags is represented, and upper corner marks represent high-quality marks;

alignment of conditional distributions taking part in the next round of enhancement can be obtained by xiAnd corresponding- >Imparting X to the composition _t And corresponding->The penalty term for conventional conditional distribution alignment is kept with higher weight because it is considered X while participating in conventional conditional distribution alignment _t Pseudo tag of->Does not have high quality, but it still has positive significance for the wide network model, especially integrated in the early stages of its iteration>When the number is small;

such that each round of iteration is pairedUpdate regular conditional distribution alignment for participation in the next round and integrate +.>The effect of the enhanced condition distribution alignment of the next round is better and better, the condition distribution alignment effect is improved by distinguishing the condition distribution alignment by the weight, the negative migration condition caused by low quality of the pseudo tag is relieved, and finally the iteration is carried out>Obtaining a result after the round;

thus, according to the conditional distribution alignment principle, enhanced conditional distribution alignment lossFrom source domain sample X _s And target field sample with high quality pseudo tag +.>The mean differences in predicted outputs between the same classes are accumulated for those target domain samples and source domain samples X without high quality pseudo tags _s The alignment loss between them is 0;

enhanced conditional distribution aligns loss terms according to the rules described above Expressed as:

in the matrixThe calculation mode of (2) is as follows:

in the method, in the process of the invention,sample set representing high quality pseudo tag c in target domain sample, +.>Represents the j th ^* Target domain sample with high quality pseudo tag, < >>Represents the j th ^* High quality pseudo tag of a target domain sample with high quality pseudo tag +.>Representation->Any one of the elements->Indicating the number of samples that possess high quality labels and belong to class c.

Further, in step 6), to explore the potential distribution information of the sample, promote the generalization ability of the breadth-network model in the target domain, introduce manifold regularization penalty into the breadth-network model that has introduced enhanced conditional distribution alignment penalty, its penalty termExpressed as:

wherein a (x _μ ) For sample x _μ Is a (x) _γ ) For sample x _γ Is a full feature node, omega _μγ Representing similarity between any two samples, cosine similarity calculation was used:

in the method, in the process of the invention,tau neighbor sets representing any sample x are clustered by adopting a KNN algorithm;

will lose termsThe rewriting is as follows:

where, Ω - Δ represents a laplace matrix,omega generated by all samples _μγ The formed adjacency matrix, omega, represents a diagonal matrix, and the calculation method is as follows:

In omega _μμ Represents an element in Ω;

the laplace matrix is normalized:

in the method, in the process of the invention,representing the normalized Laplace matrix;

thus, the term is lostThe final form is expressed as:

further, in step 7), the source domain samples in the input space are divided into two parts, i.e. each source domain sampleAnd the surrounding hidden samples-> Is indicated at->A set of samples within a specific positive value Q around, i.e. in each dimension +.>And->The distance between each is smaller than Q, for each source field sample +.>Can find a set of concealment samples meeting the following requirements +.>

Wherein Deltax is _κ Representation ofAnd->Difference vector of any dimension between them, deltax is +.>And->The difference vector between the two is Q which is a self-defined value, and the Q value is not selected too much because the sample which is too far away from the current sample and possibly not already in the current category can be selected specifically according to neighborhood knowledge or repeated experiments, and each hidden sample is supposed to be->All have the same probability of generation, i.e. obey a uniform distribution, in other words can be seen as +.>Is->Peripheral disturbance points, while Δx is the degree of random disturbance;

random sensitivity is aimed at shrinkingAnd->Predicting the output mean square error, relieving the problem of overfitting of the width network model, and simultaneously enabling the width network model to have better generalization capability for target domain samples with distribution differences;

Generating a set of disturbance points for each source domain sampleWith o disturbance points per group, the total number of disturbance pointsThe perturbation points of all source domain samples are denoted +.>Disturbance Point->Only participate in the calculation of random sensitivity, not participate in the calculation of other lost items, +.>After input into the width network model, will +.>The corresponding characteristic nodes and the enhancement nodes are combined to obtain +.>Is represented as:

in the method, in the process of the invention, representing a full-feature node corresponding to the e-th disturbance point of the i-th source domain sample;

X _s after being input into the width network model, X is calculated _s Corresponding characteristic nodes and enhancement nodes are combined to obtain X _s Is expressed as:

in the method, in the process of the invention, representing full-feature nodes corresponding to the ith source domain sample;

thus, a random sensitivity loss termRepresented as：

In the method, in the process of the invention,predictive output vector representing a disturbance sample, +.>Representation->The results after stacking o times were repeated, after stacking UW and +.>And enables a one-to-one correspondence of each source domain sample to its predicted output vector of disturbance points.

Further, in step 8), the final loss function of the width network model obtained according to steps 2) to 7) is expressed as:

wherein lambda is ₁ Is the weight of the edge distribution alignment loss, lambda ₂ Is the weight of the conventional conditional distribution alignment penalty, σ is the weight of the random sensitivity penalty term, λ ₃ Is the weight of manifold regularization loss, lambda ₄ Is a weight to enhance conditional distribution alignment loss, where λ ₄ Above lambda ₂ ；

Obtaining the value of W according to a ridge regression algorithm:

since the ridge regression algorithm is used to solve the width network model, UW and UW are combined in one solving processAt least one should be known to solve, so the first iteration is used as an initialization round of random sensitivity, no random sensitivity loss term is added in the first iteration, and the repeated stacking o times after SW is obtained to form->And it was recorded that from the addition of the random sensitivity loss term at the beginning of the second round, the first round of iteration was used to derive +.>Calculating random sensitivity, and iterating the second round of iterative calculation +.>Recording, calculating random sensitivity for the third iteration, and repeating the steps; at the same time, since the width network model generates N in each iteration _drop In order to reduce the influence of the neutron model structure of different iteration rounds on random sensitivity, N used by the submodel is generated _drop The masks are shared in each iteration round, while each submodel calculates Y for use in the random sensitivity loss term _srpt All recorded by the sub-model generated by the same Mask used in the previous iteration;

in the wide network model iterationIn the next time, use +.>Wheel->As a final classification result of the model, and calculate a classification result of the model.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. compared with other deep field methods, the method greatly shortens training time and requires less calculation resources.

2. Compared with other self-adaptive image classification methods in the non-supervision field, the method provided by the invention has the advantages that the problems of negative migration and over-fitting of the model on the source field data are relieved through further introducing the enhanced distribution alignment and the random sensitivity, and the accuracy of the self-adaptive method in the field is improved.

3. The method has wide use space in the task of classifying computer vision, simple operation, strong adaptability and wide application prospect.

In summary, the invention improves the generalization capability of the breadth-network model through distribution alignment, manifold regularization and random sensitivity, so that the breadth-network model can achieve better effects on target domains with different distributions. The method solves the problems of long time consumption and large consumption of calculation resources commonly occurring in the depth field self-adaptive method to a certain extent, and can further realize more flexible and accurate downstream application.

Drawings

FIG. 1 is a schematic diagram of a logic flow of the present invention.

Fig. 2 is a schematic diagram of the present invention.

Fig. 3 is a block diagram of a breadth-network model.

Fig. 4 is a negative migration schematic.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

As shown in fig. 1 and 2, the present embodiment discloses a field-adaptive image classification method based on width learning and random sensitivity, which includes the steps of:

1) The Office-31 data set is adopted, the Amazon domain is used as a source domain, the Webcam domain is used as a target domain, and AlexNet-FC finely tuned on the source domain is adopted ₇ Is characterized by an example. The input of the method comprises a source domain sample, a target domain sample and a label of the source domain sample, wherein the real label of the target domain sample is not available.

2) The input data includes a source domain sample, a target domain sample, and a source domain sample tag, wherein a real tag of the target domain sample is not available.

Source field sample set X _s Expressed as:

target domain sample set X _t Expressed as:

the sample X input into the breadth-network model is expressed as:

X＝[X _s ；X _t ]

in the method, in the process of the invention,

the label Y input into the width network model is expressed as:

Y＝[Y _s ；Zeros]

all tags representing the input;

at this time, n _s ＝2817，n _t ＝795，d＝4096,C＝31；

As shown in fig. 3, the breadth-network model is divided into three layers:

x is input to the breadth-network model,then the i' th group of feature nodes Z comprising k feature nodes _i' Expressed as:

in the method, in the process of the invention,and->Representing randomly generated i' th group of weights and biases for feature layer nodes, n representing the number of groups of feature nodes, φ _i' A linear activation function representing the i' th group of nodes with feature layers, all feature node combinations Z ⁿ Expressed as:

Z ⁿ ＝[Z ₁ ,Z ₂ ,...,Z _n ]

in the method, in the process of the invention,n×k represents the total number of feature nodes, where k=50 and n=10 are set;

H ^m ＝[H ₁ ,H ₂ ,...,H _m ]

in the method, in the process of the invention,here, m=1500;

A＝[Z ⁿ |H ^m ]

Wherein alpha represents L ₂ The penalty coefficient of the regularization term, W, represents the connection weight of the breadth-network model,represents an experience matrix, I represents an identity matrix, where α=1×10 is set ^-10 。

3) To be able to minimize the edge distribution differences of source and target domain samples, edge distribution alignment is introduced into the constructed width network model, using X _s And X _t The difference of the average value of the predicted output is used as a standard for measuring the edge distribution difference;

in the method, in the process of the invention,representing a transpose of the matrix, r representing the total number of source and target domain samples,i.e. n _s +n _t ，a(x _r ) Representing a full-feature node corresponding to the r sample;

predictive output ψ of the r-th sample _r Expressed as:

ψ _r ＝a(x _r )W

the loss term is calculatedThe rewriting is as follows:

in the method, in the process of the invention,representing the set of source fields, +.>Representing a set of target domains whose real labels are not available.

4) In order to improve the condition distribution alignment effect, introducing the regular condition distribution alignment with finer granularity into a width network model into which edge distribution alignment loss is introduced, and firstly using an SVM classifier to obtain a target domain sample pseudo tag for calculating the regular condition distribution alignment loss;

the loss term is calculatedThe rewriting is as follows:

in the matrixThe calculation mode of (2) is as follows:

5) Considering that the pseudo tag obtained by the SVM classifier does not have high quality, the effect of the condition distribution is reduced, so that the pseudo tag with high quality is selected by iteration, and is used for constructing enhanced condition distribution alignment loss and introduced into a wide network model into which conventional condition distribution alignment loss is introduced, and the negative migration condition caused by the quality problem of the pseudo tag is relieved, as shown in the figure 4, and the method comprises the following steps:

5.1 Selecting the pseudo tag with high confidence through the submodel;

setting the number N of generated submodels _drop Setting 0 proportion as F%, then randomly generating N _drop 0 1 mask with 0 set to F%Multiplying N by different masks and A points _drop Sub-models, each sub-model is matched with X _t As the final prediction output, the variance is used as a measure of the quality of the corresponding pseudo tag, the smaller the variance is, the higher the quality is, where N is set _drop ＝3，F＝5；

the quality of a single sample pseudo tag is calculated as follows:

creating a collection of high quality pseudo tagsPseudo tag for calculating current turn to obtain target domain sampleAccording to eta _j Sorting from high to low, taking P% of high-quality parts before sorting and adding the parts into a set xi, and selecting pseudo tags of the same sample in multiple iterations +.>After final classification is determined by adopting a majority voting mode, updating xi, and marking a target domain sample with a high-quality pseudo tag as +.>Marking a corresponding high quality pseudo tag as The number of target domain samples with high quality pseudo tags is indicated, the upper corner mark indicates a high quality mark, where p=20 is set; />

Alignment of conditional distributions taking part in the next round of enhancement can be obtained by xi And corresponding->Imparting X to the composition _t And corresponding->The penalty term for conventional conditional distribution alignment is kept with a higher weight because it is considered that while participating in conventional conditional distribution alignmentX of (2) _t Pseudo tag of->Does not have high quality, but it still has positive significance for the wide network model, especially integrated in the early stages of its iteration>When the number is small;

such that each round of iteration is pairedUpdate regular conditional distribution alignment for participation in the next round and integrate +.>The effect of the enhanced condition distribution alignment of the next round is better and better, the condition distribution alignment effect is improved by distinguishing the condition distribution alignment by the weight, the negative migration condition caused by low quality of the pseudo tag is relieved, and finally the iteration is carried out>After the round, a result is obtained, here set +.>

Enhanced conditional distribution aligns loss terms according to the rules described aboveExpressed as:

in the matrixThe calculation mode of (2) is as follows:

6) To explore the potential distribution information of the sample, the generalization capability of the width network model in the target domain is improved, manifold regularization loss is introduced into the width network model with enhanced conditional distribution alignment loss introduced, and loss terms thereof are introducedRepresented as：

Wherein a (x _μ ) For sample x _μ Corresponding full feature nodes, a (x _γ ) For sample x _γ Corresponding full feature node representation, ω _μγ Representing similarity between any two samples, cosine similarity calculation was used:

will lose termsThe rewriting is as follows:

In omega _μμ Representing the elements in Ω.

The laplace matrix is normalized:

thus, the term is lostThe final form is expressed as:

7) Dividing the source domain samples in the input space into two parts, i.e. each source domain sampleAnd hidden samples around it Is indicated at->A set of samples within a specific positive value Q around, i.e. in each dimension +.>And->The distance between each is smaller than Q, for each source field sample +.>Can find a set of concealment samples meeting the following requirements +.>

Wherein Deltax is _κ Representation ofAnd->Difference vector of any dimension between them, deltax is +.>And->The difference vector between Q is a custom value, and the selection of Q value is not too large, because samples too far from the current sample and possibly not already belonging to the current class can be specifically selected according to neighborhood knowledge or repeated experiments, where q=0.05 is set, assuming that each hidden sample is->All have the same probability of generation, i.e. obey a uniform distribution, in other words can be seen as +.>Is->Peripheral disturbance points, while Δx is the degree of random disturbance;

For each source domain sampleGenerating a set of disturbance pointsWith o disturbance points per group, the total number of disturbance pointsThe perturbation points of all source domain samples are denoted +.>Disturbance Point->Only the calculation of random sensitivity is participated, and the calculation of other loss items is not participated, o=20, < ->After input into the width network model, will +.>The corresponding characteristic nodes and the enhancement nodes are combined to obtain +.>Is represented as:

in the method, in the process of the invention,representing full-feature nodes corresponding to the ith source domain sample;

thus, a random sensitivity loss termExpressed as:

8) The final loss function of the width network model obtained according to steps 2) to 7) is expressed as:

wherein lambda is ₁ Is the weight of the edge distribution alignment loss, lambda ₂ Is the weight of the conventional conditional distribution alignment penalty, σ is the weight of the random sensitivity penalty term, λ ₃ Is the weight of manifold regularization loss, lambda ₄ Is a weight to enhance conditional distribution alignment loss, where λ ₄ Above lambda ₂ Here set lambda ₁ ＝10，λ ₂ ＝10，σ＝1，λ ₃ ＝1，λ ₄ ＝30；；

Obtaining the value of W according to a ridge regression algorithm:

since the ridge regression algorithm is used to solve the width network model, UW and UW are combined in one solving processAt least one should be known to solve, so the first iteration is used as an initialization round of random sensitivity, no random sensitivity loss term is added in the first iteration, and the repeated stacking o times after SW is obtained to form->And it was recorded that from the addition of the random sensitivity loss term at the beginning of the second round, the first round of iteration was used to derive +.>Calculating random sensitivity, and iterating the second round of iterative calculation +.>Recording, calculating random sensitivity for the third iteration, and repeating the steps; at the same time, since the width network model generates N in each iteration _drop In order to reduce the influence of the neutron model structure of different iteration rounds on random sensitivity, N used by the submodel is generated _drop The individual masks are shared in the individual iteration rounds, while each sub-model calculates the random sensitivity loss term +. >All recorded by the sub-model generated by the same Mask used in the previous iteration;

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The field self-adaptive image classification method based on width learning and random sensitivity is characterized by comprising the following steps of:

2. The method of claim 1, wherein in step 1), the input data includes a source domain sample, a target domain sample, and a source domain sample tag, wherein a real tag of the target domain sample is not available;

source field sample set X _s Expressed as:

in the method, in the process of the invention, representing real number,/->Representing the ith source domain sample, each source domain sample having a dimension of 1×d, d representing the characteristic dimension of the sample, n in total _s A source domain sample of the source domainThe sample has n _s A plurality of;

target domain sample set X _t Expressed as:

the sample X input into the breadth-network model is expressed as:

X＝[X _s ；X _t ]

in the method, in the process of the invention,

the label Y input into the width network model is expressed as:

Y＝[Y _s ；Zeros]

in the method, in the process of the invention,for a label of all 0 matrices for representing that the target domain sample is not available, < > >Representing all tags entered.

3. The domain adaptive image classification method based on width learning and random sensitivity according to claim 2, wherein in step 2), the width network model is divided into three layers:

Z ⁿ ＝[Z ₁ ,Z ₂ ,...,Z _n ]

H ^m ＝[H ₁ ,H ₂ ,...,H _m ]

in the method, in the process of the invention,

A＝[Z ⁿ |H ^m ]

wherein alpha represents L ₂ The penalty coefficient of the regularization term, W, represents the connection weight of the breadth-network model,representing the experience matrix and I representing the identity matrix.

4. The method of domain-adaptive image classification based on width learning and random sensitivity according to claim 3, wherein in step 3), in order to minimize the edge distribution difference between the source domain sample and the target domain sample, edge distribution alignment is introduced into the constructed width network model using X _s And X _t The difference of the average value of the predicted output is used as a standard for measuring the edge distribution difference;

predictive output ψ of the r-th sample _r Expressed as:

ψ _r ＝a(x _r )W

the loss term is calculatedThe rewriting is as follows:

5. The method of adaptive image classification in the domain based on breadth learning and random sensitivity according to claim 4, wherein in step 4), in order to enhance the condition distribution alignment effect, introducing a finer granularity of regular condition distribution alignment into a breadth network model into which an edge distribution alignment loss has been introduced, first obtaining a target domain sample pseudo-tag set using an SVM classifier for calculating the regular condition distribution alignment loss;

in the method, in the process of the invention,sample set representing real tags in source domain as category c,/-for>Sample set indicating pseudo tag in target domain as category c, +.>Representing the number of samples of the source domain belonging to class c,/->Indicating the number of samples of the target domain belonging to class c, < >>Represents the jth target domain sample corresponding thereto +.>Is a pseudo tag of (2);

the loss term is calculatedThe rewriting is as follows:

in the matrixThe calculation mode of (2) is as follows:

6. The field adaptive image classification method based on width learning and random sensitivity according to claim 5, wherein in step 5), considering that the pseudo tag obtained by the SVM classifier does not have high quality, the condition distribution is reduced in effect, so that the pseudo tag having high quality is selected by iteration, for constructing enhanced condition distribution alignment loss and introducing into the width network model into which conventional condition distribution alignment loss has been introduced, to alleviate the negative migration condition caused by the quality problem of the pseudo tag, comprising the steps of:

5.1 Selecting the pseudo tag with high confidence through the submodel;

will beIs converted into a single-heat labelLine output, pseudo tag as output +.>At the same time, the output vector set +/of each target domain sample in each sub-model can be obtained >Expressed as:

the quality of a single sample pseudo tag is calculated as follows:

creating a collection of high quality pseudo tagsPseudo tag of target domain sample obtained by calculating current round +.>According to eta _j Sequencing from high to low, adding the high-quality part of the P% before sequencing into a set xi, pseudo tag for the same sample selected in multiple iterations +.>After final classification is determined by adopting a majority voting mode, updating xi, and marking a target domain sample with a high-quality pseudo tag as +.>Marking a corresponding high quality pseudo tag as The number of target domain samples with high-quality pseudo tags is represented, and upper corner marks represent high-quality marks;

in the matrixThe calculation mode of (2) is as follows:

7. The method of claim 6, wherein in step 6), in order to explore the potential distribution information of the sample, the generalization ability of the breadth-network model in the target domain is improved, and manifold regularization loss is introduced into the breadth-network model with enhanced conditional distribution alignment loss introduced, the loss term thereofExpressed as:

will lose termsThe rewriting is as follows:

where, Ω - Δ represents a laplace matrix, Omega generated by all samples _μγ The formed adjacency matrix, omega, represents a diagonal matrix, and the calculation method is as follows:

in omega _μμ Represents an element in Ω;

the laplace matrix is normalized:

thus, the term is lostThe final form is expressed as:

8. the method of adaptive image classification based on width learning and random sensitivity according to claim 7, wherein in step 7), the source domain samples in the input space are divided into two parts, i.e., each source domain sampleAnd the surrounding hidden samples-> Is indicated at->A set of samples within a specific positive value Q around, i.e. in each dimension +.>And->The distance between each is smaller than Q, for each source field sample +.>Can find a group of hidden samples meeting the following requirements

Wherein Deltax is _κ Representation ofAnd->Difference vector of any dimension between them, deltax is +.>And->The difference vector between the two is Q which is a self-defined value, and the Q value is not selected too much because the sample which is too far away from the current sample and possibly not already in the current category can be selected specifically according to neighborhood knowledge or repeated experiments, and each hidden sample is supposed to be- >All have the same probability of generation, i.e. obey a uniform distribution, in other words can be seen as +.>Is->Peripheral disturbance points, while Δx is the degree of random disturbance;

thus, a random sensitivity loss termExpressed as:

9. The method of domain-adaptive image classification based on width learning and random sensitivity according to claim 8, wherein in step 8), the final loss function of the obtained width network model according to step 2) to step 7) is expressed as:

Obtaining the value of W according to a ridge regression algorithm:

since the ridge regression algorithm is used to solve the width network model, UW and UW are combined in one solving processAt least one should be known to solve, so the first iteration is used as an initialization round of random sensitivity, no random sensitivity loss term is added in the first iteration, and the repeated stacking o times after SW is obtained to form->And it was recorded that from the addition of the random sensitivity loss term at the beginning of the second round, the first round of iteration was used to derive +. >Calculating random sensitivity, and iterating the second round of iterative calculation +.>Recording, calculating random sensitivity for the third iteration, and repeating the steps; at the same time, since the width network model generates N in each iteration _drop In order to reduce the influence of the neutron model structure of different iteration rounds on random sensitivity, N used by the submodel is generated _drop The masks are shared in each iteration round, with each submodel being countedFor calculating the random sensitivity loss term>All recorded by the sub-model generated by the same Mask used in the previous iteration;