CN118334410A

CN118334410A - Cross-domain image classification method and system based on self-adaptive optimal transmission

Info

Publication number: CN118334410A
Application number: CN202410342453.6A
Authority: CN
Inventors: 杨沛; 谭琦
Original assignee: South China University of Technology SCUT; South China Normal University
Current assignee: South China University of Technology SCUT; South China Normal University
Priority date: 2024-03-25
Filing date: 2024-03-25
Publication date: 2024-07-12

Abstract

The invention discloses a cross-domain image classification method and a system based on self-adaptive optimal transmission, wherein the method comprises the following steps: acquiring original images acquired in a source field and a target field, performing image preprocessing and image enhancement, and extracting by an image feature extractor to obtain feature embedding of the image after data enhancement; the image classifier performs image classification according to the characteristic embedding of the image to obtain a prediction classification label of the image; solving the self-adaptive optimal transmission distance between the source domain image set and the target domain image set to serve as the difference degree between the source domain image set and the target domain image set; calculating the image classification loss of the image classifier on the source field image; constructing an objective function and performing iterative training to obtain a trained image feature extractor and an image classifier; and respectively extracting the characteristic embedding of the image, classifying the image in the target field, and outputting a classification label. The method and the device can effectively improve the robustness and generalization of cross-domain image classification.

Description

Cross-domain image classification method and system based on self-adaptive optimal transmission

Technical Field

The invention relates to the technical field of cross-domain image classification, in particular to a cross-domain image classification method and system based on self-adaptive optimal transmission.

Background

The cross-domain image classification technique solves the problem of how to quickly migrate an image classification system from an existing environment to a new environment. With the rapid increase in the amount of information related to images, image classification is becoming more and more important in many fields of application. The traditional image classification technology needs a large number of marked image samples, and also needs that the sample distribution in the source field and the target field meet independent same distribution, so that a better effect can be achieved. However, in practical applications, there are enough unlabeled image data and a small amount of labeled image data in many fields, so that the cost of manpower and material resources consumed for labeling a large amount of image samples is too great, and many times even not feasible. The sources of image data in different fields are different, so there is always a certain difference in feature distribution or feature space between fields. For example: the image acquisition equipment and the acquisition conditions have differences, and the images shot by indoor and outdoor, different scenes, different illumination, different angles and the like are different, and the differences of resolution, expression, action and the like can also cause the change of the characteristic distribution. The goal of cross-domain image classification is to quickly migrate an image classification system trained on a large number of annotated images in an existing environment (also known as the source domain) to a new environment (also known as the target domain).

At present, cross-domain image classification based on optimal transmission theory is one of the research directions with the most development potential in the field. The optimal transmission theory researches a difference degree problem (for example, evaluating the difference degree of a source field image set and a target field image set) between two probability distributions, which is one of core problems to be solved by cross-domain image classification. However, the classical optimal transmission problem adopts a probability-preserving quality constraint, which severely restricts the performance of the cross-domain image classification system. In the application scene of cross-domain image classification, the problems of small samples, abnormal points, noise, long tail effect, data not following independent identical distribution assumptions and the like are common. Under these scenarios, quality assurance constraints distort the transmission map, severely compromising the robustness and generalization of the cross-domain image classification system. Especially in deep learning, the training paradigm of small-batch sampling can aggravate the severity of the problem, and the existing cross-domain image classification technology based on the classical optimal transmission theory cannot meet the requirements of the practical application fields of many image classifications on the accuracy and the robustness of the cross-domain image classification.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a cross-domain image classification method and system based on self-adaptive optimal transmission.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

1. a cross-domain image classification method based on self-adaptive optimal transmission comprises the following steps:

acquiring original images acquired in a source field and a target field, wherein the source field is provided with a classification label, and the target field is not provided with a classification label;

performing image preprocessing on original images acquired in the source field and the target field;

Carrying out data enhancement on the preprocessed image;

Constructing an image feature extractor based on the depth convolution network, and performing feature extraction on the image after data enhancement by the image feature extractor to obtain feature embedding of the image;

constructing an image classifier based on the full-connection layer network, and performing image classification by the image classifier according to characteristic embedding of the image to obtain a prediction classification label of the image;

Based on feature embedding and prediction classification labels of images, optimizing and solving the self-adaptive optimal transmission distance problem between the source field image set and the target field image set to obtain the self-adaptive optimal transmission distance between the source field image set and the target field image set, wherein the self-adaptive optimal transmission distance is used as the difference degree between the source field image set and the target field image set;

Calculating image classification loss of the image classifier on the source field image based on the prediction classification label of the image;

Constructing an objective function based on the image classification loss and the degree of difference between the source field image set and the target field image set, performing gradient feedback on the image feature extractor and the image classifier based on the objective function, updating network parameters, and performing iterative training to obtain a trained image feature extractor and image classifier;

the method comprises the steps of obtaining a target field image, extracting features through a trained image feature extractor to obtain feature embedding of the image, classifying the feature embedding of the target field image based on a trained image classifier, and outputting a classification label of the image.

As a preferable technical scheme, the image feature extractor is constructed based on a depth convolution network, and specifically, the image feature extractor is constructed by adopting a residual depth network ResNet.

As a preferable technical scheme, the image classifier is constructed based on a full-connection layer network, and specifically, the image classifier is constructed by adopting a single-layer or multi-layer full-connection layer network and a soft maximization layer.

As an optimal technical scheme, an adaptive optimal transmission distance problem between a source field image set and a target field image set is optimized and solved, and the adaptive optimal transmission distance between the source field image set and the target field image set is obtained, which is specifically expressed as follows:

C_ij＝||g(x_i)-g(z_j)||-αy_itanh(f(g(z_j)))

Γ_≤(μ,v)＝{π∈P(X×Z)|π1_m≤μ,π^T1_n≤v}

Wherein, Representing a set of images of a source domain,Representing a target domain image set, W (X, Z) representing an adaptive optimal transmission distance between a source domain image set and the target domain image set, Γ representing a transmission map space, pi representing a transmission map, pi _ij representing a transmission probability mass from the source domain image X _i to the target domain image Z _j, C _ij representing a cost of transmitting a unit probability mass from the source domain image X _i to the target domain image Z _j, μ representing a probability measure of the source domain image set, v representing a probability measure of the target domain image set,A dirac function representing a source domain image,A dirac function representing the target domain image, P _i representing the probability mass of measure μ at source domain image x _i, q _j representing the probability mass of measure v at target domain image z _j, P representing the probability measure space, 1 _m representing the m-dimensional unit vector, 1 _n representing the n-dimensional unit vector, g (x _i) representing the image feature extractor mapping the source domain image to the feature space, g (z _i) representing the image feature extractor mapping the target domain image to the feature space, f () representing the image classifier, α being a non-negative coefficient, and tanh representing the tanh function.

As a preferred technical solution, calculating an image classification loss of the image classifier on the source field image, specifically expressed as:

where g () represents an image feature extractor, f () represents an image classifier, x _i represents a source field image, and y _i represents a class label.

The invention also provides a cross-domain image classification system based on self-adaptive optimal transmission, which comprises: the system comprises an original image acquisition module, an image preprocessing module, an image enhancement module, an image feature extractor, an image classifier, a self-adaptive optimal transmission distance calculation module, an image classification loss calculation module, an objective function construction module, a training module and a classification result output module;

the original image acquisition module is used for acquiring original images acquired in a source field and a target field, wherein the source field is provided with a classification label, and the target field is not provided with a classification label;

the image preprocessing module is used for preprocessing the original images acquired in the source field and the target field;

The image enhancement module is used for carrying out data enhancement on the preprocessed image;

the image feature extractor is constructed based on a depth convolution network and is used for extracting features of the image after data enhancement to obtain feature embedding of the image;

The image classifier is constructed based on a full-connection layer network and is used for classifying images according to characteristic embedding of the images to obtain prediction classification labels of the images;

The self-adaptive optimal transmission distance calculation module is used for carrying out optimization solution on the self-adaptive optimal transmission distance problem between the source field image set and the target field image set based on the characteristic embedding and prediction classification labels of the images, so as to obtain the self-adaptive optimal transmission distance between the source field image set and the target field image set, and the self-adaptive optimal transmission distance is used as the difference degree between the source field image set and the target field image set;

the image classification loss calculation module is used for calculating the image classification loss of the image classifier on the source field image based on the prediction classification label of the image;

The objective function construction module is used for constructing an objective function based on the image classification loss and the degree of difference between the source field image set and the target field image set;

The training module is used for carrying out gradient feedback on the image feature extractor and the image classifier based on the objective function, updating network parameters, and carrying out iterative training to obtain a trained image feature extractor and image classifier;

The classification result output module is used for acquiring the target field image, extracting the characteristics through the trained image characteristic extractor to obtain the characteristic embedding of the image, classifying the characteristic embedding of the target field image based on the trained image classifier, and outputting the classification label of the image.

C_ij＝||g(x_i)-g(z_j)||-ay_i tanh(f(g(z_j)))

Γ_≤(μ,v)＝{π∈P(X×Z)|π1_m≤μ,π^f1_n≤v}

As a preferred technical solution, the image classification loss calculation module is configured to calculate, based on a prediction classification label of an image, an image classification loss of an image classifier on an image in a source field, specifically expressed as:

Compared with the prior art, the invention has the following advantages and beneficial effects:

According to the method, the difference between the source domain image set and the target domain image set is measured by adopting the self-adaptive optimal transmission, the cross-domain image classification is realized based on the self-adaptive optimal transmission, the limitation of a classical optimal transmission theory can be overcome, the robustness and generalization of the cross-domain image classification can be effectively improved, and the requirements of the cross-domain image classification on the aspects of accuracy and robustness are met.

Drawings

Fig. 1 is a flow chart of a cross-domain image classification method based on adaptive optimal transmission.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1, the present embodiment provides a cross-domain image classification method based on adaptive optimal transmission, which includes the following steps:

S1: acquiring an image; acquiring original images acquired from an old environment (called a source field) and a new environment (called a target field), wherein the source field images have classification labels (taking a pedestrian recognition task as an example, if pedestrians are contained in the images, the classification labels are 'yes' and are 'no' otherwise), and the images of the target fields have no classification labels;

In this embodiment, the source field is exemplified by a clear day autopilot scene, including a large number of manually labeled images, and the target field is exemplified by a snow day autopilot scene, including images lacking labels or not manually labeled;

S2: preprocessing an image; preprocessing an original image through image preprocessing technologies such as clipping, scaling and the like to remove noise information, and keeping the dimensions of images acquired from an old environment and a new environment consistent;

In this embodiment, the original image is an original image acquired by automatic driving in a sunny scene and a snowy scene, that is, an automatic driving image;

S3: data enhancement of images: in order to solve the scarcity of the image sample and improve the generalization performance of the image classification system, the preprocessed image is also subjected to data enhancement, including but not limited to random slicing, horizontal or vertical flipping, changing illumination conditions, and the like;

s4: feature extraction of images: the method comprises the steps that a depth convolution network is used as an image feature extractor to extract features of an image subjected to data enhancement, semantic features are extracted, and feature embedding of the image is obtained;

In this embodiment, the image feature extractor is implemented by using a deep convolutional network, including but not limited to a residual depth network ResNet and the like, and is described below by taking the residual depth network ResNet as an example: resNet50 networks together have 5 convolutions, an average pooling layer. Taking input color images 224×224 as an example, first, a convolution layer conv1 with a number of 64, a convolution kernel size of 7×7, and a step size of 2, where the layer outputs a picture size of 112×112, and the number of output image channels is 64; then the layer is pooled through a maximum downsampling of 3×3, the output picture size of the layer is 56×56, and the number of output image channels is 64; stacking 4 residual network blocks, wherein the output picture size is 7 multiplied by 7, and the output image channel number is 2048; finally, the characteristic embedding of the image is obtained through an average pooling layer.

S5: cross-domain image classification: based on feature embedding of the image, classifying the image by adopting a full-connection layer network as an image classifier to obtain a prediction classification label of the image (taking a pedestrian recognition task as an example, if the image contains pedestrians, the classification label is 'yes', otherwise 'no');

in this embodiment, the image classifier is implemented using a single-layer or multi-layer fully connected layer network and a soft maximization layer;

S6: calculating the difference degree between the source domain image set and the target domain image set: based on feature embedding and prediction classification labels of images, calculating the degree of difference between a source field image set and a target field image set by using a self-adaptive optimal transmission model, and specifically comprising the following steps:

modeling a difference degree problem between a source field image set and a target field image set as a self-adaptive optimal transmission problem between solving image sets, wherein the specific process is as follows:

Set source field image set Obeying probability measuresTarget area image setObeying probability measuresWhere δ is a dirac function, measure μ has a probability mass p _i at image x _i, measure v has probability masses q _j,p_i and q _j at image z _j belonging to the probability simplex, i.eAndThe source field image set is provided with a classification label, and the classification label set is as followsWhile images of the target area have no class labels. C is the transmission cost function and,Representing the cost of transmitting the unit probability mass from the source domain image x _i to the target domain image z _j. The transmission map is represented by a joint probability pi, where pi _ij represents the transmission probability mass from the source domain image x _i to the target domain image z _j. P represents the probability measure space, 1 _m represents the m-dimensional unit vector, and 1 _n represents the n-dimensional unit vector. Converting the difference degree between the source field image set and the target field image set into a self-adaptive optimal transmission distance problem between the solving image sets, wherein the self-adaptive optimal transmission distance problem is specifically expressed as follows:

Wherein, the transmission mapping space is:

Γ_≤(μ,v)＝{π∈P(X×Z)|π1_m≤μ,π^T1_n≤v}

and after the self-adaptive optimal transmission problem between the image sets is optimally solved, obtaining the self-adaptive optimal transmission distance W (X, Z) between the image sets, and taking the self-adaptive optimal transmission distance W as the difference degree between the source domain image set and the target domain image set.

Specifically, taking an automatic driving sunny scene and snowy scene image set as an example, the embodiment adopts a self-adaptive optimal transmission model to calculate the difference degree of the automatic driving sunny scene and snowy scene image set;

Setting an automatic driving sunny scene image set Obeying probability measuresAutomatic driving snow scene image setObeying probability measuresWhere δ is a dirac function, measure μ has a probability mass p _i at image x _i, measure v has probability masses g _j,p_i and q _j at image z _j satisfyingAndThe automatic driving sunny scene image set is provided with a classification label, and the classification label set is thatWhile the set of autopilot snowy scene images has no classification labels. C is the transmission cost function and,Representing the cost of transmitting the unit probability mass from the sunny scene image x _i to the snowy scene image z _j. The transmission map is represented by a joint probability pi, where pi _ij represents the transmission probability mass from the sunny scene image x _i to the snowy scene image z _j. P represents the probability measure space, 1 _m represents the m-dimensional unit vector, and 1 _n represents the n-dimensional unit vector. The depth network comprises an image feature extractor g (x) that maps images to feature space and a classifier f (x) that maps images from feature space to class label space. The following formula is used to calculate the cost of transmitting a unit probability mass from a sunny scene image x _i to a snowy scene image z _j:

C_ij＝||g(x_i)-g(z_j)||-αy_itanh(f(g(z_j))).

the construction cost function C is to align the ultrasonic images in the feature space and the tag space at the same time, only the feature space or the tag space is not comprehensive, the difference degree of the automatic driving fine scene image set and the automatic driving snow scene image set is calculated and modeled as a self-adaptive optimal transmission distance problem between the fine scene image set and the snow scene image set is solved:

Wherein, alpha is a non-negative coefficient, and the transmission mapping space is:

Γ_≤(μ,v)＝{π∈P(X×Z)|π1_m≤μ,π^T1_n≤v}

after the self-adaptive optimal transmission problem is optimized and solved, the self-adaptive optimal transmission distance between the image sets is obtained and is used as the difference degree of the image sets of the automatic driving sunny scene and the snowy scene;

S7: calculating the image classification loss: based on the predictive classification labels of the images, the classification loss of the image classifier on the source field image is calculated, and the cross entropy loss function is adopted in the embodiment. As described in step S6, the source field image set The corresponding classified label set isG () represents an image feature extractor, f () represents an image classifier, and the cross entropy loss function is specifically expressed as:

In this embodiment, the cross entropy loss function is used to calculate the classification loss of the image classifier on the sunny scene of the automatic driving, and of course, the invention is not limited to the calculation using only the cross entropy loss function, and other classification loss functions are also applicable.

S8: calculating an objective function of the neural network: the objective function of the neural network comprises an image classification loss and the difference degree of a source field image set and a target field image set, and specifically, the objective function is the sum of the image classification loss and the difference degree of the image set, wherein the image classification loss is calculated according to the step S7, and the difference degree of the source field image set and the target field image set is calculated according to the step S6;

S9: neural network gradient feedback: gradient feedback is carried out on the deep neural network (comprising an image feature extractor and an image classifier) according to an objective function (obtained in step S8) of the neural network, and parameters of the deep neural network are updated;

S10: training a neural network: repeating the steps S4 to S9 until the neural network converges, for example: if the updated iteration number reaches the maximum iteration number, judging that the neural network is converged;

S11: outputting an image classification result: inputting the target field image, extracting the characteristics through the trained image characteristic extractor to obtain the characteristic embedding of the image, classifying the characteristic embedding of the target field image by utilizing the trained image classifier, and outputting the classification label of the image.

In this embodiment, a classification result of the automatic driving snow scene image is output, the automatic driving snow scene image is classified by using a trained image classifier, and a classification result, such as a pedestrian recognition or object classification result, is output.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The cross-domain image classification method based on the self-adaptive optimal transmission is characterized by comprising the following steps of:

Carrying out data enhancement on the preprocessed image;

2. The cross-domain image classification method based on adaptive optimal transmission according to claim 1, wherein the image feature extractor is constructed based on a depth convolution network, and particularly the image feature extractor is constructed by adopting a residual depth network ResNet.

3. The cross-domain image classification method based on self-adaptive optimal transmission according to claim 1, wherein the image classifier is constructed based on a full-connection layer network, in particular, a single-layer or multi-layer full-connection layer network and a soft maximization layer are adopted to construct the image classifier.

4. The cross-domain image classification method based on self-adaptive optimal transmission according to claim 1, wherein the self-adaptive optimal transmission distance problem between the source domain image set and the target domain image set is solved in an optimized manner, so as to obtain the self-adaptive optimal transmission distance between the source domain image set and the target domain image set, which is specifically expressed as:

C_ij＝||g(x_i)-g(z_j)||-αy_itanh(f(g(z_j)))

Γ_≤(μ,ν)＝{π∈P(X×Z)|π1_m≤μ,π^T1_n≤ν}

Wherein, Representing a set of images of a source domain,Representing a target domain image set, W (X, Z) representing an adaptive optimal transmission distance between a source domain image set and a target domain image set, Γ representing a transmission mapping space, pi representing a transmission mapping, pi _ij representing a transmission probability mass from a source domain image X _i to a target domain image Z _j, C _ij representing a cost of transmitting a unit probability mass from the source domain image X _i to the target domain image Z _j, μ representing a probability measure of the source domain image set, ν representing a probability measure of the target domain image set,A dirac function representing a source domain image,A dirac function representing the target domain image, P _i representing the probability mass of measure μ at source domain image x _i, q _k representing the probability mass of measure ν at target domain image z _j, P representing the probability measure space, 1 _m representing the m-dimensional unit vector, 1 _b representing the n-dimensional unit vector, g (x _i) representing the image feature extractor mapping the source domain image to the feature space, g (z _i) representing the image feature extractor mapping the target domain image to the feature space, f () representing the image classifier, α being a non-negative coefficient, and tanh representing the tanh function.

5. The cross-domain image classification method based on adaptive optimal transmission according to claim 1, wherein the image classification loss of the image classifier on the source domain image is calculated, specifically expressed as:

6. A cross-domain image classification system based on adaptive optimal transmission, comprising: the system comprises an original image acquisition module, an image preprocessing module, an image enhancement module, an image feature extractor, an image classifier, a self-adaptive optimal transmission distance calculation module, an image classification loss calculation module, an objective function construction module, a training module and a classification result output module;

7. The cross-domain image classification system based on adaptive optimal transmission of claim 6 wherein the image feature extractor is constructed based on a depth convolution network, in particular using a residual depth network ResNet to construct the image feature extractor.

8. The cross-domain image classification system based on adaptive optimal transmission according to claim 6, wherein the image classifier is constructed based on a full-connection layer network, and specifically, the image classifier is constructed by adopting a single-layer or multi-layer full-connection layer network and a soft maximization layer.

9. The cross-domain image classification system based on adaptive optimal transmission according to claim 6, wherein an adaptive optimal transmission distance problem between a source domain image set and a target domain image set is solved in an optimized manner, so as to obtain an adaptive optimal transmission distance between the source domain image set and the target domain image set, which is specifically expressed as:

C_ij＝||g(x_i)-g(z_j)||-αy_itanh(f(g(z_j)))

Γ_≤(μ,v)＝{π∈P(X×Z)gπ1_m≤μ,π^T1_n≤v}

Wherein, Representing a set of images of a source domain,Represents a target domain image set, W (X, Z) represents an adaptive optimal transmission distance between a source domain image set and a target domain image set, Γ represents a transmission map space, pi represents a transmission map, pi _ij represents a transmission probability mass from a source domain image X _i to a target domain image Z _j, C _ij denotes the cost of transmitting a unit probability mass from the source domain image x _i to the target domain image z _j, μ denotes the probability measure of the source domain image set, ν denotes the probability measure of the target domain image set, δ _xi denotes the dirac function of the source domain image, Delta _zj denotes the dirac function of the target domain image, p _i denotes the probability mass of measure mu at the source domain image x _i, q _j denotes the probability mass of measure v at the target domain image z _j, P denotes a probability measure space, 1 _m denotes an m-dimensional unit vector, 1 _n denotes an n-dimensional unit vector, g (x _i) denotes that the image feature extractor maps the source domain image to a feature space, g (z _i) denotes that the image feature extractor maps the target domain image to a feature space, f () represents an image classifier, α is a non-negative coefficient, and tanh represents a tanh function.

10. The cross-domain image classification system based on adaptive optimal transmission according to claim 6, wherein the image classification loss calculation module is configured to calculate an image classification loss of the image classifier on the source domain image based on the prediction classification label of the image, specifically expressed as: