CN113205522B - Intelligent image clipping method and system based on antithetical domain adaptation - Google Patents

Intelligent image clipping method and system based on antithetical domain adaptation Download PDF

Info

Publication number
CN113205522B
CN113205522B CN202110466563.XA CN202110466563A CN113205522B CN 113205522 B CN113205522 B CN 113205522B CN 202110466563 A CN202110466563 A CN 202110466563A CN 113205522 B CN113205522 B CN 113205522B
Authority
CN
China
Prior art keywords
aesthetic
domain
loss
sample
classifiers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110466563.XA
Other languages
Chinese (zh)
Other versions
CN113205522A (en
Inventor
桑农
王皓文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110466563.XA priority Critical patent/CN113205522B/en
Publication of CN113205522A publication Critical patent/CN113205522A/en
Application granted granted Critical
Publication of CN113205522B publication Critical patent/CN113205522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention provides an image intelligent cutting method and system based on antithetical domain adaptation, belonging to the field of computer vision, and the method specifically comprises the following steps: inputting a target domain image to be cut in a target application scene into a trained feature extractor to obtain global features; resampling the global features according to a preset clipping mode; inputting the regional characteristics into an aesthetic classifier to perform aesthetic scoring, and screening and cutting results; the training process of the feature extractor is as follows: inverting the domain adaptive loss gradient calculated based on the target domain sample and transmitting the inverted domain adaptive loss gradient to a feature extractor, keeping the domain adaptive loss gradient calculated based on the source domain sample unchanged and transmitting the unchanged domain adaptive loss gradient to the feature extractor, and enabling the feature extractor to learn the capability of aligning the global features of the source domain sample and the target domain sample; adjusting self parameters according to the aesthetic loss, and learning the aesthetic analysis capability; the training of the aesthetic classifier is to adjust its parameters according to the aesthetic loss. The invention solves the problem that the performance of the existing intelligent cutting method is obviously reduced during cross-domain test.

Description

Intelligent image clipping method and system based on antithetical domain adaptation
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an image intelligent clipping method and system based on anti-domain adaptation.
Background
The intelligent cutting task is to cut out a region with more reasonable composition from an original image as a new generated image and replace an original image with low quality with a generated image with high quality. Because the method only relates to clipping, the operation is simple and convenient, but the aesthetic quality of the image can be obviously improved, and the image clipping is widely applied as an image editing operation. However, manual cutting requires a certain technical threshold, which is time-consuming and not a skill that all people can master. Therefore, it becomes significant to automatically crop using an intelligent algorithm. Compared with manual cutting, the intelligent cutting does not need a user to have extremely high aesthetic literacy, does not need to know composition rules such as a trisection method and a diagonal method, can quickly cut a large number of images relatively reasonably, and presents the images to the user for screening.
The current intelligent cutting method is divided into two main categories from the aspect of training and test data distribution, wherein the first category is a cutting algorithm with aesthetic supervision on a test domain, and the supervision information of the cutting algorithm comprises an aesthetic label of each cutting subgraph; the second category is unsupervised patterning algorithms over the test domain, which require learning of aesthetic appreciation power by means of training domain labels, and generalization and reasoning over the test domain. For the second category of problems, the domain in which the training samples are located is generally referred to as the source domain, and the domain in which the test samples are located is referred to as the target domain. In practical applications, a researcher cannot predict a domain to which a sample to be clipped belongs, and cannot prepare a training set for each scene. The source domain where the training sample is located and the target domain where the test sample is located are often different, and the difference causes the performance of the trained model to be reduced in the target domain scene, which is called domain deviation.
The existing intelligent clipping algorithm does not aim at solving the domain deviation problem, most of the clipping algorithms are trained on a single data set, and the trained model can be considered to be suitable for most of the situations. In fact, however, there is also a significant domain shift problem in the intelligent clipping task, and the performance of the existing clipping model is significantly degraded in the cross-domain and cross-dataset testing process.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide an image intelligent clipping method and system based on anti-domain adaptation, and aims to solve the problem that the performance of a clipping model is remarkably reduced when a cross-domain and cross-dataset test is carried out due to the fact that the domain migration problem is not considered in the existing intelligent composition method.
In order to achieve the above object, in one aspect, the present invention provides an image intelligent clipping method based on anti-domain adaptation, including the following steps:
inputting a target domain image to be cut in a target application scene into a trained feature extractor to obtain global features;
resampling the global features according to a preset cutting mode, inputting the obtained regional features into a trained aesthetic classifier for aesthetic scoring, and screening out a cutting result;
wherein, the training process of the feature extractor is as follows: inverting the gradient of the domain adaptive loss back-propagation calculated based on the target domain sample and transmitting the inverted gradient to a feature extractor, keeping the gradient of the domain adaptive loss calculated based on the source domain sample unchanged and transmitting the unchanged gradient to the feature extractor, and enabling the feature extractor to learn the capability of aligning the global features of the source domain sample and the target domain sample; self parameters are adjusted according to the aesthetic loss, and the aesthetic analysis capability is learned; the training process of the aesthetic classifier is to adjust the parameters of the aesthetic classifier according to the aesthetic loss;
the target domain sample is any label-free image in a target application scene, and the source domain sample is an image with a cut subgraph and an aesthetic label in another scene; the domain adaptive loss is obtained based on a sample domain label and a domain discrimination result; the domain discrimination result is obtained by the global feature input domain discriminator; the aesthetic loss is obtained by combining the aesthetic score and the aesthetic label calculation.
Preferably, the method of training the feature extractor and aesthetic classifier comprises the steps of:
inputting the target domain sample and the source domain sample into a feature extractor to extract global features;
inputting the global features into a domain discriminator, combining the output domain discrimination result with a sample domain label, and calculating the domain adaptive loss; resampling the global features to obtain regional features; the channel dimension of the region feature corresponds to the cutting subgraph;
transmitting the regional characteristics to an aesthetic classifier to obtain an aesthetic score, and calculating an aesthetic loss by combining an aesthetic label;
reversely transmitting the domain adaptive loss function to a domain discriminator to optimize the parameters of the domain adaptive loss function;
the domain adaptive loss gradient calculated based on the target domain sample is transmitted to a feature extractor after being inverted, the domain adaptive loss gradient calculated based on the source domain sample is transmitted to the feature extractor unchanged, and the feature extractor learns the capability of aligning the global features of the source domain sample and the target domain sample;
and meanwhile, the aesthetic loss is transmitted back to the aesthetic classifier and the feature extractor for self-parameter optimization.
Preferably, the method of obtaining the aesthetic score is:
transmitting the regional characteristics to a plurality of aesthetic classifiers and respectively acquiring aesthetic scores;
taking the mean value of the aesthetic scores as the aesthetic score;
or transmitting the regional characteristics to a single aesthetic classifier to obtain the aesthetic score.
Preferably, when there are a plurality of aesthetic classifiers, the consistency loss between the aesthetic classifiers is calculated based on the aesthetic scores obtained by the aesthetic classifiers;
acquiring the weight loss among the aesthetic classifiers based on the parameters of the aesthetic classifiers;
and constructing an aesthetic correlation loss function by using the consistency loss, the weight loss and the aesthetic loss, and transmitting the aesthetic correlation loss function back to the aesthetic classifier and the feature extractor for self parameter optimization.
Preferably, the domain adaptation penalty is:
Ld=-(y'log y+(1-y')log(1-y))
when the current sample I comes from the source domain, y' is 0; when the current sample I is from the target domain, y' is 1; and y is the output result of the domain discriminator, and the value is between 0 and 1 after the output result is subjected to softmax normalization.
Preferably, the aesthetic loss is:
Figure GDA0003547800150000041
Figure GDA0003547800150000042
wherein Sc is an aesthetic label sequence of a cutting subgraph; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of cutting modes; xdisErrors for aesthetic scores and aesthetic labels;
the loss of consistency among aesthetic classifiers is:
Figure GDA0003547800150000043
wherein the aesthetic classifiers are provided with two, P1,P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers;
the weight loss between aesthetic classifiers is:
Figure GDA0003547800150000044
wherein the aesthetic classifiers are provided in two, W1And W2Parameters for two aesthetic classifiers; m is the length of the classifier after expansion, and j is the current classifier parameter position.
On the other hand, the invention provides an image intelligent cutting system based on the pertinence domain adaptation, which comprises a pertinence domain adaptation module, a feature extractor, a resampling structure and an aesthetic scoring module;
one port of the feature extractor is in bidirectional data transmission with the impedance domain adaptation module, and the other port of the feature extractor is in bidirectional data transmission with the aesthetic scoring module through the resampling structure;
the feature extractor is used for extracting global features of the target domain image and the source domain image; inverting the domain adaptive loss gradient calculated based on the target domain sample and transmitting the inverted domain adaptive loss gradient to a feature extractor, keeping the domain adaptive loss gradient calculated based on the source domain sample unchanged and transmitting the unchanged domain adaptive loss gradient to the feature extractor, and learning the capability of aligning the global features of the source domain sample and the target domain sample by the feature extractor; self parameters are adjusted according to the aesthetic loss, and the aesthetic analysis capability is learned;
the resampling structure is used for resampling the global features according to a preset cutting mode to obtain regional features;
the aesthetic scoring module is used for performing aesthetic scoring based on the regional characteristics and screening out a cutting result; self parameters of the aesthetic classifier in the aesthetic scoring module are adjusted according to the aesthetic loss;
the anti-domain adaptation module is used for judging whether the current sample is a target domain sample or a source domain sample according to the global characteristics and outputting a domain judgment result; calculating the domain adaptive loss based on the domain discrimination result and the domain label of the sample; the domain adaptive loss is transmitted back to a domain discriminator to adjust the parameters of the domain discriminator; the domain adaptive loss gradient calculated based on the target domain is reversely transmitted to the feature extractor, and the domain adaptive loss gradient calculated based on the source domain sample is transmitted to the feature extractor unchanged;
the target domain sample is any label-free image in a target application scene, and the source domain sample is an image with a cut subgraph and an aesthetic label in another scene;
the resampling structure comprises two ROIAlign layers which are respectively used for extracting foreground features and background features of the cutting area, and the foreground features and the background features are overlapped according to dimensions to generate area features.
Preferably, the antagonizing domain adapting module comprises an adaptive gradient inversion layer, a domain discriminator and a discrimination calculation unit;
the domain discriminator is used for judging whether the current sample is a target domain sample or a source domain sample according to the global characteristics and outputting a domain discrimination result; and self parameters are adjusted according to the domain adaptive loss;
the discrimination calculation unit is used for calculating the domain adaptive loss based on the domain discrimination result and the domain label of the sample; the domain adaptive loss is transmitted back to a domain discriminator and an adaptive gradient inversion layer;
the adaptive gradient inversion layer is used for transmitting the domain adaptive loss gradient calculated based on the target domain sample to the feature extractor after inversion, and transmitting the domain adaptive loss gradient calculated based on the source domain sample to the feature extractor while keeping the domain adaptive loss gradient unchanged.
Preferably, the aesthetic scoring module comprises an aesthetic classifier and an aesthetic computing unit;
if an aesthetic classifier is present, the aesthetic classifier is connected to the resampling structure; if a plurality of aesthetic classifiers exist, one end of each aesthetic classifier is connected with the resampling structure after the aesthetic classifiers are connected in parallel; the other end of the device is connected with an aesthetic calculation unit;
the aesthetic classifier is used for carrying out aesthetic scoring according to the regional characteristics;
the aesthetic calculating unit is used for calculating the average value of the aesthetic scores and outputting the aesthetic scores; calculating consistency loss among the aesthetic classifiers based on the aesthetic scores obtained by the aesthetic classifiers; acquiring the weight loss among the aesthetic classifiers based on the parameters of the aesthetic classifiers; and constructing an aesthetic correlation loss function by using the consistency loss, the weight loss and the aesthetic loss, and transmitting the aesthetic correlation loss function back to the aesthetic classifier and the feature extractor for self parameter optimization.
Preferably, the aesthetic loss is:
Figure GDA0003547800150000061
Figure GDA0003547800150000062
wherein Sc is an aesthetic label sequence of a cutting subgraph; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of cutting modes; xdisErrors for aesthetic scores and aesthetic labels;
the loss of consistency among aesthetic classifiers is:
Figure GDA0003547800150000063
wherein the aesthetic classifiers are provided in two, P1,P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers;
the weight loss between aesthetic classifiers is:
Figure GDA0003547800150000071
wherein the aesthetic classifiers are provided with two, W1And W2Parameters for two aesthetic classifiers; m is the length of the classifier after expansion, and j is the current classifier parameter position.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
the feature extractor in the image intelligent cutting method provided by the invention realizes training by arranging an adaptive gradient inversion layer and a domain discriminator. According to the domain adaptive loss, the source domain and the target domain samples can be distinguished by the discriminator, and the parameters of the domain discriminator can be continuously adjusted through the domain adaptive loss, so that the domain adaptive loss is reduced; the smaller the domain adaptive loss is, the stronger the ability of the domain discriminator to identify the source domain sample and the target domain sample is; the self-adaptive gradient inversion layer inverts the gradient of the domain adaptive loss feedback calculated based on the target domain sample, so that the feature extractor is aligned with the global features of the source domain and the target domain sample.
The image intelligent cutting method based on the antithetical domain adaptation supports a plurality of aesthetic classifiers with different parameters and the same structure to construct an aesthetic scoring module; on one hand, an aesthetic scoring module is used for predicting the aesthetic scoring of each cut sub-image, on the other hand, consistency loss among aesthetic classifiers is adopted to constrain the output results of a plurality of aesthetic classifiers to be consistent, and the invariant features of the cut sub-images are extracted in an auxiliary manner; meanwhile, the parameters of each aesthetic classifier are adjusted by adopting the weight loss among the aesthetic classifiers, so that the aesthetic classifiers with different parameters can output consistent results to the maximum extent, and the adaptability of the cutting model to different input images is enhanced.
Drawings
FIG. 1 is a flowchart of an image intelligent clipping method based on robust domain adaptation according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an image intelligent cropping system for resisting domain adaptation provided by an embodiment of the present invention;
fig. 3 is a schematic diagram of a network CNN structure and a data flow provided in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In one aspect, as shown in fig. 1, the present invention provides an image intelligent clipping method based on anti-domain adaptation, including the following steps:
inputting a target domain image to be cut in a target application scene into a trained feature extractor to obtain global features;
resampling the global features according to a preset cutting mode, inputting the obtained regional features into a trained aesthetic classifier for aesthetic scoring, and screening out a cutting result;
wherein, the training process of the feature extractor is as follows: inputting a target domain image to be cut into a trained feature extractor to obtain global features;
resampling the global features according to a preset clipping subgraph, inputting the obtained regional features into a trained aesthetic classifier for aesthetic scoring, and screening out a clipping result;
wherein, the training process of the feature extractor is as follows: inverting the gradient of the domain adaptive loss back-propagation calculated based on the target domain sample and transmitting the inverted gradient to a feature extractor, keeping the gradient of the domain adaptive loss calculated based on the source domain sample unchanged and transmitting the unchanged gradient to the feature extractor, and enabling the feature extractor to learn the capability of aligning the global features of the source domain sample and the target domain sample; self parameters are adjusted according to the aesthetic loss, and the aesthetic analysis capability is learned; the training process of the aesthetic classifier is to adjust the parameters of the aesthetic classifier according to the aesthetic loss;
the target domain sample is any label-free image in a target application scene, and the source domain sample is an image with a cut subgraph and an aesthetic label in another scene; the domain adaptive loss is obtained based on a sample domain label and a domain discrimination result; the domain discrimination result is obtained by the global feature input domain discriminator; the aesthetic loss is obtained by combining the aesthetic score and the aesthetic label calculation.
Preferably, the method of training the feature extractor and aesthetic classifier comprises the steps of:
inputting the target domain image and the source domain image into a feature extractor to extract global features;
inputting the global features into a domain discriminator, combining the output domain discrimination result with the domain label of the sample, and calculating the domain adaptive loss; resampling the global features to obtain regional features; the channel dimension of the region feature corresponds to the cutting subgraph;
transmitting the regional characteristics to an aesthetic classifier to obtain an aesthetic score, and calculating an aesthetic loss by combining an aesthetic label;
the adaptive loss function is transmitted back to the domain discriminator to carry out self parameter optimization;
inverting the adaptive loss gradient calculated based on the target domain sample and transmitting the inverted adaptive loss gradient to a feature extractor, keeping the domain adaptive loss gradient calculated based on the source domain sample unchanged and transmitting the unchanged adaptive loss gradient to the feature extractor, and learning the capability of aligning the global features of the source domain sample and the target domain sample by the feature extractor;
and meanwhile, the aesthetic loss is transmitted back to the aesthetic classifier and the feature extractor for self-parameter optimization.
Preferably, the method of obtaining the aesthetic score is:
transmitting the regional characteristics to a plurality of aesthetic classifiers and respectively acquiring aesthetic scores;
taking the average value of the aesthetic scores as an aesthetic score;
or transmitting the regional characteristics to a single aesthetic classifier to obtain the aesthetic score.
Preferably, when there are a plurality of aesthetic classifiers, the loss of consistency among the aesthetic classifiers is calculated based on the aesthetic scores obtained by the aesthetic classifiers;
acquiring the weight loss among the aesthetic classifiers based on the parameters of the aesthetic classifiers;
and constructing an aesthetic correlation loss function by using the consistency loss, the weight loss and the aesthetic loss, and transmitting the aesthetic correlation loss function back to the aesthetic classifier and the feature extractor for self parameter optimization.
Preferably, the domain adaptation penalty is:
Ld=-(y'log y+(1-y')log(1-y))
when the current sample I comes from the source domain, y' is 0; when the current sample I is from the target domain, y' is 1; and y is the output result of the domain discriminator, and the value is between 0 and 1 after the output result is subjected to softmax normalization.
Preferably, the aesthetic loss is:
Figure GDA0003547800150000101
Figure GDA0003547800150000102
wherein Sc is an aesthetic label sequence of a cutting subgraph; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of cutting modes; xdisErrors for aesthetic scores and aesthetic labels;
the loss of consistency among aesthetic classifiers is:
Figure GDA0003547800150000103
wherein the aesthetic classifiers are provided with two, P1,P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers;
the weight loss between aesthetic classifiers is:
Figure GDA0003547800150000104
wherein the aesthetic classifiers are provided with two, W1And W2Parameters for two aesthetic classifiers; m is the length of the classifier after expansion, and j is the current classifier parameter position.
In another aspect, as shown in fig. 2 and 3, the present invention provides an image intelligent cropping system based on anti-domain adaptation, comprising an anti-domain adaptation module, a feature extractor, a resampling structure and an aesthetic scoring module;
one port of the feature extractor is in bidirectional data transmission with the reactance domain adaptation module, and the other port of the feature extractor is in bidirectional data transmission with the aesthetic scoring module through the resampling structure;
the feature extractor is used for extracting global features of the target domain image and the source domain image; inverting the domain adaptive loss gradient calculated based on the target domain sample and transmitting the inverted domain adaptive loss gradient to a feature extractor, keeping the domain adaptive loss gradient calculated based on the source domain sample unchanged and transmitting the unchanged domain adaptive loss gradient to the feature extractor, and learning the capability of aligning the global features of the source domain sample and the target domain sample by the feature extractor; self parameters are adjusted according to the aesthetic loss, and the aesthetic analysis capability is learned;
the resampling structure is used for resampling the global features according to a preset cutting mode to obtain regional features;
the aesthetic scoring module is used for performing aesthetic scoring based on the regional characteristics and screening out a cutting result; self parameters of the aesthetic classifier in the aesthetic scoring module are adjusted according to the aesthetic loss;
the anti-domain adaptation module is used for judging whether the current sample is a target domain sample or a source domain sample according to the global characteristics and outputting a domain judgment result; calculating the domain adaptive loss based on the domain discrimination result and the domain label of the sample; the domain adaptive loss is transmitted back to a domain discriminator to adjust the parameters of the domain discriminator; the domain adaptive loss gradient calculated based on the target domain is reversely transmitted to the feature extractor, and the domain adaptive loss gradient calculated based on the source domain sample is transmitted to the feature extractor unchanged;
the target domain sample is any label-free image in a target application scene, and the source domain sample is an image with a cut subgraph and an aesthetic label in another scene;
the resampling structure comprises two ROIAlign layers which are respectively used for extracting foreground features and background features of the cutting area, and the foreground features and the background features are overlapped according to dimensions to generate area features.
Preferably, the antagonizing domain adapting module comprises an adaptive gradient inversion layer, a domain discriminator and a discrimination calculation unit;
the domain discriminator is used for judging whether the current sample is a target domain sample or a source domain sample according to the global characteristics and outputting a discrimination result; and self parameters are adjusted according to the domain adaptive loss;
the discrimination calculation unit is used for calculating the domain adaptive loss based on the discrimination result and the domain label of the sample; the domain adaptive loss is transmitted back to a domain discriminator and an adaptive gradient inversion layer;
the adaptive gradient inversion layer is used for transmitting the domain adaptive loss gradient calculated based on the target domain sample to the feature extractor after inversion, and transmitting the domain adaptive loss gradient calculated based on the source domain sample to the feature extractor while keeping the domain adaptive loss gradient unchanged.
Preferably, the aesthetic scoring module comprises an aesthetic classifier and an aesthetic computing unit;
if an aesthetic classifier is present, the aesthetic classifier is connected to the resampling structure; if a plurality of aesthetic classifiers exist, one end of each aesthetic classifier is connected with the resampling structure after the aesthetic classifiers are connected in parallel; the other end of the device is connected with an aesthetic calculation unit;
the aesthetic classifier is used for performing aesthetic scoring according to the regional characteristics;
the aesthetic calculating unit is used for calculating the average value of the aesthetic scores and outputting the aesthetic scores; calculating consistency loss among the aesthetic classifiers based on the aesthetic scores obtained by the aesthetic classifiers; acquiring weight loss among the aesthetic classifiers based on parameters of the aesthetic classifiers; and constructing an aesthetic correlation loss function by using the consistency loss, the weight loss and the aesthetic loss, and transmitting the aesthetic correlation loss function back to the aesthetic classifier and the feature extractor for self parameter optimization.
Preferably, the aesthetic loss is:
Figure GDA0003547800150000121
Figure GDA0003547800150000122
wherein Sc is an aesthetic label of a cut subgraphA sequence; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of cutting modes; xdisErrors for aesthetic scores and aesthetic labels;
the loss of consistency among aesthetic classifiers is:
Figure GDA0003547800150000131
wherein the aesthetic classifiers are provided with two, P1,P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers;
the weight loss between aesthetic classifiers is:
Figure GDA0003547800150000132
wherein the aesthetic classifiers are provided with two, W1And W2Parameters for two aesthetic classifiers; m is the length of the classifier after expansion, and j is the current classifier parameter position. Examples
As shown in FIG. 1, the invention provides an image intelligent cropping method based on robust domain adaptation, which comprises the following steps:
a training stage:
(1) taking an RGB (color) image without a label collected in a target application scene as a target domain sample, and taking an RGB image with a cut subgraph and an aesthetic label in another scene as a source domain sample;
in particular, assume that there is one source domain sample, comprising one source domain image IsAnd a set of aesthetic labels as={as(0),...,as(n) }; the n aesthetic labels correspond to the n cut subgraphs; the cutting subgraph is formed by cutting a source domain image and corresponds to n preset cutting modes B ═ { B (0), B (n);
one target domain sample only comprises one target domain image ItNot including aesthetic labels atHowever, n preset cropping modes B ═ { B (0),.. B (n) } are known, and the target domain samples are croppedThe cutting mode is the same as the cutting mode of the source domain sample;
(2) inputting the paired source domain image and target domain image into a feature extractor, and extracting global features of the source domain image and the target domain image; transmitting the global features to a resampling structure to obtain regional features;
specifically, a source domain sample and a target domain sample are combined to be used as a basic unit of one iteration optimization;
inputting the source domain samples and the target domain samples into a feature extractor, and acquiring global features of the source domain samples and the target domain samples
Figure GDA0003547800150000141
According to the position of the cutting subgraph, global features are matched
Figure GDA0003547800150000142
Resampling is carried out to obtain the regional characteristics of the source domain sample and the target domain sample
Figure GDA0003547800150000143
The channel dimension of the region feature corresponds to the cutting subgraph;
(3) sending the global characteristics of the source domain sample and the target domain sample into a domain discriminator, and calculating the domain adaptive loss according to the image label;
specifically, the domain discriminator DdomainThe method is used for predicting which domain a current sample belongs to and is formed by convolution stacking, the input is the global characteristic of the sample, and the output is equal to Ddomain(F) Representing the prediction result of the current sample field, FglobalIs the global feature of the current sample;
the domain adaptation loss calculated by the domain arbiter is:
Ld=-(y'log y+(1-y')log(1-y))
when the input image I comes from a source domain, y' is 0; when the input image I is from the target domain, y' is 1; y is an output result of the domain discriminator, and the value is between 0 and 1 after the output result is subjected to softmax normalization;
embedding a self-adaptive gradient inversion layer between the feature extractor and the domain discriminator; the function of this layer is to invert the domain adaptation loss gradient computed for the target domain sample; when the domain discriminator discriminates the source domain sample from the target domain sample, the gradient direction of the source domain sample and the gradient direction of the target domain sample received by the feature extractor are the same, and the feature extractor learns the capability of aligning the features of the source domain sample and the target domain sample;
(4) transmitting the regional characteristics of the source domain and the regional characteristics of the target domain to an aesthetic scoring module, and calculating an aesthetic correlation loss function;
regional characteristics of the sample FareaWill be the input to the aesthetics scoring module, the output of which is P ∈ RK ×1Each dimension corresponds to the aesthetic score of one cut subgraph;
the aesthetic scoring module comprises two aesthetic classifiers with the same structure and different parameters, the two aesthetic classifiers are in parallel structure, and the two aesthetic classifiers can simultaneously receive FareaAnd the aesthetic scores of all the cut sub-images are independently predicted, and the average value of the two aesthetic classifiers is used as the aesthetic score of the aesthetic score module;
in order to provide the aesthetic classifier with aesthetic appreciation capability, an aesthetic label is used as a supervision according to the aesthetic loss LAAdjusting parameters of the aesthetic classifier and the feature extractor to minimize aesthetic losses, wherein the smaller the aesthetic losses, the more accurate the predicted value of the aesthetic classifier is, and specifically continuously adjusting LAComprises the following steps:
Figure GDA0003547800150000151
Figure GDA0003547800150000152
wherein Sc is an aesthetic label sequence of a cutting subgraph; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of cutting modes; xdisFor aesthetic evaluationErrors in scoring and aesthetic labels; decrease LAAligning the sequence of predicted values with the sequence of aesthetic labels;
in order to constrain the two aesthetic classifiers to predict the same result under the premise of having different aesthetic classifier parameters, Lc and Lw are respectively used as consistency loss and weight loss to optimize the parameters of the aesthetic classifiers and the feature extractor;
Figure GDA0003547800150000153
Figure GDA0003547800150000154
the Lc represents that different parameters are set for different aesthetic classifiers, and the aesthetic scores of the cut subgraphs are predicted to be consistent; the smaller Lc, the more consistent the two aesthetic classifiers are represented; lw represents the similarity of parameter settings of different aesthetic classifiers; the larger Lw, the greater the difference in parameters representing the two aesthetic classifiers; the goal of the invention to optimize the aesthetic classifier is to make Lc smaller and Lw larger; p1And P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers respectively; w1And W2Parameters for two aesthetic classifiers;
based on the above-mentioned aesthetic loss, consistency loss function, and weight loss function, the aesthetic relevance loss function for constructing the aesthetic classifier is:
L1=LA2(Lc+μLw)
wherein L isAFor aesthetic losses; l iscLoss of consistency; l iswIs the weight loss of the classifier; lambda [ alpha ]2Mu is a compromise parameter and can be adjusted according to the actual application scene;
an application stage:
(5) inputting the target domain image to be cut into a trained image intelligent cutting system, and performing aesthetic scoring according to a preset cutting subgraph;
specifically, the intelligent cutting model is formed by L ═ LA1Ld2(Lc+μLw) Training a feature extractor, a domain discriminator and an aesthetic scoring module as a total loss function, inputting a target domain image to be cropped without a label into an image intelligent cropping system after training is finished, and outputting the aesthetic scoring P e R of each cropping sub-image of the current target domain image through the feature extractor and the aesthetic scoring deviceK×1(ii) a Wherein L isdAdapting a loss function for a domain of a domain discriminator; lambda [ alpha ]1To compromise the parameters;
(6) screening out the clipping results meeting the conditions according to the aesthetic scores of all the clipping subgraphs;
specifically, selecting a proper result from the cutting subgraph set according to the requirements of the user and the predicted result and outputting the result; the user may require that the generated image contain a certain target or a certain area, may require that the generated image conform to a certain size or aspect ratio, but in all of the conforming cropping sets, the result with higher aesthetic quality is output first.
In summary, compared with the prior art, the invention has the following advantages:
the feature extractor in the image intelligent cutting method provided by the invention realizes training by arranging an adaptive gradient inversion layer and a domain discriminator. According to the domain adaptive loss, the source domain sample and the target domain sample can be distinguished by the discriminator, and the parameters of the domain discriminator can be continuously adjusted through the domain adaptive loss, so that the domain adaptive loss is reduced, and the smaller the domain adaptive loss is, the stronger the capability of the domain discriminator for identifying the source domain sample and the target domain sample is; the self-adaptive gradient inversion layer inverts the gradient of the domain adaptive loss feedback calculated based on the target domain sample, so that the feature extractor is aligned with the global features of the source domain and the target domain sample.
The image intelligent cutting method based on the antithetical domain adaptation supports a plurality of aesthetic classifiers with different parameters and the same structure to construct an aesthetic scoring module; on one hand, an aesthetic scoring module is used for predicting the aesthetic scoring of each cut sub-image, on the other hand, consistency loss among aesthetic classifiers is adopted to constrain the output results of a plurality of aesthetic classifiers to be consistent, and the invariant features of the cut sub-images are extracted in an auxiliary manner; meanwhile, the parameters of each aesthetic classifier are adjusted by adopting the weight loss among the aesthetic classifiers, so that the aesthetic classifiers with different parameters can output consistent results to the maximum extent, and the adaptability of the cutting model to different input images is enhanced.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. An image intelligent clipping method based on antithetical domain adaptation is characterized by comprising the following steps:
inputting a target domain image to be cut in a target application scene into a trained feature extractor to obtain global features;
resampling the global features according to a preset cutting mode, inputting the obtained regional features into a trained aesthetic classifier for aesthetic scoring, and screening out a cutting result;
wherein, the training process of the feature extractor is as follows: inverting the domain adaptive loss gradient calculated based on the target domain sample and transmitting the inverted domain adaptive loss gradient to a feature extractor, keeping the domain adaptive loss gradient calculated based on the source domain sample unchanged and transmitting the unchanged domain adaptive loss gradient to the feature extractor, and enabling the feature extractor to learn the capability of aligning the global features of the source domain sample and the target domain sample; self parameters are adjusted according to the aesthetic loss, and the aesthetic analysis capability is learned; the training process of the aesthetic classifier is to adjust the parameters of the aesthetic classifier according to the aesthetic loss;
the target domain sample is any label-free image in a target application scene, and the source domain sample is an image with a cut subgraph and an aesthetic label in another scene; the domain adaptive loss is obtained based on a domain label and a domain discrimination result of the sample; the domain discrimination result is obtained by the global feature input domain discriminator; the aesthetic loss is obtained by combining the aesthetic score and the aesthetic label calculation.
2. The intelligent image cropping method according to claim 1, characterized in that the method for training the feature extractor and the aesthetic classifier comprises the following steps:
inputting the target domain sample and the source domain sample into a feature extractor to extract global features;
inputting the global features into a domain discriminator, combining the output domain discrimination result with the domain label of the sample, and calculating the domain adaptive loss; resampling the global features to obtain regional features; the channel dimension of the region feature corresponds to the cutting subgraph;
transmitting the regional characteristics to an aesthetic classifier to obtain an aesthetic score, and calculating an aesthetic loss by combining an aesthetic label;
the adaptive loss function is transmitted back to the domain discriminator to carry out self parameter optimization;
inverting and transmitting the adaptive loss gradient calculated based on the target domain sample to a feature extractor, and transmitting the domain adaptive loss gradient calculated based on the source domain sample to the feature extractor, wherein the feature extractor learns the capability of aligning the global features of the source domain sample and the target domain sample;
and meanwhile, the aesthetic loss is transmitted back to the aesthetic classifier and the feature extractor for self-parameter optimization.
3. The intelligent image cropping method according to claim 1, wherein the aesthetic score is obtained by:
transmitting the regional characteristics to a plurality of aesthetic classifiers and respectively acquiring aesthetic scores;
taking the mean value of the aesthetic scores as the aesthetic score;
or transmit the regional characteristics to a single aesthetic classifier, obtaining the aesthetic score.
4. The intelligent image cropping method according to claim 1 or 2, wherein when a plurality of aesthetic classifiers exist, the consistency loss among the aesthetic classifiers is calculated based on the aesthetic scores obtained by the aesthetic classifiers;
acquiring the weight loss among the aesthetic classifiers based on the parameters of the aesthetic classifiers;
and constructing an aesthetic relevant loss function by using the consistency loss, the weight loss and the aesthetic loss, and transmitting the aesthetic relevant loss function back to the aesthetic classifier and the feature extractor for self parameter optimization.
5. The intelligent image cropping method according to claim 1, wherein the domain adaptation loss is:
Ld=-(y'log y+(1-y')log(1-y))
when the current sample I comes from the source domain, y' is 0; when the current sample I is from the target domain, y' is 1; and y is the output result of the domain discriminator, and the value is between 0 and 1 after the output result is subjected to softmax normalization.
6. The intelligent image cropping method according to claim 4, wherein the aesthetic loss is:
Figure FDA0003547800140000031
Figure FDA0003547800140000032
wherein Sc is an aesthetic label sequence of a cutting subgraph; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of the cutting modes; xdisScoring and aesthetics for aestheticsError of the label;
the loss of consistency among the aesthetic classifiers is:
Figure FDA0003547800140000033
wherein the aesthetic classifiers are provided with two, P1,P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers;
the weight loss between aesthetic classifiers is:
Figure FDA0003547800140000034
wherein the aesthetic classifiers are provided with two, W1And W2Parameters for two aesthetic classifiers; m is the length of the aesthetic classifier parameters after expansion and j is the current classifier parameter location.
7. An image intelligent cutting system based on pertinence domain adaptation is characterized by comprising a pertinence domain adaptation module, a feature extractor, a resampling structure and an aesthetic scoring module;
one port of the feature extractor is in bidirectional data transmission with the contrast domain adaptation module, and the other port is in bidirectional data transmission with the aesthetic scoring module through the resampling structure;
the feature extractor is used for extracting global features of the target domain image and the source domain image; inverting the domain adaptive loss gradient calculated based on the target domain sample and transmitting the inverted domain adaptive loss gradient to a feature extractor, keeping the domain adaptive loss gradient calculated based on the source domain sample unchanged and transmitting the unchanged domain adaptive loss gradient to the feature extractor, and learning the capability of aligning the global features of the source domain sample and the target domain sample by the feature extractor; self parameters are adjusted according to the aesthetic loss, and the aesthetic analysis capability is learned;
the resampling structure is used for resampling the global features according to a preset cutting mode to obtain regional features;
the aesthetic scoring module is used for performing aesthetic scoring based on the regional characteristics and screening out a cutting result; self parameters of the aesthetic classifier in the aesthetic scoring module are adjusted according to the aesthetic loss;
the antagonizing domain adapting module is used for judging whether the current sample is a target domain sample or a source domain sample according to the global characteristics and outputting a domain judgment result; calculating the domain adaptive loss based on the domain discrimination result and the domain label of the sample; the domain adaptive loss is transmitted back to a domain discriminator to adjust the parameters of the domain discriminator; reversely transmitting the domain adaptive loss gradient calculated based on the target domain to the feature extractor, and transmitting the domain adaptive loss gradient calculated based on the source domain sample to the feature extractor unchanged;
the target domain sample is any label-free image in a target application scene, and the source domain sample is an image with a cropped subgraph and an aesthetic label in another scene;
the resampling structure comprises two ROIAlign layers which are respectively used for extracting foreground features and background features of the cutting area, and the foreground features and the background features are overlapped according to dimensions to generate area features.
8. The intelligent image cropping system according to claim 7, wherein the antagonizing domain adaptation module comprises an adaptive gradient inversion layer, a domain discriminator and a discrimination calculation unit;
the domain discriminator is used for judging whether the current sample is a target domain sample or a source domain sample according to the global characteristics and outputting a domain discrimination result; and self parameters are adjusted according to the domain adaptive loss;
the discrimination calculation unit is used for calculating the domain adaptive loss based on the domain discrimination result and the domain label of the sample; the domain adaptive loss is transmitted back to a domain discriminator and an adaptive gradient inversion layer;
the adaptive gradient inversion layer is used for transmitting the domain adaptive loss gradient calculated based on the target domain sample to the feature extractor after inversion, and transmitting the domain adaptive loss gradient calculated based on the source domain sample to the feature extractor while keeping the domain adaptive loss gradient unchanged.
9. The intelligent image cropping system according to claim 7 or 8, wherein the aesthetic scoring module comprises an aesthetic classifier and an aesthetic computing unit;
if an aesthetic classifier is present, the aesthetic classifier is connected to the resampling structure; if a plurality of aesthetic classifiers exist, one end of each aesthetic classifier is connected with the resampling structure after the aesthetic classifiers are connected in parallel; the other end of the connecting rod is connected with an aesthetic calculating unit;
the aesthetic classifier is used for performing aesthetic scoring according to the regional characteristics;
the aesthetic calculating unit is used for calculating the average value of the aesthetic scores and outputting the aesthetic scores; calculating consistency loss among the aesthetic classifiers based on the aesthetic scores obtained by the aesthetic classifiers; acquiring the weight loss among the aesthetic classifiers based on the parameters of the aesthetic classifiers; and constructing an aesthetic correlation loss function by using the consistency loss, the weight loss and the aesthetic loss, and transmitting the aesthetic correlation loss function back to the aesthetic classifier and the feature extractor for self parameter optimization.
10. The intelligent image cropping system of claim 9, wherein the aesthetic penalty is:
Figure FDA0003547800140000051
Figure FDA0003547800140000052
wherein Sc is an aesthetic label sequence of a cutting subgraph; p represents the normalized aesthetic score sequence; i is a cutting mode number; n is the total number of cutting modes; xdisErrors for aesthetic scores and aesthetic labels;
the loss of consistency among the aesthetic classifiers is:
Figure FDA0003547800140000053
wherein the aesthetic classifiers are provided with two, P1,P2A sequence of cropped subgraph aesthetic scores predicted for two aesthetic classifiers;
the weight loss between aesthetic classifiers is:
Figure FDA0003547800140000061
wherein the aesthetic classifiers are provided in two, W1And W2Parameters for two aesthetic classifiers; m is the length of the classifier after expansion, and j is the current classifier parameter position.
CN202110466563.XA 2021-04-28 2021-04-28 Intelligent image clipping method and system based on antithetical domain adaptation Active CN113205522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110466563.XA CN113205522B (en) 2021-04-28 2021-04-28 Intelligent image clipping method and system based on antithetical domain adaptation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110466563.XA CN113205522B (en) 2021-04-28 2021-04-28 Intelligent image clipping method and system based on antithetical domain adaptation

Publications (2)

Publication Number Publication Date
CN113205522A CN113205522A (en) 2021-08-03
CN113205522B true CN113205522B (en) 2022-05-13

Family

ID=77029216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110466563.XA Active CN113205522B (en) 2021-04-28 2021-04-28 Intelligent image clipping method and system based on antithetical domain adaptation

Country Status (1)

Country Link
CN (1) CN113205522B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146198A (en) * 2017-04-19 2017-09-08 中国电子科技集团公司电子科学研究院 A kind of intelligent method of cutting out of photo and device
CN109146892A (en) * 2018-07-23 2019-01-04 北京邮电大学 A kind of image cropping method and device based on aesthetics
CN111476805A (en) * 2020-05-22 2020-07-31 南京大学 Cross-source unsupervised domain adaptive segmentation model based on multiple constraints
CN111696112A (en) * 2020-06-15 2020-09-22 携程计算机技术(上海)有限公司 Automatic image cutting method and system, electronic equipment and storage medium
CN112132042A (en) * 2020-09-24 2020-12-25 西安电子科技大学 SAR image target detection method based on anti-domain adaptation
CN112434754A (en) * 2020-12-14 2021-03-02 前线智能科技(南京)有限公司 Cross-modal medical image domain adaptive classification method based on graph neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107146198A (en) * 2017-04-19 2017-09-08 中国电子科技集团公司电子科学研究院 A kind of intelligent method of cutting out of photo and device
CN109146892A (en) * 2018-07-23 2019-01-04 北京邮电大学 A kind of image cropping method and device based on aesthetics
CN111476805A (en) * 2020-05-22 2020-07-31 南京大学 Cross-source unsupervised domain adaptive segmentation model based on multiple constraints
CN111696112A (en) * 2020-06-15 2020-09-22 携程计算机技术(上海)有限公司 Automatic image cutting method and system, electronic equipment and storage medium
CN112132042A (en) * 2020-09-24 2020-12-25 西安电子科技大学 SAR image target detection method based on anti-domain adaptation
CN112434754A (en) * 2020-12-14 2021-03-02 前线智能科技(南京)有限公司 Cross-modal medical image domain adaptive classification method based on graph neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Domain Adaptation and Image Classification via Deep Conditional Adaptation Network;Pengfei Ge 等;《https://arxiv.org/pdf/2006.07776v1.pdf》;20200614;全文 *
基于域间相似度序数的迁移学习源领域的选择;孙俏等;《科学技术与工程》;20200718(第20期);全文 *

Also Published As

Publication number Publication date
CN113205522A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN108090902B (en) Non-reference image quality objective evaluation method based on multi-scale generation countermeasure network
CN107133943B (en) A kind of visible detection method of stockbridge damper defects detection
CN112884064B (en) Target detection and identification method based on neural network
CN109949317A (en) Based on the semi-supervised image instance dividing method for gradually fighting study
CN111881714A (en) Unsupervised cross-domain pedestrian re-identification method
Liu et al. Remote sensing image change detection based on information transmission and attention mechanism
CN112766334B (en) Cross-domain image classification method based on pseudo label domain adaptation
CN111368690A (en) Deep learning-based video image ship detection method and system under influence of sea waves
CN110009622B (en) Display panel appearance defect detection network and defect detection method thereof
CN111402126A (en) Video super-resolution method and system based on blocks
CN111160407A (en) Deep learning target detection method and system
CN110381392A (en) A kind of video abstraction extraction method and its system, device, storage medium
CN111222546B (en) Multi-scale fusion food image classification model training and image classification method
Han et al. Research on multiple jellyfish classification and detection based on deep learning
CN115131747A (en) Knowledge distillation-based power transmission channel engineering vehicle target detection method and system
CN113205522B (en) Intelligent image clipping method and system based on antithetical domain adaptation
CN111291663B (en) Method for quickly segmenting video target object by using space-time information
CN114821174A (en) Power transmission line aerial image data cleaning method based on content perception
CN115641592A (en) Contrast optimization-based electro-optical character recognition method and system
CN115311494A (en) Cultural asset image classification method combining layered training and label smoothing
CN114758135A (en) Unsupervised image semantic segmentation method based on attention mechanism
Meng et al. A Novel Steganography Algorithm Based on Instance Segmentation.
CN114445875A (en) Deep learning-based identity recognition and face comparison system and training method
CN111488806A (en) Multi-scale face recognition method based on parallel branch neural network
CN114677670B (en) Method for automatically identifying and positioning identity card tampering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant