WO2022218396A1 - 图像处理方法、装置和计算机可读存储介质 - Google Patents
图像处理方法、装置和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2022218396A1 WO2022218396A1 PCT/CN2022/086976 CN2022086976W WO2022218396A1 WO 2022218396 A1 WO2022218396 A1 WO 2022218396A1 CN 2022086976 W CN2022086976 W CN 2022086976W WO 2022218396 A1 WO2022218396 A1 WO 2022218396A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- domain
- target
- style
- feature
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 30
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 238000001514 detection method Methods 0.000 claims abstract description 130
- 238000012549 training Methods 0.000 claims abstract description 41
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000006870 function Effects 0.000 claims description 198
- 230000005012 migration Effects 0.000 claims description 61
- 238000013508 migration Methods 0.000 claims description 61
- 238000000605 extraction Methods 0.000 claims description 27
- 230000004807 localization Effects 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 17
- 238000012546 transfer Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 230000007704 transition Effects 0.000 claims description 7
- 238000009826 distribution Methods 0.000 claims description 5
- 239000013598 vector Substances 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 10
- 238000002372 labelling Methods 0.000 description 5
- 230000015556 catabolic process Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
Definitions
- the present application is based on the CN application number 202110410920.0 and the filing date is April 16, 2021, and claims its priority.
- the disclosure of the CN application is hereby incorporated into the present application as a whole.
- the present disclosure relates to the field of computer technology, and in particular, to an image processing method, an apparatus, and a computer-readable storage medium.
- autonomous driving systems can effectively avoid pedestrians and obstacles
- remote sensing systems can locate areas of interest to humans
- industrial production lines can screen and locate defective parts.
- the inventor's known robust target detection algorithm based on domain adaptation that is, to achieve feature distribution alignment through methods such as adversarial training, so as to better generalize the model trained on the labeled data of the source domain to the target domain.
- Such methods tend to assume that only one degradation type (style type) exists in the target domain.
- an image processing method comprising: acquiring a source domain content feature of a source domain image, acquiring a target domain style feature of a target domain image; Domain style features and target domain style features are different, various new style features are different from each other, and the image generated by the combination of new style features and source domain content features is semantically consistent with the source domain image, generating a variety of new style features.
- Update the content features of the source domain and the style features of the target domain combine the generated multiple new style features and the updated style features of the target domain with the updated content features of the source domain to generate the first image and the second image respectively;
- the first image, the second image, and the source domain image are trained on the target detection model, and the trained target detection model is obtained.
- acquiring the source domain content feature of the source domain image and the target domain style feature of the target domain image includes: using a content encoder to extract the source domain content feature of the source domain image; using a style encoder to extract the target domain image Domain style features.
- the style encoder includes a style feature extraction network and a clustering module, and using the style encoder to extract the target domain style features of the target domain images includes: inputting each target domain image into the style feature extraction network, and obtaining each output target The basic style features of domain images; the basic style features of each target domain image are input into the clustering module for clustering, and the feature vector of the cluster center is obtained as the target domain style feature.
- generating multiple new style features includes: randomly generating a preset number of new style features, and inputting the generated new style features and source domain content features into a generation network to obtain a first migration image; and the source domain content feature input to the generation network to obtain a second migration image; according to the style difference between the first migration image and the corresponding source domain image, and the style difference between the first migration image and the corresponding second migration image, determine the first loss function , which is used to represent the difference between the generated new style feature, the source domain style feature and the target domain style feature; according to the style difference between each first transfer image, a second loss function is determined, which is used to represent the difference between various new style features.
- a third loss function is determined, which is used to represent the semantic difference between the image generated by the combination of the new style feature and the source domain content feature and the source domain image.
- the generated new style features are adjusted until a preset convergence condition corresponding to the target is reached, and a variety of generated new style features are obtained.
- updating the content features of the source domain and the style features of the target domain includes: adjusting the parameters of the content encoder, the style encoder, and the generator according to the first loss function, the second loss function, and the third loss function, until the The preset convergence condition corresponding to the target; when the preset convergence condition corresponding to the target is reached, the source domain content feature output by the content encoder is used as the updated source domain content feature, and the target domain style output by the style encoder features as the updated target domain style features.
- the first migration image and the corresponding source domain image are used as the first reference image and the second reference image, respectively, or the first migration image and the corresponding second migration image are respectively used as the first reference image and the second reference image.
- the second reference image, or any two first transition images are used as the first reference image and the second reference image respectively, then the style difference between the first reference image and the second reference image is determined by the following method:
- the image and the second reference image are respectively input into a plurality of preset feature layers in the pre-trained feature extraction network; for each feature layer, the mean value and variance of the features of the first reference image output from the feature layer are taken as the first mean value and the first variance, take the mean and variance of the features of the second reference image output by the feature layer as the second mean and the second variance; according to the difference between the first mean and the second mean corresponding to each feature layer, the first The difference between the variance and the second variance determines the style difference between the first reference image and the second reference image.
- the first loss function is determined according to the following formula:
- k is a positive integer
- 1 ⁇ k ⁇ ns i is a positive integer
- the total number of domain images, ns and nt represent the number of source domain images and target domain images respectively
- n j represents the number of target images corresponding to the jth target domain style feature
- K t represents the number of target domain style features
- T nov is a hyperparameter that represents the threshold of distance maximization
- 1 ⁇ j ⁇ K t j is a positive integer
- Representing the jth target domain style feature and the source domain content feature of the kth source domain image are input to the generation network to obtain the second transfer image
- the second loss function is determined according to the following formula:
- the third loss function is determined according to the following formula:
- ⁇ sm ( ) represents the function of the semantic feature extractor
- kth source domain image represents the third loss function corresponding to the k-th source domain image of the i-th new style feature
- ⁇ sm ( ) represents the function of the semantic feature extractor
- the source domain content features representing the i-th new style feature and the k-th source domain image are input to the generation network, and the obtained first migration image is obtained.
- adjusting the generated new style feature according to the first loss function, the second loss function, and the third loss function includes: a weighted summation of the first loss function, the second loss function, and the third loss function to obtain target loss function; determine the gradient according to the target loss function; adjust the generated new style features according to the gradient and the preset learning rate; wherein, the value of each dimension in the randomly generated preset number of new style features is from the standard normal distribution obtained by random sampling.
- combining the generated multiple new style features and the updated target domain style features with the updated source domain content features, respectively, generating the first image and the second image respectively includes: In the case of preset convergence conditions, the generated new style features and the updated source domain content features are input into the generator to obtain the first image, and the updated target domain style features and the updated source domain content features are input. generator to get the second image.
- using the first image, the second image, and the source domain image to train the target detection model includes: inputting the first image, the second image, and the source domain image into the target detection model, respectively, to obtain the target of the first image.
- the target detection loss function is determined; the parameters of the target detection model are adjusted according to the target detection loss function.
- using the first image, the second image, and the source domain image to train the target detection model further includes: inputting the first image, the second image, the source domain image, and the target domain image into the base of the target detection model respectively
- the feature extraction network obtains the basic features of the first image, the basic features of the second image, the basic features of the source domain image and the basic features of the target domain image; the basic features of the first image, the basic features of the second image, the basic features of the source domain image
- the basic features of the image and the basic features of the target domain image are respectively input to the gradient inversion layer and then input to the discrimination network to obtain the discrimination result of the first image, the discrimination result of the second image, the discrimination result of the source domain image and the discrimination of the target domain image.
- the discrimination result of the first image the discrimination result of the second image, the discrimination result of the source domain image and the discrimination result of the target domain image, determine the discriminant loss function; according to the target detection loss function, the parameters of the target detection model are adjusted including: : Adjust the parameters of the target detection model according to the target detection loss function and the discriminative loss function.
- the target detection result includes: a positioning result and a classification result, wherein the positioning result is the position information of the detected target, the classification result is the category information of the detected target, and the labeling information of the source domain image includes the information in the source domain image.
- determining the target detection loss function includes: determining the positioning loss function according to the positioning result of the first image, the positioning result of the second image, and the difference between the positioning result of the source domain image and the position information of the target in the corresponding source domain image;
- the classification result of the first image, the classification result of the second image, the difference between the classification result of the source domain image and the category information of the target in the corresponding source domain image determine the classification loss function; the weighted positioning loss function and the classification loss function are calculated. and, determine the target detection loss function.
- the localization loss function is determined according to the following formula:
- d i represents the i-th type in the set of various new style features generated and updated target domain style features style features, represents the image generated by the combination of the i-th style feature and the updated source-domain content feature of the k-th source-domain image, which is the first image or the second image, Represents the localization result of the image and the localization loss of the target's location information in the kth source domain image, 1 ⁇ i ⁇ N d , i is a positive integer, N d represents the generated multiple new style features and the updated target domain The total number of style features in the set of style features.
- the classification loss function is determined according to the following formula:
- d i represents the i-th type in the set of various new style features generated and updated target domain style features style features, represents the image generated by the combination of the i-th style feature and the updated source-domain content feature of the k-th source-domain image, which is the first image or the second image, Represents the classification result of the image and the classification loss of the target category information in the kth source domain image, 1 ⁇ i ⁇ N d , i is a positive integer, N d represents the generated multiple new style features and the updated target domain The total number of style features in the set of style features.
- the discriminative loss function is determined according to the following formula:
- the maximum width, F( ) represents a function of the underlying feature extraction network and gradient inversion layer.
- the method further includes: inputting the image to be detected into the trained target detection model to obtain a target detection result of the image to be detected.
- an image processing apparatus comprising: an acquisition module for acquiring source domain content features and acquiring target domain style features of target domain images; a feature generation module for generating new The style feature is different from the source domain style feature and target domain style feature of the source domain image, various new style features are different, and the image generated by the combination of the new style feature and the source domain content feature is consistent with the semantics of the source domain image.
- new style features and update the content features of the source domain and the style features of the target domain;
- the image generation module is used to combine the generated multiple new style features and the updated style features of the target domain with the updated content features of the source domain, respectively.
- the first image and the second image are respectively generated; the training module is used for training the target detection model by using the first image, the second image and the source domain image to obtain the trained target detection model.
- an image processing apparatus including: a processor; and a memory coupled to the processor for storing instructions, and when the instructions are executed by the processor, the processor executes any of the foregoing The image processing method of the embodiment.
- a non-transitory computer-readable storage medium having a computer program stored thereon, wherein, when the program is executed by a processor, the image processing method of any of the foregoing embodiments is implemented.
- FIG. 1 shows a schematic flowchart of an image processing method according to some embodiments of the present disclosure.
- Figure 2 shows a schematic diagram of the network architecture of some embodiments of the present disclosure.
- FIG. 3 shows a schematic structural diagram of an image processing apparatus according to some embodiments of the present disclosure.
- FIG. 4 shows a schematic structural diagram of an image processing apparatus according to other embodiments of the present disclosure.
- FIG. 5 shows a schematic structural diagram of an image processing apparatus according to further embodiments of the present disclosure.
- a technical problem to be solved by the present disclosure is: how to improve the efficiency and effectiveness of target detection model training.
- the present disclosure provides an image processing method, which will be described below with reference to FIGS. 1 to 3 .
- FIG. 1 is a flowchart of some embodiments of the disclosed image processing method. As shown in FIG. 1 , the method of this embodiment includes steps S102 to S108.
- step S102 the source domain content feature and the source domain style feature of the source domain image are obtained, and the target domain style feature of the target domain image is obtained.
- n s is the number of source domain images.
- the set of unlabeled target domain images can be represented as represents the ith target domain image, and n t is the number of target domain images.
- Content features are used to reflect the semantic information of the image, such as the semantic category (car, person, background, etc.) to which different pixels belong.
- Style features are used to reflect the type of image degradation. For example, due to weather changes, the collected images may be affected by rain, snow, and fog and become unclear; due to changes in lighting, the collected images may have overexposure, Problems such as low light; due to the influence of the acquisition equipment and acquisition process, the picture may have problems such as blurring and noise.
- the source domain image and the target domain image have the same or similar semantic information, but have different degradation types, i.e., style features.
- source domain content features of source domain images are extracted using a content encoder; target domain style features of target domain images are extracted using a style encoder.
- Different encoders are used to encode the style feature (Style Representation) and the content feature (Content Representation) respectively, which can decouple the content feature and style feature of the image.
- the content encoder and style encoder can employ Convolutional Neural Networks (CNN), for example, VGGNet or ResNet, etc.
- CNN Convolutional Neural Networks
- the style encoder includes a style feature extraction network and a clustering module. Input each target domain image into the style feature extraction network to obtain the output basic style features of each target domain image; input the basic style features of each target domain image into the clustering module for clustering, and obtain the feature vectors of multiple cluster centers, as multiple target domain style features.
- the source domain images can all belong to one style type, and the target domain images can belong to one or more style types. Since the target domain image has no label information, the clustering method can be used to obtain one or more cluster centers of the target domain image, which are used as one or more target domain style features to represent different style types respectively.
- the clustering algorithm may adopt existing algorithms, for example, K-means, mean-shift clustering, density-based clustering algorithm, and the like. Through clustering, each target domain image can be labeled with a domain pseudo-label, that is, annotated style type for each target domain image.
- step S104 the generated new style feature is different from the source domain style feature and target domain style feature of the source domain image, various new style features are different from each other, and the image generated by combining the new style feature and the source domain content feature is different from the source domain style feature and the target domain style feature.
- the semantic consistency of the source domain images is the goal, a variety of new style features are generated, and the source domain content features and target domain style features are updated.
- a preset number of new style features are randomly generated, and the generated new style features and source domain content features are input into a generation network to obtain a first migration image; Domain content features are input to the generation network to obtain a second migration image; according to the style difference between the first migration image and the corresponding source domain image, and the style difference between the first migration image and the corresponding second migration image, the first loss function is determined, using is used to represent the difference between the generated new style feature, the source domain style feature and the target domain style feature; according to the style difference between each first transfer image, a second loss function is determined to represent the difference between various new style features; According to the difference between the semantic feature of the first migration image and the semantic feature of the corresponding source domain image, a third loss function is determined, which is used to represent the semantic difference between the image generated by the combination of the new style feature and the source domain content feature and the source domain image; The first loss function, the second loss function, and the
- the preset number may be the same as the number of target domain style features (ie, the number of style types to which target domain images belong). For example, the value of each dimension in the randomly generated preset number of new style features is randomly sampled from the standard normal distribution.
- the generative network is used to fuse the style features and content features, for example, existing models such as CNN can be used, and the examples are not limited.
- the new style feature and the source domain content feature input to the generation network can obtain the transition image from the source domain to the new domain, that is, the first transition image.
- the target domain style feature and the source domain content feature are input to the generation network, and the migration image from the source domain to the target domain, that is, the second migration image, can be obtained.
- Both the first loss function and the second loss function are determined based on the style difference of the two images.
- the first migration image and the corresponding source domain image are used as the first reference image and the second reference image, respectively, or the first migration image and the corresponding second migration image are respectively used as the first reference image and the second reference image.
- the second reference image, or any two first transition images are used as the first reference image and the second reference image respectively, the style difference between the first reference image and the second reference image is determined by the following method.
- the source domain image corresponding to the first migration image is the source domain image used to generate the source domain content feature of the first migration image.
- the second migration image corresponding to the first migration image uses the same source domain content feature. Generated first migration image and second migration image.
- the first reference image and the second reference image are respectively input into a plurality of preset feature layers in the pre-trained feature extraction network (as shown in Figure 2); for each feature layer, the first reference image output from the feature layer is Take the mean and variance of the features as the first mean and the first variance, and take the mean and variance of the features of the second reference image output from the feature layer as the second mean and second variance; The difference between the first mean and the second mean, and the difference between the first variance and the second variance, determine the style difference between the first reference image and the second reference image. The greater the difference between the first mean and the second mean corresponding to each feature layer, the greater the difference between the first variance and the second variance, and the greater the style difference between the first reference image and the second reference image.
- the pre-trained feature extraction network is, for example, the pre-trained VGG19, which is not limited to the examples.
- the style difference between the first reference image and the second reference image is determined using the following formula:
- x 1 , x 2 represent the first reference image and the second reference image respectively, 1 ⁇ i ⁇ L, i is a positive integer, and L represents multiple feature layers preset in the pre-trained feature extraction network
- the number of , ⁇ i ( ) represents the function of the i-th layer in the pre-trained feature extraction network, ⁇ ( ) represents the mean value, and ⁇ ( ) represents the variance.
- the first loss function is used to represent the difference between the generated new style feature and the source domain style feature and the target domain style feature.
- the new style feature can be compared with the source domain and the target domain.
- the existing style is different to achieve a complementary effect with the existing image style.
- the first loss function is determined according to the following formula.
- the second loss function is used to represent the difference between various new style features. Using the second loss function for training can make the new style features generated should be different from each other to ensure the diversity of the generated new domain.
- the second loss function is determined according to the following formula:
- the semantic feature of the first migration image and the semantic feature of the source domain image are obtained by a semantic feature extractor.
- the third loss function is used to represent the semantic difference between the image generated by the combination of the new style feature and the source domain content feature (the first transfer image) and the source domain image. Using the third loss function for training can make the semantics of the first migration image and the corresponding source domain image consistent, and apply the semantic label on the source domain to the corresponding generated image.
- the third loss function is determined according to the following formula.
- formula (4) represents the third loss function corresponding to the k-th source domain image of the i-th new style feature
- ⁇ sm ( ) represents the function of the semantic feature extractor
- the source domain content features representing the i-th new style feature and the k-th source domain image are input to the generation network, and the obtained first migration image is obtained.
- the target loss function is obtained by weighted summation of the first loss function, the second loss function, and the third loss function.
- the objective loss function can be determined using the following formula.
- the gradient is determined according to the objective loss function; the generated new style features are adjusted according to the gradient and a preset learning rate. For example, subtract the product of the gradient and the preset learning rate from the vector corresponding to the new style feature to obtain the adjusted new style feature.
- the parameters of the content encoder, the style encoder, and the generator are adjusted according to the first loss function, the second loss function, and the third loss function, until the parameters corresponding to the target are reached.
- Preset convergence condition when the preset convergence condition corresponding to the target is reached, the source domain content feature output by the content encoder is used as the updated source domain content feature, and the target domain style feature output by the style encoder is used as the updated source domain content feature.
- the target domain style features are adjusted according to the first loss function, the second loss function, and the third loss function, until the parameters corresponding to the target are reached.
- the gradient is determined according to the objective loss function; the parameters of the content encoder, style encoder, and generator are adjusted according to the gradient and preset learning rate, and the parameters of the semantic feature extractor can also be adjusted.
- the generated new style features are adjusted according to the target loss function, and the parameters of the content encoder, style encoder, generator, and semantic feature extractor are adjusted.
- the first loss function is determined according to the style difference between the first migration image and the corresponding source domain image, and the style difference between the first migration image and the corresponding second migration image; according to the difference between the first migration images Determine the second loss function according to the style difference; determine the third loss function according to the difference between the semantic feature of the first migration image and the semantic feature of the corresponding source domain image; determine the third loss function according to the first loss function, the second loss function, and the third loss function Determine the objective loss function.
- the above process is repeated until a preset convergence condition corresponding to the target is reached.
- the preset convergence condition is that the value of the target loss function is
- step S106 the generated multiple new style features and the updated target domain style features are respectively combined with the updated source domain content features to generate a first image and a second image respectively.
- the generated multiple new style features and the updated source domain content features are input into the generator to obtain a first image
- the updated target Domain style features and updated source domain content features are input to the generator to obtain a second image.
- the trained generator can be obtained by using the training process of the foregoing embodiment, and the first image and the second image are generated by using the trained generator, as shown in FIG. 2 .
- step S108 the target detection model is trained by using the first image, the second image, and the source domain image to obtain a trained target detection model.
- Steps S102 to S104 are the training process of the first stage, that is, novel image style confrontation exploration, to obtain the updated content features of the source domain, the style features of the target domain, and the new style features generated by confrontation, and then use step S106 to generate the first image and
- the second image is used for the second stage of training (step S108 ), that is, training the domain-invariant target detection model.
- the first image and the second image are generated based on the corresponding content features of the source domain, the first image and the second image and the corresponding source domain images have the same content feature representation, and the semantic labels of the two are consistent. Take the semantic labels on the source domain as the semantic labels of the first and second images.
- the first image, the second image and the source domain image are respectively input into the target detection model to obtain the target detection result of the first image, the target detection result of the second image, and the target detection result of the source domain image; according to The difference between the target detection result of the first image, the target detection result of the second image, and the target detection result of the source domain image and the corresponding label information of the source domain image respectively determines the target detection loss function; according to the target detection loss function, the target detection loss function is determined.
- the parameters of the model are adjusted.
- the source domain image corresponding to the first image or the second image refers to the source domain image to which the source domain content feature used to generate the first image or the second image belongs.
- the object detection model includes a base feature extraction network and an object detection network.
- the first image, the second image and the source domain image are respectively input into the basic feature extraction network to obtain the basic features of the first image, the basic features of the second image, and the basic features of the source domain image, and then the basic features of the first image, the first image and the first image are obtained.
- the basic features of the second image and the basic features of the source domain image are input into the target detection network to obtain the target detection result of the first image, the target detection result of the second image, and the target detection result of the source domain image.
- the target detection result includes at least one of a localization result and a classification result.
- the positioning result is the location information of the detected target (for example, the coordinate information of the rectangular frame containing the target)
- the classification result is the category information of the detected target (for example, car, person, background, etc.)
- the annotation information of the source domain image includes semantic labels , such as the location information of the target in the source domain image and/or the category information of the target in the source domain image.
- the positioning is determined according to the positioning result of the first image, the positioning result of the second image, and the difference between the positioning result of the source domain image and the position information of the target in the source domain image, respectively.
- Loss function determine the classification loss function according to the classification result of the first image, the classification result of the second image, the classification result of the source domain image and the category information of the target in the source domain image; Weighted summation to determine the target detection loss function. If the target detection result includes a localization result or a classification result, the value may determine the localization loss function or the classification loss function, which will not be repeated here.
- the localization loss function is determined according to the following formula:
- d i represents the i-th type in the set of various new style features generated and updated target domain style features style features
- N d represents the generated multiple new style features and the updated target domain The total number of style features in the set of style features.
- the classification loss function is determined according to the following formula:
- d i represents the i-th type in the set of various new style features generated and updated target domain style features style features
- N d represents the generated multiple new style features and the updated target domain The total number of style features in the set of style features.
- the discriminator can be added to train the target detection model through the domain discrimination results.
- the basic features of the first image, the basic features of the second image, the basic features of the source domain image, and the basic features of the target domain image are respectively input into the gradient inversion layer and then input into the discrimination layer.
- the discriminant loss function is determined by the discriminant result and the discriminant result of the target domain image; the parameters of the target detection model are adjusted according to the target detection loss function and the discriminant loss function.
- the gradient inversion layer is first input to reverse the gradient of the feature, so that the discriminator and the basic feature extraction network are optimized in opposite directions to force the basic feature extraction network to learn the domain. Invariant feature representation.
- the discriminative loss function is determined according to the following formula:
- n s represents the number of source domain images
- j represents the source domain discrimination loss function determined according to the discrimination results of each source domain image
- n t represents the number of target domain images
- 1 ⁇ j ⁇ n t j is a positive integer
- d k represents the kth style feature in the set of multiple new style features generated and updated target domain style features
- N d represents the generated multiple new style features and the updated The total number of style features in the set of target domain style features
- the discriminant loss function includes three parts, namely the source domain discriminant loss function, the target domain loss function, and the discriminant loss function determined according to the discrimination result of the first image and the discrimination result of the second image.
- the loss function of each part can be determined according to the following formula.
- the target detection loss function and the discriminant loss function are weighted and summed to obtain a total loss function, and the parameters of the target detection model are adjusted.
- the total loss function can be determined using the following formula.
- ⁇ LOC and ⁇ CLS are respectively and the weight of.
- the parameters of the target detection model and the discriminator are adjusted according to the total loss function.
- the specific training process may refer to the prior art, which will not be repeated here.
- the basic feature extraction network can use a CNN model, such as VGG, ResNet, etc., and is not limited to the examples.
- the training process of the present disclosure includes two stages.
- the first stage is a new style generation method based on adversarial exploration.
- the method includes that the generated new style features are different from the source domain style features and the target domain style features, and various new style features are different. , and the three goals are that the image generated by the combination of the new style feature and the source domain content feature is consistent with the semantics of the source domain image.
- the second stage is to train a domain-invariant object detection model.
- the process is based on domain pseudo-labels on style features (eg, by clustering each target domain image with domain pseudo-labels), and obtains feature representations and target detection models that are robust to multiple domains through an adversarial training mechanism.
- the target domain style feature of the target domain image automatically generates a variety of new style features, the generated new style features are different from each other, and are different from the source domain style feature and the target domain style feature.
- the features are also different, and the new style features combined with the source domain content features produce images that are semantically consistent with the source domain images. Therefore, the generated new style features can be combined with the updated source domain content features to generate the first image as a training sample to perform domain adaptation training on the target detection model. Further, the target domain style features and the updated source domain content features are combined to generate a The second image and the source domain image are also used as training samples for domain adaptation training of the target detection model.
- the trained target detection model can accurately detect images of various styles and types, thereby improving the effectiveness of the target detection model.
- the trained object detection model can be used for object detection on images.
- the image to be detected is input into the trained target detection model to obtain the target detection result of the image to be detected.
- the present disclosure also provides an image processing apparatus, which will be described below with reference to FIG. 3 .
- FIG. 3 is a structural diagram of some embodiments of the disclosed image processing apparatus.
- the apparatus 30 of this embodiment includes: an acquisition module 310 , a feature generation module 320 , an image generation module 330 , and a training module 340 .
- the obtaining module 310 is configured to obtain the content features of the source domain, and obtain the style features of the target domain images of the target domain.
- the obtaining module 310 is configured to use a content encoder to extract source domain content features of source domain images; and use a style encoder to extract target domain style features of target domain images.
- the style encoder includes a style feature extraction network and a clustering module
- the acquisition module 310 is configured to input each target domain image into the style feature extraction network to obtain the output basic style features of each target domain image;
- the basic style features of the domain image are input into the clustering module for clustering, and the feature vector of the cluster center is obtained as the style feature of the target domain.
- the feature generation module 320 is used to generate the image and source domain by combining the generated new style feature with the source domain style feature and the target domain style feature of the source domain image, different new style features, and the combination of the new style feature and the source domain content feature.
- the semantic consistency of domain images is the goal, and a variety of new style features are generated, and the content features of the source domain and the style features of the target domain are updated.
- the feature generation module 320 is configured to randomly generate a preset number of new style features, and input the generated new style features and source domain content features into a generating network to obtain a first migration image; Domain content features are input to the generation network to obtain a second migration image; according to the style difference between the first migration image and the corresponding source domain image, and the style difference between the first migration image and the corresponding second migration image, the first loss function is determined, using is used to represent the difference between the generated new style feature, the source domain style feature and the target domain style feature; according to the style difference between each first transfer image, a second loss function is determined to represent the difference between various new style features; According to the difference between the semantic feature of the first migration image and the semantic feature of the corresponding source domain image, a third loss function is determined, which is used to represent the semantic difference between the image generated by the combination of the new style feature and the source domain content feature and the source domain image; The first loss function, the second loss function, and the third loss function are used to adjust the generated new style features until
- the feature generation module 320 is configured to adjust the parameters of the content encoder, the style encoder, and the generator according to the first loss function, the second loss function, and the third loss function, until a preset corresponding to the target is reached Convergence condition; when the preset convergence condition corresponding to the target is reached, the source domain content feature output by the content encoder is used as the updated source domain content feature, and the target domain style feature output by the style encoder is used as the updated target. Domain style features.
- the first migration image and the corresponding source domain image are used as the first reference image and the second reference image, respectively, or the first migration image and the corresponding second migration image are respectively used as the first reference image and the second reference image.
- the second reference image, or any two first transition images are used as the first reference image and the second reference image respectively, then the style difference between the first reference image and the second reference image is determined by the following method:
- the image and the second reference image are respectively input into a plurality of preset feature layers in the pre-trained feature extraction network; for each feature layer, the mean value and variance of the features of the first reference image output from the feature layer are taken as the first mean value and the first variance, take the mean and variance of the features of the second reference image output by the feature layer as the second mean and the second variance; according to the difference between the first mean and the second mean corresponding to each feature layer, the first The difference between the variance and the second variance determines the style difference between the first reference image and the second reference image.
- the first loss function, the second loss function, and the third loss function can be determined with reference to formulas (2)-(4), respectively, and will not be repeated here.
- the feature generation module 320 is configured to perform weighted summation of the first loss function, the second loss function, and the third loss function to obtain a target loss function; determine the gradient according to the target loss function; according to the gradient and a preset learning rate Adjust the generated new style features; wherein, the value of each dimension in the randomly generated preset number of new style features is randomly sampled from the standard normal distribution.
- the image generation module 330 is configured to combine the generated multiple new style features and the updated target domain style features with the updated source domain content features, respectively, to generate a first image and a second image respectively.
- the image generation module 330 is configured to input the generated multiple new style features and the updated source domain content features into the generator to obtain the first image when a preset convergence condition corresponding to the target is reached. , the updated target domain style features and the updated source domain content features are input into the generator to obtain the second image.
- the training module 340 is configured to use the first image, the second image, and the source domain image to train the target detection model to obtain the trained target detection model.
- the training module 340 is configured to input the first image, the second image and the source domain image into the target detection model respectively, to obtain the target detection result of the first image, the target detection result of the second image, the target detection result of the source domain image Target detection result; determine the target detection loss function according to the target detection result of the first image, the target detection result of the second image, the target detection result of the source domain image and the corresponding label information of the source domain image, and determine the target detection loss function; The loss function adjusts the parameters of the object detection model.
- the training module 340 is further configured to input the first image, the second image, the source domain image and the target domain image respectively into the basic feature extraction network of the target detection model to obtain the basic features of the first image, the second image
- the basic features of the first image, the basic features of the second image, the basic features of the source domain image and the basic features of the target domain image are respectively input into the gradient reverse
- input the discrimination network to obtain the discrimination result of the first image, the discrimination result of the second image, the discrimination result of the source domain image and the discrimination result of the target domain image; according to the discrimination result of the first image, the discrimination result of the second image
- the discrimination result of the source domain image and the discrimination result of the target domain image determine the discriminative loss function; the parameters of the target detection model are adjusted according to the target detection loss function and the discriminative loss function.
- the target detection result includes: a positioning result and a classification result, wherein the positioning result is the position information of the detected target, the classification result is the category information of the detected target, and the labeling information of the source domain image includes the information in the source domain image.
- the location information of the target and the category information of the target in the source domain image; the training module 340 is used for according to the positioning result of the first image, the positioning result of the second image, and the positioning result of the source domain image respectively and the corresponding source domain image.
- the difference in position information determines the positioning loss function; according to the classification result of the first image, the classification result of the second image, the difference between the classification result of the source domain image and the category information of the target in the corresponding source domain image, the classification loss function is determined. ; Calculate the weighted sum of the localization loss function and the classification loss function to determine the target detection loss function.
- the image processing apparatus 30 further includes: a target detection module 350, configured to input the image to be detected into the trained target detection model to obtain the target detection result of the image to be detected.
- the image processing apparatuses in the embodiments of the present disclosure may each be implemented by various computing devices or computer systems, which will be described below with reference to FIG. 4 and FIG. 5 .
- FIG. 4 is a structural diagram of some embodiments of the disclosed image processing apparatus.
- the apparatus 40 of this embodiment includes a memory 410 and a processor 420 coupled to the memory 410 , the processor 420 is configured to execute any of the implementations of the present disclosure based on instructions stored in the memory 410 The image processing method in the example.
- the memory 410 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
- the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
- FIG. 5 is a structural diagram of other embodiments of the disclosed image processing apparatus.
- the apparatus 50 in this embodiment includes: a memory 510 and a processor 520 , which are similar to the memory 410 and the processor 420 , respectively. It may also include an input-output interface 530, a network interface 540, a storage interface 550, and the like. These interfaces 530 , 540 , 550 and the memory 510 and the processor 520 can be connected, for example, through a bus 560 .
- the input and output interface 530 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
- the network interface 540 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server.
- the storage interface 550 provides a connection interface for external storage devices such as SD cards and U disks.
- embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .
- computer-usable non-transitory storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
- These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps configured to implement the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (22)
- 一种图像处理方法,包括:获取源域图像的源域内容特征和目标域图像的目标域风格特征;以生成的新风格特征与所述源域图像的源域风格特征和所述目标域风格特征均不同,各种新风格特征彼此不同,以及所述新风格特征与所述源域内容特征结合生成的图像与所述源域图像的语义一致为目标,生成多种新风格特征,并更新所述源域内容特征和所述目标域风格特征;将生成的多种新风格特征和更新后的目标域风格特征分别与更新后的源域内容特征结合,分别生成第一图像和第二图像;利用所述第一图像、所述第二图像、所述源域图像对目标检测模型进行训练,得到训练完成的目标检测模型。
- 根据权利要求1所述的图像处理方法,其中,所述获取源域图像的源域内容特征和目标域图像的目标域风格特征包括:利用内容编码器提取所述源域图像的源域内容特征;利用风格编码器提取所述目标域图像的目标域风格特征。
- 根据权利要求2所述的图像处理方法,其中,所述风格编码器包括风格特征提取网络和聚类模块,所述利用风格编码器提取所述目标域图像的目标域风格特征包括:将各个目标域图像输入所述风格特征提取网络,得到输出的各个目标域图像的基本风格特征;将各个目标域图像的基本风格特征输入所述聚类模块进行聚类,得到聚类中心的特征向量,作为目标域风格特征。
- 根据权利要求2所述的图像处理方法,其中,所述生成多种新风格特征包括:随机生成预设数量新风格特征,并将生成的新风格特征和所述源域内容特征输入生成网络,得到第一迁移图像;将所述目标域风格特征和所述源域内容特征输入所述生成网络,得到第二迁移图像;根据所述第一迁移图像与对应的源域图像的风格差异,所述第一迁移图像与对应的第二迁移图像的风格差异,确定第一损失函数,用于表示生成的新风格特征与所述源域风格特征和所述目标域风格特征差异;根据各个第一迁移图像之间的风格差异,确定第二损失函数,用于表示各种新风格特征之间的差异;根据所述第一迁移图像的语义特征与对应的源域图像的语义特征的差异,确定第三损失函数,用于表示所述新风格特征与所述源域内容特征结合生成的图像与所述源域图像的语义差异;根据所述第一损失函数、第二损失函数、第三损失函数,调整生成的新风格特征,直至达到与所述目标对应的预设收敛条件,得到生成的多种新风格特征。
- 根据权利要求4所述的图像处理方法,其中,所述更新所述源域内容特征和目标域风格特征包括:根据所述第一损失函数、第二损失函数、第三损失函数,调整所述内容编码器、所述风格编码器、所述生成器的参数,直至达到与所述目标对应的预设收敛条件;在达到与所述目标对应的预设收敛条件的情况下,将所述内容编码器输出的源域内容特征作为更新后的源域内容特征,所述风格编码器输出的目标域风格特征作为更新后的目标域风格特征。
- 根据权利要求4所述的图像处理方法,其中,将所述第一迁移图像与对应的源域图像分别作为第一参考图像和第二参考图像,或者,将所述第一迁移图像与对应的第二迁移图像分别作为第一参考图像和第二参考图像,或者将任意两个第一迁移图像分别作为第一参考图像和第二参考图像,则所述第一参考图像和所述第二参考图像之间的风格差异采用以下方法确定:将所述第一参考图像和所述第二参考图像分别输入预训练的特征提取网络中预设的多个特征层;针对每个特征层,将该特征层输出的所述第一参考图像的特征取均值和方差,作为第一均值和第一方差,将该特征层输出的所述第二参考图像的特征取均值和方差,作为第二均值和第二方差;根据各个特征层对应的第一均值和第二均值的差距,第一方差和第二方差的差距, 确定所述第一参考图像和所述第二参考图像之间的风格差异。
- 根据权利要求4所述的图像处理方法,其中,所述第一损失函数根据以下公式确定:其中, 表示第i个新风格特征第k个源域图像对应的第一损失函数,k为正整数,1≤k≤n s,i为正整数,n=n s+n t表示源域图像和目标域图像的总数,n s和n t分别表示源域图像和目标域图像的数量,n j表示第j个目标域风格特征对应的目标图像的数量,K t表示目标域风格特征的数量,T nov是一个超参数,表示距离最大化的阈值,1≤j≤K t,j为正整数, 表示第k个源域图像, 表示第i个新风格特征和第k个源域图像的源域内容特征输入生成网络,得到的第一迁移图像, 表示第j个目标域风格特征和第k个源域图像的源域内容特征输入所述生成网络,得到的第二迁移图像,d(·)表示两个图像之间的风格差异的确定函数。
- 根据权利要求4所述的图像处理方法,其中,所述根据所述第一损失函数、第二损失函数、第三损失函数,调整生成的新风格特征包括:将所述第一损失函数、第二损失函数、第三损失函数进行加权求和得到目标损失函数;根据所述目标损失函数确定梯度;根据所述梯度和预设学习率调整生成的新风格特征;其中,随机生成的预设数量新风格特征中每个维度上的取值是从标准正太分布中随机采样得到的。
- 根据权利要求5所述的图像处理方法,其中,所述将生成的多种新风格特征和更新后的目标域风格特征分别与更新后的源域内容特征结合,分别生成第一图像和第二图像包括:在达到与所述目标对应的预设收敛条件的情况下,将生成的多种新风格特征和更新后的源域内容特征输入所述生成器,得到所述第一图像,将更新后的目标域风格特征和更新后的源域内容特征输入所述生成器,得到所述第二图像。
- 根据权利要求1所述的图像处理方法,其中,所述利用所述第一图像、所述第二图像、所述源域图像对目标检测模型进行训练包括:将所述第一图像、所述第二图像和所述源域图像分别输入所述目标检测模型,得到所述第一图像的目标检测结果,所述第二图像的目标检测结果,所述源域图像的目标检测结果;根据所述第一图像的目标检测结果,所述第二图像的目标检测结果,所述源域图像的目标检测结果分别与对应的源域图像的标注信息的差异,确定目标检测损失函数;根据所述目标检测损失函数对所述目标检测模型的参数进行调整。
- 根据权利要求12所述的图像处理方法,其中,所述利用所述第一图像、所述第二图像、所述源域图像对目标检测模型进行训练还包括:将所述第一图像、所述第二图像、所述源域图像和所述目标域图像分别输入所述目标检测模型的基础特征提取网络,得到所述第一图像的基础特征、所述第二图像的基础特征、所述源域图像的基础特征和所述目标域图像的基础特征;将所述第一图像的基础特征、所述第二图像的基础特征、所述源域图像的基础特征和所述目标域图像的基础特征分别输入梯度反转层后再输入判别网络,得到所述第一图像的判别结果,所述第二图像的判别结果,所述源域图像的判别结果和所述目标域图像的判别结果;根据所述第一图像的判别结果,所述第二图像的判别结果,所述源域图像的判别结果和所述目标域图像的判别结果,确定判别损失函数;所述根据所述目标检测损失函数对所述目标检测模型的参数进行调整包括:根据所述目标检测损失函数和所述判别损失函数对所述目标检测模型的参数进行调整。
- 根据权利要求12所述的图像处理方法,其中,所述目标检测结果包括:定位结果和分类结果,其中,所述定位结果为检测的目标的位置信息,所述分类结果为检测的目标的类别信息,所述源域图像的标注信息包括所述源域图像中目标的位置信息和所述源域图像中目标的类别信息;所述根据所述第一图像的目标检测结果,所述第二图像的目标检测结果,所述源域图像的目标检测结果分别与对应的源域图像的标注信息的差异,确定目标检测损失函数包括:根据所述第一图像的定位结果,所述第二图像的定位结果,所述源域图像的定位结果分别与对应的源域图像中目标的位置信息的差异,确定定位损失函数;根据所述第一图像的分类结果,所述第二图像的分类结果,所述源域图像的分类结果别与对应的源域图像中目标的类别信息的差异,确定分类损失函数;将所述定位损失函数和所述分类损失函数加权求和,确定所述目标检测损失函数。
- 根据权利要求14所述的图像处理方法,其中,所述定位损失函数根据以下公式确定:
- 根据权利要求14所述的图像处理方法,其中,所述分类损失函数根据以下公式确定:
- 根据权利要求13所述的图像处理方法,其中,所述判别损失函数根据以下公式确定:其中, 表示第i个源域图像,n s表示源域图像的数量, 则表示根据各个源域图像的判别结果确定的源域判别损失函数; 表示第j个目标域图像, 表示第j个目标域图像所属的风格类型;n t表示目标域图像的数量,1≤j≤n t,j为正整数, 表示根据各个目标域图像的判别结果确定的目标域判别损失函数,d k表示生成的多种新风格特征和更新后的目标域风格特征的集合中第k种风格特征, 表示第k种风格特征与第i个源域图像更新后的源域内容特征结合生成的图像,1≤k≤N d,k为正整数,N d表示生成的多种新风格特征和更新后的目标域风格 特征的集合中风格特征的总数, 表示根据第一图像的判别结果和第二图像的判别结果确定的判别损失函数。
- 根据权利要求1所述的图像处理方法,还包括:将待检测图像输入训练完成的目标检测模型,得到所述待检测图像的目标检测结果。
- 一种图像处理装置,包括:获取模块,用于获取源域内容特征,获取目标域图像的目标域风格特征;特征生成模块,用于以生成的新风格特征与所述源域图像的源域风格特征和所述目标域风格特征均不同,各种新风格特征彼此不同,以及所述新风格特征与所述源域内容特征结合生成的图像与所述源域图像的语义一致为目标,生成多种新风格特征,并更新所述源域内容特征和所述目标域风格特征;图像生成模块,用于将生成的多种新风格特征和更新后的目标域风格特征分别与更新后的源域内容特征结合,分别生成第一图像和第二图像;训练模块,用于利用所述第一图像、所述第二图像、所述源域图像对目标检测模型进行训练,得到训练完成的目标检测模型。
- 一种图像处理装置,包括:处理器;以及耦接至所述处理器的存储器,用于存储指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-19任一项所述的图像处理方法。
- 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-19任一项所述方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020237038915A KR20230171966A (ko) | 2021-04-16 | 2022-04-15 | 이미지 프로세싱 방법 및 장치 및 컴퓨터 판독 가능 저장 매체 |
JP2023563039A JP2024513596A (ja) | 2021-04-16 | 2022-04-15 | 画像処理方法および装置、ならびにコンピュータ可読ストレージ媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110410920.0A CN113111947B (zh) | 2021-04-16 | 2021-04-16 | 图像处理方法、装置和计算机可读存储介质 |
CN202110410920.0 | 2021-04-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022218396A1 true WO2022218396A1 (zh) | 2022-10-20 |
Family
ID=76718007
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/086976 WO2022218396A1 (zh) | 2021-04-16 | 2022-04-15 | 图像处理方法、装置和计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP2024513596A (zh) |
KR (1) | KR20230171966A (zh) |
CN (1) | CN113111947B (zh) |
WO (1) | WO2022218396A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116246014A (zh) * | 2022-12-28 | 2023-06-09 | 支付宝(杭州)信息技术有限公司 | 一种形象生成方法、装置、存储介质及电子设备 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113111947B (zh) * | 2021-04-16 | 2024-04-09 | 北京沃东天骏信息技术有限公司 | 图像处理方法、装置和计算机可读存储介质 |
CN114511646B (zh) * | 2022-04-19 | 2022-06-14 | 南通东德纺织科技有限公司 | 一种基于图像处理的布匹风格识别方法及系统 |
CN116758617B (zh) * | 2023-08-16 | 2023-11-10 | 四川信息职业技术学院 | 一种低光照度场景下的校园学生签到方法和校园签到系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122120A1 (en) * | 2017-10-20 | 2019-04-25 | Dalei Wu | Self-training method and system for semi-supervised learning with generative adversarial networks |
CN110930295A (zh) * | 2019-10-25 | 2020-03-27 | 广东开放大学(广东理工职业学院) | 一种图像风格迁移方法、系统、装置和存储介质 |
CN111292384A (zh) * | 2020-01-16 | 2020-06-16 | 西安交通大学 | 基于生成式对抗网络的跨域多样性图像生成方法及系统 |
CN112184846A (zh) * | 2020-09-16 | 2021-01-05 | 上海眼控科技股份有限公司 | 图像生成方法、装置、计算机设备和可读存储介质 |
CN113111947A (zh) * | 2021-04-16 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | 图像处理方法、装置和计算机可读存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11380034B2 (en) * | 2017-10-27 | 2022-07-05 | Google Llc | Semantically-consistent image style transfer |
CN108734653B (zh) * | 2018-05-07 | 2022-05-13 | 商汤集团有限公司 | 图像风格转换方法及装置 |
US11205096B2 (en) * | 2018-11-19 | 2021-12-21 | Google Llc | Training image-to-image translation neural networks |
CN110310221B (zh) * | 2019-06-14 | 2022-09-20 | 大连理工大学 | 一种基于生成对抗网络的多域图像风格迁移方法 |
CN112308862A (zh) * | 2020-06-04 | 2021-02-02 | 北京京东尚科信息技术有限公司 | 图像语义分割模型训练、分割方法、装置以及存储介质 |
-
2021
- 2021-04-16 CN CN202110410920.0A patent/CN113111947B/zh active Active
-
2022
- 2022-04-15 WO PCT/CN2022/086976 patent/WO2022218396A1/zh active Application Filing
- 2022-04-15 JP JP2023563039A patent/JP2024513596A/ja active Pending
- 2022-04-15 KR KR1020237038915A patent/KR20230171966A/ko unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190122120A1 (en) * | 2017-10-20 | 2019-04-25 | Dalei Wu | Self-training method and system for semi-supervised learning with generative adversarial networks |
CN110930295A (zh) * | 2019-10-25 | 2020-03-27 | 广东开放大学(广东理工职业学院) | 一种图像风格迁移方法、系统、装置和存储介质 |
CN111292384A (zh) * | 2020-01-16 | 2020-06-16 | 西安交通大学 | 基于生成式对抗网络的跨域多样性图像生成方法及系统 |
CN112184846A (zh) * | 2020-09-16 | 2021-01-05 | 上海眼控科技股份有限公司 | 图像生成方法、装置、计算机设备和可读存储介质 |
CN113111947A (zh) * | 2021-04-16 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | 图像处理方法、装置和计算机可读存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116246014A (zh) * | 2022-12-28 | 2023-06-09 | 支付宝(杭州)信息技术有限公司 | 一种形象生成方法、装置、存储介质及电子设备 |
CN116246014B (zh) * | 2022-12-28 | 2024-05-14 | 支付宝(杭州)信息技术有限公司 | 一种形象生成方法、装置、存储介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
KR20230171966A (ko) | 2023-12-21 |
CN113111947A (zh) | 2021-07-13 |
CN113111947B (zh) | 2024-04-09 |
JP2024513596A (ja) | 2024-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sindagi et al. | Prior-based domain adaptive object detection for hazy and rainy conditions | |
Ribera et al. | Locating objects without bounding boxes | |
CN108470332B (zh) | 一种多目标跟踪方法及装置 | |
WO2022218396A1 (zh) | 图像处理方法、装置和计算机可读存储介质 | |
Xie et al. | Multilevel cloud detection in remote sensing images based on deep learning | |
CN107133569B (zh) | 基于泛化多标记学习的监控视频多粒度标注方法 | |
WO2020228525A1 (zh) | 地点识别及其模型训练的方法和装置以及电子设备 | |
CN113168567A (zh) | 用于小样本转移学习的系统和方法 | |
US9798923B2 (en) | System and method for tracking and recognizing people | |
CN111582409A (zh) | 图像标签分类网络的训练方法、图像标签分类方法及设备 | |
CN108229347A (zh) | 用于人识别的拟吉布斯结构采样的深层置换的方法和装置 | |
CN108021869A (zh) | 一种结合高斯核函数的卷积神经网络跟踪方法 | |
CN108038515A (zh) | 无监督多目标检测跟踪方法及其存储装置与摄像装置 | |
CN110222572A (zh) | 跟踪方法、装置、电子设备及存储介质 | |
Li et al. | Unsupervised domain adaptation with self-attention for post-disaster building damage detection | |
Reddy et al. | AdaCrowd: Unlabeled scene adaptation for crowd counting | |
Kim et al. | A robust matching network for gradually estimating geometric transformation on remote sensing imagery | |
CN113065409A (zh) | 一种基于摄像分头布差异对齐约束的无监督行人重识别方法 | |
CN111444816A (zh) | 一种基于Faster RCNN的多尺度密集行人检测方法 | |
Yang et al. | Increaco: incrementally learned automatic check-out with photorealistic exemplar augmentation | |
Wang et al. | Robust visual tracking via a hybrid correlation filter | |
Pino et al. | Semantic segmentation of radio-astronomical images | |
Shit et al. | An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection | |
TW202226054A (zh) | 物件辨識裝置及物件辨識方法 | |
Wang et al. | Adaptive sampling for UAV tracking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22787618 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023563039 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 20237038915 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020237038915 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.02.2024) |