WO2022048182A1 - Image style transfer method and apparatus, and image style transfer model training method and apparatus - Google Patents

Image style transfer method and apparatus, and image style transfer model training method and apparatus Download PDF

Info

Publication number
WO2022048182A1
WO2022048182A1 PCT/CN2021/093432 CN2021093432W WO2022048182A1 WO 2022048182 A1 WO2022048182 A1 WO 2022048182A1 CN 2021093432 W CN2021093432 W CN 2021093432W WO 2022048182 A1 WO2022048182 A1 WO 2022048182A1
Authority
WO
WIPO (PCT)
Prior art keywords
style
image
instance
content
encoder
Prior art date
Application number
PCT/CN2021/093432
Other languages
French (fr)
Chinese (zh)
Inventor
傅慧源
马华东
余艇
张宇
Original Assignee
北京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京邮电大学 filed Critical 北京邮电大学
Publication of WO2022048182A1 publication Critical patent/WO2022048182A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the present invention relates to the technical field of image processing, in particular to a method and device for image style conversion and model training.
  • Image style transfer is an important content in the field of image processing.
  • image style conversion technology to enhance the image, the nighttime image can be enhanced to obtain a clearer image, thereby greatly improving the visibility of the image, which is of great significance for video surveillance.
  • image style transfer methods are also evolving.
  • the image style method based on traditional image processing technology directly processes the image itself, does not make use of the advanced features of the image, and has poor adaptability to new scenes.
  • the image style transfer technology based on generative adversarial network is more widely used, but the image is still blurred after style transfer, and the examples in the image, such as cars, people, etc., have poor style transfer effect, and pairs of different styles are obtained.
  • There are many technical problems such as the difficulty of image data.
  • the purpose of the present invention is to propose a method and device for image style conversion and model training, so as to improve the adaptability of image style conversion in a variety of different scenarios, and to improve image blurring and instances in images after style conversion The problem of poor effect, while achieving high-quality image style transfer of coarse-grained and fine-grained.
  • the present invention provides a method for image style conversion, comprising:
  • the image of the first style to be converted into the style and the image of the second style as the reference image are respectively input into the content and style encoders in the encoder network, and the feature images of the content and style encoding are extracted respectively;
  • the style and content encoding feature images are respectively input into the multi-layer perceptron module and residual convolution module in the decoder network, and the perceptron operation and residual convolution operation are respectively performed to obtain the adaptive instance normalization module respectively. parameters and intermediate process feature images; and share the obtained parameters to the adaptive instance normalization module of the decoder network;
  • the intermediate process feature image is input into the adaptive instance normalization module for instance normalization, and the instance-normalized feature image is input into the upsampling layer of the decoder network to obtain a Convert to the target image of the second style;
  • the image style conversion model composed of the encoder and decoder networks is pre-trained by image training samples including a plurality of images of the first and second styles and instance images cropped from the image training samples.
  • the image style conversion model is specifically obtained by pre-training according to the following method:
  • the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and a decoder network; the local style The migration model includes: local encoder and decoder networks;
  • an iterative training process includes:
  • the instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network, and the content features are decoded. and style features are reconstructed instance images;
  • the style coding feature image of the first/second style image and the content coding feature image of the instance image of the second/first style are input into the global decoder, and the instance content image of the first/second style that is generated is obtained;
  • the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
  • the method further includes:
  • the first and second style images in the image training samples are input into the global style transfer model, and the content features and style features are decoupled twice through the global encoder network and decoder network. , decode the content features and style features to obtain the reconstructed first and second style images, including:
  • the image of the first style in the image training sample is input into the global encoder network, and the multi-layer convolution operation is performed by the content encoder and the style encoder in the global encoder network.
  • the image of the second style in the image training sample is input into the global encoder network, and the multi-layer convolution operation is performed by the content encoder and the style encoder in the global encoder network.
  • the style encoding feature image of the first style image and the content encoding feature image of the second style image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network.
  • the decoder network obtains a first generated image that combines the style of the first style image and the content of the second style image;
  • the style encoding feature image of the second style image and the content encoding feature image of the first style image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and the fusion is obtained.
  • the first generated image is input into the global encoder network, and the content encoder and style encoder of the global encoder network perform multi-layer convolution operations to decouple the content encoding feature image of the first generated image and style encoding feature image;
  • the second generated image is input into the global encoder network, and the content encoder and style encoder of the global encoder network perform a multi-layer convolution operation to decouple the content encoding feature image of the second generated image and style encoding feature image;
  • the style encoding feature image of the first generated image and the content encoding feature image of the second generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained.
  • the style encoding feature image of the second generated image and the content encoding feature image of the first generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained.
  • Second style image The style encoding feature image of the second generated image and the content encoding feature image of the first generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained.
  • the instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network. , decoded content features and style features to obtain reconstructed instance images, including:
  • the instance images of the first style are input into the local encoder network, and the content encoder and the style encoder in the local encoder network perform multi-layer convolution operations to decouple the instance images of the first style.
  • the style encoding feature image of the first generated instance image and the content encoding feature image of the second generated instance image are respectively input into the multilayer perceptron module and residual convolution module in the local decoder network, and finally the reconstruction is obtained.
  • the instance image of the first/second style is an instance image cropped from the image of the first/second style.
  • the present invention also provides a device for image style conversion, comprising: an image style conversion model trained by the above-mentioned method, which is used to convert the inputted image of the first style to be styled according to the inputted first style as the reference image.
  • a second-style image converted to a second-style target image.
  • the present invention also provides a training method for an image style conversion model, comprising:
  • the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and a decoder network; the local style The migration model includes: local encoder and decoder networks;
  • an iterative training process includes:
  • the instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network, and the content features are decoded. and style features are reconstructed instance images;
  • the style coding feature image of the first/second style image and the content coding feature image of the instance image of the second/first style are input into the global decoder, and the instance content image of the first/second style that is generated is obtained;
  • the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
  • the present invention also provides a training device for an image style conversion model, comprising:
  • An image generation model building module used for building an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network, a decoder network; the local style transfer model includes: a local encoder and a decoder network;
  • the image generation model training module is used to perform multiple iterative training on the image generation model. After the number of iterative training times reaches the first preset number of times, the global style transfer model is used as the image style transfer model obtained by training; wherein, an iterative training process is performed.
  • the reconstructed first and second style images are obtained from content features and style features; the instance images cropped from the first and second style images are input to the local style transfer model, and the local encoder network, decoding
  • the reconstructed instance image is obtained by decoupling the content feature and style feature twice of the decoder network, decoding the content feature and style feature;
  • the image is input to the global decoder, and the generated first/second style instance content image is obtained; the generated first/second style instance content image is input to the content encoder and style encoder in the local encoder for multi-processing.
  • the layer convolution operation decouples the style of the instance content image and the content encoding feature image; the decoupled style encoding feature image of the second style instance content image and the content encoding feature image of the second style image are input into the
  • the global decoder network obtains the reconstructed cross-granularity second style image; the decoupled style encoding feature image of the first style image and the content encoding feature image of the first style instance content image are input to the global decoding obtain the reconstructed cross-granularity instance first style image; according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample, adjust the global parameters of the encoder and decoder networks; the local encoder and decoder are adjusted according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample parameters of the decoder network; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample
  • the image of the first style to be style-converted and the image of the second style as the reference image are respectively input into the content and style encoders in the encoder network, and the content and style encoding features are extracted respectively.
  • the parameters of the normalization module and the intermediate process feature image; and the obtained parameters are shared to the adaptive instance normalization module of the decoder network;
  • the intermediate process feature image is input to the adaptive instance normalization module for Instance normalization, inputting the instance-normalized feature image into the upsampling layer of the decoder network to obtain the target image converted from the first style to the second style;
  • the encoder and decoder network The composed image style transfer model is pre-trained by image training samples including a plurality of first and second style images and instance images cropped from the image training samples
  • the technical solution of the present invention adopts coarse-grained first and second style images and fine-grained instance images when training the image style transfer model, thereby introducing cross-granularity learning into the style transfer model. , which is used to enhance the style transfer quality of fine-grained instances and ensure the style transfer quality of coarse-grained global images, improving the blur and distortion of local instance images after style transfer.
  • FIG. 1 is a flowchart of a method for image style conversion provided by an embodiment of the present invention
  • FIG. 2 is a schematic diagram of the internal structure of an image style conversion model provided by an embodiment of the present invention.
  • FIG. 3 is a flowchart of a training method for an image style conversion model provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the internal structure of an image generation model provided by an embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for iterative training of an image style transfer model provided by an embodiment of the present invention
  • 6a-6h are schematic diagrams of reconstructing images of the first and second styles based on a global encoder and decoder network according to an embodiment of the present invention
  • FIGS. 7a-7h are schematic diagrams of reconstructing example images of the first and second styles based on local encoder and decoder networks according to an embodiment of the present invention.
  • 8a and 8b are schematic diagrams of instance content images of the first and second styles that are generated based on a global decoder network according to an embodiment of the present invention
  • 8c is a schematic diagram of the style and content features of an instance content image based on a local encoder network decoupling provided by an embodiment of the present invention
  • FIG. 8d is a schematic diagram of a cross-granularity second style image reconstructed based on a global decoder network according to an embodiment of the present invention.
  • FIG. 8e is a schematic diagram of a cross-granularity instance first style image reconstructed based on a global decoder network according to an embodiment of the present invention
  • 8f is a schematic diagram of the internal structure of an adversarial training model provided by an embodiment of the present invention.
  • FIG. 9 is a block diagram of the internal structure of an apparatus for training an image style transfer model according to an embodiment of the present invention.
  • FIG. 1 A method for image style conversion provided by an embodiment of the present invention, the specific process is shown in FIG. 1, including the following steps:
  • Step S101 Input the image of the first style to be style-converted and the image of the second style as a reference image into the content and style encoders in the image style conversion model respectively, and extract the content and style encoding feature images respectively.
  • the internal structure of the image style conversion model of the present invention may include: an encoder network 201 and a decoder network 202; wherein, the encoder network 201 may include: a content encoder 211 and a style encoder The decoder 212; the decoder network 202 may include: a multi-layer perceptron module 221, a residual convolution module 222, an adaptive instance normalization module 223 and an upsampling layer 224.
  • the image of the first style and the image of the second style are images in different scenes; for example, the image of the first style may be a nighttime image captured by a surveillance camera device; the image of the second style may be an image without night;
  • the image of the first style may be a grayscale image captured by an infrared camera device;
  • the image of the second style may be a color image;
  • the image of the first style to be style-converted and the image of the second style as the reference image are respectively input into the content encoder 211 and the style encoder 212 in the encoder network 201;
  • the content encoder 211 performs a first preset convolution operation on the input image of the first style to extract high-level feature information of the input image; the style encoder 212 performs a second preset convolution operation on the input image of the second style, Extract high-level feature information of the input image;
  • the content encoder 211 and the style encoder 212 respectively output the extracted content coding feature image and style coding feature image.
  • the content encoder 211 and the style encoder 212 may be a light-weight feature extraction convolutional neural network, such as a convolutional neural network such as UNet (U-shaped convolutional neural network). It can be understood that the feature extraction convolutional neural network continuously expands the receptive field through continuous local convolution operations, and extracts high-level feature information of the input image.
  • UNet U-shaped convolutional neural network
  • the content encoder 211 includes a multi-layer convolution layer, the convolution layer of the latter layer continues the convolution operation on the coding feature output by the convolution layer of the previous layer, and the coding feature output by the convolution layer of the last layer is the input
  • the content of the image encodes the feature image; the size of the convolution kernel and the convolution step size of each convolutional layer in the content encoder 211 can be set according to specific scenarios, for example, the size of the convolution kernel can be set to (7 ⁇ 7), The step size is (1 ⁇ 1).
  • the style encoder 212 includes a multi-layer convolution layer, a global pooling layer and a fully connected layer; the convolution layer of the latter layer continues the convolution operation on the encoded features output by the convolution layer of the previous layer, and the last layer of convolution layer
  • the output coding features are globally pooled by the global pooling layer, and then enter the fully connected layer; the fully connected layer maps the globally pooled coding features to the style feature space to obtain a style coding feature image.
  • Step S102 Input the style and content coding feature images into the multi-layer perceptron module and the residual convolution module in the decoder network, respectively, to obtain the parameters of the adaptive instance normalization module and the intermediate process feature images, respectively.
  • the style encoding feature image and the content encoding feature image are respectively input into the multi-layer perceptron module 221 and the residual convolution module 222 in the decoder network 202, and the perceptron operation and the residual convolution operation are performed respectively,
  • the parameters of the adaptive instance normalization module and the intermediate process feature images are obtained respectively;
  • the style encoding feature image and the content encoding feature image are respectively input into the multi-layer perceptron module 221 and the residual convolution module 222 in the decoder network 202;
  • the multi-layer perceptron module 221 carries out the first preset perceptron operation to the input style coding feature image, and obtains the parameter of the adaptive instance normalization module;
  • the residual convolution module 222 includes a multi-layer residual convolution layer, and the residual convolution module 222 performs a first preset residual convolution operation on the input content coding feature image to obtain an intermediate process feature image.
  • Step S103 Share the obtained parameters to the adaptive instance normalization module of the decoder network.
  • the parameters of the adaptive instance normalization module obtained by the multilayer perceptron module 221 are shared with the adaptive instance normalization module 223 in the decoder network 202 .
  • Step S104 Input the intermediate process feature image into the adaptive instance normalization module for instance normalization.
  • the intermediate process feature image obtained by the residual convolution module 222 is input to the adaptive instance normalization module 223 for instance normalization.
  • Step S105 Input the instance-normalized feature image into the upsampling layer of the decoder network to obtain the target image converted from the first style to the second style.
  • the feature image subjected to instance normalization by the adaptive instance normalization module 223 is input to the upsampling layer 224 of the decoder network 202, and finally the image size of the input image of the first style to be style-converted is the same as that of the input image. , a target image converted from the first style to the second style, that is, a style transfer image is generated.
  • the encoder network 201 is connected with the decoder network 202 in a decoupled manner to perform feature space fusion, and to perform style transfer with high quality while preserving the content.
  • the above-mentioned image style transfer model is obtained by pre-training through image training samples including a plurality of images of the first and second styles and instance images cropped from the image training samples; Granular first and second style images, and fine-grained instance images are also used to introduce cross-granularity learning into the style transfer model, which is used to enhance the style transfer quality of fine-grained instances and ensure the style transfer quality of coarse-grained global images.
  • the instances mainly include vehicles, pedestrians, and traffic signs, thereby improving the conversion effect of image style and improving the blur and distortion of local instance images after style conversion.
  • the image style transfer model belongs to the unsupervised learning model, which does not require paired reference training data of the same scene to be converted, which makes the model have strong generalization ability, greatly reduces the difficulty of data acquisition, and improves the understanding of different Monitors style transfer adaptability in scenarios.
  • a training method for an image style conversion model provided by an embodiment of the present invention the process is shown in FIG. 3 , and includes the following steps:
  • Step S300 Obtain image training samples.
  • a plurality of images of the first style and images of the second style of any scene in the monitoring data can be obtained in the real monitoring scene, as image training samples; in order to ensure the style transfer effect of the image style transfer model obtained by training , a large number of images in different monitoring scenarios can be selected as image training samples; wherein, the images of the first style and the images of the second style in the obtained image training samples can be unpaired; The high-quality style transfer effect is helpful for the target detection frame marked in the data to extract fine-grained instances in the image, and the instance images can also be cropped from the image training samples to obtain unpaired instance training samples for subsequent steps. Granular learning.
  • Step S301 Build an image generation model.
  • the constructed image generation model includes: a global style transfer model 401 and a local style transfer model 402; wherein, the global style transfer model 401 includes: a global encoder network 411 and a global decoder network 412 ; the local style transfer model 402 includes: a local encoder network 421 and a local decoder network 422 .
  • the structures of the global encoder network 411 and the local encoder network 421 are the same, and the structures of the global decoder network 412 and the local decoder network 422 are the same.
  • the structure of the global encoder network 411 can be the same as the structure of the encoder network 201 in the above image style transfer model, and the structure of the global decoder network 412 can be the same as the structure of the decoder network 202 in the above image style transfer model .
  • Step S302 Perform multiple iterative training on the image generation model, and obtain a trained image style conversion model after the number of iterative training times reaches a first preset number of times.
  • the image generation model is iteratively trained for many times. After the number of iterative trainings reaches the first preset number of times, the image style conversion model obtained by training is constructed according to the global encoder network 411 and the global decoder network 412, that is The global encoder network 411 and the global decoder network 412 are used as the encoder network 201 and the decoder network 202 in the image style transfer model obtained by training.
  • Sub-step S501 Input the images of the first and second styles in the image training sample into the global style transfer model to obtain the reconstructed images of the first and second styles;
  • the images of the first and second styles in the image training samples are input into the global style transfer model, and the two decoupled content features and The first and second style images whose style features, decoded content features and style features are reconstructed;
  • the image of the first style in the image training sample is input into the global encoder network 411, and multi-layer convolution is performed by the content encoder and the style encoder in the global encoder network 411
  • the operation decouples the content-encoding feature image and the style-encoding feature image of the image of the first style
  • the images of the second style in the image training samples are input to the global encoder network 411, and the multi-layer convolution operation is decoupled through the content encoder and the style encoder in the global encoder network 411. generating a content-encoding feature image and a style-encoding feature image of the image of the second style;
  • the style-encoded feature images of the first style image and the content-encoded feature images of the second style of images are input to the multi-layer perceptron module and the residual convolution in the global decoder network 412, respectively.
  • module; the multi-layer perceptron module in the global decoder network 412 operates on the style encoding feature image of the input first style image, and the output parameters are shared as the adaptive instance normalization module of the global decoder network 412 parameters;
  • the residual convolution module in the global decoder network 412 performs a residual convolution operation on the content-encoded feature image of the input second style image, and outputs the intermediate process feature image to the global decoder network 412.
  • the instance normalization module is adapted to obtain an instance-normalized feature image that fuses the content features of the images of the second style and the style features of the images of the first style, and the obtained instance-normalized feature images are input to the global In the upsampling layer of the decoder network 412, a first generated image is obtained that combines the style of the image of the first style and the content of the image of the second style.
  • the style encoding feature image of the second style image and the content encoding feature image of the first style image are respectively input to the multi-layer perceptron module and the residual image in the global decoder network 412.
  • the difference convolution module obtains a second generated image that combines the style of the image of the second style and the content of the image of the first style.
  • the first generated image is input to the global encoder network 411, and the content of the first generated image is decoupled through the multi-layer convolution operation performed by the content encoder and the style encoder of the global encoder network 411 encoding feature images and style encoding feature images;
  • the second generated image is input to the global encoder network 411, and the content of the second generated image is decoupled through the multi-layer convolution operation performed by the content encoder and the style encoder of the global encoder network 411. encoding feature images and style encoding feature images;
  • the style encoding feature image of the first generated image and the content encoding feature image of the second generated image are respectively input into the multilayer perceptron module and the residual convolution module in the global decoder network 412, The reconstructed first style image is finally obtained.
  • the style encoding feature image of the second generated image and the content encoding feature image of the first generated image are respectively input to the multilayer perceptron module and the residual convolution module in the global decoder network 412, Finally a reconstructed second style image is obtained.
  • Sub-step S502 input the instance image cut out from the images of the first and second styles into the local style transfer model, obtain the instance images of the first and second styles of reconstruction;
  • the instance images cropped from the first and second style images are input into the local style transfer model, and the two decoupled content features and Style features, decoded content features, and style features are reconstructed instance images; for ease of description, the instance images cropped from the images of the first style in the image training samples are referred to as instance images of the first style.
  • the instance image cropped from the image of the second style in the image training sample is called the instance image of the second style;
  • the instance image of the first style is input into the local encoder network 421, and the content encoder and the style encoder in the local encoder network 421 perform multi-layer convolution operations to decouple the output.
  • a content-encoding feature image and a style-encoding feature image of the instance image of the first style are input into the local encoder network 421, and the content encoder and the style encoder in the local encoder network 421 perform multi-layer convolution operations to decouple the output.
  • the instance image of the second style is input into the local encoder network 421, and the second style is decoupled through the multi-layer convolution operation performed by the content encoder and the style encoder in the local encoder network 421
  • the style-encoded feature images of the first style instance images and the content-encoded feature images of the second style instance images are input to the multi-layer perceptron module and the residual in the local decoder network 422, respectively.
  • Convolution module; the multi-layer perceptron module in the local decoder network 422 operates on the style encoding feature image of the input first style image, and the output parameters are shared as the adaptive instance normalization of the local decoder network 422
  • the parameters of the module; the residual convolution module in the local decoder network 422 performs a residual convolution operation on the content-encoded feature image of the input second style instance image, and outputs the intermediate process feature image to the local decoder network 422.
  • the adaptive instance normalization module of so as to obtain an instance-normalized feature image that fuses the content features of the second-style instance images and the style features of the first-style instance images, and the obtained instance-normalized feature images Input to the upsampling layer of the local decoder network 422, resulting in a first generated instance image that combines the style of the instance image of the first style and the content of the instance image of the second style.
  • the style encoding feature image of the instance image of the second style and the content encoding feature image of the instance image of the first style are respectively input into the multi-layer perceptron module in the local decoder network 422. and the residual convolution module to obtain a second generated instance image that combines the style of the instance image of the second style and the content of the instance image of the first style.
  • the first generated instance image is input to the local encoder network 421, and the first generated instance image is decoupled by performing a multi-layer convolution operation through the content encoder and the style encoder of the local encoder network 421 content-encoding feature images and style-encoding feature images;
  • the second generated instance image is input to the local encoder network 421, and the second generated instance image is decoupled by performing multi-layer convolution operations through the content encoder and style encoder of the local encoder network 421 content-encoding feature images and style-encoding feature images;
  • the style-encoded feature image of the first generated instance image and the content-encoded feature image of the second generated instance image are respectively input to the multi-layer perceptron module and residual convolution in the local decoder network 422 module, and finally get the reconstructed first style instance image.
  • the style-encoded feature image of the second generated instance image and the content-encoded feature image of the first generated instance image are respectively input to the multi-layer perceptron module and residual convolution in the local decoder network 422 module, and finally get the reconstructed second style instance image.
  • Sub-step S503 Input the style coding feature image of the first/second style image and the content coding feature image of the second/first style instance image into the global decoder network to obtain the generated first/second style instance content image ;
  • the style-encoding feature images of the images of the second style and the content-encoding feature images of the instance images of the first style are respectively input to the multi-layer perceptron modules and the multi-layer perceptron modules in the global decoder network 412.
  • the residual convolution module obtains an image that combines the style of the image of the second style and the content of the instance image of the first style, that is, the generated instance content image of the second style;
  • the style-encoded feature images of the first style image and the content-encoded feature images of the second style of instance images are input to the multilayer perceptron module and the residual volume in the global decoder network 412, respectively.
  • the product module obtains an image that combines the style of the image of the first style and the content of the instance image of the second style, that is, the generated instance content image of the first style.
  • Sub-step S504 Input the generated instance content image of the second style into the content encoder and the style encoder in the local encoder network to perform a multi-layer convolution operation to decouple the style and content encoding features of the instance content image image;
  • the generated second style instance content image is input to the content encoder and style encoder in the local encoder network 421 to perform multi-layer convolution operations to decouple the second style.
  • Examples of content images are style-encoded feature images and content-encoded feature images.
  • Sub-step S505 Input the decoupled style coding feature image of the second style instance content image and the content coding feature image of the second style image into the global decoder network to obtain a reconstructed cross-granularity second style image ;
  • the decoupled style-encoding feature images of the second-style instance content images and the content-encoding feature images of the second-style images are respectively input to the multi-level decoders in the global decoder network 412. Layer perceptron modules and residual convolution modules to obtain reconstructed cross-granularity second-style images.
  • Sub-step S506 Input the decoupled style-encoding feature image of the instance image of the first style and the content-encoding feature image of the instance content image of the second style into the local decoder network to obtain the reconstructed cross-granularity instance first style image;
  • the decoupled style-encoding feature images of the instance images of the first style and the content-encoding feature images of the instance content images of the second style are input into the local decoder network 422 respectively.
  • Multi-layer perceptron module and residual convolution module to obtain reconstructed cross-granularity instance first style images.
  • Sub-step S507 Adjust the parameters of the global encoder and decoder network according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample;
  • the parameters of the global encoder and decoder network are adjusted according to the distances between the reconstructed images of the second style and the corresponding images of the second style in the image training samples.
  • Sub-step S508 Adjust the parameters of the local encoder and decoder networks according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample ;
  • the parameters of the local encoder and decoder networks are adjusted according to the distance between the reconstructed instance image of the second style and the instance image cropped from the corresponding second style image in the image training sample.
  • Sub-step S509 According to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and the distance between the reconstructed cross-granularity instance first style image and the corresponding first style image.
  • the distance between instance images, the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
  • the first preset number of times may be 10,000, 20,000, 50,000, etc., which is not specifically limited.
  • the adversarial learning method can also be adopted based on the discriminator to continue training:
  • Step S303 Perform multiple iterative training of adversarial learning on the image generation model based on the discriminator, and when the number of iterative trainings of adversarial learning reaches a second preset number of times, use the global style transfer model as the final image style transfer obtained by training. Model.
  • the adversarial training model based on the image generation model and the discriminator includes: an image generation model, a first discriminator and a second discriminator; wherein, the input of the first discriminator is the same as that in the image generation model.
  • the output of the global decoder network is connected; the input of the second discriminator is connected with the output of the local decoder network in the image generation model;
  • the first and second style images in the image training samples are input into the global style transfer model of the adversarial training model, and the global decoder network in the adversarial training model outputs the reconstructed first and second style images to The first discriminator; the first discriminator discriminates the authenticity of the input image; according to the discriminant result of the first discriminator, the parameters of the first discriminator are adjusted to enhance the discriminant ability of the first discriminator; according to the reconstructed first/second style The distance between the image and the corresponding first/second style image in the image training sample, adjust the parameters of the global encoder and decoder network;
  • the second discriminator discriminates the authenticity of the input image; according to the discriminant result of the second discriminator, adjust the parameters of the second discriminator to enhance the discriminant ability of the second discriminator;
  • the cross-granularity second style image reconstructed by the global decoder network of the adversarial training model can also be output to the first discriminator; the first discriminator judges the authenticity of the input image; , adjust the parameters of the first discriminator to enhance the discriminative ability of the first discriminator; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, adjust the The parameters of the global encoder and decoder network; wherein, the method for generating the reconstructed cross-granularity second style image can be the same as the method for generating the reconstructed cross-granularity second style image described in the above sub-steps S503, S504, S505 are the same, and will not be repeated here;
  • the cross-granularity instance first style image reconstructed by the global decoder network of the adversarial training model can also be output to the first discriminator; the first discriminator judges the authenticity of the input image; , adjust the parameters of the first discriminator to enhance the discriminative ability of the first discriminator; according to the distance between the reconstructed cross-granularity instance first style image and the corresponding first style instance image, adjust the global encoder and The parameters of the decoder network; wherein, the method for generating the reconstructed cross-granularity instance first style image may be the same as the method for generating the reconstructed cross-granularity instance first style image described in the above sub-steps S503, S504, and S506, and is not used here. repeat;
  • the distance between the above-mentioned images reflects the difference between the images, and is used to adjust the parameters of the image generation model; wherein the difference can be the pixel difference between the generated image and the real image and other parameters that can represent the difference between the two, specifically
  • the method of determination is not limited.
  • the image generation model can generally generate a style with higher authenticity. Convert the image, and the first and second discriminators generally cannot distinguish between the real image and the generated image.
  • the parameter adjustment of the image generation model and the first and second discriminators can be stopped to obtain the final image generation model and the first and second discriminators.
  • the second preset number of times may be 10,000, 20,000, 50,000, etc., which is not specifically limited here.
  • the global encoder network 411 and the global decoder network 412 in the image generation model are used as the encoder network 201 and the decoder network 202 in the final image style transfer model obtained by training.
  • An apparatus for image style conversion provided by an embodiment of the present invention includes an image style conversion model trained by the above-mentioned method, which is used to convert the input image of the first style to be style-converted according to the input image of the reference image.
  • the image of the second style converted to the target image of the second style.
  • an apparatus for training an image style transfer model provided by an embodiment of the present invention has a structure as shown in Figure 9, including: an image generation model building module 901 and an image generation model training module 902.
  • the image generation model building module 901 is used to construct an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and decoder network; the local style transfer model includes: local encoder and decoder network;
  • the image generation model training module 902 is used to perform multiple iterative training on the image generation model, and after the number of iterative training times reaches the first preset number of times, the global style transfer model is used as the image style conversion model obtained by training; wherein, an iterative training process is performed.
  • the reconstructed first and second style images are obtained from content features and style features; the instance images cropped from the first and second style images are input to the local style transfer model, and the local encoder network, decoding
  • the reconstructed instance image is obtained by decoupling the content feature and style feature twice of the decoder network, decoding the content feature and style feature;
  • the image is input to the global decoder, and the generated first/second style instance content image is obtained; the generated first/second style instance content image is input to the content encoder and style encoder in the local encoder for multi-processing.
  • the layer convolution operation decouples the style of the instance content image and the content encoding feature image; the decoupled style encoding feature image of the second style instance content image and the content encoding feature image of the second style image are input into the
  • the global decoder network obtains the reconstructed cross-granularity second style image; the decoupled style encoding feature image of the first style image and the content encoding feature image of the first style instance content image are input to the global decoding obtain the reconstructed cross-granularity instance first style image; according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample, adjust the global parameters of the encoder and decoder networks; the local encoder and decoder are adjusted according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample parameters of the decoder network; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample
  • the apparatus for training an image style transfer model may further include: a confrontation training module 903 .
  • the adversarial training module 903 is used to perform iterative training of multiple adversarial learning on the image generation model based on the discriminator.
  • the global style transfer model in the image generation model is used as the The final trained image style transfer model.
  • the image of the first style to be style-converted and the image of the second style as the reference image are respectively input into the content and style encoders in the encoder network, and the content and style encoding features are extracted respectively.
  • the parameters of the normalization module and the intermediate process feature image; and the obtained parameters are shared to the adaptive instance normalization module of the decoder network;
  • the intermediate process feature image is input to the adaptive instance normalization module for Instance normalization, inputting the instance-normalized feature image into the upsampling layer of the decoder network to obtain the target image converted from the first style to the second style;
  • the encoder and decoder network The composed image style transfer model is pre-trained by image training samples including a plurality of first and second style images and instance images cropped from the image training samples
  • the technical solution of the present invention adopts coarse-grained first and second style images and fine-grained instance images when training the image style transfer model, thereby introducing cross-granularity learning into the style transfer model. , which is used to enhance the style transfer quality of fine-grained instances and ensure the style transfer quality of coarse-grained global images, improving the blur and distortion of local instance images after style transfer.
  • the image style conversion model of the present invention belongs to an unsupervised learning model, and does not require paired reference training data of the same scene to be converted, so that the model has strong generalization ability, greatly reduces the difficulty of data acquisition, and improves the In order to adapt to the style transfer in different monitoring scenarios.
  • the computer readable medium of this embodiment includes both permanent and non-permanent, removable and non-removable media and can be implemented by any method or technology for information storage.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read
  • DRAM dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Processing (AREA)

Abstract

Disclosed are an image style transfer method and apparatus, and an image style transfer model training method and apparatus. The image style transfer method comprises: respectively inputting first-style images and second-style images into a content encoder and a style encoder in an encoder network, and respectively extracting content encoded feature images and style encoded feature images; and respectively inputting the style encoded feature images and the content encoded feature images into a decoder network, so as to obtain target images that are transferred from a first style to a second style, wherein an image style transfer model composed of the encoder network and the decoder network is obtained by means of performing pre-training on the basis of image training samples that comprise a plurality of first-style images and second-style images, and instance images cropped from the image training samples. By using the present invention, the style transfer adaptability to a plurality of different scenarios can be improved, the problems of image blur, and poor instance effects in an image after style transfer can be ameliorated, and coarse-grained and fine-grained high-quality image style transfer can also be achieved.

Description

一种图像风格转换及模型训练的方法及装置A method and device for image style conversion and model training 技术领域technical field
本发明涉及图像处理技术领域,特别是指一种图像风格转换及模型训练的方法及装置。The present invention relates to the technical field of image processing, in particular to a method and device for image style conversion and model training.
背景技术Background technique
图像风格转换是图像处理领域的重要内容。随着社会经济的发展,对音视频的处理有越来越多的需求,尤其集中在人脸编辑/生成、图像增强、图像风格转换,应用相关技术的app/相机很容易受到人们的关注,创造经济和社会价值。使用图像风格转换技术来对图像进行增强,可对夜间图像进行增强处理可以得到信息更为清晰的图像,从而大大提高图像可视度,这对于视频监控具有重大意义。随着数字图像处理、模式识别和深度学习技术的日益发展和完善,图像风格转换方法也在不断发展。Image style transfer is an important content in the field of image processing. With the development of society and economy, there are more and more demands for audio and video processing, especially focusing on face editing/generation, image enhancement, and image style conversion. Apps/cameras that apply related technologies are easy to attract people's attention. Create economic and social value. Using image style conversion technology to enhance the image, the nighttime image can be enhanced to obtain a clearer image, thereby greatly improving the visibility of the image, which is of great significance for video surveillance. With the increasing development and perfection of digital image processing, pattern recognition, and deep learning techniques, image style transfer methods are also evolving.
基于传统图像处理技术的图像风格方法直接对图像本身进行处理,没有对图像的高级特征加以利用,对新场景的适应性差。在现有技术中,基于生成对抗网络的图像风格迁移技术应用更为广泛,但仍然存在风格转换后图像模糊,图像中的实例比如车、人等风格转换效果较差,获取成对的不同风格图像数据困难等诸多技术问题。The image style method based on traditional image processing technology directly processes the image itself, does not make use of the advanced features of the image, and has poor adaptability to new scenes. In the prior art, the image style transfer technology based on generative adversarial network is more widely used, but the image is still blurred after style transfer, and the examples in the image, such as cars, people, etc., have poor style transfer effect, and pairs of different styles are obtained. There are many technical problems such as the difficulty of image data.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明的目的在于提出一种图像风格转换及模型训练的方法及装置,以提高对多种不同场景下的图像风格转换适应性,并改善风格转换后图像模糊和图像中的实例效果差的问题,同时做到粗粒度和细粒度的高质量图像风格迁移。In view of this, the purpose of the present invention is to propose a method and device for image style conversion and model training, so as to improve the adaptability of image style conversion in a variety of different scenarios, and to improve image blurring and instances in images after style conversion The problem of poor effect, while achieving high-quality image style transfer of coarse-grained and fine-grained.
基于上述目的,本发明提供一种图像风格转换的方法,包括:Based on the above purpose, the present invention provides a method for image style conversion, comprising:
将待风格转换的第一风格的图像和作为参考图像的第二风格的图像,分 别输入到编码器网络中的内容、风格编码器,分别提取出内容、风格编码特征图像;The image of the first style to be converted into the style and the image of the second style as the reference image are respectively input into the content and style encoders in the encoder network, and the feature images of the content and style encoding are extracted respectively;
将所述风格、内容编码特征图像分别输入到解码器网络中的多层感知机模块和残差卷积模块,分别进行感知机运算和残差卷积运算,分别得到自适应实例归一化模块的参数和中间过程特征图像;并将得到的参数共享到所述解码器网络的自适应实例归一化模块;The style and content encoding feature images are respectively input into the multi-layer perceptron module and residual convolution module in the decoder network, and the perceptron operation and residual convolution operation are respectively performed to obtain the adaptive instance normalization module respectively. parameters and intermediate process feature images; and share the obtained parameters to the adaptive instance normalization module of the decoder network;
将所述中间过程特征图像输入到所述自适应实例归一化模块进行实例归一化,将实例归一化的特征图像输入到所述解码器网络的上采样层中,得到从第一风格转换到第二风格的目标图像;The intermediate process feature image is input into the adaptive instance normalization module for instance normalization, and the instance-normalized feature image is input into the upsampling layer of the decoder network to obtain a Convert to the target image of the second style;
其中,由所述编、解码器网络组成的图像风格转换模型,是通过包括多个第一、二风格的图像的图像训练样本以及从图像训练样本中裁剪的实例图像预先训练得到的。The image style conversion model composed of the encoder and decoder networks is pre-trained by image training samples including a plurality of images of the first and second styles and instance images cropped from the image training samples.
较佳地,所述图像风格转换模型具体根据如下方法预先训练得到:Preferably, the image style conversion model is specifically obtained by pre-training according to the following method:
构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;constructing an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and a decoder network; the local style The migration model includes: local encoder and decoder networks;
对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;Perform multiple iterative training on the image generation model, and after the number of iterative training reaches the first preset number of times, use the global style transfer model as the image style transfer model obtained by training;
其中,一次迭代训练过程包括:Among them, an iterative training process includes:
将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;Input the images of the first and second styles in the image training samples into the global style transfer model, and decouple the content features and style features and decode the content features twice through the global encoder network and decoder network. and style features are reconstructed for the first and second style images;
将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格 特征、解码内容特征和风格特征得到重建的实例图像;The instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network, and the content features are decoded. and style features are reconstructed instance images;
将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;The style coding feature image of the first/second style image and the content coding feature image of the instance image of the second/first style are input into the global decoder, and the instance content image of the first/second style that is generated is obtained;
将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;Inputting the generated first/second style instance content image into the content encoder and the style encoder in the local encoder to perform multi-layer convolution operations to decouple the style and content encoding feature image of the instance content image;
将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;Inputting the decoupled style-encoding feature images of the second-style instance content images and the content-encoding feature images of the second-style images into a global decoder network to obtain a reconstructed cross-granularity second-style image;
将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;inputting the decoupled style coding feature image of the first style image and the content coding feature image of the first style instance content image into the global decoder network to obtain a reconstructed cross-granularity instance first style image;
根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;Adjusting the parameters of the global encoder and decoder network according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample;
根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;adjusting the parameters of the local encoder and decoder networks according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample;
根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。According to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and between the reconstructed cross-granularity instance first style image and the corresponding first style instance image distance, the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
较佳地,所述在迭代次数达到第一预设次数后,还包括:Preferably, after the number of iterations reaches the first preset number of times, the method further includes:
对所述图像生成模型基于判别器进行多次对抗学习的迭代训练,当对抗学习的迭代训练次数达到第二预设次数时,将图像生成模型中的全局风格迁 移模型作为最终的训练得到的图像风格转换模型。Perform multiple iterative training of confrontational learning on the image generation model based on the discriminator, and when the number of iterative training times of the confrontational learning reaches the second preset number of times, the global style transfer model in the image generation model is used as the final image obtained by training Style transfer model.
其中,所述将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像,具体包括:The first and second style images in the image training samples are input into the global style transfer model, and the content features and style features are decoupled twice through the global encoder network and decoder network. , decode the content features and style features to obtain the reconstructed first and second style images, including:
将所述图像训练样本中的第一风格的图像输入到所述全局的编码器网络,通过所述全局的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第一风格的图像的内容编码特征图像和风格编码特征图像;The image of the first style in the image training sample is input into the global encoder network, and the multi-layer convolution operation is performed by the content encoder and the style encoder in the global encoder network. A content-encoding feature image and a style-encoding feature image of a style image;
将所述图像训练样本中的第二风格的图像输入到所述全局的编码器网络,通过所述全局的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第二风格的图像的内容编码特征图像和风格编码特征图像;The image of the second style in the image training sample is input into the global encoder network, and the multi-layer convolution operation is performed by the content encoder and the style encoder in the global encoder network. Content-encoding feature images and style-encoding feature images of two styles of images;
将第一风格的图像的风格编码特征图像和第二风格的图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,所述全局的解码器网络得到融合了第一风格的图像的风格、第二风格的图像的内容的第一生成图像;The style encoding feature image of the first style image and the content encoding feature image of the second style image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network. The decoder network obtains a first generated image that combines the style of the first style image and the content of the second style image;
将第二风格的图像的风格编码特征图像和第一风格的图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,得到融合了第二风格的图像的风格、第一风格的图像的内容的第二生成图像;The style encoding feature image of the second style image and the content encoding feature image of the first style image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and the fusion is obtained. The style of the image of the second style, the second generated image of the content of the image of the first style;
将第一生成图像输入到所述全局的编码器网络,通过所述全局的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第一生成图像的内容编码特征图像和风格编码特征图像;The first generated image is input into the global encoder network, and the content encoder and style encoder of the global encoder network perform multi-layer convolution operations to decouple the content encoding feature image of the first generated image and style encoding feature image;
将第二生成图像输入到所述全局的编码器网络,通过所述全局的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第二生成图像的内容编码特征图像和风格编码特征图像;The second generated image is input into the global encoder network, and the content encoder and style encoder of the global encoder network perform a multi-layer convolution operation to decouple the content encoding feature image of the second generated image and style encoding feature image;
将第一生成图像的风格编码特征图像和第二生成图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第一风格的图像;The style encoding feature image of the first generated image and the content encoding feature image of the second generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained. a style image;
将第二生成图像的风格编码特征图像和第一生成图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第二风格的图像。The style encoding feature image of the second generated image and the content encoding feature image of the first generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained. Second style image.
其中,所述将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像,具体包括:The instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network. , decoded content features and style features to obtain reconstructed instance images, including:
将第一风格的实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第一风格的实例图像的内容编码特征图像和风格编码特征图像;The instance images of the first style are input into the local encoder network, and the content encoder and the style encoder in the local encoder network perform multi-layer convolution operations to decouple the instance images of the first style. Content-encoding feature images and style-encoding feature images;
将第二风格的实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第二风格的实例图像的内容编码特征图像和风格编码特征图像;Input the instance image of the second style into the local encoder network, and perform multi-layer convolution operations through the content encoder and the style encoder in the local encoder network to decouple the instance image of the second style. Content-encoding feature images and style-encoding feature images;
将第一风格的实例图像的风格编码特征图像和第二风格的实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,得到融合了第一风格的实例图像的风格、第二风格的实例图像的内容的第一生成实例图像;Input the style encoding feature image of the instance image of the first style and the content encoding feature image of the instance image of the second style into the multi-layer perceptron module and the residual convolution module in the local decoder network, respectively, to obtain a first generated instance image that combines the style of the instance image of the first style and the content of the instance image of the second style;
将第二风格的实例图像的风格编码特征图像和第一风格的实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,得到融合了第二风格的实例图像的风格、第一风格的实例图像的内容的第二生成实例图像;Input the style-encoded feature image of the instance image of the second style and the content-encoded feature image of the instance image of the first style into the multilayer perceptron module and the residual convolution module in the local decoder network, respectively, to obtain a second generated instance image that combines the style of the instance image of the second style and the content of the instance image of the first style;
将第一生成实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第一生成实例 图像的内容编码特征图像和风格编码特征图像;Input the first generated instance image into the local encoder network, and perform multi-layer convolution operations through the content encoder and style encoder of the local encoder network to decouple the content encoding features of the first generated instance image. Image and style encoding feature images;
将第二生成实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第二生成实例图像的内容编码特征图像和风格编码特征图像;Inputting the second generated instance image into the local encoder network, and performing multi-layer convolution operations through the content encoder and style encoder of the local encoder network to decouple the content encoding features of the second generated instance image Image and style encoding feature images;
将第一生成实例图像的风格编码特征图像和第二生成实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第一风格的实例图像;The style encoding feature image of the first generated instance image and the content encoding feature image of the second generated instance image are respectively input into the multilayer perceptron module and residual convolution module in the local decoder network, and finally the reconstruction is obtained. An instance image of the first style;
将第二生成实例图像的风格编码特征图像和第一生成实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第二风格的实例图像;Input the style coding feature image of the second generated instance image and the content coding feature image of the first generated instance image to the multilayer perceptron module and the residual convolution module in the local decoder network respectively, and finally obtain the reconstruction Instance image of the second style;
其中,第一/二风格的实例图像为从第一/二风格的图像中裁剪出的实例图像。Wherein, the instance image of the first/second style is an instance image cropped from the image of the first/second style.
本发明还提供一种图像风格转换的装置,包括:如上所述的方法训练得到的图像风格转换模型,用于将输入的待风格转换的第一风格的图像,根据输入的作为参考图像的第二风格的图像,转换为第二风格的目标图像。The present invention also provides a device for image style conversion, comprising: an image style conversion model trained by the above-mentioned method, which is used to convert the inputted image of the first style to be styled according to the inputted first style as the reference image. A second-style image, converted to a second-style target image.
本发明还提供一种图像风格转换模型的训练方法,包括:The present invention also provides a training method for an image style conversion model, comprising:
构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;constructing an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and a decoder network; the local style The migration model includes: local encoder and decoder networks;
对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;Perform multiple iterative training on the image generation model, and after the number of iterative training reaches the first preset number of times, use the global style transfer model as the image style transfer model obtained by training;
其中,一次迭代训练过程包括:Among them, an iterative training process includes:
将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格 特征、解码内容特征和风格特征得到重建的第一、二风格的图像;Input the images of the first and second styles in the image training samples into the global style transfer model, and decouple the content features and style features and decode the content features twice through the global encoder network and decoder network. and style features are reconstructed for the first and second style images;
将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;The instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network, and the content features are decoded. and style features are reconstructed instance images;
将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;The style coding feature image of the first/second style image and the content coding feature image of the instance image of the second/first style are input into the global decoder, and the instance content image of the first/second style that is generated is obtained;
将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;Inputting the generated first/second style instance content image into the content encoder and the style encoder in the local encoder to perform multi-layer convolution operations to decouple the style and content encoding feature image of the instance content image;
将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;Inputting the decoupled style-encoding feature images of the second-style instance content images and the content-encoding feature images of the second-style images into a global decoder network to obtain a reconstructed cross-granularity second-style image;
将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;inputting the decoupled style coding feature image of the first style image and the content coding feature image of the first style instance content image into the global decoder network to obtain a reconstructed cross-granularity instance first style image;
根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;Adjusting the parameters of the global encoder and decoder network according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample;
根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;adjusting the parameters of the local encoder and decoder networks according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample;
根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。According to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and between the reconstructed cross-granularity instance first style image and the corresponding first style instance image distance, the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
本发明还提供一种图像风格转换模型的训练装置,包括:The present invention also provides a training device for an image style conversion model, comprising:
图像生成模型构建模块,用于构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;An image generation model building module, used for building an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network, a decoder network; the local style transfer model includes: a local encoder and a decoder network;
图像生成模型训练模块,用于对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;其中,一次迭代训练过程包括:将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器 和解码器网络的参数;根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。The image generation model training module is used to perform multiple iterative training on the image generation model. After the number of iterative training times reaches the first preset number of times, the global style transfer model is used as the image style transfer model obtained by training; wherein, an iterative training process is performed. Including: inputting the first and second style images in the image training samples into the global style transfer model, decoupling content features and style features twice through the global encoder network and decoder network, decoding The reconstructed first and second style images are obtained from content features and style features; the instance images cropped from the first and second style images are input to the local style transfer model, and the local encoder network, decoding The reconstructed instance image is obtained by decoupling the content feature and style feature twice of the decoder network, decoding the content feature and style feature; The image is input to the global decoder, and the generated first/second style instance content image is obtained; the generated first/second style instance content image is input to the content encoder and style encoder in the local encoder for multi-processing. The layer convolution operation decouples the style of the instance content image and the content encoding feature image; the decoupled style encoding feature image of the second style instance content image and the content encoding feature image of the second style image are input into the The global decoder network obtains the reconstructed cross-granularity second style image; the decoupled style encoding feature image of the first style image and the content encoding feature image of the first style instance content image are input to the global decoding obtain the reconstructed cross-granularity instance first style image; according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample, adjust the global parameters of the encoder and decoder networks; the local encoder and decoder are adjusted according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample parameters of the decoder network; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and the reconstructed cross-granularity instance first style image and the corresponding first style image The distances between instance images of the style are jointly adjusted for the local encoder and decoder network and the global encoder and decoder network.
本发明的技术方案中,将待风格转换的第一风格的图像和作为参考图像的第二风格的图像,分别输入到编码器网络中的内容、风格编码器,分别提取出内容、风格编码特征图像;将所述内容、风格编码特征图像分别输入到解码器网络中的多层感知机模块和残差卷积模块,分别进行感知机运算和残差卷积运算,分别得到自适应实例归一化模块的参数和中间过程特征图像;并将得到的参数共享到所述解码器网络的自适应实例归一化模块;将所述中间过程特征图像输入到所述自适应实例归一化模块进行实例归一化,将实例归一化的特征图像输入到所述解码器网络的上采样层中,得到从第一风格转换到第二风格的目标图像;其中,由所述编、解码器网络组成的图像风格转换模型,是通过包括多个第一、二风格的图像的图像训练样本以及从图像训练样本中裁剪的实例图像预先训练得到的。与现有技术相比,本发明技术方案在训练图像风格转换模型时,采用了粗粒度的第一、二风格的图像,还采用了细粒度的实例图像,从而将跨粒度学习引入风格转换模型中,用于加强细粒度实例的风格转换质量并保证粗粒度全局图像的风格转换质量,改善了风格转换后局部的实例图像模糊和失真的问题。In the technical scheme of the present invention, the image of the first style to be style-converted and the image of the second style as the reference image are respectively input into the content and style encoders in the encoder network, and the content and style encoding features are extracted respectively. image; input the content and style encoding feature images into the multi-layer perceptron module and residual convolution module in the decoder network respectively, perform perceptron operation and residual convolution operation respectively, and obtain adaptive instance normalization respectively The parameters of the normalization module and the intermediate process feature image; and the obtained parameters are shared to the adaptive instance normalization module of the decoder network; the intermediate process feature image is input to the adaptive instance normalization module for Instance normalization, inputting the instance-normalized feature image into the upsampling layer of the decoder network to obtain the target image converted from the first style to the second style; wherein, the encoder and decoder network The composed image style transfer model is pre-trained by image training samples including a plurality of first and second style images and instance images cropped from the image training samples. Compared with the prior art, the technical solution of the present invention adopts coarse-grained first and second style images and fine-grained instance images when training the image style transfer model, thereby introducing cross-granularity learning into the style transfer model. , which is used to enhance the style transfer quality of fine-grained instances and ensure the style transfer quality of coarse-grained global images, improving the blur and distortion of local instance images after style transfer.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明实施例提供的一种图像风格转换的方法流程图;1 is a flowchart of a method for image style conversion provided by an embodiment of the present invention;
图2为本发明实施例提供的一种图像风格转换模型的内部结构示意图;2 is a schematic diagram of the internal structure of an image style conversion model provided by an embodiment of the present invention;
图3为本发明实施例提供的一种图像风格转换模型的训练方法流程图;3 is a flowchart of a training method for an image style conversion model provided by an embodiment of the present invention;
图4为本发明实施例提供的一种图像生成模型的内部结构示意图;4 is a schematic diagram of the internal structure of an image generation model provided by an embodiment of the present invention;
图5为本发明实施例提供的一种图像风格转换模型的一次迭代训练的方法流程图;5 is a flowchart of a method for iterative training of an image style transfer model provided by an embodiment of the present invention;
图6a~6h为本发明实施例提供的基于全局的编、解码器网络重建第一、二风格的图像的示意图;6a-6h are schematic diagrams of reconstructing images of the first and second styles based on a global encoder and decoder network according to an embodiment of the present invention;
图7a~7h为本发明实施例提供的基于局部的编、解码器网络重建第一、二风格的实例图像的示意图;7a-7h are schematic diagrams of reconstructing example images of the first and second styles based on local encoder and decoder networks according to an embodiment of the present invention;
图8a、8b为本发明实施例提供的基于全局的解码器网络得到生成的第一、二风格的实例内容图像的示意图;8a and 8b are schematic diagrams of instance content images of the first and second styles that are generated based on a global decoder network according to an embodiment of the present invention;
图8c为本发明实施例提供的基于局部的编码器网络解耦实例内容图像的风格、内容特征的示意图;8c is a schematic diagram of the style and content features of an instance content image based on a local encoder network decoupling provided by an embodiment of the present invention;
图8d为本发明实施例提供的基于全局的解码器网络得到重建的跨粒度第二风格的图像的示意图;8d is a schematic diagram of a cross-granularity second style image reconstructed based on a global decoder network according to an embodiment of the present invention;
图8e为本发明实施例提供的基于全局的解码器网络得到重建的跨粒度实例第一风格图像的示意图;8e is a schematic diagram of a cross-granularity instance first style image reconstructed based on a global decoder network according to an embodiment of the present invention;
图8f为本发明实施例提供的对抗性训练模型的内部结构示意图;8f is a schematic diagram of the internal structure of an adversarial training model provided by an embodiment of the present invention;
图9为本发明实施例提供的一种图像风格转换模型的训练装置内部结构框图。FIG. 9 is a block diagram of the internal structure of an apparatus for training an image style transfer model according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to specific embodiments and accompanying drawings.
需要说明的是,除非另外定义,本发明实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意 指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of the present invention shall have the usual meanings understood by those with ordinary skill in the art to which the present disclosure belongs. As used in this disclosure, "first," "second," and similar terms do not denote any order, quantity, or importance, but are merely used to distinguish the various components. "Comprises" or "comprising" and similar words mean that the elements or things appearing before the word encompass the elements or things listed after the word and their equivalents, but do not exclude other elements or things. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right", etc. are only used to represent the relative positional relationship, and when the absolute position of the described object changes, the relative positional relationship may also change accordingly.
下面结合附图详细说明本发明实施例的技术方案。The technical solutions of the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
本发明实施例提供的一种图像风格转换的方法,具体流程如图1所示,包括如下步骤:A method for image style conversion provided by an embodiment of the present invention, the specific process is shown in FIG. 1, including the following steps:
步骤S101:将待风格转换的第一风格的图像和作为参考图像的第二风格的图像,分别输入到图像风格转换模型中的内容、风格编码器,分别提取出内容、风格编码特征图像。Step S101: Input the image of the first style to be style-converted and the image of the second style as a reference image into the content and style encoders in the image style conversion model respectively, and extract the content and style encoding feature images respectively.
具体地,本发明的图像风格转换模型的内部结构,如图2所示,可以包括:编码器网络201和解码器网络202;其中,编码器网络201中可以包括:内容编码器211和风格编码器212;解码器网络202中可以包括:多层感知机模块221、残差卷积模块222、自适应实例归一化模块223和上采样层224。Specifically, the internal structure of the image style conversion model of the present invention, as shown in FIG. 2, may include: an encoder network 201 and a decoder network 202; wherein, the encoder network 201 may include: a content encoder 211 and a style encoder The decoder 212; the decoder network 202 may include: a multi-layer perceptron module 221, a residual convolution module 222, an adaptive instance normalization module 223 and an upsampling layer 224.
第一风格的图像和第二风格图像是不同场景下的图像;例如,第一风格的图像可以是监控摄像设备拍摄的夜间图像;第二风格的图像可以是无夜色的图像;The image of the first style and the image of the second style are images in different scenes; for example, the image of the first style may be a nighttime image captured by a surveillance camera device; the image of the second style may be an image without night;
或者,第一风格的图像可以是红外摄像设备拍摄的灰度图像;第二风格的图像可以是彩色图像;Alternatively, the image of the first style may be a grayscale image captured by an infrared camera device; the image of the second style may be a color image;
本步骤中,将待风格转换的第一风格的图像和作为参考图像的第二风格的图像,分别输入到编码器网络201中的内容编码器211和风格编码器212;In this step, the image of the first style to be style-converted and the image of the second style as the reference image are respectively input into the content encoder 211 and the style encoder 212 in the encoder network 201;
内容编码器211对输入的第一风格的图像进行第一预设卷积运算,提取输入图像的高层特征信息;风格编码器212对输入的第二风格的图像进行第二预设卷积运算,提取输入图像的高层特征信息;The content encoder 211 performs a first preset convolution operation on the input image of the first style to extract high-level feature information of the input image; the style encoder 212 performs a second preset convolution operation on the input image of the second style, Extract high-level feature information of the input image;
内容编码器211和风格编码器212,分别输出提取出的内容编码特征图像和风格编码特征图像。The content encoder 211 and the style encoder 212 respectively output the extracted content coding feature image and style coding feature image.
在一种具体的实施方式中,内容编码器211和风格编码器212可以为轻量级特征提取卷积神经网络,如UNet(U型结构的卷积神经网络)等卷积神经网络。可以理解的是,特征提取卷积神经网络通过连续的局部卷积操作不断扩大感受野,提取到输入图像的高层特征信息。In a specific embodiment, the content encoder 211 and the style encoder 212 may be a light-weight feature extraction convolutional neural network, such as a convolutional neural network such as UNet (U-shaped convolutional neural network). It can be understood that the feature extraction convolutional neural network continuously expands the receptive field through continuous local convolution operations, and extracts high-level feature information of the input image.
其中,内容编码器211包括多层卷积层,后一层的卷积层对前一层卷积层输出的编码特征继续进行卷积操作,最后一层卷积层输出的编码特征即为输入图像的内容编码特征图像;内容编码器211中每层卷积层的卷积核大小和卷积步长可以根据具体的场景设定,例如可以设定卷积核大小为(7×7),步长为(1×1)。The content encoder 211 includes a multi-layer convolution layer, the convolution layer of the latter layer continues the convolution operation on the coding feature output by the convolution layer of the previous layer, and the coding feature output by the convolution layer of the last layer is the input The content of the image encodes the feature image; the size of the convolution kernel and the convolution step size of each convolutional layer in the content encoder 211 can be set according to specific scenarios, for example, the size of the convolution kernel can be set to (7×7), The step size is (1×1).
风格编码器212包括多层卷积层、全局池化层和全连接层;后一层的卷积层对前一层卷积层输出的编码特征继续进行卷积操作,最后一层卷积层输出的编码特征经全局池化层的全局池化后,进入全连接层;全连接层将全局池化后的编码特征映射到风格特征空间,得到风格编码特征图像。The style encoder 212 includes a multi-layer convolution layer, a global pooling layer and a fully connected layer; the convolution layer of the latter layer continues the convolution operation on the encoded features output by the convolution layer of the previous layer, and the last layer of convolution layer The output coding features are globally pooled by the global pooling layer, and then enter the fully connected layer; the fully connected layer maps the globally pooled coding features to the style feature space to obtain a style coding feature image.
步骤S102:将所述风格、内容编码特征图像分别输入到解码器网络中的多层感知机模块和残差卷积模块,分别得到自适应实例归一化模块的参数和中间过程特征图像。Step S102: Input the style and content coding feature images into the multi-layer perceptron module and the residual convolution module in the decoder network, respectively, to obtain the parameters of the adaptive instance normalization module and the intermediate process feature images, respectively.
本步骤中,将风格编码特征图像和内容编码特征图像,分别输入到解码器网络202中的多层感知机模块221和残差卷积模块222,分别进行感知机运算和残差卷积运算,分别得到自适应实例归一化模块的参数和中间过程特征图像;In this step, the style encoding feature image and the content encoding feature image are respectively input into the multi-layer perceptron module 221 and the residual convolution module 222 in the decoder network 202, and the perceptron operation and the residual convolution operation are performed respectively, The parameters of the adaptive instance normalization module and the intermediate process feature images are obtained respectively;
具体地,将风格编码特征图像和内容编码特征图像,分别输入到解码器网络202中的多层感知机模块221和残差卷积模块222;Specifically, the style encoding feature image and the content encoding feature image are respectively input into the multi-layer perceptron module 221 and the residual convolution module 222 in the decoder network 202;
多层感知机模块221对输入的风格编码特征图像进行第一预设感知机运 算,得到自适应实例归一化模块的参数;The multi-layer perceptron module 221 carries out the first preset perceptron operation to the input style coding feature image, and obtains the parameter of the adaptive instance normalization module;
残差卷积模块222包括多层残差卷积层,残差卷积模块222对输入的内容编码特征图像进行第一预设残差卷积运算,得到中间过程特征图像。The residual convolution module 222 includes a multi-layer residual convolution layer, and the residual convolution module 222 performs a first preset residual convolution operation on the input content coding feature image to obtain an intermediate process feature image.
步骤S103:将得到的参数共享到所述解码器网络的自适应实例归一化模块。Step S103: Share the obtained parameters to the adaptive instance normalization module of the decoder network.
本步骤中,将多层感知机模块221得到的自适应实例归一化模块的参数共享到解码器网络202中的自适应实例归一化模块223。In this step, the parameters of the adaptive instance normalization module obtained by the multilayer perceptron module 221 are shared with the adaptive instance normalization module 223 in the decoder network 202 .
步骤S104:将所述中间过程特征图像输入到所述自适应实例归一化模块进行实例归一化。Step S104: Input the intermediate process feature image into the adaptive instance normalization module for instance normalization.
本步骤中,将残差卷积模块222得到的中间过程特征图像输入到自适应实例归一化模块223进行实例归一化。In this step, the intermediate process feature image obtained by the residual convolution module 222 is input to the adaptive instance normalization module 223 for instance normalization.
步骤S105:将实例归一化的特征图像输入到所述解码器网络的上采样层中,得到从第一风格转换到第二风格的目标图像。Step S105: Input the instance-normalized feature image into the upsampling layer of the decoder network to obtain the target image converted from the first style to the second style.
本步骤中,将经过自适应实例归一化模块223进行实例归一化的特征图像输入到解码器网络202的上采样层224最终得到与输入的待风格转换的第一风格的图像尺寸相同的、从第一风格转换到第二风格的目标图像,即生成风格转换图像。In this step, the feature image subjected to instance normalization by the adaptive instance normalization module 223 is input to the upsampling layer 224 of the decoder network 202, and finally the image size of the input image of the first style to be style-converted is the same as that of the input image. , a target image converted from the first style to the second style, that is, a style transfer image is generated.
从而,编码器网络201通过解耦的方式和所述解码器网络202进行连接,以进行特征空间的融合,保留内容的同时高质量进行风格迁移。Thus, the encoder network 201 is connected with the decoder network 202 in a decoupled manner to perform feature space fusion, and to perform style transfer with high quality while preserving the content.
上述的图像风格转换模型是通过包括多个第一、二风格的图像的图像训练样本以及从图像训练样本中裁剪的实例图像预先训练得到的;由于,在训练图像风格转换模型时,采用了粗粒度的第一、二风格的图像,还采用了细粒度的实例图像,从而将跨粒度学习引入风格转换模型中,用于加强细粒度实例的风格转换质量并保证粗粒度全局图像的风格转换质量,所述实例主要包括车辆、行人、交通标志,从而提高了图像风格的转换效果,改善了风格 转换后局部的实例图像模糊和失真的问题。The above-mentioned image style transfer model is obtained by pre-training through image training samples including a plurality of images of the first and second styles and instance images cropped from the image training samples; Granular first and second style images, and fine-grained instance images are also used to introduce cross-granularity learning into the style transfer model, which is used to enhance the style transfer quality of fine-grained instances and ensure the style transfer quality of coarse-grained global images. , the instances mainly include vehicles, pedestrians, and traffic signs, thereby improving the conversion effect of image style and improving the blur and distortion of local instance images after style conversion.
另一方面,图像风格转换模型属于无监督学习的模型,不需要成对的同一场景的参考待转换训练数据,使得模型的泛化能力强,大大减少了数据获取的困难,从而提高了对不同监控场景下的风格转换适应性。On the other hand, the image style transfer model belongs to the unsupervised learning model, which does not require paired reference training data of the same scene to be converted, which makes the model have strong generalization ability, greatly reduces the difficulty of data acquisition, and improves the understanding of different Monitors style transfer adaptability in scenarios.
本发明实施例提供的一种图像风格转换模型的训练方法,流程如图3所示,包括如下步骤:A training method for an image style conversion model provided by an embodiment of the present invention, the process is shown in FIG. 3 , and includes the following steps:
步骤S300:获取图像训练样本。Step S300: Obtain image training samples.
本步骤中,可以在真实的监控场景中获取监控数据中任意场景的多个第一风格的图像和第二风格的图像,作为图像训练样本;为了保证训练得到的图像风格转换模型的风格转换效果,可以选取大量的不同监控场景下的图像作为图像训练样本;其中,获取的图像训练样本中的第一风格的图像和第二风格的图像可以是非成对的;此外,为了得到学习到细粒度的高质量风格转换效果,利于数据中标注的目标检测框提取出图像中细粒度的实例,还可从图像训练样本中裁剪出实例图像,得到非成对的实例训练样本用于后续步骤的跨粒度的学习。In this step, a plurality of images of the first style and images of the second style of any scene in the monitoring data can be obtained in the real monitoring scene, as image training samples; in order to ensure the style transfer effect of the image style transfer model obtained by training , a large number of images in different monitoring scenarios can be selected as image training samples; wherein, the images of the first style and the images of the second style in the obtained image training samples can be unpaired; The high-quality style transfer effect is helpful for the target detection frame marked in the data to extract fine-grained instances in the image, and the instance images can also be cropped from the image training samples to obtain unpaired instance training samples for subsequent steps. Granular learning.
步骤S301:构建图像生成模型。Step S301: Build an image generation model.
如图4所示,构建的图像生成模型中包括:全局风格迁移模型401和局部风格迁移模型402;其中,所述全局风格迁移模型401中包括:全局的编码器网络411和全局的解码器网络412;所述局部风格迁移模型402中包括:局部的编码器网络421和局部的解码器网络422。As shown in FIG. 4 , the constructed image generation model includes: a global style transfer model 401 and a local style transfer model 402; wherein, the global style transfer model 401 includes: a global encoder network 411 and a global decoder network 412 ; the local style transfer model 402 includes: a local encoder network 421 and a local decoder network 422 .
其中,全局的编码器网络411和局部的编码器网络421的结构相同,全局的解码器网络412和局部的解码器网络422的结构相同。而全局的编码器网络411的结构可以与上述图像风格转换模型中的编码器网络201的结构相同,全局的解码器网络412的结构可以与上述图像风格转换模型中的解码器网络202的结构相同。The structures of the global encoder network 411 and the local encoder network 421 are the same, and the structures of the global decoder network 412 and the local decoder network 422 are the same. The structure of the global encoder network 411 can be the same as the structure of the encoder network 201 in the above image style transfer model, and the structure of the global decoder network 412 can be the same as the structure of the decoder network 202 in the above image style transfer model .
步骤S302:对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,得到训练出的图像风格转换模型。Step S302: Perform multiple iterative training on the image generation model, and obtain a trained image style conversion model after the number of iterative training times reaches a first preset number of times.
具体地,对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,根据全局的编码器网络411和全局的解码器网络412构建出训练得到的图像风格转换模型,即将全局的编码器网络411和全局的解码器网络412作为训练得到的图像风格转换模型中的编码器网络201和解码器网络202。Specifically, the image generation model is iteratively trained for many times. After the number of iterative trainings reaches the first preset number of times, the image style conversion model obtained by training is constructed according to the global encoder network 411 and the global decoder network 412, that is The global encoder network 411 and the global decoder network 412 are used as the encoder network 201 and the decoder network 202 in the image style transfer model obtained by training.
本步骤中的一次迭代训练过程,具体流程如图5所示,包括如下子步骤:An iterative training process in this step, the specific process is shown in Figure 5, including the following sub-steps:
子步骤S501:将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,得到重建的第一、二风格的图像;Sub-step S501: Input the images of the first and second styles in the image training sample into the global style transfer model to obtain the reconstructed images of the first and second styles;
本子步骤中,将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、全局的解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;In this sub-step, the images of the first and second styles in the image training samples are input into the global style transfer model, and the two decoupled content features and The first and second style images whose style features, decoded content features and style features are reconstructed;
具体地,如图6a所示,将图像训练样本中的第一风格的图像输入到全局的编码器网络411,通过全局的编码器网络411中的内容编码器和风格编码器进行多层卷积操作解耦出第一风格的图像的内容编码特征图像和风格编码特征图像;Specifically, as shown in Fig. 6a, the image of the first style in the image training sample is input into the global encoder network 411, and multi-layer convolution is performed by the content encoder and the style encoder in the global encoder network 411 The operation decouples the content-encoding feature image and the style-encoding feature image of the image of the first style;
如图6b所示,将图像训练样本中的第二风格的图像输入到全局的编码器网络411,通过全局的编码器网络411中的内容编码器和风格编码器进行多层卷积操作解耦出第二风格的图像的内容编码特征图像和风格编码特征图像;As shown in Figure 6b, the images of the second style in the image training samples are input to the global encoder network 411, and the multi-layer convolution operation is decoupled through the content encoder and the style encoder in the global encoder network 411. generating a content-encoding feature image and a style-encoding feature image of the image of the second style;
如图6c所示,将第一风格的图像的风格编码特征图像和第二风格的图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感知机模块和残差卷积模块;全局的解码器网络412中的多层感知机模块对输入的第 一风格的图像的风格编码特征图像进行操作,输出的参数共享为全局的解码器网络412的自适应实例归一化模块的参数;全局的解码器网络412中的残差卷积模块对输入第二风格的图像的内容编码特征图像进行残差卷积操作,输出中间过程特征图像到全局的解码器网络412中的自适应实例归一化模块,从而得到融合第二风格的图像的内容特征和第一风格的图像的风格特征的实例归一化的特征图像,将得到的实例归一化的特征图像输入到全局的解码器网络412的上采样层中,得到融合了第一风格的图像的风格、第二风格的图像的内容的第一生成图像。As shown in Fig. 6c, the style-encoded feature images of the first style image and the content-encoded feature images of the second style of images are input to the multi-layer perceptron module and the residual convolution in the global decoder network 412, respectively. module; the multi-layer perceptron module in the global decoder network 412 operates on the style encoding feature image of the input first style image, and the output parameters are shared as the adaptive instance normalization module of the global decoder network 412 parameters; the residual convolution module in the global decoder network 412 performs a residual convolution operation on the content-encoded feature image of the input second style image, and outputs the intermediate process feature image to the global decoder network 412. The instance normalization module is adapted to obtain an instance-normalized feature image that fuses the content features of the images of the second style and the style features of the images of the first style, and the obtained instance-normalized feature images are input to the global In the upsampling layer of the decoder network 412, a first generated image is obtained that combines the style of the image of the first style and the content of the image of the second style.
同理,如图6d所示,将第二风格的图像的风格编码特征图像和第一风格的图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感知机模块和残差卷积模块,得到融合了第二风格的图像的风格、第一风格的图像的内容的第二生成图像。Similarly, as shown in Fig. 6d, the style encoding feature image of the second style image and the content encoding feature image of the first style image are respectively input to the multi-layer perceptron module and the residual image in the global decoder network 412. The difference convolution module obtains a second generated image that combines the style of the image of the second style and the content of the image of the first style.
如图6e所示,将第一生成图像输入到全局的编码器网络411,通过全局的编码器网络411的内容编码器和风格编码器进行多层卷积操作解耦出第一生成图像的内容编码特征图像和风格编码特征图像;As shown in Fig. 6e, the first generated image is input to the global encoder network 411, and the content of the first generated image is decoupled through the multi-layer convolution operation performed by the content encoder and the style encoder of the global encoder network 411 encoding feature images and style encoding feature images;
如图6f所示,将第二生成图像输入到全局的编码器网络411,通过全局的编码器网络411的内容编码器和风格编码器进行多层卷积操作解耦出第二生成图像的内容编码特征图像和风格编码特征图像;As shown in Fig. 6f, the second generated image is input to the global encoder network 411, and the content of the second generated image is decoupled through the multi-layer convolution operation performed by the content encoder and the style encoder of the global encoder network 411. encoding feature images and style encoding feature images;
如图6g所示,将第一生成图像的风格编码特征图像和第二生成图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感知机模块和残差卷积模块,最终得到重建的第一风格的图像。As shown in Figure 6g, the style encoding feature image of the first generated image and the content encoding feature image of the second generated image are respectively input into the multilayer perceptron module and the residual convolution module in the global decoder network 412, The reconstructed first style image is finally obtained.
如图6h所示,将第二生成图像的风格编码特征图像和第一生成图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感知机模块和残差卷积模块,最终得到重建的第二风格的图像。As shown in Figure 6h, the style encoding feature image of the second generated image and the content encoding feature image of the first generated image are respectively input to the multilayer perceptron module and the residual convolution module in the global decoder network 412, Finally a reconstructed second style image is obtained.
子步骤S502:将从第一、二风格的图像中裁剪出的实例图像输入到所述 局部风格迁移模型,得到重建的第一、二风格的实例图像;Sub-step S502: input the instance image cut out from the images of the first and second styles into the local style transfer model, obtain the instance images of the first and second styles of reconstruction;
本子步骤中,将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、局部的解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;为便于描述,本文中将从图像训练样本中的第一风格的图像中裁剪出的实例图像称为第一风格的实例图像,将将从图像训练样本中的第二风格的图像中裁剪出的实例图像称为第二风格的实例图像;In this sub-step, the instance images cropped from the first and second style images are input into the local style transfer model, and the two decoupled content features and Style features, decoded content features, and style features are reconstructed instance images; for ease of description, the instance images cropped from the images of the first style in the image training samples are referred to as instance images of the first style. The instance image cropped from the image of the second style in the image training sample is called the instance image of the second style;
具体地,如图7a所示,将第一风格的实例图像输入到局部的编码器网络421,通过局部的编码器网络421中的内容编码器和风格编码器进行多层卷积操作解耦出第一风格的实例图像的内容编码特征图像和风格编码特征图像;Specifically, as shown in Fig. 7a, the instance image of the first style is input into the local encoder network 421, and the content encoder and the style encoder in the local encoder network 421 perform multi-layer convolution operations to decouple the output. a content-encoding feature image and a style-encoding feature image of the instance image of the first style;
如图7b所示,将第二风格的实例图像输入到局部的编码器网络421,通过局部的编码器网络421中的内容编码器和风格编码器进行多层卷积操作解耦出第二风格的实例图像的内容编码特征图像和风格编码特征图像;As shown in Fig. 7b, the instance image of the second style is input into the local encoder network 421, and the second style is decoupled through the multi-layer convolution operation performed by the content encoder and the style encoder in the local encoder network 421 The content-encoding feature image and the style-encoding feature image of the instance image;
如图7c所示,将第一风格的实例图像的风格编码特征图像和第二风格的实例图像的内容编码特征图像,分别输入到局部的解码器网络422中的多层感知机模块和残差卷积模块;局部的解码器网络422中的多层感知机模块对输入的第一风格图像的风格编码特征图像进行操作,输出的参数共享为局部的解码器网络422的自适应实例归一化模块的参数;局部的解码器网络422中的残差卷积模块对输入第二风格的实例图像的内容编码特征图像进行残差卷积操作,输出中间过程特征图像到局部的解码器网络422中的自适应实例归一化模块,从而得到融合第二风格的实例图像的内容特征和第一风格的实例图像的风格特征的实例归一化的特征图像,将得到的实例归一化的特征图像输入到局部的解码器网络422的上采样层中,得到融合了第一风格的实例图像的风格、第二风格的实例图像的内容的第一生成实例图像。As shown in Figure 7c, the style-encoded feature images of the first style instance images and the content-encoded feature images of the second style instance images are input to the multi-layer perceptron module and the residual in the local decoder network 422, respectively. Convolution module; the multi-layer perceptron module in the local decoder network 422 operates on the style encoding feature image of the input first style image, and the output parameters are shared as the adaptive instance normalization of the local decoder network 422 The parameters of the module; the residual convolution module in the local decoder network 422 performs a residual convolution operation on the content-encoded feature image of the input second style instance image, and outputs the intermediate process feature image to the local decoder network 422. The adaptive instance normalization module of , so as to obtain an instance-normalized feature image that fuses the content features of the second-style instance images and the style features of the first-style instance images, and the obtained instance-normalized feature images Input to the upsampling layer of the local decoder network 422, resulting in a first generated instance image that combines the style of the instance image of the first style and the content of the instance image of the second style.
同理,如图7d所示,将第二风格的实例图像的风格编码特征图像和第一风格的实例图像的内容编码特征图像,分别输入到局部的解码器网络422中的多层感知机模块和残差卷积模块,得到融合了第二风格的实例图像的风格、第一风格的实例图像的内容的第二生成实例图像。Similarly, as shown in FIG. 7d, the style encoding feature image of the instance image of the second style and the content encoding feature image of the instance image of the first style are respectively input into the multi-layer perceptron module in the local decoder network 422. and the residual convolution module to obtain a second generated instance image that combines the style of the instance image of the second style and the content of the instance image of the first style.
如图7e所示,将第一生成实例图像输入到局部的编码器网络421,通过局部的编码器网络421的内容编码器和风格编码器进行多层卷积操作解耦出第一生成实例图像的内容编码特征图像和风格编码特征图像;As shown in Fig. 7e, the first generated instance image is input to the local encoder network 421, and the first generated instance image is decoupled by performing a multi-layer convolution operation through the content encoder and the style encoder of the local encoder network 421 content-encoding feature images and style-encoding feature images;
如图7f所示,将第二生成实例图像输入到局部的编码器网络421,通过局部的编码器网络421的内容编码器和风格编码器进行多层卷积操作解耦出第二生成实例图像的内容编码特征图像和风格编码特征图像;As shown in Fig. 7f, the second generated instance image is input to the local encoder network 421, and the second generated instance image is decoupled by performing multi-layer convolution operations through the content encoder and style encoder of the local encoder network 421 content-encoding feature images and style-encoding feature images;
如图7g所示,将第一生成实例图像的风格编码特征图像和第二生成实例图像的内容编码特征图像,分别输入到局部的解码器网络422中的多层感知机模块和残差卷积模块,最终得到重建的第一风格的实例图像。As shown in Fig. 7g, the style-encoded feature image of the first generated instance image and the content-encoded feature image of the second generated instance image are respectively input to the multi-layer perceptron module and residual convolution in the local decoder network 422 module, and finally get the reconstructed first style instance image.
如图7h所示,将第二生成实例图像的风格编码特征图像和第一生成实例图像的内容编码特征图像,分别输入到局部的解码器网络422中的多层感知机模块和残差卷积模块,最终得到重建的第二风格的实例图像。As shown in Figure 7h, the style-encoded feature image of the second generated instance image and the content-encoded feature image of the first generated instance image are respectively input to the multi-layer perceptron module and residual convolution in the local decoder network 422 module, and finally get the reconstructed second style instance image.
子步骤S503:将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器网络,得到生成的第一/二风格的实例内容图像;Sub-step S503: Input the style coding feature image of the first/second style image and the content coding feature image of the second/first style instance image into the global decoder network to obtain the generated first/second style instance content image ;
具体地,如图8a所示,将第二风格的图像的风格编码特征图像和第一风格的实例图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感知机模块和残差卷积模块,得到融合了第二风格的图像的风格、第一风格的实例图像的内容的图像,即生成的第二风格的实例内容图像;Specifically, as shown in Fig. 8a, the style-encoding feature images of the images of the second style and the content-encoding feature images of the instance images of the first style are respectively input to the multi-layer perceptron modules and the multi-layer perceptron modules in the global decoder network 412. The residual convolution module obtains an image that combines the style of the image of the second style and the content of the instance image of the first style, that is, the generated instance content image of the second style;
如图8b所示,将第一风格的图像的风格编码特征图像和第二风格的实例图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感 知机模块和残差卷积模块,得到融合了第一风格的图像的风格、第二风格的实例图像的内容的图像,即生成的第一风格的实例内容图像。As shown in Figure 8b, the style-encoded feature images of the first style image and the content-encoded feature images of the second style of instance images are input to the multilayer perceptron module and the residual volume in the global decoder network 412, respectively. The product module obtains an image that combines the style of the image of the first style and the content of the instance image of the second style, that is, the generated instance content image of the first style.
子步骤S504:将生成的第二风格的实例内容图像输入到局部的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;Sub-step S504: Input the generated instance content image of the second style into the content encoder and the style encoder in the local encoder network to perform a multi-layer convolution operation to decouple the style and content encoding features of the instance content image image;
具体地,如图8c所示,将生成的第二风格的实例内容图像输入到局部的编码器网络421中的内容编码器和风格编码器进行多层卷积操作解耦出所述第二风格的实例内容图像的风格编码特征图像和内容编码特征图像。Specifically, as shown in Fig. 8c, the generated second style instance content image is input to the content encoder and style encoder in the local encoder network 421 to perform multi-layer convolution operations to decouple the second style. Examples of content images are style-encoded feature images and content-encoded feature images.
子步骤S505:将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;Sub-step S505: Input the decoupled style coding feature image of the second style instance content image and the content coding feature image of the second style image into the global decoder network to obtain a reconstructed cross-granularity second style image ;
具体地,如图8d所示,将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像,分别输入到全局的解码器网络412中的多层感知机模块和残差卷积模块,得到重建的跨粒度第二风格的图像。Specifically, as shown in FIG. 8d , the decoupled style-encoding feature images of the second-style instance content images and the content-encoding feature images of the second-style images are respectively input to the multi-level decoders in the global decoder network 412. Layer perceptron modules and residual convolution modules to obtain reconstructed cross-granularity second-style images.
子步骤S506:将解耦出的第一风格的实例图像的风格编码特征图像、第二风格的实例内容图像的内容编码特征图像输入到局部的解码器网络,得到重建的跨粒度实例第一风格图像;Sub-step S506: Input the decoupled style-encoding feature image of the instance image of the first style and the content-encoding feature image of the instance content image of the second style into the local decoder network to obtain the reconstructed cross-granularity instance first style image;
具体地,如图8e所示,将解耦出的第一风格的实例图像的风格编码特征图像、第二风格的实例内容图像的内容编码特征图像,分别输入到局部的解码器网络422中的多层感知机模块和残差卷积模块,得到重建的跨粒度实例第一风格图像。Specifically, as shown in FIG. 8e , the decoupled style-encoding feature images of the instance images of the first style and the content-encoding feature images of the instance content images of the second style are input into the local decoder network 422 respectively. Multi-layer perceptron module and residual convolution module to obtain reconstructed cross-granularity instance first style images.
子步骤S507:根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;Sub-step S507: Adjust the parameters of the global encoder and decoder network according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample;
具体地,根据重建的第一风格的图像与所述图像训练样本中相应的第一风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;Specifically, adjusting the parameters of the global encoder and decoder networks according to the distance between the reconstructed first style image and the corresponding first style image in the image training sample;
根据重建的第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数。The parameters of the global encoder and decoder network are adjusted according to the distances between the reconstructed images of the second style and the corresponding images of the second style in the image training samples.
子步骤S508:根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;Sub-step S508: Adjust the parameters of the local encoder and decoder networks according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample ;
具体地,根据重建的第一风格的实例图像与从所述图像训练样本中相应的第一风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;Specifically, according to the distance between the reconstructed instance image of the first style and the instance image cropped from the corresponding first style image in the image training sample, adjust the local encoder and decoder network parameter;
根据重建的第二风格的实例图像与从所述图像训练样本中相应的第二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数。The parameters of the local encoder and decoder networks are adjusted according to the distance between the reconstructed instance image of the second style and the instance image cropped from the corresponding second style image in the image training sample.
子步骤S509:根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。Sub-step S509: According to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and the distance between the reconstructed cross-granularity instance first style image and the corresponding first style image. The distance between instance images, the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
当迭代次数达到第一预设次数时,说明编码器网络和解码器网络的参数调整次数达到第一预设次数,此时图像生成模型已经有了较好的特征提取能力和特征恢复能力,所以可以停止调整初始图像生成模型的参数,得到最终的图像生成模型。其中,第一预设次数可以为1万、2万、5万等,不做具体限定。When the number of iterations reaches the first preset number of times, it means that the number of parameter adjustments of the encoder network and the decoder network has reached the first preset number of times. At this time, the image generation model already has better feature extraction and feature recovery capabilities. Therefore, You can stop adjusting the parameters of the initial image generation model to get the final image generation model. Wherein, the first preset number of times may be 10,000, 20,000, 50,000, etc., which is not specifically limited.
作为一种更优的实施方式,还可在后续步骤S303中基于判别器采取对抗学习的方式继续训练:As a more optimal implementation, in the subsequent step S303, the adversarial learning method can also be adopted based on the discriminator to continue training:
步骤S303:对图像生成模型基于判别器进行多次对抗学习的迭代训练, 当对抗学习的迭代训练次数达到第二预设次数时,将所述全局风格迁移模型作为最终的训练得到的图像风格转换模型。Step S303: Perform multiple iterative training of adversarial learning on the image generation model based on the discriminator, and when the number of iterative trainings of adversarial learning reaches a second preset number of times, use the global style transfer model as the final image style transfer obtained by training. Model.
基于图像生成模型以及判别器构成的对抗性训练模型,如图8f所示,包括:图像生成模型、第一判别器和第二判别器;其中,第一判别器的输入与图像生成模型中的全局的解码器网络的输出相连接;第二判别器的输入与图像生成模型中的局部的解码器网络的输出相连接;The adversarial training model based on the image generation model and the discriminator, as shown in Figure 8f, includes: an image generation model, a first discriminator and a second discriminator; wherein, the input of the first discriminator is the same as that in the image generation model. The output of the global decoder network is connected; the input of the second discriminator is connected with the output of the local decoder network in the image generation model;
在利用对抗性训练模型进行对抗学习的迭代训练过程中,可以包括如下:In the iterative training process of adversarial learning using an adversarial training model, the following can be included:
将所述图像训练样本中的第一、二风格的图像输入到对抗性训练模型的全局风格迁移模型中,对抗性训练模型中的全局的解码器网络输出重建的第一、二风格的图像到第一判别器;第一判别器判别输入图像的真假;根据第一判别器的判别结果,调整第一判别器的参数,增强第一判别器的判别能力;根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;The first and second style images in the image training samples are input into the global style transfer model of the adversarial training model, and the global decoder network in the adversarial training model outputs the reconstructed first and second style images to The first discriminator; the first discriminator discriminates the authenticity of the input image; according to the discriminant result of the first discriminator, the parameters of the first discriminator are adjusted to enhance the discriminant ability of the first discriminator; according to the reconstructed first/second style The distance between the image and the corresponding first/second style image in the image training sample, adjust the parameters of the global encoder and decoder network;
将从第一、二风格的实例图像输入到对抗性训练模型的局部风格迁移模型,对抗性训练模型中的局部的解码器网络输出重建的第一、二风格的实例图像到第二判别器;第二判别器判别输入图像的真假;根据第二判别器的判别结果,调整第二判别器的参数,增强第二判别器的判别能力;根据重建的第一、二风格的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;Input the instance images of the first and second styles to the local style transfer model of the adversarial training model, and the local decoder network in the adversarial training model outputs the reconstructed instance images of the first and second styles to the second discriminator; The second discriminator discriminates the authenticity of the input image; according to the discriminant result of the second discriminator, adjust the parameters of the second discriminator to enhance the discriminant ability of the second discriminator; The distance between the cropped instance images in the corresponding first/second style images in the image training samples, and adjusting the parameters of the local encoder and decoder networks;
此外,还可将对抗性训练模型的全局的解码器网络重建的跨粒度第二风格的图像输出至第一判别器;第一判别器判别输入图像的真假;根据第一判别器的判别结果,调整第一判别器的参数,增强第一判别器的判别能力;根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;其中,重建的 跨粒度第二风格的图像的生成方法可与上述子步骤S503、S504、S505中描述的生成重建的跨粒度第二风格的图像的方法相同,此处不再赘述;In addition, the cross-granularity second style image reconstructed by the global decoder network of the adversarial training model can also be output to the first discriminator; the first discriminator judges the authenticity of the input image; , adjust the parameters of the first discriminator to enhance the discriminative ability of the first discriminator; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, adjust the The parameters of the global encoder and decoder network; wherein, the method for generating the reconstructed cross-granularity second style image can be the same as the method for generating the reconstructed cross-granularity second style image described in the above sub-steps S503, S504, S505 are the same, and will not be repeated here;
此外,还可将对抗性训练模型的全局的解码器网络重建的跨粒度实例第一风格图像输出至第一判别器;第一判别器判别输入图像的真假;根据第一判别器的判别结果,调整第一判别器的参数,增强第一判别器的判别能力;根据重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,调整所述全局的编码器和解码器网络的参数;其中,重建的跨粒度实例第一风格图像的生成方法可与上述子步骤S503、S504、S506中描述的生成重建的跨粒度实例第一风格图像的方法相同,此处不再赘述;In addition, the cross-granularity instance first style image reconstructed by the global decoder network of the adversarial training model can also be output to the first discriminator; the first discriminator judges the authenticity of the input image; , adjust the parameters of the first discriminator to enhance the discriminative ability of the first discriminator; according to the distance between the reconstructed cross-granularity instance first style image and the corresponding first style instance image, adjust the global encoder and The parameters of the decoder network; wherein, the method for generating the reconstructed cross-granularity instance first style image may be the same as the method for generating the reconstructed cross-granularity instance first style image described in the above sub-steps S503, S504, and S506, and is not used here. repeat;
上述图像之间的距离反映了图像之间的差异,用于调整所述图像生成模型的参数;其中该差异可以是生成的图像与真实的图像的像素差等能够表示二者差异的参数,具体的判定方式不做限定。The distance between the above-mentioned images reflects the difference between the images, and is used to adjust the parameters of the image generation model; wherein the difference can be the pixel difference between the generated image and the real image and other parameters that can represent the difference between the two, specifically The method of determination is not limited.
当对抗学习的迭代训练次数达到第二预设次数时,说明图像生成模型和第一、二判别器的参数调整次数已经达到足够次数,此时图像生成模型一般已经可以生成真实度较高的风格转换图像,而第一、二判别器一般无法区分真实图像和生成图像,此时可以停止图像生成模型和第一、二判别器的参数调整,得到最终的图像生成模型和第一、二判别器。其中,第二预设次数可以为1万、2万、5万等,在此不做具体限定。将图像生成模型中的全局的编码器网络411和全局的解码器网络412作为最终训练得到的图像风格转换模型中的编码器网络201和解码器网络202。When the number of iterative training times of adversarial learning reaches the second preset number of times, it means that the number of times of parameter adjustment of the image generation model and the first and second discriminators has reached enough times. At this time, the image generation model can generally generate a style with higher authenticity. Convert the image, and the first and second discriminators generally cannot distinguish between the real image and the generated image. At this time, the parameter adjustment of the image generation model and the first and second discriminators can be stopped to obtain the final image generation model and the first and second discriminators. . The second preset number of times may be 10,000, 20,000, 50,000, etc., which is not specifically limited here. The global encoder network 411 and the global decoder network 412 in the image generation model are used as the encoder network 201 and the decoder network 202 in the final image style transfer model obtained by training.
本发明实施例提供的一种图像风格转换的装置,包括如上所述的方法训练得到的图像风格转换模型,用于将输入的待风格转换的第一风格的图像,根据输入的作为参考图像的第二风格的图像,转换为第二风格的目标图像。An apparatus for image style conversion provided by an embodiment of the present invention includes an image style conversion model trained by the above-mentioned method, which is used to convert the input image of the first style to be style-converted according to the input image of the reference image. The image of the second style, converted to the target image of the second style.
基于上述的图像风格转换模型的训练方法,本发明实施例提供的一种图像风格转换模型的训练装置,结构如图9所示,包括:图像生成模型构建模 块901、图像生成模型训练模块902。Based on the above-mentioned training method for an image style transfer model, an apparatus for training an image style transfer model provided by an embodiment of the present invention has a structure as shown in Figure 9, including: an image generation model building module 901 and an image generation model training module 902.
其中,图像生成模型构建模块901用于构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;Wherein, the image generation model building module 901 is used to construct an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and decoder network; the local style transfer model includes: local encoder and decoder network;
图像生成模型训练模块902用于对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;其中,一次迭代训练过程包括:将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码 器和解码器网络的参数;根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。The image generation model training module 902 is used to perform multiple iterative training on the image generation model, and after the number of iterative training times reaches the first preset number of times, the global style transfer model is used as the image style conversion model obtained by training; wherein, an iterative training process is performed. Including: inputting the first and second style images in the image training samples into the global style transfer model, decoupling content features and style features twice through the global encoder network and decoder network, decoding The reconstructed first and second style images are obtained from content features and style features; the instance images cropped from the first and second style images are input to the local style transfer model, and the local encoder network, decoding The reconstructed instance image is obtained by decoupling the content feature and style feature twice of the decoder network, decoding the content feature and style feature; The image is input to the global decoder, and the generated first/second style instance content image is obtained; the generated first/second style instance content image is input to the content encoder and style encoder in the local encoder for multi-processing. The layer convolution operation decouples the style of the instance content image and the content encoding feature image; the decoupled style encoding feature image of the second style instance content image and the content encoding feature image of the second style image are input into the The global decoder network obtains the reconstructed cross-granularity second style image; the decoupled style encoding feature image of the first style image and the content encoding feature image of the first style instance content image are input to the global decoding obtain the reconstructed cross-granularity instance first style image; according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample, adjust the global parameters of the encoder and decoder networks; the local encoder and decoder are adjusted according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample parameters of the decoder network; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and the reconstructed cross-granularity instance first style image and the corresponding first style image The distances between instance images of the style are jointly adjusted for the local encoder and decoder network and the global encoder and decoder network.
进一步,本发明实施例提供的一种图像风格转换模型的训练装置还可包括:对抗训练模块903。Further, the apparatus for training an image style transfer model provided by the embodiment of the present invention may further include: a confrontation training module 903 .
对抗训练模块903用于对所述图像生成模型基于判别器进行多次对抗学习的迭代训练,当对抗学习的迭代训练次数达到第二预设次数时,将图像生成模型中的全局风格迁移模型作为最终的训练得到的图像风格转换模型。对抗训练模块903对图像生成模型基于判别器进行多次对抗学习的具体迭代训练方法可参考上述步骤S303中的方法,此处不再赘述。The adversarial training module 903 is used to perform iterative training of multiple adversarial learning on the image generation model based on the discriminator. When the iterative training times of the adversarial learning reaches the second preset number of times, the global style transfer model in the image generation model is used as the The final trained image style transfer model. For the specific iterative training method for the adversarial training module 903 to perform multiple confrontational learning on the image generation model based on the discriminator, reference may be made to the method in the above step S303, which will not be repeated here.
本发明实施例提供的一种图像风格转换模型的训练装置中各模块的功能的具体实现方法可参考上述图3所示各步骤中的方法,此处不再赘述。For the specific implementation method of the functions of each module in the apparatus for training an image style transfer model provided by the embodiment of the present invention, reference may be made to the method in each step shown in FIG. 3 above, which will not be repeated here.
本发明的技术方案中,将待风格转换的第一风格的图像和作为参考图像的第二风格的图像,分别输入到编码器网络中的内容、风格编码器,分别提取出内容、风格编码特征图像;将所述内容、风格编码特征图像分别输入到解码器网络中的多层感知机模块和残差卷积模块,分别进行感知机运算和残差卷积运算,分别得到自适应实例归一化模块的参数和中间过程特征图像;并将得到的参数共享到所述解码器网络的自适应实例归一化模块;将所述中间过程特征图像输入到所述自适应实例归一化模块进行实例归一化,将实例归一化的特征图像输入到所述解码器网络的上采样层中,得到从第一风格转换到第二风格的目标图像;其中,由所述编、解码器网络组成的图像风格转换模型,是通过包括多个第一、二风格的图像的图像训练样本以及从图像训练样本中裁剪的实例图像预先训练得到的。与现有技术相比,本发明技术方案在训练图像风格转换模型时,采用了粗粒度的第一、二风格的图像,还采 用了细粒度的实例图像,从而将跨粒度学习引入风格转换模型中,用于加强细粒度实例的风格转换质量并保证粗粒度全局图像的风格转换质量,改善了风格转换后局部的实例图像模糊和失真的问题。In the technical scheme of the present invention, the image of the first style to be style-converted and the image of the second style as the reference image are respectively input into the content and style encoders in the encoder network, and the content and style encoding features are extracted respectively. image; input the content and style encoding feature images into the multi-layer perceptron module and residual convolution module in the decoder network respectively, perform perceptron operation and residual convolution operation respectively, and obtain adaptive instance normalization respectively The parameters of the normalization module and the intermediate process feature image; and the obtained parameters are shared to the adaptive instance normalization module of the decoder network; the intermediate process feature image is input to the adaptive instance normalization module for Instance normalization, inputting the instance-normalized feature image into the upsampling layer of the decoder network to obtain the target image converted from the first style to the second style; wherein, the encoder and decoder network The composed image style transfer model is pre-trained by image training samples including a plurality of first and second style images and instance images cropped from the image training samples. Compared with the prior art, the technical solution of the present invention adopts coarse-grained first and second style images and fine-grained instance images when training the image style transfer model, thereby introducing cross-granularity learning into the style transfer model. , which is used to enhance the style transfer quality of fine-grained instances and ensure the style transfer quality of coarse-grained global images, improving the blur and distortion of local instance images after style transfer.
另一方面,本发明的图像风格转换模型属于无监督学习的模型,不需要成对的同一场景的参考待转换训练数据,使得模型的泛化能力强,大大减少了数据获取的困难,从而提高了对不同监控场景下的风格转换适应性。On the other hand, the image style conversion model of the present invention belongs to an unsupervised learning model, and does not require paired reference training data of the same scene to be converted, so that the model has strong generalization ability, greatly reduces the difficulty of data acquisition, and improves the In order to adapt to the style transfer in different monitoring scenarios.
本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。The computer readable medium of this embodiment includes both permanent and non-permanent, removable and non-removable media and can be implemented by any method or technology for information storage. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多其它变化,为了简明它们没有在细节中提供。Those of ordinary skill in the art should understand that the discussion of any of the above embodiments is only exemplary, and is not intended to imply that the scope of the present disclosure (including the claims) is limited to these examples; under the spirit of the present invention, the above embodiments or There may also be combinations between technical features in different embodiments, steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
另外,为简化说明和讨论,并且为了不会使本发明难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本发明难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本发明的平台的(即,这些细节应当完全处于本领域技 术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本发明的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本发明。因此,这些描述应被认为是说明性的而不是限制性的。Additionally, well known power/ground connections to integrated circuit (IC) chips and other components may or may not be shown in the figures provided in order to simplify illustration and discussion, and in order not to obscure the present invention. . Furthermore, devices may be shown in block diagram form in order to avoid obscuring the present invention, and this also takes into account the fact that the details regarding the implementation of these block diagram devices are highly dependent on the platform on which the invention will be implemented (i.e. , these details should be fully within the understanding of those skilled in the art). Where specific details (eg, circuits) are set forth to describe exemplary embodiments of the invention, it will be apparent to those skilled in the art that these specific details may be used without or with changes The present invention is carried out below. Accordingly, these descriptions are to be considered illustrative rather than restrictive.
尽管已经结合了本发明的具体实施例对本发明进行了描述,但是根据前面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可以使用所讨论的实施例。Although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations to these embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures (eg, dynamic RAM (DRAM)) may use the discussed embodiments.
本发明的实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本发明的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本发明的保护范围之内。Embodiments of the present invention are intended to cover all such alternatives, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (10)

  1. 一种图像风格转换的方法,其特征在于,包括:A method for image style conversion, comprising:
    将待风格转换的第一风格的图像和作为参考图像的第二风格的图像,分别输入到编码器网络中的内容、风格编码器,分别提取出内容、风格编码特征图像;Input the image of the first style to be style-converted and the image of the second style as the reference image into the content and style encoders in the encoder network respectively, and extract the content and style encoding feature images respectively;
    将所述风格、内容编码特征图像分别输入到解码器网络中的多层感知机模块和残差卷积模块,分别进行感知机运算和残差卷积运算,分别得到自适应实例归一化模块的参数和中间过程特征图像;并将得到的参数共享到所述解码器网络的自适应实例归一化模块;The style and content encoding feature images are respectively input into the multi-layer perceptron module and residual convolution module in the decoder network, and the perceptron operation and residual convolution operation are respectively performed to obtain the adaptive instance normalization module respectively. parameters and intermediate process feature images; and share the obtained parameters to the adaptive instance normalization module of the decoder network;
    将所述中间过程特征图像输入到所述自适应实例归一化模块进行实例归一化,将实例归一化的特征图像输入到所述解码器网络的上采样层中,得到从第一风格转换到第二风格的目标图像;The intermediate process feature image is input into the adaptive instance normalization module for instance normalization, and the instance-normalized feature image is input into the upsampling layer of the decoder network to obtain a Convert to the target image of the second style;
    其中,由所述编、解码器网络组成的图像风格转换模型,是通过包括多个第一、二风格的图像的图像训练样本以及从图像训练样本中裁剪的实例图像预先训练得到的。The image style conversion model composed of the encoder and decoder networks is pre-trained by image training samples including a plurality of images of the first and second styles and instance images cropped from the image training samples.
  2. 根据权利要求1所述的方法,其特征在于,所述图像风格转换模型具体根据如下方法预先训练得到:The method according to claim 1, wherein the image style transfer model is pre-trained according to the following method:
    构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;constructing an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and a decoder network; the local style The migration model includes: local encoder and decoder networks;
    对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;Perform multiple iterative training on the image generation model, and after the number of iterative training reaches the first preset number of times, use the global style transfer model as the image style transfer model obtained by training;
    其中,一次迭代训练过程包括:Among them, an iterative training process includes:
    将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;Input the images of the first and second styles in the image training samples into the global style transfer model, and decouple the content features and style features and decode the content features twice through the global encoder network and decoder network. and style features are reconstructed for the first and second style images;
    将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;The instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network, and the content features are decoded. and style features are reconstructed instance images;
    将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;The style coding feature image of the first/second style image and the content coding feature image of the instance image of the second/first style are input into the global decoder, and the instance content image of the first/second style that is generated is obtained;
    将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;Inputting the generated first/second style instance content image into the content encoder and the style encoder in the local encoder to perform multi-layer convolution operations to decouple the style and content encoding feature image of the instance content image;
    将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;Inputting the decoupled style-encoding feature images of the second-style instance content images and the content-encoding feature images of the second-style images into a global decoder network to obtain a reconstructed cross-granularity second-style image;
    将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;inputting the decoupled style coding feature image of the first style image and the content coding feature image of the first style instance content image into the global decoder network to obtain a reconstructed cross-granularity instance first style image;
    根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;Adjusting the parameters of the global encoder and decoder network according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample;
    根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;adjusting the parameters of the local encoder and decoder networks according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample;
    根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风 格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。According to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and between the reconstructed cross-granularity instance first style image and the corresponding first style instance image distance, the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
  3. 根据权利要求1所述的方法,其特征在于,所述在迭代次数达到第一预设次数后,还包括:The method according to claim 1, wherein after the number of iterations reaches a first preset number of times, the method further comprises:
    对所述图像生成模型基于判别器进行多次对抗学习的迭代训练,当对抗学习的迭代训练次数达到第二预设次数时,将图像生成模型中的全局风格迁移模型作为最终的训练得到的图像风格转换模型。Perform multiple iterative training of confrontational learning on the image generation model based on the discriminator, and when the number of iterative training times of the confrontational learning reaches the second preset number of times, the global style transfer model in the image generation model is used as the final image obtained by training Style transfer model.
  4. 根据权利要求2或3所述的方法,其特征在于,所述将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像,具体包括:The method according to claim 2 or 3, wherein the images of the first and second styles in the image training samples are input into the global style transfer model, and the images of the first and second styles in the image training samples are passed through the global encoder network, Decoupling the content feature and style feature of the decoder network twice, decoding the content feature and style feature to obtain the reconstructed first and second style images, including:
    将所述图像训练样本中的第一风格的图像输入到所述全局的编码器网络,通过所述全局的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第一风格的图像的内容编码特征图像和风格编码特征图像;The image of the first style in the image training sample is input into the global encoder network, and the multi-layer convolution operation is performed by the content encoder and the style encoder in the global encoder network. A content-encoding feature image and a style-encoding feature image of a style image;
    将所述图像训练样本中的第二风格的图像输入到所述全局的编码器网络,通过所述全局的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第二风格的图像的内容编码特征图像和风格编码特征图像;The image of the second style in the image training sample is input into the global encoder network, and the multi-layer convolution operation is performed by the content encoder and the style encoder in the global encoder network. Content-encoding feature images and style-encoding feature images of two styles of images;
    将第一风格的图像的风格编码特征图像和第二风格的图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,所述全局的解码器网络得到融合了第一风格的图像的风格、第二风格的图像的内容的第一生成图像;The style encoding feature image of the first style image and the content encoding feature image of the second style image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network. The decoder network obtains a first generated image that combines the style of the first style image and the content of the second style image;
    将第二风格的图像的风格编码特征图像和第一风格的图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,得到融合了第二风格的图像的风格、第一风格的图像的内容的第二生成图像;The style encoding feature image of the second style image and the content encoding feature image of the first style image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and the fusion is obtained. The style of the image of the second style, the second generated image of the content of the image of the first style;
    将第一生成图像输入到所述全局的编码器网络,通过所述全局的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第一生成图像的内容编码特征图像和风格编码特征图像;The first generated image is input into the global encoder network, and the content encoder and style encoder of the global encoder network perform multi-layer convolution operations to decouple the content encoding feature image of the first generated image and style encoding feature image;
    将第二生成图像输入到所述全局的编码器网络,通过所述全局的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第二生成图像的内容编码特征图像和风格编码特征图像;The second generated image is input into the global encoder network, and the content encoder and style encoder of the global encoder network perform a multi-layer convolution operation to decouple the content encoding feature image of the second generated image and style encoding feature image;
    将第一生成图像的风格编码特征图像和第二生成图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第一风格的图像;The style encoding feature image of the first generated image and the content encoding feature image of the second generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained. a style image;
    将第二生成图像的风格编码特征图像和第一生成图像的内容编码特征图像,分别输入到所述全局的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第二风格的图像。The style encoding feature image of the second generated image and the content encoding feature image of the first generated image are respectively input into the multi-layer perceptron module and the residual convolution module in the global decoder network, and finally the reconstructed first image is obtained. Second style image.
  5. 根据权利要求2或3所述的方法,其特征在于,所述将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像,具体包括:The method according to claim 2 or 3, wherein the instance images cropped from the images of the first and second styles are input into the local style transfer model, and the local encoder network, Decoupling the content feature and style feature of the decoder network twice, decoding the content feature and style feature to obtain the reconstructed instance image, including:
    将第一风格的实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第一风格的实例图像的内容编码特征图像和风格编码特征图像;The instance images of the first style are input into the local encoder network, and the content encoder and the style encoder in the local encoder network perform multi-layer convolution operations to decouple the instance images of the first style. Content-encoding feature images and style-encoding feature images;
    将第二风格的实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络中的内容编码器和风格编码器进行多层卷积操作解耦出第二风格的实例图像的内容编码特征图像和风格编码特征图像;Input the instance image of the second style into the local encoder network, and perform multi-layer convolution operations through the content encoder and the style encoder in the local encoder network to decouple the instance image of the second style. Content-encoding feature images and style-encoding feature images;
    将第一风格的实例图像的风格编码特征图像和第二风格的实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,得到融合了第一风格的实例图像的风格、第二风格的实例图 像的内容的第一生成实例图像;Input the style coding feature image of the instance image of the first style and the content coding feature image of the instance image of the second style into the multilayer perceptron module and the residual convolution module in the local decoder network, respectively, to obtain a first generated instance image that combines the style of the instance image of the first style and the content of the instance image of the second style;
    将第二风格的实例图像的风格编码特征图像和第一风格的实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,得到融合了第二风格的实例图像的风格、第一风格的实例图像的内容的第二生成实例图像;Input the style-encoded feature image of the instance image of the second style and the content-encoded feature image of the instance image of the first style into the multilayer perceptron module and the residual convolution module in the local decoder network, respectively, to obtain a second generated instance image that combines the style of the instance image of the second style and the content of the instance image of the first style;
    将第一生成实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第一生成实例图像的内容编码特征图像和风格编码特征图像;Input the first generated instance image into the local encoder network, and perform multi-layer convolution operations through the content encoder and style encoder of the local encoder network to decouple the content encoding features of the first generated instance image. Image and style encoding feature images;
    将第二生成实例图像输入到所述局部的编码器网络,通过所述局部的编码器网络的内容编码器和风格编码器进行多层卷积操作解耦出第二生成实例图像的内容编码特征图像和风格编码特征图像;Inputting the second generated instance image into the local encoder network, and performing multi-layer convolution operations through the content encoder and style encoder of the local encoder network to decouple the content encoding features of the second generated instance image Image and style encoding feature images;
    将第一生成实例图像的风格编码特征图像和第二生成实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第一风格的实例图像;Input the style-encoded feature image of the first generated instance image and the content-encoded feature image of the second generated instance image to the multilayer perceptron module and the residual convolution module in the local decoder network respectively, and finally obtain the reconstruction An instance image of the first style;
    将第二生成实例图像的风格编码特征图像和第一生成实例图像的内容编码特征图像,分别输入到所述局部的解码器网络中的多层感知机模块和残差卷积模块,最终得到重建的第二风格的实例图像;The style encoding feature image of the second generated instance image and the content encoding feature image of the first generated instance image are respectively input into the multilayer perceptron module and the residual convolution module in the local decoder network, and finally the reconstruction is obtained. Instance image of the second style;
    其中,第一/二风格的实例图像为从第一/二风格的图像中裁剪出的实例图像。The instance image of the first/second style is an instance image cropped from the image of the first/second style.
  6. 一种图像风格转换的装置,其特征在于,包括:如权利要求1-5任一所述的方法训练得到的图像风格转换模型,用于将输入的待风格转换的第一风格的图像,根据输入的作为参考图像的第二风格的图像,转换为第二风格的目标图像。A device for image style conversion, comprising: an image style conversion model trained by the method according to any one of claims 1-5, for converting the inputted image of the first style to be style-converted, according to The input image of the second style as the reference image is converted into the target image of the second style.
  7. 一种图像风格转换模型的训练方法,其特征在于,包括:A method for training an image style transfer model, comprising:
    构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模 型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;constructing an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network and a decoder network; the local style The migration model includes: local encoder and decoder networks;
    对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;Perform multiple iterative training on the image generation model, and after the number of iterative training reaches the first preset number of times, use the global style transfer model as the image style transfer model obtained by training;
    其中,一次迭代训练过程包括:Among them, an iterative training process includes:
    将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;Input the images of the first and second styles in the image training samples into the global style transfer model, and decouple the content features and style features and decode the content features twice through the global encoder network and decoder network. and style features are reconstructed for the first and second style images;
    将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;The instance images cropped from the first and second style images are input into the local style transfer model, and the content features and style features are decoupled twice through the local encoder network and decoder network, and the content features are decoded. and style features are reconstructed instance images;
    将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;The style coding feature image of the first/second style image and the content coding feature image of the instance image of the second/first style are input into the global decoder, and the instance content image of the first/second style that is generated is obtained;
    将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;Inputting the generated first/second style instance content image into the content encoder and the style encoder in the local encoder to perform multi-layer convolution operations to decouple the style and content encoding feature image of the instance content image;
    将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;Inputting the decoupled style-encoding feature images of the second-style instance content images and the content-encoding feature images of the second-style images into a global decoder network to obtain a reconstructed cross-granularity second-style image;
    将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;inputting the decoupled style coding feature image of the first style image and the content coding feature image of the first style instance content image into the global decoder network to obtain a reconstructed cross-granularity instance first style image;
    根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风 格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;Adjust the parameters of the global encoder and decoder network according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample;
    根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;adjusting the parameters of the local encoder and decoder networks according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample;
    根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。According to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and between the reconstructed cross-granularity instance first style image and the corresponding first style instance image distance, the local encoder and decoder network and the global encoder and decoder network are jointly adjusted.
  8. 根据权利要求7所述的方法,其特征在于,所述在迭代次数达到第一预设次数后,还包括:The method according to claim 7, wherein after the number of iterations reaches the first preset number of times, the method further comprises:
    基于判别器进行多次对抗学习的迭代训练,当对抗学习的迭代训练次数达到第二预设次数时,将所述全局风格迁移模型作为最终的训练得到的图像风格转换模型。Based on the discriminator, multiple times of iterative training of adversarial learning are performed, and when the number of iterative training times of adversarial learning reaches a second preset number of times, the global style transfer model is used as the final image style transfer model obtained by training.
  9. 一种图像风格转换模型的训练装置,其特征在于,包括:A training device for an image style conversion model, comprising:
    图像生成模型构建模块,用于构建图像生成模型;其中,所述图像生成模型中包括:全局风格迁移模型和局部风格迁移模型;其中,所述全局风格迁移模型中包括:全局的编码器网络、解码器网络;所述局部风格迁移模型中包括:局部的编码器、解码器网络;An image generation model building module, used for building an image generation model; wherein, the image generation model includes: a global style transfer model and a local style transfer model; wherein, the global style transfer model includes: a global encoder network, a decoder network; the local style transfer model includes: a local encoder and a decoder network;
    图像生成模型训练模块,用于对图像生成模型进行多次迭代训练,在迭代训练次数达到第一预设次数后,将全局风格迁移模型作为训练得到的图像风格转换模型;其中,一次迭代训练过程包括:将所述图像训练样本中的第一、二风格的图像输入到所述全局风格迁移模型,经过所述全局的编码器网络、解码器网络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的第一、二风格的图像;将从第一、二风格的图像中裁剪出的实例图像输入到所述局部风格迁移模型,经过所述局部的编码器网络、解码器网 络的两次解耦内容特征和风格特征、解码内容特征和风格特征得到重建的实例图像;将第一/二风格图像的风格编码特征图像和第二/一风格的实例图像的内容编码特征图像输入到全局的解码器,得到生成的第一/二风格的实例内容图像;将生成的第一/二风格的实例内容图像输入到局部的编码器中的内容编码器和风格编码器进行多层卷积操作解耦出所述实例内容图像的风格、内容编码特征图像;将解耦出的第二风格的实例内容图像的风格编码特征图像、第二风格的图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度第二风格的图像;将解耦出的第一风格的图像的风格编码特征图像、第一风格的实例内容图像的内容编码特征图像输入到全局的解码器网络,得到重建的跨粒度实例第一风格图像;根据重建的第一/二风格的图像与所述图像训练样本中相应的第一/二风格的图像之间的距离,调整所述全局的编码器和解码器网络的参数;根据重建的实例图像与从所述图像训练样本中相应的第一/二风格的图像中裁剪出的实例图像之间的距离,调整所述局部的编码器和解码器网络的参数;根据重建的跨粒度第二风格的图像与所述图像训练样本中相应的第二风格的图像之间的距离,以及重建的跨粒度实例第一风格图像与相应的第一风格的实例图像之间的距离,联合调整所述局部编码器和解码器网络、全局编码器和解码器网络。The image generation model training module is used to perform multiple iterative training on the image generation model. After the number of iterative training times reaches the first preset number of times, the global style transfer model is used as the image style conversion model obtained by training; wherein, an iterative training process is performed. Including: inputting the first and second style images in the image training samples into the global style transfer model, decoupling content features and style features twice through the global encoder network and decoder network, decoding The reconstructed first and second style images are obtained from content features and style features; the instance images cropped from the first and second style images are input to the local style transfer model, and the local encoder network, decoding The reconstructed instance image is obtained by decoupling the content feature and style feature twice of the decoder network, decoding the content feature and style feature; The image is input to the global decoder, and the generated first/second style instance content image is obtained; the generated first/second style instance content image is input to the content encoder and style encoder in the local encoder for multi-processing. The layer convolution operation decouples the style of the instance content image and the content encoding feature image; the decoupled style encoding feature image of the second style instance content image and the content encoding feature image of the second style image are input into the The global decoder network obtains the reconstructed cross-granularity second style image; the decoupled style encoding feature image of the first style image and the content encoding feature image of the first style instance content image are input to the global decoding obtain the reconstructed cross-granularity instance first style image; according to the distance between the reconstructed first/second style image and the corresponding first/second style image in the image training sample, adjust the global parameters of the encoder and decoder networks; the local encoder and decoder are adjusted according to the distance between the reconstructed instance image and the instance image cropped from the corresponding first/second style image in the image training sample parameters of the decoder network; according to the distance between the reconstructed cross-granularity second style image and the corresponding second style image in the image training sample, and the reconstructed cross-granularity instance first style image and the corresponding first style image The distances between instance images of the style are jointly adjusted for the local encoder and decoder network and the global encoder and decoder network.
  10. 根据权利要求9所述的装置,其特征在于,还包括:The device of claim 9, further comprising:
    对抗训练模块,用于对所述图像生成模型基于判别器进行多次对抗学习的迭代训练,当对抗学习的迭代训练次数达到第二预设次数时,将图像生成模型中的全局风格迁移模型作为最终的训练得到的图像风格转换模型。The adversarial training module is used to perform multiple iterative training of adversarial learning on the image generation model based on the discriminator. When the number of iterative training times of the adversarial learning reaches a second preset number of times, the global style transfer model in the image generation model is used as the The final trained image style transfer model.
PCT/CN2021/093432 2020-09-02 2021-05-12 Image style transfer method and apparatus, and image style transfer model training method and apparatus WO2022048182A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010907304.1 2020-09-02
CN202010907304.1A CN111815509B (en) 2020-09-02 2020-09-02 Image style conversion and model training method and device

Publications (1)

Publication Number Publication Date
WO2022048182A1 true WO2022048182A1 (en) 2022-03-10

Family

ID=72860716

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/093432 WO2022048182A1 (en) 2020-09-02 2021-05-12 Image style transfer method and apparatus, and image style transfer model training method and apparatus

Country Status (2)

Country Link
CN (1) CN111815509B (en)
WO (1) WO2022048182A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610935A (en) * 2022-05-12 2022-06-10 之江实验室 Method and system for synthesizing semantic image of text control image style
GB2612775A (en) * 2021-11-10 2023-05-17 Sony Interactive Entertainment Inc System and method for generating assets
CN116402067A (en) * 2023-04-06 2023-07-07 哈尔滨工业大学 Cross-language self-supervision generation method for multi-language character style retention

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815509B (en) * 2020-09-02 2021-01-01 北京邮电大学 Image style conversion and model training method and device
CN112883806B (en) * 2021-01-21 2024-03-22 杭州广电云网络科技有限公司 Video style migration method and device based on neural network, computer equipment and storage medium
CN113160042B (en) * 2021-05-21 2023-02-17 北京邮电大学 Image style migration model training method and device and electronic equipment
CN113343878A (en) * 2021-06-18 2021-09-03 北京邮电大学 High-fidelity face privacy protection method and system based on generation countermeasure network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263865A (en) * 2019-06-24 2019-09-20 北方民族大学 A kind of semi-supervised multi-modal multi-class image interpretation method
US10657676B1 (en) * 2018-06-28 2020-05-19 Snap Inc. Encoding and decoding a stylized custom graphic
CN111179215A (en) * 2019-11-29 2020-05-19 北京航空航天大学合肥创新研究院 Method and system for analyzing internal structure of cell based on cell bright field picture
CN111583165A (en) * 2019-02-19 2020-08-25 京东方科技集团股份有限公司 Image processing method, device, equipment and storage medium
CN111815509A (en) * 2020-09-02 2020-10-23 北京邮电大学 Image style conversion and model training method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10832387B2 (en) * 2017-07-19 2020-11-10 Petuum Inc. Real-time intelligent image manipulation system
CN109829353B (en) * 2018-11-21 2023-04-18 东南大学 Face image stylizing method based on space constraint
CN111445476B (en) * 2020-02-27 2023-05-26 上海交通大学 Monocular depth estimation method based on multi-mode unsupervised image content decoupling
CN111539896B (en) * 2020-04-30 2022-05-27 华中科技大学 Domain-adaptive-based image defogging method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657676B1 (en) * 2018-06-28 2020-05-19 Snap Inc. Encoding and decoding a stylized custom graphic
CN111583165A (en) * 2019-02-19 2020-08-25 京东方科技集团股份有限公司 Image processing method, device, equipment and storage medium
CN110263865A (en) * 2019-06-24 2019-09-20 北方民族大学 A kind of semi-supervised multi-modal multi-class image interpretation method
CN111179215A (en) * 2019-11-29 2020-05-19 北京航空航天大学合肥创新研究院 Method and system for analyzing internal structure of cell based on cell bright field picture
CN111815509A (en) * 2020-09-02 2020-10-23 北京邮电大学 Image style conversion and model training method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2612775A (en) * 2021-11-10 2023-05-17 Sony Interactive Entertainment Inc System and method for generating assets
CN114610935A (en) * 2022-05-12 2022-06-10 之江实验室 Method and system for synthesizing semantic image of text control image style
CN116402067A (en) * 2023-04-06 2023-07-07 哈尔滨工业大学 Cross-language self-supervision generation method for multi-language character style retention
CN116402067B (en) * 2023-04-06 2024-01-30 哈尔滨工业大学 Cross-language self-supervision generation method for multi-language character style retention

Also Published As

Publication number Publication date
CN111815509A (en) 2020-10-23
CN111815509B (en) 2021-01-01

Similar Documents

Publication Publication Date Title
WO2022048182A1 (en) Image style transfer method and apparatus, and image style transfer model training method and apparatus
Sheng et al. Temporal context mining for learned video compression
US11159790B2 (en) Methods, apparatuses, and systems for transcoding a video
TW202247650A (en) Implicit image and video compression using machine learning systems
CN113837938B (en) Super-resolution method for reconstructing potential image based on dynamic vision sensor
CN111488932B (en) Self-supervision video time-space characterization learning method based on frame rate perception
KR20200114436A (en) Apparatus and method for performing scalable video decoing
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
CN112381716B (en) Image enhancement method based on generation type countermeasure network
CN116803079A (en) Scalable coding of video and related features
Liu et al. End-to-end neural video coding using a compound spatiotemporal representation
Hamdi et al. A New Image Enhancement and Super Resolution technique for license plate recognition
CN116957931A (en) Method for improving image quality of camera image based on nerve radiation field
CN112651911A (en) High dynamic range imaging generation method based on polarization image
Jia et al. Event-based semantic segmentation with posterior attention
CN112418127B (en) Video sequence coding and decoding method for video pedestrian re-identification
CN112884636B (en) Style migration method for automatically generating stylized video
KR20220043912A (en) Method and Apparatus for Coding Feature Map Based on Deep Learning in Multitasking System for Machine Vision
Kim et al. End-to-end learnable multi-scale feature compression for vcm
CN113706572B (en) End-to-end panoramic image segmentation method based on query vector
Wang et al. Automatic model-based dataset generation for high-level vision tasks of autonomous driving in haze weather
Que et al. Residual dense U‐Net for abnormal exposure restoration from single images
CN116962657B (en) Color video generation method, device, electronic equipment and storage medium
Xu et al. VQ-NeRV: A Vector Quantized Neural Representation for Videos
WO2023165487A1 (en) Feature domain optical flow determination method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21863252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21863252

Country of ref document: EP

Kind code of ref document: A1