WO2022022001A1 - 对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统 - Google Patents
对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统 Download PDFInfo
- Publication number
- WO2022022001A1 WO2022022001A1 PCT/CN2021/093265 CN2021093265W WO2022022001A1 WO 2022022001 A1 WO2022022001 A1 WO 2022022001A1 CN 2021093265 W CN2021093265 W CN 2021093265W WO 2022022001 A1 WO2022022001 A1 WO 2022022001A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- style
- network
- feature map
- content
- image
- Prior art date
Links
- 238000012546 transfer Methods 0.000 title claims abstract description 133
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims description 30
- 238000013507 mapping Methods 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 16
- 230000006835 compression Effects 0.000 claims description 11
- 238000007906 compression Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 10
- 238000004821 distillation Methods 0.000 claims description 6
- 230000008707 rearrangement Effects 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 30
- 238000010586 diagram Methods 0.000 description 14
- 230000000694 effects Effects 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
Definitions
- the present application relates to the technical field of image or video processing, for example, to a method for compressing a style transfer network and a method, apparatus and system for style transfer.
- style transfer is an important image editing task.
- the purpose of style transfer is to transfer the visual elements of a style image to another content image. This results in a stylized image.
- This scheme makes the style statistics of the reconstructed image (such as Gram matrix) close to the style statistics of the style image by updating the pixel values of the reconstructed image in a gradient descent manner, and then makes the style statistics of the reconstructed image
- the high-level feature representation of the super-resolution test sequence (Visual Geometry Group, VGG) network is close to the feature representation of the content image, so as to obtain a reconstructed image that has both the characteristics of the style image and the content image.
- VGG Visual Geometry Group
- the other is a style transfer algorithm based on offline model optimization.
- the information of the style image is added while reconstructing the image through a pre-trained feedforward network.
- This method is the main method adopted by the industry.
- the disadvantage of this method is that a single model can learn fewer styles, usually a new model needs to be retrained for a new style, and the scalability is weak.
- the present application provides a method for compressing a style transfer network and a method, device and system for style transfer, so as to solve the problem that a single style transfer model can learn less styles, the scalability is weak, and the number of parameters of the style transfer model is relatively small. There are many problems, the computational overhead is large, and it is not conducive to mobile deployment.
- a method for style transfer including:
- a style transfer image is generated according to the content feature map before sorting and the rearranged style feature map.
- a style transfer system includes a style transfer network, the style transfer network includes an encoding network, a style mapping unit and a decoding network, wherein,
- the encoding network is configured to perform encoding processing on the input content image and style image, generate a content feature map corresponding to the content image and a style feature map corresponding to the style image, and combine the content feature map and the style feature map. outputting a style feature map to the style mapping unit, and outputting the content feature map to the decoding network;
- the decoding network is configured to generate a style transfer image according to the content feature map and the rearranged style feature map.
- the weight of the teacher network is fixed, the student network is trained according to a preset loss function, and the trained network is used as a compressed style transfer network.
- an apparatus for style transfer comprising:
- the feature sorting module is configured to, after obtaining the content feature map corresponding to the content image and the style feature map corresponding to the style image, respectively perform channel-by-channel sorting on the content feature map and the style feature map;
- a rearrangement module configured to obtain the order statistics information of the sorted content feature maps, and rearrange the sorted style feature maps according to the order statistics information
- the style transfer image generation module is configured to generate a style transfer image according to the content feature map before sorting and the rearranged style feature map.
- an apparatus for compressing the above style transfer network comprising:
- the student network creation module is set to use the style transfer network as the teacher network, and establish a student network with a structure similar to the teacher network;
- a convolutional connection module configured to connect the output of each layer in the different layers of the teacher network with the layer corresponding to each layer of the student network through a convolution of a preset size
- the student network training module is set to fix the weight of the teacher network, train the student network according to a preset loss function, and use the network obtained by training as a compressed style transfer network.
- Also provided is a computer device comprising:
- processors one or more processors
- storage means arranged to store one or more programs
- the one or more programs when executed by the one or more processors, cause the one or more processors to implement the above-described method.
- FIG. 1 is a flowchart of a method for style transfer provided in Embodiment 1 of the present application;
- FIG. 2 is a structural block diagram of a style transfer system provided in Embodiment 2 of the present application.
- FIG. 3 is a schematic diagram of a network structure of a coding network provided in Embodiment 2 of the present application;
- FIG. 4 is a schematic diagram of a network structure of another decoding network provided in Embodiment 2 of the present application.
- FIG. 5 is a structural block diagram of a style transfer system provided in Embodiment 3 of the present application.
- FIG. 6 is a schematic diagram of a model compression provided by Embodiment 3 of the present application.
- FIG. 7 is a flowchart of a method for compressing a style transfer network according to Embodiment 4 of the present application.
- Embodiment 8 is a structural block diagram of an apparatus for style transfer provided in Embodiment 5 of the present application.
- FIG. 9 is a structural block diagram of an apparatus for compressing a style transfer network according to Embodiment 6 of the present application.
- FIG. 10 is a schematic structural diagram of a computer device according to Embodiment 7 of the present application.
- FIG. 1 is a flowchart of a style transfer method provided in Embodiment 1 of the present application.
- This embodiment is applicable to the following scenarios: a user uploads a picture or a video, and can select an image from a preset artistic style map for the style The style map of the migration target, or upload a style map by yourself as the migration target, and finally return the image or video after style migration.
- the method can be applied to video or picture editing applications, live broadcast applications, short video applications and other products, can be executed by a style transfer device, and can include the following steps:
- Step 110 After obtaining the content feature map corresponding to the content image and the style feature map corresponding to the style image, sort the content feature map and the style feature map channel by channel respectively.
- the content feature map may be an image feature (image features) generated by the content image passing through an encoding network, and the encoding network extracts feature information such as color, statistics, texture, and structure of the content image; the style feature map
- the style image can be passed through an encoding network, and the encoding network can extract the color, statistics, texture, structure and other feature information of the style image, and output the feature image.
- step 110 the content feature map and the style feature map may be sorted channel by channel.
- step 110 may include the following steps:
- Step 110-1 Perform channel-by-channel vectorization processing on the content feature map and the style feature map, respectively, to obtain a vectorized content feature vector and a vectorized style feature vector.
- a vectorization process may be as follows: the feature map is vectorized channel by channel into a data set containing N pixel point samples in the order from left to right and top to bottom.
- the content feature map is a feature map with a number of 2 channels, each channel includes a 2x2 matrix, i.e. x ⁇ R 2 ⁇ ? ⁇ ? .
- the style feature map is also a feature map with a number of 2 channels, each channel includes a 2x2 matrix, i.e. y ⁇ R 2 ⁇ ? ⁇ ? .
- the obtained content feature vector is The style feature vector is
- Step 110-2 Sort the vectorized content feature vector and the vectorized style feature vector channel by channel respectively.
- the pair may be along the second dimension Sort to get the sorted content feature vector and the sorted style feature vector
- Step 120 Acquire order statistics of the sorted content feature maps, and rearrange the sorted style feature maps according to the order statistics.
- the order statistics of the sorted content feature maps can be obtained, and the sorted style feature maps can be rearranged based on the order statistics, so that the rearrangement
- the latter style feature map has the order statistics of the content feature map, which not only keeps the style statistics of the style feature map unchanged, but also introduces the order statistics of the content feature map, which facilitates the migration of any style image.
- step 120 may include the following sub-steps:
- Step 120-1 compare the content feature vector in the content feature map before sorting with the content feature vector in the sorted content feature map, and determine the sorting index of the content feature vector in the sorted content feature map, as the order Statistics.
- Step 120-2 Rearrange the style feature vectors in the sorted style feature map according to the order statistical information.
- the style feature vectors in the sorted style feature map can be rearranged according to the order statistical information. , get the rearranged feature tensor
- Step 120-3 Restore the spatial dimension of the rearranged style feature vector to obtain a rearranged style feature map.
- the rearranged style feature tensor is recovered , the rearranged style feature map z ⁇ R C ⁇ H ⁇ W can be obtained.
- the rearranged style feature map z ⁇ R C ⁇ H ⁇ W can be obtained.
- Step 130 Generate a style transfer image according to the content feature map before sorting and the rearranged style feature map.
- the rearranged style feature map not only retains the second-order statistical information of the style features, but also has the order statistics information of the content feature map, and reconstructs the content feature map and the rearranged style feature map to generate a style.
- Migrating images can better present the integration effect of style images and content images.
- step 130 may include the following steps:
- the content feature map before sorting and the rearranged style feature map are input into a pre-trained decoding network, so that the content feature map before sorting and the rearranged style feature map are processed by the decoding network.
- Decode reconstruction to generate style transfer images.
- the content feature map before sorting and the style feature map after rearrangement are decoded and reconstructed by a decoding network, and a style transfer image is output.
- the network structure of the decoding network is symmetrical with the network structure of the encoding network.
- content feature information of different levels of the content feature map can be introduced into the decoding network in a skip connection manner, which can help the decoding network to better recover the details of the content image.
- the method of skip connection is the method of direct introduction, and the content feature information of different levels of the content feature map can be directly introduced into the decoding network without going through any sub-network.
- the order statistics of the sorted content feature maps are obtained, and statistics are made according to the order of the sorted content feature maps.
- the information rearranges the sorted style feature maps. In this way, for a specified content feature map, after determining the order statistics of the content feature map, it can be based on the order statistics of the content feature map.
- the style feature map is rearranged so that the rearranged style feature map retains both its own style statistics and the order statistics of the content feature map, so that when the style of the current content image needs to be converted, it is only necessary to convert the style of the current content image according to the content
- the order statistics of the images are used to rearrange and decode the style feature map, so as to realize any style transfer of the content image corresponding to the specified content feature map, without retraining the model for new styles, and it has strong scalability.
- FIG. 2 is a structural block diagram of a style transfer system provided in Embodiment 2 of the present application, wherein the style transfer system may include a style transfer network, and the style transfer network includes an encoding network 210 , a style mapping unit 220 and a decoding network 230 .
- the style transfer system may include a style transfer network
- the style transfer network includes an encoding network 210 , a style mapping unit 220 and a decoding network 230 .
- the encoding network 210 is also called an encoder, and it has two inputs, namely a content image and a style image.
- the encoding network 210 encodes the content image and outputs the content feature map, and encodes the style image and outputs the style. feature map.
- the encoding mentioned in this embodiment may include, but is not limited to, extracting feature information of different levels of content images or style images.
- the encoding network 210 is connected to the style mapping unit 220 and the decoding network 230 , which outputs the content feature map and the style feature map to the style mapping unit 220 , and outputs the content feature map to the decoding network 230 .
- the encoding network 210 may be a pre-trained VGG network, such as VGG-19, which may be pre-trained on the image dataset ImageNet, wherein the weights of the encoder do not participate in the training during the training process.
- the network structure of the encoding network 210 may be shown in FIG. 3 .
- the encoding network 210 may include 9 convolutional layers (conv) and 3 max pooling layers (maxpool), each convolutional layer is followed by a A Rectified Linear Unit (ReLU) nonlinear activation layer.
- Conv convolutional layers
- maxpool max pooling layers
- ReLU Rectified Linear Unit
- the content image or style image is an image in a blue-green-red (Blue-Green-Red, BGR) format (it may also be an image in other formats, such as RGB, which is not limited in this embodiment)
- BGR blue-green-Red
- the style image when the content image or After the style image is input to the encoding network 210, it first passes through a convolution layer including two convolution kernels (each convolution kernel size is 3x3) and the number of output channels is 64, and then passes through a max pooling layer; Including two convolution kernels (the size of each convolution kernel is 3x3), the number of output channels is 128 convolution layers, and then through a maximum pooling layer for downsampling; then after including four convolution kernels (each volume The kernel size is 3x3), the number of output channels is 256 convolution layers, and a maximum pooling layer; finally, it goes through a convolution layer that includes a convolution kernel (the size of the convolution kernel is 3x3) and the number of output
- the style mapping unit 220 is configured to perform channel-by-channel sorting on the content feature maps and the style feature maps respectively; obtain the order statistics information of the sorted content feature maps, and perform the sorting on the sorted style feature maps according to the order statistics information. rearranging, and outputting the rearranged style feature map to the decoding network.
- the style mapping unit 220 sorts the content feature map and the style feature map channel by channel respectively, and obtains the order statistics of the sorted content feature map (that is, the ranking index), and then according to The order statistic information rearranges the sorted style feature map, so that the rearranged style feature map retains both the second-order statistic information of the style feature and the order statistic information of the content feature map.
- the style mapping unit 220 After the style mapping unit 220 obtains the rearranged style feature map, it can output the rearranged style feature map to the decoding network 230 .
- the style mapping unit 220 is an independent unit without training weights, and has the characteristics of no parameters. Therefore, adding the style mapping unit 220 to the style transfer network will not increase the size of the network, and the computational cost will also be reduced. relatively low.
- the decoding network 230 is configured to generate a style transfer image according to the content feature map and the rearranged style feature map.
- the decoding network 230 receives the content feature map from the encoding network 210 and the rearranged style feature map from the style mapping unit 220, the content feature map and the rearranged style feature The image is decoded and reconstructed, and the style transfer image is output.
- the network structure of the decoding network 230 is symmetric with the network structure of the encoding network 210 .
- the input of the input layer is the rearranged style feature map, and the content feature information of different levels of the content feature map can be introduced into the decoding network 230 in a skip connection manner, This can help the decoding network 230 to better recover the details of the content image.
- the encoding network 210 is further configured to output the style feature map to the decoding network 230, and the decoding network 230 may also add an Adaptive Instance Normalization (AdaIN) sub-network, the AdaIN sub-network Including two inputs, which are the output of the previous layer of the AdaIN sub-network in the decoding network 210 and the style feature map of the corresponding scale of the output, and then the AdaIN sub-network can, according to the mean and variance of the style feature map of the corresponding scale of the output, one by one Normalize this output channel-wise, i.e.:
- AdaIN Adaptive Instance Normalization
- ⁇ (x) is the channel-wise mean of x
- ⁇ (y) is the channel-wise mean of the style feature map
- ⁇ (x) is the channel-wise standard deviation of x
- ⁇ (y) is the channel-by-channel value of the style feature map
- the variance of , x is the output of the decoding network 210 in the previous layer of the AdaIN sub-network.
- the AdaIN sub-network strengthens the statistical distribution of style features during image reconstruction by changing the channel-wise mean and variance of the feature maps.
- the network structure of the decoding network 230 can be shown in FIG. 4
- the input layer (Input) of the decoding network 230 is the rearranged style feature map (output of style projection) output by the style mapping unit 220
- the convolution layer is up-sampling through bilinear interpolation, and is cascaded with the content feature map of the corresponding scale obtained by the encoding network 210, and finally includes a convolution kernel (the size of the convolution kernel is 3x), the output channel
- the number of convolutional layers is 64, and the convolutional layer including 1 convolution kernel (the size of the convolution kernel is 3x3), the number of output channels is 3, and finally the output is mapped to the RGB space.
- the loss function designed based on the above style transfer network may include a content loss function, a style loss function and a reconstruction loss function.
- the content loss function can be measured by the mean square error loss of the features output by the encoding network 210 obtained by passing the style transfer image and the content image through the encoding network 210, namely:
- c represents the content image
- p represents the style transfer image
- E(c) represents the content feature map obtained after the content image passes through the encoding network 210
- E(p) represents the style transfer feature map obtained after the style transfer image passes through the encoding network 210 .
- the style loss function can be measured by the feature of the encoding network 210 output obtained by passing the style transfer image and the style image through the encoding network 210, and the distance of the channel-by-channel mean value and the distance of the variance in some corresponding layers of the encoding network 210, namely:
- E i represents the i-th layer feature map obtained by the encoding network 210
- s represents the style image
- N represents the number of corresponding layers used to represent the style loss function.
- the reconstruction loss function is the closeness of the style transfer image reconstructed by the model to the content image when the input style image and content image are consistent (for example, the style image and the content image are the same image), namely:
- E(c) represents the feature map obtained after the content image passes through the encoding network 210
- E(p′) represents the feature map obtained after the style transfer image passes through the encoding network 210 .
- the final loss function is the weighted sum of the above three loss functions, namely:
- the optimization goal is to minimize the loss function L, and the weights of content loss, style loss, and reconstruction loss can be set as needed, which is not limited in this embodiment, for example, the weights of content loss, style loss, and reconstruction loss Take them as 1, 2, and 0.1, respectively.
- the style transfer network can be optimized by gradient descent, for example, take an initial step size of 0.1, and reduce the step size by a factor of 0.1 every 500 epochs.
- a style mapping unit without training parameters is introduced into the style transfer network, the style mapping unit is located between the encoding network and the decoding network, and is set to perform the content feature map and style feature map output by the encoding network. Sort channel by channel, and obtain the order statistics of the sorted content feature maps, then rearrange the sorted style feature maps according to the order statistics, and input the rearranged style feature maps to the decoding network for decoding, so that , for a specified content image, after determining the order statistical information of the content feature map of the content image, then based on the order statistical information of the content feature map, by rearranging the style feature map of the style image of any style , so that the rearranged style feature map has the order statistics of the content feature map, so as to realize any style transfer of the specified content image, without the need to retrain the model for the new style, with strong scalability and more freedom for users. big.
- FIG. 5 is a structural block diagram of a style transfer system provided in Embodiment 3 of the present application. This embodiment is implemented on the basis of the embodiment in FIG. 2.
- the style transfer system may further include a model compression module 240. It is set to perform model compression processing on the style transfer network in the embodiment of FIG. 2 .
- the style transfer network can be compressed based on the idea of model distillation.
- the model compression module 240 may include the following sub-modules:
- the student network creation sub-module is set to use the style transfer network as a teacher network, and establish a student network with a structure similar to the teacher network.
- the association sub-module is configured to connect the output of each layer in the different layers of the teacher network with the layer corresponding to the layer of the student network through a convolution of a preset size.
- the student network training sub-module is set to fix the weight of the teacher network, train the student network according to a preset loss function, and use the network obtained by training as a compressed style transfer network.
- the style transfer network is used as the teacher model, that is, the teacher network includes an encoding network, a style mapping unit and a decoding network, and then a teacher network is designed with A student model with a similar structure, where the similar structure refers to a structure with the same number of network layers, but with fewer convolution kernels and a smaller number of parameters.
- the output of each of the different layers of the teacher network is connected to the corresponding layer of the student network through a pre-set size convolutional layer with trainable parameters.
- the convolution may be a small convolution, for example, the convolution layer of the preset size may be a convolution layer with a convolution kernel of 1 ⁇ 1. Then the weight of the teacher network is fixed, and all the parameters of the student network are involved in the training.
- the loss function of the student network may include a style loss function, a content loss function, a reconstruction loss function, and a distillation loss function.
- the distillation loss function is the similarity between the features output by each layer in the different layers of the student network and the features output by the layers corresponding to the layer of the teacher network after being mapped by the above convolution, that is, the features output by different layers of the student network and
- the output of the corresponding layer of the teacher network is as close as possible after convolution mapping with a preset size, which can be expressed by the following formula:
- f i and gi are the 1x1 convolutions that map the output of each layer of the encoding network and the decoding network in the teacher network, respectively, E i (x) represents the output of the i-th layer of the encoding network of the student network, and D i (x) represents the output of the ith layer of the decoding network of the student network, E′ i (x) represents the output of the ith layer of the encoding network of the teacher network, D′ i (x) represents the ith layer of the decoding network of the teacher network
- the output of the layer, N represents the number of corresponding layers used to characterize the distillation loss function.
- the reconstruction loss function is the similarity between the style transfer image output by the student network and the style transfer image output by the teacher network, that is, the style transfer image output by the student network and the style transfer image output by the teacher network are as close as possible, which can be expressed by the following formula:
- the content loss function is measured by the mean square error loss of the features output by the encoding network of the student network obtained by passing the style transfer image and the content image output by the student network through the encoding network of the student network.
- the style loss function adopts the characteristics of the encoding network output of the student network obtained by passing the style transfer image output by the student network and the style image through the encoding network of the student network, and the distance of the channel-by-channel mean of some corresponding layers of the encoding network of the student network The distance from the variance is measured. Since the style loss function and the content loss function are similar to the loss function used in the training of the style transfer network in the second embodiment, reference may be made to the description of the second embodiment, which will not be repeated here.
- the final loss function of the student network can be:
- the style transfer network when performing model compression processing on the style transfer network, is used as the teacher network to establish a student network with a similar structure to the teacher network, and the output of each layer in the different layers of the teacher network is A pre-trained teacher network is used to supervise the student network by connecting it with the corresponding layer of the student network through a preset size convolution.
- a small model ie, the student network
- a small model can be trained directly from scratch. There is no need to pre-train the encoding network and decoding network in steps, which greatly compresses the volume of the style transfer network, and can better maintain the transfer effect and scalability of the network, reduce the computational cost, and achieve a lightweight model. Easy to deploy on mobile.
- FIG. 7 is a flowchart of a method for compressing a style transfer network provided in Embodiment 4 of the present application.
- the method can be applied to compress the style transfer network described in FIG. 2 , and may include the following steps:
- step 710 the style transfer network is used as a teacher network, and a student network with a structure similar to the teacher network is established.
- Step 720 Connect the output of each layer of the different layers of the teacher network to the layer corresponding to the layer of the student network through a convolution of a preset size.
- Step 730 Fix the weight of the teacher network, train the student network according to a preset loss function, and use the trained network as a compressed style transfer network.
- This embodiment is a method embodiment corresponding to Embodiment 3.
- Embodiment 8 is a structural block diagram of an apparatus for style transfer provided in Embodiment 5 of the present application, which may include the following modules:
- the feature sorting module 810 is configured to, after obtaining the content feature map corresponding to the content image and the style feature map corresponding to the style image, sort the content feature map and the style feature map channel by channel respectively;
- the rearranging module 820 is configured to set In order to obtain the order statistics information of the sorted content feature maps, and rearrange the sorted style feature maps according to the order statistics information;
- the rearranged style feature maps generate style transfer images.
- the style transfer apparatus provided by the embodiment of the present application can execute the style transfer method provided by any embodiment of the present application, and has functional modules and effects corresponding to the execution method.
- Embodiment 9 is a structural block diagram of an apparatus for compressing a style transfer network provided in Embodiment 6 of the present application, which may include the following modules:
- the student network creation module 910 is configured to use the style transfer network as the teacher network, and to establish a student network with a similar structure to the teacher network;
- the convolutional connection module 920 is configured to use each layer in the different layers of the teacher network The output of the student network is connected with the layer corresponding to each layer of the student network through a preset size of convolution;
- the student network training module 930 is set to fix the weight of the teacher network, and according to the preset loss function, to The student network is trained, and the trained network is used as a compressed style transfer network.
- An apparatus for compressing a style transfer network provided by an embodiment of the present application can execute the method for compressing a style transfer network provided by any embodiment of the present application, and has functional modules and effects corresponding to the execution method.
- FIG. 10 is a schematic structural diagram of a computer device provided in Embodiment 7 of the present application.
- the computer device includes a processor 100, a memory 101, an input device 102, and an output device 103; The number can be one or more.
- one processor 100 is used as an example; the processor 100, the memory 101, the input device 102 and the output device 103 in the computer equipment can be connected by a bus or in other ways. Take bus connection as an example.
- the memory 101 may be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the foregoing embodiments in the embodiments of the present application.
- the processor 100 executes various functional applications and data processing of the computer device by running the software programs, instructions and modules stored in the memory 101, ie, implements the methods mentioned in the above method embodiments.
- the eighth embodiment of the present application further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are used to execute the methods in the foregoing method embodiments when executed by a computer processor.
- a storage medium containing computer-executable instructions provided by an embodiment of the present application the computer-executable instructions of which are not limited to the above method operations, and can also perform related operations in the methods provided by any embodiment of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
本文公开了一种对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统。该风格迁移的方法包括:在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对所述内容特征图以及所述风格特征图分别进行逐通道排序;获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排;根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
Description
本申请要求在2020年07月27日提交中国专利局、申请号为202010733581.5的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本申请涉及图像或视频处理技术领域,例如涉及一种对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统。
在计算机视觉领域,风格迁移(style transfer)是一类很重要的图像编辑任务,风格迁移的目的是将一张风格图像(style image)的视觉元素迁移到另一张内容图像(content image),从而生成风格化的图像(stylized image)。
在相关技术中,实现风格迁移的方式有两种:
一种是基于在线图像优化的风格迁移算法。该方案通过让重建后的图像以梯度下降的方式更新像素值,使重建后的图像的风格统计量(如格拉姆(Gram)矩阵)接近风格图像的风格统计量,然后使重建后的图像的超分辨率测试序列(Visual Geometry Group,VGG)网络的高层特征表达接近内容图像的特征表达,从而获得同时拥有风格图像和内容图像的特点的重建图像。该方案的缺点是图像重建的优化速度较慢,效率较低,不利于工业化的部署。
另一种是基于离线模型优化的风格迁移算法。该方案是通过预训练的前向网络,在重建图像的同时加入风格图像的信息。该方法是工业界采用的主要方法,该方法的缺点是单个模型可以学习的风格较少,通常对于一个新的风格需要重新训练新的模型,可扩展性较弱。
同时,上述的风格迁移方法均存在参数量较多、计算开销较大、不利于移动端部署的问题。
发明内容
本申请提供一种对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统,以解决单个风格迁移模型可以学习的风格较少,可扩展性较弱,以及风格迁移模型的参数量较多、计算开销较大、不利于移动端部署的问题。
提供了一种风格迁移的方法,包括:
在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对所述内容特征图以及所述风格特征图分别进行逐通道排序;
获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排;
根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
还提供了一种风格迁移系统,所述风格迁移系统包括风格迁移网络,所述风格迁移网络包括编码网络、风格映射单元以及解码网络,其中,
所述编码网络,设置为对输入的内容图像以及风格图像进行编码处理,生成所述内容图像对应的内容特征图以及所述风格图像对应的风格特征图,并将所述内容特征图以及所述风格特征图输出至所述风格映射单元,以及,将所述内容特征图输出至所述解码网络;
所述风格映射单元,设置为对所述内容特征图以及所述风格特征图分别进行逐通道排序;获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排,以及,将重排后的风格特征图输出至所述解码网络;
所述解码网络,设置为根据所述内容特征图以及所述重排后的风格特征图生成风格迁移图像。
还提供了一种对上述的风格迁移网络进行压缩的方法,所述方法包括:
将风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络;
将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与所述每层对应的层相连;
固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
还提供了一种风格迁移的装置,所述装置包括:
特征排序模块,设置为在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对所述内容特征图以及所述风格特征图分别进行逐通道排序;
重排模块,设置为获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排;
风格迁移图像生成模块,设置为根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
还提供了一种对上述的风格迁移网络进行压缩的装置,所述装置包括:
学生网络创建模块,设置为将风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络;
卷积连接模块,设置为将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与所述每层对应的层相连;
学生网络训练模块,设置为固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
还提供了一种计算机设备,所述计算机设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的方法。
还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的方法。
图1是本申请实施例一提供的一种风格迁移的方法的流程图;
图2是本申请实施例二提供的一种风格迁移系统的结构框图;
图3是本申请实施例二提供的一种编码网络的网络结构示意图;
图4是本申请实施例二提供的另一种解码网络的网络结构示意图;
图5是本申请实施例三提供的一种风格迁移系统的结构框图;
图6是本申请实施例三提供的一种模型压缩的示意图;
图7是本申请实施例四提供的一种对风格迁移网络进行压缩的方法的流程图;
图8是本申请实施例五提供的一种风格迁移的装置的结构框图;
图9是本申请实施例六提供的一种对风格迁移网络进行压缩的装置的结构框图;
图10是本申请实施例七提供的一种计算机设备的结构示意图。
下面结合附图和实施例对本申请进行说明。
实施例一
图1是本申请实施例一提供的一种风格迁移的方法的流程图,本实施例可适用于如下场景:用户上传图片或视频,可在预设的艺术风格图中选择一张用于风格迁移的目标风格图,或者自己上传一张风格图作为迁移目标,最终返回风格迁移后的图片或视频。该方法可以应用在视频或图片编辑应用程序、直播应用程序、短视频应用程序等产品中,可由风格迁移装置执行,可以包括如下步骤:
步骤110、在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对内容特征图以及风格特征图分别进行逐通道排序。
在一种实施例中,内容特征图可以为内容图像经过编码网络,由编码网络对内容图像的颜色、统计、纹理、结构等特征信息进行提取,生成的图像特征(image features);风格特征图可以为风格图像经过编码网络,由编码网络对风格图像的颜色、统计、纹理、结构等特征信息进行提取,输出的特征图像。
在步骤110中,可以分别对内容特征图以及风格特征图逐通道(Channel)进行排序。在一种实施方式中,步骤110可以包括如下步骤:
步骤110-1,对所述内容特征图以及风格特征图分别进行逐通道向量化处理,获得向量化后的内容特征向量以及向量化后的风格特征向量。
在一种实施方式中,一种向量化处理的方式可以为:按照从左到右、从上到下的顺序将特征图逐通道向量化为一个包含N个像素点样本的数据集。例如,假设内容特征图为
是一个2通道数的特征图,每个通道包括一个2x2矩阵,即x∈R
2×?×?。风格特征图为
也是一个2通道数的特征图,每个通道包括一个2x2矩阵,即y∈R
2×?×?。则按照从左到右、从上到下的顺序逐通道将两者分别向量化后,得到的内容特征向量为
风格特征向量为
步骤110-2,对所述向量化后的内容特征向量以及所述向量化后的风格特征向量分别进行逐通道排序。
步骤120、获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排。
在该步骤中,对内容特征图以及风格特征图进行排序后,可以获取排序后的内容特征图的次序统计信息,并基于该次序统计信息对排序后的风格特征图进行重排,使得重排后的风格特征图具有了内容特征图的次序统计信息,既保留了风格特征图的风格统计量不变,也引入了内容特征图的次序统计信息,便于实现对任意风格图像的迁移。
在一种实施方式中,步骤120可以包括如下子步骤:
步骤120-1,将排序前的内容特征图中的内容特征向量与排序后的内容特征图中的内容特征向量进行比较,确定排序后的内容特征图中的内容特征向量的排序索引,作为次序统计信息。
在一种实施方式中,排序索引为排序前的内容特征图中的内容特征向量中的特征元素位于排序后的内容特征图中的内容特征向量中的次序。例如,针对上例,对
进行排序后得到:
针对
中的第一个特征元素“4”,其位于排序后的
中的第4位,即其对应的排序索引是4;针对
中的第二个特征元素“3”,其位于排序后的
中的第3位,即其对应的排序索引是3,以此类推,得到多个特征元素的排序索引如下:d
x=[4,3,1,2],[3,2,4,1]]。
步骤120-2,按照所述次序统计信息对排序后的风格特征图中的风格特征向量进行重排。
例如,针对上例,可以采用排序索引d
x=[4,3,1,2],[3,2,4,1]]对
进行重排,即,将
的第一个通道中排序在第“4”位的特征元素“23”重排在该通道的第1位,将
的第一个通道中排序在第“3”位的特征元素“19”重排在该通道的第2位,将
的第一个通道中排序在第“1”位的 特征元素“11”重排在该通道的第3位,将
的第一个通道中排序在第“2”位的特征元素“14”重排在该通道的第4位,以此类推,得到重排后的风格特征向量如下:
此时,
就具有内容特征向量
的次序统计信息,或者说,
中元素的大小相对关系与
相同。
步骤120-3,恢复重排后的风格特征向量的空间维度,得到重排后的风格特征图。
步骤130、根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
在该步骤中,重排后的风格特征图既保留了风格特征的二阶统计信息,又具有内容特征图的次序统计信息,将内容特征图与重排后的风格特征图进行重建生成的风格迁移图像,可以更好地呈现出风格图像与内容图像的整合效果。
在一种实施方式中,步骤130可以包括如下步骤:
将排序前的所述内容特征图以及重排后的风格特征图输入至预训练的解码网络中,以由所述解码网络对排序前的所述内容特征图以及重排后的风格特征图进行解码重建,生成风格迁移图像。
在该实施例中,通过解码网络来对排序前的内容特征图以及重排后的风格特征图进行解码重建,输出风格迁移图像。其中,解码网络的网络结构与编码网络的网络结构对称。
在一种实施例中,在解码网络中,内容特征图的不同层级的内容特征信息可以以跳跃连接的方式被引入到解码网络中,这样可以帮助解码网络更好的恢复内容图像的细节。其中,跳跃连接的方式就是直接引入的方式,内容特征图的不同层级的内容特征信息可以不经过任何子网络被直接引入到解码网络中。
在本实施例中,在进行风格迁移时,通过对内容特征图以及风格特征图进行逐通道排序,获取排序后的内容特征图的次序统计信息,并按照该排序后的内容特征图的次序统计信息对排序后的风格特征图进行重排,这样,针对指定的内容特征图,当确定该内容特征图的次序统计信息以后,则可以基于该内容特征图的次序统计信息,通过对任意风格的风格特征图进行重排的方式,使得重排后的风格特征图既保留了自身的风格统计信息又具有内容特征图的次序统计信息,这样当需要转换当前内容图像的风格时,只需要根据内容图像的次序 统计信息对风格特征图进行重排和解码等处理,从而实现对该指定内容特征图对应的内容图像的任意风格迁移,无需针对新的风格重新训练模型,可扩展性强。
实施例二
图2是本申请实施例二提供的一种风格迁移系统的结构框图,其中,该风格迁移系统中可以包括风格迁移网络,该风格迁移网络包括编码网络210、风格映射单元220以及解码网络230。其中,
所述编码网络210,设置为对输入的内容图像以及风格图像进行编码处理,生成内容图像对应的内容特征图以及风格图像对应的风格特征图,并将所述内容特征图以及所述风格特征图输出至所述风格映射单元,以及,将所述内容特征图输出至所述解码网络。
在该实施例中,编码网络210又称编码器,其有两个输入,分别是内容图像和风格图像,编码网络210对内容图像进行编码后输出内容特征图,对风格图像进行编码后输出风格特征图。其中,本实施例所提及的编码,可以包括但不限于:提取内容图像或者风格图像不同层级的特征信息。
如图2所示,编码网络210与风格映射单元220以及解码网络230均有连接,其将内容特征图和风格特征图输出给风格映射单元220,以及将内容特征图输出给解码网络230。
在一种实施例中,编码网络210可以为预训练的VGG网络,如VGG-19,该网络可以在图像数据集ImageNet上进行预训练,其中,在训练过程中编码器的权重不参与训练。
作为一种示例,编码网络210的网络结构可以图3所示,该编码网络210可以包括9个卷积层(conv)和3个最大池化层(maxpool),每个卷积层后均接一个线性整流函数(Rectified Linear Unit,ReLU)非线性激活层。假设内容图像或者风格图像为蓝-绿-红(Blue-Green-Red,BGR)格式的图像(也可以是其他格式的图像,如RGB等,本实施例对此不作限制),当内容图像或者风格图像输入至编码网络210以后,首先经过包括两个卷积核(每个卷积核大小是3x3)、输出通道数是64的卷积层,然后再经过一个最大池化层;之后再经过包括两个卷积核(每个卷积核大小是3x3)、输出通道数是128的卷积层,再经过一个最大池化层进行下采样;接着经过包括四个卷积核(每个卷积核大小是3x3)、输出通道数是256的卷积层,和一个最大池化层;最后再经过包括一个卷积核(卷积核大小是3x3)、输出通道数是512的卷积层,最终输出对应的特 征图。
所述风格映射单元220,设置为对内容特征图以及风格特征图分别进行逐通道排序;获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排,以及,将重排后的风格特征图输出至所述解码网络。
风格映射单元220在接收到内容特征图以及风格特征图以后,对内容特征图以及风格特征图分别进行逐通道排序,并获取排序后的内容特征图的次序统计信息(即排序索引),然后按照该次序统计信息对排序后的风格特征图进行重排,使得重排后的风格特征图既保留了风格特征的二阶统计信息又具有内容特征图的次序统计信息。
关于风格映射单元220对内容特征图以及风格特征图的排序、重排等过程,可以参考图1实施例的描述,此处不再赘述了。
风格映射单元220获得重排后的风格特征图以后,则可以将重排后的风格特征图输出至解码网络230。
在该实施例中,风格映射单元220是一个无可训练权重的独立单元,具有无参数的特点,因此在风格迁移网络中增加风格映射单元220不会带来网络体积的增大,计算开销也相对较低。
所述解码网络230,设置为根据所述内容特征图以及重排后的风格特征图生成风格迁移图像。
在该实施例中,解码网络230接收到来自于编码网络210的内容特征图以及来自于风格映射单元220的重排后的风格特征图以后,可以对该内容特征图以及重排后的风格特征图进行解码重建,输出风格迁移图像。
解码网络230的网络结构与编码网络210的网络结构是对称的。在一种实施例中,在解码网络230中,输入层的输入是重排后的风格特征图,内容特征图的不同层级的内容特征信息可以以跳跃连接的方式被引入到解码网络230中,这样可以帮助解码网络230更好地恢复内容图像的细节。
在一种实施例中,编码网络210还设置为将风格特征图输出至解码网络230中,而在解码网络230中还可以加入自适应实例规范化(Adaptive Instance Normalization,AdaIN)子网络,AdaIN子网络包括两个输入,分别是解码网络210中位于AdaIN子网络前一层的输出和该输出对应尺度的风格特征图,然后AdaIN子网络可以根据该输出对应尺度的风格特征图的均值和方差,逐通道地对该输出进行归一化处理,即:
其中,μ(x)是x逐通道的均值,μ(y)是风格特征图的逐通道的均值,σ(x)是x逐通道的标准差,σ(y)是风格特征图的逐通道的方差,x是解码网络210中位于AdaIN子网络前一层的输出。
AdaIN子网络通过改变特征图的逐通道的均值和方差,强化了图像重建过程中风格特征的统计分布。
作为一种示例,解码网络230的网络结构可以图4所示,对解码网络230的输入层(Input)输入的是风格映射单元220输出的重排后的风格特征图(output of style projection),首先经过包括1个卷积核(卷积核大小是3x3)、输出通道数是256的卷积层,然后通过双线性插值进行上采样(upsample),并和经过编码网络210得到的对应尺度的内容特征图(content feature map)进行级联;之后,再经过包括3个卷积核(每个卷积核大小是3x3)、输出通道数是256的卷积层和包括1个卷积核(卷积核大小是3x3)、输出通道数是128的卷积层;然后做双线性插值的上采样;接着经过AdaIN子网络并和经过编码网络210得到的对应尺度的内容特征图进行级联;然后经过包括1个卷积核(卷积核大小是3x3)、输出通道数是128的卷积层和包括1个卷积核(卷积核大小是3x3)、输出通道数是64的卷积层,并通过双线性插值进行上采样,和经过编码网络210得到的对应尺度的内容特征图进行级联,最后经过包括1个卷积核(卷积核大小是3x)、输出通道数是64的卷积层,以及包括1个卷积核(卷积核大小是3x3)、输出通道数是3的卷积层,最终将输出映射到RGB空间。其中,在上述解码网络230的网络结构中除了最后一个卷积层外,其余卷积层后均接一个ReLU非线性激活层。
在一种实施方式中,基于上述的风格迁移网络设计的损失函数可以包括内容损失函数、风格损失函数和重建损失函数。
内容损失函数可以采用将风格迁移图像和内容图像通过编码网络210后得到的编码网络210输出的特征的均方误差损失来度量,即:
L
c=‖E(c)-E(p)‖
2
其中,c表示内容图像,p表示风格迁移图像,E(c)表示内容图像通过编码网络210后得到的内容特征图,E(p)表示风格迁移图像通过编码网络210后得到的风格迁移特征图。
风格损失函数可以采用将风格迁移图像和风格图像通过编码网络210后得到的编码网络210输出的特征,在编码网络210的一些对应层的逐通道的均值 的距离和方差的距离来度量,即:
其中,E
i表示通过编码网络210得到的第i层特征图,s表示风格图像,N表示用于表征风格损失函数的对应层的数量。
重建损失函数是当输入的风格图像和内容图像一致(例如风格图像和内容图像是同一张图)时,模型重建的风格迁移图像与内容图像的接近程度,即:
L
r=‖E(c)-E(p′)‖
2
其中,E(c)表示内容图像经过编码网络210后得到的特征图,E(p′)表示风格迁移图像经过编码网络210后得到的特征图。
最终的损失函数为上述三项损失函数的加权和,即:
L=w
c*L
c+w
s*L
s+w
r*L
r
训练风格迁移网络时,优化目标是最小化损失函数L,内容损失、风格损失、重建损失的权重可以按需设置,本实施例对此不作限制,例如,内容损失、风格损失、重建损失的权重分别取为1、2、0.1。可以采用梯度下降法对风格迁移网络进行优化,例如,取初始步长为0.1,每隔500个epoch将步长缩小0.1倍。
在本申请实施例中,在风格迁移网络中引入无可训练参数的风格映射单元,该风格映射单元位于编码网络与解码网络之间,设置为对编码网络输出的内容特征图以及风格特征图进行逐通道排序,并获取排序后的内容特征图的次序统计信息,然后按照该次序统计信息对排序后的风格特征图进行重排,将重排后的风格特征图输入至解码网络进行解码,这样,针对指定的内容图像,当确定该内容图像的内容特征图的次序统计信息以后,则可以基于该内容特征图的次序统计信息,通过对任意风格的风格图像的风格特征图进行重排的方式,使得重排后的风格特征图具有内容特征图的次序统计信息,从而实现对该指定内容图像的任意风格迁移,无需针对新的风格重新训练模型,可扩展性强,给用户的自由度更大。
实施例三
图5是本申请实施例三提供的一种风格迁移系统的结构框图,该实施例在图2实施例的基础上实现,在该实施例中,风格迁移系统中还可以包括模型压缩模块240,设置为对图2实施例的风格迁移网络进行模型压缩处理。
在一种实施方式中,可以基于模型蒸馏的思想对风格迁移网络进行压缩。 则模型压缩模块240可以包括如下子模块:
学生网络创建子模块,设置为将所述风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络。
关联子模块,设置为将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与该层对应的层相连。
学生网络训练子模块,设置为固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
在一种实施例中,如图6的模型压缩示意图所示,以风格迁移网络作为教师网络(teacher model),即教师网络中包括编码网络、风格映射单元以及解码网络,然后设计一个和教师网络具有相似结构的学生网络(student model),其中,相似结构指的是网络层数相同,但卷积核数量更少,参数量更小的结构。
教师网络的不同层中每层的输出通过一个预设大小的卷积层和学生网络的对应层相连,该卷积参数可训练。其中,该卷积可以为小卷积,例如,该预设大小的卷积层可以为卷积核是1x1的卷积层。然后固定教师网络的权重,学生网络的全部参数均参与训练。
在一种实施例中,学生网络的损失函数可以包括风格损失函数、内容损失函数、重构损失函数以及蒸馏损失函数。
该蒸馏损失函数为学生网络的不同层中每层输出的特征与教师网络的与该层对应的层输出的特征经过上述卷积的映射后的相似程度,即学生网络的不同层输出的特征和教师网络对应层的输出经过预设大小的卷积映射后尽量接近,可以采用如下公式表示:
其中,f
i,g
i分别为教师网络中对编码网络和解码网络的每一层输出进行映射的1x1卷积,E
i(x)表示学生网络的编码网络的第i层的输出,D
i(x)表示学生网络的解码网络的第i层的输出,E′
i(x)表示教师网络的编码网络的第i层的输出,D′
i(x)表示教师网络的解码网络的第i层的输出,N表示用于表征蒸馏损失函数的对应层的数量。
重构损失函数为学生网络输出的风格迁移图像与教师网络输出的风格迁移图像的相似程度,即学生网络输出的风格迁移图像和教师网络输出的风格迁移图像尽量接近,可以采用如下公式表示:
L
r=‖D(x)-D′(X)‖
2
内容损失函数采用将学生网络输出的风格迁移图像和内容图像通过学生网络的编码网络后得到的学生网络的编码网络输出的特征的均方误差损失来度量。
风格损失函数采用将学生网络输出的风格迁移图像和风格图像通过学生网络的编码网络后得到的学生网络的编码网络输出的特征,在学生网络的编码网络的一些对应层的逐通道的均值的距离和方差的距离来度量。由于风格损失函数和内容损失函数与实施例二中的风格迁移网络的训练所使用的损失函数类似,可以参考实施例二的描述,此处不再赘述。
学生网络最终的损失函数可以为:
L=w
c*L
c+w
s*L
s+w
r*L
r+w
d*L
d
在本实施例中,在对风格迁移网络进行模型压缩处理时,以风格迁移网络作为教师网络,建立一个与该教师网络具有相似结构的学生网络,且教师网络的不同层中的每层的输出通过一个预设大小的卷积和学生网络的对应层相连,以实现采用预训练的教师网络对学生网络进行监督,利用教师网络的监督信息,可以直接从头训练一个小模型(即学生网络),而不需要分步预训练编码网络和解码网络,较大程度上压缩了风格迁移网络的体积的同时,可以较好地保持网络的迁移效果和拓展性,降低了计算开销,达到模型轻量化,便于移动端部署。
实施例四
图7是本申请实施例四提供的一种对风格迁移网络进行压缩的方法的流程图,该方法可以应用于对图2所述的风格迁移网络进行压缩,可以包括如下步骤:
步骤710,将风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络。
步骤720,将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与该层对应的层相连。
步骤730,固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
本实施例为实施例三对应的方法实施例,针对本实施例的说明可以参照实施例三的描述,本实施例对此不再赘述了。
实施例五
图8是本申请实施例五提供的一种风格迁移的装置的结构框图,可以包括如下模块:
特征排序模块810,设置为在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对所述内容特征图以及所述风格特征图分别进行逐通道排序;重排模块820,设置为获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排;风格迁移图像生成模块830,设置为根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
本申请实施例所提供的一种风格迁移的装置可执行本申请任意实施例所提供的风格迁移的方法,具备执行方法相应的功能模块和效果。
实施例六
图9是本申请实施例六提供的一种对风格迁移网络进行压缩的装置的结构框图,可以包括如下模块:
学生网络创建模块910,设置为将风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络;卷积连接模块920,设置为将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与所述每层对应的层相连;学生网络训练模块930,设置为固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
本申请实施例所提供的一种对风格迁移网络进行压缩的装置可执行本申请任意实施例所提供的对风格迁移网络进行压缩的方法,具备执行方法相应的功能模块和效果。
实施例七
图10是本申请实施例七提供的一种计算机设备的结构示意图,如图10所示,该计算机设备包括处理器100、存储器101、输入装置102和输出装置103;计算机设备中处理器100的数量可以是一个或多个,图10中以一个处理器100为例;计算机设备中的处理器100、存储器101、输入装置102和输出装置103可以通过总线或其他方式连接,图10中以通过总线连接为例。
存储器101作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的上述实施例对应的程序指令/模块。 处理器100通过运行存储在存储器101中的软件程序、指令以及模块,从而执行计算机设备的多种功能应用以及数据处理,即实现上述的方法实施例中提到的方法。
实施例八
本申请实施例八还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行上述方法实施例中的方法。
本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的方法中的相关操作。
Claims (15)
- 一种风格迁移的方法,包括:在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对所述内容特征图以及所述风格特征图分别进行逐通道排序;获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排;根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
- 根据权利要求1所述的风格迁移的方法,其中,所述对所述内容特征图以及所述风格特征图分别进行逐通道排序,包括:对所述内容特征图以及所述风格特征图分别进行逐通道向量化处理,获得向量化后的内容特征向量以及向量化后的风格特征向量;对所述向量化后的内容特征向量以及所述向量化后的风格特征向量分别进行逐通道排序。
- 根据权利要求2所述的风格迁移的方法,其中,所述获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排包括:将排序前的所述内容特征图中的内容特征向量与所述排序后的内容特征图中的内容特征向量进行比较,确定所述排序后的内容特征图中的内容特征向量的排序索引,作为所述次序统计信息,其中,所述排序索引为排序前的所述内容特征图中的内容特征向量中的特征元素位于所述排序后的内容特征图中的内容特征向量中的次序;按照所述次序统计信息对所述排序后的风格特征图中的风格特征向量进行重排;恢复重排后的风格特征向量的空间维度,得到所述重排后的风格特征图。
- 根据权利要求1-3任一项所述的风格迁移的方法,其中,所述根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像包括:将排序前的所述内容特征图以及所述重排后的风格特征图输入至预训练的解码网络中,以由所述解码网络对排序前的所述内容特征图以及所述重排后的风格特征图进行解码重建,生成所述风格迁移图像,其中,排序前的所述内容特征图的不同层级的内容特征信息以跳跃连接的方式被引入到所述解码网络中。
- 一种风格迁移系统,包括风格迁移网络,所述风格迁移网络包括编码网络、风格映射单元以及解码网络,其中,所述编码网络,设置为对输入的内容图像以及风格图像进行编码处理,生成所述内容图像对应的内容特征图以及所述风格图像对应的风格特征图,并将所述内容特征图以及所述风格特征图输出至所述风格映射单元,以及,将所述内容特征图输出至所述解码网络;所述风格映射单元,设置为对所述内容特征图以及所述风格特征图分别进行逐通道排序;获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排,以及,将重排后的风格特征图输出至所述解码网络;所述解码网络,设置为根据所述内容特征图以及所述重排后的风格特征图生成风格迁移图像。
- 根据权利要求5所述的风格迁移系统,其中,所述解码网络的输入层的输入是所述重排后的风格特征图,排序前的所述内容特征图的不同层级的内容特征信息以跳跃连接的方式被引入到所述解码网络中。
- 根据权利要求5或6所述的风格迁移系统,其中,所述编码网络还设置为将所述风格特征图输出至所述解码网络中;所述解码网络包括前向网络和自适应实例规范化AdaIN子网络,所述AdaIN子网络的输入包括所述解码网络中位于所述AdaIN子网络前一层的输出和所述输出对应尺度的风格特征图,所述AdaIN子网络设置为根据所述输出对应尺度的风格特征图的均值和方差,逐通道地对所述输出进行归一化处理。
- 根据权利要求5所述的风格迁移系统,还包括:模型压缩模块,设置为对所述风格迁移网络进行模型压缩处理。
- 根据权利要求8所述的风格迁移系统,其中,所述模型压缩模块包括:学生网络创建子模块,设置为将所述风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络;关联子模块,设置为将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与所述每层对应的层相连;学生网络训练子模块,设置为固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
- 根据权利要求9所述的风格迁移系统,其中,所述损失函数包括风格损失函数、内容损失函数、重构损失函数以及蒸馏损失函数;所述蒸馏损失函数为所述学生网络的不同层中每层输出的特征与所述教师网络的与所述每层对应的层输出的特征经过所述每层对应的卷积的映射后的相 似程度;所述重构损失函数为所述学生网络输出的风格迁移图像与所述教师网络输出的风格迁移图像的相似程度;所述内容损失函数采用将所述学生网络输出的风格迁移图像和内容图像通过所述学生网络的编码网络后得到的所述学生网络的编码网络输出的特征的均方误差损失来度量;所述风格损失函数采用将所述学生网络输出的风格迁移图像和风格图像通过所述学生网络的编码网络后得到的所述学生网络的编码网络输出的特征,在所述学生网络的编码网络的多层的逐通道的均值的距离和方差的距离来度量。
- 一种对权利要求5中的风格迁移网络进行压缩的方法,包括:将风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络;将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与所述每层对应的层相连;固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
- 一种风格迁移的装置,包括:特征排序模块,设置为在获得内容图像对应的内容特征图和风格图像对应的风格特征图以后,对所述内容特征图以及所述风格特征图分别进行逐通道排序;重排模块,设置为获取排序后的内容特征图的次序统计信息,并按照所述次序统计信息对排序后的风格特征图进行重排;风格迁移图像生成模块,设置为根据排序前的所述内容特征图以及重排后的风格特征图生成风格迁移图像。
- 一种对权利要求5中的风格迁移网络进行压缩的装置,包括:学生网络创建模块,设置为将风格迁移网络作为教师网络,并建立一个与所述教师网络具有相似结构的学生网络;卷积连接模块,设置为将所述教师网络的不同层中每层的输出通过一个预设大小的卷积与所述学生网络的与所述每层对应的层相连;学生网络训练模块,设置为固定所述教师网络的权重,按照预设的损失函数,对所述学生网络进行训练,训练得到的网络作为压缩后的风格迁移网络。
- 一种计算机设备,包括:至少一个处理器;存储装置,设置为存储至少一个程序;当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-4中任一项所述的风格迁移的方法和权利要求11所述的对风格迁移网络进行压缩的方法中的至少之一。
- 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-4中任一项所述的风格迁移的方法和权利要求11所述的对风格迁移网络进行压缩的方法中的至少之一。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010733581.5 | 2020-07-27 | ||
CN202010733581.5A CN111932445B (zh) | 2020-07-27 | 2020-07-27 | 对风格迁移网络的压缩方法及风格迁移方法、装置和系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022022001A1 true WO2022022001A1 (zh) | 2022-02-03 |
Family
ID=73315354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/093265 WO2022022001A1 (zh) | 2020-07-27 | 2021-05-12 | 对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111932445B (zh) |
WO (1) | WO2022022001A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115618452A (zh) * | 2022-12-08 | 2023-01-17 | 湖南大学 | 具有设计师风格的服装图像智能生成系统 |
CN116071275A (zh) * | 2023-03-29 | 2023-05-05 | 天津大学 | 基于在线知识蒸馏和预训练先验的人脸图像修复方法 |
CN116543287A (zh) * | 2023-04-20 | 2023-08-04 | 南京市秦淮区水务设施养护所 | 基于联邦风格迁移的水务河道漂浮物检测方法 |
CN117521742A (zh) * | 2023-10-12 | 2024-02-06 | 汕头大学 | 基于深度神经网络模型的轻量化部署图像处理方法 |
WO2024138720A1 (zh) * | 2022-12-30 | 2024-07-04 | 深圳Tcl新技术有限公司 | 一种图像生成方法、装置、计算机设备和存储介质 |
CN118537433A (zh) * | 2024-07-24 | 2024-08-23 | 江西啄木蜂科技有限公司 | 基于多模态大模型的自然保护地和林业遥感图像生成方法 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111932445B (zh) * | 2020-07-27 | 2024-07-16 | 广州市百果园信息技术有限公司 | 对风格迁移网络的压缩方法及风格迁移方法、装置和系统 |
CN112669308B (zh) * | 2021-01-06 | 2024-05-24 | 携程旅游信息技术(上海)有限公司 | 基于风格迁移的图像生成方法、系统、设备及存储介质 |
CN112884636B (zh) * | 2021-01-28 | 2023-09-26 | 南京大学 | 一种自动生成风格化视频的风格迁移方法 |
CN113012038B (zh) * | 2021-03-19 | 2023-11-28 | 深圳市兴海物联科技有限公司 | 一种图像风格迁移处理方法、移动终端和云端服务器 |
CN113327194A (zh) * | 2021-06-30 | 2021-08-31 | 北京百度网讯科技有限公司 | 图像风格迁移方法、装置、设备和存储介质 |
CN114529796A (zh) * | 2022-01-30 | 2022-05-24 | 北京百度网讯科技有限公司 | 模型训练方法、图像识别方法、装置及电子设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109949214A (zh) * | 2019-03-26 | 2019-06-28 | 湖北工业大学 | 一种图像风格迁移方法及系统 |
CN110874631A (zh) * | 2020-01-20 | 2020-03-10 | 浙江大学 | 一种基于特征图稀疏化的卷积神经网络剪枝方法 |
US20200126205A1 (en) * | 2018-10-18 | 2020-04-23 | Boe Technology Group Co., Ltd. | Image processing method, image processing apparatus, computing device and computer-readable storage medium |
CN111325664A (zh) * | 2020-02-27 | 2020-06-23 | Oppo广东移动通信有限公司 | 风格迁移方法、装置、存储介质及电子设备 |
CN111415299A (zh) * | 2020-03-26 | 2020-07-14 | 浙江科技学院 | 一种高分辨率图像风格迁移方法 |
CN111932445A (zh) * | 2020-07-27 | 2020-11-13 | 广州市百果园信息技术有限公司 | 对风格迁移网络的压缩方法及风格迁移方法、装置和系统 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10198839B2 (en) * | 2016-09-22 | 2019-02-05 | Apple Inc. | Style transfer-based image content correction |
CN109308679B (zh) * | 2018-08-13 | 2022-08-30 | 深圳市商汤科技有限公司 | 一种图像风格转换方法及装置、设备、存储介质 |
KR102708715B1 (ko) * | 2018-11-16 | 2024-09-24 | 삼성전자주식회사 | 영상 처리 장치 및 그 동작방법 |
US10839493B2 (en) * | 2019-01-11 | 2020-11-17 | Adobe Inc. | Transferring image style to content of a digital image |
CN110232152B (zh) * | 2019-05-27 | 2021-03-23 | 腾讯科技(深圳)有限公司 | 内容推荐方法、装置、服务器以及存储介质 |
CN110210468B (zh) * | 2019-05-29 | 2022-12-16 | 电子科技大学 | 一种基于卷积神经网络特征融合迁移的文字识别方法 |
-
2020
- 2020-07-27 CN CN202010733581.5A patent/CN111932445B/zh active Active
-
2021
- 2021-05-12 WO PCT/CN2021/093265 patent/WO2022022001A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200126205A1 (en) * | 2018-10-18 | 2020-04-23 | Boe Technology Group Co., Ltd. | Image processing method, image processing apparatus, computing device and computer-readable storage medium |
CN109949214A (zh) * | 2019-03-26 | 2019-06-28 | 湖北工业大学 | 一种图像风格迁移方法及系统 |
CN110874631A (zh) * | 2020-01-20 | 2020-03-10 | 浙江大学 | 一种基于特征图稀疏化的卷积神经网络剪枝方法 |
CN111325664A (zh) * | 2020-02-27 | 2020-06-23 | Oppo广东移动通信有限公司 | 风格迁移方法、装置、存储介质及电子设备 |
CN111415299A (zh) * | 2020-03-26 | 2020-07-14 | 浙江科技学院 | 一种高分辨率图像风格迁移方法 |
CN111932445A (zh) * | 2020-07-27 | 2020-11-13 | 广州市百果园信息技术有限公司 | 对风格迁移网络的压缩方法及风格迁移方法、装置和系统 |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115618452A (zh) * | 2022-12-08 | 2023-01-17 | 湖南大学 | 具有设计师风格的服装图像智能生成系统 |
WO2024138720A1 (zh) * | 2022-12-30 | 2024-07-04 | 深圳Tcl新技术有限公司 | 一种图像生成方法、装置、计算机设备和存储介质 |
CN116071275A (zh) * | 2023-03-29 | 2023-05-05 | 天津大学 | 基于在线知识蒸馏和预训练先验的人脸图像修复方法 |
CN116543287A (zh) * | 2023-04-20 | 2023-08-04 | 南京市秦淮区水务设施养护所 | 基于联邦风格迁移的水务河道漂浮物检测方法 |
CN116543287B (zh) * | 2023-04-20 | 2023-11-14 | 南京市秦淮区水务设施养护所 | 基于联邦风格迁移的水务河道漂浮物检测方法 |
CN117521742A (zh) * | 2023-10-12 | 2024-02-06 | 汕头大学 | 基于深度神经网络模型的轻量化部署图像处理方法 |
CN117521742B (zh) * | 2023-10-12 | 2024-04-30 | 汕头大学 | 基于深度神经网络模型的轻量化部署图像处理方法 |
CN118537433A (zh) * | 2024-07-24 | 2024-08-23 | 江西啄木蜂科技有限公司 | 基于多模态大模型的自然保护地和林业遥感图像生成方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111932445B (zh) | 2024-07-16 |
CN111932445A (zh) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022022001A1 (zh) | 对风格迁移网络进行压缩的方法及风格迁移的方法、装置和系统 | |
Nash et al. | Generating images with sparse representations | |
US11537873B2 (en) | Processing method and system for convolutional neural network, and storage medium | |
Xu et al. | Data-distortion guided self-distillation for deep neural networks | |
EP3678059B1 (en) | Image processing method, image processing apparatus, and a neural network training method | |
CN111368662B (zh) | 一种人脸图像属性编辑方法、装置、存储介质及设备 | |
Zhao et al. | Invertible image decolorization | |
US10909728B1 (en) | Learned lossy image compression codec | |
US11328184B2 (en) | Image classification and conversion method and device, image processor and training method therefor, and medium | |
US11216913B2 (en) | Convolutional neural network processor, image processing method and electronic device | |
KR20210074360A (ko) | 이미지 처리 방법, 디바이스 및 장치, 그리고 저장 매체 | |
CN110689599A (zh) | 基于非局部增强的生成对抗网络的3d视觉显著性预测方法 | |
WO2023151529A1 (zh) | 人脸图像的处理方法及相关设备 | |
JP2004532577A (ja) | 非直交基本関数を用いて色画像を効率的に符号化する方法および装置 | |
CN113205449A (zh) | 表情迁移模型的训练方法及装置、表情迁移方法及装置 | |
CN108734653A (zh) | 图像风格转换方法及装置 | |
CN114973049A (zh) | 一种统一卷积与自注意力的轻量视频分类方法 | |
CN111881920B (zh) | 一种大分辨率图像的网络适配方法及神经网络训练装置 | |
CN114586056A (zh) | 图像处理方法及装置、设备、视频处理方法及存储介质 | |
CN116168197A (zh) | 一种基于Transformer分割网络和正则化训练的图像分割方法 | |
CN117576118B (zh) | 多尺度多感知的实时图像分割方法、系统、终端及介质 | |
CN111627077A (zh) | 一种医疗图像的处理方法及其压缩、还原系统 | |
CN110111252A (zh) | 基于投影矩阵的单幅图像超分辨率方法 | |
CN115937429A (zh) | 一种基于单张图像的细粒度3d人脸重建方法 | |
CN115474048A (zh) | 基于分裂四元数模型的快速彩色图像压缩方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21851141 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21851141 Country of ref document: EP Kind code of ref document: A1 |