WO2022077417A1 - 图像处理方法、图像处理设备和可读存储介质 - Google Patents
图像处理方法、图像处理设备和可读存储介质 Download PDFInfo
- Publication number
- WO2022077417A1 WO2022077417A1 PCT/CN2020/121405 CN2020121405W WO2022077417A1 WO 2022077417 A1 WO2022077417 A1 WO 2022077417A1 CN 2020121405 W CN2020121405 W CN 2020121405W WO 2022077417 A1 WO2022077417 A1 WO 2022077417A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- image
- neural network
- trained
- layer
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 30
- 238000012545 processing Methods 0.000 title claims abstract description 26
- 238000013528 artificial neural network Methods 0.000 claims abstract description 169
- 238000012549 training Methods 0.000 claims abstract description 56
- 238000000034 method Methods 0.000 claims abstract description 33
- 239000010410 layer Substances 0.000 claims description 216
- 238000000605 extraction Methods 0.000 claims description 63
- 230000006870 function Effects 0.000 claims description 45
- 238000005070 sampling Methods 0.000 claims description 28
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 26
- 239000002356 single layer Substances 0.000 claims description 26
- 230000009467 reduction Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 238000005457 optimization Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 11
- 230000004913 activation Effects 0.000 description 22
- 238000013527 convolutional neural network Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 14
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 5
- 238000012937 correction Methods 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, an image processing device and a readable storage medium.
- the compressed video Before video is transmitted, it needs to be compressed and encoded due to bandwidth limitations.
- the compressed video will generate a variety of compression noises, which affects people's viewing experience of the video on the display terminal.
- aspects of the present disclosure provide an image processing method, an image processing apparatus, and a readable storage medium.
- Embodiments of the present disclosure provide an image processing method, including:
- the trained first neural network is obtained by training the first neural network to be trained by a first training method, and the first training method includes:
- the second neural network to be trained and the discriminant network to be trained are alternately trained, and the trained second neural network and the trained discriminant network are obtained; wherein, the parameters of the trained second neural network are more than those to be trained.
- parameters of the first neural network are configured to transform a received image with a first definition into an image with a second definition, the second definition being greater than the the first definition;
- the first neural network to be trained includes: a plurality of first feature extraction sub-networks and a first output sub-network located after the plurality of first feature extraction sub-networks, the trained
- the second neural network includes: a plurality of second feature extraction sub-networks and a second output sub-network located after the plurality of second feature extraction sub-networks, the first feature extraction sub-network and the second feature extraction sub-network One-to-one correspondence with the network;
- the first sample image is provided to the trained second neural network and the first neural network to be trained, so that the first neural network to be trained outputs the first output image, and the trained first neural network outputs the first output image.
- the second neural network outputs a second output image;
- the number of channels of the output image of the first feature extraction sub-network is less than the number of channels of the output image of the corresponding second feature extraction sub-network;
- the first training method further includes: providing a plurality of output images of the second feature extraction sub-networks to a plurality of dimensionality reduction layers in a one-to-one correspondence, so that each dimensionality reduction layer generates an intermediate image; the intermediate image The number of channels is the same as the number of channels of the output image of the first feature extraction sub-network;
- Adjusting the parameters of the first neural network according to the total loss function includes: adjusting the parameters of the first neural network and the dimensionality reduction layer; wherein, the third loss is based on each of the intermediate images is obtained by summing the differences between the output images corresponding to the first feature extraction sub-network.
- the total loss further includes: a fourth loss, the fourth loss is derived based on the perceptual loss of the first output image and the second output image.
- the perceptual loss of the first output image and the second output image Calculated according to the following formula:
- y1 is the first output image
- y2 is the second output image
- j is the layer number of the preset network layer in the discriminant network
- C is the channel number of the output image of the preset network layer
- H is The height of the output image of the preset network layer
- W is the width of the output image of the preset network layer.
- the first loss includes an L1 loss of the first output image and the second output image.
- the second loss includes a cross-entropy loss of the first discriminant result and the first target result.
- the third loss term includes the sum of the L2 loss of the output image of each first feature extraction sub-network and the corresponding intermediate image.
- the first output image is provided to the trained discrimination network, so that the trained discrimination network generates a first discrimination result based on the first output image, comprising:
- the first output image is set to have a ground truth label, and the first output image with ground truth label is provided to the trained discriminant network, so that the discriminant network outputs a first discriminant result.
- the training of the discriminant network to be trained includes:
- the current discriminant network providing the first sharpness-enhancing image and the original sample image corresponding to the second sample image to the current discriminant network, and adjusting the current discriminant network according to the loss function of the current discriminant network , so that the output of the discrimination network after parameter adjustment can represent whether the input of the discrimination network is the output image of the second neural network or the discrimination result of the original sample image.
- the training of the second neural network to be trained includes:
- the parameters of the current second neural network are adjusted to obtain an updated second neural network;
- the current loss function of the second neural network is the first
- the second term is based on the difference between the second sharpness-enhanced image and its corresponding original sample image, and the second term in the current loss function of the second neural network is based on the difference between the second discrimination result and the second target result. difference between.
- the first item in the current loss function of the second neural network is ⁇ 1 LossG1 , ⁇ 1 is a preset weight, and LossG1 is the second definition enhancement image and its corresponding original L1 loss between sample images;
- the second term in the current loss function of the second neural network is ⁇ 2 L D , ⁇ 2 is a preset weight, and LD is the cross entropy between the second discrimination result and the second target result ;
- ⁇ 3 is a preset weight
- y is the original sample image corresponding to the second sharpness-enhancing image, upscaling the image for the second definition
- the preset network layer in the preset optimization network is the preset network layer in the preset optimization network
- j is the number of layers of the preset network layer in the preset optimization network
- C is the number of channels of the output image of the preset network layer
- H is the preset network layer.
- the first neural network to be trained includes: a plurality of first upsampling layers, a plurality of first downsampling layers, and a plurality of single-layer convolutional layers, each of the first upsampling layers and each of the first down-sampling layers is located between two of the single-layer convolutional layers;
- the input data of the penultimate i-th single-layer convolutional layer includes the penultimate i-th first up-sampling layer The superposition of the output data of the positive number and the output data of the i-th single-layer convolutional layer; wherein, the number of the single-layer convolutional layers is an even number, and i is greater than 0 and less than the number of the single-layered convolutional layers. half;
- the trained second neural network includes: a plurality of second upsampling layers, a plurality of second downsampling layers and a plurality of residual blocks, the plurality of second upsampling layers and the plurality of first upsampling layers;
- the sampling layers are in one-to-one correspondence
- the plurality of second down-sampling layers are in one-to-one correspondence with the plurality of first down-sampling layers
- the plurality of residual blocks are in one-to-one correspondence with the plurality of single-layer convolutional layers;
- the input data of the last i-th said residual block is the superposition of the output data of the last i-th said second upsampling layer and the output data of the positive i-th said residual block;
- the first feature extraction sub-network includes: the first up-sampling layer, or the first down-sampling layer, or the single-layer convolutional layer; the first output sub-network includes the single-layer convolutional layer layer; the second feature extraction sub-network includes: the second up-sampling layer, or the second down-sampling layer, or the residual block; the second output sub-network includes the residual block.
- Embodiments of the present disclosure further provide an image processing device, including a memory and a processor, where a computer program is stored in the memory, wherein the computer program implements the above-mentioned image processing method when executed by the processor.
- Embodiments of the present disclosure further provide a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned image processing method when executed by a processor.
- FIG. 1 is a schematic diagram of a convolutional neural network.
- FIG. 2 is a schematic diagram of an image processing method provided in an embodiment of the present disclosure.
- FIG. 3 is a schematic diagram of a first training method provided in an embodiment of the present disclosure.
- FIG. 4 is a schematic diagram of a network architecture including a first neural network and a second neural network provided in an embodiment of the present disclosure.
- Figure 5 is an example diagram of a residual block.
- FIG. 6 is a schematic structural diagram of a trained discriminant network provided in an embodiment of the present disclosure.
- FIG. 7 is a flowchart of an optional implementation manner of step S21 provided in an embodiment of the present disclosure.
- FIG. 8 is an effect diagram before and after image processing using the image processing method according to an embodiment of the present disclosure.
- the compressed video Before video is transmitted, it needs to be compressed and encoded due to bandwidth limitations.
- the compressed video will generate a variety of compression noises, which affects people's viewing experience of the video on the display terminal.
- the video compression and repair technology based on deep learning can improve the repair effect of video compression noise.
- the algorithm model of deep learning has a large amount of parameters, which results in an excessive amount of calculation of the display terminal.
- FIG. 1 is a schematic diagram of a convolutional neural network.
- the convolutional neural network can be used for image processing, which uses images as input and output, and replaces scalar weights with filters (ie, convolutions).
- FIG. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited by the embodiments of the present disclosure.
- the convolutional neural network includes an input layer 101 , a hidden layer 102 and an output layer 103 . In the input layer 101 there are 4 input images, in the middle hidden layer 102 there are 3 units to output 3 output images, and in the output layer 103 there are 2 units to output 2 output images.
- a convolutional layer has a weight w ij k and a bias b i k , where the weight w ij k represents the convolution kernel, and the bias is a scalar superimposed to the output of the convolutional layer, where k is the input layer Label No. 101, i and j are the labels of the unit of input layer 101 and the unit of hidden layer 102, respectively.
- the first convolutional layer 201 includes a first set of convolution kernels (wi ij 1 in FIG. 1 ) and a first set of biases ( bi 1 in FIG. 1 ).
- the second convolutional layer 202 includes a second set of convolution kernels (w ij 2 in FIG.
- each convolutional layer includes dozens or hundreds of convolution kernels, and if the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.
- the convolutional neural network further includes a first activation layer 203 and a second activation layer 204 .
- the first activation layer 203 is located after the first convolutional layer 201
- the second activation layer 204 is located after the second convolutional layer 202 .
- the activation layer includes an activation function, which is used to introduce nonlinear factors into the convolutional neural network, so that the convolutional neural network can better solve more complex problems.
- the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), a hyperbolic tangent function (tanh function), and the like.
- the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer.
- the convolutional neural network in Figure 1 can be used to improve the clarity of the image.
- the trained convolutional neural network improves the clarity of the input low-definition image to obtain a high-definition image.
- the training process of the convolutional neural network is the optimization process of the parameters of the convolutional neural network.
- the loss of the convolutional neural network helps to optimize the parameters (weights) of the convolutional neural network, and the goal of the training process is to minimize the loss of the neural network by optimizing the parameters of the neural network.
- the loss of the neural network is used to measure the quality of the prediction of the network model, that is, to express the degree of the gap between the predicted results and the actual data.
- FIG. 2 is a schematic diagram of the image processing method provided in the embodiment of the present disclosure.
- the image processing method includes: S10. Use a trained first neural network to process an input image Process to get the target output image. The clarity of the target output image is higher than the clarity of the input image.
- “sharpness” refers to, for example, the clarity of each detail shadow pattern and its boundary in an image. The higher the clarity, the better the perception effect of the human eye.
- the definition of the target output image is higher than that of the input image, for example, it means that the input image is processed by using the image processing method provided by the embodiment of the present disclosure, such as denoising and/or deblurring processing, so that the obtained image after processing is processed.
- the target output image is sharper than the input image.
- the trained first neural network is obtained by training the first neural network to be trained by the first training method.
- FIG. 3 is a schematic diagram of the first training method provided in the embodiment of the present disclosure. , as shown in Figure 3, the first training method includes:
- the trained second neural network has more parameters than the to-be-trained first neural network.
- 4 is a schematic diagram of a network architecture including a first neural network to be trained and a trained second neural network provided in an embodiment of the present disclosure
- the first neural network 10 to be trained includes: a plurality of first feature extraction sub-networks ML1 and the first output sub-network OL1 located after the plurality of first feature extraction sub-networks ML1
- the trained second neural network 20 includes: a plurality of second feature extraction sub-networks ML2 and a plurality of second feature extraction sub-networks ML2 After the second output sub-network OL2, the first feature extraction sub-network ML1 corresponds to the second feature extraction sub-network ML2 one-to-one.
- the first neural network 10 to be trained includes: a plurality of first upsampling layers 13, a plurality of first downsampling layers 12, and a plurality of single-layer convolutional layers 11, each first upsampling layer 13 and each first downsampling layer 12 are located between two single-layer convolutional layers 11; the input data of the penultimate i-th single-layer convolutional layer 11 includes the output data of the penultimate i-th first upsampling layer 13 and the superposition of the output data of the positive i-th single-layer convolutional layer 11.
- the number of single-layer convolutional layers 11 is an even number, and i is greater than 0 and less than half of the number of single-layered convolutional layers.
- the second neural network 20 includes: a plurality of second upsampling layers 23 , a plurality of second downsampling layers 22 and a plurality of residual blocks 21 , the plurality of second upsampling layers 23 and the plurality of first upsampling layers 13 are one One-to-one correspondence, multiple second downsampling layers 22 correspond to multiple first downsampling layers 12 one-to-one, multiple residual blocks 21 correspond to multiple single-layer convolutional layers 11 one-to-one; the last i-th residual block
- the input data of 21 includes the superposition of the output data of the penultimate ith second upsampling layer 23 and the output data of the positive ith residual block 21 .
- the single-layer convolution layer 11 , the first up-sampling layer 13 and the first down-sampling layer 12 all use 3*3 convolution kernels, and the number of convolution kernels is 128.
- the sampling magnification of the second upsampling layer 23 is the same as that of the first upsampling layer 13
- the sampling magnification of the second downsampling layer 22 is the same as that of the first downsampling layer 12 .
- both the first up-sampling layer 13 and the first down-sampling layer 12 are double-sampled.
- the first downsampling layer 12 and the second downsampling layer 22 may include an inverse Muxout layer, a Strided Convolution, a Maxpool Layer, or a standard per-channel downsampler (such as bicubic interpolation) .
- the first upsampling layer 13 and the second upsampling layer 23 may include Muxout layers, Strided Transposed Convolution, or standard per-channel upsamplers (eg, bicubic interpolation).
- FIG. 5 is an example diagram of a residual block.
- each residual block 21 includes three sub-residual blocks 21 a connected in sequence.
- Each sub-residual block 21a adopts two convolutional layers with 3*3 convolution kernels, and an activation layer is connected between the two convolutional layers.
- an activation layer is connected between the two convolutional layers.
- its input is superimposed to the last convolutional layer on the output result of , and thus serve as the output of the sub-residual block 21a.
- the activation layer includes an activation function, and the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), a hyperbolic tangent function (tanh function), and the like.
- the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer.
- the residual block 21 in the second neural network 20 is replaced by a single-layer convolutional layer 11 , thereby reducing the amount of parameters of the first convolutional network 10 .
- the first feature extraction sub-network ML1 includes: a first up-sampling layer 13, or a first down-sampling layer 12, or the single-layer convolutional layer 11; the first output sub-network OL1 includes a single-layer convolutional layer 21;
- the second feature extraction sub-network ML2 includes: a second up-sampling layer 23 , or a second down-sampling layer 22 , or a residual block 21 ; the second output sub-network OL2 includes a residual block 21 .
- the number of channels of the output image of the first feature extraction sub-network ML1 is greater than the number of channels of the output image of the second feature extraction sub-network ML2.
- the number of channels of the output image of the first feature extraction sub-network ML1 is 128, and the number of channels of the output image of the second feature extraction sub-network ML2 is 32.
- the image input to each network layer is represented by a matrix, and the image received by the first layer in the neural network can be an image matrix with three channels of R, G, and B, that is, , the image matrix for each channel represents the data for the red, green, or blue components of the image.
- Each network layer is used to extract features from the image.
- the output data of the network layer includes multiple matrices, and each matrix represents a channel of the image.
- the second neural network to be trained and the discriminant network to be trained are alternately trained, so as to compete with each other to obtain the best model.
- the trained second neural network is configured to transform a received image with a first definition into an image with a second definition, the second definition being greater than the first definition.
- the trained discriminant network is configured to determine the matching degree between the output result of the second neural network and the preset standard image, and the matching degree is between 0 and 1. Among them, when the second neural network to be trained is trained, the parameters of the current second neural network are adjusted so that after the output result of the second neural network after parameter adjustment is input into the current discriminant network, the output of the discriminant network is as close to 1 as possible.
- the parameters of the current discriminant network are adjusted so that after the preset standard image is input into the current discriminant network, the output result of the current discriminant network is as close to 1 as possible (that is, the discriminant network determines Its input is a "true” sample), and after the current output of the second neural network enters the discriminant network, the output result of the discriminant network is as close to 0 as possible (that is, the discriminant network determines that its input is a "fake” sample).
- the discriminant network is continuously optimized to distinguish the output results of the second neural network from the preset standard images as much as possible, and the second neural network is continuously optimized to make the output results as far as possible. Close to the preset standard image.
- This method allows two neural networks to compete and improve each time based on the better and better results of the other network to get better and better network models.
- FIG. 6 is a schematic structural diagram of a trained discriminant network provided in an embodiment of the present disclosure.
- the trained discriminant network 30 includes a plurality of convolutional layers 31 to 34 and a fully connected layer 35.
- Each convolutional layer 31-34 adopts a 2-fold downsampling convolutional layer, and an activation layer is connected behind each convolutional layer 31-34.
- the activation layer includes an activation function, and the activation function can include a linear correction unit (ReLU) function, S type function (Sigmoid function) or hyperbolic tangent function (tanh function), etc.
- Each convolution layer 31 to 34 uses a 3*3 convolution kernel.
- the number of channels of the image output by the convolution layer 31 is 32, the number of channels of the image output by the convolution layer 32 is 64, and the number of channels output by the convolution layer 33 is 32.
- the number of channels of the image is 128, and the number of channels of the image output by the convolutional layer 34 is 192.
- the fully connected layer 35 outputs a vector of 1024*1, and then passes through the activation layer (for example, the activation layer uses sigmoid as the activation function), and outputs a value between 0 and 1.
- the structures of the trained discriminant network and the discriminant network to be trained ie, the number of convolutional layers, the number of convolutional kernels in the convolutional layers
- the difference lies in the different weights in the convolutional layers.
- the number of layers of the network layers in the trained first neural network 10 and the trained discriminant network 30 in FIG. 4 and FIG. 5 is only an exemplary illustration. structure is adjusted.
- the original video can be compressed at a low bit rate (for example, the compression bit rate is 1 Mbps) to obtain a compressed video, and each frame of image in the compressed video can be used as the first image with noise Sample image, the noise may be Gaussian noise.
- the total loss includes the first loss, the second loss and the third loss.
- the first loss is obtained based on the difference between the first output image and the second output image; the second loss is based on the first discrimination result and the first target result.
- the third loss is obtained based on the difference between the output image of at least one first feature extraction sub-network and the output image of the corresponding second feature extraction sub-network.
- the output of the trained discriminant network 30 is a matching degree between 0 and 1.
- the first target result is a matching degree close to 1 or equal to 1.
- adjusting the parameters of the first neural network according to the total loss refers to adjusting the parameters of the first neural network so that when the first training method is performed multiple times, the value of the total loss tends to decrease as a whole.
- the execution times of the first training method may be preset, or, when the total loss is less than a preset value, the first training method is not performed. It should also be noted that, in different times of the first training method, the used first sample images may be different.
- the difference between the two images is the difference in the low-frequency information of the two images, which can be characterized by L1 loss value, mean square error (MSE), similarity (SSIM), and the like.
- the first loss includes the L1 loss of the first output image and the second output image, and may specifically be x 1 Loss1, where x 1 is a preset weight, and Loss1 is the first output image and the second output image.
- y1 is the first output image
- y2 is the second output image.
- the second loss includes a cross-entropy loss between the first discrimination result and the first target result, specifically x 2 Loss2, where x 2 is a preset weight, and Loss2 is the first discrimination result of the discriminant network and the first Cross-entropy loss for a target result.
- step S23 specifically includes: setting the first output image with the ground truth label, and providing the first output image with the ground truth label to the trained discriminant network, so that the discriminant network outputs the first output image Discrimination results.
- the true value label is used to indicate that the image is a "true" sample
- the first target result is the probability corresponding to the true value label.
- the first target result is 1.
- the third loss is specifically obtained based on a difference between the transformed image of the output image of the at least one first feature extraction sub-network and the output image of the corresponding second feature extraction sub-network.
- the network architecture including the first neural network and the second neural network also includes a plurality of dimensionality reduction layers 40 .
- the dimension reduction layer 40 is in one-to-one correspondence with the first feature extraction sub-network ML1, and the dimension reduction layer 40 is configured to perform channel dimension reduction on the output image of the corresponding first feature extraction sub-network to generate an intermediate image;
- the number of channels of the output image of the second feature extraction sub-network is the same.
- the first training method also includes: providing a plurality of output images of the second feature extraction sub-networks to a plurality of dimensionality reduction layers in a one-to-one correspondence, so that each dimensionality reduction layer generates an intermediate image; the number of channels of the intermediate image is the same as The output images of the first feature extraction sub-network have the same number of channels.
- step S24 the parameters of the first neural network and the dimensionality reduction layer are adjusted.
- the third loss is obtained based on the sum of the differences between each of the intermediate images and the corresponding output image of the first feature extraction sub-network.
- the difference between the intermediate image and the output image of the second feature extraction sub-network is represented by the L2 loss of both.
- the third loss is x 3 Loss3, where Loss3 is the sum of the L2 losses of the output image of each first feature extraction sub-network and the corresponding intermediate image. Specifically, Loss3 is calculated according to the following formula:
- x 3 is the preset weight
- T is the number of the first feature extraction sub-network
- S n (z) is the output image of the n-th layer second feature extraction sub-network in the first neural network
- G n (z ) is the output image of the nth layer first feature extraction sub-network in the second neural network
- f(G n (z)) is the output of the dimension reduction layer corresponding to the nth layer first feature extraction sub-network in the second neural network the intermediate image.
- the trained first neural network is simplified, and the trained first neural network has fewer parameters and a simpler network structure, so that the training A good first neural network occupies less resources (eg, computing resources, storage resources, etc.) during its operation, and thus can be applied to lightweight terminals.
- resources eg, computing resources, storage resources, etc.
- the first loss is obtained based on the difference between the output result of the first neural network and the output result of the second neural network
- the second loss is based on the training
- the difference between the discrimination result of the good discriminant network and the first target result is obtained
- the third loss is obtained based on the difference between the output image of at least one first feature extraction sub-network and the output image of the corresponding second feature extraction sub-network, Therefore, the performance of the trained first neural network is as similar as possible to that of the second neural network. Therefore, the embodiments of the present disclosure can reduce the parameters of the image processing model on the premise of ensuring the image processing effect, thereby improving the image processing speed.
- the total loss further includes: a fourth loss, the fourth loss is obtained based on the perceptual loss of the first output image and the second output image.
- the perceptual loss is used to characterize the difference between the high-frequency information of two images (for example, detailed features such as texture and hair on the image).
- the fourth loss is: x 4 is the preset weight. is the perceptual loss of the first output image and the second output image, which is calculated according to the following formula:
- y1 is the first output image
- y2 is the second output image
- j is the layer number of the preset network layer in the discriminant network
- C is the channel number of the output image of the preset network layer
- H is The height of the output image of the preset network layer
- W is the width of the output image of the preset network layer.
- the preset network layer can output a convolutional layer with 128 image channels.
- FIG. 7 is a flowchart of an optional implementation manner of step S21 provided in the embodiment of the present disclosure.
- step S21 specifically includes: performing step S21a and step S21b alternately until a preset training condition is reached.
- the preset training condition is, for example, that the number of times of alternation between step S21a and step S21b reaches a preset number of times.
- S21a providing the second sample image to the current second neural network, so that the second neural network generates a first image with improved definition.
- the input of the characterization discriminant network is the output image of the second neural network or the discrimination result of the original sample image.
- S21b providing the third sample image to the current second neural network, so that the second neural network generates a second definition-improved image.
- the second definition-enhanced image is input into the discriminant network after parameter adjustment, so that the discriminant network after parameter-adjustment generates a second discrimination result based on the second definition-enhanced image.
- the parameters of the second neural network are adjusted to obtain an updated second neural network.
- the current second neural network is the second neural network to be trained; In each round of training after one round, the current second neural network is the second neural network updated in step S21b of the previous round of training.
- the current discriminant network is the discriminant network to be trained; in each round of training after the first round, the current discriminant network is the one after the parameters adjusted in step S21a of the previous round of training. discriminant network.
- the first term in the loss function of the second neural network is based on the difference between the second sharpness-enhanced image and its corresponding original sample image
- the second term in the loss function of the second neural network Based on the difference between the second discrimination result and the second target result.
- the first term in the loss function LossG of the second neural network is ⁇ 1 LossG1 , ⁇ 1 is a preset weight, and LossG1 is the difference between the second definition-improved image and its corresponding original sample image. L1 loss. specifically, y is the original sample image corresponding to the second definition enhancement image, The image is boosted for the second definition.
- the second term in the loss function of the second neural network is ⁇ 2 L D , where ⁇ 2 is a preset weight, and LD is the cross-entropy between the second discrimination result and the second target result.
- the second target result is used to indicate that the input of the discriminant network is the original image corresponding to the second definition-improved image, that is, the input of the discriminant network is used to indicate that the input is a "true" sample.
- the second target result is 1.
- ⁇ 3 is a preset weight.
- the preset network layer in the preset optimization network is the preset network layer in the preset optimization network
- j is the number of layers of the preset network layer in the preset optimization network
- C is the number of channels of the output image of the preset network layer
- H is the preset network layer.
- W be the width of the output image of the preset network layer; the preset optimization network adopts the VGG-19 network.
- the image output by each network layer is not a visually visible image, but is represented by a matrix.
- the height of the image can be regarded as the number of rows in the matrix, and the width of the image can be regarded as The number of columns of the matrix.
- the image output by the updated second neural network is as close as possible to the L1 loss value of the original sample image, and the image output by the second neural network is similar to the original sample image.
- the perceptual loss of the image is as close as possible, and at the same time, after the image output by the second neural network is provided to the discriminant network, the output result of the discriminant network is close to 1.
- the second sample image and the third sample image may be the same.
- the second sample images are different, and the third sample images are also different.
- the training step of the discriminant network may be performed first, and the training step of the generation network may also be performed first.
- lossless compression may be performed on the original video to obtain a lossless compressed video, and an image frame in the lossless compressed video image may be used as the original sample image; Image frames in the compressed video are used as second sample images or third sample images.
- the training process of step S21 may employ the Adam optimizer with a learning rate of 1e-4.
- the trained first neural network has fewer parameters and a simpler network structure than the second neural network, so that the first neural network occupies less resources (for example, computing resources, storage resources, etc.), so it can be applied to lightweight terminals.
- the training method of the first neural network to be trained can make the performance of the trained first neural network close to that of the trained second neural network. Therefore, the image processing method of the embodiment of the present disclosure can obtain high-definition images. At the same time, improve the image processing speed.
- FIG. 8 is an effect diagram before and after image processing using the image processing method according to an embodiment of the present disclosure.
- the left image in FIG. 8 is the input image before processing, and the right image is the processed target output image.
- the clarity of the image is improved.
- the parameter compression factor of the first convolutional network is greater than 50 times, and the processing speed is increased by about 15 times.
- the present disclosure also provides an image processing device, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the above-mentioned training method for an image processing model is implemented.
- the present disclosure also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the above-mentioned training method for an image processing model.
- the above-mentioned memory and the computer-readable storage medium include, but are not limited to, the following readable media: such as random access memory (RAM), read only memory (ROM), non-volatile random access memory (NVRAM), programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable PROM (EEPROM), Flash Memory, Magnetic or Optical Data Storage, Registers, Disk or Tape, such as Compact Disc (CD) or DVD (Digital Universal Disc) and other non-transitory media.
- processors include, but are not limited to, general purpose processors, central processing units (CPUs), microprocessors, digital signal processors (DSPs), controllers, microcontrollers, state machines, and the like.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (14)
- 一种图像处理方法,包括:利用训练好的第一神经网络对输入图像进行处理,得到目标输出图像;所述目标输出图像的清晰度大于所述输入图像的清晰度;其中,训练好的所述第一神经网络是对待训练的第一神经网络进行第一训练方法训练得到,所述第一训练方法包括:对待训练的第二神经网络和待训练的判别网络进行交替训练,得到训练好的第二神经网络和训练好的判别网络;其中,训练好的所述第二神经网络的参数多于待训练的所述第一神经网络的参数;训练好的所述第二神经网络配置为将接收到的、具有第一清晰度的图像变换为具有第二清晰度的图像,所述第二清晰度大于所述第一清晰度;待训练的所述第一神经网络包括:多个第一特征提取子网络和位于所述多个第一特征提取子网络之后的第一输出子网络,训练好的所述第二神经网络包括:多个第二特征提取子网络和位于所述多个第二特征提取子网络之后的第二输出子网络,所述第一特征提取子网络与所述第二特征提取子网络一一对应;将第一样本图像分别提供给训练好的所述第二神经网络和待训练的所述第一神经网络,以使待训练的所述第一神经网络输出第一输出图像,训练好的所述第二神经网络输出第二输出图像;将所述第一输出图像提供给训练好的所述判别网络,以使训练好的所述判别网络生成基于所述第一输出图像的第一判别结果;根据总损失调整所述第一神经网络的参数,以得到更新后的所述第一神经网络;其中,所述总损失包括第一损失、第二损失和第三损失,所述第一损失是基于所述第一输出图像和所述第二输出图像的差异得到的;所述第二损失是基于所述第一判别结果与第一目标结果的差异得到的;所述第三损失是基于至少一个所述第一特征提取子网络的输出图像与相应的所述第二特征提取子网络的输出图像的差异得到的。
- 根据权利要求1所述的图像处理方法,其中,所述第一特征提取子网络的输出图像的通道数小于相应的第二特征提取子网络的 输出图像的通道数;所述第一训练方法还包括:将多个所述第二特征提取子网络的输出图像一一对应地提供给多个降维层,以使每个降维层生成中间图像;所述中间图像的通道数与所述第一特征提取子网络的输出图像的通道数相同;根据总损失函数调整所述第一神经网络的参数,包括:对所述第一神经网络和所述降维层的参数都进行调整;其中,所述第三损失是基于每个所述中间图像与相应的所述第一特征提取子网络的输出图像之间的差异的总和得到的。
- 根据权利要求1所述的图像处理方法,其中,所述总损失还包括:第四损失,所述第四损失是基于所述第一输出图像与所述第二输出图像的感知损失得到的。
- 根据权利要求1至4中任意一项所述的图像处理方法,其中,所述第一损失包括所述第一输出图像与所述第二输出图像的L1损失。
- 根据权利要求1至4中任意一项所述的图像处理方法,其中,所述第二损失包括所述第一判别结果与第一目标结果的交叉熵损失。
- 根据权利要求2至4中任意一项所述的图像处理方法,其中,所述第三损失项包括每个第一特征提取子网络的输出图像与相应的 中间图像的L2损失的总和。
- 根据权利要求1至4中任意一项所述的图像处理方法,其中,将所述第一输出图像提供给训练好的所述判别网络,以使训练好的所述判别网络生成基于所述第一输出图像的第一判别结果,包括:将所述第一输出图像设置为带有真值标签,并将具有真值标签的第一输出图像提供给训练好的所述判别网络,以使所述判别网络输出第一判别结果。
- 根据权利要求1至4中任意一项所述的图像处理方法,其中,对待训练的第二神经网络和待训练的判别网络进行交替训练的步骤中,对待训练的判别网络进行训练,包括:将第二样本图像提供给当前的所述第二神经网络,以使当前的所述第二神经网络生成第一清晰度提升图像;将所述第一清晰度提升图像以及与所述第二样本图像对应的原始样本图像提供给当前的所述判别网络,并根据当前的所述判别网络的损失函数来调节当前的所述判别网络的参数,使得调参后的所述判别网络输出能够表征所述判别网络的输入为所述第二神经网络的输出图像还是所述原始样本图像的判别结果。
- 根据权利要求9所述的图像处理方法,其中,对待训练的第二神经网络和待训练的判别网络进行交替训练的步骤中,对待训练的第二神经网络进行训练,包括:将第三样本图像提供给当前的所述第二神经网络,以使当前的所述第二神经网络生成第二清晰度提升图像;将所述第二清晰度提升图像输入调参后的所述判别网络,以使调参后的所述判别网络生成基于所述第二清晰度提升图像的第二判别结果;基于当前的所述第二神经网络的损失函数,调整当前的所述第二神经网络的参数,以得到更新后的第二神经网络;当前的所述第二神经网络的损失函数中的第一项基于所述第二清晰度提升图像与其对应的原始样本图像之间的差异,当前的所述第二神经网络的损失函 数中的第二项基于所述第二判别结果与第二目标结果之间的差异。
- 根据权利要求10所述的图像处理方法,其中,当前的所述第二神经网络的损失函数中的第一项为λ 1LossG1,λ 1为预设的权值,LossG1为所述第二清晰度提升图像与其对应的原始样本图像之间的L1损失;当前的所述第二神经网络的损失函数中的第二项为λ 2L D,λ 2为预设的权值,L D为所述第二判别结果与所述第二目标结果的交叉熵;
- 根据权利要求1至4中任意一项所述的图像处理方法,其中,待训练的所述第一神经网络包括:多个第一上采样层、多个第一下采样层和多个单层卷积层,每个所述第一上采样层和每个所述第一下采样层均位于两个所述单层卷积层之间;倒数第i个所述单层卷积层的输入数据包括倒数第i个所述第一上采样层的输出数据和正数第i个所述单层卷积层的输出数据的叠加;其中,所述单层卷积层的数量为偶数,i大于0且小于所述单层卷积层的数量的一半;训练好的所述第二神经网络包括:多个第二上采样层、多个第二下采样层和多个残差块,所述多个第二上采样层与所述多个第一上采样层一一对应,所述多个第二下采样层与所述多个第一下采样层一一对应,所述多个残差块与所述多个单层卷积层一一对应;倒数第i个所述残差块的输入数据为倒数第i个所述第二上采样层的输出数据和正数第i个所述残差块的输出数据的叠加;所述第一特征提取子网络包括:所述第一上采样层、或者所述第一下采样层、或者所述单层卷积层;所述第一输出子网络包括所述单层卷积层;所述第二特征提取子网络包括:所述第二上采样层、或者所述第二下采样层、或者所述残差块;所述第二输出子网络包括所述残差块。
- 一种图像处理设备,包括存储器和处理器,所述存储器上存储有计算机程序,其中,所述计算机程序被所述处理器执行时实现权利要求1至12中任意一项所述的图像处理方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至12中任意一项所述的图像处理方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202080002356.2A CN114641792A (zh) | 2020-10-16 | 2020-10-16 | 图像处理方法、图像处理设备和可读存储介质 |
PCT/CN2020/121405 WO2022077417A1 (zh) | 2020-10-16 | 2020-10-16 | 图像处理方法、图像处理设备和可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/121405 WO2022077417A1 (zh) | 2020-10-16 | 2020-10-16 | 图像处理方法、图像处理设备和可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022077417A1 true WO2022077417A1 (zh) | 2022-04-21 |
Family
ID=81208702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/121405 WO2022077417A1 (zh) | 2020-10-16 | 2020-10-16 | 图像处理方法、图像处理设备和可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114641792A (zh) |
WO (1) | WO2022077417A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758030A (zh) * | 2022-04-29 | 2022-07-15 | 天津大学 | 融合物理模型和深度学习的水下偏振成像方法 |
CN116958556A (zh) * | 2023-08-01 | 2023-10-27 | 东莞理工学院 | 用于椎体和椎间盘分割的双通道互补脊柱图像分割方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116128768B (zh) * | 2023-04-17 | 2023-07-11 | 中国石油大学(华东) | 一种带有去噪模块的无监督图像低照度增强方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018053340A1 (en) * | 2016-09-15 | 2018-03-22 | Twitter, Inc. | Super resolution using a generative adversarial network |
WO2019197712A1 (en) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | An apparatus, a method and a computer program for video coding and decoding |
CN111192206A (zh) * | 2019-12-03 | 2020-05-22 | 河海大学 | 一种提高图像清晰度的方法 |
CN111767979A (zh) * | 2019-04-02 | 2020-10-13 | 京东方科技集团股份有限公司 | 神经网络的训练方法、图像处理方法、图像处理装置 |
-
2020
- 2020-10-16 WO PCT/CN2020/121405 patent/WO2022077417A1/zh active Application Filing
- 2020-10-16 CN CN202080002356.2A patent/CN114641792A/zh active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018053340A1 (en) * | 2016-09-15 | 2018-03-22 | Twitter, Inc. | Super resolution using a generative adversarial network |
WO2019197712A1 (en) * | 2018-04-09 | 2019-10-17 | Nokia Technologies Oy | An apparatus, a method and a computer program for video coding and decoding |
CN111767979A (zh) * | 2019-04-02 | 2020-10-13 | 京东方科技集团股份有限公司 | 神经网络的训练方法、图像处理方法、图像处理装置 |
CN111192206A (zh) * | 2019-12-03 | 2020-05-22 | 河海大学 | 一种提高图像清晰度的方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114758030A (zh) * | 2022-04-29 | 2022-07-15 | 天津大学 | 融合物理模型和深度学习的水下偏振成像方法 |
CN116958556A (zh) * | 2023-08-01 | 2023-10-27 | 东莞理工学院 | 用于椎体和椎间盘分割的双通道互补脊柱图像分割方法 |
CN116958556B (zh) * | 2023-08-01 | 2024-03-19 | 东莞理工学院 | 用于椎体和椎间盘分割的双通道互补脊柱图像分割方法 |
Also Published As
Publication number | Publication date |
---|---|
CN114641792A (zh) | 2022-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022077417A1 (zh) | 图像处理方法、图像处理设备和可读存储介质 | |
JP7446997B2 (ja) | 敵対的生成ネットワークのトレーニング方法、画像処理方法、デバイスおよび記憶媒体 | |
CN107977932B (zh) | 一种基于可鉴别属性约束生成对抗网络的人脸图像超分辨率重建方法 | |
WO2019120110A1 (zh) | 图像重建方法及设备 | |
CN110059796B (zh) | 卷积神经网络的生成方法及装置 | |
US11216910B2 (en) | Image processing system, image processing method and display device | |
US10325346B2 (en) | Image processing system for downscaling images using perceptual downscaling method | |
US10034005B2 (en) | Banding prediction for video encoding | |
EP4109392A1 (en) | Image processing method and image processing device | |
US9282330B1 (en) | Method and apparatus for data compression using content-based features | |
US20220335583A1 (en) | Image processing method, apparatus, and system | |
CN111489364B (zh) | 基于轻量级全卷积神经网络的医学图像分割方法 | |
US20230177641A1 (en) | Neural network training method, image processing method, and apparatus | |
TW202141358A (zh) | 圖像修復方法及裝置、存儲介質、終端 | |
CN110717868B (zh) | 视频高动态范围反色调映射模型构建、映射方法及装置 | |
CN111507910B (zh) | 一种单图像去反光的方法、装置及存储介质 | |
US20220414838A1 (en) | Image dehazing method and system based on cyclegan | |
CN110111266B (zh) | 一种基于深度学习去噪的近似信息传递算法改进方法 | |
CN109871790B (zh) | 一种基于混合神经网络模型的视频去色方法 | |
CN110809126A (zh) | 一种基于自适应可变形卷积的视频帧插值方法及系统 | |
CN112927137A (zh) | 一种用于获取盲超分辨率图像的方法、设备及存储介质 | |
CN114830168A (zh) | 图像重建方法、电子设备和计算机可读存储介质 | |
US20230379475A1 (en) | Codec Rate Distortion Compensating Downsampler | |
Yang et al. | Blind image quality measurement via data-driven transform-based feature enhancement | |
CN117094910A (zh) | 基于非线性无激活网络的生成对抗网络图像去模糊方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20957195 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20957195 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.02.2024) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20957195 Country of ref document: EP Kind code of ref document: A1 |