WO2022077417A1 - 图像处理方法、图像处理设备和可读存储介质 - Google Patents

图像处理方法、图像处理设备和可读存储介质 Download PDF

Info

Publication number
WO2022077417A1
WO2022077417A1 PCT/CN2020/121405 CN2020121405W WO2022077417A1 WO 2022077417 A1 WO2022077417 A1 WO 2022077417A1 CN 2020121405 W CN2020121405 W CN 2020121405W WO 2022077417 A1 WO2022077417 A1 WO 2022077417A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
image
neural network
trained
layer
Prior art date
Application number
PCT/CN2020/121405
Other languages
English (en)
French (fr)
Inventor
陈冠男
段然
高艳
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202080002356.2A priority Critical patent/CN114641792A/zh
Priority to PCT/CN2020/121405 priority patent/WO2022077417A1/zh
Publication of WO2022077417A1 publication Critical patent/WO2022077417A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, an image processing device and a readable storage medium.
  • the compressed video Before video is transmitted, it needs to be compressed and encoded due to bandwidth limitations.
  • the compressed video will generate a variety of compression noises, which affects people's viewing experience of the video on the display terminal.
  • aspects of the present disclosure provide an image processing method, an image processing apparatus, and a readable storage medium.
  • Embodiments of the present disclosure provide an image processing method, including:
  • the trained first neural network is obtained by training the first neural network to be trained by a first training method, and the first training method includes:
  • the second neural network to be trained and the discriminant network to be trained are alternately trained, and the trained second neural network and the trained discriminant network are obtained; wherein, the parameters of the trained second neural network are more than those to be trained.
  • parameters of the first neural network are configured to transform a received image with a first definition into an image with a second definition, the second definition being greater than the the first definition;
  • the first neural network to be trained includes: a plurality of first feature extraction sub-networks and a first output sub-network located after the plurality of first feature extraction sub-networks, the trained
  • the second neural network includes: a plurality of second feature extraction sub-networks and a second output sub-network located after the plurality of second feature extraction sub-networks, the first feature extraction sub-network and the second feature extraction sub-network One-to-one correspondence with the network;
  • the first sample image is provided to the trained second neural network and the first neural network to be trained, so that the first neural network to be trained outputs the first output image, and the trained first neural network outputs the first output image.
  • the second neural network outputs a second output image;
  • the number of channels of the output image of the first feature extraction sub-network is less than the number of channels of the output image of the corresponding second feature extraction sub-network;
  • the first training method further includes: providing a plurality of output images of the second feature extraction sub-networks to a plurality of dimensionality reduction layers in a one-to-one correspondence, so that each dimensionality reduction layer generates an intermediate image; the intermediate image The number of channels is the same as the number of channels of the output image of the first feature extraction sub-network;
  • Adjusting the parameters of the first neural network according to the total loss function includes: adjusting the parameters of the first neural network and the dimensionality reduction layer; wherein, the third loss is based on each of the intermediate images is obtained by summing the differences between the output images corresponding to the first feature extraction sub-network.
  • the total loss further includes: a fourth loss, the fourth loss is derived based on the perceptual loss of the first output image and the second output image.
  • the perceptual loss of the first output image and the second output image Calculated according to the following formula:
  • y1 is the first output image
  • y2 is the second output image
  • j is the layer number of the preset network layer in the discriminant network
  • C is the channel number of the output image of the preset network layer
  • H is The height of the output image of the preset network layer
  • W is the width of the output image of the preset network layer.
  • the first loss includes an L1 loss of the first output image and the second output image.
  • the second loss includes a cross-entropy loss of the first discriminant result and the first target result.
  • the third loss term includes the sum of the L2 loss of the output image of each first feature extraction sub-network and the corresponding intermediate image.
  • the first output image is provided to the trained discrimination network, so that the trained discrimination network generates a first discrimination result based on the first output image, comprising:
  • the first output image is set to have a ground truth label, and the first output image with ground truth label is provided to the trained discriminant network, so that the discriminant network outputs a first discriminant result.
  • the training of the discriminant network to be trained includes:
  • the current discriminant network providing the first sharpness-enhancing image and the original sample image corresponding to the second sample image to the current discriminant network, and adjusting the current discriminant network according to the loss function of the current discriminant network , so that the output of the discrimination network after parameter adjustment can represent whether the input of the discrimination network is the output image of the second neural network or the discrimination result of the original sample image.
  • the training of the second neural network to be trained includes:
  • the parameters of the current second neural network are adjusted to obtain an updated second neural network;
  • the current loss function of the second neural network is the first
  • the second term is based on the difference between the second sharpness-enhanced image and its corresponding original sample image, and the second term in the current loss function of the second neural network is based on the difference between the second discrimination result and the second target result. difference between.
  • the first item in the current loss function of the second neural network is ⁇ 1 LossG1 , ⁇ 1 is a preset weight, and LossG1 is the second definition enhancement image and its corresponding original L1 loss between sample images;
  • the second term in the current loss function of the second neural network is ⁇ 2 L D , ⁇ 2 is a preset weight, and LD is the cross entropy between the second discrimination result and the second target result ;
  • ⁇ 3 is a preset weight
  • y is the original sample image corresponding to the second sharpness-enhancing image, upscaling the image for the second definition
  • the preset network layer in the preset optimization network is the preset network layer in the preset optimization network
  • j is the number of layers of the preset network layer in the preset optimization network
  • C is the number of channels of the output image of the preset network layer
  • H is the preset network layer.
  • the first neural network to be trained includes: a plurality of first upsampling layers, a plurality of first downsampling layers, and a plurality of single-layer convolutional layers, each of the first upsampling layers and each of the first down-sampling layers is located between two of the single-layer convolutional layers;
  • the input data of the penultimate i-th single-layer convolutional layer includes the penultimate i-th first up-sampling layer The superposition of the output data of the positive number and the output data of the i-th single-layer convolutional layer; wherein, the number of the single-layer convolutional layers is an even number, and i is greater than 0 and less than the number of the single-layered convolutional layers. half;
  • the trained second neural network includes: a plurality of second upsampling layers, a plurality of second downsampling layers and a plurality of residual blocks, the plurality of second upsampling layers and the plurality of first upsampling layers;
  • the sampling layers are in one-to-one correspondence
  • the plurality of second down-sampling layers are in one-to-one correspondence with the plurality of first down-sampling layers
  • the plurality of residual blocks are in one-to-one correspondence with the plurality of single-layer convolutional layers;
  • the input data of the last i-th said residual block is the superposition of the output data of the last i-th said second upsampling layer and the output data of the positive i-th said residual block;
  • the first feature extraction sub-network includes: the first up-sampling layer, or the first down-sampling layer, or the single-layer convolutional layer; the first output sub-network includes the single-layer convolutional layer layer; the second feature extraction sub-network includes: the second up-sampling layer, or the second down-sampling layer, or the residual block; the second output sub-network includes the residual block.
  • Embodiments of the present disclosure further provide an image processing device, including a memory and a processor, where a computer program is stored in the memory, wherein the computer program implements the above-mentioned image processing method when executed by the processor.
  • Embodiments of the present disclosure further provide a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned image processing method when executed by a processor.
  • FIG. 1 is a schematic diagram of a convolutional neural network.
  • FIG. 2 is a schematic diagram of an image processing method provided in an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of a first training method provided in an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a network architecture including a first neural network and a second neural network provided in an embodiment of the present disclosure.
  • Figure 5 is an example diagram of a residual block.
  • FIG. 6 is a schematic structural diagram of a trained discriminant network provided in an embodiment of the present disclosure.
  • FIG. 7 is a flowchart of an optional implementation manner of step S21 provided in an embodiment of the present disclosure.
  • FIG. 8 is an effect diagram before and after image processing using the image processing method according to an embodiment of the present disclosure.
  • the compressed video Before video is transmitted, it needs to be compressed and encoded due to bandwidth limitations.
  • the compressed video will generate a variety of compression noises, which affects people's viewing experience of the video on the display terminal.
  • the video compression and repair technology based on deep learning can improve the repair effect of video compression noise.
  • the algorithm model of deep learning has a large amount of parameters, which results in an excessive amount of calculation of the display terminal.
  • FIG. 1 is a schematic diagram of a convolutional neural network.
  • the convolutional neural network can be used for image processing, which uses images as input and output, and replaces scalar weights with filters (ie, convolutions).
  • FIG. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited by the embodiments of the present disclosure.
  • the convolutional neural network includes an input layer 101 , a hidden layer 102 and an output layer 103 . In the input layer 101 there are 4 input images, in the middle hidden layer 102 there are 3 units to output 3 output images, and in the output layer 103 there are 2 units to output 2 output images.
  • a convolutional layer has a weight w ij k and a bias b i k , where the weight w ij k represents the convolution kernel, and the bias is a scalar superimposed to the output of the convolutional layer, where k is the input layer Label No. 101, i and j are the labels of the unit of input layer 101 and the unit of hidden layer 102, respectively.
  • the first convolutional layer 201 includes a first set of convolution kernels (wi ij 1 in FIG. 1 ) and a first set of biases ( bi 1 in FIG. 1 ).
  • the second convolutional layer 202 includes a second set of convolution kernels (w ij 2 in FIG.
  • each convolutional layer includes dozens or hundreds of convolution kernels, and if the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.
  • the convolutional neural network further includes a first activation layer 203 and a second activation layer 204 .
  • the first activation layer 203 is located after the first convolutional layer 201
  • the second activation layer 204 is located after the second convolutional layer 202 .
  • the activation layer includes an activation function, which is used to introduce nonlinear factors into the convolutional neural network, so that the convolutional neural network can better solve more complex problems.
  • the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), a hyperbolic tangent function (tanh function), and the like.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer.
  • the convolutional neural network in Figure 1 can be used to improve the clarity of the image.
  • the trained convolutional neural network improves the clarity of the input low-definition image to obtain a high-definition image.
  • the training process of the convolutional neural network is the optimization process of the parameters of the convolutional neural network.
  • the loss of the convolutional neural network helps to optimize the parameters (weights) of the convolutional neural network, and the goal of the training process is to minimize the loss of the neural network by optimizing the parameters of the neural network.
  • the loss of the neural network is used to measure the quality of the prediction of the network model, that is, to express the degree of the gap between the predicted results and the actual data.
  • FIG. 2 is a schematic diagram of the image processing method provided in the embodiment of the present disclosure.
  • the image processing method includes: S10. Use a trained first neural network to process an input image Process to get the target output image. The clarity of the target output image is higher than the clarity of the input image.
  • “sharpness” refers to, for example, the clarity of each detail shadow pattern and its boundary in an image. The higher the clarity, the better the perception effect of the human eye.
  • the definition of the target output image is higher than that of the input image, for example, it means that the input image is processed by using the image processing method provided by the embodiment of the present disclosure, such as denoising and/or deblurring processing, so that the obtained image after processing is processed.
  • the target output image is sharper than the input image.
  • the trained first neural network is obtained by training the first neural network to be trained by the first training method.
  • FIG. 3 is a schematic diagram of the first training method provided in the embodiment of the present disclosure. , as shown in Figure 3, the first training method includes:
  • the trained second neural network has more parameters than the to-be-trained first neural network.
  • 4 is a schematic diagram of a network architecture including a first neural network to be trained and a trained second neural network provided in an embodiment of the present disclosure
  • the first neural network 10 to be trained includes: a plurality of first feature extraction sub-networks ML1 and the first output sub-network OL1 located after the plurality of first feature extraction sub-networks ML1
  • the trained second neural network 20 includes: a plurality of second feature extraction sub-networks ML2 and a plurality of second feature extraction sub-networks ML2 After the second output sub-network OL2, the first feature extraction sub-network ML1 corresponds to the second feature extraction sub-network ML2 one-to-one.
  • the first neural network 10 to be trained includes: a plurality of first upsampling layers 13, a plurality of first downsampling layers 12, and a plurality of single-layer convolutional layers 11, each first upsampling layer 13 and each first downsampling layer 12 are located between two single-layer convolutional layers 11; the input data of the penultimate i-th single-layer convolutional layer 11 includes the output data of the penultimate i-th first upsampling layer 13 and the superposition of the output data of the positive i-th single-layer convolutional layer 11.
  • the number of single-layer convolutional layers 11 is an even number, and i is greater than 0 and less than half of the number of single-layered convolutional layers.
  • the second neural network 20 includes: a plurality of second upsampling layers 23 , a plurality of second downsampling layers 22 and a plurality of residual blocks 21 , the plurality of second upsampling layers 23 and the plurality of first upsampling layers 13 are one One-to-one correspondence, multiple second downsampling layers 22 correspond to multiple first downsampling layers 12 one-to-one, multiple residual blocks 21 correspond to multiple single-layer convolutional layers 11 one-to-one; the last i-th residual block
  • the input data of 21 includes the superposition of the output data of the penultimate ith second upsampling layer 23 and the output data of the positive ith residual block 21 .
  • the single-layer convolution layer 11 , the first up-sampling layer 13 and the first down-sampling layer 12 all use 3*3 convolution kernels, and the number of convolution kernels is 128.
  • the sampling magnification of the second upsampling layer 23 is the same as that of the first upsampling layer 13
  • the sampling magnification of the second downsampling layer 22 is the same as that of the first downsampling layer 12 .
  • both the first up-sampling layer 13 and the first down-sampling layer 12 are double-sampled.
  • the first downsampling layer 12 and the second downsampling layer 22 may include an inverse Muxout layer, a Strided Convolution, a Maxpool Layer, or a standard per-channel downsampler (such as bicubic interpolation) .
  • the first upsampling layer 13 and the second upsampling layer 23 may include Muxout layers, Strided Transposed Convolution, or standard per-channel upsamplers (eg, bicubic interpolation).
  • FIG. 5 is an example diagram of a residual block.
  • each residual block 21 includes three sub-residual blocks 21 a connected in sequence.
  • Each sub-residual block 21a adopts two convolutional layers with 3*3 convolution kernels, and an activation layer is connected between the two convolutional layers.
  • an activation layer is connected between the two convolutional layers.
  • its input is superimposed to the last convolutional layer on the output result of , and thus serve as the output of the sub-residual block 21a.
  • the activation layer includes an activation function, and the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), a hyperbolic tangent function (tanh function), and the like.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer.
  • the residual block 21 in the second neural network 20 is replaced by a single-layer convolutional layer 11 , thereby reducing the amount of parameters of the first convolutional network 10 .
  • the first feature extraction sub-network ML1 includes: a first up-sampling layer 13, or a first down-sampling layer 12, or the single-layer convolutional layer 11; the first output sub-network OL1 includes a single-layer convolutional layer 21;
  • the second feature extraction sub-network ML2 includes: a second up-sampling layer 23 , or a second down-sampling layer 22 , or a residual block 21 ; the second output sub-network OL2 includes a residual block 21 .
  • the number of channels of the output image of the first feature extraction sub-network ML1 is greater than the number of channels of the output image of the second feature extraction sub-network ML2.
  • the number of channels of the output image of the first feature extraction sub-network ML1 is 128, and the number of channels of the output image of the second feature extraction sub-network ML2 is 32.
  • the image input to each network layer is represented by a matrix, and the image received by the first layer in the neural network can be an image matrix with three channels of R, G, and B, that is, , the image matrix for each channel represents the data for the red, green, or blue components of the image.
  • Each network layer is used to extract features from the image.
  • the output data of the network layer includes multiple matrices, and each matrix represents a channel of the image.
  • the second neural network to be trained and the discriminant network to be trained are alternately trained, so as to compete with each other to obtain the best model.
  • the trained second neural network is configured to transform a received image with a first definition into an image with a second definition, the second definition being greater than the first definition.
  • the trained discriminant network is configured to determine the matching degree between the output result of the second neural network and the preset standard image, and the matching degree is between 0 and 1. Among them, when the second neural network to be trained is trained, the parameters of the current second neural network are adjusted so that after the output result of the second neural network after parameter adjustment is input into the current discriminant network, the output of the discriminant network is as close to 1 as possible.
  • the parameters of the current discriminant network are adjusted so that after the preset standard image is input into the current discriminant network, the output result of the current discriminant network is as close to 1 as possible (that is, the discriminant network determines Its input is a "true” sample), and after the current output of the second neural network enters the discriminant network, the output result of the discriminant network is as close to 0 as possible (that is, the discriminant network determines that its input is a "fake” sample).
  • the discriminant network is continuously optimized to distinguish the output results of the second neural network from the preset standard images as much as possible, and the second neural network is continuously optimized to make the output results as far as possible. Close to the preset standard image.
  • This method allows two neural networks to compete and improve each time based on the better and better results of the other network to get better and better network models.
  • FIG. 6 is a schematic structural diagram of a trained discriminant network provided in an embodiment of the present disclosure.
  • the trained discriminant network 30 includes a plurality of convolutional layers 31 to 34 and a fully connected layer 35.
  • Each convolutional layer 31-34 adopts a 2-fold downsampling convolutional layer, and an activation layer is connected behind each convolutional layer 31-34.
  • the activation layer includes an activation function, and the activation function can include a linear correction unit (ReLU) function, S type function (Sigmoid function) or hyperbolic tangent function (tanh function), etc.
  • Each convolution layer 31 to 34 uses a 3*3 convolution kernel.
  • the number of channels of the image output by the convolution layer 31 is 32, the number of channels of the image output by the convolution layer 32 is 64, and the number of channels output by the convolution layer 33 is 32.
  • the number of channels of the image is 128, and the number of channels of the image output by the convolutional layer 34 is 192.
  • the fully connected layer 35 outputs a vector of 1024*1, and then passes through the activation layer (for example, the activation layer uses sigmoid as the activation function), and outputs a value between 0 and 1.
  • the structures of the trained discriminant network and the discriminant network to be trained ie, the number of convolutional layers, the number of convolutional kernels in the convolutional layers
  • the difference lies in the different weights in the convolutional layers.
  • the number of layers of the network layers in the trained first neural network 10 and the trained discriminant network 30 in FIG. 4 and FIG. 5 is only an exemplary illustration. structure is adjusted.
  • the original video can be compressed at a low bit rate (for example, the compression bit rate is 1 Mbps) to obtain a compressed video, and each frame of image in the compressed video can be used as the first image with noise Sample image, the noise may be Gaussian noise.
  • the total loss includes the first loss, the second loss and the third loss.
  • the first loss is obtained based on the difference between the first output image and the second output image; the second loss is based on the first discrimination result and the first target result.
  • the third loss is obtained based on the difference between the output image of at least one first feature extraction sub-network and the output image of the corresponding second feature extraction sub-network.
  • the output of the trained discriminant network 30 is a matching degree between 0 and 1.
  • the first target result is a matching degree close to 1 or equal to 1.
  • adjusting the parameters of the first neural network according to the total loss refers to adjusting the parameters of the first neural network so that when the first training method is performed multiple times, the value of the total loss tends to decrease as a whole.
  • the execution times of the first training method may be preset, or, when the total loss is less than a preset value, the first training method is not performed. It should also be noted that, in different times of the first training method, the used first sample images may be different.
  • the difference between the two images is the difference in the low-frequency information of the two images, which can be characterized by L1 loss value, mean square error (MSE), similarity (SSIM), and the like.
  • the first loss includes the L1 loss of the first output image and the second output image, and may specifically be x 1 Loss1, where x 1 is a preset weight, and Loss1 is the first output image and the second output image.
  • y1 is the first output image
  • y2 is the second output image.
  • the second loss includes a cross-entropy loss between the first discrimination result and the first target result, specifically x 2 Loss2, where x 2 is a preset weight, and Loss2 is the first discrimination result of the discriminant network and the first Cross-entropy loss for a target result.
  • step S23 specifically includes: setting the first output image with the ground truth label, and providing the first output image with the ground truth label to the trained discriminant network, so that the discriminant network outputs the first output image Discrimination results.
  • the true value label is used to indicate that the image is a "true" sample
  • the first target result is the probability corresponding to the true value label.
  • the first target result is 1.
  • the third loss is specifically obtained based on a difference between the transformed image of the output image of the at least one first feature extraction sub-network and the output image of the corresponding second feature extraction sub-network.
  • the network architecture including the first neural network and the second neural network also includes a plurality of dimensionality reduction layers 40 .
  • the dimension reduction layer 40 is in one-to-one correspondence with the first feature extraction sub-network ML1, and the dimension reduction layer 40 is configured to perform channel dimension reduction on the output image of the corresponding first feature extraction sub-network to generate an intermediate image;
  • the number of channels of the output image of the second feature extraction sub-network is the same.
  • the first training method also includes: providing a plurality of output images of the second feature extraction sub-networks to a plurality of dimensionality reduction layers in a one-to-one correspondence, so that each dimensionality reduction layer generates an intermediate image; the number of channels of the intermediate image is the same as The output images of the first feature extraction sub-network have the same number of channels.
  • step S24 the parameters of the first neural network and the dimensionality reduction layer are adjusted.
  • the third loss is obtained based on the sum of the differences between each of the intermediate images and the corresponding output image of the first feature extraction sub-network.
  • the difference between the intermediate image and the output image of the second feature extraction sub-network is represented by the L2 loss of both.
  • the third loss is x 3 Loss3, where Loss3 is the sum of the L2 losses of the output image of each first feature extraction sub-network and the corresponding intermediate image. Specifically, Loss3 is calculated according to the following formula:
  • x 3 is the preset weight
  • T is the number of the first feature extraction sub-network
  • S n (z) is the output image of the n-th layer second feature extraction sub-network in the first neural network
  • G n (z ) is the output image of the nth layer first feature extraction sub-network in the second neural network
  • f(G n (z)) is the output of the dimension reduction layer corresponding to the nth layer first feature extraction sub-network in the second neural network the intermediate image.
  • the trained first neural network is simplified, and the trained first neural network has fewer parameters and a simpler network structure, so that the training A good first neural network occupies less resources (eg, computing resources, storage resources, etc.) during its operation, and thus can be applied to lightweight terminals.
  • resources eg, computing resources, storage resources, etc.
  • the first loss is obtained based on the difference between the output result of the first neural network and the output result of the second neural network
  • the second loss is based on the training
  • the difference between the discrimination result of the good discriminant network and the first target result is obtained
  • the third loss is obtained based on the difference between the output image of at least one first feature extraction sub-network and the output image of the corresponding second feature extraction sub-network, Therefore, the performance of the trained first neural network is as similar as possible to that of the second neural network. Therefore, the embodiments of the present disclosure can reduce the parameters of the image processing model on the premise of ensuring the image processing effect, thereby improving the image processing speed.
  • the total loss further includes: a fourth loss, the fourth loss is obtained based on the perceptual loss of the first output image and the second output image.
  • the perceptual loss is used to characterize the difference between the high-frequency information of two images (for example, detailed features such as texture and hair on the image).
  • the fourth loss is: x 4 is the preset weight. is the perceptual loss of the first output image and the second output image, which is calculated according to the following formula:
  • y1 is the first output image
  • y2 is the second output image
  • j is the layer number of the preset network layer in the discriminant network
  • C is the channel number of the output image of the preset network layer
  • H is The height of the output image of the preset network layer
  • W is the width of the output image of the preset network layer.
  • the preset network layer can output a convolutional layer with 128 image channels.
  • FIG. 7 is a flowchart of an optional implementation manner of step S21 provided in the embodiment of the present disclosure.
  • step S21 specifically includes: performing step S21a and step S21b alternately until a preset training condition is reached.
  • the preset training condition is, for example, that the number of times of alternation between step S21a and step S21b reaches a preset number of times.
  • S21a providing the second sample image to the current second neural network, so that the second neural network generates a first image with improved definition.
  • the input of the characterization discriminant network is the output image of the second neural network or the discrimination result of the original sample image.
  • S21b providing the third sample image to the current second neural network, so that the second neural network generates a second definition-improved image.
  • the second definition-enhanced image is input into the discriminant network after parameter adjustment, so that the discriminant network after parameter-adjustment generates a second discrimination result based on the second definition-enhanced image.
  • the parameters of the second neural network are adjusted to obtain an updated second neural network.
  • the current second neural network is the second neural network to be trained; In each round of training after one round, the current second neural network is the second neural network updated in step S21b of the previous round of training.
  • the current discriminant network is the discriminant network to be trained; in each round of training after the first round, the current discriminant network is the one after the parameters adjusted in step S21a of the previous round of training. discriminant network.
  • the first term in the loss function of the second neural network is based on the difference between the second sharpness-enhanced image and its corresponding original sample image
  • the second term in the loss function of the second neural network Based on the difference between the second discrimination result and the second target result.
  • the first term in the loss function LossG of the second neural network is ⁇ 1 LossG1 , ⁇ 1 is a preset weight, and LossG1 is the difference between the second definition-improved image and its corresponding original sample image. L1 loss. specifically, y is the original sample image corresponding to the second definition enhancement image, The image is boosted for the second definition.
  • the second term in the loss function of the second neural network is ⁇ 2 L D , where ⁇ 2 is a preset weight, and LD is the cross-entropy between the second discrimination result and the second target result.
  • the second target result is used to indicate that the input of the discriminant network is the original image corresponding to the second definition-improved image, that is, the input of the discriminant network is used to indicate that the input is a "true" sample.
  • the second target result is 1.
  • ⁇ 3 is a preset weight.
  • the preset network layer in the preset optimization network is the preset network layer in the preset optimization network
  • j is the number of layers of the preset network layer in the preset optimization network
  • C is the number of channels of the output image of the preset network layer
  • H is the preset network layer.
  • W be the width of the output image of the preset network layer; the preset optimization network adopts the VGG-19 network.
  • the image output by each network layer is not a visually visible image, but is represented by a matrix.
  • the height of the image can be regarded as the number of rows in the matrix, and the width of the image can be regarded as The number of columns of the matrix.
  • the image output by the updated second neural network is as close as possible to the L1 loss value of the original sample image, and the image output by the second neural network is similar to the original sample image.
  • the perceptual loss of the image is as close as possible, and at the same time, after the image output by the second neural network is provided to the discriminant network, the output result of the discriminant network is close to 1.
  • the second sample image and the third sample image may be the same.
  • the second sample images are different, and the third sample images are also different.
  • the training step of the discriminant network may be performed first, and the training step of the generation network may also be performed first.
  • lossless compression may be performed on the original video to obtain a lossless compressed video, and an image frame in the lossless compressed video image may be used as the original sample image; Image frames in the compressed video are used as second sample images or third sample images.
  • the training process of step S21 may employ the Adam optimizer with a learning rate of 1e-4.
  • the trained first neural network has fewer parameters and a simpler network structure than the second neural network, so that the first neural network occupies less resources (for example, computing resources, storage resources, etc.), so it can be applied to lightweight terminals.
  • the training method of the first neural network to be trained can make the performance of the trained first neural network close to that of the trained second neural network. Therefore, the image processing method of the embodiment of the present disclosure can obtain high-definition images. At the same time, improve the image processing speed.
  • FIG. 8 is an effect diagram before and after image processing using the image processing method according to an embodiment of the present disclosure.
  • the left image in FIG. 8 is the input image before processing, and the right image is the processed target output image.
  • the clarity of the image is improved.
  • the parameter compression factor of the first convolutional network is greater than 50 times, and the processing speed is increased by about 15 times.
  • the present disclosure also provides an image processing device, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the above-mentioned training method for an image processing model is implemented.
  • the present disclosure also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the above-mentioned training method for an image processing model.
  • the above-mentioned memory and the computer-readable storage medium include, but are not limited to, the following readable media: such as random access memory (RAM), read only memory (ROM), non-volatile random access memory (NVRAM), programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable PROM (EEPROM), Flash Memory, Magnetic or Optical Data Storage, Registers, Disk or Tape, such as Compact Disc (CD) or DVD (Digital Universal Disc) and other non-transitory media.
  • processors include, but are not limited to, general purpose processors, central processing units (CPUs), microprocessors, digital signal processors (DSPs), controllers, microcontrollers, state machines, and the like.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

公开一种图像处理方法、图像处理设备和计算机可读存储介质,图像处理方法包括:利用训练好的第一神经网络对输入图像进行处理,得到目标输出图像;训练好的第一神经网络是对待训练的第一神经网络进行第一训练方法训练得到,第一训练方法包括:对待训练的第二神经网络和待训练的判别网络进行交替训练,得到训练好的第二神经网络和训练好的判别网络;将第一样本图像分别提供给训练好的第二神经网络和待训练的第一神经网络,以使待训练的第一神经网络输出第一输出图像,训练好的第二神经网络输出第二输出图像;将第一输出图像提供给训练好的判别网络,以使判别网络生成第一判别结果;根据总损失调整第一神经网络的参数,得到更新后的第一神经网络。

Description

图像处理方法、图像处理设备和可读存储介质 技术领域
本公开涉及图像处理技术领域,具体涉及一种图像处理方法、图像处理设备和可读存储介质。
背景技术
视频在传输前,由于带宽限制,需要进行压缩编码。而压缩后的视频会产生多种压缩噪声,影响人们在显示终端对视频的观感体验。
深度学习技术的兴起,为视频压缩修复方向带来了技术上的突破。其通过对大量视频数据的训练学习,可以很好地提高修复效果。但深度学习的算法模型一般参数量越大,并且,网络结构越深,处理效果越好,这就会导致计算量过大,无法满足视频在显示终端的实时处理要求。
发明内容
本公开的多个方面提供一种图像处理方法、图像处理设备和可读存储介质。
本公开实施例提供了一种图像处理方法,包括:
利用训练好的第一神经网络对输入图像进行处理,得到目标输出图像;所述目标输出图像的清晰度大于所述输入图像的清晰度;
其中,训练好的所述第一神经网络是对待训练的第一神经网络进行第一训练方法训练得到,所述第一训练方法包括:
对待训练的第二神经网络和待训练的判别网络进行交替训练,得到训练好的第二神经网络和训练好的判别网络;其中,训练好的所述第二神经网络的参数多于待训练的所述第一神经网络的参数;训练好的所述第二神经网络配置为将接收到的、具有第一清晰度的图像变换为具有第二清晰度的图像,所述第二清晰度大于所述第一清晰度;待训练的所述第一神经网络包括:多个第一特征提取子网络和位于所述多个第一特征提取子网络之后的第一输出子网络,训练好的所述第二神经网络包括:多个第二特征提取子网络和位于所述多个第二特征提取子网络之后的第二输出子网络,所述第一特征提取子网络与所述 第二特征提取子网络一一对应;
将第一样本图像分别提供给训练好的所述第二神经网络和待训练的所述第一神经网络,以使待训练的所述第一神经网络输出第一输出图像,训练好的所述第二神经网络输出第二输出图像;
将所述第一输出图像提供给训练好的所述判别网络,以使训练好的所述判别网络生成基于所述第一输出图像的第一判别结果;
根据总损失调整所述第一神经网络的参数,以得到更新后的所述第一神经网络;其中,所述总损失包括第一损失、第二损失和第三损失,所述第一损失是基于所述第一输出图像和所述第二输出图像的差异得到的;所述第二损失是基于所述第一判别结果与第一目标结果的差异得到的;所述第三损失是基于至少一个所述第一特征提取子网络的输出图像与相应的所述第二特征提取子网络的输出图像的差异得到的。
在一些实施例中,所述第一特征提取子网络的输出图像的通道数小于相应的第二特征提取子网络的输出图像的通道数;
所述第一训练方法还包括:将多个所述第二特征提取子网络的输出图像一一对应地提供给多个降维层,以使每个降维层生成中间图像;所述中间图像的通道数与所述第一特征提取子网络的输出图像的通道数相同;
根据总损失函数调整所述第一神经网络的参数,包括:对所述第一神经网络和所述降维层的参数都进行调整;其中,所述第三损失是基于每个所述中间图像与相应的所述第一特征提取子网络的输出图像之间的差异的总和得到的。
在一些实施例中,所述总损失还包括:第四损失,所述第四损失是基于所述第一输出图像与所述第二输出图像的感知损失得到的。
在一些实施例中,所述第一输出图像与所述第二输出图像的感知损失
Figure PCTCN2020121405-appb-000001
根据以下公式计算:
Figure PCTCN2020121405-appb-000002
其中,y1为所述第一输出图像,y2为所述第二输出图像,
Figure PCTCN2020121405-appb-000003
为训练好的所述判别网络中的预设网络层,j为所述预设网络层在所述 判别网络中的层数,C为所述预设网络层的输出图像的通道数,H为所述预设网络层的输出图像的高度,W为所述预设网络层的输出图像的宽度。
在一些实施例中,所述第一损失包括所述第一输出图像与所述第二输出图像的L1损失。
在一些实施例中,所述第二损失包括所述第一判别结果与第一目标结果的交叉熵损失。
在一些实施例中,所述第三损失项包括每个第一特征提取子网络的输出图像与相应的中间图像的L2损失的总和。
在一些实施例中,将所述第一输出图像提供给训练好的所述判别网络,以使训练好的所述判别网络生成基于所述第一输出图像的第一判别结果,包括:
将所述第一输出图像设置为带有真值标签,并将具有真值标签的第一输出图像提供给训练好的所述判别网络,以使所述判别网络输出第一判别结果。
在一些实施例中,对待训练的第二神经网络和待训练的判别网络进行交替训练的步骤中,对待训练的判别网络进行训练,包括:
将第二样本图像提供给当前的所述第二神经网络,以使当前的所述第二神经网络生成第一清晰度提升图像;
将所述第一清晰度提升图像以及与所述第二样本图像对应的原始样本图像提供给当前的所述判别网络,并根据当前的所述判别网络的损失函数来调节当前的所述判别网络的参数,使得调参后的所述判别网络输出能够表征所述判别网络的输入为所述第二神经网络的输出图像还是所述原始样本图像的判别结果。
在一些实施例中,对待训练的第二神经网络和待训练的判别网络进行交替训练的步骤中,对待训练的第二神经网络进行训练,包括:
将第三样本图像提供给当前的所述第二神经网络,以使当前的所述第二神经网络生成第二清晰度提升图像;
将所述第二清晰度提升图像输入调参后的所述判别网络,以使调参后的所述判别网络生成基于所述第二清晰度提升图像的第二判别结果;
基于当前的所述第二神经网络的损失函数,调整当前的所述第 二神经网络的参数,以得到更新后的第二神经网络;当前的所述第二神经网络的损失函数中的第一项基于所述第二清晰度提升图像与其对应的原始样本图像之间的差异,当前的所述第二神经网络的损失函数中的第二项基于所述第二判别结果与第二目标结果之间的差异。
在一些实施例中,当前的所述第二神经网络的损失函数中的第一项为λ 1LossG1,λ 1为预设的权值,LossG1为所述第二清晰度提升图像与其对应的原始样本图像之间的L1损失;
当前的所述第二神经网络的损失函数中的第二项为λ 2L D,λ 2为预设的权值,L D为所述第二判别结果与所述第二目标结果的交叉熵;
当前的所述第二神经网络的损失函数中的第三项为
Figure PCTCN2020121405-appb-000004
λ 3为预设的权值,y为所述第二清晰度提升图像所对应的原始样本图像,
Figure PCTCN2020121405-appb-000005
为所述第二清晰度提升图像;
Figure PCTCN2020121405-appb-000006
为预设优化网络中的预设网络层,j为预设网络层在所述预设优化网络中的层数,C为所述预设网络层的输出图像的通道数,H为所述预设网络层的输出图像的高度,W为所述预设网络层的输出图像的宽度;所述预设优化网络采用VGG-19网络。
在一些实施例中,待训练的所述第一神经网络包括:多个第一上采样层、多个第一下采样层和多个单层卷积层,每个所述第一上采样层和每个所述第一下采样层均位于两个所述单层卷积层之间;倒数第i个所述单层卷积层的输入数据包括倒数第i个所述第一上采样层的输出数据和正数第i个所述单层卷积层的输出数据的叠加;其中,所述单层卷积层的数量为偶数,i大于0且小于所述单层卷积层的数量的一半;
训练好的所述第二神经网络包括:多个第二上采样层、多个第二下采样层和多个残差块,所述多个第二上采样层与所述多个第一上采样层一一对应,所述多个第二下采样层与所述多个第一下采样层一一对应,所述多个残差块与所述多个单层卷积层一一对应;倒数第i个所述残差块的输入数据为倒数第i个所述第二上采样层的输出数据和正数第i个所述残差块的输出数据的叠加;
所述第一特征提取子网络包括:所述第一上采样层、或者所述第一下采样层、或者所述单层卷积层;所述第一输出子网络包括所述单层卷积层;所述第二特征提取子网络包括:所述第二上采样层、或者所述第二下采样层、或者所述残差块;所述第二输出子网络包括所述残差块。
本公开实施例还提供一种图像处理设备,包括存储器和处理器,所述存储器上存储有计算机程序,其中,所述计算机程序被所述处理器执行时实现上述的图像处理方法。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现上述的图像处理方法。
附图说明
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为一种卷积神经网络的示意图。
图2为本公开实施例中提供的图像处理方法示意图。
图3为本公开实施例中提供的第一训练方法的示意图。
图4为本公开实施例中提供的包括第一神经网络和第二神经网络的网络架构示意图。
图5为残差块的示例图。
图6为本公开实施例中提供的训练好的判别网络的结构示意图。
图7为本公开实施例中提供的步骤S21的一种可选实现方式流程图。
图8为采用本公开实施例的图像处理方法对图像处理前后的效果图。
具体实施方式
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。
视频在传输前,由于带宽限制,需要进行压缩编码。而压缩后的视频会产生多种压缩噪声,影响人们在显示终端对视频的观感体验。基于深度学习的视频压缩修复技术,能够提高对视频压缩噪声的修复效果。但是,深度学习的算法模型的参数量较大,从而造成显示终端的计算量过大。
深度学习系统的主要组成部分是卷积神经网络,图1为一种卷积神经网络的示意图。该卷积神经网络可以用于图像处理,其使用图像作为输入和输出,并通过滤波器(即,卷积)替代标量权重。图1中仅示出了具有3层结构的卷积神经网络,本公开的实施例对此不作限制。如图1所示,卷积神经网络包括输入层101、隐藏层102和输出层103。在输入层101输入4个输入图像,在中间的隐藏层102存在3个单元以输出3个输出图像,而在输出层103存在2个单元以输出2个输出图像。
如图1所示,卷积层具有权重w ij k和偏置b i k,权重w ij k表示卷积核,偏置是叠加到卷积层的输出的标量,其中,k是表示输入层101号的标签,i和j分别是输入层101的单元和隐藏层102的单元的标签。例如,第一卷积层201包括第一组卷积核(图1中的w ij 1)和第一组偏置(图1中的b i 1)。第二卷积层202包括第二组卷积核(图1中的w ij 2)和第二组偏置(图1中的b i 2)。通常,每个卷积层包括数十个或数百个卷积核,若卷积神经网络为深度卷积神经网络,则可以包括至少五层卷积层。
如图1所示,该卷积神经网络还包括第一激活层203和第二激活层204。第一激活层203位于第一卷积层201之后,第二激活层204位于第二卷积层202之后。激活层包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层中。
图1的卷积神经网络可以用于提高图像的清晰度,经过训练的卷积神经网络将输入的低清晰度图像进行清晰度的提升,得到高清晰度的图像。卷积神经网络的训练过程即为对卷积神经网络的参数的优化过程。其中,卷积神经网络的损失有助于优化卷积神经网络的参数 (权重),训练过程的目标是通过优化神经网络的参数来最大程度地减小神经网络的损失。其中,神经网络的损失用于衡量网络模型预测的好坏,即,用来表现预测结果与实际数据的差距程度。
本公开实施例提供一种图像处理方法,图2为本公开实施例中提供的图像处理方法示意图,如图2所示,图像处理方法包括:S10、利用训练好的第一神经网络对输入图像进行处理,得到目标输出图像。目标输出图像的清晰度高于输入图像的清晰度。
需要说明的是,本公开的实施例中,“清晰度”例如是指图像中各细部影纹及其边界的清晰程度,清晰度越高,人眼的感观效果越好。目标输出图像的清晰度高于输入图像的清晰度,例如是指采用本公开实施例提供的图像处理方法对输入图像进行处理,例如进行去噪和/或去模糊处理,从而使处理后得到的目标输出图像比输入图像更清晰。
在本公开实施例中,其中,训练好的所述第一神经网络是对待训练的第一神经网络进行第一训练方法训练得到,图3为本公开实施例中提供的第一训练方法的示意图,如图3所示,第一训练方法包括:
S21、对待训练的第二神经网络和待训练的判别网络进行交替训练,得到训练好的第二神经网络和训练好的判别网络。
在本公开实施例中,训练好的第二神经网络的参数多于待训练的第一神经网络的参数。图4为本公开实施例中提供的包括待训练的第一神经网络和训练好的第二神经网络的网络架构示意图,待训练的第一神经网络10包括:多个第一特征提取子网络ML1和位于多个第一特征提取子网络ML1之后的第一输出子网络OL1,训练好的第二神经网络20包括:多个第二特征提取子网络ML2和位于多个第二特征提取子网络ML2之后的第二输出子网络OL2,第一特征提取子网络ML1与第二特征提取子网络ML2一一对应。
在一些实施例中,待训练的第一神经网络10包括:多个第一上采样层13、多个第一下采样层12和多个单层卷积层11,每个第一上采样层13和每个第一下采样层12均位于两个单层卷积层11之间;倒数第i个单层卷积层11的输入数据包括倒数第i个第一上采样层13的输出数据和正数第i个单层卷积层11的输出数据的叠加。其中,单层卷积层11的数量为偶数,i大于0且小于单层卷积层的数量的 一半。第二神经网络20包括:多个第二上采样层23、多个第二下采样层22和多个残差块21,多个第二上采样层23与多个第一上采样层13一一对应,多个第二下采样层22与多个第一下采样层12一一对应,多个残差块21与多个单层卷积层11一一对应;倒数第i个残差块21的输入数据包括倒数第i个第二上采样层23的输出数据和正数第i个残差块21的输出数据的叠加。
单层卷积层11、第一上采样层13和第一下采样层12均采用3*3的卷积核,卷积核数均为128。第二上采样层23的采样倍率与第一上采样层13相同,第二下采样层22的采样倍率与第一下采样层12相同。示例性地,第一上采样层13和第一下采样层12均为2倍采样。第一下采样层12和第二下采样层22可以包括反向Muxout层、条纹卷积(Strided Convolution)、最大池化层(Maxpool Layer)或标准的每通道下采样器(如双三次插值)。第一上采样层13和第二上采样层23可以包括Muxout层、条纹反向卷积(Strided Transposed Convolution)或标准的每通道上采样器(如双三次插值)。
图5为残差块的示例图,如图5所示,每个残差块21包括三个依次连接的子残差块21a。每个子残差块21a采用两个具有3*3卷积核的卷积层,两个卷积层之间连接有激活层,每个子残差块21a中,其输入叠加到最后一个卷积层的输出结果上,从而作为子残差块21a的输出。激活层包括激活函数,激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层中。在第一卷积网络10中,以单层卷积层11代替第二神经网络20中的残差块21,从而减少第一卷积网络10的参数量。
第一特征提取子网络ML1包括:第一上采样层13、或者第一下采样层12、或者所述单层卷积层11;所述第一输出子网络OL1包括单层卷积层21;所述第二特征提取子网络ML2包括:第二上采样层23、或者第二下采样层22、或者残差块21;第二输出子网络OL2包括残差块21。另外,第一特征提取子网络ML1的输出图像的通道数大于第二特征提取子网络ML2的输出图像的通道数。示例性地,第一特征提取子网络ML1的输出图像的通道数为128,第二特征提取 子网络ML2的输出图像的通道数为32。需要说明的是,在神经网络中,输入每个网络层的图像均是以矩阵进行表示的,神经网络中的第一层接收到的图像可以为R、G、B三通道的图像矩阵,即,每个通道的图像矩阵表示图像的红色分量、绿色分量或蓝色分量的数据。而每个网络层用于对图像进行特征提取,经过特征提取后,网络层的输出数据包括多个矩阵,每个矩阵即表示图像的一个通道。
在本公开实施例中,待训练的第二神经网络和待训练的判别网络进行交替训练,从而相互竞争,获得最佳模型。具体地,训练好的第二神经网络配置为将接收到的、具有第一清晰度的图像变换为具有第二清晰度的图像,第二清晰度大于第一清晰度。训练好的判别网络配置为,确定第二神经网络的输出结果与预设标准图像的匹配度,该匹配度在0~1之间。其中,对待训练的第二神经网络进行训练时,通过调整当前的第二神经网络的参数,以使参数调整后的第二神经网络的输出结果输入当前的判别网络后,判别网络输出尽量接近1的匹配度;对待训练的判别网络进行训练时,通过调整当前的判别网络的参数,以使得预设标准图像输入当前的判别网络后,当前的判别网络输出结果尽量接近1(即,判别网络判定其输入为“真”样本),且当前的第二神经网络的输出结果进入判别网络后,判别网络输出结果尽量接近0(即,判别网络判定其输入为“假”样本)。通过第二神经网络和判别网络的交替训练,使得判别网络不断优化,以尽量判别区分开第二神经网络的输出结果与预设标准图像,而第二神经网络不断优化,以使输出结果尽可能接近预设标准图像。这种方法使得两个神经网络在每次训练中基于另一网络越来越好的结果而进行竞争和不断改进,以得到越来越优的网络模型。
图6为本公开实施例中提供的训练好的判别网络的结构示意图,如图6所示,训练好的判别网络30包括多个卷积层31~34和全连接层35,示例性地,每个卷积层31~34采用2倍下采样卷积层,每个卷积层31~34后面连接一个激活层,激活层包括激活函数,激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。每个卷积层31~34中采用3*3的卷积核,卷积层31输出的图像的通道数为32,卷积层32输出的图像的通道数为64,卷积层33输出的图像的通道数为128,卷积层34输出 的图像的通道数为192。全连接层35输出1024*1的向量,之后通过激活层(例如,该激活层采用sigmoid作为激活函数)后,输出0~1之间的值。
应当理解的是,训练好的判别网络和待训练的判别网络的结构(即,卷积层数,卷积层中的卷积核数量)相同,区别在于卷积层中的权重不同。
需要说明的是,图4、图5中的训练好的第一神经网络10和训练好的判别网络30中的网络层的层数仅为示例性说明,在实际应用中,可以根据需要对网络结构进行调整。
S22、将第一样本图像分别提供给训练好的第二神经网络和待训练的第一神经网络,以使待训练的第一神经网络输出第一输出图像,训练好的第二神经网络输出第二输出图像。
在一些示例中,可以对原始视频进行低码率的压缩(例如,压缩码率为1Mbps),得到压缩后的视频,压缩后的视频中的每一帧图像均可以作为带有噪声的第一样本图像,所述噪声可以为高斯噪声。
S23、将第一输出图像提供给训练好的判别网络,以使训练好的判别网络生成基于第一输出图像的第一判别结果。
S24、根据总损失调整第一神经网络的参数,以得到更新后的第一神经网络。其中,总损失包括第一损失、第二损失和第三损失,第一损失是基于第一输出图像和第二输出图像的差异得到的;第二损失是基于第一判别结果与第一目标结果的差异得到的;第三损失是基于至少一个第一特征提取子网络的输出图像与相应的第二特征提取子网络的输出图像的差异得到的。
如上所述,训练好的判别网络30的输出为0~1之间的匹配度,这种情况下,第一目标结果为接近1或等于1的匹配度。
需要说明的是,“根据总损失调整第一神经网络的参数”是指,调整第一神经网络的参数,使得在多次进行第一训练方法时,总损失的值整体上呈减小的趋势。第一训练方法的执行次数可以预先设定,或者,当总损失小于预设值时,不再进行第一训练方法。还需要说明的是,不同次的第一训练方法中,所利用的第一样本图像可以不同。
在本公开实施例中,两个图像之间的差异为两个图像的低频信息的差异,其可以采用L1损失值、均方误差(MSE)、相似度(SSIM) 等来进行表征。
在一些实施例中,第一损失包括第一输出图像与第二输出图像的L1损失,具体可为x 1Loss1,其中,x 1为预设的权值,Loss1为第一输出图像与第二输出图像的L1损失,即,Loss1=||y1-y2|| 1,y1为第一输出图像,y2为第二输出图像。
在一些实施例中,第二损失包括第一判别结果与第一目标结果的交叉熵损失,具体为x 2Loss2,x 2为预设的权值,Loss2为判别网络的第一判别结果与第一目标结果的交叉熵损失。
具体地,Loss2=-[PlogP’+(1-P)log(1-P’)],其中,P为第一目标结果,P’为第一判别结果。在一些实施例中,步骤S23具体包括:将第一输出图像设置为带有真值标签,并将具有真值标签的第一输出图像提供给训练好的判别网络,以使判别网络输出第一判别结果。其中,真值标签用于表示图像为“真”样本,第一目标结果为真值标签对应的概率。例如,第一目标结果为1。
在一些实施例中,第三损失具体是基于至少一个第一特征提取子网络的输出图像的变换图像与相应的第二特征提取子网络的输出图像的差异得到的。例如,包括第一神经网络和第二神经网络的网络架构还包括多个降维层40。降维层40与第一特征提取子网络ML1一一对应,降维层40配置为对相应的第一特征提取子网络的输出图像进行通道降维,生成中间图像;中间图像的通道数与所述第二特征提取子网络的输出图像的通道数相同。
第一训练方法还包括:将多个所述第二特征提取子网络的输出图像一一对应地提供给多个降维层,以使每个降维层生成中间图像;中间图像的通道数与第一特征提取子网络的输出图像的通道数相同。这种情况下,在步骤S24中,对所述第一神经网络和所述降维层的参数都进行调整。其中,第三损失是基于每个所述中间图像与相应的所述第一特征提取子网络的输出图像之间的差异的总和得到的。
在一些实施例中,中间图像与第二特征提取子网络的输出图像之间的差异以二者的L2损失来表示。第三损失为x 3Loss3,其中,Loss3为每个第一特征提取子网络的输出图像与相应的中间图像的L2损失的总和。具体地,Loss3根据以下公式计算:
Figure PCTCN2020121405-appb-000007
其中,x 3为预设的权值;T为第一特征提取子网络的数量,S n(z)为第一神经网络中第n层第二特征提取子网络的输出图像,G n(z)为第二神经网络中第n层第一特征提取子网络的输出图像,f(G n(z))为第二神经网络中第n层第一特征提取子网络所对应的降维层输出的中间图像。
在本公开实施例中,和训练好的第二神经网络相比,训练好的第一神经网络得到了简化,训练好的第一神经网络具有更少的参数和更简单的网络结构,使得训练好的第一神经网络在其运行时占用较少的资源(例如计算资源、存储资源等),因而可以应用于轻量级的终端。并且,在对待训练的第一神经网络训练时所采用的总损失中,第一损失是基于第一神经网络的输出结果与第二神经网络的输出结果的差异得到的,第二损失是基于训练好的判别网络的判别结果与第一目标结果的差异得到的,第三损失是基于至少一个第一特征提取子网络的输出图像与相应的第二特征提取子网络的输出图像的差异得到的,从而使得训练好的第一神经网络的性能尽量与第二神经网络相同。因此,本公开实施例可以在保证图像处理效果的前提下,减少图像处理模型的参数,从而提高图像处理速度。
在一些实施例中,总损失还包括:第四损失,第四损失是基于第一输出图像与第二输出图像的感知损失得到的。其中,感知损失用于表征两个图像高频信息(例如,图像上的纹理、毛发等细节特征)的差异。
可选地,第四损失为:
Figure PCTCN2020121405-appb-000008
x 4为预设的权值。
Figure PCTCN2020121405-appb-000009
为第一输出图像与第二输出图像的感知损失,其根据以下公式计算:
Figure PCTCN2020121405-appb-000010
其中,y1为第一输出图像,y2为第二输出图像,
Figure PCTCN2020121405-appb-000011
为训练好的所述判别网络中的预设网络层,j为所述预设网络层在所述判别网络中的层数,C为所述预设网络层的输出图像的通道数,H为所述预设网络层的输出图像的高度,W为所述预设网络层的输出图像的宽度。 可以理解的是,
Figure PCTCN2020121405-appb-000012
为第一输出图像输入至训练好的判别网络后,预设网络层的输出图像;
Figure PCTCN2020121405-appb-000013
为第二输出图像输入至预设优化网络后,预设网络层的输出图像。可选地,预设网络层可以输出图像通道数为128的卷积层。
图7为本公开实施例中提供的步骤S21的一种可选实现方式流程图,如图7所示,步骤S21具体包括:交替进行步骤S21a和步骤S21b,直至达到预设训练条件。预设训练条件例如为:步骤S21a和步骤S21b的交替次数达到预设次数。
S21a、将第二样本图像提供给当前的所述第二神经网络,以使第二神经网络生成第一清晰度提升图像。将第一清晰度提升图像以及与第二样本图像对应的原始样本图像提供给当前的判别网络,并根据当前的判别网络的损失函数来调节判别网络的参数,使得调参后的判别网络输出能够表征判别网络的输入为第二神经网络的输出图像还是原始样本图像的判别结果。
S21b、将第三样本图像提供给当前的第二神经网络,以使第二神经网络生成第二清晰度提升图像。将第二清晰度提升图像输入调参后的判别网络,以使调参后的判别网络生成基于第二清晰度提升图像的第二判别结果。基于第二神经网络的损失函数,调整第二神经网络的参数,以得到更新后的第二神经网络。
需要说明的是,将第n次步骤S21a和第n次步骤S21b作为一轮训练过程,那么,在第一轮训练过程中,当前的第二神经网络是待训练的第二神经网络;在第一轮之后的每轮训练过程中,当前的第二神经网络均是上一轮训练过程的步骤S21b中所更新后的第二神经网络。在第一轮训练过程中,当前的判别网络是待训练的判别网络;在第一轮之后的每轮训练过程中,当前的判别网络均是上一轮训练过程的步骤S21a中所调参后的判别网络。
其中,所述第二神经网络的损失函数中的第一项基于所述第二清晰度提升图像与其对应的原始样本图像之间的差异,所述第二神经网络的损失函数中的第二项基于所述第二判别结果与第二目标结果之间的差异。
在一些实施例中,第二神经网络的损失函数LossG中的第一项 为λ 1LossG1,λ 1为预设的权值,LossG1为第二清晰度提升图像与其对应的原始样本图像之间的L1损失。具体地,
Figure PCTCN2020121405-appb-000014
y为第二清晰度提升图像所对应的原始样本图像,
Figure PCTCN2020121405-appb-000015
为所述第二清晰度提升图像。
第二神经网络的损失函数中的第二项为λ 2L D,λ 2为预设的权值,L D为第二判别结果与第二目标结果的交叉熵。其中,第二目标结果用于表征判别网络的输入为第二清晰度提升图像对应的原始图像,即,用于表示判别网络的输入为“真”样本。例如,第二目标结果为1。
第二神经网络的损失函数中的第三项为
Figure PCTCN2020121405-appb-000016
其中,λ 3为预设的权值。
Figure PCTCN2020121405-appb-000017
为预设优化网络中的预设网络层,j为预设网络层在所述预设优化网络中的层数,C为所述预设网络层的输出图像的通道数,H为所述预设网络层的输出图像的高度,W为所述预设网络层的输出图像的宽度;所述预设优化网络采用VGG-19网络。需要说明的是,在神经网络中,每层网络层所输出的图像并不是视觉可见的图像,而是以矩阵进行表示的,图像的高度可以看做矩阵的行数,图像的宽度可以看做矩阵的列数。
也就是说,在第二神经网络进行多次训练后,更新后的第二神经网络所输出的图像与原始样本图像的L1损失值尽量接近,并且,第二神经网络所输出的图像与原始样本图像的感知损失尽量接近,同时,第二神经网络所输出的图像提供给判别网络后,判别网络输出的结果接近1。
可选地,在同一轮训练过程中的步骤S21a和S21b中,第二样本图像和第三样本图像可以是相同的。而在不同轮的训练过程中的第二样本图像不同,第三样本图像也不同。
需要说明的是,在每一轮训练过程中,可以先进行判别网络的训练步骤,也可以先进行生成网络的训练步骤。
在一些示例中,可以对原始视频进行无损压缩得到无损压缩视频,将无损压缩视频图像中的图像帧作为原始样本图像;对原始视频进行第一码率的压缩得到低损压缩视频,将低损压缩视频中的图像帧 作为第二样本图像或第三样本图像。
在一些示例中,步骤S21的训练过程可以采用Adam优化器,学习率为1e-4。
在本公开实施例中,训练好的第一神经网络具有比第二神经网络更少的参数和更简单的网络结构,使得第一神经网络在其运行时占用较少的资源(例如计算资源、存储资源等),因而可以应用于轻量级的终端。并且,待训练的第一神经网络的训练方法能够使训练好的第一神经网络的性能与训练好的第二神经网络接近,因此,本公开实施例的图像处理方法能够获得清晰度较高的图像的同时,提高图像处理速度。
图8为采用本公开实施例的图像处理方法对图像处理前后的效果图,图8中的左图为处理前的输入图像,右图为处理后的目标输出图像。如图8所示,经过图像处理后,图像的清晰度提高。并且,和第二卷积网络相比,第一卷积网络的参数量压缩倍数大于50倍,处理速度提高15倍左右。
本公开还提供一种图像处理设备,包括存储器和处理器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时实现上述图像处理模型的训练方法。
本公开还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述图像处理模型的训练方法。
上述存储器和所述计算机可读存储介质包括但不限于以下可读介质:诸如随机存取存储器(RAM)、只读存储器(ROM)、非易失性随机存取存储器(NVRAM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除PROM(EEPROM)、闪存、磁或光数据存储、寄存器、磁盘或磁带、诸如光盘(CD)或DVD(数字通用盘)的光存储介质以及其它非暂时性介质。处理器的示例包括但不限于通用处理器、中央处理单元(CPU)、微处理器、数字信号处理器(DSP)、控制器、微控制器、状态机等。
可以理解的是,以上实施方式仅仅是为了说明本公开的原理而采用的示例性实施方式,然而本公开并不局限于此。对于本领域内的普通技术人员而言,在不脱离本公开的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本公开的保护范围。

Claims (14)

  1. 一种图像处理方法,包括:
    利用训练好的第一神经网络对输入图像进行处理,得到目标输出图像;所述目标输出图像的清晰度大于所述输入图像的清晰度;
    其中,训练好的所述第一神经网络是对待训练的第一神经网络进行第一训练方法训练得到,所述第一训练方法包括:
    对待训练的第二神经网络和待训练的判别网络进行交替训练,得到训练好的第二神经网络和训练好的判别网络;其中,训练好的所述第二神经网络的参数多于待训练的所述第一神经网络的参数;训练好的所述第二神经网络配置为将接收到的、具有第一清晰度的图像变换为具有第二清晰度的图像,所述第二清晰度大于所述第一清晰度;待训练的所述第一神经网络包括:多个第一特征提取子网络和位于所述多个第一特征提取子网络之后的第一输出子网络,训练好的所述第二神经网络包括:多个第二特征提取子网络和位于所述多个第二特征提取子网络之后的第二输出子网络,所述第一特征提取子网络与所述第二特征提取子网络一一对应;
    将第一样本图像分别提供给训练好的所述第二神经网络和待训练的所述第一神经网络,以使待训练的所述第一神经网络输出第一输出图像,训练好的所述第二神经网络输出第二输出图像;
    将所述第一输出图像提供给训练好的所述判别网络,以使训练好的所述判别网络生成基于所述第一输出图像的第一判别结果;
    根据总损失调整所述第一神经网络的参数,以得到更新后的所述第一神经网络;其中,所述总损失包括第一损失、第二损失和第三损失,所述第一损失是基于所述第一输出图像和所述第二输出图像的差异得到的;所述第二损失是基于所述第一判别结果与第一目标结果的差异得到的;所述第三损失是基于至少一个所述第一特征提取子网络的输出图像与相应的所述第二特征提取子网络的输出图像的差异得到的。
  2. 根据权利要求1所述的图像处理方法,其中,所述第一特征提取子网络的输出图像的通道数小于相应的第二特征提取子网络的 输出图像的通道数;
    所述第一训练方法还包括:将多个所述第二特征提取子网络的输出图像一一对应地提供给多个降维层,以使每个降维层生成中间图像;所述中间图像的通道数与所述第一特征提取子网络的输出图像的通道数相同;
    根据总损失函数调整所述第一神经网络的参数,包括:对所述第一神经网络和所述降维层的参数都进行调整;其中,所述第三损失是基于每个所述中间图像与相应的所述第一特征提取子网络的输出图像之间的差异的总和得到的。
  3. 根据权利要求1所述的图像处理方法,其中,所述总损失还包括:第四损失,所述第四损失是基于所述第一输出图像与所述第二输出图像的感知损失得到的。
  4. 根据权利要求3所述的图像处理方法,其中,所述第一输出图像与所述第二输出图像的感知损失
    Figure PCTCN2020121405-appb-100001
    根据以下公式计算:
    Figure PCTCN2020121405-appb-100002
    其中,y1为所述第一输出图像,y2为所述第二输出图像,
    Figure PCTCN2020121405-appb-100003
    为训练好的所述判别网络中的预设网络层,j为所述预设网络层在所述判别网络中的层数,C为所述预设网络层的输出图像的通道数,H为所述预设网络层的输出图像的高度,W为所述预设网络层的输出图像的宽度。
  5. 根据权利要求1至4中任意一项所述的图像处理方法,其中,所述第一损失包括所述第一输出图像与所述第二输出图像的L1损失。
  6. 根据权利要求1至4中任意一项所述的图像处理方法,其中,所述第二损失包括所述第一判别结果与第一目标结果的交叉熵损失。
  7. 根据权利要求2至4中任意一项所述的图像处理方法,其中,所述第三损失项包括每个第一特征提取子网络的输出图像与相应的 中间图像的L2损失的总和。
  8. 根据权利要求1至4中任意一项所述的图像处理方法,其中,将所述第一输出图像提供给训练好的所述判别网络,以使训练好的所述判别网络生成基于所述第一输出图像的第一判别结果,包括:
    将所述第一输出图像设置为带有真值标签,并将具有真值标签的第一输出图像提供给训练好的所述判别网络,以使所述判别网络输出第一判别结果。
  9. 根据权利要求1至4中任意一项所述的图像处理方法,其中,对待训练的第二神经网络和待训练的判别网络进行交替训练的步骤中,对待训练的判别网络进行训练,包括:
    将第二样本图像提供给当前的所述第二神经网络,以使当前的所述第二神经网络生成第一清晰度提升图像;
    将所述第一清晰度提升图像以及与所述第二样本图像对应的原始样本图像提供给当前的所述判别网络,并根据当前的所述判别网络的损失函数来调节当前的所述判别网络的参数,使得调参后的所述判别网络输出能够表征所述判别网络的输入为所述第二神经网络的输出图像还是所述原始样本图像的判别结果。
  10. 根据权利要求9所述的图像处理方法,其中,对待训练的第二神经网络和待训练的判别网络进行交替训练的步骤中,对待训练的第二神经网络进行训练,包括:
    将第三样本图像提供给当前的所述第二神经网络,以使当前的所述第二神经网络生成第二清晰度提升图像;
    将所述第二清晰度提升图像输入调参后的所述判别网络,以使调参后的所述判别网络生成基于所述第二清晰度提升图像的第二判别结果;
    基于当前的所述第二神经网络的损失函数,调整当前的所述第二神经网络的参数,以得到更新后的第二神经网络;当前的所述第二神经网络的损失函数中的第一项基于所述第二清晰度提升图像与其对应的原始样本图像之间的差异,当前的所述第二神经网络的损失函 数中的第二项基于所述第二判别结果与第二目标结果之间的差异。
  11. 根据权利要求10所述的图像处理方法,其中,当前的所述第二神经网络的损失函数中的第一项为λ 1LossG1,λ 1为预设的权值,LossG1为所述第二清晰度提升图像与其对应的原始样本图像之间的L1损失;
    当前的所述第二神经网络的损失函数中的第二项为λ 2L D,λ 2为预设的权值,L D为所述第二判别结果与所述第二目标结果的交叉熵;
    当前的所述第二神经网络的损失函数中的第三项为
    Figure PCTCN2020121405-appb-100004
    λ 3为预设的权值,y为所述第二清晰度提升图像所对应的原始样本图像,
    Figure PCTCN2020121405-appb-100005
    为所述第二清晰度提升图像;
    Figure PCTCN2020121405-appb-100006
    为预设优化网络中的预设网络层,j为预设网络层在所述预设优化网络中的层数,C为所述预设网络层的输出图像的通道数,H为所述预设网络层的输出图像的高度,W为所述预设网络层的输出图像的宽度;所述预设优化网络采用VGG-19网络。
  12. 根据权利要求1至4中任意一项所述的图像处理方法,其中,待训练的所述第一神经网络包括:多个第一上采样层、多个第一下采样层和多个单层卷积层,每个所述第一上采样层和每个所述第一下采样层均位于两个所述单层卷积层之间;倒数第i个所述单层卷积层的输入数据包括倒数第i个所述第一上采样层的输出数据和正数第i个所述单层卷积层的输出数据的叠加;其中,所述单层卷积层的数量为偶数,i大于0且小于所述单层卷积层的数量的一半;
    训练好的所述第二神经网络包括:多个第二上采样层、多个第二下采样层和多个残差块,所述多个第二上采样层与所述多个第一上采样层一一对应,所述多个第二下采样层与所述多个第一下采样层一一对应,所述多个残差块与所述多个单层卷积层一一对应;倒数第i个所述残差块的输入数据为倒数第i个所述第二上采样层的输出数据和正数第i个所述残差块的输出数据的叠加;
    所述第一特征提取子网络包括:所述第一上采样层、或者所述第一下采样层、或者所述单层卷积层;所述第一输出子网络包括所述单层卷积层;所述第二特征提取子网络包括:所述第二上采样层、或者所述第二下采样层、或者所述残差块;所述第二输出子网络包括所述残差块。
  13. 一种图像处理设备,包括存储器和处理器,所述存储器上存储有计算机程序,其中,所述计算机程序被所述处理器执行时实现权利要求1至12中任意一项所述的图像处理方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至12中任意一项所述的图像处理方法。
PCT/CN2020/121405 2020-10-16 2020-10-16 图像处理方法、图像处理设备和可读存储介质 WO2022077417A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080002356.2A CN114641792A (zh) 2020-10-16 2020-10-16 图像处理方法、图像处理设备和可读存储介质
PCT/CN2020/121405 WO2022077417A1 (zh) 2020-10-16 2020-10-16 图像处理方法、图像处理设备和可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/121405 WO2022077417A1 (zh) 2020-10-16 2020-10-16 图像处理方法、图像处理设备和可读存储介质

Publications (1)

Publication Number Publication Date
WO2022077417A1 true WO2022077417A1 (zh) 2022-04-21

Family

ID=81208702

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121405 WO2022077417A1 (zh) 2020-10-16 2020-10-16 图像处理方法、图像处理设备和可读存储介质

Country Status (2)

Country Link
CN (1) CN114641792A (zh)
WO (1) WO2022077417A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958556A (zh) * 2023-08-01 2023-10-27 东莞理工学院 用于椎体和椎间盘分割的双通道互补脊柱图像分割方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116128768B (zh) * 2023-04-17 2023-07-11 中国石油大学(华东) 一种带有去噪模块的无监督图像低照度增强方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
WO2019197712A1 (en) * 2018-04-09 2019-10-17 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN111192206A (zh) * 2019-12-03 2020-05-22 河海大学 一种提高图像清晰度的方法
CN111767979A (zh) * 2019-04-02 2020-10-13 京东方科技集团股份有限公司 神经网络的训练方法、图像处理方法、图像处理装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018053340A1 (en) * 2016-09-15 2018-03-22 Twitter, Inc. Super resolution using a generative adversarial network
WO2019197712A1 (en) * 2018-04-09 2019-10-17 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
CN111767979A (zh) * 2019-04-02 2020-10-13 京东方科技集团股份有限公司 神经网络的训练方法、图像处理方法、图像处理装置
CN111192206A (zh) * 2019-12-03 2020-05-22 河海大学 一种提高图像清晰度的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958556A (zh) * 2023-08-01 2023-10-27 东莞理工学院 用于椎体和椎间盘分割的双通道互补脊柱图像分割方法
CN116958556B (zh) * 2023-08-01 2024-03-19 东莞理工学院 用于椎体和椎间盘分割的双通道互补脊柱图像分割方法

Also Published As

Publication number Publication date
CN114641792A (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
JP7446997B2 (ja) 敵対的生成ネットワークのトレーニング方法、画像処理方法、デバイスおよび記憶媒体
WO2019120110A1 (zh) 图像重建方法及设备
CN110059796B (zh) 卷积神经网络的生成方法及装置
US10034005B2 (en) Banding prediction for video encoding
US10325346B2 (en) Image processing system for downscaling images using perceptual downscaling method
EP4109392A1 (en) Image processing method and image processing device
US9282330B1 (en) Method and apparatus for data compression using content-based features
US11216910B2 (en) Image processing system, image processing method and display device
Seo et al. A novel just-noticeable-difference-based saliency-channel attention residual network for full-reference image quality predictions
US20220335583A1 (en) Image processing method, apparatus, and system
CN110799995A (zh) 数据识别器训练方法、数据识别器训练装置、程序及训练方法
WO2022077417A1 (zh) 图像处理方法、图像处理设备和可读存储介质
CN110717868B (zh) 视频高动态范围反色调映射模型构建、映射方法及装置
CN111489364B (zh) 基于轻量级全卷积神经网络的医学图像分割方法
CN111507910B (zh) 一种单图像去反光的方法、装置及存储介质
US20230177641A1 (en) Neural network training method, image processing method, and apparatus
TW202141358A (zh) 圖像修復方法及裝置、存儲介質、終端
CN110111266B (zh) 一种基于深度学习去噪的近似信息传递算法改进方法
CN112927137A (zh) 一种用于获取盲超分辨率图像的方法、设备及存储介质
CN109871790B (zh) 一种基于混合神经网络模型的视频去色方法
CN114830168A (zh) 图像重建方法、电子设备和计算机可读存储介质
CN117094910A (zh) 基于非线性无激活网络的生成对抗网络图像去模糊方法
CN112634136B (zh) 一种基于图像特征快速拼接的图像超分辨率方法及其系统
CN111861877A (zh) 视频超分变率的方法和装置
CN109741313A (zh) 独立成分分析和卷积神经网络的无参考图像质量评价方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20957195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20957195

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.02.2024)