WO2020063648A1 - 生成对抗网络训练方法、图像处理方法、设备及存储介质 - Google Patents

生成对抗网络训练方法、图像处理方法、设备及存储介质 Download PDF

Info

Publication number
WO2020063648A1
WO2020063648A1 PCT/CN2019/107761 CN2019107761W WO2020063648A1 WO 2020063648 A1 WO2020063648 A1 WO 2020063648A1 CN 2019107761 W CN2019107761 W CN 2019107761W WO 2020063648 A1 WO2020063648 A1 WO 2020063648A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
resolution
network
noise
training
Prior art date
Application number
PCT/CN2019/107761
Other languages
English (en)
French (fr)
Inventor
刘瀚文
朱丹
那彦波
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201811155326.6A external-priority patent/CN109345455B/zh
Priority claimed from CN201811155252.6A external-priority patent/CN109255390B/zh
Priority claimed from CN201811155147.2A external-priority patent/CN109360151B/zh
Priority claimed from CN201811155930.9A external-priority patent/CN109345456B/zh
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to EP19864756.2A priority Critical patent/EP3859655A4/en
Priority to JP2020528931A priority patent/JP7446997B2/ja
Priority to KR1020207014462A priority patent/KR102389173B1/ko
Priority to US16/759,669 priority patent/US11449751B2/en
Publication of WO2020063648A1 publication Critical patent/WO2020063648A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/20Linear translation of a whole image or part thereof, e.g. panning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • G06T3/4076Super resolution, i.e. output image resolution higher than sensor resolution by iteratively correcting the provisional high resolution image using the original low-resolution image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T5/70
    • G06T5/90
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination

Definitions

  • the present disclosure relates to, but is not limited to, the field of image processing, and in particular, to a training method for generating an adversarial network, an image processing method for generating an adversarial network using the training method, a computer device, and a computer-readable storage medium.
  • Convolutional neural network is a common deep learning network, and has been widely used in the field of image processing to achieve image recognition, image classification, and image super-resolution reconstruction.
  • the second-resolution image reconstructed based on the first-resolution image (the resolution of the second-resolution image is greater than the resolution of the first-resolution image) often lacks detailed information , Causing the second resolution image to look unreal.
  • the present disclosure aims to solve at least one of the technical problems in the prior art, and proposes a training method for generating an adversarial network, an image processing method for generating an adversarial network using the training method, a computer device, and a computer-readable storage medium.
  • the present disclosure provides a training method for generating an adversarial network.
  • the generating adversarial network includes a generation network and a discrimination network, and the generation network is configured to convert a first resolution image to a second resolution Image, the resolution of the second resolution image is greater than the resolution of the first resolution image, the training method includes a step of generating a network training, and the step of generating a network training includes:
  • the first input image and the second input image are provided to a generation network, respectively, to generate a first output image based on the first input image and a second output image based on the second input image; wherein the first input image includes the A first noise image corresponding to a first resolution sample image and a noise sample of a first amplitude; the second input image includes a second noise image corresponding to the first resolution sample image and a noise sample of a second amplitude The first amplitude is greater than 0 and the second amplitude is equal to 0;
  • the discrimination network Providing the first output image and the second resolution sample image to a discrimination network, the discrimination network outputting a first discrimination result based on the first output image and a second discrimination sample image based on the Second identification result;
  • Adjusting parameters of the generation network to reduce a loss function of the generation network wherein the loss function of the generation network includes a first loss, a second loss, and a third loss, and the first loss of the loss function is based on the first loss Reconstruction error between the second output image and the second resolution sample image; the second loss of the loss function is based on the perceptual error between the first output image and the second resolution sample image; The third loss of the loss function is based on the first discrimination result and the second discrimination result.
  • the mean square between the second output image and the second resolution sample image Any one of the error, the structural similarity between the second output image and the second resolution sample image determines a reconstruction error between the second output image and the second resolution sample image .
  • L is the total number of resolution improvement steps in the iterative process; L ⁇ 1;
  • LR is the first resolution sample image
  • HR l is the sum obtained by downsampling the second resolution sample image Images of the same resolution
  • ⁇ 1 is a preset weight.
  • L CX () is a perceptual loss calculation function
  • ⁇ 2 is a preset weight
  • HR 1,2, ... L are the sums obtained by downsampling the second resolution sample image Each image in the one-to-one correspondence with the same resolution image;
  • ⁇ 3 is a preset weight.
  • ⁇ 1 : ⁇ 2 : ⁇ 3 10: 0.1: 0.001.
  • the noise sample is random noise.
  • the training method further includes a discriminating network training step, the discriminating network training step includes: providing the first output image and the second resolution sample image to the discriminating network respectively, so that the discriminating The network separately outputs a discrimination result based on the first output image and a discrimination result based on the second resolution sample image; and adjusting parameters of the discrimination network to reduce a loss function of the discrimination network;
  • the discriminating network training step and the generating network training step are alternately performed until a preset training condition is reached.
  • both the first output image and the second output image are generated by the generation network through an iterative process of a resolution increasing step, and the total number of resolution increasing steps in the iterative process is L; when When L is greater than 1, the generating network generates an intermediate image in each of the first L-1 resolution improvement steps in the iterative processing based on the first input image.
  • each intermediate image generated by the generating network based on the first input image is also provided to the generating network;
  • a two-resolution sample image is provided to the identification network, a third-resolution sample image with the same resolution as the one-to-one correspondence to each intermediate image obtained after downsampling the second-resolution sample image is provided.
  • the present disclosure also provides an image processing method of a generation network in a generation adversarial network obtained by using the training method, the image processing method is used to improve the resolution of an image, and the image processing method includes:
  • a noise image corresponding to an input image and a reference noise is provided to the generation network, so that the generation network generates a second resolution image based on the input image.
  • the amplitude of the reference noise is between 0 and the first amplitude.
  • the reference noise is random noise.
  • the present disclosure also provides a computer device including a memory and a processor, where the computer program is stored on the memory, and the computer program implements the above training method when executed by the processor.
  • the present disclosure also provides a computer-readable storage medium having stored thereon a computer program that implements the above training method when executed by a processor.
  • FIG. 1 is a schematic diagram of the relationship between reconstruction distortion and perceived distortion
  • FIG. 2 is a flowchart of generating network training steps in an embodiment of the present disclosure.
  • FIG. 3 is a schematic structural diagram of a generation network in an embodiment of the present disclosure.
  • Image super-resolution reconstruction is a technique to increase the resolution of the initial image to obtain a higher-resolution image.
  • reconstruction distortion and perceived distortion are used to evaluate the super-resolution reconstruction effect.
  • Reconstruction distortion is used to measure the difference between the reconstructed image and the reference image.
  • Specific evaluation criteria include mean square error (MSE), similarity (SSIM), and peak signal-to-noise ratio (PSNR).
  • MSE mean square error
  • SSIM similarity
  • PSNR peak signal-to-noise ratio
  • Perceptual distortion is more focused on the The image looks more like a natural image.
  • FIG. 1 is a schematic diagram showing the relationship between reconstruction distortion and perceived distortion. As shown in FIG. 1, when the reconstruction distortion is small, the perceived distortion is large. At this point, the reconstructed image looks smoother and lacks detail. When the perceived distortion is small, the reconstruction distortion is large. At this time, the details of the reconstructed image are richer. Current image super-resolution reconstruction methods often pursue smaller reconstruction distortions. However, in some
  • the present disclosure provides a training method for generating an adversarial network.
  • the generation of an adversarial network includes a generation network and a discrimination network.
  • the generation network is used to convert a first resolution image into a second resolution image to obtain a second resolution of a target resolution. Image, the resolution of the second resolution image is greater than the resolution of the first resolution image.
  • the generation network may obtain the second resolution image through one processing or multiple iterations of the resolution increasing step. Taking the resolution of the to-be-processed image (that is, the first resolution image) as 128 * 128 and the target resolution as 1024 * 1024 as an example, the generation network can obtain 1024 * 1024 through a step of increasing the resolution by a factor of 8 times.
  • the second resolution image of the image; the image with the resolutions of 256 * 256, 512 * 512, and 1024 * 1024 can also be obtained in sequence by performing three iterations of the resolution increasing step with a multiple of two.
  • the training method for generating an adversarial network includes a step of generating a network training.
  • FIG. 2 is a flowchart of generating network training steps in an embodiment of the present disclosure. As shown in Figure 2, the steps of generating network training include:
  • a first resolution sample image is extracted from a second resolution sample image, and the resolution of the second resolution sample image is higher than the resolution of the first resolution sample image.
  • the first resolution sample image may be obtained by downsampling the second resolution sample image.
  • the first input image and the second input image are respectively provided to a generation network to generate a first output image based on the first input image and a second output image based on the second input image, respectively, where the first input image includes The first noise image corresponding to the first resolution sample image and the noise sample of the first amplitude; the second input image includes the second noise image corresponding to the first resolution sample image and the noise sample of the second amplitude.
  • the first amplitude is greater than zero and the second amplitude is equal to zero.
  • the amplitude of the noise sample is the average fluctuation amplitude of the noise sample.
  • the noise sample is random noise
  • the average value of the image corresponding to the noise sample is ⁇
  • the variance is ⁇ , that is, most of the pixel values in the image corresponding to the noise sample fluctuate between ⁇ - ⁇ to ⁇ + ⁇ ;
  • the noise amplitude is ⁇ . It can be understood that during the image processing process, the image is represented by a matrix, and the above pixel values represent the element values in the image matrix. When the amplitude of the noise sample is 0, since the value of each element in the image matrix is not less than 0, it can be considered that the value of each element of the image matrix is 0.
  • the training method for generating adversarial networks there are multiple generating network training steps; in the same generating network training step, the first resolution sample image is the same, and the first input image and the second The model parameters of the input image generation network are the same.
  • the first output image and the second resolution sample image are respectively provided to a discrimination network, and the discrimination network outputs a first discrimination result based on the first output image and a second discrimination result based on the second resolution sample image.
  • the first discrimination result is used to characterize the matching degree between the first output image and the second resolution sample image.
  • the first discrimination result is used to characterize the probability that the discrimination network determines that the first output image is a second resolution sample image
  • the second identification result is used to characterize the probability that the identification network determines that the second-resolution sample image is indeed the second-resolution sample image.
  • the identification network can be regarded as a classifier with a scoring function.
  • the identification network can score the received image to be identified, and the output score indicates the probability that the image to be identified (the first output image) is a second-resolution sample image, that is, the above-mentioned matching degree, where the matching degree can be Between 0 and 1.
  • the output of the authentication network is 0 or close to 0, it means that the authentication network classifies the image to be authenticated as a non-high-resolution sample image; when the output of the authentication network is 1 or close to 1, it indicates that it receives the image to be authenticated.
  • the image is a second resolution sample image.
  • the scoring function of the discrimination network can be trained using "true” and "false” samples of predetermined scores.
  • the “false” sample is an image generated by the generation network
  • the “true” sample is a second-resolution sample image.
  • the training process of the identification network is to adjust the parameters of the identification network so that the identification network outputs a score close to 1 when it receives "true” samples and outputs a score close to 0 when it receives "false” samples.
  • the loss function of the generated network includes the first loss, the second loss, and the third loss; specifically, the loss function is a superposition of the first loss, the second loss, and the third loss, where the first loss is based on the second output Reconstruction error between the image and the second resolution sample image; the second loss is based on the perception error between the first output image and the second resolution sample image; the third loss is based on the first discrimination result and the second discrimination result .
  • the detailed features for example, hair, lines, etc.
  • the reconstruction distortion of the second-resolution image generated by the generated network is small, the perceived distortion is large, and the naked eye does not look realistic; when noise is added to the training of the generated network, the The detailed features in the structured second-resolution image will be obvious, but the reconstruction distortion is large.
  • the second input image including the noise image with the amplitude of 0 and the first input image including the noise image with the amplitude of 1 are respectively provided to the generating network for training, and the loss function
  • the first loss reflects the reconstruction distortion of the generated network generation results
  • the second loss reflects the perceived distortion of the generated network generation results, that is, the loss function combines two distortion evaluation criteria, and the image is processed using the trained generation network.
  • the amplitude of the input noise can be adjusted according to the actual needs (that is, whether it is necessary to obtain the details of the prominent image and the degree of highlight), so that the reconstructed image meets the actual needs. For example, given the range of reconstruction distortion, the amplitude of the input noise is adjusted to achieve the minimum perceived distortion; or given the range of perceived distortion, the amplitude of the input noise is adjusted to achieve the minimum weight. Otic distortion.
  • the amplitude of the noise image of the first input image in this embodiment is 1, which refers to an amplitude value obtained by normalizing the amplitude of the noise image.
  • the amplitude of the noise image may not be normalized, and the amplitude value of the noise image of the first input image may also be a value other than 1.
  • the noise samples are random noise; the mean value of the first noise image is 1.
  • the average value of the first noise image is: the average value of the normalized image of the first noise image.
  • the channel of the image in the embodiment of the present disclosure is to divide an image into one or more channels for processing.
  • an RGB color image can be divided into three channels: red, green, and blue.
  • the degree map is an image of one channel; if the color image is divided by the HSV color system, it refers to the three channels of hue H, saturation S, and brightness V.
  • the loss function of the generated network is shown in the following formula:
  • ⁇ 1 : ⁇ 2 : ⁇ 3 may be set according to the continuity of the local image.
  • ⁇ 1 : ⁇ 2 : ⁇ 3 may be set according to a target pixel in the image.
  • the first output image and the second output image are both generated by the generation network through an iterative process of the resolution increasing step; the total number of resolution increasing steps in the iterative process is L, and L ⁇ 1.
  • LR is the first resolution sample image; For right An image with the same resolution as the first resolution sample image obtained after the downsampling.
  • the down-sampling method may be the same as the method of obtaining the first-resolution sample image from the second-resolution sample image in step S1.
  • E [] is the calculation of matrix energy.
  • E [] can be used to calculate the maximum or average value of the elements in the matrix in "[]".
  • the reconstruction error when calculating the reconstruction error, not only the L1 general number of the difference image matrix between the second output image itself and the second resolution sample image, but also the accumulation The third-resolution image generated by the generation network (i.e., ) And the L1 universal number of the difference image matrix between the third-resolution sample images (ie, HR 1 , HR 2 ,..., HR L-1 ) of the same resolution.
  • the L1 universal number of the difference image between the third-resolution image, the second-output image down-sampled image, and the first-resolution sample image is also accumulated.
  • the amplitude is zero, the final output image of the generated network can achieve the least reconstruction distortion.
  • the resolution of the third resolution image is greater than the resolution of the first resolution sample image, and the resolution of the third resolution image is the same as the resolution of the third resolution sample image.
  • MSE mean square error
  • SSIM structural similarity
  • the down-sampling method may be the same as the method of obtaining the first-resolution sample image from the second-resolution sample image in step S1.
  • HR l and E [] refer to the description above, and will not be repeated here.
  • L CX () is a calculation function of Contextual Loss.
  • the calculation of the perceptual error not only uses the perceptual loss function to calculate the difference between the first output image and the second resolution sample image, but also cumulatively calculates: the third resolution generated by the generation network based on the first input image Rate image (i.e. ) And the third resolution sample image of the same resolution (ie, HR 1 , HR 2 ,... HR L-1 ).
  • the differences between the third-resolution image, the second-output image down-sampled image, and the first-resolution sample image are also accumulated, so that when the resolution is increased by using the generation network, When noise is generated, the final output image of the network can achieve as little perceived distortion as possible.
  • An image group is generated when the network is iteratively processed based on the first input image, and the image group includes the image generated at the end of each resolution improvement step.
  • HR 1,2, ... L are the sums obtained by downsampling the second resolution sample image The resolution of each image in the one-to-one correspondence of the same image. Among them, HR L is the second resolution sample image itself.
  • Network based authentication D (HR 1, 2, ... L ) is the authentication result of the authentication network based on HR 1, 2, ... L , that is, the second authentication result.
  • the training step of discriminating network includes the step of training the discriminating network including: providing the first output image and the second resolution sample image to the discriminating network, so that The network outputs the discrimination result based on the first output image and the discrimination result based on the second resolution sample image, and adjusts the parameters of the discrimination network to reduce the loss function of the discrimination network.
  • the identification network training step and the generation network training step are alternately performed until a preset training condition is reached.
  • the preset training condition may be, for example, that the number of alternating times reaches a predetermined value.
  • the parameters for generating the network and identifying the network are set or random.
  • the first output image and the second output image are both generated by the generation network through an iterative process of the resolution increasing step, and the total number of iterations is L times.
  • L 1
  • each time an image is provided to the identification network only the first output image or the second resolution sample image may be provided to the identification network.
  • L> 1 in the first L-1 resolution improvement steps based on the first input image by the generation network, each time the resolution improvement is performed, the generation network generates an intermediate image; at the Lth iteration, the generation network generates The image is the first output image.
  • the discrimination network is configured to have multiple inputs to receive multiple images at the same time, and determine the matching degree between the one with the highest resolution and the sample image with the second resolution according to the received multiple images.
  • each intermediate image generated by the generating network based on the first input image is provided to the discriminating network; and the second resolution sample image is provided to the discriminating network.
  • a third resolution sample image corresponding to each intermediate image and having the same resolution and obtained after downsampling the second resolution sample image is provided to the discrimination network.
  • the output of the authentication network is as close as possible to 1 as the result of the authentication, that is, the authentication network considers the generating network to be
  • the output result is a second resolution sample image.
  • the parameters of the identification network are adjusted so that the second resolution sample image is input to the identification network, and the output of the identification network is as close to 1 as possible, and after the output of the network is generated and input to the identification network,
  • the output of the discrimination network is as close as possible to 0; that is, the discrimination network can determine whether the image it receives is a second-resolution sample image through training.
  • the identification network is continuously optimized to improve the discrimination ability; and the generation network is continuously optimized to make the output result as close as possible to the second resolution sample image.
  • This method allows two models that "fight each other” to compete and continuously improve based on the better results of the other model in each training, so as to obtain more and better generative adversarial network models.
  • the present disclosure also provides an image processing method for generating an adversarial network using the training method described above.
  • the image processing method is used to improve the resolution of an image by using the generation network in the generation adversarial network.
  • the image processing method includes: The noise image corresponding to the reference noise is provided to the generation network, so that the generation network generates an image with a higher resolution than the input image.
  • the amplitude of the reference noise is between 0 and a first amplitude.
  • the reference noise is random noise.
  • the disclosure When training the generative network in the generative adversarial network, the disclosure provides the generative network with noise samples of zero amplitude and noise samples of the first amplitude, and the loss function of the generative network combines reconstruction distortion and perceived distortion. Distortion evaluation standard, then, when using the generation network to improve the resolution of the image, the amplitude of the reference noise can be adjusted according to the actual needs, so as to meet the actual needs. For example, given a range of reconstruction distortion, the amplitude of the reference noise is adjusted to achieve the smallest perceived distortion; or given a range of perceived distortion, the amplitude of the reference noise is adjusted to achieve the smallest weight. Otic distortion.
  • FIG. 3 is a schematic structural diagram of a generation network in an embodiment of the present disclosure.
  • the generation network is used for iterative processing of resolution enhancement, and each time the resolution enhancement process increases the resolution of the image I l-1 to be processed to obtain the image I l after the resolution is improved.
  • the to-be-processed image I l-1 is the initial input image; when the total number of iterations of resolution enhancement is L times, and L> 1, the to-be-processed image I l -1 is the output image after the resolution of the initial input image is increased -1 times.
  • the image to be processed I l-1 in the figure is a 256 * 256 image obtained after a resolution increase.
  • the generation network includes a first analysis module 11, a second analysis module 12, a first connection module 21, a second connection module 22, an interpolation module 31, a first upsampling module 41, and a first downsampling module 51. , The superposition module 70 and the iterative residual correction system.
  • the first analysis module 11 is configured to generate an image to be processed the image I l-1 wherein R ⁇ l-1, the number of channels of the feature image R ⁇ l-1 I l-1 is larger than the number of channels to be processed image.
  • the first linking module 21 is configured to concatenate a feature image R ⁇ l-1 of a to-be-processed image with a noise image to obtain a first merged image RC ⁇ l-1 ; the first merged image RC ⁇ l-1
  • the number of channels is the sum of the number of channels in the feature image R ⁇ l-1 and the number of channels in the noise image noise.
  • both the first input image and the second input image provided to the generation network may include the first resolution sample image and Multiple noise sample images with different resolutions; or both the first input image and the second input image include a first resolution sample image and a noise sample image, and when iterating to the lth time, the network generates a noise sample image according to the amplitude Generate images of noise samples at the required multiples.
  • Interpolation module 31 is configured to be treated image I l-1 interpolated, the image to be processed to obtain I l-1 based on the fourth image resolution, the resolution of the fourth image resolution is 512 * 512.
  • the interpolation module can sample traditional interpolation methods such as bicubic interpolation for interpolation.
  • the resolution of the fourth resolution image is greater than the resolution of the image I l-1 to be processed.
  • the second analysis module 12 is configured to generate a feature image of a fourth resolution image, the number of channels of the feature image being greater than the number of channels of the fourth resolution image.
  • the first down-sampling module 51 is configured to down-sample a feature image of a fourth-resolution image to obtain a first down-sampled feature image.
  • the resolution of the down-sampled feature image is 256 * 256.
  • the second linking module 22 is configured to link the first merged image RC ⁇ l-1 with the first down-sampled feature image to obtain a second merged image.
  • the first up-sampling module 41 is configured to up-sample the second merged image to obtain a first up-sampled feature image R l 0 .
  • the iterative residual correction system is used to perform at least one residual correction on the first up-sampled feature image through back-projection to obtain a residual corrected feature image.
  • the iterative residual correction system includes a second down-sampling module 52, a second up-sampling module 42, and a residual determination module 60.
  • the second down-sampling module 52 is configured to down-sample 2 times the received image
  • the second up-sampling module 42 is configured to up-sample 2 times the received image
  • the residual determination module 60 is Construct a pair to determine the difference image between the two images it receives.
  • the first up-sampled feature image R l 0 is down-sampled by twice the first second down-sampling module 52 to obtain a feature image R l 01 ; the feature image R l 01 is subjected to the first After 2 times downsampling of the two second downsampling modules, a feature image R l 02 with the same resolution as the initial input image is obtained; then, a residual determination module is used to obtain the feature image R l 02 and the first resolution improvement
  • the first merged image RC ⁇ 0 in the step that is, the difference image between the feature image of the original input image and the first merged image RC ⁇ 0 after the noise image is merged
  • Use another residual determination module to obtain a difference image between the feature
  • the generation network also includes a synthesis module 80 configured to synthesize the feature images R l ⁇ obtained after multiple residual corrections to obtain a fifth resolution image with the same number of channels as the fourth resolution image ;
  • the fifth resolution image and the fourth resolution image are superimposed to obtain an output image I l after the l-th resolution improvement.
  • the resolution of the fifth-resolution image is the same as that of the fourth-resolution image.
  • the first analysis module 11, the second analysis module 12, the first upsampling module 41, the second upsampling module 42, the first downsampling module 51, the second downsampling module 52, and the synthesis module 80 can all Convolutional layers can be used to achieve the corresponding functions through each module.
  • the present disclosure also provides a computer device including a memory and a processor.
  • the memory stores a computer program, and the computer program implements the training method for generating an adversarial network when the computer program is executed by the processor.
  • the present disclosure also provides a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the training method for generating the adversarial network described above.
  • the above memory and the computer-readable storage medium include, but are not limited to, the following readable media: such as random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable only Read memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, disks or tapes, such as compact discs (CDs) or DVDs (digital (Universal Disk) optical storage media and other non-transitory media.
  • the processor include, but are not limited to, a general-purpose processor, a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a controller, a microcontroller, a state machine, and the like.

Abstract

一种生成对抗网络的训练方法、图像处理方法、计算机设备和计算机可读存储介质,所述训练方法包括生成网络训练步骤,生成网络训练步骤包括:从第二分辨率样本图像中提取第一分辨率样本图像(S1);分别将第一输入图像和第二输入图像提供给生成网络,以分别生成基于第一输入图像的第一输出图像和基于第二输入图像的第二输出图像(S2);分别将第一输出图像和第二分辨率样本图像提供给鉴别网络,鉴别网络输出基于第一输出图像的第一鉴别结果和基于第二分辨率样本图像的第二鉴别结果(S3);通过调整生成网络的参数以减小生成网络的损失函数(S4),从而得到所需要求的图像。

Description

生成对抗网络训练方法、图像处理方法、设备及存储介质
相关申请的交叉引用
本申请要求2018年9月30日提交中国知识产权局的、名称为“生成对抗网络训练方法、图像处理方法、设备及存储介质”的中国专利申请201811155930.9的优先权;本申请要求2018年9月30日提交中国知识产权局的、名称为“图像鉴别方法、鉴别器和计算机可读存储介质”的中国专利申请201811155326.6的优先权;本申请要求2018年9月30日提交中国知识产权局的、名称为“图像处理方法及系统、分辨率提升方法、可读存储介质”的中国专利申请201811155147.2的优先权;以及本申请要求2018年9月30日提交中国知识产权局的、名称为“训练图像的预处理方法及模块、鉴别器、可读存储介质”的中国专利申请201811155252.6的优先权,在此通过引用方式将该申请的公开内容并入本文。
技术领域
本公开涉及但不限于图像处理领域,具体涉及一种生成对抗网络的训练方法、利用所述训练方法得到的生成对抗网络的图像处理方法、计算机设备以及计算机可读存储介质。
背景技术
卷积神经网络是一种常见的深度学习网络,目前已被大量应用于图像处理领域,以实现图像识别、图像分类和图像超分辨率重构等。
在目前的超分辨率重构的方法中,基于第一分辨率图像而重构的第二分辨率图像(第二分辨率图像的分辨率大于第一分辨率图像的分辨率)往往缺少细节信息,导致第二分辨率图像看起来不真实。
发明内容
本公开旨在至少解决现有技术中存在的技术问题之一,提出了一种生成对抗网络的训练方法、利用所述训练方法得到的生成对抗网 络的图像处理方法、计算机设备和计算机可读存储介质。
为了解决上述技术问题之一,本公开提供一种生成对抗网络的训练方法,所述生成对抗网络包括生成网络和鉴别网络,所述生成网络用于将第一分辨率图像转化为第二分辨率图像,第二分辨率图像的分辨率大于第一分辨率图像的分辨率,所述训练方法包括生成网络训练步骤,所述生成网络训练步骤包括:
从第二分辨率样本图像中提取第一分辨率样本图像,所述第二分辨率样本图像的分辨率高于所述第一分辨率样本图像的分辨率;
分别将第一输入图像和第二输入图像提供给生成网络,以分别生成基于第一输入图像的第一输出图像和基于第二输入图像的第二输出图像;其中,第一输入图像包括所述第一分辨率样本图像和第一幅度的噪声样本所对应的第一噪声图像;所述第二输入图像包括所述第一分辨率样本图像和第二幅度的噪声样本所对应的第二噪声图像;所述第一幅度大于0,所述第二幅度等于0;
分别将所述第一输出图像和所述第二分辨率样本图像提供给鉴别网络,所述鉴别网络输出基于所述第一输出图像的第一鉴别结果和基于所述第二分辨率样本图像的第二鉴别结果;
调整所述生成网络的参数以减小生成网络的损失函数;其中,所述生成网络的损失函数包括第一损失、第二损失和第三损失,所述损失函数的第一损失基于所述第二输出图像和所述第二分辨率样本图像之间的重构误差;所述损失函数的第二损失基于所述第一输出图像与所述第二分辨率样本图像之间的感知误差;所述损失函数的第三损失基于所述第一鉴别结果和第二鉴别结果。
可选地,根据所述第二输出图像与所述第二分辨率样本图像的差值图像矩阵的L1泛数、所述第二输出图像与所述第二分辨率样本图像之间的均方误差、所述第二输出图像与所述第二分辨率样本图像之间的结构相似性中的任意一者确定所述第二输出图像和所述第二分辨率样本图像之间的重构误差。
可选地,所述第一输出图像和所述第二输出图像均由所述生成网络通过分辨率提升步骤的迭代处理生成;所述生成网络的损失函数 的第一损失为λ 1L rec(X,Y n=0),其中:
Figure PCTCN2019107761-appb-000001
其中,X为所述第二分辨率样本图像;
Y n=0为所述第二输出图像;
L rec(X,Y n=0)为所述第二输出图像与所述第二分辨率样本图像之间的重构误差;
L为所述迭代处理中分辨率提升步骤的总次数;L≥1;
Figure PCTCN2019107761-appb-000002
为所述生成网络基于所述第二输入图像进行的迭代处理中第l次分辨率提升步骤结束时生成的图像;l≤L;
LR为所述第一分辨率样本图像;
Figure PCTCN2019107761-appb-000003
为对
Figure PCTCN2019107761-appb-000004
进行下采样后得到的与第一分辨率样本图像分辨率相同的图像;
HR l为所述第二分辨率样本图像进行下采样后得到的与
Figure PCTCN2019107761-appb-000005
分辨率相同的图像;
E[]为对矩阵能量的计算;以及
λ 1为预设的权值。
可选地,所述生成网络的损失函数的第二损失为λ 2L per(X,Y n=1),其中:
Figure PCTCN2019107761-appb-000006
其中,Y n=1为所述第一输出图像;
L per(X,Y n=1)为所述第一输出图像与所述第二分辨率样本图像之间的感知误差;
Figure PCTCN2019107761-appb-000007
为所述生成网络基于所述第一输入图像进行的迭代处理中第l次分辨率提升步骤结束时生成的图像;
Figure PCTCN2019107761-appb-000008
为对
Figure PCTCN2019107761-appb-000009
进行下采样后得到的与第一分辨率样本图像的分辨率相同的图像;
L CX()为感知损失计算函数;以及
λ 2为预设的权值。
可选地,所述生成网络的损失函数的第三损失为λ 3L GAN(Y n=1),其中,
Figure PCTCN2019107761-appb-000010
其中,
Figure PCTCN2019107761-appb-000011
为所述生成网络基于所述第一输入图像进行迭代处理时生成图像组,该图像组包括各次分辨率提升步骤结束时生成的图像;
HR 1,2,...L为对第二分辨率样本图像进行下采样后得到的与
Figure PCTCN2019107761-appb-000012
中各个图像一一对应的分辨率相同的图像;
Figure PCTCN2019107761-appb-000013
为所述第一鉴别结果;
D(HR 1,2,...L)为所述第二鉴别结果;以及
λ 3为预设的权值。
可选地,λ 1:λ 2:λ 3=10:0.1:0.001。
可选地,所述噪声样本为随机噪声。可选地,所述训练方法还包括鉴别网络训练步骤,该鉴别网络训练步骤包括:将所述第一输出图像和所述第二分辨率样本图像分别提供给所述鉴别网络,使所述鉴别网络分别输出基于所述第一输出图像的鉴别结果和基于所述第二分辨率样本图像的鉴别结果;并通过调整所述鉴别网络的参数,以减小所述鉴别网络的损失函数;
所述鉴别网络训练步骤与所述生成网络训练步骤交替进行,直至达到预设训练条件。
可选地,所述第一输出图像和所述第二输出图像均由所述生成网络通过分辨率提升步骤的迭代处理生成,所述迭代处理中的分辨率提升步骤的总次数为L;当L大于1时,所述生成网络基于第一输入图像进行迭代处理中的前L-1次分辨率提升步骤中,每进行一次分辨率提升,生成网络均生成一个中间图像。
在所述鉴别网络训练步骤中,将所述第一输出图像提供给所述鉴别网络的同时,还将生成网络基于所述第一输入图像生成的各个中 间图像提供给生成网络;将所述第二分辨率样本图像提供给所述鉴别网络的同时,还将对所述第二分辨率样本图像进行下采样后得到的与各个中间图像一一对应的分辨率相同的第三分辨率样本图像提供给所述鉴别网络。
相应地,本公开还提供一种利用上述训练方法得到的生成对抗网络中的生成网络的图像处理方法,所述图像处理方法用于提升图像的分辨率,所述图像处理方法包括:
将输入图像和参考噪声所对应的噪声图像提供给所述生成网络,以使所述生成网络生成基于所述输入图像的第二分辨率图像。
可选地,所述参考噪声的幅度在0到所述第一幅度之间。
可选地,所述参考噪声为随机噪声。
相应地,本公开还提供一种计算机设备,包括存储器和处理器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时实现上述训练方法。
相应地,本公开还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述训练方法。
附图说明
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为重构失真和感知失真之间的关系示意图;
图2为本公开实施例中的生成网络训练步骤的流程图。
图3为本公开实施例中的生成网络的结构示意图。
具体实施方式
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。
图像超分辨率重构是对初始图像进行分辨率提升,以获得更高 分辨率的图像的技术。在图像超分辨率重构中,重构失真和感知失真用来评估超分辨率的重构效果。重构失真用来衡量重构图像和参考图像之间的差异程度,具体评价标准包括均方误差(MSE)、相似度(SSIM)、峰值信噪比(PSNR)等;感知失真更关注于使图像看起来和自然图像更像。图1为重构失真和感知失真之间的关系示意图。如图1所示,当重构失真较小时,则感知失真较大。此时,重构图像看起来更平滑而缺少细节。当感知失真较小时,重构失真较大。此时,重构图像的细节更丰富。目前的图像超分辨率重构方法往往追求较小的重构失真,但是,在一些应用场景下,人们往往更希望获得细节丰富的重构图像。
本公开提供一种生成对抗网络的训练方法,生成对抗网络包括生成网络和鉴别网络,生成网络用于将第一分辨率图像转化为第二分辨率图像,以获得目标分辨率的第二分辨率图像,第二分辨率图像的分辨率大于第一分辨率图像的分辨率。生成网络可以通过分辨率提升步骤的一次处理或多次迭代处理来获得第二分辨率图像。以待处理图像(即第一分辨率图像)的分辨率为128*128、目标分辨率为1024*1024为例,生成网络可以通过一次提升倍数为8倍的分辨率提升步骤来获得1024*1024的第二分辨率图像;也可以通过提升倍数为2倍的分辨率提升步骤进行三次迭代处理来依次获得分辨率为256*256、512*512、1024*1024的图像。
生成对抗网络的训练方法包括生成网络训练步骤。图2为本公开实施例中的生成网络训练步骤的流程图。如图2所示,生成网络训练步骤包括:
S1、从第二分辨率样本图像中提取第一分辨率样本图像,第二分辨率样本图像的分辨率高于第一分辨率样本图像的分辨率。具体地,第一分辨率样本图像可以通过对第二分辨率样本图像进行下采样后得到。
S2、分别将第一输入图像和第二输入图像提供给生成网络,以分别生成基于第一输入图像的第一输出图像和基于第二输入图像的第二输出图像;其中,第一输入图像包括第一分辨率样本图像和第一 幅度的噪声样本所对应的第一噪声图像;第二输入图像包括第一分辨率样本图像和第二幅度的噪声样本所对应的第二噪声图像。第一幅度大于0,第二幅度等于0。
其中,噪声样本的幅度为噪声样本的平均波动幅度。例如,噪声样本为随机噪声,噪声样本所对应的图像均值为μ,方差为σ,即,噪声样本所对应的图像中各像素值大部分在μ-σ~μ+σ之间波动;此时,噪声幅度为μ。可以理解的是,在图像处理过程中,图像均以矩阵表示,上述像素值则表示图像矩阵中的元素值。而当噪声样本的幅度为0时,由于图像矩阵中的各元素的值不小于0,则可以看作图像矩阵的各元素值均为0。
另外需要说明的是,在生成对抗网络的训练方法中有多个生成网络训练步骤;在同一个生成网络训练步骤中,第一分辨率样本图像为同一个,且接收第一输入图像和第二输入图像的生成网络的模型参数是相同的。
S3、分别将第一输出图像和第二分辨率样本图像提供给鉴别网络,鉴别网络输出基于第一输出图像的第一鉴别结果和基于第二分辨率样本图像的第二鉴别结果。第一鉴别结果用于表征第一输出图像与第二分辨率样本图像之间的匹配度,例如,第一鉴别结果用于表征鉴别网络判断第一输出图像为第二分辨率样本图像的概率;第二鉴别结果用于表征鉴别网络判断第二分辨率样品图像确实为第二分辨率样本图像的概率。
这里,鉴别网络可以看作具有打分功能的分类器。该鉴别网络可以对接收到的待鉴别图像进行打分,输出的分数则表示待鉴别图像(第一输出图像)为第二分辨率样本图像的概率,即,上述匹配度,其中,匹配度可以在0~1之间。当鉴别网络的输出为0或接近0时,表示鉴别网络将其接收到的待鉴别图像分类为非高分辨样本图像;当鉴别网络的输出为1或接近1时,表示其接收到的待鉴别图像为第二分辨率样本图像。
鉴别网络的打分功能可以利用预先确定分数的“真”样本和“假”样本进行训练。例如,“假”样本为生成网络生成的图像,“真”样 本为第二分辨率样本图像。鉴别网络的训练过程即:通过调整鉴别网络的参数,使得鉴别网络接收到“真”样本时输出接近1的分数;接收到“假”样本时输出接近0的分数。
S4、通过调整生成网络的参数以减小生成网络的损失函数。所谓“减小生成网络的损失函数”是指,损失函数的值相对于上一次生成网络训练步骤中是减小的,或者,多次生成网络训练步骤中,损失函数的值整体上呈减小的趋势。这里,生成网络的损失函数包括第一损失、第二损失和第三损失;具体的,损失函数是第一损失、第二损失和第三损失的叠加,其中,第一损失为基于第二输出图像和第二分辨率样本图像之间的重构误差;第二损失为基于第一输出图像与第二分辨率样本图像之间的感知误差;第三损失基于第一鉴别结果和第二鉴别结果。
在进行超分辨率重构时,重构的第二分辨率图像中细节特征(例如,毛发、线条等)往往会和噪声有关。当生成网络的训练中不加入噪声时,生成网络所生成的第二分辨率图像的重构失真较小,感知失真较大,肉眼看起来不够真实;当生成网络的训练中加入噪声时,重构的第二分辨率图像中的细节特征会比较明显,但是重构失真较大。而本公开中在生成网络的训练中,分别将包括幅度为0的噪声图像的第二输入图像和包括幅度为1的噪声图像的第一输入图像提供给了生成网络进行训练,且损失函数的第一损失反映了生成网络生成结果的重构失真,第二损失反映了生成网络生成结果的感知失真,即,损失函数结合了两种失真评价标准,在利用训练好的生成网络来对图像进行分辨率提升时,可以根据实际需要(即,是否需要获得突出图像的细节以及突出程度),来调节输入噪声的幅度,从而使重构的图像满足实际需求。例如,在给定重构失真范围的情况下,通过调节输入噪声的幅度,以达到最小的感知失真;或者在给定感知失真范围的情况下,通过调节输入噪声的幅度,以达到最小的重构失真。
需要说明的是,本实施例中所说的第一输入图像的噪声图像的幅度为1,是指对噪声图像的幅度进行归一化之后得到的幅度值。在本申请的其他实施例中,也可不对噪声图像的幅度进行归一化,则第一输 入图像的噪声图像的幅度值也可以为其他不为1的值。
可选地,噪声样本为随机噪声;第一噪声图像的均值为1。可选地,第一噪声图像的均值为:第一噪声图像的归一化图像的均值。例如,第一噪声图像为灰度图像,则对第一噪声图像进行归一化得到的图像中,各像素值的平均值即为第一噪声图像的均值;又例如,第一噪声图像为彩色图像,则对第一噪声图像各通道进行归一化后得到的图像中,各像素值的平均值即为第一噪声图像的均值。需要说明的是本公开实施例的图像的通道是对一副图像划分为一个或多个通道进行处理,例如一副RGB模式彩色图像,可以分为红、绿、蓝三个通道;如果是灰度图,就是一个通道的图像;如果是以HSV色系划分彩色图像指的是色调H,饱和度S,亮度V三个通道。
可选地,生成网络的损失函数如以下公式所示:
Loss=λ 1L rec(X,Y n=0)+λ 2L per(X,Y n=1)+λ 3L GAN(Y n=1)
其中,损失函数Loss的第一损失λ 1L rec(X,Y n=0)中,L rec(X,Y n=0)为第二输出图像和第二分辨率样本图像之间的重构误差。损失函数Loss的第二损失λ 2L per(X,Y n=1)中,L per(X,Y n=1)为第二输出图像与第二分辨率样本图像之间的感知误差。损失函数Loss的第三损失λ 2L per(X,Y n=1)中,λ 3L GAN(Y n=1)为第一鉴别结果和第二鉴别结果之和。λ 1、λ 2、λ 3均为预设的权值。例如,λ 1:λ 2:λ 3=10:0.1:0.001,或者,λ 1:λ 2:λ 3=1:1:0.5等,可根据实际需求进行调整。在一些实施例中,可以根据局部图像的连续性设置λ 1:λ 2:λ 3。在一些实施例中,可以根据图像中的目标像素设置λ 1:λ 2:λ 3
具体地,第二输出图像Y n=0与第二分辨率样本图像X之间的重构误差L rec(X,Y n=0)根据以下公式计算:
Figure PCTCN2019107761-appb-000014
其中,第一输出图像和第二输出图像均由生成网络通过分辨率提升步骤的迭代处理生成;迭代处理中的分辨率提升步骤的总次数为L,L≥1。
Figure PCTCN2019107761-appb-000015
为生成网络基于第二输入图像进行的迭代处理中第l次分辨率提升步骤结束时生成的图像;l≤L。可以理解的是,当l=L时,生成网络即生成第二输出图像Y n=0
LR为第一分辨率样本图像;
Figure PCTCN2019107761-appb-000016
为对
Figure PCTCN2019107761-appb-000017
进行下采样后得到的与第一分辨率样本图像分辨率相同的图像。下采样方式可以与步骤S1中从第二分辨率样本图像中获取第一分辨率样本图像的方式相同。
HR l为第二分辨率样本图像进行下采样后得到的与
Figure PCTCN2019107761-appb-000018
分辨率相同的图像。在此需要注意的是,当l=L时,
Figure PCTCN2019107761-appb-000019
即为第二输出图像Y n=0,此时,HR l即为第二分辨率样本图像本身,也可以看作对第二分辨率样本图像进行倍率为1的下采样后得到的图像。
E[]为对矩阵能量的计算。例如,E[]可以为计算“[]”中的矩阵中元素的最大值或平均值。
对于生成网络迭代多次分辨率提升步骤的情况,在计算重构误差时,不仅仅计算第二输出图像本身与第二分辨率样本图像之间的差值图像矩阵的L1泛数,而且还累加了生成网络生成的第三分辨率图像(即,
Figure PCTCN2019107761-appb-000020
)与相同分辨率的第三分辨率样本图像(即,HR 1、HR 2、…HR L-1)之间的差值图像矩阵的L1泛数。同时,还累加了第三分辨率图像、第二输出图像下采样的图像与第一分辨率样本图像之间的差值图像的L1泛数,从而在利用生成网络进行分辨率提升时,当输入幅度为零的噪声时,生成网络最终输出的图像能够达到尽量小的重构失真。需要说明的是第三分辨率图像的分辨率大于第一分辨率样本图像的分辨率,第三分辨率图像的分辨率和第三分辨率样本图像的分辨率相同。
上述实施例中,第二输出图像和第二分辨率样本图像之间的重构误差L rec(X,Y n=0)是基于第二输出图像与第二分辨率样本图像的差值图像矩阵的L1泛数得到的,当然,也可以基于第二输出图像与第二分辨率样本图像之间的均方误差(MSE)得到重构误差,或者基于第二输出图像与第二分辨率样本图像之间的结构相似性(SSIM)得到重构误差。
可选地,第一输出图像Y n=1与第二分辨率样本图像X之间的感知误差L per(X,Y n=1)根据以下公式计算:
Figure PCTCN2019107761-appb-000021
Figure PCTCN2019107761-appb-000022
为生成网络基于第一输入图像进行的迭代处理中第l次分辨率提升步骤结束时生成的图像;l≤L。可以理解的是,当l=L时,生成网络即生成第一输出图像Y n=1
Figure PCTCN2019107761-appb-000023
为对
Figure PCTCN2019107761-appb-000024
进行下采样后得到的与第一分辨率样本图像LR分辨率相同的图像。下采样方式可以与步骤S1中从第二分辨率样本图像中获取第一分辨率样本图像的方式相同。HR l、E[]的含义参见上文中描述,这里不再赘述。
L CX()为感知损失(Contextual Loss)计算函数。
与计算重构误差类似地,感知误差的计算不仅利用感知损失函数计算了第一输出图像与第二分辨率样本图像的差异,还累加计算了:生成网络基于第一输入图像生成的第三分辨率图像(即,
Figure PCTCN2019107761-appb-000025
)与相同分辨率的第三分辨率样本图像(即,HR 1、HR 2、…HR L-1)之间的差异。同时,还累加了第三分辨率图像、第二输出图像下采样的图像与第一分辨率样本图像之间的差异,从而在利用生成网络进行分辨率提升时,当输入幅度为第一幅度的噪声时,生成网络最终输出的图像能够达到尽量小的感知失真。
可选地,生成网络的损失函数的第三损失中的L GAN(Y n=1)根据以下公式计算:
Figure PCTCN2019107761-appb-000026
其中,
Figure PCTCN2019107761-appb-000027
为生成网络基于第一输入图像进行迭代处理时生成图像组,该图像组中包括各次分辨率提升步骤结束时生成的图像。当L=1时,该图像组中仅包括上述第一输出图像;当L>1时,该图像组中包括上述
Figure PCTCN2019107761-appb-000028
Figure PCTCN2019107761-appb-000029
以及第一输出图像Y n=1
HR 1,2,...L为对第二分辨率样本图像进行下采样后得到的与
Figure PCTCN2019107761-appb-000030
中各个图像的分辨率一一对应的相同的图像。其中,HR L即为第二分辨率样本图像本身。
Figure PCTCN2019107761-appb-000031
为鉴别网络基于
Figure PCTCN2019107761-appb-000032
的鉴别结果,即,第一鉴别结果;D(HR 1,2,...L)为鉴别网络基于HR 1,2,...L的鉴别结果,即,第二鉴别结果。
在本公开的训练方法中,除了上述生成网络的训练步骤外,还包括鉴别网络训练步骤,鉴别网络训练步骤包括:将第一输出图像和第二分辨率样本图像分别提供给鉴别网络,使鉴别网络分别输出基于第一输出图像的鉴别结果和基于第二分辨率样本图像的鉴别结果;并通过调整鉴别网络的参数,以减小鉴别网络的损失函数。
鉴别网络训练步骤与生成网络训练步骤交替进行,直至达到预设训练条件。该预设训练条件例如可以为交替次数达到预定值。
其中,在初始化时,生成网络和鉴别网络的参数是设定的或随机的。
如上文中所述,第一输出图像和第二输出图像均由生成网络通过分辨率提升步骤的迭代处理生成,迭代总次数为L次。当L=1时,每次向鉴别网络提供图像时,可以只将第一输出图像或第二分辨率样本图像提供给鉴别网络。当L>1时,生成网络基于第一输入图像进行的前L-1次分辨率提升步骤中,每进行一次分辨率提升,生成网络均生成一个中间图像;迭代第L次时,生成网络生成的图像即为第一输出图像。此时,鉴别网络配置为具有多个输入端,以同时接收多个图像,并根据接收到的多个图像确定其中分辨率最高的一者与第二分辨率样本图像之间的匹配度。而在鉴别网络训练步骤中,将第一输出图像提供给鉴别网络的同时,还将生成网络基于第一输入图像生成的各个中间图像提供给鉴别网络;将第二分辨率样本图像提供给鉴别网络的同时,还将对第二分辨率样本图像进行下采样后得到的与各个中间图像一一对应的且分辨率相同的第三分辨率样本图像提供给鉴别网络。
在生成网络的训练过程中,通过调整生成网络的参数,以使生 成网络的输出结果输入鉴别网络后,鉴别网络输出尽量接近1的匹配度,以作为鉴别结果,即,使鉴别网络认为生成网络的输出结果为第二分辨率样本图像。在鉴别网络的训练过程中,通过调整鉴别网络的参数,以使得第二分辨率样本图像输入鉴别网络后,鉴别网络输出尽量接近1的匹配度,且在生成网络的输出结果输入鉴别网络后,鉴别网络输出尽量接近0的匹配度;即,鉴别网络通过训练能够判断出其接收到的图像是否是第二分辨率样本图像。通过生成网络和鉴别网络的交替训练,使得鉴别网络不断优化,以提高鉴别能力;而生成网络不断优化,以使输出结果尽可能接近第二分辨率样本图像。这种方法使得两个相互“对抗”的模型在每次训练中基于另一模型越来越好的结果而进行竞争和不断改进,以得到越来越优的生成对抗网络模型。
本公开还提供一种利用如上述训练方法得到的生成对抗网络的图像处理方法,该图像处理方法用于利用生成对抗网络中的生成网络提升图像的分辨率,图像处理方法包括:将输入图像和参考噪声所对应的噪声图像提供给生成网络,以使生成网络生成比输入图像的分辨率更高的图像。其中,参考噪声的幅度在0到第一幅度之间。具体地,参考噪声为随机噪声。
本公开在训练生成对抗网络中的生成网络时,分别给生成网络提供了幅度为零的噪声样本和第一幅度的噪声样本,并且,生成网络的损失函数结合了重构失真和感知失真两种失真评价标准,那么,在利用生成网络对图像进行分辨率提升时,则可以根据实际需要来调节参考噪声的幅度,从而满足实际需求。例如,在给定重构失真范围的情况下,通过调节参考噪声的幅度,以达到最小的感知失真;或者在给定感知失真范围的情况下,通过调节参考噪声的幅度,以达到最小的重构失真。
图3为本公开实施例中的生成网络的结构示意图。下面结合图3对生成网络进行介绍。生成网络用于进行分辨率提升的迭代处理,每次分辨率提升过程将待处理图像I l-1的分辨率进行提升,以得到分辨率提升后的图像I l。当分辨率提升的总迭代次数为1时,待处理图像I l-1即为初始的输入图像;当分辨率提升的总迭代次数为L次,且L>1 时,则待处理图像I l-1为对初始的输入图像进行l-1次分辨率提升后的输出图像。下面以初始的输入图像的分辨率为128*128、每次分辨率提升倍数为2、l=2为例对生成网络进行介绍。此时,图中的待处理图像I l-1则为经过一次分辨率提升后得到的256*256的图像。
如图3所示,生成网络包括第一分析模块11、第二分析模块12、第一联接模块21、第二联接模块22、插值模块31、第一上采样模块41、第一下采样模块51、叠加模块70和迭代的残差校正系统。
第一分析模块11被构造成生成待处理图像I l-1的特征图像R μ l-1,该特征图像R μ l-1的通道数大于待处理图像I l-1的通道数。
第一联接模块21被构造成将待处理图像的特征图像R μ l-1与噪声图像noise联接(concatenate),得到第一合并图像RC μ l-1;该第一合并图像RC μ l-1的通道数为特征图像R μ l-1的通道数与噪声图像noise的通道数之和。
在此需要注意的是,噪声图像noise的分辨率与待处理图像I l-1的分辨率相同。因此,当生成网络执行分辨率提升的迭代总次数为多次时,在生成网络的训练步骤中,向生成网络提供的第一输入图像和第二输入图像均可以包括第一分辨率样本图像和多个不同分辨率的噪声样本图像;或者,第一输入图像和第二输入图像均包括第一分辨率样本图像和一个噪声样本图像,当迭代至第l次时,生成网络根据噪声样本的幅度生成所需倍数的噪声样本图像。
插值模块31被构造成对待处理图像I l-1进行插值,得到基于待处理图像I l-1的第四分辨率图像,该第四分辨率图像的分辨率为512*512。插值模块可以采样双三次插值(bicubic)等传统插值方法进行插值。所述第四分辨率图像的分辨率大于所述待处理图像I l-1的分辨率。
第二分析模块12被构造成生成第四分辨率图像的特征图像,该特征图像的通道数大于第四分辨率图像的通道数。
第一下采样模块51被构造成对第四分辨率图像的特征图像进行下采样,以获得第一下采样特征图像。该下采样特征图像的分辨率为256*256。
第二联接模块22被构造成将第一合并图像RC μ l-1与第一下采样特征图像联接,得到第二合并图像。
第一上采样模块41被构造成对第二合并图像进行上采样,得到第一上采样特征图像R l 0
迭代的残差校正系统用于通过反向投影(back-projection)对第一上采样特征图像进行至少一次残差校正,得到经过残差修正的特征图像。
其中,迭代残差校正系统包括第二下采样模块52、第二上采样模块42和残差确定模块60。第二下采样模块52被构造成对其接收到的图像进行2倍的下采样,第二上采样模块42被构造成对其接收到的图像进行2倍的上采样;残差确定模块60被构造成对确定其接收到的两个图像之间的差值图像。
在第一次残差校正时,第一上采样特征图像R l 0经过第一个第二下采样模块52的2倍下采样后,得到特征图像R l 01;该特征图像R l 01经过第二个第二下采样模块的2倍下采样后,得到与初始输入图像分辨率相同的特征图像R l 02;之后,利用一个残差确定模块获取特征图像R l 02与第一次分辨率提升步骤中的第一合并图像RC μ 0(即,原始输入图像的特征图像与噪声图像合并后的第一合并图像RC μ 0)之间的差值图像;然后利用第二上采样模块对该差值图像进行上采样,并利用叠加模块70将上采样后得到的特征图像与特征图像R 01 l叠加,得到与第一合并图像R 1 l-1分辨率相同的特征图像R 03 l;之后,利用另一个残差确定模块得到特征图像R 03 l与第一合并图像RC μ l-1之间的差值图像;并利用第二上采样模块42对该差值图像进行2倍的上采样,上采样后的图像与第一上采样特征图像R l 0叠加,得到经过第一次残像校正后的特征图像R l 1
之后可以通过同样的过程对特征图像R l 1进行第2次残差校正,得到经过第二次残差校正后的特征图像R l 2;还可以再通过同样的过程对特征图像R l 2进行第3次残像校正,以此类推。图中μ表示残差校正的次数。
生成网络还包括合成模块80,该合成模块80被构造成对经过多 次残差校正后得到的特征图像R l μ进行合成,以得到与第四分辨率图像通道数相同的第五分辨率图像;该第五分辨率图像和第四分辨率图像叠加,得到第l次分辨率提升后的输出图像I l。第五分辨率图像的分辨率与第四分辨率图像的分辨率相同。
在生成网络中,第一分析模块11、第二分析模块12、第一上采样模块41、第二上采样模块42、第一下采样模块51、第二下采样模块52和合成模块80均可以通过各模块均可以卷积层来实现相应的功能。
上述是以l=2为例,对迭代处理中的第二次分辨率提升过程进行了介绍,其他次分辨率提升过程与上述过程类似,这里不再详细说明。
本公开还提供一种计算机设备,包括存储器和处理器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时实现上述生成对抗网络的训练方法。
本公开还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述生成对抗网络的训练方法。
上述存储器和所述计算机可读存储介质包括但不限于以下可读介质:诸如随机存取存储器(RAM)、只读存储器(ROM)、非易失性随机存取存储器(NVRAM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除PROM(EEPROM)、闪存、磁或光数据存储、寄存器、磁盘或磁带、诸如光盘(CD)或DVD(数字通用盘)的光存储介质以及其它非暂时性介质。处理器的示例包括但不限于通用处理器、中央处理单元(CPU)、微处理器、数字信号处理器(DSP)、控制器、微控制器、状态机等。
可以理解的是,以上实施方式仅仅是为了说明本公开的原理而采用的示例性实施方式,然而本公开并不局限于此。对于本领域内的普通技术人员而言,在不脱离本公开的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本公开的保护范围。

Claims (14)

  1. 一种生成对抗网络的训练方法,所述生成对抗网络包括生成网络和鉴别网络,所述生成网络用于将第一分辨率图像转化为第二分辨率图像,第二分辨率图像的分辨率大于第一分辨率图像的分辨率,所述训练方法包括生成网络训练步骤,其中,所述生成网络训练步骤包括:
    从第二分辨率样本图像中提取第一分辨率样本图像,所述第二分辨率样本图像的分辨率高于所述第一分辨率样本图像的分辨率;
    分别将第一输入图像和第二输入图像提供给所述生成网络,以分别生成基于第一输入图像的第一输出图像和基于第二输入图像的第二输出图像;其中,第一输入图像包括所述第一分辨率样本图像和第一幅度的噪声样本所对应的第一噪声图像;所述第二输入图像包括所述第一分辨率样本图像和第二幅度的噪声样本所对应的第二噪声图像;所述第一幅度大于0,所述第二幅度等于0;
    分别将所述第一输出图像和所述第二分辨率样本图像提供给鉴别网络,所述鉴别网络输出基于所述第一输出图像的第一鉴别结果和基于所述第二分辨率样本图像的第二鉴别结果;
    调整所述生成网络的参数以减小生成网络的损失函数,其中,所述生成网络的损失函数包括第一损失、第二损失和第三损失,所述第一损失基于所述第二输出图像和所述第二分辨率样本图像之间的重构误差;所述第二损失基于所述第一输出图像与所述第二分辨率样本图像之间的感知误差;所述第三损失基于所述第一鉴别结果和第二鉴别结果。
  2. 根据权利要求1所述的训练方法,其中,根据所述第二输出图像与所述第二分辨率样本图像的差值图像矩阵的L1泛数、所述第二输出图像与所述第二分辨率样本图像之间的均方误差、所述第二输出图像与所述第二分辨率样本图像之间的结构相似性中的任意一者确定所述第二输出图像和所述第二分辨率样本图像之间的重构误差。
  3. 根据权利要求1所述的训练方法,其中,所述第一输出图像和所述第二输出图像均由所述生成网络通过分辨率提升步骤的迭代处理生成,所述生成网络的损失函数的第一损失为λ 1L rec(X,Y n=0),其中:
    Figure PCTCN2019107761-appb-100001
    其中,X为所述第二分辨率样本图像;
    Y n=0为所述第二输出图像;
    L rec(X,Y n=0)为所述第二输出图像与所述第二分辨率样本图像之间的重构误差;
    L为所述迭代处理中分辨率提升步骤的总次数,L≥1;
    Figure PCTCN2019107761-appb-100002
    为所述生成网络基于所述第二输入图像进行的迭代处理中第l次分辨率提升步骤结束时生成的图像,l≤L;
    LR为所述第一分辨率样本图像;
    Figure PCTCN2019107761-appb-100003
    为对
    Figure PCTCN2019107761-appb-100004
    进行下采样后得到的与第一分辨率样本图像分辨率相同的图像;
    HR l为所述第二分辨率样本图像进行下采样后得到的与
    Figure PCTCN2019107761-appb-100005
    分辨率相同的图像;
    E[]为对矩阵能量的计算;以及
    λ 1为预设的权值。
  4. 根据权利要求3所述的训练方法,其中,所述生成网络的损失函数的第二损失为λ 2L per(X,Y n=1),其中:
    Figure PCTCN2019107761-appb-100006
    其中,Y n=1为所述第一输出图像;
    L per(X,Y n=1)为所述第一输出图像与所述第二分辨率样本图像之间的感知误差;
    Figure PCTCN2019107761-appb-100007
    为所述生成网络基于所述第一输入图像进行的迭代处理中第 l次分辨率提升步骤结束时生成的图像;
    Figure PCTCN2019107761-appb-100008
    为对
    Figure PCTCN2019107761-appb-100009
    进行下采样后得到的与第一分辨率样本图像分辨率相同的图像;
    L CX()为感知损失计算函数;以及
    λ 2为预设的权值。
  5. 根据权利要求4所述的训练方法,其中,所述生成网络的损失函数的第三损失为λ 3L GAN(Y n=1),其中,
    Figure PCTCN2019107761-appb-100010
    其中,
    Figure PCTCN2019107761-appb-100011
    为所述生成网络基于所述第一输入图像进行迭代处理时生成图像组,该图像组包括各次分辨率提升步骤结束时生成的图像;
    HR 1,2,...L为对第二分辨率样本图像进行下采样后得到的与
    Figure PCTCN2019107761-appb-100012
    中各个图像一一对应的分辨率相同的图像;
    Figure PCTCN2019107761-appb-100013
    为所述第一鉴别结果;D(HR 1,2,...L)为所述第二鉴别结果;以及
    λ 3为预设的权值。
  6. 根据权利要求5所述的训练方法,其中,λ 1:λ 2:λ 3=10:0.1:0.001。
  7. 根据权利要求1所述的训练方法,其中,所述噪声样本为随机噪声。
  8. 根据权利要求1所述的训练方法,其中,
    所述训练方法还包括鉴别网络训练步骤,该鉴别网络训练步骤包括:将所述第一输出图像和所述第二分辨率样本图像分别提供给所述鉴别网络,使所述鉴别网络分别输出基于所述第一输出图像的鉴别结果和基于所述第二分辨率样本图像的鉴别结果;并通过调整所述鉴 别网络的参数,以减小所述鉴别网络的损失函数;
    所述鉴别网络训练步骤与所述生成网络训练步骤交替进行,直至达到预设训练条件。
  9. 根据权利要求8所述的训练方法,其中,
    所述第一输出图像和所述第二输出图像均由所述生成网络通过分辨率提升步骤的迭代处理生成,所述迭代处理中的分辨率提升步骤的总次数为L;当L大于1时,所述生成网络基于第一输入图像进行迭代处理中的前L-1次分辨率提升步骤中,每进行一次分辨率提升,生成网络均生成一个中间图像;
    在所述鉴别网络训练步骤中,将所述第一输出图像提供给所述鉴别网络的同时,还将生成网络基于所述第一输入图像生成的各个中间图像提供给生成网络;将所述第二分辨率样本图像提供给所述鉴别网络的同时,还将对所述第二分辨率样本图像进行下采样后得到的与各个中间图像一一对应的分辨率相同的第三分辨率样本图像提供给所述鉴别网络。
  10. 一种利用如权利要求1至9中任意一项的训练方法得到的生成对抗网络中的生成网络的图像处理方法,其中,所述图像处理方法用于提升图像的分辨率,所述图像处理方法包括:
    将输入图像和参考噪声所对应的噪声图像提供给所述生成网络,以使所述生成网络生成基于所述输入图像的第二分辨率图像。
  11. 根据权利要求10所述的图像处理方法,其中,所述参考噪声的幅度在0到所述第一幅度之间。
  12. 根据权利要求10所述的图像处理方法,其中,所述参考噪声为随机噪声。
  13. 一种计算机设备,包括存储器和处理器,所述存储器上存 储有计算机程序,其中,所述计算机程序被所述处理器执行时实现权利要求1至9中任意一项所述的训练方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至9中任意一项所述的训练方法。
PCT/CN2019/107761 2018-09-30 2019-09-25 生成对抗网络训练方法、图像处理方法、设备及存储介质 WO2020063648A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP19864756.2A EP3859655A4 (en) 2018-09-30 2019-09-25 TRAINING METHOD, IMAGE PROCESSING METHOD, DEVICE AND STORAGE MEDIUM FOR GENERATIVE ADVERSARY NETWORK
JP2020528931A JP7446997B2 (ja) 2018-09-30 2019-09-25 敵対的生成ネットワークのトレーニング方法、画像処理方法、デバイスおよび記憶媒体
KR1020207014462A KR102389173B1 (ko) 2018-09-30 2019-09-25 생성 적대 네트워크를 위한 트레이닝 방법, 이미지 처리 방법, 디바이스 및 저장 매체
US16/759,669 US11449751B2 (en) 2018-09-30 2019-09-25 Training method for generative adversarial network, image processing method, device and storage medium

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN201811155326.6A CN109345455B (zh) 2018-09-30 2018-09-30 图像鉴别方法、鉴别器和计算机可读存储介质
CN201811155930.9 2018-09-30
CN201811155147.2 2018-09-30
CN201811155252.6A CN109255390B (zh) 2018-09-30 2018-09-30 训练图像的预处理方法及模块、鉴别器、可读存储介质
CN201811155147.2A CN109360151B (zh) 2018-09-30 2018-09-30 图像处理方法及系统、分辨率提升方法、可读存储介质
CN201811155252.6 2018-09-30
CN201811155930.9A CN109345456B (zh) 2018-09-30 2018-09-30 生成对抗网络训练方法、图像处理方法、设备及存储介质
CN201811155326.6 2018-09-30

Publications (1)

Publication Number Publication Date
WO2020063648A1 true WO2020063648A1 (zh) 2020-04-02

Family

ID=69950197

Family Applications (4)

Application Number Title Priority Date Filing Date
PCT/CN2019/083872 WO2020062846A1 (en) 2018-09-30 2019-04-23 Apparatus and method for image processing, and system for training neural network
PCT/CN2019/092042 WO2020062957A1 (en) 2018-09-30 2019-06-20 System, method, and computer-readable medium for image classification
PCT/CN2019/092113 WO2020062958A1 (en) 2018-09-30 2019-06-20 Apparatus, method, and computer-readable medium for image processing, and system for training a neural network
PCT/CN2019/107761 WO2020063648A1 (zh) 2018-09-30 2019-09-25 生成对抗网络训练方法、图像处理方法、设备及存储介质

Family Applications Before (3)

Application Number Title Priority Date Filing Date
PCT/CN2019/083872 WO2020062846A1 (en) 2018-09-30 2019-04-23 Apparatus and method for image processing, and system for training neural network
PCT/CN2019/092042 WO2020062957A1 (en) 2018-09-30 2019-06-20 System, method, and computer-readable medium for image classification
PCT/CN2019/092113 WO2020062958A1 (en) 2018-09-30 2019-06-20 Apparatus, method, and computer-readable medium for image processing, and system for training a neural network

Country Status (9)

Country Link
US (4) US11615505B2 (zh)
EP (4) EP3857447A4 (zh)
JP (3) JP7415251B2 (zh)
KR (2) KR102661434B1 (zh)
AU (1) AU2019350918B2 (zh)
BR (1) BR112020022560A2 (zh)
MX (1) MX2020013580A (zh)
RU (1) RU2762144C1 (zh)
WO (4) WO2020062846A1 (zh)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7312026B2 (ja) * 2019-06-12 2023-07-20 キヤノン株式会社 画像処理装置、画像処理方法およびプログラム
EP3788933A1 (en) * 2019-09-05 2021-03-10 BSH Hausgeräte GmbH Method for controlling a home appliance
CN114981836A (zh) * 2020-01-23 2022-08-30 三星电子株式会社 电子设备和电子设备的控制方法
US11507831B2 (en) 2020-02-24 2022-11-22 Stmicroelectronics International N.V. Pooling unit for deep learning acceleration
EP3937120B1 (en) * 2020-07-08 2023-12-20 Sartorius Stedim Data Analytics AB Computer-implemented method, computer program product and system for processing images
US11887279B2 (en) * 2020-08-25 2024-01-30 Sharif University Of Technology Machine learning-based denoising of an image
US11455811B2 (en) * 2020-08-28 2022-09-27 Check it out Co., Ltd. System and method for verifying authenticity of an anti-counterfeiting element, and method for building a machine learning model used to verify authenticity of an anti-counterfeiting element
CN112132012B (zh) * 2020-09-22 2022-04-26 中国科学院空天信息创新研究院 基于生成对抗网络的高分辨率sar船舶图像生成方法
CN114830168A (zh) * 2020-11-16 2022-07-29 京东方科技集团股份有限公司 图像重建方法、电子设备和计算机可读存储介质
CN112419200B (zh) * 2020-12-04 2024-01-19 宁波舜宇仪器有限公司 一种图像质量优化方法及显示方法
US11895330B2 (en) * 2021-01-25 2024-02-06 Lemon Inc. Neural network-based video compression with bit allocation
CN113012064B (zh) * 2021-03-10 2023-12-12 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN112884673A (zh) * 2021-03-11 2021-06-01 西安建筑科技大学 改进损失函数SinGAN的墓室壁画分块间缺失信息的重建方法
CN113962360B (zh) * 2021-10-09 2024-04-05 西安交通大学 一种基于gan网络的样本数据增强方法及系统
KR102548283B1 (ko) * 2021-12-22 2023-06-27 (주)뉴로컴즈 콘볼루션 신경망 컴퓨팅 장치
CN114331903B (zh) * 2021-12-31 2023-05-12 电子科技大学 一种图像修复方法及存储介质
CN115063492B (zh) * 2022-04-28 2023-08-08 宁波大学 一种抵抗jpeg压缩的对抗样本的生成方法
KR20240033619A (ko) 2022-09-05 2024-03-12 삼성에스디에스 주식회사 문서 내 관심 영역 추출 방법 및 장치
CN115393242A (zh) * 2022-09-30 2022-11-25 国网电力空间技术有限公司 一种基于gan的电网异物图像数据增强的方法和装置
CN115631178B (zh) * 2022-11-03 2023-11-10 昆山润石智能科技有限公司 自动晶圆缺陷检测方法、系统、设备及存储介质
CN117196985A (zh) * 2023-09-12 2023-12-08 军事科学院军事医学研究院军事兽医研究所 一种基于深度强化学习的视觉去雨雾方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8675999B1 (en) * 2012-09-28 2014-03-18 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Apparatus, system, and method for multi-patch based super-resolution from an image
CN107767343A (zh) * 2017-11-09 2018-03-06 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
WO2018086354A1 (zh) * 2016-11-09 2018-05-17 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
CN109255390A (zh) * 2018-09-30 2019-01-22 京东方科技集团股份有限公司 训练图像的预处理方法及模块、鉴别器、可读存储介质
CN109345456A (zh) * 2018-09-30 2019-02-15 京东方科技集团股份有限公司 生成对抗网络训练方法、图像处理方法、设备及存储介质
CN109345455A (zh) * 2018-09-30 2019-02-15 京东方科技集团股份有限公司 图像鉴别方法、鉴别器和计算机可读存储介质
CN109360151A (zh) * 2018-09-30 2019-02-19 京东方科技集团股份有限公司 图像处理方法及系统、分辨率提升方法、可读存储介质

Family Cites Families (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781196A (en) 1990-10-19 1998-07-14 Eidos Plc Of The Boat House Video compression by extracting pixel changes exceeding thresholds
US5754697A (en) 1994-12-02 1998-05-19 Fu; Chi-Yung Selective document image data compression technique
US6766067B2 (en) * 2001-04-20 2004-07-20 Mitsubishi Electric Research Laboratories, Inc. One-pass super-resolution images
US7215831B2 (en) 2001-04-26 2007-05-08 Georgia Tech Research Corp. Video enhancement using multiple frame techniques
AU2002366985A1 (en) 2001-12-26 2003-07-30 Yeda Research And Development Co.Ltd. A system and method for increasing space or time resolution in video
CN101593269B (zh) 2008-05-29 2012-05-02 汉王科技股份有限公司 人脸识别装置及方法
CN102770887A (zh) 2010-01-28 2012-11-07 耶路撒冷希伯来大学伊森姆研究发展有限公司 用于从输入图像产生像素分辨率提高的输出图像的方法和系统
CN101872472B (zh) 2010-06-02 2012-03-28 中国科学院自动化研究所 一种基于样本学习的人脸图像超分辨率重建方法
US9378542B2 (en) 2011-09-28 2016-06-28 The United States Of America As Represented By The Secretary Of The Army System and processor implemented method for improved image quality and generating an image of a target illuminated by quantum particles
US8737728B2 (en) * 2011-09-30 2014-05-27 Ebay Inc. Complementary item recommendations using image feature data
EP2662824A1 (en) 2012-05-10 2013-11-13 Thomson Licensing Method and device for generating a super-resolution version of a low resolution input data structure
CN102915527A (zh) 2012-10-15 2013-02-06 中山大学 基于形态学成分分析的人脸图像超分辨率重建方法
US9948963B2 (en) 2012-11-27 2018-04-17 Lg Electronics Inc. Signal transceiving apparatus and signal transceiving method
CN103514580B (zh) 2013-09-26 2016-06-08 香港应用科技研究院有限公司 用于获得视觉体验优化的超分辨率图像的方法和系统
EP2908285A1 (en) 2014-02-13 2015-08-19 Thomson Licensing Method for performing super-resolution on single images and apparatus for performing super-resolution on single images
CN104853059B (zh) 2014-02-17 2018-12-18 台达电子工业股份有限公司 超分辨率图像处理方法及其装置
TWI492187B (zh) 2014-02-17 2015-07-11 Delta Electronics Inc 超解析度影像處理方法及其裝置
CN103903236B (zh) * 2014-03-10 2016-08-31 北京信息科技大学 人脸图像超分辨率重建的方法和装置
WO2015143624A1 (en) 2014-03-25 2015-10-01 Spreadtrum Communications(Shanghai) Co., Ltd. Methods and systems for denoising images
US9865036B1 (en) 2015-02-05 2018-01-09 Pixelworks, Inc. Image super resolution via spare representation of multi-class sequential and joint dictionaries
KR102338372B1 (ko) * 2015-09-30 2021-12-13 삼성전자주식회사 영상으로부터 객체를 분할하는 방법 및 장치
RU2694021C1 (ru) 2015-12-14 2019-07-08 Моушен Метрикс Интернешэнл Корп. Способ и устройство идентификации частей фрагментированного материала в пределах изображения
US10360477B2 (en) * 2016-01-11 2019-07-23 Kla-Tencor Corp. Accelerating semiconductor-related computations using learning based models
CN107315566B (zh) * 2016-04-26 2020-11-03 中科寒武纪科技股份有限公司 一种用于执行向量循环移位运算的装置和方法
FR3050846B1 (fr) * 2016-04-27 2019-05-03 Commissariat A L'energie Atomique Et Aux Energies Alternatives Dispositif et procede de distribution de donnees de convolution d'un reseau de neurones convolutionnel
CN105976318A (zh) 2016-04-28 2016-09-28 北京工业大学 一种图像超分辨率重建方法
CN105975931B (zh) 2016-05-04 2019-06-14 浙江大学 一种基于多尺度池化的卷积神经网络人脸识别方法
CN105975968B (zh) * 2016-05-06 2019-03-26 西安理工大学 一种基于Caffe框架的深度学习车牌字符识别方法
RU2635883C1 (ru) * 2016-06-02 2017-11-16 Самсунг Электроникс Ко., Лтд. Способ и система обработки изображений для формирования изображений сверхвысокого разрешения
US10319076B2 (en) 2016-06-16 2019-06-11 Facebook, Inc. Producing higher-quality samples of natural images
US11024009B2 (en) 2016-09-15 2021-06-01 Twitter, Inc. Super resolution using a generative adversarial network
JP2018063504A (ja) * 2016-10-12 2018-04-19 株式会社リコー 生成モデル学習方法、装置及びプログラム
KR20180057096A (ko) * 2016-11-21 2018-05-30 삼성전자주식회사 표정 인식과 트레이닝을 수행하는 방법 및 장치
CN108229508B (zh) * 2016-12-15 2022-01-04 富士通株式会社 用于训练图像处理装置的训练装置和训练方法
KR101854071B1 (ko) * 2017-01-13 2018-05-03 고려대학교 산학협력단 딥러닝을 사용하여 관심 부위 이미지를 생성하는 방법 및 장치
US10482639B2 (en) * 2017-02-21 2019-11-19 Adobe Inc. Deep high-resolution style synthesis
KR101947782B1 (ko) * 2017-02-22 2019-02-13 한국과학기술원 열화상 영상 기반의 거리 추정 장치 및 방법. 그리고 이를 위한 신경망 학습 방법
JP2018139071A (ja) * 2017-02-24 2018-09-06 株式会社リコー 生成モデル学習方法、生成モデル学習装置およびプログラム
KR102499396B1 (ko) * 2017-03-03 2023-02-13 삼성전자 주식회사 뉴럴 네트워크 장치 및 뉴럴 네트워크 장치의 동작 방법
RU2652722C1 (ru) * 2017-05-03 2018-04-28 Самсунг Электроникс Ко., Лтд. Обработка данных для сверхразрешения
CN107133601B (zh) 2017-05-13 2021-03-23 五邑大学 一种基于生成式对抗网络图像超分辨率技术的行人再识别方法
CN107154023B (zh) 2017-05-17 2019-11-05 电子科技大学 基于生成对抗网络和亚像素卷积的人脸超分辨率重建方法
CN107369189A (zh) 2017-07-21 2017-11-21 成都信息工程大学 基于特征损失的医学图像超分辨率重建方法
CN107527044B (zh) 2017-09-18 2021-04-30 北京邮电大学 一种基于搜索的多张车牌清晰化方法及装置
CN108476291A (zh) 2017-09-26 2018-08-31 深圳市大疆创新科技有限公司 图像生成方法、图像生成装置和机器可读存储介质
US10552944B2 (en) * 2017-10-13 2020-02-04 Adobe Inc. Image upscaling with controllable noise reduction using a neural network
CN108122197B (zh) * 2017-10-27 2021-05-04 江西高创保安服务技术有限公司 一种基于深度学习的图像超分辨率重建方法
CN107766860A (zh) 2017-10-31 2018-03-06 武汉大学 基于级联卷积神经网络的自然场景图像文本检测方法
CN108154499B (zh) 2017-12-08 2021-10-08 东华大学 一种基于k-svd学习字典的机织物纹理瑕疵检测方法
CN108052940A (zh) 2017-12-17 2018-05-18 南京理工大学 基于深度学习的sar遥感图像水面目标检测方法
CN107977932B (zh) 2017-12-28 2021-04-23 北京工业大学 一种基于可鉴别属性约束生成对抗网络的人脸图像超分辨率重建方法
CN108268870B (zh) 2018-01-29 2020-10-09 重庆师范大学 基于对抗学习的多尺度特征融合超声图像语义分割方法
CN108334848B (zh) 2018-02-06 2020-12-25 哈尔滨工业大学 一种基于生成对抗网络的微小人脸识别方法
CN108416428B (zh) 2018-02-28 2021-09-14 中国计量大学 一种基于卷积神经网络的机器人视觉定位方法
US11105942B2 (en) * 2018-03-27 2021-08-31 Schlumberger Technology Corporation Generative adversarial network seismic data processor
US10783622B2 (en) * 2018-04-25 2020-09-22 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
US11222415B2 (en) * 2018-04-26 2022-01-11 The Regents Of The University Of California Systems and methods for deep learning microscopy
CN108596830B (zh) * 2018-04-28 2022-04-22 国信优易数据股份有限公司 一种图像风格迁移模型训练方法以及图像风格迁移方法
KR102184755B1 (ko) * 2018-05-31 2020-11-30 서울대학교 산학협력단 안면 특화 초 고화질 심층 신경망 학습 장치 및 방법
US11756160B2 (en) * 2018-07-27 2023-09-12 Washington University ML-based methods for pseudo-CT and HR MR image estimation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8675999B1 (en) * 2012-09-28 2014-03-18 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Apparatus, system, and method for multi-patch based super-resolution from an image
WO2018086354A1 (zh) * 2016-11-09 2018-05-17 京东方科技集团股份有限公司 图像升频系统及其训练方法、以及图像升频方法
CN107767343A (zh) * 2017-11-09 2018-03-06 京东方科技集团股份有限公司 图像处理方法、处理装置和处理设备
CN109255390A (zh) * 2018-09-30 2019-01-22 京东方科技集团股份有限公司 训练图像的预处理方法及模块、鉴别器、可读存储介质
CN109345456A (zh) * 2018-09-30 2019-02-15 京东方科技集团股份有限公司 生成对抗网络训练方法、图像处理方法、设备及存储介质
CN109345455A (zh) * 2018-09-30 2019-02-15 京东方科技集团股份有限公司 图像鉴别方法、鉴别器和计算机可读存储介质
CN109360151A (zh) * 2018-09-30 2019-02-19 京东方科技集团股份有限公司 图像处理方法及系统、分辨率提升方法、可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3859655A4 *

Also Published As

Publication number Publication date
JP2022501661A (ja) 2022-01-06
AU2019350918B2 (en) 2021-10-07
JP7446997B2 (ja) 2024-03-11
JP2022501662A (ja) 2022-01-06
US20210334642A1 (en) 2021-10-28
WO2020062846A1 (en) 2020-04-02
WO2020062957A1 (en) 2020-04-02
JP7415251B2 (ja) 2024-01-17
RU2762144C1 (ru) 2021-12-16
BR112020022560A2 (pt) 2021-06-01
MX2020013580A (es) 2021-02-26
EP3859655A1 (en) 2021-08-04
EP3857503A4 (en) 2022-07-20
US11615505B2 (en) 2023-03-28
US20210365744A1 (en) 2021-11-25
US11449751B2 (en) 2022-09-20
US11348005B2 (en) 2022-05-31
US20210342976A1 (en) 2021-11-04
JP2022501663A (ja) 2022-01-06
KR102389173B1 (ko) 2022-04-21
KR102661434B1 (ko) 2024-04-29
US20200285959A1 (en) 2020-09-10
KR20210012009A (ko) 2021-02-02
KR20200073267A (ko) 2020-06-23
US11361222B2 (en) 2022-06-14
EP3857504A1 (en) 2021-08-04
AU2019350918A1 (en) 2020-11-19
EP3857504A4 (en) 2022-08-10
EP3859655A4 (en) 2022-08-10
EP3857447A1 (en) 2021-08-04
WO2020062958A1 (en) 2020-04-02
EP3857447A4 (en) 2022-06-29
EP3857503A1 (en) 2021-08-04
JP7463643B2 (ja) 2024-04-09

Similar Documents

Publication Publication Date Title
WO2020063648A1 (zh) 生成对抗网络训练方法、图像处理方法、设备及存储介质
CN109345456B (zh) 生成对抗网络训练方法、图像处理方法、设备及存储介质
US9405960B2 (en) Face hallucination using convolutional neural networks
CN109360151B (zh) 图像处理方法及系统、分辨率提升方法、可读存储介质
CN110046644B (zh) 一种证件防伪的方法及装置、计算设备和存储介质
JP4933186B2 (ja) 画像処理装置、画像処理方法、プログラム及び記憶媒体
CN110059728B (zh) 基于注意力模型的rgb-d图像视觉显著性检测方法
US8861883B2 (en) Image processing apparatus, image processing method, and storage medium storing image processing program
CN109872305B (zh) 一种基于质量图生成网络的无参考立体图像质量评价方法
CN108875623B (zh) 一种基于图像特征融合对比技术的人脸识别方法
JP2011100395A (ja) 判別装置、判別方法およびプログラム
JP4901229B2 (ja) 赤目検出方法および装置並びにプログラム
CN111784624A (zh) 目标检测方法、装置、设备及计算机可读存储介质
US20220164601A1 (en) Methods and Apparatuses of Contrastive Learning for Color Constancy
CN106780333B (zh) 一种图像超分辨率重建方法
WO2022077417A1 (zh) 图像处理方法、图像处理设备和可读存储介质
JP4868249B2 (ja) 映像信号処理装置
CN109410143B (zh) 图像增强方法、装置、电子设备及计算机可读介质
CN115170435A (zh) 一种基于Unet网络的图像几何畸变校正方法
CN109685839B (zh) 图像对齐方法、移动终端以及计算机存储介质
CN111383187B (zh) 一种图像处理方法、装置及智能终端
CN111383172B (zh) 神经网络模型的训练方法、装置及智能终端
CN117527983A (zh) 基于Transformer的图像信息隐藏方法
US7421147B2 (en) Hybrid template matching for imaging applications
CN112750151A (zh) 基于数理统计的服装颜色匹配方法、装置和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19864756

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20207014462

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020528931

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2019864756

Country of ref document: EP