WO2021073493A1 - 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质 - Google Patents

图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质 Download PDF

Info

Publication number
WO2021073493A1
WO2021073493A1 PCT/CN2020/120586 CN2020120586W WO2021073493A1 WO 2021073493 A1 WO2021073493 A1 WO 2021073493A1 CN 2020120586 W CN2020120586 W CN 2020120586W WO 2021073493 A1 WO2021073493 A1 WO 2021073493A1
Authority
WO
WIPO (PCT)
Prior art keywords
level
image
output
neural network
processing
Prior art date
Application number
PCT/CN2020/120586
Other languages
English (en)
French (fr)
Inventor
那彦波
陈文彬
刘瀚文
朱丹
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/419,350 priority Critical patent/US11954822B2/en
Publication of WO2021073493A1 publication Critical patent/WO2021073493A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T5/92
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T5/60
    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiments of the present disclosure relate to an image processing method and device, a neural network training method, an image processing method incorporating a neural network model, a construction method of a merged neural network model, a neural network processor, and a storage medium.
  • CNN Convolutional Neural Network
  • At least one embodiment of the present disclosure provides an image processing method, including: acquiring an input image; and based on the input image, acquiring initial feature images with N levels of resolution arranged from high to low, where N is a positive integer and N>2; Based on the initial feature images of the second to N levels, the initial feature image of the first level is cyclically zoomed to obtain an intermediate feature image; and the intermediate feature image is synthesized to obtain an output image; wherein, the Cyclic scaling processing includes: N-1 levels of nested scaling processing, each level of scaling processing includes down-sampling processing, connection processing, up-sampling processing, and residual link addition processing; down-sampling of the i-th level The processing is based on the down-sampling of the input of the scaling process of the i-th level to obtain the down-sampled output of the i-th level, and the connection processing of the i-th level is based on the down-sampling output of the i-th level and the initial feature image of the i+1-
  • the up-sampling process of the i-th level is based on the joint output of the i-th level to obtain the up-sampling output of the i-th level, and the residual link addition processing of the i-th level will process the i-th level
  • connection processing of the i-th level is based on the down-sampling output of the i-th level and the initial feature image of the i+1-th level to obtain the joint output of the i-th level, including:
  • the down-sampled output of the i level is used as the input of the scaling process of the i+1th level to obtain the output of the scaling process of the i+1th level; and the output of the scaling process of the i+1th level is compared with The initial feature images of the (i+1)th level are connected to obtain the joint output of the i-th level.
  • the scaling process of at least one level is continuously performed multiple times, and the output of the previous scaling process is used as the input of the next scaling process.
  • the scaling process of each level is executed twice in succession.
  • the resolution of the initial feature image of the first level is the highest, and the resolution of the initial feature image of the first level is the same as the resolution of the input image.
  • the resolution of the initial feature image of the previous level is an integer multiple of the resolution of the initial feature image of the next level.
  • obtaining the initial feature images of the N levels arranged from high to low resolution includes: connecting the input image with a random noise image to obtain a joint input image; and The joint input image is analyzed and processed at N different levels to obtain the initial feature images of the N levels arranged from high to low resolution respectively.
  • acquiring the input image includes: acquiring an original input image with a first resolution; and performing resolution conversion processing on the original input image to obtain the input image with a second resolution.
  • the second resolution is greater than the first resolution.
  • one of the bicubic interpolation algorithm, the bilinear interpolation algorithm, and the Lanczos interpolation algorithm is used to perform the resolution conversion processing.
  • the image processing method further includes: cropping the input image to obtain multiple sub-input images with overlapping regions;
  • the obtaining of initial feature images of N levels with resolution from high to low based on the input image specifically includes: obtaining, based on each sub-input image, the initial feature images of N levels with resolution from high to low, where N is Positive integer, and N>2;
  • the performing cyclic scaling processing on the initial feature image of the first level based on the initial feature images of the second to N levels to obtain the intermediate feature image specifically includes: based on the sub-initial feature images of the second to N levels, The first-level sub-initial feature image is cyclically zoomed to obtain the sub-intermediate feature image;
  • the synthesizing the intermediate feature image to obtain an output image specifically includes: synthesizing the sub-intermediate feature image to obtain a corresponding sub-output image; and outputting sub-outputs corresponding to the multiple sub-input images.
  • the image is spliced into the output image.
  • the multiple sub-input images have the same size, the centers of the multiple sub-input images form a uniform and regular grid, and in the row direction and column direction, the size of the overlapping area of two adjacent sub-input images
  • the size is constant, and the pixel value of each pixel in the output image is expressed as:
  • Y p represents the pixel value of any pixel p in the output image
  • T represents the number of sub-output images that include the pixel p
  • Y k represents that the pixel p includes The pixel value in the sub-output image of the pixel p
  • s k represents the sub-output of the pixel p to the k-th sub-output image including the pixel p in the k-th sub-output image including the pixel p The distance from the center of the image.
  • At least one embodiment of the present disclosure further provides an image processing method combining neural network models, wherein the combined neural network model includes multiple neural networks, the multiple neural networks are used to perform the same image processing task, and the multiple neural networks are used to perform the same image processing task.
  • the resolutions of the input images of the two neural networks are the same, the resolutions of the output images of the multiple neural networks are the same, and at least one of the structures and parameters of the multiple neural networks is different;
  • the image processing method includes: inputting an input image into the plurality of neural networks in the merged neural network model to obtain the outputs of the plurality of neural networks respectively; and adding the outputs of the plurality of neural networks to obtain Average value to obtain the output of the merged neural network model.
  • the plurality of neural networks includes a first neural network, the first neural network is used to execute a first image processing method, and the first image processing method includes: obtaining an input image; and obtaining a resolution from the input image based on the input image.
  • the cyclic zoom processing includes: N-1 levels of nested zoom processing, each level of zoom processing includes the following Sampling processing, connection processing, up-sampling processing and residual link addition processing; the down-sampling processing of the i-th level is based on the input of the scaling processing of the i-th level for down-sampling to obtain the down-sampled output of the i-th level, and the connection of the i-th level
  • the processing is
  • the residual link addition process of the i-th level The input of the scaling process of the i-th level and the up-sampling output of the i-th level are subjected to residual link addition to obtain the scaling process of the i-th level
  • the output of, where i 1,2,...,N-1; the scaling process of the j+1 level is nested between the downsampling process of the jth level and the connection process of the jth level, the jth level
  • At least one embodiment of the present disclosure also provides a neural network training method, wherein the neural network includes: an analysis network, a cyclic scaling network, and a synthesis network; the training method includes: obtaining a first training input image; using the The analysis network processes the first training input image to obtain N levels of training initial feature images arranged from high to low resolution, where N is a positive integer, and N>2; using the cyclic scaling network, based on The training initial feature images of the second to N levels are cyclically zoomed on the training initial feature images of the first level to obtain the training intermediate feature images; the synthesis network is used to synthesize the training intermediate feature images to obtain A first training output image; based on the first training output image, calculating the loss value of the neural network through a loss function; and correcting the parameters of the neural network according to the loss value of the neural network; wherein, the Cyclic scaling processing includes: N-1 levels of nested scaling processing, each level of scaling processing includes down-sampling processing, connection processing, up-sampling processing
  • the up-sampling process of the i-th level is based on the joint output of the i-th level to obtain the up-sampling output of the i-th level.
  • the loss function is expressed as:
  • L(Y, X) represents the loss function
  • Y represents the first training output image
  • X represents the first training standard image corresponding to the first training input image
  • S k-1 () represents the first training output image.
  • E[] represents the calculation of matrix energy.
  • using the analysis network to process the first training input image to obtain the N levels of training initial feature images arranged from high to low resolution includes: combining the first training input image with Random noise images are connected to obtain a training joint input image; and N different levels of analysis processing are performed on the training joint input image using the analysis network to obtain the N resolutions arranged from high to low.
  • Hierarchical training initial feature images For example, using the analysis network to process the first training input image to obtain the N levels of training initial feature images arranged from high to low resolution includes: combining the first training input image with Random noise images are connected to obtain a training joint input image; and N different levels of analysis processing are performed on the training joint input image using the analysis network to obtain the N resolutions arranged from high to low. Hierarchical training initial feature images.
  • calculating the loss value of the neural network through the loss function based on the first training output image includes: using a discriminant network to process the first training output image, and based on the first training output image The output of the corresponding discriminant network calculates the loss value of the neural network.
  • the discriminant network includes: M-1 levels of down-sampling sub-networks, M levels of discriminant sub-networks, synthesis sub-networks, and activation layers; the M-1 levels of down-sampling sub-networks are used to The input of the discriminant network is subjected to different levels of down-sampling processing to obtain the output of the M-1 level of down-sampling sub-network; the input of the discriminant network and the output of the M-1 level of down-sampling sub-network are respectively Corresponding to the input of the discriminant network of the M levels; the discriminant network of each level includes the brightness processing sub-network, the first convolution sub-network, and the second convolution sub-network connected in sequence; the discriminant network of the t-th level The output of the second convolutional sub-network in the network is connected with the output of the first convolutional sub-network in the t+1-th level discriminant network and then becomes the second convolution in the t+1-th level discriminant network.
  • the brightness processing sub-network includes a brightness feature extraction sub-network, a normalization sub-network and a translation correlation sub-network
  • the brightness feature extraction sub-network is used to extract brightness feature images
  • the normalization sub-network is used to compare
  • the brightness feature image is normalized to obtain a normalized brightness feature image
  • the translation correlation sub-network is used to perform multiple image translation processing on the normalized brightness feature image to obtain multiple shifts And generate a plurality of correlation images according to the correlation between the normalized brightness characteristic image and each of the shifted images.
  • the loss function is expressed as:
  • L(Y,X) represents the loss function
  • Y represents the first training output image
  • X represents the first training standard image corresponding to the first training input image
  • L L1 (S M (Y W 0 )
  • S M (X)) represents the third contrast loss function
  • S M () represents the standard down-sampling processing of the M-th level
  • E[] represents the calculation of matrix energy
  • S 1 is a constant
  • F ij represents the value of the j-th position in the first content feature map of the first training output image extracted by the i-th convolution kernel in the conv3-4 module of the VGG-19 network
  • P ij Represents the value of the j-th position in the second content feature map of the first training standard image extracted by the i-th convolution kernel in the conv3-4 module of the VGG-19 network.
  • the training method of the neural network further includes: training the discrimination network based on the neural network; and alternately executing the training process of the discrimination network and the training process of the neural network to obtain training A good neural network; wherein, based on the neural network, training the discriminant network includes: obtaining a second training input image; using the neural network to process the second training input image to obtain a second Training output image; based on the second training output image, calculating a discriminant loss value through a discriminant loss function; and correcting the parameters of the discriminant network according to the discriminant loss value.
  • the discriminant loss function is expressed as:
  • U represents the second training standard image corresponding to the second training input image
  • C(U) represents the discriminant output image obtained by using the second training standard image as the input of the discriminant network
  • the neural network training method further includes: prior to training, performing cropping processing and decoding processing on each sample image of the training set to obtain multiple sub-sample images in a binary data format; during training, based on the The sub-sample images in the binary data format train the neural network.
  • the sizes of the multiple sub-sample images are the same.
  • At least one embodiment of the present disclosure further provides a method for constructing a merged neural network model, including: obtaining a plurality of trained neural network models, wherein the plurality of neural network models are used to perform the same image processing task, and the multiple neural network models The resolutions of the input images of the two neural network models are the same, the resolutions of the output images of the multiple neural network models are the same, and at least one of the structures and parameters of the multiple neural network models is different; in the same verification set Obtaining the output of the plurality of neural network models above, determining the evaluation quality of the plurality of neural network models according to a predetermined image quality evaluation standard, and sorting the plurality of neural network models according to the evaluation quality from high to low; Use the neural network model with the highest evaluation quality as the first neural network model in the combined neural network model; and determine whether the current remaining neural network model with the highest evaluation quality can be added to the current combined neural network model, if so, then The current remaining neural network model with the highest evaluation quality is added to the current merged neural network model,
  • the method for constructing the merged neural network model further includes: training the obtained merged neural network model to obtain a trained merged neural network model.
  • the predetermined image quality evaluation standard includes one of mean square error, similarity, and peak signal-to-noise ratio.
  • the plurality of neural network models includes a first neural network model, the first neural network model is used to execute a first image processing method, and the first image processing method includes: acquiring an input image; N levels of initial feature images arranged from high to low resolution, N is a positive integer, and N>2; based on the initial feature images of the 2nd to N levels, the first feature image of the N levels of initial feature images Perform cyclic scaling processing on the initial feature image of level 1 to obtain an intermediate feature image; and perform synthesis processing on the intermediate feature image to obtain an output image; wherein, the cyclic scaling process includes: N-1 levels of layer-by-layer Nested scaling processing, the scaling processing at each level includes down-sampling processing, joining processing, up-sampling processing and residual linking addition processing; the down-sampling processing of the i-th level is based on the input of the i-th level scaling processing.
  • the connection processing of the i-th level is based on the connection of the down-sampling output of the i-th level and the initial feature image of the i+1-th level to obtain the joint output of the i-th level
  • the upper part of the i-th level Sampling processing is based on the joint output of the i-th level to obtain the up-sampled output of the i-th level
  • the residual link addition processing of the i-th level combines the input of the scaling processing of the i-th level with the up-sampling of the i-th level
  • At least one embodiment of the present disclosure further provides a neural network processor, including an analysis circuit, a cyclic scaling circuit, and a synthesis circuit;
  • the analysis circuit is configured to obtain initial N levels of resolution from high to low based on the input image.
  • Feature image, N is a positive integer, and N>2;
  • the cyclic scaling circuit is configured to perform cyclic scaling processing on the initial feature image of the first level based on the initial feature image of the second to N levels to obtain an intermediate feature image;
  • the synthesis circuit is configured to perform synthesis processing on the intermediate feature image to obtain an output image;
  • the cyclic scaling circuit includes N-1 levels of scaling circuits nested layer by layer, and each level of scaling circuit includes Down-sampling circuit, connection circuit, up-sampling circuit, and residual link adding circuit;
  • the down-sampling circuit of the i-th level performs down-sampling based on the input of the scaling circuit of the i-th level to obtain the down-sampled output of the i
  • the up-sampling circuit of the i-th level obtains the joint output of the i-th level based on the joint output of the i-th level.
  • the up-sampling output of the i-level, the residual link addition circuit of the i-th level performs the residual link addition of the input of the scaling circuit of the i-th level and the up-sampled output of the i-th level to obtain the scaling of the i-th level
  • the output of the circuit, where i 1, 2,...,N-1; the scaling circuit of the j+1 level is nested between the downsampling circuit of the jth level and the connection circuit of the jth level, and the jth level
  • At least one embodiment of the present disclosure further provides an image processing device, including: an image acquisition module configured to acquire an input image; an image processing module configured to: based on the input image, obtain N levels of resolution arranged from high to low The initial feature image, where N is a positive integer, and N>2; based on the initial feature images of the second to N levels, the initial feature image of the first level is cyclically zoomed to obtain the intermediate feature image; and the intermediate feature The image is synthesized and processed to obtain an output image; wherein the cyclic zoom processing includes: N-1 levels of nested zoom processing, each level of zoom processing includes down-sampling processing, connection processing, and up-sampling processing And the residual link addition processing; the down-sampling processing of the i-th level is based on the input of the scaling processing of the i-th level to obtain the down-sampled output of the i-th level, and the connection processing of the i-th level is based on the down-sampling of the i-th level The sampled
  • the upsampling process of the i-th level is based on the joint output of the i-th level to obtain the up-sampled output of the i-th level.
  • At least one embodiment of the present disclosure further provides an image processing device, including: a memory for non-transitory storage of computer-readable instructions; and a processor for running the computer-readable instructions, the computer-readable instructions being When the processor is running, it executes the image processing method provided by any embodiment of the present disclosure, or executes the image processing method combining neural network model provided by any embodiment of the present disclosure, or executes the neural network provided by any embodiment of the present disclosure. Training method, or implement the method for constructing a merged neural network model provided by any embodiment of the present disclosure.
  • At least one embodiment of the present disclosure further provides a storage medium for non-transitory storage of computer-readable instructions, wherein when the non-transitory computer-readable instructions are executed by a computer, the image provided by any embodiment of the present disclosure can be executed.
  • the instruction of the processing method, or the instruction of the image processing method incorporating the neural network model provided by any embodiment of the present disclosure, or the instruction of the neural network training method provided by any embodiment of the present disclosure, or the instruction of the neural network training method provided by any embodiment of the present disclosure, or The instructions of the method for constructing a combined neural network model provided by any embodiment are disclosed.
  • Figure 1 is a schematic diagram of a convolutional neural network
  • Figure 2A is a schematic diagram of the structure of a convolutional neural network
  • Figure 2B is a schematic diagram of the working process of a convolutional neural network
  • FIG. 3 is a flowchart of an image processing method provided by some embodiments of the present disclosure.
  • FIG. 4A is a schematic flowchart diagram corresponding to the image processing method shown in FIG. 3 according to some embodiments of the present disclosure
  • FIG. 4B is a schematic flowchart diagram corresponding to the image processing method shown in FIG. 3 according to other embodiments of the present disclosure
  • FIG. 5 is a schematic diagram of a cropping process and a splicing process provided by some embodiments of the present disclosure
  • FIG. 6 is a schematic diagram of a merged neural network model provided by some embodiments of the present disclosure.
  • FIG. 7 is a schematic block diagram of the structure of a neural network provided by an embodiment of the present disclosure.
  • FIG. 8A is a flowchart of a neural network training method provided by an embodiment of the present disclosure.
  • FIG. 8B is a schematic structural block diagram of training the neural network shown in FIG. 7 corresponding to the training method shown in FIG. 8A according to an embodiment of the present disclosure
  • FIG. 9 is a schematic structural diagram of a discrimination network provided by some embodiments of the present disclosure.
  • FIG. 10 is a flow chart of generating adversarial training provided by some embodiments of the present disclosure.
  • FIG. 11A is a flowchart of a method for training a discriminant network provided by some embodiments of the present disclosure
  • FIG. 11B is a schematic structural block diagram of training the neural network shown in FIG. 9 corresponding to the training method shown in FIG. 11A according to some embodiments of the present disclosure
  • FIG. 12 is a flowchart of a method for constructing a merged neural network model provided by some embodiments of the present disclosure
  • FIG. 13A is a schematic block diagram of a neural network processor provided by some embodiments of the present disclosure.
  • FIG. 13B is a schematic block diagram of another neural network processor provided by some embodiments of the present disclosure.
  • FIG. 14A is a schematic block diagram of an image processing apparatus provided by some embodiments of the present disclosure.
  • FIG. 14B is a schematic block diagram of another image processing apparatus provided by some embodiments of the present disclosure.
  • FIG. 15 is a schematic diagram of a storage medium provided by some embodiments of the present disclosure.
  • Image enhancement is one of the research hotspots in the field of image processing. Due to the limitation of various physical factors in the image acquisition process (for example, the size of the image sensor of the mobile phone camera is too small and the limitation of other software and hardware, etc.) and the interference of environmental noise, the image quality will be greatly reduced.
  • the purpose of image enhancement is to improve the grayscale histogram of the image and the contrast of the image through image enhancement technology, thereby highlighting the detailed information of the image and improving the visual effect of the image.
  • CNN Convolutional Neural Network
  • FIG. 1 shows a schematic diagram of a convolutional neural network.
  • the convolutional neural network can be used for image processing, which uses images as input and output, and replaces scalar weights with convolution kernels.
  • FIG. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited in the embodiments of the present disclosure.
  • the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103.
  • the input layer 101 has 4 inputs
  • the hidden layer 102 has 3 outputs
  • the output layer 103 has 2 outputs.
  • the convolutional neural network finally outputs 2 images.
  • the 4 inputs of the input layer 101 may be 4 images, or 4 feature images of 1 image.
  • the three outputs of the hidden layer 102 may be characteristic images of the image input through the input layer 101.
  • the convolutional layer has weights And bias Weights Represents the convolution kernel, bias Is a scalar superimposed on the output of the convolutional layer, where k is the label of the input layer 101, and i and j are the labels of the unit of the input layer 101 and the unit of the hidden layer 102, respectively.
  • the first convolutional layer 201 includes a first set of convolution kernels (in Figure 1 ) And the first set of offsets (in Figure 1 ).
  • the second convolutional layer 202 includes a second set of convolution kernels (in Figure 1 ) And the second set of offsets (in Figure 1 ).
  • each convolutional layer includes tens or hundreds of convolution kernels. If the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.
  • the convolutional neural network further includes a first activation layer 203 and a second activation layer 204.
  • the first activation layer 203 is located behind the first convolutional layer 201
  • the second activation layer 204 is located behind the second convolutional layer 202.
  • the activation layer (for example, the first activation layer 203 and the second activation layer 204) includes activation functions, which are used to introduce nonlinear factors to the convolutional neural network, so that the convolutional neural network can better solve more complex problems .
  • the activation function may include a linear correction unit (ReLU) function, a sigmoid function (Sigmoid function), a hyperbolic tangent function (tanh function), and the like.
  • the ReLU function is an unsaturated nonlinear function
  • the Sigmoid function and tanh function are saturated nonlinear functions.
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer (for example, the first convolutional layer 201 can include the first activation layer 203, and the second convolutional layer 202 can be Including the second active layer 204).
  • the first convolution layer 201 For example, in the first convolution layer 201, first, several convolution kernels in the first group of convolution kernels are applied to each input And several offsets in the first set of offsets In order to obtain the output of the first convolutional layer 201; then, the output of the first convolutional layer 201 can be processed by the first activation layer 203 to obtain the output of the first activation layer 203.
  • the second convolutional layer 202 firstly, several convolution kernels in the second set of convolution kernels are applied to the output of the input first activation layer 203 And several offsets in the second set of offsets In order to obtain the output of the second convolutional layer 202; then, the output of the second convolutional layer 202 can be processed by the second activation layer 204 to obtain the output of the second activation layer 204.
  • the output of the first convolutional layer 201 can be a convolution kernel applied to its input Offset
  • the output of the second convolution layer 202 can be a convolution kernel applied to the output of the first activation layer 203 Offset The result of the addition.
  • the convolutional neural network Before using the convolutional neural network for image processing, the convolutional neural network needs to be trained. After training, the convolution kernel and bias of the convolutional neural network remain unchanged during image processing. During the training process, each convolution kernel and bias are adjusted through multiple sets of input/output example images and optimization algorithms to obtain an optimized convolutional neural network.
  • Fig. 2A shows a schematic diagram of the structure of a convolutional neural network
  • Fig. 2B shows a schematic diagram of the working process of a convolutional neural network.
  • FIGS. 2A and 2B after the input image is input to the convolutional neural network through the input layer, it goes through several processing procedures (each level in FIG. 2A) in turn, and then outputs the category identification.
  • the main components of a convolutional neural network can include multiple convolutional layers, multiple down-sampling layers, and fully connected layers.
  • each of these layers refers to the corresponding processing operation, that is, convolution processing, downsampling processing, fully connected processing, etc.
  • the described neural network also refers to the corresponding processing operation, the example standardization layer or layer standardization layer to be described below is similar to this, and the description will not be repeated here.
  • a complete convolutional neural network can be composed of these three layers.
  • FIG. 2A only shows three levels of a convolutional neural network, namely the first level, the second level, and the third level.
  • each level may include a convolution module and a downsampling layer.
  • each convolution module may include a convolution layer. Therefore, the processing procedure of each level may include: convolution and down-sampling of the input image.
  • each convolution module may also include an instance normalization layer or a layer normalization layer, so that the processing process of each level may also include instance normalization processing or layer normalization processing.
  • the instance standardization layer is used to perform instance standardization processing on the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby simplifying the image generation process and improving the effect of image enhancement.
  • the predetermined range may be [-1, 1] and so on.
  • the instance standardization layer performs instance standardization processing on each feature image according to its own mean and variance.
  • the instance standardization layer can also be used to perform instance standardization processing on a single image.
  • the instance standardization formula of the instance standardization layer can be expressed as follows:
  • x tijk is the value of the t-th patch, the i-th characteristic image, the j-th row, and the k-th column in the set of characteristic images output by the convolutional layer.
  • y tijk represents the result obtained after processing x tijk by the instance standardization layer.
  • ⁇ 1 is a small integer to avoid zero denominator.
  • the layer standardization layer is similar to the example standardization layer. It is also used to perform layer standardization processing on the feature image output by the convolutional layer, so that the gray value of the pixel of the feature image changes within a predetermined range, thereby simplifying the image generation process and improving The effect of image enhancement.
  • the predetermined range may be [-1, 1].
  • the layer standardization layer performs layer standardization processing on each column of the characteristic image according to the mean value and variance of each column of each characteristic image, so as to realize the layer standardization processing of the characteristic image.
  • the layer standardization layer can also be used to perform layer standardization processing on a single image.
  • the model of the feature image is expressed as (T, C, H, W). Therefore, the layer standardization formula of the layer standardization layer can be expressed as follows:
  • x tijk is the value of the t-th patch, the i-th characteristic image, the j-th row, and the k-th column in the set of characteristic images output by the convolutional layer.
  • y tijk represents the result obtained after processing x tijk by the layer standardization layer.
  • ⁇ 2 is a small integer to avoid zero denominator.
  • the convolutional layer is the core layer of the convolutional neural network.
  • a neuron is only connected to some of the neurons in the adjacent layer.
  • the convolutional layer can apply several convolution kernels (also called filters) to the input image to extract multiple types of features of the input image.
  • Each convolution kernel can extract one type of feature.
  • the convolution kernel is generally initialized in the form of a random decimal matrix. During the training process of the convolutional neural network, the convolution kernel will learn to obtain reasonable weights.
  • the result obtained after applying a convolution kernel to the input image is called a feature map, and the number of feature images is equal to the number of convolution kernels.
  • Each feature image is composed of some neurons arranged in a rectangle, and the neurons of the same feature image share weights, and the shared weights here are the convolution kernels.
  • the feature image output by the convolution layer of one level can be input to the convolution layer of the next next level and processed again to obtain a new feature image.
  • the first-level convolutional layer can output a first-level feature image, which is input to the second-level convolutional layer and processed again to obtain a second-level feature image.
  • the convolutional layer can use different convolution kernels to convolve the data of a certain local receptive field of the input image, and the convolution result is input to the activation layer, which is calculated according to the corresponding activation function To get the characteristic information of the input image.
  • the down-sampling layer is arranged between adjacent convolutional layers, and the down-sampling layer is a form of down-sampling.
  • the down-sampling layer can be used to reduce the scale of the input image, simplify the calculation complexity, and reduce over-fitting to a certain extent; on the other hand, the down-sampling layer can also perform feature compression to extract the input image Main features.
  • the down-sampling layer can reduce the size of feature images, but does not change the number of feature images.
  • a 2 ⁇ 2 output image can be obtained, which means that 36 pixels on the input image are merged into the output image. 1 pixel.
  • the last downsampling layer or convolutional layer can be connected to one or more fully connected layers, which are used to connect all the extracted features.
  • the output of the fully connected layer is a one-dimensional matrix, which is a vector.
  • the image processing method includes: obtaining an input image; based on the input image, obtaining initial feature images of N levels of resolution from high to low, where N is a positive integer, and N>2, where the first level with the highest resolution
  • the resolution of the initial feature image of is the same as the resolution of the input image
  • the initial feature image of the first level is cyclically zoomed to obtain the intermediate feature image, and the resolution of the intermediate feature image
  • the rate is the same as the resolution of the input image
  • the intermediate feature image is synthesized to obtain the output image; among them, the cyclic zoom processing includes: N-1 levels of nested zoom processing, each level of zoom processing Including down-sampling processing, connection processing, up-sampling processing and residual link addition processing; the down-sampling processing of the i-th level is based on the input of the scaling processing of the i-th level for down-sampling to obtain the down-sampled output
  • the up-sampling processing of the i-th level is based on the joint output of the i-th level to obtain the i-th level Up-sampling output, the residual link addition process of the i-th level
  • the scaling process of the j+1 level is nested between the downsampling process of the jth level and the connection process of the jth level, and the output of the downsampling process of the jth level is used as the
  • the zoom processing input of the j+1th level, where j 1, 2,...,N-2.
  • Some embodiments of the present disclosure also provide an image processing device corresponding to the above-mentioned image processing method, a training method of a neural network, an image processing method of a merged network model, a construction method of a merged network model, and a storage medium.
  • the image processing method provided by at least one embodiment of the present disclosure obtains multiple initial feature images of different resolutions based on the input image, and combines these initial feature images of different resolutions to perform cyclic scaling processing on the initial feature image with the highest resolution to obtain Higher image fidelity and greatly improve the quality of the output image, while also increasing the processing speed.
  • FIG. 3 is a flowchart of an image processing method provided by some embodiments of the present disclosure
  • FIG. 4A is a schematic flowchart of an image processing method provided by some embodiments of the present disclosure corresponding to the image processing method shown in FIG. 3
  • FIG. 4B is Other embodiments of the present disclosure provide a schematic flowchart diagram corresponding to the image processing method shown in FIG. 3.
  • the image processing method shown in FIG. 3 will be described in detail with reference to FIGS. 4A and 4B.
  • the image processing method includes:
  • Step S110 Obtain an input image.
  • the input image is labeled INP.
  • the input image INP may include photos taken through the camera of a smartphone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a web camera, etc., which may include images of people, animals, plants, or landscapes. Images and the like are not limited in the embodiments of the present disclosure.
  • the input image INP may be a grayscale image or a color image.
  • color images include, but are not limited to, 3-channel RGB images. It should be noted that in the embodiments of the present disclosure, when the input image INP is a grayscale image, the output image OUTP is also a grayscale image; when the input image INP is a color image, the output image OUTP is also a color image.
  • the input image is obtained by obtaining an original input image with a first resolution, and performing resolution conversion processing (for example, image super-resolution reconstruction processing) on the original input image.
  • the input image has a second resolution, and the second resolution is greater than the first resolution.
  • Image super-resolution reconstruction is a technology that improves the resolution of an image to obtain a higher resolution image.
  • super-resolution images are usually generated by interpolation algorithms.
  • commonly used interpolation algorithms include nearest interpolation, bilinear interpolation, bicubic interpolation, Lanczos interpolation, and so on.
  • the image processing method provided by the embodiment of the present disclosure can perform enhancement processing on the super-resolution image generated by the conventional method, thereby improving the quality of the super-resolution image.
  • Step S120 Based on the input image, obtain initial feature images of N levels arranged from high to low resolution, where N is a positive integer, and N>2.
  • the input image INP can be analyzed and processed at N different levels through the analysis network to obtain the initial feature image F01 of the N levels arranged from high to low resolution.
  • ⁇ F0N for example, F01 ⁇ F05 shown in Fig. 4A.
  • the analysis network includes N analysis sub-networks ASNs, and each analysis sub-network ASN is used to perform the above-mentioned different levels of analysis and processing to obtain the initial N levels of resolution from high to low.
  • Feature images F01 to F0N for example, F01 to F05 shown in FIG. 4A).
  • each analysis sub-network ASN can be implemented as a convolutional network module such as convolutional neural network CNN, residual network ResNet, dense network DenseNet, etc.
  • each analysis sub-network ASN can include a convolutional layer, a downsampling layer, Standardization layer, etc., but not limited to this.
  • the input image INP can be first connected with the random noise image noise (concatenate, as shown in CONCAT in the figure) to obtain a joint input image; then the joint input image can be obtained through the analysis network.
  • the image is analyzed and processed at N different levels to obtain initial feature images F01 to F0N of N levels arranged from high to low resolution respectively.
  • the concatenation process CONCAT can be regarded as: stacking each channel image of multiple (for example, two or more) images to be joined, so that the number of channels of the image obtained by the concatenation is the number of channels of the multiple images to be concatenated The sum of the numbers.
  • the image of each channel of the joint input image is the synthesis of the image of each channel of the input image and the image of each channel of the random noise image.
  • the random noise in the random noise image noise may conform to a Gaussian distribution, but is not limited to this.
  • FIG. 4B for the specific process and details of the analysis processing in the embodiment shown in FIG. 4B, reference may be made to the related description of the analysis processing in the embodiment shown in FIG. 4A, which will not be repeated here.
  • the noise amplitude of the random noise image may be zero; for example, in other embodiments, the noise amplitude of the random noise image may not be zero. The embodiment of the present disclosure does not limit this.
  • each level is determined in a top-down direction.
  • the resolution of the initial feature image F01 of the first level with the highest resolution may be the same as the resolution of the input image INP.
  • the input image is obtained by performing resolution conversion processing (for example, image super-resolution reconstruction processing) on the original input image.
  • resolution conversion processing for example, image super-resolution reconstruction processing
  • the initial features of the Nth level with the lowest resolution The resolution of the image may be the same as the resolution of the original input image. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • the resolution of the initial feature image of the previous level is an integer multiple of the resolution of the initial feature image of the next level (for example, the i+1th level), for example 2 times, 3 times, 4 times,..., etc., the embodiment of the present disclosure does not limit this.
  • Step S130 Perform cyclic scaling processing on the initial feature image of the first level based on the initial feature image of the second to N levels to obtain an intermediate feature image.
  • the cyclic scaling processing includes: N-1 levels of nested scaling processing, each level of scaling processing includes down-sampling processing DS, connection processing CONCAT, and up-sampling processing executed in sequence. Sampling processing US and residual link addition processing ADD.
  • the down-sampling process of the i-th level is based on the input of the scaling process of the i-th level for down-sampling to obtain the down-sampled output of the i-th level, and the connection processing of the i-th level is based on the i-th level.
  • the down-sampling output is connected with the initial feature image of the i+1 level to obtain the joint output of the i-th level.
  • the up-sampling process of the i-th level is based on the joint output of the i-th level to obtain the up-sampled output of the i-th level.
  • connection process of the i-th level is based on the down-sampled output of the i-th level and the initial feature image of the i+1th level to obtain the joint output of the i-th level, including:
  • the down-sampled output of the level is used as the input of the scaling process of the i+1th level to obtain the output of the scaling process of the i+1th level; and the output of the scaling process of the i+1th level is compared with the initial
  • the feature images are connected to obtain the joint output of the i-th level.
  • the down-sampling processing DS is used to reduce the size of the feature map, thereby reducing the data amount of the feature map.
  • the down-sampling process can be performed through the down-sampling layer, but is not limited to this.
  • the down-sampling layer can use max pooling, average pooling, strided convolution, decimation, such as selecting fixed pixels, and demultiplexing output (demuxout, Split the input image into multiple smaller images) and other down-sampling methods to achieve down-sampling processing.
  • the downsampling layer can also use interpolation algorithms such as interpolation, bilinear interpolation, bicubic interpolation, and Lanczos interpolation to perform downsampling processing. For example, when the interpolation algorithm is used for down-sampling, only the interpolated value can be retained and the original pixel value can be removed, thereby reducing the size of the feature map.
  • the up-sampling process US is used to increase the size of the feature map, thereby increasing the data volume of the feature map.
  • the up-sampling process can be performed through the up-sampling layer, but is not limited to this.
  • the up-sampling layer may adopt up-sampling methods such as strided transposed convolution and interpolation algorithms to implement up-sampling processing.
  • the interpolation algorithm may include, for example, interpolation, bilinear interpolation, bicubic interpolation, Lanczos interpolation and other algorithms. For example, when an interpolation algorithm is used for up-sampling processing, the original pixel value and the interpolated value can be retained, thereby increasing the size of the feature map.
  • the scaling process at each level can be regarded as a residual network.
  • the residual network can maintain its input in a certain proportion in its output through residual link addition processing, that is, through residual link addition processing ADD, you can add
  • the input of the scaling process of each level is maintained in the output of the scaling process of each level at a certain ratio.
  • the size of the input and output of the residual link addition process ADD is the same.
  • the residual link addition processing may include correspondingly adding the values of each row and each column of the matrix of the two feature images, but it is not limited to this.
  • the downsampling factor of the downsampling process at the same level corresponds to the upsampling factor of the upsampling process, that is, when the downsampling factor of the downsampling process is 1/y .
  • the upsampling factor of the upsampling process is y, where y is a positive integer, and y is usually equal to or greater than 2.
  • the parameters of the downsampling process at different levels may be the same or different;
  • the parameters of the upsampling process of the levels may be the same or different;
  • the parameters of the residual link addition process of different levels may be the same or different.
  • the embodiment of the present disclosure does not limit this.
  • the parameters of the down-sampling process at the same level in different orders may be the same or different; the parameters of the up-sampling process at the same level in different orders It can be the same or different; the parameters of the residual link addition processing at the same level in different orders can be the same or different.
  • the embodiment of the present disclosure does not limit this.
  • the multi-scale cyclic sampling processing may also include: performing instance standardization processing or processing on the output of down-sampling processing, the output of up-sampling processing, etc. Layer standardization processing. It should be noted that the output of the down-sampling processing, the output of the up-sampling processing, etc. can adopt the same standardized processing method (example standardized processing or layer standardized processing), or different standardized processing methods. No restrictions.
  • “nested” means that an object includes another object that is similar or identical to the object, and the object includes but is not limited to a process or a network structure.
  • the scaling process of at least one level may be executed multiple times in succession, that is, each level may include multiple scaling processes, for example, the output of the previous scaling process is used as the input of the next scaling process.
  • the scaling process of each level can be executed twice in succession. In this case, the quality of the output image can be improved and the network structure can be avoided from being complicated. It should be noted that the embodiment of the present disclosure does not limit the specific execution times of the scaling processing at each level.
  • the resolution of the intermediate feature image is the same as the resolution of the input image INP.
  • the initial feature image F01 at the first level may be subjected to the above-mentioned cyclic scaling process based on the initial feature images F01 to F05 at the second to fifth levels to Obtain the intermediate feature image FM.
  • Step S140 Perform synthesis processing on the intermediate feature image to obtain an output image.
  • the intermediate feature image FM may be synthesized through the synthesis network MERG to obtain the output image OUTP.
  • the synthesis network may include convolutional layers and the like.
  • the output image may include a 1-channel grayscale image, or may include, for example, a 3-channel RGB image (ie, a color image). It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the synthesis network MERG, as long as it can convert the convolution feature dimension (ie, the intermediate feature image FM) into the output image OUTP.
  • the input image may be cropped first to obtain multiple sub-input images with overlapping regions; then, the above-mentioned image processing method (for example, the aforementioned step S110, step S140, etc.) is used. ) The multiple sub-input images are respectively processed to obtain corresponding multiple sub-output images; finally, the multiple corresponding sub-output images are spliced into output images.
  • the above-mentioned image processing method for example, the aforementioned step S110, step S140, etc.
  • FIG. 5 is a schematic diagram of a cropping process and a splicing process provided by some embodiments of the present disclosure.
  • the above-mentioned cropping process and splicing process will be described in detail with reference to FIG. 5.
  • the input image may be cropped into multiple sub-input images with overlapping areas (for example, as shown in FIG. 5, as shown by the four rectangular boxes represented by the respective centers T1 to T4. Show).
  • the multiple sub-input images should cover the entire input image, that is, each pixel in the input image should be included in at least one sub-input image.
  • the size and resolution of the multiple sub-input images are the same, and the centers of the multiple sub-input images form a uniform and regular grid, that is, in the horizontal direction (that is, in the row direction) and in the vertical direction.
  • the distance between adjacent centers in the direction ie, the column direction
  • the size of the overlapping area of two adjacent sub-input images is constant in the row direction or/and the column direction.
  • the row and column positions of the pixels in the input image correspond to the row and column positions of the pixels in the output image
  • the row and column positions of the pixels in each sub-input image correspond to those in the corresponding sub-output image.
  • the row and column positions that is, the four rectangular boxes represented by the respective centers T1 to T4 in FIG. 5 can also represent the corresponding four sub-output images.
  • the pixel value of each pixel in the output image can be calculated by the following formula:
  • Y p represents the pixel value of any pixel p in the output image
  • T represents the number of sub-output images that include the pixel p
  • Y k represents that the pixel p includes the pixel in the k-th frame
  • the pixel value in the sub-output image of point p, s k represents the distance from the pixel point p in the k-th sub-output image including the pixel point p to the center of the k-th sub-output image including the pixel point p .
  • the above-mentioned splicing process can be calculated through the following steps:
  • splicing processing algorithm is exemplary, and the embodiments of the present disclosure do not limit this.
  • splicing processing algorithms as long as they can reasonably process the pixel values of the pixels in the overlapping area and Just meet the actual needs.
  • the output image is a color image, for example, a 3-channel RGB image
  • the object of the above-mentioned cropping process and splicing process should be the image of each channel.
  • the image processing method provided by the embodiments of the present disclosure obtains multiple initial feature images with different resolutions based on the input image, and combines these initial feature images with different resolutions to perform cyclic scaling processing on the initial feature image with the highest resolution, so as to obtain more High image fidelity and greatly improve the quality of the output image, while also increasing the processing speed.
  • FIG. 6 is a schematic diagram of a merged neural network model provided by some embodiments of the present disclosure.
  • the merged neural network model includes a plurality of neural network models.
  • the multiple neural network models are used to perform the same image processing task, the resolution (ie size) of the input images of the multiple neural network models is the same, and the resolution (ie size) of the output images of the multiple neural network models Also the same; at the same time, at least one of the structures and parameters of the multiple neural network models is different (different parameters means that the parameters are at least not completely the same).
  • neural network models with the same structure but different parameters may be trained based on different training configurations.
  • the above-mentioned different training configurations refer to one of different training sets, different initial parameters, different convolution kernel sizes, and different hyperparameters, or any combination thereof.
  • the image processing method of the merged neural network model may include: inputting an input image into multiple neural network models in the merged neural network model to obtain outputs of the multiple neural network models respectively; and The outputs of the multiple neural network models are added together and averaged to obtain the output of the combined neural network model (that is, the output image).
  • the image processing method for merging neural network models provided by the embodiments of the present disclosure can utilize these models with similar or slightly inferior performance, so that the output effect of the merging neural network model is better than that of a single neural network with the best performance. The output effect of the network is better.
  • the plurality of neural network models may include a first neural network model, and the first neural network model is used to execute a first image processing method.
  • the first image processing method is provided by the foregoing embodiment.
  • the image processing method (for example, including the above-mentioned step S110 to step S140, etc.), the embodiment of the present disclosure includes but is not limited to this.
  • FIG. 6 only shows the case where the merged neural network model includes three neural network models NNM1 to NNM3, it should not be regarded as a limitation of the present disclosure, that is, the merged neural network model can include more according to actual needs. More or less neural network models.
  • the construction of the merged neural network model can refer to the related description of the construction method of the merged neural network model which will be explained later, which will not be repeated here.
  • the image processing method for merging neural network models provided by the embodiments of the present disclosure can directly average the output of multiple neural network models to obtain a better output effect, and the merged neural network model is easy to update (that is, adding a new neural network model). , Or replace the poorly performing neural network model in the existing merged neural network model with a new neural network model, etc.).
  • FIG. 7 is a schematic block diagram of a neural network structure provided by an embodiment of the present disclosure
  • FIG. 8A is a flowchart of a neural network training method provided by an embodiment of the present disclosure
  • FIG. 8B is a flowchart provided by an embodiment of the present disclosure
  • the neural network 100 includes an analysis network 110, a cyclic scaling network 120 and a synthesis network 130.
  • the neural network 100 may be used to execute the image processing method provided in the foregoing embodiment (for example, the embodiment shown in FIG. 4A or FIG. 4B).
  • the analysis network 110 can be used to perform step S120 in the aforementioned image processing method, that is, the analysis network 110 can process the input image to obtain initial feature images of N levels arranged from high to low resolution, and N is a positive integer.
  • the cyclic scaling network 120 can be used to perform step S130 in the aforementioned image processing method, that is, the cyclic scaling network 120 can circulate the initial feature images of the first level based on the initial feature images of the 2nd to N levels Scaling processing to obtain an intermediate feature image;
  • the synthesis network 130 may be used to perform step S140 in the aforementioned image processing method, that is, the synthesis network 130 may perform synthesis processing on the intermediate feature image to obtain an output image.
  • the specific structures of the neural network 100, the analysis network 110, the cyclic zoom network 120, and the synthesis network 130 and the corresponding specific processing procedures and details can be referred to the relevant description in the foregoing image processing method, which will not be repeated here.
  • the input image and the output image can also refer to the description of the input image and the output image in the image processing method provided in the foregoing embodiment, which is not repeated here.
  • the training method of the neural network includes step S210 to step S260.
  • Step S210 Obtain a first training input image.
  • the first training input image may also include the camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a surveillance camera, or a web camera.
  • the photos which may include images of people, images of animals and plants, or landscape images, etc., which are not limited in the embodiments of the present disclosure.
  • the first training input image can be a grayscale image or a color image.
  • color images include, but are not limited to, 3-channel RGB images.
  • the first training input image is obtained by obtaining a training original input image and performing resolution conversion processing (for example, image super-resolution reconstruction processing) on the training original input image.
  • resolution conversion processing for example, image super-resolution reconstruction processing
  • super-resolution images are usually generated by interpolation algorithms.
  • interpolation algorithms include nearest interpolation, bilinear interpolation, bicubic interpolation, Lanczos interpolation, and so on.
  • multiple pixels can be generated based on one pixel in the training original input image to obtain the first training input image based on the super-resolution of the training original input image.
  • Step S220 Use the analysis network to process the first training input image to obtain N levels of training initial feature images arranged from high to low resolution, where N is a positive integer and N>2.
  • the analysis network 110 may include N analysis sub-networks, and each analysis sub-network is used to perform analysis processing at different levels, so as to obtain the N analysis sub-networks arranged from high to low resolution.
  • Hierarchical training initial feature images may be implemented as a convolutional network module such as a convolutional neural network CNN, a residual network ResNet, a dense network DenseNet, etc., for example, each analysis sub-network may include a convolutional layer, a down-sampling layer, and a normalization layer. Etc., but not limited to this.
  • the resolution of the training initial feature image of the first level with the highest resolution may be the same as the resolution of the first training input image.
  • the first training input image is obtained by performing resolution conversion processing (for example, image super-resolution reconstruction processing) on the training original input image.
  • the Nth with the lowest resolution The resolution of the hierarchical training initial feature image may be the same as the resolution of the training original input image. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • Step S230 Using the cyclic zoom network, based on the training initial feature images of the second to N levels, perform cyclic zoom processing on the training initial feature images of the first level to obtain the training intermediate feature images.
  • step S230 For example, for the specific process and details of the cyclic scaling processing of the cyclic scaling network 120 in step S230, reference may be made to the relevant description of the cyclic scaling processing in the aforementioned step S130, which will not be repeated here.
  • Step S240 Use a synthesis network to perform synthesis processing on the training intermediate feature image to obtain a first training output image.
  • the synthesis network 130 may also include a convolutional layer and the like.
  • the first training output image may be a grayscale image including 1 channel, or may be an RGB image (ie, a color image) including, for example, 3 channels.
  • the embodiment of the present disclosure does not limit the structure and parameters of the synthesis network 330, as long as it can convert the convolution feature dimension (ie, the training intermediate feature image) into the first training output image.
  • Step S250 Calculate the loss value of the neural network through the loss function based on the first training output image.
  • the parameters of the neural network 100 include the parameters of the analysis network 110, the parameters of the cyclic scaling network 120, and the parameters of the synthesis network 130.
  • the initial parameter of the neural network 100 may be a random number, for example, the random number conforms to a Gaussian distribution, which is not limited in the embodiment of the present disclosure.
  • the initial training feature images of N levels are obtained by directly performing different levels of analysis and processing on the first training input image (not connected to the random noise image) through the analysis network 110 (refer to FIG. 4A).
  • the above loss function can be expressed as:
  • L (Y, X) represents the loss function
  • Y represents the first training output image
  • X represents the first training standard image corresponding to the first training input image
  • S k-1 () represents performing the k-1 level down Sampling processing
  • S k-1 (Y) represents the output obtained by performing the k-1 level downsampling processing on the first training output image
  • S k-1 (X) represents performing the k-th training standard image on the first training standard image.
  • the output obtained by the 1-level down-sampling processing, E[] represents the calculation of the matrix energy. For example, E[] can be used to calculate the maximum value or average value of the elements in the matrix in "[]".
  • the first training standard image X has the same scene as the first training input image, that is, the content of the two is the same, and at the same time, the quality of the first training standard image X is higher than the quality of the first training output image.
  • the first training standard image X is equivalent to the target output image of the neural network 100.
  • image quality evaluation standards include mean square error (MSE), similarity (SSIM), peak signal-to-noise ratio (PSNR), and so on.
  • the first training standard image X may be a photo image taken by, for example, a digital single-lens reflex camera.
  • interpolation algorithms such as bilinear interpolation, bicubic interpolation, Lanczos interpolation, etc. may be used to downsample the first training standard image X to obtain the original training input image, and then perform training
  • the original input image is subjected to resolution conversion processing (for example, image super-resolution reconstruction processing) to obtain the first training input image, so as to ensure that the first training standard image X and the first training input image have the same scene.
  • resolution conversion processing for example, image super-resolution reconstruction processing
  • the down-sampling method used in the down-sampling process of the k-1 level can be the same as the down-sampling process of the k-1 level in the aforementioned cyclic scaling process. The downsampling method is the same.
  • the resolution of S 1 (Y) and S 1 (X) is the same as the resolution of the down-sampling output of the first level in the aforementioned cyclic scaling process
  • S 2 (Y) and S 2 ( The resolution of X) is the same as the resolution of the down-sampled output of the second level in the aforementioned cyclic scaling process
  • the resolutions of S 3 (Y) and S 3 (X) are the same as those of the third level in the aforementioned cyclic scaling process.
  • the resolution of the sampled output is the same,..., and so on, the resolutions of S N-1 (Y) and S N-1 (X) are the same as the resolution of the down-sampled output of the N-1 level in the cyclic scaling process described above the same. It should be noted that the embodiments of the present disclosure do not limit this.
  • the training goal of the neural network 100 is to minimize the loss value.
  • the parameters of the neural network 100 are continuously corrected, so that the first training output image output by the neural network 100 after the parameter correction is continuously close to the first training standard image, thereby continuously Reduce the loss value.
  • the aforementioned loss function provided in this embodiment is exemplary, and the embodiment of the present disclosure includes but is not limited to this.
  • the initial training feature images of N levels are obtained by first concatenating the first training input image with the random noise image (CONCAT) to obtain the training joint input image, and then the training joint input image is obtained through the analysis network 110.
  • the input image is obtained by analyzing and processing N different levels (refer to FIG. 4B).
  • the training process of the neural network 100 needs to be performed jointly with the discriminant network.
  • a discriminant network can be used to process the first training output image, and the loss value of the neural network 100 can be calculated based on the output of the discriminant network corresponding to the first training output image.
  • FIG. 9 is a schematic structural diagram of a discrimination network provided by some embodiments of the present disclosure.
  • the discrimination network 200 includes M-1 levels of down-sampling sub-networks, M levels of discrimination sub-networks, synthesis sub-networks and activation layers, where M is a positive integer and M>1.
  • M is a positive integer and M>1.
  • M N-1.
  • the order of the levels is determined in a top-down direction.
  • the discriminant network when using the discriminant network to process the first training output image, first perform different levels of downsampling processing on the first training output image through M-1 levels of downsampling sub-networks to obtain The output of the M-1 level down-sampling sub-network; then, the first training output image and the output of the M-1 level down-sampling sub-network are respectively used as the input of the M-level discriminant branch network.
  • the resolution of the output of the down-sampling sub-network of the previous level is higher than the resolution of the output of the down-sampling sub-network of the next level.
  • the first training output image is used as the input of the first-level discriminant network
  • the output of the downsampling sub-network of the first level is used as the input of the second-level discriminant network.
  • the output of the sampling sub-network is used as the input of the third-level discriminant network,..., and so on, the output of the M-1 level down-sampling sub-network is used as the input of the M-th discriminant network.
  • the down-sampling sub-network includes a down-sampling layer.
  • the down-sampling sub-network can use max pooling, average pooling, strided convolution, decimation, such as selecting fixed pixels, and demultiplexing output (demuxout). , Split the input image into multiple smaller images) and other down-sampling methods to achieve down-sampling processing.
  • the downsampling layer can also use interpolation algorithms such as interpolation, bilinear interpolation, bicubic interpolation, and Lanczos interpolation to perform downsampling processing.
  • the discriminant branch network of each level includes a brightness processing sub-network (as shown by a dashed box in FIG. 9 ), a first convolution sub-network, and a second convolution sub-network connected in sequence.
  • the brightness processing sub-network may include a brightness feature extraction sub-network, a normalization sub-network and a translation correlation sub-network.
  • the brightness feature extraction sub-network of each level is used to extract the brightness feature image of the input of the discriminant branch network of the level. Since the human eye is more sensitive to the brightness feature of the image but not to other features, by extracting the brightness feature of the training image, some unnecessary information can be removed, thereby reducing the amount of calculation. It should be understood that the brightness feature extraction sub-network can be used to extract the brightness feature image of the color image, that is, the brightness feature extraction sub-network works when the first training output image is a color image; and when the input of the discriminating branch network (ie, the first When a training output image, etc.) is a grayscale image, the brightness feature extraction sub-network may not be required.
  • the output of the M-1 level downsampling sub-network is also a 3-channel RGB image, that is to say .
  • the input of the discriminant network at each level is 3 channel RGB images.
  • the feature extraction sub-network can extract the brightness feature image through the following formula:
  • R, G and B respectively represent the red information (i.e. the data information of the first channel), the green information (i.e. the data information of the second channel) and the blue information (i.e. the data information of the third channel) of the RGB format image, P represents the converted brightness information.
  • the normalization sub-network is used to normalize the above-mentioned brightness feature image to obtain a normalized brightness feature image.
  • the pixel values of the normalized brightness feature image can be unified in the comparison In the small value range, prevent some pixel values from being too large or too small, so as to facilitate the calculation of correlation.
  • the normalized sub-network can be normalized by the following formula:
  • J is the normalized brightness feature image
  • I is the brightness feature image
  • Blur() is the Gaussian blur operation. That is, Blur(I) represents performing Gaussian blur operation on the brightness feature image, and Blur(I 2 ) represents squaring each pixel value in the brightness feature image to obtain a new feature image, and performing Gaussian blur operation on the new feature image.
  • is the image obtained after the Gaussian blur operation of the brightness feature image
  • ⁇ 2 is the local variance image of the brightness feature image (variance normalized image).
  • the translation correlation sub-network is used to perform multiple image translation processing on the normalized brightness feature image to obtain multiple shifted images; and according to the correlation between the normalized brightness feature image and each shifted image Sex, generate multiple correlation images.
  • each image translation process includes: shifting the last a-column pixels of the normalized brightness characteristic image to before the remaining pixels in the row direction to obtain an intermediate image; then, the rear b of the intermediate image The row pixels are shifted in the column direction to before the remaining pixels to obtain a shifted image.
  • 0 ⁇ a ⁇ H, 0 ⁇ b ⁇ W, a and b are integers
  • H is the total number of rows of pixels in the normalized brightness feature image
  • W is the total number of columns of pixels in the normalized brightness feature image
  • the value of at least one of a and b changes.
  • the value of each pixel corresponds to the value of each pixel of the brightness characteristic image in a one-to-one correspondence; and, all the pixels in the i-th row and j-th column in the shifted image The values come from pixels at different positions in the first feature image.
  • the shifted image is the normalized brightness characteristic image itself.
  • the last b-row pixels of the normalized brightness feature image can be shifted in the column direction to before the rest of the pixels to obtain an intermediate image, and then the last a-column pixels of the intermediate image can be shifted in the row direction to Before the remaining pixels, to get a shifted image.
  • the number of image translation processing is H ⁇ W times (wherein, a and b are also counted once when both are 0), so that H ⁇ W correlation images are obtained.
  • generating a plurality of correlation images according to the correlation between the normalized brightness feature image and each shifted image includes: arranging the i-th row and j-th column in the normalized brightness feature image The product of the value of the pixel and the value of the pixel in the i-th row and the j-th column in each shifted image is used as the value of the pixel in the i-th row and j-th column in the corresponding correlation image; where 1 ⁇ i ⁇ H, 1 ⁇ j ⁇ W, i, and j are all integers.
  • the first convolution subnetwork is used to perform convolution processing on multiple correlation images to obtain the first convolution feature image, that is, the first convolution subnetwork may include a convolution layer.
  • the first convolution sub-network may further include a standardization layer, so that the first convolution sub-network may also perform standardization processing. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • the second convolution sub-network may include a convolution layer and a down-sampling layer, so that convolution processing and down-sampling processing can be performed on the input of the second convolution sub-network.
  • the output of the first convolutional sub-network in the first-level discriminant network is used as the input of the second convolutional sub-network in the first-level discriminant network; the t-th-level discriminant network
  • the output of the second convolutional sub-network in the network is connected (CONCAT) with the output of the first convolutional sub-network in the t+1-th level discriminant network, and then becomes the first convolutional sub-network in the t+1-th level discriminant network.
  • CONCAT connection
  • Step S214 Use the third backbone network to process the second intermediate output to obtain multiple first feature maps.
  • the synthesis sub-network is connected to the second convolution sub-network in the M-th discriminant branch network, and the synthesis sub-network is used to connect the second convolution sub-network in the M-th discriminant branch network.
  • the output of the network is synthesized to obtain a discriminative output image.
  • the specific structure of the synthesis sub-network and the specific process and details of the synthesis processing can be referred to the relevant description of the aforementioned synthesis network, which will not be repeated here.
  • the activation layer is connected to the composite sub-network.
  • the activation function of the activation layer may adopt a Sigmoid function, so that the output of the activation layer (that is, the output of the discriminant network 200) is a value within the range of [0, 1].
  • the output of the discriminant network 200 can be used to characterize, for example, the quality of the first training output image.
  • the loss function of the neural network 100 can be expressed as:
  • L(Y,X) represents the loss function
  • X represents the first training standard image corresponding to the first training input image
  • L L1 (S M (Y W 1 )
  • S M ( X)) represents the first contrast loss function
  • L L1 ((Y W 0 )
  • X) represents the second contrast loss function
  • the first training output image obtained when the noise amplitude of the noisy image is 0,
  • S M () represents the first training output image.
  • the aforementioned preset weight can be adjusted according to actual needs.
  • ⁇ 1 : ⁇ 2 : ⁇ 3 : ⁇ 4 : ⁇ 5 0.001:10:0.1:10:10, and the embodiments of the present disclosure include but are not limited to this.
  • E[] represents the calculation of matrix energy.
  • E[] can be used to calculate the maximum value or average value of the elements in the matrix in "[]".
  • a content feature extraction module may be used to provide content features of the first training output image and the first training standard image.
  • the content feature extraction module may be a conv3-4 module in the VGG-19 network, and the embodiments of the present disclosure include but are not limited to this.
  • the VGG-19 network is a type of deep convolutional neural network, which was developed by the Visual Geometry Group of Oxford University and has been widely used in the field of visual recognition.
  • S 1 is a constant
  • F ij represents the value of the j-th position in the first content feature map of the first training output image extracted by the i-th convolution kernel in the content feature extraction module
  • P ij represents the value in the content feature extraction module The value of the j-th position in the second content feature map of the first training standard image extracted by the i-th convolution kernel in the module.
  • the content loss function expressed by the above formula is exemplary.
  • the content loss function can also be expressed as other commonly used formulas, which are not limited in the embodiments of the present disclosure.
  • the specific expression form of the loss function of the aforementioned neural network 100 is exemplary, and the embodiments of the present disclosure do not limit this, that is, the loss function of the neural network 100 may include more or less components according to actual needs. section.
  • Step S260 Correct the parameters of the neural network according to the loss value of the neural network.
  • the training process of the neural network 100 may also include an optimization function (not shown in FIG. 8B).
  • the optimization function may calculate the error value of the parameters of the neural network 100 according to the loss value calculated by the loss function, and according to the error value The parameters of the neural network 100 are corrected.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (BGD) algorithm, etc., to calculate the error value of the parameters of the neural network 100.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the training method of the neural network 100 may further include: judging whether the training of the neural network satisfies a predetermined condition, and if the predetermined condition is not satisfied, repeating the above training process (ie, step S210 to step S260); if If the predetermined conditions are met, the above-mentioned training process is stopped, and a trained neural network is obtained.
  • the aforementioned predetermined condition is that the loss values corresponding to two consecutive (or more) first training output images no longer decrease significantly.
  • the foregoing predetermined condition is that the number of training times or training periods of the neural network reaches a predetermined number. It should be noted that the embodiments of the present disclosure do not limit this.
  • the first training output image Y output by the trained neural network 100 is close to the first training standard image X in terms of content and quality.
  • FIG. 10 is a flow chart of generating adversarial training provided by some embodiments of the present disclosure.
  • generative adversarial training includes:
  • Step S300 Training the discriminant network based on the neural network
  • Step S400 Training the neural network based on the discriminant network.
  • the training process of the neural network in step S400 can be implemented through the above steps S210 to S260, which will not be repeated here. It should be noted that during the training process of the neural network 100, the parameters of the discriminant network 200 remain unchanged. It should be noted that in the generative adversarial training, the neural network 100 can also be generally referred to as the generative network 100.
  • FIG. 11A is a flowchart of a method for training a discriminative network provided by some embodiments of the present disclosure
  • FIG. 11B is a method for training the nerve shown in FIG. 9 corresponding to the training method shown in FIG. 11A according to some embodiments of the present disclosure.
  • the training process of the discriminant network 200 ie, step S300
  • FIGS. 11A and 11B will be described in detail with reference to FIGS. 11A and 11B.
  • step S300 the training process of the discriminant network 200, namely step S300, includes step S310 to step S340, as follows:
  • Step S310 Obtain a second training input image
  • Step S320 Use a neural network to process the second training input image to obtain a second training output image
  • Step S330 Based on the second training output image, calculate the discriminant loss value through the discriminant loss function
  • Step S340 Correct the parameters of the discrimination network according to the discrimination loss value.
  • the training process of the discriminating network 200 may further include: judging whether the training of the discriminating network 200 satisfies a predetermined condition, if the predetermined condition is not met, repeating the training process of the discriminating network 200; if the predetermined condition is satisfied, then The training process of the discriminant network 200 is stopped, and a trained discriminant network 200 is obtained.
  • the foregoing predetermined condition is that the discriminant loss values corresponding to two consecutive (or more) second training output images and the second training standard image are no longer significantly reduced.
  • the above-mentioned predetermined condition is that it is determined that the number of training times or the training period of the network 200 reaches a predetermined number. It should be noted that the embodiments of the present disclosure do not limit this.
  • the neural network 100 needs to be jointly trained. It should be noted that during the training process of the discriminant network 200, the parameters of the neural network 100 remain unchanged.
  • the above example is only a schematic illustration of the training process of the discriminant network.
  • Those skilled in the art should know that in the training stage, a large number of sample images need to be used to train the discriminant network; at the same time, in the training process of each sample image, multiple iterations can be included to modify the parameters of the discriminant network.
  • the training phase also includes fine-tune the parameters of the discriminant network to obtain more optimized parameters.
  • the initial parameter of the discrimination network 200 may be a random number, for example, the random number conforms to a Gaussian distribution, which is not limited in the embodiment of the present disclosure.
  • the training process of the discriminant network 200 can also include an optimization function (not shown in FIG. 11A).
  • the optimization function can calculate the error value of the parameters of the discriminant network 200 according to the discriminant loss value calculated by the discriminant loss function, and according to the error The value modifies the parameters of the discrimination network 200.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, etc., to calculate the error value of the parameters of the discriminant network 200.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the second training input image may be the same as the first training input image.
  • the second training input image set and the first training input image set are the same image set.
  • the embodiments of the present disclosure include but are not limited to this.
  • the second training input image can refer to the related description of the aforementioned first training input image, which will not be repeated here.
  • the discriminative loss function can be expressed as:
  • U represents the second training standard image corresponding to the second training input image
  • C(U) represents the discriminant output image obtained by using the second training standard image as the input of the discriminant network
  • the second training standard image U has the same scene as the second training input image, that is, the content of the two is the same, and at the same time, the quality of the second training standard image U is higher than the quality of the second training output image.
  • the second training standard image U can refer to the related description of the aforementioned first training standard image X, which will not be repeated here.
  • discriminant loss function expressed by the above formula is exemplary.
  • discriminant loss function can also be expressed as other commonly used formulas, which are not limited in the embodiments of the present disclosure.
  • the training objective of the discriminant network 200 is to minimize the discriminative loss value.
  • the parameters of the discriminant network 200 are continuously modified so that the discriminant network 200 after the parameter correction can accurately distinguish the second training output image and the second training standard image, that is, make
  • the discriminant network 200 determines that the deviation between the second training output image and the second training standard image is getting larger and larger, thereby continuously reducing the discrimination loss value.
  • the training of the neural network 100 and the training of the discriminant network 200 are performed alternately and iteratively.
  • the discriminant network 200 is generally trained in the first stage to improve the discriminating ability of the discriminant network 200, and the discriminant network 200 trained in the first stage is obtained;
  • the discriminant network 200 trained in the first stage performs the first stage training on the neural network 100 to improve the image enhancement processing capability of the neural network 100, and obtain the neural network 100 trained in the first stage.
  • the discriminant network 200 trained in the first stage is trained in the second stage to improve the discriminating ability of the discriminant network 200, and obtain The discriminant network 200 trained in the second stage; then, the neural network 100 trained in the first stage is trained in the second stage based on the discriminant network 200 trained in the second stage to improve the image enhancement processing capability of the neural network 100, and get The neural network 100 trained in the second stage, and so on, and then the discriminant network 200 and the neural network 100 are trained in the third stage, the fourth stage training, ... until the quality of the output of the neural network 100 can be close to the corresponding The quality of the training standard image.
  • the anti-antibody of the neural network 100 and the discriminant network 200 now discriminates the loss function opposite to the generated loss function in the loss function of the neural network.
  • the image output by the neural network 100 obtained after training is a high-quality image (that is, close to the quality of the training standard image), and the discriminant network 200 is generated for the second training standard image and the neural network 100.
  • the output of the second training output image tends to be consistent, that is, the neural network 100 and the discriminant network 200 reach the Nash equilibrium through the confrontation game.
  • the reading operation of a large number of sample images in the training set (including the first/second training input image, the first/second training standard image, etc.) is usually involved.
  • decoding operations refers to the operation of reading the sample image stored in the memory into the processor; for example, in some embodiments, the decoding operation refers to the conversion of the image format (such as PNG, TIFF, The operation of decoding a sample image in a format such as JPEG into a binary data format, and the sample image usually needs to be decoded before it can be processed by a neural network.
  • each read operation and decoding operation will take up a lot of computing resources, which is not conducive to improving the training speed; this problem is especially serious when the number of sample images with higher resolution is large. Therefore, in some embodiments, in order to solve the above problems, before the training is officially started, each sample image of the training set can be cropped and decoded in advance to obtain multiple sub-sample images in binary data format, which can be based on the Multiple sub-sample images in binary data format train the neural network.
  • each sample image in the training set may be cropped into multiple sub-sample images, and then the multiple sub-sample images are decoded into binary data format sub-sample images and stored.
  • each sample image in the training set can be decoded into binary data format, and then the sample image in the binary format is cropped to obtain multiple sub-sample images in the binary data format and saved .
  • multiple sub-sample images corresponding to each sample image may or may not overlap each other, which is not limited in the embodiment of the present disclosure.
  • the sizes of the multiple sub-sample images corresponding to each sample image may be completely equal, may also be partially equal, or may not be equal to each other, which is not limited in the embodiment of the present disclosure.
  • the centers of the multiple sub-sample images corresponding to each sample image may be uniformly distributed or non-uniformly distributed, which is not limited in the embodiment of the present disclosure.
  • each sample image corresponds to a folder, and multiple sub-sample images corresponding to the sample image are stored in the folder in a predetermined naming manner; at the same time, folders corresponding to all sample images may be Stored in a large folder, that is, the training set can correspond to the large folder.
  • each sub-sample image corresponding to each sample image may be named in a naming manner of "sample image name" + "sub-sample image serial number", and the embodiments of the present disclosure include but are not limited to this.
  • the neural network training method provided by the embodiment of the present disclosure can train the neural network used in the image processing method of the embodiment of the present disclosure.
  • the neural network trained by the training method can perform image enhancement processing on the input image. Obtain higher image fidelity and greatly improve the quality of the output image, while also increasing the processing speed.
  • FIG. 12 is a flowchart of a method for constructing a merged neural network model provided by some embodiments of the present disclosure.
  • the method for constructing the merged neural network model includes step S410 to step S450.
  • Step S410 Obtain multiple trained neural network models, where the multiple neural network models are used to perform the same image processing task, the input images of the multiple neural network models have the same resolution, and the multiple neural network models have the same resolution.
  • the resolution of the output images is the same, and at least one of the structures and parameters of the multiple neural network models is different.
  • the multiple neural network models may include neural network models with the same structure but different parameters.
  • the neural network model with the same structure but different parameters may be trained based on different training configurations.
  • different training configurations refer to one of different training sets, different initial parameters, different convolution kernel sizes (for example, 3 ⁇ 3, 5 ⁇ 5, 7 ⁇ 7, etc.), different hyperparameters, etc. or random combination. It should be understood that when there are differences between the specific structures of the neural network models, the training can be performed based on the same training configuration, which is not limited in the embodiments of the present disclosure.
  • the plurality of neural network models may include a first neural network model, and the first neural network model is used to execute a first image processing method.
  • the first image processing method is provided by the foregoing embodiment.
  • the image processing method (for example, including the above-mentioned step S110 to step S140, etc.), the embodiment of the present disclosure includes but is not limited to this.
  • Step S420 Obtain the outputs of the multiple neural network models on the same verification set, determine the evaluation quality of the multiple neural network models according to a predetermined image quality evaluation standard, and set the multiple neural network models from high to high according to the evaluation quality Sort low.
  • the verification set includes a verification input image and a verification standard image corresponding to the verification input image.
  • the verification input image can refer to the related description of the aforementioned training input image (for example, the first training input image, the second training input image)
  • the verification standard image can refer to the aforementioned training standard image (for example, the first training standard image, the second training input image).
  • the relevant description of the training standard image) will not be repeated here.
  • the validation set and the training set are usually not strictly distinguished. For example, in some cases, the validation set can be used as the training set, and a part of the training set can be used as the validation set.
  • the aforementioned verification input image is input to the plurality of neural network models to obtain verification output images of the plurality of neural network models, and then each neural network model is determined based on each verification output image and verification standard image
  • the quality of the assessment For example, generally, the closer the verification output image is to the verification standard image, the higher the evaluation quality of the neural network model.
  • image quality evaluation standards include mean square error (MSE), similarity (SSIM), peak signal-to-noise ratio (PSNR), and so on. Taking the evaluation standard as the mean square error as an example, the mean square error between the verification output image and the verification standard image can be calculated by the following formula:
  • MSE represents the mean square error
  • Y' represents the verification output image
  • X' represents the verification standard image corresponding to the verification output image
  • E[] represents the calculation of matrix energy
  • the multiple neural network models are sorted according to the evaluation quality from high to low, that is, the multiple neural networks are sorted according to the mean square error from small to large.
  • Step S430 Use the neural network model with the highest evaluation quality as the first neural network model in the merged neural network model.
  • the neural network model with the smallest mean square error may be used as the first neural network model in the merged neural network model, and the embodiments of the present disclosure include but are not limited to this.
  • the neural network model with the largest PSNR may be used as the first neural network model in the merged neural network model.
  • Step S440 Determine whether the current remaining neural network model with the highest evaluation quality can be added to the current merged neural network model, if so, add the current remaining neural network model with the highest evaluation quality to the current merged neural network model, if not, Then use the current merged neural network model as the obtained merged neural network model.
  • the verification input image can be input to each neural network model in the current merged neural network model to obtain the output of each neural network model in the current merged neural network model; then, the current merged neural network model The output of each neural network model in is added to obtain an average value to obtain the output of the current merged neural network model, and the evaluation quality of the current merged neural network model is determined based on the output of the current merged neural network model.
  • the verification input image can be input into the current remaining neural network model with the highest evaluation quality (the current remaining neural network model refers to the neural network model that has not yet been incorporated into the merged neural network model) to obtain the current remaining evaluation
  • the output of the neural network model with the highest quality then, the output of the current remaining neural network model with the highest evaluation quality and the output of each neural network model in the current merged neural network model are added together and averaged to obtain the temporary merged neural network.
  • the output of the network model, and the evaluation quality of the temporary merged neural network model is determined based on the output of the temporary merged neural network model.
  • step S440 If the evaluation quality of the temporary merged neural network model is not lower than the evaluation quality of the current merged neural network model, the current remaining neural network model with the highest evaluation quality is added to the current merged neural network model, and the remaining evaluation quality will continue The highest neural network model is judged; if the evaluation quality of the temporary merged neural network model is lower than the evaluation quality of the current merged neural network model, step S440 is ended.
  • step S440 is naturally ended.
  • Step S450 Use the neural network training method to train the obtained merged neural network model to obtain the trained merged neural network model.
  • the specific structure, processing process and details of the neural network models are not required to be completely the same.
  • a neural network model that performs the same image processing task and has other specific structures at this time, there is no requirement on whether the training configuration is the same
  • FIG. 13A is a schematic block diagram of a neural network processor provided by some embodiments of the present disclosure.
  • the neural network processor 50 includes an analysis circuit 60, a cyclic scaling circuit 70, and a synthesis circuit 80.
  • the neural network processor 50 can be used to execute the aforementioned image processing method.
  • the analysis circuit 60 is configured to obtain N levels of initial feature images arranged from high to low resolution based on the input image.
  • N is a positive integer and N>2, that is, the analysis circuit 60 can be used to perform the aforementioned image processing method.
  • step S120 the specific process and details refer to the aforementioned related descriptions, which will not be repeated here.
  • the cyclic scaling circuit 70 is configured to perform a cyclic scaling process on the initial feature image of the first level based on the initial feature image of the second to N levels to obtain an intermediate feature image, that is, the cyclic scaling circuit 70 can be used to perform the aforementioned image processing
  • the specific process and details refer to the foregoing related description, and the details are not repeated here.
  • the synthesis circuit 80 is configured to perform synthesis processing on the intermediate feature image to obtain an output image, that is, the synthesis circuit 80 can be used to perform step S140 of the foregoing image processing method.
  • the synthesis circuit 80 can be used to perform step S140 of the foregoing image processing method.
  • the cyclic zoom circuit 70 may include N-1 levels of nested zoom circuits 75, each level of the zoom circuit 75 includes a down-sampling circuit 751, a connection circuit 752, and an up-sampling circuit 753.
  • the addition circuit 754 is linked with the residual, so that the cyclic scaling circuit 70 can be used to perform the cyclic scaling process in the aforementioned image processing method.
  • the down-sampling circuit of the i-th level performs down-sampling based on the input of the scaling circuit of the i-th level to obtain the down-sampling output of the i-th level
  • the connection circuit of the i-th level is based on the down-sampling output of the i-th level and the i+
  • the initial feature images of level 1 are connected to obtain the joint output of the i-th level.
  • the up-sampling circuit of the i-th level obtains the up-sampling output of the i-th level based on the joint output of the i-th level, and the residual link of the i-th level is added
  • the scaling circuit of the j+1 level is nested between the down-sampling circuit of the j-th level and the connection circuit of the j-th level, and the output of the down-sampling circuit of the j-th level is used as the j+1 level
  • FIG. 13B is a schematic block diagram of another neural network processor provided by some embodiments of the present disclosure.
  • the algorithms of each layer in the convolutional neural network such as FIG. 4A and/or FIG. 4B can be implemented in the neural network processor 10 shown in FIG. 13B.
  • the neural network processor (NPU) 10 can be mounted on a main CPU (not shown in FIG. 13B) as a coprocessor, and the main CPU can allocate tasks.
  • the core part of the NPU is the arithmetic circuit 11, and the controller 12 controls the arithmetic circuit 11 to extract data (for example, an input matrix and a weight matrix, etc.) in the internal memory 13 and perform calculations.
  • the arithmetic circuit 11 may include multiple processing units (Process Engine, PE).
  • PE Process Engine
  • the arithmetic circuit 11 is a two-dimensional systolic array.
  • the arithmetic circuit 11 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 11 is a general-purpose matrix processor.
  • the arithmetic circuit 11 can read the corresponding data of the weight matrix from the internal memory 12 and cache it on each PE in the arithmetic circuit 11; in addition, the arithmetic circuit 11 can also read from the internal memory 12.
  • the data of the input matrix and the weight matrix are subjected to matrix operations, and partial results or final results of the obtained matrix are stored in the accumulator 14.
  • the vector calculation unit 15 may perform further processing on the output of the arithmetic circuit 11, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 15 can be used for network calculation of the non-convolution/non-fully connected layer in the neural network, such as sampling, standardization, etc. as follows.
  • the vector calculation unit 15 may store the processed output vector in the unified memory 16.
  • the vector calculation unit 15 may apply a nonlinear function to the output of the arithmetic circuit 11, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 15 generates a standardized value, a combined value, or both.
  • the processed output vector can be used as the activation input of the arithmetic circuit 11, for example for use in a subsequent layer in a convolutional neural network.
  • Part or all of the steps of the image processing method and neural network training method provided by the embodiments of the present disclosure may be executed by the arithmetic circuit 11 or the vector calculation unit 15.
  • the neural network processor 10 can write the input data in the external memory (not shown in FIG. 13B) into the internal memory 13 and/or the unified memory 16 through the storage unit access to the controller 17, and also in the unified memory 16.
  • the data is stored in the external memory.
  • the bus interface unit 20 is used to realize the interaction between the main CPU, the storage unit access controller 17, and the fetch memory 18 through the bus.
  • the fetch memory 18 connected to the controller 12 is used to store instructions used by the controller 12.
  • the controller 12 is used to call the instruction cached in the instruction fetch memory 18 to realize the working process of the control arithmetic circuit 11.
  • each layer in the convolutional neural network shown in FIG. 4A and/or FIG. 4B may be executed by the arithmetic circuit 11 or the vector calculation unit 15.
  • FIG. 14A is a schematic block diagram of an image processing apparatus provided by some embodiments of the present disclosure.
  • the image processing device 470 includes an image acquisition module 480 and an image processing module 490.
  • the image processing device 470 may be used to execute the aforementioned image processing method, and the embodiments of the present disclosure include but are not limited to this.
  • the image acquisition module 480 may be used to execute step S110 of the aforementioned image processing method, and the embodiments of the present disclosure include but are not limited to this.
  • the image acquisition module 480 may be used to acquire an input image.
  • the image acquisition module 480 may include a memory, which stores input images; or, the image acquisition module 480 may also include one or more cameras to acquire the input images.
  • the image processing module 490 may be used to execute step S120 to step S140 of the aforementioned image processing method, and the embodiments of the present disclosure include but are not limited to this.
  • the image processing module can: Based on the input image, obtain the initial feature images of N levels arranged from high to low resolution, where N is a positive integer and N>2; based on the initial feature images of the 2nd to N levels, The initial feature image of the first level is cyclically zoomed to obtain an intermediate feature image; and the intermediate feature image is synthesized to obtain an output image.
  • the specific process and details of the cyclic zoom processing can refer to the related description in the foregoing image processing method, and the details will not be repeated here.
  • the image acquisition module 480 and the image processing module 490 may be implemented as hardware, software, firmware, and any feasible combination thereof.
  • FIG. 14B is a schematic block diagram of another image processing apparatus provided by some embodiments of the present disclosure.
  • the image processing apparatus 500 includes a memory 510 and a processor 520.
  • the memory 510 is used for non-transitory storage of computer readable instructions
  • the processor 520 is used for running the computer readable instructions.
  • the image processing method provided by any embodiment of the present disclosure is executed. Or/and the image processing method of the combined neural network model or/and the training method of the neural network or/and the construction method of the combined neural network model.
  • the memory 510 and the processor 520 may directly or indirectly communicate with each other.
  • the image processing apparatus 500 may further include a system bus 530, and the memory 510 and the processor 520 may communicate with each other through the system bus 530.
  • the processor 520 may communicate with each other through the system bus 530.
  • 530 accesses the memory 510.
  • components such as the memory 510 and the processor 520 may communicate through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunication network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunication network, and/or any combination of the above networks, and so on.
  • the wired network may, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication
  • the wireless network may use, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi.
  • the present disclosure does not limit the types and functions of the network here.
  • the processor 520 may control other components in the image processing apparatus to perform desired functions.
  • the processor 520 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, and other devices with data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard.
  • the GPU can also be built into the central processing unit (CPU).
  • the memory 510 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • Non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • one or more computer instructions may be stored in the memory 510, and the processor 520 may execute the computer instructions to implement various functions.
  • Various application programs and various data can also be stored in the computer-readable storage medium, such as input image, output image, first/second training input image, first/second training output image, first/second training Standard images and various data used and/or generated by the application.
  • one or more steps in the image processing method or the image processing method incorporating the neural network model described above may be executed.
  • one or more steps in the method for training a neural network or the method for constructing a combined neural network model described above may be executed.
  • the image processing apparatus 500 may further include an input interface 540 that allows an external device to communicate with the image processing apparatus 500.
  • the input interface 540 may be used to receive instructions from external computer devices, from users, and the like.
  • the image processing apparatus 500 may further include an output interface 550 that connects the image processing apparatus 500 and one or more external devices to each other.
  • the image processing apparatus 500 may display images and the like through the output interface 550.
  • External devices that communicate with the image processing apparatus 500 through the input interface 1010 and the output interface 1012 may be included in an environment that provides any type of user interface with which a user can interact. Examples of user interface types include graphical user interfaces, natural user interfaces, and so on.
  • a graphical user interface can accept input from a user using an input device such as a keyboard, mouse, remote control, etc., and provide output on an output device such as a display.
  • a natural user interface may enable a user to interact with the image processing apparatus 500 in a manner that does not need to be subject to constraints imposed by input devices such as a keyboard, mouse, remote control, and the like.
  • natural user interfaces can rely on voice recognition, touch and stylus recognition, gesture recognition on and near the screen, air gestures, head and eye tracking, voice and semantics, vision, touch, gestures, and machine intelligence.
  • the image processing apparatus 500 is shown as a single system in FIG. 9, it is understood that the image processing apparatus 500 may also be a distributed system, and may also be arranged as a cloud facility (including a public cloud or a private cloud). Therefore, for example, several devices may communicate through a network connection and may jointly perform tasks described as being performed by the image processing apparatus 500.
  • the processing procedure of the image processing method please refer to the relevant description in the embodiment of the above-mentioned image processing method, and for the detailed description of the processing procedure of the image processing method incorporating the neural network model, please refer to the image of the aforementioned merging neural network model.
  • the relevant description in the embodiment of the processing method for the detailed description of the processing process of the neural network training method, please refer to the relevant description in the above-mentioned neural network training method embodiment, regarding the processing process of the method for constructing a combined neural network model.
  • the image processing device provided by the embodiments of the present disclosure is exemplary rather than restrictive. According to actual application requirements, the image processing device may also include other conventional components or structures, for example, to achieve image processing. For the necessary functions of the device, those skilled in the art can set other conventional components or structures according to specific application scenarios, which are not limited in the embodiments of the present disclosure.
  • the technical effects of the image processing device provided by the embodiments of the present disclosure can refer to the corresponding descriptions of the image processing method, the image processing method incorporating the neural network model, the training method of the neural network, and the construction method of the merged neural network model in the above embodiments. I won't repeat them here.
  • FIG. 15 is a schematic diagram of a storage medium provided by an embodiment of the present disclosure.
  • the storage medium 600 non-transitory stores computer-readable instructions 601.
  • any of the embodiments of the present disclosure can be executed.
  • the instruction of the image processing method or the image processing method incorporating the neural network model or the instruction of the neural network training method or the construction method of the incorporated neural network model provided by any embodiment of the present disclosure can be executed.
  • one or more computer instructions may be stored on the storage medium 600.
  • Some computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the above-mentioned image processing method or the image processing method incorporating a neural network model.
  • Other computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the above-mentioned neural network training method or the construction method of merging neural network models.
  • the storage medium may include the storage components of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), optical disk read-only memory (CD -ROM), flash memory, or any combination of the above storage media, can also be other suitable storage media.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • CD -ROM optical disk read-only memory
  • flash memory or any combination of the above storage media, can also be other suitable storage media.

Abstract

一种图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质。该图像处理方法包括:基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对中间特征图像进行合成处理,以得到输出图像。循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括依次执行的下采样处理、联接处理、上采样处理和残差链接相加处理;当前层级的联接处理基于当前层级的下采样处理的输出和下一层级的初始特征图像进行联接得到当前层级的联合处理的输出;下一层级的缩放处理嵌套在当前层级的下采样处理和联接处理之间。

Description

图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
本申请要求于2019年10月18日递交的中国专利申请第201910995755.2号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器以及存储介质。
背景技术
当前,基于人工神经网络的深度学习技术已经在诸如图像分类、图像捕获和搜索、面部识别、年龄和语音识别等领域取得了巨大进展。深度学习的优势在于可以利用通用的结构以相对类似的系统解决非常不同的技术问题。卷积神经网络(Convolutional Neural Network,CNN)是近年发展起来并引起广泛重视的一种人工神经网络,CNN是一种特殊的图像识别方式,属于非常有效的带有前向反馈的网络。现在,CNN的应用范围已经不仅仅限于图像识别领域,也可以应用在人脸识别、文字识别、图像处理等应用方向。
发明内容
本公开至少一个实施例提供一种图像处理方法,包括:获取输入图像;基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对所述中间特征图像进行合成处理,以得到输出图像;其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
例如,所述第i层级的联接处理基于所述第i层级的下采样输出和所述第i+1层级的初始特征图像进行联接得到所述第i层级的联合输出,包括:将所述第i层级的下采样输出作为所述第i+1层级的缩放处理的输入,以得到所述第i+1层级的缩放处理的输出;以及将所 述第i+1层级的缩放处理的输出与所述第i+1层级的初始特征图像进行联接,以得到所述第i层级的联合输出。
例如,至少一个层级的缩放处理连续执行多次,且前一次缩放处理的输出作为后一次缩放处理的输入。
例如,每个层级的缩放处理连续执行两次。
例如,在所述N个层级的初始特征图像中,第1层级的初始特征图像的分辨率最高,且第1层级的初始特征图像的分辨率与所述输入图像的分辨率相同。
例如,前一层级的初始特征图像的分辨率为后一层级的初始特征图像的分辨率的整数倍。
例如,基于所述输入图像,得到分辨率从高到低排列的所述N个层级的初始特征图像,包括:将所述输入图像与随机噪声图像进行联接,以得到联合输入图像;以及对所述联合输入图像进行N个不同层级的分析处理,以分别得到分辨率从高到低排列的所述N个层级的初始特征图像。
例如,获取所述输入图像,包括:获取具有第一分辨率的原始输入图像;以及对所述原始输入图像进行分辨率转换处理,以得到具有第二分辨率的所述输入图像,所述第二分辨率大于第一分辨率。
例如,采用双立方插值算法、双线性插值算法和兰索斯(Lanczos)插值算法之一进行所述分辨率转换处理。
例如,所述的图像处理方法还包括:对所述输入图像进行裁剪处理,以得到具有交叠区域的多个子输入图像;
所述基于输入图像得到分辨率从高到低排列的N个层级的初始特征图像具体包括:基于每个子输入图像,得到分辨率从高到低排列的N个层级的子初始特征图像,N为正整数,且N>2;
所述基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像具体包括:基于第2~N层级的子初始特征图像,对所述第1层级的子初始特征图像进行循环缩放处理,以得到子中间特征图像;
所述对所述中间特征图像进行合成处理,以得到输出图像具体包括:对所述子中间特征图像进行合成处理,以得到对应的子输出图像;以及将所述多个子输入图像对应的子输出图像拼接为所述输出图像。
例如,所述多个子输入图像的尺寸大小相同,所述多个子输入图像的中心形成均匀规则的网格,且在行方向和列方向上,相邻的两个子输入图像的交叠区域的尺寸大小均是恒定的,所述输出图像中每个像素点的像素值表示为:
Figure PCTCN2020120586-appb-000001
其中,Y p表示所述输出图像中的任意一个像素点p的像素值,T表示包括该像素点p的子输出图像的数量,Y k,(p)表示该像素点p在第k幅包括该像素点p的子输出图像中的像素值,s k表示在所述第k幅包括该像素点p的子输出图像中该像素点p到所述第k幅包括该像素点p的子输出图像的中心的距离。
本公开至少一个实施例还提供一种合并神经网络模型的图像处理方法,其中,所述合并神经网络模型包括多个神经网络,所述多个神经网络用于执行同一图像处理任务,所述多个神经网络的输入图像的分辨率相同,所述多个神经网络的输出图像的分辨率相同,所述多个神经网络两两之间至少结构和参数之一不同;所述合并神经网络模型的图像处理方法,包括:将输入图像输入所述合并神经网络模型中的所述多个神经网络,以分别得到所述多个神经网络的输出;以及将所述多个神经网络的输出相加取平均值,以得到所述合并神经网络模型的输出。
例如,所述多个神经网络包括第一神经网络,所述第一神经网络用于执行第一图像处理方法,所述第一图像处理方法包括:获取输入图像;基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对所述中间特征图像进行合成处理,以得到输出图像;其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
本公开至少一个实施例还提供一种神经网络的训练方法,其中,所述神经网络包括:分析网络、循环缩放网络和合成网络;所述训练方法包括:获取第一训练输入图像;使用所述分析网络对所述第一训练输入图像进行处理,以得到分辨率从高到低排列的N个层级的训练初始特征图像,N为正整数,且N>2;使用所述循环缩放网络,基于第2~N层级的训练初始特征图像,对第1层级的训练初始特征图像进行循环缩放处理,以得到训练中间特征图像;使用所述合成网络对所述训练中间特征图像进行合成处理,以得到第一训练输出图像;基于所述第一训练输出图像,通过损失函数计算所述神经网络的损失值;以及根据所述神经网络的损失值对所述神经网络的参数进行修正;其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括依次执行的下采样处理、联接处理、上采样处理和残差链接相加处理;第i层级的下采样处理基于第i层级的缩放处理的 输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
例如,所述损失函数表示为:
Figure PCTCN2020120586-appb-000002
其中,L(Y,X)表示所述损失函数,Y表示所述第一训练输出图像,X表示所述第一训练输入图像对应的第一训练标准图像,S k-1()表示进行第k-1层级的标准下采样处理,E[]表示对矩阵能量的计算。
例如,使用所述分析网络对所述第一训练输入图像进行处理,以得到分辨率从高到低排列的所述N个层级的训练初始特征图像,包括:将所述第一训练输入图像与随机噪声图像进行联接,以得到训练联合输入图像;以及使用所述分析网络对所述训练联合输入图像进行N个不同层级的分析处理,以分别得到分辨率从高到低排列的所述N个层级的训练初始特征图像。
例如,基于所述第一训练输出图像,通过所述损失函数计算所述神经网络的损失值,包括:使用判别网络对所述第一训练输出图像进行处理,并基于所述第一训练输出图像对应的判别网络的输出计算所述神经网络的损失值。
例如,所述判别网络包括:M-1个层级的下采样子网络、M个层级的判别支网络、合成子网络和激活层;所述M-1个层级的下采样子网络用于对所述判别网络的输入进行不同层级的下采样处理,以得到M-1个层级的下采样子网络的输出;所述判别网络的输入和所述M-1个层级的下采样子网络的输出分别对应作为所述M个层级的判别支网络的输入;每个层级的判别支网络包括依次连接的亮度处理子网络、第一卷积子网络和第二卷积子网络;第t层级的判别支网络中的第二卷积子网络的输出与第t+1层级的判别支网络中的第一卷积子网络的输出进行联接后作为第t+1层级的判别支网络中的第二卷积子网络的输入,其中,t=1,2,…,M-1;所述合成子网络用于对第M层级的判别支网络中的第二卷积子网络的输出进行合成处理,以得到判别输出图像;所述激活层用于对所述判别输出图像进行处理,以得到表征所述判别网络的输入的质量的数值。
例如,所述亮度处理子网络包括亮度特征提取子网络、归一化子网络和平移相关子网络,所述亮度特征提取子网络用于提取亮度特征图像,所述归一化子网络用于对所述亮度特征图像进行归一化处理,以得到归一化亮度特征图像,所述平移相关子网络用于对所述归一化亮度特征图像进行多次图像平移处理,以得到多个移位图像,并根据所述归一化亮 度特征图像与每个所述移位图像之间的相关性,生成多个相关性图像。
例如,所述损失函数表示为:
Figure PCTCN2020120586-appb-000003
其中,L(Y,X)表示损失函数,Y表示所述第一训练输出图像,Y包括Y W=1和Y W=0,X表示所述第一训练输入图像对应的第一训练标准图像,L G(Y W=1)表示生成损失函数,Y W=1表示所述随机噪声图像的噪声幅度不为0的情况下得到的第一训练输出图像,L L1(S M(Y W=1),S M(X))表示第一对比损失函数,L cont(Y W=1,X)表示内容损失函数,L L1((Y W=0),X)表示第二对比损失函数,Y W=0表示所述随机噪声图像的噪声幅度为0的情况下得到的第一训练输出图像,L L1(S M(Y W=0),S M(X))表示第三对比损失函数,S M()表示进行第M层级的标准下采样处理,λ 1、λ 2、λ 3、λ 4、λ 5分别表示预设的权值;
所述生成损失函数L G(Y W=1)表示为:
L G(Y W=1)=-E[log(Sigmoid(C(Y W=1)-C(X)))],
其中,C(Y W=1)表示所述随机噪声图像的噪声幅度不为0的情况下得到的判别输出图像,C(X)表示第一训练标准图像作为判别网络的输入得到的判别输出图像;
所述第一对比损失函数L L1(S M(Y W=1),S M(X))、所述第二对比损失函数L L1((Y W=0),X)和所述第三对比损失函数L L1(S M(Y W=0),S M(X))分别表示为:
Figure PCTCN2020120586-appb-000004
其中,E[]表示对矩阵能量的计算;
所述内容损失函数L cont(Y W=1,X)表示为:
Figure PCTCN2020120586-appb-000005
其中,S 1为常数,F ij表示在VGG-19网络的conv3-4模块中第i个卷积核提取的第一训练输出图像的第一内容特征图中第j个位置的值,P ij表示在所述VGG-19网络的conv3-4模块中第i个卷积核提取的第一训练标准图像的第二内容特征图中第j个位置的值。
例如,所述神经网络的训练方法还包括:基于所述神经网络,对所述判别网络进行训练;以及,交替地执行所述判别网络的训练过程和所述神经网络的训练过程,以得到训练好的神经网络;其中,基于所述神经网络,对所述判别网络进行训练,包括:获取第二训练输入图像;使用所述神经网络对所述第二训练输入图像进行处理,以得到第二训练输出图像;基于所述第二训练输出图像,通过判别损失函数计算判别损失值;以及根据所述判别损失值对所述判别网络的参数进行修正。
例如,所述判别损失函数表示为:
L D(V W=1)=-E[log(Sigmoid(C(U)-C(V W=1)))],
其中,L D(V W=1)表示判别损失函数,U表示所述第二训练输入图像对应的第二训练标准图像,V W=1表示所述随机噪声图像的噪声幅度不为0的情况下得到的第二训练输出图像,C(U)表示所述第二训练标准图像作为所述判别网络的输入得到的判别输出图像,C(V W=1)表示所述随机噪声图像的噪声幅度不为0的情况下得到的判别输出图像。
例如,所述神经网络的训练方法还包括:在进行训练之前,对训练集的各个样本图像进行裁剪处理和解码处理,以得到二进制数据格式的多个子样本图像;在进行训练时,基于所述二进制数据格式的子样本图像对所述神经网络进行训练。
例如,所述多个子样本图像的尺寸大小相等。
本公开至少一个实施例还提供一种合并神经网络模型的构建方法,包括:获取多个训练好的神经网络模型,其中,所述多个神经网络模型用于执行同一图像处理任务,所述多个神经网络模型的输入图像的分辨率相同,所述多个神经网络模型的输出图像的分辨率相同,所述多个神经网络模型两两之间至少结构和参数之一不同;在同一验证集上获得所述多个神经网络模型的输出,根据预定的图像质量评估标准确定所述多个神经网络模型的评估质量,并将所述多个神经网络模型按照评估质量从高到低进行排序;将评估质量最高的神经网络模型作为所述合并神经网络模型中的第1个神经网络模型;以及判断当前余下的评估质量最高的神经网络模型能否加入当前的合并神经网络模型,若能,则将当前余下的评估质量最高的神经网络模型加入当前的合并神经网络模型,若不能,则将当前的合并神经网络模型作为获得的合并神经网络模型。
例如,所述的合并神经网络模型的构建方法还包括:对所述获得的合并神经网络模型进行训练,以得到训练好的合并神经网络模型。
例如,所述预定的图像质量评估标准包括均方误差、相似度和峰值信噪比之一。
例如,所述多个神经网络模型包括第一神经网络模型,所述第一神经网络模型用于执行第一图像处理方法,所述第一图像处理方法包括:获取输入图像;基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对所述N个层级的初始特征图像中的第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对所述中间特征图像进行合成处理,以得到输出图像;其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;第j+1层级 的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
本公开至少一个实施例还提供一种神经网络处理器,包括分析电路、循环缩放电路和合成电路;所述分析电路配置为基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;所述循环缩放电路配置为基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;所述合成电路配置为对所述中间特征图像进行合成处理,以得到输出图像;其中,所述循环缩放电路包括N-1个层级的逐层嵌套的缩放电路,每个层级的缩放电路包括下采样电路、联接电路、上采样电路和残差链接相加电路;第i层级的下采样电路基于第i层级的缩放电路的输入进行下采样得到第i层级的下采样输出,第i层级的联接电路基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样电路基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加电路将所述第i层级的缩放电路的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放电路的输出,其中,i=1,2,…,N-1;第j+1层级的缩放电路嵌套在第j层级的下采样电路和第j层级的联接电路之间,所述第j层级的下采样电路的输出作为所述第j+1层级的缩放电路的输入,其中,j=1,2,…,N-2。
本公开至少一个实施例还提供一种图像处理装置,包括:图像获取模块,配置为获取输入图像;图像处理模块,配置为:基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对所述中间特征图像进行合成处理,以得到输出图像;其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理输入,其中,j=1,2,…,N-2。
本公开至少一个实施例还提供一种图像处理装置,包括:存储器,用于非暂时性存储计算机可读指令;以及处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行本公开任一实施例提供的图像处理方法,或者执行本公开任一实施例提供的合并神经网络模型的图像处理方法,或者执行本公开任一实施例提供的神经网络的训练方法,或者执行根据本公开任一实施例提供的合并神经网络模型的构建方法。
本公开至少一个实施例还提供一种存储介质,非暂时性地存储计算机可读指令,其中,当所述非暂时性计算机可读指令由计算机执行时可以执行本公开任一实施例提供的图像处理方法的指令,或者可以执行本公开任一实施例提供的合并神经网络模型的图像处理方法的指令,或者可以执行本公开任一实施例提供的神经网络的训练方法的指令,或者可以执行本公开任一实施例提供的合并神经网络模型的构建方法的指令。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为一种卷积神经网络的示意图;
图2A为一种卷积神经网络的结构示意图;
图2B为一种卷积神经网络的工作过程示意图;
图3为本公开一些实施例提供的一种图像处理方法的流程图;
图4A为本公开一些实施例提供的一种对应于图3所示的图像处理方法的示意性流程框图;
图4B为本公开另一些实施例提供的一种对应于图3所示的图像处理方法的示意性流程框图;
图5为本公开一些实施例提供的一种裁剪处理和拼接处理的示意图;
图6为本公开一些实施例提供的一种合并神经网络模型的示意图;
图7为本公开一实施例提供的一种神经网络的结构示意框图;
图8A为本公开一实施例提供的一种神经网络的训练方法的流程图;
图8B为本公开一实施例提供的一种对应于图8A中所示的训练方法训练图7所示的神经网络的示意性架构框图;
图9为本公开一些实施例提供的一种判别网络的结构示意图;
图10为本公开一些实施例提供的一种生成对抗式训练的流程图;
图11A为本公开一些实施例提供的一种判别网络的训练方法的流程图;
图11B为本公开一些实施例提供的一种对应于图11A中所示的训练方法训练图9所示的神经网络的示意性架构框图;
图12为本公开一些实施例提供的一种合并神经网络模型的构建方法的流程图;
图13A为本公开一些实施例提供的一种神经网络处理器的示意性框图;
图13B为本公开一些实施例提供的另一种神经网络处理器的示意性框图;
图14A为本公开一些实施例提供的一种图像处理装置的示意性框图;
图14B为本公开一些实施例提供的另一种图像处理装置的示意性框图;以及
图15为本公开一些实施例提供的一种存储介质的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面通过几个具体的实施例对本公开进行说明。为了保持本公开实施例的以下说明清楚且简明,可省略已知功能和已知部件的详细说明。当本公开实施例的任一部件在一个以上的附图中出现时,该部件在每个附图中由相同或类似的参考标号表示。
图像增强是图像处理领域的研究热点之一。由于在图像采集过程中存在各种物理因素的限制(例如,手机相机的图像传感器尺寸太小以及其他软件、硬件的限制等)以及环境噪声的干扰,会导致图像质量大大降低。图像增强的目的是通过图像增强技术,改善图像的灰度直方图,提高图像的对比度,从而凸显图像细节信息,改善图像的视觉效果。
最初,卷积神经网络(Convolutional Neural Network,CNN)主要用于识别二维形状,其对图像的平移、比例缩放、倾斜或其他形式的变形具有高度不变性。CNN主要通过局部感知野和权值共享来简化神经网络的复杂性、减少权重的数量。随着深度学习技术的发展,CNN的应用范围已经不仅仅限于图像识别领域,其也可以应用在人脸识别、文字识别、动物分类、图像处理等领域。
图1示出了一种卷积神经网络的示意图。例如,该卷积神经网络可以用于图像处理,其使用图像作为输入和输出,并通过卷积核替代标量的权重。图1中仅示出了具有3层结构的卷积神经网络,本公开的实施例对此不作限制。如图1所示,卷积神经网络包括输入层101、隐藏层102和输出层103。输入层101具有4个输入,隐藏层102具有3个输出,输出层103具有2个输出,最终该卷积神经网络最终输出2幅图像。
例如,输入层101的4个输入可以为4幅图像,或者1幅图像的四种特征图像。隐藏层102的3个输出可以为经过输入层101输入的图像的特征图像。
例如,如图1所示,卷积层具有权重
Figure PCTCN2020120586-appb-000006
和偏置
Figure PCTCN2020120586-appb-000007
权重
Figure PCTCN2020120586-appb-000008
表示卷积核,偏置
Figure PCTCN2020120586-appb-000009
是叠加到卷积层的输出的标量,其中,k是表示输入层101的标签,i和j分别是输入层101的单元和隐藏层102的单元的标签。例如,第一卷积层201包括第一组卷积核(图1中的
Figure PCTCN2020120586-appb-000010
) 和第一组偏置(图1中的
Figure PCTCN2020120586-appb-000011
)。第二卷积层202包括第二组卷积核(图1中的
Figure PCTCN2020120586-appb-000012
)和第二组偏置(图1中的
Figure PCTCN2020120586-appb-000013
)。通常,每个卷积层包括数十个或数百个卷积核,若卷积神经网络为深度卷积神经网络,则其可以包括至少五层卷积层。
例如,如图1所示,该卷积神经网络还包括第一激活层203和第二激活层204。第一激活层203位于第一卷积层201之后,第二激活层204位于第二卷积层202之后。激活层(例如,第一激活层203和第二激活层204)包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。ReLU函数为非饱和非线性函数,Sigmoid函数和tanh函数为饱和非线性函数。例如,激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层(例如,第一卷积层201可以包括第一激活层203,第二卷积层202可以包括第二激活层204)中。
例如,在第一卷积层201中,首先,对每个输入应用第一组卷积核中的若干卷积核
Figure PCTCN2020120586-appb-000014
和第一组偏置中的若干偏置
Figure PCTCN2020120586-appb-000015
以得到第一卷积层201的输出;然后,第一卷积层201的输出可以通过第一激活层203进行处理,以得到第一激活层203的输出。在第二卷积层202中,首先,对输入的第一激活层203的输出应用第二组卷积核中的若干卷积核
Figure PCTCN2020120586-appb-000016
和第二组偏置中的若干偏置
Figure PCTCN2020120586-appb-000017
以得到第二卷积层202的输出;然后,第二卷积层202的输出可以通过第二激活层204进行处理,以得到第二激活层204的输出。例如,第一卷积层201的输出可以为对其输入应用卷积核
Figure PCTCN2020120586-appb-000018
后再与偏置
Figure PCTCN2020120586-appb-000019
相加的结果,第二卷积层202的输出可以为对第一激活层203的输出应用卷积核
Figure PCTCN2020120586-appb-000020
后再与偏置
Figure PCTCN2020120586-appb-000021
相加的结果。
在利用卷积神经网络进行图像处理前,需要对卷积神经网络进行训练。经过训练之后,卷积神经网络的卷积核和偏置在图像处理期间保持不变。在训练过程中,各卷积核和偏置通过多组输入/输出示例图像以及优化算法进行调整,以获取优化后的卷积神经网络。
图2A示出了一种卷积神经网络的结构示意图,图2B示出了一种卷积神经网络的工作过程示意图。例如,如图2A和2B所示,输入图像通过输入层输入到卷积神经网络后,依次经过若干个处理过程(如图2A中的每个层级)后输出类别标识。卷积神经网络的主要组成部分可以包括多个卷积层、多个下采样层和全连接层等。本公开中,应该理解的是,多个卷积层、多个下采样层和全连接层等这些层每个都指代对应的处理操作,即卷积处理、下采样处理、全连接处理等,所描述的神经网络也都指代对应的处理操作,以下将要描述的实例标准化层或层标准化层等也与此类似,这里不再重复说明。例如,一个完整的卷积神经网络可以由这三种层叠加组成。例如,图2A仅示出了一种卷积神经网络的三个层级,即第一层级、第二层级和第三层级。例如,每个层级可以包括一个卷积模块和一个下采样层。例如,每个卷积模块可以包括卷积层。由此,每个层级的处理过程可以包括:对输入图像进行卷积(convolution)以及下采样(sub-sampling/down-sampling)。例如,根据实际需要,每个卷积模块还可以包括实例标准化(instance normalization)层或层标准化(layer normalization)层,从而每个层级的处理过程还可以包括实例标准化处理或层标准化处理。
例如,实例标准化层用于对卷积层输出的特征图像进行实例标准化处理,以使特征图像的像素的灰度值在预定范围内变化,从而简化图像生成过程,改善图像增强的效果。例如,预定范围可以为[-1,1]等。实例标准化层根据每个特征图像自身的均值和方差,对该特征图像进行实例标准化处理。例如,实例标准化层还可用于对单幅图像进行实例标准化处理。
例如,假设小批梯度下降法(mini-batch gradient decent)的尺寸为T,某一卷积层输出的特征图像的数量为C,且每个特征图像均为H行W列的矩阵,则特征图像的模型表示为(T,C,H,W)。从而,实例标准化层的实例标准化公式可以表示如下:
Figure PCTCN2020120586-appb-000022
其中,x tijk为该卷积层输出的特征图像集合中的第t个特征块(patch)、第i个特征图像、第j行、第k列的值。y tijk表示经过实例标准化层处理x tijk后得到的结果。ε 1为一个很小的整数,以避免分母为0。
例如,层标准化层与实例标准化层类似,也用于对卷积层输出的特征图像进行层标准化处理,以使特征图像的像素的灰度值在预定范围内变化,从而简化图像生成过程,改善图像增强的效果。例如,预定范围可以为[-1,1]。与实例标准化层不同的是,层标准化层根据每个特征图像每一列的均值和方差,对该特征图像的每一列进行层标准化处理,从而实现对该特征图像的层标准化处理。例如,层标准化层也可用于对单幅图像进行层标准化处理。
例如,仍然以上述小批梯度下降法(mini-batch gradient decent)为例,特征图像的模型表示为(T,C,H,W)。从而,层标准化层的层标准化公式可以表示如下:
Figure PCTCN2020120586-appb-000023
其中,x tijk为该卷积层输出的特征图像集合中的第t个特征块(patch)、第i个特征图像、第j行、第k列的值。y tijk表示经过层标准化层处理x tijk后得到的结果。ε 2为一个很小的整数,以避免分母为0。
卷积层是卷积神经网络的核心层。在卷积神经网络的卷积层中,一个神经元只与部分相邻层的神经元连接。卷积层可以对输入图像应用若干个卷积核(也称为滤波器),以提取输入图像的多种类型的特征。每个卷积核可以提取一种类型的特征。卷积核一般以随机小数矩阵的形式初始化,在卷积神经网络的训练过程中卷积核将通过学习以得到合理的权值。对输入图像应用一个卷积核之后得到的结果被称为特征图像(feature map),特征图像的数目与卷积核的数目相等。每个特征图像由一些矩形排列的神经元组成,同一特征图像的神经元共享权值,这里共享的权值就是卷积核。一个层级的卷积层输出的特征图像可以被输入到相邻的下一个层级的卷积层并再次处理以得到新的特征图像。例如,如图2A所示,第一层级的卷积层可以输出第一层级特征图像,该第一层级特征图像被输入到第二层级的 卷积层再次处理以得到第二层级特征图像。
例如,如图2B所示,卷积层可以使用不同的卷积核对输入图像的某一个局部感受域的数据进行卷积,卷积结果被输入激活层,该激活层根据相应的激活函数进行计算以得到输入图像的特征信息。
例如,如图2A和2B所示,下采样层设置在相邻的卷积层之间,下采样层是下采样的一种形式。一方面,下采样层可以用于缩减输入图像的规模,简化计算的复杂度,在一定程度上减小过拟合的现象;另一方面,下采样层也可以进行特征压缩,提取输入图像的主要特征。下采样层能够减少特征图像的尺寸,但不改变特征图像的数量。例如,一个尺寸为12×12的输入图像,通过6×6的卷积核对其进行采样,那么可以得到2×2的输出图像,这意味着输入图像上的36个像素合并为输出图像中的1个像素。最后一个下采样层或卷积层可以连接到一个或多个全连接层,全连接层用于连接提取的所有特征。全连接层的输出为一个一维矩阵,也就是向量。
本公开至少一个实施例提供一种图像处理方法。该图像处理方法包括:获取输入图像;基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2,其中,分辨率最高的第1层级的初始特征图像的分辨率与输入图像的分辨率相同;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像,中间特征图像的分辨率与输入图像的分辨率相同;以及对中间特征图像进行合成处理,以得到输出图像;其中,循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将第i层级的缩放处理的输入与第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,第j层级的下采样处理的输出作为所述第j+1层级的缩放处理输入,其中,j=1,2,…,N-2。
本公开的一些实施例还提供对应于上述图像处理方法的图像处理装置、神经网络的训练方法、合并网络模型的图像处理方法、合并网络模型的构建方法以及存储介质。
本公开至少一个实施例提供的图像处理方法基于输入图像得到多种不同分辨率的初始特征图像,并结合这些不同分辨率的初始特征图像对分辨率最高的初始特征图像进行循环缩放处理,可以获取更高的图像保真度以及大幅提升输出图像的质量,同时还可以提高处理速度。
下面结合附图对本公开的一些实施例及其示例进行详细说明。
图3为本公开一些实施例提供的一种图像处理方法的流程图,图4A为本公开一些实施例提供的一种对应于图3所示的图像处理方法的示意性流程框图,图4B为本公开另一些实 施例提供的一种对应于图3所示的图像处理方法的示意性流程框图。以下,结合图4A和图4B,对图3所示的图像处理方法进行详细说明。
例如,如图3所示,该图像处理方法包括:
步骤S110:获取输入图像。
例如,如图4A和图4B所示,输入图像标记为INP。
例如,输入图像INP可以包括通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等拍摄采集的照片,其可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。
例如,输入图像INP可以为灰度图像,也可以为彩色图像。例如,彩色图像包括但不限于3个通道的RGB图像等。需要说明的是,在本公开的实施例中,当输入图像INP为灰度图像时,输出图像OUTP也是灰度图像;当输入图像INP是彩色图像时,输出图像OUTP也是彩色图像。
例如,在一些实施例中,输入图像是通过获取具有第一分辨率的原始输入图像,并对原始输入图像进行分辨率转换处理(例如,图像超分辨率重构处理)得到的。例如,在一些实施例中,输入图像具有第二分辨率,且第二分辨率大于第一分辨率。图像超分辨率重构是对图像进行分辨率提升,以获得更高分辨率的图像的技术。在常用的图像超分辨率重构技术的实现方式中,超分辨率图像通常是采用插值算法生成的。例如,常用的插值算法包括最临近插值、双线性插值、双立方插值、兰索斯(Lanczos)插值等等。利用上述插值算法之一,可以基于原始输入图像中的一个像素生成多个像素从而获得基于原始输入图像的超分辨率的输入图像。也就是说,本公开的实施例提供的图像处理方法可以对常规方法生成的超分辨率图像进行增强处理,从而提高该超分辨率图像的质量。
步骤S120:基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2。
例如,在一些实施例中,如图4A所示,可以通过分析网络对输入图像INP进行N个不同层级的分析处理,以分别得到分辨率从高到低排列的N个层级的初始特征图像F01~F0N(例如,图4A所示的F01~F05)。例如,如图4A所示,分析网络包括N个分析子网络ASN,各分析子网络ASN分别用于进行上述不同层级的分析处理,以分别得到分辨率从高到低排列的N个层级的初始特征图像F01~F0N(例如,图4A所示的F01~F05)。例如,各分析子网络ASN可以实现为包括诸如卷积神经网络CNN、残差网络ResNet、密集网络DenseNet等的卷积网络模块,例如,各分析子网络ASN可以包括卷积层、下采样层、标准化层等,但不限于此。
例如,在一些实施例中,如图4B所示,可以先将输入图像INP与随机噪声图像noise联接(concatenate,如图中CONCAT所示),以得到联合输入图像;然后通过分析网络对联合输入图像进行N个不同层级的分析处理,以分别得到分辨率从高到低排列的N个层级的初始特征图像F01~F0N。例如,联接处理CONCAT可以看作:将待联接的多个(例如, 两个或两个以上)图像的各通道图像堆叠,从而使得联接得到的图像的通道数为待联接的多个图像的通道数之和。例如,联合输入图像的各通道图像即为输入图像的各通道图像与随机噪声图像的各通道图像的综合。例如,随机噪声图像noise中的随机噪声可以符合高斯分布,但不限于此。例如,图4B所示的实施例中的分析处理的具体过程和细节可以参考图4A所示的实施例中的分析处理的相关描述,在此不再重复赘述。
需要说明的是,在进行图像增强处理时,输出图像中的细节特征(例如,毛发、线条等)往往会和噪声有关。在应用神经网络进行图像增强处理时,根据实际需要(是否需要突出细节以及细节的突出程度等),来调节输入噪声的幅度,从而使输出图像满足实际需求。例如,在一些实施例中,随机噪声图像的噪声幅度可以为0;例如,在另一些实施例中,随机噪声图像的噪声幅度可以不为0。本公开的实施例对此不作限制。
例如,在图4A和图4B中,各层级的次序是按照自上而下的方向进行确定。
例如,在一些实施例中,分辨率最高的第1层级的初始特征图像F01的分辨率可以与输入图像INP的分辨率相同。例如,在一些实施例中,输入图像是通过对原始输入图像进行分辨率转换处理(例如,图像超分辨率重构处理)得到的,在此情况下,分辨率最低的第N层级的初始特征图像的分辨率可以与原始输入图像的分辨率相同,需要说明的是,本公开的实施例包括但不限于此。
例如,在一些实施例中,前一层级(例如,第i层级)的初始特征图像的分辨率为后一层级(例如,第i+1层级)的初始特征图像的分辨率的整数倍,例如2倍,3倍,4倍,…,等等,本公开的实施例对此不作限制。
需要说明的是,虽然图4A和图4B均示出了得到5个层级的初始特征图像F01~F05(即N=5)的情形,但不应视作对本公开的限制,即N的取值可以根据实际需要进行设置。
步骤S130:基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像。
例如,如图4A和图4B所示,循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括依次执行的下采样处理DS、联接处理CONCAT、上采样处理US和残差链接相加处理ADD。
例如,如图4A和图4B所示,第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将第i层级的缩放处理的输入与第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1。
例如,如图4A和图4B所示,第i层级的联接处理基于第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,包括:将第i层级的下采样输出作为第i+1层级的缩放处理的输入,以得到第i+1层级的缩放处理的输出;以及将第i+1层级 的缩放处理的输出与第i+1层级的初始特征图像进行联接,以得到第i层级的联合输出。
下采样处理DS用于减小特征图的尺寸,从而减少特征图的数据量,例如可以通过下采样层进行下采样处理,但不限于此。例如,下采样层可以采用最大值合并(max pooling)、平均值合并(average pooling)、跨度卷积(strided convolution)、欠采样(decimation,例如选择固定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等下采样方法实现下采样处理。例如,下采样层还可以采用内插值、双线性插值、双立方插值(Bicubic Interprolation)、兰索斯(Lanczos)插值等插值算法进行下采样处理。例如,在利用插值算法进行下采样处理时,可以仅保留插入值而去除原始像素值,从而减小特征图的尺寸。
上采样处理US用于增大特征图的尺寸,从而增加特征图的数据量,例如可以通过上采样层进行上采样处理,但不限于此。例如,上采样层可以采用跨度转置卷积(strided transposed convolution)、插值算法等上采样方法实现上采样处理。插值算法例如可以包括内插值、双线性插值、两次立方插值(Bicubic Interprolation)、兰索斯(Lanczos)插值等算法。例如,在利用插值算法进行上采样处理时,可以保留原始像素值和插入值,从而增大特征图的尺寸。
每个层级的缩放处理可以视为一个残差网络,残差网络可以通过残差链接相加处理将其输入以一定的比例保持在其输出中,即通过残差链接相加处理ADD,可以将每个层级的缩放处理的输入以一定的比例保持在每个层级的缩放处理的输出中。例如,残差链接相加处理ADD的输入与输出的尺寸相同。例如,以特征图像为例,残差链接相加处理可以包括将两幅特征图像的矩阵的每一行、每一列的值对应相加,但不限于此。
需要说明的是,在本公开的一些实施例中,同一层级的下采样处理的下采样因子与上采样处理的上采样因子对应,即:当该下采样处理的下采样因子为1/y时,则该上采样处理的上采样因子为y,其中y为正整数,且y通常等于或大于2。从而,可以确保同一层级的上采样处理的输出和下采样处理的输入尺寸相同。
需要说明的是,在本公开的一些实施例(不限于本实施例)中,不同层级的下采样处理的参数(即该下采样处理对应的网络结构的参数)可以相同,也可以不同;不同层级的上采样处理的参数(即该上采样处理对应的网络结构的参数)可以相同,也可以不同;不同层级的残差链接相加处理的参数可以相同,也可以不同。本公开的实施例对此不作限制。
需要说明的是,在本公开的一些实施例(不限于本实施例)中,不同次序的同一层级的下采样处理的参数可以相同,也可以不同;不同次序的同一层级的上采样处理的参数可以相同,也可以不同;不同次序的同一层级的残差链接相加处理的参数可以相同,也可以不同。本公开的实施例对此不作限制。
例如,在本公开的一些实施例中,为了改善特征图像的亮度、对比度等全局特征,多尺度循环采样处理还可以包括:对下采样处理的输出、上采样处理的输出等进行实例标准化处理或层标准化处理。需要说明的是,下采样处理的输出、上采样处理的输出等可以采用相同的标准化处理方法(实例标准化处理或层标准化处理),也可以采用不同的标准化 处理方法,本公开的实施例对此不作限制。
例如,如图4A和4B所示,第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,其中,j=1,2,…,N-2。也就是说,第j层级的下采样处理的输出作为第j+1层级的缩放处理的输入,同时,第j+1层级的缩放处理的输出作为第j层级的联接处理的输入之一(第j+1层级的初始特征图像作为第j层级的联接处理的输入之另一)。
需要说明的是,在本公开中,“嵌套”是指一个对象中包括与该对象相似或相同的另一个对象,所述对象包括但不限于流程或者网络结构等。
例如,在一些实施例中,至少一个层级的缩放处理可以连续执行多次,即每个层级可以包括多次缩放处理,例如,前一次缩放处理的输出作为后一次缩放处理的输入。例如,如图4A和图4B所示,每个层级的缩放处理可以连续执行两次,在此情况下,既可以提升输出图像的质量,又可以避免网络结构复杂化。需要说明的是,本公开的实施例对每个层级的缩放处理的具体执行次数并不作限制。
例如,在一些实施例中,中间特征图像的分辨率与输入图像INP的分辨率相同。
例如,如图4A和图4B所示,在N=5的情形下,可以基于第2~5层级的初始特征图像F01~F05,对第1层级的初始特征图像F01进行上述循环缩放处理,以得到中间特征图像FM。
步骤S140:对中间特征图像进行合成处理,以得到输出图像。
例如,在一些实施例中,如图4A和图4B所示,可以通过合成网络MERG对中间特征图像FM进行合成处理,以得到输出图像OUTP。例如,在一些实施例中,合成网络可以包括卷积层等。例如,该输出图像可以包括1个通道的灰度图像,也可以包括例如3个通道的RGB图像(即彩色图像)。需要说明的是,本公开的实施例对合成网络MERG的结构和参数不作限制,只要其能将卷积特征维度(即中间特征图像FM)转换为输出图像OUTP即可。
需要说明的是,直接使用上述图像处理方法对较高分辨率(例如,分辨率在4k或4k以上等)的输入图像进行处理,对图像处理装置的硬件条件(例如,显存等)的要求较高。因此,在一些实施例中,为了解决上述问题,可以先对输入图像进行裁剪处理,以得到具有交叠区域的多个子输入图像;然后,利用上述图像处理方法(例如前述步骤S110值步骤S140等)对该多个子输入图像分别进行处理,以得到对应的多个子输出图像;最后,将该对应的多个子输出图像拼接为输出图像。
图5为本公开一些实施例提供的一种裁剪处理和拼接处理的示意图。以下,结合图5对上述裁剪处理和拼接处理进行详细说明。
例如,在一些实施例中,如图5所示,可以将输入图像裁剪为具有交叠区域的多个子输入图像(例如,如图5中以各自的中心T1~T4表示的四个矩形框所示)。该多个子输入图像应当覆盖整个输入图像,即输入图像中的每个像素点应包括于至少一个子输入图像中。例如,在一些实施例中,该多个子输入图像的尺寸大小和分辨率都相同,且该多个子输入 图像的中心形成一个均匀规则的网格,即在水平方向(即行方向)和在竖直方向(即列方向)上的相邻的中心之间的距离必须分别恒定。例如,在一些实施例中,在行方向或/和列方向上,相邻的两个子输入图像的交叠区域的尺寸大小是恒定的。
应当理解的是,输入图像中的像素点的行列位置与输出图像中的像素点的行列位置一一对应,各子输入图像中的像素点的行列位置与对应的子输出图像中的像素点的行列位置一一对应,也就是说,图5中以各自的中心T1~T4表示的四个矩形框还可以表示对应的四幅子输出图像。
例如,在将对应的多个子输出图像拼接为输出图像的过程中,可以通过以下公式计算输出图像中每个像素点的像素值:
Figure PCTCN2020120586-appb-000024
其中,Y p表示输出图像中的任意一个像素点p的像素值,T表示包括该像素点p的子输出图像的数量,Y k,(p)表示该像素点p在第k幅包括该像素点p的子输出图像中的像素值,s k表示在第k幅包括该像素点p的子输出图像中该像素点p到该第k幅包括该像素点p的子输出图像的中心的距离。
例如,在一些实施例中,可以通过以下步骤计算实现上述拼接处理过程:
(1)初始化输出图像矩阵,其中,所有像素值设置为零。需要说明的是,当输出图像为灰度图像时,输出图像矩阵具有1个通道;当输出图像为3个通道的RGB图像(即彩色图像)时,输出图像矩阵相应具有3个通道。
(2)初始化计数矩阵,其中,所有元素值设置为零。该计数矩阵的尺寸大小(分辨率)与输出图像矩阵的尺寸大小(分辨率)相同,且该计数矩阵具有1个通道。
(3)将每个子输出图像中的每个像素点到该子输出图像的中心的距离与该像素点对应的计数矩阵的元素的当前值相加后作为该对应的计数矩阵的元素的新值;将每个子输出图像中的每个像素点的像素值乘以该子输出图像中该像素点到该子输出图像的中心的距离后与该像素点对应的输出图像矩阵的当前像素值相加后作为该对应的输出图像矩阵的新像素值。
(4)将输出图像矩阵中的每个像素值除以计数矩阵中与之对应的元素值,得到最终的像素值,从而得到最终的输出图像矩阵,即输出图像。应当理解的是,在上述除法过程中,应该确保技术矩阵中的每个元素值大于零。
需要说明的是,虽然图5中仅示出了T=4的情形,但不应视作对本公开的限制,即T的取值可以根据实际需要进行设置。
需要说明的是,上述拼接处理的算法是示例性的,本公开的实施例对此不作限制,对于其他拼接处理的算法,只要其能够对交叠区域中的像素点的像素值进行合理处理并满足实际需求即可。
应当理解的是,当输出图像为彩色图像,例如3个通道的RGB图像时,上述裁剪处理和拼接处理的对象应当是各个通道的图像。
本公开的实施例提供的图像处理方法基于输入图像得到多种不同分辨率的初始特征图像,并结合这些不同分辨率的初始特征图像对分辨率最高的初始特征图像进行循环缩放处理,可以获取更高的图像保真度以及大幅提升输出图像的质量,同时还可以提高处理速度。
本公开至少一实施例还提供一种合并神经网络模型的图像处理方法。图6为本公开一些实施例提供的一种合并神经网络模型的示意图。例如,如图6所示,该合并神经网络模型包括多个神经网络模型。例如,该多个神经网络模型用于执行同一图像处理任务,该多个神经网络模型的输入图像的分辨率(即尺寸)相同,该多个神经网络模型的输出图像的分辨率(即尺寸)也相同;同时,该多个神经网络模型两两之间至少结构和参数之一不同(参数不同是指参数至少不完全相同)。例如,在一些实施例中,结构相同而参数不同的神经网络模型可以是基于不同的训练配置训练得到。例如,上述不同的训练配置是指不同的训练集、不同的初始参数、不同的卷积核尺寸、不同的超参数等之一或其任意组合。
例如,如图6所示,该合并神经网络模型的图像处理方法可以包括:将输入图像输入该合并神经网络模型中的多个神经网络模型,以分别得到该多个神经网络模型的输出;以及将该多个神经网络模型的输出相加取平均值,以得到该合并神经网络模型的输出(即输出图像)。
需要说明的是,在实际应用中,通常会对多种神经网络结构的模型通过调节超参数来进行训练,从而产生很多训练过的模型;之后,在这些模型之中挑选表现最好(即输出效果最好)的模型作为主要解决方案,在接下来的阶段里主要集中调优该模型,但是那些拥有相似或略差表现却被淘汰的模型往往不会再被利用上。相比之下,本公开的实施例提供的合并神经网络模型的图像处理方法可以将这些拥有相似或略差表现的模型利用起来,从而使合并神经网络模型的输出效果比表现最好的单一神经网络的输出效果更优。
应当理解的是,对于执行相同图像处理任务而具有其它具体结构的神经网络模型(此时,对训练配置是否相同不作要求),如果其输入和输出的尺寸与上述单一神经网络模型的输入和输出的尺寸相同,则可以将其以加入或替换(例如,替换表现较差的模型)等方式并入现有的合并神经网络模型,只要其可以使新的合并神经网络模型具有更优的输出效果即可。
例如,在一些实施例中,该多个神经网络模型可以包括第一神经网络模型,该第一神经网络模型用于执行第一图像处理方法,例如第一图像处理方法即为前述实施例提供的图像处理方法(例如,包括上述步骤S110至步骤S140等),本公开的实施例包括但不限于此。
需要说明的是,虽然图6中仅示出了合并神经网络模型包括三个神经网络模型NNM1~NNM3的情形,但不应视作对本公开的限制,即合并神经网络模型可以根据实际需求包括更多或更少的神经网络模型。
需要说明的是,合并神经网络模型的构建可以参考后续将要说明的合并神经网络模型的构建方法的相关描述,在此不再赘述。
本公开的实施例提供的合并神经网络模型的图像处理方法,可以直接平均多个神经网络模型的输出以获得更优的输出效果,且该合并神经网络模型便于更新(即加入新的神经网络模型,或用新的神经网络模型替换现有合并神经网络模型中表现较差的神经网络模型等)。
本公开的实施例提供的合并神经网络模型的图像处理方法的技术效果可以参考前述实施例中关于图像处理方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种神经网络的训练方法。图7为本公开一实施例提供的一种神经网络的结构示意框图,图8A为本公开一实施例提供的一种神经网络的训练方法的流程图,图8B为本公开一实施例提供的一种对应于图8A中所示的训练方法训练图7所示的神经网络的示意性架构框图。
例如,如图7所示,该神经网络100包括分析网络110、循环缩放网络120和合成网络130。例如,该神经网络100可以用于执行前述实施例(例如,图4A或图4B所示的实施例)提供的图像处理方法。例如,分析网络110可以用于执行前述图像处理方法中的步骤S120,即分析网络110可以对输入图像进行处理以得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;循环缩放网络120可以用于执行前述图像处理方法中的步骤S130,即循环缩放网络120可以基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;合成网络130可以用于执行前述图像处理方法中的步骤S140,即合成网络130可以中间特征图像进行合成处理,以得到输出图像。例如,神经网络100、分析网络110、循环缩放网络120和合成网络130的具体结构及其对应的具体处理过程和细节可以参考前述图像处理方法中的相关描述,在此不再重复赘述。
例如,输入图像和输出图像也可以参考前述实施例提供的图像处理方法中关于输入图像和输出图像的描述,在此不再重复赘述。
例如,结合图8A和图8B所示,该神经网络的训练方法包括步骤S210至步骤S260。
步骤S210:获取第一训练输入图像。
例如,与前述步骤S110中的输入图像类似,第一训练输入图像也可以包括通过智能手机的摄像头、平板电脑的摄像头、个人计算机的摄像头、数码照相机的镜头、监控摄像头或者网络摄像头等拍摄采集的照片,其可以包括人物图像、动植物图像或风景图像等,本公开的实施例对此不作限制。
例如,第一训练输入图像可以为灰度图像,也可以为彩色图像。例如,彩色图像包括但不限于3个通道的RGB图像等。
例如,在一些实施例中,第一训练输入图像是通过获取训练原始输入图像,并对训练原始输入图像进行分辨率转换处理(例如,图像超分辨率重构处理)得到的。在常用的图像超分辨率重构技术的实现方式中,超分辨率图像通常是采用插值算法生成的。例如,常 用的插值算法包括最临近插值、双线性插值、双立方插值、兰索斯(Lanczos)插值等等。利用上述插值算法之一,可以基于训练原始输入图像中的一个像素生成多个像素从而获得基于训练原始输入图像的超分辨率的第一训练输入图像。
步骤S220:使用分析网络对第一训练输入图像进行处理,以得到分辨率从高到低排列的N个层级的训练初始特征图像,N为正整数,且N>2。
例如,与前述步骤S120中的分析网络类似,分析网络110可以包括N个分析子网络,各分析子网络分别用于进行不同层级的分析处理,以分别得到分辨率从高到低排列的N个层级的训练初始特征图像。例如,各分析子网络可以实现为包括诸如卷积神经网络CNN、残差网络ResNet、密集网络DenseNet等的卷积网络模块,例如,各分析子网络可以包括卷积层、下采样层、标准化层等,但不限于此。
例如,在一些实施例中,分辨率最高的第1层级的训练初始特征图像的分辨率可以与第一训练输入图像的分辨率相同。例如,在一些实施例中,第一训练输入图像是通过对训练原始输入图像进行分辨率转换处理(例如,图像超分辨率重构处理)得到的,在此情况下,分辨率最低的第N层级的训练初始特征图像的分辨率可以与训练原始输入图像的分辨率相同,需要说明的是,本公开的实施例包括但不限于此。
步骤S230:使用循环缩放网络,基于第2~N层级的训练初始特征图像,对第1层级的训练初始特征图像进行循环缩放处理,以得到训练中间特征图像。
例如,步骤S230中的循环缩放网络120的循环缩放处理的具体过程和细节可以参考前述步骤S130中关于循环缩放处理的相关描述,在此不再重复赘述。
步骤S240:使用合成网络对训练中间特征图像进行合成处理,以得到第一训练输出图像。
例如,与前述步骤S140中的合成网络类似,合成网络130也可以包括卷积层等。例如,该第一训练输出图像可以为包括1个通道的灰度图像,也可以为包括例如3个通道的RGB图像(即彩色图像)。需要说明的是,本公开的实施例对合成网络330的结构和参数不作限制,只要其能将卷积特征维度(即训练中间特征图像)转换为第一训练输出图像即可。
步骤S250:基于第一训练输出图像,通过损失函数计算神经网络的损失值。
例如,神经网络100的参数包括分析网络110的参数、循环缩放网络120的参数和合成网络130的参数。例如,神经网络100的初始参数可以为随机数,例如随机数符合高斯分布,本公开的实施例对此不作限制。
例如,在一些实施例中,N个层级的训练初始特征图像是通过分析网络110直接对第一训练输入图像(不与随机噪声图像联接)进行不同层级的分析处理得到的(参考图4A所示)。在此情况下,上述损失函数可以表示为:
Figure PCTCN2020120586-appb-000025
其中,L(Y,X)表示损失函数,Y表示第一训练输出图像,X表示第一训练输入图像对应的第一训练标准图像,S k-1()表示进行第k-1层级的下采样处理,S k-1(Y)表示对第一训练输出图像进行第k-1层级下采样处理得到的输出,S k-1(X)表示对所述第一训练标准图像进行第k-1层级的下采样处理得到的输出,E[]表示对矩阵能量的计算。例如,E[]可以为计算“[]”中的矩阵中元素的最大值或平均值。
例如,第一训练标准图像X具有与第一训练输入图像相同的场景,即二者的内容相同,同时,第一训练标准图像X的质量比第一训练输出图像的质量高。例如,第一训练标准图像X相当于神经网络100的目标输出图像。例如,图像的质量评价标准包括均方误差(MSE)、相似度(SSIM)、峰值信噪比(PSNR)等。例如,第一训练标准图像X可以为例如数码单镜反光相机拍摄的照片图像。例如,在一些实施例中,可以采用双线性插值、双立方插值、兰索斯(Lanczos)插值等插值算法对第一训练标准图像X进行下采样处理以得到训练原始输入图像,然后对训练原始输入图像进行分辨率转换处理(例如,图像超分辨率重构处理)以得到第一训练输入图像,从而可以确保第一训练标准图像X与第一训练输入图像具有相同的场景。需要说明的是,本公开的实施例包括但不限于此。
例如,当k=1时,S k-1()=S 0(),即第0层级的下采样处理,表示不进行下采样处理;当k>1时,第k-1层级的下采样处理的输出的分辨率随着k的增大而减小,例如,第k-1层级的下采样处理采用的下采样方法可以与前述循环缩放处理中的第k-1层级的下采样处理的下采样方法相同。例如,在一些实施例中,S 1(Y)和S 1(X)的分辨率与前述循环缩放处理中的第1层级的下采样输出的分辨率相同,S 2(Y)和S 2(X)的分辨率与前述循环缩放处理中的第2层级的下采样输出的分辨率相同,S 3(Y)和S 3(X)的分辨率与前述循环缩放处理中的第3层级的下采样输出的分辨率相同,…,以此类推,S N-1(Y)和S N-1(X)的分辨率与前述循环缩放处理中的第N-1层级的下采样输出的分辨率相同。需要说明的是,本公开的实施例对此不作限制。
例如,在本实施例中,神经网络100的训练目标是最小化损失值。例如,在神经网络100的训练过程中,神经网络100的参数被不断地修正,以使经过参数修正后的神经网络100输出的第一训练输出图像不断接近于第一训练标准图像,从而不断地减小损失值。需要说明的是,本实施例提供的上述损失函数是示例性的,本公开的实施例包括但不限于此。
例如,在另一些实施例中,N个层级的训练初始特征图像是通过先将第一训练输入图像与随机噪声图像联接(CONCAT)以得到训练联合输入图像,然后再通过分析网络110对训练联合输入图像进行N个不同层级的分析处理得到的(参考图4B所示)。在此情况下,神经网络100的训练过程需要联合判别网络而进行。例如,在一些实施例中,可以使用判别网络对第一训练输出图像进行处理,并基于第一训练输出图像对应的判别网络的输出计算神经网络100的损失值。
图9为本公开一些实施例提供的一种判别网络的结构示意图。如图9所示,该判别网络200包括M-1个层级的下采样子网络、M个层级的判别支网络、合成子网络和激活层, 其中,M为正整数,且M>1。例如,图9中示出了M=3的情形,但不应视作对本公开的限制,即M的取值可以根据实际需要进行设置。例如,在一些实施例中,M=N-1。例如,在图9中,各层级的次序是按照自上而下的方向进行确定。
例如,如图9所示,使用判别网络对第一训练输出图像进行处理时,首先通过M-1个层级的下采样子网络分别对第一训练输出图像进行不同层级的下采样处理,以得到M-1个层级的下采样子网络的输出;然后,将第一训练输出图像和该M-1个层级的下采样子网络的输出分别对应作为M个层级的判别支网络的输入。例如,在一些实施例中,上一个层级的下采样子网络的输出的分辨率高于下一个层级的下采样子网络的输出的分辨率。例如,在一些实施例中,第一训练输出图像作为第1层级的判别支网络的输入,第1层级的下采样子网络的输出作为第2层级的判别支网络的输入,第2层级的下采样子网络的输出作为第3层级的判别支网络的输入,…,以此类推,第M-1层级的下采样子网络的输出作为第M层级的判别支网络的输入。
例如,下采样子网络包括下采样层。例如,下采样子网络可以采用最大值合并(max pooling)、平均值合并(average pooling)、跨度卷积(strided convolution)、欠采样(decimation,例如选择固定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等下采样方法实现下采样处理。例如,下采样层还可以采用内插值、双线性插值、双立方插值(Bicubic Interprolation)、兰索斯(Lanczos)插值等插值算法进行下采样处理。
例如,如图9所示,每个层级的判别支网络包括依次连接的亮度处理子网络(如图9中虚线框所示)、第一卷积子网络和第二卷积子网络。例如,在一些实施例中,该亮度处理子网络可以包括亮度特征提取子网络、归一化子网络和平移相关子网络。
例如,每个层级的亮度特征提取子网络用于提取该层级的判别支网络的输入的亮度特征图像。由于人眼对图像的亮度特征比较敏感,而对其他特征并不敏感,因此,通过提取训练图像的亮度特征,能够去除一些不必要的信息,从而减少运算量。应当理解的是,亮度特征提取子网络可以用于提取彩色图像的亮度特征图像,即亮度特征提取子网络在第一训练输出图像为彩色图像时起作用;而当判别支网络的输入(即第一训练输出图像等)为灰度图像时,可以不需要亮度特征提取子网络。
以第一训练输出图像为3个通道的RGB图像(即彩色图像)为例,在情况下,M-1个层级的下采样子网络的输出也均为3个通道的RGB图像,也就是说,每个层级的判别支网络的输入均为3个通道的RGB图像。此时,特征提取子网络可以通过下述公式提取亮度特征图像:
P=0.299R+0.587G+0.114B,
其中,R、G和B分别表示RGB格式图像的红色信息(即第一通道的数据信息)、绿色信息(即第二通道的数据信息)和蓝色信息(即第三通道的数据信息),P表示转换得到的亮度信息。
例如,归一化子网络用于对上述亮度特征图像进行归一化处理,以得到归一化亮度特 征图像,通过归一化处理后,能够将归一化亮度特征图像的像素值统一在比较小的数值范围内,防止某些像素值过大或过小,从而更便于相关性的计算。
例如,归一化子网络可以通过下述公式进行归一化处理:
Figure PCTCN2020120586-appb-000026
其中,J为归一化亮度特征图像,I为亮度特征图像;Blur()为高斯模糊运算。即,Blur(I)表示对亮度特征图像进行高斯模糊运算,Blur(I 2)表示将亮度特征图像中各像素值进行平方得到新的特征图像,并对该新的特征图像进行高斯模糊运算。μ为亮度特征图像经过高斯模糊运算后得到的图像,σ 2为亮度特征图像的局部方差图像(variance normalized image)。
例如,平移相关子网络用于对上述归一化亮度特征图像进行多次图像平移处理,以得到多个移位图像;并根据该归一化亮度特征图像与每个移位图像之间的相关性,生成多个相关性图像。
例如,在一些实施例中,每次图像平移处理包括:将归一化亮度特征图像的后a列像素沿行方向平移至其余像素之前,以得到中间图像;然后,将该中间图像的后b行像素沿列方向平移至其余像素之前,以得到移位图像。其中,0≤a<H,0≤b<W,a、b均为整数,H为归一化亮度特征图像中像素的总行数,W为归一化亮度特征图像中像素的总列数;并且,在任意两次图像平移过程中,a、b中至少一者的取值发生改变。经过这种方式的图像平移处理得到的移位图像中,各像素的值与该亮度特征图像的各像素的值一一对应相同;并且,所有移位图像中第i行第j列的像素的值分别来自于第一特征图像中不同位置的像素。
需要说明的是,当a和b同时为0时,则移位图像即为归一化亮度特征图像本身。另外,每次图像平移也可以先将归一化亮度特征图像的后b行像素沿列方向平移至其余像素之前,以得到中间图像,然后再将中间图像的后a列像素沿行方向平移至其余像素之前,以得到移位图像。例如,在一些实施例中,图像平移处理的次数为H×W次(其中,a和b同时为0时也算一次),从而获得H×W个相关性图像。
例如,在一些实施例中,根据归一化亮度特征图像与每个移位图像之间的相关性,生成多个相关性图像,包括:将归一化亮度特征图像中第i行第j列像素的值与每个移位图像中第i行第j列像素的值的乘积作为对应的相关性图像中第i行第j列像素的值;其中,1≤i≤H,1≤j≤W,i、j均为整数。
例如,第一卷积子网络用于对多个相关性图像进行卷积处理,以得到第一卷积特征图像,即第一卷积子网络可以包括卷积层。例如,在一些实施例中,第一卷积子网络还可以包括标准化层,从而第一卷积子网络还可以进行标准化处理,需要说明的是,本公开的实施例包括但不限于此。
例如,第二卷积子网络可以包括卷积层和下采样层,从而可以对第二卷积子网络的输入进行卷积处理和下采样处理。例如,如图9所示,第1层级的判别支网络中的第一卷积 子网络的输出作为第1层级的判别支网络中的第二卷积子网络的输入;第t层级的判别支网络中的第二卷积子网络的输出与第t+1层级的判别支网络中的第一卷积子网络的输出进行联接(CONCAT)后作为第t+1层级的判别支网络中的第二卷积子网络的输入,其中,t为整数,且1≤t≤M-1。
步骤S214:使用第三主干网络对第二中间输出进行处理,以得到多个第一特征图。
例如,如图9所示,合成子网络连接到第M层级的判别支网络中的第二卷积子网络连接,合成子网络用于对第M层级的判别支网络中的第二卷积子网络的输出进行合成处理,以得到判别输出图像。例如,在一些实施例中,合成子网络的具体结构及其进行合成处理的具体过程和细节可以参考前述合成网络的相关描述,在此不再重复赘述。
例如,如图9所示,激活层连接到合成子网络。例如,在一些实施例中,该激活层的激活函数可以采用Sigmoid函数,从而,该激活层的输出(即判别网络200的输出)为一个在[0,1]的取值范围内的数值。例如,判别网络200的输出可以用于表征例如第一训练输出图像的质量。例如,判别网络200输出的数值越大,例如趋近于1,则表示判别网络200认定第一训练输出图像的质量越高(例如,越接近于第一训练标准图像的质量);例如,判别网络200输出的数值越小,例如趋近于0,则表示判别网络200认定第一训练输出图像的质量越低。
例如,在联合上述判别网络200对神经网络100进行训练的情况下,神经网络100的损失函数可以表示为:
Figure PCTCN2020120586-appb-000027
其中,L(Y,X)表示损失函数,Y表示第一训练输出图像(包括Y W=1和Y W=0),X表示第一训练输入图像对应的第一训练标准图像,L G(Y W=1)表示生成损失函数,Y W=1表示随机噪声图像的噪声幅度不为0的情况下得到的第一训练输出图像,L L1(S M(Y W=1),S M(X))表示第一对比损失函数,L cont(Y W=1,X)表示内容损失函数,L L1((Y W=0),X)表示第二对比损失函数,Y W=0表示随机噪声图像的噪声幅度为0的情况下得到的第一训练输出图像,L L1(S M(Y W=0),S M(X))表示第三对比损失函数,S M()表示进行第M层级的下采样处理,λ 1、λ 2、λ 3、λ 4、λ 5分别表示预设的权值。
例如,上述预设的权值可以根据实际需求进行调整。例如,在一些实施例中,λ 12345=0.001:10:0.1:10:10,本公开的实施例包括但不限于此。
例如,在一些实施例中,生成损失函数L G(Y W=1)可以表示为:
L G(Y W=1)=-E[log(Sigmoid(C(Y W=1)-C(X)))],
其中,E[]表示对矩阵能量的计算。例如,E[]可以为计算“[]”中的矩阵中元素的最大值或平均值。
例如,在一些实施例中,可以采用内容特征提取模块提供第一训练输出图像和第一训练标准图像的内容特征。例如,在一些实施例中,该内容特征提取模块可以为VGG-19网络中的conv3-4模块,本公开的实施例包括但不限于此。需要说明的是,VGG-19网络为深度卷积神经网络的一种,其是由牛津大学视觉几何组(Visual Geometry Group)开发,已经在视觉识别领域得到广泛应用。例如,在一些实施例中,内容损失函数L cont(Y W=1,X)可以表示为:
Figure PCTCN2020120586-appb-000028
其中,S 1为常数,F ij表示在内容特征提取模块中第i个卷积核提取的第一训练输出图像的第一内容特征图中第j个位置的值,P ij表示在内容特征提取模块中第i个卷积核提取的第一训练标准图像的第二内容特征图中第j个位置的值。
需要说明的是,上述公式表示的内容损失函数是示例性的,例如,内容损失函数还可以表示为其他常用的公式,本公开的实施例对此不作限制。
需要说明的是,上述神经网络100的损失函数的具体表达形式是示例性的,本公开的实施例对此不作限制,即神经网络100的损失函数可以根据实际需要包括更多或更少的组成部分。
步骤S260:根据神经网络的损失值对神经网络的参数进行修正。
例如,在神经网络100的训练过程中还可以包括优化函数(图8B中未示出),优化函数可以根据损失函数计算得到的损失值计算神经网络100的参数的误差值,并根据该误差值对神经网络100的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算神经网络100的参数的误差值。
例如,在一些实施例中,神经网络100的训练方法还可以包括:判断神经网络的训练是否满足预定条件,若不满足预定条件,则重复执行上述训练过程(即步骤S210至步骤S260);若满足预定条件,则停止上述训练过程,得到训练好的神经网络。例如,在一些实施例中,上述预定条件为连续两幅(或更多幅)第一训练输出图像对应的损失值不再显著减小。例如,在另一些实施例中,上述预定条件为神经网络的训练次数或训练周期达到预定数目。需要说明的是,本公开的实施例对此不作限制。
例如,训练好的神经网络100输出的第一训练输出图像Y在内容和质量等方面都接近于第一训练标准图像X。
需要说明的是,在联合判别网络200对神经网络100进行训练的过程中,通常需要进行生成对抗式训练。图10为本公开一些实施例提供的一种生成对抗式训练的流程图。例如,如图10所示,生成对抗式训练包括:
步骤S300:基于神经网络,对判别网络进行训练;
步骤S400:基于判别网络,对神经网络进行训练;以及,
交替地执行上述训练过程,以得到训练好的神经网络。
例如,步骤S400中的神经网络的训练过程可以通过上述步骤S210至步骤S260实现,在此不再重复赘述。需要说明的是,在神经网络100的训练过程中,判别网络200的参数保持不变。需要说明的是,在生成对抗式训练中,神经网络100通常也可以称为生成网络100。
图11A为本公开一些实施例提供的一种判别网络的训练方法的流程图,图11B为本公开一些实施例提供的一种对应于图11A中所示的训练方法训练图9所示的神经网络的示意性架构框图。以下,结合图11A和图11B对判别网络200的训练过程(即步骤S300)进行详细说明。
例如,结合图11A和图11B所示,判别网络200的训练过程,即步骤S300,包括步骤S310至步骤S340,如下所示:
步骤S310:获取第二训练输入图像;
步骤S320:使用神经网络对第二训练输入图像进行处理,以得到第二训练输出图像;
步骤S330:基于第二训练输出图像,通过判别损失函数计算判别损失值;
步骤S340:根据判别损失值对判别网络的参数进行修正。
例如,判别网络200的训练过程,即步骤S400还可以包括:判断判别网络200的训练是否满足预定条件,若不满足预定条件,则重复执行上述判别网络200的训练过程;若满足预定条件,则停止判别网络200的训练过程,得到训练好的判别网络200。例如,在一个示例中,上述预定条件为连续两幅(或更多幅)第二训练输出图像和第二训练标准图像对应的判别损失值不再显著减小。例如,在另一个示例中,上述预定条件为判别网络200的训练次数或训练周期达到预定数目。需要说明的是,本公开的实施例对此不作限制。
例如,如图11A所示,在判别网络200的训练过程中,需要联合神经网络100进行训练。需要说明的是,在判别网络200的训练过程中,神经网络100的参数保持不变。
需要说明的是,上述示例仅是示意性说明判别网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本图像对判别网络进行训练;同时,在每一幅样本图像训练过程中,都可以包括多次反复迭代以对判别网络的参数进行修正。又例如,训练阶段还包括对判别网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,判别网络200的初始参数可以为随机数,例如随机数符合高斯分布,本公开的实施例对此不作限制。
例如,判别网络200的训练过程中还可以包括优化函数(图11A中未示出),优化函数可以根据判别损失函数计算得到的判别损失值计算判别网络200的参数的误差值,并根据该误差值对判别网络200的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算判别网络200的参数的误差值。
例如,第二训练输入图像可以与第一训练输入图像相同,例如,第二训练输入图像的 集合与第一训练输入图像的集合是同一个图像集合,本公开的实施例包括但不限于此。例如,第二训练输入图像可以参考前述第一训练输入图像的相关描述,在此不再重复赘述。
例如,在一些实施例中,判别损失函数可以表示为:
Figure PCTCN2020120586-appb-000029
其中,L D(V W=1)表示判别损失函数,U表示第二训练输入图像对应的第二训练标准图像,V W=1表示随机噪声图像的噪声幅度不为0的情况下得到的第二训练输出图像,C(U)表示第二训练标准图像作为判别网络的输入得到的判别输出图像,C(V W=1)表示随机噪声图像的噪声幅度不为0的情况下得到的判别输出图像。
例如,第二训练标准图像U具有与第二训练输入图像相同的场景,即二者的内容相同,同时,第二训练标准图像U的质量比第二训练输出图像的质量高。例如,第二训练标准图像U可以参考前述第一训练标准图像X的相关描述,在此不再重复赘述。
需要说明的是,上述公式表示的判别损失函数是示例性的,例如,判别损失函数还可以表示为其他常用的公式,本公开的实施例对此不作限制。
例如,判别网络200的训练目标是最小化判别损失值。例如,在神经网络200的训练过程中,判别网络200的参数被不断地修正,以使经过参数修正后的判别网络200能够准确鉴别第二训练输出图像和第二训练标准图像,也就是,使判别网络200认定第二训练输出图像与第二训练标准图像的偏差越来越大,从而不断地减小判别损失值。
需要说明的是,在本实施例中,神经网络100的训练和判别网络200的训练是交替迭代进行的。例如,对于未经训练的神经网络100和判别网络200,一般先对判别网络200进行第一阶段训练,提高判别网络200的鉴别能力,得到经过第一阶段训练的判别网络200;然后,基于经过第一阶段训练的判别网络200对神经网络100进行第一阶段训练,提高神经网络100的图像增强处理能力,得到经过第一阶段训练的神经网络100。与第一阶段训练类似,在第二阶段训练中,基于经过第一阶段训练的神经网络100,对经过第一阶段训练的判别网络200进行第二阶段训练,提高判别网络200的鉴别能力,得到经过第二阶段训练的判别网络200;然后,基于经过第二阶段训练的判别网络200对经过第一阶段训练的神经网络100进行第二阶段训练,提高神经网络100的图像增强处理能力,得到经过第二阶段训练的神经网络100,依次类推,接下来对判别网络200和神经网络100进行第三阶段训练、第四阶段训练、……,直到得到的神经网络100的输出的质量可以接近于对应的训练标准图像的质量。
需要说明的是,在神经网络100和判别网络200的交替训练过程中,神经网络100和判别网络200的对抗体现在判别损失函数与神经网络的损失函数中的生成损失函数相反。还需要说明的是,理想情况下,经过训练得到的神经网络100输出的图像为高质量图像(即接近于训练标准图像的质量),判别网络200针对第二训练标准图像和该神经网络100生成的第二训练输出图像的输出趋近于一致,即神经网络100和判别网络200经过对抗博弈达到纳什均衡。
需要说明的是,在本公开的实施例提供的训练方法中,通常涉及对训练集中的大量样本图像(包括第一/第二训练输入图像、第一/第二训练标准图像等)的读操作和解码操作。例如,在一些实施例中,读操作是指将存储在存储器中的样本图像读取到处理器中的操作;例如,在一些实施例中,解码操作是指将图片格式(例如PNG、TIFF、JPEG等格式)的样本图像解码为二进制数据格式的操作,样本图像通常需要解码之后才能通过神经网络进行处理。
对于分辨率较高的样本图像,每一次读操作和解码操作都会占用大量的计算资源,不利于提高训练速度;当分辨率较高的样本图像的数量较多时,该问题尤为严重。因此,在一些实施例中,为了解决上述问题,在正式开始训练之前,可以提前对训练集的各个样本图像进行裁剪处理和解码处理,以得到二进制数据格式的多个子样本图像,从而可以基于该二进制数据格式的多个子样本图像对神经网络进行训练。
例如,在一些实施例中,可以先将训练集中的每个样本图像裁剪为多个子样本图像,再将该多个子样本图像分别解码为二进制数据格式的子样本图像并进行存储。例如,在另一些实施例中,可以先将训练集中的每个样本图像解码为二进制数据格式的,再对该二进制格式的样本图像进行裁剪以得到多个二进制数据格式的子样本图像并进行保存。
例如,每个样本图像对应的多个子样本图像之间可以相互交叠,也可以互不交叠,本公开的实施例对此不作限制。例如,每个样本图像对应的多个子样本图像的尺寸大小可以完全相等,也可以部分相等,也可以互不相等,本公开的实施例对此不作限制。例如,每个样本图像对应的多个子样本图像的中心可以呈均匀分布,也可以呈非均匀分布,本公开的实施例对此不作限制。
例如,同一个样本图像对应的多个子样本图像可以存储在同一个存储路径(例如同一个文件夹)中,而不同的样本图像对应的子样本图像则分别存储在不同的存储路径中。例如,在一些实施例中,每一个样本图像对应一个文件夹,该样本图像对应的多个子样本图像以预定的命名方式存储在该文件夹中;同时,全部的样本图像对应的文件夹又可以存储在一个大文件夹中,即训练集可以对应该大文件夹。例如,每个样本图像对应的各子样本图像可以按照“样本图像名”+“子样本图像序号”的命名方式进行命名,本公开的实施例包括但不限于此。
例如,在基于上述二进制数据格式的多个子样本图像对神经网络进行训练时,可以随机读取一个文件夹(相当于选择一个样本图像),再随机读取该文件夹中的一个二进制数据格式的子样本图像,然后,以读取的二进制数据格式的子样本图像作为例如训练输入图像等进行训练。由此,可以在训练过程中节省计算资源。
本公开的实施例提供的神经网络的训练方法,可以对本公开实施例的图像处理方法中采用的神经网络进行训练,通过该训练方法训练好的神经网络,可以对输入图像进行图像增强处理,可以获取更高的图像保真度以及大幅提升输出图像的质量,同时还可以提高处理速度。
本公开至少一实施例还提供一种合并神经网络模型的构建方法。图12为本公开一些实施例提供的一种合并神经网络模型的构建方法的流程图。例如,如图12所示,该合并神经网络模型的构建方法包括步骤S410至步骤S450。
步骤S410:获取多个训练好的神经网络模型,其中,该多个神经网络模型用于执行同一图像处理任务,该多个神经网络模型的输入图像的分辨率相同,该多个神经网络模型的输出图像的分辨率相同,该多个神经网络模型两两之间至少结构和参数之一不同。
例如,在步骤S410中,该多个神经网络模型中可以包括结构相同而参数不同的神经网络模型。例如,该结构相同而参数不同的神经网络模型可以是基于不同的训练配置训练得到。例如,不同的训练配置是指不同的训练集、不同的初始参数、不同的卷积核尺寸(例如,3×3,5×5,7×7等)、不同的超参数等之一或其任意组合。应当理解的是,当各神经网络模型的具体结构之间存在差异时,可以基于相同的训练配置进行训练,本公开的实施例对此不作限制。
例如,在一些实施例中,该多个神经网络模型可以包括第一神经网络模型,该第一神经网络模型用于执行第一图像处理方法,例如第一图像处理方法即为前述实施例提供的图像处理方法(例如,包括上述步骤S110至步骤S140等),本公开的实施例包括但不限于此。
步骤S420:在同一验证集上获得该多个神经网络模型的输出,根据预定的图像质量评价标准确定该多个神经网络模型的评估质量,并将该多个神经网络模型按照评估质量从高到低进行排序。
例如,在步骤S420中,验证集包括验证输入图像和与验证输入图像对应的验证标准图像。例如,验证输入图像可以参考前述训练输入图像(例如,第一训练输入图像、第二训练输入图像)的相关描述,验证标准图像可以参考前述训练标准图像(例如,第一训练标准图像、第二训练标准图像)的相关描述,在此不再重复赘述。应当理解的是,验证集和训练集通常不作严格区分,例如,在某些情况下,验证集可以用作训练集,而训练集的一部分可以用作验证集。
例如,在一些实施例中,将上述验证输入图像输入该多个神经网络模型,以得到该多个神经网络模型的验证输出图像,然后基于各验证输出图像和验证标准图像,确定各神经网络模型的评估质量。例如,通常验证输出图像越接近于验证标准图像,则说明神经网络模型的评估质量越高。例如,图像的质量评价标准包括均方误差(MSE)、相似度(SSIM)、峰值信噪比(PSNR)等。以评价标准为均方误差为例,可以通过下述公式计算验证输出图像和验证标准图像之间的均方误差:
MSE=E[(X’-Y’) 2],
其中,MSE表示均方误差,Y’表示验证输出图像,X’表示验证输出图像对应的验证标准图像,E[]表示对矩阵能量的计算。
例如,均方误差MSE越小,表明验证输出图像越接近于验证标准图像,即神经网络模 型的评估质量越高;均方误差MSE越大,表明验证输出图像越偏离于验证标准图像,即神经网络模型的评估质量越低。例如,在此情况下,将该多个神经网络模型按照评估质量从高到低进行排序,即为将该多个神经网络按照均方误差从小到大进行排序。
步骤S430:将评估质量最高的神经网络模型作为合并神经网络模型中的第1个神经网络模型。例如,在一些实施例中,可以将均方误差最小的神经网络模型作为合并神经网络模型中的第1个神经网络模型,本公开的实施例包括但不限于此。例如,在另一些实施例中,可以将PSNR最大的神经网络模型作为合并神经网络模型中的第1个神经网络模型。
步骤S440:判断当前余下的评估质量最高的神经网络模型能否加入当前的合并神经网络模型,若能,则将当前余下的评估质量最高的神经网络模型加入当前的合并神经网络模型,若不能,则将当前的合并神经网络模型作为获得的合并神经网络模型。
例如,一方面,可以将验证输入图像输入当前的合并神经网络模型中的各神经网络模型,以得到当前的合并神经网络模型中的各神经网络模型的输出;然后,将当前的合并神经网络模型中的各神经网络模型的输出相加取平均值,以得到当前的合并神经网络模型的输出,并基于当前的合并神经网络模型的输出确定当前的合并神经网络模型的评估质量。另一方面,可以将验证输入图像输入当前余下的评估质量最高的神经网络模型(当前余下的神经网络模型是指当前还未并入合并神经网络模型的神经网络模型),以得到当前余下的评估质量最高的神经网络模型的输出;然后,将当前余下的评估质量最高的神经网络模型的输出与当前的合并神经网络模型中的各神经网络模型的输出相加取平均值,以得到临时合并神经网络模型的输出,并基于该临时合并神经网络模型的输出确定临时合并神经网络模型的评估质量。如果临时合并神经网络模型的评估质量不低于当前的合并神经网络模型的评估质量,则将当前余下的评估质量最高的神经网络模型加入当前的合并神经网络模型中,并继续对余下的评估质量最高的神经网络模型进行判断;如果临时合并神经网络模型的评估质量低于当前的合并神经网络模型的评估质量,则结束步骤S440。
应当理解的是,在合并神经网络模型仅包括第1个神经网络模型时,直接将该第1个神经网络模型的输出作为合并神经网络模型的输出。还应当理解的是,如果步骤S410中得到的多个神经网络模型均加入了合并神经网络模型,也自然结束步骤S440。
步骤S450:使用神经网络的训练方法对获得的合并神经网络模型进行训练,以得到训练好的合并神经网络模型。
例如,对获得的合并神经网络模型进行训练,即对获得的合并神经网络模型中的各神经网络模型同时进行训练,具体训练过程可以参考前述神经网络的训练方法的相关描述,在此不再重复赘述。
应当理解的是,在公开的实施例提供的合并神经网络模型的构建方法中,并不要求各神经网络模型的具体结构及处理过程和细节等完全相同。例如,对于执行相同图像处理任务而具有其它具体结构的神经网络模型(此时,对训练配置是否相同不作要求),只要其输入和输出的尺寸与上述多个神经网络模型的输入和输出的尺寸相同,就可以将其以加入 或替换(例如,替换表现较差的模型)等方式并入现有的合并神经网络模型,只要其可以使新的合并神经网络模型具有更高的评估质量即可。
本公开的实施例提供的合并神经网络模型的构建方法的技术效果可以参考前述实施例中关于合并神经网络模型的图像处理方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种神经网络处理器。图13A为本公开一些实施例提供的一种神经网络处理器的示意性框图。例如,如图13A所示,该神经网络处理器50包括分析电路60、循环缩放电路70和合成电路80。例如,该神经网络处理器50可以用于执行前述图像处理方法。
例如,分析电路60配置为基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2,即分析电路60可以用于执行前述图像处理方法的步骤S120,具体过程和细节参考前述相关描述,在此不再重复赘述。
例如,循环缩放电路70配置为基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像,即循环缩放电路70可以用于执行前述图像处理方法的步骤S130,具体过程和细节参考前述相关描述,在此不再重复赘述。
例如,合成电路80配置为对中间特征图像进行合成处理,以得到输出图像,即合成电路80可以用于执行前述图像处理方法的步骤S140,具体过程和细节参考前述相关描述,在此不再重复赘述。
例如,如图13A所示,循环缩放电路70可以包括N-1个层级的逐层嵌套的缩放电路75,每个层级的缩放电路75包括下采样电路751、联接电路752、上采样电路753和残差链接相加电路754,从而该循环缩放电路70可以用于执行前述图像处理方法中的循环缩放处理的过程。例如,第i层级的下采样电路基于第i层级的缩放电路的输入进行下采样得到第i层级的下采样输出,第i层级的联接电路基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样电路基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加电路将所述第i层级的缩放电路的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放电路的输出,其中,i=1,2,…,N-1;第j+1层级的缩放电路嵌套在第j层级的下采样电路和第j层级的联接电路之间,所述第j层级的下采样电路的输出作为所述第j+1层级的缩放电路的输入,其中,j=1,2,…,N-2。
图13B为本公开一些实施例提供的另一种神经网络处理器的示意性框图。例如,如图4A和/或图4B等卷积神经网络中各层的算法均可以在图13B所示的神经网络处理器10中得以实现。
例如,神经网络网络处理器(NPU)10可以作为协处理器挂载到主CPU(图13B中未示出)上,由主CPU分配任务。NPU的核心部分为运算电路11,控制器12控制运算电路11提取内部存储器13中的数据(例如,输入矩阵和权重矩阵等)并进行运算。
例如,在一些实施例中,运算电路11内部可以包括多个处理单元(Process Engine,PE)。 例如,在一些实施例中,运算电路11是二维脉动阵列。运算电路11还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。例如,在一些实施例中,运算电路11是通用的矩阵处理器。
例如,在一些实施例中,运算电路11可以从内部存储器12中读取权重矩阵的相应数据,并缓存在运算电路11中每一个PE上;另外,运算电路11还从内部存储器12中读取输入矩阵的数据与权重矩阵进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器14中。
例如,向量计算单元15可以对运算电路11的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元15可以用于神经网络中非卷积/非全连接层的网络计算,如下采样,标准化等。
例如,在一些实施例中,向量计算单元15可以将经过处理的输出的向量存储到统一存储器16中。例如,向量计算单元15可以将非线性函数应用到运算电路11的输出,例如累加值的向量,用以生成激活值。例如,在一些实施例中,向量计算单元15生成标准化的值、合并值,或二者均有。例如,在一些实施例中,处理过的输出的向量能够用作运算电路11的激活输入,例如用于在卷积神经网络中的后续层中的使用。
本公开的实施例提供的图像处理方法以及神经网络的训练方法的部分或全部步骤可以由运算电路11或向量计算单元15执行。
例如,神经网络处理器10可以通过存储单元访问控制器17将外部存储器(图13B中未示出)中的输入数据等写入到内部存储器13和/或统一存储器16,还将统一存储器16中的数据存入外部存储器。
例如,总线接口单元20,用于通过总线实现主CPU、存储单元访问控制器17和取指存储器18等之间的交互。例如,与控制器12连接的取指存储器18,用于存储控制器12使用的指令。例如,控制器12,用于调用取指存储器18中缓存的指令,实现控制运算电路11的工作过程。
例如,图4A和/或图4B所示的卷积神经网络中各层的运算可以由运算电路11或向量计算单元15执行。
本公开至少一实施例还提供一种图像处理装置。图14A为本公开一些实施例提供的一种图像处理装置的示意性框图。例如,如图14A所示,该图像处理装置470包括图像获取模块480和图像处理模块490。
例如,图像处理装置470可以用于执行前述图像处理方法,本公开的实施例包括但不限于此。
例如,图像获取模块480可以用于执行前述图像处理方法的步骤S110,本公开的实施例包括但不限于此。例如,图像获取模块480可以用于获取输入图像。例如,图像获取模块480可以包括存储器,存储器存储有输入图像;或者,图像获取模块480也可以包括一个或多个摄像头,以获取输入图像。
例如,图像处理模块490可以用于执行前述图像处理方法的步骤S120-步骤S140,本公开的实施例包括但不限于此。例如,图像处理模块可以:基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对中间特征图像进行合成处理,以得到输出图像。例如,循环缩放处理的具体过程和细节可以参考前述图像处理方法中的相关描述,在此不再重复赘述。
例如,在一些实施例中,图像获取模块480和图像处理模块490可以实现为硬件、软件、固件以及它们的任意可行的组合。
图14B为本公开一些实施例提供的另一种图像处理装置的示意性框图。例如,如图14B所示,该图像处理装置500包括存储器510和处理器520。例如,存储器510用于非暂时性存储计算机可读指令,处理器520用于运行该计算机可读指令,该计算机可读指令被处理器520运行时执行本公开任一实施例提供的图像处理方法或/和合并神经网络模型的图像处理方法或/和神经网络的训练方法或/和合并神经网络模型的构建方法。
例如,存储器510和处理器520之间可以直接或间接地互相通信。例如,在一些示例中,如图14B所示,该图像处理装置500还可以包括系统总线530,存储器510和处理器520之间可以通过系统总线530互相通信,例如,处理器520可以通过系统总线530访问存储器510。例如,在另一些示例中,存储器510和处理器520等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,处理器520可以控制图像处理装置中的其它组件以执行期望的功能。处理器520可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理能力和/或程序执行能力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。
例如,存储器510可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
例如,在存储器510上可以存储一个或多个计算机指令,处理器520可以运行所述计算机指令,以实现各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如输入图像、输出图像、第一/第二训练输入图像、第一/第二训练输出图像、第一/ 第二训练标准图像以及应用程序使用和/或产生的各种数据等。
例如,存储器510存储的一些计算机指令被处理器520执行时可以执行根据上文所述的图像处理方法或合并神经网络模型的图像处理方法中的一个或多个步骤。又例如,存储器510存储的另一些计算机指令被处理器520执行时可以执行根据上文所述的神经网络的训练方法或合并神经网络模型的构建方法中的一个或多个步骤。
例如,如图14B所示,图像处理装置500还可以包括允许外部设备与图像处理装置500进行通信的输入接口540。例如,输入接口540可被用于从外部计算机设备、从用户等处接收指令。图像处理装置500还可以包括使图像处理装置500和一个或多个外部设备相互连接的输出接口550。例如,图像处理装置500可以通过输出接口550显示图像等。通过输入接口1010和输出接口1012与图像处理装置500通信的外部设备可被包括在提供任何类型的用户可与之交互的用户界面的环境中。用户界面类型的示例包括图形用户界面、自然用户界面等。例如,图形用户界面可接受来自用户采用诸如键盘、鼠标、遥控器等之类的输入设备的输入,以及在诸如显示器之类的输出设备上提供输出。此外,自然用户界面可使得用户能够以无需受到诸如键盘、鼠标、遥控器等之类的输入设备强加的约束的方式来与图像处理装置500交互。相反,自然用户界面可依赖于语音识别、触摸和指示笔识别、屏幕上和屏幕附近的手势识别、空中手势、头部和眼睛跟踪、语音和语义、视觉、触摸、手势、以及机器智能等。
另外,图像处理装置500尽管在图9中被示出为单个系统,但可以理解,图像处理装置500也可以是分布式系统,还可以布置为云设施(包括公有云或私有云)。因此,例如,若干设备可以通过网络连接进行通信并且可共同执行被描述为由图像处理装置500执行的任务。
例如,关于图像处理方法的处理过程的详细说明可以参考上述图像处理方法的实施例中的相关描述,关于合并神经网络模型的图像处理方法的处理过程的详细说明可以参考上述合并神经网络模型的图像处理方法的实施例中的相关描述,关于神经网络的训练方法的处理过程的详细说明可以参考上述神经网络的训练方法的实施例中的相关描述,关于合并神经网络模型的构建方法的处理过程的详细说明可以参考上述合并神经网络模型的构建方法的实施例中的相关描述,重复之处不再赘述。
需要说明的是,本公开的实施例提供的图像处理装置是示例性的,而非限制性的,根据实际应用需要,该图像处理装置还可以包括其他常规部件或结构,例如,为实现图像处理装置的必要功能,本领域技术人员可以根据具体应用场景设置其他的常规部件或结构,本公开的实施例对此不作限制。
本公开的实施例提供的图像处理装置的技术效果可以参考上述实施例中关于图像处理方法、合并神经网络模型的图像处理方法、神经网络的训练方法以及合并神经网络模型的构建方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种存储介质。图15为本公开一实施例提供的一种存储介 质的示意图。例如,如图15所示,该存储介质600非暂时性地存储计算机可读指令601,当非暂时性计算机可读指令601由计算机(包括处理器)执行时可以执行本公开任一实施例提供的图像处理方法或合并神经网络模型的图像处理方法的指令或者可以执行本公开任一实施例提供的神经网络的训练方法或合并神经网络模型的构建方法的指令。
例如,在存储介质600上可以存储一个或多个计算机指令。存储介质600上存储的一些计算机指令可以是例如用于实现上述图像处理方法或合并神经网络模型的图像处理方法中的一个或多个步骤的指令。存储介质上存储的另一些计算机指令可以是例如用于实现上述神经网络的训练方法或合并神经网络模型的构建方法中的一个或多个步骤的指令。
例如,存储介质可以包括平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、光盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
本公开的实施例提供的存储介质的技术效果可以参考上述实施例中图像处理方法、合并神经网络模型的图像处理方法、神经网络的训练方法以及合并神经网络模型的构建方法的相应描述,在此不再赘述。
对于本公开,有以下几点需要说明:
(1)本公开实施例附图中,只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开同一实施例及不同实施例中的特征可以相互组合。
以上,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (32)

  1. 一种图像处理方法,包括:
    获取输入图像;
    基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;
    基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及
    对所述中间特征图像进行合成处理,以得到输出图像;
    其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;
    第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;
    第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
  2. 根据权利要求1所述的图像处理方法,其中,所述第i层级的联接处理基于所述第i层级的下采样输出和所述第i+1层级的初始特征图像进行联接得到所述第i层级的联合输出,包括:
    将所述第i层级的下采样输出作为所述第i+1层级的缩放处理的输入,以得到所述第i+1层级的缩放处理的输出;以及
    将所述第i+1层级的缩放处理的输出与所述第i+1层级的初始特征图像进行联接,以得到所述第i层级的联合输出。
  3. 根据权利要求2所述的图像处理方法,其中,至少一个层级的缩放处理连续执行多次,且前一次缩放处理的输出作为后一次缩放处理的输入。
  4. 根据权利要求3所述的图像处理方法,其中,每个层级的缩放处理连续执行两次。
  5. 根据权利要求1-4任一项所述的图像处理方法,其中,在所述N个层级的初始特征图像中,第1层级的初始特征图像的分辨率最高,且第1层级的初始特征图像的分辨率与所述输入图像的分辨率相同。
  6. 根据权利要求1-5任一项所述的图像处理方法,其中,前一层级的初始特征图像的分辨率为后一层级的初始特征图像的分辨率的整数倍。
  7. 根据权利要求1-6任一项所述的图像处理方法,其中,基于所述输入图像,得到分 辨率从高到低排列的所述N个层级的初始特征图像,包括:
    将所述输入图像与随机噪声图像进行联接,以得到联合输入图像;以及
    对所述联合输入图像进行N个不同层级的分析处理,以分别得到分辨率从高到低排列的所述N个层级的初始特征图像。
  8. 根据权利要求1-7任一项所述的图像处理方法,其中,获取所述输入图像,包括:
    获取具有第一分辨率的原始输入图像;以及
    对所述原始输入图像进行分辨率转换处理,以得到具有第二分辨率的所述输入图像,所述第二分辨率大于第一分辨率。
  9. 根据权利要求8所述的图像处理方法,其中,采用双立方插值算法、双线性插值算法和兰索斯(Lanczos)插值算法之一进行所述分辨率转换处理。
  10. 根据权利要求1-9任一项所述的图像处理方法,还包括:
    对所述输入图像进行裁剪处理,以得到具有交叠区域的多个子输入图像;
    所述基于输入图像得到分辨率从高到低排列的N个层级的初始特征图像具体包括:
    基于每个子输入图像,得到分辨率从高到低排列的N个层级的子初始特征图像,N为正整数,且N>2;
    所述基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像具体包括:
    基于第2~N层级的子初始特征图像,对所述第1层级的子初始特征图像进行循环缩放处理,以得到子中间特征图像;
    所述对所述中间特征图像进行合成处理,以得到输出图像具体包括:
    对所述子中间特征图像进行合成处理,以得到对应的子输出图像;以及
    将所述多个子输入图像对应的子输出图像拼接为所述输出图像。
  11. 根据权利要求10所述的图像处理方法,其中,所述多个子输入图像的尺寸大小相同,所述多个子输入图像的中心形成均匀规则的网格,且在行方向和列方向上,相邻的两个子输入图像的交叠区域的尺寸大小均是恒定的,所述输出图像中每个像素点的像素值表示为:
    Figure PCTCN2020120586-appb-100001
    其中,Y p表示所述输出图像中的任意一个像素点p的像素值,T表示包括该像素点p的子输出图像的数量,Y k,(p)表示该像素点p在第k幅包括该像素点p的子输出图像中的像素值,s k表示在所述第k幅包括该像素点p的子输出图像中该像素点p到所述第k幅包括该像素点p的子输出图像的中心的距离。
  12. 一种合并神经网络模型的图像处理方法,其中,所述合并神经网络模型包括多个神经网络模型,所述多个神经网络模型用于执行同一图像处理任务,所述多个神经网络模 型的输入图像的分辨率相同,所述多个神经网络模型的输出图像的分辨率相同,所述多个神经网络模型两两之间至少结构和参数之一不同;
    所述合并神经网络模型的图像处理方法,包括:
    将输入图像输入所述合并神经网络模型中的所述多个神经网络模型,以分别得到所述多个神经网络模型的输出;以及
    将所述多个神经网络模型的输出相加取平均值,以得到所述合并神经网络模型的输出。
  13. 根据权利要求12所述的合并神经网络模型的图像处理方法,其中,所述多个神经网络模型包括第一神经网络,所述第一神经网络模型用于执行第一图像处理方法,所述第一图像处理方法包括:
    获取输入图像;
    基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;
    基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及
    对所述中间特征图像进行合成处理,以得到输出图像;
    其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;
    第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;
    第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
  14. 一种神经网络的训练方法,其中,所述神经网络包括:分析网络、循环缩放网络和合成网络;
    所述训练方法包括:
    获取第一训练输入图像;
    使用所述分析网络对所述第一训练输入图像进行处理,以得到分辨率从高到低排列的N个层级的训练初始特征图像,N为正整数,且N>2;
    使用所述循环缩放网络,基于第2~N层级的训练初始特征图像,对第1层级的训练初始特征图像进行循环缩放处理,以得到训练中间特征图像;
    使用所述合成网络对所述训练中间特征图像进行合成处理,以得到第一训练输出图像;
    基于所述第一训练输出图像,通过损失函数计算所述神经网络的损失值;以及
    根据所述神经网络的损失值对所述神经网络的参数进行修正;
    其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括依次执行的下采样处理、联接处理、上采样处理和残差链接相加处理;
    第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;
    第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
  15. 根据权利要求14所述的神经网络的训练方法,其中,所述损失函数表示为:
    Figure PCTCN2020120586-appb-100002
    其中,L(Y,X)表示所述损失函数,Y表示所述第一训练输出图像,X表示所述第一训练输入图像对应的第一训练标准图像,S k-1(Y)表示对所述第一训练输出图像进行第k-1层级下采样处理得到的输出,S k-1(X)表示对所述第一训练标准图像进行第k-1层级的下采样处理得到的输出,E[]表示对矩阵能量的计算。
  16. 根据权利要求14或15所述的神经网络的训练方法,其中,使用所述分析网络对所述第一训练输入图像进行处理,以得到分辨率从高到低排列的所述N个层级的训练初始特征图像,包括:
    将所述第一训练输入图像与随机噪声图像进行联接,以得到训练联合输入图像;以及
    使用所述分析网络对所述训练联合输入图像进行N个不同层级的分析处理,以分别得到分辨率从高到低排列的所述N个层级的训练初始特征图像。
  17. 根据权利要求16所述的神经网络的训练方法,其中,基于所述第一训练输出图像,通过所述损失函数计算所述神经网络的损失值,包括:
    使用判别网络对所述第一训练输出图像进行处理,并基于所述第一训练输出图像对应的判别网络的输出计算所述神经网络的损失值。
  18. 根据权利要求17所述的神经网络的训练方法,其中,所述判别网络包括:M-1个层级的下采样子网络、M个层级的判别支网络、合成子网络和激活层;
    所述M-1个层级的下采样子网络用于对所述判别网络的输入进行不同层级的下采样处理,以得到M-1个层级的下采样子网络的输出;
    所述判别网络的输入和所述M-1个层级的下采样子网络的输出分别对应作为所述M个层级的判别支网络的输入;
    每个层级的判别支网络包括依次连接的亮度处理子网络、第一卷积子网络和第二卷积子网络;
    第t层级的判别支网络中的第二卷积子网络的输出与第t+1层级的判别支网络中的第一卷积子网络的输出进行联接后作为第t+1层级的判别支网络中的第二卷积子网络的输入,其中,t=1,2,…,M-1;
    所述合成子网络用于对第M层级的判别支网络中的第二卷积子网络的输出进行合成处理,以得到判别输出图像;
    所述激活层用于对所述判别输出图像进行处理,以得到表征所述判别网络的输入的质量的数值。
  19. 根据权利要求18所述的神经网络的训练方法,其中,所述亮度处理子网络包括亮度特征提取子网络、归一化子网络和平移相关子网络,
    所述亮度特征提取子网络用于提取亮度特征图像,
    所述归一化子网络用于对所述亮度特征图像进行归一化处理,以得到归一化亮度特征图像,
    所述平移相关子网络用于对所述归一化亮度特征图像进行多次图像平移处理,以得到多个移位图像,并根据所述归一化亮度特征图像与每个所述移位图像之间的相关性,生成多个相关性图像。
  20. 根据权利要求18或19所述的神经网络的训练方法,其中,所述损失函数表示为:
    Figure PCTCN2020120586-appb-100003
    其中,L(Y,X)表示损失函数,Y表示所述第一训练输出图像,Y包括Y W=1和Y W=0,X表示所述第一训练输入图像对应的第一训练标准图像,L G(Y W=1)表示生成损失函数,Y W=1表示所述随机噪声图像的噪声幅度不为0的情况下得到的第一训练输出图像,L L1(S M(Y W=1),S M(X))表示第一对比损失函数,L cont(Y W=1,X)表示内容损失函数,L L1((Y W=0),X)表示第二对比损失函数,Y W=0表示所述随机噪声图像的噪声幅度为0的情况下得到的第一训练输出图像,L L1(S M(Y W=0),S M(X))表示第三对比损失函数,S M()表示进行第M层级的下采样处理,λ 1、λ 2、λ 3、λ 4、λ 5分别表示预设的权值;
    所述生成损失函数L G(Y W=1)表示为:
    L G(Y W=1)=-E[log(Sigmoid(C(Y W=1)-C(X)))],
    其中,C(Y W=1)表示所述随机噪声图像的噪声幅度不为0的情况下得到的判别输出图像,C(X)表示第一训练标准图像作为判别网络的输入得到的判别输出图像;
    所述第一对比损失函数L L1(S M(Y W=1),S M(X))、所述第二对比损失函数L L1((Y W=0),X)和所述第三对比损失函数L L1(S M(Y W=0),S M(X))分别表示为:
    Figure PCTCN2020120586-appb-100004
    其中,E[]表示对矩阵能量的计算;
    所述内容损失函数L cont(Y W=1,X)表示为:
    Figure PCTCN2020120586-appb-100005
    其中,S 1为常数,F ij表示在内容特征提取模块中第i个卷积核提取的第一训练输出图像的第一内容特征图中第j个位置的值,P ij表示在所述内容特征提取模块中第i个卷积核提取的第一训练标准图像的第二内容特征图中第j个位置的值。
  21. 根据权利要求17-20任一项所述的神经网络的训练方法,还包括:
    基于所述神经网络,对所述判别网络进行训练;以及,
    交替地执行所述判别网络的训练过程和所述神经网络的训练过程,以得到训练好的神经网络;
    其中,基于所述神经网络,对所述判别网络进行训练,包括:
    获取第二训练输入图像;
    使用所述神经网络对所述第二训练输入图像进行处理,以得到第二训练输出图像;
    基于所述第二训练输出图像,通过判别损失函数计算判别损失值;以及
    根据所述判别损失值对所述判别网络的参数进行修正。
  22. 根据权利要求21所述的神经网络的训练方法,其中,所述判别损失函数表示为:
    L D(V W=1)=-E[log(Sigmoid(C(U)-C(V W=1)))],
    其中,L D(V W=1)表示判别损失函数,U表示所述第二训练输入图像对应的第二训练标准图像,V W=1表示所述随机噪声图像的噪声幅度不为0的情况下得到的第二训练输出图像,C(U)表示所述第二训练标准图像作为所述判别网络的输入得到的判别输出图像,C(V W=1)表示所述随机噪声图像的噪声幅度不为0的情况下得到的判别输出图像。
  23. 根据权利要求14-20任一项所述的神经网络的训练方法,还包括:
    在进行训练之前,对训练集的各个样本图像进行裁剪处理和解码处理,以得到二进制数据格式的多个子样本图像;
    在进行训练时,基于所述二进制数据格式的子样本图像对所述神经网络进行训练。
  24. 根据权利要求23所述的神经网络的训练方法,其中,所述多个子样本图像的尺寸大小相等。
  25. 一种合并神经网络模型的构建方法,包括:
    获取多个训练好的神经网络模型,其中,所述多个神经网络模型用于执行同一图像处 理任务,所述多个神经网络模型的输入图像的分辨率相同,所述多个神经网络模型的输出图像的分辨率相同,所述多个神经网络模型两两之间至少结构和参数之一不同;
    在同一验证集上获得所述多个神经网络模型的输出,根据预定的图像质量评估标准确定所述多个神经网络模型的评估质量,并将所述多个神经网络模型按照评估质量从高到低进行排序;
    将评估质量最高的神经网络模型作为所述合并神经网络模型中的第1个神经网络模型;以及
    判断当前余下的评估质量最高的神经网络模型能否加入当前的合并神经网络模型,若能,则将当前余下的评估质量最高的神经网络模型加入当前的合并神经网络模型,若不能,则将当前的合并神经网络模型作为获得的合并神经网络模型。
  26. 根据权利要求25所述的合并神经网络模型的构建方法,还包括:
    对所述获得的合并神经网络模型进行训练,以得到训练好的合并神经网络模型。
  27. 根据权利要求25或26所述的合并神经网络模型的构建方法,其中,所述预定的图像质量评估标准包括均方误差、相似度和峰值信噪比之一。
  28. 根据权利要求25-27任一项所述的合并神经网络模型的构建方法,其中,所述多个神经网络模型包括第一神经网络模型,所述第一神经网络模型用于执行第一图像处理方法,所述第一图像处理方法包括:
    获取输入图像;
    基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;
    基于第2~N层级的初始特征图像,对所述N个层级的初始特征图像中的第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及
    对所述中间特征图像进行合成处理,以得到输出图像;
    其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;
    第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;
    第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理的输入,其中,j=1,2,…,N-2。
  29. 一种神经网络处理器,包括分析电路、循环缩放电路和合成电路;
    所述分析电路配置为基于输入图像,得到分辨率从高到低排列的N个层级的初始特征 图像,N为正整数,且N>2;
    所述循环缩放电路配置为基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;
    所述合成电路配置为对所述中间特征图像进行合成处理,以得到输出图像;其中,
    所述循环缩放电路包括N-1个层级的逐层嵌套的缩放电路,每个层级的缩放电路包括下采样电路、联接电路、上采样电路和残差链接相加电路;
    第i层级的下采样电路基于第i层级的缩放电路的输入进行下采样得到第i层级的下采样输出,第i层级的联接电路基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样电路基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加电路将所述第i层级的缩放电路的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放电路的输出,其中,i=1,2,…,N-1;
    第j+1层级的缩放电路嵌套在第j层级的下采样电路和第j层级的联接电路之间,所述第j层级的下采样电路的输出作为所述第j+1层级的缩放电路的输入,其中,j=1,2,…,N-2。
  30. 一种图像处理装置,包括:
    图像获取模块,配置为获取输入图像;
    图像处理模块,配置为:基于输入图像,得到分辨率从高到低排列的N个层级的初始特征图像,N为正整数,且N>2;基于第2~N层级的初始特征图像,对第1层级的初始特征图像进行循环缩放处理,以得到中间特征图像;以及对所述中间特征图像进行合成处理,以得到输出图像;
    其中,所述循环缩放处理包括:N-1个层级的逐层嵌套的缩放处理,每个层级的缩放处理包括下采样处理、联接处理、上采样处理和残差链接相加处理;
    第i层级的下采样处理基于第i层级的缩放处理的输入进行下采样得到第i层级的下采样输出,第i层级的联接处理基于所述第i层级的下采样输出和第i+1层级的初始特征图像进行联接得到第i层级的联合输出,第i层级的上采样处理基于所述第i层级的联合输出得到第i层级的上采样输出,第i层级的残差链接相加处理将所述第i层级的缩放处理的输入与所述第i层级的上采样输出进行残差链接相加得到第i层级的缩放处理的输出,其中,i=1,2,…,N-1;
    第j+1层级的缩放处理嵌套在第j层级的下采样处理和第j层级的联接处理之间,所述第j层级的下采样处理的输出作为所述第j+1层级的缩放处理输入,其中,j=1,2,…,N-2。
  31. 一种图像处理装置,包括:
    存储器,用于非暂时性存储计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行根据权利要求1-11任一项所述的图像处理方法,或者执行根据权利要求12或13所述的合并神经网络模型的图像处理方法,或者执行根据权利要求14-24任一项所述的神经网络的 训练方法,或者执行根据权利要求25-28任一项所述的合并神经网络模型的构建方法。
  32. 一种存储介质,非暂时性地存储计算机可读指令,其中,当所述非暂时性计算机可读指令由计算机执行时可以执行根据权利要求1-11任一项所述的图像处理方法的指令,或者可以执行根据权利要求12或13所述的合并神经网络模型的图像处理方法的指令,或者可以执行根据权利要求14-24任一项所述的神经网络的训练方法的指令,或者可以执行根据权利要求25-28任一项所述的合并神经网络模型的构建方法的指令。
PCT/CN2020/120586 2019-10-18 2020-10-13 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质 WO2021073493A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/419,350 US11954822B2 (en) 2019-10-18 2020-10-13 Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910995755.2 2019-10-18
CN201910995755.2A CN110717851B (zh) 2019-10-18 2019-10-18 图像处理方法及装置、神经网络的训练方法、存储介质

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US17/419,350 A-371-Of-International US11954822B2 (en) 2019-10-18 2020-10-13 Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium
US18/396,866 Continuation US20240135490A1 (en) 2019-10-17 2023-12-27 Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium

Publications (1)

Publication Number Publication Date
WO2021073493A1 true WO2021073493A1 (zh) 2021-04-22

Family

ID=69212868

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/120586 WO2021073493A1 (zh) 2019-10-18 2020-10-13 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质

Country Status (3)

Country Link
US (1) US11954822B2 (zh)
CN (1) CN110717851B (zh)
WO (1) WO2021073493A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925746A (zh) * 2022-04-19 2022-08-19 淮阴工学院 一种基于Air-Net的目标检测方法
WO2023217270A1 (zh) * 2022-05-13 2023-11-16 北京字跳网络技术有限公司 图像超分方法、超分网络参数调整方法、相关装置及介质

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3966778A1 (en) * 2019-05-06 2022-03-16 Sony Group Corporation Electronic device, method and computer program
CN110717851B (zh) 2019-10-18 2023-10-27 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN113556496B (zh) * 2020-04-23 2022-08-09 京东方科技集团股份有限公司 视频分辨率提升方法及装置、存储介质及电子设备
CN111932474A (zh) * 2020-07-30 2020-11-13 深圳市格灵人工智能与机器人研究院有限公司 基于深度学习的图像去噪方法
CN112164227B (zh) * 2020-08-26 2022-06-28 深圳奇迹智慧网络有限公司 违停车辆告警方法、装置、计算机设备和存储介质
CN112215789B (zh) * 2020-10-12 2022-10-25 北京字节跳动网络技术有限公司 图像去雾方法、装置、设备和计算机可读介质
CN112215751A (zh) * 2020-10-13 2021-01-12 Oppo广东移动通信有限公司 图像缩放方法、图像缩放装置及终端设备
RU2764395C1 (ru) 2020-11-23 2022-01-17 Самсунг Электроникс Ко., Лтд. Способ и устройство для совместного выполнения дебайеризации и устранения шумов изображения с помощью нейронной сети
CN112784897B (zh) * 2021-01-20 2024-03-26 北京百度网讯科技有限公司 图像处理方法、装置、设备和存储介质
WO2022183325A1 (zh) * 2021-03-01 2022-09-09 京东方科技集团股份有限公司 视频块处理方法及装置、神经网络的训练方法和存储介质
US20220286696A1 (en) * 2021-03-02 2022-09-08 Samsung Electronics Co., Ltd. Image compression method and apparatus
US20220301127A1 (en) * 2021-03-18 2022-09-22 Applied Materials, Inc. Image processing pipeline for optimizing images in machine learning and other applications
WO2022267046A1 (zh) * 2021-06-25 2022-12-29 京东方科技集团股份有限公司 非抽取的图像处理方法及装置
CN114494022B (zh) * 2022-03-31 2022-07-29 苏州浪潮智能科技有限公司 模型训练方法、超分辨率重建方法、装置、设备及介质
CN115861662B (zh) * 2023-02-22 2023-05-12 脑玺(苏州)智能科技有限公司 基于组合神经网络模型的预测方法、装置、设备及介质
CN117392009B (zh) * 2023-12-06 2024-03-19 国网山东省电力公司淄博供电公司 图像自动透雾处理方法、系统、终端及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810999A (zh) * 2014-02-27 2014-05-21 清华大学 基于分布式神经网络的语言模型训练方法及其系统
US9659384B2 (en) * 2014-10-03 2017-05-23 EyeEm Mobile GmbH. Systems, methods, and computer program products for searching and sorting images by aesthetic quality
CN106991506A (zh) * 2017-05-16 2017-07-28 深圳先进技术研究院 智能终端及其基于lstm的股票趋势预测方法
CN108537762A (zh) * 2017-12-29 2018-09-14 西安电子科技大学 基于深度多尺度网络的二次jpeg压缩图像取证方法
CN110018322A (zh) * 2019-04-18 2019-07-16 北京先见智控科技有限公司 一种基于深度学习的转速检测方法及系统
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN110717851A (zh) * 2019-10-18 2020-01-21 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010122502A1 (en) * 2009-04-20 2010-10-28 Yeda Research And Development Co. Ltd. Super-resolution from a single signal
US8917948B2 (en) * 2011-09-16 2014-12-23 Adobe Systems Incorporated High-quality denoising of an image sequence
EP2979242B1 (en) * 2013-03-27 2018-01-10 Thomson Licensing Method and apparatus for generating a super-resolved image from a single image
CN110223224A (zh) * 2019-04-29 2019-09-10 杰创智能科技股份有限公司 一种基于信息过滤网络的图像超分辨实现算法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810999A (zh) * 2014-02-27 2014-05-21 清华大学 基于分布式神经网络的语言模型训练方法及其系统
US9659384B2 (en) * 2014-10-03 2017-05-23 EyeEm Mobile GmbH. Systems, methods, and computer program products for searching and sorting images by aesthetic quality
CN106991506A (zh) * 2017-05-16 2017-07-28 深圳先进技术研究院 智能终端及其基于lstm的股票趋势预测方法
CN108537762A (zh) * 2017-12-29 2018-09-14 西安电子科技大学 基于深度多尺度网络的二次jpeg压缩图像取证方法
CN110018322A (zh) * 2019-04-18 2019-07-16 北京先见智控科技有限公司 一种基于深度学习的转速检测方法及系统
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN110717851A (zh) * 2019-10-18 2020-01-21 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925746A (zh) * 2022-04-19 2022-08-19 淮阴工学院 一种基于Air-Net的目标检测方法
CN114925746B (zh) * 2022-04-19 2023-08-01 淮阴工学院 一种基于Air-Net的目标检测方法
WO2023217270A1 (zh) * 2022-05-13 2023-11-16 北京字跳网络技术有限公司 图像超分方法、超分网络参数调整方法、相关装置及介质

Also Published As

Publication number Publication date
CN110717851A (zh) 2020-01-21
CN110717851B (zh) 2023-10-27
US11954822B2 (en) 2024-04-09
US20220084166A1 (en) 2022-03-17

Similar Documents

Publication Publication Date Title
WO2021073493A1 (zh) 图像处理方法及装置、神经网络的训练方法、合并神经网络模型的图像处理方法、合并神经网络模型的构建方法、神经网络处理器及存储介质
WO2020239026A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
WO2020192483A1 (zh) 图像显示方法和设备
WO2020177651A1 (zh) 图像分割方法和图像处理装置
WO2020200030A1 (zh) 神经网络的训练方法、图像处理方法、图像处理装置和存储介质
CN111402143B (zh) 图像处理方法、装置、设备及计算机可读存储介质
WO2021018163A1 (zh) 神经网络的搜索方法及装置
US20210209459A1 (en) Processing method and system for convolutional neural network, and storage medium
EP3933693A1 (en) Object recognition method and device
CN111402130B (zh) 数据处理方法和数据处理装置
EP3923233A1 (en) Image denoising method and apparatus
CN113066017B (zh) 一种图像增强方法、模型训练方法及设备
WO2022134971A1 (zh) 一种降噪模型的训练方法及相关装置
EP4163832A1 (en) Neural network training method and apparatus, and image processing method and apparatus
CN114730456A (zh) 神经网络模型的训练方法、图像处理方法及其装置
CN112257759A (zh) 一种图像处理的方法以及装置
CN113095470A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
CN113066018A (zh) 一种图像增强方法及相关装置
KR20210029692A (ko) 비디오 영상에 보케 효과를 적용하는 방법 및 기록매체
CN115761258A (zh) 一种基于多尺度融合与注意力机制的图像方向预测方法
CN113096023A (zh) 神经网络的训练方法、图像处理方法及装置、存储介质
WO2020187029A1 (zh) 图像处理方法及装置、神经网络的训练方法、存储介质
CN117151987A (zh) 一种图像增强方法、装置及电子设备
WO2022183325A1 (zh) 视频块处理方法及装置、神经网络的训练方法和存储介质
US20240135490A1 (en) Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20876788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20876788

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20876788

Country of ref document: EP

Kind code of ref document: A1